digitalmars.D - Early std.crypto

Piotr Szturmaj (10/10) Oct 24 2011 https://github.com/pszturmaj/phobos/tree/master/std/crypto

Martin Nowak (28/39) Oct 24 2011 AC =

Piotr Szturmaj (22/41) Oct 25 2011 The reason why hash and block length are runtime variables is that some

Martin Nowak (35/78) Oct 25 2011 e =

Piotr Szturmaj (10/89) Oct 25 2011 Yes, I thought about that, but current way is faster because only

Brad Roberts (6/17) Oct 25 2011 A key element to a lot of crypto code is speed. I really don't think we...

Piotr Szturmaj (16/31) Oct 25 2011 You are of course right about speed but there are some reasons for

Walter Bright (15/24) Oct 25 2011 Thanks for championing this.

Piotr Szturmaj (5/37) Oct 25 2011 I share your opinion. I was thinking about such filter concept for

Walter Bright (3/7) Oct 25 2011 Andrei and I have pretty much failed at articulating this vision for Pho...

Jonathan M Davis (10/19) Oct 25 2011 Ranges in general are not something that is comunicated very well. They'...

Walter Bright (2/10) Oct 25 2011 Yup. Guilty as charged.

Brad Anderson (5/27) Oct 25 2011 I learned about them from Andrei's boostcon talk. Not exactly the first
Andrej Mitrovic (6/6) Oct 25 2011 If you want a bit of history:
Piotr Szturmaj (12/21) Oct 25 2011 Personally, I learned ranges from Phobos docs/src and Andrei's article

Walter Bright (4/8) Oct 25 2011 An easy test is that if the interface takes a T[] as input, consider a r...

Steve Teale (24/28) Oct 25 2011 I have a question about ranges that occurred to me when I was composing ...

Jonathan M Davis (30/66) Oct 25 2011 It's likely to cause issues if the type returned by save differs from th...

Steve Teale (16/47) Oct 26 2011 Well, yes Jonathan, that's just what I'm getting at. I want to save just...

Jonathan M Davis (14/63) Oct 26 2011 Ranges are defined per templates in std.range. isForwardRange, isInputRa...

Steve Teale (14/78) Oct 26 2011 But is it just required by std.range, or is it required at some

Jonathan M Davis (32/60) Oct 26 2011 It complicates things to allow for the range returned by save not be the...

Dmitry Olshansky (9/37) Oct 26 2011 Yay! Range with 'restore' strikes again.

Jonathan M Davis (18/46) Oct 25 2011 That and an article that Andrei wrote on ranges a while back which was n...
bcs (5/15) Oct 28 2011 Are you re-implementing the function kernels your self or are you using

Piotr Szturmaj (4/8) Nov 04 2011 Until now, I implemented some hash functions. There are no branching

bcs (10/19) Nov 04 2011 From my very incomplete memory I found the source I was looking for (I

Walter Bright (3/7) Nov 04 2011 You've got a good point. While I'd like to see native D implementations,...
Piotr Szturmaj (33/53) Nov 20 2011 Sorry for late answer.

bcs (12/67) Nov 21 2011 Np, but I still have a number of concerns:

Piotr Szturmaj (13/52) Nov 22 2011 The first advantage is that Phobos will be independent of any crypto

Regan Heath (26/76) Nov 22 2011 Ultimately I think it comes down to the question of whether we want/expe...
bcs (20/37) Nov 25 2011 "Wrote some crypto code" is a rather weak recommendation. Depending on

Brad Anderson (4/46) Nov 26 2011 How about putting a disclaimer on the module warning the code hasn't bee...

bcs (8/11) Nov 27 2011 What does that gain over implementing the first itteration in terms of

Jude Young (4/16) Nov 27 2011 I do like this idea.

Piotr Szturmaj (4/23) Nov 27 2011 This was my goal... to write native implementation along with OpenSSL

bcs (5/29) Nov 27 2011 Yes, though I'd prefer to see them distinct and non-mutually exclusive.

Brad Anderson (5/18) Nov 27 2011 That's even better but isn't the issue over bundling incompatibly licens...

bcs (7/12) Nov 27 2011 If we can't find something with a licence allowing bundling then we just...

Piotr Szturmaj <bncrbme jadamspam.pl> writes:

https://github.com/pszturmaj/phobos/tree/master/std/crypto

This is some early work on std.crypto proposal. Currently only MD5, HMAC 
and all SHA family functions (excluding SHA0 which is very old, broken 
and no longer in use). I plan to add other crypto primitives later.

I know about one SHA1 pull request optimized for SSSE3. I think native 
code must be there to support other non x86 CPUs and SIMD optimization 
may be added at any time later.

Any opinions are welcome. Especially if such design is good or bad, and 
what needs to be changed.

Thanks :)

Oct 24 2011

"Martin Nowak" <dawg dawgfoto.de> writes:

On Tue, 25 Oct 2011 02:10:49 +0200, Piotr Szturmaj <bncrbme jadamspam.pl=
  =

wrote:

 https://github.com/pszturmaj/phobos/tree/master/std/crypto

 This is some early work on std.crypto proposal. Currently only MD5, HM=

AC  =

 and all SHA family functions (excluding SHA0 which is very old, broken=

  =

 and no longer in use). I plan to add other crypto primitives later.

 I know about one SHA1 pull request optimized for SSSE3. I think native=

  =

 code must be there to support other non x86 CPUs and SIMD optimization=

  =

 may be added at any time later.

 Any opinions are welcome. Especially if such design is good or bad, an=

d  =

 what needs to be changed.

 Thanks :)

Great to push this a little.


I have to say though that I like the current struct based interface
much better.

struct Hash
{
     // enhanced by some compile time traits
     enum hashLength  =3D 16;
     enum blockLength =3D  0;

     // three interface functions
     void start();
     void update(const(ubyte)[] data);
     void finish(ref ubyte[hashLength] digest);
}

You wouldn't need the save, restore functions.
Some unnecessary allocations could go away.
Most important instances would have less mutable state.

You could probably parameterize a Merkle Damg=C3=A5rd base with free
functions for the transformation.

A dynamic interface can be obtaines by templated instances similar to wh=
at  =

std.range does.

Oct 24 2011

Piotr Szturmaj <bncrbme jadamspam.pl> writes:

Martin Nowak wrote:
 I have to say though that I like the current struct based interface
 much better.

 struct Hash
 {
 // enhanced by some compile time traits
 enum hashLength = 16;
 enum blockLength = 0;

The reason why hash and block length are runtime variables is that some 
hash functions are parametrized with variables of great amplitude, for 
example CubeHash may have any number of rounds, and any size of block 
and hash output.

 // three interface functions
 void start();
 void update(const(ubyte)[] data);
 void finish(ref ubyte[hashLength] digest);
 }

There, it is:

reset();
put();
finish();

The put() function makes hash implementation an OutputRange.

 You wouldn't need the save, restore functions.

They're not needed. They only serve as speed optimization when hashing 
many messages which have the same beginning block. This is used in HMAC, 
which is:

HMAC(func, key, message) = func(key ^ opad, func(key ^ ipad, message));

when func supports saving the IV, the first parts are precomputed, when 
not HMAC resorts to full hashing. This optimization is also mentioned in 
HMAC spec.

 Some unnecessary allocations could go away.
 Most important instances would have less mutable state.

Could you specify which ones, please?

 You could probably parameterize a Merkle Damgård base with free
 functions for the transformation.

What would be the difference from current class parametrization?

 A dynamic interface can be obtaines by templated instances similar to
 what std.range does.

Could you elaborate? I don't know exactly what do you mean. Function 
templates?

Thanks a lot!

Oct 25 2011

"Martin Nowak" <dawg dawgfoto.de> writes:

On Tue, 25 Oct 2011 09:43:48 +0200, Piotr Szturmaj <bncrbme jadamspam.pl=
  =

wrote:

 Martin Nowak wrote:
 I have to say though that I like the current struct based interface
 much better.

 struct Hash
 {
 // enhanced by some compile time traits
 enum hashLength =3D 16;
 enum blockLength =3D 0;

 The reason why hash and block length are runtime variables is that som=

e  =

 hash functions are parametrized with variables of great amplitude, for=

  =

 example CubeHash may have any number of rounds, and any size of block =

 =

 and hash output.

 // three interface functions
 void start();
 void update(const(ubyte)[] data);
 void finish(ref ubyte[hashLength] digest);
 }

 There, it is:

 reset();
 put();
 finish();

Reset does two different things depending on the internal state. Not so =
 =

good.

 The put() function makes hash implementation an OutputRange.

 You wouldn't need the save, restore functions.

 They're not needed. They only serve as speed optimization when hashing=

  =

 many messages which have the same beginning block. This is used in HMA=

C,  =

 which is:

 HMAC(func, key, message) =3D func(key ^ opad, func(key ^ ipad, message=

));
 when func supports saving the IV, the first parts are precomputed, whe=

n  =

 not HMAC resorts to full hashing. This optimization is also mentioned =

in  =

 HMAC spec.

If hash contexts were value type you could simply do.
auto saved =3D hash_ctx;
Or alternatively one could add a 'save()' function to an isSaveableHash(=
H)
concept.

 Some unnecessary allocations could go away.
 Most important instances would have less mutable state.

 Could you specify which ones, please?

Basically every 'new' in std.hash.crypto.base but especially the ones in=
  =

hash(T) and hashToHex(T).

 You could probably parameterize a Merkle Damg=C3=A5rd base with free
 functions for the transformation.

 What would be the difference from current class parametrization?

Just wanted to point out a specific alternative if code reuse is of a  =

concern.
If not using classes you need a way to inject the transformation which  =

could
be done like.
alias MerkleDamgard!(uint, 5, 80, 16, 20, sha1Transform) SHA1;

 A dynamic interface can be obtaines by templated instances similar to=


 what std.range does.

 Could you elaborate? I don't know exactly what do you mean. Function  =

 templates?

http://www.digitalmars.com/d/2.0/phobos/std_range.html#InputRange
DynamicAllocatorTemplate at  =

https://github.com/dsimcha/TempAlloc/blob/master/std/allocators/allocato=
r.d

These are to support cases where you either want a stable ABI or
have a template firewall for scalability issues (e.g. could be sensible =
 =

for the HMAC implementation although not really necessary).

 Thanks a lot!

Oct 25 2011

Piotr Szturmaj <bncrbme jadamspam.pl> writes:

Martin Nowak wrote:
 On Tue, 25 Oct 2011 09:43:48 +0200, Piotr Szturmaj
 <bncrbme jadamspam.pl> wrote:

 Martin Nowak wrote:
 I have to say though that I like the current struct based interface
 much better.

 struct Hash
 {
 // enhanced by some compile time traits
 enum hashLength = 16;
 enum blockLength = 0;

 The reason why hash and block length are runtime variables is that
 some hash functions are parametrized with variables of great
 amplitude, for example CubeHash may have any number of rounds, and any
 size of block and hash output.

 // three interface functions
 void start();
 void update(const(ubyte)[] data);
 void finish(ref ubyte[hashLength] digest);
 }

 There, it is:

 reset();
 put();
 finish();

 Reset does two different things depending on the internal state. Not so
 good.

I think that's negligible, but it may be "unbranched" easily.

 The put() function makes hash implementation an OutputRange.

 You wouldn't need the save, restore functions.

 They're not needed. They only serve as speed optimization when hashing
 many messages which have the same beginning block. This is used in
 HMAC, which is:

 HMAC(func, key, message) = func(key ^ opad, func(key ^ ipad, message));

 when func supports saving the IV, the first parts are precomputed,
 when not HMAC resorts to full hashing. This optimization is also
 mentioned in HMAC spec.

 If hash contexts were value type you could simply do.
 auto saved = hash_ctx;
 Or alternatively one could add a 'save()' function to an isSaveableHash(H)
 concept.

Yes, I thought about that, but current way is faster because only 
initialization vector is saved, and api name emphasises it. General 
save() on SHA512 would need to copy IV and 80 ulongs (internal state).

 Some unnecessary allocations could go away.
 Most important instances would have less mutable state.

 Could you specify which ones, please?

 Basically every 'new' in std.hash.crypto.base but especially the ones in
 hash(T) and hashToHex(T).

Yes, this will be fixed.

 You could probably parameterize a Merkle Damgård base with free
 functions for the transformation.

 What would be the difference from current class parametrization?

 Just wanted to point out a specific alternative if code reuse is of a
 concern.
 If not using classes you need a way to inject the transformation which
 could
 be done like.
 alias MerkleDamgard!(uint, 5, 80, 16, 20, sha1Transform) SHA1;

Either way this function must be written. Either as free function or as 
class method. I don't think someone would use transformation function 
directly.

 A dynamic interface can be obtaines by templated instances similar to
 what std.range does.

 Could you elaborate? I don't know exactly what do you mean. Function
 templates?

 http://www.digitalmars.com/d/2.0/phobos/std_range.html#InputRange
 DynamicAllocatorTemplate at
 https://github.com/dsimcha/TempAlloc/blob/master/std/allocators/allocator.d

 These are to support cases where you either want a stable ABI or
 have a template firewall for scalability issues (e.g. could be sensible
 for the HMAC implementation although not really necessary).

I will look into it. Thanks!

Oct 25 2011

Brad Roberts <braddr puremagic.com> writes:

On 10/24/2011 5:10 PM, Piotr Szturmaj wrote:
 https://github.com/pszturmaj/phobos/tree/master/std/crypto
 
 This is some early work on std.crypto proposal. Currently only MD5, HMAC and
all SHA family functions (excluding SHA0
 which is very old, broken and no longer in use). I plan to add other crypto
primitives later.
 
 I know about one SHA1 pull request optimized for SSSE3. I think native code
must be there to support other non x86 CPUs
 and SIMD optimization may be added at any time later.
 
 Any opinions are welcome. Especially if such design is good or bad, and what
needs to be changed.
 
 Thanks :)

A key element to a lot of crypto code is speed.  I really don't think we want
to re-invent all the optimizations on all
the platforms.  To that end, I really suggest that we stick to wrapping
existing implementations, like openssl.  While I
hate the openssl apis, I do respect the continual effort that various companies
invest in optimizing the code.

My 2 cents,
Brad

Oct 25 2011

Piotr Szturmaj <bncrbme jadamspam.pl> writes:

Brad Roberts wrote:
 On 10/24/2011 5:10 PM, Piotr Szturmaj wrote:
 https://github.com/pszturmaj/phobos/tree/master/std/crypto

 This is some early work on std.crypto proposal. Currently only MD5, HMAC and
all SHA family functions (excluding SHA0
 which is very old, broken and no longer in use). I plan to add other crypto
primitives later.

 I know about one SHA1 pull request optimized for SSSE3. I think native code
must be there to support other non x86 CPUs
 and SIMD optimization may be added at any time later.

 Any opinions are welcome. Especially if such design is good or bad, and what
needs to be changed.

 Thanks :)

 A key element to a lot of crypto code is speed.  I really don't think we want
to re-invent all the optimizations on all
 the platforms.  To that end, I really suggest that we stick to wrapping
existing implementations, like openssl.  While I
 hate the openssl apis, I do respect the continual effort that various
companies invest in optimizing the code.

You are of course right about speed but there are some reasons for 
having our own code _if_ we want std.crypto:

1. Phobos independence
2. Non D friendly API of openssl
3. No need to link with openssl to compute a simple hash.
4. Licensing

We also have some options for speed improvement, while retaining our API

1. Wrap openssl _on demand_, this is transparent to the user, API 
doesn't change
2. Many (if not all) openssl asm code may be obtained using CRYPTOGAMS 
license (BSD). But reading http://www.openssl.org/~appro/cryptogams/ 
suggests that author might be willing to license it under Boost.
3. Adapt Crypto++ asm code which is public domain (x86/64 only)

The 1st one should be easy, and user would have choice between openssl 
wrapped within std.crypto or direct access to etc.c.openssl.

Oct 25 2011

Walter Bright <newshound2 digitalmars.com> writes:

On 10/24/2011 5:10 PM, Piotr Szturmaj wrote:
 https://github.com/pszturmaj/phobos/tree/master/std/crypto

 This is some early work on std.crypto proposal. Currently only MD5, HMAC and
all
 SHA family functions (excluding SHA0 which is very old, broken and no longer in
 use). I plan to add other crypto primitives later.

 I know about one SHA1 pull request optimized for SSSE3. I think native code
must
 be there to support other non x86 CPUs and SIMD optimization may be added at
any
 time later.

 Any opinions are welcome. Especially if such design is good or bad, and what
 needs to be changed.


Thanks for championing this.

The input to the functions should be a range, not an array (although an array
is 
a range).

In general, for Phobos, all arbitrary input data should be in the form of 
ranges, and all arbitrary output data should present itself as a range. This 
facilitates the idea of:

     range => algorithm => range

So, for example, I want to encrypt and then zip a file and send the output to a 
socket:

     file => encrypt => compress => socket

All the components here will just "snap" together. With the existing design of 
crypto, I'd have to read the file into an array, then pass the array to
encrypt, 
etc.

Think of it like the filter concept in Unix that has been so successful.

Oct 25 2011

Piotr Szturmaj <bncrbme jadamspam.pl> writes:

Walter Bright wrote:
 On 10/24/2011 5:10 PM, Piotr Szturmaj wrote:
 https://github.com/pszturmaj/phobos/tree/master/std/crypto

 This is some early work on std.crypto proposal. Currently only MD5,
 HMAC and all
 SHA family functions (excluding SHA0 which is very old, broken and no
 longer in
 use). I plan to add other crypto primitives later.

 I know about one SHA1 pull request optimized for SSSE3. I think native
 code must
 be there to support other non x86 CPUs and SIMD optimization may be
 added at any
 time later.

 Any opinions are welcome. Especially if such design is good or bad,
 and what
 needs to be changed.


 Thanks for championing this.

 The input to the functions should be a range, not an array (although an
 array is a range).

 In general, for Phobos, all arbitrary input data should be in the form
 of ranges, and all arbitrary output data should present itself as a
 range. This facilitates the idea of:

 range => algorithm => range

 So, for example, I want to encrypt and then zip a file and send the
 output to a socket:

 file => encrypt => compress => socket

 All the components here will just "snap" together. With the existing
 design of crypto, I'd have to read the file into an array, then pass the
 array to encrypt, etc.

 Think of it like the filter concept in Unix that has been so successful.

I share your opinion. I was thinking about such filter concept for 
std.crypto.cipher (TBD), but I will also try to convert current hash 
function code to ranges.

Thanks for pointing that out.

Oct 25 2011

Walter Bright <newshound2 digitalmars.com> writes:

On 10/25/2011 3:40 PM, Piotr Szturmaj wrote:
 I share your opinion. I was thinking about such filter concept for
 std.crypto.cipher (TBD), but I will also try to convert current hash function
 code to ranges.

 Thanks for pointing that out.

Andrei and I have pretty much failed at articulating this vision for Phobos. We 
need to get our act together.

Oct 25 2011

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Tuesday, October 25, 2011 15:55 Walter Bright wrote:
 On 10/25/2011 3:40 PM, Piotr Szturmaj wrote:
 I share your opinion. I was thinking about such filter concept for
 std.crypto.cipher (TBD), but I will also try to convert current hash
 function code to ranges.
 
 Thanks for pointing that out.

 
 Andrei and I have pretty much failed at articulating this vision for
 Phobos. We need to get our act together.

Ranges in general are not something that is comunicated very well. They're not 
in TDPL, and none of the online documentation discusses them in detail. You 
pretty much only learn them from reading Phobos' documentation and using 
Phobos or from discussing it with people who know about them already. And 
there's no overall plan or design for Phobos articulated _anywhere_ that I'm 
aware of, so between those two facts, there's really nothing to get such a 
vision across to anyone other than word of mouth (though I suppose that it's 
more "word of keyboard" in many cases).

- Jonathan M Davis

Oct 25 2011

Walter Bright <newshound2 digitalmars.com> writes:

On 10/25/2011 4:12 PM, Jonathan M Davis wrote:
 Ranges in general are not something that is comunicated very well. They're not
 in TDPL, and none of the online documentation discusses them in detail. You
 pretty much only learn them from reading Phobos' documentation and using
 Phobos or from discussing it with people who know about them already. And
 there's no overall plan or design for Phobos articulated _anywhere_ that I'm
 aware of, so between those two facts, there's really nothing to get such a
 vision across to anyone other than word of mouth (though I suppose that it's
 more "word of keyboard" in many cases).

Yup. Guilty as charged.

Oct 25 2011

Brad Anderson <eco gnuk.net> writes:

On Tue, Oct 25, 2011 at 5:12 PM, Jonathan M Davis <jmdavisProg gmx.com>wrote:

 On Tuesday, October 25, 2011 15:55 Walter Bright wrote:
 On 10/25/2011 3:40 PM, Piotr Szturmaj wrote:
 I share your opinion. I was thinking about such filter concept for
 std.crypto.cipher (TBD), but I will also try to convert current hash
 function code to ranges.

 Thanks for pointing that out.

 Andrei and I have pretty much failed at articulating this vision for
 Phobos. We need to get our act together.

 Ranges in general are not something that is comunicated very well. They're
 not
 in TDPL, and none of the online documentation discusses them in detail. You
 pretty much only learn them from reading Phobos' documentation and using
 Phobos or from discussing it with people who know about them already. And
 there's no overall plan or design for Phobos articulated _anywhere_ that
 I'm
 aware of, so between those two facts, there's really nothing to get such a
 vision across to anyone other than word of mouth (though I suppose that
 it's
 more "word of keyboard" in many cases).

 - Jonathan M Davis

I learned about them from Andrei's boostcon talk.  Not exactly the first
place you'd look for information about D but the talk was very informative.

Regards,
Brad Anderson

Oct 25 2011

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

If you want a bit of history:

http://www.digitalmars.com/d/archives/digitalmars/D/announce/RFC_on_range_design_for_D2_12922.html#N12922
http://www.digitalmars.com/d/archives/digitalmars/D/announce/Revised_RFC_on_range_design_for_D2_13211.html

http://web.archive.org/web/20090112000313/http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html

That last link (although outdated) is probably more descriptive than
what we have now.

Oct 25 2011

Piotr Szturmaj <bncrbme jadamspam.pl> writes:

Walter Bright wrote:
 On 10/25/2011 3:40 PM, Piotr Szturmaj wrote:
 I share your opinion. I was thinking about such filter concept for
 std.crypto.cipher (TBD), but I will also try to convert current hash
 function
 code to ranges.

 Thanks for pointing that out.

 Andrei and I have pretty much failed at articulating this vision for
 Phobos. We need to get our act together.

Personally, I learned ranges from Phobos docs/src and Andrei's article 
about them. In the beginning I was surprised that I could not find any 
"range howto", yet on this NG some range designs were considered 
obvious. But I finally (somewhat) understood and used them in my 
postgres client implementation. And I like them.

I wish there was some article about ranges and their uses. Some tips and 
tricks thing for users and guidelines for range writers.

But let's get back to std.crypto. I planned to add input ranges to 
function put, because it already forms an output range. I thought that 
std.range.put() would serve the need if UFCS will be supported. But, as 
I guess, the best method is to support input ranges directly.

Oct 25 2011

Walter Bright <newshound2 digitalmars.com> writes:

On 10/25/2011 5:22 PM, Piotr Szturmaj wrote:
 But let's get back to std.crypto. I planned to add input ranges to function
put,
 because it already forms an output range. I thought that std.range.put() would
 serve the need if UFCS will be supported. But, as I guess, the best method is
to
 support input ranges directly.

An easy test is that if the interface takes a T[] as input, consider a range 
instead. Ditto for output. If an interface takes a File as input, it's a red 
flag that something is wrong.

Oct 25 2011

Steve Teale <steve.teale britseyeview.com> writes:

 
 An easy test is that if the interface takes a T[] as input, consider a
 range instead. Ditto for output. If an interface takes a File as input,
 it's a red flag that something is wrong.

I have a question about ranges that occurred to me when I was composing a 
MySQL result set into a random access range.

To do that I have to provide the save capability defined by

R r1;
R r2 = r1.save;

Would it harm the use of ranges as elements of operation sequences if the 
type of the entity that got saved was not the same as the original range 
provider type. Something like:

R r1;
S r2 = r1.save;
r1.restore(r2);

For entities with a good deal of state, that implement a range, storing 
the whole state, and more significantly, restoring it, may be non-
trivial, but saving and restoring the state of the range may be quite a 
lightweight affair.

I'm also puzzled by the semantics of the random access range.

If I restrict myself to indexing into the range, then things are fine.

auto x = r[0];
x = r[10];
x = r[0];

But if I slip in a few popFront's, then presumably x = r[0] will give me 
a different result.

This just makes me slightly uneasy.

Steve

Oct 25 2011

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Wednesday, October 26, 2011 06:28:52 Steve Teale wrote:
 An easy test is that if the interface takes a T[] as input, consider a
 range instead. Ditto for output. If an interface takes a File as input,
 it's a red flag that something is wrong.

 
 I have a question about ranges that occurred to me when I was composing a
 MySQL result set into a random access range.
 
 To do that I have to provide the save capability defined by
 
 R r1;
 R r2 = r1.save;
 
 Would it harm the use of ranges as elements of operation sequences if the
 type of the entity that got saved was not the same as the original range
 provider type. Something like:
 
 R r1;
 S r2 = r1.save;
 r1.restore(r2);
 
 For entities with a good deal of state, that implement a range, storing
 the whole state, and more significantly, restoring it, may be non-
 trivial, but saving and restoring the state of the range may be quite a
 lightweight affair.

It's likely to cause issues if the type returned by save differs from the 
original type. From your description, it sounds like the range is over 
something, and it's that something which you don't want to copy. If that's the 
case, then the range just doesn't contain that data. Rather, it's a view into 
the data, so saving it just save that view.

The fact that arrays are currently the most used example of ranges definitely 
makes them more confusing IMHO, since people often think of ranges as 
containers, whereas ranges aren't. And really, dynamic arrays in D _aren't_ 
containers. No dynamic array actually owns its elements. The runtime owns it, 
and dynamic arrays are just ranges over that data.

It's easier to understand when you think about a container, such as vector 
type (e.g. std.container.Array) or a linked list. The container holds the 
data. The container is _not_ a range. A range is a view of that data. So, 
popping elements off of the range has no effect on the original container - the 
same goes with many range-based functions. As long as the elements in the 
range aren't mutated or rearranged, the elements in the container are 
unaltered. So, calling save on a range just gives you a copy of that view of 
the container, allowing you to alter the range and still have that original 
view of the container.

 I'm also puzzled by the semantics of the random access range.
 
 If I restrict myself to indexing into the range, then things are fine.
 
 auto x = r[0];
 x = r[10];
 x = r[0];
 
 But if I slip in a few popFront's, then presumably x = r[0] will give me
 a different result.
 
 This just makes me slightly uneasy.

r[0] _is_ r.front, so of course popFront would change what r[0] is. You're 
consuming the range as you pop elements off of its front. So, if you do 
_anything_ which alters a range (rather than its elements), at least some of 
(if not all of) the indices in a random access range change. The reason that 
this is making you uneasy is likely at least partially because of the fact 
that you're probably thinking about arrays and how they're used rather than 
thinking more abstractly about ranges. Arrays are really a bad example of 
ranges, in spite of the fact that they are unfortunately, our most common 
example of ranges at this point.

- Jonathan M Davis

Oct 25 2011

Steve Teale <steve.teale britseyeview.com> writes:

On Tue, 25 Oct 2011 23:45:58 -0700, Jonathan M Davis wrote:

 On Wednesday, October 26, 2011 06:28:52 Steve Teale wrote:
 An easy test is that if the interface takes a T[] as input, consider
 a range instead. Ditto for output. If an interface takes a File as
 input, it's a red flag that something is wrong.

 
 I have a question about ranges that occurred to me when I was composing
 a MySQL result set into a random access range.
 
 To do that I have to provide the save capability defined by
 
 R r1;
 R r2 = r1.save;
 
 Would it harm the use of ranges as elements of operation sequences if
 the type of the entity that got saved was not the same as the original
 range provider type. Something like:
 
 R r1;
 S r2 = r1.save;
 r1.restore(r2);
 
 For entities with a good deal of state, that implement a range, storing
 the whole state, and more significantly, restoring it, may be non-
 trivial, but saving and restoring the state of the range may be quite a
 lightweight affair.

 
 It's likely to cause issues if the type returned by save differs from
 the original type. From your description, it sounds like the range is
 over something, and it's that something which you don't want to copy. If
 that's the case, then the range just doesn't contain that data. Rather,
 it's a view into the data, so saving it just save that view.

Well, yes Jonathan, that's just what I'm getting at. I want to save just 
those things that are pertinent to the range/view, not the whole panoply 
of the result set representation or whatever other object is providing 
the data that the range is a view of.

I could do it by saving an instance of the object representing the result 
set containing only the relevant data, but that would be misleading for 
the user, because it would not match the object it was cloned from in any 
respect other than representing a range.

Should not the requirement for the thing provided by save be only that it 
should provide the same view as that provided by the source range at the 
time of the save, and the same range 'interface'.

Is there an accepted term for the way ranges are defined, i.e. as an 
entity that satisfies some template evaluating roughly to a bool?

Steve

Steve

Oct 26 2011

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Wednesday, October 26, 2011 03:17 Steve Teale wrote:
 On Tue, 25 Oct 2011 23:45:58 -0700, Jonathan M Davis wrote:
 On Wednesday, October 26, 2011 06:28:52 Steve Teale wrote:
 An easy test is that if the interface takes a T[] as input, consider
 a range instead. Ditto for output. If an interface takes a File as
 input, it's a red flag that something is wrong.

 
 I have a question about ranges that occurred to me when I was composing
 a MySQL result set into a random access range.
 
 To do that I have to provide the save capability defined by
 
 R r1;
 R r2 = r1.save;
 
 Would it harm the use of ranges as elements of operation sequences if
 the type of the entity that got saved was not the same as the original
 range provider type. Something like:
 
 R r1;
 S r2 = r1.save;
 r1.restore(r2);
 
 For entities with a good deal of state, that implement a range, storing
 the whole state, and more significantly, restoring it, may be non-
 trivial, but saving and restoring the state of the range may be quite a
 lightweight affair.

 
 It's likely to cause issues if the type returned by save differs from
 the original type. From your description, it sounds like the range is
 over something, and it's that something which you don't want to copy. If
 that's the case, then the range just doesn't contain that data. Rather,
 it's a view into the data, so saving it just save that view.

 
 Well, yes Jonathan, that's just what I'm getting at. I want to save just
 those things that are pertinent to the range/view, not the whole panoply
 of the result set representation or whatever other object is providing
 the data that the range is a view of.
 
 I could do it by saving an instance of the object representing the result
 set containing only the relevant data, but that would be misleading for
 the user, because it would not match the object it was cloned from in any
 respect other than representing a range.
 
 Should not the requirement for the thing provided by save be only that it
 should provide the same view as that provided by the source range at the
 time of the save, and the same range 'interface'.
 
 Is there an accepted term for the way ranges are defined, i.e. as an
 entity that satisfies some template evaluating roughly to a bool?

Ranges are defined per templates in std.range. isForwardRange, isInputRange, 
isRandomAccessRange, etc. And it looks like isForwardRange specifically checks 
that the return type of save is the same type as the range itself. I was 
thinking that it didn't necessarily require that but that you'd have definite 
issues having save return a different type, because it's generally assumed 
that it's the same type and code will reflect that. However, it looks like 
it's actually required.

What you should probably do is have the result set be an object holding the 
data but not be a range itself and then overload opSlice to give you a range 
over that data (as would be done with a container). Then the range isn't in a 
position where it's trying to own the data, and it's clear to the programmer 
where the data is stored.

- Jonathan M Davis

Oct 26 2011

Steve Teale <steve.teale britseyeview.com> writes:

On Wed, 26 Oct 2011 13:01:14 -0400, Jonathan M Davis wrote:

 On Wednesday, October 26, 2011 03:17 Steve Teale wrote:
 On Tue, 25 Oct 2011 23:45:58 -0700, Jonathan M Davis wrote:
 On Wednesday, October 26, 2011 06:28:52 Steve Teale wrote:
 An easy test is that if the interface takes a T[] as input,
 consider a range instead. Ditto for output. If an interface takes
 a File as input, it's a red flag that something is wrong.

 
 I have a question about ranges that occurred to me when I was
 composing a MySQL result set into a random access range.
 
 To do that I have to provide the save capability defined by
 
 R r1;
 R r2 = r1.save;
 
 Would it harm the use of ranges as elements of operation sequences
 if the type of the entity that got saved was not the same as the
 original range provider type. Something like:
 
 R r1;
 S r2 = r1.save;
 r1.restore(r2);
 
 For entities with a good deal of state, that implement a range,
 storing the whole state, and more significantly, restoring it, may
 be non- trivial, but saving and restoring the state of the range may
 be quite a lightweight affair.

 
 It's likely to cause issues if the type returned by save differs from
 the original type. From your description, it sounds like the range is
 over something, and it's that something which you don't want to copy.
 If that's the case, then the range just doesn't contain that data.
 Rather, it's a view into the data, so saving it just save that view.

 
 Well, yes Jonathan, that's just what I'm getting at. I want to save
 just those things that are pertinent to the range/view, not the whole
 panoply of the result set representation or whatever other object is
 providing the data that the range is a view of.
 
 I could do it by saving an instance of the object representing the
 result set containing only the relevant data, but that would be
 misleading for the user, because it would not match the object it was
 cloned from in any respect other than representing a range.
 
 Should not the requirement for the thing provided by save be only that
 it should provide the same view as that provided by the source range at
 the time of the save, and the same range 'interface'.
 
 Is there an accepted term for the way ranges are defined, i.e. as an
 entity that satisfies some template evaluating roughly to a bool?

 
 Ranges are defined per templates in std.range. isForwardRange,
 isInputRange, isRandomAccessRange, etc. And it looks like isForwardRange
 specifically checks that the return type of save is the same type as the
 range itself. I was thinking that it didn't necessarily require that but
 that you'd have definite issues having save return a different type,
 because it's generally assumed that it's the same type and code will
 reflect that. However, it looks like it's actually required.

But is it just required by std.range, or is it required at some 
theoretical level by algorithms that can work with ranges as inputs and 
outputs?

 What you should probably do is have the result set be an object holding
 the data but not be a range itself and then overload opSlice to give you
 a range over that data (as would be done with a container). Then the
 range isn't in a position where it's trying to own the data, and it's
 clear to the programmer where the data is stored.
 

But in my mind, just from a quick reading of the language definition, 
slices are closely tied to arrays, which as you have already noted, are 
not the best example of a range.

Are ranges just a library artefact, or are they supported by the 
language. They appear to be recognized by foreach, so if they are 
recognized by D, then we should presumably have operator functions 
opInputRange, opForwardRange, and so on, whatever those terms might mean.

If not, then facilities like std.algorithm should have a warning notice 
like cigarettes that says "this facility may not be usable with POD".

Steve

Oct 26 2011

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Wednesday, October 26, 2011 19:00:02 Steve Teale wrote:
 On Wed, 26 Oct 2011 13:01:14 -0400, Jonathan M Davis wrote:
 Ranges are defined per templates in std.range. isForwardRange,
 isInputRange, isRandomAccessRange, etc. And it looks like isForwardRange
 specifically checks that the return type of save is the same type as the
 range itself. I was thinking that it didn't necessarily require that but
 that you'd have definite issues having save return a different type,
 because it's generally assumed that it's the same type and code will
 reflect that. However, it looks like it's actually required.

 
 But is it just required by std.range, or is it required at some
 theoretical level by algorithms that can work with ranges as inputs and
 outputs?

It complicates things to allow for the range returned by save not be the same 
type as the original range. It's probably possible to alter isForwardRange to 
allow save to return a different type, but I don't know what all of the side 
effects of such a decision would be. I'm certain however that it would break a 
fair bit of code.

 What you should probably do is have the result set be an object holding
 the data but not be a range itself and then overload opSlice to give you
 a range over that data (as would be done with a container). Then the
 range isn't in a position where it's trying to own the data, and it's
 clear to the programmer where the data is stored.

 
 But in my mind, just from a quick reading of the language definition,
 slices are closely tied to arrays, which as you have already noted, are
 not the best example of a range.
 
 Are ranges just a library artefact, or are they supported by the
 language. They appear to be recognized by foreach, so if they are
 recognized by D, then we should presumably have operator functions
 opInputRange, opForwardRange, and so on, whatever those terms might mean.

foreach supports the API of an input range: front, empty, and popFront. 
There's no need for overloaded operators of any kind. If you want to overload 
an operator for foreach, then use opApply. foreach is the _only_ place in the 
language that specfically supports ranges.

Arrays are a poor example of ranges because they don't have an obviously 
associated container. A dynamic array is a range over a block of memory that 
the runtime maintains. With a full-blown container such as Array or 
RedBlackTree, it's the container that holds the data, and a range for one of 
them is a range over that container. People tend to think of arrays as being 
containers rather than just slices, so it becomes confusing to people.

It's much easier to properly understand ranges if you think of them as being 
associated with a container like an iterator would be in C++. They don't 
normally own the elements that they're iterating over. When they do, you get 
situations such as input ranges where you can't save them (an input stream 
being a prime example).

 If not, then facilities like std.algorithm should have a warning notice
 like cigarettes that says "this facility may not be usable with POD".

I'm afraid that I don't understand why a warning would be necessary. std.range 
lists the functions that each type of range must implement and it has 
templates for testing whether a particular type implements those functions for 
a given sort of range. I suspect that you're misunderstanding something 
fundamental about ranges.

If you haven't read this article yet - 
http://www.informit.com/articles/printerfriendly.aspx?p=1407357 - I suggest 
that you do. It doesn't entirely match what std.range does (at least as far as 
naming goes) but it does explain the core concepts fairly well and as far as I 
know is the best explanation on ranges that currently exists.

- Jonathan M Davis

Oct 26 2011

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 26.10.2011 10:28, Steve Teale wrote:
 An easy test is that if the interface takes a T[] as input, consider a
 range instead. Ditto for output. If an interface takes a File as input,
 it's a red flag that something is wrong.

 I have a question about ranges that occurred to me when I was composing a
 MySQL result set into a random access range.

 To do that I have to provide the save capability defined by

 R r1;
 R r2 = r1.save;

 Would it harm the use of ranges as elements of operation sequences if the
 type of the entity that got saved was not the same as the original range
 provider type. Something like:

 R r1;
 S r2 = r1.save;
 r1.restore(r2);

Yay! Range with 'restore' strikes again.
Seriously, last time something about range was discussed I noticed this 
very same thing - full copy is costly, there are ranges that can restore 
their state using some special tiny savepoint object.
Though currently this kind of thing is not supported nor expected, plus 
there are cases where copy needs to be made.

 For entities with a good deal of state, that implement a range, storing
 the whole state, and more significantly, restoring it, may be non-
 trivial, but saving and restoring the state of the range may be quite a
 lightweight affair.

 I'm also puzzled by the semantics of the random access range.

 If I restrict myself to indexing into the range, then things are fine.

 auto x = r[0];
 x = r[10];
 x = r[0];

 But if I slip in a few popFront's, then presumably x = r[0] will give me
 a different result.

 This just makes me slightly uneasy.

 Steve


-- 
Dmitry Olshansky

Oct 26 2011

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Tuesday, October 25, 2011 16:16 Brad Anderson wrote:
 On Tue, Oct 25, 2011 at 5:12 PM, Jonathan M Davis 

<jmdavisProg gmx.com>wrote:
 On Tuesday, October 25, 2011 15:55 Walter Bright wrote:
 On 10/25/2011 3:40 PM, Piotr Szturmaj wrote:
 I share your opinion. I was thinking about such filter concept for
 std.crypto.cipher (TBD), but I will also try to convert current hash
 function code to ranges.
 
 Thanks for pointing that out.

 
 Andrei and I have pretty much failed at articulating this vision for
 Phobos. We need to get our act together.

 
 Ranges in general are not something that is comunicated very well.
 They're not
 in TDPL, and none of the online documentation discusses them in detail.
 You pretty much only learn them from reading Phobos' documentation and
 using Phobos or from discussing it with people who know about them
 already. And there's no overall plan or design for Phobos articulated
 _anywhere_ that I'm
 aware of, so between those two facts, there's really nothing to get such
 a vision across to anyone other than word of mouth (though I suppose
 that it's
 more "word of keyboard" in many cases).
 
 - Jonathan M Davis

 
 I learned about them from Andrei's boostcon talk. Not exactly the first
 place you'd look for information about D but the talk was very informative.

That and an article that Andrei wrote on ranges a while back which was not D-
specific are the only online sources on ranges that I'm aware of, and to my 
knowledge, neither of them are referenced on D's site. Ranges are a powerful 
concept, but they're not well known, in part because they're relatively new - 
particularly when it comes to being used in a major library, let alone the 
standard library for a language. So, they definitely need some explaining.

I was working on article on ranges a while back, but tabled it due to bugs 
related to std.container.Array rendering my examples unworkable. I should 
probably get back to working on that and see if those issues still exist or 
whether I can better work around them in the article. But the fact that 
dynamic arrays are actually a very poor example of ranges in some regards 
(given the fact that people tend to think of them as containers even though 
they really aren't in D) complicates things in a way that I'd really like to 
be able to explain some of the concepts using a real container rather than 
arrays.

- Jonathan M Davis

Oct 25 2011

bcs <bcs example.com> writes:

On 10/24/2011 05:10 PM, Piotr Szturmaj wrote:
 https://github.com/pszturmaj/phobos/tree/master/std/crypto

 This is some early work on std.crypto proposal. Currently only MD5, HMAC
 and all SHA family functions (excluding SHA0 which is very old, broken
 and no longer in use). I plan to add other crypto primitives later.

 I know about one SHA1 pull request optimized for SSSE3. I think native
 code must be there to support other non x86 CPUs and SIMD optimization
 may be added at any time later.

 Any opinions are welcome. Especially if such design is good or bad, and
 what needs to be changed.

 Thanks :)

Are you re-implementing the function kernels your self or are you using 
an existing implementation? Given what I've heard about things like 
side-channel attacks using processing times to recover keys, I'd rather 
not see Phobos use anything written by less than the best expert available.

Oct 28 2011

Piotr Szturmaj <bncrbme jadamspam.pl> writes:

bcs wrote:
 Are you re-implementing the function kernels your self or are you using
 an existing implementation? Given what I've heard about things like
 side-channel attacks using processing times to recover keys, I'd rather
 not see Phobos use anything written by less than the best expert available.

Until now, I implemented some hash functions. There are no branching 
instructions in their transform() routines, so theoretically processing 
time is independent of the function input.

Nov 04 2011

bcs <bcs example.com> writes:

On 11/04/2011 04:27 AM, Piotr Szturmaj wrote:
 bcs wrote:
 Are you re-implementing the function kernels your self or are you using
 an existing implementation? Given what I've heard about things like
 side-channel attacks using processing times to recover keys, I'd rather
 not see Phobos use anything written by less than the best expert
 available.

 Until now, I implemented some hash functions. There are no branching
 instructions in their transform() routines, so theoretically processing
 time is independent of the function input.

 From my very incomplete memory I found the source I was looking for (I 
googled for "aes interperative dance") here 
http://www.moserware.com/2009/09/stick-figure-guide-to-advanced.html 
Look for "Foot-Shooting Prevention Agreement" in one of the images 
~20-25% of the way down.

tl;dr; It mentions "cache-based, timing, and other side channel 
attacks". Unless you can explain to me what those are, in painful 
detail, I don't think we should trust you to avoid them. Get a good 
vetted C implementation and wrap it with a nice D API and call it a day.

Nov 04 2011

Walter Bright <newshound2 digitalmars.com> writes:

On 11/4/2011 7:52 PM, bcs wrote:
 tl;dr; It mentions "cache-based, timing, and other side channel attacks".
Unless
 you can explain to me what those are, in painful detail, I don't think we
should
 trust you to avoid them. Get a good vetted C implementation and wrap it with a
 nice D API and call it a day.


You've got a good point. While I'd like to see native D implementations, crypto 
security is such a big issue we'd probably be better off with your suggestion.

Nov 04 2011

Piotr Szturmaj <bncrbme jadamspam.pl> writes:

bcs wrote:
 On 11/04/2011 04:27 AM, Piotr Szturmaj wrote:
 bcs wrote:
 Are you re-implementing the function kernels your self or are you using
 an existing implementation? Given what I've heard about things like
 side-channel attacks using processing times to recover keys, I'd rather
 not see Phobos use anything written by less than the best expert
 available.

 Until now, I implemented some hash functions. There are no branching
 instructions in their transform() routines, so theoretically processing
 time is independent of the function input.

  From my very incomplete memory I found the source I was looking for (I
 googled for "aes interperative dance") here
 http://www.moserware.com/2009/09/stick-figure-guide-to-advanced.html
 Look for "Foot-Shooting Prevention Agreement" in one of the images
 ~20-25% of the way down.

 tl;dr; It mentions "cache-based, timing, and other side channel
 attacks". Unless you can explain to me what those are, in painful
 detail, I don't think we should trust you to avoid them. Get a good
 vetted C implementation and wrap it with a nice D API and call it a day.

Sorry for late answer.

I didn't implement AES yet, but it can be implemented without lookup 
tables (countermeasure to cache-timing attacks).

Branch-free code without secret-data dependent memory accesses is free 
from cache-timing vurnelabilities on most architectures. The remaining 
are architectures where instruction execution times (cycles) may depend 
on data (f.i. multiply by 0 may take 1 cycle, and >2 cycles when 
multiplying by other numbers). I don't know any concrete examples but I 
realize such cycle-varying instructions _may_ exist. Of course such 
instructions must be avoided in secure code.

The most hard to avoid side channel attack is Power Analysis. On almost 
all CPUs instruction power drain depends on the operands. Again 
multiplying by 0 consume less power than multiplying by other numbers 
and that may be distinguished on the oscilloscope. There are some 
countermeasures including operand blinding, but not all crypto 
algorithms support that.

I only implemented some hashing functions: MD5, SHA1/224/256/384/512 and 
Tiger1/2. MD5 and SHA have code that is branch-free and does not use any 
data dependent lookups so their execution time should be constant, 
making timing attacks harmless.

Unfortunately Tiger function does lookups based on the source bytes and 
is vurnelable to cache-timing attacks. This may be considered insecure 
and removed (openssl does not support it either) as making it secure may 
be not worth it.

The only problem that remain is the power analysis which require 
physical access to the computer. We need to ask ourselfs to what degree 
we must secure our code. I'm going to argue that no single 
implementation is secure to power analysis on all architectures. I think 
we must stick to cache/branch timing level and forget about power 
analysis as its scope is beyond D's specification. We simply can't avoid 
most power analysis attacks because most of the countermeasures belong 
to the hardware instead of the software level.

Nov 20 2011

bcs <bcs example.com> writes:

On 11/20/2011 08:10 AM, Piotr Szturmaj wrote:
 bcs wrote:
 On 11/04/2011 04:27 AM, Piotr Szturmaj wrote:
 bcs wrote:
 Are you re-implementing the function kernels your self or are you using
 an existing implementation? Given what I've heard about things like
 side-channel attacks using processing times to recover keys, I'd rather
 not see Phobos use anything written by less than the best expert
 available.

 Until now, I implemented some hash functions. There are no branching
 instructions in their transform() routines, so theoretically processing
 time is independent of the function input.

 From my very incomplete memory I found the source I was looking for (I
 googled for "aes interperative dance") here
 http://www.moserware.com/2009/09/stick-figure-guide-to-advanced.html
 Look for "Foot-Shooting Prevention Agreement" in one of the images
 ~20-25% of the way down.

 tl;dr; It mentions "cache-based, timing, and other side channel
 attacks". Unless you can explain to me what those are, in painful
 detail, I don't think we should trust you to avoid them. Get a good
 vetted C implementation and wrap it with a nice D API and call it a day.

 Sorry for late answer.

Np, but I still have a number of concerns:

What is the advantage to implementing the kernels of any of these 
functions in D? Will the code be faster? Smaller? More secure? More 
maintainable? (On the other hand, the value of doing the API code in D 
goes with no debate.)

How many people in the D community have the experience and know-how to 
review the security of an implementation? If there are less than 2 or 3 
people who can do that, can we afford to include native kernels? We 
can't have just one and if we have only two and one leaves for some 
reason the code becomes un-maintainable for lack of a reviewer. *I* 
wouldn't be comfortable with less than about 4-5.

 I didn't implement AES yet, but it can be implemented without lookup
 tables (countermeasure to cache-timing attacks).

 Branch-free code without secret-data dependent memory accesses is free
 from cache-timing vurnelabilities on most architectures. The remaining
 are architectures where instruction execution times (cycles) may depend
 on data (f.i. multiply by 0 may take 1 cycle, and >2 cycles when
 multiplying by other numbers). I don't know any concrete examples but I
 realize such cycle-varying instructions _may_ exist. Of course such
 instructions must be avoided in secure code.

 The most hard to avoid side channel attack is Power Analysis. On almost
 all CPUs instruction power drain depends on the operands. Again
 multiplying by 0 consume less power than multiplying by other numbers
 and that may be distinguished on the oscilloscope. There are some
 countermeasures including operand blinding, but not all crypto
 algorithms support that.

 I only implemented some hashing functions: MD5, SHA1/224/256/384/512 and
 Tiger1/2. MD5 and SHA have code that is branch-free and does not use any
 data dependent lookups so their execution time should be constant,
 making timing attacks harmless.

 Unfortunately Tiger function does lookups based on the source bytes and
 is vurnelable to cache-timing attacks. This may be considered insecure
 and removed (openssl does not support it either) as making it secure may
 be not worth it.

 The only problem that remain is the power analysis which require
 physical access to the computer. We need to ask ourselfs to what degree
 we must secure our code. I'm going to argue that no single
 implementation is secure to power analysis on all architectures. I think
 we must stick to cache/branch timing level and forget about power
 analysis as its scope is beyond D's specification. We simply can't avoid
 most power analysis attacks because most of the countermeasures belong
 to the hardware instead of the software level.

Nov 21 2011

Piotr Szturmaj <bncrbme jadamspam.pl> writes:

bcs wrote:
 On 11/20/2011 08:10 AM, Piotr Szturmaj wrote:
 bcs wrote:
 On 11/04/2011 04:27 AM, Piotr Szturmaj wrote:
 bcs wrote:
 Are you re-implementing the function kernels your self or are you
 using
 an existing implementation? Given what I've heard about things like
 side-channel attacks using processing times to recover keys, I'd
 rather
 not see Phobos use anything written by less than the best expert
 available.

 Until now, I implemented some hash functions. There are no branching
 instructions in their transform() routines, so theoretically processing
 time is independent of the function input.

 From my very incomplete memory I found the source I was looking for (I
 googled for "aes interperative dance") here
 http://www.moserware.com/2009/09/stick-figure-guide-to-advanced.html
 Look for "Foot-Shooting Prevention Agreement" in one of the images
 ~20-25% of the way down.

 tl;dr; It mentions "cache-based, timing, and other side channel
 attacks". Unless you can explain to me what those are, in painful
 detail, I don't think we should trust you to avoid them. Get a good
 vetted C implementation and wrap it with a nice D API and call it a day.


  >
 Sorry for late answer.

 Np, but I still have a number of concerns:

 What is the advantage to implementing the kernels of any of these
 functions in D? Will the code be faster? Smaller? More secure? More
 maintainable? (On the other hand, the value of doing the API code in D
 goes with no debate.)

The first advantage is that Phobos will be independent of any crypto 
libraries. The second one is that there will be no licensing issues. All 
crypto code will be under Boost license like the rest of Phobos.

 How many people in the D community have the experience and know-how to
 review the security of an implementation? If there are less than 2 or 3
 people who can do that, can we afford to include native kernels? We
 can't have just one and if we have only two and one leaves for some
 reason the code becomes un-maintainable for lack of a reviewer. *I*
 wouldn't be comfortable with less than about 4-5.

I know Regan Heath who wrote some crypto code for Tango. Also, I suspect 
that D _will_ gain more (crypto) contributors, especially after joining GCC.

Minimum number of contributors/reviewers requirement in open-source 
project is at least unfortunate in my opinion. Nevertheless, I respect 
your thoughts. But imagine what could happen if Walter waited for 
contributors instead of starting D project on his own?

Please realize that we do not implement every possible crypto algorithm 
at once. We need to start with something like hashing, then add 
encryption and other cryptographic primitives.

Nov 22 2011

"Regan Heath" <regan netmail.co.nz> writes:

On Tue, 22 Nov 2011 16:16:21 -0000, Piotr Szturmaj <bncrbme jadamspam.pl>
wrote:
 bcs wrote:
 On 11/20/2011 08:10 AM, Piotr Szturmaj wrote:
 bcs wrote:
 On 11/04/2011 04:27 AM, Piotr Szturmaj wrote:
 bcs wrote:
 Are you re-implementing the function kernels your self or are you
 using
 an existing implementation? Given what I've heard about things like
 side-channel attacks using processing times to recover keys, I'd
 rather
 not see Phobos use anything written by less than the best expert
 available.

 Until now, I implemented some hash functions. There are no branching
 instructions in their transform() routines, so theoretically  
 processing
 time is independent of the function input.

 From my very incomplete memory I found the source I was looking for (I
 googled for "aes interperative dance") here
 http://www.moserware.com/2009/09/stick-figure-guide-to-advanced.html
 Look for "Foot-Shooting Prevention Agreement" in one of the images
 ~20-25% of the way down.

 tl;dr; It mentions "cache-based, timing, and other side channel
 attacks". Unless you can explain to me what those are, in painful
 detail, I don't think we should trust you to avoid them. Get a good
 vetted C implementation and wrap it with a nice D API and call it a  
 day.


  >
 Sorry for late answer.

 Np, but I still have a number of concerns:

 What is the advantage to implementing the kernels of any of these
 functions in D? Will the code be faster? Smaller? More secure? More
 maintainable? (On the other hand, the value of doing the API code in D
 goes with no debate.)

 The first advantage is that Phobos will be independent of any crypto  
 libraries. The second one is that there will be no licensing issues. All  
 crypto code will be under Boost license like the rest of Phobos.

Ultimately I think it comes down to the question of whether we want/expect  
to have native D implementations of things, or whether certain  
projects/libraries will always be too large to maintain in D.  The answer  
to that, comes down to whether we think D will gain a sufficient user base  
to include enough people able to produce and maintain those libraries, or  
.. whether D will gain sufficient importance that existing developers in  
those libraries produce D bindings themselves.

Under the assumption that D will gain sufficient traction it makes sense  
to start implementing what we can in D, now.  At the same time, in order  
to gain that traction D needs bindings to existing libraries.  The latter  
is probably a slightly higher priority at present, but it's not a reason  
not to develop native D implementations at the same time.

If we're lucky having initially incomplete native D implementations will  
actually encourage people with those skills to contribute to D, as there  
is a certain sort of satisfaction in doing so.

 How many people in the D community have the experience and know-how to
 review the security of an implementation? If there are less than 2 or 3
 people who can do that, can we afford to include native kernels? We
 can't have just one and if we have only two and one leaves for some
 reason the code becomes un-maintainable for lack of a reviewer. *I*
 wouldn't be comfortable with less than about 4-5.

 I know Regan Heath who wrote some crypto code for Tango. Also, I suspect  
 that D _will_ gain more (crypto) contributors, especially after joining  
 GCC.

I wrote those as a crypto novice and haven't had any cause to advance
since then.  There are some obvious things to watch out for, for example
the hashing routines should not make copies of the input data, or if they
do they should 'scrub' that memory clean afterwards.  But, that's the
limit of my knowledge, there are bound to be more advanced problems and
solutions that I'm simply not aware of.

Regan

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/

Nov 22 2011

bcs <bcs example.com> writes:

On 11/22/2011 08:16 AM, Piotr Szturmaj wrote:
 bcs wrote:

 How many people in the D community have the experience and know-how to
 review the security of an implementation? If there are less than 2 or 3
 people who can do that, can we afford to include native kernels? We
 can't have just one and if we have only two and one leaves for some
 reason the code becomes un-maintainable for lack of a reviewer. *I*
 wouldn't be comfortable with less than about 4-5.

 I know Regan Heath who wrote some crypto code for Tango. Also, I suspect
 that D _will_ gain more (crypto) contributors, especially after joining
 GCC.

"Wrote some crypto code" is a rather weak recommendation. Depending on 
how you interpret it, that would recommend *me*. A better recommendation 
would be "Mr Y gets paid by security company X to do crypto analysis" or 
"Mrs Z has published several well review papers on vulnerabilities in 
this kind of code".

 Minimum number of contributors/reviewers requirement in open-source
 project is at least unfortunate in my opinion. Nevertheless, I respect
 your thoughts. But imagine what could happen if Walter waited for
 contributors instead of starting D project on his own?

 Please realize that we do not implement every possible crypto algorithm
 at once. We need to start with something like hashing, then add
 encryption and other cryptographic primitives.

I have no problem with that comment. My concern revolves around the 
point that the implementation of cryptographic primitives has security 
implications. I'm worried that we don't have the resources to 
demonstrate that our implementation is at least as good as the currently 
available implementation. I'd rather Phobos not include a given 
primitive than contain one of unknown quality. What I'd like to see is 
that the crypto package quickly contain interfaces for all the 
primitives we can find pre-vetted Boost licensed implementations for. At 
that point I would have no issue with as methodical and meticulous 
effort to divest ourselves of external dependencies as we can get access 
to the expertises needed to vet our own implementations (to the same 
level as the code they are replacing).

Yes, I'm intentionally being paranoid here but this is security and 
paranoia is part of the job description darn-it.

Nov 25 2011

Brad Anderson <eco gnuk.net> writes:

On Fri, Nov 25, 2011 at 10:31 PM, bcs <bcs example.com> wrote:

 On 11/22/2011 08:16 AM, Piotr Szturmaj wrote:

 bcs wrote:

  How many people in the D community have the experience and know-how to
 review the security of an implementation? If there are less than 2 or 3
 people who can do that, can we afford to include native kernels? We
 can't have just one and if we have only two and one leaves for some
 reason the code becomes un-maintainable for lack of a reviewer. *I*
 wouldn't be comfortable with less than about 4-5.

 I know Regan Heath who wrote some crypto code for Tango. Also, I suspect
 that D _will_ gain more (crypto) contributors, especially after joining
 GCC.

 "Wrote some crypto code" is a rather weak recommendation. Depending on how
 you interpret it, that would recommend *me*. A better recommendation would
 be "Mr Y gets paid by security company X to do crypto analysis" or "Mrs Z
 has published several well review papers on vulnerabilities in this kind of
 code".


  Minimum number of contributors/reviewers requirement in open-source
 project is at least unfortunate in my opinion. Nevertheless, I respect
 your thoughts. But imagine what could happen if Walter waited for
 contributors instead of starting D project on his own?

 Please realize that we do not implement every possible crypto algorithm
 at once. We need to start with something like hashing, then add
 encryption and other cryptographic primitives.

 I have no problem with that comment. My concern revolves around the point
 that the implementation of cryptographic primitives has security
 implications. I'm worried that we don't have the resources to demonstrate
 that our implementation is at least as good as the currently available
 implementation. I'd rather Phobos not include a given primitive than
 contain one of unknown quality. What I'd like to see is that the crypto
 package quickly contain interfaces for all the primitives we can find
 pre-vetted Boost licensed implementations for. At that point I would have
 no issue with as methodical and meticulous effort to divest ourselves of
 external dependencies as we can get access to the expertises needed to vet
 our own implementations (to the same level as the code they are replacing).

 Yes, I'm intentionally being paranoid here but this is security and
 paranoia is part of the job description darn-it.

How about putting a disclaimer on the module warning the code hasn't been
through a rigorous security audit and point them at well established C
libraries if they need that sort of assurance.

Nov 26 2011

bcs <bcs example.com> writes:

On 11/26/2011 04:19 PM, Brad Anderson wrote:
 How about putting a disclaimer on the module warning the code hasn't
 been through a rigorous security audit and point them at well
 established C libraries if they need that sort of assurance.

What does that gain over implementing the first itteration in terms of 
well established C libraries and then replacing that with native 
implementations as the code goes been through a rigorous security audit?

Or how about do both as API compatible implementations? That would work 
for people who need the proven security and people who can't afford 
external dependencies as well as allow them to be swapped out for each 
other with minimal effort once the native code is proven.

Nov 27 2011

Jude Young <10equals2 gmail.com> writes:

On Sun 27 Nov 2011 10:27:58 AM CST, bcs wrote:
 On 11/26/2011 04:19 PM, Brad Anderson wrote:
 How about putting a disclaimer on the module warning the code hasn't
 been through a rigorous security audit and point them at well
 established C libraries if they need that sort of assurance.

 What does that gain over implementing the first itteration in terms of
 well established C libraries and then replacing that with native
 implementations as the code goes been through a rigorous security audit?

 Or how about do both as API compatible implementations? That would
 work for people who need the proven security and people who can't
 afford external dependencies as well as allow them to be swapped out
 for each other with minimal effort once the native code is proven.

I do like this idea.
swap implementations by simply swapping import and linking?
nice.

Nov 27 2011

Piotr Szturmaj <bncrbme jadamspam.pl> writes:

Jude Young wrote:
 On Sun 27 Nov 2011 10:27:58 AM CST, bcs wrote:
 On 11/26/2011 04:19 PM, Brad Anderson wrote:
 How about putting a disclaimer on the module warning the code hasn't
 been through a rigorous security audit and point them at well
 established C libraries if they need that sort of assurance.

 What does that gain over implementing the first itteration in terms of
 well established C libraries and then replacing that with native
 implementations as the code goes been through a rigorous security audit?

 Or how about do both as API compatible implementations? That would
 work for people who need the proven security and people who can't
 afford external dependencies as well as allow them to be swapped out
 for each other with minimal effort once the native code is proven.

 I do like this idea.
 swap implementations by simply swapping import and linking?
 nice.

This was my goal... to write native implementation along with OpenSSL 
wrapper and add 'useOpenSSL' version identifier. Would that satisfy 
everyone?

Nov 27 2011

bcs <bcs example.com> writes:

On 11/27/2011 12:15 PM, Piotr Szturmaj wrote:
 Jude Young wrote:
 On Sun 27 Nov 2011 10:27:58 AM CST, bcs wrote:
 On 11/26/2011 04:19 PM, Brad Anderson wrote:
 How about putting a disclaimer on the module warning the code hasn't
 been through a rigorous security audit and point them at well
 established C libraries if they need that sort of assurance.

 What does that gain over implementing the first itteration in terms of
 well established C libraries and then replacing that with native
 implementations as the code goes been through a rigorous security audit?

 Or how about do both as API compatible implementations? That would
 work for people who need the proven security and people who can't
 afford external dependencies as well as allow them to be swapped out
 for each other with minimal effort once the native code is proven.

 I do like this idea.
 swap implementations by simply swapping import and linking?
 nice.

 This was my goal... to write native implementation along with OpenSSL
 wrapper and add 'useOpenSSL' version identifier. Would that satisfy
 everyone?

Yes, though I'd prefer to see them distinct and non-mutually exclusive. 
For one things, someone may well consider the native implementation of 
one primitive vetted before they consider another to be. Both results 
could be had by the suitable application of aliases.

Nov 27 2011

Brad Anderson <eco gnuk.net> writes:

On Sun, Nov 27, 2011 at 9:27 AM, bcs <bcs example.com> wrote:

 On 11/26/2011 04:19 PM, Brad Anderson wrote:

 How about putting a disclaimer on the module warning the code hasn't
 been through a rigorous security audit and point them at well
 established C libraries if they need that sort of assurance.

 What does that gain over implementing the first itteration in terms of
 well established C libraries and then replacing that with native
 implementations as the code goes been through a rigorous security audit?

 Or how about do both as API compatible implementations? That would work
 for people who need the proven security and people who can't afford
 external dependencies as well as allow them to be swapped out for each
 other with minimal effort once the native code is proven.

That's even better but isn't the issue over bundling incompatibly licensed
libraries with phobos?  Nothing is stopping someone from writing bindings
for these libraries as some random library on D Source or Github already.
 An agreed upon API would be very nice in any case.

Nov 27 2011

bcs <bcs example.com> writes:

On 11/27/2011 12:14 PM, Brad Anderson wrote:
 That's even better but isn't the issue over bundling incompatibly
 licensed libraries with phobos?  Nothing is stopping someone from
 writing bindings for these libraries as some random library on D Source
 or Github already.

If we can't find something with a licence allowing bundling then we just 
include the D language bits (including bindings) and note that along 
with where to get the lib.

 An agreed upon API would be very nice in any case.

That is necessary (or at least very desirable) in any case as it would 
allow swapping one cipher for another just as easily as it would allow 
swapping one implementation for another.

Nov 27 2011

D Programming

C/C++ Programming

Other

digitalmars.D - Early std.crypto