www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Early std.crypto

reply Piotr Szturmaj <bncrbme jadamspam.pl> writes:
https://github.com/pszturmaj/phobos/tree/master/std/crypto

This is some early work on std.crypto proposal. Currently only MD5, HMAC 
and all SHA family functions (excluding SHA0 which is very old, broken 
and no longer in use). I plan to add other crypto primitives later.

I know about one SHA1 pull request optimized for SSSE3. I think native 
code must be there to support other non x86 CPUs and SIMD optimization 
may be added at any time later.

Any opinions are welcome. Especially if such design is good or bad, and 
what needs to be changed.

Thanks :)
Oct 24 2011
next sibling parent reply "Martin Nowak" <dawg dawgfoto.de> writes:
On Tue, 25 Oct 2011 02:10:49 +0200, Piotr Szturmaj <bncrbme jadamspam.pl=
  =

wrote:
 https://github.com/pszturmaj/phobos/tree/master/std/crypto

 This is some early work on std.crypto proposal. Currently only MD5, HM=

 and all SHA family functions (excluding SHA0 which is very old, broken=

 and no longer in use). I plan to add other crypto primitives later.

 I know about one SHA1 pull request optimized for SSSE3. I think native=

 code must be there to support other non x86 CPUs and SIMD optimization=

 may be added at any time later.

 Any opinions are welcome. Especially if such design is good or bad, an=

 what needs to be changed.

 Thanks :)

Great to push this a little. I have to say though that I like the current struct based interface much better. struct Hash { // enhanced by some compile time traits enum hashLength =3D 16; enum blockLength =3D 0; // three interface functions void start(); void update(const(ubyte)[] data); void finish(ref ubyte[hashLength] digest); } You wouldn't need the save, restore functions. Some unnecessary allocations could go away. Most important instances would have less mutable state. You could probably parameterize a Merkle Damg=C3=A5rd base with free functions for the transformation. A dynamic interface can be obtaines by templated instances similar to wh= at = std.range does.
Oct 24 2011
parent reply Piotr Szturmaj <bncrbme jadamspam.pl> writes:
Martin Nowak wrote:
 I have to say though that I like the current struct based interface
 much better.

 struct Hash
 {
 // enhanced by some compile time traits
 enum hashLength = 16;
 enum blockLength = 0;

The reason why hash and block length are runtime variables is that some hash functions are parametrized with variables of great amplitude, for example CubeHash may have any number of rounds, and any size of block and hash output.
 // three interface functions
 void start();
 void update(const(ubyte)[] data);
 void finish(ref ubyte[hashLength] digest);
 }

There, it is: reset(); put(); finish(); The put() function makes hash implementation an OutputRange.
 You wouldn't need the save, restore functions.

They're not needed. They only serve as speed optimization when hashing many messages which have the same beginning block. This is used in HMAC, which is: HMAC(func, key, message) = func(key ^ opad, func(key ^ ipad, message)); when func supports saving the IV, the first parts are precomputed, when not HMAC resorts to full hashing. This optimization is also mentioned in HMAC spec.
 Some unnecessary allocations could go away.
 Most important instances would have less mutable state.

Could you specify which ones, please?
 You could probably parameterize a Merkle Damgård base with free
 functions for the transformation.

What would be the difference from current class parametrization?
 A dynamic interface can be obtaines by templated instances similar to
 what std.range does.

Could you elaborate? I don't know exactly what do you mean. Function templates? Thanks a lot!
Oct 25 2011
parent Piotr Szturmaj <bncrbme jadamspam.pl> writes:
Martin Nowak wrote:
 On Tue, 25 Oct 2011 09:43:48 +0200, Piotr Szturmaj
 <bncrbme jadamspam.pl> wrote:

 Martin Nowak wrote:
 I have to say though that I like the current struct based interface
 much better.

 struct Hash
 {
 // enhanced by some compile time traits
 enum hashLength = 16;
 enum blockLength = 0;

The reason why hash and block length are runtime variables is that some hash functions are parametrized with variables of great amplitude, for example CubeHash may have any number of rounds, and any size of block and hash output.
 // three interface functions
 void start();
 void update(const(ubyte)[] data);
 void finish(ref ubyte[hashLength] digest);
 }

There, it is: reset(); put(); finish();

good.

I think that's negligible, but it may be "unbranched" easily.
 The put() function makes hash implementation an OutputRange.

 You wouldn't need the save, restore functions.

They're not needed. They only serve as speed optimization when hashing many messages which have the same beginning block. This is used in HMAC, which is: HMAC(func, key, message) = func(key ^ opad, func(key ^ ipad, message)); when func supports saving the IV, the first parts are precomputed, when not HMAC resorts to full hashing. This optimization is also mentioned in HMAC spec.

auto saved = hash_ctx; Or alternatively one could add a 'save()' function to an isSaveableHash(H) concept.

Yes, I thought about that, but current way is faster because only initialization vector is saved, and api name emphasises it. General save() on SHA512 would need to copy IV and 80 ulongs (internal state).
 Some unnecessary allocations could go away.
 Most important instances would have less mutable state.

Could you specify which ones, please?

hash(T) and hashToHex(T).

Yes, this will be fixed.
 You could probably parameterize a Merkle Damgård base with free
 functions for the transformation.

What would be the difference from current class parametrization?

concern. If not using classes you need a way to inject the transformation which could be done like. alias MerkleDamgard!(uint, 5, 80, 16, 20, sha1Transform) SHA1;

Either way this function must be written. Either as free function or as class method. I don't think someone would use transformation function directly.
 A dynamic interface can be obtaines by templated instances similar to
 what std.range does.

Could you elaborate? I don't know exactly what do you mean. Function templates?

DynamicAllocatorTemplate at https://github.com/dsimcha/TempAlloc/blob/master/std/allocators/allocator.d These are to support cases where you either want a stable ABI or have a template firewall for scalability issues (e.g. could be sensible for the HMAC implementation although not really necessary).

I will look into it. Thanks!
Oct 25 2011
prev sibling next sibling parent "Martin Nowak" <dawg dawgfoto.de> writes:
On Tue, 25 Oct 2011 09:43:48 +0200, Piotr Szturmaj <bncrbme jadamspam.pl=
  =

wrote:
 Martin Nowak wrote:
 I have to say though that I like the current struct based interface
 much better.

 struct Hash
 {
 // enhanced by some compile time traits
 enum hashLength =3D 16;
 enum blockLength =3D 0;

The reason why hash and block length are runtime variables is that som=

 hash functions are parametrized with variables of great amplitude, for=

 example CubeHash may have any number of rounds, and any size of block =

 and hash output.

 // three interface functions
 void start();
 void update(const(ubyte)[] data);
 void finish(ref ubyte[hashLength] digest);
 }

There, it is: reset(); put(); finish();

= good.
 The put() function makes hash implementation an OutputRange.

 You wouldn't need the save, restore functions.

They're not needed. They only serve as speed optimization when hashing=

 many messages which have the same beginning block. This is used in HMA=

 which is:

 HMAC(func, key, message) =3D func(key ^ opad, func(key ^ ipad, message=

 when func supports saving the IV, the first parts are precomputed, whe=

 not HMAC resorts to full hashing. This optimization is also mentioned =

 HMAC spec.

auto saved =3D hash_ctx; Or alternatively one could add a 'save()' function to an isSaveableHash(= H) concept.
 Some unnecessary allocations could go away.
 Most important instances would have less mutable state.

Could you specify which ones, please?

= hash(T) and hashToHex(T).
 You could probably parameterize a Merkle Damg=C3=A5rd base with free
 functions for the transformation.

What would be the difference from current class parametrization?

concern. If not using classes you need a way to inject the transformation which = could be done like. alias MerkleDamgard!(uint, 5, 80, 16, 20, sha1Transform) SHA1;
 A dynamic interface can be obtaines by templated instances similar to=


 what std.range does.

Could you elaborate? I don't know exactly what do you mean. Function =

 templates?

DynamicAllocatorTemplate at = https://github.com/dsimcha/TempAlloc/blob/master/std/allocators/allocato= r.d These are to support cases where you either want a stable ABI or have a template firewall for scalability issues (e.g. could be sensible = = for the HMAC implementation although not really necessary).
 Thanks a lot!

Oct 25 2011
prev sibling next sibling parent reply Brad Roberts <braddr puremagic.com> writes:
On 10/24/2011 5:10 PM, Piotr Szturmaj wrote:
 https://github.com/pszturmaj/phobos/tree/master/std/crypto
 
 This is some early work on std.crypto proposal. Currently only MD5, HMAC and
all SHA family functions (excluding SHA0
 which is very old, broken and no longer in use). I plan to add other crypto
primitives later.
 
 I know about one SHA1 pull request optimized for SSSE3. I think native code
must be there to support other non x86 CPUs
 and SIMD optimization may be added at any time later.
 
 Any opinions are welcome. Especially if such design is good or bad, and what
needs to be changed.
 
 Thanks :)

A key element to a lot of crypto code is speed. I really don't think we want to re-invent all the optimizations on all the platforms. To that end, I really suggest that we stick to wrapping existing implementations, like openssl. While I hate the openssl apis, I do respect the continual effort that various companies invest in optimizing the code. My 2 cents, Brad
Oct 25 2011
parent Piotr Szturmaj <bncrbme jadamspam.pl> writes:
Brad Roberts wrote:
 On 10/24/2011 5:10 PM, Piotr Szturmaj wrote:
 https://github.com/pszturmaj/phobos/tree/master/std/crypto

 This is some early work on std.crypto proposal. Currently only MD5, HMAC and
all SHA family functions (excluding SHA0
 which is very old, broken and no longer in use). I plan to add other crypto
primitives later.

 I know about one SHA1 pull request optimized for SSSE3. I think native code
must be there to support other non x86 CPUs
 and SIMD optimization may be added at any time later.

 Any opinions are welcome. Especially if such design is good or bad, and what
needs to be changed.

 Thanks :)

A key element to a lot of crypto code is speed. I really don't think we want to re-invent all the optimizations on all the platforms. To that end, I really suggest that we stick to wrapping existing implementations, like openssl. While I hate the openssl apis, I do respect the continual effort that various companies invest in optimizing the code.

You are of course right about speed but there are some reasons for having our own code _if_ we want std.crypto: 1. Phobos independence 2. Non D friendly API of openssl 3. No need to link with openssl to compute a simple hash. 4. Licensing We also have some options for speed improvement, while retaining our API 1. Wrap openssl _on demand_, this is transparent to the user, API doesn't change 2. Many (if not all) openssl asm code may be obtained using CRYPTOGAMS license (BSD). But reading http://www.openssl.org/~appro/cryptogams/ suggests that author might be willing to license it under Boost. 3. Adapt Crypto++ asm code which is public domain (x86/64 only) The 1st one should be easy, and user would have choice between openssl wrapped within std.crypto or direct access to etc.c.openssl.
Oct 25 2011
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 10/24/2011 5:10 PM, Piotr Szturmaj wrote:
 https://github.com/pszturmaj/phobos/tree/master/std/crypto

 This is some early work on std.crypto proposal. Currently only MD5, HMAC and
all
 SHA family functions (excluding SHA0 which is very old, broken and no longer in
 use). I plan to add other crypto primitives later.

 I know about one SHA1 pull request optimized for SSSE3. I think native code
must
 be there to support other non x86 CPUs and SIMD optimization may be added at
any
 time later.

 Any opinions are welcome. Especially if such design is good or bad, and what
 needs to be changed.

Thanks for championing this. The input to the functions should be a range, not an array (although an array is a range). In general, for Phobos, all arbitrary input data should be in the form of ranges, and all arbitrary output data should present itself as a range. This facilitates the idea of: range => algorithm => range So, for example, I want to encrypt and then zip a file and send the output to a socket: file => encrypt => compress => socket All the components here will just "snap" together. With the existing design of crypto, I'd have to read the file into an array, then pass the array to encrypt, etc. Think of it like the filter concept in Unix that has been so successful.
Oct 25 2011
next sibling parent reply Piotr Szturmaj <bncrbme jadamspam.pl> writes:
Walter Bright wrote:
 On 10/24/2011 5:10 PM, Piotr Szturmaj wrote:
 https://github.com/pszturmaj/phobos/tree/master/std/crypto

 This is some early work on std.crypto proposal. Currently only MD5,
 HMAC and all
 SHA family functions (excluding SHA0 which is very old, broken and no
 longer in
 use). I plan to add other crypto primitives later.

 I know about one SHA1 pull request optimized for SSSE3. I think native
 code must
 be there to support other non x86 CPUs and SIMD optimization may be
 added at any
 time later.

 Any opinions are welcome. Especially if such design is good or bad,
 and what
 needs to be changed.

Thanks for championing this. The input to the functions should be a range, not an array (although an array is a range). In general, for Phobos, all arbitrary input data should be in the form of ranges, and all arbitrary output data should present itself as a range. This facilitates the idea of: range => algorithm => range So, for example, I want to encrypt and then zip a file and send the output to a socket: file => encrypt => compress => socket All the components here will just "snap" together. With the existing design of crypto, I'd have to read the file into an array, then pass the array to encrypt, etc. Think of it like the filter concept in Unix that has been so successful.

I share your opinion. I was thinking about such filter concept for std.crypto.cipher (TBD), but I will also try to convert current hash function code to ranges. Thanks for pointing that out.
Oct 25 2011
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 10/25/2011 3:40 PM, Piotr Szturmaj wrote:
 I share your opinion. I was thinking about such filter concept for
 std.crypto.cipher (TBD), but I will also try to convert current hash function
 code to ranges.

 Thanks for pointing that out.

Andrei and I have pretty much failed at articulating this vision for Phobos. We need to get our act together.
Oct 25 2011
next sibling parent reply Piotr Szturmaj <bncrbme jadamspam.pl> writes:
Walter Bright wrote:
 On 10/25/2011 3:40 PM, Piotr Szturmaj wrote:
 I share your opinion. I was thinking about such filter concept for
 std.crypto.cipher (TBD), but I will also try to convert current hash
 function
 code to ranges.

 Thanks for pointing that out.

Andrei and I have pretty much failed at articulating this vision for Phobos. We need to get our act together.

Personally, I learned ranges from Phobos docs/src and Andrei's article about them. In the beginning I was surprised that I could not find any "range howto", yet on this NG some range designs were considered obvious. But I finally (somewhat) understood and used them in my postgres client implementation. And I like them. I wish there was some article about ranges and their uses. Some tips and tricks thing for users and guidelines for range writers. But let's get back to std.crypto. I planned to add input ranges to function put, because it already forms an output range. I thought that std.range.put() would serve the need if UFCS will be supported. But, as I guess, the best method is to support input ranges directly.
Oct 25 2011
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 10/25/2011 5:22 PM, Piotr Szturmaj wrote:
 But let's get back to std.crypto. I planned to add input ranges to function
put,
 because it already forms an output range. I thought that std.range.put() would
 serve the need if UFCS will be supported. But, as I guess, the best method is
to
 support input ranges directly.

An easy test is that if the interface takes a T[] as input, consider a range instead. Ditto for output. If an interface takes a File as input, it's a red flag that something is wrong.
Oct 25 2011
next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Wednesday, October 26, 2011 06:28:52 Steve Teale wrote:
 An easy test is that if the interface takes a T[] as input, consider a
 range instead. Ditto for output. If an interface takes a File as input,
 it's a red flag that something is wrong.

I have a question about ranges that occurred to me when I was composing a MySQL result set into a random access range. To do that I have to provide the save capability defined by R r1; R r2 = r1.save; Would it harm the use of ranges as elements of operation sequences if the type of the entity that got saved was not the same as the original range provider type. Something like: R r1; S r2 = r1.save; r1.restore(r2); For entities with a good deal of state, that implement a range, storing the whole state, and more significantly, restoring it, may be non- trivial, but saving and restoring the state of the range may be quite a lightweight affair.

It's likely to cause issues if the type returned by save differs from the original type. From your description, it sounds like the range is over something, and it's that something which you don't want to copy. If that's the case, then the range just doesn't contain that data. Rather, it's a view into the data, so saving it just save that view. The fact that arrays are currently the most used example of ranges definitely makes them more confusing IMHO, since people often think of ranges as containers, whereas ranges aren't. And really, dynamic arrays in D _aren't_ containers. No dynamic array actually owns its elements. The runtime owns it, and dynamic arrays are just ranges over that data. It's easier to understand when you think about a container, such as vector type (e.g. std.container.Array) or a linked list. The container holds the data. The container is _not_ a range. A range is a view of that data. So, popping elements off of the range has no effect on the original container - the same goes with many range-based functions. As long as the elements in the range aren't mutated or rearranged, the elements in the container are unaltered. So, calling save on a range just gives you a copy of that view of the container, allowing you to alter the range and still have that original view of the container.
 I'm also puzzled by the semantics of the random access range.
 
 If I restrict myself to indexing into the range, then things are fine.
 
 auto x = r[0];
 x = r[10];
 x = r[0];
 
 But if I slip in a few popFront's, then presumably x = r[0] will give me
 a different result.
 
 This just makes me slightly uneasy.

r[0] _is_ r.front, so of course popFront would change what r[0] is. You're consuming the range as you pop elements off of its front. So, if you do _anything_ which alters a range (rather than its elements), at least some of (if not all of) the indices in a random access range change. The reason that this is making you uneasy is likely at least partially because of the fact that you're probably thinking about arrays and how they're used rather than thinking more abstractly about ranges. Arrays are really a bad example of ranges, in spite of the fact that they are unfortunately, our most common example of ranges at this point. - Jonathan M Davis
Oct 25 2011
prev sibling next sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 26.10.2011 10:28, Steve Teale wrote:
 An easy test is that if the interface takes a T[] as input, consider a
 range instead. Ditto for output. If an interface takes a File as input,
 it's a red flag that something is wrong.

I have a question about ranges that occurred to me when I was composing a MySQL result set into a random access range. To do that I have to provide the save capability defined by R r1; R r2 = r1.save; Would it harm the use of ranges as elements of operation sequences if the type of the entity that got saved was not the same as the original range provider type. Something like: R r1; S r2 = r1.save; r1.restore(r2);

Yay! Range with 'restore' strikes again. Seriously, last time something about range was discussed I noticed this very same thing - full copy is costly, there are ranges that can restore their state using some special tiny savepoint object. Though currently this kind of thing is not supported nor expected, plus there are cases where copy needs to be made.
 For entities with a good deal of state, that implement a range, storing
 the whole state, and more significantly, restoring it, may be non-
 trivial, but saving and restoring the state of the range may be quite a
 lightweight affair.

 I'm also puzzled by the semantics of the random access range.

 If I restrict myself to indexing into the range, then things are fine.

 auto x = r[0];
 x = r[10];
 x = r[0];

 But if I slip in a few popFront's, then presumably x = r[0] will give me
 a different result.

 This just makes me slightly uneasy.

 Steve

-- Dmitry Olshansky
Oct 26 2011
prev sibling parent Steve Teale <steve.teale britseyeview.com> writes:
On Tue, 25 Oct 2011 23:45:58 -0700, Jonathan M Davis wrote:

 On Wednesday, October 26, 2011 06:28:52 Steve Teale wrote:
 An easy test is that if the interface takes a T[] as input, consider
 a range instead. Ditto for output. If an interface takes a File as
 input, it's a red flag that something is wrong.

I have a question about ranges that occurred to me when I was composing a MySQL result set into a random access range. To do that I have to provide the save capability defined by R r1; R r2 = r1.save; Would it harm the use of ranges as elements of operation sequences if the type of the entity that got saved was not the same as the original range provider type. Something like: R r1; S r2 = r1.save; r1.restore(r2); For entities with a good deal of state, that implement a range, storing the whole state, and more significantly, restoring it, may be non- trivial, but saving and restoring the state of the range may be quite a lightweight affair.

It's likely to cause issues if the type returned by save differs from the original type. From your description, it sounds like the range is over something, and it's that something which you don't want to copy. If that's the case, then the range just doesn't contain that data. Rather, it's a view into the data, so saving it just save that view.

Well, yes Jonathan, that's just what I'm getting at. I want to save just those things that are pertinent to the range/view, not the whole panoply of the result set representation or whatever other object is providing the data that the range is a view of. I could do it by saving an instance of the object representing the result set containing only the relevant data, but that would be misleading for the user, because it would not match the object it was cloned from in any respect other than representing a range. Should not the requirement for the thing provided by save be only that it should provide the same view as that provided by the source range at the time of the save, and the same range 'interface'. Is there an accepted term for the way ranges are defined, i.e. as an entity that satisfies some template evaluating roughly to a bool? Steve Steve
Oct 26 2011
prev sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 10/25/2011 4:12 PM, Jonathan M Davis wrote:
 Ranges in general are not something that is comunicated very well. They're not
 in TDPL, and none of the online documentation discusses them in detail. You
 pretty much only learn them from reading Phobos' documentation and using
 Phobos or from discussing it with people who know about them already. And
 there's no overall plan or design for Phobos articulated _anywhere_ that I'm
 aware of, so between those two facts, there's really nothing to get such a
 vision across to anyone other than word of mouth (though I suppose that it's
 more "word of keyboard" in many cases).

Yup. Guilty as charged.
Oct 25 2011
prev sibling next sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Tuesday, October 25, 2011 15:55 Walter Bright wrote:
 On 10/25/2011 3:40 PM, Piotr Szturmaj wrote:
 I share your opinion. I was thinking about such filter concept for
 std.crypto.cipher (TBD), but I will also try to convert current hash
 function code to ranges.
 
 Thanks for pointing that out.

Andrei and I have pretty much failed at articulating this vision for Phobos. We need to get our act together.

Ranges in general are not something that is comunicated very well. They're not in TDPL, and none of the online documentation discusses them in detail. You pretty much only learn them from reading Phobos' documentation and using Phobos or from discussing it with people who know about them already. And there's no overall plan or design for Phobos articulated _anywhere_ that I'm aware of, so between those two facts, there's really nothing to get such a vision across to anyone other than word of mouth (though I suppose that it's more "word of keyboard" in many cases). - Jonathan M Davis
Oct 25 2011
prev sibling next sibling parent Brad Anderson <eco gnuk.net> writes:
--0015174489fc4a697c04b027bc1f
Content-Type: text/plain; charset=ISO-8859-1

On Tue, Oct 25, 2011 at 5:12 PM, Jonathan M Davis <jmdavisProg gmx.com>wrote:

 On Tuesday, October 25, 2011 15:55 Walter Bright wrote:
 On 10/25/2011 3:40 PM, Piotr Szturmaj wrote:
 I share your opinion. I was thinking about such filter concept for
 std.crypto.cipher (TBD), but I will also try to convert current hash
 function code to ranges.

 Thanks for pointing that out.

Andrei and I have pretty much failed at articulating this vision for Phobos. We need to get our act together.

Ranges in general are not something that is comunicated very well. They're not in TDPL, and none of the online documentation discusses them in detail. You pretty much only learn them from reading Phobos' documentation and using Phobos or from discussing it with people who know about them already. And there's no overall plan or design for Phobos articulated _anywhere_ that I'm aware of, so between those two facts, there's really nothing to get such a vision across to anyone other than word of mouth (though I suppose that it's more "word of keyboard" in many cases). - Jonathan M Davis

I learned about them from Andrei's boostcon talk. Not exactly the first place you'd look for information about D but the talk was very informative. Regards, Brad Anderson --0015174489fc4a697c04b027bc1f Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On Tue, Oct 25, 2011 at 5:12 PM, Jonathan M Davis <span dir=3D"ltr">&lt;<a = href=3D"mailto:jmdavisProg gmx.com">jmdavisProg gmx.com</a>&gt;</span> wrot= e:<br><div class=3D"gmail_quote"><blockquote class=3D"gmail_quote" style=3D= "margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"> <div><div></div><div class=3D"h5">On Tuesday, October 25, 2011 15:55 Walter= Bright wrote:<br> &gt; On 10/25/2011 3:40 PM, Piotr Szturmaj wrote:<br> &gt; &gt; I share your opinion. I was thinking about such filter concept fo= r<br> &gt; &gt; std.crypto.cipher (TBD), but I will also try to convert current h= ash<br> &gt; &gt; function code to ranges.<br> &gt; &gt;<br> &gt; &gt; Thanks for pointing that out.<br> &gt;<br> &gt; Andrei and I have pretty much failed at articulating this vision for<b= r> &gt; Phobos. We need to get our act together.<br> <br> </div></div>Ranges in general are not something that is comunicated very we= ll. They&#39;re not<br> in TDPL, and none of the online documentation discusses them in detail. You= <br> pretty much only learn them from reading Phobos&#39; documentation and usin= g<br> Phobos or from discussing it with people who know about them already. And<b= r> there&#39;s no overall plan or design for Phobos articulated _anywhere_ tha= t I&#39;m<br> aware of, so between those two facts, there&#39;s really nothing to get suc= h a<br> vision across to anyone other than word of mouth (though I suppose that it&= #39;s<br> more &quot;word of keyboard&quot; in many cases).<br> <font color=3D"#888888"><br> - Jonathan M Davis<br> </font></blockquote></div><br><div>I learned about them from Andrei&#39;s b= oostcon talk. =A0Not exactly the first place you&#39;d look for information= about D but the talk was very informative.</div><div><br></div><div>Regard= s,</div> <div>Brad Anderson</div> --0015174489fc4a697c04b027bc1f--
Oct 25 2011
prev sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
If you want a bit of history:

http://www.digitalmars.com/d/archives/digitalmars/D/announce/RFC_on_range_design_for_D2_12922.html#N12922
http://www.digitalmars.com/d/archives/digitalmars/D/announce/Revised_RFC_on_range_design_for_D2_13211.html

http://web.archive.org/web/20090112000313/http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html

That last link (although outdated) is probably more descriptive than
what we have now.
Oct 25 2011
prev sibling parent Steve Teale <steve.teale britseyeview.com> writes:
 
 An easy test is that if the interface takes a T[] as input, consider a
 range instead. Ditto for output. If an interface takes a File as input,
 it's a red flag that something is wrong.

I have a question about ranges that occurred to me when I was composing a MySQL result set into a random access range. To do that I have to provide the save capability defined by R r1; R r2 = r1.save; Would it harm the use of ranges as elements of operation sequences if the type of the entity that got saved was not the same as the original range provider type. Something like: R r1; S r2 = r1.save; r1.restore(r2); For entities with a good deal of state, that implement a range, storing the whole state, and more significantly, restoring it, may be non- trivial, but saving and restoring the state of the range may be quite a lightweight affair. I'm also puzzled by the semantics of the random access range. If I restrict myself to indexing into the range, then things are fine. auto x = r[0]; x = r[10]; x = r[0]; But if I slip in a few popFront's, then presumably x = r[0] will give me a different result. This just makes me slightly uneasy. Steve
Oct 25 2011
prev sibling next sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Tuesday, October 25, 2011 16:16 Brad Anderson wrote:
 On Tue, Oct 25, 2011 at 5:12 PM, Jonathan M Davis 

 On Tuesday, October 25, 2011 15:55 Walter Bright wrote:
 On 10/25/2011 3:40 PM, Piotr Szturmaj wrote:
 I share your opinion. I was thinking about such filter concept for
 std.crypto.cipher (TBD), but I will also try to convert current hash
 function code to ranges.
 
 Thanks for pointing that out.

Andrei and I have pretty much failed at articulating this vision for Phobos. We need to get our act together.

Ranges in general are not something that is comunicated very well. They're not in TDPL, and none of the online documentation discusses them in detail. You pretty much only learn them from reading Phobos' documentation and using Phobos or from discussing it with people who know about them already. And there's no overall plan or design for Phobos articulated _anywhere_ that I'm aware of, so between those two facts, there's really nothing to get such a vision across to anyone other than word of mouth (though I suppose that it's more "word of keyboard" in many cases). - Jonathan M Davis

I learned about them from Andrei's boostcon talk. Not exactly the first place you'd look for information about D but the talk was very informative.

That and an article that Andrei wrote on ranges a while back which was not D- specific are the only online sources on ranges that I'm aware of, and to my knowledge, neither of them are referenced on D's site. Ranges are a powerful concept, but they're not well known, in part because they're relatively new - particularly when it comes to being used in a major library, let alone the standard library for a language. So, they definitely need some explaining. I was working on article on ranges a while back, but tabled it due to bugs related to std.container.Array rendering my examples unworkable. I should probably get back to working on that and see if those issues still exist or whether I can better work around them in the article. But the fact that dynamic arrays are actually a very poor example of ranges in some regards (given the fact that people tend to think of them as containers even though they really aren't in D) complicates things in a way that I'd really like to be able to explain some of the concepts using a real container rather than arrays. - Jonathan M Davis
Oct 25 2011
prev sibling next sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Wednesday, October 26, 2011 03:17 Steve Teale wrote:
 On Tue, 25 Oct 2011 23:45:58 -0700, Jonathan M Davis wrote:
 On Wednesday, October 26, 2011 06:28:52 Steve Teale wrote:
 An easy test is that if the interface takes a T[] as input, consider
 a range instead. Ditto for output. If an interface takes a File as
 input, it's a red flag that something is wrong.

I have a question about ranges that occurred to me when I was composing a MySQL result set into a random access range. To do that I have to provide the save capability defined by R r1; R r2 = r1.save; Would it harm the use of ranges as elements of operation sequences if the type of the entity that got saved was not the same as the original range provider type. Something like: R r1; S r2 = r1.save; r1.restore(r2); For entities with a good deal of state, that implement a range, storing the whole state, and more significantly, restoring it, may be non- trivial, but saving and restoring the state of the range may be quite a lightweight affair.

It's likely to cause issues if the type returned by save differs from the original type. From your description, it sounds like the range is over something, and it's that something which you don't want to copy. If that's the case, then the range just doesn't contain that data. Rather, it's a view into the data, so saving it just save that view.

Well, yes Jonathan, that's just what I'm getting at. I want to save just those things that are pertinent to the range/view, not the whole panoply of the result set representation or whatever other object is providing the data that the range is a view of. I could do it by saving an instance of the object representing the result set containing only the relevant data, but that would be misleading for the user, because it would not match the object it was cloned from in any respect other than representing a range. Should not the requirement for the thing provided by save be only that it should provide the same view as that provided by the source range at the time of the save, and the same range 'interface'. Is there an accepted term for the way ranges are defined, i.e. as an entity that satisfies some template evaluating roughly to a bool?

Ranges are defined per templates in std.range. isForwardRange, isInputRange, isRandomAccessRange, etc. And it looks like isForwardRange specifically checks that the return type of save is the same type as the range itself. I was thinking that it didn't necessarily require that but that you'd have definite issues having save return a different type, because it's generally assumed that it's the same type and code will reflect that. However, it looks like it's actually required. What you should probably do is have the result set be an object holding the data but not be a range itself and then overload opSlice to give you a range over that data (as would be done with a container). Then the range isn't in a position where it's trying to own the data, and it's clear to the programmer where the data is stored. - Jonathan M Davis
Oct 26 2011
prev sibling next sibling parent Steve Teale <steve.teale britseyeview.com> writes:
On Wed, 26 Oct 2011 13:01:14 -0400, Jonathan M Davis wrote:

 On Wednesday, October 26, 2011 03:17 Steve Teale wrote:
 On Tue, 25 Oct 2011 23:45:58 -0700, Jonathan M Davis wrote:
 On Wednesday, October 26, 2011 06:28:52 Steve Teale wrote:
 An easy test is that if the interface takes a T[] as input,
 consider a range instead. Ditto for output. If an interface takes
 a File as input, it's a red flag that something is wrong.

I have a question about ranges that occurred to me when I was composing a MySQL result set into a random access range. To do that I have to provide the save capability defined by R r1; R r2 = r1.save; Would it harm the use of ranges as elements of operation sequences if the type of the entity that got saved was not the same as the original range provider type. Something like: R r1; S r2 = r1.save; r1.restore(r2); For entities with a good deal of state, that implement a range, storing the whole state, and more significantly, restoring it, may be non- trivial, but saving and restoring the state of the range may be quite a lightweight affair.

It's likely to cause issues if the type returned by save differs from the original type. From your description, it sounds like the range is over something, and it's that something which you don't want to copy. If that's the case, then the range just doesn't contain that data. Rather, it's a view into the data, so saving it just save that view.

Well, yes Jonathan, that's just what I'm getting at. I want to save just those things that are pertinent to the range/view, not the whole panoply of the result set representation or whatever other object is providing the data that the range is a view of. I could do it by saving an instance of the object representing the result set containing only the relevant data, but that would be misleading for the user, because it would not match the object it was cloned from in any respect other than representing a range. Should not the requirement for the thing provided by save be only that it should provide the same view as that provided by the source range at the time of the save, and the same range 'interface'. Is there an accepted term for the way ranges are defined, i.e. as an entity that satisfies some template evaluating roughly to a bool?

Ranges are defined per templates in std.range. isForwardRange, isInputRange, isRandomAccessRange, etc. And it looks like isForwardRange specifically checks that the return type of save is the same type as the range itself. I was thinking that it didn't necessarily require that but that you'd have definite issues having save return a different type, because it's generally assumed that it's the same type and code will reflect that. However, it looks like it's actually required.

But is it just required by std.range, or is it required at some theoretical level by algorithms that can work with ranges as inputs and outputs?
 What you should probably do is have the result set be an object holding
 the data but not be a range itself and then overload opSlice to give you
 a range over that data (as would be done with a container). Then the
 range isn't in a position where it's trying to own the data, and it's
 clear to the programmer where the data is stored.
 

But in my mind, just from a quick reading of the language definition, slices are closely tied to arrays, which as you have already noted, are not the best example of a range. Are ranges just a library artefact, or are they supported by the language. They appear to be recognized by foreach, so if they are recognized by D, then we should presumably have operator functions opInputRange, opForwardRange, and so on, whatever those terms might mean. If not, then facilities like std.algorithm should have a warning notice like cigarettes that says "this facility may not be usable with POD". Steve
Oct 26 2011
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Wednesday, October 26, 2011 19:00:02 Steve Teale wrote:
 On Wed, 26 Oct 2011 13:01:14 -0400, Jonathan M Davis wrote:
 Ranges are defined per templates in std.range. isForwardRange,
 isInputRange, isRandomAccessRange, etc. And it looks like isForwardRange
 specifically checks that the return type of save is the same type as the
 range itself. I was thinking that it didn't necessarily require that but
 that you'd have definite issues having save return a different type,
 because it's generally assumed that it's the same type and code will
 reflect that. However, it looks like it's actually required.

But is it just required by std.range, or is it required at some theoretical level by algorithms that can work with ranges as inputs and outputs?

It complicates things to allow for the range returned by save not be the same type as the original range. It's probably possible to alter isForwardRange to allow save to return a different type, but I don't know what all of the side effects of such a decision would be. I'm certain however that it would break a fair bit of code.
 What you should probably do is have the result set be an object holding
 the data but not be a range itself and then overload opSlice to give you
 a range over that data (as would be done with a container). Then the
 range isn't in a position where it's trying to own the data, and it's
 clear to the programmer where the data is stored.

But in my mind, just from a quick reading of the language definition, slices are closely tied to arrays, which as you have already noted, are not the best example of a range. Are ranges just a library artefact, or are they supported by the language. They appear to be recognized by foreach, so if they are recognized by D, then we should presumably have operator functions opInputRange, opForwardRange, and so on, whatever those terms might mean.

foreach supports the API of an input range: front, empty, and popFront. There's no need for overloaded operators of any kind. If you want to overload an operator for foreach, then use opApply. foreach is the _only_ place in the language that specfically supports ranges. Arrays are a poor example of ranges because they don't have an obviously associated container. A dynamic array is a range over a block of memory that the runtime maintains. With a full-blown container such as Array or RedBlackTree, it's the container that holds the data, and a range for one of them is a range over that container. People tend to think of arrays as being containers rather than just slices, so it becomes confusing to people. It's much easier to properly understand ranges if you think of them as being associated with a container like an iterator would be in C++. They don't normally own the elements that they're iterating over. When they do, you get situations such as input ranges where you can't save them (an input stream being a prime example).
 If not, then facilities like std.algorithm should have a warning notice
 like cigarettes that says "this facility may not be usable with POD".

I'm afraid that I don't understand why a warning would be necessary. std.range lists the functions that each type of range must implement and it has templates for testing whether a particular type implements those functions for a given sort of range. I suspect that you're misunderstanding something fundamental about ranges. If you haven't read this article yet - http://www.informit.com/articles/printerfriendly.aspx?p=1407357 - I suggest that you do. It doesn't entirely match what std.range does (at least as far as naming goes) but it does explain the core concepts fairly well and as far as I know is the best explanation on ranges that currently exists. - Jonathan M Davis
Oct 26 2011
prev sibling parent reply bcs <bcs example.com> writes:
On 10/24/2011 05:10 PM, Piotr Szturmaj wrote:
 https://github.com/pszturmaj/phobos/tree/master/std/crypto

 This is some early work on std.crypto proposal. Currently only MD5, HMAC
 and all SHA family functions (excluding SHA0 which is very old, broken
 and no longer in use). I plan to add other crypto primitives later.

 I know about one SHA1 pull request optimized for SSSE3. I think native
 code must be there to support other non x86 CPUs and SIMD optimization
 may be added at any time later.

 Any opinions are welcome. Especially if such design is good or bad, and
 what needs to be changed.

 Thanks :)

Are you re-implementing the function kernels your self or are you using an existing implementation? Given what I've heard about things like side-channel attacks using processing times to recover keys, I'd rather not see Phobos use anything written by less than the best expert available.
Oct 28 2011
next sibling parent reply Piotr Szturmaj <bncrbme jadamspam.pl> writes:
bcs wrote:
 Are you re-implementing the function kernels your self or are you using
 an existing implementation? Given what I've heard about things like
 side-channel attacks using processing times to recover keys, I'd rather
 not see Phobos use anything written by less than the best expert available.

Until now, I implemented some hash functions. There are no branching instructions in their transform() routines, so theoretically processing time is independent of the function input.
Nov 04 2011
parent reply bcs <bcs example.com> writes:
On 11/04/2011 04:27 AM, Piotr Szturmaj wrote:
 bcs wrote:
 Are you re-implementing the function kernels your self or are you using
 an existing implementation? Given what I've heard about things like
 side-channel attacks using processing times to recover keys, I'd rather
 not see Phobos use anything written by less than the best expert
 available.

Until now, I implemented some hash functions. There are no branching instructions in their transform() routines, so theoretically processing time is independent of the function input.

From my very incomplete memory I found the source I was looking for (I googled for "aes interperative dance") here http://www.moserware.com/2009/09/stick-figure-guide-to-advanced.html Look for "Foot-Shooting Prevention Agreement" in one of the images ~20-25% of the way down. tl;dr; It mentions "cache-based, timing, and other side channel attacks". Unless you can explain to me what those are, in painful detail, I don't think we should trust you to avoid them. Get a good vetted C implementation and wrap it with a nice D API and call it a day.
Nov 04 2011
next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 11/4/2011 7:52 PM, bcs wrote:
 tl;dr; It mentions "cache-based, timing, and other side channel attacks".
Unless
 you can explain to me what those are, in painful detail, I don't think we
should
 trust you to avoid them. Get a good vetted C implementation and wrap it with a
 nice D API and call it a day.

You've got a good point. While I'd like to see native D implementations, crypto security is such a big issue we'd probably be better off with your suggestion.
Nov 04 2011
prev sibling parent reply Piotr Szturmaj <bncrbme jadamspam.pl> writes:
bcs wrote:
 On 11/04/2011 04:27 AM, Piotr Szturmaj wrote:
 bcs wrote:
 Are you re-implementing the function kernels your self or are you using
 an existing implementation? Given what I've heard about things like
 side-channel attacks using processing times to recover keys, I'd rather
 not see Phobos use anything written by less than the best expert
 available.

Until now, I implemented some hash functions. There are no branching instructions in their transform() routines, so theoretically processing time is independent of the function input.

From my very incomplete memory I found the source I was looking for (I googled for "aes interperative dance") here http://www.moserware.com/2009/09/stick-figure-guide-to-advanced.html Look for "Foot-Shooting Prevention Agreement" in one of the images ~20-25% of the way down. tl;dr; It mentions "cache-based, timing, and other side channel attacks". Unless you can explain to me what those are, in painful detail, I don't think we should trust you to avoid them. Get a good vetted C implementation and wrap it with a nice D API and call it a day.

Sorry for late answer. I didn't implement AES yet, but it can be implemented without lookup tables (countermeasure to cache-timing attacks). Branch-free code without secret-data dependent memory accesses is free from cache-timing vurnelabilities on most architectures. The remaining are architectures where instruction execution times (cycles) may depend on data (f.i. multiply by 0 may take 1 cycle, and >2 cycles when multiplying by other numbers). I don't know any concrete examples but I realize such cycle-varying instructions _may_ exist. Of course such instructions must be avoided in secure code. The most hard to avoid side channel attack is Power Analysis. On almost all CPUs instruction power drain depends on the operands. Again multiplying by 0 consume less power than multiplying by other numbers and that may be distinguished on the oscilloscope. There are some countermeasures including operand blinding, but not all crypto algorithms support that. I only implemented some hashing functions: MD5, SHA1/224/256/384/512 and Tiger1/2. MD5 and SHA have code that is branch-free and does not use any data dependent lookups so their execution time should be constant, making timing attacks harmless. Unfortunately Tiger function does lookups based on the source bytes and is vurnelable to cache-timing attacks. This may be considered insecure and removed (openssl does not support it either) as making it secure may be not worth it. The only problem that remain is the power analysis which require physical access to the computer. We need to ask ourselfs to what degree we must secure our code. I'm going to argue that no single implementation is secure to power analysis on all architectures. I think we must stick to cache/branch timing level and forget about power analysis as its scope is beyond D's specification. We simply can't avoid most power analysis attacks because most of the countermeasures belong to the hardware instead of the software level.
Nov 20 2011
parent reply bcs <bcs example.com> writes:
On 11/20/2011 08:10 AM, Piotr Szturmaj wrote:
 bcs wrote:
 On 11/04/2011 04:27 AM, Piotr Szturmaj wrote:
 bcs wrote:
 Are you re-implementing the function kernels your self or are you using
 an existing implementation? Given what I've heard about things like
 side-channel attacks using processing times to recover keys, I'd rather
 not see Phobos use anything written by less than the best expert
 available.

Until now, I implemented some hash functions. There are no branching instructions in their transform() routines, so theoretically processing time is independent of the function input.

From my very incomplete memory I found the source I was looking for (I googled for "aes interperative dance") here http://www.moserware.com/2009/09/stick-figure-guide-to-advanced.html Look for "Foot-Shooting Prevention Agreement" in one of the images ~20-25% of the way down. tl;dr; It mentions "cache-based, timing, and other side channel attacks". Unless you can explain to me what those are, in painful detail, I don't think we should trust you to avoid them. Get a good vetted C implementation and wrap it with a nice D API and call it a day.

Sorry for late answer.

Np, but I still have a number of concerns: What is the advantage to implementing the kernels of any of these functions in D? Will the code be faster? Smaller? More secure? More maintainable? (On the other hand, the value of doing the API code in D goes with no debate.) How many people in the D community have the experience and know-how to review the security of an implementation? If there are less than 2 or 3 people who can do that, can we afford to include native kernels? We can't have just one and if we have only two and one leaves for some reason the code becomes un-maintainable for lack of a reviewer. *I* wouldn't be comfortable with less than about 4-5.
 I didn't implement AES yet, but it can be implemented without lookup
 tables (countermeasure to cache-timing attacks).

 Branch-free code without secret-data dependent memory accesses is free
 from cache-timing vurnelabilities on most architectures. The remaining
 are architectures where instruction execution times (cycles) may depend
 on data (f.i. multiply by 0 may take 1 cycle, and >2 cycles when
 multiplying by other numbers). I don't know any concrete examples but I
 realize such cycle-varying instructions _may_ exist. Of course such
 instructions must be avoided in secure code.

 The most hard to avoid side channel attack is Power Analysis. On almost
 all CPUs instruction power drain depends on the operands. Again
 multiplying by 0 consume less power than multiplying by other numbers
 and that may be distinguished on the oscilloscope. There are some
 countermeasures including operand blinding, but not all crypto
 algorithms support that.

 I only implemented some hashing functions: MD5, SHA1/224/256/384/512 and
 Tiger1/2. MD5 and SHA have code that is branch-free and does not use any
 data dependent lookups so their execution time should be constant,
 making timing attacks harmless.

 Unfortunately Tiger function does lookups based on the source bytes and
 is vurnelable to cache-timing attacks. This may be considered insecure
 and removed (openssl does not support it either) as making it secure may
 be not worth it.

 The only problem that remain is the power analysis which require
 physical access to the computer. We need to ask ourselfs to what degree
 we must secure our code. I'm going to argue that no single
 implementation is secure to power analysis on all architectures. I think
 we must stick to cache/branch timing level and forget about power
 analysis as its scope is beyond D's specification. We simply can't avoid
 most power analysis attacks because most of the countermeasures belong
 to the hardware instead of the software level.

Nov 21 2011
parent reply Piotr Szturmaj <bncrbme jadamspam.pl> writes:
bcs wrote:
 On 11/20/2011 08:10 AM, Piotr Szturmaj wrote:
 bcs wrote:
 On 11/04/2011 04:27 AM, Piotr Szturmaj wrote:
 bcs wrote:
 Are you re-implementing the function kernels your self or are you
 using
 an existing implementation? Given what I've heard about things like
 side-channel attacks using processing times to recover keys, I'd
 rather
 not see Phobos use anything written by less than the best expert
 available.

Until now, I implemented some hash functions. There are no branching instructions in their transform() routines, so theoretically processing time is independent of the function input.

From my very incomplete memory I found the source I was looking for (I googled for "aes interperative dance") here http://www.moserware.com/2009/09/stick-figure-guide-to-advanced.html Look for "Foot-Shooting Prevention Agreement" in one of the images ~20-25% of the way down. tl;dr; It mentions "cache-based, timing, and other side channel attacks". Unless you can explain to me what those are, in painful detail, I don't think we should trust you to avoid them. Get a good vetted C implementation and wrap it with a nice D API and call it a day.


 Sorry for late answer.

Np, but I still have a number of concerns: What is the advantage to implementing the kernels of any of these functions in D? Will the code be faster? Smaller? More secure? More maintainable? (On the other hand, the value of doing the API code in D goes with no debate.)

The first advantage is that Phobos will be independent of any crypto libraries. The second one is that there will be no licensing issues. All crypto code will be under Boost license like the rest of Phobos.
 How many people in the D community have the experience and know-how to
 review the security of an implementation? If there are less than 2 or 3
 people who can do that, can we afford to include native kernels? We
 can't have just one and if we have only two and one leaves for some
 reason the code becomes un-maintainable for lack of a reviewer. *I*
 wouldn't be comfortable with less than about 4-5.

I know Regan Heath who wrote some crypto code for Tango. Also, I suspect that D _will_ gain more (crypto) contributors, especially after joining GCC. Minimum number of contributors/reviewers requirement in open-source project is at least unfortunate in my opinion. Nevertheless, I respect your thoughts. But imagine what could happen if Walter waited for contributors instead of starting D project on his own? Please realize that we do not implement every possible crypto algorithm at once. We need to start with something like hashing, then add encryption and other cryptographic primitives.
Nov 22 2011
parent reply bcs <bcs example.com> writes:
On 11/22/2011 08:16 AM, Piotr Szturmaj wrote:
 bcs wrote:

 How many people in the D community have the experience and know-how to
 review the security of an implementation? If there are less than 2 or 3
 people who can do that, can we afford to include native kernels? We
 can't have just one and if we have only two and one leaves for some
 reason the code becomes un-maintainable for lack of a reviewer. *I*
 wouldn't be comfortable with less than about 4-5.

I know Regan Heath who wrote some crypto code for Tango. Also, I suspect that D _will_ gain more (crypto) contributors, especially after joining GCC.

"Wrote some crypto code" is a rather weak recommendation. Depending on how you interpret it, that would recommend *me*. A better recommendation would be "Mr Y gets paid by security company X to do crypto analysis" or "Mrs Z has published several well review papers on vulnerabilities in this kind of code".
 Minimum number of contributors/reviewers requirement in open-source
 project is at least unfortunate in my opinion. Nevertheless, I respect
 your thoughts. But imagine what could happen if Walter waited for
 contributors instead of starting D project on his own?

 Please realize that we do not implement every possible crypto algorithm
 at once. We need to start with something like hashing, then add
 encryption and other cryptographic primitives.

I have no problem with that comment. My concern revolves around the point that the implementation of cryptographic primitives has security implications. I'm worried that we don't have the resources to demonstrate that our implementation is at least as good as the currently available implementation. I'd rather Phobos not include a given primitive than contain one of unknown quality. What I'd like to see is that the crypto package quickly contain interfaces for all the primitives we can find pre-vetted Boost licensed implementations for. At that point I would have no issue with as methodical and meticulous effort to divest ourselves of external dependencies as we can get access to the expertises needed to vet our own implementations (to the same level as the code they are replacing). Yes, I'm intentionally being paranoid here but this is security and paranoia is part of the job description darn-it.
Nov 25 2011
parent reply bcs <bcs example.com> writes:
On 11/26/2011 04:19 PM, Brad Anderson wrote:
 How about putting a disclaimer on the module warning the code hasn't
 been through a rigorous security audit and point them at well
 established C libraries if they need that sort of assurance.

What does that gain over implementing the first itteration in terms of well established C libraries and then replacing that with native implementations as the code goes been through a rigorous security audit? Or how about do both as API compatible implementations? That would work for people who need the proven security and people who can't afford external dependencies as well as allow them to be swapped out for each other with minimal effort once the native code is proven.
Nov 27 2011
next sibling parent reply Piotr Szturmaj <bncrbme jadamspam.pl> writes:
Jude Young wrote:
 On Sun 27 Nov 2011 10:27:58 AM CST, bcs wrote:
 On 11/26/2011 04:19 PM, Brad Anderson wrote:
 How about putting a disclaimer on the module warning the code hasn't
 been through a rigorous security audit and point them at well
 established C libraries if they need that sort of assurance.

What does that gain over implementing the first itteration in terms of well established C libraries and then replacing that with native implementations as the code goes been through a rigorous security audit? Or how about do both as API compatible implementations? That would work for people who need the proven security and people who can't afford external dependencies as well as allow them to be swapped out for each other with minimal effort once the native code is proven.

I do like this idea. swap implementations by simply swapping import and linking? nice.

This was my goal... to write native implementation along with OpenSSL wrapper and add 'useOpenSSL' version identifier. Would that satisfy everyone?
Nov 27 2011
parent bcs <bcs example.com> writes:
On 11/27/2011 12:15 PM, Piotr Szturmaj wrote:
 Jude Young wrote:
 On Sun 27 Nov 2011 10:27:58 AM CST, bcs wrote:
 On 11/26/2011 04:19 PM, Brad Anderson wrote:
 How about putting a disclaimer on the module warning the code hasn't
 been through a rigorous security audit and point them at well
 established C libraries if they need that sort of assurance.

What does that gain over implementing the first itteration in terms of well established C libraries and then replacing that with native implementations as the code goes been through a rigorous security audit? Or how about do both as API compatible implementations? That would work for people who need the proven security and people who can't afford external dependencies as well as allow them to be swapped out for each other with minimal effort once the native code is proven.

I do like this idea. swap implementations by simply swapping import and linking? nice.

This was my goal... to write native implementation along with OpenSSL wrapper and add 'useOpenSSL' version identifier. Would that satisfy everyone?

Yes, though I'd prefer to see them distinct and non-mutually exclusive. For one things, someone may well consider the native implementation of one primitive vetted before they consider another to be. Both results could be had by the suitable application of aliases.
Nov 27 2011
prev sibling parent bcs <bcs example.com> writes:
On 11/27/2011 12:14 PM, Brad Anderson wrote:
 That's even better but isn't the issue over bundling incompatibly
 licensed libraries with phobos?  Nothing is stopping someone from
 writing bindings for these libraries as some random library on D Source
 or Github already.

If we can't find something with a licence allowing bundling then we just include the D language bits (including bindings) and note that along with where to get the lib.
 An agreed upon API would be very nice in any case.

That is necessary (or at least very desirable) in any case as it would allow swapping one cipher for another just as easily as it would allow swapping one implementation for another.
Nov 27 2011
prev sibling next sibling parent "Regan Heath" <regan netmail.co.nz> writes:
On Tue, 22 Nov 2011 16:16:21 -0000, Piotr Szturmaj <bncrbme jadamspam.pl>
wrote:
 bcs wrote:
 On 11/20/2011 08:10 AM, Piotr Szturmaj wrote:
 bcs wrote:
 On 11/04/2011 04:27 AM, Piotr Szturmaj wrote:
 bcs wrote:
 Are you re-implementing the function kernels your self or are you
 using
 an existing implementation? Given what I've heard about things like
 side-channel attacks using processing times to recover keys, I'd
 rather
 not see Phobos use anything written by less than the best expert
 available.

Until now, I implemented some hash functions. There are no branching instructions in their transform() routines, so theoretically processing time is independent of the function input.

From my very incomplete memory I found the source I was looking for (I googled for "aes interperative dance") here http://www.moserware.com/2009/09/stick-figure-guide-to-advanced.html Look for "Foot-Shooting Prevention Agreement" in one of the images ~20-25% of the way down. tl;dr; It mentions "cache-based, timing, and other side channel attacks". Unless you can explain to me what those are, in painful detail, I don't think we should trust you to avoid them. Get a good vetted C implementation and wrap it with a nice D API and call it a day.


 Sorry for late answer.

Np, but I still have a number of concerns: What is the advantage to implementing the kernels of any of these functions in D? Will the code be faster? Smaller? More secure? More maintainable? (On the other hand, the value of doing the API code in D goes with no debate.)

The first advantage is that Phobos will be independent of any crypto libraries. The second one is that there will be no licensing issues. All crypto code will be under Boost license like the rest of Phobos.

Ultimately I think it comes down to the question of whether we want/expect to have native D implementations of things, or whether certain projects/libraries will always be too large to maintain in D. The answer to that, comes down to whether we think D will gain a sufficient user base to include enough people able to produce and maintain those libraries, or .. whether D will gain sufficient importance that existing developers in those libraries produce D bindings themselves. Under the assumption that D will gain sufficient traction it makes sense to start implementing what we can in D, now. At the same time, in order to gain that traction D needs bindings to existing libraries. The latter is probably a slightly higher priority at present, but it's not a reason not to develop native D implementations at the same time. If we're lucky having initially incomplete native D implementations will actually encourage people with those skills to contribute to D, as there is a certain sort of satisfaction in doing so.
 How many people in the D community have the experience and know-how to
 review the security of an implementation? If there are less than 2 or 3
 people who can do that, can we afford to include native kernels? We
 can't have just one and if we have only two and one leaves for some
 reason the code becomes un-maintainable for lack of a reviewer. *I*
 wouldn't be comfortable with less than about 4-5.

I know Regan Heath who wrote some crypto code for Tango. Also, I suspect that D _will_ gain more (crypto) contributors, especially after joining GCC.

I wrote those as a crypto novice and haven't had any cause to advance since then. There are some obvious things to watch out for, for example the hashing routines should not make copies of the input data, or if they do they should 'scrub' that memory clean afterwards. But, that's the limit of my knowledge, there are bound to be more advanced problems and solutions that I'm simply not aware of. Regan -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Nov 22 2011
prev sibling next sibling parent Brad Anderson <eco gnuk.net> writes:
--f46d040838bd9155a504b2ac56bc
Content-Type: text/plain; charset=ISO-8859-1

On Fri, Nov 25, 2011 at 10:31 PM, bcs <bcs example.com> wrote:

 On 11/22/2011 08:16 AM, Piotr Szturmaj wrote:

 bcs wrote:

How many people in the D community have the experience and know-how to
 review the security of an implementation? If there are less than 2 or 3
 people who can do that, can we afford to include native kernels? We
 can't have just one and if we have only two and one leaves for some
 reason the code becomes un-maintainable for lack of a reviewer. *I*
 wouldn't be comfortable with less than about 4-5.

I know Regan Heath who wrote some crypto code for Tango. Also, I suspect that D _will_ gain more (crypto) contributors, especially after joining GCC.

"Wrote some crypto code" is a rather weak recommendation. Depending on how you interpret it, that would recommend *me*. A better recommendation would be "Mr Y gets paid by security company X to do crypto analysis" or "Mrs Z has published several well review papers on vulnerabilities in this kind of code". Minimum number of contributors/reviewers requirement in open-source
 project is at least unfortunate in my opinion. Nevertheless, I respect
 your thoughts. But imagine what could happen if Walter waited for
 contributors instead of starting D project on his own?

 Please realize that we do not implement every possible crypto algorithm
 at once. We need to start with something like hashing, then add
 encryption and other cryptographic primitives.

I have no problem with that comment. My concern revolves around the point that the implementation of cryptographic primitives has security implications. I'm worried that we don't have the resources to demonstrate that our implementation is at least as good as the currently available implementation. I'd rather Phobos not include a given primitive than contain one of unknown quality. What I'd like to see is that the crypto package quickly contain interfaces for all the primitives we can find pre-vetted Boost licensed implementations for. At that point I would have no issue with as methodical and meticulous effort to divest ourselves of external dependencies as we can get access to the expertises needed to vet our own implementations (to the same level as the code they are replacing). Yes, I'm intentionally being paranoid here but this is security and paranoia is part of the job description darn-it.

How about putting a disclaimer on the module warning the code hasn't been through a rigorous security audit and point them at well established C libraries if they need that sort of assurance. --f46d040838bd9155a504b2ac56bc Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On Fri, Nov 25, 2011 at 10:31 PM, bcs <span dir=3D"ltr">&lt;<a href=3D"mail= to:bcs example.com">bcs example.com</a>&gt;</span> wrote:<br><div class=3D"= gmail_quote"><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;b= order-left:1px #ccc solid;padding-left:1ex;"> <div class=3D"im">On 11/22/2011 08:16 AM, Piotr Szturmaj wrote:<br> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p= x #ccc solid;padding-left:1ex"> bcs wrote:<br> </blockquote> <br> </div><div class=3D"im"><blockquote class=3D"gmail_quote" style=3D"margin:0= 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><blockquote class=3D= "gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding= -left:1ex"> How many people in the D community have the experience and know-how to<br> review the security of an implementation? If there are less than 2 or 3<br> people who can do that, can we afford to include native kernels? We<br> can&#39;t have just one and if we have only two and one leaves for some<br> reason the code becomes un-maintainable for lack of a reviewer. *I*<br> wouldn&#39;t be comfortable with less than about 4-5.<br> </blockquote> <br> I know Regan Heath who wrote some crypto code for Tango. Also, I suspect<br=

GCC.<br> </blockquote> <br></div> &quot;Wrote some crypto code&quot; is a rather weak recommendation. Dependi= ng on how you interpret it, that would recommend *me*. A better recommendat= ion would be &quot;Mr Y gets paid by security company X to do crypto analys= is&quot; or &quot;Mrs Z has published several well review papers on vulnera= bilities in this kind of code&quot;.<div class=3D"im"> <br> <br> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p= x #ccc solid;padding-left:1ex"> Minimum number of contributors/reviewers requirement in open-source<br> project is at least unfortunate in my opinion. Nevertheless, I respect<br> your thoughts. But imagine what could happen if Walter waited for<br> contributors instead of starting D project on his own?<br> <br> Please realize that we do not implement every possible crypto algorithm<br> at once. We need to start with something like hashing, then add<br> encryption and other cryptographic primitives.<br> </blockquote> <br></div> I have no problem with that comment. My concern revolves around the point t= hat the implementation of cryptographic primitives has security implication= s. I&#39;m worried that we don&#39;t have the resources to demonstrate that= our implementation is at least as good as the currently available implemen= tation. I&#39;d rather Phobos not include a given primitive than contain on= e of unknown quality. What I&#39;d like to see is that the crypto package q= uickly contain interfaces for all the primitives we can find pre-vetted Boo= st licensed implementations for. At that point I would have no issue with a= s methodical and meticulous effort to divest ourselves of external dependen= cies as we can get access to the expertises needed to vet our own implement= ations (to the same level as the code they are replacing).<br> <br> Yes, I&#39;m intentionally being paranoid here but this is security and par= anoia is part of the job description darn-it.<br> </blockquote></div><br><div>How about putting a disclaimer on the module wa= rning the code hasn&#39;t been through a=A0rigorous=A0security audit and po= int them at well established C libraries if they need that sort of assuranc= e.</div> --f46d040838bd9155a504b2ac56bc--
Nov 26 2011
prev sibling next sibling parent Jude Young <10equals2 gmail.com> writes:
On Sun 27 Nov 2011 10:27:58 AM CST, bcs wrote:
 On 11/26/2011 04:19 PM, Brad Anderson wrote:
 How about putting a disclaimer on the module warning the code hasn't
 been through a rigorous security audit and point them at well
 established C libraries if they need that sort of assurance.

What does that gain over implementing the first itteration in terms of well established C libraries and then replacing that with native implementations as the code goes been through a rigorous security audit? Or how about do both as API compatible implementations? That would work for people who need the proven security and people who can't afford external dependencies as well as allow them to be swapped out for each other with minimal effort once the native code is proven.

I do like this idea. swap implementations by simply swapping import and linking? nice.
Nov 27 2011
prev sibling parent Brad Anderson <eco gnuk.net> writes:
--f46d040712a7b533f504b2bd0922
Content-Type: text/plain; charset=ISO-8859-1

On Sun, Nov 27, 2011 at 9:27 AM, bcs <bcs example.com> wrote:

 On 11/26/2011 04:19 PM, Brad Anderson wrote:

 How about putting a disclaimer on the module warning the code hasn't
 been through a rigorous security audit and point them at well
 established C libraries if they need that sort of assurance.

What does that gain over implementing the first itteration in terms of well established C libraries and then replacing that with native implementations as the code goes been through a rigorous security audit? Or how about do both as API compatible implementations? That would work for people who need the proven security and people who can't afford external dependencies as well as allow them to be swapped out for each other with minimal effort once the native code is proven.

That's even better but isn't the issue over bundling incompatibly licensed libraries with phobos? Nothing is stopping someone from writing bindings for these libraries as some random library on D Source or Github already. An agreed upon API would be very nice in any case. --f46d040712a7b533f504b2bd0922 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On Sun, Nov 27, 2011 at 9:27 AM, bcs <span dir=3D"ltr">&lt;<a href=3D"mailt= o:bcs example.com">bcs example.com</a>&gt;</span> wrote:<br><div class=3D"g= mail_quote"><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;bo= rder-left:1px #ccc solid;padding-left:1ex;"> <div class=3D"im">On 11/26/2011 04:19 PM, Brad Anderson wrote:<br> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p= x #ccc solid;padding-left:1ex"> <br> How about putting a disclaimer on the module warning the code hasn&#39;t<br=

established C libraries if they need that sort of assurance.<br> </blockquote> <br></div> What does that gain over implementing the first itteration in terms of well= established C libraries and then replacing that with native implementation= s as the code goes been through a rigorous security audit?<br> <br> Or how about do both as API compatible implementations? That would work for= people who need the proven security and people who can&#39;t afford extern= al dependencies as well as allow them to be swapped out for each other with= minimal effort once the native code is proven.<br> </blockquote></div><br><div>That&#39;s even better but isn&#39;t the issue = over bundling incompatibly licensed libraries with phobos? =A0Nothing is st= opping someone from writing bindings for these libraries as some random lib= rary on D Source or Github already. =A0An agreed upon API would be very nic= e in any case.</div> --f46d040712a7b533f504b2bd0922--
Nov 27 2011