www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - RFC: std.uuid

reply Johannes Pfau <spam example.com> writes:
I've finished the port of boost.uuid to D and I'd hope to get some initial 
feedback.

Quoting the module documentation:

This is a port of boost.uuid from the boost project with some minor 
additions and API changes for a more D-like API.
A UUID, or Universally unique identifier, is intended to uniquely identify 
information in a distributed environment without significant central 
coordination. It can be used to tag objects with very short lifetimes, or to 
reliably identify very persistent objects across a network [...]

Documentation:
http://dl.dropbox.com/u/24218791/d/src/uuid.html

Source Code:
https://github.com/jpf91/phobos/blob/std.uuid/std/uuid.rl
https://github.com/jpf91/phobos/blob/std.uuid/std/uuid.d

There's one special thing about this module: It uses the ragel 
(http://www.complang.org/ragel/) state machine compiler. I hope this is not 
a problem for phobos, ragel doesn't introduce any runtime dependencies and 
has no effect on licensing. There's also no need to integrate ragel in the 
build process. We can publish a pregenerated .d file, which should be 
updated manually whenever changes to uuid.rl are made.

This module also depends on Piotr Szturmaj's crypto library to generate 
level 3&5 UUIDS. The code for this is written, but wouldn't be included in 
phobos until official SHA1 and MD5 implementations are in phobos. Swapping 
the MD5/SHA1 implementations against a different implementation should be 
very easy.

Some things I'd especially like feedback for:
* I'd really like to get suggestions for type/function names. Should the 
UUID struct be UUID/uuid/Uuid ?
* the names nameMD5UUID/nameSHAUUID look especially ugly. ideas?
* comments on typos/language etc
Dec 22 2011
next sibling parent reply "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Thursday, 22 December 2011 at 23:12:07 UTC, Johannes Pfau 
wrote:
 There's one special thing about this module: It uses the ragel 
 (http://www.complang.org/ragel/) state machine compiler. I hope 
 this is not a problem for phobos, ragel doesn't introduce any 
 runtime dependencies and has no effect on licensing. There's 
 also no need to integrate ragel in the build process. We can 
 publish a pregenerated .d file, which should be updated 
 manually whenever changes to uuid.rl are made.

Would it be hard to reimplement the ragel part using string mixins? From a quick glance at it, it doesn't seem too difficult.
Dec 22 2011
next sibling parent reply Johannes Pfau <spam example.com> writes:
Vladimir Panteleev wrote:

 On Thursday, 22 December 2011 at 23:12:07 UTC, Johannes Pfau
 wrote:
 There's one special thing about this module: It uses the ragel
 (http://www.complang.org/ragel/) state machine compiler. I hope
 this is not a problem for phobos, ragel doesn't introduce any
 runtime dependencies and has no effect on licensing. There's
 also no need to integrate ragel in the build process. We can
 publish a pregenerated .d file, which should be updated
 manually whenever changes to uuid.rl are made.

Would it be hard to reimplement the ragel part using string mixins? From a quick glance at it, it doesn't seem too difficult.

It wouldn't be too difficult. Ragel is only used to parse the text form of uuids. As that's a pretty simple task, such a parser could be written manually. But I also have a almost complete HTTP header parser/formatter implementation using ragel which I wanted to propose for phobos eventually. And I'm not going to rewrite those, so I hoped std.uuid could clear the way for ragel based code in phobos ;-)
Dec 22 2011
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/22/11 5:38 PM, Vladimir Panteleev wrote:
 On Thursday, 22 December 2011 at 23:30:06 UTC, Johannes Pfau wrote:
 Vladimir Panteleev wrote:

 On Thursday, 22 December 2011 at 23:12:07 UTC, Johannes Pfau
 wrote:
 There's one special thing about this module: It uses the ragel
 (http://www.complang.org/ragel/) state machine compiler. I hope
 this is not a problem for phobos, ragel doesn't introduce any
 runtime dependencies and has no effect on licensing. There's
 also no need to integrate ragel in the build process. We can
 publish a pregenerated .d file, which should be updated
 manually whenever changes to uuid.rl are made.

Would it be hard to reimplement the ragel part using string mixins? From a quick glance at it, it doesn't seem too difficult.

It wouldn't be too difficult. Ragel is only used to parse the text form of uuids. As that's a pretty simple task, such a parser could be written manually. But I also have a almost complete HTTP header parser/formatter implementation using ragel which I wanted to propose for phobos eventually. And I'm not going to rewrite those, so I hoped std.uuid could clear the way for ragel based code in phobos ;-)

Using 3rd-party technology such as source preprocessors makes future contributions more difficult for the vast majority. Considering that the language provides nearly the same thing as a fully-supported feature, it seems like a redundant complication. Parsing/formatting HTTP headers isn't exactly rocket-science, either.

Agreed. I think it would be wonderful to achieve ragel capabilities with compile-time execution. If we need ragel in D we've failed. Andrei
Dec 22 2011
parent reply Johannes Pfau <spam example.com> writes:
Andrei Alexandrescu wrote:

 On 12/22/11 5:38 PM, Vladimir Panteleev wrote:
 Using 3rd-party technology such as source preprocessors makes future
 contributions more difficult for the vast majority. Considering that the
 language provides nearly the same thing as a fully-supported feature


I must be missing something. Your talking of mixins? Ragel is not a macro/mixin system, it's a state machine compiler, compiling a 20 line machine definition into 700 line fast, goto based code? Or are you talking about the new regex? I seriously doubt that any regex implementation can compare with ragel generated code in terms of performance. (Not important for the uuid id parser, but if you ever want to implement a fast http server...)
 it seems like a redundant complication. Parsing/formatting HTTP headers
 isn't exactly rocket-science, either.


Nope, but it gets exhausting if you write all parsers by hand (and it's more probable that you introduce bugs). Mongrel (ruby http server) uses ragel, lighttpd2 (small, fast webserver) does as well. Also how can writing a state machine definition which looks exactly like the BNF in the spec be more difficult than writing a parser manually?
 Agreed. I think it would be wonderful to achieve ragel capabilities with
 compile-time execution. If we need ragel in D we've failed.
 
 Andrei

I agree that a state machine compiler in D using ctfe would be awesome, but currently there is none. If ragel based code isn't acceptable for phobos at all, I'll just rewrite the uuid parsers using for, switch and to!ubyte.
Dec 23 2011
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/23/11 2:40 AM, Johannes Pfau wrote:
 Andrei Alexandrescu wrote:

 On 12/22/11 5:38 PM, Vladimir Panteleev wrote:
 Using 3rd-party technology such as source preprocessors makes future
 contributions more difficult for the vast majority. Considering that the
 language provides nearly the same thing as a fully-supported feature


I must be missing something. Your talking of mixins? Ragel is not a macro/mixin system, it's a state machine compiler, compiling a 20 line machine definition into 700 line fast, goto based code?

I'm talking about ragel. D should have enough capabilities to generate 700 lines of fast code from a 20-line spec. Andrei
Dec 23 2011
parent Johannes Pfau <spam example.com> writes:
Andrei Alexandrescu wrote:

 On 12/23/11 2:40 AM, Johannes Pfau wrote:
 Andrei Alexandrescu wrote:

 On 12/22/11 5:38 PM, Vladimir Panteleev wrote:
 Using 3rd-party technology such as source preprocessors makes future
 contributions more difficult for the vast majority. Considering that
 the language provides nearly the same thing as a fully-supported
 feature


I must be missing something. Your talking of mixins? Ragel is not a macro/mixin system, it's a state machine compiler, compiling a 20 line machine definition into 700 line fast, goto based code?

I'm talking about ragel. D should have enough capabilities to generate 700 lines of fast code from a 20-line spec. Andrei

I meant especially this sentence: "Considering that the language provides nearly the same thing as a fully-supported feature", what is this feature? I can only think of string mixins, which sure allows to implement a state machine compiler in ctfe, but it's not "nearly the same"
Dec 23 2011
prev sibling parent Johannes Pfau <spam example.com> writes:
Vladimir Panteleev wrote:

 On Thursday, 22 December 2011 at 23:12:07 UTC, Johannes Pfau
 wrote:
 There's one special thing about this module: It uses the ragel
 (http://www.complang.org/ragel/) state machine compiler. I hope
 this is not a problem for phobos, ragel doesn't introduce any
 runtime dependencies and has no effect on licensing. There's
 also no need to integrate ragel in the build process. We can
 publish a pregenerated .d file, which should be updated
 manually whenever changes to uuid.rl are made.

Would it be hard to reimplement the ragel part using string mixins? From a quick glance at it, it doesn't seem too difficult.

I uploaded a new version at https://github.com/jpf91/phobos/blob/std.uuid/std/uuid.d which gets rid of the ragel dependency.
Dec 23 2011
prev sibling next sibling parent "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Thursday, 22 December 2011 at 23:30:06 UTC, Johannes Pfau 
wrote:
 Vladimir Panteleev wrote:

 On Thursday, 22 December 2011 at 23:12:07 UTC, Johannes Pfau
 wrote:
 There's one special thing about this module: It uses the ragel
 (http://www.complang.org/ragel/) state machine compiler. I 
 hope
 this is not a problem for phobos, ragel doesn't introduce any
 runtime dependencies and has no effect on licensing. There's
 also no need to integrate ragel in the build process. We can
 publish a pregenerated .d file, which should be updated
 manually whenever changes to uuid.rl are made.

Would it be hard to reimplement the ragel part using string mixins? From a quick glance at it, it doesn't seem too difficult.

It wouldn't be too difficult. Ragel is only used to parse the text form of uuids. As that's a pretty simple task, such a parser could be written manually. But I also have a almost complete HTTP header parser/formatter implementation using ragel which I wanted to propose for phobos eventually. And I'm not going to rewrite those, so I hoped std.uuid could clear the way for ragel based code in phobos ;-)

Using 3rd-party technology such as source preprocessors makes future contributions more difficult for the vast majority. Considering that the language provides nearly the same thing as a fully-supported feature, it seems like a redundant complication. Parsing/formatting HTTP headers isn't exactly rocket-science, either.
Dec 22 2011
prev sibling next sibling parent reply Piotr Szturmaj <bncrbme jadamspam.pl> writes:
Johannes Pfau wrote:
 I've finished the port of boost.uuid to D and I'd hope to get some initial
 feedback.

Very nice. I will need UUIDs in one of my D projects :)
 This module also depends on Piotr Szturmaj's crypto library to generate
 level 3&5 UUIDS. The code for this is written, but wouldn't be included in
 phobos until official SHA1 and MD5 implementations are in phobos. Swapping
 the MD5/SHA1 implementations against a different implementation should be
 very easy.

I want to contribute it to Phobos. I will be working on a project which will make extensive use of cryptography. So if I'm about to write D crypto code anyway, I thought it might be better to contribute it to std (if everyone would like it). There are couple of issues though: * there is a pull request with SHA1 implementation using SSSE3. But it is only SHA1. My implementation contains all SHA flavors up to SHA-512 without SHA-0 (which is broken). I think we should combine these implementations to get the best of both. * comments about side-channel vurnelability. I think each crypto primitive should have a note in the docs if its vurnelable or not. That should be enough IMHO. It is impractical to make it safe on all platforms - no single general purpose crypto library is 100% safe against side channel attacks. * it is not finished yet. Currently there are no ciphers, only hashes. * after reading some posts in "Early std.crypto" thread I don't know if it is still welcome to Phobos. I need a "green light" first.
 Some things I'd especially like feedback for:
 * I'd really like to get suggestions for type/function names. Should the
 UUID struct be UUID/uuid/Uuid ?

UUID is the standard name. It is a shortcut similar to "UTF" which in Phobos is uppercase.
 * the names nameMD5UUID/nameSHAUUID look especially ugly. ideas?

uuidMD5 / uuidSHA1 ?
Dec 23 2011
parent reply Johannes Pfau <spam example.com> writes:
Piotr Szturmaj wrote:

 Johannes Pfau wrote:
 I've finished the port of boost.uuid to D and I'd hope to get some
 initial feedback.

Very nice. I will need UUIDs in one of my D projects :)
 This module also depends on Piotr Szturmaj's crypto library to generate
 level 3&5 UUIDS. The code for this is written, but wouldn't be included
 in phobos until official SHA1 and MD5 implementations are in phobos.
 Swapping the MD5/SHA1 implementations against a different implementation
 should be very easy.

I want to contribute it to Phobos. I will be working on a project which will make extensive use of cryptography. So if I'm about to write D crypto code anyway, I thought it might be better to contribute it to std (if everyone would like it). There are couple of issues though: * there is a pull request with SHA1 implementation using SSSE3. But it is only SHA1. My implementation contains all SHA flavors up to SHA-512 without SHA-0 (which is broken). I think we should combine these implementations to get the best of both. * comments about side-channel vurnelability. I think each crypto primitive should have a note in the docs if its vurnelable or not. That should be enough IMHO. It is impractical to make it safe on all platforms - no single general purpose crypto library is 100% safe against side channel attacks. * it is not finished yet. Currently there are no ciphers, only hashes. * after reading some posts in "Early std.crypto" thread I don't know if it is still welcome to Phobos. I need a "green light" first.

and md5 at compile time, so using a C library as proposed in that thread would be bad for std.uuid. I hope you'll get your crypto code into phobos ;-) Related question to the SHA/MD5 hash functions: could those be pure?
 
 Some things I'd especially like feedback for:
 * I'd really like to get suggestions for type/function names. Should the
 UUID struct be UUID/uuid/Uuid ?

UUID is the standard name. It is a shortcut similar to "UTF" which in Phobos is uppercase.

OK
 * the names nameMD5UUID/nameSHAUUID look especially ugly. ideas?

uuidMD5 / uuidSHA1 ?

Dec 23 2011
next sibling parent reply Piotr Szturmaj <bncrbme jadamspam.pl> writes:
Johannes Pfau wrote:
 Related question to the SHA/MD5 hash functions: could those be pure?

Weakly pure, yes - but for what?. Btw. I just gave a try at compile time evaluation of these hashes. I got rid of memcpy but then endianness issues arise. According to CTFE docs: "non-portable casts (eg, from int[] to float[]), including casts which depend on endianness, are not permitted." and the sad part is that hash source code involves endianness. So, there will be no compile-time hash support unless compiler would allow casting from uint[] to ubyte[] or uint* to ubyte*.
Dec 23 2011
parent reply Piotr Szturmaj <bncrbme jadamspam.pl> writes:
Jonathan M Davis wrote:
 On Friday, December 23, 2011 23:09:32 Piotr Szturmaj wrote:
 Johannes Pfau wrote:
 Related question to the SHA/MD5 hash functions: could those be pure?

Weakly pure, yes - but for what?

In general, if a function _can_ be pure, it _should_ be pure. If it can be and it isn't, it artificially restricts the types of functions which can call it.

Yes, Johannes probably want to mark uuid hash gen as pure. I just wanted to know if its something important as my code used memcpy which is impure.
Dec 23 2011
parent reply Piotr Szturmaj <bncrbme jadamspam.pl> writes:
Vladimir Panteleev wrote:
 On Saturday, 24 December 2011 at 00:31:43 UTC, Piotr Szturmaj wrote:
 Jonathan M Davis wrote:
 On Friday, December 23, 2011 23:09:32 Piotr Szturmaj wrote:
 Johannes Pfau wrote:
 Related question to the SHA/MD5 hash functions: could those be pure?

Weakly pure, yes - but for what?

In general, if a function _can_ be pure, it _should_ be pure. If it can be and it isn't, it artificially restricts the types of functions which can call it.

Yes, Johannes probably want to mark uuid hash gen as pure. I just wanted to know if its something important as my code used memcpy which is impure.

Where does your code use memcpy? I see one mention in the comments, but none in the code.

See putArray() in base.d
 Anyway, I believe you can do without memcpy by using array copy? Array
 copy might even be faster, since memcpy is not a DMD compiler intrinsic
 like in many C/C++ compilers.

I converted memcpy calls to array copy but it become about 1 Mbps slower.
Dec 24 2011
parent reply Piotr Szturmaj <bncrbme jadamspam.pl> writes:
Vladimir Panteleev wrote:
 On Sunday, 25 December 2011 at 01:08:04 UTC, Piotr Szturmaj wrote:
 Anyway, I believe you can do without memcpy by using array copy? Array
 copy might even be faster, since memcpy is not a DMD compiler intrinsic
 like in many C/C++ compilers.

I converted memcpy calls to array copy but it become about 1 Mbps slower.


That should be MBps.
 That's strange. I've tried optimizing some of my code today, and
 changing slice copies to memcpy had the opposite effect. You are
 benchmarking with -O -release -inline, right?

Yes. Here are the results: http://pastebin.com/rD8kiaQy. This is observed only with Windows DMD.
Dec 26 2011
parent Piotr Szturmaj <bncrbme jadamspam.pl> writes:
Vladimir Panteleev wrote:
 On Monday, 26 December 2011 at 17:37:17 UTC, Piotr Szturmaj wrote:
 Yes. Here are the results: http://pastebin.com/rD8kiaQy. This is
 observed only with Windows DMD.

I'd be more interested in seeing the code.

Sorry for late answer. For memcpy cases code is the same as in my github Phobos fork. Here is the change to slice copying: http://pastebin.com/EteqEper
 I've done some more research on this. In release builds, DMD on Windows
 emits a memcpy call for a slice copy. However, the auto-generated memcpy
 call has slightly less overhead (register/stack shuffling) than a manual
 memcpy call, which explains the performance difference I was seeing.

Jan 06 2012
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Friday, December 23, 2011 23:09:32 Piotr Szturmaj wrote:
 Johannes Pfau wrote:
 Related question to the SHA/MD5 hash functions: could those be pure?

Weakly pure, yes - but for what?

In general, if a function _can_ be pure, it _should_ be pure. If it can be and it isn't, it artificially restricts the types of functions which can call it. - Jonathan M Davis
Dec 23 2011
prev sibling next sibling parent "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Saturday, 24 December 2011 at 00:31:43 UTC, Piotr Szturmaj 
wrote:
 Jonathan M Davis wrote:
 On Friday, December 23, 2011 23:09:32 Piotr Szturmaj wrote:
 Johannes Pfau wrote:
 Related question to the SHA/MD5 hash functions: could those 
 be pure?

Weakly pure, yes - but for what?

In general, if a function _can_ be pure, it _should_ be pure. If it can be and it isn't, it artificially restricts the types of functions which can call it.

Yes, Johannes probably want to mark uuid hash gen as pure. I just wanted to know if its something important as my code used memcpy which is impure.

Where does your code use memcpy? I see one mention in the comments, but none in the code. Anyway, I believe you can do without memcpy by using array copy? Array copy might even be faster, since memcpy is not a DMD compiler intrinsic like in many C/C++ compilers.
Dec 24 2011
prev sibling next sibling parent "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Sunday, 25 December 2011 at 01:08:04 UTC, Piotr Szturmaj wrote:
 Where does your code use memcpy? I see one mention in the 
 comments, but
 none in the code.

See putArray() in base.d

Sorry, I lost track of the conversation. I was looking at uuid.d.
 Anyway, I believe you can do without memcpy by using array 
 copy? Array
 copy might even be faster, since memcpy is not a DMD compiler 
 intrinsic
 like in many C/C++ compilers.

I converted memcpy calls to array copy but it become about 1 Mbps slower.

I guess array copy is currently a runtime call rather than an intrinsic, then...
Dec 24 2011
prev sibling next sibling parent "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Sunday, 25 December 2011 at 01:08:04 UTC, Piotr Szturmaj wrote:
 Anyway, I believe you can do without memcpy by using array 
 copy? Array
 copy might even be faster, since memcpy is not a DMD compiler 
 intrinsic
 like in many C/C++ compilers.

I converted memcpy calls to array copy but it become about 1 Mbps slower.

That's strange. I've tried optimizing some of my code today, and changing slice copies to memcpy had the opposite effect. You are benchmarking with -O -release -inline, right?
Dec 24 2011
prev sibling next sibling parent "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Monday, 26 December 2011 at 17:37:17 UTC, Piotr Szturmaj wrote:
 Yes. Here are the results: http://pastebin.com/rD8kiaQy. This 
 is observed only with Windows DMD.

I'd be more interested in seeing the code. I've done some more research on this. In release builds, DMD on Windows emits a memcpy call for a slice copy. However, the auto-generated memcpy call has slightly less overhead (register/stack shuffling) than a manual memcpy call, which explains the performance difference I was seeing.
Dec 28 2011
prev sibling parent "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Friday, 6 January 2012 at 21:10:50 UTC, Piotr Szturmaj wrote:
 Vladimir Panteleev wrote:
 On Monday, 26 December 2011 at 17:37:17 UTC, Piotr Szturmaj 
 wrote:
 Yes. Here are the results: http://pastebin.com/rD8kiaQy. This 
 is
 observed only with Windows DMD.

I'd be more interested in seeing the code.

Sorry for late answer. For memcpy cases code is the same as in my github Phobos fork. Here is the change to slice copying: http://pastebin.com/EteqEper

I haven't looked at the disassembly yet, but I'd suggest to rewrite your code so that the left side of the assignment is a slice expression beginning with 0. I think DMD will generate optimal code (memcpy with slightly less overhead than a manual call) if you make it clear to the compiler that the left-hand slice and the right-hand slice have the same length. Also, it looks like the slice version wastes an extra variable (bw).
Jan 06 2012
prev sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Saturday, December 24, 2011 01:31:46 Piotr Szturmaj wrote:
 Yes, Johannes probably want to mark uuid hash gen as pure. I just wanted
 to know if its something important as my code used memcpy which is impure.

We really need to find a way for the C memory functions to be considered pure like the D ones are short of having to use casts inside functions using them. Appender has the same problem for the same reason, I believe, which makes a lot of functions in Phobos not be able to be pure when they should be able to be. - Jonathan M Davis
Dec 23 2011