www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Review: std.uuid

reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
The review process stalled long enough, let's kick start it with a small 
yet a valuable module that was there for quite some time.

std.uuid by Johannes Pfau is a rehash of it's C++ twin from well known 
Boost library.

The review cycle takes the usual 2 weeks starting today 9th June, ending 
at 23 June 00:00 UTC followed by a one week vote process ending at 30 June.

The module gives the following description:
---------------------
This is a port of boost.uuid from the boost project with some minor
additions and API changes for a more D-like API. A UUID, or Universally
unique identifier, is intended to uniquely identify information in a
distributed environment without significant central coordination. It
can be used to tag objects with very short lifetimes, or to reliably
identify very persistent objects across a network. UUIDs have many
applications. [...]
---------------------

Links & Info

Code: https://github.com/jpf91/phobos/blob/std.uuid/std/uuid.d
API-Docs: http://dl.dropbox.com/u/24218791/d/src/uuid.html

Additional note from Johannes:

The code and documentation for shaUUID has already been written,
but until phobos has support for SHA1, that can't be included. The code
is currently commented out in the source file (it's well tested
with some 3rd party SHA1 code), but the documentation for those
functions is included in the API-docs. I think those functions should
be reviewed as well, so that it's possible to add them to phobos with a
simple pull request at a later date.

-- 
Dmitry Olshansky
Jun 09 2012
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/9/2012 10:30 AM, Dmitry Olshansky wrote:
 The review process stalled long enough, let's kick start it with a small yet a
 valuable module that was there for quite some time.

Great! My initial suggestion is that the parsing routines should accept input ranges, rather just arrays.
Jun 09 2012
parent Walter Bright <newshound2 digitalmars.com> writes:
On 6/10/2012 2:05 AM, Johannes Pfau wrote:
 Am Sat, 09 Jun 2012 23:06:25 -0700
 Great! My initial suggestion is that the parsing routines should
 accept input ranges, rather just arrays.

That could be done, but I currently keep the complete input string in the ParserException. That wouldn't be possible if the parser has to operate on an InputRange. It could work with a ForwardRange, but converting the Range back to a string wouldn't be very efficient. This would work best with some kind of BufferedRange/Stream concept. I'm just wondering whether it really makes sense to use ranges in this case. The text representation of UUIDs is usually exactly 36 characters long (for the simple parser it must be exactly 36 characters), so when using some kind of range interface and std.uuid it should be easy enough to read the UUID string into a stack buffer, then pass it to std.uuid. But if it's really desired, I can sure add an implementation using ForwardRanges.

I can see it wouldn't make sense for the simple parser, but for the more general one it may. You can also specialize the more general parser for arrays, so there won't be the overhead of making a copy.
Jun 10 2012
prev sibling next sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 09.06.2012 21:30, Dmitry Olshansky wrote:
 The review process stalled long enough, let's kick start it with a small
 yet a valuable module that was there for quite some time.

Ok, let's grill it a bit ;) From browsing the docs alone: ParserException - too general name likely to collide with user code, call it UuidException instead InsufficientInputException - maybe merge it with ParserException, just add small <reason> field Why? - Typical user can't do much with this extra info. By the end of day an illegal UUID string is an illegal UUID string. Add a constructor that takes variadic ubyte[] arr.../byte[] arr.. that should just assert that argument has 16 bytes to hold UUID. (all these casts in docs ...) isNil - can we follow the precedent and use 'empty' for "has no useful state" as does for instance std.regex.Regex and all of Range API. Variant - may be confusing with as I thought of std.Variant till the bitter end :) uuidVersion - sadly we can't fix it's name ;) nilUUID should be a property that returns UUID.init or better an enum, no need to clutter object file with _TLS_ globals. parseUUID - like Walter pointed out, could use input range BTW functions taking a namespace UUID can take const UUID. extractRegex ---> uuidRegex what's easier from user standpoint? Now the code: line 587: swap surely could do better then this: swapRanges(this.data[], rhs.data[]); e.g. swap it as 4 uints or 2/4 size_ts. Main reason for built-in swap is efficiency. As it is now it won't surprise me if std.swap would be faster with its 3 bit-blits. line 1011: overload with no args Creating Random generator on each call is unacceptable. It's better to either remove it or use thread-local _static_ RNG here. -- Dmitry Olshansky
Jun 10 2012
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 10.06.2012 15:22, Johannes Pfau wrote:
 Am Sun, 10 Jun 2012 13:02:16 +0400
 schrieb Dmitry Olshansky<dmitry.olsh gmail.com>:

 On 09.06.2012 21:30, Dmitry Olshansky wrote:
 The review process stalled long enough, let's kick start it with a
 small yet a valuable module that was there for quite some time.

Ok, let's grill it a bit ;) From browsing the docs alone: ParserException - too general name likely to collide with user code, call it UuidException instead InsufficientInputException - maybe merge it with ParserException, just add small<reason> field Why? - Typical user can't do much with this extra info. By the end of day an illegal UUID string is an illegal UUID string.

OK. Is UUIDParserException too long? I'd like to keep 'parser' or something similar in the name, as almost all UUID operations are nothrow and 'UUIDException' sounds like a UUID may blow up at any time.
 Add a constructor that takes variadic ubyte[] arr.../byte[] arr..
 that should just assert that argument has 16 bytes to hold UUID. (all
 these casts in docs ...)

Good idea! But wouldn't it be better to use this(ubyte[16] uuid...) as described on http://dlang.org/function.html (Typesafe Variadic Functions: For static arrays) Meh - we still can't get rid of the casts: DMD complains: -------------- std/uuid.d(260): Error: template std.uuid.UUID.__ctor cannot deduce template function from argument types !()(int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int) -------------- and AFAIK there's no integer suffix to specify a ubyte value, so we end up with this: UUID tmp = UUID(cast(ubyte)0,cast(ubyte)1,cast(ubyte)2,cast(ubyte)3, cast(ubyte)4,cast(ubyte)5,cast(ubyte)6,cast(ubyte)7,cast(ubyte)8, cast(ubyte)9,cast(ubyte)10,cast(ubyte)11,cast(ubyte)12, cast(ubyte)13,cast(ubyte)14,cast(ubyte)15);
 isNil - can we follow the precedent and use 'empty' for "has no
 useful state" as does for instance std.regex.Regex and all of Range
 API.

OK
 Variant - may be confusing with as I thought of std.Variant till the
 bitter end :)

Do you know a better name? Both the wikipedia article and rfc4122 call it variant though, so I'd like to keep that name. I could probably add a Note: Do not confuse with std.variant.Variant warning.
 uuidVersion - sadly we can't fix it's name ;)

We could use '_version' or 'version_' but I think uuidVersion is the better solution.
 nilUUID should be a property that returns UUID.init or better an
 enum, no need to clutter object file with _TLS_ globals.

It already is an enum? line 771: enum UUID nilUUID = UUID.init;
 parseUUID - like Walter pointed out, could use input range

See my response to Walter.
 BTW functions taking a namespace UUID can take const UUID.

OK. (Although the namespace uuids are passed by value, so that shouldn't be necessary)
 extractRegex --->  uuidRegex what's easier from user standpoint?

OK, renamed to uuidRegex
 Now the code:

 line 587: swap surely could do better then this:
               swapRanges(this.data[], rhs.data[]);
 e.g. swap it as 4 uints or 2/4 size_ts. Main reason for built-in swap
 is efficiency. As it is  now it won't surprise me if std.swap would
 be faster with its 3 bit-blits.

you mean std.algorithm.swap?
 line 1011: overload with no args
 	Creating Random generator on each call is unacceptable. It's
 better to either remove it or use thread-local _static_ RNG here.

You mean simply declaring randomGen static in line 1013? Then the seed function would still be called every time. Or making randomGen a private module level TLS variable and calling seed in the module constructor? What about the overload only taking a type? It also creates the RNG on every call. To avoid this I could make randomGen static, but there's no way to avoid calling seed every time. Another possibility is to make uuid generators structs instead of functions, but that complicates the API and the randomUUID overload which takes a RNG variable should already be as fast as that. BTW: Is it OK to update the code& docs while the review is running or should I wait with those changes till the review is finished?

It's fine as long as you keep as informed of updates ;) -- Dmitry Olshansky
Jun 10 2012
prev sibling next sibling parent Johannes Pfau <nospam example.com> writes:
Am Sat, 09 Jun 2012 23:06:25 -0700
schrieb Walter Bright <newshound2 digitalmars.com>:

 On 6/9/2012 10:30 AM, Dmitry Olshansky wrote:
 The review process stalled long enough, let's kick start it with a
 small yet a valuable module that was there for quite some time.

Great! My initial suggestion is that the parsing routines should accept input ranges, rather just arrays.

That could be done, but I currently keep the complete input string in the ParserException. That wouldn't be possible if the parser has to operate on an InputRange. It could work with a ForwardRange, but converting the Range back to a string wouldn't be very efficient. This would work best with some kind of BufferedRange/Stream concept. I'm just wondering whether it really makes sense to use ranges in this case. The text representation of UUIDs is usually exactly 36 characters long (for the simple parser it must be exactly 36 characters), so when using some kind of range interface and std.uuid it should be easy enough to read the UUID string into a stack buffer, then pass it to std.uuid. But if it's really desired, I can sure add an implementation using ForwardRanges.
Jun 10 2012
prev sibling next sibling parent "Jonas Drewsen" <jdrewsen nospam.com> writes:
On Saturday, 9 June 2012 at 17:31:02 UTC, Dmitry Olshansky wrote:
 The review process stalled long enough, let's kick start it 
 with a small yet a valuable module that was there for quite 
 some time.

When looking at the associated RFC I noticed that a UUID can contain e.g. timestap, clock, node fields. What about providing properties that could return these fields? It would also be nice to have an index in the top of the page like std.algorithm has. Jonas
Jun 10 2012
prev sibling next sibling parent Johannes Pfau <nospam example.com> writes:
Am Sun, 10 Jun 2012 12:03:15 +0200
schrieb "Jonas Drewsen" <jdrewsen nospam.com>:

 When looking at the associated RFC I noticed that a UUID can 
 contain e.g. timestap, clock, node fields. What about providing 
 properties that could return these fields?

Why would you need those? A UUID only needs to be unique, which is guaranteed by those node, timestamp and clock fields. But the exact value of node, timestamp, clock is never important, as long as it's unique. std.uuid can't create version 1&2 uuids so we never have to set these fields either.
 It would also be nice to have an index in the top of the page 
 like std.algorithm has.

OK, I'll have a look at that.
Jun 10 2012
prev sibling next sibling parent Johannes Pfau <nospam example.com> writes:
Am Sun, 10 Jun 2012 13:02:16 +0400
schrieb Dmitry Olshansky <dmitry.olsh gmail.com>:

 On 09.06.2012 21:30, Dmitry Olshansky wrote:
 The review process stalled long enough, let's kick start it with a
 small yet a valuable module that was there for quite some time.

Ok, let's grill it a bit ;) From browsing the docs alone: ParserException - too general name likely to collide with user code, call it UuidException instead InsufficientInputException - maybe merge it with ParserException, just add small <reason> field Why? - Typical user can't do much with this extra info. By the end of day an illegal UUID string is an illegal UUID string.

OK. Is UUIDParserException too long? I'd like to keep 'parser' or something similar in the name, as almost all UUID operations are nothrow and 'UUIDException' sounds like a UUID may blow up at any time.
 
 Add a constructor that takes variadic ubyte[] arr.../byte[] arr..
 that should just assert that argument has 16 bytes to hold UUID. (all
 these casts in docs ...)

Good idea! But wouldn't it be better to use this(ubyte[16] uuid...) as described on http://dlang.org/function.html (Typesafe Variadic Functions: For static arrays) Meh - we still can't get rid of the casts: DMD complains: -------------- std/uuid.d(260): Error: template std.uuid.UUID.__ctor cannot deduce template function from argument types !()(int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int) -------------- and AFAIK there's no integer suffix to specify a ubyte value, so we end up with this: UUID tmp = UUID(cast(ubyte)0,cast(ubyte)1,cast(ubyte)2,cast(ubyte)3, cast(ubyte)4,cast(ubyte)5,cast(ubyte)6,cast(ubyte)7,cast(ubyte)8, cast(ubyte)9,cast(ubyte)10,cast(ubyte)11,cast(ubyte)12, cast(ubyte)13,cast(ubyte)14,cast(ubyte)15);
 
 isNil - can we follow the precedent and use 'empty' for "has no
 useful state" as does for instance std.regex.Regex and all of Range
 API.

OK
 
 Variant - may be confusing with as I thought of std.Variant till the 
 bitter end :)

Do you know a better name? Both the wikipedia article and rfc4122 call it variant though, so I'd like to keep that name. I could probably add a Note: Do not confuse with std.variant.Variant warning.
 uuidVersion - sadly we can't fix it's name ;)

We could use '_version' or 'version_' but I think uuidVersion is the better solution.
 
 nilUUID should be a property that returns UUID.init or better an
 enum, no need to clutter object file with _TLS_ globals.

It already is an enum? line 771: enum UUID nilUUID = UUID.init;
 parseUUID - like Walter pointed out, could use input range

See my response to Walter.
 BTW functions taking a namespace UUID can take const UUID.

OK. (Although the namespace uuids are passed by value, so that shouldn't be necessary)
 extractRegex ---> uuidRegex what's easier from user standpoint?

OK, renamed to uuidRegex
 
 Now the code:
 
 line 587: swap surely could do better then this:
              swapRanges(this.data[], rhs.data[]);
 e.g. swap it as 4 uints or 2/4 size_ts. Main reason for built-in swap
 is efficiency. As it is  now it won't surprise me if std.swap would
 be faster with its 3 bit-blits.

you mean std.algorithm.swap?
 
 line 1011: overload with no args
 	Creating Random generator on each call is unacceptable. It's
 better to either remove it or use thread-local _static_ RNG here.
 
 

You mean simply declaring randomGen static in line 1013? Then the seed function would still be called every time. Or making randomGen a private module level TLS variable and calling seed in the module constructor? What about the overload only taking a type? It also creates the RNG on every call. To avoid this I could make randomGen static, but there's no way to avoid calling seed every time. Another possibility is to make uuid generators structs instead of functions, but that complicates the API and the randomUUID overload which takes a RNG variable should already be as fast as that. BTW: Is it OK to update the code & docs while the review is running or should I wait with those changes till the review is finished?
Jun 10 2012
prev sibling next sibling parent reply Johannes Pfau <nospam example.com> writes:
Am Sat, 09 Jun 2012 21:30:57 +0400
schrieb Dmitry Olshansky <dmitry.olsh gmail.com>:
 Code: https://github.com/jpf91/phobos/blob/std.uuid/std/uuid.d
 API-Docs: http://dl.dropbox.com/u/24218791/d/src/uuid.html

I pushed these changes suggested by Dmitry Olshansky and Jonas Drewsen: * Add documentation table * Rename UUID.isNil --> UUID.empty * Merge ParserException and IsufficientInputException into UUIDParserException * Add note about std.variant.Variant * Rewrite example to avoid cast * Add non-working(commented out) variadic constructor * Rename extractRegex --> uuidRegex * Use std.algorithm.swap instead of swapRanges Both the source file and the documentation have been updated.
Jun 10 2012
next sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 12.06.2012 15:46, Johannes Pfau wrote:
 Am Mon, 11 Jun 2012 13:12:49 +0200
 schrieb Johannes Pfau<nospam example.com>:

 Am Sun, 10 Jun 2012 18:49:03 +0200
 schrieb Johannes Pfau<nospam example.com>:

 Am Sat, 09 Jun 2012 21:30:57 +0400
 schrieb Dmitry Olshansky<dmitry.olsh gmail.com>:
 Code: https://github.com/jpf91/phobos/blob/std.uuid/std/uuid.d
 API-Docs: http://dl.dropbox.com/u/24218791/d/src/uuid.html

I pushed these changes suggested by Dmitry Olshansky and Jonas Drewsen: * Add documentation table * Rename UUID.isNil --> UUID.empty * Merge ParserException and IsufficientInputException into UUIDParserException * Add note about std.variant.Variant * Rewrite example to avoid cast * Add non-working(commented out) variadic constructor * Rename extractRegex --> uuidRegex * Use std.algorithm.swap instead of swapRanges

* parseUUID now supports InputRanges with ElementType == dchar (all string types are still supported as well)

* randomUUID now creates the RNG only once per thread. * randomUUID lazily seeds the RNG once (per thread) on the first function call * randomUUID now uses the new seed overload (requires https://github.com/D-Programming-Language/phobos/pull/627 ) * the unsafe randomUUID overload taking only a type has been removed.

Judging by the swift rate of fixes and keeping in mind that it's only ~1.5 KLOC with abundance of unit tests I'm tempted to suggest the following. Let's trim the review time span a bit, so that review ends at 18-19 June and voting ends at 25-26 June. That's, of course, given that you and others don't mind it. How does it sound? -- Dmitry Olshansky
Jun 12 2012
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 12.06.2012 18:18, Johannes Pfau wrote:
 Am Tue, 12 Jun 2012 16:15:48 +0400
 schrieb Dmitry Olshansky<dmitry.olsh gmail.com>:

 Judging by the swift rate of fixes and keeping in mind that it's only
 ~1.5 KLOC with abundance of unit tests I'm tempted to suggest the
 following.

 Let's trim the review time span a bit, so that review ends at 18-19
 June and voting ends at 25-26 June. That's, of course, given that you
 and others don't mind it.

 How does it sound?

Sure, sounds good to me.

Great! I'll wait a day or two just to make sure there is no opposition among reviewers :) -- Dmitry Olshansky
Jun 12 2012
prev sibling next sibling parent "Tobias Pankrath" <tobias pankrath.net> writes:
There are some references to a nilUUID left and the intro says, 
that UUIDs default to nil. I'd recommend not to use nil at all 
and instead talk about an empty UUID or UUID.init. The intro 
should state that an UUID is empty iff it eqauls UUID.init and 
that UUID.init is an UUID with all bytes zero.
Jun 12 2012
prev sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 16.06.2012 15:30, Johannes Pfau wrote:
 Am Tue, 12 Jun 2012 13:46:17 +0200
 schrieb Johannes Pfau<nospam example.com>:

 Am Mon, 11 Jun 2012 13:12:49 +0200
 schrieb Johannes Pfau<nospam example.com>:

 Am Sun, 10 Jun 2012 18:49:03 +0200
 schrieb Johannes Pfau<nospam example.com>:

 Am Sat, 09 Jun 2012 21:30:57 +0400
 schrieb Dmitry Olshansky<dmitry.olsh gmail.com>:
 Code: https://github.com/jpf91/phobos/blob/std.uuid/std/uuid.d
 API-Docs: http://dl.dropbox.com/u/24218791/d/src/uuid.html

I pushed these changes suggested by Dmitry Olshansky and Jonas Drewsen: * Add documentation table * Rename UUID.isNil --> UUID.empty * Merge ParserException and IsufficientInputException into UUIDParserException * Add note about std.variant.Variant * Rewrite example to avoid cast * Add non-working(commented out) variadic constructor * Rename extractRegex --> uuidRegex * Use std.algorithm.swap instead of swapRanges

* parseUUID now supports InputRanges with ElementType == dchar (all string types are still supported as well)

* randomUUID now creates the RNG only once per thread. * randomUUID lazily seeds the RNG once (per thread) on the first function call * randomUUID now uses the new seed overload (requires https://github.com/D-Programming-Language/phobos/pull/627 ) * the unsafe randomUUID overload taking only a type has been removed.

* Add changes suggested by Jonathan M Davis (see https://github.com/jpf91/phobos/commit/91cd1c1f4385cfe1cd868c0720aff257d9436822 )

Surely, a better way then making a global is to routinely check for zeros. Plus, you can cast .ptr to size_t* and compare word by word in 4 or 2 ops(!). You'd better watch out for __ctfe though, it deon't allow casting pointers this way. -- Dmitry Olshansky
Jun 16 2012
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 16.06.2012 21:06, Dmitry Olshansky wrote:
 On 16.06.2012 15:30, Johannes Pfau wrote:
 Am Tue, 12 Jun 2012 13:46:17 +0200
 schrieb Johannes Pfau<nospam example.com>:

 Am Mon, 11 Jun 2012 13:12:49 +0200
 schrieb Johannes Pfau<nospam example.com>:

 Am Sun, 10 Jun 2012 18:49:03 +0200
 schrieb Johannes Pfau<nospam example.com>:

 Am Sat, 09 Jun 2012 21:30:57 +0400
 schrieb Dmitry Olshansky<dmitry.olsh gmail.com>:
 Code: https://github.com/jpf91/phobos/blob/std.uuid/std/uuid.d
 API-Docs: http://dl.dropbox.com/u/24218791/d/src/uuid.html

I pushed these changes suggested by Dmitry Olshansky and Jonas Drewsen: * Add documentation table * Rename UUID.isNil --> UUID.empty * Merge ParserException and IsufficientInputException into UUIDParserException * Add note about std.variant.Variant * Rewrite example to avoid cast * Add non-working(commented out) variadic constructor * Rename extractRegex --> uuidRegex * Use std.algorithm.swap instead of swapRanges

* parseUUID now supports InputRanges with ElementType == dchar (all string types are still supported as well)

* randomUUID now creates the RNG only once per thread. * randomUUID lazily seeds the RNG once (per thread) on the first function call * randomUUID now uses the new seed overload (requires https://github.com/D-Programming-Language/phobos/pull/627 ) * the unsafe randomUUID overload taking only a type has been removed.

* Add changes suggested by Jonathan M Davis (see https://github.com/jpf91/phobos/commit/91cd1c1f4385cfe1cd868c0720aff257d9436822 )

Surely, a better way then making a global is to routinely check for zeros. Plus, you can cast .ptr to size_t* and compare word by word in 4 or 2 ops(!). You'd better watch out for __ctfe though, it deon't allow casting pointers this way.

Ah and another way to go about it is: union { ubyte[16] uuid; size_t[16/size_t.sizeof] by_word; } -- Dmitry Olshansky
Jun 16 2012
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 17.06.2012 12:04, Johannes Pfau wrote:
 Am Sat, 16 Jun 2012 21:11:51 +0400
 schrieb Dmitry Olshansky<dmitry.olsh gmail.com>:
 Ah and another way to go about it is:
 union {
 	ubyte[16] uuid;
 	size_t[16/size_t.sizeof] by_word;
 }

Isn't that an optimization which should really be done by the compiler? It already knows that it's supposed to compare two ubyte[16]...

It knows that you compare two ubyte[16] that it. It easily might miss the fact that one of them is always 0 in all 16 cells.
 Also how could the union solution be used without having to copy the
 data?

There is no copy it union, aka overlapped storage. In other words as these 16 bytes represented as (on 32bit) 4 size_t. They share memory location. It doesn't play nice with CTFE though, as it thinks unions to be plain struct last time I checked. -- Dmitry Olshansky
Jun 17 2012
parent reply =?ISO-8859-1?Q?Alex_R=F8nne_Petersen?= <alex lycus.org> writes:
On 17-06-2012 10:22, Johannes Pfau wrote:
 Am Sun, 17 Jun 2012 12:15:00 +0400
 schrieb Dmitry Olshansky<dmitry.olsh gmail.com>:

 On 17.06.2012 12:04, Johannes Pfau wrote:
 Am Sat, 16 Jun 2012 21:11:51 +0400
 schrieb Dmitry Olshansky<dmitry.olsh gmail.com>:
 Ah and another way to go about it is:
 union {
 	ubyte[16] uuid;
 	size_t[16/size_t.sizeof] by_word;
 }

Isn't that an optimization which should really be done by the compiler? It already knows that it's supposed to compare two ubyte[16]...

It knows that you compare two ubyte[16] that it. It easily might miss the fact that one of them is always 0 in all 16 cells.
 Also how could the union solution be used without having to copy the
 data?

There is no copy it union, aka overlapped storage. In other words as these 16 bytes represented as (on 32bit) 4 size_t. They share memory location. It doesn't play nice with CTFE though, as it thinks unions to be plain struct last time I checked.

Yes, I thought about using the union nested in the empty function, so a copy would have been needed: bool empty() { union { ubyte[16] uuid; .... } uuid = data; //copy } I didn't know that it's possible to make members of a union private though, so I could use this in the UUID struct: union { ubyte[16] data; private size_t[16/size_t.sizeof] by_word; } which indeed wouldn't require copying. However, with this code added std.uuid fails some unrelated ctfe UUID comparison unittests...

I wouldn't expect unions to be CTFE-safe at all. They allow reinterpreting anything as anything, which could screw everything up in a managed environment like the CTFE interpreter. Anyway, isn't this optimizing something that really doesn't need optimization? I'd be very surprised if this shows up in any profiling results at all. -- Alex Rønne Petersen alex lycus.org http://lycus.org
Jun 17 2012
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 17.06.2012 12:36, Alex Rønne Petersen wrote:
 On 17-06-2012 10:22, Johannes Pfau wrote:
 Am Sun, 17 Jun 2012 12:15:00 +0400
 schrieb Dmitry Olshansky<dmitry.olsh gmail.com>:

 On 17.06.2012 12:04, Johannes Pfau wrote:
 Am Sat, 16 Jun 2012 21:11:51 +0400
 schrieb Dmitry Olshansky<dmitry.olsh gmail.com>:
 Ah and another way to go about it is:
 union {
 ubyte[16] uuid;
 size_t[16/size_t.sizeof] by_word;
 }

Isn't that an optimization which should really be done by the compiler? It already knows that it's supposed to compare two ubyte[16]...

It knows that you compare two ubyte[16] that it. It easily might miss the fact that one of them is always 0 in all 16 cells.
 Also how could the union solution be used without having to copy the
 data?

There is no copy it union, aka overlapped storage. In other words as these 16 bytes represented as (on 32bit) 4 size_t. They share memory location. It doesn't play nice with CTFE though, as it thinks unions to be plain struct last time I checked.

Yes, I thought about using the union nested in the empty function, so a copy would have been needed: bool empty() { union { ubyte[16] uuid; .... } uuid = data; //copy } I didn't know that it's possible to make members of a union private though, so I could use this in the UUID struct: union { ubyte[16] data; private size_t[16/size_t.sizeof] by_word; } which indeed wouldn't require copying. However, with this code added std.uuid fails some unrelated ctfe UUID comparison unittests...


bool empty() nothrow safe ... { if(__ctfe) return find!"a!=0"(data).empty; //simple size_t* p = cast(size_t*)data.ptr; static if(size_t.szieof == 4) return p[0] == 0 && p[1] == 0 && p[2] == 0 && p[3] == 0; else static if(size_t.sizeof == 8) return p[0] == 0 && [1] == 0; else static assert(false, "nonsense, it's not 32 or 64 bit"); }
 I wouldn't expect unions to be CTFE-safe at all. They allow
 reinterpreting anything as anything, which could screw everything up in
 a managed environment like the CTFE interpreter.

 Anyway, isn't this optimizing something that really doesn't need
 optimization? I'd be very surprised if this shows up in any profiling
 results at all.

The "pleasure" of writing standard library is that you'll never know what is bottleneck. All depends on users. And the more you have them the greater impact of things that have "one in a million" probability. -- Dmitry Olshansky
Jun 17 2012
prev sibling next sibling parent Johannes Pfau <nospam example.com> writes:
Am Sun, 10 Jun 2012 07:22:21 -0700
schrieb Walter Bright <newshound2 digitalmars.com>:

 On 6/10/2012 2:05 AM, Johannes Pfau wrote:
 Am Sat, 09 Jun 2012 23:06:25 -0700
 Great! My initial suggestion is that the parsing routines should
 accept input ranges, rather just arrays.

That could be done, but I currently keep the complete input string in the ParserException. That wouldn't be possible if the parser has to operate on an InputRange. It could work with a ForwardRange, but converting the Range back to a string wouldn't be very efficient. This would work best with some kind of BufferedRange/Stream concept. I'm just wondering whether it really makes sense to use ranges in this case. The text representation of UUIDs is usually exactly 36 characters long (for the simple parser it must be exactly 36 characters), so when using some kind of range interface and std.uuid it should be easy enough to read the UUID string into a stack buffer, then pass it to std.uuid. But if it's really desired, I can sure add an implementation using ForwardRanges.

I can see it wouldn't make sense for the simple parser, but for the more general one it may. You can also specialize the more general parser for arrays, so there won't be the overhead of making a copy.

OK I'll do that
Jun 10 2012
prev sibling next sibling parent "Jonas Drewsen" <jdrewsen nospam.com> writes:
On Sunday, 10 June 2012 at 11:00:51 UTC, Johannes Pfau wrote:
 Am Sun, 10 Jun 2012 12:03:15 +0200
 schrieb "Jonas Drewsen" <jdrewsen nospam.com>:

 When looking at the associated RFC I noticed that a UUID can 
 contain e.g. timestap, clock, node fields. What about 
 providing properties that could return these fields?

Why would you need those? A UUID only needs to be unique, which is guaranteed by those node, timestamp and clock fields. But the exact value of node, timestamp, clock is never important, as long as it's unique. std.uuid can't create version 1&2 uuids so we never have to set these fields either.

I do not know actually. If you believe that they are not usable anyway then please ignore my suggestion.
 It would also be nice to have an index in the top of the page 
 like std.algorithm has.

OK, I'll have a look at that.

Great /Jonas
Jun 10 2012
prev sibling next sibling parent Johannes Pfau <nospam example.com> writes:
Am Sun, 10 Jun 2012 18:49:03 +0200
schrieb Johannes Pfau <nospam example.com>:

 Am Sat, 09 Jun 2012 21:30:57 +0400
 schrieb Dmitry Olshansky <dmitry.olsh gmail.com>:
 Code: https://github.com/jpf91/phobos/blob/std.uuid/std/uuid.d
 API-Docs: http://dl.dropbox.com/u/24218791/d/src/uuid.html

I pushed these changes suggested by Dmitry Olshansky and Jonas Drewsen: * Add documentation table * Rename UUID.isNil --> UUID.empty * Merge ParserException and IsufficientInputException into UUIDParserException * Add note about std.variant.Variant * Rewrite example to avoid cast * Add non-working(commented out) variadic constructor * Rename extractRegex --> uuidRegex * Use std.algorithm.swap instead of swapRanges

* parseUUID now supports InputRanges with ElementType == dchar (all string types are still supported as well)
Jun 11 2012
prev sibling next sibling parent Johannes Pfau <nospam example.com> writes:
Am Mon, 11 Jun 2012 13:12:49 +0200
schrieb Johannes Pfau <nospam example.com>:

 Am Sun, 10 Jun 2012 18:49:03 +0200
 schrieb Johannes Pfau <nospam example.com>:
 
 Am Sat, 09 Jun 2012 21:30:57 +0400
 schrieb Dmitry Olshansky <dmitry.olsh gmail.com>:
 Code: https://github.com/jpf91/phobos/blob/std.uuid/std/uuid.d
 API-Docs: http://dl.dropbox.com/u/24218791/d/src/uuid.html

I pushed these changes suggested by Dmitry Olshansky and Jonas Drewsen: * Add documentation table * Rename UUID.isNil --> UUID.empty * Merge ParserException and IsufficientInputException into UUIDParserException * Add note about std.variant.Variant * Rewrite example to avoid cast * Add non-working(commented out) variadic constructor * Rename extractRegex --> uuidRegex * Use std.algorithm.swap instead of swapRanges

* parseUUID now supports InputRanges with ElementType == dchar (all string types are still supported as well)

* randomUUID now creates the RNG only once per thread. * randomUUID lazily seeds the RNG once (per thread) on the first function call * randomUUID now uses the new seed overload (requires https://github.com/D-Programming-Language/phobos/pull/627 ) * the unsafe randomUUID overload taking only a type has been removed.
Jun 12 2012
prev sibling next sibling parent Johannes Pfau <nospam example.com> writes:
Am Tue, 12 Jun 2012 14:24:15 +0200
schrieb "Tobias Pankrath" <tobias pankrath.net>:

 There are some references to a nilUUID left and the intro says, 
 that UUIDs default to nil. I'd recommend not to use nil at all 
 and instead talk about an empty UUID or UUID.init. The intro 
 should state that an UUID is empty iff it eqauls UUID.init and 
 that UUID.init is an UUID with all bytes zero.

oops. The references to nil definitely have to go. Thanks for letting me know. Do you mean I should also drop nilUUID completely and use UUID.init instead everywhere in the module?
Jun 12 2012
prev sibling next sibling parent Johannes Pfau <nospam example.com> writes:
Am Tue, 12 Jun 2012 16:15:48 +0400
schrieb Dmitry Olshansky <dmitry.olsh gmail.com>:

 
 Judging by the swift rate of fixes and keeping in mind that it's only 
 ~1.5 KLOC with abundance of unit tests I'm tempted to suggest the
 following.
 
 Let's trim the review time span a bit, so that review ends at 18-19
 June and voting ends at 25-26 June. That's, of course, given that you
 and others don't mind it.
 
 How does it sound?
 

Sure, sounds good to me.
Jun 12 2012
prev sibling next sibling parent "Kagamin" <spam here.lot> writes:
If we use all caps for abbreviations then the names should be 
SHA1UUID, MD5UUID and UUIDVersion?
Jun 13 2012
prev sibling next sibling parent Kevin Cox <kevincox.ca gmail.com> writes:
--0015175cb67af84f7004c258e8ce
Content-Type: text/plain; charset=UTF-8

On Jun 13, 2012 7:23 AM, "Kagamin" <spam here.lot> wrote:
 If we use all caps for abbreviations then the names should be SHA1UUID,

I believe tr naming scheme is acronyms have the same case. So if an acronym is first it is all lowercase otherwise all uppercase. --0015175cb67af84f7004c258e8ce Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable <p><br> On Jun 13, 2012 7:23 AM, &quot;Kagamin&quot; &lt;spam here.lot&gt; wrote:<b= r> &gt;<br> &gt; If we use all caps for abbreviations then the names should be SHA1UUID= , MD5UUID and UUIDVersion?</p> <p>I believe tr naming scheme is acronyms have the same case.=C2=A0 So if a= n acronym is first it is all lowercase otherwise all uppercase.<br> </p> --0015175cb67af84f7004c258e8ce--
Jun 13 2012
prev sibling next sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Saturday, June 09, 2012 21:30:57 Dmitry Olshansky wrote:
 Code: https://github.com/jpf91/phobos/blob/std.uuid/std/uuid.d
 API-Docs: http://dl.dropbox.com/u/24218791/d/src/uuid.html

* I'm not terribly fond of having names like Variant and Version, but they _are_ local to UUID, so I guess that they're clear enough. But the fact that you have to put a note about it _not_ being std.variant.Variant definitely indicates that another name might be better. I know that the RFC uses the word variant, but it also indicates that type would probably be a better name, and if Variant is unclear enough that a note is needed, then perhaps it really isn't the right name. * nameBasedMd5 should probably be nameBasedMD5, and nameBasedSha1 should probably be nameBasedSHA1. The correct names are MD5 and SHA-1, not md5 and sha1, so it would be more correct to use nameBasedMD5 and nameBasedSHA1. With acronyms, the thing that most of Phobos does is to keep all of the letters the same case, so if the first letter is lowercase, then the whole thing is lowercase, and if the first letter is uppercase, then the whole thing is uppercase. I'm not sure if MD5 and SHA-1 are acronyms or not, but they're similar enough, that I'd say that the same naming scheme should be used, which you do in most cases, but you didn't with the enums for some reason. * For clarity's sake, please always put const after the function signatures for all const functions. In same cases you do, and in others you don't. * If all that's preventing toString from being nothrow is idup, then just do this: string toString() safe const pure nothrow { try return _toString().idup; catch(Exception e) assert(0, "It should be impossible for idup to throw."); } and if there's a bug report for idup, then list it in a comment so that we know to get rid of the try-catch block once that's been fixed (and if there isn't a bug report, one should be created). * Get rid of nilUUID. Just use UUID.init. It just seems like an extra, unnecessary complication when the two are identical. * Please use SHA-1 in the documentation rather than SHA1, since SHA-1 is the correct name. Also, make sure that it's SHA-1 and not just SHA. You're not always consistent (e.g. take a look at sha1UUID). * parseUUID should be clear on whether it takes both lowercase and uppercase hex digits, and I'd argue that it should definitely accept both. [0-9a-f] implies that it only accepts lowercase hex digits. * Use std.exception's assertThrown. It'll make your tests cleaner. And if you can't do that, because you need to look at the exception itself, then use collectException. So, thrown = false; try //make sure 36 characters in total or we'll get a 'tooMuch' reason id = UUID("{8ab3060e-2cba-4f23-b74c-b52db3bdf6}"); catch(UUIDParserException e) { assert(e.reason == UUIDParserException.Reason.invalidChar); thrown = true; } assert(thrown); becomes //make sure 36 characters in total or we'll get a 'tooMuch' reason auto e = collectException!UUIDParserException(UUID("{8ab3060e-2cba-4f23-b74c- b52db3bdf6}")); assert(e !is null); assert(e.reason == UUIDParserException.Reason.invalidChar); It's much shorter that way and less error-prone. * You should probably change empty's body to enum ubyte[16] emptyUUID = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]; return data == emptyUUID[]; I'm not sure whether it currently allocates or not if you do that (I _think_ that it'll do the heap allocation at compile time and then only have the value type at runtime), but regardless of whether it does or not, I'd expect that once it's fixed so that assigning an array literal to a static array, it definitely _won't_ allocate on the heap, whereas your current implementation will regardless of that fix. And if that doesn't seem like a good change, then you can just make a module- level or struct variable which is immutable ubyte[16] and have empty use that. As it stands, it's always going to allocate the array literal on the heap. It's probably not a big deal, since empty probably won't be called all that often, but if we can easily avoid the extra heap allocation, we should. * I know that void toString(scope void delegate(const(char)[]) sink) is what has been previously discussed for a toString which doesn't allocate, but I have to wonder why we don't just use an output range. I suppose that that's a whole other discussion, but that's what I would have suggested rather than a delegate - though in some circumstances it probably _is_ easier to just create a nested function rather than a whole output range. Maybe we should be moving to having 3 overloads of toString (normal, accepts delegate, and accepts output range). * rndGen returns a Random (which is already aliased to Mt19937), and already initializes it with an unpredictable seed. So, why not just us rndGen in randomUUID? It becomes return randomUUID(rndGen()); It's much shorter and just as good as far as I can tell. * Change randomValue in randomUUID to auto. The default random number generator _does_ use uint, but others could use ubyte, ushort, or ulong instead. I'd also be tempted to argue that you should make it immutable and use a different variable for each random value (you won't ever accidentally fail to set it to a new value that way - that I don't think that reusing variables is a good idea in general), but the benefits of that are debatable. Actually, just make it a loop. Something like foreach(size_t i; iota(0, 16, 4)) { randomGen.popFront(); immutable randomValue = randomGen.front; u.data[i .. i + 4] = *cast(ubyte[4]*)&randomValue; } That's probably slightly less efficient (though I wouldn't think by much), but it's much cleaner. And actually, since the randomValue isn't guarantee to be a uint, but your code needs it to be 4 bytes, you should probably change that line to immutable uint randomValue = cast(uint)randomGen.front; * I'd advise giving more descriptive names to parseError's parameters. It's true that it's a nested function and not in the actual API, but it would make the function easier to understand. * You should probably use R or Range for template parameters which are ranges. That's the typical thing to do. T is usually used for generic types rather than for ranges. The code will be somewhat easier to read if you use R or Range given that that's how the rest of Phobos is. * In parseUUID, element should be a size_t, since it's used specifically for indexing. As it stands, the compiler is going to have to do extra conversions in some of the places that it's used (e.g. converting it to size_t when indexing). * In the unit tests, I'd argue that it would be cleaner to just use the result directly in an assertion rather than saving it if it's not going to be used outside of the assertion. For instance, id = parseUUID("5668122d-9df0-49a4-ad0b-b9b0a57f886a"d); assert(id.data == [86, 104, 18, 45, 157, 240, 73, 164, 173, 11, 185, 176, 165, 127, 136, 106]); should probably be assert(parseUUID("5668122d-9df0-49a4-ad0b-b9b0a57f886a"d).data == [86, 104, 18, 45, 157, 240, 73, 164, 173, 11, 185, 176, 165, 127, 136, 106]); It's generally shorter, cleaner, and avoids the risk of reusing the variable without setting it again. * For your functions which can take range types, test with filter!"true" so that you get a range which isn't an array or string. * When testing various string types, you should take advantage of TypeTuple and std.conv.to to be thorough. For instance, instead of id = parseUUID("5668122d-9df0-49a4-ad0b-b9b0a57f886a"d); assert(id.data == [86, 104, 18, 45, 157, 240, 73, 164, 173, 11, 185, 176, 165, 127, 136, 106]); id = parseUUID("5668122d-9df0-49a4-ad0b-b9b0a57f886a"w); assert(id.data == [86, 104, 18, 45, 157, 240, 73, 164, 173, 11, 185, 176, 165, 127, 136, 106]); do something like foreach(S; TypeTuple!(char, const char, immutable char, wchar, const wchar, immutable wchar, dchar, const dchar, immutable dchar)) { assert(parseUUID(to!S("5668122d-9df0-49a4-ad0b-b9b0a57f886a")).data == [86, 104, 18, 45, 157, 240, 73, 164, 173, 11, 185, 176, 165, 127, 136, 106]); } And while you're at it, you could throw _all_ of your string tests in that loop save for those which wouldn't really make sense to put in there (and you probably don't want to put very many exception tests in the loop, since those get to be expensive). You get much better testing this way. However, do make sure that some of the exception tests use multiple string types even if they aren't in the TypeTuple loop, since some of your exceptions (in parseUUID in particular) depend on the type of range passed in. * I'm not sure if you do or not, but if you have a spot where you have to create a UUIDParserException from another exception but don't want to have to specify the file and line number, then you should create an overload of UUIDParserException which takes the Throwable first (and probably forwards to the other constructor to avoid code duplication). Exception does this, and it can be quite useful. But UUIDParserException can't be publicly constructed, so only do it if it's actually useful within std.uuid. * Personally, I'd prefer UUIDParsingException to UUIDParserException, since you're not writing a full-on parser but rather just doing parsing in some of your functions. But that's fairly subjective. - Jonathan M Davis
Jun 13 2012
parent reply Jacob Carlborg <doob me.com> writes:
On 2012-06-14 03:57, Jonathan M Davis wrote:
 On Saturday, June 09, 2012 21:30:57 Dmitry Olshansky wrote:
 Code: https://github.com/jpf91/phobos/blob/std.uuid/std/uuid.d
 API-Docs: http://dl.dropbox.com/u/24218791/d/src/uuid.html

* I'm not terribly fond of having names like Variant and Version, but they _are_ local to UUID, so I guess that they're clear enough. But the fact that you have to put a note about it _not_ being std.variant.Variant definitely indicates that another name might be better. I know that the RFC uses the word variant, but it also indicates that type would probably be a better name, and if Variant is unclear enough that a note is needed, then perhaps it really isn't the right name.

I understand exactly what you're saying here and it's probably good to not have too many conflicting names in the standard library. But at the same time it feels like the whole point of modules go to waste and we're back to C where everything is in the same global namespace. -- /Jacob Carlborg
Jun 13 2012
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 14.06.2012 10:35, Jacob Carlborg wrote:
 On 2012-06-14 03:57, Jonathan M Davis wrote:
 On Saturday, June 09, 2012 21:30:57 Dmitry Olshansky wrote:
 Code: https://github.com/jpf91/phobos/blob/std.uuid/std/uuid.d
 API-Docs: http://dl.dropbox.com/u/24218791/d/src/uuid.html

* I'm not terribly fond of having names like Variant and Version, but they _are_ local to UUID, so I guess that they're clear enough. But the fact that you have to put a note about it _not_ being std.variant.Variant definitely indicates that another name might be better. I know that the RFC uses the word variant, but it also indicates that type would probably be a better name, and if Variant is unclear enough that a note is needed, then perhaps it really isn't the right name.

I understand exactly what you're saying here and it's probably good to not have too many conflicting names in the standard library. But at the same time it feels like the whole point of modules go to waste and we're back to C where everything is in the same global namespace.

It feels this way because by default we import all symbols. The good thing is that you don't care for conflicts as long as you don't touch the conflicting name. One day we'd just have to use static import more often. -- Dmitry Olshansky
Jun 13 2012
next sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2012-06-14 08:57, Dmitry Olshansky wrote:

 It feels this way because by default we import all symbols. The good
 thing is that you don't care for conflicts as long as you don't touch
 the conflicting name.

That's a good thing.
 One day we'd just have to use static import more often.

Or aliases, or renamed imports. -- /Jacob Carlborg
Jun 14 2012
parent reply Jacob Carlborg <doob me.com> writes:
On 2012-06-14 09:49, Jonathan M Davis wrote:

 It is and it isn't. It makes it much easier to end up with conflicts. It would
 be less of a problem if not all symbols were imported by default. We _do_ have
 some great tools for dealing with conflicts, but the fact that everything gets
 imported by default when you import a module means that it's very easy to
 create conflicts, and when that's coupled with the fact that D is very quick to
 declare ambiguities and conflicts with overload sets and whatnot rather than
 trying to determine which one you really meant (like C++ would) makes it that
 much worse. There are definite pros to the situation (e.g. being so picky with
 overload sets makes it much harder to call the wrong function without knowing
 it), but there's no question that conflicts happen quite easily. It's not as
 big a deal with introducing a new module, since the programmer will deal with
 it when they import it for the first time, but it tends to break code when
 adding new stuff to old modules.

 So, in general, if we can pick good names which don't conflict, we're better
 off. But if the best names are ones that are used in other modules, then that's
 what we'll end up going with. It has to be judged on a case-by-case basis.

I was referring to that it's good that a conflict will not occur just by importing modules with conflicting names. You also have to _use_ them.
 aliases are useless for dealing with conflicts as long as private aliases
 aren't hidden. At present, I'd argue that private aliases are bad practice
 unless you alias things to very esoteric names which aren't likely to conflict.
 It would really be nice to have that fixed though.

Right, but that's a bug. -- /Jacob Carlborg
Jun 14 2012
parent Jacob Carlborg <doob me.com> writes:
On 2012-06-14 12:04, Jonathan M Davis wrote:

 Actually, it's not. It's by design. private never hides _anything_ and is
 always considered _after_ overload resolution. That's how it works in C++, and
 that's how it works in D. If we want it to change, we need to convince Walter.
 Personally, I think that we'd be better off if we made private hidden, but I
 don't know if we can convince Walter of that. Regardless, someone else will
 probably have to implement the changes. IIRC, someone had done something along
 those lines and created a pull request, but I don't know what happened with
 it.

Oh, yeah, I remember that now. I really don't a reason why private aliases shouldn't be hidden. -- /Jacob Carlborg
Jun 14 2012
prev sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Thursday, June 14, 2012 09:41:53 Jacob Carlborg wrote:
 On 2012-06-14 08:57, Dmitry Olshansky wrote:
 It feels this way because by default we import all symbols. The good
 thing is that you don't care for conflicts as long as you don't touch
 the conflicting name.

That's a good thing.

It is and it isn't. It makes it much easier to end up with conflicts. It would be less of a problem if not all symbols were imported by default. We _do_ have some great tools for dealing with conflicts, but the fact that everything gets imported by default when you import a module means that it's very easy to create conflicts, and when that's coupled with the fact that D is very quick to declare ambiguities and conflicts with overload sets and whatnot rather than trying to determine which one you really meant (like C++ would) makes it that much worse. There are definite pros to the situation (e.g. being so picky with overload sets makes it much harder to call the wrong function without knowing it), but there's no question that conflicts happen quite easily. It's not as big a deal with introducing a new module, since the programmer will deal with it when they import it for the first time, but it tends to break code when adding new stuff to old modules. So, in general, if we can pick good names which don't conflict, we're better off. But if the best names are ones that are used in other modules, then that's what we'll end up going with. It has to be judged on a case-by-case basis.
 One day we'd just have to use static import more often.

Or aliases, or renamed imports.

aliases are useless for dealing with conflicts as long as private aliases aren't hidden. At present, I'd argue that private aliases are bad practice unless you alias things to very esoteric names which aren't likely to conflict. It would really be nice to have that fixed though. - Jonathan M Davis
Jun 14 2012
prev sibling next sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 09.06.2012 21:30, Dmitry Olshansky wrote:
 The review process stalled long enough, let's kick start it with a small
 yet a valuable module that was there for quite some time.

 std.uuid by Johannes Pfau is a rehash of it's C++ twin from well known
 Boost library.

 The review cycle takes the usual 2 weeks starting today 9th June, ending
 at 23 June 00:00 UTC followed by a one week vote process ending at 30 June.

Attention: we changing the review period, it'll take 5 days less in review. ----- Given the small scale of module with approval from Johannes, the review period is shortened: Review ends at 18-19 00:00 UTC and similarly voting ends at 25-26 June. ----- -- Dmitry Olshansky
Jun 14 2012
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Thursday, June 14, 2012 11:55:51 Jacob Carlborg wrote:
 aliases are useless for dealing with conflicts as long as private aliases
 aren't hidden. At present, I'd argue that private aliases are bad practice
 unless you alias things to very esoteric names which aren't likely to
 conflict. It would really be nice to have that fixed though.

Right, but that's a bug.

Actually, it's not. It's by design. private never hides _anything_ and is always considered _after_ overload resolution. That's how it works in C++, and that's how it works in D. If we want it to change, we need to convince Walter. Personally, I think that we'd be better off if we made private hidden, but I don't know if we can convince Walter of that. Regardless, someone else will probably have to implement the changes. IIRC, someone had done something along those lines and created a pull request, but I don't know what happened with it. - Jonathan M Davis
Jun 14 2012
prev sibling next sibling parent Johannes Pfau <nospam example.com> writes:
Am Wed, 13 Jun 2012 18:57:51 -0700
schrieb Jonathan M Davis <jmdavisProg gmx.com>:

 * I'm not terribly fond of having names like Variant and Version, but
 they _are_ local to UUID, so I guess that they're clear enough. But
 the fact that you have to put a note about it _not_ being
 std.variant.Variant definitely indicates that another name might be
 better. I know that the RFC uses the word variant, but it also
 indicates that type would probably be a better name, and if Variant
 is unclear enough that a note is needed, then perhaps it really isn't
 the right name.

Well I added the note as a result of this review. I think whether the name variant is confusing depends a lot on whether you've already used a UUID library or std.variant. Naming it 'Type' might confuse those who already worked with UUIDs. And 'Type' is a very generic name, even more than UUID and at some point we'll have other 'Type' symbols in phobos as well. So I'd rather like to keep the name consistent with the RFC.
 
 * nameBasedMd5 should probably be nameBasedMD5, and nameBasedSha1
 should probably be nameBasedSHA1. The correct names are MD5 and
 SHA-1, not md5 and sha1, so it would be more correct to use
 nameBasedMD5 and nameBasedSHA1. With acronyms, the thing that most of
 Phobos does is to keep all of the letters the same case, so if the
 first letter is lowercase, then the whole thing is lowercase, and if
 the first letter is uppercase, then the whole thing is uppercase. I'm
 not sure if MD5 and SHA-1 are acronyms or not, but they're similar
 enough, that I'd say that the same naming scheme should be used,
 which you do in most cases, but you didn't with the enums for some
 reason.

 * For clarity's sake, please always put const after the function
 signatures for all const functions. In same cases you do, and in
 others you don't.

list, as in ' safe nothrow const pure')
 
 * If all that's preventing toString from being nothrow is idup, then
 just do this:
 
 string toString()  safe const pure nothrow
 {
     try
         return _toString().idup;
     catch(Exception e)
         assert(0, "It should be impossible for idup to throw.");
 }
 
 and if there's a bug report for idup, then list it in a comment so
 that we know to get rid of the try-catch block once that's been fixed
 (and if there isn't a bug report, one should be created).

_toString is already nothrow, so yes, it's only idups fault ;-) Added links to http://d.puremagic.com/issues/show_bug.cgi?id=5700
 
 * Get rid of nilUUID. Just use UUID.init. It just seems like an
 extra, unnecessary complication when the two are identical.

 
 * Please use SHA-1 in the documentation rather than SHA1, since SHA-1
 is the correct name. Also, make sure that it's SHA-1 and not just
 SHA. You're not always consistent (e.g. take a look at sha1UUID).

 
 * parseUUID should be clear on whether it takes both lowercase and
 uppercase hex digits, and I'd argue that it should definitely accept
 both. [0-9a-f] implies that it only accepts lowercase hex digits.

 * Use std.exception's assertThrown. It'll make your tests cleaner.
 And if you can't do that, because you need to look at the exception
 itself, then use collectException. So,
 
 thrown = false;
 try
     //make sure 36 characters in total or we'll get a 'tooMuch' reason
     id = UUID("{8ab3060e-2cba-4f23-b74c-b52db3bdf6}");
 catch(UUIDParserException e)
 {
     assert(e.reason == UUIDParserException.Reason.invalidChar);
     thrown = true;
 }
 assert(thrown);
 
 becomes
 
 //make sure 36 characters in total or we'll get a 'tooMuch' reason
 auto e =
 collectException!UUIDParserException(UUID("{8ab3060e-2cba-4f23-b74c-
 b52db3bdf6}")); assert(e !is null);
 assert(e.reason == UUIDParserException.Reason.invalidChar);
 
 It's much shorter that way and less error-prone.

OK
 * You should probably change empty's body to
 
 enum ubyte[16] emptyUUID = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0];
 return data == emptyUUID[];
 
 I'm not sure whether it currently allocates or not if you do that (I
 _think_ that it'll do the heap allocation at compile time and then
 only have the value type at runtime), but regardless of whether it
 does or not, I'd expect that once it's fixed so that assigning an
 array literal to a static array, it definitely _won't_ allocate on
 the heap, whereas your current implementation will regardless of that
 fix.
 
 And if that doesn't seem like a good change, then you can just make a
 module- level or struct variable which is immutable ubyte[16] and
 have empty use that. As it stands, it's always going to allocate the
 array literal on the heap. It's probably not a big deal, since empty
 probably won't be called all that often, but if we can easily avoid
 the extra heap allocation, we should.

I totally agree, we shouldn't allocate there.
 * I know that
 
 void toString(scope void delegate(const(char)[]) sink)
 
 is what has been previously discussed for a toString which doesn't
 allocate, but I have to wonder why we don't just use an output range.
 I suppose that that's a whole other discussion, but that's what I
 would have suggested rather than a delegate - though in some
 circumstances it probably _is_ easier to just create a nested
 function rather than a whole output range. Maybe we should be moving
 to having 3 overloads of toString (normal, accepts delegate, and
 accepts output range).

Probably, but as you said that's a whole other discussion :-)
 
 * rndGen returns a Random (which is already aliased to Mt19937), and
 already initializes it with an unpredictable seed. So, why not just
 us rndGen in randomUUID? It becomes
 
 return randomUUID(rndGen());
 
 It's much shorter and just as good as far as I can tell.

Good catch. We'd have to change rndGen to use the new seed function though. And the documentation for Random says: "You may want to use it if [...] you don't care for the minutiae of the method being used" I'm not sure, do we care about the method? We need something which generates 'good' random numbers, I don't think a LCG would be acceptable. But rndGen could default to something like that on embedded systems? Then again, I know nothing about random number generators, so someone please tell me: Do we need Mersenne twister for random UUIDS or could we use any RNG?
 
 * Change randomValue in randomUUID to auto. The default random number 
 generator _does_ use uint, but others could use ubyte, ushort, or
 ulong instead. I'd also be tempted to argue that you should make it
 immutable and use a different variable for each random value (you
 won't ever accidentally fail to set it to a new value that way - that
 I don't think that reusing variables is a good idea in general), but
 the benefits of that are debatable.
 
 Actually, just make it a loop. Something like
 
 foreach(size_t i; iota(0, 16, 4))
 {
     randomGen.popFront();
     immutable randomValue = randomGen.front;
     u.data[i .. i + 4] = *cast(ubyte[4]*)&randomValue;
 }
 
 That's probably slightly less efficient (though I wouldn't think by
 much), but it's much cleaner. And actually, since the randomValue
 isn't guarantee to be a uint, but your code needs it to be 4 bytes,
 you should probably change that line to
 
 immutable uint randomValue = cast(uint)randomGen.front;

I don't think that's a good idea. If the randomGen.front the is e.g. ubyte, casting it to uint will produce 3 bytes which are not set to a random value. That's not acceptable. ------------ trusted UUID randomUUID(RNG)(ref RNG randomGen) if(isUniformRNG!(RNG) && isIntegral!(typeof(RNG.front)) && typeof(RNG.front).sizeof <= 16) { enum elemSize = typeof(RNG.front).sizeof; UUID u; foreach(size_t i; iota(0, 16, elemSize)) { randomGen.popFront(); immutable randomValue = randomGen.front; u.data[i .. i + elemSize] = *cast(ubyte[elemSize]*)&randomValue; } ------------ ?
 
 * I'd advise giving more descriptive names to parseError's
 parameters. It's true that it's a nested function and not in the
 actual API, but it would make the function easier to understand.

 * You should probably use R or Range for template parameters which
 are ranges. That's the typical thing to do. T is usually used for
 generic types rather than for ranges. The code will be somewhat
 easier to read if you use R or Range given that that's how the rest
 of Phobos is.

 
 * In parseUUID, element should be a size_t, since it's used
 specifically for indexing. As it stands, the compiler is going to
 have to do extra conversions in some of the places that it's used
 (e.g. converting it to size_t when indexing).

 
 * In the unit tests, I'd argue that it would be cleaner to just use
 the result directly in an assertion rather than saving it if it's not
 going to be used outside of the assertion. For instance,
 
 id = parseUUID("5668122d-9df0-49a4-ad0b-b9b0a57f886a"d);
 assert(id.data == [86, 104, 18, 45, 157, 240, 73, 164, 173,
         11, 185, 176, 165, 127, 136, 106]);
 
 should probably be
 
 assert(parseUUID("5668122d-9df0-49a4-ad0b-b9b0a57f886a"d).data ==
            [86, 104, 18, 45, 157, 240, 73, 164, 173, 11, 185, 176,
 165, 127, 136, 106]);
 
 It's generally shorter, cleaner, and avoids the risk of reusing the
 variable without setting it again.

 * For your functions which can take range types, test with
 filter!"true" so that you get a range which isn't an array or string.

that.... I can't use filter (and map) though: std/algorithm.d(1141): Error: nested structs with constructors are not yet supported in CTFE (Bug 6419) std/uuid.d(1273): called from here: filter("00000000-0000-0000-0000-000000000000"d)
 
 * When testing various string types, you should take advantage of
 TypeTuple and std.conv.to to be thorough. For instance, instead of
 
 id = parseUUID("5668122d-9df0-49a4-ad0b-b9b0a57f886a"d);
 assert(id.data == [86, 104, 18, 45, 157, 240, 73, 164, 173,
     11, 185, 176, 165, 127, 136, 106]);
 id = parseUUID("5668122d-9df0-49a4-ad0b-b9b0a57f886a"w);
 assert(id.data == [86, 104, 18, 45, 157, 240, 73, 164, 173,
     11, 185, 176, 165, 127, 136, 106]);
 
 do something like
 
 foreach(S; TypeTuple!(char, const char, immutable char,
                       wchar, const wchar, immutable wchar,
                       dchar, const dchar, immutable dchar))
 {
     assert(parseUUID(to!S("5668122d-9df0-49a4-ad0b-b9b0a57f886a")).data
 == [86, 104, 18, 45, 157, 240, 73, 164, 173, 11, 185, 176, 165, 127, 
 136, 106]);
 }
 
 And while you're at it, you could throw _all_ of your string tests in
 that loop save for those which wouldn't really make sense to put in
 there (and you probably don't want to put very many exception tests
 in the loop, since those get to be expensive). You get much better
 testing this way.

exception tests in total and throwing/catching 100 exceptions takes ~120 msec here, so it should be fast enough)
 
 However, do make sure that some of the exception tests use multiple
 string types even if they aren't in the TypeTuple loop, since some of
 your exceptions (in parseUUID in particular) depend on the type of
 range passed in.

 
 * I'm not sure if you do or not, but if you have a spot where you
 have to create a UUIDParserException from another exception but don't
 want to have to specify the file and line number, then you should
 create an overload of UUIDParserException which takes the Throwable
 first (and probably forwards to the other constructor to avoid code
 duplication). Exception does this, and it can be quite useful. But
 UUIDParserException can't be publicly constructed, so only do it if
 it's actually useful within std.uuid.

 
 * Personally, I'd prefer UUIDParsingException to UUIDParserException,
 since you're not writing a full-on parser but rather just doing
 parsing in some of your functions. But that's fairly subjective.

 
 - Jonathan M Davis

Thanks for your feedback!
Jun 16 2012
prev sibling next sibling parent Johannes Pfau <nospam example.com> writes:
Am Tue, 12 Jun 2012 13:46:17 +0200
schrieb Johannes Pfau <nospam example.com>:

 Am Mon, 11 Jun 2012 13:12:49 +0200
 schrieb Johannes Pfau <nospam example.com>:
 
 Am Sun, 10 Jun 2012 18:49:03 +0200
 schrieb Johannes Pfau <nospam example.com>:
 
 Am Sat, 09 Jun 2012 21:30:57 +0400
 schrieb Dmitry Olshansky <dmitry.olsh gmail.com>:
 Code: https://github.com/jpf91/phobos/blob/std.uuid/std/uuid.d
 API-Docs: http://dl.dropbox.com/u/24218791/d/src/uuid.html

I pushed these changes suggested by Dmitry Olshansky and Jonas Drewsen: * Add documentation table * Rename UUID.isNil --> UUID.empty * Merge ParserException and IsufficientInputException into UUIDParserException * Add note about std.variant.Variant * Rewrite example to avoid cast * Add non-working(commented out) variadic constructor * Rename extractRegex --> uuidRegex * Use std.algorithm.swap instead of swapRanges

* parseUUID now supports InputRanges with ElementType == dchar (all string types are still supported as well)

* randomUUID now creates the RNG only once per thread. * randomUUID lazily seeds the RNG once (per thread) on the first function call * randomUUID now uses the new seed overload (requires https://github.com/D-Programming-Language/phobos/pull/627 ) * the unsafe randomUUID overload taking only a type has been removed.

* Add changes suggested by Jonathan M Davis (see https://github.com/jpf91/phobos/commit/91cd1c1f4385cfe1cd868c0720aff257d9436822 )
Jun 16 2012
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Saturday, June 16, 2012 13:27:54 Johannes Pfau wrote:
 Am Wed, 13 Jun 2012 18:57:51 -0700
 So I'd rather like to keep the name consistent with the RFC.

I understand. I don't think that there's a really a good solution. So, if you think that keeping it as-is is best, then we can do that.
 _toString is already nothrow, so yes, it's only idups fault ;-)
 Added links to http://d.puremagic.com/issues/show_bug.cgi?id=5700

Please just put it as a comment above the try-catch rather than putting it in the ddoc. No one using the function needs to know that there's a workaround for nothrow. They just need to know that it's nothrow. It's the Phobos maintainers that need to know that there's a workaround. It seems fairly common to do something like // BUG workaround for bugzilla 5700 so that it's nicely greppable.
 * parseUUID should be clear on whether it takes both lowercase and
 uppercase hex digits, and I'd argue that it should definitely accept
 both. [0-9a-f] implies that it only accepts lowercase hex digits.

OK

Make sure that you have tests which test UUIDs with both uppercase and lowercase letters. I only see tests with lowercase letters. You seem to have updated an example with some uppercase letters but not any tests from what I can see. Speaking of which, please make sure that you have unittest blocks with your examples in them to make sure that they compile. Personally, I always put //Verify Example. unittest { //example code... } after a function with an example and make sure that the example and the code in the unittest block match exactly (save for indentation, since any indentation in the example in the ddoc ends up in the final ddoc, and we do want the indentation in the unittest block).
 * rndGen returns a Random (which is already aliased to Mt19937), and
 already initializes it with an unpredictable seed. So, why not just
 us rndGen in randomUUID? It becomes
 
 return randomUUID(rndGen());
 
 It's much shorter and just as good as far as I can tell.

Good catch. We'd have to change rndGen to use the new seed function though. And the documentation for Random says: "You may want to use it if [...] you don't care for the minutiae of the method being used" I'm not sure, do we care about the method? We need something which generates 'good' random numbers, I don't think a LCG would be acceptable. But rndGen could default to something like that on embedded systems? Then again, I know nothing about random number generators, so someone please tell me: Do we need Mersenne twister for random UUIDS or could we use any RNG?

Umm. My point was that as far as I can tell, what you're doing is _identical_ to what rndGen is doing. Yours ----------- trusted UUID randomUUID()() { static Mt19937 randomGen; static bool seeded = false; if(!seeded) { randomGen.seed(map!((a) => unpredictableSeed)(repeat(0))); seeded = true; } return randomUUID(randomGen); } ----------- vs ----------- alias Mt19937 Random; property ref Random rndGen() { static Random result; static bool initialized; if (!initialized) { result = Random(unpredictableSeed); initialized = true; } return result; } ----------- Really, the only differences are the fact that you're passing the Random into randomUUID and returning that rather than returing the Random, and you're using a map to do the seed for reasons that completely escape me. In fact, I'm not sure how that code even compiles. It creates an infinite range whose every value is zero and then maps all of those zeroes to unpredictableSeed. So, you've created an infinite range of unpredictableSeed results. But Mt19937's seed takes a single uint, not a range. So, I'm baffled as to how that line even compiles. In any case, it looks like they're both using Random, and they're both seeding it with an unpredictableSeed. It's just that in noe case, it's using Random's constructor to do it, while in the other, seed is being called explicitly (thereby _re_seeding the Random). So, I don't see what you're doing that's any different from just doing return randomUUID(rndGen()); aside from the fact that you seem to have an impossible line of code for seeding your random number generator. And trying to compile that line on my computer actually fails (as I'd expect), so I don't know how it's compiling on yours.
 I don't think that's a good idea. If the randomGen.front the is e.g.
 ubyte, casting it to uint will produce 3 bytes which are not set to a
 random value. That's not acceptable.
 
 ------------
  trusted UUID randomUUID(RNG)(ref RNG randomGen) if(isUniformRNG!(RNG)
 &&
     isIntegral!(typeof(RNG.front)) && typeof(RNG.front).sizeof <= 16)
 {
     enum elemSize = typeof(RNG.front).sizeof;
     UUID u;
     foreach(size_t i; iota(0, 16, elemSize))
     {
         randomGen.popFront();
         immutable randomValue = randomGen.front;
         u.data[i .. i + elemSize] = *cast(ubyte[elemSize]*)&randomValue;
     }
 ------------
 ?

Looks good except for the fact that the typeof(RNG.front).sizeof <= 16 is completely unnecessary. isIntgeral guarantees that the type is a built-in integral type, the largest of which is 8 bytes (and even if we added cent and ucent, it would still be only 16, not greater than 16).
 * For your functions which can take range types, test with
 filter!"true" so that you get a range which isn't an array or string.

Does take!(string, int) return a slice? Should have thought about that....

No. It doesn't, because hasSlicing!string is false.
 I can't use filter (and map) though:
 std/algorithm.d(1141): Error: nested structs with constructors are not
 yet supported in CTFE (Bug 6419) std/uuid.d(1273):        called from
 here: filter("00000000-0000-0000-0000-000000000000"d)

filter!"true"("00000000-0000-0000-0000-000000000000") should compile just fine. You then pass that to your range-based uuid functions which accept ranges or dchar. What does CTFE have to do with it? That would only be an issue if you were trying to use filter in CTFE, and even if you have CTFE tests where you'd like to do that and can't due to that bug, there are plenty of tests that you could run which weren't CTFE. I think that one or both of us is misunderstanding something here.
 Thanks for your feedback!

Also, I just noticed that it looks like you don't have any tests which verify that empty is false when it's supposed to be. You should probably add a few. And by the way, before this actually gets merged into Phobos (as I'm guessing that it will be, since no way seems to particularly dislike the module thus far), you should fix the ddoc to use LREF and XREF when referencing symbols elsewhere in the module and elsewhere in std respectively. - Jonathan M Davis
Jun 16 2012
prev sibling next sibling parent Johannes Pfau <nospam example.com> writes:
Am Sat, 16 Jun 2012 16:57:25 -0700
schrieb Jonathan M Davis <jmdavisProg gmx.com>:

 Umm. My point was that as far as I can tell, what you're doing is
 _identical_ to what rndGen is doing.
 

Yes right now it's exactly the same. But there's no guarantee that the 'Random' alias (and therefore rndGen) will always use Mt19997. The docs explicitly say: -------------- The "default", "favorite", "suggested" random number generator type on the current platform. It is an alias for one of the previously-defined generators. You may want to use it if (1) you need to generate some nice random numbers, and (2) you don't care for the minutiae of the method being used. -------------- so it could be an alias to a 'worse' RNG on some platforms and I'm not sure if that's OK. I don't mind changing it to rndGen if someone can tell me that it's fine to use 'worse' RNGs for UUIDs, but boost used MersenneTwister and I don't want to change that unless I'm sure it won't cause problems.
 
 So, I don't see what you're doing that's any different from just doing
 
 return randomUUID(rndGen());
 
 aside from the fact that you seem to have an impossible line of code
 for seeding your random number generator. And trying to compile that
 line on my computer actually fails (as I'd expect), so I don't know
 how it's compiling on yours.

Yes, that code requires a new overload for seed, see https://github.com/D-Programming-Language/phobos/pull/627 And yes, it's generating a infinite range of unpredictableSeed (Mt19937 uses 624 values from that range) I don't know anything about RNGs, but Andrew Talbot suggested in the thread called "Mersenne Twister Seeding and UUIDs" that we need this new, better seeding method.
 
 I don't think that's a good idea. If the randomGen.front the is e.g.
 ubyte, casting it to uint will produce 3 bytes which are not set to
 a random value. That's not acceptable.
 
 ------------
  trusted UUID randomUUID(RNG)(ref RNG randomGen)
 if(isUniformRNG!(RNG) &&
     isIntegral!(typeof(RNG.front)) && typeof(RNG.front).sizeof <=
 16) {
     enum elemSize = typeof(RNG.front).sizeof;
     UUID u;
     foreach(size_t i; iota(0, 16, elemSize))
     {
         randomGen.popFront();
         immutable randomValue = randomGen.front;
         u.data[i .. i + elemSize] =
 *cast(ubyte[elemSize]*)&randomValue; }
 ------------
 ?

Looks good except for the fact that the typeof(RNG.front).sizeof <= 16 is completely unnecessary. isIntgeral guarantees that the type is a built-in integral type, the largest of which is 8 bytes (and even if we added cent and ucent, it would still be only 16, not greater than 16).

I'll make it a static assert. It might not be necessary, but it at least documents that elemSize can't be > 16 and it's another safety measure.
 * For your functions which can take range types, test with
 filter!"true" so that you get a range which isn't an array or
 string.

Does take!(string, int) return a slice? Should have thought about that....

No. It doesn't, because hasSlicing!string is false.
 I can't use filter (and map) though:
 std/algorithm.d(1141): Error: nested structs with constructors are
 not yet supported in CTFE (Bug 6419) std/uuid.d(1273):
 called from here: filter("00000000-0000-0000-0000-000000000000"d)

filter!"true"("00000000-0000-0000-0000-000000000000") should compile just fine. You then pass that to your range-based uuid functions which accept ranges or dchar. What does CTFE have to do with it? That would only be an issue if you were trying to use filter in CTFE, and even if you have CTFE tests where you'd like to do that and can't due to that bug, there are plenty of tests that you could run which weren't CTFE. I think that one or both of us is misunderstanding something here.

I _do_ have CTFE tests. parseUUID and UUID.this(string) are guaranteed to work in CTFE as well. And if I have to use a custom Range for the CTFE tests anyway, why not use it for all tests? (This also allows me to explicitly test Forward and Input Ranges)
 
 Thanks for your feedback!

Also, I just noticed that it looks like you don't have any tests which verify that empty is false when it's supposed to be. You should probably add a few.

 
 And by the way, before this actually gets merged into Phobos (as I'm
 guessing that it will be, since no way seems to particularly dislike
 the module thus far), you should fix the ddoc to use LREF and XREF
 when referencing symbols elsewhere in the module and elsewhere in std
 respectively.

OK, I replaced some $(D ) with XREF/LREF but MYREF1,2,3 probably have to stay. Those are needed for the table (at least it's done that way in std.algorithm)
Jun 17 2012
prev sibling next sibling parent Johannes Pfau <nospam example.com> writes:
Am Sat, 16 Jun 2012 21:11:51 +0400
schrieb Dmitry Olshansky <dmitry.olsh gmail.com>:

 On 16.06.2012 21:06, Dmitry Olshansky wrote:
 On 16.06.2012 15:30, Johannes Pfau wrote:
 Am Tue, 12 Jun 2012 13:46:17 +0200
 schrieb Johannes Pfau<nospam example.com>:

 Am Mon, 11 Jun 2012 13:12:49 +0200
 schrieb Johannes Pfau<nospam example.com>:

 Am Sun, 10 Jun 2012 18:49:03 +0200
 schrieb Johannes Pfau<nospam example.com>:

 Am Sat, 09 Jun 2012 21:30:57 +0400
 schrieb Dmitry Olshansky<dmitry.olsh gmail.com>:
 Code: https://github.com/jpf91/phobos/blob/std.uuid/std/uuid.d
 API-Docs: http://dl.dropbox.com/u/24218791/d/src/uuid.html

I pushed these changes suggested by Dmitry Olshansky and Jonas Drewsen: * Add documentation table * Rename UUID.isNil --> UUID.empty * Merge ParserException and IsufficientInputException into UUIDParserException * Add note about std.variant.Variant * Rewrite example to avoid cast * Add non-working(commented out) variadic constructor * Rename extractRegex --> uuidRegex * Use std.algorithm.swap instead of swapRanges

* parseUUID now supports InputRanges with ElementType == dchar (all string types are still supported as well)

* randomUUID now creates the RNG only once per thread. * randomUUID lazily seeds the RNG once (per thread) on the first function call * randomUUID now uses the new seed overload (requires https://github.com/D-Programming-Language/phobos/pull/627 ) * the unsafe randomUUID overload taking only a type has been removed.

* Add changes suggested by Jonathan M Davis (see https://github.com/jpf91/phobos/commit/91cd1c1f4385cfe1cd868c0720aff257d9436822 )

Surely, a better way then making a global is to routinely check for zeros. Plus, you can cast .ptr to size_t* and compare word by word in 4 or 2 ops(!). You'd better watch out for __ctfe though, it deon't allow casting pointers this way.

Ah and another way to go about it is: union { ubyte[16] uuid; size_t[16/size_t.sizeof] by_word; }

Isn't that an optimization which should really be done by the compiler? It already knows that it's supposed to compare two ubyte[16]... Also how could the union solution be used without having to copy the data?
Jun 17 2012
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Sunday, June 17, 2012 09:42:40 Johannes Pfau wrote:
 Am Sat, 16 Jun 2012 16:57:25 -0700
 
 schrieb Jonathan M Davis <jmdavisProg gmx.com>:
 Umm. My point was that as far as I can tell, what you're doing is
 _identical_ to what rndGen is doing.

Yes right now it's exactly the same. But there's no guarantee that the 'Random' alias (and therefore rndGen) will always use Mt19997. The docs explicitly say: -------------- The "default", "favorite", "suggested" random number generator type on the current platform. It is an alias for one of the previously-defined generators. You may want to use it if (1) you need to generate some nice random numbers, and (2) you don't care for the minutiae of the method being used. -------------- so it could be an alias to a 'worse' RNG on some platforms and I'm not sure if that's OK. I don't mind changing it to rndGen if someone can tell me that it's fine to use 'worse' RNGs for UUIDs, but boost used MersenneTwister and I don't want to change that unless I'm sure it won't cause problems.

I seriously question that it will _ever_ be anything worse then Mt19997, but if you're that worried about it, maybe you should add a static assert that Random is Mt19997? If you want to leave it as-is, then you should probably at least put a comment in there as to why you're using Mt19997 instead of just using rndGen, otherwise, someone may come along and change it to use rndGen later.
 So, I don't see what you're doing that's any different from just doing
 
 return randomUUID(rndGen());
 
 aside from the fact that you seem to have an impossible line of code
 for seeding your random number generator. And trying to compile that
 line on my computer actually fails (as I'd expect), so I don't know
 how it's compiling on yours.

Yes, that code requires a new overload for seed, see https://github.com/D-Programming-Language/phobos/pull/627 And yes, it's generating a infinite range of unpredictableSeed (Mt19937 uses 624 values from that range) I don't know anything about RNGs, but Andrew Talbot suggested in the thread called "Mersenne Twister Seeding and UUIDs" that we need this new, better seeding method.

Ah, okay. Apparently I missed that pull request. I'll have to look it over, particularly since I seem to be pretty much the only one merging anything in of late (particularly since Andrei has been too busy to do much with D lately).
 I don't think that's a good idea. If the randomGen.front the is e.g.
 ubyte, casting it to uint will produce 3 bytes which are not set to
 a random value. That's not acceptable.
 
 ------------
  trusted UUID randomUUID(RNG)(ref RNG randomGen)
 if(isUniformRNG!(RNG) &&
 
     isIntegral!(typeof(RNG.front)) && typeof(RNG.front).sizeof <=
 
 16) {
 
     enum elemSize = typeof(RNG.front).sizeof;
     UUID u;
     foreach(size_t i; iota(0, 16, elemSize))
     {
     
         randomGen.popFront();
         immutable randomValue = randomGen.front;
         u.data[i .. i + elemSize] =
 
 *cast(ubyte[elemSize]*)&randomValue; }
 ------------
 ?

Looks good except for the fact that the typeof(RNG.front).sizeof <= 16 is completely unnecessary. isIntgeral guarantees that the type is a built-in integral type, the largest of which is 8 bytes (and even if we added cent and ucent, it would still be only 16, not greater than 16).

I'll make it a static assert. It might not be necessary, but it at least documents that elemSize can't be > 16 and it's another safety measure.

But it's _completely_ unnecessary. You've already guaranteed it with isIntegral. I guess that you can leave it there if you want to, but as far as I can see, it's pointless.
 * For your functions which can take range types, test with
 filter!"true" so that you get a range which isn't an array or
 string.

Does take!(string, int) return a slice? Should have thought about that....

No. It doesn't, because hasSlicing!string is false.
 I can't use filter (and map) though:
 std/algorithm.d(1141): Error: nested structs with constructors are
 not yet supported in CTFE (Bug 6419) std/uuid.d(1273):
 called from here: filter("00000000-0000-0000-0000-000000000000"d)

filter!"true"("00000000-0000-0000-0000-000000000000") should compile just fine. You then pass that to your range-based uuid functions which accept ranges or dchar. What does CTFE have to do with it? That would only be an issue if you were trying to use filter in CTFE, and even if you have CTFE tests where you'd like to do that and can't due to that bug, there are plenty of tests that you could run which weren't CTFE. I think that one or both of us is misunderstanding something here.

I _do_ have CTFE tests. parseUUID and UUID.this(string) are guaranteed to work in CTFE as well. And if I have to use a custom Range for the CTFE tests anyway, why not use it for all tests? (This also allows me to explicitly test Forward and Input Ranges)

If you have to create other ranges for your tests anyway, then that's fine. But you seemed to be saying that filter didn't work at all, and it should.
 Thanks for your feedback!

Also, I just noticed that it looks like you don't have any tests which verify that empty is false when it's supposed to be. You should probably add a few.

There's one such test, but I can add some more.

You probably don't need a lot, but it does need to be tested, and I didn't see any tests for empty which were on a UUID which wasn't supposed to be empty.
 And by the way, before this actually gets merged into Phobos (as I'm
 guessing that it will be, since no way seems to particularly dislike
 the module thus far), you should fix the ddoc to use LREF and XREF
 when referencing symbols elsewhere in the module and elsewhere in std
 respectively.

OK, I replaced some $(D ) with XREF/LREF but MYREF1,2,3 probably have to stay. Those are needed for the table (at least it's done that way in std.algorithm)

Stuff for the table is fine. It's primarily an issue of the ddoc directly on functions which you've been using $(D ) with, since that doesn't create links to anything. - Jonathan M Davis
Jun 17 2012
prev sibling next sibling parent Johannes Pfau <nospam example.com> writes:
Am Sun, 17 Jun 2012 12:15:00 +0400
schrieb Dmitry Olshansky <dmitry.olsh gmail.com>:

 On 17.06.2012 12:04, Johannes Pfau wrote:
 Am Sat, 16 Jun 2012 21:11:51 +0400
 schrieb Dmitry Olshansky<dmitry.olsh gmail.com>:
 Ah and another way to go about it is:
 union {
 	ubyte[16] uuid;
 	size_t[16/size_t.sizeof] by_word;
 }

Isn't that an optimization which should really be done by the compiler? It already knows that it's supposed to compare two ubyte[16]...

It knows that you compare two ubyte[16] that it. It easily might miss the fact that one of them is always 0 in all 16 cells.
 Also how could the union solution be used without having to copy the
 data?

There is no copy it union, aka overlapped storage. In other words as these 16 bytes represented as (on 32bit) 4 size_t. They share memory location. It doesn't play nice with CTFE though, as it thinks unions to be plain struct last time I checked.

Yes, I thought about using the union nested in the empty function, so a copy would have been needed: bool empty() { union { ubyte[16] uuid; .... } uuid = data; //copy } I didn't know that it's possible to make members of a union private though, so I could use this in the UUID struct: union { ubyte[16] data; private size_t[16/size_t.sizeof] by_word; } which indeed wouldn't require copying. However, with this code added std.uuid fails some unrelated ctfe UUID comparison unittests...
Jun 17 2012
prev sibling parent Johannes Pfau <nospam example.com> writes:
Am Sun, 17 Jun 2012 01:10:16 -0700
schrieb Jonathan M Davis <jmdavisProg gmx.com>:

 I seriously question that it will _ever_ be anything worse then
 Mt19997, but if you're that worried about it, maybe you should add a
 static assert that Random is Mt19997? If you want to leave it as-is,
 then you should probably at least put a comment in there as to why
 you're using Mt19997 instead of just using rndGen, otherwise, someone
 may come along and change it to use rndGen later.

OK, I added the static assert and use rndGen now. I'll have to update the pull request so that the new seeding method is used in rndGen though.
 So, I don't see what you're doing that's any different from just
 doing
 
 return randomUUID(rndGen());
 
 aside from the fact that you seem to have an impossible line of
 code for seeding your random number generator. And trying to
 compile that line on my computer actually fails (as I'd expect),
 so I don't know how it's compiling on yours.

Yes, that code requires a new overload for seed, see https://github.com/D-Programming-Language/phobos/pull/627 And yes, it's generating a infinite range of unpredictableSeed (Mt19937 uses 624 values from that range) I don't know anything about RNGs, but Andrew Talbot suggested in the thread called "Mersenne Twister Seeding and UUIDs" that we need this new, better seeding method.

Ah, okay. Apparently I missed that pull request. I'll have to look it over, particularly since I seem to be pretty much the only one merging anything in of late (particularly since Andrei has been too busy to do much with D lately).
 I don't think that's a good idea. If the randomGen.front the is
 e.g. ubyte, casting it to uint will produce 3 bytes which are
 not set to a random value. That's not acceptable.
 
 ------------
  trusted UUID randomUUID(RNG)(ref RNG randomGen)
 if(isUniformRNG!(RNG) &&
 
     isIntegral!(typeof(RNG.front)) && typeof(RNG.front).sizeof
 <=
 
 16) {
 
     enum elemSize = typeof(RNG.front).sizeof;
     UUID u;
     foreach(size_t i; iota(0, 16, elemSize))
     {
     
         randomGen.popFront();
         immutable randomValue = randomGen.front;
         u.data[i .. i + elemSize] =
 
 *cast(ubyte[elemSize]*)&randomValue; }
 ------------
 ?

Looks good except for the fact that the typeof(RNG.front).sizeof <= 16 is completely unnecessary. isIntgeral guarantees that the type is a built-in integral type, the largest of which is 8 bytes (and even if we added cent and ucent, it would still be only 16, not greater than 16).

I'll make it a static assert. It might not be necessary, but it at least documents that elemSize can't be > 16 and it's another safety measure.

But it's _completely_ unnecessary. You've already guaranteed it with isIntegral. I guess that you can leave it there if you want to, but as far as I can see, it's pointless.
 * For your functions which can take range types, test with
 filter!"true" so that you get a range which isn't an array or
 string.

Does take!(string, int) return a slice? Should have thought about that....

No. It doesn't, because hasSlicing!string is false.
 I can't use filter (and map) though:
 std/algorithm.d(1141): Error: nested structs with constructors
 are not yet supported in CTFE (Bug 6419) std/uuid.d(1273):
 called from here:
 filter("00000000-0000-0000-0000-000000000000"d)

filter!"true"("00000000-0000-0000-0000-000000000000") should compile just fine. You then pass that to your range-based uuid functions which accept ranges or dchar. What does CTFE have to do with it? That would only be an issue if you were trying to use filter in CTFE, and even if you have CTFE tests where you'd like to do that and can't due to that bug, there are plenty of tests that you could run which weren't CTFE. I think that one or both of us is misunderstanding something here.

I _do_ have CTFE tests. parseUUID and UUID.this(string) are guaranteed to work in CTFE as well. And if I have to use a custom Range for the CTFE tests anyway, why not use it for all tests? (This also allows me to explicitly test Forward and Input Ranges)

If you have to create other ranges for your tests anyway, then that's fine. But you seemed to be saying that filter didn't work at all, and it should.
 Thanks for your feedback!

Also, I just noticed that it looks like you don't have any tests which verify that empty is false when it's supposed to be. You should probably add a few.

There's one such test, but I can add some more.

You probably don't need a lot, but it does need to be tested, and I didn't see any tests for empty which were on a UUID which wasn't supposed to be empty.
 And by the way, before this actually gets merged into Phobos (as
 I'm guessing that it will be, since no way seems to particularly
 dislike the module thus far), you should fix the ddoc to use LREF
 and XREF when referencing symbols elsewhere in the module and
 elsewhere in std respectively.

OK, I replaced some $(D ) with XREF/LREF but MYREF1,2,3 probably have to stay. Those are needed for the table (at least it's done that way in std.algorithm)

Stuff for the table is fine. It's primarily an issue of the ddoc directly on functions which you've been using $(D ) with, since that doesn't create links to anything. - Jonathan M Davis

Jun 17 2012