www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Andrei's std.path review

reply "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:
I am replying to this in a new thread, so as not to pollute the voting 
thread.  (Speaking from experience as a review manager, noise in the 
voting thread makes it harder to count votes.)


On Wed, 10 Aug 2011 12:37:01 -0600, Andrei Alexandrescu wrote:

 On 8/5/11 7:26 AM, dsimcha wrote:
 My apologies for not announcing this yesterday. For some reason I
 thought today was the official end of review. Anyhow, Lars
 Kyllingstad's new std.path module, which has been in review for the
 past 3 weeks, is up for vote. Please vote yes or no in this thread.

 [...]

My vote is "yes", with a few advisory comments. Comments on the documentation: * "This module is used to parse path strings." -> "This module is used to manipulate path strings."

Ok.
 * "perform any I/O" -> "perform any actual file system actions"

Ok.
 * "use std.file.isDir and std.file.exists" -> use the XREF macro to
 generate cross-reference links.

Cool, didn't know about that one. Will fix. Should it also be used to generate internal references (such as linking to filenameCharCmp in the filenameCmp documentation) or is there a separate macro for this?
 * "backslashes on this platform" -> "backslashes on that platform"

Ok.
 * "The result of calling a function on an ill-formed path is undefined."
 This simplifies documentation but is a bit extreme. We could and should
 specify the behavior for strings that don't look quite like paths.

I am not so sure about this. This is easy in the old std.path, because its behaviour is generally very simplistic. "Search for some character, return everything to the left/right of it" and so on. The new version is designed to correctly handle a lot of corner cases, and such specifications will often tend to be very convoluted. For example, the old getDrive() simply searches for a colon and returns everything to the left of it, period. The new driveName() also includes support for UNC paths, i.e. assert (driveName(`\\server\share\file`) == `\\server\share`); Furthermore, making assumptions about the well-formedness of the input paths allows for simpler, more performant code. Using Windows drive letters as an example again, only one-character drive specifiers are allowed -- c:, d:, etc. It is therefore unnecessary to search the entire string for a colon, it suffices to look at the second character.
 * BTW we should have a validPath() function that tells whether a string
 looks like a valid path or not.

I actually had such a function in an earlier version of std.path, but I removed it for some reason I can't recall at the moment. I agree such a function could be useful, so I'll dig it out again. One use case for validPath() -- or isValidPath(), to follow the module's convention -- would be in "in" contracts for more or less every function in the module. Then the statement you criticised above could be moderated to "well-formedness is checked in non-release mode".
 * Add example:
 
 assert (baseName("dir/file.ext", ".xyz") == "file.ext");

Agreed.
 * defaultExtension -> setDefaultExtension?

I think it's too late to change this now that the voting is almost finished.
 * absolutePath calls getcwd(), negating the assertion in the beginning
 that there's no real directory access going on.

Strictly speaking, now that getcwd() is the default value for the "base" argument to absolutePath() and relativePath(), it is implicitly called by the user as he or she neglects to provide a value for this parameter. For practical purposes, I guess there isn't a real difference between this and getcwd() getting called inside the functions, so I will change the documentation to reflect this.
 * absolutePath and others use string, others use generic characters.
 Why? (This is my strongest comment.)

absolutePath() and relativePath() depend, in the majority of cases, on the result of getcwd(), which returns string. expandTilde() depends on the contents of $HOME or /etc/passwd, which will both also be of string type. I have tried keeping the number of allocations in this module at an absolute minimum, and decided that if any UTF-8/16/32 transcoding is to happen, it should happen at the explicit request of the user, not behind the scenes.
 * filenameCharCmp and filenameCmp -> why long and not int?

filenameCharCmp() returns a-b, and since a and b are dchars, the corresponding signed type is long. filenameCmp() returns long because filenameCharCmp() does.
 * Example in expandTilde uses odd ALL_UPERCASE variable names.

Ah, that would be an example from the old std.path. Can't believe I didn't spot this before. Will fix.
 Comments on the implementation:
 
 * We're increasingly moving towards consolidating imports into one.

Seriously? I haven't noticed this. While I will certainly follow the Phobos conventions, it would be nice to know the rationale for this. Is it simply for vertical compactness? As you may have noticed, I'm not as worried about that as you are. ;) Arguments in favour of separate imports are, in my opinion: - Adding/removing imports is easier. - Seeing which modules are imported is easier.
 * "." is hardcoded as a symbol for the current dir.

Yes. Unlike dir and path separators, the symbols "." and ".." are uniform across all platforms. You may also have noticed that I've removed the curdir and pardir constants, defined in the old std.path as "." and "..", respectively. Their utter uselessness was pointed out at some point during the review, and no one could provide a compelling use case (actually, no one could provide *any* use case) for them.
 * Would be interesting to figure what it would take to make pathSplitter
 reuse splitter.

It would be interesting, yes, but due to the various corner cases it handles, and the special nature of the front-most segment of an absolute/ rooted path, I suspect it will be impossible. At the very least, it won't be worth it.
 * Misalignment in lines 2195--2238. I think you don't need the extra
 scope there anyway, but if you do, don't make a special rule for that
 case - obey normal brace indentation.

This code is not mine -- it's from the old std.path -- and I didn't notice the extra braces until now. I don't think there is a good reason to have them there, so I'll try to remove them. Thanks for the review! -Lars
Aug 11 2011
next sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Thursday, August 11, 2011 07:10:08 Lars T. Kyllingstad wrote:
 I am replying to this in a new thread, so as not to pollute the voting
 thread.  (Speaking from experience as a review manager, noise in the
 voting thread makes it harder to count votes.)
 
 On Wed, 10 Aug 2011 12:37:01 -0600, Andrei Alexandrescu wrote:
 * "use std.file.isDir and std.file.exists" -> use the XREF macro to
 generate cross-reference links.

Cool, didn't know about that one. Will fix. Should it also be used to generate internal references (such as linking to filenameCharCmp in the filenameCmp documentation) or is there a separate macro for this?

XREF deals with linking to functions in other Phobos modules. CXREF deals with linking to functions in druntime modules. LREF deals with linking to functions within a module.
 * filenameCharCmp and filenameCmp -> why long and not int?

filenameCharCmp() returns a-b, and since a and b are dchars, the corresponding signed type is long. filenameCmp() returns long because filenameCharCmp() does.

I'd argue that you should just cast it to int and return int. All the various compare functions promise is whether the return value is less than, equal to, or greater than 0. Relying on the exact value is wrong. And normally such functions return int. So, I don't see any reason why these shouldn't be change to return int.
 * Example in expandTilde uses odd ALL_UPERCASE variable names.

Ah, that would be an example from the old std.path. Can't believe I didn't spot this before. Will fix.
 Comments on the implementation:
 
 * We're increasingly moving towards consolidating imports into one.

Seriously? I haven't noticed this. While I will certainly follow the Phobos conventions, it would be nice to know the rationale for this. Is it simply for vertical compactness? As you may have noticed, I'm not as worried about that as you are. ;) Arguments in favour of separate imports are, in my opinion: - Adding/removing imports is easier. - Seeing which modules are imported is easier.

A large portion of Phobos consolidates imports, but not all of it. Nothing I've been doing has been unless the module already did, since I much prefer separate imports, but I know that Andrei favors using one import statement. We don't really have an official rule on the matter though. I think that the only time that it's come up is when Andrei has mentioned it for one reason or another, which hasn't happened very often. I'm not sure what the opinions of any of the other Phobos devs are on the matter. Personally, I _much_ prefer separate imports per module. - Jonathan M Davis
Aug 11 2011
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/13/11 7:20 AM, Lars T. Kyllingstad wrote:
 On Thu, 11 Aug 2011 00:27:54 -0700, Jonathan M Davis wrote:
 On Thursday, August 11, 2011 07:10:08 Lars T. Kyllingstad wrote:
 On Wed, 10 Aug 2011 12:37:01 -0600, Andrei Alexandrescu wrote:
 * filenameCharCmp and filenameCmp ->  why long and not int?

filenameCharCmp() returns a-b, and since a and b are dchars, the corresponding signed type is long. filenameCmp() returns long because filenameCharCmp() does.

I'd argue that you should just cast it to int and return int. All the various compare functions promise is whether the return value is less than, equal to, or greater than 0. Relying on the exact value is wrong. And normally such functions return int. So, I don't see any reason why these shouldn't be change to return int.

But what do we gain by making it an int? long just seems more natural in this case, IMO.

Unicode characters range in between 0 through 1,114,111. So the most natural type of the difference is int. This would be the first time I'm seeing an API returning a ternary value as a long. (Also, 64-bit machines can operate on two 32-bit integrals simultaneously (literally) so 32-bit integrals may be faster. Probably not a material advantage.) Andrei
Aug 13 2011
prev sibling next sibling parent "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:
On Thu, 11 Aug 2011 00:27:54 -0700, Jonathan M Davis wrote:
 On Thursday, August 11, 2011 07:10:08 Lars T. Kyllingstad wrote:
 On Wed, 10 Aug 2011 12:37:01 -0600, Andrei Alexandrescu wrote:
 * filenameCharCmp and filenameCmp -> why long and not int?

filenameCharCmp() returns a-b, and since a and b are dchars, the corresponding signed type is long. filenameCmp() returns long because filenameCharCmp() does.

I'd argue that you should just cast it to int and return int. All the various compare functions promise is whether the return value is less than, equal to, or greater than 0. Relying on the exact value is wrong. And normally such functions return int. So, I don't see any reason why these shouldn't be change to return int.

But what do we gain by making it an int? long just seems more natural in this case, IMO. -Lars
Aug 13 2011
prev sibling parent "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:
On Sat, 13 Aug 2011 09:26:45 -0500, Andrei Alexandrescu wrote:

 On 8/13/11 7:20 AM, Lars T. Kyllingstad wrote:
 On Thu, 11 Aug 2011 00:27:54 -0700, Jonathan M Davis wrote:
 On Thursday, August 11, 2011 07:10:08 Lars T. Kyllingstad wrote:
 On Wed, 10 Aug 2011 12:37:01 -0600, Andrei Alexandrescu wrote:
 * filenameCharCmp and filenameCmp ->  why long and not int?

filenameCharCmp() returns a-b, and since a and b are dchars, the corresponding signed type is long. filenameCmp() returns long because filenameCharCmp() does.

I'd argue that you should just cast it to int and return int. All the various compare functions promise is whether the return value is less than, equal to, or greater than 0. Relying on the exact value is wrong. And normally such functions return int. So, I don't see any reason why these shouldn't be change to return int.

But what do we gain by making it an int? long just seems more natural in this case, IMO.

Unicode characters range in between 0 through 1,114,111. So the most natural type of the difference is int. This would be the first time I'm seeing an API returning a ternary value as a long. (Also, 64-bit machines can operate on two 32-bit integrals simultaneously (literally) so 32-bit integrals may be faster. Probably not a material advantage.)

All right, you have convinced me. int it is. :) -Lars
Aug 13 2011