digitalmars.D - Andrei's std.path review

Lars T. Kyllingstad (75/117) Aug 11 2011 I am replying to this in a new thread, so as not to pollute the voting

Jonathan M Davis (18/52) Aug 11 2011 XREF deals with linking to functions in other Phobos modules. CXREF deal...

Lars T. Kyllingstad (4/17) Aug 13 2011 But what do we gain by making it an int? long just seems more natural i...

Andrei Alexandrescu (9/25) Aug 13 2011 Unicode characters range in between 0 through 1,114,111. So the most

Lars T. Kyllingstad (3/31) Aug 13 2011 All right, you have convinced me. int it is. :)

"Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:

I am replying to this in a new thread, so as not to pollute the voting 
thread.  (Speaking from experience as a review manager, noise in the 
voting thread makes it harder to count votes.)


On Wed, 10 Aug 2011 12:37:01 -0600, Andrei Alexandrescu wrote:

 On 8/5/11 7:26 AM, dsimcha wrote:
 My apologies for not announcing this yesterday. For some reason I
 thought today was the official end of review. Anyhow, Lars
 Kyllingstad's new std.path module, which has been in review for the
 past 3 weeks, is up for vote. Please vote yes or no in this thread.

 [...]

 
 My vote is "yes", with a few advisory comments.
 
 Comments on the documentation:
 
 * "This module is used to parse path strings." -> "This module is used
 to manipulate path strings."

Ok.


 * "perform any I/O" -> "perform any actual file system actions"

Ok.


 * "use std.file.isDir and std.file.exists" -> use the XREF macro to
 generate cross-reference links.

Cool, didn't know about that one.  Will fix.  Should it also be used to 
generate internal references (such as linking to filenameCharCmp in the 
filenameCmp documentation) or is there a separate macro for this?


 * "backslashes on this platform" -> "backslashes on that platform"

Ok.


 * "The result of calling a function on an ill-formed path is undefined."
 This simplifies documentation but is a bit extreme. We could and should
 specify the behavior for strings that don't look quite like paths.

I am not so sure about this.  This is easy in the old std.path, because 
its behaviour is generally very simplistic.  "Search for some character, 
return everything to the left/right of it" and so on.

The new version is designed to correctly handle a lot of corner cases, 
and such specifications will often tend to be very convoluted.  For 
example, the old getDrive() simply searches for a colon and returns 
everything to the left of it, period.  The new driveName() also includes 
support for UNC paths, i.e.

  assert (driveName(`\\server\share\file`) == `\\server\share`);

Furthermore, making assumptions about the well-formedness of the input 
paths allows for simpler, more performant code.  Using Windows drive 
letters as an example again, only one-character drive specifiers are 
allowed -- c:, d:, etc.  It is therefore unnecessary to search the entire 
string for a colon, it suffices to look at the second character.


 * BTW we should have a validPath() function that tells whether a string
 looks like a valid path or not.

I actually had such a function in an earlier version of std.path, but I 
removed it for some reason I can't recall at the moment.  I agree such a 
function could be useful, so I'll dig it out again.

One use case for validPath() -- or isValidPath(), to follow the module's 
convention -- would be in "in" contracts for more or less every function 
in the module.  Then the statement you criticised above could be 
moderated to "well-formedness is checked in non-release mode".


 * Add example:
 
 assert (baseName("dir/file.ext", ".xyz") == "file.ext");

Agreed.


 * defaultExtension -> setDefaultExtension?

I think it's too late to change this now that the voting is almost 
finished.


 * absolutePath calls getcwd(), negating the assertion in the beginning
 that there's no real directory access going on.

Strictly speaking, now that getcwd() is the default value for the "base" 
argument to absolutePath() and relativePath(), it is implicitly called by 
the user as he or she neglects to provide a value for this parameter.

For practical purposes, I guess there isn't a real difference between 
this and getcwd() getting called inside the functions, so I will change 
the documentation to reflect this.


 * absolutePath and others use string, others use generic characters.
 Why? (This is my strongest comment.)

absolutePath() and relativePath() depend, in the majority of cases, on 
the result of getcwd(), which returns string.  expandTilde() depends on 
the contents of $HOME or /etc/passwd, which will both also be of string 
type.  I have tried keeping the number of allocations in this module at 
an absolute minimum, and decided that if any UTF-8/16/32 transcoding is 
to happen, it should happen at the explicit request of the user, not 
behind the scenes.


 * filenameCharCmp and filenameCmp -> why long and not int?

filenameCharCmp() returns a-b, and since a and b are dchars, the 
corresponding signed type is long.  filenameCmp() returns long because 
filenameCharCmp() does.


 * Example in expandTilde uses odd ALL_UPERCASE variable names.

Ah, that would be an example from the old std.path.  Can't believe I 
didn't spot this before.  Will fix.


 Comments on the implementation:
 
 * We're increasingly moving towards consolidating imports into one.

Seriously?  I haven't noticed this.  While I will certainly follow the 
Phobos conventions, it would be nice to know the rationale for this.  Is 
it simply for vertical compactness?  As you may have noticed, I'm not as 
worried about that as you are. ;)

Arguments in favour of separate imports are, in my opinion:

 - Adding/removing imports is easier.
 - Seeing which modules are imported is easier.


 * "." is hardcoded as a symbol for the current dir.

Yes.  Unlike dir and path separators, the symbols "." and ".." are 
uniform across all platforms.

You may also have noticed that I've removed the curdir and pardir 
constants, defined in the old std.path as "." and "..", respectively.  
Their utter uselessness was pointed out at some point during the review, 
and no one could provide a compelling use case (actually, no one could 
provide *any* use case) for them.


 * Would be interesting to figure what it would take to make pathSplitter
 reuse splitter.

It would be interesting, yes, but due to the various corner cases it 
handles, and the special nature of the front-most segment of an absolute/
rooted path, I suspect it will be impossible.  At the very least, it 
won't be worth it.


 * Misalignment in lines 2195--2238. I think you don't need the extra
 scope there anyway, but if you do, don't make a special rule for that
 case - obey normal brace indentation.

This code is not mine -- it's from the old std.path -- and I didn't 
notice the extra braces until now.  I don't think there is a good reason 
to have them there, so I'll try to remove them.

Thanks for the review!

-Lars

Aug 11 2011

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Thursday, August 11, 2011 07:10:08 Lars T. Kyllingstad wrote:
 I am replying to this in a new thread, so as not to pollute the voting
 thread.  (Speaking from experience as a review manager, noise in the
 voting thread makes it harder to count votes.)
 
 On Wed, 10 Aug 2011 12:37:01 -0600, Andrei Alexandrescu wrote:
 * "use std.file.isDir and std.file.exists" -> use the XREF macro to
 generate cross-reference links.

 
 Cool, didn't know about that one.  Will fix.  Should it also be used to
 generate internal references (such as linking to filenameCharCmp in the
 filenameCmp documentation) or is there a separate macro for this?

XREF deals with linking to functions in other Phobos modules. CXREF deals with 
linking to functions in druntime modules. LREF deals with linking to functions 
within a module.

 * filenameCharCmp and filenameCmp -> why long and not int?

 
 filenameCharCmp() returns a-b, and since a and b are dchars, the
 corresponding signed type is long.  filenameCmp() returns long because
 filenameCharCmp() does.

I'd argue that you should just cast it to int and return int. All the various 
compare functions promise is whether the return value is less than, equal to, 
or greater than 0. Relying on the exact value is wrong. And normally such 
functions return int. So, I don't see any reason why these shouldn't be change 
to return int.

 * Example in expandTilde uses odd ALL_UPERCASE variable names.

 
 Ah, that would be an example from the old std.path.  Can't believe I
 didn't spot this before.  Will fix.
 
 Comments on the implementation:
 
 * We're increasingly moving towards consolidating imports into one.

 
 Seriously?  I haven't noticed this.  While I will certainly follow the
 Phobos conventions, it would be nice to know the rationale for this.  Is
 it simply for vertical compactness?  As you may have noticed, I'm not as
 worried about that as you are. ;)
 
 Arguments in favour of separate imports are, in my opinion:
 
  - Adding/removing imports is easier.
  - Seeing which modules are imported is easier.

A large portion of Phobos consolidates imports, but not all of it. Nothing 
I've been doing has been unless the module already did, since I much prefer 
separate imports, but I know that Andrei favors using one import statement. We 
don't really have an official rule on the matter though. I think that the only 
time that it's come up is when Andrei has mentioned it for one reason or 
another, which hasn't happened very often. I'm not sure what the opinions of 
any of the other Phobos devs are on the matter. Personally, I _much_ prefer 
separate imports per module.

- Jonathan M Davis

Aug 11 2011

"Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:

On Thu, 11 Aug 2011 00:27:54 -0700, Jonathan M Davis wrote:
 On Thursday, August 11, 2011 07:10:08 Lars T. Kyllingstad wrote:
 On Wed, 10 Aug 2011 12:37:01 -0600, Andrei Alexandrescu wrote:
 * filenameCharCmp and filenameCmp -> why long and not int?

 
 filenameCharCmp() returns a-b, and since a and b are dchars, the
 corresponding signed type is long.  filenameCmp() returns long because
 filenameCharCmp() does.

 
 I'd argue that you should just cast it to int and return int. All the
 various compare functions promise is whether the return value is less
 than, equal to, or greater than 0. Relying on the exact value is wrong.
 And normally such functions return int. So, I don't see any reason why
 these shouldn't be change to return int.

But what do we gain by making it an int?  long just seems more natural in 
this case, IMO.

-Lars

Aug 13 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/13/11 7:20 AM, Lars T. Kyllingstad wrote:
 On Thu, 11 Aug 2011 00:27:54 -0700, Jonathan M Davis wrote:
 On Thursday, August 11, 2011 07:10:08 Lars T. Kyllingstad wrote:
 On Wed, 10 Aug 2011 12:37:01 -0600, Andrei Alexandrescu wrote:
 * filenameCharCmp and filenameCmp ->  why long and not int?

 filenameCharCmp() returns a-b, and since a and b are dchars, the
 corresponding signed type is long.  filenameCmp() returns long because
 filenameCharCmp() does.

 I'd argue that you should just cast it to int and return int. All the
 various compare functions promise is whether the return value is less
 than, equal to, or greater than 0. Relying on the exact value is wrong.
 And normally such functions return int. So, I don't see any reason why
 these shouldn't be change to return int.

 But what do we gain by making it an int?  long just seems more natural in
 this case, IMO.

Unicode characters range in between 0 through 1,114,111. So the most 
natural type of the difference is int.

This would be the first time I'm seeing an API returning a ternary value 
as a long.

(Also, 64-bit machines can operate on two 32-bit integrals 
simultaneously (literally) so 32-bit integrals may be faster. Probably 
not a material advantage.)


Andrei

Aug 13 2011

"Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:

On Sat, 13 Aug 2011 09:26:45 -0500, Andrei Alexandrescu wrote:

 On 8/13/11 7:20 AM, Lars T. Kyllingstad wrote:
 On Thu, 11 Aug 2011 00:27:54 -0700, Jonathan M Davis wrote:
 On Thursday, August 11, 2011 07:10:08 Lars T. Kyllingstad wrote:
 On Wed, 10 Aug 2011 12:37:01 -0600, Andrei Alexandrescu wrote:
 * filenameCharCmp and filenameCmp ->  why long and not int?

 filenameCharCmp() returns a-b, and since a and b are dchars, the
 corresponding signed type is long.  filenameCmp() returns long
 because filenameCharCmp() does.

 I'd argue that you should just cast it to int and return int. All the
 various compare functions promise is whether the return value is less
 than, equal to, or greater than 0. Relying on the exact value is
 wrong. And normally such functions return int. So, I don't see any
 reason why these shouldn't be change to return int.

 But what do we gain by making it an int?  long just seems more natural
 in this case, IMO.

 
 Unicode characters range in between 0 through 1,114,111. So the most
 natural type of the difference is int.
 
 This would be the first time I'm seeing an API returning a ternary value
 as a long.
 
 (Also, 64-bit machines can operate on two 32-bit integrals
 simultaneously (literally) so 32-bit integrals may be faster. Probably
 not a material advantage.)

All right, you have convinced me.  int it is. :)

-Lars

Aug 13 2011

D Programming

C/C++ Programming

Other

digitalmars.D - Andrei's std.path review