www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - std.path review: update

reply "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:
Based on your comments, I have made some changes to my std.path 
proposal.  A list of the changes I have made can be found at the 
following address (look at the commits dated 2011-07-17):

  https://github.com/kyllingstad/phobos/commits/std-path

I believe I have covered most of your requests, with a few exceptions:

Firstly, Jonathan argued very convincingly that the contents of the 
current std.path should be put back in, marked as "scheduled for 
deprecation".  I intend to do this when the review is over, if my 
submission gets accepted.  For now, ignore the bottommost deprecated: 
block.

Secondly, David and Jonathan suggested I optimise functions like 
setExtension() using ~= to append when possible.  I have tried doing so 
for setExtension(), and I'm not convinced the extra complexity is worth 
the relatively modest gain.  The specialised, optimised version can be 
found here:

  https://github.com/kyllingstad/phobos/blob/std-path/std/path.d#L529

Finally, there are some requests with which I don't personally agree.  
Therefore, I'd like to get more opinions before making any changes:

- Should I add toNativePath(), which replaces '/' with '\' on Windows and 
vice versa on POSIX?

- Should it be specified/documented whether a function returns "" or 
null?  Specifically, is it important that

    extension("foo") is null
    extension("foo.") !is null && extension("foo.") == ""

- Do people agree with Jonathan's views on function names?


As before, code and docs can be found here:

https://github.com/kyllingstad/phobos/blob/std-path/std/path.d
http://www.kyllingen.net/code/new-std-path/phobos-prerelease/std_path.html

-Lars
Jul 17 2011
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Lars T. Kyllingstad:

 I believe I have covered most of your requests, with a few exceptions:
compatibleStrings is a template still. Bye, bearophile
Jul 17 2011
parent reply "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:
On Sun, 17 Jul 2011 17:43:42 -0400, bearophile wrote:

 Lars T. Kyllingstad:
 
 I believe I have covered most of your requests, with a few exceptions:
compatibleStrings is a template still.
I know. Sorry, forgot to mention that. For now, I'd like to keep it the way it is. I can't find any precedence in Phobos for turning these kinds of tests into CTFEable functions, and if compatibleStrings were to end up in std.traits, for instance, it would stand out as being different from everything else in there. If it is decided that it is better to write these tests as ordinary functions, that should probably be done throughout Phobos. -Lars
Jul 18 2011
parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Monday 18 July 2011 09:35:17 Lars T. Kyllingstad wrote:
 On Sun, 17 Jul 2011 17:43:42 -0400, bearophile wrote:
 Lars T. Kyllingstad:
 I believe I have covered most of your requests, with a few exceptions:
compatibleStrings is a template still.
I know. Sorry, forgot to mention that. For now, I'd like to keep it the way it is. I can't find any precedence in Phobos for turning these kinds of tests into CTFEable functions, and if compatibleStrings were to end up in std.traits, for instance, it would stand out as being different from everything else in there. If it is decided that it is better to write these tests as ordinary functions, that should probably be done throughout Phobos.
And it _should_ be a template. All of the stuff like that are templates. And I'm not even sure that it _can_ be a function. And even if it can, what would we gain by making it a function anyway? It's operating on types. It's of no use at runtime. It's a perfect candidate for an eponymous template. std.traits, std.range, etc. do this sort of thing in pretty much exactly the same way. There may be a cleaner way to write it then it currently is, but using an eponymous template like that is the correct thing to do. - Jonathan M Davis
Jul 18 2011
parent reply bearophile <bearophileHUGS lycos.com> writes:
Jonathan M Davis:

 And it _should_ be a template. All of the stuff like that are templates. And 
 I'm not even sure that it _can_ be a function. And even if it can, what would 
 we gain by making it a function anyway? It's operating on types. It's of no 
 use at runtime. It's a perfect candidate for an eponymous template. 
 std.traits, std.range, etc. do this sort of thing in pretty much exactly the 
 same way. There may be a cleaner way to write it then it currently is, but 
 using an eponymous template like that is the correct thing to do.
This seems to work: import std.traits: isSomeChar, Unqual, isSomeString; bool compatibleStrings(Strings...)() if (Strings.length) { static if (isSomeString!(Strings[0])) { alias Unqual!(typeof(Strings[0].init[0])) TC; foreach (s; Strings[1 .. $]) static if (isSomeString!s && !is(TC == Unqual!(typeof(s.init[0])))) return false; return true; } else return false; } version (unittest) { static assert (compatibleStrings!(char[], const(char)[], string)()); static assert (compatibleStrings!(wchar[], const(wchar)[], wstring)()); static assert (compatibleStrings!(dchar[], const(dchar)[], dstring)()); static assert (!compatibleStrings!(int[], const(int)[], immutable(int)[])()); static assert (!compatibleStrings!(char[], wchar[])()); static assert (!compatibleStrings!(char[], dstring)()); } void main() {} I have written tons of such things in dlibs1, and generally I have seen that recursive templates are slower and need more RAM than similar functions. Bye, bearophile
Jul 18 2011
parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Monday 18 July 2011 06:28:50 bearophile wrote:
 Jonathan M Davis:
 And it _should_ be a template. All of the stuff like that are templates.
 And I'm not even sure that it _can_ be a function. And even if it can,
 what would we gain by making it a function anyway? It's operating on
 types. It's of no use at runtime. It's a perfect candidate for an
 eponymous template. std.traits, std.range, etc. do this sort of thing
 in pretty much exactly the same way. There may be a cleaner way to
 write it then it currently is, but using an eponymous template like
 that is the correct thing to do.
This seems to work: import std.traits: isSomeChar, Unqual, isSomeString; bool compatibleStrings(Strings...)() if (Strings.length) { static if (isSomeString!(Strings[0])) { alias Unqual!(typeof(Strings[0].init[0])) TC; foreach (s; Strings[1 .. $]) static if (isSomeString!s && !is(TC == Unqual!(typeof(s.init[0])))) return false; return true; } else return false; } version (unittest) { static assert (compatibleStrings!(char[], const(char)[], string)()); static assert (compatibleStrings!(wchar[], const(wchar)[], wstring)()); static assert (compatibleStrings!(dchar[], const(dchar)[], dstring)()); static assert (!compatibleStrings!(int[], const(int)[], immutable(int)[])()); static assert (!compatibleStrings!(char[], wchar[])()); static assert (!compatibleStrings!(char[], dstring)()); } void main() {} I have written tons of such things in dlibs1, and generally I have seen that recursive templates are slower and need more RAM than similar functions.
Okay. Yes, you could do that. But what you're doing is basically the same as the eponymous template except that it's saving the value to in a function so that it can be called at runtime. The gain is 0 and potentially confusing. It's no better than bool compatibleStringsFunc(Strings...)() { enum retval = compatibleStrings!Strings; return retval; } But you _did_ find a way to turn it into a function. - Jonathan M Davis
Jul 18 2011
parent bearophile <bearophileHUGS lycos.com> writes:
Jonathan M Davis:

 But what you're doing is basically the same as 
 the eponymous template except that it's saving the value to in a function so 
 that it can be called at runtime. The gain is 0 and potentially confusing.
 It's no better than
 
 bool compatibleStringsFunc(Strings...)()
 {
 	enum retval = compatibleStrings!Strings;
 	return retval;
 }
The gain of my version is that it doesn't generate tons of templates. From my experience such functions lead to faster compile times and less memory used by the compiler compared to using recursive templates. And for me a foreach is usually less confusing than recursive templates :-) Bye, bearophile
Jul 18 2011
prev sibling next sibling parent reply Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 7/17/11, Lars T. Kyllingstad <public kyllingen.nospamnet> wrote:
 - Should I add toNativePath(), which replaces '/' with '\' on Windows and
 vice versa on POSIX?
Actually I withdraw that feature request. Some tools will work with only forward slashes, others only backward slashes, but this is regardless of what platform they're on. E.g. some tools don't work with forward slashes, while GIT doesn't work with backward slashes when running on Windows. I think .replace(r"\", "/") and .replace("/", r"\") are good enough, but maybe an alias to each version wouldn't be bad. E.g. "toForwardSlash" and "toBackslash". It's not hard to define this in our own code, so it's not really a feature request.
Jul 17 2011
next sibling parent "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:
On Mon, 18 Jul 2011 00:24:30 +0200, Andrej Mitrovic wrote:

 On 7/17/11, Lars T. Kyllingstad <public kyllingen.nospamnet> wrote:
 - Should I add toNativePath(), which replaces '/' with '\' on Windows
 and vice versa on POSIX?
Actually I withdraw that feature request. Some tools will work with only forward slashes, others only backward slashes, but this is regardless of what platform they're on.
Noted. :) -Lars
Jul 18 2011
prev sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Sun, 17 Jul 2011 18:24:30 -0400, Andrej Mitrovic  
<andrej.mitrovich gmail.com> wrote:

 On 7/17/11, Lars T. Kyllingstad <public kyllingen.nospamnet> wrote:
 - Should I add toNativePath(), which replaces '/' with '\' on Windows  
 and
 vice versa on POSIX?
Actually I withdraw that feature request. Some tools will work with only forward slashes, others only backward slashes, but this is regardless of what platform they're on. E.g. some tools don't work with forward slashes, while GIT doesn't work with backward slashes when running on Windows. I think .replace(r"\", "/") and .replace("/", r"\") are good enough, but maybe an alias to each version wouldn't be bad. E.g. "toForwardSlash" and "toBackslash". It's not hard to define this in our own code, so it's not really a feature request.
Hum... I wonder if normalize should do this... Is normalize supposed to create a canonical path? If so, then this needs to happen. -Steve
Jul 18 2011
parent reply "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:
On Mon, 18 Jul 2011 13:26:08 -0400, Steven Schveighoffer wrote:

 On Sun, 17 Jul 2011 18:24:30 -0400, Andrej Mitrovic
 <andrej.mitrovich gmail.com> wrote:
 
 On 7/17/11, Lars T. Kyllingstad <public kyllingen.nospamnet> wrote:
 - Should I add toNativePath(), which replaces '/' with '\' on Windows
 and
 vice versa on POSIX?
Actually I withdraw that feature request. Some tools will work with only forward slashes, others only backward slashes, but this is regardless of what platform they're on. E.g. some tools don't work with forward slashes, while GIT doesn't work with backward slashes when running on Windows. I think .replace(r"\", "/") and .replace("/", r"\") are good enough, but maybe an alias to each version wouldn't be bad. E.g. "toForwardSlash" and "toBackslash". It's not hard to define this in our own code, so it's not really a feature request.
Hum... I wonder if normalize should do this... Is normalize supposed to create a canonical path? If so, then this needs to happen.
normalize does this on Windows, where '/' is also a directory separator, but not on POSIX, where '\' is an ordinary filename character. I am not entirely sure what the exact definition of "canonical path" is, but according to some it entails resolving symlinks. normalize does not do this, but it does everything else: - resolves . and .. to the extent possible - collapses redundant directory separators - changes '/' to '\' on Windows -Lars -Lars
Jul 18 2011
parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Mon, 18 Jul 2011 14:30:51 -0400, Lars T. Kyllingstad  
<public kyllingen.nospamnet> wrote:

 On Mon, 18 Jul 2011 13:26:08 -0400, Steven Schveighoffer wrote:

 On Sun, 17 Jul 2011 18:24:30 -0400, Andrej Mitrovic
 <andrej.mitrovich gmail.com> wrote:

 On 7/17/11, Lars T. Kyllingstad <public kyllingen.nospamnet> wrote:
 - Should I add toNativePath(), which replaces '/' with '\' on Windows
 and
 vice versa on POSIX?
Actually I withdraw that feature request. Some tools will work with only forward slashes, others only backward slashes, but this is regardless of what platform they're on. E.g. some tools don't work with forward slashes, while GIT doesn't work with backward slashes when running on Windows. I think .replace(r"\", "/") and .replace("/", r"\") are good enough, but maybe an alias to each version wouldn't be bad. E.g. "toForwardSlash" and "toBackslash". It's not hard to define this in our own code, so it's not really a feature request.
Hum... I wonder if normalize should do this... Is normalize supposed to create a canonical path? If so, then this needs to happen.
normalize does this on Windows, where '/' is also a directory separator, but not on POSIX, where '\' is an ordinary filename character. I am not entirely sure what the exact definition of "canonical path" is, but according to some it entails resolving symlinks. normalize does not do this, but it does everything else: - resolves . and .. to the extent possible - collapses redundant directory separators - changes '/' to '\' on Windows
OK, this is what I meant. By canonical path, I mean I should be able to take two paths that point to the same filename and normalize should output the same string for both. I agree that the posix version should not replace \ with /, since that's a Windows specific issue. I realize there are some limitations when all you are doing is string manipulation. For example ~steves/blah resolves to the canonical path /home/steves/blah. Same thing with symlinks. I guess normalize is the best term for it, don't want to confuse it with full canonical. -Steve
Jul 18 2011
prev sibling next sibling parent Jesse Phillips <jessekphillips+d gmail.com> writes:
On Sun, 17 Jul 2011 21:27:41 +0000, Lars T. Kyllingstad wrote:

 - Should I add toNativePath(), which replaces '/' with '\' on Windows
 and vice versa on POSIX?
I'm not sure my opinion on this. It seems like a useful idea, but as Andrej points out it make just cause other issues.
 - Should it be specified/documented whether a function returns "" or
 null?  Specifically, is it important that
 
     extension("foo") is null
     extension("foo.") !is null && extension("foo.") == ""
I don't think it is important, but probably should be documented.
 - Do people agree with Jonathan's views on function names?
I think I did.
Jul 17 2011
prev sibling next sibling parent reply Brian Schott <brian-schott cox.net> writes:
The documentation comments for driveName say that the return value will
be an empty string in some circumstances, but the code and unit tests
both say that the behavior is to return null.
Jul 17 2011
parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Sunday 17 July 2011 22:08:27 Brian Schott wrote:
 The documentation comments for driveName say that the return value will
 be an empty string in some circumstances, but the code and unit tests
 both say that the behavior is to return null.
The fun part with that is that "" == null and a null string is empty per std.array.empty, so it _is_ the empty string. The only difference is that "" !is null. So, if the function says that it returns null, then it needs to return null. Since it says that it returns the empty string, it could return either. Now, in spite of all that, there's still a problem since the tests verify that the return value is null, not empty. Either the documentation should say that it returns null, or the tests should be checking for empty, not null. But still, the documentation isn't incorrect. Are the tests are perfectly valid, but they really shouldn't be testing for is null instead of empty when the function is supposed to return empty. - Jonathan M Davis
Jul 17 2011
parent reply "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:
On Sun, 17 Jul 2011 22:38:43 -0700, Jonathan M Davis wrote:

 On Sunday 17 July 2011 22:08:27 Brian Schott wrote:
 The documentation comments for driveName say that the return value will
 be an empty string in some circumstances, but the code and unit tests
 both say that the behavior is to return null.
The fun part with that is that "" == null and a null string is empty per std.array.empty, so it _is_ the empty string. The only difference is that "" !is null. So, if the function says that it returns null, then it needs to return null. Since it says that it returns the empty string, it could return either. Now, in spite of all that, there's still a problem since the tests verify that the return value is null, not empty. Either the documentation should say that it returns null, or the tests should be checking for empty, not null. But still, the documentation isn't incorrect. Are the tests are perfectly valid, but they really shouldn't be testing for is null instead of empty when the function is supposed to return empty.
Pending a decision on the null vs. empty issue, I have now standardised on using empty() for testing whether functions return empty strings. -Lars
Jul 18 2011
parent reply torhu <no spam.invalid> writes:
On 18.07.2011 11:42, Lars T. Kyllingstad wrote:
 On Sun, 17 Jul 2011 22:38:43 -0700, Jonathan M Davis wrote:

  On Sunday 17 July 2011 22:08:27 Brian Schott wrote:
  The documentation comments for driveName say that the return value will
  be an empty string in some circumstances, but the code and unit tests
  both say that the behavior is to return null.
The fun part with that is that "" == null and a null string is empty per std.array.empty, so it _is_ the empty string. The only difference is that "" !is null. So, if the function says that it returns null, then it needs to return null. Since it says that it returns the empty string, it could return either. Now, in spite of all that, there's still a problem since the tests verify that the return value is null, not empty. Either the documentation should say that it returns null, or the tests should be checking for empty, not null. But still, the documentation isn't incorrect. Are the tests are perfectly valid, but they really shouldn't be testing for is null instead of empty when the function is supposed to return empty.
Pending a decision on the null vs. empty issue, I have now standardised on using empty() for testing whether functions return empty strings.
I'd like to make a case for null as the 'nothing here' value. The advantage of using null is that all possible ways of testing for 'nothingness' (is, ==, as a boolean condition, empty range) will work. But if you return an empty string, you can't do 'str is null', because that will be false. With null there's just no doubt, and no way to get the test wrong. As far as I can tell by the testing I've done, you can use a null string in every way that you can use an empty string, even append to it with ~=. The distinction between null and empty strings is significant in C and Java, but in D it's not, and the tiny difference that actually exists mainly serves to confuse people. It doesn't help that the actual differences are largely undocumented either. One difference is that a statically allocated empty string is null terminated, but I think that can be safely ignored in the case of return values. By the way, did you read my post in the other thread?
Jul 18 2011
next sibling parent reply "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:
On Mon, 18 Jul 2011 14:23:18 +0200, torhu wrote:

 On 18.07.2011 11:42, Lars T. Kyllingstad wrote:
 On Sun, 17 Jul 2011 22:38:43 -0700, Jonathan M Davis wrote:

  On Sunday 17 July 2011 22:08:27 Brian Schott wrote:
  The documentation comments for driveName say that the return value
  will be an empty string in some circumstances, but the code and unit
  tests both say that the behavior is to return null.
The fun part with that is that "" == null and a null string is empty per std.array.empty, so it _is_ the empty string. The only difference is that "" !is null. So, if the function says that it returns null, then it needs to return null. Since it says that it returns the empty string, it could return either. Now, in spite of all that, there's still a problem since the tests verify that the return value is null, not empty. Either the documentation should say that it returns null, or the tests should be checking for empty, not null. But still, the documentation isn't incorrect. Are the tests are perfectly valid, but they really shouldn't be testing for is null instead of empty when the function is supposed to return empty.
Pending a decision on the null vs. empty issue, I have now standardised on using empty() for testing whether functions return empty strings.
I'd like to make a case for null as the 'nothing here' value. The advantage of using null is that all possible ways of testing for 'nothingness' (is, ==, as a boolean condition, empty range) will work. But if you return an empty string, you can't do 'str is null', because that will be false. With null there's just no doubt, and no way to get the test wrong. As far as I can tell by the testing I've done, you can use a null string in every way that you can use an empty string, even append to it with ~=. The distinction between null and empty strings is significant in C and Java, but in D it's not, and the tiny difference that actually exists mainly serves to confuse people. It doesn't help that the actual differences are largely undocumented either. One difference is that a statically allocated empty string is null terminated, but I think that can be safely ignored in the case of return values.
True, but the question was not whether one should use null or "" for the "nothing here" return value of a function. The question was whether the function returning null should mean something different than it returning "".
 By the way, did you read my post in the other thread?
Yes, I read it, but I forgot to answer. Sorry about that. I've answered now. -Lars
Jul 18 2011
parent reply torhu <no spam.invalid> writes:
On 18.07.2011 16:18, Lars T. Kyllingstad wrote:
 On Mon, 18 Jul 2011 14:23:18 +0200, torhu wrote:
  I'd like to make a case for null as the 'nothing here' value.

  The advantage of using null is that all possible ways of testing for
  'nothingness' (is, ==, as a boolean condition, empty range) will work.
  But if you return an empty string, you can't do 'str is null', because
  that will be false.  With null there's just no doubt, and no way to get
  the test wrong.

  As far as I can tell by the testing I've done, you can use a null string
  in every way that you can use an empty string, even append to it with
  ~=.   The distinction between null and empty strings is significant in C
  and Java, but in D it's not, and the tiny difference that actually
  exists mainly serves to confuse people.  It doesn't help that the actual
  differences are largely undocumented either.

  One difference is that a statically allocated empty string is null
  terminated, but I think that can be safely ignored in the case of return
  values.
True, but the question was not whether one should use null or "" for the "nothing here" return value of a function. The question was whether the function returning null should mean something different than it returning "".
I meant to imply that null and empty should not be used to mean two different things, sorry if I didn't make myself clear. AFAIK, none of the Phobos functions that take string arguments care about the difference. If the length is zero, the pointer value is ignored. In light of this, I don't know what different meanings null and empty would or should have.
Jul 18 2011
parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On 2011-07-18 10:51, torhu wrote:
 On 18.07.2011 16:18, Lars T. Kyllingstad wrote:
 On Mon, 18 Jul 2011 14:23:18 +0200, torhu wrote:
 I'd like to make a case for null as the 'nothing here' value.
 
 The advantage of using null is that all possible ways of testing for
 'nothingness' (is, ==, as a boolean condition, empty range) will work.
 But if you return an empty string, you can't do 'str is null', because
 that will be false. With null there's just no doubt, and no way to get
 the test wrong.
 
 As far as I can tell by the testing I've done, you can use a null
 string in every way that you can use an empty string, even append to
 it with ~=. The distinction between null and empty strings is
 significant in C and Java, but in D it's not, and the tiny difference
 that actually exists mainly serves to confuse people. It doesn't help
 that the actual differences are largely undocumented either.
 
 One difference is that a statically allocated empty string is null
 terminated, but I think that can be safely ignored in the case of
 return values.
True, but the question was not whether one should use null or "" for the "nothing here" return value of a function. The question was whether the function returning null should mean something different than it returning "".
I meant to imply that null and empty should not be used to mean two different things, sorry if I didn't make myself clear. AFAIK, none of the Phobos functions that take string arguments care about the difference. If the length is zero, the pointer value is ignored. In light of this, I don't know what different meanings null and empty would or should have.
There are definitely situations where it is valuable to differentiate between null and empty, but in the case of D arrays, they really aren't designed for it, because nearly everything in the language treats them as being the same thing. There may be some value in differentiating them in spite of that, but it doesn't generally work very well. One of the few places would be the return value of a function. So, if there could reasonably be a difference between "" and null for the return value of a function, then it could be reasonable to null mean something different than "". But the truth is that that's going to be error prone, because people are likely to use == null instead of is null, not realizing that == null doesn't do what they want (in fact, arguably, == null merits a warning). So, if there's no clear gain in returning null, the documentation should just say that it returns empty, and then it doesn't matter whether it returns "" or null. It _is_ a bit of a conundrum though. I'm not sure that making null and "" virtually identical was ultimately a good idea, but we're stuck with it at this point. - Jonathan M Davis
Jul 18 2011
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 7/18/11 7:23 AM, torhu wrote:
 On 18.07.2011 11:42, Lars T. Kyllingstad wrote:
 On Sun, 17 Jul 2011 22:38:43 -0700, Jonathan M Davis wrote:

 On Sunday 17 July 2011 22:08:27 Brian Schott wrote:
 The documentation comments for driveName say that the return value will
 be an empty string in some circumstances, but the code and unit tests
 both say that the behavior is to return null.
The fun part with that is that "" == null and a null string is empty per std.array.empty, so it _is_ the empty string. The only difference is that "" !is null. So, if the function says that it returns null, then it needs to return null. Since it says that it returns the empty string, it could return either. Now, in spite of all that, there's still a problem since the tests verify that the return value is null, not empty. Either the documentation should say that it returns null, or the tests should be checking for empty, not null. But still, the documentation isn't incorrect. Are the tests are perfectly valid, but they really shouldn't be testing for is null instead of empty when the function is supposed to return empty.
Pending a decision on the null vs. empty issue, I have now standardised on using empty() for testing whether functions return empty strings.
I'd like to make a case for null as the 'nothing here' value. The advantage of using null is that all possible ways of testing for 'nothingness' (is, ==, as a boolean condition, empty range) will work. But if you return an empty string, you can't do 'str is null', because that will be false. With null there's just no doubt, and no way to get the test wrong.
Note that there are two aspects: generating 'nothing here' values, and testing for 'nothing here'. In keeping with the "be generous with what you receive and conservative with what you send" mantra, good functions should test string inputs with str.empty and return null strings. Andrei
Jul 18 2011
parent reply "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:
On Mon, 18 Jul 2011 09:38:08 -0500, Andrei Alexandrescu wrote:

 On 7/18/11 7:23 AM, torhu wrote:
 On 18.07.2011 11:42, Lars T. Kyllingstad wrote:
 On Sun, 17 Jul 2011 22:38:43 -0700, Jonathan M Davis wrote:

 On Sunday 17 July 2011 22:08:27 Brian Schott wrote:
 The documentation comments for driveName say that the return value
 will be an empty string in some circumstances, but the code and unit
 tests both say that the behavior is to return null.
The fun part with that is that "" == null and a null string is empty per std.array.empty, so it _is_ the empty string. The only difference is that "" !is null. So, if the function says that it returns null, then it needs to return null. Since it says that it returns the empty string, it could return either. Now, in spite of all that, there's still a problem since the tests verify that the return value is null, not empty. Either the documentation should say that it returns null, or the tests should be checking for empty, not null. But still, the documentation isn't incorrect. Are the tests are perfectly valid, but they really shouldn't be testing for is null instead of empty when the function is supposed to return empty.
Pending a decision on the null vs. empty issue, I have now standardised on using empty() for testing whether functions return empty strings.
I'd like to make a case for null as the 'nothing here' value. The advantage of using null is that all possible ways of testing for 'nothingness' (is, ==, as a boolean condition, empty range) will work. But if you return an empty string, you can't do 'str is null', because that will be false. With null there's just no doubt, and no way to get the test wrong.
Note that there are two aspects: generating 'nothing here' values, and testing for 'nothing here'.
Some have argued that there is an extra dimension to this, namely the distinction between "nothing here" and "something here, but that something is an empty string". I am not convinced we should make that distinction.
 In keeping with the "be generous with what you receive and conservative
 with what you send" mantra, good functions should test string inputs
 with str.empty and return null strings.
The specific example which spurred the debate was the following: While there is no doubt that extension("foo") should return null, Vladimir Panteleev argued that extension("foo.") should be *specified* to return "" (specifically, an empty slice from the end of the input string) to indicate that there is an "empty extension". I disagree, I don't think null and "" should have different semantics. The fact that extension() currently *does* behave as Vladimir wants is, in my opinion, an implementation detail. Note that extension() seems to be the only function for which the controversy has arisen so far, so it may not be worth taking this discussion too far. -Lars
Jul 18 2011
parent reply "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Mon, 18 Jul 2011 18:07:12 +0300, Lars T. Kyllingstad  
<public kyllingen.nospamnet> wrote:

 The fact that extension() currently *does* behave as Vladimir wants is,
 in my opinion, an implementation detail.
Is it still an implementation detail if it's documented behavior? -- Best regards, Vladimir mailto:vladimir thecybershadow.net
Jul 18 2011
parent "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
Sorry, I thought you meant the old getExt().
Jul 18 2011
prev sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Mon, 18 Jul 2011 08:23:18 -0400, torhu <no spam.invalid> wrote:

 On 18.07.2011 11:42, Lars T. Kyllingstad wrote:
 On Sun, 17 Jul 2011 22:38:43 -0700, Jonathan M Davis wrote:

  On Sunday 17 July 2011 22:08:27 Brian Schott wrote:
  The documentation comments for driveName say that the return value  
 will
  be an empty string in some circumstances, but the code and unit tests
  both say that the behavior is to return null.
The fun part with that is that "" == null and a null string is empty per std.array.empty, so it _is_ the empty string. The only difference is that "" !is null. So, if the function says that it returns null, then it needs to return null. Since it says that it returns the empty string, it could return either. Now, in spite of all that, there's still a problem since the tests verify that the return value is null, not empty. Either the documentation should say that it returns null, or the tests should be checking for empty, not null. But still, the documentation isn't incorrect. Are the tests are perfectly valid, but they really shouldn't be testing for is null instead of empty when the function is supposed to return empty.
Pending a decision on the null vs. empty issue, I have now standardised on using empty() for testing whether functions return empty strings.
I'd like to make a case for null as the 'nothing here' value. The advantage of using null is that all possible ways of testing for 'nothingness' (is, ==, as a boolean condition, empty range) will work. But if you return an empty string, you can't do 'str is null', because that will be false. With null there's just no doubt, and no way to get the test wrong.
The one that's kind of nice is the if(path.extension), which reads not only much better than if(path.extension == null), but it's a very common idiom in many languages (using if to test a string's emptiness). People are likely to get this wrong (in fact, it may make sense for *all* empty arrays to evaluate as false for an if condition). I personally think if there's no real difference, returning null is the better option based on these points. However, if there is some performance/maintenance advantage to not returning null, then just return an empty non-null array and specify in the API docs that the function returns an empty string. -Steve
Jul 18 2011
prev sibling next sibling parent reply "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:
On Sun, 17 Jul 2011 21:27:41 +0000, Lars T. Kyllingstad wrote:

 Based on your comments, I have made some changes to my std.path
 proposal.  A list of the changes I have made can be found at the
 following address (look at the commits dated 2011-07-17):
 
   https://github.com/kyllingstad/phobos/commits/std-path
 
 I believe I have covered most of your requests, with a few exceptions:
It seems I forgot about the CTFEability tests. I'll fix that too, and push the updated code later today. -Lars
Jul 18 2011
parent "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:
On Mon, 18 Jul 2011 10:05:07 +0000, Lars T. Kyllingstad wrote:

 On Sun, 17 Jul 2011 21:27:41 +0000, Lars T. Kyllingstad wrote:
 
 Based on your comments, I have made some changes to my std.path
 proposal.  A list of the changes I have made can be found at the
 following address (look at the commits dated 2011-07-17):
 
   https://github.com/kyllingstad/phobos/commits/std-path
 
 I believe I have covered most of your requests, with a few exceptions:
It seems I forgot about the CTFEability tests. I'll fix that too, and push the updated code later today.
Done. Most functions were CTFEable without any modifications (thanks, Don!). :) The exceptions are relativePath (because of std.algorithm.cmp) and expandTilde (which is strictly a run-time function). -Lars
Jul 18 2011
prev sibling next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Sun, 17 Jul 2011 17:27:41 -0400, Lars T. Kyllingstad  
<public kyllingen.nospamnet> wrote:

 Based on your comments, I have made some changes to my std.path
 proposal.  A list of the changes I have made can be found at the
 following address (look at the commits dated 2011-07-17):

   https://github.com/kyllingstad/phobos/commits/std-path
This is a review of the docs/design. I'll review the code separately: basename's standards section says: (with suitable adaptions for Windows paths) adaptions => adaptations This occurs twice. In driveName: Should std.path handle uunc paths? i.e. \\servername\share\path (I think if it does, it should specify \\servername\share as the drive) joinPath: Does this normalize the paths? For example: joinPath("/home/steves", "../lars") => /home/steves/../lars or /home/lars ? If so, the docs should reflect that. If not, maybe it should :) If it doesn't, at least the docs should state that it doesn't. pathSplitter: I think this should be a bi-directional range (no technical limitation I can think of). fcmp: "On Windows, fcmp is an alias for std.string.icmp, which yields a case insensitive comparison. On POSIX, it is an alias for std.algorithm.cmp, i.e. a case sensitive comparison." What about comparing c:/foo with c:\foo? This isn't going to be equal with icmp. expandTilde: I've commented on expandTilde from the other posts, but if it is kept a posix-only function, the documentation should reflect that.
Jul 18 2011
parent reply "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:
On Mon, 18 Jul 2011 13:16:29 -0400, Steven Schveighoffer wrote:

 On Sun, 17 Jul 2011 17:27:41 -0400, Lars T. Kyllingstad
 <public kyllingen.nospamnet> wrote:
 
 Based on your comments, I have made some changes to my std.path
 proposal.  A list of the changes I have made can be found at the
 following address (look at the commits dated 2011-07-17):

   https://github.com/kyllingstad/phobos/commits/std-path
This is a review of the docs/design. I'll review the code separately: basename's standards section says: (with suitable adaptions for Windows paths) adaptions => adaptations
Oops. Thanks!
 This occurs twice.
Copy+paste. :)
 In driveName:
 
 Should std.path handle uunc paths?  i.e. \\servername\share\path  (I
 think if it does, it should specify \\servername\share as the drive)
Yes, std.path is supposed to support UNC paths. For instance, the following works now: assert (equal(pathSplitter(`\\foo\bar\baz`), [`\\foo`, "bar", "baz"])); I guess you would rather have that assert (equal(pathSplitter(`\\foo\bar\baz`), [`\\foo\bar`, "baz"])); then? I am not very familiar with Windows network shares; is \\foo never a valid path on its own? As I understand it, some POSIX systems also mount network drives using similar paths. Does anyone know whether "//foo" is a valid path on these systems, or does it have to bee "//foo/bar"?
 joinPath:
 
 Does this normalize the paths?  For example:
 
 joinPath("/home/steves", "../lars") => /home/steves/../lars or
 /home/lars ?
 
 If so, the docs should reflect that.  If not, maybe it should :)  If it
 doesn't, at least the docs should state that it doesn't.
No, it doesn't, and I don't think it should. It is better to let the user choose whether they want the overhead of normalization by calling normalize() explicitly. I will specify this in the docs.
 pathSplitter:
 
 I think this should be a bi-directional range (no technical limitation I
 can think of).
It is more of a complexity vs. benefit thing, but as you are the second person to ask for this, I will look into it. A convincing use case would be nice, though. :)
 fcmp:
 "On Windows, fcmp is an alias for std.string.icmp, which yields a case
 insensitive comparison. On POSIX, it is an alias for std.algorithm.cmp,
 i.e. a case sensitive comparison."
 
 What about comparing c:/foo with c:\foo?  This isn't going to be equal
 with icmp.
I am a bit unsure what to do about the comparison functions (fcmp, pathCharMatch and globMatch). Aside from the issue with directory separators it is, as was pointed out by someone else, entirely possible to mount case-sensitive file systems on Windows and case-insensitive file systems on POSIX. (The latter is not uncommon on OSX, I believe.) I am open to suggestions.
 expandTilde:
 
 I've commented on expandTilde from the other posts, but if it is kept a
 posix-only function, the documentation should reflect that.
It does; look at the "Returns" section. Perhaps it should be moved to a more prominent location? -Lars
Jul 18 2011
next sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On 2011-07-18 11:25, Lars T. Kyllingstad wrote:
 On Mon, 18 Jul 2011 13:16:29 -0400, Steven Schveighoffer wrote:
 On Sun, 17 Jul 2011 17:27:41 -0400, Lars T. Kyllingstad
 
 <public kyllingen.nospamnet> wrote:
 Based on your comments, I have made some changes to my std.path
 proposal. A list of the changes I have made can be found at the
 
 following address (look at the commits dated 2011-07-17):
 https://github.com/kyllingstad/phobos/commits/std-path
This is a review of the docs/design. I'll review the code separately: basename's standards section says: (with suitable adaptions for Windows paths) adaptions => adaptations
Oops. Thanks!
 This occurs twice.
Copy+paste. :)
 In driveName:
 
 Should std.path handle uunc paths? i.e. \\servername\share\path (I
 think if it does, it should specify \\servername\share as the drive)
Yes, std.path is supposed to support UNC paths. For instance, the following works now: assert (equal(pathSplitter(`\\foo\bar\baz`), [`\\foo`, "bar", "baz"])); I guess you would rather have that assert (equal(pathSplitter(`\\foo\bar\baz`), [`\\foo\bar`, "baz"])); then? I am not very familiar with Windows network shares; is \\foo never a valid path on its own? As I understand it, some POSIX systems also mount network drives using similar paths. Does anyone know whether "//foo" is a valid path on these systems, or does it have to bee "//foo/bar"?
 joinPath:
 
 Does this normalize the paths? For example:
 
 joinPath("/home/steves", "../lars") => /home/steves/../lars or
 /home/lars ?
 
 If so, the docs should reflect that. If not, maybe it should :) If it
 doesn't, at least the docs should state that it doesn't.
No, it doesn't, and I don't think it should. It is better to let the user choose whether they want the overhead of normalization by calling normalize() explicitly. I will specify this in the docs.
 pathSplitter:
 
 I think this should be a bi-directional range (no technical limitation I
 can think of).
It is more of a complexity vs. benefit thing, but as you are the second person to ask for this, I will look into it. A convincing use case would be nice, though. :)
 fcmp:
 "On Windows, fcmp is an alias for std.string.icmp, which yields a case
 insensitive comparison. On POSIX, it is an alias for std.algorithm.cmp,
 i.e. a case sensitive comparison."
 
 What about comparing c:/foo with c:\foo? This isn't going to be equal
 with icmp.
I am a bit unsure what to do about the comparison functions (fcmp, pathCharMatch and globMatch). Aside from the issue with directory separators it is, as was pointed out by someone else, entirely possible to mount case-sensitive file systems on Windows and case-insensitive file systems on POSIX. (The latter is not uncommon on OSX, I believe.) I am open to suggestions.
 expandTilde:
 
 I've commented on expandTilde from the other posts, but if it is kept a
 posix-only function, the documentation should reflect that.
It does; look at the "Returns" section. Perhaps it should be moved to a more prominent location?
I suggest that you do what I did in std.file (e.g. with getTimesWin). I put this at the very top of the ddoc comment: $(BLUE This function is Windows-Only.) or if it's only on Posix: $(BLUE This function is Posix-Only.) - Jonathan M Davis
Jul 18 2011
prev sibling next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Mon, 18 Jul 2011 14:25:57 -0400, Lars T. Kyllingstad  
<public kyllingen.nospamnet> wrote:

 On Mon, 18 Jul 2011 13:16:29 -0400, Steven Schveighoffer wrote:
 In driveName:

 Should std.path handle uunc paths?  i.e. \\servername\share\path  (I
 think if it does, it should specify \\servername\share as the drive)
Yes, std.path is supposed to support UNC paths. For instance, the following works now: assert (equal(pathSplitter(`\\foo\bar\baz`), [`\\foo`, "bar", "baz"])); I guess you would rather have that assert (equal(pathSplitter(`\\foo\bar\baz`), [`\\foo\bar`, "baz"])); then? I am not very familiar with Windows network shares; is \\foo never a valid path on its own?
It is and it isn't. It's *not* a normal directory, because only shares can be in that directory. In other words, the point at which a UNC path turns into normal directory structure is after the share name. An easy way to compare is, you can only map drive letters to shares, not to servers.
 As I understand it, some POSIX systems also mount network drives using
 similar paths.  Does anyone know whether "//foo" is a valid path on these
 systems, or does it have to bee "//foo/bar"?
Typically, linux uses URL's, i.e. smb://server/share URL parsing is probably not in std.path's charter. However, I have used a command like: mount -t cifs //server/share /mnt/serverfiles But this is only in very special contexts. In general I don't think //foo should be considered a server path on Posix systems.
 joinPath:

 Does this normalize the paths?  For example:

 joinPath("/home/steves", "../lars") => /home/steves/../lars or
 /home/lars ?

 If so, the docs should reflect that.  If not, maybe it should :)  If it
 doesn't, at least the docs should state that it doesn't.
No, it doesn't, and I don't think it should. It is better to let the user choose whether they want the overhead of normalization by calling normalize() explicitly. I will specify this in the docs.
In fact, if you do not normalize during the join, it's *more* overhead to normalize afterwards. If normalization is done while joining, then you only build one string. There's no need to build a non-normalized string, then build a normalized string based on that. Plus the data is only iterated once. I think it's at least worth an option, but I'm not going to hold back my vote based on this :)
 pathSplitter:

 I think this should be a bi-directional range (no technical limitation I
 can think of).
It is more of a complexity vs. benefit thing, but as you are the second person to ask for this, I will look into it. A convincing use case would be nice, though. :)
Well a path is more like a stack than a queue. You are usually operating more on the back side of it. To provide back and popBack makes a lot of sense to me. For example, to implement the command cd ../foo, you need to popBack the topmost directory.
 fcmp:
 "On Windows, fcmp is an alias for std.string.icmp, which yields a case
 insensitive comparison. On POSIX, it is an alias for std.algorithm.cmp,
 i.e. a case sensitive comparison."

 What about comparing c:/foo with c:\foo?  This isn't going to be equal
 with icmp.
I am a bit unsure what to do about the comparison functions (fcmp, pathCharMatch and globMatch). Aside from the issue with directory separators it is, as was pointed out by someone else, entirely possible to mount case-sensitive file systems on Windows and case-insensitive file systems on POSIX. (The latter is not uncommon on OSX, I believe.) I am open to suggestions.
It's definitely something to think about. At the very least, I think the default file system case sensitivity should be mapped to a certain function. It doesn't hurt to expose the opposite sensitivity as an alternate (you need to implement both anyway). A template with all options defaulted for the current OS makes good sense I think. Actually, expanding/renaming pathCharMatch provides a perfect way to default these: e.g.: version(Windows) { enum defaultOSSensitivity = false; enum defaultOSDirSeps = `\/`; } else version(Posix) { enum defaultOSSensitivity = true; enum defaultOSDirSeps = "/"; } // replaces pathCharMatch int pathCharCmp(bool caseSensitive = defaultOSSensitivity, string dirseps = defaultOSDirSeps)(dchar a, dchar b); int fcmp(alias pred = "pathCharCmp(a, b)", S1, S2)(S1 filename1, S2 filename2); Anyone who wants to do alternate comparisons is free to do so using other options from pathCharCmp.
 expandTilde:

 I've commented on expandTilde from the other posts, but if it is kept a
 posix-only function, the documentation should reflect that.
It does; look at the "Returns" section. Perhaps it should be moved to a more prominent location?
Yes. It should say (Posix-only). I believe technically that it should fail to compile on Windows if it does not map to a "home" directory there. Note that as named, it's possible to confuse with expanding the DOS 8.3 name of a file, i.e. Progra~1 -Steve
Jul 18 2011
parent reply "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:
On Mon, 18 Jul 2011 14:51:06 -0400, Steven Schveighoffer wrote:

 On Mon, 18 Jul 2011 14:25:57 -0400, Lars T. Kyllingstad
 <public kyllingen.nospamnet> wrote:
 
 On Mon, 18 Jul 2011 13:16:29 -0400, Steven Schveighoffer wrote:
 In driveName:

 Should std.path handle uunc paths?  i.e. \\servername\share\path  (I
 think if it does, it should specify \\servername\share as the drive)
Yes, std.path is supposed to support UNC paths. For instance, the following works now: assert (equal(pathSplitter(`\\foo\bar\baz`), [`\\foo`, "bar", "baz"])); I guess you would rather have that assert (equal(pathSplitter(`\\foo\bar\baz`), [`\\foo\bar`, "baz"])); then? I am not very familiar with Windows network shares; is \\foo never a valid path on its own?
It is and it isn't.
Well, that certainly cleared things up. ;)
 It's *not* a normal directory, because only shares
 can be in that directory.  In other words, the point at which a UNC path
 turns into normal directory structure is after the share name.
 
 An easy way to compare is, you can only map drive letters to shares, not
 to servers.
Then driveName() should probably return the full share path. But, of the following asserts, which should pass? assert (pathSplitter(`\\foo\bar\baz`).front == `\\foo\bar`); assert (pathSplitter(`\\foo\bar\baz`).front == `\\foo`); assert (baseName(`\\foo\bar`) == `\\foo\bar`); assert (baseName(`\\foo\bar`) == "bar"); assert (dirName(`\\foo\bar`) == `\\foo\bar`); assert (dirName(`\\foo\bar`) == `\\foo`); Note that if you replace `\\foo\bar` with `c:\` in the above, the first assert in each pair will pass. Same with "/" on POSIX. Basically, that choice corresponds to treating `\\foo\bar` as a filesystem root.
 As I understand it, some POSIX systems also mount network drives using
 similar paths.  Does anyone know whether "//foo" is a valid path on
 these systems, or does it have to bee "//foo/bar"?
Typically, linux uses URL's, i.e. smb://server/share URL parsing is probably not in std.path's charter. However, I have used a command like: mount -t cifs //server/share /mnt/serverfiles But this is only in very special contexts. In general I don't think //foo should be considered a server path on Posix systems.
I actually got a request on the Phobos list that std.path should support such paths. Furthermore, the POSIX stardard explicitly mentions "//" paths (though it basically says it is implementation-defined whether to bother dealing with them).
 joinPath:

 Does this normalize the paths?  For example:

 joinPath("/home/steves", "../lars") => /home/steves/../lars or
 /home/lars ?

 If so, the docs should reflect that.  If not, maybe it should :)  If
 it doesn't, at least the docs should state that it doesn't.
No, it doesn't, and I don't think it should. It is better to let the user choose whether they want the overhead of normalization by calling normalize() explicitly. I will specify this in the docs.
In fact, if you do not normalize during the join, it's *more* overhead to normalize afterwards. If normalization is done while joining, then you only build one string. There's no need to build a non-normalized string, then build a normalized string based on that. Plus the data is only iterated once. I think it's at least worth an option, but I'm not going to hold back my vote based on this :)
If it doesn't turn out to be a huge undertaking, I think I'll replace joinPath() with a function buildPath() that takes an input range of path segments and joins them together, with optional normalization. Then, normalize(path) can be implemented as: buildPath(pathSplitter(path)); Does that sound sensible?
 pathSplitter:

 I think this should be a bi-directional range (no technical limitation
 I can think of).
It is more of a complexity vs. benefit thing, but as you are the second person to ask for this, I will look into it. A convincing use case would be nice, though. :)
Well a path is more like a stack than a queue. You are usually operating more on the back side of it. To provide back and popBack makes a lot of sense to me. For example, to implement the command cd ../foo, you need to popBack the topmost directory.
Ok, I'll see what I can do about it. :)
 fcmp:
 "On Windows, fcmp is an alias for std.string.icmp, which yields a case
 insensitive comparison. On POSIX, it is an alias for
 std.algorithm.cmp, i.e. a case sensitive comparison."

 What about comparing c:/foo with c:\foo?  This isn't going to be equal
 with icmp.
I am a bit unsure what to do about the comparison functions (fcmp, pathCharMatch and globMatch). Aside from the issue with directory separators it is, as was pointed out by someone else, entirely possible to mount case-sensitive file systems on Windows and case-insensitive file systems on POSIX. (The latter is not uncommon on OSX, I believe.) I am open to suggestions.
It's definitely something to think about. At the very least, I think the default file system case sensitivity should be mapped to a certain function. It doesn't hurt to expose the opposite sensitivity as an alternate (you need to implement both anyway). A template with all options defaulted for the current OS makes good sense I think. Actually, expanding/renaming pathCharMatch provides a perfect way to default these: e.g.: version(Windows) { enum defaultOSSensitivity = false; enum defaultOSDirSeps = `\/`; } else version(Posix) { enum defaultOSSensitivity = true; enum defaultOSDirSeps = "/"; } // replaces pathCharMatch int pathCharCmp(bool caseSensitive = defaultOSSensitivity, string dirseps = defaultOSDirSeps)(dchar a, dchar b); int fcmp(alias pred = "pathCharCmp(a, b)", S1, S2)(S1 filename1, S2 filename2); Anyone who wants to do alternate comparisons is free to do so using other options from pathCharCmp.
Good idea. I'll probably implement something like that.
 expandTilde:

 I've commented on expandTilde from the other posts, but if it is kept
 a posix-only function, the documentation should reflect that.
It does; look at the "Returns" section. Perhaps it should be moved to a more prominent location?
Yes. It should say (Posix-only). I believe technically that it should fail to compile on Windows if it does not map to a "home" directory there. Note that as named, it's possible to confuse with expanding the DOS 8.3 name of a file, i.e. Progra~1
I agree. I'll put it inside a version(Posix) block. -Lars
Jul 20 2011
next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 20 Jul 2011 13:36:51 -0400, Lars T. Kyllingstad  
<public kyllingen.nospamnet> wrote:

 On Mon, 18 Jul 2011 14:51:06 -0400, Steven Schveighoffer wrote:

 On Mon, 18 Jul 2011 14:25:57 -0400, Lars T. Kyllingstad
 <public kyllingen.nospamnet> wrote:

 On Mon, 18 Jul 2011 13:16:29 -0400, Steven Schveighoffer wrote:
 In driveName:

 Should std.path handle uunc paths?  i.e. \\servername\share\path  (I
 think if it does, it should specify \\servername\share as the drive)
Yes, std.path is supposed to support UNC paths. For instance, the following works now: assert (equal(pathSplitter(`\\foo\bar\baz`), [`\\foo`, "bar", "baz"])); I guess you would rather have that assert (equal(pathSplitter(`\\foo\bar\baz`), [`\\foo\bar`, "baz"])); then? I am not very familiar with Windows network shares; is \\foo never a valid path on its own?
It is and it isn't.
Well, that certainly cleared things up. ;)
It is in that if you open explorer and type in \\servername, it will give you a list of shares you can try. But I don't think it's a valid *path*, except in explorer. So my intuition is to declare it never a valid path. I'm not sure how \\server interacts with the low level functions of Windows (such as CreateFile). Some research/experimentation is probably warranted.
 It's *not* a normal directory, because only shares
 can be in that directory.  In other words, the point at which a UNC path
 turns into normal directory structure is after the share name.

 An easy way to compare is, you can only map drive letters to shares, not
 to servers.
Then driveName() should probably return the full share path. But, of the following asserts, which should pass? assert (pathSplitter(`\\foo\bar\baz`).front == `\\foo\bar`); assert (pathSplitter(`\\foo\bar\baz`).front == `\\foo`); assert (baseName(`\\foo\bar`) == `\\foo\bar`); assert (baseName(`\\foo\bar`) == "bar"); assert (dirName(`\\foo\bar`) == `\\foo\bar`); assert (dirName(`\\foo\bar`) == `\\foo`); Note that if you replace `\\foo\bar` with `c:\` in the above, the first assert in each pair will pass. Same with "/" on POSIX. Basically, that choice corresponds to treating `\\foo\bar` as a filesystem root.
Yes, I think this sounds right (pending research/experimentation cited above).
 As I understand it, some POSIX systems also mount network drives using
 similar paths.  Does anyone know whether "//foo" is a valid path on
 these systems, or does it have to bee "//foo/bar"?
Typically, linux uses URL's, i.e. smb://server/share URL parsing is probably not in std.path's charter. However, I have used a command like: mount -t cifs //server/share /mnt/serverfiles But this is only in very special contexts. In general I don't think //foo should be considered a server path on Posix systems.
I actually got a request on the Phobos list that std.path should support such paths. Furthermore, the POSIX stardard explicitly mentions "//" paths (though it basically says it is implementation-defined whether to bother dealing with them).
ls //root lists the contents of /root. I'd guess that opening //root with open() would simply open /root. Given that context, they should not be considered to be a server path IMO.
 joinPath:

 Does this normalize the paths?  For example:

 joinPath("/home/steves", "../lars") => /home/steves/../lars or
 /home/lars ?

 If so, the docs should reflect that.  If not, maybe it should :)  If
 it doesn't, at least the docs should state that it doesn't.
No, it doesn't, and I don't think it should. It is better to let the user choose whether they want the overhead of normalization by calling normalize() explicitly. I will specify this in the docs.
In fact, if you do not normalize during the join, it's *more* overhead to normalize afterwards. If normalization is done while joining, then you only build one string. There's no need to build a non-normalized string, then build a normalized string based on that. Plus the data is only iterated once. I think it's at least worth an option, but I'm not going to hold back my vote based on this :)
If it doesn't turn out to be a huge undertaking, I think I'll replace joinPath() with a function buildPath() that takes an input range of path segments and joins them together, with optional normalization. Then, normalize(path) can be implemented as: buildPath(pathSplitter(path)); Does that sound sensible?
That sounds good. -Steve
Jul 20 2011
next sibling parent reply "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:
On Wed, 20 Jul 2011 14:16:04 -0400, Steven Schveighoffer wrote:

 I'm not sure how \\server interacts with the low level functions of
 Windows (such as CreateFile).  Some research/experimentation is probably
 warranted.
Any .NET programmers out there? Can you please tell me what the following functions return? System.IO.Path.GetDirectoryName("\\foo\bar") System.IO.Path.GetPathRoot("\\foo\bar\baz") -Lars
Jul 20 2011
parent reply Jussi Jumppanen <jussij zeusedit.com> writes:
Lars T. Kyllingstad Wrote:

 On Wed, 20 Jul 2011 14:16:04 -0400, Steven Schveighoffer wrote:

 Any .NET programmers out there?  Can you please tell me what the 
 following functions return?
 
   System.IO.Path.GetDirectoryName("\\foo\bar")
   System.IO.Path.GetPathRoot("\\foo\bar\baz")
This code: using System; namespace Test { static class Program { [STAThread] static void Main() { string test; test = "\\foo\bar\"; Console.WriteLine("System.IO.Path.GetDirectoryName(" + test + ")"); Console.WriteLine(System.IO.Path.GetDirectoryName(test)); test = "\\foo\bar"; Console.WriteLine("System.IO.Path.GetDirectoryName(" + test + ")"); Console.WriteLine(System.IO.Path.GetDirectoryName(test)); test = "\\foo\bar\baz"; Console.WriteLine("System.IO.Path.GetDirectoryName(" + test + ")"); Console.WriteLine(System.IO.Path.GetPathRoot(test)); } } } produced this output: C:\temp>test.exe System.IO.Path.GetDirectoryName(\\foo\bar\) \\foo\bar System.IO.Path.GetDirectoryName(\\foo\bar) System.IO.Path.GetDirectoryName(\\foo\bar\baz) \\foo\bar Cheers Jussi
Jul 21 2011
parent "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:
On Thu, 21 Jul 2011 03:36:37 -0400, Jussi Jumppanen wrote:

 Lars T. Kyllingstad Wrote:
 
 On Wed, 20 Jul 2011 14:16:04 -0400, Steven Schveighoffer wrote:

 Any .NET programmers out there?  Can you please tell me what the
 following functions return?
 
   System.IO.Path.GetDirectoryName("\\foo\bar")
   System.IO.Path.GetPathRoot("\\foo\bar\baz")
This code: using System; namespace Test { static class Program { [STAThread] static void Main() { string test; test = "\\foo\bar\"; Console.WriteLine("System.IO.Path.GetDirectoryName(" + test + ")"); Console.WriteLine(System.IO.Path.GetDirectoryName(test)); test = "\\foo\bar"; Console.WriteLine("System.IO.Path.GetDirectoryName(" + test + ")"); Console.WriteLine(System.IO.Path.GetDirectoryName(test)); test = "\\foo\bar\baz"; Console.WriteLine("System.IO.Path.GetDirectoryName(" + test + ")"); Console.WriteLine(System.IO.Path.GetPathRoot(test)); } } } produced this output: C:\temp>test.exe System.IO.Path.GetDirectoryName(\\foo\bar\) \\foo\bar System.IO.Path.GetDirectoryName(\\foo\bar) System.IO.Path.GetDirectoryName(\\foo\bar\baz) \\foo\bar Cheers Jussi
Thanks, this is very helpful. Now we know that MS's APIs treat \\foo\bar as a root directory, so we should do the same. This means that, once I get around to implementing it, the following asserts will pass on Windows: assert (baseName(`\\foo\bar`) == `\\foo\bar`); assert (dirName(`\\foo\bar`) == `\\foo\bar`); assert (pathSplitter(`\\foo\bar\baz`).front == `\\foo\bar`); This is analogous to the following on POSIX (where the behaviour mimics that of the basename and dirname shell utilities): assert (baseName("/") == "/"); assert (dirName("/") == "/"); assert (pathSplitter("/").front == "/"); -Lars
Jul 21 2011
prev sibling parent reply Rainer Schuetze <r.sagitario gmx.de> writes:
On 20.07.2011 20:16, Steven Schveighoffer wrote:
 ls //root lists the contents of /root. I'd guess that opening //root
 with open() would simply open /root. Given that context, they should not
 be considered to be a server path IMO.
If that's true for the bare open() without going through possible translations in "ls", I'd guess that "//server/share" would look for a file/directory "share" in "/server", so std.path should treat it this way for posix, too. Sorry, if my previous comments in the phobos-list caused confusion, I must have confused the mount share with a directory specification.
Jul 21 2011
parent "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:
On Thu, 21 Jul 2011 09:09:52 +0200, Rainer Schuetze wrote:

 On 20.07.2011 20:16, Steven Schveighoffer wrote:
 ls //root lists the contents of /root. I'd guess that opening //root
 with open() would simply open /root. Given that context, they should
 not be considered to be a server path IMO.
If that's true for the bare open() without going through possible translations in "ls", I'd guess that "//server/share" would look for a file/directory "share" in "/server", so std.path should treat it this way for posix, too. Sorry, if my previous comments in the phobos-list caused confusion, I must have confused the mount share with a directory specification.
All right, I'll remove "//path" support again. That simplifies things for POSIX, at least. -Lars
Jul 21 2011
prev sibling next sibling parent "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:
On Wed, 20 Jul 2011 17:36:51 +0000, Lars T. Kyllingstad wrote:

 On Mon, 18 Jul 2011 14:51:06 -0400, Steven Schveighoffer wrote:
 
 On Mon, 18 Jul 2011 14:25:57 -0400, Lars T. Kyllingstad
 <public kyllingen.nospamnet> wrote:
 
 On Mon, 18 Jul 2011 13:16:29 -0400, Steven Schveighoffer wrote:
 In driveName:

 Should std.path handle uunc paths?  i.e. \\servername\share\path  (I
 think if it does, it should specify \\servername\share as the drive)
Yes, std.path is supposed to support UNC paths. For instance, the following works now: assert (equal(pathSplitter(`\\foo\bar\baz`), [`\\foo`, "bar", "baz"])); I guess you would rather have that assert (equal(pathSplitter(`\\foo\bar\baz`), [`\\foo\bar`, "baz"])); then? I am not very familiar with Windows network shares; is \\foo never a valid path on its own?
It is and it isn't.
Well, that certainly cleared things up. ;)
 It's *not* a normal directory, because only shares can be in that
 directory.  In other words, the point at which a UNC path turns into
 normal directory structure is after the share name.
 
 An easy way to compare is, you can only map drive letters to shares,
 not to servers.
Then driveName() should probably return the full share path. But, of the following asserts, which should pass? assert (pathSplitter(`\\foo\bar\baz`).front == `\\foo\bar`); assert (pathSplitter(`\\foo\bar\baz`).front == `\\foo`); assert (baseName(`\\foo\bar`) == `\\foo\bar`); assert (baseName(`\\foo\bar`) == "bar"); assert (dirName(`\\foo\bar`) == `\\foo\bar`); assert (dirName(`\\foo\bar`) == `\\foo`); Note that if you replace `\\foo\bar` with `c:\` in the above, the first assert in each pair will pass. Same with "/" on POSIX. Basically, that choice corresponds to treating `\\foo\bar` as a filesystem root.
 As I understand it, some POSIX systems also mount network drives using
 similar paths.  Does anyone know whether "//foo" is a valid path on
 these systems, or does it have to bee "//foo/bar"?
Typically, linux uses URL's, i.e. smb://server/share URL parsing is probably not in std.path's charter. However, I have used a command like: mount -t cifs //server/share /mnt/serverfiles But this is only in very special contexts. In general I don't think //foo should be considered a server path on Posix systems.
I actually got a request on the Phobos list that std.path should support such paths. Furthermore, the POSIX stardard explicitly mentions "//" paths (though it basically says it is implementation-defined whether to bother dealing with them).
 joinPath:

 Does this normalize the paths?  For example:

 joinPath("/home/steves", "../lars") => /home/steves/../lars or
 /home/lars ?

 If so, the docs should reflect that.  If not, maybe it should :)  If
 it doesn't, at least the docs should state that it doesn't.
No, it doesn't, and I don't think it should. It is better to let the user choose whether they want the overhead of normalization by calling normalize() explicitly. I will specify this in the docs.
In fact, if you do not normalize during the join, it's *more* overhead to normalize afterwards. If normalization is done while joining, then you only build one string. There's no need to build a non-normalized string, then build a normalized string based on that. Plus the data is only iterated once. I think it's at least worth an option, but I'm not going to hold back my vote based on this :)
If it doesn't turn out to be a huge undertaking, I think I'll replace joinPath() with a function buildPath() that takes an input range of path segments and joins them together, with optional normalization. Then, normalize(path) can be implemented as: buildPath(pathSplitter(path)); Does that sound sensible?
Actually, I realise now, it doesn't. :) Since joinPath/buildPath needs to support path segments containing multiple directories, normalize would just be buildPath(path) -Lars
Jul 20 2011
prev sibling parent Rainer Schuetze <r.sagitario gmx.de> writes:
On 20.07.2011 19:36, Lars T. Kyllingstad wrote:
 On Mon, 18 Jul 2011 14:51:06 -0400, Steven Schveighoffer wrote:
 It's definitely something to think about.  At the very least, I think
 the default file system case sensitivity should be mapped to a certain
 function.  It doesn't hurt to expose the opposite sensitivity as an
 alternate (you need to implement both anyway).  A template with all
 options defaulted for the current OS makes good sense I think.
 Actually, expanding/renaming pathCharMatch provides a perfect way to
 default these:

 e.g.:
 version(Windows)
 {
      enum defaultOSSensitivity = false;
      enum defaultOSDirSeps = `\/`;
 }
 else version(Posix)
 {
      enum defaultOSSensitivity = true;
      enum defaultOSDirSeps = "/";
 }

 // replaces pathCharMatch
 int pathCharCmp(bool caseSensitive = defaultOSSensitivity, string
 dirseps = defaultOSDirSeps)(dchar a, dchar b);

 int fcmp(alias pred = "pathCharCmp(a, b)", S1, S2)(S1 filename1, S2
 filename2);

 Anyone who wants to do alternate comparisons is free to do so using
 other options from pathCharCmp.
Good idea. I'll probably implement something like that.
I like the direction that this is heading. If the idea gets extended to other functions as well, you won't have to reimplement std.path if you have to deal with posix paths on windows and vice versa, e.g. when transferring data containing paths between different systems.
Jul 21 2011
prev sibling parent reply "Nick Sabalausky" <a a.a> writes:
"Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> wrote in message 
news:j01trl$2ia$6 digitalmars.com...
 On Mon, 18 Jul 2011 13:16:29 -0400, Steven Schveighoffer wrote:
 In driveName:

 Should std.path handle uunc paths?  i.e. \\servername\share\path  (I
 think if it does, it should specify \\servername\share as the drive)
Yes, std.path is supposed to support UNC paths. For instance, the following works now: assert (equal(pathSplitter(`\\foo\bar\baz`), [`\\foo`, "bar", "baz"])); I guess you would rather have that assert (equal(pathSplitter(`\\foo\bar\baz`), [`\\foo\bar`, "baz"])); then? I am not very familiar with Windows network shares; is \\foo never a valid path on its own?
I don't know whether or not it's "never" a valid path, but "dir \\server" always fails and "dir \\server\share" always works (assuming it exists, at least). So treating the whole thing as a drive might be the right thing to do. (Of course, it's completely moronic that WIndows works that way...)
 fcmp:
 "On Windows, fcmp is an alias for std.string.icmp, which yields a case
 insensitive comparison. On POSIX, it is an alias for std.algorithm.cmp,
 i.e. a case sensitive comparison."

 What about comparing c:/foo with c:\foo?  This isn't going to be equal
 with icmp.
I am a bit unsure what to do about the comparison functions (fcmp, pathCharMatch and globMatch). Aside from the issue with directory separators it is, as was pointed out by someone else, entirely possible to mount case-sensitive file systems on Windows and case-insensitive file systems on POSIX. (The latter is not uncommon on OSX, I believe.) I am open to suggestions.
If such mountings are possible, it would seem that there must be some way to check the sensitivity (otherwise the OS itself would probably crap out on it). Although, at least in the case of case-insensitive mountings on posix, doesn't that mean such paths would have both case-sensitive and case-insensitive parts? Ex: /mount/damnWinDrive/dir/subdir Wouldn't the "mount/damnWinDrive" part be case-sensitive and the "dir/subdir" part be insensitve? (I'm starting to really despise case-insensitive filesystems.)
Jul 19 2011
next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
Here's some relevant info:
http://msdn.microsoft.com/en-us/library/aa365247%28v=vs.85%29.aspx
Jul 19 2011
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 19 Jul 2011 15:55:29 -0400, Nick Sabalausky <a a.a> wrote:

 If such mountings are possible, it would seem that there must be some  
 way to
 check the sensitivity (otherwise the OS itself would probably crap out on
 it).
I've done it before, mounted a windows share on a linux box via cifs. What happens is, everything thinks it's case sensitive (i.e. any user-space tools), but when you go to open a file, write a file, rename a file, the share performs as if it were case insensitive. For example: ls /mnt/winshare File.txt find /mnt/winshare -name FILE.TXT No files found touch /mnt/winshare/FILE.TXT => updates date/time on File.txt cat /mnt/winshare/FILE.TXT => outputs File.txt So as long as you are performing operations *blindly*, the case insensitivity kicks in. For example, open a file without first searching for it. But if you start reading directories, tools have no idea it's on a case-insensitive filesystem.
 Although, at least in the case of case-insensitive mountings on posix,
 doesn't that mean such paths would have both case-sensitive and
 case-insensitive parts?

 Ex: /mount/damnWinDrive/dir/subdir

 Wouldn't the "mount/damnWinDrive" part be case-sensitive and the
 "dir/subdir" part be insensitve?
Yes, actually, this is a very good point. And there's no way for std.path to make that distinction.
 (I'm starting to really despise case-insensitive filesystems.)
I've never understood why they have any benefits whatsoever. The only reason I can think of them having any use is legacy. -Steve
Jul 20 2011
prev sibling parent reply "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:
On Tue, 19 Jul 2011 15:55:29 -0400, Nick Sabalausky wrote:

 "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> wrote in message
 news:j01trl$2ia$6 digitalmars.com...
 On Mon, 18 Jul 2011 13:16:29 -0400, Steven Schveighoffer wrote:
 In driveName:

 Should std.path handle uunc paths?  i.e. \\servername\share\path  (I
 think if it does, it should specify \\servername\share as the drive)
Yes, std.path is supposed to support UNC paths. For instance, the following works now: assert (equal(pathSplitter(`\\foo\bar\baz`), [`\\foo`, "bar", "baz"])); I guess you would rather have that assert (equal(pathSplitter(`\\foo\bar\baz`), [`\\foo\bar`, "baz"])); then? I am not very familiar with Windows network shares; is \\foo never a valid path on its own?
I don't know whether or not it's "never" a valid path, but "dir \\server" always fails and "dir \\server\share" always works (assuming it exists, at least). So treating the whole thing as a drive might be the right thing to do. (Of course, it's completely moronic that WIndows works that way...)
 fcmp:
 "On Windows, fcmp is an alias for std.string.icmp, which yields a case
 insensitive comparison. On POSIX, it is an alias for
 std.algorithm.cmp, i.e. a case sensitive comparison."

 What about comparing c:/foo with c:\foo?  This isn't going to be equal
 with icmp.
I am a bit unsure what to do about the comparison functions (fcmp, pathCharMatch and globMatch). Aside from the issue with directory separators it is, as was pointed out by someone else, entirely possible to mount case-sensitive file systems on Windows and case-insensitive file systems on POSIX. (The latter is not uncommon on OSX, I believe.) I am open to suggestions.
If such mountings are possible, it would seem that there must be some way to check the sensitivity (otherwise the OS itself would probably crap out on it).
That check would probably be orders of magnitude more expensive than a simple string operation.
 Although, at least in the case of case-insensitive mountings on posix,
 doesn't that mean such paths would have both case-sensitive and
 case-insensitive parts?
 
 Ex: /mount/damnWinDrive/dir/subdir
 
 Wouldn't the "mount/damnWinDrive" part be case-sensitive and the
 "dir/subdir" part be insensitve?
Argh.
 (I'm starting to really despise case-insensitive filesystems.)
Me too. Does anyone know whether Windows' case insensitivity is limited to ASCII? If not, is the filesystem Unicode-aware, or does it uses some locale specific codepage to compare file names? -Lars
Jul 20 2011
next sibling parent reply Alix Pexton <alix.DOT.pexton gmail.DOT.com> writes:
On 20/07/2011 20:57, Lars T. Kyllingstad wrote:
 Does anyone know whether Windows' case insensitivity is limited to ASCII?
 If not, is the filesystem Unicode-aware, or does it uses some locale
 specific codepage to compare file names?

 -Lars
Wikipedia says Windows long file names are up to 255 UTF-16 characters (or code points, depending which article you refer to >< ) Seems consistent with Microsoft's approach to character encoding throughout the rest of the Windows API.
 http://en.wikipedia.org/wiki/Long_filename
A...
Jul 20 2011
parent "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:
On Wed, 20 Jul 2011 22:20:16 +0100, Alix Pexton wrote:

 On 20/07/2011 20:57, Lars T. Kyllingstad wrote:
 Does anyone know whether Windows' case insensitivity is limited to
 ASCII? If not, is the filesystem Unicode-aware, or does it uses some
 locale specific codepage to compare file names?

 -Lars
Wikipedia says Windows long file names are up to 255 UTF-16 characters (or code points, depending which article you refer to >< ) Seems consistent with Microsoft's approach to character encoding throughout the rest of the Windows API.
 http://en.wikipedia.org/wiki/Long_filename
Thanks! In other words, fcmp() needs to do UTF-16 decoding... -Lars
Jul 21 2011
prev sibling parent Rainer Schuetze <r.sagitario gmx.de> writes:
 Does anyone know whether Windows' case insensitivity is limited to ASCII?
 If not, is the filesystem Unicode-aware, or does it uses some locale
 specific codepage to compare file names?
I just tried a few examples: Using umlauts works as expected, i.e. upper or lower case characters are treated as the same. I then used the greek omega (\u3a9 and \u3c9), still files with upper and lower case are the same, even back on a FAT-16 usb drive (even though some ~-magic is going on there which might not work in Windows 3.1-).
 -Lars
Jul 20 2011
prev sibling parent torhu <no spam.invalid> writes:
On 17.07.2011 23:27, Lars T. Kyllingstad wrote:
 - Should it be specified/documented whether a function returns "" or
 null?  Specifically, is it important that

      extension("foo") is null
      extension("foo.") !is null&&  extension("foo.") == ""
I guess you've already thought about this, but one solution is to just return the dot as part of the extension. Then you get extension("foo.") == ".". I noticed that .NET's getExtension method does this. setExtension and defaultExtension would probably have to change to at least accept extensions that include the dot, if extension() is changed.
Jul 21 2011