www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - std.path review: second update

reply "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:
Here's a new update based on your comments and requests.  Code and docs 
are in the usual place,

  https://github.com/kyllingstad/phobos/blob/std-path/std/path.d
  http://www.kyllingen.net/code/std-path/phobos-prerelease/std_path.html

Here are the highlights:

* UNC paths, i.e. \\server\share\..., are now (hopefully) handled 
correctly on Windows.  See the baseName(), dirName(), rootName(), 
driveName(), pathSplitter(), etc. docs for examples.

* Support for //foo/bar paths on POSIX has been removed.

* rootName() is new.

* pathSplitter() now returns a bidirectional range.

* pathCharMatch() and fcmp() have been replaced by improved functions 
filenameCharCmp() and filenameCmp(), respectively, with optional case 
sensitivity.

* joinPath() has been renamed to buildPath().

* Added a function buildNormalizedPath(), which performs normalization 
while joining the segments.

* Redefined normalize() in terms of buildNormalizedPath().

I am still unsure of the extent to which long UNC paths (i.e. \\?\...) 
should be supported, if at all.  If anyone has anything to say on the 
matter, please do. :)

-Lars
Jul 29 2011
next sibling parent Johannes Pfau <spam example.com> writes:
Lars T. Kyllingstad wrote:
Here's a new update based on your comments and requests.  Code and
docs are in the usual place,

  https://github.com/kyllingstad/phobos/blob/std-path/std/path.d
  http://www.kyllingen.net/code/std-path/phobos-prerelease/std_path.html

Here are the highlights:

* UNC paths, i.e. \\server\share\..., are now (hopefully) handled 
correctly on Windows.  See the baseName(), dirName(), rootName(), 
driveName(), pathSplitter(), etc. docs for examples.

* Support for //foo/bar paths on POSIX has been removed.

* rootName() is new.

* pathSplitter() now returns a bidirectional range.

* pathCharMatch() and fcmp() have been replaced by improved functions 
filenameCharCmp() and filenameCmp(), respectively, with optional case 
sensitivity.

* joinPath() has been renamed to buildPath().

* Added a function buildNormalizedPath(), which performs normalization 
while joining the segments.

* Redefined normalize() in terms of buildNormalizedPath().

I am still unsure of the extent to which long UNC paths (i.e. \\?\...) 
should be supported, if at all.  If anyone has anything to say on the 
matter, please do. :)

-Lars

As globMatch's "individual character comparisons are done calling filenameCharCmp()" and filenameCharCmp has case-sensitive and case-insensitive versions, could we also get this behavior for globMatch? I even have a use case for this ;-) The FreeDesktop MIME database uses it: http://standards.freedesktop.org/shared-mime-info-spec/shared-mime-info-spec-latest.html "Applications MUST match globs case-insensitively, except when the case-sensitive attribute is set to true. This is so that e.g. main.C will be seen as a C++ file, but IMAGE.GIF will still use the *.gif pattern." -- Johannes Pfau
Jul 29 2011
prev sibling next sibling parent reply KennyTM~ <kennytm gmail.com> writes:
On Jul 30, 11 02:06, Lars T. Kyllingstad wrote:
 Here's a new update based on your comments and requests.  Code and docs
 are in the usual place,

    https://github.com/kyllingstad/phobos/blob/std-path/std/path.d
    http://www.kyllingen.net/code/std-path/phobos-prerelease/std_path.html

 Here are the highlights:

 * UNC paths, i.e. \\server\share\..., are now (hopefully) handled
 correctly on Windows.  See the baseName(), dirName(), rootName(),
 driveName(), pathSplitter(), etc. docs for examples.

 * Support for //foo/bar paths on POSIX has been removed.

 * rootName() is new.

 * pathSplitter() now returns a bidirectional range.

 * pathCharMatch() and fcmp() have been replaced by improved functions
 filenameCharCmp() and filenameCmp(), respectively, with optional case
 sensitivity.

 * joinPath() has been renamed to buildPath().

 * Added a function buildNormalizedPath(), which performs normalization
 while joining the segments.

 * Redefined normalize() in terms of buildNormalizedPath().

 I am still unsure of the extent to which long UNC paths (i.e. \\?\...)
 should be supported, if at all.  If anyone has anything to say on the
 matter, please do. :)

 -Lars

Thanks for the nice work! Comments: - pathSplitter: empty, front, back could be const. Also, make the struct 'static struct'. - hasDrive, isDriveRoot: the path "#:\x" should not pass hasDrive. In Windows only /[a-zA-Z]/ are supported drive letters. '#:' may work but is not officially supported. - Bug 6390 (= 6337) has been fixed. Some CTFE tests can be enabled. - baseName, dirName "TODO: safe pure nothrow": Mention which bugs are preventing this (std.conv.to and std.string.chomp?). - buildPath, More than two path components: looks like a perfect candidate for std.algorithm.reduce. And this should probably accept a range, but I'm not sure. - CaseSensitive.osDefault: As I mentioned before, version(OSX) should set this to 'no'. - globMatch: only UTF-8 is supported? - extension: Note that the DDoc text of 'extension' is highlighted.
Jul 29 2011
next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Fri, 29 Jul 2011 15:47:55 -0400, KennyTM~ <kennytm gmail.com> wrote:


 Comments:

 - pathSplitter: empty, front, back could be const.

I know it seems like "if it could be const, we should mark it const", but does it always make sense? I mean, consider this example. Should front/back/empty be const? What are ever the chance that someone has a const range? Consider that a const range is near useless (it can't do it's primary function -- iterate). Just wondering... -Steve
Jul 29 2011
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Friday 29 July 2011 15:58:14 Steven Schveighoffer wrote:
 On Fri, 29 Jul 2011 15:47:55 -0400, KennyTM~ <kennytm gmail.com> wrote:
 Comments:
 
 - pathSplitter: empty, front, back could be const.

I know it seems like "if it could be const, we should mark it const", but does it always make sense? I mean, consider this example. Should front/back/empty be const? What are ever the chance that someone has a const range? Consider that a const range is near useless (it can't do it's primary function -- iterate). Just wondering...

I'd generally argue that if it can be const, it should be const. So, it's probably better to make empty, front, and back const. However, const functions _range_ rather pointless for ranges. So, I'm not really sure that it matters. It could go either way. - Jonathan M Davis
Jul 29 2011
prev sibling next sibling parent reply "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:
On Sat, 30 Jul 2011 03:47:55 +0800, KennyTM~ wrote:

 On Jul 30, 11 02:06, Lars T. Kyllingstad wrote:
 Here's a new update based on your comments and requests.  Code and docs
 are in the usual place,

    https://github.com/kyllingstad/phobos/blob/std-path/std/path.d
    http://www.kyllingen.net/code/std-path/phobos-prerelease/


 Here are the highlights:

 * UNC paths, i.e. \\server\share\..., are now (hopefully) handled
 correctly on Windows.  See the baseName(), dirName(), rootName(),
 driveName(), pathSplitter(), etc. docs for examples.

 * Support for //foo/bar paths on POSIX has been removed.

 * rootName() is new.

 * pathSplitter() now returns a bidirectional range.

 * pathCharMatch() and fcmp() have been replaced by improved functions
 filenameCharCmp() and filenameCmp(), respectively, with optional case
 sensitivity.

 * joinPath() has been renamed to buildPath().

 * Added a function buildNormalizedPath(), which performs normalization
 while joining the segments.

 * Redefined normalize() in terms of buildNormalizedPath().

 I am still unsure of the extent to which long UNC paths (i.e. \\?\...)
 should be supported, if at all.  If anyone has anything to say on the
 matter, please do. :)

 -Lars

Thanks for the nice work! Comments: - pathSplitter: empty, front, back could be const. Also, make the struct 'static struct'.

Will do.
 - hasDrive, isDriveRoot: the path
 
      "#:\x"
 
 should not pass hasDrive. In Windows only /[a-zA-Z]/ are supported drive
 letters. '#:' may work but is not officially supported.

True, but then "#:" must be considered a directory name, which is not much better. The module generally assumes that any path you pass to it is well-formed. If not, the results are undefined. The rationale for this is that you don't know whether a path is valid, i.e. well-formed AND correct (pointing to the right place in the file system), until you try to use it. std.path can in principle verify well- formedness, but since it cannot verify correctness, both may just as well be taken care of by the OS.
 - Bug 6390 (= 6337) has been fixed. Some CTFE tests can be enabled.

I know, but the fix hasn't been released yet. I'd prefer if people didn't have to build the compiler from repo to test the module. I'll be sure to enable the tests when DMD 2.055 has been released.
 - baseName, dirName "TODO:  safe pure nothrow": Mention which bugs are
 preventing this (std.conv.to and std.string.chomp?).

Will do.
 - buildPath, More than two path components: looks like a perfect
 candidate for  std.algorithm.reduce. And this should probably accept a
 range, but I'm not sure.

I'll look into using reduce. I have considered letting the function take a general range, but variadic parameters likely cover the vast majority of use cases. If there is a real need for a range version, however, I'd be happy to add it. Otherwise, I'd rather avoid the API clutter.
 - CaseSensitive.osDefault: As I mentioned before, version(OSX) should
 set this to 'no'.

Sorry, forgot about that. I'll fix it.
 - globMatch: only UTF-8 is supported?

Except for the name change (fnmatch -> globMatch), this function is not written by me. It is taken from the current std.path, and is for all intents and purposes not part of my submission. However, if it is trivial to redefine it in terms of general string types, I will do so. I'll check.
 - extension: Note that the DDoc text of 'extension' is highlighted.

Thanks! -Lars
Jul 30 2011
parent reply KennyTM~ <kennytm gmail.com> writes:
On Jul 30, 11 17:00, Lars T. Kyllingstad wrote:
 On Sat, 30 Jul 2011 03:47:55 +0800, KennyTM~ wrote:

 On Jul 30, 11 02:06, Lars T. Kyllingstad wrote:


 Thanks for the nice work!

 Comments:

 - pathSplitter: empty, front, back could be const. Also, make the struct
 'static struct'.

Will do.
 - hasDrive, isDriveRoot: the path

       "#:\x"

 should not pass hasDrive. In Windows only /[a-zA-Z]/ are supported drive
 letters. '#:' may work but is not officially supported.

True, but then "#:" must be considered a directory name, which is not much better. The module generally assumes that any path you pass to it is well-formed. If not, the results are undefined. The rationale for this is that you don't know whether a path is valid, i.e. well-formed AND correct (pointing to the right place in the file system), until you try to use it. std.path can in principle verify well- formedness, but since it cannot verify correctness, both may just as well be taken care of by the OS.

Fair enough.
 - Bug 6390 (= 6337) has been fixed. Some CTFE tests can be enabled.

I know, but the fix hasn't been released yet. I'd prefer if people didn't have to build the compiler from repo to test the module. I'll be sure to enable the tests when DMD 2.055 has been released.

If this module is accepted, it will be released along with 2.055 which carries the fix of 6337 also. Have to delay those tests to 2.056 doesn't sound reasonable to me. It's OK if those are just disabled to ease reviewing, but I'd argue that in the final commit, the tests must be enabled whenever the trunk allows it.
 - baseName, dirName "TODO:  safe pure nothrow": Mention which bugs are
 preventing this (std.conv.to and std.string.chomp?).

Will do.
 - buildPath, More than two path components: looks like a perfect
 candidate for  std.algorithm.reduce. And this should probably accept a
 range, but I'm not sure.

I'll look into using reduce. I have considered letting the function take a general range, but variadic parameters likely cover the vast majority of use cases. If there is a real need for a range version, however, I'd be happy to add it. Otherwise, I'd rather avoid the API clutter.

OK.
 - CaseSensitive.osDefault: As I mentioned before, version(OSX) should
 set this to 'no'.

Sorry, forgot about that. I'll fix it.
 - globMatch: only UTF-8 is supported?

Except for the name change (fnmatch -> globMatch), this function is not written by me. It is taken from the current std.path, and is for all intents and purposes not part of my submission. However, if it is trivial to redefine it in terms of general string types, I will do so. I'll check.

OK.
 - extension: Note that the DDoc text of 'extension' is highlighted.

Thanks! -Lars

Jul 30 2011
next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Saturday 30 July 2011 15:38:33 Lars T. Kyllingstad wrote:
 On Sat, 30 Jul 2011 23:27:33 +0800, KennyTM~ wrote:
 On Jul 30, 11 17:00, Lars T. Kyllingstad wrote:
 On Sat, 30 Jul 2011 03:47:55 +0800, KennyTM~ wrote:
 - hasDrive, isDriveRoot: the path
 
       "#:\x"
 
 should not pass hasDrive. In Windows only /[a-zA-Z]/ are supported
 drive letters. '#:' may work but is not officially supported.

True, but then "#:" must be considered a directory name, which is not much better. The module generally assumes that any path you pass to it is well-formed. If not, the results are undefined. The rationale for this is that you don't know whether a path is valid, i.e. well-formed AND correct (pointing to the right place in the file system), until you try to use it. std.path can in principle verify well- formedness, but since it cannot verify correctness, both may just as well be taken care of by the OS.

Fair enough.

I've added a note stating this to the module documentation.
 - Bug 6390 (= 6337) has been fixed. Some CTFE tests can be enabled.

I know, but the fix hasn't been released yet. I'd prefer if people didn't have to build the compiler from repo to test the module. I'll be sure to enable the tests when DMD 2.055 has been released.

If this module is accepted, it will be released along with 2.055 which carries the fix of 6337 also. Have to delay those tests to 2.056 doesn't sound reasonable to me. It's OK if those are just disabled to ease reviewing, but I'd argue that in the final commit, the tests must be enabled whenever the trunk allows it.

I think the current policy is that Phobos trunk must compile with the latest released compiler. I'll check up on this, and if I'm wrong, I will of course enable the tests when I commit the code to Phobos.

That's wrong. The latest druntime and Phobos in github do _not_ have to compile with the latest released version of the compiler - just the latest version of the compiler on github. The autotester is constantly running with the latest of all 3, and it's not altogether uncommon that a change is made in druntime and/or Phobos so that they'll still compile after a change to the compiler (or even upon occasion that a change is made in the compiler to make something in druntime or Phobos possible). It would be problematic for dmd's tests if Phobos and druntime were forced to work with the released version of the compiler instead of the latest in github, and it could cause problems for druntime and Phobos when you went to do a release. Such issues are likely to be fewer and farther between than they used to be, since the language doesn't have huge changes from release to release, but both the compiler and the libraries are in enough flux that tying the libraries to the last released version of the compiler would be a problem. I don't think that it's unreasonable to comment out some tests during the review process so that people aren't forced to grab the latest dmd from github, but they should be uncommented if/when the code gets merged into Phobos. - Jonathan M Davis
Jul 30 2011
prev sibling parent "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:
On Sat, 30 Jul 2011 14:09:47 -0700, Jonathan M Davis wrote:

 On Saturday 30 July 2011 15:38:33 Lars T. Kyllingstad wrote:
 On Sat, 30 Jul 2011 23:27:33 +0800, KennyTM~ wrote:
 On Jul 30, 11 17:00, Lars T. Kyllingstad wrote:
 On Sat, 30 Jul 2011 03:47:55 +0800, KennyTM~ wrote:
 - hasDrive, isDriveRoot: the path
 
       "#:\x"
 
 should not pass hasDrive. In Windows only /[a-zA-Z]/ are supported
 drive letters. '#:' may work but is not officially supported.

True, but then "#:" must be considered a directory name, which is not much better. The module generally assumes that any path you pass to it is well-formed. If not, the results are undefined. The rationale for this is that you don't know whether a path is valid, i.e. well-formed AND correct (pointing to the right place in the file system), until you try to use it. std.path can in principle verify well- formedness, but since it cannot verify correctness, both may just as well be taken care of by the OS.

Fair enough.

I've added a note stating this to the module documentation.
 - Bug 6390 (= 6337) has been fixed. Some CTFE tests can be enabled.

I know, but the fix hasn't been released yet. I'd prefer if people didn't have to build the compiler from repo to test the module. I'll be sure to enable the tests when DMD 2.055 has been released.

If this module is accepted, it will be released along with 2.055 which carries the fix of 6337 also. Have to delay those tests to 2.056 doesn't sound reasonable to me. It's OK if those are just disabled to ease reviewing, but I'd argue that in the final commit, the tests must be enabled whenever the trunk allows it.

I think the current policy is that Phobos trunk must compile with the latest released compiler. I'll check up on this, and if I'm wrong, I will of course enable the tests when I commit the code to Phobos.

That's wrong. The latest druntime and Phobos in github do _not_ have to compile with the latest released version of the compiler - just the latest version of the compiler on github. The autotester is constantly running with the latest of all 3, and it's not altogether uncommon that a change is made in druntime and/or Phobos so that they'll still compile after a change to the compiler (or even upon occasion that a change is made in the compiler to make something in druntime or Phobos possible). It would be problematic for dmd's tests if Phobos and druntime were forced to work with the released version of the compiler instead of the latest in github, and it could cause problems for druntime and Phobos when you went to do a release. Such issues are likely to be fewer and farther between than they used to be, since the language doesn't have huge changes from release to release, but both the compiler and the libraries are in enough flux that tying the libraries to the last released version of the compiler would be a problem. I don't think that it's unreasonable to comment out some tests during the review process so that people aren't forced to grab the latest dmd from github, but they should be uncommented if/when the code gets merged into Phobos.

All right, thanks for clearing that up. -Lars
Jul 31 2011
prev sibling parent "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:
On Sat, 30 Jul 2011 23:27:33 +0800, KennyTM~ wrote:

 On Jul 30, 11 17:00, Lars T. Kyllingstad wrote:
 On Sat, 30 Jul 2011 03:47:55 +0800, KennyTM~ wrote:
 - hasDrive, isDriveRoot: the path

       "#:\x"

 should not pass hasDrive. In Windows only /[a-zA-Z]/ are supported
 drive letters. '#:' may work but is not officially supported.

True, but then "#:" must be considered a directory name, which is not much better. The module generally assumes that any path you pass to it is well-formed. If not, the results are undefined. The rationale for this is that you don't know whether a path is valid, i.e. well-formed AND correct (pointing to the right place in the file system), until you try to use it. std.path can in principle verify well- formedness, but since it cannot verify correctness, both may just as well be taken care of by the OS.

Fair enough.

I've added a note stating this to the module documentation.
 - Bug 6390 (= 6337) has been fixed. Some CTFE tests can be enabled.

I know, but the fix hasn't been released yet. I'd prefer if people didn't have to build the compiler from repo to test the module. I'll be sure to enable the tests when DMD 2.055 has been released.

carries the fix of 6337 also. Have to delay those tests to 2.056 doesn't sound reasonable to me. It's OK if those are just disabled to ease reviewing, but I'd argue that in the final commit, the tests must be enabled whenever the trunk allows it.

I think the current policy is that Phobos trunk must compile with the latest released compiler. I'll check up on this, and if I'm wrong, I will of course enable the tests when I commit the code to Phobos. -Lars
Jul 30 2011
prev sibling next sibling parent "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:
On Fri, 29 Jul 2011 21:10:30 +0200, Johannes Pfau wrote:

 Lars T. Kyllingstad wrote:
Here's a new update based on your comments and requests.  Code and docs
are in the usual place,

  https://github.com/kyllingstad/phobos/blob/std-path/std/path.d
  http://www.kyllingen.net/code/std-path/phobos-prerelease/std_path.html

Here are the highlights:

* UNC paths, i.e. \\server\share\..., are now (hopefully) handled
correctly on Windows.  See the baseName(), dirName(), rootName(),
driveName(), pathSplitter(), etc. docs for examples.

* Support for //foo/bar paths on POSIX has been removed.

* rootName() is new.

* pathSplitter() now returns a bidirectional range.

* pathCharMatch() and fcmp() have been replaced by improved functions
filenameCharCmp() and filenameCmp(), respectively, with optional case
sensitivity.

* joinPath() has been renamed to buildPath().

* Added a function buildNormalizedPath(), which performs normalization
while joining the segments.

* Redefined normalize() in terms of buildNormalizedPath().

I am still unsure of the extent to which long UNC paths (i.e. \\?\...)
should be supported, if at all.  If anyone has anything to say on the
matter, please do. :)

-Lars

As globMatch's "individual character comparisons are done calling filenameCharCmp()" and filenameCharCmp has case-sensitive and case-insensitive versions, could we also get this behavior for globMatch?

Sure. That was my intention all along, but I forgot. I'll fix it as soon as possible.
 I even have a use case for this ;-)
 The FreeDesktop MIME database uses it:
 http://standards.freedesktop.org/shared-mime-info-spec/shared-mime-info-

 
 "Applications MUST match globs case-insensitively, except when the
 case-sensitive attribute is set to true. This is so that e.g. main.C
 will be seen as a C++ file, but IMAGE.GIF will still use the *.gif
 pattern."

:) -Lars
Jul 30 2011
prev sibling next sibling parent "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:
Ok, I've pushed some improvements based on Johannes Pfau and KennyTM~'s 
comments now.  The relevant commits are dated 2011-07-30:

  https://github.com/kyllingstad/phobos/commits/std-path

-Lars
Jul 30 2011
prev sibling next sibling parent reply "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:
On Fri, 29 Jul 2011 18:06:58 +0000, Lars T. Kyllingstad wrote:

 Here's a new update based on your comments and requests.  [...]

You may have noticed that I did not incorporate Steve's suggestion to make the directory separator(s) a template parameter. The reason is that this is the *smallest* difference between paths on the different platforms. Drive letters and UNC paths are a much bigger difference, and make interoperability near impossible. Therefore, I would just like to reiterate a suggestion I made on the Phobos list a while ago, which was then shot down. What if we put the entire module inside a template, like this: enum Platform { windows, posix } enum CaseSensitive { yes, no } template Path(Platform platform, CaseSensitive caseSensitive) { /* Every function in the module goes inside this template, and instead of version (Windows) { ... } and so on, we use static if (platform == Platform.windows) { ... } That way, if people for some reason need to deal with POSIX paths on Windows, for instance, they can just write Path!(Platform.posix).someFunction(myPath); */ } // Of course, we don't want to add a cumbersome prefix every time // we call a function for the current platform. By mixing in the // template, std.path can be used *exactly* like now: version (Windows) private enum currentPlatform = Platform.windows; else version (Posix) private enum currentPlatform = Platform.posix; mixin Path!(currentPlatform, platformDefaultCaseSensitivity); What do you think? -Lars
Jul 31 2011
next sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Sunday 31 July 2011 14:24:30 Lars T. Kyllingstad wrote:
 On Fri, 29 Jul 2011 18:06:58 +0000, Lars T. Kyllingstad wrote:
 Here's a new update based on your comments and requests.  [...]

You may have noticed that I did not incorporate Steve's suggestion to make the directory separator(s) a template parameter. The reason is that this is the *smallest* difference between paths on the different platforms. Drive letters and UNC paths are a much bigger difference, and make interoperability near impossible. Therefore, I would just like to reiterate a suggestion I made on the Phobos list a while ago, which was then shot down. What if we put the entire module inside a template, like this: enum Platform { windows, posix } enum CaseSensitive { yes, no } template Path(Platform platform, CaseSensitive caseSensitive) { /* Every function in the module goes inside this template, and instead of version (Windows) { ... } and so on, we use static if (platform == Platform.windows) { ... } That way, if people for some reason need to deal with POSIX paths on Windows, for instance, they can just write Path!(Platform.posix).someFunction(myPath); */ } // Of course, we don't want to add a cumbersome prefix every time // we call a function for the current platform. By mixing in the // template, std.path can be used *exactly* like now: version (Windows) private enum currentPlatform = Platform.windows; else version (Posix) private enum currentPlatform = Platform.posix; mixin Path!(currentPlatform, platformDefaultCaseSensitivity); What do you think?

Honestly, it seems like overkill and overly messy. Doing something that drastic would have to add a lot of value IMHO, and I just don't see it. And honestly, even if it _did_ add a lot of value, I think that I'd want a cleaner solution to be found. - Jonathan M Davis
Jul 31 2011
parent "Nick Sabalausky" <a a.a> writes:
"Jonathan M Davis" <jmdavisProg gmx.com> wrote in message 
news:mailman.1989.1312140117.14074.digitalmars-d puremagic.com...
 On Sunday 31 July 2011 14:24:30 Lars T. Kyllingstad wrote:
 On Fri, 29 Jul 2011 18:06:58 +0000, Lars T. Kyllingstad wrote:
 Here's a new update based on your comments and requests.  [...]

You may have noticed that I did not incorporate Steve's suggestion to make the directory separator(s) a template parameter. The reason is that this is the *smallest* difference between paths on the different platforms. Drive letters and UNC paths are a much bigger difference, and make interoperability near impossible. Therefore, I would just like to reiterate a suggestion I made on the Phobos list a while ago, which was then shot down. What if we put the entire module inside a template, like this: enum Platform { windows, posix } enum CaseSensitive { yes, no } template Path(Platform platform, CaseSensitive caseSensitive) { /* Every function in the module goes inside this template, and instead of version (Windows) { ... } and so on, we use static if (platform == Platform.windows) { ... } That way, if people for some reason need to deal with POSIX paths on Windows, for instance, they can just write Path!(Platform.posix).someFunction(myPath); */ } // Of course, we don't want to add a cumbersome prefix every time // we call a function for the current platform. By mixing in the // template, std.path can be used *exactly* like now: version (Windows) private enum currentPlatform = Platform.windows; else version (Posix) private enum currentPlatform = Platform.posix; mixin Path!(currentPlatform, platformDefaultCaseSensitivity); What do you think?


I like it.
 Honestly, it seems like overkill and overly messy.

Doesn't seem so bad to me.
 Doing something that
 drastic would have to add a lot of value IMHO, and I just don't see it. 
 And
 honestly, even if it _did_ add a lot of value, I think that I'd want a 
 cleaner
 solution to be found.

I wouldn't necessarily be opposed to a cleaner solution if one were found.
Jul 31 2011
prev sibling parent "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:
On Sun, 31 Jul 2011 12:21:46 -0700, Jonathan M Davis wrote:

 On Sunday 31 July 2011 14:24:30 Lars T. Kyllingstad wrote:
 On Fri, 29 Jul 2011 18:06:58 +0000, Lars T. Kyllingstad wrote:
 Here's a new update based on your comments and requests.  [...]

You may have noticed that I did not incorporate Steve's suggestion to make the directory separator(s) a template parameter. The reason is that this is the *smallest* difference between paths on the different platforms. Drive letters and UNC paths are a much bigger difference, and make interoperability near impossible. Therefore, I would just like to reiterate a suggestion I made on the Phobos list a while ago, which was then shot down. What if we put the entire module inside a template, like this: enum Platform { windows, posix } enum CaseSensitive { yes, no } template Path(Platform platform, CaseSensitive caseSensitive) { /* Every function in the module goes inside this template, and instead of version (Windows) { ... } and so on, we use static if (platform == Platform.windows) { ... } That way, if people for some reason need to deal with POSIX paths on Windows, for instance, they can just write Path!(Platform.posix).someFunction(myPath); */ } // Of course, we don't want to add a cumbersome prefix every time // we call a function for the current platform. By mixing in the // template, std.path can be used *exactly* like now: version (Windows) private enum currentPlatform = Platform.windows; else version (Posix) private enum currentPlatform = Platform.posix; mixin Path!(currentPlatform, platformDefaultCaseSensitivity); What do you think?

Honestly, it seems like overkill and overly messy. Doing something that drastic would have to add a lot of value IMHO, and I just don't see it. And honestly, even if it _did_ add a lot of value, I think that I'd want a cleaner solution to be found.

I don't see what's so messy or drastic about it. For the most part, you'd use the module *exactly* as you do now. Handling paths from different platforms are not an uncommon need; people have requested more than once that this be possible with the new std.path. Zip archives and FTP clients/servers are good examples of use cases. What's nice is that this doesn't have to be added now. We can make the change I suggest at any later time without breaking backwards compatibility. -Lars
Aug 01 2011
prev sibling next sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
 On Sun, 31 Jul 2011 12:21:46 -0700, Jonathan M Davis wrote:
 On Sunday 31 July 2011 14:24:30 Lars T. Kyllingstad wrote:
 On Fri, 29 Jul 2011 18:06:58 +0000, Lars T. Kyllingstad wrote:
 Here's a new update based on your comments and requests. [...]

You may have noticed that I did not incorporate Steve's suggestion to make the directory separator(s) a template parameter. The reason is that this is the *smallest* difference between paths on the different platforms. Drive letters and UNC paths are a much bigger difference, and make interoperability near impossible. Therefore, I would just like to reiterate a suggestion I made on the Phobos list a while ago, which was then shot down. What if we put the entire module inside a template, like this: enum Platform { windows, posix } enum CaseSensitive { yes, no } template Path(Platform platform, CaseSensitive caseSensitive) { /* Every function in the module goes inside this template, and instead of version (Windows) { ... } and so on, we use static if (platform == Platform.windows) { ... } That way, if people for some reason need to deal with POSIX paths on Windows, for instance, they can just write Path!(Platform.posix).someFunction(myPath); */ } // Of course, we don't want to add a cumbersome prefix every time // we call a function for the current platform. By mixing in the // template, std.path can be used *exactly* like now: version (Windows) private enum currentPlatform = Platform.windows; else version (Posix) private enum currentPlatform = Platform.posix; mixin Path!(currentPlatform, platformDefaultCaseSensitivity); What do you think?

Honestly, it seems like overkill and overly messy. Doing something that drastic would have to add a lot of value IMHO, and I just don't see it. And honestly, even if it _did_ add a lot of value, I think that I'd want a cleaner solution to be found.

I don't see what's so messy or drastic about it. For the most part, you'd use the module *exactly* as you do now. Handling paths from different platforms are not an uncommon need; people have requested more than once that this be possible with the new std.path. Zip archives and FTP clients/servers are good examples of use cases. What's nice is that this doesn't have to be added now. We can make the change I suggest at any later time without breaking backwards compatibility.

It seems very messy to me from an implementation stand point. There's also the question about what it does to the spellchecker. If you type extensio instead of extension, will the spellchecker find it, or does the additional template obscure it? Also, can ddoc handle it? I'm not sure that a template mixin like that would be properly documented. Maybe it's ultimately a good idea, but it seems to me like it stands a good chance of causing problems. We'd have to have a clear grasp of all of the side effects first. - Jonathan M Davis
Aug 01 2011
prev sibling next sibling parent reply "Jonathan M Davis" <jmdavisProg gmx.com> writes:
 Here's a new update based on your comments and requests. Code and docs
 are in the usual place,
 
 https://github.com/kyllingstad/phobos/blob/std-path/std/path.d
 http://www.kyllingen.net/code/std-path/phobos-prerelease/std_path.html
 
 Here are the highlights:
 
 * UNC paths, i.e. \\server\share\..., are now (hopefully) handled
 correctly on Windows. See the baseName(), dirName(), rootName(),
 driveName(), pathSplitter(), etc. docs for examples.
 
 * Support for //foo/bar paths on POSIX has been removed.
 
 * rootName() is new.
 
 * pathSplitter() now returns a bidirectional range.
 
 * pathCharMatch() and fcmp() have been replaced by improved functions
 filenameCharCmp() and filenameCmp(), respectively, with optional case
 sensitivity.
 
 * joinPath() has been renamed to buildPath().
 
 * Added a function buildNormalizedPath(), which performs normalization
 while joining the segments.
 
 * Redefined normalize() in terms of buildNormalizedPath().
 
 I am still unsure of the extent to which long UNC paths (i.e. \\?\...)
 should be supported, if at all. If anyone has anything to say on the
 matter, please do. :)

Well, my previous comments on function names still stand - both in terms of names using Ext instead of Extension and Sep instead of sep - and also with regards to making non-property functions verbs (e.g. absolutePath and relativePath). Aside from that, wasn't it discussed that extensions should include the . in them in order to cover the case where a file ends in a .? Otherwise assert(extension("file.") == extension("file")); So, all of the stuff dealing with extensions should probably include the dot in the extension. It looks like both the current std.path and your std.path handle it by using "" for dot and null for nothing, but distinguishing between null and the empty string like that isn't necessarily a good idea, and your documentation doesn't make the distinction clear, unlike the documentation for the current std.path. So, at minimum, the docs should be clearer about null vs empty when it comes to the extension (and there should be tests for it), but it would arguably be better to just include the . in the extension. - Jonathan M Davis
Aug 01 2011
parent torhu <no spam.invalid> writes:
On 01.08.2011 20:06, Lars T. Kyllingstad wrote:
 On Mon, 01 Aug 2011 17:24:14 +0000, Jonathan M Davis wrote:

  Here's a new update based on your comments and requests. Code and docs
  are in the usual place,
  [...]

of names using Ext instead of Extension and Sep instead of sep -

Since the feedback has been minimal since my last update, I take it people are more or less happy with the module's functionality now. (Unless they're just tired of the whole discussion, of course. ;)) That means it is time to let the bikeshedding begin. Since we have been unable to reach a consensus in previous discussions, and there are now only three days left of the review, I would really like to get a good sample of the general opinion here. Folks, please state your preferences in terms of function names. I'll try to put personal bias aside and compose a naming scheme that is both internally consistent and consistent with the majority opinion.
  and also with regards to making non-property functions verbs (e.g.
  absolutePath and relativePath).

I'd be happy to change it, but I'm at loss for good alternatives. I seem to remember you suggesting makeAbsolute and makeRelative, but not being 100% happy with them yourself. Any other suggestions?

I tend to like the current naming scheme. I tried taking the list of functions and turning them into verbs, resulting in names like toAbsolute, ensureExtension, getBaseName, etc. It makes it a bit easier to tell what's a function and what's not. But given just a little bit of context, that's usually pretty clear. Then those "to" and "get" prefixes start to feel like more like noise. Leaving out prefixes in some cases leaves more room for making the names descriptive. I guess that's another way of seeing it, so I thought I'd mention it. It makes it a bit harder to get an overview of the module by reading the symbol list at the top of the doc page, though.
  Aside from that, wasn't it discussed that extensions should include the
  . in them in order to cover the case where a file ends in a .? Otherwise

  assert(extension("file.") == extension("file"));

  So, all of the stuff dealing with extensions should probably include the
  dot in the extension. It looks like both the current std.path and your
  std.path handle it by using "" for dot and null for nothing, but
  distinguishing between null and the empty string like that isn't
  necessarily a good idea, and your documentation doesn't make the
  distinction clear, unlike the documentation for the current std.path.

  So, at minimum, the docs should be clearer about null vs empty when it
  comes to the extension (and there should be tests for it), but it would
  arguably be better to just include the . in the extension.

I'm slowly coming around to the idea of including the dot. Unless I hear any loud protests I will probably do so.

I mentioned this in the other thread, but I'm hard pressed to find an actual use case for it. The only thing I can think of is that with some Windows applications specifying a file name that ends with a dot will suppress the addition of a default extension. Notepad is an example, it adds .txt by default. I'm not sure if that's really relevant, though. On Windows, I believe it's generally true that a file name that ends with a dot is not considered to have an extension. You can't normally create a file name ending with a dot, IIRC, fopen() will just ignore the dot and create the file without it. In light of that, I think the current behavior of extension() makes perfect sense. I the rare case that a file name ending with a dot exists on the file system, it could easily fool naive user code that would likely never have seen such a file name in testing: --- if (extension(fname).length == 0) { writeln("Missing extension, adding default!); setExtension(fname, "xyz"); } --- Returning null for the special case of a name ending with a dot seems reasonable, I'm fine with that. A slightly obscure solution for a slightly obscure problem. I think setExtension could be smarter, same goes for defaultExtension. This is what I think they should behave like: assert(setExtension("foo", "bar") == "foo.bar"); assert(setExtension("foo", ".bar") == "foo.bar"); assert(setExtension("foo.", "bar") == "foo.bar"); assert(setExtension("foo.", ".bar") == "foo.bar"); This is what .NET does. It's treating extensions not just like substrings, but recognizing that there is syntax and conventions associated with them. Which is what you want 90% of the time. Just my pragmatic, Windows-sentric view of things :)
Aug 01 2011
prev sibling next sibling parent "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:
On Mon, 01 Aug 2011 17:24:07 +0000, Jonathan M Davis wrote:

 On Sun, 31 Jul 2011 12:21:46 -0700, Jonathan M Davis wrote:
 On Sunday 31 July 2011 14:24:30 Lars T. Kyllingstad wrote:
 On Fri, 29 Jul 2011 18:06:58 +0000, Lars T. Kyllingstad wrote:
 Here's a new update based on your comments and requests. [...]

You may have noticed that I did not incorporate Steve's suggestion to make the directory separator(s) a template parameter. The reason is that this is the *smallest* difference between paths on the different platforms. Drive letters and UNC paths are a much bigger difference, and make interoperability near impossible. Therefore, I would just like to reiterate a suggestion I made on the Phobos list a while ago, which was then shot down. What if we put the entire module inside a template, like this: enum Platform { windows, posix } enum CaseSensitive { yes, no } template Path(Platform platform, CaseSensitive caseSensitive) { /* Every function in the module goes inside this template, and instead of version (Windows) { ... } and so on, we use static if (platform == Platform.windows) { ... } That way, if people for some reason need to deal with POSIX paths on Windows, for instance, they can just write Path!(Platform.posix).someFunction(myPath); */ } // Of course, we don't want to add a cumbersome prefix every time // we call a function for the current platform. By mixing in the // template, std.path can be used *exactly* like now: version (Windows) private enum currentPlatform = Platform.windows; else version (Posix) private enum currentPlatform = Platform.posix; mixin Path!(currentPlatform, platformDefaultCaseSensitivity); What do you think?

Honestly, it seems like overkill and overly messy. Doing something that drastic would have to add a lot of value IMHO, and I just don't see it. And honestly, even if it _did_ add a lot of value, I think that I'd want a cleaner solution to be found.

I don't see what's so messy or drastic about it. For the most part, you'd use the module *exactly* as you do now. Handling paths from different platforms are not an uncommon need; people have requested more than once that this be possible with the new std.path. Zip archives and FTP clients/servers are good examples of use cases. What's nice is that this doesn't have to be added now. We can make the change I suggest at any later time without breaking backwards compatibility.

It seems very messy to me from an implementation stand point. There's also the question about what it does to the spellchecker. If you type extensio instead of extension, will the spellchecker find it, or does the additional template obscure it?

I don't think we should design the library around shortcomings of the compiler.
 Also, can ddoc handle it? I'm not
 sure that a template mixin like that would be properly documented.

The functions would be documented as members of the Path template, and we'd have to state in the module introduction that the template itself can, for the most part, be ignored unless you really need the extra functionality. The fact that the docs become slightly more convoluted is, of course, an argument against my solution. However, I don't think it outweighs the benefits.
 Maybe it's ultimately a good idea, but it seems to me like it stands a
 good chance of causing problems. We'd have to have a clear grasp of all
 of the side effects first.

Again, it doesn't have to be decided upon now. Adding it later is a non- breaking change. Actually, it would probably be a bad idea to do it now, when there are only a few days left of the review. -Lars
Aug 01 2011
prev sibling next sibling parent reply "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:
On Mon, 01 Aug 2011 17:24:14 +0000, Jonathan M Davis wrote:

 Here's a new update based on your comments and requests. Code and docs
 are in the usual place,
 [...]

of names using Ext instead of Extension and Sep instead of sep -

Since the feedback has been minimal since my last update, I take it people are more or less happy with the module's functionality now. (Unless they're just tired of the whole discussion, of course. ;)) That means it is time to let the bikeshedding begin. Since we have been unable to reach a consensus in previous discussions, and there are now only three days left of the review, I would really like to get a good sample of the general opinion here. Folks, please state your preferences in terms of function names. I'll try to put personal bias aside and compose a naming scheme that is both internally consistent and consistent with the majority opinion.
 and also with regards to making non-property functions verbs (e.g.
 absolutePath and relativePath).

I'd be happy to change it, but I'm at loss for good alternatives. I seem to remember you suggesting makeAbsolute and makeRelative, but not being 100% happy with them yourself. Any other suggestions?
 Aside from that, wasn't it discussed that extensions should include the
 . in them in order to cover the case where a file ends in a .? Otherwise
 
 assert(extension("file.") == extension("file"));
 
 So, all of the stuff dealing with extensions should probably include the
 dot in the extension. It looks like both the current std.path and your
 std.path handle it by using "" for dot and null for nothing, but
 distinguishing between null and the empty string like that isn't
 necessarily a good idea, and your documentation doesn't make the
 distinction clear, unlike the documentation for the current std.path.
 
 So, at minimum, the docs should be clearer about null vs empty when it
 comes to the extension (and there should be tests for it), but it would
 arguably be better to just include the . in the extension.

I'm slowly coming around to the idea of including the dot. Unless I hear any loud protests I will probably do so. -Lars
Aug 01 2011
next sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
 On 01.08.2011 20:06, Lars T. Kyllingstad wrote:
 On Mon, 01 Aug 2011 17:24:14 +0000, Jonathan M Davis wrote:
 Here's a new update based on your comments and requests. Code and docs
 are in the usual place,
 [...]

Well, my previous comments on function names still stand - both in terms of names using Ext instead of Extension and Sep instead of sep -

Since the feedback has been minimal since my last update, I take it people are more or less happy with the module's functionality now. (Unless they're just tired of the whole discussion, of course. ;)) That means it is time to let the bikeshedding begin. Since we have been unable to reach a consensus in previous discussions, and there are now only three days left of the review, I would really like to get a good sample of the general opinion here. Folks, please state your preferences in terms of function names. I'll try to put personal bias aside and compose a naming scheme that is both internally consistent and consistent with the majority opinion.
 and also with regards to making non-property functions verbs (e.g.
 absolutePath and relativePath).

I'd be happy to change it, but I'm at loss for good alternatives. I seem to remember you suggesting makeAbsolute and makeRelative, but not being 100% happy with them yourself. Any other suggestions?

I tend to like the current naming scheme. I tried taking the list of functions and turning them into verbs, resulting in names like toAbsolute, ensureExtension, getBaseName, etc. It makes it a bit easier to tell what's a function and what's not. But given just a little bit of context, that's usually pretty clear. Then those "to" and "get" prefixes start to feel like more like noise. Leaving out prefixes in some cases leaves more room for making the names descriptive. I guess that's another way of seeing it, so I thought I'd mention it. It makes it a bit harder to get an overview of the module by reading the symbol list at the top of the doc page, though.
 Aside from that, wasn't it discussed that extensions should include the
 . in them in order to cover the case where a file ends in a .?
 Otherwise
 
 assert(extension("file.") == extension("file"));
 
 So, all of the stuff dealing with extensions should probably include
 the dot in the extension. It looks like both the current std.path and
 your std.path handle it by using "" for dot and null for nothing, but
 distinguishing between null and the empty string like that isn't
 necessarily a good idea, and your documentation doesn't make the
 distinction clear, unlike the documentation for the current std.path.
 
 So, at minimum, the docs should be clearer about null vs empty when it
 comes to the extension (and there should be tests for it), but it would
 arguably be better to just include the . in the extension.

I'm slowly coming around to the idea of including the dot. Unless I hear any loud protests I will probably do so.

I mentioned this in the other thread, but I'm hard pressed to find an actual use case for it. The only thing I can think of is that with some Windows applications specifying a file name that ends with a dot will suppress the addition of a default extension. Notepad is an example, it adds .txt by default. I'm not sure if that's really relevant, though. On Windows, I believe it's generally true that a file name that ends with a dot is not considered to have an extension. You can't normally create a file name ending with a dot, IIRC, fopen() will just ignore the dot and create the file without it. In light of that, I think the current behavior of extension() makes perfect sense. I the rare case that a file name ending with a dot exists on the file system, it could easily fool naive user code that would likely never have seen such a file name in testing: --- if (extension(fname).length == 0) { writeln("Missing extension, adding default!); setExtension(fname, "xyz"); } --- Returning null for the special case of a name ending with a dot seems reasonable, I'm fine with that. A slightly obscure solution for a slightly obscure problem. I think setExtension could be smarter, same goes for defaultExtension. This is what I think they should behave like: assert(setExtension("foo", "bar") == "foo.bar"); assert(setExtension("foo", ".bar") == "foo.bar"); assert(setExtension("foo.", "bar") == "foo.bar"); assert(setExtension("foo.", ".bar") == "foo.bar"); This is what .NET does. It's treating extensions not just like substrings, but recognizing that there is syntax and conventions associated with them. Which is what you want 90% of the time. Just my pragmatic, Windows-sentric view of things :)

The problem with that is that then you _can't_ have something like "foo..bar". Granted, that's not a normal thing to do, but it would be problematic if it ever happened. The primary danger I see with using null for no extension and empty for "." is assert(extension("file.") == extension("file")); That could cause problems for any program that actually runs into a file which ends with a ".". Now, if Linux, Windows, and Mac OS X all prevent files ending in a dot (which I seriously doubt), then it isn't really an issue. Having the difference between null and empty does make it possible to distinguish if you start doing stuff like auto ext1 = extension("file."); auto ext2 = extension("file"); assert(ext1 !is null && ext2 !is null && ext1 == ext2); but it's ugly and it means that your average program is going to deal with it incorrectly. True, it's unlikely to happen, but if it _does_ happen, the program will be buggy. It might be reasonable to have all of the functions in std.path which return an extension include the dot and have them accept extensions with or without the dot (in which case they just add a dot if there wasn't one - regardless of whether the file's baseName ended with a dot or not). But that inconsistency might be problematic - though considering that the alternative is pretty much to assert or throw if given an extension which doesn't start with a dot, they might pretty much have to have the behavior of adding the dot if it isn't there. Ideally, this would always be true assert(setExtension(stripExtension(fileName), fileName.extension) == fileName); In your suggestion, that would not be the case for a file such as "foo..bar", and I'm not quite sure what it does with Lars' current implementation for a file such as "file.". I'd have to check. But unless there's a really good reason why forcing assert(setExtension(stripExtension(fileName), fileName.extension) == fileName); to always be true wouldn't work, then I definitely think that it should _always_ be true. - Jonathan M Davis
Aug 01 2011
prev sibling next sibling parent reply "Nick Sabalausky" <a a.a> writes:
"Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> wrote in message 
news:j16pv5$id8$3 digitalmars.com...
 Folks, please state your preferences in terms of function names.  I'll
 try to put personal bias aside and compose a naming scheme that is both
 internally consistent and consistent with the majority opinion.

I'm happy either way with Sep/Ext or Separator/Extension. I guess my preference would be for the shorter versions. They're perfectly clear (who isn't familiar with the "ext" abbreviation for "extension"?) and they're easier to spell. And one-line file/path manipulations are less likely to grow too far past the 80-char mark. But if we kept the long ones, I wouldn't complain.
 and also with regards to making non-property functions verbs (e.g.
 absolutePath and relativePath).

I'd be happy to change it, but I'm at loss for good alternatives. I seem to remember you suggesting makeAbsolute and makeRelative, but not being 100% happy with them yourself. Any other suggestions?

I agree with "function names should be verbs" as a general guideline, but I don't think it should be taken so strictly that it gets forced on in situations (like this one) where it just doesn't work quite as well. Despite not being verbs, "absolutePath/relativePath" are perfectly clear and much more descriptive than "makeAbsolute/makeRelative" (Make an absolute or relative what?). And then "makeAbsolutePath/makeRelativePath" is just starting to get verbose. I think this is one case where it's just not worthwhile to force the "function names should be verbs" guideline.
 I'm slowly coming around to the idea of including the dot.  Unless I hear
 any loud protests I will probably do so.

Sounds good to me.
Aug 01 2011
parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Wednesday 03 August 2011 08:35:10 Lars T. Kyllingstad wrote:
 On Mon, 01 Aug 2011 18:18:39 -0400, Nick Sabalausky wrote:
 "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> wrote in message
 news:j16pv5$id8$3 digitalmars.com...
 
 Folks, please state your preferences in terms of function names.  I'll
 try to put personal bias aside and compose a naming scheme that is
 both
 internally consistent and consistent with the majority opinion.

I'm happy either way with Sep/Ext or Separator/Extension. I guess my preference would be for the shorter versions. They're perfectly clear (who isn't familiar with the "ext" abbreviation for "extension"?) and they're easier to spell. And one-line file/path manipulations are less likely to grow too far past the 80-char mark. But if we kept the long ones, I wouldn't complain.

If I rename setExtension() and defaultExtension() to setExt() and defaultExt(), I feel like extension() should also be renamed to ext() for consistency's sake. I would *really* dislike that, hence my resistance to the shorter names.

Well, it _is_ somewhat less consistent that way, but I think that gain in having shorter function names makes it worth it. So, I'd stick with extension but shorten all of the others to Ext. It _is_ consistent insomuch as all of the functions which have more than one word in them shorten it to Ext. The only odd man out is the one which is only one word, and I think that the reason why it was different would be fairly obvious to anyone using std.path. So, yes. You have less consistency with setExt, defaultExt, and extension, but I definitely think that the benefits make it worth it.
 and also with regards to making non-property functions verbs (e.g.
 absolutePath and relativePath).

I'd be happy to change it, but I'm at loss for good alternatives. I seem to remember you suggesting makeAbsolute and makeRelative, but not being 100% happy with them yourself. Any other suggestions?

I agree with "function names should be verbs" as a general guideline, but I don't think it should be taken so strictly that it gets forced on in situations (like this one) where it just doesn't work quite as well. Despite not being verbs, "absolutePath/relativePath" are perfectly clear and much more descriptive than "makeAbsolute/makeRelative" (Make an absolute or relative what?). And then "makeAbsolutePath/makeRelativePath" is just starting to get verbose.

I actually preferred the names from my original proposal: toAbsolute and toRelative. But someone (can't remember who) pointed out that toSomething functions by convention convert between types, and they're probably right.
 I think this is one case where it's just not worthwhile to force the
 "function names should be verbs" guideline.

That seems to be the general consensus. I'll keep the names as they are.

I'd argue that it's worth it for absolutePath and relativePath (I see no problem with makeAbsolute and makeRelative), and since Absolute and Relative aren't types and can't cause confusion in that regard, I don't really mind toAbsolute and toRelative either, but it does seem that not everyone agrees on that point. - Jonathan M Davis
Aug 03 2011
prev sibling next sibling parent reply Jesse Phillips <jessekphillips+d gmail.com> writes:
On Mon, 01 Aug 2011 20:50:21 +0000, Jonathan M Davis wrote:

st my pragmatic, Windows-sentric view of things :)
 
 The problem with that is that then you _can't_ have something like
 "foo..bar". Granted, that's not a normal thing to do, but it would be
 problematic if it ever happened.

You can, you just wouldn't build it with these functions. And frankly if you are using these then you probably wouldn't want foo..bar and wouldn't care that it ended up foo.bar or foo..bar. You just want the proper extension on the proper name, which is what you'd get.
 The primary danger I see with using null for no extension and empty for
 "." is
 
 assert(extension("file.") == extension("file"));
 
 That could cause problems for any program that actually runs into a file
 which ends with a ".". Now, if Linux, Windows, and Mac OS X all prevent
 files ending in a dot (which I seriously doubt), then it isn't really an
 issue. Having the difference between null and empty does make it
 possible to distinguish if you start doing stuff like

I disagree, that assertion should pass. They both have the same extension. If you really want to know if a file name ends in a dot, assert("file."[$-1] == "."); I just don't see a case where you'd be dealing with these too files and care to know the extension is explicitly empty or umm just empty.
Aug 01 2011
next sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Monday 01 August 2011 22:55:02 Jonathan M Davis wrote:
 On Tuesday 02 August 2011 05:41:11 Jesse Phillips wrote:
 On Mon, 01 Aug 2011 20:50:21 +0000, Jonathan M Davis wrote:
 
 st my pragmatic, Windows-sentric view of things :)
 
 The problem with that is that then you _can't_ have something like
 "foo..bar". Granted, that's not a normal thing to do, but it would
 be
 problematic if it ever happened.

You can, you just wouldn't build it with these functions. And frankly if you are using these then you probably wouldn't want foo..bar and wouldn't care that it ended up foo.bar or foo..bar. You just want the proper extension on the proper name, which is what you'd get.
 The primary danger I see with using null for no extension and empty
 for
 "." is
 
 assert(extension("file.") == extension("file"));
 
 That could cause problems for any program that actually runs into a
 file which ends with a ".". Now, if Linux, Windows, and Mac OS X
 all prevent files ending in a dot (which I seriously doubt), then
 it isn't really an issue. Having the difference between null and
 empty does make it possible to distinguish if you start doing stuff
 like

I disagree, that assertion should pass. They both have the same extension. If you really want to know if a file name ends in a dot, assert("file."[$-1] == "."); I just don't see a case where you'd be dealing with these too files and care to know the extension is explicitly empty or umm just empty.

The main issue that I see is that it becomes too easy to have a program think that the names of two files are identical if you're dealing with their pieces instead of their whole. Granted, this is not exactly a normal situation, but I'd much prefer to have it handled cleanly so that programs that get into such a situation don't end up being buggy rather than declaring that two items which aren't identical as being equal.

Actually, I suppose that I could sum up my take on it by saying that if the two aren't absolutely identical, they shouldn't be treated as equal. "file." and "file" do _not_ have the same extension. One has an empty extension whereas the other has none. Most stuff won't care about the difference, in which case it probably won't care whether extension("file.") == extension("file") or not (in fact, all it probably care about is whether the extension is a particular extension or not). But stuff which _does_ care about the difference _will_ care whether extension("file.") == extension("file"). It _won't_ want them to be equal. In general, I don't like the idea of null and empty being conflated - they're two separate things. It's bad enough that arrays act that way in D. There's no need to propagate it to file extensions as well. - Jonathan M Davis
Aug 01 2011
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 08/02/2011 01:02 AM, Jonathan M Davis wrote:
 Actually, I suppose that I could sum up my take on it by saying that if the
 two aren't absolutely identical, they shouldn't be treated as equal.

 "file." and "file" do _not_ have the same extension. One has an empty extension
 whereas the other has none.

I agree. A simple solution is to make the dot part of the extension. So extension("file.") returns "." and extension("file") returns the empty string (be it null or not). Problem solved. One nice side effect is that concatenating the base and the extension gives back the name. Andrei
Aug 02 2011
prev sibling next sibling parent "Marco Leise" <Marco.Leise gmx.de> writes:
Am 02.08.2011, 08:02 Uhr, schrieb Jonathan M Davis <jmdavisProg gmx.com>:

 "file." and "file" do _not_ have the same extension. One has an empty  
 extension whereas the other has none.

Still I would expect a get extension function to return the empty string for both. Why is that so? As Wikipedia states the interpretation depends on the filesystem (or maybe on the originating OS, but you can use ext3 on Windows and NTFS on Linux nowadays). But others seem to have problems as well: Trailing dots disappear in Samba: http://lists.samba.org/archive/rsync/2002-September/003636.html On Windows files ending in a dot cannot be deleted: http://cygwin.com/ml/cygwin/2004-01/msg00848.html http://blog.dotsmart.net/2008/06/12/solved-cannot-read-from-the-source-file-or-disk/ Mozilla Linux cannot open files ending in a dot: https://bugzilla.mozilla.org/show_bug.cgi?id=149586 The file extension is what is following the last dot. On Windows it cannot be empty, thus 'foo.' will be an inaccessible file. Yet 'foo..bar' is perfectly fine, which is causing us trouble now, since 'foo.' is 'foo..bar' stripped from its extension, but 'foo.' itself - while valid on Posix - is an ambiguous name in Windows. Camp A thinks: - it has no extension as long as the dot isn't followed by one - changing the extension must result in 'foo..ext' - getExtension should never return null, but be either '' or include the dot as in '.ext' - disassembling and reassembling a filename by string concatenation should return the original filename in all cases Camp B thinks: - no dot = no extension, otherwise what follows the dot is the extension - changing the extension must result in 'foo.ext' - getExtension returns null if no dot is found, an empty string if the file ends in a dot or otherwise what is following the dot - disassembling and reassembling a filename isn't a portable process I started at camp A, but now I'm really caught in the middle. Their arguments make as much sense. Funny enough even Sun avoided file extension methods in their Java File class, so I checked Python for that matter: os.path.splitext ( "foo.bar" ) -> '.bar' os.path.splitext ( "foo." ) -> '.' os.path.splitext ( "foo" ) -> '' Although there is no routine to change the extension, the obvious approach would result in changeExt('foo.', '.bar') == 'foo.bar'. This is what Jonathan prefers and I agree with this solution now that I made up my mind. It's just inconvenient that by this convention you cannot change the extension of 'Keep my dot.' in a way that the result is 'Keep my dot..ext'.
Aug 02 2011
prev sibling parent Jose Armando Garcia <jsancio gmail.com> writes:
On Tue, Aug 2, 2011 at 8:57 AM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 On 08/02/2011 01:02 AM, Jonathan M Davis wrote:
 Actually, I suppose that I could sum up my take on it by saying that if
 the
 two aren't absolutely identical, they shouldn't be treated as equal.

 "file." and "file" do _not_ have the same extension. One has an empty
 extension
 whereas the other has none.

I agree. A simple solution is to make the dot part of the extension. So extension("file.") returns "." and extension("file") returns the empty string (be it null or not). Problem solved. One nice side effect is that concatenating the base and the extension gives back the name.

For what it is worth. This also matches how std.log deals with extensions. Well, std.log leaves it up to the user to include the period ('.'). /++ Specifies the extension for the name of log files. The default value is $(D ".log"). +/ property string fileNameExtension(string extension) { return _fileNameExtension = extension; }
Aug 02 2011
prev sibling next sibling parent "Marco Leise" <Marco.Leise gmx.de> writes:
Am 01.08.2011, 22:50 Uhr, schrieb Jonathan M Davis <jmdavisProg gmx.com>:
 The problem with that is that then you _can't_ have something like  
 "foo..bar".

I used to have a library of sound files when I was young. Some of them were spoken sentences and I named the files according to what was said. While I couldn't use a question mark I did use ! and . "If I was torturing someone, he wouldn't be allowed to sit..wav" With Explorer hiding the known extension it looked fine. The .NET way is fine for the general case, but I learnt that a file only has an extension when the file name doesn't end with the dot. So I'd expect the semantics to be setExtension('I am a file name', 'bar') == 'I am a file name.bar' setExtension('I am a file name.', 'bar') == 'I am a file name..bar' setExtension('I am a file name', '.bar') == 'I am a file name.bar' setExtension('I am a file name.', '.bar') == 'I am a file name..bar'
Aug 01 2011
prev sibling parent amanda <maamanda26 yahoo.com> writes:
People with herpes are not alone now! Because you have " herpesanddating.net  "
It helps people with herpes in all aspects.
Aug 02 2011
prev sibling next sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
 On Mon, 01 Aug 2011 17:24:14 +0000, Jonathan M Davis wrote:
 Here's a new update based on your comments and requests. Code and docs
 are in the usual place,
 [...]

Well, my previous comments on function names still stand - both in terms of names using Ext instead of Extension and Sep instead of sep -

Since the feedback has been minimal since my last update, I take it people are more or less happy with the module's functionality now. (Unless they're just tired of the whole discussion, of course. ;)) That means it is time to let the bikeshedding begin. Since we have been unable to reach a consensus in previous discussions, and there are now only three days left of the review, I would really like to get a good sample of the general opinion here. Folks, please state your preferences in terms of function names. I'll try to put personal bias aside and compose a naming scheme that is both internally consistent and consistent with the majority opinion.

If you want much feedback, you're probably going to need a new thread for that. I think that most people looked at the initial std.path implementation that you gave, had their say, and moved on. It wouldn't entirely surprise me if you could get away with making some incredibly stupid changes and then get them merged into Phobos simply because next to no one really looked at your most recent changes (I would hope that you wouldn't do that though).
 and also with regards to making non-property functions verbs (e.g.
 absolutePath and relativePath).

I'd be happy to change it, but I'm at loss for good alternatives. I seem to remember you suggesting makeAbsolute and makeRelative, but not being 100% happy with them yourself. Any other suggestions?

I'd just go with makeAbsolute and makeRelative. Maybe makeAbolutePath and makeRelativePath (or makeAbsPath and makeRelPath) would be better, but those are getting a bit long. It's defaultExtension which is problematic (aside from the fact that I think that Extension should be shortened to Ext). What it's trying to do is set the extension with a default extension if one isn't set but do nothing if it is, and I can't think of a good name for that. setExtToDefaultIfNoExt just doesn't roll off the tongue. ;) So, I don't have a good fix for defaultExtension, but I think that going with makeAbsolute and makeRelative would be better, since they _do_ make sense and are proper verbs. - Jonathan M Davis
Aug 01 2011
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Tuesday 02 August 2011 05:41:11 Jesse Phillips wrote:
 On Mon, 01 Aug 2011 20:50:21 +0000, Jonathan M Davis wrote:
 
 st my pragmatic, Windows-sentric view of things :)
 
 The problem with that is that then you _can't_ have something like
 "foo..bar". Granted, that's not a normal thing to do, but it would be
 problematic if it ever happened.

You can, you just wouldn't build it with these functions. And frankly if you are using these then you probably wouldn't want foo..bar and wouldn't care that it ended up foo.bar or foo..bar. You just want the proper extension on the proper name, which is what you'd get.
 The primary danger I see with using null for no extension and empty for
 "." is
 
 assert(extension("file.") == extension("file"));
 
 That could cause problems for any program that actually runs into a file
 which ends with a ".". Now, if Linux, Windows, and Mac OS X all prevent
 files ending in a dot (which I seriously doubt), then it isn't really an
 issue. Having the difference between null and empty does make it
 possible to distinguish if you start doing stuff like

I disagree, that assertion should pass. They both have the same extension. If you really want to know if a file name ends in a dot, assert("file."[$-1] == "."); I just don't see a case where you'd be dealing with these too files and care to know the extension is explicitly empty or umm just empty.

The main issue that I see is that it becomes too easy to have a program think that the names of two files are identical if you're dealing with their pieces instead of their whole. Granted, this is not exactly a normal situation, but I'd much prefer to have it handled cleanly so that programs that get into such a situation don't end up being buggy rather than declaring that two items which aren't identical as being equal. - Jonathan M Davis
Aug 01 2011
prev sibling next sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Tuesday 02 August 2011 10:19:54 Marco Leise wrote:
 Am 02.08.2011, 08:02 Uhr, schrieb Jonathan M Davis <jmdavisProg gmx.com>:
 "file." and "file" do _not_ have the same extension. One has an empty
 extension whereas the other has none.

Still I would expect a get extension function to return the empty string for both. Why is that so? As Wikipedia states the interpretation depends on the filesystem (or maybe on the originating OS, but you can use ext3 on Windows and NTFS on Linux nowadays). But others seem to have problems as well: Trailing dots disappear in Samba: http://lists.samba.org/archive/rsync/2002-September/003636.html On Windows files ending in a dot cannot be deleted: http://cygwin.com/ml/cygwin/2004-01/msg00848.html http://blog.dotsmart.net/2008/06/12/solved-cannot-read-from-the-source-file- or-disk/ Mozilla Linux cannot open files ending in a dot: https://bugzilla.mozilla.org/show_bug.cgi?id=149586 The file extension is what is following the last dot. On Windows it cannot be empty, thus 'foo.' will be an inaccessible file. Yet 'foo..bar' is perfectly fine, which is causing us trouble now, since 'foo.' is 'foo..bar' stripped from its extension, but 'foo.' itself - while valid on Posix - is an ambiguous name in Windows. Camp A thinks: - it has no extension as long as the dot isn't followed by one - changing the extension must result in 'foo..ext' - getExtension should never return null, but be either '' or include the dot as in '.ext' - disassembling and reassembling a filename by string concatenation should return the original filename in all cases Camp B thinks: - no dot = no extension, otherwise what follows the dot is the extension - changing the extension must result in 'foo.ext' - getExtension returns null if no dot is found, an empty string if the file ends in a dot or otherwise what is following the dot - disassembling and reassembling a filename isn't a portable process I started at camp A, but now I'm really caught in the middle. Their arguments make as much sense. Funny enough even Sun avoided file extension methods in their Java File class, so I checked Python for that matter: os.path.splitext ( "foo.bar" ) -> '.bar' os.path.splitext ( "foo." ) -> '.' os.path.splitext ( "foo" ) -> '' Although there is no routine to change the extension, the obvious approach would result in changeExt('foo.', '.bar') == 'foo.bar'. This is what Jonathan prefers and I agree with this solution now that I made up my mind. It's just inconvenient that by this convention you cannot change the extension of 'Keep my dot.' in a way that the result is 'Keep my dot..ext'.

Except that that's two extensions, which shouldn't pose a problem. Actually, that raises the argument that we should have an addExtension function. After all, files such as file.tar.gz are quite common (on Linux at least), and std.path should be able to handle files with multiple extensions. IIRC, on the whole both the old std.path and the new std.path handle multiple extensions fairly well, but I don't think that either the old std.path or the new std.path has a function which handles the case where you want to add an extension to a file regardless of whether it already has an extension. - Jonathan M Davis
Aug 02 2011
parent "Nick Sabalausky" <a a.a> writes:
"Jonathan M Davis" <jmdavisProg gmx.com> wrote in message 
news:mailman.2046.1312274305.14074.digitalmars-d puremagic.com...
 Actually, that raises the argument that we should have an addExtension
 function.

We do: ~
Aug 02 2011
prev sibling parent "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:
On Mon, 01 Aug 2011 18:18:39 -0400, Nick Sabalausky wrote:

 "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> wrote in message
 news:j16pv5$id8$3 digitalmars.com...
 Folks, please state your preferences in terms of function names.  I'll
 try to put personal bias aside and compose a naming scheme that is both
 internally consistent and consistent with the majority opinion.

preference would be for the shorter versions. They're perfectly clear (who isn't familiar with the "ext" abbreviation for "extension"?) and they're easier to spell. And one-line file/path manipulations are less likely to grow too far past the 80-char mark. But if we kept the long ones, I wouldn't complain.

If I rename setExtension() and defaultExtension() to setExt() and defaultExt(), I feel like extension() should also be renamed to ext() for consistency's sake. I would *really* dislike that, hence my resistance to the shorter names.
 and also with regards to making non-property functions verbs (e.g.
 absolutePath and relativePath).

I'd be happy to change it, but I'm at loss for good alternatives. I seem to remember you suggesting makeAbsolute and makeRelative, but not being 100% happy with them yourself. Any other suggestions?

but I don't think it should be taken so strictly that it gets forced on in situations (like this one) where it just doesn't work quite as well. Despite not being verbs, "absolutePath/relativePath" are perfectly clear and much more descriptive than "makeAbsolute/makeRelative" (Make an absolute or relative what?). And then "makeAbsolutePath/makeRelativePath" is just starting to get verbose.

I actually preferred the names from my original proposal: toAbsolute and toRelative. But someone (can't remember who) pointed out that toSomething functions by convention convert between types, and they're probably right.
 I think this is one case where it's just not worthwhile to force the
 "function names should be verbs" guideline.

That seems to be the general consensus. I'll keep the names as they are. -Lars
Aug 03 2011