www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Path as an object in std.path

reply "Dylan Knutson" <tcdknutson gmail.com> writes:
Hello,
I'd like to open up the idea of Path being an object in std.path. 
I've submitted a pull 
(https://github.com/D-Programming-Language/phobos/pull/1333) that 
adds a Path struct to std.path, "which exposes a much more 
palatable interface to path string manipulation".

As jmdavis points out, this has previously been discussed. 
However, I can't find that discussion, and I think that the 
benefits of including an OO way to deal with paths is a serious 
gain for the standard library.

Why I think it should be reconsidered for inclusion in the std 
(listed in the pull):
* Adds a (more) platform independent abstraction for path strings.
* Path provides a type safe way to pass, compare, and manipulate 
arbitrary path strings.
* It wraps over the functions defined in std.path, so behavior of 
methods on Path are, in most cases, identical to their 
corresponding module function.

I'd like some feedback on what others think about this; I'd hate 
to see this commit closed due to a discussion that happened at a 
different point in D's development when the language had 
different needs.

Thank you.
Jun 04 2013
next sibling parent reply "Joshua Niehus" <jm.niehus gmail.com> writes:
On Wednesday, 5 June 2013 at 06:27:46 UTC, Dylan Knutson wrote:
 "which exposes a much more palatable interface to path string 
 manipulation".
 [...snip...]
 I'd like some feedback on what others think about this;
personally, I prefer the current implementation and found it easy to use for the multitudes of tiny scripts I've written. I wouldn't like to create an "object" just to call isAbsolute. That being said, I don't see why having the struct would hurt. Nice work by the way
Jun 05 2013
next sibling parent Jacob Carlborg <doob me.com> writes:
On 2013-06-05 09:11, Joshua Niehus wrote:

 personally, I prefer the current implementation and found it easy to use
 for the multitudes of tiny scripts I've written.  I wouldn't like to
 create an "object" just to call isAbsolute.
I agree. But if you're passing around a lot of paths it would probably be a good idea to have a proper type for the paths.
 That being said, I don't see why having the struct would hurt.
-- /Jacob Carlborg
Jun 05 2013
prev sibling parent reply "John Colvin" <john.loughran.colvin gmail.com> writes:
On Wednesday, 5 June 2013 at 07:11:49 UTC, Joshua Niehus wrote:
 On Wednesday, 5 June 2013 at 06:27:46 UTC, Dylan Knutson wrote:
 "which exposes a much more palatable interface to path string 
 manipulation".
 [...snip...]
 I'd like some feedback on what others think about this;
personally, I prefer the current implementation and found it easy to use for the multitudes of tiny scripts I've written. I wouldn't like to create an "object" just to call isAbsolute. That being said, I don't see why having the struct would hurt. Nice work by the way
Is there any reason why we couldn't keep the string-based free functions around as well?
Jun 05 2013
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/5/13 7:33 AM, John Colvin wrote:
 On Wednesday, 5 June 2013 at 07:11:49 UTC, Joshua Niehus wrote:
 On Wednesday, 5 June 2013 at 06:27:46 UTC, Dylan Knutson wrote:
 "which exposes a much more palatable interface to path string
 manipulation".
 [...snip...]
 I'd like some feedback on what others think about this;
personally, I prefer the current implementation and found it easy to use for the multitudes of tiny scripts I've written. I wouldn't like to create an "object" just to call isAbsolute. That being said, I don't see why having the struct would hurt. Nice work by the way
Is there any reason why we couldn't keep the string-based free functions around as well?
I don't have a strong opinion regarding Path object vs. string functions, and I agree both have advantages and disadvantages. But I would be opposed to having both. Andrei
Jun 05 2013
next sibling parent "John Colvin" <john.loughran.colvin gmail.com> writes:
On Wednesday, 5 June 2013 at 13:26:39 UTC, Andrei Alexandrescu 
wrote:
 On 6/5/13 7:33 AM, John Colvin wrote:
 On Wednesday, 5 June 2013 at 07:11:49 UTC, Joshua Niehus wrote:
 On Wednesday, 5 June 2013 at 06:27:46 UTC, Dylan Knutson 
 wrote:
 "which exposes a much more palatable interface to path string
 manipulation".
 [...snip...]
 I'd like some feedback on what others think about this;
personally, I prefer the current implementation and found it easy to use for the multitudes of tiny scripts I've written. I wouldn't like to create an "object" just to call isAbsolute. That being said, I don't see why having the struct would hurt. Nice work by the way
Is there any reason why we couldn't keep the string-based free functions around as well?
I don't have a strong opinion regarding Path object vs. string functions, and I agree both have advantages and disadvantages. But I would be opposed to having both. Andrei
Because of duplication of implementation? Or is it simply "2 ways to do the same thing" is bad? I was imagining the following situation: Free functions, similar/identical to current Struct that provides all current functionality by wrapping the free functions, plus any extra stuff that is only appropriate for a path object. Unfortunately the current naming scheme doesn't really suit this idea that well.
Jun 05 2013
prev sibling next sibling parent reply "Regan Heath" <regan netmail.co.nz> writes:
On Wed, 05 Jun 2013 14:26:39 +0100, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 6/5/13 7:33 AM, John Colvin wrote:
 On Wednesday, 5 June 2013 at 07:11:49 UTC, Joshua Niehus wrote:
 On Wednesday, 5 June 2013 at 06:27:46 UTC, Dylan Knutson wrote:
 "which exposes a much more palatable interface to path string
 manipulation".
 [...snip...]
 I'd like some feedback on what others think about this;
personally, I prefer the current implementation and found it easy to use for the multitudes of tiny scripts I've written. I wouldn't like to create an "object" just to call isAbsolute. That being said, I don't see why having the struct would hurt. Nice work by the way
Is there any reason why we couldn't keep the string-based free functions around as well?
I don't have a strong opinion regarding Path object vs. string functions, and I agree both have advantages and disadvantages. But I would be opposed to having both.
1. System.IO.FileInfo and System.IO.DirectoryInfo non-static/instance classes with methods i.e. Delete() 2. System.File and System.Directory static classes with methods accepting strings i.e. Delete(string name) R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Jun 05 2013
parent "Regan Heath" <regan netmail.co.nz> writes:
On Wed, 05 Jun 2013 15:12:22 +0100, Regan Heath <regan netmail.co.nz>  
wrote:

 On Wed, 05 Jun 2013 14:26:39 +0100, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:

 On 6/5/13 7:33 AM, John Colvin wrote:
 On Wednesday, 5 June 2013 at 07:11:49 UTC, Joshua Niehus wrote:
 On Wednesday, 5 June 2013 at 06:27:46 UTC, Dylan Knutson wrote:
 "which exposes a much more palatable interface to path string
 manipulation".
 [...snip...]
 I'd like some feedback on what others think about this;
personally, I prefer the current implementation and found it easy to use for the multitudes of tiny scripts I've written. I wouldn't like to create an "object" just to call isAbsolute. That being said, I don't see why having the struct would hurt. Nice work by the way
Is there any reason why we couldn't keep the string-based free functions around as well?
I don't have a strong opinion regarding Path object vs. string functions, and I agree both have advantages and disadvantages. But I would be opposed to having both.
1. System.IO.FileInfo and System.IO.DirectoryInfo non-static/instance classes with methods i.e. Delete() 2. System.File and System.Directory static classes with methods accepting strings i.e. Delete(string name)
I forgot to say.. I've used both in different situations. Sometimes you get a FileInfo/DirectoryInfo from another method, or you have created one because you're going to re-use the path/information a lot (to get file attributes etc) and sometimes you just need to build a path using Path.Combine (into a string) and delete it, or similar. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Jun 05 2013
prev sibling parent reply "Dylan Knutson" <tcdknutson gmail.com> writes:
 I don't have a strong opinion regarding Path object vs. string 
 functions, and I agree both have advantages and disadvantages. 
 But I would be opposed to having both.
Personally, I'd prefer to just use the Path struct in std.path. While a Path can be represented as a string, it's not the same concept (the same way that a DateTime can be represented as an integer, but we don't just pass times around as raw integer types). However, I can't imagine that'd be feasible as it'd break a lot of code. My suggestion would be to just keep the freestanding functions to operate on raw strings, and then migrate over code that depends on std.path to use the Path struct as needed. I realize that this is easier said than done, but even then it shouldn't be a lot of work as Path can implicitly be converted to a string. This makes its integration into existing codebases/Phobos literally as easy as using "Path my_path = `foo\bar`" instead of "string my_path = `foo\bar`". You lose no functionality but gain type safety if you have to do any future refactoring.
  I wouldn't like to create an "object" just to call isAbsolute.
Agreed. The best course of action would probably be keep the raw functions as they exist (think of them as the static versions of Path methods). However, for large applications, the type safety of a Path object makes working with paths much easier.
Jun 05 2013
next sibling parent Timothee Cour <thelastmammoth gmail.com> writes:
currently there's no way to perform cross-platform operations.

what about:
---
enum Platform{Posix,Windows}
version(Posix) enum PlatformDefault=Platform.Posix; else enum
PlatformDefault=Platform.Windows;
struct Path(T=PlatformDefault){}

void main(){
Path!(Platform.Posix) path="a\b";
auto path2=path.to!Path;
}
---

it allows current usage with no modification, and allows cross-platform
logic.


On Wed, Jun 5, 2013 at 1:19 PM, Dylan Knutson <tcdknutson gmail.com> wrote:

 I don't have a strong opinion regarding Path object vs. string functions,
 and I agree both have advantages and disadvantages. But I would be opposed
 to having both.
Personally, I'd prefer to just use the Path struct in std.path. While a Path can be represented as a string, it's not the same concept (the same way that a DateTime can be represented as an integer, but we don't just pass times around as raw integer types). However, I can't imagine that'd be feasible as it'd break a lot of code. My suggestion would be to just keep the freestanding functions to operate on raw strings, and then migrate over code that depends on std.path to use the Path struct as needed. I realize that this is easier said than done, but even then it shouldn't be a lot of work as Path can implicitly be converted to a string. This makes its integration into existing codebases/Phobos literally as easy as using "Path my_path = `foo\bar`" instead of "string my_path = `foo\bar`". You lose no functionality but gain type safety if you have to do any future refactoring. I wouldn't like to create an "object" just to call isAbsolute.

 Agreed. The best course of action would probably be keep the raw functions
 as they exist (think of them as the static versions of Path methods).
 However, for large applications, the type safety of a Path object makes
 working with paths much easier.
Jun 05 2013
prev sibling next sibling parent reply "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Wednesday, June 05, 2013 22:19:21 Dylan Knutson wrote:
 I don't have a strong opinion regarding Path object vs. string
 functions, and I agree both have advantages and disadvantages.
 But I would be opposed to having both.
Personally, I'd prefer to just use the Path struct in std.path. While a Path can be represented as a string, it's not the same concept (the same way that a DateTime can be represented as an integer, but we don't just pass times around as raw integer types).
There's a significant difference between a type which has a value and units and one which is basically just a string or array of strings wrapped by another type. Not that a Path struct is without value, but I think that there's a very large difference in the amount of value that the two provide. AFAIK, very few bugs are caused by treating paths as strings, but there are a lot of time- related bugs out there caused by using naked values instead of values with units.
 This makes its integration into existing codebases/Phobos
 literally as easy as
[snip] See, this is exactly the sort of thing I'm afraid of. I don't want to have to have arguments over whether a particular function should accept a path as a string or a struct. Right now, we have one way do to it, so it's clear, and it works just fine. If we add a Path struct, then we have two ways to do the same thing, and we're going to have a division among APIs as to which way to handle paths. And I think that we'll be very much worse of because of it. While there is value in having a path struct rather than a string, I don't think that it's worth the extra confusion and division that it'll cause. If we were going to have a path struct, we should have done that in the first place rather than using strings. - Jonathan M Davis
Jun 05 2013
parent reply "Dylan Knutson" <tcdknutson gmail.com> writes:
 There's a significant difference between a type which has a 
 value and units and
 one which is basically just a string or array of strings 
 wrapped by another
 type. Not that a Path struct is without value, but I think that 
 there's a very
 large difference in the amount of value that the two provide. 
 AFAIK, very few
 bugs are caused by treating paths as strings, but there are a 
 lot of time-
 related bugs out there caused by using naked values instead of 
 values with
 units.
Dub is forced to define its own separate Path type because, as its author states, using a string to represent a path "more often than not results in hidden bugs." (https://github.com/rejectedsoftware/dub/issues/79). Representing a path is just fine in a small script, but the moment you've got to handle stuff like path comparison, building, and general manipulation, you're better off defining an abstraction for it.
 See, this is exactly the sort of thing I'm afraid of. I don't 
 want to have to
 have arguments over whether a particular function should accept 
 a path as a
 string or a struct. Right now, we have one way do to it, so 
 it's clear, and it
 works just fine.
I see no problem with just keeping Phobos as it is, it was just a suggestion to make use of new functionality. A function that takes a string can accept a Path *or* a string, and it'll work just fine, thanks to subtyping. void bar(Path path) { return; } void foo(string str) { return; } Path p = `baz\quixx`; bar(p); foo(p); So there doesn't have to be an argument over what a function should accept; that's up to the function's internal implementation. From the outside, it'll accept both.
Jun 05 2013
parent "Jesse Phillips" <Jesse.K.Phillips+D gmail.com> writes:
On Wednesday, 5 June 2013 at 20:52:24 UTC, Dylan Knutson wrote:
 Dub is forced to define its own separate Path type because, as 
 its author states, using a string to represent a path "more 
 often than not results in hidden bugs."
You're miss quoting here. "usually will be places where the path is modified using string operations [...]" While I've had desires to have my functions accept a Path so that I can document what is being accepted (also helps with function overloads), std.path has been working well for me as I move my code from string operations to path operations.
Jun 05 2013
prev sibling parent Timothee Cour <thelastmammoth gmail.com> writes:
On Wed, Jun 5, 2013 at 1:34 PM, Jonathan M Davis <jmdavisProg gmx.com>wrote:

 On Wednesday, June 05, 2013 22:19:21 Dylan Knutson wrote:
 I don't have a strong opinion regarding Path object vs. string
 functions, and I agree both have advantages and disadvantages.
 But I would be opposed to having both.
Personally, I'd prefer to just use the Path struct in std.path. While a Path can be represented as a string, it's not the same concept (the same way that a DateTime can be represented as an integer, but we don't just pass times around as raw integer types).
There's a significant difference between a type which has a value and units and one which is basically just a string or array of strings wrapped by another type. Not that a Path struct is without value, but I think that there's a very large difference in the amount of value that the two provide. AFAIK, very few bugs are caused by treating paths as strings,
I disagree. It allows to catch bugs early (eg: giving a $mypath environment variable to a binary, where the env variable wasn't set or set to an invalid path name). Constructing a Path object from it will immediately fail as opposed to later down the code with possibly evil artifacts (eg when using std.process.shell functions). Other advantage : central location for all path object creations allows to instrument the code for example for logging all path names mentioned. Would be impossible with raw string type. Other advantage: makes it easy to work with cross-platform code (ie operating on windows paths from posix), see my previous post in this thread. have such an abstraction. Given D's alias this functionality, this abstraction comes at 0 runtime cost and makes it work with 0 adaptation for most existing code. What will it break? We should discuss that.
 but there are a lot of time-
 related bugs out there caused by using naked values instead of values with
 units.

 This makes its integration into existing codebases/Phobos
 literally as easy as
[snip] See, this is exactly the sort of thing I'm afraid of. I don't want to have to have arguments over whether a particular function should accept a path as a string or a struct. Right now, we have one way do to it, so it's clear, and it works just fine. If we add a Path struct, then we have two ways to do the same thing, and we're going to have a division among APIs as to which way to handle paths. And I think that we'll be very much worse of because of it. While there is value in having a path struct rather than a string, I don't think that it's worth the extra confusion and division that it'll cause. If we were going to have a path struct, we should have done that in the first place rather than using strings. - Jonathan M Davis
Jun 05 2013
prev sibling next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/5/13 2:27 AM, Dylan Knutson wrote:
 Hello,
 I'd like to open up the idea of Path being an object in std.path. I've
 submitted a pull
 (https://github.com/D-Programming-Language/phobos/pull/1333) that adds a
 Path struct to std.path, "which exposes a much more palatable interface
 to path string manipulation".
[snip] Great, thanks for this work. I agree that the proposal deserves a fair shake. Andrei
Jun 05 2013
prev sibling next sibling parent reply "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Wednesday, 5 June 2013 at 06:27:46 UTC, Dylan Knutson wrote:
 Hello,
 I'd like to open up the idea of Path being an object in 
 std.path. I've submitted a pull 
 (https://github.com/D-Programming-Language/phobos/pull/1333) 
 that adds a Path struct to std.path, "which exposes a much more 
 palatable interface to path string manipulation".
For the record, there are some existing D path object implementations: * Tango's FilePath class: https://github.com/SiegeLord/Tango-D2/blob/d2port/tango/io/FilePath.d * Vibe's Path struct: https://github.com/rejectedsoftware/vibe.d/blob/master/source/vibe/inet/path.d
Jun 05 2013
next sibling parent reply "Dylan Knutson" <tcdknutson gmail.com> writes:
On Wednesday, 5 June 2013 at 22:06:52 UTC, Vladimir Panteleev 
wrote:
 On Wednesday, 5 June 2013 at 06:27:46 UTC, Dylan Knutson wrote:
 Hello,
 I'd like to open up the idea of Path being an object in 
 std.path. I've submitted a pull 
 (https://github.com/D-Programming-Language/phobos/pull/1333) 
 that adds a Path struct to std.path, "which exposes a much 
 more palatable interface to path string manipulation".
For the record, there are some existing D path object implementations: * Tango's FilePath class: https://github.com/SiegeLord/Tango-D2/blob/d2port/tango/io/FilePath.d * Vibe's Path struct: https://github.com/rejectedsoftware/vibe.d/blob/master/source/vibe/inet/path.d
The design of Path was prompted by Dub's own internal path module, might I add. And if anything, this just goes to show that a Path object indeed does have its use cases.
Jun 05 2013
parent "Dylan Knutson" <tcdknutson gmail.com> writes:
I'd like to point out some of the pitfalls of using a raw string
as a representation of a path, too.

You've got to manually normalize strings before any comparison is
done. Even a single directory delimer at the end of the string
means that the paths won't compare correctly. This takes a good
amount of extra code to do so, and you've got to remember to
normalize *everywhere*, or you've got a bug waiting to happen.
string s1 = `baz/../foo/bar/`;
string s2 = `foo/bar/`;
string s3 = `foo/bar`;

assert(s1 == s2); // Fails
assert(s2 == s3); // Fails
assert(s1 == s3); // Fails
assert(buildNormalizedPath(s1) == buildNormalizedPath(s2)); //
Passes, with many more keystrokes.

Comparing with Paths:
Path p1 = `baz/../foo/bar/`;
Path p2 = `foo/bar/`;
Path p2 = `foo/bar`;

assert(p1 == p2); // Passes.
assert(p2 == p3); // Passes.
assert(p1 == p3); // Passes.

As you can see, Path is just generally easier to work with,
because it encapsulates the concept a path. There's no having to
normalize strings, because that's done for you. It just works.

Building a path with strings isn't difficult, but the function
calls are unweildy.
string s1 = buildNormalizedPath(`foo`, `bar`);
string s2 = buildNormalizedPath(s1, `baz`);
assert(s2 == `foo/bar/baz`); // Will fail on some platforms.

Building a Path, IMO, just looks cleaner, and it's obvious what
you're doing.
Path p1 = Path(`foo`, `bar`);
Path p2 = p1.join(`baz`);
assert(p2 == `foo/bar/baz`); // Passes on all platforms.

As a sidenote, I'd like to point out that using Path has *no more
overhead* than passing around and manipulating a raw string.
As far as I can tell, all use cases for Path takes less code, and
more easily convays what you're doing. D's support for object
oriented design is great; why not make use of it?
Jun 05 2013
prev sibling parent "Jesse Phillips" <Jesse.K.Phillips+D gmail.com> writes:
On Wednesday, 5 June 2013 at 22:06:52 UTC, Vladimir Panteleev 
wrote:
 * Tango's FilePath class:
   
 https://github.com/SiegeLord/Tango-D2/blob/d2port/tango/io/FilePath.d
Note that Tango code should not be used for code intended for Phobos unless all authors of that piece have stated they will license under Boost. It is a firm stance to prevent any potential legal issues (whether perceived or real)
Jun 05 2013
prev sibling next sibling parent reply "Lars T. Kyllingstad" <public kyllingen.net> writes:
On Wednesday, 5 June 2013 at 06:27:46 UTC, Dylan Knutson wrote:
 Hello,
 I'd like to open up the idea of Path being an object in 
 std.path. I've submitted a pull 
 (https://github.com/D-Programming-Language/phobos/pull/1333) 
 that adds a Path struct to std.path, "which exposes a much more 
 palatable interface to path string manipulation".
Since I am the designer and primary author of std.path, I should probably say something. When I first started working on "the new std.path" a couple of years ago, I initially entertained the idea of writing it in terms of a dedicated Path type. I was quickly convinced otherwise by others, and proceeded to design the module around normal strings. For the last two years I've been working more in C++ than in D (by necessity, not by desire), and for all my path-manipulation needs I've been using boost::filesystem. This library has a dedicated path type, so I've gained some experience with this kind of API. And I am *really* happy we went with the string solution for std.path. Paths are usually obtained in string form, and they are normally passed to other functions and third party libraries in string form. Having to convert them to something else just to do what is, in fact, string manipulations, is just annoying. (One of my biggest gripes with boost::filesystem is that conversions between path and string necessitate a copy, which is not a problem with your Path type, so in that respect it is better than Boost's solution.)
 [...]

 Why I think it should be reconsidered for inclusion in the std 
 (listed in the pull):
 * Adds a (more) platform independent abstraction for path 
 strings.
How is this more platform independent? It is just a simple wrapper around a string, with methods that forward to already-extant module-level functions.
 * Path provides a type safe way to pass, compare, and 
 manipulate arbitrary path strings.
How is it safer? I would agree with this if it verified that isValidpath(_path) on construction and maintained this as an invariant, but I cannot see that it does.
 * It wraps over the functions defined in std.path, so behavior 
 of methods on Path are, in most cases, identical to their 
 corresponding module function.
Then what is the added value? Having Path together with normal string functions in the same module will be confusing (there are two almost-equal ways of doing the same thing; which one should I choose?), and it will add code duplication (now my code has to accept paths both as strings and as Paths). As the author of std.path this may come across as hostile or jealous, but I don't see that the proposed change improves anything. Lars
Jun 06 2013
next sibling parent reply "Lars T. Kyllingstad" <public kyllingen.net> writes:
On Thursday, 6 June 2013 at 07:05:52 UTC, Lars T. Kyllingstad 
wrote:
 On Wednesday, 5 June 2013 at 06:27:46 UTC, Dylan Knutson wrote:
 Hello,
 I'd like to open up the idea of Path being an object in 
 std.path. I've submitted a pull 
 (https://github.com/D-Programming-Language/phobos/pull/1333) 
 that adds a Path struct to std.path, "which exposes a much 
 more palatable interface to path string manipulation".
[...]
Let me add some more to this. To justify the addition of such a type, it needs to pull its own weight. For added value, it could do one or both of the following: 1. Maintain an isValidPath() invariant, for early error detection. (On POSIX, this is rather trivial, as any string that does not contain a null character is in principle a valid path, but on Windows, the situation is different.) 2. Add in-place versions of path modifiers (setExtension, setDrive, etc.), for improved performance. One solution would be for Path to be a trivial string wrapper which does (1) and not (2). In this case, it is justified to have Path *in addition to* the existing functions. Another solution would be for Path to do (2), possibly in addition to (1). However, in this case it should be a *replacement* for the existing functions, and not an addition. Otherwise, we have two almost-equal ways of doing the same thing, which should be avoided. (I am not advocating this, however, as it will massively break user code all over again.) Lars
Jun 06 2013
next sibling parent reply "Regan Heath" <regan netmail.co.nz> writes:
On Thu, 06 Jun 2013 08:55:50 +0100, Lars T. Kyllingstad  
<public kyllingen.net> wrote:

 On Thursday, 6 June 2013 at 07:05:52 UTC, Lars T. Kyllingstad wrote:
 On Wednesday, 5 June 2013 at 06:27:46 UTC, Dylan Knutson wrote:
 Hello,
 I'd like to open up the idea of Path being an object in std.path. I've  
 submitted a pull  
 (https://github.com/D-Programming-Language/phobos/pull/1333) that adds  
 a Path struct to std.path, "which exposes a much more palatable  
 interface to path string manipulation".
[...]
Let me add some more to this. To justify the addition of such a type, it needs to pull its own weight. For added value, it could do one or both of the following:
Does System.IO.DirectoryInfo: http://msdn.microsoft.com/en-us/library/system.io.directoryinfo.aspx Add sufficient value to justify it's existence to your mind? vs just having System.IO.Directory: http://msdn.microsoft.com/en-us/library/system.io.directory.aspx R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Jun 06 2013
parent reply "Lars T. Kyllingstad" <public kyllingen.net> writes:
On Thursday, 6 June 2013 at 10:32:36 UTC, Regan Heath wrote:
 On Thu, 06 Jun 2013 08:55:50 +0100, Lars T. Kyllingstad 
 <public kyllingen.net> wrote:

 On Thursday, 6 June 2013 at 07:05:52 UTC, Lars T. Kyllingstad 
 wrote:
 On Wednesday, 5 June 2013 at 06:27:46 UTC, Dylan Knutson 
 wrote:
 Hello,
 I'd like to open up the idea of Path being an object in 
 std.path. I've submitted a pull 
 (https://github.com/D-Programming-Language/phobos/pull/1333) 
 that adds a Path struct to std.path, "which exposes a much 
 more palatable interface to path string manipulation".
[...]
Let me add some more to this. To justify the addition of such a type, it needs to pull its own weight. For added value, it could do one or both of the following:
Does System.IO.DirectoryInfo: http://msdn.microsoft.com/en-us/library/system.io.directoryinfo.aspx Add sufficient value to justify it's existence to your mind? vs just having System.IO.Directory: http://msdn.microsoft.com/en-us/library/system.io.directory.aspx
They add great value, but that is a completely different discussion, as these are more similar to std.file.DirEntry. The added value is mainly in the performance benefits; for example, if (exists(f) && isFile(f) && timeLastModified(f) < d) ... requires three filesystem lookups (stat() calls), whereas auto de = dirEntry(f); if (de.exists && de.isFile && de.timeLastModified < d) ... is just one. I see no such benefit in the proposed Path type.
Jun 06 2013
parent reply "Dylan Knutson" <tcdknutson gmail.com> writes:
On Thursday, 6 June 2013 at 10:48:54 UTC, Lars T. Kyllingstad 
wrote:
 On Thursday, 6 June 2013 at 10:32:36 UTC, Regan Heath wrote:
 On Thu, 06 Jun 2013 08:55:50 +0100, Lars T. Kyllingstad 
 <public kyllingen.net> wrote:

 On Thursday, 6 June 2013 at 07:05:52 UTC, Lars T. Kyllingstad 
 wrote:
 [...]
Let me add some more to this. To justify the addition of such a type, it needs to pull its own weight. For added value, it could do one or both of the following:
Does System.IO.DirectoryInfo: http://msdn.microsoft.com/en-us/library/system.io.directoryinfo.aspx Add sufficient value to justify it's existence to your mind? vs just having System.IO.Directory: http://msdn.microsoft.com/en-us/library/system.io.directory.aspx
They add great value, but that is a completely different discussion, as these are more similar to std.file.DirEntry. The added value is mainly in the performance benefits; for example, if (exists(f) && isFile(f) && timeLastModified(f) < d) ... requires three filesystem lookups (stat() calls), whereas auto de = dirEntry(f); if (de.exists && de.isFile && de.timeLastModified < d) ... is just one. I see no such benefit in the proposed Path type.
Path and dirEntry are different modules with different goals to fulfill. I don't think it's appropriate to compare a module whose function is path manipulation with one whose is querying filesystem information.
Jun 06 2013
next sibling parent "Lars T. Kyllingstad" <public kyllingen.net> writes:
On Thursday, 6 June 2013 at 14:54:25 UTC, Dylan Knutson wrote:
 On Thursday, 6 June 2013 at 10:48:54 UTC, Lars T. Kyllingstad
 They add great value, but that is a completely different 
 discussion, as these are more similar to std.file.DirEntry.  
 [...]
Path and dirEntry are different modules with different goals to fulfill. I don't think it's appropriate to compare a module whose function is path manipulation with one whose is querying filesystem information.
Which is why my first sentence said "that is a completely different discussion".
Jun 06 2013
prev sibling parent "Regan Heath" <regan netmail.co.nz> writes:
On Thu, 06 Jun 2013 15:54:24 +0100, Dylan Knutson <tcdknutson gmail.com>  
wrote:

 On Thursday, 6 June 2013 at 10:48:54 UTC, Lars T. Kyllingstad wrote:
 On Thursday, 6 June 2013 at 10:32:36 UTC, Regan Heath wrote:
 On Thu, 06 Jun 2013 08:55:50 +0100, Lars T. Kyllingstad  
 <public kyllingen.net> wrote:

 On Thursday, 6 June 2013 at 07:05:52 UTC, Lars T. Kyllingstad wrote:
 [...]
Let me add some more to this. To justify the addition of such a type, it needs to pull its own weight. For added value, it could do one or both of the following:
Does System.IO.DirectoryInfo: http://msdn.microsoft.com/en-us/library/system.io.directoryinfo.aspx Add sufficient value to justify it's existence to your mind? vs just having System.IO.Directory: http://msdn.microsoft.com/en-us/library/system.io.directory.aspx
They add great value, but that is a completely different discussion, as these are more similar to std.file.DirEntry. The added value is mainly in the performance benefits; for example, if (exists(f) && isFile(f) && timeLastModified(f) < d) ... requires three filesystem lookups (stat() calls), whereas auto de = dirEntry(f); if (de.exists && de.isFile && de.timeLastModified < d) ... is just one. I see no such benefit in the proposed Path type.
Path and dirEntry are different modules with different goals to fulfill. I don't think it's appropriate to compare a module whose function is path manipulation with one whose is querying filesystem information.
Yeah, my fault. I didn't take the time to look at the proposed module in detail. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Jun 07 2013
prev sibling parent reply "Dylan Knutson" <tcdknutson gmail.com> writes:
 Let me add some more to this.  To justify the addition of such 
 a type, it needs to pull its own weight.  For added value, it 
 could do one or both of the following:

 1. Maintain an isValidPath() invariant, for early error 
 detection.  (On POSIX, this is rather trivial, as any string 
 that does not contain a null character is in principle a valid 
 path, but on Windows, the situation is different.)
That's a possibility.
 2. Add in-place versions of path modifiers (setExtension, 
 setDrive, etc.), for improved performance.
I don't think that there'll be any performance improvements by making in place modification functions. Considering under the hood the path object is just a string, and that string's reference needs to be changed with each modification, I don't see how manipulation can be made faster.
 One solution would be for Path to be a trivial string wrapper 
 which does (1) and not (2).  In this case, it is justified to 
 have Path *in addition to* the existing functions.

 Another solution would be for Path to do (2), possibly in 
 addition to (1).  However, in this case it should be a 
 *replacement* for the existing functions, and not an addition.  
 Otherwise, we have two almost-equal ways of doing the same 
 thing, which should be avoided.  (I am not advocating this, 
 however, as it will massively break user code all over again.)
The more I think about it, the more partial I am to removing the existing string methods in std.path. At most, using a Path object increases number of characters typed by 6 (`Path()`). And even then, chances are you'll be saving characters as method names can be simplified to remove `path` from them: buildNormalizedPath -> normalized, isValidPath -> isValid, etc. Even with user code breaking, 1) D isn't exactly considered a stable language quite yet; I'm sure that users expect code breakage with each new release, and 2) it's trivial to convert code that uses the string based API to the object based API.
Jun 06 2013
parent reply "Lars T. Kyllingstad" <public kyllingen.net> writes:
On Thursday, 6 June 2013 at 14:39:03 UTC, Dylan Knutson wrote:
 [...]

 I don't think that there'll be any performance improvements by 
 making in place modification functions. Considering under the 
 hood the path object is just a string, and that string's 
 reference needs to be changed with each modification, I don't 
 see how manipulation can be made faster.
Why does _path have to be an immutable string? It could just as well be a char[], or it could be templated on the character type.
 [...]

 The more I think about it, the more partial I am to removing 
 the existing string methods in std.path. At most, using a Path 
 object increases number of characters typed by 6 (`Path()`). 
 And even then, chances are you'll be saving characters as 
 method names can be simplified to remove `path` from them: 
 buildNormalizedPath -> normalized, isValidPath -> isValid, etc. 
 Even with user code breaking, 1) D isn't exactly considered a 
 stable language quite yet; I'm sure that users expect code 
 breakage with each new release, and 2) it's trivial to convert 
 code that uses the string based API to the object based API.
I know D isn't 100% stable yet, but bear in mind that this module was introduced no more than two years ago, as part of the (still-ongoing) effort to revamp the old modules from the D1 days. It was accepted with a unanimous vote after a comprehensive review by the D community. And already you want another breaking redesign? I am strongly opposed to this.
Jun 06 2013
parent reply "Dylan Knutson" <tcdknutson gmail.com> writes:
On Thursday, 6 June 2013 at 15:24:09 UTC, Lars T. Kyllingstad 
wrote:
 On Thursday, 6 June 2013 at 14:39:03 UTC, Dylan Knutson wrote:
 [...]

 I don't think that there'll be any performance improvements by 
 making in place modification functions. Considering under the 
 hood the path object is just a string, and that string's 
 reference needs to be changed with each modification, I don't 
 see how manipulation can be made faster.
Why does _path have to be an immutable string? It could just as well be a char[], or it could be templated on the character type.
 [...]

 The more I think about it, the more partial I am to removing 
 the existing string methods in std.path. At most, using a Path 
 object increases number of characters typed by 6 (`Path()`). 
 And even then, chances are you'll be saving characters as 
 method names can be simplified to remove `path` from them: 
 buildNormalizedPath -> normalized, isValidPath -> isValid, 
 etc. Even with user code breaking, 1) D isn't exactly 
 considered a stable language quite yet; I'm sure that users 
 expect code breakage with each new release, and 2) it's 
 trivial to convert code that uses the string based API to the 
 object based API.
I know D isn't 100% stable yet, but bear in mind that this module was introduced no more than two years ago, as part of the (still-ongoing) effort to revamp the old modules from the D1 days. It was accepted with a unanimous vote after a comprehensive review by the D community. And already you want another breaking redesign? I am strongly opposed to this.
Well, keep in mind that D 2 years ago was a different beast. AFAIK, D only recently got `alias X this`, which solves 90% of breakage problems when passing around Paths. FWIW, having Path be an object adds consistency with the rest of Phobos, which has many entities which could be expressed as primitives, expressed as objects. To name a few, DateTime is an object, File is an object, and DirEntry is an object. Yes, they could be described as integers, or a pointer, or a string, but it's less cognitive load on the developer to recognize them as separate types.
Jun 06 2013
parent reply "Lars T. Kyllingstad" <public kyllingen.net> writes:
On Thursday, 6 June 2013 at 15:41:51 UTC, Dylan Knutson wrote:
 FWIW, having Path be an object adds consistency with the rest 
 of Phobos, which has many entities which could be expressed as 
 primitives, expressed as objects. To name a few, DateTime is an 
 object, File is an object, and DirEntry is an object. Yes, they 
 could be described as integers, or a pointer, or a string, but 
 it's less cognitive load on the developer to recognize them as 
 separate types.
"Reducing cognitive load" is not the main reason these are objects. DateTime lumps together no less than six integers. File adds automatic resource management via reference counting. DirEntry caches file information to avoid repeated filesystem lookups. And so on.
Jun 06 2013
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/6/2013 8:57 AM, Lars T. Kyllingstad wrote:
 On Thursday, 6 June 2013 at 15:41:51 UTC, Dylan Knutson wrote:
 FWIW, having Path be an object adds consistency with the rest of Phobos, which
 has many entities which could be expressed as primitives, expressed as
 objects. To name a few, DateTime is an object, File is an object, and DirEntry
 is an object. Yes, they could be described as integers, or a pointer, or a
 string, but it's less cognitive load on the developer to recognize them as
 separate types.
"Reducing cognitive load" is not the main reason these are objects. DateTime lumps together no less than six integers. File adds automatic resource management via reference counting. DirEntry caches file information to avoid repeated filesystem lookups. And so on.
It's hard to see what value there is in a type that is simply a wrapper around an existing type, and which provides implicit conversions too/from that existing type so that they can be intermixed arbitrarily. At the end, that's nothing more than: alias string Path;
Jun 06 2013
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/6/13 1:41 PM, Walter Bright wrote:
 On 6/6/2013 8:57 AM, Lars T. Kyllingstad wrote:
 On Thursday, 6 June 2013 at 15:41:51 UTC, Dylan Knutson wrote:
 FWIW, having Path be an object adds consistency with the rest of
 Phobos, which
 has many entities which could be expressed as primitives, expressed as
 objects. To name a few, DateTime is an object, File is an object, and
 DirEntry
 is an object. Yes, they could be described as integers, or a pointer,
 or a
 string, but it's less cognitive load on the developer to recognize
 them as
 separate types.
"Reducing cognitive load" is not the main reason these are objects. DateTime lumps together no less than six integers. File adds automatic resource management via reference counting. DirEntry caches file information to avoid repeated filesystem lookups. And so on.
It's hard to see what value there is in a type that is simply a wrapper around an existing type, and which provides implicit conversions too/from that existing type so that they can be intermixed arbitrarily. At the end, that's nothing more than: alias string Path;
No, you get to check the conversions going one way. If you destroy, destroy in style. This is a wrong argument. Andrei
Jun 06 2013
prev sibling next sibling parent reply "Regan Heath" <regan netmail.co.nz> writes:
On Thu, 06 Jun 2013 08:05:51 +0100, Lars T. Kyllingstad  
<public kyllingen.net> wrote:
 Paths are usually obtained in string form, and they are normally passed  
 to other functions and third party libraries in string form.  Having to  
 convert them to something else just to do what is, in fact, string  
 manipulations, is just annoying.
Agree 100%. and this is good. It also has System.File and System.Directory static classes with static methods taking string, also good. constructed from a string, and then have methods which mirror the static methods from System.File plus a refresh method to update the cached file attributes etc obtained from the file system. I find these objects useful. It would be nice for D to have similar objects, IMO. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Jun 06 2013
parent reply "Lars T. Kyllingstad" <public kyllingen.net> writes:
On Thursday, 6 June 2013 at 10:30:05 UTC, Regan Heath wrote:
 On Thu, 06 Jun 2013 08:05:51 +0100, Lars T. Kyllingstad 
 <public kyllingen.net> wrote:
 Paths are usually obtained in string form, and they are 
 normally passed to other functions and third party libraries 
 in string form.  Having to convert them to something else just 
 to do what is, in fact, string manipulations, is just annoying.
Agree 100%. a string and this is good. It also has System.File and System.Directory static classes with static methods taking string, also good. which are constructed from a string, and then have methods which mirror the static methods from System.File plus a refresh method to update the cached file attributes etc obtained from the file system. I find these objects useful. It would be nice for D to have similar objects, IMO.
It does have a similar type: std.file.DirEntry.
Jun 06 2013
parent "Regan Heath" <regan netmail.co.nz> writes:
On Thu, 06 Jun 2013 11:43:51 +0100, Lars T. Kyllingstad  
<public kyllingen.net> wrote:

 On Thursday, 6 June 2013 at 10:30:05 UTC, Regan Heath wrote:
 On Thu, 06 Jun 2013 08:05:51 +0100, Lars T. Kyllingstad  
 <public kyllingen.net> wrote:
 Paths are usually obtained in string form, and they are normally  
 passed to other functions and third party libraries in string form.   
 Having to convert them to something else just to do what is, in fact,  
 string manipulations, is just annoying.
Agree 100%. and this is good. It also has System.File and System.Directory static classes with static methods taking string, also good. are constructed from a string, and then have methods which mirror the static methods from System.File plus a refresh method to update the cached file attributes etc obtained from the file system. I find these objects useful. It would be nice for D to have similar objects, IMO.
It does have a similar type: std.file.DirEntry.
Ahh.. excellent. In that case, I don't think we want/need the Path being proposed. Side-note; DirEntry is a very UNIX centric name - I only know that because I have coded with it, I wonder what pure windows developers make of it.. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Jun 06 2013
prev sibling parent reply "Dylan Knutson" <tcdknutson gmail.com> writes:
On Thursday, 6 June 2013 at 07:05:52 UTC, Lars T. Kyllingstad 
wrote:
 Paths are usually obtained in string form, and they are 
 normally passed to other functions and third party libraries in 
 string form.  Having to convert them to something else just to 
 do what is, in fact, string manipulations, is just annoying.
Well, when designing Path, I didn't want to add much, if any, programmer overhead. Conversion to a Path is trivial: Change the type to Path, and 90% of the time it'll just work. The only case that comes to mind where a string can't be implicitly assigned/converted to a Path is when passing it to a function, in which case all it needs to be wrapped in is Path(). Or, have an overloaded version that takes a string (which all path using functions do now anyways).
 (One of my biggest gripes with boost::filesystem is that 
 conversions between path and string necessitate a copy, which 
 is not a problem with your Path type, so in that respect it is 
 better than Boost's solution.)


 [...]

 Why I think it should be reconsidered for inclusion in the std 
 (listed in the pull):
 * Adds a (more) platform independent abstraction for path 
 strings.
How is this more platform independent? It is just a simple wrapper around a string, with methods that forward to already-extant module-level functions.
I should have said "makes it easier to be platform independent". Normalization is done automatically on comparison. There's nothing you can't do with normal std.path functions, but that's not the point. It's to be type safe and add convenience.
 * Path provides a type safe way to pass, compare, and 
 manipulate arbitrary path strings.
How is it safer? I would agree with this if it verified that isValidpath(_path) on construction and maintained this as an invariant, but I cannot see that it does.
Type safe. Once you've got a huge program with many concepts floating around, you don't want to have to keep track of which strings are paths and which aren't, and you don't want to do all the specifics like splitting, normalization, and joining with raw string functions. This isn't just conjecture either; there are D programs in the wild that abstract away path strings because it's easier to deal with them that way. I didn't want to force paths passed in to be valid, because the programmer might want an invalid path passed around for whatever reason.
 * It wraps over the functions defined in std.path, so behavior 
 of methods on Path are, in most cases, identical to their 
 corresponding module function.
Then what is the added value?
See above. I didn't want to change functionality, just make it easier to use.
 As the author of std.path this may come across as hostile or 
 jealous, but I don't see that the proposed change improves 
 anything.
You came off as quite constructive; thank you :-)
Jun 06 2013
parent reply "Lars T. Kyllingstad" <public kyllingen.net> writes:
On Thursday, 6 June 2013 at 14:51:13 UTC, Dylan Knutson wrote:
 I should have said "makes it easier to be platform 
 independent". Normalization is done automatically on comparison.
Yes, p1 == p2 sure looks nice, but unbeknownst to the API user, it comes at the cost of several memory allocations, and it does not perform a case-insensitive comparison on Windows in its current form. (Should it? I dunno.)
 This isn't just conjecture either; there are D programs in the 
 wild that abstract away path strings because it's easier to 
 deal with them that way.
 I didn't want to force paths passed in to be valid, because the 
 programmer might want an invalid path passed around for 
 whatever reason.
As others have pointed out, there are examples of the opposite too.
 You came off as quite constructive; thank you :-)
:)
Jun 06 2013
parent reply "Dylan Knutson" <tcdknutson gmail.com> writes:
On Thursday, 6 June 2013 at 16:06:50 UTC, Lars T. Kyllingstad 
wrote:
 On Thursday, 6 June 2013 at 14:51:13 UTC, Dylan Knutson wrote:
 I should have said "makes it easier to be platform 
 independent". Normalization is done automatically on 
 comparison.
Yes, p1 == p2 sure looks nice, but unbeknownst to the API user, it comes at the cost of several memory allocations, and it does not perform a case-insensitive comparison on Windows in its current form. (Should it? I dunno.)
It doesn't do any allocations that the user won't have to do anyways. Paths have to be normalized before comparison; not doing so isn't correct behavior. Eg, the strings `foo../bar` != `bar`, yet they're equivalent paths. Path encapsulates the behavior. So it's the difference between buildNormalizedPath(s1) == buildNormalizedPath(s2); and p1 == p2;
Jun 06 2013
next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 06 Jun 2013 12:14:30 -0400, Dylan Knutson <tcdknutson gmail.com>  
wrote:

 On Thursday, 6 June 2013 at 16:06:50 UTC, Lars T. Kyllingstad wrote:
 On Thursday, 6 June 2013 at 14:51:13 UTC, Dylan Knutson wrote:
 I should have said "makes it easier to be platform independent".  
 Normalization is done automatically on comparison.
Yes, p1 == p2 sure looks nice, but unbeknownst to the API user, it comes at the cost of several memory allocations, and it does not perform a case-insensitive comparison on Windows in its current form. (Should it? I dunno.)
It doesn't do any allocations that the user won't have to do anyways. Paths have to be normalized before comparison; not doing so isn't correct behavior. Eg, the strings `foo../bar` != `bar`, yet they're equivalent paths. Path encapsulates the behavior. So it's the difference between buildNormalizedPath(s1) == buildNormalizedPath(s2); and p1 == p2;
This can be done without allocations. -Steve
Jun 06 2013
next sibling parent reply "Lars T. Kyllingstad" <public kyllingen.net> writes:
On Thursday, 6 June 2013 at 17:13:10 UTC, Steven Schveighoffer 
wrote:
 On Thu, 06 Jun 2013 12:14:30 -0400, Dylan Knutson 
 <tcdknutson gmail.com> wrote:

 It doesn't do any allocations that the user won't have to do 
 anyways. Paths have to be normalized before comparison; not 
 doing so isn't correct behavior. Eg, the strings `foo../bar` 
 != `bar`, yet they're equivalent paths. Path encapsulates the 
 behavior. So it's the difference between

 buildNormalizedPath(s1) == buildNormalizedPath(s2);

 and

 p1 == p2;
This can be done without allocations.
I know. There are a few additions that I've been planning to make for std.path for the longest time, I just haven't found the time to do so yet. Specifically, I want to add a couple of functions that deal with ranges of path segments rather than full path strings. The first one is a lazy "path normaliser": assert (equal(pathNormalizer(["foo", "bar", "..", "baz"]), ["foo", "bar", "baz"])); With this, non-allocating path comparison is easy. The verbose version of p1 == p2, which could be wrapped for convenience, is then: equal(pathNormalizer(pathSplitter(p1)), pathNormalizer(pathSplitter(p2))) You can also use filenameCmp() as a predicate to equal() to make the comparison case-insensitive on OSes where this is expected. Very general and composable, and easily wrappable. The second thing I'd like to add is an overload of buildPath() that takes a range of path segments. (Then buildNormalizedPath(p) can also be implemented as buildPath(pathNormalizer(p)).) Maybe now is a good time to get this done. :)
Jun 06 2013
next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 06 Jun 2013 13:25:56 -0400, Lars T. Kyllingstad  
<public kyllingen.net> wrote:

 On Thursday, 6 June 2013 at 17:13:10 UTC, Steven Schveighoffer wrote:
 On Thu, 06 Jun 2013 12:14:30 -0400, Dylan Knutson  
 <tcdknutson gmail.com> wrote:

 It doesn't do any allocations that the user won't have to do anyways.  
 Paths have to be normalized before comparison; not doing so isn't  
 correct behavior. Eg, the strings `foo../bar` != `bar`, yet they're  
 equivalent paths. Path encapsulates the behavior. So it's the  
 difference between

 buildNormalizedPath(s1) == buildNormalizedPath(s2);

 and

 p1 == p2;
This can be done without allocations.
I know. There are a few additions that I've been planning to make for std.path for the longest time, I just haven't found the time to do so yet. Specifically, I want to add a couple of functions that deal with ranges of path segments rather than full path strings. The first one is a lazy "path normaliser": assert (equal(pathNormalizer(["foo", "bar", "..", "baz"]), ["foo", "bar", "baz"])); With this, non-allocating path comparison is easy. The verbose version of p1 == p2, which could be wrapped for convenience, is then: equal(pathNormalizer(pathSplitter(p1)), pathNormalizer(pathSplitter(p2))) You can also use filenameCmp() as a predicate to equal() to make the comparison case-insensitive on OSes where this is expected. Very general and composable, and easily wrappable.
Great! I'd highly suggest pathEqual which takes two ranges of dchar and does the composition and OS-specific comparison for you. -Steve
Jun 06 2013
parent reply "Lars T. Kyllingstad" <public kyllingen.net> writes:
On Thursday, 6 June 2013 at 17:28:56 UTC, Steven Schveighoffer 
wrote:
 On Thu, 06 Jun 2013 13:25:56 -0400, Lars T. Kyllingstad 
 <public kyllingen.net> wrote:

 On Thursday, 6 June 2013 at 17:13:10 UTC, Steven Schveighoffer 
 wrote:
 On Thu, 06 Jun 2013 12:14:30 -0400, Dylan Knutson 
 <tcdknutson gmail.com> wrote:

 It doesn't do any allocations that the user won't have to do 
 anyways. Paths have to be normalized before comparison; not 
 doing so isn't correct behavior. Eg, the strings `foo../bar` 
 != `bar`, yet they're equivalent paths. Path encapsulates 
 the behavior. So it's the difference between

 buildNormalizedPath(s1) == buildNormalizedPath(s2);

 and

 p1 == p2;
This can be done without allocations.
I know. There are a few additions that I've been planning to make for std.path for the longest time, I just haven't found the time to do so yet. Specifically, I want to add a couple of functions that deal with ranges of path segments rather than full path strings. The first one is a lazy "path normaliser": assert (equal(pathNormalizer(["foo", "bar", "..", "baz"]), ["foo", "bar", "baz"])); With this, non-allocating path comparison is easy. The verbose version of p1 == p2, which could be wrapped for convenience, is then: equal(pathNormalizer(pathSplitter(p1)), pathNormalizer(pathSplitter(p2))) You can also use filenameCmp() as a predicate to equal() to make the comparison case-insensitive on OSes where this is expected. Very general and composable, and easily wrappable.
Great! I'd highly suggest pathEqual which takes two ranges of dchar and does the composition and OS-specific comparison for you.
They don't have to be dchar if all the building blocks are templates (as the existing ones are): bool pathEqual(CaseSensitive cs = CaseSensitive.osDefault, C1, C2) (const(C1)[] p1, const(C2)[] p2) if (isSomeChar!C1 && isSomeChar!C2) { return equal!((a, b) => filenameCharCmp!cs(a, b) == 0) (pathNormalizer(pathSplitter(p1)), pathNormalizer(pathSplitter(p2))); }
Jun 06 2013
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 06 Jun 2013 13:40:37 -0400, Lars T. Kyllingstad  
<public kyllingen.net> wrote:

 On Thursday, 6 June 2013 at 17:28:56 UTC, Steven Schveighoffer wrote:
 Great!  I'd highly suggest pathEqual which takes two ranges of dchar  
 and does the composition and OS-specific comparison for you.
They don't have to be dchar if all the building blocks are templates (as the existing ones are): bool pathEqual(CaseSensitive cs = CaseSensitive.osDefault, C1, C2) (const(C1)[] p1, const(C2)[] p2) if (isSomeChar!C1 && isSomeChar!C2)
Actually, all string variants are dchar ranges :) And your solution is less general, dchar ranges don't have to be arrays. However, I don't think in practice there are any real non-array dchar ranges... One thing your version does do is explicitly say the parameters are const, which you couldn't do with a non-array dchar range. -Steve
Jun 06 2013
parent "Lars T. Kyllingstad" <public kyllingen.net> writes:
On Thursday, 6 June 2013 at 17:48:59 UTC, Steven Schveighoffer 
wrote:
 On Thu, 06 Jun 2013 13:40:37 -0400, Lars T. Kyllingstad 
 <public kyllingen.net> wrote:

 On Thursday, 6 June 2013 at 17:28:56 UTC, Steven Schveighoffer 
 wrote:
 Great!  I'd highly suggest pathEqual which takes two ranges 
 of dchar and does the composition and OS-specific comparison 
 for you.
They don't have to be dchar if all the building blocks are templates (as the existing ones are): bool pathEqual(CaseSensitive cs = CaseSensitive.osDefault, C1, C2) (const(C1)[] p1, const(C2)[] p2) if (isSomeChar!C1 && isSomeChar!C2)
Actually, all string variants are dchar ranges :) And your solution is less general, dchar ranges don't have to be arrays.
Ok, now I see what you meant.
 However, I don't think in practice there are any real non-array 
 dchar ranges...
At least not any that also support slicing, which I think it is fair to require of "path ranges".
Jun 06 2013
prev sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Thursday, June 06, 2013 19:25:56 Lars T. Kyllingstad wrote:
 I know. There are a few additions that I've been planning to
 make for std.path for the longest time, I just haven't found the
 time to do so yet. Specifically, I want to add a couple of
 functions that deal with ranges of path segments rather than full
 path strings.
Another thing to consider is overloads of some of the functions which take an output range as their first argument. There has been an increased push lately to cut down on GC allocations in Phobos, and so we're probably going to start having more functions be overloaded such that they can be used with output ranges in order to give the folks who want to avoid the GC more control - similar to how we have the overload of toString that takes a delegate (though outside of classes, since we can templatize stuff, using an output range is more flexible than a delegate, though a delegate does qualify as an ouput range apparently). - Jonathan M Davis
Jun 06 2013
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/6/13 1:13 PM, Steven Schveighoffer wrote:
 buildNormalizedPath(s1) == buildNormalizedPath(s2);

 and

 p1 == p2;
This can be done without allocations.
Interesting. "Show me the code!" Andrei
Jun 06 2013
next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 06 Jun 2013 13:47:42 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 6/6/13 1:13 PM, Steven Schveighoffer wrote:
 buildNormalizedPath(s1) == buildNormalizedPath(s2);

 and

 p1 == p2;
This can be done without allocations.
Interesting. "Show me the code!"
I think Lars summed it up nicely. It's not full working code yet, but it shows how one can do the path splitting and normalization lazily. However, it should be noted that buildNormalizedPath cannot be done without allocations, just the full comparison. -Steve
Jun 06 2013
prev sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 6/6/2013 10:47 AM, Andrei Alexandrescu wrote:
 On 6/6/13 1:13 PM, Steven Schveighoffer wrote:
 buildNormalizedPath(s1) == buildNormalizedPath(s2);

 and

 p1 == p2;
This can be done without allocations.
Interesting. "Show me the code!"
Not necessary - it is trivially obvious to the most casual observer! (You just use the same logic that normalizes the path to do the comparison.)
Jun 06 2013
prev sibling next sibling parent "Lars T. Kyllingstad" <public kyllingen.net> writes:
On Thursday, 6 June 2013 at 16:14:31 UTC, Dylan Knutson wrote:
 On Thursday, 6 June 2013 at 16:06:50 UTC, Lars T. Kyllingstad 
 wrote:
 It doesn't do any allocations that the user won't have to do 
 anyways. Paths have to be normalized before comparison; not 
 doing so isn't correct behavior. Eg, the strings `foo../bar` != 
 `bar`, yet they're equivalent paths. Path encapsulates the 
 behavior. So it's the difference between

 buildNormalizedPath(s1) == buildNormalizedPath(s2);

 and

 p1 == p2;
To me, at least, the first one practically screams "expensive operation", whereas the second one does the exact opposite.
Jun 06 2013
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/6/2013 9:14 AM, Dylan Knutson wrote:
 It doesn't do any allocations that the user won't have to do anyways. Paths
have
 to be normalized before comparison; not doing so isn't correct behavior. Eg,
the
 strings `foo../bar` != `bar`, yet they're equivalent paths. Path encapsulates
 the behavior. So it's the difference between

 buildNormalizedPath(s1) == buildNormalizedPath(s2);

 and

 p1 == p2;
I believe it is a mistake to try and automatically hide the difference between ./bar and bar. Paths being == and 'referring to the same file' are different things. For example, what about symlinks? For performance reasons, also, I'd want to normalize sometime after building the entire path, I wouldn't want to normalize at each step. Normalization should be an explicit step, not implicit.
Jun 06 2013
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 06 Jun 2013 13:50:13 -0400, Walter Bright  
<newshound2 digitalmars.com> wrote:

 For example, what about symlinks?
Path operations should not require a real filesystem. They are string manipulations, nothing more. There is huge value in that. -Steve
Jun 06 2013
parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Thursday, June 06, 2013 13:53:51 Steven Schveighoffer wrote:
 On Thu, 06 Jun 2013 13:50:13 -0400, Walter Bright
 
 <newshound2 digitalmars.com> wrote:
 For example, what about symlinks?
Path operations should not require a real filesystem. They are string manipulations, nothing more. There is huge value in that.
Agreed, but symlinks highlight the fact that there is a difference between paths being equal and paths referring to the same file. - Jonathan M Davis
Jun 06 2013
prev sibling next sibling parent reply "Flamaros" <flamaros.xavier gmail.com> writes:
On Wednesday, 5 June 2013 at 06:27:46 UTC, Dylan Knutson wrote:
 Hello,
 I'd like to open up the idea of Path being an object in 
 std.path. I've submitted a pull 
 (https://github.com/D-Programming-Language/phobos/pull/1333) 
 that adds a Path struct to std.path, "which exposes a much more 
 palatable interface to path string manipulation".

 As jmdavis points out, this has previously been discussed. 
 However, I can't find that discussion, and I think that the 
 benefits of including an OO way to deal with paths is a serious 
 gain for the standard library.

 Why I think it should be reconsidered for inclusion in the std 
 (listed in the pull):
 * Adds a (more) platform independent abstraction for path 
 strings.
 * Path provides a type safe way to pass, compare, and 
 manipulate arbitrary path strings.
 * It wraps over the functions defined in std.path, so behavior 
 of methods on Path are, in most cases, identical to their 
 corresponding module function.

 I'd like some feedback on what others think about this; I'd 
 hate to see this commit closed due to a discussion that 
 happened at a different point in D's development when the 
 language had different needs.

 Thank you.
I like the idea to manipulate paths trough an object. API that taking path as parameter as better typed than with string. It's really usefull for file loaders, it's affirm the method will do path related operation and expect a particular string format. Some methods seems miss like completeBaseName and completeSuffix. You can take a look to : Qt API http://qt-project.org/doc/qt-4.8/qfileinfo.html The bad thing with the Qt API it's we can't know which method do a file system access, that why I prefer having 2 separated ojects. It would be good to have the FileInfo object.
Jun 06 2013
parent "Flamaros" <flamaros.xavier gmail.com> writes:
On Thursday, 6 June 2013 at 07:26:53 UTC, Flamaros wrote:
 On Wednesday, 5 June 2013 at 06:27:46 UTC, Dylan Knutson wrote:
 Hello,
 I'd like to open up the idea of Path being an object in 
 std.path. I've submitted a pull 
 (https://github.com/D-Programming-Language/phobos/pull/1333) 
 that adds a Path struct to std.path, "which exposes a much 
 more palatable interface to path string manipulation".

 As jmdavis points out, this has previously been discussed. 
 However, I can't find that discussion, and I think that the 
 benefits of including an OO way to deal with paths is a 
 serious gain for the standard library.

 Why I think it should be reconsidered for inclusion in the std 
 (listed in the pull):
 * Adds a (more) platform independent abstraction for path 
 strings.
 * Path provides a type safe way to pass, compare, and 
 manipulate arbitrary path strings.
 * It wraps over the functions defined in std.path, so behavior 
 of methods on Path are, in most cases, identical to their 
 corresponding module function.

 I'd like some feedback on what others think about this; I'd 
 hate to see this commit closed due to a discussion that 
 happened at a different point in D's development when the 
 language had different needs.

 Thank you.
I like the idea to manipulate paths trough an object. API that taking path as parameter as better typed than with string. It's really usefull for file loaders, it's affirm the method will do path related operation and expect a particular string format. Some methods seems miss like completeBaseName and completeSuffix. You can take a look to : Qt API http://qt-project.org/doc/qt-4.8/qfileinfo.html The bad thing with the Qt API it's we can't know which method do a file system access, that why I prefer having 2 separated ojects. It would be good to have the FileInfo object.
Having an object will also remove format normalization, with a string as parameter the normalization method have to always be called.
Jun 06 2013
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/4/2013 11:27 PM, Dylan Knutson wrote:
 I'd like to open up the idea of Path being an object in std.path. I've
submitted
 a pull (https://github.com/D-Programming-Language/phobos/pull/1333) that adds a
 Path struct to std.path, "which exposes a much more palatable interface to path
 string manipulation".
I've succumbed to the temptation to do this several times over the years. I always wind up backing it out and going back to strings. The objections have all been already mentioned by others in this thread. I understand the motivation for doing it, it seems like a great idea, but I am strongly opposed to it. To repeat the objections: 1. Making a more 'palatable' interface is pretty much chasing rainbows. It really isn't better, it is just different. In many ways, it is worse because it cannot hope to duplicate the rich interface available for strings. 2. APIs that deal with filenames take strings and return strings, not Path objects. Your code gets littered with path and filename components that are sometimes Paths and sometimes strings and sometimes both. 3. Every time you deal with a filename or path, you have to decide whether to use a Path or a string. This may seem like a small thing, but when writing a lot of code to deal with paths, this becomes a fracking annoyance. 4. An awful lot of path manipulation is done using string functions. Ever do regexes on paths? I do. But regex deals with strings, not Path objects. Ditto for the rest of the universe of code that deals with strings. 5. You wind up with two parallel universes of functions to deal with paths - one dealing with strings, one with Paths, oh, and a third universe of crap that deals with mixed strings and Paths. 6. If you try not to do (5), you break all existing code. 7. People like writing paths as "/etc/hosts", not Path("/etc/hosts"). People will not stand for a Path constructor that winds up allocating memory so it can rewrite the string in a canonical path representation. 8. There really isn't any such thing as a portable path representation. It's more than just \ vs /. There are the drive prefixes in Windows that have no analog in Linux. Sometimes case matters in Linux, where it would be ignored under Windows. There are 8.3 issues sometimes. The only thing you can do is come up with a subset of what works across systems, and then of course you have to go back to using strings when you need to access D:\foo\abc.c 9. People think about paths in terms of strings, not Path objects. Adding an abstraction layer always produces the feeling of "what is it doing, is it allocating memory, is it slow, is it doing something clever that I don't need/want?". This is cognitive baggage, and interferes with writing clear, correct code. I've written a lot of cross-platform path code, I've tried the Path object thing multiple times, and I wrote the original std.path, and it uses strings because of my experience.
Jun 06 2013
next sibling parent "David Nadlinger" <code klickverbot.at> writes:
On Thursday, 6 June 2013 at 15:36:17 UTC, Walter Bright wrote:
 I've succumbed to the temptation to do this several times over 
 the years.

 I always wind up backing it out and going back to strings.
As another data point (which may or may not be relevant for the discussion here), the LLVM system/support library was initially based on Path objects, but recently has been rewritten to use raw strings: http://llvm.org/docs/doxygen/html/namespacellvm_1_1sys_1_1path.html David
Jun 06 2013
prev sibling next sibling parent reply "Dylan Knutson" <tcdknutson gmail.com> writes:
On Thursday, 6 June 2013 at 15:36:17 UTC, Walter Bright wrote:
 I've succumbed to the temptation to do this several times over 
 the years.

 I always wind up backing it out and going back to strings.

 The objections have all been already mentioned by others in 
 this thread. I understand the motivation for doing it, it seems 
 like a great idea,
Yay!
 but I am strongly opposed to it.
Oh.
 To repeat the objections:

 1. Making a more 'palatable' interface is pretty much chasing 
 rainbows. It really isn't better, it is just different. In many 
 ways, it is worse because it cannot hope to duplicate the rich 
 interface available for strings.
.toString ?
 2. APIs that deal with filenames take strings and return 
 strings, not Path objects. Your code gets littered with path 
 and filename components that are sometimes Paths and sometimes 
 strings and sometimes both.
As for APIs that return strings, a `Path toPath(string)` function could be added in std.path? Another solution would be to migrate the parts of Phobos that use path strings to using actual paths. They could be overloaded with a counterpart that also takes a string, but the toPath function would be pretty useful here.
 3. Every time you deal with a filename or path, you have to 
 decide whether to use a Path or a string. This may seem like a 
 small thing, but when writing a lot of code to deal with paths, 
 this becomes a fracking annoyance.
If there should only be one API used, I'd suggest just use Path.
 4. An awful lot of path manipulation is done using string 
 functions. Ever do regexes on paths? I do. But regex deals with 
 strings, not Path objects. Ditto for the rest of the universe 
 of code that deals with strings.
Path implicitly converts to a string.
 5. You wind up with two parallel universes of functions to deal 
 with paths - one dealing with strings, one with Paths, oh, and 
 a third universe of crap that deals with mixed strings and 
 Paths.
Well, I didn't say this in my OP, but I did a few comments back: I'm more partial to deprecating the string API and moving to Path. I didn't think many would go for this, but the more I think about it, the more I realize how little code would break, and how easy it'd be to fix that.
 6. If you try not to do (5), you break all existing code.
 7. People like writing paths as "/etc/hosts", not 
 Path("/etc/hosts"). People will not stand for a Path 
 constructor that winds up allocating memory so it can rewrite 
 the string in a canonical path representation.
string s = "/etc/hosts" Path s = "/etc/hosts" It even takes less chars :-P and it only allocates on Path == Path and Path == string comparison. Which would have been done manually anyways.
 8. There really isn't any such thing as a portable path 
 representation. It's more than just \ vs /. There are the drive 
 prefixes in Windows that have no analog in Linux. Sometimes 
 case matters in Linux, where it would be ignored under Windows. 
 There are 8.3 issues sometimes. The only thing you can do is 
 come up with a subset of what works across systems, and then of 
 course you have to go back to using strings when you need to 
 access D:\foo\abc.c
Well, that's not so much a limitation of Path or path functions as much as it is with the operating systems themselves. You still run into that with strings. I'm not trying to do anything groundbreaking, just abstract away the concept of a path so it's easy to write larger applications.
 9. People think about paths in terms of strings, not Path 
 objects. Adding an abstraction layer always produces the 
 feeling of "what is it doing, is it allocating memory, is it 
 slow, is it doing something clever that I don't need/want?". 
 This is cognitive baggage, and interferes with writing clear, 
 correct code.
It's easy to think about a path as a string for trivial code. Once the application uses paths in a nontrivial manner, people write wrappers around path functions anyways. Type safety is very useful. Good practice says don't worry about the implementation of what you can't see. If the programmer is worried about the speed of the abstraction, deal with that separately. FWIW, the Path wrapper doesn't allocate unless it needs to :-)
Jun 06 2013
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/6/2013 9:00 AM, Dylan Knutson wrote:
 1. Making a more 'palatable' interface is pretty much chasing rainbows. It
 really isn't better, it is just different. In many ways, it is worse because
 it cannot hope to duplicate the rich interface available for strings.
.toString ?
 2. APIs that deal with filenames take strings and return strings, not Path
 objects. Your code gets littered with path and filename components that are
 sometimes Paths and sometimes strings and sometimes both.
As for APIs that return strings, a `Path toPath(string)` function could be added in std.path? Another solution would be to migrate the parts of Phobos that use path strings to using actual paths. They could be overloaded with a counterpart that also takes a string, but the toPath function would be pretty useful here.
Yes, your code becomes littered with conversions. Ugh.
 3. Every time you deal with a filename or path, you have to decide whether to
 use a Path or a string. This may seem like a small thing, but when writing a
 lot of code to deal with paths, this becomes a fracking annoyance.
If there should only be one API used, I'd suggest just use Path.
Except that just doesn't work out in practice. An awful lot uses strings, and again, people want to use the incredibly rich string manipulation code out there on paths.
 the more I realize how little
 code would break, and how easy it'd be to fix that.
That's been used to justify every code breakage. And yet, people eschew using D because of constant code breakage. It must stop.
 It even takes less chars :-P and it only allocates on Path == Path and Path ==
 string comparison. Which would have been done manually anyways.
Doing memory allocation to do == is a bad idea. People intuitively think of == as a cheap operation.
 Well, that's not so much a limitation of Path or path functions as much as it
is
 with the operating systems themselves. You still run into that with strings.
I'm
 not trying to do anything groundbreaking, just abstract away the concept of a
 path so it's easy to write larger applications.
But it isn't easier to use a Path object. That's one of the things I discovered when using them - it's never easier.
 Good practice says don't worry about the implementation of what you can't see.
Yeah, well, you said that == allocates memory under the hood, which is surprising behavior. Real programs definitely worry about the implementation.
 If the programmer is worried about the speed of the abstraction, deal with that
 separately.
Yes, he goes back to using strings.
Jun 06 2013
parent reply "Dylan Knutson" <tcdknutson gmail.com> writes:
On Thursday, 6 June 2013 at 16:24:11 UTC, Walter Bright wrote:
 As for APIs that return strings, a `Path toPath(string)` 
 function could be added
 in std.path? Another solution would be to migrate the parts of 
 Phobos that use
 path strings to using actual paths. They could be overloaded 
 with a counterpart
 that also takes a string, but the toPath function would be 
 pretty useful here.
Yes, your code becomes littered with conversions. Ugh.
As opposed to the rest of the conventions that Phobos uses?
 If there should only be one API used, I'd suggest just use 
 Path.
Except that just doesn't work out in practice. An awful lot uses strings, and again, people want to use the incredibly rich string manipulation code out there on paths.
Hence subtyping.
 the more I realize how little
 code would break, and how easy it'd be to fix that.
That's been used to justify every code breakage. And yet, people eschew using D because of constant code breakage. It must stop.
Well, it comes down to are we willing to marginally break code for the sake of a better API. D and Phobos aren't considered stable by any standard; I don't think we should treat them like they're set in stone. Also, deprecation gives developers plenty of time to update their code (if they have to at all).
 It even takes less chars :-P and it only allocates on Path == 
 Path and Path ==
 string comparison. Which would have been done manually anyways.
Doing memory allocation to do == is a bad idea. People intuitively think of == as a cheap operation.
It only allocates if buildNormalPath allocates. And if you aren't using buildNormalPath in the first place before comparing strings, you're comparing paths wrong.
 Well, that's not so much a limitation of Path or path 
 functions as much as it is
 with the operating systems themselves. You still run into that 
 with strings. I'm
 not trying to do anything groundbreaking, just abstract away 
 the concept of a
 path so it's easy to write larger applications.
But it isn't easier to use a Path object. That's one of the things I discovered when using them - it's never easier.
Projects such as Dub, Vibe, and to an extent Tango disagree.
 Good practice says don't worry about the implementation of 
 what you can't see.
Yeah, well, you said that == allocates memory under the hood, which is surprising behavior. Real programs definitely worry about the implementation.
Well, they shouldn't. Profile code first, see where the hotspots are, and fix those. I'd be very surprised if path comparison and manipulation is so heavily used, it becomes a slow spot for programs. And if it does, that's not the fault of the Path struct itself, but rather of the underlying functions it uses.
 If the programmer is worried about the speed of the 
 abstraction, deal with that
 separately.
Yes, he goes back to using strings.
See above; I can't think of any use case for paths where they account for a considerable amount of run time.
Jun 06 2013
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/6/2013 9:50 AM, Dylan Knutson wrote:
 Well, it comes down to are we willing to marginally break code for the sake of
a
 better API. D and Phobos aren't considered stable by any standard; I don't
think
 we should treat them like they're set in stone. Also, deprecation gives
 developers plenty of time to update their code (if they have to at all).
I don't believe that because we broke A, therefore it's ok to break B. And secondly, it isn't clear that Path is a better API. I'm not opposed to breakage in all cases. But there needs to be a big win to justify it. I'm not seeing even a small net win for Path types. I'm not talking hypothetical either, like I said, I've tried them several times.
 Projects such as Dub, Vibe, and to an extent Tango disagree.
I agree there's a strong temptation to create a Path object, and I've succumbed myself to it several times. A corollary is that people often wanted to create a String class, too, though that has died out. You might also consider David Nadlinger's counter example: "As another data point (which may or may not be relevant for the discussion here), the LLVM system/support library was initially based on Path objects, but recently has been rewritten to use raw strings: http://llvm.org/docs/doxygen/html/namespacellvm_1_1sys_1_1path.html" I've rewritten my Path code to go back to raw strings, too.
Jun 06 2013
parent reply "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Thursday, June 06, 2013 10:37:27 Walter Bright wrote:
 On 6/6/2013 9:50 AM, Dylan Knutson wrote:
 Well, it comes down to are we willing to marginally break code for the
 sake of a better API. D and Phobos aren't considered stable by any
 standard; I don't think we should treat them like they're set in stone.
 Also, deprecation gives developers plenty of time to update their code
 (if they have to at all).
I don't believe that because we broke A, therefore it's ok to break B. And secondly, it isn't clear that Path is a better API. I'm not opposed to breakage in all cases. But there needs to be a big win to justify it. I'm not seeing even a small net win for Path types. I'm not talking hypothetical either, like I said, I've tried them several times.
Some modules have needed been redone. Some still do. But we already _did_ rework std.path. We agreed that we liked the new API, and it's been working great. It's one thing to revisit an API that's been around since before we had ranges or a review process. It's an entirely different thing to be constantly reworking entire modules. I think that we need _very_ strong justification to redesign a module that we already put through the review process. And I really don't think that we have it here. - Jonathan M Davis
Jun 06 2013
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/6/2013 10:50 AM, Jonathan M Davis wrote:
 Some modules have needed been redone. Some still do. But we already _did_
 rework std.path. We agreed that we liked the new API, and it's been working
 great. It's one thing to revisit an API that's been around since before we had
 ranges or a review process. It's an entirely different thing to be constantly
 reworking entire modules. I think that we need _very_ strong justification to
 redesign a module that we already put through the review process. And I really
 don't think that we have it here.
I think we're in violent agreement. An example of a strong justification for a redo is, for example, conversion to use ranges. std.zip needs that treatment.
Jun 06 2013
parent reply "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Thursday, June 06, 2013 11:09:29 Walter Bright wrote:
 On 6/6/2013 10:50 AM, Jonathan M Davis wrote:
 Some modules have needed been redone. Some still do. But we already _did_
 rework std.path. We agreed that we liked the new API, and it's been
 working
 great. It's one thing to revisit an API that's been around since before we
 had ranges or a review process. It's an entirely different thing to be
 constantly reworking entire modules. I think that we need _very_ strong
 justification to redesign a module that we already put through the review
 process. And I really don't think that we have it here.
I think we're in violent agreement.
Yes. I was replying in support of your argument rather than replying directly to Dylan.
 An example of a strong justification for a redo is, for example, conversion
 to use ranges. std.zip needs that treatment.
Agreed. - Jonathan M Davis
Jun 06 2013
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/6/13 2:13 PM, Jonathan M Davis wrote:
 An example of a strong justification for a redo is, for example, conversion
 to use ranges. std.zip needs that treatment.
Agreed.
Key to success for Path: somehow get it on the ranges bandwagon :o). Andrei
Jun 06 2013
next sibling parent reply "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Thursday, June 06, 2013 14:38:41 Andrei Alexandrescu wrote:
 On 6/6/13 2:13 PM, Jonathan M Davis wrote:
 An example of a strong justification for a redo is, for example,
 conversion
 to use ranges. std.zip needs that treatment.
Agreed.
Key to success for Path: somehow get it on the ranges bandwagon :o).
LOL. Well, given that strings are _already_ ranges, that wouldn't help it anywhere near as much as it does with other cases of code breakage, since std.path is already quite range-ready. - Jonathan M Davis
Jun 06 2013
parent reply "monarch_dodra" <monarchdodra gmail.com> writes:
On Thursday, 6 June 2013 at 19:29:08 UTC, Jonathan M Davis wrote:
 On Thursday, June 06, 2013 14:38:41 Andrei Alexandrescu wrote:
 On 6/6/13 2:13 PM, Jonathan M Davis wrote:
 An example of a strong justification for a redo is, for 
 example,
 conversion
 to use ranges. std.zip needs that treatment.
Agreed.
Key to success for Path: somehow get it on the ranges bandwagon :o).
LOL. Well, given that strings are _already_ ranges, that wouldn't help it anywhere near as much as it does with other cases of code breakage, since std.path is already quite range-ready. - Jonathan M Davis
Something I wanted to add: I think using string as the main form of representation for a path is fine. However, there are times where it is convenient to be able to explode a path into a structure, where each part is clearly separate from the next. This makes it easy to do certain otherwise hard to do operations. eg: Change: C:\Users\Monarch\Docs\MyFile.txt to D:\Users\Monarch\MyFile.txt Regexes are fun and all, but they do come with their own complications, and pitfalls. And they *do* require efforts to write. Or use the existing interface. It works, I won't argue agains it, but I do find times where it is kind of clunky. I'd be in favor of having a "Path" object, if only for being able to help in the construction or modification of string paths. For example, I imagine something as: string oldPath = `C:\Users\Monarch\Docs\MyFile.txt`: Path myPath = Path(oldPath); myPath.drive = 'D'; myPath.folders = myPath.folders[0 .. $ - 1]; string newPath = myPath.build; I think it would be useful to have that. None of the existing interfaces change. It's just an optional tool that I think would be convenient. -------- If I may present an analogy: C deals with "time" using the arithmetic "time_t" primitive. It works, is mostly convenient, and is the standard API. Still, C also proposes the "struct tm", which is a time, exploded into year/month/day/hours/min/sec. You can do nothing with this type, except, well read and write to it, and convert it back to/from time_t. Yet, is has its uses, if only being presented in a way that might be more natural to manipulate. And that is reason enough for its existence.
Jun 07 2013
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/7/13 1:04 PM, monarch_dodra wrote:
 I think using string as the main form of representation for a path is fine.

 However, there are times where it is convenient to be able to explode a
 path into a structure, where each part is clearly separate from the
 next.
Tuple!( string, "drive", string[], "folders", string, "basename", string, "extension" ) parsePath(string path); string buildPath(string drive, string[] folders, string basename, string extension); Andrei
Jun 07 2013
next sibling parent reply "monarch_dodra" <monarchdodra gmail.com> writes:
On Friday, 7 June 2013 at 17:27:16 UTC, Andrei Alexandrescu wrote:
 On 6/7/13 1:04 PM, monarch_dodra wrote:
 I think using string as the main form of representation for a 
 path is fine.

 However, there are times where it is convenient to be able to 
 explode a
 path into a structure, where each part is clearly separate 
 from the
 next.
Tuple!( string, "drive", string[], "folders", string, "basename", string, "extension" ) parsePath(string path); string buildPath(string drive, string[] folders, string basename, string extension); Andrei
Yeah. That's pretty much more or less what I was describing. Except "buildPath" would take your (unnamed) tuple type directly. There'd be also be a "filename" member/ufcs function in there for convenience. I think that would be a small, but useful, addition to std.path.
Jun 07 2013
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/7/13 2:10 PM, monarch_dodra wrote:
 On Friday, 7 June 2013 at 17:27:16 UTC, Andrei Alexandrescu wrote:
 On 6/7/13 1:04 PM, monarch_dodra wrote:
 I think using string as the main form of representation for a path is
 fine.

 However, there are times where it is convenient to be able to explode a
 path into a structure, where each part is clearly separate from the
 next.
Tuple!( string, "drive", string[], "folders", string, "basename", string, "extension" ) parsePath(string path); string buildPath(string drive, string[] folders, string basename, string extension); Andrei
Yeah. That's pretty much more or less what I was describing. Except "buildPath" would take your (unnamed) tuple type directly.
No, the version I wrote is more flexible. You get to pass separate arguments to it or just pass a tuple with .expand. buildPath(parsePath("/bin/sh").expand) should rebuild "/bin/sh".
 There'd be also be a "filename" member/ufcs function in there for
 convenience.

 I think that would be a small, but useful, addition to std.path.
Me 2. Andrei
Jun 07 2013
parent "John Colvin" <john.loughran.colvin gmail.com> writes:
On Friday, 7 June 2013 at 18:26:42 UTC, Andrei Alexandrescu wrote:
 On 6/7/13 2:10 PM, monarch_dodra wrote:
 On Friday, 7 June 2013 at 17:27:16 UTC, Andrei Alexandrescu 
 wrote:
 On 6/7/13 1:04 PM, monarch_dodra wrote:
 I think using string as the main form of representation for 
 a path is
 fine.

 However, there are times where it is convenient to be able 
 to explode a
 path into a structure, where each part is clearly separate 
 from the
 next.
Tuple!( string, "drive", string[], "folders", string, "basename", string, "extension" ) parsePath(string path); string buildPath(string drive, string[] folders, string basename, string extension); Andrei
Yeah. That's pretty much more or less what I was describing. Except "buildPath" would take your (unnamed) tuple type directly.
No, the version I wrote is more flexible. You get to pass separate arguments to it or just pass a tuple with .expand. buildPath(parsePath("/bin/sh").expand) should rebuild "/bin/sh".
 There'd be also be a "filename" member/ufcs function in there 
 for
 convenience.

 I think that would be a small, but useful, addition to 
 std.path.
Me 2. Andrei
An overload for buildPath that took the tuple directly would be good. Typing expand all the time would get tiresome if you were doing lots of this.
Jun 07 2013
prev sibling parent reply "Lars T. Kyllingstad" <public kyllingen.net> writes:
On Friday, 7 June 2013 at 17:27:16 UTC, Andrei Alexandrescu wrote:
 On 6/7/13 1:04 PM, monarch_dodra wrote:
 I think using string as the main form of representation for a 
 path is fine.

 However, there are times where it is convenient to be able to 
 explode a
 path into a structure, where each part is clearly separate 
 from the
 next.
Tuple!( string, "drive", string[], "folders", string, "basename", string, "extension" ) parsePath(string path); string buildPath(string drive, string[] folders, string basename, string extension);
This is a good idea. Not only is it convenient, but as there is a lot of overlap in the work done by the various path decomposition functions, it will also improve performance when you need the results of several of them. But why stop at the parts you have listed there? Why not offer every possible decomposition the user could ever want? It's about the same amount of work, because the number of "split points" you need to find is exactly the same. Splitting the directory part into separate segments should be optional, since it allocates. DecomposedPath!(inout(C)) decompose(inout(C)[] path, bool splitDir = true); struct DecomposedPath(C) if (isSomeChar!C) { C[] driveName; /// Equal to driveName() C[] dirName; /// Equal to dirName() C[] noDriveDir; /// Equal to dirName().stripDrive() C[] rootName; /// Equal to rootName() C[] baseName; /// Equal to baseName() C[] stem; /// Equal to baseName().stripExtension() C[] extension; /// Equal to extension() /// Equal to dirName().pathSplitter().array() (optional) C[][] dirSegments; }
Jun 08 2013
parent reply "Lars T. Kyllingstad" <public kyllingen.net> writes:
On Saturday, 8 June 2013 at 14:08:59 UTC, Lars T. Kyllingstad 
wrote:
 On Friday, 7 June 2013 at 17:27:16 UTC, Andrei Alexandrescu 
 wrote:
 However, there are times where it is convenient to be able to 
 explode a
 path into a structure, where each part is clearly separate 
 from the
 next.
Tuple!( string, "drive", string[], "folders", string, "basename", string, "extension" ) parsePath(string path); string buildPath(string drive, string[] folders, string basename, string extension);
[...] But why stop at the parts you have listed there?
The moment I clicked "Send", I realised that offering multiple decompositions would prevent recomposition, because you'd have to choose which parts to combine.
Jun 08 2013
parent reply "monarch_dodra" <monarchdodra gmail.com> writes:
On Saturday, 8 June 2013 at 14:14:33 UTC, Lars T. Kyllingstad 
wrote:
 On Saturday, 8 June 2013 at 14:08:59 UTC, Lars T. Kyllingstad 
 wrote:
 On Friday, 7 June 2013 at 17:27:16 UTC, Andrei Alexandrescu 
 wrote:
 However, there are times where it is convenient to be able 
 to explode a
 path into a structure, where each part is clearly separate 
 from the
 next.
Tuple!( string, "drive", string[], "folders", string, "basename", string, "extension" ) parsePath(string path); string buildPath(string drive, string[] folders, string basename, string extension);
[...] But why stop at the parts you have listed there?
The moment I clicked "Send", I realised that offering multiple decompositions would prevent recomposition, because you'd have to choose which parts to combine.
Using D's property functions, this should not actually be a problem. The struct could be opaque in regards to which members are actually attributes, and which are functions. Eg: Path path = Path(`C:\MyFile.txt`); path.filename = "main.cpp"; path.extension = "d"; assert(path.buildPath() == `C:\main.d`)); I don't see any reason for that to not work.
Jun 08 2013
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/8/13 10:45 AM, monarch_dodra wrote:
 On Saturday, 8 June 2013 at 14:14:33 UTC, Lars T. Kyllingstad wrote:
 On Saturday, 8 June 2013 at 14:08:59 UTC, Lars T. Kyllingstad wrote:
 On Friday, 7 June 2013 at 17:27:16 UTC, Andrei Alexandrescu wrote:
 However, there are times where it is convenient to be able to
 explode a
 path into a structure, where each part is clearly separate from the
 next.
Tuple!( string, "drive", string[], "folders", string, "basename", string, "extension" ) parsePath(string path); string buildPath(string drive, string[] folders, string basename, string extension);
[...] But why stop at the parts you have listed there?
The moment I clicked "Send", I realised that offering multiple decompositions would prevent recomposition, because you'd have to choose which parts to combine.
Using D's property functions, this should not actually be a problem. The struct could be opaque in regards to which members are actually attributes, and which are functions. Eg: Path path = Path(`C:\MyFile.txt`); path.filename = "main.cpp"; path.extension = "d"; assert(path.buildPath() == `C:\main.d`)); I don't see any reason for that to not work.
Looks like the proposal may be converted into something liked by all - a small PathComponents struct with the appropriate primitives. A high ratio of usefulness to size would be key to acceptance. Andrei
Jun 08 2013
prev sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Jun 06, 2013 at 02:38:41PM -0400, Andrei Alexandrescu wrote:
 On 6/6/13 2:13 PM, Jonathan M Davis wrote:
An example of a strong justification for a redo is, for example,
conversion to use ranges. std.zip needs that treatment.
Agreed.
Key to success for Path: somehow get it on the ranges bandwagon :o).
[...] Hmm. Let's see: assert(isInputRange!Path); version(Windows) auto p = Path(`..\blah\blah\..\bluh`); else version(Linux) auto p = Path(`../blah/blah/../bluh`); // I'm assuming auto normalization; if you don't like that, // pretend I also wrote this line: // p.normalize(); assert(p.equals([ "..", "blah", "bluh" ]); What about that? ;-) While the above may *look* attractive, it's actually a minefield full of pitfalls. Consider this directory tree in Posix: /home/user/test /home/user/test/symlink -> /home/user/real/1 /home/user/test/real /home/user/test/real/1/myfile /home/user/test/real/2/anotherfile Let's say the current working directory is /home/user. Now consider this: auto p = Path(`test/symlink/../2/anotherfile`); assert(std.path.exists(p)); // should this work? The only way the above can actually work is if normalization queries the filesystem. That is to say, it is NOT mere string manipulations. However, *should* normalization always check the filesystem? What if the program is constructing a list of paths that it's going to create, which don't exist in the filesystem yet? Then normalization will fail, even though the paths are valid. Conclusion: correct path normalization depends on intent, which only the programmer knows -- the library can't possibly figure this out without being told. (And I haven't even started getting into OS-dependent path manipulation yet... what should Path(`C:\Program Files\abc.def`) do on a Posix system?) IOW, the programmer *already* has to know about system-dependent details of paths, so I'm not sure what value Path is really adding. At least, I'm not finding it compelling enough to eschew plain old string manipulations. Besides, should glob patterns like "/home/user/prog/*/*.d" be Path's or strings? What about path regexes? Should Path export a whole suite of parallel methods for constructing such patterns? One can always interconvert to/from strings, of course, but if we'd started out with strings in the first place, we wouldn't need any conversions. The OS ultimately takes only strings anyway, so is there really a need to insert a convert to/from Path in between? I do see a lot of value in providing *functions* for manipulating path strings (normalizations, parsing path components, splitting file extensions, etc.), but I've a hard time with encapsulating a path string in an opaque object when it doesn't really give that much more value. If you *really* like the idea of Path, nothing stops you from writing one yourself, and have it implicitly convert to string so that you can pass it directly to OS functions that take paths. I just don't see value in requiring Phobos functions to only take Path objects. T -- WINDOWS = Will Install Needless Data On Whole System -- CompuMan
Jun 06 2013
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/6/13 12:50 PM, Dylan Knutson wrote:
 Well, it comes down to are we willing to marginally break code for the
 sake of a better API.
Well the position of "marginally" in the sentence above may be contested by some.
 D and Phobos aren't considered stable by any
 standard; I don't think we should treat them like they're set in stone.
 Also, deprecation gives developers plenty of time to update their code
 (if they have to at all).
I think this opinion is very unlikely to enjoy popularity. We actively /want/ to make Phobos more stable, so using the argument that it's not yet stable to add more instability is sure to fit the pattern of some list of fallacies. Besides, the corresponding benefits (the best solid argument that could be constructed) are at least according to some not that large to justify the cost of breakage. Andrei
Jun 06 2013
parent reply "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Thursday, June 06, 2013 13:45:44 Andrei Alexandrescu wrote:
 D and Phobos aren't considered stable by any
 standard; I don't think we should treat them like they're set in stone.
 Also, deprecation gives developers plenty of time to update their code
 (if they have to at all).
I think this opinion is very unlikely to enjoy popularity. We actively /want/ to make Phobos more stable, so using the argument that it's not yet stable to add more instability is sure to fit the pattern of some list of fallacies. Besides, the corresponding benefits (the best solid argument that could be constructed) are at least according to some not that large to justify the cost of breakage.
Agreed. Breaking stuff in an effort to create a solid, stable API is one thing (and at this point, we want to minimize even that as much as we reasonably can). Constantly going back and rebreaking stuff is quite another. We already redid std.path. It went through the full review process and was voted in. We want to move towards being _more_ stable not less. Some API breakage will still be necessary (like replacing std.xml or the streaming modules), but it's a cost that we want to avoid when it isn't necessary. Each module redesign must justify itself, and the simple fact that other modules have already been redesigned is not enough for that. Not to mention, over time, it should arguably require _more_ justification to redo a module (or make any breaking change in Phobos), because more people are using it, and we really do want to be stable. - Jonathan M Davis
Jun 06 2013
parent "Peter Alexander" <peter.alexander.au gmail.com> writes:
Just want to chime in and say that I'm also against this change.

I can see some small benefits, but I also see problems, all of 
which have already been covered.

Even if it is a small net improvement, I don't think it's 
anywhere near a big enough improvement to warrant an API change.
Jun 06 2013
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/6/13 11:36 AM, Walter Bright wrote:
 To repeat the objections:
Now with devil's advocate interjections:
 1. Making a more 'palatable' interface is pretty much chasing rainbows.
 It really isn't better, it is just different. In many ways, it is worse
 because it cannot hope to duplicate the rich interface available for
 strings.
Subtyping (Path is a subtype of string by means of alias this) should make getting from paths to strings easy, and getting back from strings to paths one constructor call away (which adds correctness).
 2. APIs that deal with filenames take strings and return strings, not
 Path objects. Your code gets littered with path and filename components
 that are sometimes Paths and sometimes strings and sometimes both.
Subtyping should make it easy to pass paths to APIs that expect strings.
 3. Every time you deal with a filename or path, you have to decide
 whether to use a Path or a string. This may seem like a small thing, but
 when writing a lot of code to deal with paths, this becomes a fracking
 annoyance.
If there's a reward for using paths the annoyance factor may be reduced.
 4. An awful lot of path manipulation is done using string functions.
 Ever do regexes on paths? I do. But regex deals with strings, not Path
 objects. Ditto for the rest of the universe of code that deals with
 strings.
Subtyping should take care of this.
 5. You wind up with two parallel universes of functions to deal with
 paths - one dealing with strings, one with Paths, oh, and a third
 universe of crap that deals with mixed strings and Paths.
Subtyping makes one way easy and constructors make the other way affordable. Again, this comes back to perceived gains that compensate for the shortcomings.
 6. If you try not to do (5), you break all existing code.
Only "half".
 7. People like writing paths as "/etc/hosts", not Path("/etc/hosts").
 People will not stand for a Path constructor that winds up allocating
 memory so it can rewrite the string in a canonical path representation.
Lazy canonicalization may help.
 8. There really isn't any such thing as a portable path representation.
 It's more than just \ vs /. There are the drive prefixes in Windows that
 have no analog in Linux. Sometimes case matters in Linux, where it would
 be ignored under Windows. There are 8.3 issues sometimes. The only thing
 you can do is come up with a subset of what works across systems, and
 then of course you have to go back to using strings when you need to
 access D:\foo\abc.c
That is actually an argument in favor of good encapsulation, not against.
 9. People think about paths in terms of strings, not Path objects.
 Adding an abstraction layer always produces the feeling of "what is it
 doing, is it allocating memory, is it slow, is it doing something clever
 that I don't need/want?". This is cognitive baggage, and interferes with
 writing clear, correct code.
I'm not sure whether the generalization holds. Andrei
Jun 06 2013
parent reply "Lars T. Kyllingstad" <public kyllingen.net> writes:
On Thursday, 6 June 2013 at 16:03:15 UTC, Andrei Alexandrescu 
wrote:
 [...]

 8. There really isn't any such thing as a portable path 
 representation.
 It's more than just \ vs /. There are the drive prefixes in 
 Windows that
 have no analog in Linux. Sometimes case matters in Linux, 
 where it would
 be ignored under Windows. There are 8.3 issues sometimes. The 
 only thing
 you can do is come up with a subset of what works across 
 systems, and
 then of course you have to go back to using strings when you 
 need to
 access D:\foo\abc.c
That is actually an argument in favor of good encapsulation, not against.
The proposed API change does not introduce good encapsulation. It introduces a super-thin wrapper around a built-in type, and replaces free functions with methods, for what gain?
Jun 06 2013
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/6/13 1:04 PM, Lars T. Kyllingstad wrote:
 On Thursday, 6 June 2013 at 16:03:15 UTC, Andrei Alexandrescu wrote:
 [...]

 8. There really isn't any such thing as a portable path representation.
 It's more than just \ vs /. There are the drive prefixes in Windows that
 have no analog in Linux. Sometimes case matters in Linux, where it would
 be ignored under Windows. There are 8.3 issues sometimes. The only thing
 you can do is come up with a subset of what works across systems, and
 then of course you have to go back to using strings when you need to
 access D:\foo\abc.c
That is actually an argument in favor of good encapsulation, not against.
The proposed API change does not introduce good encapsulation. It introduces a super-thin wrapper around a built-in type, and replaces free functions with methods, for what gain?
I was talking in principle. I agree that the argument "it was as easy as wrapping the already existing functions" works against the current proposal, not in favor of it. Andrei
Jun 06 2013
prev sibling next sibling parent reply Michel Fortin <michel.fortin michelf.ca> writes:
On 2013-06-06 15:36:15 +0000, Walter Bright <newshound2 digitalmars.com> said:

 8. There really isn't any such thing as a portable path representation. 
 It's more than just \ vs /. There are the drive prefixes in Windows 
 that have no analog in Linux. Sometimes case matters in Linux, where it 
 would be ignored under Windows. There are 8.3 issues sometimes. The 
 only thing you can do is come up with a subset of what works across 
 systems, and then of course you have to go back to using strings when 
 you need to access D:\foo\abc.c
Actually, there is one portable representation for paths: URLs. More specifically "file:" URLs if we're limiting ourselves to filesystem paths. Relative URLs should probably count too. But otherwise, that's all true. To correctly normalize a path, you need to know which underlying filesystem is in use. Today's operating systems can mix and match case-sensitive, case-preserving, and case-insensitive filesystems, different restrictions on file names, and sometime have obscure restrictions/normalization when using old APIs on newer filesystenm. You can't really normalize a path without making a lot of assumptions. Of course, that's not an argument for or against having a path object to encapsulate the differences. But I'd tend to say that what the path object can do is more limited than one might think at first glance. As a side note, Apple is currently asking application developers to use URLs instead of raw paths to local files. Using URLs makes it possible for instance to attach "bookmarks" keys on path (in the query string) that can more or less automatically punch a hole in the sandbox when accessing a file (which can expire or be revoked). Pretty much all recent Cocoa APIs take url objects instead of path strings. -- Michel Fortin michel.fortin michelf.ca http://michelf.ca/
Jun 06 2013
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/6/2013 9:23 AM, Michel Fortin wrote:
 Actually, there is one portable representation for paths: URLs. More
 specifically "file:" URLs if we're limiting ourselves to filesystem paths.
 Relative URLs should probably count too.
That doesn't work for case sensitivity/insensitivity differences, nor does it work for drive letters like "C:" (which don't exist on Apple systems, hence they can afford to dismiss them). In D source code, we deal with this with the convention that package and module names must be lower case. But there's no getting around the fact that "File" and "file" are different paths under Windows, and are the same under Linux. There is no generic abstraction to account for that - the programmer must be aware of it and adjust as appropriate for his application.
Jun 06 2013
next sibling parent reply "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Thursday, June 06, 2013 10:27:28 Walter Bright wrote:
 But there's no getting around the fact
 that "File" and "file" are different paths under Windows, and are the same
 under Linux.
I think you got that backwards. ;) - Jonathan M Davis
Jun 06 2013
parent Walter Bright <newshound2 digitalmars.com> writes:
On 6/6/2013 10:59 AM, Jonathan M Davis wrote:
 On Thursday, June 06, 2013 10:27:28 Walter Bright wrote:
 But there's no getting around the fact
 that "File" and "file" are different paths under Windows, and are the same
 under Linux.
I think you got that backwards. ;)
Dang, I should have written some unittests!
Jun 06 2013
prev sibling parent reply Michel Fortin <michel.fortin michelf.ca> writes:
On 2013-06-06 17:27:28 +0000, Walter Bright <newshound2 digitalmars.com> said:

 That doesn't work for case sensitivity/insensitivity differences nor 
 does it work for drive letters like "C:" (which don't exist on Apple 
 systems, hence they can afford to dismiss them).
Have you never opened a local file in a windows web browser and took a look at the URL? The drive letter is there. file:///c:/path/to/the%20file.txt The drive letter is simply the first part of the path on Windows.
 But there's no getting around the fact that "File" and "file" are 
 different paths under Windows, and are the same under Linux.
Actually, it doesn't depend on Linux or Windows or OS X. It depends on the filesystem used, be it FAT16, FAT32, NTFS, ext{1,2,3}, HFS+, Case-sensitive HFS+, etc. If you assume a specific case sensitivity setting by looking at the OS, that's a bug. You can mount NTFS and FAT on Linux or OS X, and Apple has Case-sensitive HFS+ for OS X and its the default on iOS. Then there's the whole issue about which locale to use for Unicode case-insensitive comparisons. I'd bet that different filesystems choose different approaches to this tricky problem. So there's no way to normalize for case-sensitivity just by looking at a path or a URL, even if you know on which OS you're on. If you want to know for sure whether two paths are the same, or what is the normalized path, you need to ask the filesystem at some point. Anything else is based on fragile assumptions. -- Michel Fortin michel.fortin michelf.ca http://michelf.ca/
Jun 06 2013
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/6/2013 1:02 PM, Michel Fortin wrote:
 On 2013-06-06 17:27:28 +0000, Walter Bright <newshound2 digitalmars.com> said:

 That doesn't work for case sensitivity/insensitivity differences nor does it
 work for drive letters like "C:" (which don't exist on Apple systems, hence
 they can afford to dismiss them).
Have you never opened a local file in a windows web browser and took a look at the URL? The drive letter is there. file:///c:/path/to/the%20file.txt The drive letter is simply the first part of the path on Windows.
I didn't know that, but that doesn't make it a canonical path. It just combines the notion of url with a path.
 But there's no getting around the fact that "File" and "file" are different
 paths under Windows, and are the same under Linux.
Actually, it doesn't depend on Linux or Windows or OS X. It depends on the filesystem used, be it FAT16, FAT32, NTFS, ext{1,2,3}, HFS+, Case-sensitive HFS+, etc. If you assume a specific case sensitivity setting by looking at the OS, that's a bug. You can mount NTFS and FAT on Linux or OS X, and Apple has Case-sensitive HFS+ for OS X and its the default on iOS. Then there's the whole issue about which locale to use for Unicode case-insensitive comparisons. I'd bet that different filesystems choose different approaches to this tricky problem. So there's no way to normalize for case-sensitivity just by looking at a path or a URL, even if you know on which OS you're on. If you want to know for sure whether two paths are the same, or what is the normalized path, you need to ask the filesystem at some point. Anything else is based on fragile assumptions.
It may be a bug, and I personally try to never depend on path code that is case sensitive or not, but I bet there's a *lot* of code out there that makes those assumptions. BTW, Windows still has only erratic support for using / as path separators, even in the system commands. Not even the "DIR" command can deal with it.
Jun 06 2013
next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 06 Jun 2013 16:25:58 -0400, Walter Bright  
<newshound2 digitalmars.com> wrote:

 BTW, Windows still has only erratic support for using / as path  
 separators, even in the system commands. Not even the "DIR" command can  
 deal with it.
We don't program using DIR. That is irrelevant. (not contesting that Windows doesn't work well with '/', just that DIR, or any other command line tool, is evidence) -Steve
Jun 06 2013
parent Walter Bright <newshound2 digitalmars.com> writes:
On 6/6/2013 1:54 PM, Steven Schveighoffer wrote:
 On Thu, 06 Jun 2013 16:25:58 -0400, Walter Bright <newshound2 digitalmars.com>
 wrote:

 BTW, Windows still has only erratic support for using / as path separators,
 even in the system commands. Not even the "DIR" command can deal with it.
We don't program using DIR. That is irrelevant. (not contesting that Windows doesn't work well with '/', just that DIR, or any other command line tool, is evidence)
The fact that DIR, probably the most widely used command in Windows, doesn't support it is indicative. I've also noticed Windows file dialog boxes not supporting it, and those are supposed to be standard components. DIR is used in .bat files and makefiles, it is certainly used in programming.
Jun 06 2013
prev sibling parent Michel Fortin <michel.fortin michelf.ca> writes:
On 2013-06-06 20:25:58 +0000, Walter Bright <newshound2 digitalmars.com> said:

 On 6/6/2013 1:02 PM, Michel Fortin wrote:
 Have you never opened a local file in a windows web browser and took a look at
 the URL? The drive letter is there.
 
      file:///c:/path/to/the%20file.txt
 
 The drive letter is simply the first part of the path on Windows.
I didn't know that, but that doesn't make it a canonical path. It just combines the notion of url with a path.
It's not a canonical path, but it's a platform-neutral representation of a path. You can perform the same operations with a URL (including regular expressions) irrespective the underlying OS. I was replying initially to your claim that there was no portable way to represent a path. I don't think the definition of a "portable path" needs to include any notion of canonical, because not even non-portable paths can be canonical these days.
 Actually, it doesn't depend on Linux or Windows or OS X. It depends on the
 filesystem used, be it FAT16, FAT32, NTFS, ext{1,2,3}, HFS+, Case-sensitive
 HFS+, etc. If you assume a specific case sensitivity setting by looking at the
 OS, that's a bug. You can mount NTFS and FAT on Linux or OS X, and Apple has
 Case-sensitive HFS+ for OS X and its the default on iOS. Then there's the whole
 issue about which locale to use for Unicode case-insensitive comparisons. I'd
 bet that different filesystems choose different approaches to this 
 tricky problem.
 
 So there's no way to normalize for case-sensitivity just by looking at 
 a path or
 a URL, even if you know on which OS you're on. If you want to know for sure
 whether two paths are the same, or what is the normalized path, you need to ask
 the filesystem at some point. Anything else is based on fragile assumptions.
It may be a bug, and I personally try to never depend on path code that is case sensitive or not, but I bet there's a *lot* of code out there that makes those assumptions.
That's a good way to deal with paths (don't assume anything). And I'd bet even case-sensitive filesystems differ in behaviour when presented with different normalization of Unicode (using pre-combined characters vs. combining ones). -- Michel Fortin michel.fortin michelf.ca http://michelf.ca/
Jun 06 2013
prev sibling parent reply Brad Roberts <braddr puremagic.com> writes:
On 6/6/13 1:02 PM, Michel Fortin wrote:
 and Apple has Case-sensitive HFS+ for OS X and its the default on iOS.
Careful.. While HFS+ can be case sensitive, it's not by default. Nor is it recommended due to the number of osx applications that just aren't designed with that in mind.
Jun 07 2013
parent Michel Fortin <michel.fortin michelf.ca> writes:
On 2013-06-07 20:52:30 +0000, Brad Roberts <braddr puremagic.com> said:

 On 6/6/13 1:02 PM, Michel Fortin wrote:
 and Apple has Case-sensitive HFS+ for OS X and its the default on iOS.
Careful.. While HFS+ can be case sensitive, it's not by default. Nor is it recommended due to the number of osx applications that just aren't designed with that in mind.
True. But what I meant is that it's the default on iOS, not OS X. (Funnily, if you're running things in the iOS Simulator you'll run on the same file system as OS X, case-sensitive most likely.) -- Michel Fortin michel.fortin michelf.ca http://michelf.ca/
Jun 07 2013
prev sibling parent "Robert Clipsham" <robert octarineparrot.com> writes:
On Thursday, 6 June 2013 at 15:36:17 UTC, Walter Bright wrote:
 On 6/4/2013 11:27 PM, Dylan Knutson wrote:
 I'd like to open up the idea of Path being an object in 
 std.path. I've submitted
 a pull 
 (https://github.com/D-Programming-Language/phobos/pull/1333) 
 that adds a
 Path struct to std.path, "which exposes a much more palatable 
 interface to path
 string manipulation".
I've succumbed to the temptation to do this several times over the years. I always wind up backing it out and going back to strings.
As another data point: Java 7 introduces new Path and Paths objects: http://docs.oracle.com/javase/7/docs/api/java/nio/file/Paths.html So they clearly think using an object(s) for it is useful. ----- Without even thinking about the API, just using it, all the code I've written in the past couple of weeks looks something like this: Path p = Paths.get(someDir, someOtherDir); p = p.subpath(otherPath, p.getNameCount()); Path file = p.resolve(someFile); print(file.toString()); file.toFile().doSomething(); ie. All my code is converting to/from a Path object purely for dealing with Windows and Posix / vs \ differences and doing sub-paths. Seems a bit pointless when we could just use free functions in my opinion.
Jun 06 2013