## digitalmars.D - Path as an object in std.path

"Dylan Knutson" <tcdknutson gmail.com> writes:
Hello,
I'd like to open up the idea of Path being an object in std.path.
I've submitted a pull
(https://github.com/D-Programming-Language/phobos/pull/1333) that
adds a Path struct to std.path, "which exposes a much more
palatable interface to path string manipulation".

As jmdavis points out, this has previously been discussed.
However, I can't find that discussion, and I think that the
benefits of including an OO way to deal with paths is a serious
gain for the standard library.

Why I think it should be reconsidered for inclusion in the std
(listed in the pull):
* Adds a (more) platform independent abstraction for path strings.
* Path provides a type safe way to pass, compare, and manipulate
arbitrary path strings.
* It wraps over the functions defined in std.path, so behavior of
methods on Path are, in most cases, identical to their
corresponding module function.

to see this commit closed due to a discussion that happened at a
different point in D's development when the language had
different needs.

Thank you.
Jun 04 2013
"Joshua Niehus" <jm.niehus gmail.com> writes:
On Wednesday, 5 June 2013 at 06:27:46 UTC, Dylan Knutson wrote:
"which exposes a much more palatable interface to path string
manipulation".
[...snip...]

personally, I prefer the current implementation and found it easy to use for the multitudes of tiny scripts I've written. I wouldn't like to create an "object" just to call isAbsolute. That being said, I don't see why having the struct would hurt. Nice work by the way
Jun 05 2013
Jacob Carlborg <doob me.com> writes:
On 2013-06-05 09:11, Joshua Niehus wrote:

personally, I prefer the current implementation and found it easy to use
for the multitudes of tiny scripts I've written.  I wouldn't like to
create an "object" just to call isAbsolute.

I agree. But if you're passing around a lot of paths it would probably be a good idea to have a proper type for the paths.
That being said, I don't see why having the struct would hurt.

-- /Jacob Carlborg
Jun 05 2013
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/5/13 7:33 AM, John Colvin wrote:
On Wednesday, 5 June 2013 at 07:11:49 UTC, Joshua Niehus wrote:
On Wednesday, 5 June 2013 at 06:27:46 UTC, Dylan Knutson wrote:
"which exposes a much more palatable interface to path string
manipulation".
[...snip...]

personally, I prefer the current implementation and found it easy to use for the multitudes of tiny scripts I've written. I wouldn't like to create an "object" just to call isAbsolute. That being said, I don't see why having the struct would hurt. Nice work by the way

Is there any reason why we couldn't keep the string-based free functions around as well?

I don't have a strong opinion regarding Path object vs. string functions, and I agree both have advantages and disadvantages. But I would be opposed to having both. Andrei
Jun 05 2013
"John Colvin" <john.loughran.colvin gmail.com> writes:
On Wednesday, 5 June 2013 at 07:11:49 UTC, Joshua Niehus wrote:
On Wednesday, 5 June 2013 at 06:27:46 UTC, Dylan Knutson wrote:
"which exposes a much more palatable interface to path string
manipulation".
[...snip...]

personally, I prefer the current implementation and found it easy to use for the multitudes of tiny scripts I've written. I wouldn't like to create an "object" just to call isAbsolute. That being said, I don't see why having the struct would hurt. Nice work by the way

Is there any reason why we couldn't keep the string-based free functions around as well?
Jun 05 2013
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/5/13 2:27 AM, Dylan Knutson wrote:
Hello,
I'd like to open up the idea of Path being an object in std.path. I've
submitted a pull
Path struct to std.path, "which exposes a much more palatable interface
to path string manipulation".

Great, thanks for this work. I agree that the proposal deserves a fair shake. Andrei
Jun 05 2013
"John Colvin" <john.loughran.colvin gmail.com> writes:
On Wednesday, 5 June 2013 at 13:26:39 UTC, Andrei Alexandrescu
wrote:
On 6/5/13 7:33 AM, John Colvin wrote:
On Wednesday, 5 June 2013 at 07:11:49 UTC, Joshua Niehus wrote:
On Wednesday, 5 June 2013 at 06:27:46 UTC, Dylan Knutson
wrote:
"which exposes a much more palatable interface to path string
manipulation".
[...snip...]

personally, I prefer the current implementation and found it easy to use for the multitudes of tiny scripts I've written. I wouldn't like to create an "object" just to call isAbsolute. That being said, I don't see why having the struct would hurt. Nice work by the way

Is there any reason why we couldn't keep the string-based free functions around as well?

I don't have a strong opinion regarding Path object vs. string functions, and I agree both have advantages and disadvantages. But I would be opposed to having both. Andrei

Because of duplication of implementation? Or is it simply "2 ways to do the same thing" is bad? I was imagining the following situation: Free functions, similar/identical to current Struct that provides all current functionality by wrapping the free functions, plus any extra stuff that is only appropriate for a path object. Unfortunately the current naming scheme doesn't really suit this idea that well.
Jun 05 2013
"Regan Heath" <regan netmail.co.nz> writes:
On Wed, 05 Jun 2013 14:26:39 +0100, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:

On 6/5/13 7:33 AM, John Colvin wrote:
On Wednesday, 5 June 2013 at 07:11:49 UTC, Joshua Niehus wrote:
On Wednesday, 5 June 2013 at 06:27:46 UTC, Dylan Knutson wrote:
"which exposes a much more palatable interface to path string
manipulation".
[...snip...]

personally, I prefer the current implementation and found it easy to use for the multitudes of tiny scripts I've written. I wouldn't like to create an "object" just to call isAbsolute. That being said, I don't see why having the struct would hurt. Nice work by the way

Is there any reason why we couldn't keep the string-based free functions around as well?

I don't have a strong opinion regarding Path object vs. string functions, and I agree both have advantages and disadvantages. But I would be opposed to having both.

C# has both: 1. System.IO.FileInfo and System.IO.DirectoryInfo non-static/instance classes with methods i.e. Delete() 2. System.File and System.Directory static classes with methods accepting strings i.e. Delete(string name) R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Jun 05 2013
"Regan Heath" <regan netmail.co.nz> writes:
On Wed, 05 Jun 2013 15:12:22 +0100, Regan Heath <regan netmail.co.nz>
wrote:

On Wed, 05 Jun 2013 14:26:39 +0100, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:

On 6/5/13 7:33 AM, John Colvin wrote:
On Wednesday, 5 June 2013 at 07:11:49 UTC, Joshua Niehus wrote:
On Wednesday, 5 June 2013 at 06:27:46 UTC, Dylan Knutson wrote:
"which exposes a much more palatable interface to path string
manipulation".
[...snip...]

personally, I prefer the current implementation and found it easy to use for the multitudes of tiny scripts I've written. I wouldn't like to create an "object" just to call isAbsolute. That being said, I don't see why having the struct would hurt. Nice work by the way

Is there any reason why we couldn't keep the string-based free functions around as well?

I don't have a strong opinion regarding Path object vs. string functions, and I agree both have advantages and disadvantages. But I would be opposed to having both.

C# has both: 1. System.IO.FileInfo and System.IO.DirectoryInfo non-static/instance classes with methods i.e. Delete() 2. System.File and System.Directory static classes with methods accepting strings i.e. Delete(string name)

I forgot to say.. I've used both in different situations. Sometimes you get a FileInfo/DirectoryInfo from another method, or you have created one because you're going to re-use the path/information a lot (to get file attributes etc) and sometimes you just need to build a path using Path.Combine (into a string) and delete it, or similar. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Jun 05 2013
"Dylan Knutson" <tcdknutson gmail.com> writes:
I don't have a strong opinion regarding Path object vs. string
But I would be opposed to having both.

Personally, I'd prefer to just use the Path struct in std.path. While a Path can be represented as a string, it's not the same concept (the same way that a DateTime can be represented as an integer, but we don't just pass times around as raw integer types). However, I can't imagine that'd be feasible as it'd break a lot of code. My suggestion would be to just keep the freestanding functions to operate on raw strings, and then migrate over code that depends on std.path to use the Path struct as needed. I realize that this is easier said than done, but even then it shouldn't be a lot of work as Path can implicitly be converted to a string. This makes its integration into existing codebases/Phobos literally as easy as using "Path my_path = foo\bar" instead of "string my_path = foo\bar". You lose no functionality but gain type safety if you have to do any future refactoring.
I wouldn't like to create an "object" just to call isAbsolute.

Agreed. The best course of action would probably be keep the raw functions as they exist (think of them as the static versions of Path methods). However, for large applications, the type safety of a Path object makes working with paths much easier.
Jun 05 2013
Timothee Cour <thelastmammoth gmail.com> writes:
--089e0115f3963a355d04de6e0afe
Content-Type: text/plain; charset=ISO-8859-1

currently there's no way to perform cross-platform operations.

---
enum Platform{Posix,Windows}
version(Posix) enum PlatformDefault=Platform.Posix; else enum
PlatformDefault=Platform.Windows;
struct Path(T=PlatformDefault){}

void main(){
Path!(Platform.Posix) path="a\b";
auto path2=path.to!Path;
}
---

it allows current usage with no modification, and allows cross-platform
logic.

On Wed, Jun 5, 2013 at 1:19 PM, Dylan Knutson <tcdknutson gmail.com> wrote:

I don't have a strong opinion regarding Path object vs. string functions,
and I agree both have advantages and disadvantages. But I would be opposed
to having both.

Personally, I'd prefer to just use the Path struct in std.path. While a Path can be represented as a string, it's not the same concept (the same way that a DateTime can be represented as an integer, but we don't just pass times around as raw integer types). However, I can't imagine that'd be feasible as it'd break a lot of code. My suggestion would be to just keep the freestanding functions to operate on raw strings, and then migrate over code that depends on std.path to use the Path struct as needed. I realize that this is easier said than done, but even then it shouldn't be a lot of work as Path can implicitly be converted to a string. This makes its integration into existing codebases/Phobos literally as easy as using "Path my_path = foo\bar" instead of "string my_path = foo\bar". You lose no functionality but gain type safety if you have to do any future refactoring. I wouldn't like to create an "object" just to call isAbsolute.

Agreed. The best course of action would probably be keep the raw functions as they exist (think of them as the static versions of Path methods). However, for large applications, the type safety of a Path object makes working with paths much easier.

--089e0115f3963a355d04de6e0afe Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable currently there&#39;s no way to perform cross-platform operations.<div><br>= <div>what about:</div><div>---</div><div>enum Platform{Posix,Windows}</div>= <div>version(Posix) enum PlatformDefault=3DPlatform.Posix; else enum Platfo= rmDefault=3DPlatform.Windows;</div> <div>struct Path(T=3DPlatformDefault){}</div><div><br></div><div>void main(= ){</div><div>Path!(Platform.Posix) path=3D&quot;a\b&quot;;</div><div>auto p= ath2=3D<a href=3D"http://path.to">path.to</a>!Path;</div><div>}</div><div><= div> ---</div></div><div><br></div><div>it allows current usage with no modifica= tion, and allows cross-platform logic.</div><div><br></div><div><br><div cl= ass=3D"gmail_quote">On Wed, Jun 5, 2013 at 1:19 PM, Dylan Knutson <span dir= =3D"ltr">&lt;<a href=3D"mailto:tcdknutson gmail.com" target=3D"_blank">tcdk= nutson gmail.com</a>&gt;</span> wrote:<br> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p= x #ccc solid;padding-left:1ex"><div class=3D"im"><blockquote class=3D"gmail= _quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:= 1ex"> I don&#39;t have a strong opinion regarding Path object vs. string function= s, and I agree both have advantages and disadvantages. But I would be oppos= ed to having both.<br> </blockquote> <br></div> Personally, I&#39;d prefer to just use the Path struct in std.path. While a= Path can be represented as a string, it&#39;s not the same concept (the sa= me way that a DateTime can be represented as an integer, but we don&#39;t j= ust pass times around as raw integer types).<br> <br> However, I can&#39;t imagine that&#39;d be feasible as it&#39;d break a lot= of code. My suggestion would be to just keep the freestanding functions to= operate on raw strings, and then migrate over code that depends on std.pat= h to use the Path struct as needed. I realize that this is easier said than= done, but even then it shouldn&#39;t be a lot of work as Path can implicit= ly be converted to a string.<br> <br> This makes its integration into existing codebases/Phobos literally as easy= as using &quot;Path my_path =3D foo\bar&quot; instead of &quot;string my= _path =3D foo\bar&quot;. You lose no functionality but gain type safety i= f you have to do any future refactoring.<div class=3D"im"> <br> <br> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p= x #ccc solid;padding-left:1ex"> =A0I wouldn&#39;t like to create an &quot;object&quot; just to call isAbsol= ute.<br> </blockquote> <br></div> Agreed. The best course of action would probably be keep the raw functions = as they exist (think of them as the static versions of Path methods). Howev= er, for large applications, the type safety of a Path object makes working = with paths much easier.<br> </blockquote></div><br></div></div> --089e0115f3963a355d04de6e0afe--
Jun 05 2013
"Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Wednesday, June 05, 2013 22:19:21 Dylan Knutson wrote:
I don't have a strong opinion regarding Path object vs. string
But I would be opposed to having both.

Personally, I'd prefer to just use the Path struct in std.path. While a Path can be represented as a string, it's not the same concept (the same way that a DateTime can be represented as an integer, but we don't just pass times around as raw integer types).

There's a significant difference between a type which has a value and units and one which is basically just a string or array of strings wrapped by another type. Not that a Path struct is without value, but I think that there's a very large difference in the amount of value that the two provide. AFAIK, very few bugs are caused by treating paths as strings, but there are a lot of time- related bugs out there caused by using naked values instead of values with units.
This makes its integration into existing codebases/Phobos
literally as easy as

See, this is exactly the sort of thing I'm afraid of. I don't want to have to have arguments over whether a particular function should accept a path as a string or a struct. Right now, we have one way do to it, so it's clear, and it works just fine. If we add a Path struct, then we have two ways to do the same thing, and we're going to have a division among APIs as to which way to handle paths. And I think that we'll be very much worse of because of it. While there is value in having a path struct rather than a string, I don't think that it's worth the extra confusion and division that it'll cause. If we were going to have a path struct, we should have done that in the first place rather than using strings. - Jonathan M Davis
Jun 05 2013
Timothee Cour <thelastmammoth gmail.com> writes:
--001a11c309a0e9fe8804de6e4f23
Content-Type: text/plain; charset=ISO-8859-1

On Wed, Jun 5, 2013 at 1:34 PM, Jonathan M Davis <jmdavisProg gmx.com>wrote:

On Wednesday, June 05, 2013 22:19:21 Dylan Knutson wrote:
I don't have a strong opinion regarding Path object vs. string
But I would be opposed to having both.

Personally, I'd prefer to just use the Path struct in std.path. While a Path can be represented as a string, it's not the same concept (the same way that a DateTime can be represented as an integer, but we don't just pass times around as raw integer types).

There's a significant difference between a type which has a value and units and one which is basically just a string or array of strings wrapped by another type. Not that a Path struct is without value, but I think that there's a very large difference in the amount of value that the two provide. AFAIK, very few bugs are caused by treating paths as strings,

I disagree. It allows to catch bugs early (eg: giving a $mypath environment variable to a binary, where the env variable wasn't set or set to an invalid path name). Constructing a Path object from it will immediately fail as opposed to later down the code with possibly evil artifacts (eg when using std.process.shell functions). Other advantage : central location for all path object creations allows to instrument the code for example for logging all path names mentioned. Would be impossible with raw string type. Other advantage: makes it easy to work with cross-platform code (ie operating on windows paths from posix), see my previous post in this thread. I very much welcome this. There's a reason why other languages (C#, java) have such an abstraction. Given D's alias this functionality, this abstraction comes at 0 runtime cost and makes it work with 0 adaptation for most existing code. What will it break? We should discuss that. but there are a lot of time- related bugs out there caused by using naked values instead of values with units. This makes its integration into existing codebases/Phobos literally as easy as See, this is exactly the sort of thing I'm afraid of. I don't want to have to have arguments over whether a particular function should accept a path as a string or a struct. Right now, we have one way do to it, so it's clear, and it works just fine. If we add a Path struct, then we have two ways to do the same thing, and we're going to have a division among APIs as to which way to handle paths. And I think that we'll be very much worse of because of it. While there is value in having a path struct rather than a string, I don't think that it's worth the extra confusion and division that it'll cause. If we were going to have a path struct, we should have done that in the first place rather than using strings. - Jonathan M Davis --001a11c309a0e9fe8804de6e4f23 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable <div class=3D"gmail_quote">On Wed, Jun 5, 2013 at 1:34 PM, Jonathan M Davis= <span dir=3D"ltr">&lt;<a href=3D"mailto:jmdavisProg gmx.com" target=3D"_bl= ank">jmdavisProg gmx.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmai= l_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left= :1ex"> <div class=3D"im">On Wednesday, June 05, 2013 22:19:21 Dylan Knutson wrote:= <br> &gt; &gt; I don&#39;t have a strong opinion regarding Path object vs. strin= g<br> &gt; &gt; functions, and I agree both have advantages and disadvantages.<br= &gt;<br> &gt; Personally, I&#39;d prefer to just use the Path struct in std.path.<br= &gt; concept (the same way that a DateTime can be represented as an<br> &gt; integer, but we don&#39;t just pass times around as raw integer<br> &gt; types).<br> <br> </div>There&#39;s a significant difference between a type which has a value= and units and<br> one which is basically just a string or array of strings wrapped by another= <br> type. Not that a Path struct is without value, but I think that there&#39;s= a very<br> large difference in the amount of value that the two provide. AFAIK, very f= ew<br> bugs are caused by treating paths as strings, </blockquote><div><br></div><= div>I disagree.</div><div><br></div><div>It allows to catch bugs early (eg:= giving a$mypath environment variable to a binary, where the env variable = wasn&#39;t set or set to an invalid path name). Constructing a Path object = from it will immediately fail as opposed to later down the code with possib= ly evil artifacts (eg when using std.process.shell functions).</div> <div><br></div><div>Other advantage : central location for all path object = creations allows to instrument the code for example for logging all path na= mes mentioned. Would be impossible with raw string type.</div><div><br> </div><div>Other advantage: makes it easy to work with cross-platform code = (ie operating on windows paths from posix), see my previous post in this th= read.</div><div><br></div><div>I very much welcome this. There&#39;s a reas= on why other languages (C#, java) have such an abstraction. Given D&#39;s a= lias this functionality, this abstraction comes at 0 runtime cost and makes= it work with 0 adaptation for most existing code.=A0</div> <div><br></div><div>What will it break? We should discuss that.</div><div><= br></div><div>=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0 = 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">but there are a lot o= f time-<br> related bugs out there caused by using naked values instead of values with<= br> units.<br> <div class=3D"im"><br> &gt; This makes its integration into existing codebases/Phobos<br> &gt; literally as easy as<br> </div>[snip]<br> <br> See, this is exactly the sort of thing I&#39;m afraid of. I don&#39;t want = to have to<br> have arguments over whether a particular function should accept a path as a= <br> string or a struct. Right now, we have one way do to it, so it&#39;s clear,= and it<br> works just fine. If we add a Path struct, then we have two ways to do the s= ame<br> thing, and we&#39;re going to have a division among APIs as to which way to= handle<br> paths. And I think that we&#39;ll be very much worse of because of it. Whil= e there<br> is value in having a path struct rather than a string, I don&#39;t think th= at it&#39;s<br> worth the extra confusion and division that it&#39;ll cause. If we were goi= ng to<br> have a path struct, we should have done that in the first place rather than= <br> using strings.<br> <span class=3D"HOEnZb"><font color=3D"#888888"><br> - Jonathan M Davis<br> </font></span></blockquote></div><br> --001a11c309a0e9fe8804de6e4f23--
Jun 05 2013
"Dylan Knutson" <tcdknutson gmail.com> writes:
There's a significant difference between a type which has a
value and units and
one which is basically just a string or array of strings
wrapped by another
type. Not that a Path struct is without value, but I think that
there's a very
large difference in the amount of value that the two provide.
AFAIK, very few
bugs are caused by treating paths as strings, but there are a
lot of time-
related bugs out there caused by using naked values instead of
values with
units.

Dub is forced to define its own separate Path type because, as its author states, using a string to represent a path "more often than not results in hidden bugs." (https://github.com/rejectedsoftware/dub/issues/79). Representing a path is just fine in a small script, but the moment you've got to handle stuff like path comparison, building, and general manipulation, you're better off defining an abstraction for it.
See, this is exactly the sort of thing I'm afraid of. I don't
want to have to
have arguments over whether a particular function should accept
a path as a
string or a struct. Right now, we have one way do to it, so
it's clear, and it
works just fine.

I see no problem with just keeping Phobos as it is, it was just a suggestion to make use of new functionality. A function that takes a string can accept a Path *or* a string, and it'll work just fine, thanks to subtyping. void bar(Path path) { return; } void foo(string str) { return; } Path p = baz\quixx; bar(p); foo(p); So there doesn't have to be an argument over what a function should accept; that's up to the function's internal implementation. From the outside, it'll accept both.
Jun 05 2013
On Wednesday, 5 June 2013 at 06:27:46 UTC, Dylan Knutson wrote:
Hello,
I'd like to open up the idea of Path being an object in
std.path. I've submitted a pull
(https://github.com/D-Programming-Language/phobos/pull/1333)
that adds a Path struct to std.path, "which exposes a much more
palatable interface to path string manipulation".

For the record, there are some existing D path object implementations: * Tango's FilePath class: https://github.com/SiegeLord/Tango-D2/blob/d2port/tango/io/FilePath.d * Vibe's Path struct: https://github.com/rejectedsoftware/vibe.d/blob/master/source/vibe/inet/path.d
Jun 05 2013
"Dylan Knutson" <tcdknutson gmail.com> writes:
On Wednesday, 5 June 2013 at 22:06:52 UTC, Vladimir Panteleev
wrote:
On Wednesday, 5 June 2013 at 06:27:46 UTC, Dylan Knutson wrote:
Hello,
I'd like to open up the idea of Path being an object in
std.path. I've submitted a pull
(https://github.com/D-Programming-Language/phobos/pull/1333)
that adds a Path struct to std.path, "which exposes a much
more palatable interface to path string manipulation".

For the record, there are some existing D path object implementations: * Tango's FilePath class: https://github.com/SiegeLord/Tango-D2/blob/d2port/tango/io/FilePath.d * Vibe's Path struct: https://github.com/rejectedsoftware/vibe.d/blob/master/source/vibe/inet/path.d

The design of Path was prompted by Dub's own internal path module, might I add. And if anything, this just goes to show that a Path object indeed does have its use cases.
Jun 05 2013
"Dylan Knutson" <tcdknutson gmail.com> writes:
I'd like to point out some of the pitfalls of using a raw string
as a representation of a path, too.

You've got to manually normalize strings before any comparison is
done. Even a single directory delimer at the end of the string
means that the paths won't compare correctly. This takes a good
amount of extra code to do so, and you've got to remember to
normalize *everywhere*, or you've got a bug waiting to happen.
string s1 = baz/../foo/bar/;
string s2 = foo/bar/;
string s3 = foo/bar;

assert(s1 == s2); // Fails
assert(s2 == s3); // Fails
assert(s1 == s3); // Fails
assert(buildNormalizedPath(s1) == buildNormalizedPath(s2)); //
Passes, with many more keystrokes.

Comparing with Paths:
Path p1 = baz/../foo/bar/;
Path p2 = foo/bar/;
Path p2 = foo/bar;

assert(p1 == p2); // Passes.
assert(p2 == p3); // Passes.
assert(p1 == p3); // Passes.

As you can see, Path is just generally easier to work with,
because it encapsulates the concept a path. There's no having to
normalize strings, because that's done for you. It just works.

Building a path with strings isn't difficult, but the function
calls are unweildy.
string s1 = buildNormalizedPath(foo, bar);
string s2 = buildNormalizedPath(s1, baz);
assert(s2 == foo/bar/baz); // Will fail on some platforms.

Building a Path, IMO, just looks cleaner, and it's obvious what
you're doing.
Path p1 = Path(foo, bar);
Path p2 = p1.join(baz);
assert(p2 == foo/bar/baz); // Passes on all platforms.

As a sidenote, I'd like to point out that using Path has *no more
overhead* than passing around and manipulating a raw string.
As far as I can tell, all use cases for Path takes less code, and
more easily convays what you're doing. D's support for object
oriented design is great; why not make use of it?
Jun 05 2013
"Jesse Phillips" <Jesse.K.Phillips+D gmail.com> writes:
On Wednesday, 5 June 2013 at 20:52:24 UTC, Dylan Knutson wrote:
Dub is forced to define its own separate Path type because, as
its author states, using a string to represent a path "more
often than not results in hidden bugs."

You're miss quoting here. "usually will be places where the path is modified using string operations [...]" While I've had desires to have my functions accept a Path so that I can document what is being accepted (also helps with function overloads), std.path has been working well for me as I move my code from string operations to path operations.
Jun 05 2013
"Jesse Phillips" <Jesse.K.Phillips+D gmail.com> writes:
On Wednesday, 5 June 2013 at 22:06:52 UTC, Vladimir Panteleev
wrote:
* Tango's FilePath class:

https://github.com/SiegeLord/Tango-D2/blob/d2port/tango/io/FilePath.d

Note that Tango code should not be used for code intended for Phobos unless all authors of that piece have stated they will license under Boost. It is a firm stance to prevent any potential legal issues (whether perceived or real)
Jun 05 2013
"Lars T. Kyllingstad" <public kyllingen.net> writes:
On Wednesday, 5 June 2013 at 06:27:46 UTC, Dylan Knutson wrote:
Hello,
I'd like to open up the idea of Path being an object in
std.path. I've submitted a pull
(https://github.com/D-Programming-Language/phobos/pull/1333)
that adds a Path struct to std.path, "which exposes a much more
palatable interface to path string manipulation".

Since I am the designer and primary author of std.path, I should probably say something. When I first started working on "the new std.path" a couple of years ago, I initially entertained the idea of writing it in terms of a dedicated Path type. I was quickly convinced otherwise by others, and proceeded to design the module around normal strings. For the last two years I've been working more in C++ than in D (by necessity, not by desire), and for all my path-manipulation needs I've been using boost::filesystem. This library has a dedicated path type, so I've gained some experience with this kind of API. And I am *really* happy we went with the string solution for std.path. Paths are usually obtained in string form, and they are normally passed to other functions and third party libraries in string form. Having to convert them to something else just to do what is, in fact, string manipulations, is just annoying. (One of my biggest gripes with boost::filesystem is that conversions between path and string necessitate a copy, which is not a problem with your Path type, so in that respect it is better than Boost's solution.)
[...]

Why I think it should be reconsidered for inclusion in the std
(listed in the pull):
* Adds a (more) platform independent abstraction for path
strings.

How is this more platform independent? It is just a simple wrapper around a string, with methods that forward to already-extant module-level functions.
* Path provides a type safe way to pass, compare, and
manipulate arbitrary path strings.

How is it safer? I would agree with this if it verified that isValidpath(_path) on construction and maintained this as an invariant, but I cannot see that it does.
* It wraps over the functions defined in std.path, so behavior
of methods on Path are, in most cases, identical to their
corresponding module function.

Then what is the added value? Having Path together with normal string functions in the same module will be confusing (there are two almost-equal ways of doing the same thing; which one should I choose?), and it will add code duplication (now my code has to accept paths both as strings and as Paths). As the author of std.path this may come across as hostile or jealous, but I don't see that the proposed change improves anything. Lars
Jun 06 2013
Walter Bright <newshound2 digitalmars.com> writes:
On 6/6/2013 8:57 AM, Lars T. Kyllingstad wrote:
On Thursday, 6 June 2013 at 15:41:51 UTC, Dylan Knutson wrote:
FWIW, having Path be an object adds consistency with the rest of Phobos, which
has many entities which could be expressed as primitives, expressed as
objects. To name a few, DateTime is an object, File is an object, and DirEntry
is an object. Yes, they could be described as integers, or a pointer, or a
string, but it's less cognitive load on the developer to recognize them as
separate types.

"Reducing cognitive load" is not the main reason these are objects. DateTime lumps together no less than six integers. File adds automatic resource management via reference counting. DirEntry caches file information to avoid repeated filesystem lookups. And so on.

It's hard to see what value there is in a type that is simply a wrapper around an existing type, and which provides implicit conversions too/from that existing type so that they can be intermixed arbitrarily. At the end, that's nothing more than: alias string Path;
Jun 06 2013
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/6/13 1:41 PM, Walter Bright wrote:
On 6/6/2013 8:57 AM, Lars T. Kyllingstad wrote:
On Thursday, 6 June 2013 at 15:41:51 UTC, Dylan Knutson wrote:
FWIW, having Path be an object adds consistency with the rest of
Phobos, which
has many entities which could be expressed as primitives, expressed as
objects. To name a few, DateTime is an object, File is an object, and
DirEntry
is an object. Yes, they could be described as integers, or a pointer,
or a
string, but it's less cognitive load on the developer to recognize
them as
separate types.

"Reducing cognitive load" is not the main reason these are objects. DateTime lumps together no less than six integers. File adds automatic resource management via reference counting. DirEntry caches file information to avoid repeated filesystem lookups. And so on.

It's hard to see what value there is in a type that is simply a wrapper around an existing type, and which provides implicit conversions too/from that existing type so that they can be intermixed arbitrarily. At the end, that's nothing more than: alias string Path;

No, you get to check the conversions going one way. If you destroy, destroy in style. This is a wrong argument. Andrei
Jun 06 2013
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/6/13 1:13 PM, Steven Schveighoffer wrote:
buildNormalizedPath(s1) == buildNormalizedPath(s2);

and

p1 == p2;

This can be done without allocations.

Interesting. "Show me the code!" Andrei
Jun 06 2013
Walter Bright <newshound2 digitalmars.com> writes:
On 6/6/2013 10:47 AM, Andrei Alexandrescu wrote:
On 6/6/13 1:13 PM, Steven Schveighoffer wrote:
buildNormalizedPath(s1) == buildNormalizedPath(s2);

and

p1 == p2;

This can be done without allocations.

Interesting. "Show me the code!"

Not necessary - it is trivially obvious to the most casual observer! (You just use the same logic that normalizes the path to do the comparison.)
Jun 06 2013
Walter Bright <newshound2 digitalmars.com> writes:
On 6/6/2013 9:14 AM, Dylan Knutson wrote:
It doesn't do any allocations that the user won't have to do anyways. Paths
have
to be normalized before comparison; not doing so isn't correct behavior. Eg,
the
strings foo../bar != bar, yet they're equivalent paths. Path encapsulates
the behavior. So it's the difference between

buildNormalizedPath(s1) == buildNormalizedPath(s2);

and

p1 == p2;

I believe it is a mistake to try and automatically hide the difference between ./bar and bar. Paths being == and 'referring to the same file' are different things. For example, what about symlinks? For performance reasons, also, I'd want to normalize sometime after building the entire path, I wouldn't want to normalize at each step. Normalization should be an explicit step, not implicit.
Jun 06 2013
"Flamaros" <flamaros.xavier gmail.com> writes:
On Wednesday, 5 June 2013 at 06:27:46 UTC, Dylan Knutson wrote:
Hello,
I'd like to open up the idea of Path being an object in
std.path. I've submitted a pull
(https://github.com/D-Programming-Language/phobos/pull/1333)
that adds a Path struct to std.path, "which exposes a much more
palatable interface to path string manipulation".

As jmdavis points out, this has previously been discussed.
However, I can't find that discussion, and I think that the
benefits of including an OO way to deal with paths is a serious
gain for the standard library.

Why I think it should be reconsidered for inclusion in the std
(listed in the pull):
* Adds a (more) platform independent abstraction for path
strings.
* Path provides a type safe way to pass, compare, and
manipulate arbitrary path strings.
* It wraps over the functions defined in std.path, so behavior
of methods on Path are, in most cases, identical to their
corresponding module function.

hate to see this commit closed due to a discussion that
happened at a different point in D's development when the

Thank you.

I like the idea to manipulate paths trough an object. API that taking path as parameter as better typed than with string. It's really usefull for file loaders, it's affirm the method will do path related operation and expect a particular string format. Some methods seems miss like completeBaseName and completeSuffix. You can take a look to : Qt API http://qt-project.org/doc/qt-4.8/qfileinfo.html The bad thing with the Qt API it's we can't know which method do a file system access, that why I prefer having 2 separated ojects. It would be good to have the FileInfo object.
Jun 06 2013
"Lars T. Kyllingstad" <public kyllingen.net> writes:
On Thursday, 6 June 2013 at 07:05:52 UTC, Lars T. Kyllingstad
wrote:
On Wednesday, 5 June 2013 at 06:27:46 UTC, Dylan Knutson wrote:
Hello,
I'd like to open up the idea of Path being an object in
std.path. I've submitted a pull
(https://github.com/D-Programming-Language/phobos/pull/1333)
that adds a Path struct to std.path, "which exposes a much
more palatable interface to path string manipulation".

[...]

Let me add some more to this. To justify the addition of such a type, it needs to pull its own weight. For added value, it could do one or both of the following: 1. Maintain an isValidPath() invariant, for early error detection. (On POSIX, this is rather trivial, as any string that does not contain a null character is in principle a valid path, but on Windows, the situation is different.) 2. Add in-place versions of path modifiers (setExtension, setDrive, etc.), for improved performance. One solution would be for Path to be a trivial string wrapper which does (1) and not (2). In this case, it is justified to have Path *in addition to* the existing functions. Another solution would be for Path to do (2), possibly in addition to (1). However, in this case it should be a *replacement* for the existing functions, and not an addition. Otherwise, we have two almost-equal ways of doing the same thing, which should be avoided. (I am not advocating this, however, as it will massively break user code all over again.) Lars
Jun 06 2013
"Regan Heath" <regan netmail.co.nz> writes:
On Thu, 06 Jun 2013 08:05:51 +0100, Lars T. Kyllingstad
<public kyllingen.net> wrote:
Paths are usually obtained in string form, and they are normally passed
to other functions and third party libraries in string form.  Having to
convert them to something else just to do what is, in fact, string
manipulations, is just annoying.

Agree 100%. C# has Path.Combine which builds paths from strings, returning a string and this is good. It also has System.File and System.Directory static classes with static methods taking string, also good. But, C# also has System.IO.FileInfo and System.IO.DirectoryInfo which are constructed from a string, and then have methods which mirror the static methods from System.File plus a refresh method to update the cached file attributes etc obtained from the file system. I find these objects useful. It would be nice for D to have similar objects, IMO. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Jun 06 2013
"Regan Heath" <regan netmail.co.nz> writes:
On Thu, 06 Jun 2013 08:55:50 +0100, Lars T. Kyllingstad
<public kyllingen.net> wrote:

On Thursday, 6 June 2013 at 07:05:52 UTC, Lars T. Kyllingstad wrote:
On Wednesday, 5 June 2013 at 06:27:46 UTC, Dylan Knutson wrote:
Hello,
I'd like to open up the idea of Path being an object in std.path. I've
submitted a pull
a Path struct to std.path, "which exposes a much more palatable
interface to path string manipulation".

[...]

Let me add some more to this. To justify the addition of such a type, it needs to pull its own weight. For added value, it could do one or both of the following:

Does System.IO.DirectoryInfo: http://msdn.microsoft.com/en-us/library/system.io.directoryinfo.aspx Add sufficient value to justify it's existence to your mind? vs just having System.IO.Directory: http://msdn.microsoft.com/en-us/library/system.io.directory.aspx R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Jun 06 2013
"Lars T. Kyllingstad" <public kyllingen.net> writes:
On Thursday, 6 June 2013 at 10:30:05 UTC, Regan Heath wrote:
On Thu, 06 Jun 2013 08:05:51 +0100, Lars T. Kyllingstad
<public kyllingen.net> wrote:
Paths are usually obtained in string form, and they are
normally passed to other functions and third party libraries
in string form.  Having to convert them to something else just
to do what is, in fact, string manipulations, is just annoying.

Agree 100%. C# has Path.Combine which builds paths from strings, returning a string and this is good. It also has System.File and System.Directory static classes with static methods taking string, also good. But, C# also has System.IO.FileInfo and System.IO.DirectoryInfo which are constructed from a string, and then have methods which mirror the static methods from System.File plus a refresh method to update the cached file attributes etc obtained from the file system. I find these objects useful. It would be nice for D to have similar objects, IMO.

It does have a similar type: std.file.DirEntry. http://dlang.org/phobos/std_file.html#.DirEntry
Jun 06 2013
"Lars T. Kyllingstad" <public kyllingen.net> writes:
On Thursday, 6 June 2013 at 10:32:36 UTC, Regan Heath wrote:
On Thu, 06 Jun 2013 08:55:50 +0100, Lars T. Kyllingstad
<public kyllingen.net> wrote:

On Thursday, 6 June 2013 at 07:05:52 UTC, Lars T. Kyllingstad
wrote:
On Wednesday, 5 June 2013 at 06:27:46 UTC, Dylan Knutson
wrote:
Hello,
I'd like to open up the idea of Path being an object in
std.path. I've submitted a pull
(https://github.com/D-Programming-Language/phobos/pull/1333)
that adds a Path struct to std.path, "which exposes a much
more palatable interface to path string manipulation".

[...]

Let me add some more to this. To justify the addition of such a type, it needs to pull its own weight. For added value, it could do one or both of the following:

Does System.IO.DirectoryInfo: http://msdn.microsoft.com/en-us/library/system.io.directoryinfo.aspx Add sufficient value to justify it's existence to your mind? vs just having System.IO.Directory: http://msdn.microsoft.com/en-us/library/system.io.directory.aspx

They add great value, but that is a completely different discussion, as these are more similar to std.file.DirEntry. The added value is mainly in the performance benefits; for example, if (exists(f) && isFile(f) && timeLastModified(f) < d) ... requires three filesystem lookups (stat() calls), whereas auto de = dirEntry(f); if (de.exists && de.isFile && de.timeLastModified < d) ... is just one. I see no such benefit in the proposed Path type.
Jun 06 2013
"Regan Heath" <regan netmail.co.nz> writes:
On Thu, 06 Jun 2013 11:43:51 +0100, Lars T. Kyllingstad
<public kyllingen.net> wrote:

On Thursday, 6 June 2013 at 10:30:05 UTC, Regan Heath wrote:
On Thu, 06 Jun 2013 08:05:51 +0100, Lars T. Kyllingstad
<public kyllingen.net> wrote:
Paths are usually obtained in string form, and they are normally
passed to other functions and third party libraries in string form.
Having to convert them to something else just to do what is, in fact,
string manipulations, is just annoying.

Agree 100%. C# has Path.Combine which builds paths from strings, returning a string and this is good. It also has System.File and System.Directory static classes with static methods taking string, also good. But, C# also has System.IO.FileInfo and System.IO.DirectoryInfo which are constructed from a string, and then have methods which mirror the static methods from System.File plus a refresh method to update the cached file attributes etc obtained from the file system. I find these objects useful. It would be nice for D to have similar objects, IMO.

It does have a similar type: std.file.DirEntry. http://dlang.org/phobos/std_file.html#.DirEntry

Ahh.. excellent. In that case, I don't think we want/need the Path being proposed. Side-note; DirEntry is a very UNIX centric name - I only know that because I have coded with it, I wonder what pure windows developers make of it.. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Jun 06 2013
"Flamaros" <flamaros.xavier gmail.com> writes:
On Thursday, 6 June 2013 at 07:26:53 UTC, Flamaros wrote:
On Wednesday, 5 June 2013 at 06:27:46 UTC, Dylan Knutson wrote:
Hello,
I'd like to open up the idea of Path being an object in
std.path. I've submitted a pull
(https://github.com/D-Programming-Language/phobos/pull/1333)
that adds a Path struct to std.path, "which exposes a much
more palatable interface to path string manipulation".

As jmdavis points out, this has previously been discussed.
However, I can't find that discussion, and I think that the
benefits of including an OO way to deal with paths is a
serious gain for the standard library.

Why I think it should be reconsidered for inclusion in the std
(listed in the pull):
* Adds a (more) platform independent abstraction for path
strings.
* Path provides a type safe way to pass, compare, and
manipulate arbitrary path strings.
* It wraps over the functions defined in std.path, so behavior
of methods on Path are, in most cases, identical to their
corresponding module function.

hate to see this commit closed due to a discussion that
happened at a different point in D's development when the

Thank you.

I like the idea to manipulate paths trough an object. API that taking path as parameter as better typed than with string. It's really usefull for file loaders, it's affirm the method will do path related operation and expect a particular string format. Some methods seems miss like completeBaseName and completeSuffix. You can take a look to : Qt API http://qt-project.org/doc/qt-4.8/qfileinfo.html The bad thing with the Qt API it's we can't know which method do a file system access, that why I prefer having 2 separated ojects. It would be good to have the FileInfo object.

Having an object will also remove format normalization, with a string as parameter the normalization method have to always be called.
Jun 06 2013
"Dylan Knutson" <tcdknutson gmail.com> writes:
Let me add some more to this.  To justify the addition of such
a type, it needs to pull its own weight.  For added value, it
could do one or both of the following:

1. Maintain an isValidPath() invariant, for early error
detection.  (On POSIX, this is rather trivial, as any string
that does not contain a null character is in principle a valid
path, but on Windows, the situation is different.)

That's a possibility.
2. Add in-place versions of path modifiers (setExtension,
setDrive, etc.), for improved performance.

I don't think that there'll be any performance improvements by making in place modification functions. Considering under the hood the path object is just a string, and that string's reference needs to be changed with each modification, I don't see how manipulation can be made faster.
One solution would be for Path to be a trivial string wrapper
which does (1) and not (2).  In this case, it is justified to
have Path *in addition to* the existing functions.

Another solution would be for Path to do (2), possibly in
addition to (1).  However, in this case it should be a
*replacement* for the existing functions, and not an addition.
Otherwise, we have two almost-equal ways of doing the same
thing, which should be avoided.  (I am not advocating this,
however, as it will massively break user code all over again.)

The more I think about it, the more partial I am to removing the existing string methods in std.path. At most, using a Path object increases number of characters typed by 6 (Path()). And even then, chances are you'll be saving characters as method names can be simplified to remove path from them: buildNormalizedPath -> normalized, isValidPath -> isValid, etc. Even with user code breaking, 1) D isn't exactly considered a stable language quite yet; I'm sure that users expect code breakage with each new release, and 2) it's trivial to convert code that uses the string based API to the object based API.
Jun 06 2013
"Dylan Knutson" <tcdknutson gmail.com> writes:
On Thursday, 6 June 2013 at 07:05:52 UTC, Lars T. Kyllingstad
wrote:
Paths are usually obtained in string form, and they are
normally passed to other functions and third party libraries in
string form.  Having to convert them to something else just to
do what is, in fact, string manipulations, is just annoying.

Well, when designing Path, I didn't want to add much, if any, programmer overhead. Conversion to a Path is trivial: Change the type to Path, and 90% of the time it'll just work. The only case that comes to mind where a string can't be implicitly assigned/converted to a Path is when passing it to a function, in which case all it needs to be wrapped in is Path(). Or, have an overloaded version that takes a string (which all path using functions do now anyways).
(One of my biggest gripes with boost::filesystem is that
conversions between path and string necessitate a copy, which
is not a problem with your Path type, so in that respect it is
better than Boost's solution.)

[...]

Why I think it should be reconsidered for inclusion in the std
(listed in the pull):
* Adds a (more) platform independent abstraction for path
strings.

How is this more platform independent? It is just a simple wrapper around a string, with methods that forward to already-extant module-level functions.

I should have said "makes it easier to be platform independent". Normalization is done automatically on comparison. There's nothing you can't do with normal std.path functions, but that's not the point. It's to be type safe and add convenience.
* Path provides a type safe way to pass, compare, and
manipulate arbitrary path strings.

How is it safer? I would agree with this if it verified that isValidpath(_path) on construction and maintained this as an invariant, but I cannot see that it does.

Type safe. Once you've got a huge program with many concepts floating around, you don't want to have to keep track of which strings are paths and which aren't, and you don't want to do all the specifics like splitting, normalization, and joining with raw string functions. This isn't just conjecture either; there are D programs in the wild that abstract away path strings because it's easier to deal with them that way. I didn't want to force paths passed in to be valid, because the programmer might want an invalid path passed around for whatever reason.
* It wraps over the functions defined in std.path, so behavior
of methods on Path are, in most cases, identical to their
corresponding module function.

Then what is the added value?

See above. I didn't want to change functionality, just make it easier to use.
As the author of std.path this may come across as hostile or
jealous, but I don't see that the proposed change improves
anything.

You came off as quite constructive; thank you :-)
Jun 06 2013
"Dylan Knutson" <tcdknutson gmail.com> writes:
On Thursday, 6 June 2013 at 10:48:54 UTC, Lars T. Kyllingstad
wrote:
On Thursday, 6 June 2013 at 10:32:36 UTC, Regan Heath wrote:
On Thu, 06 Jun 2013 08:55:50 +0100, Lars T. Kyllingstad
<public kyllingen.net> wrote:

On Thursday, 6 June 2013 at 07:05:52 UTC, Lars T. Kyllingstad
wrote:
[...]

Let me add some more to this. To justify the addition of such a type, it needs to pull its own weight. For added value, it could do one or both of the following:

Does System.IO.DirectoryInfo: http://msdn.microsoft.com/en-us/library/system.io.directoryinfo.aspx Add sufficient value to justify it's existence to your mind? vs just having System.IO.Directory: http://msdn.microsoft.com/en-us/library/system.io.directory.aspx

They add great value, but that is a completely different discussion, as these are more similar to std.file.DirEntry. The added value is mainly in the performance benefits; for example, if (exists(f) && isFile(f) && timeLastModified(f) < d) ... requires three filesystem lookups (stat() calls), whereas auto de = dirEntry(f); if (de.exists && de.isFile && de.timeLastModified < d) ... is just one. I see no such benefit in the proposed Path type.

Path and dirEntry are different modules with different goals to fulfill. I don't think it's appropriate to compare a module whose function is path manipulation with one whose is querying filesystem information.
Jun 06 2013
"Lars T. Kyllingstad" <public kyllingen.net> writes:
On Thursday, 6 June 2013 at 14:39:03 UTC, Dylan Knutson wrote:
[...]

I don't think that there'll be any performance improvements by
making in place modification functions. Considering under the
hood the path object is just a string, and that string's
reference needs to be changed with each modification, I don't
see how manipulation can be made faster.

Why does _path have to be an immutable string? It could just as well be a char[], or it could be templated on the character type.
[...]

The more I think about it, the more partial I am to removing
the existing string methods in std.path. At most, using a Path
object increases number of characters typed by 6 (Path()).
And even then, chances are you'll be saving characters as
method names can be simplified to remove path from them:
buildNormalizedPath -> normalized, isValidPath -> isValid, etc.
Even with user code breaking, 1) D isn't exactly considered a
stable language quite yet; I'm sure that users expect code
breakage with each new release, and 2) it's trivial to convert
code that uses the string based API to the object based API.

I know D isn't 100% stable yet, but bear in mind that this module was introduced no more than two years ago, as part of the (still-ongoing) effort to revamp the old modules from the D1 days. It was accepted with a unanimous vote after a comprehensive review by the D community. And already you want another breaking redesign? I am strongly opposed to this.
Jun 06 2013
"Lars T. Kyllingstad" <public kyllingen.net> writes:
On Thursday, 6 June 2013 at 14:54:25 UTC, Dylan Knutson wrote:
On Thursday, 6 June 2013 at 10:48:54 UTC, Lars T. Kyllingstad
They add great value, but that is a completely different
discussion, as these are more similar to std.file.DirEntry.
[...]

Path and dirEntry are different modules with different goals to fulfill. I don't think it's appropriate to compare a module whose function is path manipulation with one whose is querying filesystem information.

Which is why my first sentence said "that is a completely different discussion".
Jun 06 2013
Walter Bright <newshound2 digitalmars.com> writes:
On 6/4/2013 11:27 PM, Dylan Knutson wrote:
I'd like to open up the idea of Path being an object in std.path. I've
submitted
a pull (https://github.com/D-Programming-Language/phobos/pull/1333) that adds a
Path struct to std.path, "which exposes a much more palatable interface to path
string manipulation".

I've succumbed to the temptation to do this several times over the years. I always wind up backing it out and going back to strings. The objections have all been already mentioned by others in this thread. I understand the motivation for doing it, it seems like a great idea, but I am strongly opposed to it. To repeat the objections: 1. Making a more 'palatable' interface is pretty much chasing rainbows. It really isn't better, it is just different. In many ways, it is worse because it cannot hope to duplicate the rich interface available for strings. 2. APIs that deal with filenames take strings and return strings, not Path objects. Your code gets littered with path and filename components that are sometimes Paths and sometimes strings and sometimes both. 3. Every time you deal with a filename or path, you have to decide whether to use a Path or a string. This may seem like a small thing, but when writing a lot of code to deal with paths, this becomes a fracking annoyance. 4. An awful lot of path manipulation is done using string functions. Ever do regexes on paths? I do. But regex deals with strings, not Path objects. Ditto for the rest of the universe of code that deals with strings. 5. You wind up with two parallel universes of functions to deal with paths - one dealing with strings, one with Paths, oh, and a third universe of crap that deals with mixed strings and Paths. 6. If you try not to do (5), you break all existing code. 7. People like writing paths as "/etc/hosts", not Path("/etc/hosts"). People will not stand for a Path constructor that winds up allocating memory so it can rewrite the string in a canonical path representation. 8. There really isn't any such thing as a portable path representation. It's more than just \ vs /. There are the drive prefixes in Windows that have no analog in Linux. Sometimes case matters in Linux, where it would be ignored under Windows. There are 8.3 issues sometimes. The only thing you can do is come up with a subset of what works across systems, and then of course you have to go back to using strings when you need to access D:\foo\abc.c 9. People think about paths in terms of strings, not Path objects. Adding an abstraction layer always produces the feeling of "what is it doing, is it allocating memory, is it slow, is it doing something clever that I don't need/want?". This is cognitive baggage, and interferes with writing clear, correct code. I've written a lot of cross-platform path code, I've tried the Path object thing multiple times, and I wrote the original std.path, and it uses strings because of my experience.
Jun 06 2013
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/6/13 11:36 AM, Walter Bright wrote:
To repeat the objections:

1. Making a more 'palatable' interface is pretty much chasing rainbows.
It really isn't better, it is just different. In many ways, it is worse
because it cannot hope to duplicate the rich interface available for
strings.

Subtyping (Path is a subtype of string by means of alias this) should make getting from paths to strings easy, and getting back from strings to paths one constructor call away (which adds correctness).
2. APIs that deal with filenames take strings and return strings, not
Path objects. Your code gets littered with path and filename components
that are sometimes Paths and sometimes strings and sometimes both.

Subtyping should make it easy to pass paths to APIs that expect strings.
3. Every time you deal with a filename or path, you have to decide
whether to use a Path or a string. This may seem like a small thing, but
when writing a lot of code to deal with paths, this becomes a fracking
annoyance.

If there's a reward for using paths the annoyance factor may be reduced.
4. An awful lot of path manipulation is done using string functions.
Ever do regexes on paths? I do. But regex deals with strings, not Path
objects. Ditto for the rest of the universe of code that deals with
strings.

Subtyping should take care of this.
5. You wind up with two parallel universes of functions to deal with
paths - one dealing with strings, one with Paths, oh, and a third
universe of crap that deals with mixed strings and Paths.

Subtyping makes one way easy and constructors make the other way affordable. Again, this comes back to perceived gains that compensate for the shortcomings.
6. If you try not to do (5), you break all existing code.

Only "half".
7. People like writing paths as "/etc/hosts", not Path("/etc/hosts").
People will not stand for a Path constructor that winds up allocating
memory so it can rewrite the string in a canonical path representation.

Lazy canonicalization may help.
8. There really isn't any such thing as a portable path representation.
It's more than just \ vs /. There are the drive prefixes in Windows that
have no analog in Linux. Sometimes case matters in Linux, where it would
be ignored under Windows. There are 8.3 issues sometimes. The only thing
you can do is come up with a subset of what works across systems, and
then of course you have to go back to using strings when you need to
access D:\foo\abc.c

That is actually an argument in favor of good encapsulation, not against.
9. People think about paths in terms of strings, not Path objects.
Adding an abstraction layer always produces the feeling of "what is it
doing, is it allocating memory, is it slow, is it doing something clever
that I don't need/want?". This is cognitive baggage, and interferes with
writing clear, correct code.

I'm not sure whether the generalization holds. Andrei
Jun 06 2013
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/6/13 1:04 PM, Lars T. Kyllingstad wrote:
On Thursday, 6 June 2013 at 16:03:15 UTC, Andrei Alexandrescu wrote:
[...]

8. There really isn't any such thing as a portable path representation.
It's more than just \ vs /. There are the drive prefixes in Windows that
have no analog in Linux. Sometimes case matters in Linux, where it would
be ignored under Windows. There are 8.3 issues sometimes. The only thing
you can do is come up with a subset of what works across systems, and
then of course you have to go back to using strings when you need to
access D:\foo\abc.c

That is actually an argument in favor of good encapsulation, not against.

The proposed API change does not introduce good encapsulation. It introduces a super-thin wrapper around a built-in type, and replaces free functions with methods, for what gain?

I was talking in principle. I agree that the argument "it was as easy as wrapping the already existing functions" works against the current proposal, not in favor of it. Andrei
Jun 06 2013
Michel Fortin <michel.fortin michelf.ca> writes:
On 2013-06-06 15:36:15 +0000, Walter Bright <newshound2 digitalmars.com> said:

8. There really isn't any such thing as a portable path representation.
It's more than just \ vs /. There are the drive prefixes in Windows
that have no analog in Linux. Sometimes case matters in Linux, where it
would be ignored under Windows. There are 8.3 issues sometimes. The
only thing you can do is come up with a subset of what works across
systems, and then of course you have to go back to using strings when
you need to access D:\foo\abc.c

Actually, there is one portable representation for paths: URLs. More specifically "file:" URLs if we're limiting ourselves to filesystem paths. Relative URLs should probably count too. But otherwise, that's all true. To correctly normalize a path, you need to know which underlying filesystem is in use. Today's operating systems can mix and match case-sensitive, case-preserving, and case-insensitive filesystems, different restrictions on file names, and sometime have obscure restrictions/normalization when using old APIs on newer filesystenm. You can't really normalize a path without making a lot of assumptions. Of course, that's not an argument for or against having a path object to encapsulate the differences. But I'd tend to say that what the path object can do is more limited than one might think at first glance. As a side note, Apple is currently asking application developers to use URLs instead of raw paths to local files. Using URLs makes it possible for instance to attach "bookmarks" keys on path (in the query string) that can more or less automatically punch a hole in the sandbox when accessing a file (which can expire or be revoked). Pretty much all recent Cocoa APIs take url objects instead of path strings. -- Michel Fortin michel.fortin michelf.ca http://michelf.ca/
Jun 06 2013
Walter Bright <newshound2 digitalmars.com> writes:
On 6/6/2013 9:23 AM, Michel Fortin wrote:
Actually, there is one portable representation for paths: URLs. More
specifically "file:" URLs if we're limiting ourselves to filesystem paths.
Relative URLs should probably count too.

That doesn't work for case sensitivity/insensitivity differences, nor does it work for drive letters like "C:" (which don't exist on Apple systems, hence they can afford to dismiss them). In D source code, we deal with this with the convention that package and module names must be lower case. But there's no getting around the fact that "File" and "file" are different paths under Windows, and are the same under Linux. There is no generic abstraction to account for that - the programmer must be aware of it and adjust as appropriate for his application.
Jun 06 2013
Walter Bright <newshound2 digitalmars.com> writes:
On 6/6/2013 10:59 AM, Jonathan M Davis wrote:
On Thursday, June 06, 2013 10:27:28 Walter Bright wrote:
But there's no getting around the fact
that "File" and "file" are different paths under Windows, and are the same
under Linux.

I think you got that backwards. ;)

Dang, I should have written some unittests!
Jun 06 2013
Michel Fortin <michel.fortin michelf.ca> writes:
On 2013-06-06 17:27:28 +0000, Walter Bright <newshound2 digitalmars.com> said:

That doesn't work for case sensitivity/insensitivity differences nor
does it work for drive letters like "C:" (which don't exist on Apple
systems, hence they can afford to dismiss them).

Have you never opened a local file in a windows web browser and took a look at the URL? The drive letter is there. file:///c:/path/to/the%20file.txt The drive letter is simply the first part of the path on Windows.
But there's no getting around the fact that "File" and "file" are
different paths under Windows, and are the same under Linux.

Actually, it doesn't depend on Linux or Windows or OS X. It depends on the filesystem used, be it FAT16, FAT32, NTFS, ext{1,2,3}, HFS+, Case-sensitive HFS+, etc. If you assume a specific case sensitivity setting by looking at the OS, that's a bug. You can mount NTFS and FAT on Linux or OS X, and Apple has Case-sensitive HFS+ for OS X and its the default on iOS. Then there's the whole issue about which locale to use for Unicode case-insensitive comparisons. I'd bet that different filesystems choose different approaches to this tricky problem. So there's no way to normalize for case-sensitivity just by looking at a path or a URL, even if you know on which OS you're on. If you want to know for sure whether two paths are the same, or what is the normalized path, you need to ask the filesystem at some point. Anything else is based on fragile assumptions. -- Michel Fortin michel.fortin michelf.ca http://michelf.ca/
Jun 06 2013
Walter Bright <newshound2 digitalmars.com> writes:
On 6/6/2013 1:02 PM, Michel Fortin wrote:
On 2013-06-06 17:27:28 +0000, Walter Bright <newshound2 digitalmars.com> said:

That doesn't work for case sensitivity/insensitivity differences nor does it
work for drive letters like "C:" (which don't exist on Apple systems, hence
they can afford to dismiss them).

Have you never opened a local file in a windows web browser and took a look at the URL? The drive letter is there. file:///c:/path/to/the%20file.txt The drive letter is simply the first part of the path on Windows.

I didn't know that, but that doesn't make it a canonical path. It just combines the notion of url with a path.
But there's no getting around the fact that "File" and "file" are different
paths under Windows, and are the same under Linux.

Actually, it doesn't depend on Linux or Windows or OS X. It depends on the filesystem used, be it FAT16, FAT32, NTFS, ext{1,2,3}, HFS+, Case-sensitive HFS+, etc. If you assume a specific case sensitivity setting by looking at the OS, that's a bug. You can mount NTFS and FAT on Linux or OS X, and Apple has Case-sensitive HFS+ for OS X and its the default on iOS. Then there's the whole issue about which locale to use for Unicode case-insensitive comparisons. I'd bet that different filesystems choose different approaches to this tricky problem. So there's no way to normalize for case-sensitivity just by looking at a path or a URL, even if you know on which OS you're on. If you want to know for sure whether two paths are the same, or what is the normalized path, you need to ask the filesystem at some point. Anything else is based on fragile assumptions.

It may be a bug, and I personally try to never depend on path code that is case sensitive or not, but I bet there's a *lot* of code out there that makes those assumptions. BTW, Windows still has only erratic support for using / as path separators, even in the system commands. Not even the "DIR" command can deal with it.
Jun 06 2013
Walter Bright <newshound2 digitalmars.com> writes:
On 6/6/2013 1:54 PM, Steven Schveighoffer wrote:
On Thu, 06 Jun 2013 16:25:58 -0400, Walter Bright <newshound2 digitalmars.com>
wrote:

BTW, Windows still has only erratic support for using / as path separators,
even in the system commands. Not even the "DIR" command can deal with it.

We don't program using DIR. That is irrelevant. (not contesting that Windows doesn't work well with '/', just that DIR, or any other command line tool, is evidence)

The fact that DIR, probably the most widely used command in Windows, doesn't support it is indicative. I've also noticed Windows file dialog boxes not supporting it, and those are supposed to be standard components. DIR is used in .bat files and makefiles, it is certainly used in programming.
Jun 06 2013
Michel Fortin <michel.fortin michelf.ca> writes:
On 2013-06-06 20:25:58 +0000, Walter Bright <newshound2 digitalmars.com> said:

On 6/6/2013 1:02 PM, Michel Fortin wrote:
Have you never opened a local file in a windows web browser and took a look at
the URL? The drive letter is there.

file:///c:/path/to/the%20file.txt

The drive letter is simply the first part of the path on Windows.

I didn't know that, but that doesn't make it a canonical path. It just combines the notion of url with a path.

It's not a canonical path, but it's a platform-neutral representation of a path. You can perform the same operations with a URL (including regular expressions) irrespective the underlying OS. I was replying initially to your claim that there was no portable way to represent a path. I don't think the definition of a "portable path" needs to include any notion of canonical, because not even non-portable paths can be canonical these days.
Actually, it doesn't depend on Linux or Windows or OS X. It depends on the
filesystem used, be it FAT16, FAT32, NTFS, ext{1,2,3}, HFS+, Case-sensitive
HFS+, etc. If you assume a specific case sensitivity setting by looking at the
OS, that's a bug. You can mount NTFS and FAT on Linux or OS X, and Apple has
Case-sensitive HFS+ for OS X and its the default on iOS. Then there's the whole
issue about which locale to use for Unicode case-insensitive comparisons. I'd
bet that different filesystems choose different approaches to this
tricky problem.

So there's no way to normalize for case-sensitivity just by looking at
a path or
a URL, even if you know on which OS you're on. If you want to know for sure
whether two paths are the same, or what is the normalized path, you need to ask
the filesystem at some point. Anything else is based on fragile assumptions.

It may be a bug, and I personally try to never depend on path code that is case sensitive or not, but I bet there's a *lot* of code out there that makes those assumptions.

That's a good way to deal with paths (don't assume anything). And I'd bet even case-sensitive filesystems differ in behaviour when presented with different normalization of Unicode (using pre-combined characters vs. combining ones). -- Michel Fortin michel.fortin michelf.ca http://michelf.ca/
Jun 06 2013
Michel Fortin <michel.fortin michelf.ca> writes:

On 6/6/13 1:02 PM, Michel Fortin wrote:
and Apple has Case-sensitive HFS+ for OS X and its the default on iOS.

Careful.. While HFS+ can be case sensitive, it's not by default. Nor is it recommended due to the number of osx applications that just aren't designed with that in mind.

True. But what I meant is that it's the default on iOS, not OS X. (Funnily, if you're running things in the iOS Simulator you'll run on the same file system as OS X, case-sensitive most likely.) -- Michel Fortin michel.fortin michelf.ca http://michelf.ca/
Jun 07 2013
Walter Bright <newshound2 digitalmars.com> writes:
On 6/6/2013 9:00 AM, Dylan Knutson wrote:
1. Making a more 'palatable' interface is pretty much chasing rainbows. It
really isn't better, it is just different. In many ways, it is worse because
it cannot hope to duplicate the rich interface available for strings.

.toString ?
2. APIs that deal with filenames take strings and return strings, not Path
objects. Your code gets littered with path and filename components that are
sometimes Paths and sometimes strings and sometimes both.

As for APIs that return strings, a Path toPath(string) function could be added in std.path? Another solution would be to migrate the parts of Phobos that use path strings to using actual paths. They could be overloaded with a counterpart that also takes a string, but the toPath function would be pretty useful here.

Yes, your code becomes littered with conversions. Ugh.
3. Every time you deal with a filename or path, you have to decide whether to
use a Path or a string. This may seem like a small thing, but when writing a
lot of code to deal with paths, this becomes a fracking annoyance.

If there should only be one API used, I'd suggest just use Path.

Except that just doesn't work out in practice. An awful lot uses strings, and again, people want to use the incredibly rich string manipulation code out there on paths.
the more I realize how little
code would break, and how easy it'd be to fix that.

That's been used to justify every code breakage. And yet, people eschew using D because of constant code breakage. It must stop.
It even takes less chars :-P and it only allocates on Path == Path and Path ==
string comparison. Which would have been done manually anyways.

Doing memory allocation to do == is a bad idea. People intuitively think of == as a cheap operation.
Well, that's not so much a limitation of Path or path functions as much as it
is
with the operating systems themselves. You still run into that with strings.
I'm
not trying to do anything groundbreaking, just abstract away the concept of a
path so it's easy to write larger applications.

But it isn't easier to use a Path object. That's one of the things I discovered when using them - it's never easier.
Good practice says don't worry about the implementation of what you can't see.

Yeah, well, you said that == allocates memory under the hood, which is surprising behavior. Real programs definitely worry about the implementation.
If the programmer is worried about the speed of the abstraction, deal with that
separately.

Yes, he goes back to using strings.
Jun 06 2013
Walter Bright <newshound2 digitalmars.com> writes:
On 6/6/2013 9:50 AM, Dylan Knutson wrote:
Well, it comes down to are we willing to marginally break code for the sake of
a
better API. D and Phobos aren't considered stable by any standard; I don't
think
we should treat them like they're set in stone. Also, deprecation gives
developers plenty of time to update their code (if they have to at all).

I don't believe that because we broke A, therefore it's ok to break B. And secondly, it isn't clear that Path is a better API. I'm not opposed to breakage in all cases. But there needs to be a big win to justify it. I'm not seeing even a small net win for Path types. I'm not talking hypothetical either, like I said, I've tried them several times.
Projects such as Dub, Vibe, and to an extent Tango disagree.

I agree there's a strong temptation to create a Path object, and I've succumbed myself to it several times. A corollary is that people often wanted to create a String class, too, though that has died out. You might also consider David Nadlinger's counter example: "As another data point (which may or may not be relevant for the discussion here), the LLVM system/support library was initially based on Path objects, but recently has been rewritten to use raw strings: http://llvm.org/docs/doxygen/html/namespacellvm_1_1sys_1_1path.html" I've rewritten my Path code to go back to raw strings, too.
Jun 06 2013
Walter Bright <newshound2 digitalmars.com> writes:
On 6/6/2013 10:50 AM, Jonathan M Davis wrote:
Some modules have needed been redone. Some still do. But we already _did_
rework std.path. We agreed that we liked the new API, and it's been working
great. It's one thing to revisit an API that's been around since before we had
ranges or a review process. It's an entirely different thing to be constantly
reworking entire modules. I think that we need _very_ strong justification to
redesign a module that we already put through the review process. And I really
don't think that we have it here.

I think we're in violent agreement. An example of a strong justification for a redo is, for example, conversion to use ranges. std.zip needs that treatment.
Jun 06 2013
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/6/13 2:13 PM, Jonathan M Davis wrote:
An example of a strong justification for a redo is, for example, conversion
to use ranges. std.zip needs that treatment.

Agreed.

Key to success for Path: somehow get it on the ranges bandwagon :o). Andrei
Jun 06 2013
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/7/13 1:04 PM, monarch_dodra wrote:
I think using string as the main form of representation for a path is fine.

However, there are times where it is convenient to be able to explode a
path into a structure, where each part is clearly separate from the
next.

Tuple!( string, "drive", string[], "folders", string, "basename", string, "extension" ) parsePath(string path); string buildPath(string drive, string[] folders, string basename, string extension); Andrei
Jun 07 2013
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/7/13 2:10 PM, monarch_dodra wrote:
On Friday, 7 June 2013 at 17:27:16 UTC, Andrei Alexandrescu wrote:
On 6/7/13 1:04 PM, monarch_dodra wrote:
I think using string as the main form of representation for a path is
fine.

However, there are times where it is convenient to be able to explode a
path into a structure, where each part is clearly separate from the
next.

Tuple!( string, "drive", string[], "folders", string, "basename", string, "extension" ) parsePath(string path); string buildPath(string drive, string[] folders, string basename, string extension); Andrei

Yeah. That's pretty much more or less what I was describing. Except "buildPath" would take your (unnamed) tuple type directly.

No, the version I wrote is more flexible. You get to pass separate arguments to it or just pass a tuple with .expand. buildPath(parsePath("/bin/sh").expand) should rebuild "/bin/sh".
There'd be also be a "filename" member/ufcs function in there for
convenience.

I think that would be a small, but useful, addition to std.path.

Me 2. Andrei
Jun 07 2013
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/8/13 10:45 AM, monarch_dodra wrote:
On Saturday, 8 June 2013 at 14:14:33 UTC, Lars T. Kyllingstad wrote:
On Saturday, 8 June 2013 at 14:08:59 UTC, Lars T. Kyllingstad wrote:
On Friday, 7 June 2013 at 17:27:16 UTC, Andrei Alexandrescu wrote:
However, there are times where it is convenient to be able to
explode a
path into a structure, where each part is clearly separate from the
next.

Tuple!( string, "drive", string[], "folders", string, "basename", string, "extension" ) parsePath(string path); string buildPath(string drive, string[] folders, string basename, string extension);

[...] But why stop at the parts you have listed there?

The moment I clicked "Send", I realised that offering multiple decompositions would prevent recomposition, because you'd have to choose which parts to combine.

Using D's property functions, this should not actually be a problem. The struct could be opaque in regards to which members are actually attributes, and which are functions. Eg: Path path = Path(C:\MyFile.txt); path.filename = "main.cpp"; path.extension = "d"; assert(path.buildPath() == C:\main.d)); I don't see any reason for that to not work.

Looks like the proposal may be converted into something liked by all - a small PathComponents struct with the appropriate primitives. A high ratio of usefulness to size would be key to acceptance. Andrei
Jun 08 2013
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/6/13 12:50 PM, Dylan Knutson wrote:
Well, it comes down to are we willing to marginally break code for the
sake of a better API.

Well the position of "marginally" in the sentence above may be contested by some.
D and Phobos aren't considered stable by any
standard; I don't think we should treat them like they're set in stone.
Also, deprecation gives developers plenty of time to update their code
(if they have to at all).

I think this opinion is very unlikely to enjoy popularity. We actively /want/ to make Phobos more stable, so using the argument that it's not yet stable to add more instability is sure to fit the pattern of some list of fallacies. Besides, the corresponding benefits (the best solid argument that could be constructed) are at least according to some not that large to justify the cost of breakage. Andrei
Jun 06 2013
"Dylan Knutson" <tcdknutson gmail.com> writes:
On Thursday, 6 June 2013 at 15:24:09 UTC, Lars T. Kyllingstad
wrote:
On Thursday, 6 June 2013 at 14:39:03 UTC, Dylan Knutson wrote:
[...]

I don't think that there'll be any performance improvements by
making in place modification functions. Considering under the
hood the path object is just a string, and that string's
reference needs to be changed with each modification, I don't
see how manipulation can be made faster.

Why does _path have to be an immutable string? It could just as well be a char[], or it could be templated on the character type.
[...]

The more I think about it, the more partial I am to removing
the existing string methods in std.path. At most, using a Path
object increases number of characters typed by 6 (Path()).
And even then, chances are you'll be saving characters as
method names can be simplified to remove path from them:
buildNormalizedPath -> normalized, isValidPath -> isValid,
etc. Even with user code breaking, 1) D isn't exactly
considered a stable language quite yet; I'm sure that users
expect code breakage with each new release, and 2) it's
trivial to convert code that uses the string based API to the
object based API.

I know D isn't 100% stable yet, but bear in mind that this module was introduced no more than two years ago, as part of the (still-ongoing) effort to revamp the old modules from the D1 days. It was accepted with a unanimous vote after a comprehensive review by the D community. And already you want another breaking redesign? I am strongly opposed to this.

Well, keep in mind that D 2 years ago was a different beast. AFAIK, D only recently got alias X this, which solves 90% of breakage problems when passing around Paths. FWIW, having Path be an object adds consistency with the rest of Phobos, which has many entities which could be expressed as primitives, expressed as objects. To name a few, DateTime is an object, File is an object, and DirEntry is an object. Yes, they could be described as integers, or a pointer, or a string, but it's less cognitive load on the developer to recognize them as separate types.
Jun 06 2013
On Thursday, 6 June 2013 at 15:36:17 UTC, Walter Bright wrote:
I've succumbed to the temptation to do this several times over
the years.

I always wind up backing it out and going back to strings.

As another data point (which may or may not be relevant for the discussion here), the LLVM system/support library was initially based on Path objects, but recently has been rewritten to use raw strings: http://llvm.org/docs/doxygen/html/namespacellvm_1_1sys_1_1path.html David
Jun 06 2013
"Lars T. Kyllingstad" <public kyllingen.net> writes:
On Thursday, 6 June 2013 at 15:41:51 UTC, Dylan Knutson wrote:
FWIW, having Path be an object adds consistency with the rest
of Phobos, which has many entities which could be expressed as
primitives, expressed as objects. To name a few, DateTime is an
object, File is an object, and DirEntry is an object. Yes, they
could be described as integers, or a pointer, or a string, but
it's less cognitive load on the developer to recognize them as
separate types.

"Reducing cognitive load" is not the main reason these are objects. DateTime lumps together no less than six integers. File adds automatic resource management via reference counting. DirEntry caches file information to avoid repeated filesystem lookups. And so on.
Jun 06 2013
"Dylan Knutson" <tcdknutson gmail.com> writes:
On Thursday, 6 June 2013 at 15:36:17 UTC, Walter Bright wrote:
I've succumbed to the temptation to do this several times over
the years.

I always wind up backing it out and going back to strings.

The objections have all been already mentioned by others in
this thread. I understand the motivation for doing it, it seems
like a great idea,

but I am strongly opposed to it.

To repeat the objections:

1. Making a more 'palatable' interface is pretty much chasing
rainbows. It really isn't better, it is just different. In many
ways, it is worse because it cannot hope to duplicate the rich
interface available for strings.

.toString ?
2. APIs that deal with filenames take strings and return
strings, not Path objects. Your code gets littered with path
and filename components that are sometimes Paths and sometimes
strings and sometimes both.

As for APIs that return strings, a Path toPath(string) function could be added in std.path? Another solution would be to migrate the parts of Phobos that use path strings to using actual paths. They could be overloaded with a counterpart that also takes a string, but the toPath function would be pretty useful here.
3. Every time you deal with a filename or path, you have to
decide whether to use a Path or a string. This may seem like a
small thing, but when writing a lot of code to deal with paths,
this becomes a fracking annoyance.

If there should only be one API used, I'd suggest just use Path.
4. An awful lot of path manipulation is done using string
functions. Ever do regexes on paths? I do. But regex deals with
strings, not Path objects. Ditto for the rest of the universe
of code that deals with strings.

Path implicitly converts to a string.
5. You wind up with two parallel universes of functions to deal
with paths - one dealing with strings, one with Paths, oh, and
a third universe of crap that deals with mixed strings and
Paths.

Well, I didn't say this in my OP, but I did a few comments back: I'm more partial to deprecating the string API and moving to Path. I didn't think many would go for this, but the more I think about it, the more I realize how little code would break, and how easy it'd be to fix that.
6. If you try not to do (5), you break all existing code.

7. People like writing paths as "/etc/hosts", not
Path("/etc/hosts"). People will not stand for a Path
constructor that winds up allocating memory so it can rewrite
the string in a canonical path representation.

string s = "/etc/hosts" Path s = "/etc/hosts" It even takes less chars :-P and it only allocates on Path == Path and Path == string comparison. Which would have been done manually anyways.
8. There really isn't any such thing as a portable path
representation. It's more than just \ vs /. There are the drive
case matters in Linux, where it would be ignored under Windows.
There are 8.3 issues sometimes. The only thing you can do is
come up with a subset of what works across systems, and then of
course you have to go back to using strings when you need to
access D:\foo\abc.c

Well, that's not so much a limitation of Path or path functions as much as it is with the operating systems themselves. You still run into that with strings. I'm not trying to do anything groundbreaking, just abstract away the concept of a path so it's easy to write larger applications.
9. People think about paths in terms of strings, not Path
objects. Adding an abstraction layer always produces the
feeling of "what is it doing, is it allocating memory, is it
slow, is it doing something clever that I don't need/want?".
This is cognitive baggage, and interferes with writing clear,
correct code.

It's easy to think about a path as a string for trivial code. Once the application uses paths in a nontrivial manner, people write wrappers around path functions anyways. Type safety is very useful. Good practice says don't worry about the implementation of what you can't see. If the programmer is worried about the speed of the abstraction, deal with that separately. FWIW, the Path wrapper doesn't allocate unless it needs to :-)
Jun 06 2013
"Lars T. Kyllingstad" <public kyllingen.net> writes:
On Thursday, 6 June 2013 at 14:51:13 UTC, Dylan Knutson wrote:
I should have said "makes it easier to be platform
independent". Normalization is done automatically on comparison.

Yes, p1 == p2 sure looks nice, but unbeknownst to the API user, it comes at the cost of several memory allocations, and it does not perform a case-insensitive comparison on Windows in its current form. (Should it? I dunno.)
This isn't just conjecture either; there are D programs in the
wild that abstract away path strings because it's easier to
deal with them that way.
I didn't want to force paths passed in to be valid, because the
programmer might want an invalid path passed around for
whatever reason.

As others have pointed out, there are examples of the opposite too.
You came off as quite constructive; thank you :-)

:)
Jun 06 2013
"Dylan Knutson" <tcdknutson gmail.com> writes:
On Thursday, 6 June 2013 at 16:06:50 UTC, Lars T. Kyllingstad
wrote:
On Thursday, 6 June 2013 at 14:51:13 UTC, Dylan Knutson wrote:
I should have said "makes it easier to be platform
independent". Normalization is done automatically on
comparison.

Yes, p1 == p2 sure looks nice, but unbeknownst to the API user, it comes at the cost of several memory allocations, and it does not perform a case-insensitive comparison on Windows in its current form. (Should it? I dunno.)

It doesn't do any allocations that the user won't have to do anyways. Paths have to be normalized before comparison; not doing so isn't correct behavior. Eg, the strings foo../bar != bar, yet they're equivalent paths. Path encapsulates the behavior. So it's the difference between buildNormalizedPath(s1) == buildNormalizedPath(s2); and p1 == p2;
Jun 06 2013
"Dylan Knutson" <tcdknutson gmail.com> writes:
On Thursday, 6 June 2013 at 16:24:11 UTC, Walter Bright wrote:
As for APIs that return strings, a Path toPath(string)
in std.path? Another solution would be to migrate the parts of
Phobos that use
path strings to using actual paths. They could be overloaded
with a counterpart
that also takes a string, but the toPath function would be
pretty useful here.

Yes, your code becomes littered with conversions. Ugh.

As opposed to the rest of the conventions that Phobos uses?
If there should only be one API used, I'd suggest just use
Path.

Except that just doesn't work out in practice. An awful lot uses strings, and again, people want to use the incredibly rich string manipulation code out there on paths.

Hence subtyping.
the more I realize how little
code would break, and how easy it'd be to fix that.

That's been used to justify every code breakage. And yet, people eschew using D because of constant code breakage. It must stop.

Well, it comes down to are we willing to marginally break code for the sake of a better API. D and Phobos aren't considered stable by any standard; I don't think we should treat them like they're set in stone. Also, deprecation gives developers plenty of time to update their code (if they have to at all).
It even takes less chars :-P and it only allocates on Path ==
Path and Path ==
string comparison. Which would have been done manually anyways.

Doing memory allocation to do == is a bad idea. People intuitively think of == as a cheap operation.

It only allocates if buildNormalPath allocates. And if you aren't using buildNormalPath in the first place before comparing strings, you're comparing paths wrong.
Well, that's not so much a limitation of Path or path
functions as much as it is
with the operating systems themselves. You still run into that
with strings. I'm
not trying to do anything groundbreaking, just abstract away
the concept of a
path so it's easy to write larger applications.

But it isn't easier to use a Path object. That's one of the things I discovered when using them - it's never easier.

Projects such as Dub, Vibe, and to an extent Tango disagree.
Good practice says don't worry about the implementation of
what you can't see.

Yeah, well, you said that == allocates memory under the hood, which is surprising behavior. Real programs definitely worry about the implementation.

Well, they shouldn't. Profile code first, see where the hotspots are, and fix those. I'd be very surprised if path comparison and manipulation is so heavily used, it becomes a slow spot for programs. And if it does, that's not the fault of the Path struct itself, but rather of the underlying functions it uses.
If the programmer is worried about the speed of the
abstraction, deal with that
separately.

Yes, he goes back to using strings.

See above; I can't think of any use case for paths where they account for a considerable amount of run time.
Jun 06 2013
"Lars T. Kyllingstad" <public kyllingen.net> writes:
On Thursday, 6 June 2013 at 16:03:15 UTC, Andrei Alexandrescu
wrote:
[...]

8. There really isn't any such thing as a portable path
representation.
It's more than just \ vs /. There are the drive prefixes in
Windows that
where it would
be ignored under Windows. There are 8.3 issues sometimes. The
only thing
you can do is come up with a subset of what works across
systems, and
then of course you have to go back to using strings when you
need to
access D:\foo\abc.c

That is actually an argument in favor of good encapsulation, not against.

The proposed API change does not introduce good encapsulation. It introduces a super-thin wrapper around a built-in type, and replaces free functions with methods, for what gain?
Jun 06 2013
"Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 06 Jun 2013 12:14:30 -0400, Dylan Knutson <tcdknutson gmail.com>
wrote:

On Thursday, 6 June 2013 at 16:06:50 UTC, Lars T. Kyllingstad wrote:
On Thursday, 6 June 2013 at 14:51:13 UTC, Dylan Knutson wrote:
I should have said "makes it easier to be platform independent".
Normalization is done automatically on comparison.

Yes, p1 == p2 sure looks nice, but unbeknownst to the API user, it comes at the cost of several memory allocations, and it does not perform a case-insensitive comparison on Windows in its current form. (Should it? I dunno.)

It doesn't do any allocations that the user won't have to do anyways. Paths have to be normalized before comparison; not doing so isn't correct behavior. Eg, the strings foo../bar != bar, yet they're equivalent paths. Path encapsulates the behavior. So it's the difference between buildNormalizedPath(s1) == buildNormalizedPath(s2); and p1 == p2;

This can be done without allocations. -Steve
Jun 06 2013
"Lars T. Kyllingstad" <public kyllingen.net> writes:
On Thursday, 6 June 2013 at 16:14:31 UTC, Dylan Knutson wrote:
On Thursday, 6 June 2013 at 16:06:50 UTC, Lars T. Kyllingstad
wrote:
It doesn't do any allocations that the user won't have to do
anyways. Paths have to be normalized before comparison; not
doing so isn't correct behavior. Eg, the strings foo../bar !=
bar, yet they're equivalent paths. Path encapsulates the
behavior. So it's the difference between

buildNormalizedPath(s1) == buildNormalizedPath(s2);

and

p1 == p2;

To me, at least, the first one practically screams "expensive operation", whereas the second one does the exact opposite.
Jun 06 2013
"Lars T. Kyllingstad" <public kyllingen.net> writes:
On Thursday, 6 June 2013 at 17:13:10 UTC, Steven Schveighoffer
wrote:
On Thu, 06 Jun 2013 12:14:30 -0400, Dylan Knutson
<tcdknutson gmail.com> wrote:

It doesn't do any allocations that the user won't have to do
anyways. Paths have to be normalized before comparison; not
doing so isn't correct behavior. Eg, the strings foo../bar
!= bar, yet they're equivalent paths. Path encapsulates the
behavior. So it's the difference between

buildNormalizedPath(s1) == buildNormalizedPath(s2);

and

p1 == p2;

This can be done without allocations.

I know. There are a few additions that I've been planning to make for std.path for the longest time, I just haven't found the time to do so yet. Specifically, I want to add a couple of functions that deal with ranges of path segments rather than full path strings. The first one is a lazy "path normaliser": assert (equal(pathNormalizer(["foo", "bar", "..", "baz"]), ["foo", "bar", "baz"])); With this, non-allocating path comparison is easy. The verbose version of p1 == p2, which could be wrapped for convenience, is then: equal(pathNormalizer(pathSplitter(p1)), pathNormalizer(pathSplitter(p2))) You can also use filenameCmp() as a predicate to equal() to make the comparison case-insensitive on OSes where this is expected. Very general and composable, and easily wrappable. The second thing I'd like to add is an overload of buildPath() that takes a range of path segments. (Then buildNormalizedPath(p) can also be implemented as buildPath(pathNormalizer(p)).) Maybe now is a good time to get this done. :)
Jun 06 2013
"Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 06 Jun 2013 13:25:56 -0400, Lars T. Kyllingstad
<public kyllingen.net> wrote:

On Thursday, 6 June 2013 at 17:13:10 UTC, Steven Schveighoffer wrote:
On Thu, 06 Jun 2013 12:14:30 -0400, Dylan Knutson
<tcdknutson gmail.com> wrote:

It doesn't do any allocations that the user won't have to do anyways.
Paths have to be normalized before comparison; not doing so isn't
correct behavior. Eg, the strings foo../bar != bar, yet they're
equivalent paths. Path encapsulates the behavior. So it's the
difference between

buildNormalizedPath(s1) == buildNormalizedPath(s2);

and

p1 == p2;

This can be done without allocations.

I know. There are a few additions that I've been planning to make for std.path for the longest time, I just haven't found the time to do so yet. Specifically, I want to add a couple of functions that deal with ranges of path segments rather than full path strings. The first one is a lazy "path normaliser": assert (equal(pathNormalizer(["foo", "bar", "..", "baz"]), ["foo", "bar", "baz"])); With this, non-allocating path comparison is easy. The verbose version of p1 == p2, which could be wrapped for convenience, is then: equal(pathNormalizer(pathSplitter(p1)), pathNormalizer(pathSplitter(p2))) You can also use filenameCmp() as a predicate to equal() to make the comparison case-insensitive on OSes where this is expected. Very general and composable, and easily wrappable.

Great! I'd highly suggest pathEqual which takes two ranges of dchar and does the composition and OS-specific comparison for you. -Steve
Jun 06 2013
"Lars T. Kyllingstad" <public kyllingen.net> writes:
On Thursday, 6 June 2013 at 17:28:56 UTC, Steven Schveighoffer
wrote:
On Thu, 06 Jun 2013 13:25:56 -0400, Lars T. Kyllingstad
<public kyllingen.net> wrote:

On Thursday, 6 June 2013 at 17:13:10 UTC, Steven Schveighoffer
wrote:
On Thu, 06 Jun 2013 12:14:30 -0400, Dylan Knutson
<tcdknutson gmail.com> wrote:

It doesn't do any allocations that the user won't have to do
anyways. Paths have to be normalized before comparison; not
doing so isn't correct behavior. Eg, the strings foo../bar
!= bar, yet they're equivalent paths. Path encapsulates
the behavior. So it's the difference between

buildNormalizedPath(s1) == buildNormalizedPath(s2);

and

p1 == p2;

This can be done without allocations.

I know. There are a few additions that I've been planning to make for std.path for the longest time, I just haven't found the time to do so yet. Specifically, I want to add a couple of functions that deal with ranges of path segments rather than full path strings. The first one is a lazy "path normaliser": assert (equal(pathNormalizer(["foo", "bar", "..", "baz"]), ["foo", "bar", "baz"])); With this, non-allocating path comparison is easy. The verbose version of p1 == p2, which could be wrapped for convenience, is then: equal(pathNormalizer(pathSplitter(p1)), pathNormalizer(pathSplitter(p2))) You can also use filenameCmp() as a predicate to equal() to make the comparison case-insensitive on OSes where this is expected. Very general and composable, and easily wrappable.

Great! I'd highly suggest pathEqual which takes two ranges of dchar and does the composition and OS-specific comparison for you.

They don't have to be dchar if all the building blocks are templates (as the existing ones are): bool pathEqual(CaseSensitive cs = CaseSensitive.osDefault, C1, C2) (const(C1)[] p1, const(C2)[] p2) if (isSomeChar!C1 && isSomeChar!C2) { return equal!((a, b) => filenameCharCmp!cs(a, b) == 0) (pathNormalizer(pathSplitter(p1)), pathNormalizer(pathSplitter(p2))); }
Jun 06 2013
"Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Thursday, June 06, 2013 19:25:56 Lars T. Kyllingstad wrote:
I know. There are a few additions that I've been planning to
make for std.path for the longest time, I just haven't found the
time to do so yet. Specifically, I want to add a couple of
functions that deal with ranges of path segments rather than full
path strings.

Another thing to consider is overloads of some of the functions which take an output range as their first argument. There has been an increased push lately to cut down on GC allocations in Phobos, and so we're probably going to start having more functions be overloaded such that they can be used with output ranges in order to give the folks who want to avoid the GC more control - similar to how we have the overload of toString that takes a delegate (though outside of classes, since we can templatize stuff, using an output range is more flexible than a delegate, though a delegate does qualify as an ouput range apparently). - Jonathan M Davis
Jun 06 2013
"Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 06 Jun 2013 13:40:37 -0400, Lars T. Kyllingstad
<public kyllingen.net> wrote:

On Thursday, 6 June 2013 at 17:28:56 UTC, Steven Schveighoffer wrote:

Great!  I'd highly suggest pathEqual which takes two ranges of dchar
and does the composition and OS-specific comparison for you.

They don't have to be dchar if all the building blocks are templates (as the existing ones are): bool pathEqual(CaseSensitive cs = CaseSensitive.osDefault, C1, C2) (const(C1)[] p1, const(C2)[] p2) if (isSomeChar!C1 && isSomeChar!C2)

Actually, all string variants are dchar ranges :) And your solution is less general, dchar ranges don't have to be arrays. However, I don't think in practice there are any real non-array dchar ranges... One thing your version does do is explicitly say the parameters are const, which you couldn't do with a non-array dchar range. -Steve
Jun 06 2013
"Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Thursday, June 06, 2013 10:37:27 Walter Bright wrote:
On 6/6/2013 9:50 AM, Dylan Knutson wrote:
Well, it comes down to are we willing to marginally break code for the
sake of a better API. D and Phobos aren't considered stable by any
standard; I don't think we should treat them like they're set in stone.
Also, deprecation gives developers plenty of time to update their code
(if they have to at all).

And secondly, it isn't clear that Path is a better API. I'm not opposed to breakage in all cases. But there needs to be a big win to justify it. I'm not seeing even a small net win for Path types. I'm not talking hypothetical either, like I said, I've tried them several times.

Some modules have needed been redone. Some still do. But we already _did_ rework std.path. We agreed that we liked the new API, and it's been working great. It's one thing to revisit an API that's been around since before we had ranges or a review process. It's an entirely different thing to be constantly reworking entire modules. I think that we need _very_ strong justification to redesign a module that we already put through the review process. And I really don't think that we have it here. - Jonathan M Davis
Jun 06 2013
"Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 06 Jun 2013 13:47:42 -0400, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:

On 6/6/13 1:13 PM, Steven Schveighoffer wrote:
buildNormalizedPath(s1) == buildNormalizedPath(s2);

and

p1 == p2;

This can be done without allocations.

Interesting. "Show me the code!"

I think Lars summed it up nicely. It's not full working code yet, but it shows how one can do the path splitting and normalization lazily. However, it should be noted that buildNormalizedPath cannot be done without allocations, just the full comparison. -Steve
Jun 06 2013
"Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 06 Jun 2013 13:50:13 -0400, Walter Bright
<newshound2 digitalmars.com> wrote:

Path operations should not require a real filesystem. They are string manipulations, nothing more. There is huge value in that. -Steve
Jun 06 2013
"Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Thursday, June 06, 2013 10:27:28 Walter Bright wrote:
But there's no getting around the fact
that "File" and "file" are different paths under Windows, and are the same
under Linux.

I think you got that backwards. ;) - Jonathan M Davis
Jun 06 2013
"Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Thursday, June 06, 2013 13:45:44 Andrei Alexandrescu wrote:
D and Phobos aren't considered stable by any
standard; I don't think we should treat them like they're set in stone.
Also, deprecation gives developers plenty of time to update their code
(if they have to at all).

I think this opinion is very unlikely to enjoy popularity. We actively /want/ to make Phobos more stable, so using the argument that it's not yet stable to add more instability is sure to fit the pattern of some list of fallacies. Besides, the corresponding benefits (the best solid argument that could be constructed) are at least according to some not that large to justify the cost of breakage.

Agreed. Breaking stuff in an effort to create a solid, stable API is one thing (and at this point, we want to minimize even that as much as we reasonably can). Constantly going back and rebreaking stuff is quite another. We already redid std.path. It went through the full review process and was voted in. We want to move towards being _more_ stable not less. Some API breakage will still be necessary (like replacing std.xml or the streaming modules), but it's a cost that we want to avoid when it isn't necessary. Each module redesign must justify itself, and the simple fact that other modules have already been redesigned is not enough for that. Not to mention, over time, it should arguably require _more_ justification to redo a module (or make any breaking change in Phobos), because more people are using it, and we really do want to be stable. - Jonathan M Davis
Jun 06 2013
"Lars T. Kyllingstad" <public kyllingen.net> writes:
On Thursday, 6 June 2013 at 17:48:59 UTC, Steven Schveighoffer
wrote:
On Thu, 06 Jun 2013 13:40:37 -0400, Lars T. Kyllingstad
<public kyllingen.net> wrote:

On Thursday, 6 June 2013 at 17:28:56 UTC, Steven Schveighoffer
wrote:

Great!  I'd highly suggest pathEqual which takes two ranges
of dchar and does the composition and OS-specific comparison
for you.

They don't have to be dchar if all the building blocks are templates (as the existing ones are): bool pathEqual(CaseSensitive cs = CaseSensitive.osDefault, C1, C2) (const(C1)[] p1, const(C2)[] p2) if (isSomeChar!C1 && isSomeChar!C2)

Actually, all string variants are dchar ranges :) And your solution is less general, dchar ranges don't have to be arrays.

Ok, now I see what you meant.
However, I don't think in practice there are any real non-array
dchar ranges...

At least not any that also support slicing, which I think it is fair to require of "path ranges".
Jun 06 2013
"Peter Alexander" <peter.alexander.au gmail.com> writes:
Just want to chime in and say that I'm also against this change.

I can see some small benefits, but I also see problems, all of

Even if it is a small net improvement, I don't think it's
anywhere near a big enough improvement to warrant an API change.
Jun 06 2013
"Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Thursday, June 06, 2013 11:09:29 Walter Bright wrote:
On 6/6/2013 10:50 AM, Jonathan M Davis wrote:
Some modules have needed been redone. Some still do. But we already _did_
rework std.path. We agreed that we liked the new API, and it's been
working
great. It's one thing to revisit an API that's been around since before we
had ranges or a review process. It's an entirely different thing to be
constantly reworking entire modules. I think that we need _very_ strong
justification to redesign a module that we already put through the review
process. And I really don't think that we have it here.

I think we're in violent agreement.

An example of a strong justification for a redo is, for example, conversion
to use ranges. std.zip needs that treatment.

Agreed. - Jonathan M Davis
Jun 06 2013
"Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Thursday, June 06, 2013 13:53:51 Steven Schveighoffer wrote:
On Thu, 06 Jun 2013 13:50:13 -0400, Walter Bright

<newshound2 digitalmars.com> wrote:

Path operations should not require a real filesystem. They are string manipulations, nothing more. There is huge value in that.

Agreed, but symlinks highlight the fact that there is a difference between paths being equal and paths referring to the same file. - Jonathan M Davis
Jun 06 2013
"Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Thursday, June 06, 2013 14:38:41 Andrei Alexandrescu wrote:
On 6/6/13 2:13 PM, Jonathan M Davis wrote:
An example of a strong justification for a redo is, for example,
conversion
to use ranges. std.zip needs that treatment.

Agreed.

Key to success for Path: somehow get it on the ranges bandwagon :o).

LOL. Well, given that strings are _already_ ranges, that wouldn't help it anywhere near as much as it does with other cases of code breakage, since std.path is already quite range-ready. - Jonathan M Davis
Jun 06 2013
"Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 06 Jun 2013 16:25:58 -0400, Walter Bright
<newshound2 digitalmars.com> wrote:

BTW, Windows still has only erratic support for using / as path
separators, even in the system commands. Not even the "DIR" command can
deal with it.

We don't program using DIR. That is irrelevant. (not contesting that Windows doesn't work well with '/', just that DIR, or any other command line tool, is evidence) -Steve
Jun 06 2013
"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Jun 06, 2013 at 02:38:41PM -0400, Andrei Alexandrescu wrote:
On 6/6/13 2:13 PM, Jonathan M Davis wrote:
An example of a strong justification for a redo is, for example,
conversion to use ranges. std.zip needs that treatment.

Agreed.

Key to success for Path: somehow get it on the ranges bandwagon :o).

Hmm. Let's see: assert(isInputRange!Path); version(Windows) auto p = Path(..\blah\blah\..\bluh); else version(Linux) auto p = Path(../blah/blah/../bluh); // I'm assuming auto normalization; if you don't like that, // pretend I also wrote this line: // p.normalize(); assert(p.equals([ "..", "blah", "bluh" ]); What about that? ;-) While the above may *look* attractive, it's actually a minefield full of pitfalls. Consider this directory tree in Posix: /home/user/test /home/user/test/symlink -> /home/user/real/1 /home/user/test/real /home/user/test/real/1/myfile /home/user/test/real/2/anotherfile Let's say the current working directory is /home/user. Now consider this: auto p = Path(test/symlink/../2/anotherfile); assert(std.path.exists(p)); // should this work? The only way the above can actually work is if normalization queries the filesystem. That is to say, it is NOT mere string manipulations. However, *should* normalization always check the filesystem? What if the program is constructing a list of paths that it's going to create, which don't exist in the filesystem yet? Then normalization will fail, even though the paths are valid. Conclusion: correct path normalization depends on intent, which only the programmer knows -- the library can't possibly figure this out without being told. (And I haven't even started getting into OS-dependent path manipulation yet... what should Path(C:\Program Files\abc.def) do on a Posix system?) IOW, the programmer *already* has to know about system-dependent details of paths, so I'm not sure what value Path is really adding. At least, I'm not finding it compelling enough to eschew plain old string manipulations. Besides, should glob patterns like "/home/user/prog/*/*.d" be Path's or strings? What about path regexes? Should Path export a whole suite of parallel methods for constructing such patterns? One can always interconvert to/from strings, of course, but if we'd started out with strings in the first place, we wouldn't need any conversions. The OS ultimately takes only strings anyway, so is there really a need to insert a convert to/from Path in between? I do see a lot of value in providing *functions* for manipulating path strings (normalizations, parsing path components, splitting file extensions, etc.), but I've a hard time with encapsulating a path string in an opaque object when it doesn't really give that much more value. If you *really* like the idea of Path, nothing stops you from writing one yourself, and have it implicitly convert to string so that you can pass it directly to OS functions that take paths. I just don't see value in requiring Phobos functions to only take Path objects. T -- WINDOWS = Will Install Needless Data On Whole System -- CompuMan
Jun 06 2013
"Robert Clipsham" <robert octarineparrot.com> writes:
On Thursday, 6 June 2013 at 15:36:17 UTC, Walter Bright wrote:
On 6/4/2013 11:27 PM, Dylan Knutson wrote:
I'd like to open up the idea of Path being an object in
std.path. I've submitted
a pull
(https://github.com/D-Programming-Language/phobos/pull/1333)
Path struct to std.path, "which exposes a much more palatable
interface to path
string manipulation".

I've succumbed to the temptation to do this several times over the years. I always wind up backing it out and going back to strings.

As another data point: Java 7 introduces new Path and Paths objects: http://docs.oracle.com/javase/7/docs/api/java/nio/file/Paths.html So they clearly think using an object(s) for it is useful. ----- Without even thinking about the API, just using it, all the code I've written in the past couple of weeks looks something like this: Path p = Paths.get(someDir, someOtherDir); p = p.subpath(otherPath, p.getNameCount()); Path file = p.resolve(someFile); print(file.toString()); file.toFile().doSomething(); ie. All my code is converting to/from a Path object purely for dealing with Windows and Posix / vs \ differences and doing sub-paths. Seems a bit pointless when we could just use free functions in my opinion.
Jun 06 2013
"Regan Heath" <regan netmail.co.nz> writes:
On Thu, 06 Jun 2013 15:54:24 +0100, Dylan Knutson <tcdknutson gmail.com>
wrote:

On Thursday, 6 June 2013 at 10:48:54 UTC, Lars T. Kyllingstad wrote:
On Thursday, 6 June 2013 at 10:32:36 UTC, Regan Heath wrote:
On Thu, 06 Jun 2013 08:55:50 +0100, Lars T. Kyllingstad
<public kyllingen.net> wrote:

On Thursday, 6 June 2013 at 07:05:52 UTC, Lars T. Kyllingstad wrote:
[...]

Let me add some more to this. To justify the addition of such a type, it needs to pull its own weight. For added value, it could do one or both of the following:

Does System.IO.DirectoryInfo: http://msdn.microsoft.com/en-us/library/system.io.directoryinfo.aspx Add sufficient value to justify it's existence to your mind? vs just having System.IO.Directory: http://msdn.microsoft.com/en-us/library/system.io.directory.aspx

They add great value, but that is a completely different discussion, as these are more similar to std.file.DirEntry. The added value is mainly in the performance benefits; for example, if (exists(f) && isFile(f) && timeLastModified(f) < d) ... requires three filesystem lookups (stat() calls), whereas auto de = dirEntry(f); if (de.exists && de.isFile && de.timeLastModified < d) ... is just one. I see no such benefit in the proposed Path type.

Path and dirEntry are different modules with different goals to fulfill. I don't think it's appropriate to compare a module whose function is path manipulation with one whose is querying filesystem information.

Yeah, my fault. I didn't take the time to look at the proposed module in detail. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Jun 07 2013
"monarch_dodra" <monarchdodra gmail.com> writes:
On Thursday, 6 June 2013 at 19:29:08 UTC, Jonathan M Davis wrote:
On Thursday, June 06, 2013 14:38:41 Andrei Alexandrescu wrote:
On 6/6/13 2:13 PM, Jonathan M Davis wrote:
An example of a strong justification for a redo is, for
example,
conversion
to use ranges. std.zip needs that treatment.

Agreed.

Key to success for Path: somehow get it on the ranges bandwagon :o).

LOL. Well, given that strings are _already_ ranges, that wouldn't help it anywhere near as much as it does with other cases of code breakage, since std.path is already quite range-ready. - Jonathan M Davis

Something I wanted to add: I think using string as the main form of representation for a path is fine. However, there are times where it is convenient to be able to explode a path into a structure, where each part is clearly separate from the next. This makes it easy to do certain otherwise hard to do operations. eg: Change: C:\Users\Monarch\Docs\MyFile.txt to D:\Users\Monarch\MyFile.txt Regexes are fun and all, but they do come with their own complications, and pitfalls. And they *do* require efforts to write. Or use the existing interface. It works, I won't argue agains it, but I do find times where it is kind of clunky. I'd be in favor of having a "Path" object, if only for being able to help in the construction or modification of string paths. For example, I imagine something as: string oldPath = C:\Users\Monarch\Docs\MyFile.txt: Path myPath = Path(oldPath); myPath.drive = 'D'; myPath.folders = myPath.folders[0 .. \$ - 1]; string newPath = myPath.build; I think it would be useful to have that. None of the existing interfaces change. It's just an optional tool that I think would be convenient. -------- If I may present an analogy: C deals with "time" using the arithmetic "time_t" primitive. It works, is mostly convenient, and is the standard API. Still, C also proposes the "struct tm", which is a time, exploded into year/month/day/hours/min/sec. You can do nothing with this type, except, well read and write to it, and convert it back to/from time_t. Yet, is has its uses, if only being presented in a way that might be more natural to manipulate. And that is reason enough for its existence.
Jun 07 2013
"monarch_dodra" <monarchdodra gmail.com> writes:
On Friday, 7 June 2013 at 17:27:16 UTC, Andrei Alexandrescu wrote:
On 6/7/13 1:04 PM, monarch_dodra wrote:
I think using string as the main form of representation for a
path is fine.

However, there are times where it is convenient to be able to
explode a
path into a structure, where each part is clearly separate
from the
next.

Tuple!( string, "drive", string[], "folders", string, "basename", string, "extension" ) parsePath(string path); string buildPath(string drive, string[] folders, string basename, string extension); Andrei

Yeah. That's pretty much more or less what I was describing. Except "buildPath" would take your (unnamed) tuple type directly. There'd be also be a "filename" member/ufcs function in there for convenience. I think that would be a small, but useful, addition to std.path.
Jun 07 2013
"John Colvin" <john.loughran.colvin gmail.com> writes:
On Friday, 7 June 2013 at 18:26:42 UTC, Andrei Alexandrescu wrote:
On 6/7/13 2:10 PM, monarch_dodra wrote:
On Friday, 7 June 2013 at 17:27:16 UTC, Andrei Alexandrescu
wrote:
On 6/7/13 1:04 PM, monarch_dodra wrote:
I think using string as the main form of representation for
a path is
fine.

However, there are times where it is convenient to be able
to explode a
path into a structure, where each part is clearly separate
from the
next.

Tuple!( string, "drive", string[], "folders", string, "basename", string, "extension" ) parsePath(string path); string buildPath(string drive, string[] folders, string basename, string extension); Andrei

Yeah. That's pretty much more or less what I was describing. Except "buildPath" would take your (unnamed) tuple type directly.

No, the version I wrote is more flexible. You get to pass separate arguments to it or just pass a tuple with .expand. buildPath(parsePath("/bin/sh").expand) should rebuild "/bin/sh".
There'd be also be a "filename" member/ufcs function in there
for
convenience.

I think that would be a small, but useful, addition to
std.path.

Me 2. Andrei

An overload for buildPath that took the tuple directly would be good. Typing expand all the time would get tiresome if you were doing lots of this.
Jun 07 2013
On 6/6/13 1:02 PM, Michel Fortin wrote:
and Apple has Case-sensitive HFS+ for OS X and its the default on iOS.

Careful.. While HFS+ can be case sensitive, it's not by default. Nor is it recommended due to the number of osx applications that just aren't designed with that in mind.
Jun 07 2013
"Lars T. Kyllingstad" <public kyllingen.net> writes:
On Friday, 7 June 2013 at 17:27:16 UTC, Andrei Alexandrescu wrote:
On 6/7/13 1:04 PM, monarch_dodra wrote:
I think using string as the main form of representation for a
path is fine.

However, there are times where it is convenient to be able to
explode a
path into a structure, where each part is clearly separate
from the
next.

Tuple!( string, "drive", string[], "folders", string, "basename", string, "extension" ) parsePath(string path); string buildPath(string drive, string[] folders, string basename, string extension);

This is a good idea. Not only is it convenient, but as there is a lot of overlap in the work done by the various path decomposition functions, it will also improve performance when you need the results of several of them. But why stop at the parts you have listed there? Why not offer every possible decomposition the user could ever want? It's about the same amount of work, because the number of "split points" you need to find is exactly the same. Splitting the directory part into separate segments should be optional, since it allocates. DecomposedPath!(inout(C)) decompose(inout(C)[] path, bool splitDir = true); struct DecomposedPath(C) if (isSomeChar!C) { C[] driveName; /// Equal to driveName() C[] dirName; /// Equal to dirName() C[] noDriveDir; /// Equal to dirName().stripDrive() C[] rootName; /// Equal to rootName() C[] baseName; /// Equal to baseName() C[] stem; /// Equal to baseName().stripExtension() C[] extension; /// Equal to extension() /// Equal to dirName().pathSplitter().array() (optional) C[][] dirSegments; }
Jun 08 2013
"Lars T. Kyllingstad" <public kyllingen.net> writes:
On Saturday, 8 June 2013 at 14:08:59 UTC, Lars T. Kyllingstad
wrote:
On Friday, 7 June 2013 at 17:27:16 UTC, Andrei Alexandrescu
wrote:
However, there are times where it is convenient to be able to
explode a
path into a structure, where each part is clearly separate
from the
next.

Tuple!( string, "drive", string[], "folders", string, "basename", string, "extension" ) parsePath(string path); string buildPath(string drive, string[] folders, string basename, string extension);

[...] But why stop at the parts you have listed there?

The moment I clicked "Send", I realised that offering multiple decompositions would prevent recomposition, because you'd have to choose which parts to combine.
Jun 08 2013
"monarch_dodra" <monarchdodra gmail.com> writes:
On Saturday, 8 June 2013 at 14:14:33 UTC, Lars T. Kyllingstad
wrote:
On Saturday, 8 June 2013 at 14:08:59 UTC, Lars T. Kyllingstad
wrote:
On Friday, 7 June 2013 at 17:27:16 UTC, Andrei Alexandrescu
wrote:
However, there are times where it is convenient to be able
to explode a
path into a structure, where each part is clearly separate
from the
next.

Tuple!( string, "drive", string[], "folders", string, "basename", string, "extension" ) parsePath(string path); string buildPath(string drive, string[] folders, string basename, string extension);

[...] But why stop at the parts you have listed there?

The moment I clicked "Send", I realised that offering multiple decompositions would prevent recomposition, because you'd have to choose which parts to combine.

Using D's property functions, this should not actually be a problem. The struct could be opaque in regards to which members are actually attributes, and which are functions. Eg: Path path = Path(C:\MyFile.txt); path.filename = "main.cpp"; path.extension = "d"; assert(path.buildPath() == C:\main.d)); I don't see any reason for that to not work.
Jun 08 2013