digitalmars.D - Path as an object in std.path

"Dylan Knutson" <tcdknutson gmail.com> writes:
Hello,
I'd like to open up the idea of Path being an object in std.path.
I've submitted a pull
(https://github.com/D-Programming-Language/phobos/pull/1333) that
adds a Path struct to std.path, "which exposes a much more
palatable interface to path string manipulation".

As jmdavis points out, this has previously been discussed.
However, I can't find that discussion, and I think that the
benefits of including an OO way to deal with paths is a serious
gain for the standard library.

Why I think it should be reconsidered for inclusion in the std
(listed in the pull):
* Adds a (more) platform independent abstraction for path strings.
* Path provides a type safe way to pass, compare, and manipulate
arbitrary path strings.
* It wraps over the functions defined in std.path, so behavior of
methods on Path are, in most cases, identical to their
corresponding module function.

to see this commit closed due to a discussion that happened at a
different point in D's development when the language had
different needs.

Thank you.

Jun 04 2013
"Joshua Niehus" <jm.niehus gmail.com> writes:
On Wednesday, 5 June 2013 at 06:27:46 UTC, Dylan Knutson wrote:
"which exposes a much more palatable interface to path string
manipulation".
[...snip...]

personally, I prefer the current implementation and found it easy
to use for the multitudes of tiny scripts I've written.  I
wouldn't like to create an "object" just to call isAbsolute.

That being said, I don't see why having the struct would hurt.

Nice work by the way

Jun 05 2013
Jacob Carlborg <doob me.com> writes:
On 2013-06-05 09:11, Joshua Niehus wrote:

personally, I prefer the current implementation and found it easy to use
for the multitudes of tiny scripts I've written.  I wouldn't like to
create an "object" just to call isAbsolute.

I agree. But if you're passing around a lot of paths it would probably
be a good idea to have a proper type for the paths.

That being said, I don't see why having the struct would hurt.

--
/Jacob Carlborg

Jun 05 2013
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/5/13 7:33 AM, John Colvin wrote:
On Wednesday, 5 June 2013 at 07:11:49 UTC, Joshua Niehus wrote:
On Wednesday, 5 June 2013 at 06:27:46 UTC, Dylan Knutson wrote:
"which exposes a much more palatable interface to path string
manipulation".
[...snip...]

personally, I prefer the current implementation and found it easy to
use for the multitudes of tiny scripts I've written. I wouldn't like
to create an "object" just to call isAbsolute.

That being said, I don't see why having the struct would hurt.

Nice work by the way

Is there any reason why we couldn't keep the string-based free functions
around as well?

I don't have a strong opinion regarding Path object vs. string
would be opposed to having both.

Andrei

Jun 05 2013
"John Colvin" <john.loughran.colvin gmail.com> writes:
On Wednesday, 5 June 2013 at 07:11:49 UTC, Joshua Niehus wrote:
On Wednesday, 5 June 2013 at 06:27:46 UTC, Dylan Knutson wrote:
"which exposes a much more palatable interface to path string
manipulation".
[...snip...]

personally, I prefer the current implementation and found it
easy to use for the multitudes of tiny scripts I've written.  I
wouldn't like to create an "object" just to call isAbsolute.

That being said, I don't see why having the struct would hurt.

Nice work by the way

Is there any reason why we couldn't keep the string-based free
functions around as well?

Jun 05 2013
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/5/13 2:27 AM, Dylan Knutson wrote:
Hello,
I'd like to open up the idea of Path being an object in std.path. I've
submitted a pull
Path struct to std.path, "which exposes a much more palatable interface
to path string manipulation".

Great, thanks for this work. I agree that the proposal deserves a fair
shake.

Andrei

Jun 05 2013
"John Colvin" <john.loughran.colvin gmail.com> writes:
On Wednesday, 5 June 2013 at 13:26:39 UTC, Andrei Alexandrescu
wrote:
On 6/5/13 7:33 AM, John Colvin wrote:
On Wednesday, 5 June 2013 at 07:11:49 UTC, Joshua Niehus wrote:
On Wednesday, 5 June 2013 at 06:27:46 UTC, Dylan Knutson
wrote:
"which exposes a much more palatable interface to path string
manipulation".
[...snip...]

personally, I prefer the current implementation and found it
easy to
use for the multitudes of tiny scripts I've written. I
wouldn't like
to create an "object" just to call isAbsolute.

That being said, I don't see why having the struct would hurt.

Nice work by the way

Is there any reason why we couldn't keep the string-based free
functions
around as well?

I don't have a strong opinion regarding Path object vs. string
But I would be opposed to having both.

Andrei

Because of duplication of implementation? Or is it simply "2 ways
to do the same thing" is bad?

I was imagining the following situation:

Free functions, similar/identical to current

Struct that provides all current functionality by wrapping
the free functions, plus any extra stuff that is only appropriate
for a path object.

Unfortunately the current naming scheme doesn't really suit this
idea that well.

Jun 05 2013
"Regan Heath" <regan netmail.co.nz> writes:
On Wed, 05 Jun 2013 14:26:39 +0100, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:

On 6/5/13 7:33 AM, John Colvin wrote:
On Wednesday, 5 June 2013 at 07:11:49 UTC, Joshua Niehus wrote:
On Wednesday, 5 June 2013 at 06:27:46 UTC, Dylan Knutson wrote:
"which exposes a much more palatable interface to path string
manipulation".
[...snip...]

personally, I prefer the current implementation and found it easy to
use for the multitudes of tiny scripts I've written. I wouldn't like
to create an "object" just to call isAbsolute.

That being said, I don't see why having the struct would hurt.

Nice work by the way

Is there any reason why we couldn't keep the string-based free functions
around as well?

I don't have a strong opinion regarding Path object vs. string
would be opposed to having both.

C# has both:
1. System.IO.FileInfo and System.IO.DirectoryInfo non-static/instance
classes with methods i.e. Delete()
2. System.File and System.Directory static classes with methods accepting
strings i.e. Delete(string name)

R

--
Using Opera's revolutionary email client: http://www.opera.com/mail/

Jun 05 2013
"Regan Heath" <regan netmail.co.nz> writes:
On Wed, 05 Jun 2013 15:12:22 +0100, Regan Heath <regan netmail.co.nz>
wrote:

On Wed, 05 Jun 2013 14:26:39 +0100, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:

On 6/5/13 7:33 AM, John Colvin wrote:
On Wednesday, 5 June 2013 at 07:11:49 UTC, Joshua Niehus wrote:
On Wednesday, 5 June 2013 at 06:27:46 UTC, Dylan Knutson wrote:
"which exposes a much more palatable interface to path string
manipulation".
[...snip...]

personally, I prefer the current implementation and found it easy to
use for the multitudes of tiny scripts I've written. I wouldn't like
to create an "object" just to call isAbsolute.

That being said, I don't see why having the struct would hurt.

Nice work by the way

Is there any reason why we couldn't keep the string-based free
functions
around as well?

I don't have a strong opinion regarding Path object vs. string
would be opposed to having both.

C# has both:
1. System.IO.FileInfo and System.IO.DirectoryInfo non-static/instance
classes with methods i.e. Delete()
2. System.File and System.Directory static classes with methods
accepting strings i.e. Delete(string name)

I forgot to say.. I've used both in different situations.  Sometimes you
get a FileInfo/DirectoryInfo from another method, or you have created one
because you're going to re-use the path/information a lot (to get file
attributes etc) and sometimes you just need to build a path using
Path.Combine (into a string) and delete it, or similar.

R

--
Using Opera's revolutionary email client: http://www.opera.com/mail/

Jun 05 2013
"Dylan Knutson" <tcdknutson gmail.com> writes:
 I don't have a strong opinion regarding Path object vs. string
But I would be opposed to having both.

Personally, I'd prefer to just use the Path struct in std.path.
While a Path can be represented as a string, it's not the same
concept (the same way that a DateTime can be represented as an
integer, but we don't just pass times around as raw integer
types).

However, I can't imagine that'd be feasible as it'd break a lot
of code. My suggestion would be to just keep the freestanding
functions to operate on raw strings, and then migrate over code
that depends on std.path to use the Path struct as needed. I
realize that this is easier said than done, but even then it
shouldn't be a lot of work as Path can implicitly be converted to
a string.

This makes its integration into existing codebases/Phobos
literally as easy as using "Path my_path = foo\bar" instead of
"string my_path = foo\bar". You lose no functionality but gain
type safety if you have to do any future refactoring.

I wouldn't like to create an "object" just to call isAbsolute.

Agreed. The best course of action would probably be keep the raw
functions as they exist (think of them as the static versions of
Path methods). However, for large applications, the type safety
of a Path object makes working with paths much easier.

Jun 05 2013
Timothee Cour <thelastmammoth gmail.com> writes:
--089e0115f3963a355d04de6e0afe
Content-Type: text/plain; charset=ISO-8859-1

currently there's no way to perform cross-platform operations.

---
enum Platform{Posix,Windows}
version(Posix) enum PlatformDefault=Platform.Posix; else enum
PlatformDefault=Platform.Windows;
struct Path(T=PlatformDefault){}

void main(){
Path!(Platform.Posix) path="a\b";
auto path2=path.to!Path;
}
---

it allows current usage with no modification, and allows cross-platform
logic.

On Wed, Jun 5, 2013 at 1:19 PM, Dylan Knutson <tcdknutson gmail.com> wrote:

I don't have a strong opinion regarding Path object vs. string functions,
and I agree both have advantages and disadvantages. But I would be opposed
to having both.

Personally, I'd prefer to just use the Path struct in std.path. While a
Path can be represented as a string, it's not the same concept (the same
way that a DateTime can be represented as an integer, but we don't just
pass times around as raw integer types).

However, I can't imagine that'd be feasible as it'd break a lot of code.
My suggestion would be to just keep the freestanding functions to operate
on raw strings, and then migrate over code that depends on std.path to use
the Path struct as needed. I realize that this is easier said than done,
but even then it shouldn't be a lot of work as Path can implicitly be
converted to a string.

This makes its integration into existing codebases/Phobos literally as
easy as using "Path my_path = foo\bar" instead of "string my_path =
foo\bar". You lose no functionality but gain type safety if you have to
do any future refactoring.

I wouldn't like to create an "object" just to call isAbsolute.

Agreed. The best course of action would probably be keep the raw functions
as they exist (think of them as the static versions of Path methods).
However, for large applications, the type safety of a Path object makes
working with paths much easier.

--089e0115f3963a355d04de6e0afe
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

currently there&#39;s no way to perform cross-platform operations.<div><br>=
<div>version(Posix) enum PlatformDefault=3DPlatform.Posix; else enum Platfo=
rmDefault=3DPlatform.Windows;</div>
<div>struct Path(T=3DPlatformDefault){}</div><div><br></div><div>void main(=
){</div><div>Path!(Platform.Posix) path=3D&quot;a\b&quot;;</div><div>auto p=
ath2=3D<a href=3D"http://path.to">path.to</a>!Path;</div><div>}</div><div><=
div>
---</div></div><div><br></div><div>it allows current usage with no modifica=
tion, and allows cross-platform logic.</div><div><br></div><div><br><div cl=
ass=3D"gmail_quote">On Wed, Jun 5, 2013 at 1:19 PM, Dylan Knutson <span dir=
=3D"ltr">&lt;<a href=3D"mailto:tcdknutson gmail.com" target=3D"_blank">tcdk=
nutson gmail.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:=
1ex">
I don&#39;t have a strong opinion regarding Path object vs. string function=
s, and I agree both have advantages and disadvantages. But I would be oppos=
ed to having both.<br>
</blockquote>
<br></div>
Personally, I&#39;d prefer to just use the Path struct in std.path. While a=
Path can be represented as a string, it&#39;s not the same concept (the sa=
me way that a DateTime can be represented as an integer, but we don&#39;t j=
ust pass times around as raw integer types).<br>

<br>
However, I can&#39;t imagine that&#39;d be feasible as it&#39;d break a lot=
of code. My suggestion would be to just keep the freestanding functions to=
operate on raw strings, and then migrate over code that depends on std.pat=
h to use the Path struct as needed. I realize that this is easier said than=
done, but even then it shouldn&#39;t be a lot of work as Path can implicit=
ly be converted to a string.<br>

<br>
This makes its integration into existing codebases/Phobos literally as easy=
as using &quot;Path my_path =3D foo\bar&quot; instead of &quot;string my=
_path =3D foo\bar&quot;. You lose no functionality but gain type safety i=
f you have to do any future refactoring.<div class=3D"im">
<br>
<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
=A0I wouldn&#39;t like to create an &quot;object&quot; just to call isAbsol=
ute.<br>
</blockquote>
<br></div>
Agreed. The best course of action would probably be keep the raw functions =
as they exist (think of them as the static versions of Path methods). Howev=
er, for large applications, the type safety of a Path object makes working =
with paths much easier.<br>

</blockquote></div><br></div></div>

--089e0115f3963a355d04de6e0afe--

Jun 05 2013
"Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Wednesday, June 05, 2013 22:19:21 Dylan Knutson wrote:
I don't have a strong opinion regarding Path object vs. string
But I would be opposed to having both.

Personally, I'd prefer to just use the Path struct in std.path.
While a Path can be represented as a string, it's not the same
concept (the same way that a DateTime can be represented as an
integer, but we don't just pass times around as raw integer
types).

There's a significant difference between a type which has a value and units and
one which is basically just a string or array of strings wrapped by another
type. Not that a Path struct is without value, but I think that there's a very
large difference in the amount of value that the two provide. AFAIK, very few
bugs are caused by treating paths as strings, but there are a lot of time-
related bugs out there caused by using naked values instead of values with
units.

This makes its integration into existing codebases/Phobos
literally as easy as

See, this is exactly the sort of thing I'm afraid of. I don't want to have to
have arguments over whether a particular function should accept a path as a
string or a struct. Right now, we have one way do to it, so it's clear, and it
works just fine. If we add a Path struct, then we have two ways to do the same
thing, and we're going to have a division among APIs as to which way to handle
paths. And I think that we'll be very much worse of because of it. While there
is value in having a path struct rather than a string, I don't think that it's
worth the extra confusion and division that it'll cause. If we were going to
have a path struct, we should have done that in the first place rather than
using strings.

- Jonathan M Davis

Jun 05 2013
Timothee Cour <thelastmammoth gmail.com> writes:
--001a11c309a0e9fe8804de6e4f23
Content-Type: text/plain; charset=ISO-8859-1

On Wed, Jun 5, 2013 at 1:34 PM, Jonathan M Davis <jmdavisProg gmx.com>wrote:

On Wednesday, June 05, 2013 22:19:21 Dylan Knutson wrote:
I don't have a strong opinion regarding Path object vs. string
But I would be opposed to having both.

Personally, I'd prefer to just use the Path struct in std.path.
While a Path can be represented as a string, it's not the same
concept (the same way that a DateTime can be represented as an
integer, but we don't just pass times around as raw integer
types).

There's a significant difference between a type which has a value and
units and
one which is basically just a string or array of strings wrapped by another
type. Not that a Path struct is without value, but I think that there's a
very
large difference in the amount of value that the two provide. AFAIK, very
few
bugs are caused by treating paths as strings,

I disagree.

It allows to catch bugs early (eg: giving a $mypath environment variable to a binary, where the env variable wasn't set or set to an invalid path name). Constructing a Path object from it will immediately fail as opposed to later down the code with possibly evil artifacts (eg when using std.process.shell functions). Other advantage : central location for all path object creations allows to instrument the code for example for logging all path names mentioned. Would be impossible with raw string type. Other advantage: makes it easy to work with cross-platform code (ie operating on windows paths from posix), see my previous post in this thread. I very much welcome this. There's a reason why other languages (C#, java) have such an abstraction. Given D's alias this functionality, this abstraction comes at 0 runtime cost and makes it work with 0 adaptation for most existing code. What will it break? We should discuss that. but there are a lot of time- related bugs out there caused by using naked values instead of values with units. This makes its integration into existing codebases/Phobos literally as easy as See, this is exactly the sort of thing I'm afraid of. I don't want to have to have arguments over whether a particular function should accept a path as a string or a struct. Right now, we have one way do to it, so it's clear, and it works just fine. If we add a Path struct, then we have two ways to do the same thing, and we're going to have a division among APIs as to which way to handle paths. And I think that we'll be very much worse of because of it. While there is value in having a path struct rather than a string, I don't think that it's worth the extra confusion and division that it'll cause. If we were going to have a path struct, we should have done that in the first place rather than using strings. - Jonathan M Davis --001a11c309a0e9fe8804de6e4f23 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable <div class=3D"gmail_quote">On Wed, Jun 5, 2013 at 1:34 PM, Jonathan M Davis= <span dir=3D"ltr">&lt;<a href=3D"mailto:jmdavisProg gmx.com" target=3D"_bl= ank">jmdavisProg gmx.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmai= l_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left= :1ex"> <div class=3D"im">On Wednesday, June 05, 2013 22:19:21 Dylan Knutson wrote:= <br> &gt; &gt; I don&#39;t have a strong opinion regarding Path object vs. strin= g<br> &gt; &gt; functions, and I agree both have advantages and disadvantages.<br= &gt;<br> &gt; Personally, I&#39;d prefer to just use the Path struct in std.path.<br= &gt; concept (the same way that a DateTime can be represented as an<br> &gt; integer, but we don&#39;t just pass times around as raw integer<br> &gt; types).<br> <br> </div>There&#39;s a significant difference between a type which has a value= and units and<br> one which is basically just a string or array of strings wrapped by another= <br> type. Not that a Path struct is without value, but I think that there&#39;s= a very<br> large difference in the amount of value that the two provide. AFAIK, very f= ew<br> bugs are caused by treating paths as strings, </blockquote><div><br></div><= div>I disagree.</div><div><br></div><div>It allows to catch bugs early (eg:= giving a$mypath environment variable to a binary, where the env variable =
wasn&#39;t set or set to an invalid path name). Constructing a Path object =
from it will immediately fail as opposed to later down the code with possib=
ly evil artifacts (eg when using std.process.shell functions).</div>
<div><br></div><div>Other advantage : central location for all path object =
creations allows to instrument the code for example for logging all path na=
mes mentioned. Would be impossible with raw string type.</div><div><br>
</div><div>Other advantage: makes it easy to work with cross-platform code =
(ie operating on windows paths from posix), see my previous post in this th=
read.</div><div><br></div><div>I very much welcome this. There&#39;s a reas=
on why other languages (C#, java) have such an abstraction. Given D&#39;s a=
lias this functionality, this abstraction comes at 0 runtime cost and makes=
it work with 0 adaptation for most existing code.=A0</div>
<div><br></div><div>What will it break? We should discuss that.</div><div><=
br></div><div>=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0 =
0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">but there are a lot o=
f time-<br>

related bugs out there caused by using naked values instead of values with<=
br>
units.<br>
<div class=3D"im"><br>
&gt; This makes its integration into existing codebases/Phobos<br>
&gt; literally as easy as<br>
</div>[snip]<br>
<br>
See, this is exactly the sort of thing I&#39;m afraid of. I don&#39;t want =
to have to<br>
have arguments over whether a particular function should accept a path as a=
<br>
string or a struct. Right now, we have one way do to it, so it&#39;s clear,=
and it<br>
works just fine. If we add a Path struct, then we have two ways to do the s=
ame<br>
thing, and we&#39;re going to have a division among APIs as to which way to=
handle<br>
paths. And I think that we&#39;ll be very much worse of because of it. Whil=
e there<br>
is value in having a path struct rather than a string, I don&#39;t think th=
at it&#39;s<br>
worth the extra confusion and division that it&#39;ll cause. If we were goi=
ng to<br>
have a path struct, we should have done that in the first place rather than=
<br>
using strings.<br>
<span class=3D"HOEnZb"><font color=3D"#888888"><br>
- Jonathan M Davis<br>
</font></span></blockquote></div><br>

--001a11c309a0e9fe8804de6e4f23--

Jun 05 2013
"Dylan Knutson" <tcdknutson gmail.com> writes:
 There's a significant difference between a type which has a
value and units and
one which is basically just a string or array of strings
wrapped by another
type. Not that a Path struct is without value, but I think that
there's a very
large difference in the amount of value that the two provide.
AFAIK, very few
bugs are caused by treating paths as strings, but there are a
lot of time-
related bugs out there caused by using naked values instead of
values with
units.

Dub is forced to define its own separate Path type because, as
its author states, using a string to represent a path "more often
than not results in hidden bugs."
(https://github.com/rejectedsoftware/dub/issues/79). Representing
a path is just fine in a small script, but the moment you've got
to handle stuff like path comparison, building, and general
manipulation, you're better off defining an abstraction for it.

See, this is exactly the sort of thing I'm afraid of. I don't
want to have to
have arguments over whether a particular function should accept
a path as a
string or a struct. Right now, we have one way do to it, so
it's clear, and it
works just fine.

I see no problem with just keeping Phobos as it is, it was just a
suggestion to make use of new functionality. A function that
takes a string can accept a Path *or* a string, and it'll work
just fine, thanks to subtyping.

void bar(Path path) { return; }
void foo(string str) { return; }

Path p = baz\quixx;

bar(p);
foo(p);

So there doesn't have to be an argument over what a function
should accept; that's up to the function's internal
implementation. From the outside, it'll accept both.

Jun 05 2013
On Wednesday, 5 June 2013 at 06:27:46 UTC, Dylan Knutson wrote:
Hello,
I'd like to open up the idea of Path being an object in
std.path. I've submitted a pull
(https://github.com/D-Programming-Language/phobos/pull/1333)
that adds a Path struct to std.path, "which exposes a much more
palatable interface to path string manipulation".

For the record, there are some existing D path object
implementations:

* Tango's FilePath class:

https://github.com/SiegeLord/Tango-D2/blob/d2port/tango/io/FilePath.d

* Vibe's Path struct:

https://github.com/rejectedsoftware/vibe.d/blob/master/source/vibe/inet/path.d

Jun 05 2013
"Dylan Knutson" <tcdknutson gmail.com> writes:
On Wednesday, 5 June 2013 at 22:06:52 UTC, Vladimir Panteleev
wrote:
On Wednesday, 5 June 2013 at 06:27:46 UTC, Dylan Knutson wrote:
Hello,
I'd like to open up the idea of Path being an object in
std.path. I've submitted a pull
(https://github.com/D-Programming-Language/phobos/pull/1333)
that adds a Path struct to std.path, "which exposes a much
more palatable interface to path string manipulation".

For the record, there are some existing D path object
implementations:

* Tango's FilePath class:

https://github.com/SiegeLord/Tango-D2/blob/d2port/tango/io/FilePath.d

* Vibe's Path struct:

https://github.com/rejectedsoftware/vibe.d/blob/master/source/vibe/inet/path.d

The design of Path was prompted by Dub's own internal path
module, might I add. And if anything, this just goes to show that
a Path object indeed does have its use cases.

Jun 05 2013
"Dylan Knutson" <tcdknutson gmail.com> writes:
I'd like to point out some of the pitfalls of using a raw string
as a representation of a path, too.

You've got to manually normalize strings before any comparison is
done. Even a single directory delimer at the end of the string
means that the paths won't compare correctly. This takes a good
amount of extra code to do so, and you've got to remember to
normalize *everywhere*, or you've got a bug waiting to happen.
string s1 = baz/../foo/bar/;
string s2 = foo/bar/;
string s3 = foo/bar;

assert(s1 == s2); // Fails
assert(s2 == s3); // Fails
assert(s1 == s3); // Fails
assert(buildNormalizedPath(s1) == buildNormalizedPath(s2)); //
Passes, with many more keystrokes.

Comparing with Paths:
Path p1 = baz/../foo/bar/;
Path p2 = foo/bar/;
Path p2 = foo/bar;

assert(p1 == p2); // Passes.
assert(p2 == p3); // Passes.
assert(p1 == p3); // Passes.

As you can see, Path is just generally easier to work with,
because it encapsulates the concept a path. There's no having to
normalize strings, because that's done for you. It just works.

Building a path with strings isn't difficult, but the function
calls are unweildy.
string s1 = buildNormalizedPath(foo, bar);
string s2 = buildNormalizedPath(s1, baz);
assert(s2 == foo/bar/baz); // Will fail on some platforms.

Building a Path, IMO, just looks cleaner, and it's obvious what
you're doing.
Path p1 = Path(foo, bar);
Path p2 = p1.join(baz);
assert(p2 == foo/bar/baz); // Passes on all platforms.

As a sidenote, I'd like to point out that using Path has *no more
overhead* than passing around and manipulating a raw string.
As far as I can tell, all use cases for Path takes less code, and
more easily convays what you're doing. D's support for object
oriented design is great; why not make use of it?

Jun 05 2013
"Jesse Phillips" <Jesse.K.Phillips+D gmail.com> writes:
On Wednesday, 5 June 2013 at 20:52:24 UTC, Dylan Knutson wrote:
Dub is forced to define its own separate Path type because, as
its author states, using a string to represent a path "more
often than not results in hidden bugs."

You're miss quoting here. "usually will be places where the path
is modified using string operations [...]"

While I've had desires to have my functions accept a Path so that
I can document what is being accepted (also helps with function
overloads), std.path has been working well for me as I move my
code from string operations to path operations.

Jun 05 2013
"Jesse Phillips" <Jesse.K.Phillips+D gmail.com> writes:
On Wednesday, 5 June 2013 at 22:06:52 UTC, Vladimir Panteleev
wrote:
* Tango's FilePath class:

https://github.com/SiegeLord/Tango-D2/blob/d2port/tango/io/FilePath.d

Note that Tango code should not be used for code intended for
Phobos unless all authors of that piece have stated they will
license under Boost. It is a firm stance to prevent any potential
legal issues (whether perceived or real)

Jun 05 2013
"Lars T. Kyllingstad" <public kyllingen.net> writes:
On Wednesday, 5 June 2013 at 06:27:46 UTC, Dylan Knutson wrote:
Hello,
I'd like to open up the idea of Path being an object in
std.path. I've submitted a pull
(https://github.com/D-Programming-Language/phobos/pull/1333)
that adds a Path struct to std.path, "which exposes a much more
palatable interface to path string manipulation".

Since I am the designer and primary author of std.path, I should
probably say something.

When I first started working on "the new std.path" a couple of
years ago, I initially entertained the idea of writing it in
terms of a dedicated Path type.  I was quickly convinced
otherwise by others, and proceeded to design the module around
normal strings.

For the last two years I've been working more in C++ than in D
(by necessity, not by desire), and for all my path-manipulation
needs I've been using boost::filesystem.  This library has a
dedicated path type, so I've gained some experience with this
kind of API.  And I am *really* happy we went with the string
solution for std.path.

Paths are usually obtained in string form, and they are normally
passed to other functions and third party libraries in string
form.  Having to convert them to something else just to do what
is, in fact, string manipulations, is just annoying.

(One of my biggest gripes with boost::filesystem is that
conversions between path and string necessitate a copy, which is
not a problem with your Path type, so in that respect it is
better than Boost's solution.)

[...]

Why I think it should be reconsidered for inclusion in the std
(listed in the pull):
* Adds a (more) platform independent abstraction for path
strings.

How is this more platform independent?  It is just a simple
wrapper around a string, with methods that forward to

* Path provides a type safe way to pass, compare, and
manipulate arbitrary path strings.

How is it safer?  I would agree with this if it verified that
isValidpath(_path) on construction and maintained this as an
invariant, but I cannot see that it does.

* It wraps over the functions defined in std.path, so behavior
of methods on Path are, in most cases, identical to their
corresponding module function.

Then what is the added value?

Having Path together with normal string functions in the same
module will be confusing (there are two almost-equal ways of
doing the same thing; which one should I choose?), and it will
add code duplication (now my code has to accept paths both as
strings and as Paths).

As the author of std.path this may come across as hostile or
jealous, but I don't see that the proposed change improves
anything.

Lars

Jun 06 2013
Walter Bright <newshound2 digitalmars.com> writes:
On 6/6/2013 8:57 AM, Lars T. Kyllingstad wrote:
On Thursday, 6 June 2013 at 15:41:51 UTC, Dylan Knutson wrote:
FWIW, having Path be an object adds consistency with the rest of Phobos, which
has many entities which could be expressed as primitives, expressed as
objects. To name a few, DateTime is an object, File is an object, and DirEntry
is an object. Yes, they could be described as integers, or a pointer, or a
string, but it's less cognitive load on the developer to recognize them as
separate types.

"Reducing cognitive load" is not the main reason these are objects.  DateTime
lumps together no less than six integers. File adds automatic resource
management via reference counting. DirEntry caches file information to avoid
repeated filesystem lookups.  And so on.

It's hard to see what value there is in a type that is simply a wrapper around
an existing type, and which provides implicit conversions too/from that
existing
type so that they can be intermixed arbitrarily.

At the end, that's nothing more than:

alias string Path;

Jun 06 2013
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/6/13 1:41 PM, Walter Bright wrote:
On 6/6/2013 8:57 AM, Lars T. Kyllingstad wrote:
On Thursday, 6 June 2013 at 15:41:51 UTC, Dylan Knutson wrote:
FWIW, having Path be an object adds consistency with the rest of
Phobos, which
has many entities which could be expressed as primitives, expressed as
objects. To name a few, DateTime is an object, File is an object, and
DirEntry
is an object. Yes, they could be described as integers, or a pointer,
or a
string, but it's less cognitive load on the developer to recognize
them as
separate types.

"Reducing cognitive load" is not the main reason these are objects.
DateTime
lumps together no less than six integers. File adds automatic resource
management via reference counting. DirEntry caches file information to
avoid
repeated filesystem lookups. And so on.

It's hard to see what value there is in a type that is simply a wrapper
around an existing type, and which provides implicit conversions
too/from that existing type so that they can be intermixed arbitrarily.

At the end, that's nothing more than:

alias string Path;

No, you get to check the conversions going one way.

If you destroy, destroy in style. This is a wrong argument.

Andrei

Jun 06 2013
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/6/13 1:13 PM, Steven Schveighoffer wrote:
buildNormalizedPath(s1) == buildNormalizedPath(s2);

and

p1 == p2;

This can be done without allocations.

Interesting. "Show me the code!"

Andrei

Jun 06 2013
Walter Bright <newshound2 digitalmars.com> writes:
On 6/6/2013 10:47 AM, Andrei Alexandrescu wrote:
On 6/6/13 1:13 PM, Steven Schveighoffer wrote:
buildNormalizedPath(s1) == buildNormalizedPath(s2);

and

p1 == p2;

This can be done without allocations.

Interesting. "Show me the code!"

Not necessary - it is trivially obvious to the most casual observer!

(You just use the same logic that normalizes the path to do the comparison.)

Jun 06 2013
Walter Bright <newshound2 digitalmars.com> writes:
On 6/6/2013 9:14 AM, Dylan Knutson wrote:
It doesn't do any allocations that the user won't have to do anyways. Paths
have
to be normalized before comparison; not doing so isn't correct behavior. Eg,
the
strings foo../bar != bar, yet they're equivalent paths. Path encapsulates
the behavior. So it's the difference between

buildNormalizedPath(s1) == buildNormalizedPath(s2);

and

p1 == p2;

I believe it is a mistake to try and automatically hide the difference between
./bar and bar. Paths being == and 'referring to the same file' are different
things.

For performance reasons, also, I'd want to normalize sometime after building
the
entire path, I wouldn't want to normalize at each step. Normalization should be
an explicit step, not implicit.

Jun 06 2013
"Flamaros" <flamaros.xavier gmail.com> writes:
On Wednesday, 5 June 2013 at 06:27:46 UTC, Dylan Knutson wrote:
Hello,
I'd like to open up the idea of Path being an object in
std.path. I've submitted a pull
(https://github.com/D-Programming-Language/phobos/pull/1333)
that adds a Path struct to std.path, "which exposes a much more
palatable interface to path string manipulation".

As jmdavis points out, this has previously been discussed.
However, I can't find that discussion, and I think that the
benefits of including an OO way to deal with paths is a serious
gain for the standard library.

Why I think it should be reconsidered for inclusion in the std
(listed in the pull):
* Adds a (more) platform independent abstraction for path
strings.
* Path provides a type safe way to pass, compare, and
manipulate arbitrary path strings.
* It wraps over the functions defined in std.path, so behavior
of methods on Path are, in most cases, identical to their
corresponding module function.

hate to see this commit closed due to a discussion that
happened at a different point in D's development when the

Thank you.

I like the idea to manipulate paths trough an object. API that
taking path as parameter as better typed than with string. It's
really usefull for file loaders, it's affirm the method will do
path related operation and expect a particular string format.

Some methods seems miss like completeBaseName and completeSuffix.
You can take a look to : Qt API
http://qt-project.org/doc/qt-4.8/qfileinfo.html

The bad thing with the Qt API it's we can't know which method do
a file system access, that why I prefer having 2 separated ojects.

It would be good to have the FileInfo object.

Jun 06 2013
"Lars T. Kyllingstad" <public kyllingen.net> writes:
On Thursday, 6 June 2013 at 07:05:52 UTC, Lars T. Kyllingstad
wrote:
On Wednesday, 5 June 2013 at 06:27:46 UTC, Dylan Knutson wrote:
Hello,
I'd like to open up the idea of Path being an object in
std.path. I've submitted a pull
(https://github.com/D-Programming-Language/phobos/pull/1333)
that adds a Path struct to std.path, "which exposes a much
more palatable interface to path string manipulation".

[...]

Let me add some more to this.  To justify the addition of such a
type, it needs to pull its own weight.  For added value, it could
do one or both of the following:

1. Maintain an isValidPath() invariant, for early error
detection.  (On POSIX, this is rather trivial, as any string that
does not contain a null character is in principle a valid path,
but on Windows, the situation is different.)

2. Add in-place versions of path modifiers (setExtension,
setDrive, etc.), for improved performance.

One solution would be for Path to be a trivial string wrapper
which does (1) and not (2).  In this case, it is justified to
have Path *in addition to* the existing functions.

Another solution would be for Path to do (2), possibly in
addition to (1).  However, in this case it should be a
*replacement* for the existing functions, and not an addition.
Otherwise, we have two almost-equal ways of doing the same thing,
which should be avoided.  (I am not advocating this, however, as
it will massively break user code all over again.)

Lars

Jun 06 2013
"Regan Heath" <regan netmail.co.nz> writes:
On Thu, 06 Jun 2013 08:05:51 +0100, Lars T. Kyllingstad
<public kyllingen.net> wrote:
Paths are usually obtained in string form, and they are normally passed
to other functions and third party libraries in string form.  Having to
convert them to something else just to do what is, in fact, string
manipulations, is just annoying.

Agree 100%.

C# has Path.Combine which builds paths from strings, returning a string
and this is good.

It also has System.File and System.Directory static classes with static
methods taking string, also good.

But, C# also has System.IO.FileInfo and System.IO.DirectoryInfo which are
constructed from a string, and then have methods which mirror the static
methods from System.File plus a refresh method to update the cached file
attributes etc obtained from the file system.  I find these objects useful.

It would be nice for D to have similar objects, IMO.

R

--
Using Opera's revolutionary email client: http://www.opera.com/mail/

Jun 06 2013
"Regan Heath" <regan netmail.co.nz> writes:
On Thu, 06 Jun 2013 08:55:50 +0100, Lars T. Kyllingstad
<public kyllingen.net> wrote:

On Thursday, 6 June 2013 at 07:05:52 UTC, Lars T. Kyllingstad wrote:
On Wednesday, 5 June 2013 at 06:27:46 UTC, Dylan Knutson wrote:
Hello,
I'd like to open up the idea of Path being an object in std.path. I've
submitted a pull
a Path struct to std.path, "which exposes a much more palatable
interface to path string manipulation".

[...]

Let me add some more to this.  To justify the addition of such a type,
it needs to pull its own weight.  For added value, it could do one or
both of the following:

Does System.IO.DirectoryInfo:
http://msdn.microsoft.com/en-us/library/system.io.directoryinfo.aspx

vs just having System.IO.Directory:
http://msdn.microsoft.com/en-us/library/system.io.directory.aspx

R

--
Using Opera's revolutionary email client: http://www.opera.com/mail/

Jun 06 2013
"Lars T. Kyllingstad" <public kyllingen.net> writes:
On Thursday, 6 June 2013 at 10:30:05 UTC, Regan Heath wrote:
On Thu, 06 Jun 2013 08:05:51 +0100, Lars T. Kyllingstad
<public kyllingen.net> wrote:
Paths are usually obtained in string form, and they are
normally passed to other functions and third party libraries
in string form.  Having to convert them to something else just
to do what is, in fact, string manipulations, is just annoying.

Agree 100%.

C# has Path.Combine which builds paths from strings, returning
a string and this is good.

It also has System.File and System.Directory static classes
with static methods taking string, also good.

But, C# also has System.IO.FileInfo and System.IO.DirectoryInfo
which are constructed from a string, and then have methods
which mirror the static methods from System.File plus a refresh
method to update the cached file attributes etc obtained from
the file system.  I find these objects useful.

It would be nice for D to have similar objects, IMO.

It does have a similar type: std.file.DirEntry.
http://dlang.org/phobos/std_file.html#.DirEntry

Jun 06 2013
"Lars T. Kyllingstad" <public kyllingen.net> writes:
On Thursday, 6 June 2013 at 10:32:36 UTC, Regan Heath wrote:
On Thu, 06 Jun 2013 08:55:50 +0100, Lars T. Kyllingstad
<public kyllingen.net> wrote:

On Thursday, 6 June 2013 at 07:05:52 UTC, Lars T. Kyllingstad
wrote:
On Wednesday, 5 June 2013 at 06:27:46 UTC, Dylan Knutson
wrote:
Hello,
I'd like to open up the idea of Path being an object in
std.path. I've submitted a pull
(https://github.com/D-Programming-Language/phobos/pull/1333)
that adds a Path struct to std.path, "which exposes a much
more palatable interface to path string manipulation".

[...]

Let me add some more to this.  To justify the addition of such
a type, it needs to pull its own weight.  For added value, it
could do one or both of the following:

Does System.IO.DirectoryInfo:
http://msdn.microsoft.com/en-us/library/system.io.directoryinfo.aspx

vs just having System.IO.Directory:
http://msdn.microsoft.com/en-us/library/system.io.directory.aspx

They add great value, but that is a completely different
discussion, as these are more similar to std.file.DirEntry.  The
added value is mainly in the performance benefits; for example,

if (exists(f) && isFile(f) && timeLastModified(f) < d) ...

requires three filesystem lookups (stat() calls), whereas

auto de = dirEntry(f);
if (de.exists && de.isFile && de.timeLastModified < d) ...

is just one.

I see no such benefit in the proposed Path type.

Jun 06 2013
"Regan Heath" <regan netmail.co.nz> writes:
On Thu, 06 Jun 2013 11:43:51 +0100, Lars T. Kyllingstad
<public kyllingen.net> wrote:

On Thursday, 6 June 2013 at 10:30:05 UTC, Regan Heath wrote:
On Thu, 06 Jun 2013 08:05:51 +0100, Lars T. Kyllingstad
<public kyllingen.net> wrote:
Paths are usually obtained in string form, and they are normally
passed to other functions and third party libraries in string form.
Having to convert them to something else just to do what is, in fact,
string manipulations, is just annoying.

Agree 100%.

C# has Path.Combine which builds paths from strings, returning a string
and this is good.

It also has System.File and System.Directory static classes with static
methods taking string, also good.

But, C# also has System.IO.FileInfo and System.IO.DirectoryInfo which
are constructed from a string, and then have methods which mirror the
static methods from System.File plus a refresh method to update the
cached file attributes etc obtained from the file system.  I find these
objects useful.

It would be nice for D to have similar objects, IMO.

It does have a similar type: std.file.DirEntry.
http://dlang.org/phobos/std_file.html#.DirEntry

Ahh.. excellent.  In that case, I don't think we want/need the Path being
proposed.

Side-note;  DirEntry is a very UNIX centric name - I only know that
because I have coded with it, I wonder what pure windows developers make
of it..

R

--
Using Opera's revolutionary email client: http://www.opera.com/mail/

Jun 06 2013
"Flamaros" <flamaros.xavier gmail.com> writes:
On Thursday, 6 June 2013 at 07:26:53 UTC, Flamaros wrote:
On Wednesday, 5 June 2013 at 06:27:46 UTC, Dylan Knutson wrote:
Hello,
I'd like to open up the idea of Path being an object in
std.path. I've submitted a pull
(https://github.com/D-Programming-Language/phobos/pull/1333)
that adds a Path struct to std.path, "which exposes a much
more palatable interface to path string manipulation".

As jmdavis points out, this has previously been discussed.
However, I can't find that discussion, and I think that the
benefits of including an OO way to deal with paths is a
serious gain for the standard library.

Why I think it should be reconsidered for inclusion in the std
(listed in the pull):
* Adds a (more) platform independent abstraction for path
strings.
* Path provides a type safe way to pass, compare, and
manipulate arbitrary path strings.
* It wraps over the functions defined in std.path, so behavior
of methods on Path are, in most cases, identical to their
corresponding module function.

hate to see this commit closed due to a discussion that
happened at a different point in D's development when the

Thank you.

I like the idea to manipulate paths trough an object. API that
taking path as parameter as better typed than with string. It's
really usefull for file loaders, it's affirm the method will do
path related operation and expect a particular string format.

Some methods seems miss like completeBaseName and
completeSuffix.
You can take a look to : Qt API
http://qt-project.org/doc/qt-4.8/qfileinfo.html

The bad thing with the Qt API it's we can't know which method
do a file system access, that why I prefer having 2 separated
ojects.

It would be good to have the FileInfo object.

Having an object will also remove format normalization, with a
string as parameter the normalization method have to always be
called.

Jun 06 2013
"Dylan Knutson" <tcdknutson gmail.com> writes:
 Let me add some more to this.  To justify the addition of such
a type, it needs to pull its own weight.  For added value, it
could do one or both of the following:

1. Maintain an isValidPath() invariant, for early error
detection.  (On POSIX, this is rather trivial, as any string
that does not contain a null character is in principle a valid
path, but on Windows, the situation is different.)

That's a possibility.

2. Add in-place versions of path modifiers (setExtension,
setDrive, etc.), for improved performance.

I don't think that there'll be any performance improvements by
making in place modification functions. Considering under the
hood the path object is just a string, and that string's
reference needs to be changed with each modification, I don't see
how manipulation can be made faster.

One solution would be for Path to be a trivial string wrapper
which does (1) and not (2).  In this case, it is justified to
have Path *in addition to* the existing functions.

Another solution would be for Path to do (2), possibly in
addition to (1).  However, in this case it should be a
*replacement* for the existing functions, and not an addition.
Otherwise, we have two almost-equal ways of doing the same
thing, which should be avoided.  (I am not advocating this,
however, as it will massively break user code all over again.)

The more I think about it, the more partial I am to removing the
existing string methods in std.path. At most, using a Path object
increases number of characters typed by 6 (Path()). And even
then, chances are you'll be saving characters as method names can
be simplified to remove path from them: buildNormalizedPath ->
normalized, isValidPath -> isValid, etc. Even with user code
breaking, 1) D isn't exactly considered a stable language quite
yet; I'm sure that users expect code breakage with each new
release, and 2) it's trivial to convert code that uses the string
based API to the object based API.

Jun 06 2013
"Dylan Knutson" <tcdknutson gmail.com> writes:
On Thursday, 6 June 2013 at 07:05:52 UTC, Lars T. Kyllingstad
wrote:
Paths are usually obtained in string form, and they are
normally passed to other functions and third party libraries in
string form.  Having to convert them to something else just to
do what is, in fact, string manipulations, is just annoying.

Well, when designing Path, I didn't want to add much, if any,
programmer overhead. Conversion to a Path is trivial: Change the
type to Path, and 90% of the time it'll just work. The only case
that comes to mind where a string can't be implicitly
assigned/converted to a Path is when passing it to a function, in
which case all it needs to be wrapped in is Path(). Or, have an
overloaded version that takes a string (which all path using
functions do now anyways).

(One of my biggest gripes with boost::filesystem is that
conversions between path and string necessitate a copy, which
is not a problem with your Path type, so in that respect it is
better than Boost's solution.)

[...]

Why I think it should be reconsidered for inclusion in the std
(listed in the pull):
* Adds a (more) platform independent abstraction for path
strings.

How is this more platform independent?  It is just a simple
wrapper around a string, with methods that forward to

I should have said "makes it easier to be platform independent".
Normalization is done automatically on comparison. There's
nothing you can't do with normal std.path functions, but that's
not the point. It's to be type safe and add convenience.

* Path provides a type safe way to pass, compare, and
manipulate arbitrary path strings.

How is it safer?  I would agree with this if it verified that
isValidpath(_path) on construction and maintained this as an
invariant, but I cannot see that it does.

Type safe. Once you've got a huge program with many concepts
floating around, you don't want to have to keep track of which
strings are paths and which aren't, and you don't want to do all
the specifics like splitting, normalization, and joining with raw
string functions. This isn't just conjecture either; there are D
programs in the wild that abstract away path strings because it's
easier to deal with them that way.
I didn't want to force paths passed in to be valid, because the
programmer might want an invalid path passed around for whatever
reason.

* It wraps over the functions defined in std.path, so behavior
of methods on Path are, in most cases, identical to their
corresponding module function.

Then what is the added value?

See above. I didn't want to change functionality, just make it
easier to use.

As the author of std.path this may come across as hostile or
jealous, but I don't see that the proposed change improves
anything.

You came off as quite constructive; thank you :-)

Jun 06 2013
"Dylan Knutson" <tcdknutson gmail.com> writes:
On Thursday, 6 June 2013 at 10:48:54 UTC, Lars T. Kyllingstad
wrote:
On Thursday, 6 June 2013 at 10:32:36 UTC, Regan Heath wrote:
On Thu, 06 Jun 2013 08:55:50 +0100, Lars T. Kyllingstad
<public kyllingen.net> wrote:

On Thursday, 6 June 2013 at 07:05:52 UTC, Lars T. Kyllingstad
wrote:
[...]

Let me add some more to this.  To justify the addition of
such a type, it needs to pull its own weight.  For added
value, it could do one or both of the following:

Does System.IO.DirectoryInfo:
http://msdn.microsoft.com/en-us/library/system.io.directoryinfo.aspx

vs just having System.IO.Directory:
http://msdn.microsoft.com/en-us/library/system.io.directory.aspx

They add great value, but that is a completely different
discussion, as these are more similar to std.file.DirEntry.
The added value is mainly in the performance benefits; for
example,

if (exists(f) && isFile(f) && timeLastModified(f) < d) ...

requires three filesystem lookups (stat() calls), whereas

auto de = dirEntry(f);
if (de.exists && de.isFile && de.timeLastModified < d) ...

is just one.

I see no such benefit in the proposed Path type.

Path and dirEntry are different modules with different goals to
fulfill. I don't think it's appropriate to compare a module whose
function is path manipulation with one whose is querying
filesystem information.

Jun 06 2013
"Lars T. Kyllingstad" <public kyllingen.net> writes:
On Thursday, 6 June 2013 at 14:39:03 UTC, Dylan Knutson wrote:
[...]

I don't think that there'll be any performance improvements by
making in place modification functions. Considering under the
hood the path object is just a string, and that string's
reference needs to be changed with each modification, I don't
see how manipulation can be made faster.

Why does _path have to be an immutable string?  It could just as
well be a char[], or it could be templated on the character type.

[...]

The more I think about it, the more partial I am to removing
the existing string methods in std.path. At most, using a Path
object increases number of characters typed by 6 (Path()).
And even then, chances are you'll be saving characters as
method names can be simplified to remove path from them:
buildNormalizedPath -> normalized, isValidPath -> isValid, etc.
Even with user code breaking, 1) D isn't exactly considered a
stable language quite yet; I'm sure that users expect code
breakage with each new release, and 2) it's trivial to convert
code that uses the string based API to the object based API.

I know D isn't 100% stable yet, but bear in mind that this module
was introduced no more than two years ago, as part of the
(still-ongoing) effort to revamp the old modules from the D1
days.  It was accepted with a unanimous vote after a
comprehensive review by the D community.  And already you want
another breaking redesign?  I am strongly opposed to this.

Jun 06 2013
"Lars T. Kyllingstad" <public kyllingen.net> writes:
On Thursday, 6 June 2013 at 14:54:25 UTC, Dylan Knutson wrote:
On Thursday, 6 June 2013 at 10:48:54 UTC, Lars T. Kyllingstad
They add great value, but that is a completely different
discussion, as these are more similar to std.file.DirEntry.
[...]

Path and dirEntry are different modules with different goals to
fulfill. I don't think it's appropriate to compare a module
whose function is path manipulation with one whose is querying
filesystem information.

Which is why my first sentence said "that is a completely
different discussion".

Jun 06 2013
Walter Bright <newshound2 digitalmars.com> writes:
On 6/4/2013 11:27 PM, Dylan Knutson wrote:
I'd like to open up the idea of Path being an object in std.path. I've
submitted
a pull (https://github.com/D-Programming-Language/phobos/pull/1333) that adds a
Path struct to std.path, "which exposes a much more palatable interface to path
string manipulation".

I've succumbed to the temptation to do this several times over the years.

I always wind up backing it out and going back to strings.

The objections have all been already mentioned by others in this thread. I
understand the motivation for doing it, it seems like a great idea, but I am
strongly opposed to it.

To repeat the objections:

1. Making a more 'palatable' interface is pretty much chasing rainbows. It
really isn't better, it is just different. In many ways, it is worse because it
cannot hope to duplicate the rich interface available for strings.

2. APIs that deal with filenames take strings and return strings, not Path
objects. Your code gets littered with path and filename components that are
sometimes Paths and sometimes strings and sometimes both.

3. Every time you deal with a filename or path, you have to decide whether to
use a Path or a string. This may seem like a small thing, but when writing a
lot
of code to deal with paths, this becomes a fracking annoyance.

4. An awful lot of path manipulation is done using string functions. Ever do
regexes on paths? I do. But regex deals with strings, not Path objects. Ditto
for the rest of the universe of code that deals with strings.

5. You wind up with two parallel universes of functions to deal with paths -
one
dealing with strings, one with Paths, oh, and a third universe of crap that
deals with mixed strings and Paths.

6. If you try not to do (5), you break all existing code.

7. People like writing paths as "/etc/hosts", not Path("/etc/hosts"). People
will not stand for a Path constructor that winds up allocating memory so it can
rewrite the string in a canonical path representation.

8. There really isn't any such thing as a portable path representation. It's
more than just \ vs /. There are the drive prefixes in Windows that have no
analog in Linux. Sometimes case matters in Linux, where it would be ignored
under Windows. There are 8.3 issues sometimes. The only thing you can do is
come
up with a subset of what works across systems, and then of course you have to
go
back to using strings when you need to access D:\foo\abc.c

9. People think about paths in terms of strings, not Path objects. Adding an
abstraction layer always produces the feeling of "what is it doing, is it
allocating memory, is it slow, is it doing something clever that I don't
need/want?". This is cognitive baggage, and interferes with writing clear,
correct code.

I've written a lot of cross-platform path code, I've tried the Path object
thing
multiple times, and I wrote the original std.path, and it uses strings because
of my experience.

Jun 06 2013
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/6/13 11:36 AM, Walter Bright wrote:
To repeat the objections:

1. Making a more 'palatable' interface is pretty much chasing rainbows.
It really isn't better, it is just different. In many ways, it is worse
because it cannot hope to duplicate the rich interface available for
strings.

Subtyping (Path is a subtype of string by means of alias this) should
make getting from paths to strings easy, and getting back from strings
to paths one constructor call away (which adds correctness).

2. APIs that deal with filenames take strings and return strings, not
Path objects. Your code gets littered with path and filename components
that are sometimes Paths and sometimes strings and sometimes both.

Subtyping should make it easy to pass paths to APIs that expect strings.

3. Every time you deal with a filename or path, you have to decide
whether to use a Path or a string. This may seem like a small thing, but
when writing a lot of code to deal with paths, this becomes a fracking
annoyance.

If there's a reward for using paths the annoyance factor may be reduced.

4. An awful lot of path manipulation is done using string functions.
Ever do regexes on paths? I do. But regex deals with strings, not Path
objects. Ditto for the rest of the universe of code that deals with
strings.

Subtyping should take care of this.

5. You wind up with two parallel universes of functions to deal with
paths - one dealing with strings, one with Paths, oh, and a third
universe of crap that deals with mixed strings and Paths.

Subtyping makes one way easy and constructors make the other way
affordable. Again, this comes back to perceived gains that compensate
for the shortcomings.

6. If you try not to do (5), you break all existing code.

Only "half".

7. People like writing paths as "/etc/hosts", not Path("/etc/hosts").
People will not stand for a Path constructor that winds up allocating
memory so it can rewrite the string in a canonical path representation.

Lazy canonicalization may help.

8. There really isn't any such thing as a portable path representation.
It's more than just \ vs /. There are the drive prefixes in Windows that
have no analog in Linux. Sometimes case matters in Linux, where it would
be ignored under Windows. There are 8.3 issues sometimes. The only thing
you can do is come up with a subset of what works across systems, and
then of course you have to go back to using strings when you need to
access D:\foo\abc.c

That is actually an argument in favor of good encapsulation, not against.

9. People think about paths in terms of strings, not Path objects.
Adding an abstraction layer always produces the feeling of "what is it
doing, is it allocating memory, is it slow, is it doing something clever
that I don't need/want?". This is cognitive baggage, and interferes with
writing clear, correct code.

I'm not sure whether the generalization holds.

Andrei

Jun 06 2013
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/6/13 1:04 PM, Lars T. Kyllingstad wrote:
On Thursday, 6 June 2013 at 16:03:15 UTC, Andrei Alexandrescu wrote:
[...]

8. There really isn't any such thing as a portable path representation.
It's more than just \ vs /. There are the drive prefixes in Windows that
have no analog in Linux. Sometimes case matters in Linux, where it would
be ignored under Windows. There are 8.3 issues sometimes. The only thing
you can do is come up with a subset of what works across systems, and
then of course you have to go back to using strings when you need to
access D:\foo\abc.c

That is actually an argument in favor of good encapsulation, not against.

The proposed API change does not introduce good encapsulation. It
introduces a super-thin wrapper around a built-in type, and replaces
free functions with methods, for what gain?

I was talking in principle. I agree that the argument "it was as easy as
wrapping the already existing functions" works against the current
proposal, not in favor of it.

Andrei

Jun 06 2013
Michel Fortin <michel.fortin michelf.ca> writes:
On 2013-06-06 15:36:15 +0000, Walter Bright <newshound2 digitalmars.com> said:

8. There really isn't any such thing as a portable path representation.
It's more than just \ vs /. There are the drive prefixes in Windows
that have no analog in Linux. Sometimes case matters in Linux, where it
would be ignored under Windows. There are 8.3 issues sometimes. The
only thing you can do is come up with a subset of what works across
systems, and then of course you have to go back to using strings when
you need to access D:\foo\abc.c

Actually, there is one portable representation for paths: URLs. More
specifically "file:" URLs if we're limiting ourselves to filesystem
paths. Relative URLs should probably count too.

But otherwise, that's all true. To correctly normalize a path, you need
to know which underlying filesystem is in use. Today's operating
systems can mix and match case-sensitive, case-preserving, and
case-insensitive filesystems, different restrictions on file names, and
sometime have obscure restrictions/normalization when using old APIs on
newer filesystenm. You can't really normalize a path without making a
lot of assumptions.

Of course, that's not an argument for or against having a path object
to encapsulate the differences. But I'd tend to say that what the path
object can do is more limited than one might think at first glance.

As a side note, Apple is currently asking application developers to use
URLs instead of raw paths to local files. Using URLs makes it possible
for instance to attach "bookmarks" keys on path (in the query string)
that can more or less automatically punch a hole in the sandbox when
accessing a file (which can expire or be revoked). Pretty much all
recent Cocoa APIs take url objects instead of path strings.

--
Michel Fortin
michel.fortin michelf.ca
http://michelf.ca/

Jun 06 2013
Walter Bright <newshound2 digitalmars.com> writes:
On 6/6/2013 9:23 AM, Michel Fortin wrote:
Actually, there is one portable representation for paths: URLs. More
specifically "file:" URLs if we're limiting ourselves to filesystem paths.
Relative URLs should probably count too.

That doesn't work for case sensitivity/insensitivity differences, nor does it
work for drive letters like "C:" (which don't exist on Apple systems, hence
they
can afford to dismiss them).

In D source code, we deal with this with the convention that package and module
names must be lower case. But there's no getting around the fact that "File"
and
"file" are different paths under Windows, and are the same under Linux.

There is no generic abstraction to account for that - the programmer must be
aware of it and adjust as appropriate for his application.

Jun 06 2013
Walter Bright <newshound2 digitalmars.com> writes:
On 6/6/2013 10:59 AM, Jonathan M Davis wrote:
On Thursday, June 06, 2013 10:27:28 Walter Bright wrote:
But there's no getting around the fact
that "File" and "file" are different paths under Windows, and are the same
under Linux.

I think you got that backwards. ;)

Dang, I should have written some unittests!

Jun 06 2013
Michel Fortin <michel.fortin michelf.ca> writes:
On 2013-06-06 17:27:28 +0000, Walter Bright <newshound2 digitalmars.com> said:

That doesn't work for case sensitivity/insensitivity differences nor
does it work for drive letters like "C:" (which don't exist on Apple
systems, hence they can afford to dismiss them).

Have you never opened a local file in a windows web browser and took a
look at the URL? The drive letter is there.

file:///c:/path/to/the%20file.txt

The drive letter is simply the first part of the path on Windows.

But there's no getting around the fact that "File" and "file" are
different paths under Windows, and are the same under Linux.

Actually, it doesn't depend on Linux or Windows or OS X. It depends on
the filesystem used, be it FAT16, FAT32, NTFS, ext{1,2,3}, HFS+,
Case-sensitive HFS+, etc. If you assume a specific case sensitivity
setting by looking at the OS, that's a bug. You can mount NTFS and FAT
on Linux or OS X, and Apple has Case-sensitive HFS+ for OS X and its
the default on iOS. Then there's the whole issue about which locale to
use for Unicode case-insensitive comparisons. I'd bet that different
filesystems choose different approaches to this tricky problem.

So there's no way to normalize for case-sensitivity just by looking at
a path or a URL, even if you know on which OS you're on. If you want to
know for sure whether two paths are the same, or what is the normalized
path, you need to ask the filesystem at some point. Anything else is
based on fragile assumptions.

--
Michel Fortin
michel.fortin michelf.ca
http://michelf.ca/

Jun 06 2013
Walter Bright <newshound2 digitalmars.com> writes:
On 6/6/2013 1:02 PM, Michel Fortin wrote:
On 2013-06-06 17:27:28 +0000, Walter Bright <newshound2 digitalmars.com> said:

That doesn't work for case sensitivity/insensitivity differences nor does it
work for drive letters like "C:" (which don't exist on Apple systems, hence
they can afford to dismiss them).

Have you never opened a local file in a windows web browser and took a look at
the URL? The drive letter is there.

file:///c:/path/to/the%20file.txt

The drive letter is simply the first part of the path on Windows.

I didn't know that, but that doesn't make it a canonical path. It just combines
the notion of url with a path.

But there's no getting around the fact that "File" and "file" are different
paths under Windows, and are the same under Linux.

Actually, it doesn't depend on Linux or Windows or OS X. It depends on the
filesystem used, be it FAT16, FAT32, NTFS, ext{1,2,3}, HFS+, Case-sensitive
HFS+, etc. If you assume a specific case sensitivity setting by looking at the
OS, that's a bug. You can mount NTFS and FAT on Linux or OS X, and Apple has
Case-sensitive HFS+ for OS X and its the default on iOS. Then there's the whole
issue about which locale to use for Unicode case-insensitive comparisons. I'd
bet that different filesystems choose different approaches to this tricky
problem.

So there's no way to normalize for case-sensitivity just by looking at a path
or
a URL, even if you know on which OS you're on. If you want to know for sure
whether two paths are the same, or what is the normalized path, you need to ask
the filesystem at some point. Anything else is based on fragile assumptions.

It may be a bug, and I personally try to never depend on path code that is case
sensitive or not, but I bet there's a *lot* of code out there that makes those
assumptions.

BTW, Windows still has only erratic support for using / as path separators,
even
in the system commands. Not even the "DIR" command can deal with it.

Jun 06 2013
Walter Bright <newshound2 digitalmars.com> writes:
On 6/6/2013 1:54 PM, Steven Schveighoffer wrote:
On Thu, 06 Jun 2013 16:25:58 -0400, Walter Bright <newshound2 digitalmars.com>
wrote:

BTW, Windows still has only erratic support for using / as path separators,
even in the system commands. Not even the "DIR" command can deal with it.

We don't program using DIR.  That is irrelevant.  (not contesting that Windows
doesn't work well with '/', just that DIR, or any other command line tool, is
evidence)

The fact that DIR, probably the most widely used command in Windows, doesn't
support it is indicative.

I've also noticed Windows file dialog boxes not supporting it, and those are
supposed to be standard components.

DIR is used in .bat files and makefiles, it is certainly used in programming.

Jun 06 2013
Michel Fortin <michel.fortin michelf.ca> writes:
On 2013-06-06 20:25:58 +0000, Walter Bright <newshound2 digitalmars.com> said:

On 6/6/2013 1:02 PM, Michel Fortin wrote:
Have you never opened a local file in a windows web browser and took a look at
the URL? The drive letter is there.

file:///c:/path/to/the%20file.txt

The drive letter is simply the first part of the path on Windows.

I didn't know that, but that doesn't make it a canonical path. It just
combines the notion of url with a path.

It's not a canonical path, but it's a platform-neutral representation
of a path. You can perform the same operations with a URL (including
regular expressions) irrespective the underlying OS.

I was replying initially to your claim that there was no portable way
to represent a path. I don't think the definition of a "portable path"
needs to include any notion of canonical, because not even non-portable
paths can be canonical these days.

Actually, it doesn't depend on Linux or Windows or OS X. It depends on the
filesystem used, be it FAT16, FAT32, NTFS, ext{1,2,3}, HFS+, Case-sensitive
HFS+, etc. If you assume a specific case sensitivity setting by looking at the
OS, that's a bug. You can mount NTFS and FAT on Linux or OS X, and Apple has
Case-sensitive HFS+ for OS X and its the default on iOS. Then there's the whole
issue about which locale to use for Unicode case-insensitive comparisons. I'd
bet that different filesystems choose different approaches to this
tricky problem.

So there's no way to normalize for case-sensitivity just by looking at
a path or
a URL, even if you know on which OS you're on. If you want to know for sure
whether two paths are the same, or what is the normalized path, you need to ask
the filesystem at some point. Anything else is based on fragile assumptions.

It may be a bug, and I personally try to never depend on path code that
is case sensitive or not, but I bet there's a *lot* of code out there
that makes those assumptions.

That's a good way to deal with paths (don't assume anything). And I'd
bet even case-sensitive filesystems differ in behaviour when presented
with different normalization of Unicode (using pre-combined characters
vs. combining ones).

--
Michel Fortin
michel.fortin michelf.ca
http://michelf.ca/

Jun 06 2013
Michel Fortin <michel.fortin michelf.ca> writes:
On 2013-06-07 20:52:30 +0000, Brad Roberts <braddr puremagic.com> said:

On 6/6/13 1:02 PM, Michel Fortin wrote:
and Apple has Case-sensitive HFS+ for OS X and its the default on iOS.

Careful.. While HFS+ can be case sensitive, it's not by default.  Nor
is it recommended due to the number of osx applications that just
aren't designed with that in mind.

True. But what I meant is that it's the default on iOS, not OS X.
(Funnily, if you're running things in the iOS Simulator you'll run on
the same file system as OS X, case-sensitive most likely.)

--
Michel Fortin
michel.fortin michelf.ca
http://michelf.ca/

Jun 07 2013
Walter Bright <newshound2 digitalmars.com> writes:
On 6/6/2013 9:00 AM, Dylan Knutson wrote:
1. Making a more 'palatable' interface is pretty much chasing rainbows. It
really isn't better, it is just different. In many ways, it is worse because
it cannot hope to duplicate the rich interface available for strings.

.toString ?

2. APIs that deal with filenames take strings and return strings, not Path
objects. Your code gets littered with path and filename components that are
sometimes Paths and sometimes strings and sometimes both.

As for APIs that return strings, a Path toPath(string) function could be
in std.path? Another solution would be to migrate the parts of Phobos that use
path strings to using actual paths. They could be overloaded with a counterpart
that also takes a string, but the toPath function would be pretty useful here.

Yes, your code becomes littered with conversions. Ugh.

3. Every time you deal with a filename or path, you have to decide whether to
use a Path or a string. This may seem like a small thing, but when writing a
lot of code to deal with paths, this becomes a fracking annoyance.

If there should only be one API used, I'd suggest just use Path.

Except that just doesn't work out in practice. An awful lot uses strings, and
again, people want to use the incredibly rich string manipulation code out
there
on paths.

the more I realize how little
code would break, and how easy it'd be to fix that.

That's been used to justify every code breakage. And yet, people eschew using D
because of constant code breakage. It must stop.

It even takes less chars :-P and it only allocates on Path == Path and Path ==
string comparison. Which would have been done manually anyways.

Doing memory allocation to do == is a bad idea. People intuitively think of ==
as a cheap operation.

Well, that's not so much a limitation of Path or path functions as much as it
is
with the operating systems themselves. You still run into that with strings.
I'm
not trying to do anything groundbreaking, just abstract away the concept of a
path so it's easy to write larger applications.

But it isn't easier to use a Path object. That's one of the things I discovered
when using them - it's never easier.

Good practice says don't worry about the implementation of what you can't see.

Yeah, well, you said that == allocates memory under the hood, which is
surprising behavior. Real programs definitely worry about the implementation.

If the programmer is worried about the speed of the abstraction, deal with that
separately.

Yes, he goes back to using strings.

Jun 06 2013
Walter Bright <newshound2 digitalmars.com> writes:
On 6/6/2013 9:50 AM, Dylan Knutson wrote:
Well, it comes down to are we willing to marginally break code for the sake of
a
better API. D and Phobos aren't considered stable by any standard; I don't
think
we should treat them like they're set in stone. Also, deprecation gives
developers plenty of time to update their code (if they have to at all).

I don't believe that because we broke A, therefore it's ok to break B.

And secondly, it isn't clear that Path is a better API.

I'm not opposed to breakage in all cases. But there needs to be a big win to
justify it. I'm not seeing even a small net win for Path types. I'm not talking
hypothetical either, like I said, I've tried them several times.

Projects such as Dub, Vibe, and to an extent Tango disagree.

I agree there's a strong temptation to create a Path object, and I've succumbed
myself to it several times. A corollary is that people often wanted to create a
String class, too, though that has died out.

You might also consider David Nadlinger's counter example:

"As another data point (which may or may not be relevant for the discussion
here), the LLVM system/support library was initially based on Path objects, but
recently has been rewritten to use raw strings:
http://llvm.org/docs/doxygen/html/namespacellvm_1_1sys_1_1path.html"

I've rewritten my Path code to go back to raw strings, too.

Jun 06 2013
Walter Bright <newshound2 digitalmars.com> writes:
On 6/6/2013 10:50 AM, Jonathan M Davis wrote:
Some modules have needed been redone. Some still do. But we already _did_
rework std.path. We agreed that we liked the new API, and it's been working
great. It's one thing to revisit an API that's been around since before we had
ranges or a review process. It's an entirely different thing to be constantly
reworking entire modules. I think that we need _very_ strong justification to
redesign a module that we already put through the review process. And I really
don't think that we have it here.

I think we're in violent agreement.

An example of a strong justification for a redo is, for example, conversion to
use ranges. std.zip needs that treatment.

Jun 06 2013
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/6/13 2:13 PM, Jonathan M Davis wrote:
An example of a strong justification for a redo is, for example, conversion
to use ranges. std.zip needs that treatment.

Agreed.

Key to success for Path: somehow get it on the ranges bandwagon :o).

Andrei

Jun 06 2013
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/7/13 1:04 PM, monarch_dodra wrote:
I think using string as the main form of representation for a path is fine.

However, there are times where it is convenient to be able to explode a
path into a structure, where each part is clearly separate from the
next.

Tuple!(
string, "drive",
string[], "folders",
string, "basename",
string, "extension"
)
parsePath(string path);

string buildPath(string drive, string[] folders, string basename, string
extension);

Andrei

Jun 07 2013
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/7/13 2:10 PM, monarch_dodra wrote:
On Friday, 7 June 2013 at 17:27:16 UTC, Andrei Alexandrescu wrote:
On 6/7/13 1:04 PM, monarch_dodra wrote:
I think using string as the main form of representation for a path is
fine.

However, there are times where it is convenient to be able to explode a
path into a structure, where each part is clearly separate from the
next.

Tuple!(
string, "drive",
string[], "folders",
string, "basename",
string, "extension"
)
parsePath(string path);

string buildPath(string drive, string[] folders, string basename,
string extension);

Andrei

Yeah. That's pretty much more or less what I was describing. Except
"buildPath" would take your (unnamed) tuple type directly.

No, the version I wrote is more flexible. You get to pass separate
arguments to it or just pass a tuple with .expand.

buildPath(parsePath("/bin/sh").expand)

should rebuild "/bin/sh".

There'd be also be a "filename" member/ufcs function in there for
convenience.

I think that would be a small, but useful, addition to std.path.

Me 2.

Andrei

Jun 07 2013
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/8/13 10:45 AM, monarch_dodra wrote:
On Saturday, 8 June 2013 at 14:14:33 UTC, Lars T. Kyllingstad wrote:
On Saturday, 8 June 2013 at 14:08:59 UTC, Lars T. Kyllingstad wrote:
On Friday, 7 June 2013 at 17:27:16 UTC, Andrei Alexandrescu wrote:
However, there are times where it is convenient to be able to
explode a
path into a structure, where each part is clearly separate from the
next.

Tuple!(
string, "drive",
string[], "folders",
string, "basename",
string, "extension"
)
parsePath(string path);

string buildPath(string drive, string[] folders, string basename,
string extension);

[...]

But why stop at the parts you have listed there?

The moment I clicked "Send", I realised that offering multiple
decompositions would prevent recomposition, because you'd have to
choose which parts to combine.

Using D's property functions, this should not actually be a problem. The
struct could be opaque in regards to which members are actually
attributes, and which are functions.

Eg:
Path path = Path(C:\MyFile.txt);
path.filename = "main.cpp";
path.extension = "d";
assert(path.buildPath() == C:\main.d));

I don't see any reason for that to not work.

Looks like the proposal may be converted into something liked by all - a
small PathComponents struct with the appropriate primitives. A high
ratio of usefulness to size would be key to acceptance.

Andrei

Jun 08 2013
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/6/13 12:50 PM, Dylan Knutson wrote:
Well, it comes down to are we willing to marginally break code for the
sake of a better API.

Well the position of "marginally" in the sentence above may be contested
by some.

D and Phobos aren't considered stable by any
standard; I don't think we should treat them like they're set in stone.
Also, deprecation gives developers plenty of time to update their code
(if they have to at all).

I think this opinion is very unlikely to enjoy popularity. We actively
/want/ to make Phobos more stable, so using the argument that it's not
yet stable to add more instability is sure to fit the pattern of some
list of fallacies. Besides, the corresponding benefits (the best solid
argument that could be constructed) are at least according to some not
that large to justify the cost of breakage.

Andrei

Jun 06 2013
"Dylan Knutson" <tcdknutson gmail.com> writes:
On Thursday, 6 June 2013 at 15:24:09 UTC, Lars T. Kyllingstad
wrote:
On Thursday, 6 June 2013 at 14:39:03 UTC, Dylan Knutson wrote:
[...]

I don't think that there'll be any performance improvements by
making in place modification functions. Considering under the
hood the path object is just a string, and that string's
reference needs to be changed with each modification, I don't
see how manipulation can be made faster.

Why does _path have to be an immutable string?  It could just
as well be a char[], or it could be templated on the character
type.

[...]

The more I think about it, the more partial I am to removing
the existing string methods in std.path. At most, using a Path
object increases number of characters typed by 6 (Path()).
And even then, chances are you'll be saving characters as
method names can be simplified to remove path from them:
buildNormalizedPath -> normalized, isValidPath -> isValid,
etc. Even with user code breaking, 1) D isn't exactly
considered a stable language quite yet; I'm sure that users
expect code breakage with each new release, and 2) it's
trivial to convert code that uses the string based API to the
object based API.

I know D isn't 100% stable yet, but bear in mind that this
module was introduced no more than two years ago, as part of
the (still-ongoing) effort to revamp the old modules from the
D1 days.  It was accepted with a unanimous vote after a
comprehensive review by the D community.  And already you want
another breaking redesign?  I am strongly opposed to this.

Well, keep in mind that D 2 years ago was a different beast.
AFAIK, D only recently got alias X this, which solves 90% of
breakage problems when passing around Paths.
FWIW, having Path be an object adds consistency with the rest of
Phobos, which has many entities which could be expressed as
primitives, expressed as objects. To name a few, DateTime is an
object, File is an object, and DirEntry is an object. Yes, they
could be described as integers, or a pointer, or a string, but
it's less cognitive load on the developer to recognize them as
separate types.

Jun 06 2013
On Thursday, 6 June 2013 at 15:36:17 UTC, Walter Bright wrote:
I've succumbed to the temptation to do this several times over
the years.

I always wind up backing it out and going back to strings.

As another data point (which may or may not be relevant for the
discussion here), the LLVM system/support library was initially
based on Path objects, but recently has been rewritten to use raw
strings:
http://llvm.org/docs/doxygen/html/namespacellvm_1_1sys_1_1path.html

David

Jun 06 2013
"Lars T. Kyllingstad" <public kyllingen.net> writes:
On Thursday, 6 June 2013 at 15:41:51 UTC, Dylan Knutson wrote:
FWIW, having Path be an object adds consistency with the rest
of Phobos, which has many entities which could be expressed as
primitives, expressed as objects. To name a few, DateTime is an
object, File is an object, and DirEntry is an object. Yes, they
could be described as integers, or a pointer, or a string, but
it's less cognitive load on the developer to recognize them as
separate types.

"Reducing cognitive load" is not the main reason these are
objects.  DateTime lumps together no less than six integers.
File adds automatic resource management via reference counting.
DirEntry caches file information to avoid repeated filesystem
lookups.  And so on.

Jun 06 2013
"Dylan Knutson" <tcdknutson gmail.com> writes:
On Thursday, 6 June 2013 at 15:36:17 UTC, Walter Bright wrote:
I've succumbed to the temptation to do this several times over
the years.

I always wind up backing it out and going back to strings.

The objections have all been already mentioned by others in
this thread. I understand the motivation for doing it, it seems
like a great idea,

but I am strongly opposed to it.

To repeat the objections:

1. Making a more 'palatable' interface is pretty much chasing
rainbows. It really isn't better, it is just different. In many
ways, it is worse because it cannot hope to duplicate the rich
interface available for strings.

.toString ?

2. APIs that deal with filenames take strings and return
strings, not Path objects. Your code gets littered with path
and filename components that are sometimes Paths and sometimes
strings and sometimes both.

As for APIs that return strings, a Path toPath(string) function
could be added in std.path? Another solution would be to migrate
the parts of Phobos that use path strings to using actual paths.
They could be overloaded with a counterpart that also takes a
string, but the toPath function would be pretty useful here.

3. Every time you deal with a filename or path, you have to
decide whether to use a Path or a string. This may seem like a
small thing, but when writing a lot of code to deal with paths,
this becomes a fracking annoyance.

If there should only be one API used, I'd suggest just use Path.

4. An awful lot of path manipulation is done using string
functions. Ever do regexes on paths? I do. But regex deals with
strings, not Path objects. Ditto for the rest of the universe
of code that deals with strings.

Path implicitly converts to a string.

5. You wind up with two parallel universes of functions to deal
with paths - one dealing with strings, one with Paths, oh, and
a third universe of crap that deals with mixed strings and
Paths.

Well, I didn't say this in my OP, but I did a few comments back:
I'm more partial to deprecating the string API and moving to
Path. I didn't think many would go for this, but the more I think
about it, the more I realize how little code would break, and how
easy it'd be to fix that.

6. If you try not to do (5), you break all existing code.

7. People like writing paths as "/etc/hosts", not
Path("/etc/hosts"). People will not stand for a Path
constructor that winds up allocating memory so it can rewrite
the string in a canonical path representation.

string s = "/etc/hosts"
Path s = "/etc/hosts"

It even takes less chars :-P and it only allocates on Path ==
Path and Path == string comparison. Which would have been done
manually anyways.

8. There really isn't any such thing as a portable path
representation. It's more than just \ vs /. There are the drive
case matters in Linux, where it would be ignored under Windows.
There are 8.3 issues sometimes. The only thing you can do is
come up with a subset of what works across systems, and then of
course you have to go back to using strings when you need to
access D:\foo\abc.c

Well, that's not so much a limitation of Path or path functions
as much as it is with the operating systems themselves. You still
run into that with strings. I'm not trying to do anything
groundbreaking, just abstract away the concept of a path so it's
easy to write larger applications.

9. People think about paths in terms of strings, not Path
objects. Adding an abstraction layer always produces the
feeling of "what is it doing, is it allocating memory, is it
slow, is it doing something clever that I don't need/want?".
This is cognitive baggage, and interferes with writing clear,
correct code.

It's easy to think about a path as a string for trivial code.
Once the application uses paths in a nontrivial manner, people
write wrappers around path functions anyways. Type safety is very
useful.
Good practice says don't worry about the implementation of what
you can't see. If the programmer is worried about the speed of
the abstraction, deal with that separately. FWIW, the Path
wrapper doesn't allocate unless it needs to :-)

Jun 06 2013
"Lars T. Kyllingstad" <public kyllingen.net> writes:
On Thursday, 6 June 2013 at 14:51:13 UTC, Dylan Knutson wrote:
I should have said "makes it easier to be platform
independent". Normalization is done automatically on comparison.

Yes, p1 == p2 sure looks nice, but unbeknownst to the API user,
it comes at the cost of several memory allocations, and it does
not perform a case-insensitive comparison on Windows in its
current form.  (Should it?  I dunno.)

This isn't just conjecture either; there are D programs in the
wild that abstract away path strings because it's easier to
deal with them that way.
I didn't want to force paths passed in to be valid, because the
programmer might want an invalid path passed around for
whatever reason.

As others have pointed out, there are examples of the opposite
too.

You came off as quite constructive; thank you :-)

:)

Jun 06 2013
"Dylan Knutson" <tcdknutson gmail.com> writes:
On Thursday, 6 June 2013 at 16:06:50 UTC, Lars T. Kyllingstad
wrote:
On Thursday, 6 June 2013 at 14:51:13 UTC, Dylan Knutson wrote:
I should have said "makes it easier to be platform
independent". Normalization is done automatically on
comparison.

Yes, p1 == p2 sure looks nice, but unbeknownst to the API user,
it comes at the cost of several memory allocations, and it does
not perform a case-insensitive comparison on Windows in its
current form.  (Should it?  I dunno.)

It doesn't do any allocations that the user won't have to do
anyways. Paths have to be normalized before comparison; not doing
so isn't correct behavior. Eg, the strings foo../bar != bar,
yet they're equivalent paths. Path encapsulates the behavior. So
it's the difference between

buildNormalizedPath(s1) == buildNormalizedPath(s2);

and

p1 == p2;

Jun 06 2013
"Dylan Knutson" <tcdknutson gmail.com> writes:
On Thursday, 6 June 2013 at 16:24:11 UTC, Walter Bright wrote:
As for APIs that return strings, a Path toPath(string)
in std.path? Another solution would be to migrate the parts of
Phobos that use
path strings to using actual paths. They could be overloaded
with a counterpart
that also takes a string, but the toPath function would be
pretty useful here.

Yes, your code becomes littered with conversions. Ugh.

As opposed to the rest of the conventions that Phobos uses?

If there should only be one API used, I'd suggest just use
Path.

Except that just doesn't work out in practice. An awful lot
uses strings, and again, people want to use the incredibly rich
string manipulation code out there on paths.

Hence subtyping.

the more I realize how little
code would break, and how easy it'd be to fix that.

That's been used to justify every code breakage. And yet,
people eschew using D because of constant code breakage. It
must stop.

Well, it comes down to are we willing to marginally break code
for the sake of a better API. D and Phobos aren't considered
stable by any standard; I don't think we should treat them like
they're set in stone. Also, deprecation gives developers plenty
of time to update their code (if they have to at all).

It even takes less chars :-P and it only allocates on Path ==
Path and Path ==
string comparison. Which would have been done manually anyways.

Doing memory allocation to do == is a bad idea. People
intuitively think of == as a cheap operation.

It only allocates if buildNormalPath allocates. And if you aren't
using buildNormalPath in the first place before comparing
strings, you're comparing paths wrong.

Well, that's not so much a limitation of Path or path
functions as much as it is
with the operating systems themselves. You still run into that
with strings. I'm
not trying to do anything groundbreaking, just abstract away
the concept of a
path so it's easy to write larger applications.

But it isn't easier to use a Path object. That's one of the
things I discovered when using them - it's never easier.

Projects such as Dub, Vibe, and to an extent Tango disagree.

Good practice says don't worry about the implementation of
what you can't see.

Yeah, well, you said that == allocates memory under the hood,
which is surprising behavior. Real programs definitely worry

Well, they shouldn't. Profile code first, see where the hotspots
are, and fix those. I'd be very surprised if path comparison and
manipulation is so heavily used, it becomes a slow spot for
programs. And if it does, that's not the fault of the Path struct
itself, but rather of the underlying functions it uses.

If the programmer is worried about the speed of the
abstraction, deal with that
separately.

Yes, he goes back to using strings.

See above; I can't think of any use case for paths where they
account for a considerable amount of run time.

Jun 06 2013
"Lars T. Kyllingstad" <public kyllingen.net> writes:
On Thursday, 6 June 2013 at 16:03:15 UTC, Andrei Alexandrescu
wrote:
[...]

8. There really isn't any such thing as a portable path
representation.
It's more than just \ vs /. There are the drive prefixes in
Windows that
where it would
be ignored under Windows. There are 8.3 issues sometimes. The
only thing
you can do is come up with a subset of what works across
systems, and
then of course you have to go back to using strings when you
need to
access D:\foo\abc.c

That is actually an argument in favor of good encapsulation,
not against.

The proposed API change does not introduce good encapsulation.
It introduces a super-thin wrapper around a built-in type, and
replaces free functions with methods, for what gain?

Jun 06 2013
"Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 06 Jun 2013 12:14:30 -0400, Dylan Knutson <tcdknutson gmail.com>
wrote:

On Thursday, 6 June 2013 at 16:06:50 UTC, Lars T. Kyllingstad wrote:
On Thursday, 6 June 2013 at 14:51:13 UTC, Dylan Knutson wrote:
I should have said "makes it easier to be platform independent".
Normalization is done automatically on comparison.

Yes, p1 == p2 sure looks nice, but unbeknownst to the API user, it
comes at the cost of several memory allocations, and it does not
perform a case-insensitive comparison on Windows in its current form.
(Should it?  I dunno.)

It doesn't do any allocations that the user won't have to do anyways.
Paths have to be normalized before comparison; not doing so isn't
correct behavior. Eg, the strings foo../bar != bar, yet they're
equivalent paths. Path encapsulates the behavior. So it's the difference
between

buildNormalizedPath(s1) == buildNormalizedPath(s2);

and

p1 == p2;

This can be done without allocations.

-Steve

Jun 06 2013
"Lars T. Kyllingstad" <public kyllingen.net> writes:
On Thursday, 6 June 2013 at 16:14:31 UTC, Dylan Knutson wrote:
On Thursday, 6 June 2013 at 16:06:50 UTC, Lars T. Kyllingstad
wrote:
It doesn't do any allocations that the user won't have to do
anyways. Paths have to be normalized before comparison; not
doing so isn't correct behavior. Eg, the strings foo../bar !=
bar, yet they're equivalent paths. Path encapsulates the
behavior. So it's the difference between

buildNormalizedPath(s1) == buildNormalizedPath(s2);

and

p1 == p2;

To me, at least, the first one practically screams "expensive
operation", whereas the second one does the exact opposite.

Jun 06 2013
"Lars T. Kyllingstad" <public kyllingen.net> writes:
On Thursday, 6 June 2013 at 17:13:10 UTC, Steven Schveighoffer
wrote:
On Thu, 06 Jun 2013 12:14:30 -0400, Dylan Knutson
<tcdknutson gmail.com> wrote:

It doesn't do any allocations that the user won't have to do
anyways. Paths have to be normalized before comparison; not
doing so isn't correct behavior. Eg, the strings foo../bar
!= bar, yet they're equivalent paths. Path encapsulates the
behavior. So it's the difference between

buildNormalizedPath(s1) == buildNormalizedPath(s2);

and

p1 == p2;

This can be done without allocations.

I know.  There are a few additions that I've been planning to
make for std.path for the longest time, I just haven't found the
time to do so yet.  Specifically, I want to add a couple of
functions that deal with ranges of path segments rather than full
path strings.

The first one is a lazy "path normaliser":

assert (equal(pathNormalizer(["foo", "bar", "..", "baz"]),
["foo", "bar", "baz"]));

With this, non-allocating path comparison is easy.  The verbose
version of p1 == p2, which could be wrapped for convenience, is
then:

equal(pathNormalizer(pathSplitter(p1)),
pathNormalizer(pathSplitter(p2)))

You can also use filenameCmp() as a predicate to equal() to make
the comparison case-insensitive on OSes where this is expected.
Very general and composable, and easily wrappable.

The second thing I'd like to add is an overload of buildPath()
that takes a range of path segments.  (Then
buildNormalizedPath(p) can also be implemented as
buildPath(pathNormalizer(p)).)

Maybe now is a good time to get this done. :)

Jun 06 2013
"Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 06 Jun 2013 13:25:56 -0400, Lars T. Kyllingstad
<public kyllingen.net> wrote:

On Thursday, 6 June 2013 at 17:13:10 UTC, Steven Schveighoffer wrote:
On Thu, 06 Jun 2013 12:14:30 -0400, Dylan Knutson
<tcdknutson gmail.com> wrote:

It doesn't do any allocations that the user won't have to do anyways.
Paths have to be normalized before comparison; not doing so isn't
correct behavior. Eg, the strings foo../bar != bar, yet they're
equivalent paths. Path encapsulates the behavior. So it's the
difference between

buildNormalizedPath(s1) == buildNormalizedPath(s2);

and

p1 == p2;

This can be done without allocations.

I know.  There are a few additions that I've been planning to make for
std.path for the longest time, I just haven't found the time to do so
yet.  Specifically, I want to add a couple of functions that deal with
ranges of path segments rather than full path strings.

The first one is a lazy "path normaliser":

assert (equal(pathNormalizer(["foo", "bar", "..", "baz"]),
["foo", "bar", "baz"]));

With this, non-allocating path comparison is easy.  The verbose version
of p1 == p2, which could be wrapped for convenience, is then:

equal(pathNormalizer(pathSplitter(p1)),
pathNormalizer(pathSplitter(p2)))

You can also use filenameCmp() as a predicate to equal() to make the
comparison case-insensitive on OSes where this is expected.  Very
general and composable, and easily wrappable.

Great!  I'd highly suggest pathEqual which takes two ranges of dchar and
does the composition and OS-specific comparison for you.

-Steve

Jun 06 2013
"Lars T. Kyllingstad" <public kyllingen.net> writes:
On Thursday, 6 June 2013 at 17:28:56 UTC, Steven Schveighoffer
wrote:
On Thu, 06 Jun 2013 13:25:56 -0400, Lars T. Kyllingstad
<public kyllingen.net> wrote:

On Thursday, 6 June 2013 at 17:13:10 UTC, Steven Schveighoffer
wrote:
On Thu, 06 Jun 2013 12:14:30 -0400, Dylan Knutson
<tcdknutson gmail.com> wrote:

It doesn't do any allocations that the user won't have to do
anyways. Paths have to be normalized before comparison; not
doing so isn't correct behavior. Eg, the strings foo../bar
!= bar, yet they're equivalent paths. Path encapsulates
the behavior. So it's the difference between

buildNormalizedPath(s1) == buildNormalizedPath(s2);

and

p1 == p2;

This can be done without allocations.

I know.  There are a few additions that I've been planning to
make for std.path for the longest time, I just haven't found
the time to do so yet.  Specifically, I want to add a couple
of functions that deal with ranges of path segments rather
than full path strings.

The first one is a lazy "path normaliser":

assert (equal(pathNormalizer(["foo", "bar", "..", "baz"]),
["foo", "bar", "baz"]));

With this, non-allocating path comparison is easy.  The
verbose version of p1 == p2, which could be wrapped for
convenience, is then:

equal(pathNormalizer(pathSplitter(p1)),
pathNormalizer(pathSplitter(p2)))

You can also use filenameCmp() as a predicate to equal() to
make the comparison case-insensitive on OSes where this is
expected.  Very general and composable, and easily wrappable.

Great!  I'd highly suggest pathEqual which takes two ranges of
dchar and does the composition and OS-specific comparison for
you.

They don't have to be dchar if all the building blocks are
templates (as the existing ones are):

bool pathEqual(CaseSensitive cs = CaseSensitive.osDefault, C1, C2)
(const(C1)[] p1, const(C2)[] p2)
if (isSomeChar!C1 && isSomeChar!C2)
{
return equal!((a, b) => filenameCharCmp!cs(a, b) == 0)
(pathNormalizer(pathSplitter(p1)),
pathNormalizer(pathSplitter(p2)));
}

Jun 06 2013
"Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Thursday, June 06, 2013 19:25:56 Lars T. Kyllingstad wrote:
I know. There are a few additions that I've been planning to
make for std.path for the longest time, I just haven't found the
time to do so yet. Specifically, I want to add a couple of
functions that deal with ranges of path segments rather than full
path strings.

Another thing to consider is overloads of some of the functions which take an
output range as their first argument. There has been an increased push lately
to cut down on GC allocations in Phobos, and so we're probably going to start
having more functions be overloaded such that they can be used with output
ranges in order to give the folks who want to avoid the GC more control -
similar to how we have the overload of toString that takes a delegate (though
outside of classes, since we can templatize stuff, using an output range is
more flexible than a delegate, though a delegate does qualify as an ouput range
apparently).

- Jonathan M Davis

Jun 06 2013
"Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 06 Jun 2013 13:40:37 -0400, Lars T. Kyllingstad
<public kyllingen.net> wrote:

On Thursday, 6 June 2013 at 17:28:56 UTC, Steven Schveighoffer wrote:

Great!  I'd highly suggest pathEqual which takes two ranges of dchar
and does the composition and OS-specific comparison for you.

They don't have to be dchar if all the building blocks are templates (as
the existing ones are):

bool pathEqual(CaseSensitive cs = CaseSensitive.osDefault, C1, C2)
(const(C1)[] p1, const(C2)[] p2)
if (isSomeChar!C1 && isSomeChar!C2)

Actually, all string variants are dchar ranges :)  And your solution is
less general, dchar ranges don't have to be arrays.

However, I don't think in practice there are any real non-array dchar
ranges...

One thing your version does do is explicitly say the parameters are const,
which you couldn't do with a non-array dchar range.

-Steve

Jun 06 2013
"Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Thursday, June 06, 2013 10:37:27 Walter Bright wrote:
On 6/6/2013 9:50 AM, Dylan Knutson wrote:
Well, it comes down to are we willing to marginally break code for the
sake of a better API. D and Phobos aren't considered stable by any
standard; I don't think we should treat them like they're set in stone.
Also, deprecation gives developers plenty of time to update their code
(if they have to at all).

And secondly, it isn't clear that Path is a better API.

I'm not opposed to breakage in all cases. But there needs to be a big win to
justify it. I'm not seeing even a small net win for Path types. I'm not
talking hypothetical either, like I said, I've tried them several times.

Some modules have needed been redone. Some still do. But we already _did_
rework std.path. We agreed that we liked the new API, and it's been working
great. It's one thing to revisit an API that's been around since before we had
ranges or a review process. It's an entirely different thing to be constantly
reworking entire modules. I think that we need _very_ strong justification to
redesign a module that we already put through the review process. And I really
don't think that we have it here.

- Jonathan M Davis

Jun 06 2013
"Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 06 Jun 2013 13:47:42 -0400, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:

On 6/6/13 1:13 PM, Steven Schveighoffer wrote:
buildNormalizedPath(s1) == buildNormalizedPath(s2);

and

p1 == p2;

This can be done without allocations.

Interesting. "Show me the code!"

I think Lars summed it up nicely.  It's not full working code yet, but it
shows how one can do the path splitting and normalization lazily.

However, it should be noted that buildNormalizedPath cannot be done
without allocations, just the full comparison.

-Steve

Jun 06 2013
"Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 06 Jun 2013 13:50:13 -0400, Walter Bright
<newshound2 digitalmars.com> wrote:

Path operations should not require a real filesystem.  They are string
manipulations, nothing more.

There is huge value in that.

-Steve

Jun 06 2013
"Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Thursday, June 06, 2013 10:27:28 Walter Bright wrote:
But there's no getting around the fact
that "File" and "file" are different paths under Windows, and are the same
under Linux.

I think you got that backwards. ;)

- Jonathan M Davis

Jun 06 2013
"Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Thursday, June 06, 2013 13:45:44 Andrei Alexandrescu wrote:
D and Phobos aren't considered stable by any
standard; I don't think we should treat them like they're set in stone.
Also, deprecation gives developers plenty of time to update their code
(if they have to at all).

I think this opinion is very unlikely to enjoy popularity. We actively
/want/ to make Phobos more stable, so using the argument that it's not
yet stable to add more instability is sure to fit the pattern of some
list of fallacies. Besides, the corresponding benefits (the best solid
argument that could be constructed) are at least according to some not
that large to justify the cost of breakage.

Agreed. Breaking stuff in an effort to create a solid, stable API is one thing
(and at this point, we want to minimize even that as much as we reasonably
can). Constantly going back and rebreaking stuff is quite another. We already
redid std.path. It went through the full review process and was voted in. We
want to move towards being _more_ stable not less. Some API breakage will
still be necessary (like replacing std.xml or the streaming modules), but it's
a cost that we want to avoid when it isn't necessary. Each module redesign
must justify itself, and the simple fact that other modules have already been
redesigned is not enough for that. Not to mention, over time, it should
arguably require _more_ justification to redo a module (or make any breaking
change in Phobos), because more people are using it, and we really do want to
be stable.

- Jonathan M Davis

Jun 06 2013
"Lars T. Kyllingstad" <public kyllingen.net> writes:
On Thursday, 6 June 2013 at 17:48:59 UTC, Steven Schveighoffer
wrote:
On Thu, 06 Jun 2013 13:40:37 -0400, Lars T. Kyllingstad
<public kyllingen.net> wrote:

On Thursday, 6 June 2013 at 17:28:56 UTC, Steven Schveighoffer
wrote:

Great!  I'd highly suggest pathEqual which takes two ranges
of dchar and does the composition and OS-specific comparison
for you.

They don't have to be dchar if all the building blocks are
templates (as the existing ones are):

bool pathEqual(CaseSensitive cs = CaseSensitive.osDefault, C1,
C2)
(const(C1)[] p1, const(C2)[] p2)
if (isSomeChar!C1 && isSomeChar!C2)

Actually, all string variants are dchar ranges :)  And your
solution is less general, dchar ranges don't have to be arrays.

Ok, now I see what you meant.

However, I don't think in practice there are any real non-array
dchar ranges...

At least not any that also support slicing, which I think it is
fair to require of "path ranges".

Jun 06 2013
"Peter Alexander" <peter.alexander.au gmail.com> writes:
Just want to chime in and say that I'm also against this change.

I can see some small benefits, but I also see problems, all of

Even if it is a small net improvement, I don't think it's
anywhere near a big enough improvement to warrant an API change.

Jun 06 2013
"Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Thursday, June 06, 2013 11:09:29 Walter Bright wrote:
On 6/6/2013 10:50 AM, Jonathan M Davis wrote:
Some modules have needed been redone. Some still do. But we already _did_
rework std.path. We agreed that we liked the new API, and it's been
working
great. It's one thing to revisit an API that's been around since before we
had ranges or a review process. It's an entirely different thing to be
constantly reworking entire modules. I think that we need _very_ strong
justification to redesign a module that we already put through the review
process. And I really don't think that we have it here.

I think we're in violent agreement.

to Dylan.

An example of a strong justification for a redo is, for example, conversion
to use ranges. std.zip needs that treatment.

Agreed.

- Jonathan M Davis

Jun 06 2013
"Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Thursday, June 06, 2013 13:53:51 Steven Schveighoffer wrote:
On Thu, 06 Jun 2013 13:50:13 -0400, Walter Bright

<newshound2 digitalmars.com> wrote:

Path operations should not require a real filesystem. They are string
manipulations, nothing more.

There is huge value in that.

Agreed, but symlinks highlight the fact that there is a difference between
paths being equal and paths referring to the same file.

- Jonathan M Davis

Jun 06 2013
"Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Thursday, June 06, 2013 14:38:41 Andrei Alexandrescu wrote:
On 6/6/13 2:13 PM, Jonathan M Davis wrote:
An example of a strong justification for a redo is, for example,
conversion
to use ranges. std.zip needs that treatment.

Agreed.

Key to success for Path: somehow get it on the ranges bandwagon :o).

LOL. Well, given that strings are _already_ ranges, that wouldn't help it
anywhere near as much as it does with other cases of code breakage, since

- Jonathan M Davis

Jun 06 2013
"Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 06 Jun 2013 16:25:58 -0400, Walter Bright
<newshound2 digitalmars.com> wrote:

BTW, Windows still has only erratic support for using / as path
separators, even in the system commands. Not even the "DIR" command can
deal with it.

We don't program using DIR.  That is irrelevant.  (not contesting that
Windows doesn't work well with '/', just that DIR, or any other command
line tool, is evidence)

-Steve

Jun 06 2013
"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Jun 06, 2013 at 02:38:41PM -0400, Andrei Alexandrescu wrote:
On 6/6/13 2:13 PM, Jonathan M Davis wrote:
An example of a strong justification for a redo is, for example,
conversion to use ranges. std.zip needs that treatment.

Agreed.

Key to success for Path: somehow get it on the ranges bandwagon :o).

Hmm. Let's see:

assert(isInputRange!Path);
version(Windows)
auto p = Path(..\blah\blah\..\bluh);
else version(Linux)
auto p = Path(../blah/blah/../bluh);

// I'm assuming auto normalization; if you don't like that,
// pretend I also wrote this line:
//	p.normalize();

assert(p.equals([
"..",
"blah",
"bluh"
]);

While the above may *look* attractive, it's actually a minefield full of
pitfalls. Consider this directory tree in Posix:

/home/user/test
/home/user/test/real
/home/user/test/real/1/myfile
/home/user/test/real/2/anotherfile

Let's say the current working directory is /home/user. Now consider
this:

auto p = Path(test/symlink/../2/anotherfile);
assert(std.path.exists(p));	// should this work?

The only way the above can actually work is if normalization queries the
filesystem. That is to say, it is NOT mere string manipulations.

However, *should* normalization always check the filesystem? What if the
program is constructing a list of paths that it's going to create, which
don't exist in the filesystem yet? Then normalization will fail, even
though the paths are valid.

Conclusion: correct path normalization depends on intent, which only the
programmer knows -- the library can't possibly figure this out without
being told. (And I haven't even started getting into OS-dependent path
manipulation yet... what should Path(C:\Program Files\abc.def) do on a
system-dependent details of paths, so I'm not sure what value Path is
really adding. At least, I'm not finding it compelling enough to eschew
plain old string manipulations.

Besides, should glob patterns like "/home/user/prog/*/*.d" be Path's or
strings? What about path regexes? Should Path export a whole suite of
parallel methods for constructing such patterns? One can always
interconvert to/from strings, of course, but if we'd started out with
strings in the first place, we wouldn't need any conversions. The OS
ultimately takes only strings anyway, so is there really a need to
insert a convert to/from Path in between?

I do see a lot of value in providing *functions* for manipulating path
strings (normalizations, parsing path components, splitting file
extensions, etc.), but I've a hard time with encapsulating a path string
in an opaque object when it doesn't really give that much more value. If
you *really* like the idea of Path, nothing stops you from writing one
yourself, and have it implicitly convert to string so that you can pass
it directly to OS functions that take paths. I just don't see value in
requiring Phobos functions to only take Path objects.

T

--
WINDOWS = Will Install Needless Data On Whole System -- CompuMan

Jun 06 2013
"Robert Clipsham" <robert octarineparrot.com> writes:
On Thursday, 6 June 2013 at 15:36:17 UTC, Walter Bright wrote:
On 6/4/2013 11:27 PM, Dylan Knutson wrote:
I'd like to open up the idea of Path being an object in
std.path. I've submitted
a pull
(https://github.com/D-Programming-Language/phobos/pull/1333)
Path struct to std.path, "which exposes a much more palatable
interface to path
string manipulation".

I've succumbed to the temptation to do this several times over
the years.

I always wind up backing it out and going back to strings.

As another data point: Java 7 introduces new Path and Paths
objects:

http://docs.oracle.com/javase/7/docs/api/java/nio/file/Paths.html

So they clearly think using an object(s) for it is useful.

-----

Without even thinking about the API, just using it, all the code
I've written in the past couple of weeks looks something like
this:

Path p = Paths.get(someDir, someOtherDir);
p = p.subpath(otherPath, p.getNameCount());
Path file = p.resolve(someFile);
print(file.toString());
file.toFile().doSomething();

ie. All my code is converting to/from a Path object purely for
dealing with Windows and Posix / vs \ differences and doing
sub-paths. Seems a bit pointless when we could just use free
functions in my opinion.

Jun 06 2013
"Regan Heath" <regan netmail.co.nz> writes:
On Thu, 06 Jun 2013 15:54:24 +0100, Dylan Knutson <tcdknutson gmail.com>
wrote:

On Thursday, 6 June 2013 at 10:48:54 UTC, Lars T. Kyllingstad wrote:
On Thursday, 6 June 2013 at 10:32:36 UTC, Regan Heath wrote:
On Thu, 06 Jun 2013 08:55:50 +0100, Lars T. Kyllingstad
<public kyllingen.net> wrote:

On Thursday, 6 June 2013 at 07:05:52 UTC, Lars T. Kyllingstad wrote:
[...]

Let me add some more to this.  To justify the addition of such a
type, it needs to pull its own weight.  For added value, it could do
one or both of the following:

Does System.IO.DirectoryInfo:
http://msdn.microsoft.com/en-us/library/system.io.directoryinfo.aspx

vs just having System.IO.Directory:
http://msdn.microsoft.com/en-us/library/system.io.directory.aspx

They add great value, but that is a completely different discussion, as
these are more similar to std.file.DirEntry.  The added value is mainly
in the performance benefits; for example,

if (exists(f) && isFile(f) && timeLastModified(f) < d) ...

requires three filesystem lookups (stat() calls), whereas

auto de = dirEntry(f);
if (de.exists && de.isFile && de.timeLastModified < d) ...

is just one.

I see no such benefit in the proposed Path type.

Path and dirEntry are different modules with different goals to fulfill.
I don't think it's appropriate to compare a module whose function is
path manipulation with one whose is querying filesystem information.

Yeah, my fault.  I didn't take the time to look at the proposed module in
detail.

R

--
Using Opera's revolutionary email client: http://www.opera.com/mail/

Jun 07 2013
"monarch_dodra" <monarchdodra gmail.com> writes:
On Thursday, 6 June 2013 at 19:29:08 UTC, Jonathan M Davis wrote:
On Thursday, June 06, 2013 14:38:41 Andrei Alexandrescu wrote:
On 6/6/13 2:13 PM, Jonathan M Davis wrote:
An example of a strong justification for a redo is, for
example,
conversion
to use ranges. std.zip needs that treatment.

Agreed.

Key to success for Path: somehow get it on the ranges
bandwagon :o).

LOL. Well, given that strings are _already_ ranges, that
wouldn't help it
anywhere near as much as it does with other cases of code
breakage, since

- Jonathan M Davis

I think using string as the main form of representation for a
path is fine.

However, there are times where it is convenient to be able to
explode a path into a structure, where each part is clearly
separate from the next. This makes it easy to do certain
otherwise hard to do operations. eg:

Change:
C:\Users\Monarch\Docs\MyFile.txt
to
D:\Users\Monarch\MyFile.txt

Regexes are fun and all, but they do come with their own
complications, and pitfalls. And they *do* require efforts to
write. Or use the existing interface. It works, I won't argue
agains it, but I do find times where it is kind of clunky.

I'd be in favor of having a "Path" object, if only for being able
to help in the construction or modification of string paths.

For example, I imagine something as:
string oldPath = C:\Users\Monarch\Docs\MyFile.txt:
Path   myPath  = Path(oldPath);
myPath.drive = 'D';
myPath.folders = myPath.folders[0 .. \$ - 1];
string newPath = myPath.build;

I think it would be useful to have that. None of the existing
interfaces change. It's just an optional tool that I think would
be convenient.

--------

If I may present an analogy: C deals with "time" using the
arithmetic "time_t" primitive. It works, is mostly convenient,
and is the standard API. Still, C also proposes the "struct tm",
which is a time, exploded into year/month/day/hours/min/sec.

You can do nothing with this type, except, well read and write to
it, and convert it back to/from time_t. Yet, is has its uses, if
only being presented in a way that might be more natural to
manipulate. And that is reason enough for its existence.

Jun 07 2013
"monarch_dodra" <monarchdodra gmail.com> writes:
On Friday, 7 June 2013 at 17:27:16 UTC, Andrei Alexandrescu wrote:
On 6/7/13 1:04 PM, monarch_dodra wrote:
I think using string as the main form of representation for a
path is fine.

However, there are times where it is convenient to be able to
explode a
path into a structure, where each part is clearly separate
from the
next.

Tuple!(
string, "drive",
string[], "folders",
string, "basename",
string, "extension"
)
parsePath(string path);

string buildPath(string drive, string[] folders, string
basename, string extension);

Andrei

Yeah. That's pretty much more or less what I was describing.
Except "buildPath" would take your (unnamed) tuple type directly.

There'd be also be a "filename" member/ufcs function in there for
convenience.

I think that would be a small, but useful, addition to std.path.

Jun 07 2013
"John Colvin" <john.loughran.colvin gmail.com> writes:
On Friday, 7 June 2013 at 18:26:42 UTC, Andrei Alexandrescu wrote:
On 6/7/13 2:10 PM, monarch_dodra wrote:
On Friday, 7 June 2013 at 17:27:16 UTC, Andrei Alexandrescu
wrote:
On 6/7/13 1:04 PM, monarch_dodra wrote:
I think using string as the main form of representation for
a path is
fine.

However, there are times where it is convenient to be able
to explode a
path into a structure, where each part is clearly separate
from the
next.

Tuple!(
string, "drive",
string[], "folders",
string, "basename",
string, "extension"
)
parsePath(string path);

string buildPath(string drive, string[] folders, string
basename,
string extension);

Andrei

Yeah. That's pretty much more or less what I was describing.
Except
"buildPath" would take your (unnamed) tuple type directly.

No, the version I wrote is more flexible. You get to pass
separate arguments to it or just pass a tuple with .expand.

buildPath(parsePath("/bin/sh").expand)

should rebuild "/bin/sh".

There'd be also be a "filename" member/ufcs function in there
for
convenience.

I think that would be a small, but useful, addition to
std.path.

Me 2.

Andrei

An overload for buildPath that took the tuple directly would be
good. Typing expand all the time would get tiresome if you were
doing lots of this.

Jun 07 2013
On 6/6/13 1:02 PM, Michel Fortin wrote:
and Apple has Case-sensitive HFS+ for OS X and its the default on iOS.

Careful.. While HFS+ can be case sensitive, it's not by default.  Nor is it
recommended due to the
number of osx applications that just aren't designed with that in mind.

Jun 07 2013
"Lars T. Kyllingstad" <public kyllingen.net> writes:
On Friday, 7 June 2013 at 17:27:16 UTC, Andrei Alexandrescu wrote:
On 6/7/13 1:04 PM, monarch_dodra wrote:
I think using string as the main form of representation for a
path is fine.

However, there are times where it is convenient to be able to
explode a
path into a structure, where each part is clearly separate
from the
next.

Tuple!(
string, "drive",
string[], "folders",
string, "basename",
string, "extension"
)
parsePath(string path);

string buildPath(string drive, string[] folders, string
basename, string extension);

This is a good idea.  Not only is it convenient, but as there is
a lot of overlap in the work done by the various path
decomposition functions, it will also improve performance when
you need the results of several of them.

But why stop at the parts you have listed there?  Why not offer
every possible decomposition the user could ever want?  It's
about the same amount of work, because the number of "split
points" you need to find is exactly the same.

Splitting the directory part into separate segments should be
optional, since it allocates.

DecomposedPath!(inout(C)) decompose(inout(C)[] path, bool
splitDir = true);

struct DecomposedPath(C) if (isSomeChar!C)
{
C[] driveName;      /// Equal to driveName()
C[] dirName;        /// Equal to dirName()
C[] noDriveDir;     /// Equal to dirName().stripDrive()
C[] rootName;       /// Equal to rootName()
C[] baseName;       /// Equal to baseName()
C[] stem;           /// Equal to baseName().stripExtension()
C[] extension;      /// Equal to extension()

/// Equal to dirName().pathSplitter().array()  (optional)
C[][] dirSegments;
}

Jun 08 2013
"Lars T. Kyllingstad" <public kyllingen.net> writes:
On Saturday, 8 June 2013 at 14:08:59 UTC, Lars T. Kyllingstad
wrote:
On Friday, 7 June 2013 at 17:27:16 UTC, Andrei Alexandrescu
wrote:
However, there are times where it is convenient to be able to
explode a
path into a structure, where each part is clearly separate
from the
next.

Tuple!(
string, "drive",
string[], "folders",
string, "basename",
string, "extension"
)
parsePath(string path);

string buildPath(string drive, string[] folders, string
basename, string extension);

[...]

But why stop at the parts you have listed there?

The moment I clicked "Send", I realised that offering multiple
decompositions would prevent recomposition, because you'd have to
choose which parts to combine.

Jun 08 2013
"monarch_dodra" <monarchdodra gmail.com> writes:
On Saturday, 8 June 2013 at 14:14:33 UTC, Lars T. Kyllingstad
wrote:
On Saturday, 8 June 2013 at 14:08:59 UTC, Lars T. Kyllingstad
wrote:
On Friday, 7 June 2013 at 17:27:16 UTC, Andrei Alexandrescu
wrote:
However, there are times where it is convenient to be able
to explode a
path into a structure, where each part is clearly separate
from the
next.

Tuple!(
string, "drive",
string[], "folders",
string, "basename",
string, "extension"
)
parsePath(string path);

string buildPath(string drive, string[] folders, string
basename, string extension);

[...]

But why stop at the parts you have listed there?

The moment I clicked "Send", I realised that offering multiple
decompositions would prevent recomposition, because you'd have
to choose which parts to combine.

Using D's property functions, this should not actually be a
problem. The struct could be opaque in regards to which members
are actually attributes, and which are functions.

Eg:
Path path = Path(C:\MyFile.txt);
path.filename  = "main.cpp";
path.extension = "d";
assert(path.buildPath() == C:\main.d));

I don't see any reason for that to not work.

Jun 08 2013