digitalmars.D.bugs - [Issue 7002] New: std.path needs a isValidFilePath function

d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=7002

Summary: std.path needs a isValidFilePath function
Product: D
Version: D2
Platform: Other
OS/Version: Windows
Status: NEW
Severity: enhancement
Priority: P2
Component: Phobos
AssignedTo: nobody puremagic.com
ReportedBy: andrej.mitrovich gmail.com

--- Comment #0 from Andrej Mitrovic <andrej.mitrovich gmail.com> 2011-11-24
13:10:38 PST ---
I currently see no way of verifying that a given path is a path to a *file* and
not a folder, IOW I want this assert to pass:

assert(isValidFilePath(r"C:\folder\file"))

but the following one to fail, because 'file' in this case is syntactically a
folder:

assert(isValidFilePath(r"C:\folder\file\"))

Using basename and isValidFilename isn't a workaround as basename strips the
slash:

auto file = r"C:\folder\file\";
assert(file.baseName.isValidFilename);  // passes, baseName returns "file"

Either we implement a isValidFilePath function, or a "fileName" function which
could be used like so:

auto file = r"C:\folder\file\";
assert(file.fileName.isValidFilename);  // this would fail, as expected

--
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Nov 24 2011
d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=7002

Andrej Mitrovic <andrej.mitrovich gmail.com> changed:

----------------------------------------------------------------------------
Status|NEW                         |RESOLVED
Resolution|                            |INVALID

--- Comment #1 from Andrej Mitrovic <andrej.mitrovich gmail.com> 2011-11-26
00:57:48 PST ---
Err, no actually what happened was I was reading:
isValidFilename

and the docs said it verifies *directories* as well, but it doesn't. It
verifies filenames only, and this is what I need.

The docs confused me, sorry.

--
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Nov 26 2011
d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=7002

Andrej Mitrovic <andrej.mitrovich gmail.com> changed:

----------------------------------------------------------------------------
Status|RESOLVED                    |REOPENED
Resolution|INVALID                     |

--- Comment #2 from Andrej Mitrovic <andrej.mitrovich gmail.com> 2011-11-26
01:11:38 PST ---
Double confusion for the win.

isValidFilename checks filenames, not file paths. I was right the first time.

--
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Nov 26 2011
d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=7002

Jonathan M Davis <jmdavisProg gmx.com> changed:

----------------------------------------------------------------------------
CC|                            |jmdavisProg gmx.com
Platform|Other                       |All
OS/Version|Windows                     |All

--- Comment #3 from Jonathan M Davis <jmdavisProg gmx.com> 2011-11-27 20:14:40
PST ---
The _only_ way to know for sure if a path points to a file is to use std.file -
typically either std.file.isFile on a string or isFile on a std.file.DirEntry.

The fact that a path does not end in a directory separator does not say
anything about whether it's a directory or a file. As such, depending on
whether it does or not seems like a bad idea. A path which _does_ end in a
directory separator _is_ directory (assuming that the path is valid), but the
lack of one means nothing.

isValidFilename essentially assumes that it's getting the base name of a file
or directory and verifies that it's valid. isValidPath essentilally splits on
the directory separator and calls isValidFilename on each of those components
(I believe that it's somewhat more complicated than that, but that's
essentially what it's doing. So, isValidFilename is going to return false if
the name that it's given contains a directory separator and isValidPath doesn't
care about the directory separator, since it's effectively splitting on it. In
_neither_ case are they concerned about whether ending in a directory separator
or not indicates a file or a directory.

So, if you want a function which returns whether a particular path is
definitively for a directory or not (i.e. ends in a directory separator), then
a new one needs to be created as opposed to changing what any of the current
functions are doing. But it seems to me that if you want to do _that_, all you
need to do is

assert(path.endsWith(dirSeparator));

What are you looking for beyond that?

As I said, the _lack_ of a directory separator doesn't mean that the path is
for a file, so you can only _really_ know if you use std.file.isFile. So, given
that and the fact that endsWith will tell you whether a path ends with a
directory separator, I'm not sure what more you're really looking for here.

--
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Nov 27 2011
d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=7002

--- Comment #4 from Andrej Mitrovic <andrej.mitrovich gmail.com> 2011-11-27
20:30:54 PST ---
I don't need to know if it's a file, that's std.file's business. I'm talking

I want to reject paths that are clearly directories and not files. In other
words, *accept* paths which are valid file paths. I know well that a file can
be extensionless.

assert(path.endsWith(dirSeparator)) is what I want (but a call to isValidPath
beforehand is also required). I think it's standard enough to be included as a
"isValidFilePath" function.

========

Consider a build system where the user can specify a path to an output file,
even if that path or file might not exist:

build --output=C:\dir\name   <- pass
build --output=C:\dir\name\  <- reject this

I can't use isValidFilename here, it will reject both of these cases because it
only works on "name" and "name.ext" variants, if it sees any separators it
rejects it.

--
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Nov 27 2011
d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=7002

--- Comment #5 from Andrej Mitrovic <andrej.mitrovich gmail.com> 2011-11-27
20:39:12 PST ---
One more thing, your solution doesn't work because a user might pass a
different dir separator, IOW on Windows this would pass while it should be
rejected:
C:\dir\name/

It needs to be in the library so people don't reinvent it all the damn time.

--
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Nov 27 2011
d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=7002

--- Comment #6 from Andrej Mitrovic <andrej.mitrovich gmail.com> 2011-11-27
20:42:34 PST ---
bool isValidFilePath(string filePath)
{
immutable end = filePath[$-1]; return (filePath.isValidPath && end != '/' && end != '\\' && filePath.baseName.isValidFilename); } But I can't tell if this is 100% right. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------  Nov 27 2011 d-bugmail puremagic.com writes: http://d.puremagic.com/issues/show_bug.cgi?id=7002 --- Comment #7 from Jonathan M Davis <jmdavisProg gmx.com> 2011-11-27 20:56:35 PST --- My point was that the lack of a directory separator on the end says _nothing_ about whether a path refers to a directory or a file. I can do both /home/jmdavis and /home/jmdavis/ and _both_ refer to a directory on my system. And as for across OSes, I can legally put a backslash in a filename on Linux. It would be very wrong to say that there was anything invalid about having one at the end of a file name except on Windows. path.endsWith(dirSeparator) tells you exactly whether the file name ends with a directory separator or not. If you want to check that a path is valid and does not end with a directory separator, then do assert(isValidPath(path) && !path.endsWith(dirSeparator)); I don't see why a new function is needed for this, _especially_ when it's perfectly legal to refer to directories _without_ a directory separator on the end of the path. isValidPath _already_ makes sure that an incorrect directory separator isn't used. I don't see how you can really be using paths alone to try and determine whether a path refers to a file or not when the path doesn't give you enough information to do that. It would make no sense for std.path to try and determine that IMHO precisely because the information isn't there. You need to actually check the disk for that - which is what std.file.isFile is for. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------  Nov 27 2011 d-bugmail puremagic.com writes: http://d.puremagic.com/issues/show_bug.cgi?id=7002 --- Comment #8 from Andrej Mitrovic <andrej.mitrovich gmail.com> 2011-11-27 23:09:18 PST --- I think there's some confusion here. When I say "valid file path" I mean *syntactically* valid, not whether they exist and whether something that looks like a file path is actually a directory on a hard drive. (In reply to comment #7) My point was that the lack of a directory separator on the end says _nothing_ about whether a path refers to a directory or a file. I can do both /home/jmdavis and /home/jmdavis/ and _both_ refer to a directory on my system. I would reject /home/jmdavis/ as a valid file path on Windows only. /home/jdavis would be accepted as a valid file path, even if it's *actually* a directory. I could easily do a std.file.isDir check *after* that if I need to. And as for across OSes, I can legally put a backslash in a filename on Linux. It would be very wrong to say that there was anything invalid about having one at the end of a file name except on Windows. On Linux this would pass as a valid file path. Putting it in the library means it will work in a cross-platform and platform-specific way. Forward slashes are also *valid* on Windows, and yet dirSeparator only returns r"\" on Windows. path.endsWith(dirSeparator) tells you exactly whether the file name ends with a directory separator or not. If you want to check that a path is valid and does not end with a directory separator, then do assert(isValidPath(path) && !path.endsWith(dirSeparator)); Wrong. A forward slash is a valid directory separator on Windows, your assert will pass for this path even though it *can't* be a file path specifier on windows: import std.path; import std.algorithm; void main() { string path = r"C:\foo\bar/"; assert(isValidPath(path) && !path.endsWith(dirSeparator)); } I don't see why a new function is needed for this, _especially_ when it's perfectly legal to refer to directories _without_ a directory separator on the end of the path. The focus is on files, not directories. Read my post with the build example I've pasted before. I don't see how you can really be using paths alone to try and determine whether a path refers to a file or not when the path doesn't give you enough information to do that. It would make no sense for std.path to try and determine that IMHO precisely because the information isn't there. You need to actually check the disk for that - which is what std.file.isFile is for. Of course I don't do it with *just* a syntactical check, but if I can possibly reject a string as syntactically invalid, I'd rather do that than wait for the OS to tell me if it's a file or directory. This could be a useful optimization technique, e.g. if you're working on a large list of paths and you need to quickly verify that they're all syntactically file paths. This could save you time compared to just blindly doing: try { foreach (path; possibleFilePaths) auto file = File(path, "w"); // could fail if it's a directory string } catch { } -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------  Nov 27 2011 d-bugmail puremagic.com writes: http://d.puremagic.com/issues/show_bug.cgi?id=7002 --- Comment #9 from Jonathan M Davis <jmdavisProg gmx.com> 2011-11-27 23:27:50 PST --- Wrong. A forward slash is a valid directory separator on Windows, your assert will pass for this path even though it *can't* be a file path specifier on windows Then use assert(isValidPath(path) && !isDirSeparator(path[$-]);

Of course I don't do it with *just* a syntactical check, but if I can possibly
reject a string as syntactically invalid, I'd rather do that than wait for the
OS to tell me if it's a file or directory.

This could be a useful optimization technique, e.g. if you're working on a
large list of paths and you need to quickly verify that they're all
syntactically file paths.

I'd argue that std.file.isFile is fast enough given that you're going to be
doing I/O anyway and that you're going to need to use it regardless if you want
to verify that a path is a valid file, since the syntactic check is not enough.
So, the extra benefit of verifying that the path doesn't end with a directory
separator is minimal. And if it matters, then do the check that I suggested
above. Personally, I just always use std.file.exists and std.file.isFile. If
you want to avoid trying to open invalid files, you always have to call them
before opening any file anyway.

I seriously question that needing to verify whether a path ends in a directory
separator deserves its own function - even when coupled with verifying that the
path doesn't contain any invalid characters. It's simple enough to do with two
checks if you actually need it, and in the general case, it should be
completely unnecessary, since there are other checks that you need to done
anyway which will catch it. It strikes me as a minor optimization which is
completely unnecessary in the general case.

There's nothing wrong with an enhancement request for it, but I don't think
that such a function merits being added to std.path.

--
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Nov 27 2011
d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=7002

Lars T. Kyllingstad <bugzilla kyllingen.net> changed:

----------------------------------------------------------------------------
CC|                            |bugzilla kyllingen.net

--- Comment #10 from Lars T. Kyllingstad <bugzilla kyllingen.net> 2012-02-01
14:27:36 PST ---
Sorry for coming late to the party, but I totally agree with Jonathan on this.

Such a function does not seem generic enough to warrant inclusion in the
standard library, and std.path already provides the building blocks making it a
one-liner:

{
return isValidPath(path) && isDirSeparator(path[\$-1]);
}

--
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Feb 01 2012
d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=7002

Andrej Mitrovic <andrej.mitrovich gmail.com> changed: