www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 3848] New: functions in std.file don't take symbolic links into account

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3848

           Summary: functions in std.file don't take symbolic links into
                    account
           Product: D
           Version: 2.040
          Platform: Other
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Phobos
        AssignedTo: nobody puremagic.com
        ReportedBy: jmdavisProg gmail.com



00:24:31 PST ---
Created an attachment (id=572)
Changes for std.file.d to make it work with symlinks.

At present, if you call isfile() or isdir() on a symbolic link, it reports on
what the link points to rather than the link itself. Presumably this is fine on
Windows since as far as I know, Windows doesn't have symbolic links. But it is
a big problem for Linux.

The other problem is rmdirRecursive(). At present, I believe that it currently
would follow symbolic links to directories rather than just deleting the
symlink itself. Ideally, it would not follow symlinks.



I have included a patch which I believe fixes the problems. Whether it's the
best solution to the problem, I don't know. I added getLinkAttributes() which
uses lstat instead of stat and make isdir() and isfile() use
getLinkAttributes() rather than getAttributes(). I also added issymlink() to go
with isdir() and isfile(). DirEntry was changed to use lstat as well. That
should fix, isfile(), isdir(), and rmdirRecursive(). Most of the other
functions like lastModified() continue to use fstat, so they continue to give
information on the file which is pointed to rather than the symlink, which I
think is what we want.

The one wart is that having DirEntry use lstat means that all of the attributes
that it gives are for the symlink itself which is not always what we want. So,
I added a followLink() function which returns a DirEntry with the values for
stat if it's a symbolic link or just returns a copy of the DirEntry if it's
not. followLink() can almost certainly been done in a cleaner manner. I'd have
used init, but I don't know how to get dirent for a single file rather than a
whole directory. So, it uses stat and then checks the mode to figure out which
value to set d_type for. DT_WHT is currently not used (it's commented out)
because I have no clue what it represents, so I don't know what mode value from
stat it corresponds to.

In any case, if my patch isn't quite what you'd want to use, hopefully it's at
least a step in the right direction. Hopefully the patch is in an appropriate
format. It has changes for std.file.d

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Feb 24 2010
next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3848


Andrei Alexandrescu <andrei metalanguage.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
                 CC|                            |andrei metalanguage.com



05:11:32 PST ---
Thanks, I'll look into it. Just like you, I'm unclear whether isdir() vs.
isfile() should automatically dereference the link. I think we should keep the
current behavior and add an islink() test that tells whether the thing is
actually a link.

I think removal should remove the actual link.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Feb 24 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3848




08:46:30 PST ---
The main problem with having isdir() and isfile() continuing to use stat, and
having the programmer just call issymlink() to check whether it's a link is
that that's a call to stat _and_ a call to lstat, and from what I understand
stat and lstat are a bit expensive (presumably because they have to talk to
disk). Of course, if there's a high chance that a programmer will have to check
whether a symlink points to a file or a directory, then they'll end up calling
both isdir()/isfile() and issymlink() and so will have a call to both stat and
lstat anyway, so it may be better to keep isdir() and isfile() using stat.
Really, it depends on the most likely use case. rmdirRecursive() is definitely
more efficient using just lstat though.

Personally, I wish that there were a function which returned information for
the file which is pointed to while telling you whether it's a symlink or not,
but unfortunately, from what I've found, there is no such beast.

From an efficiency perspective, it would be nice if there were a function which
returned a DirEntry for a single file so that you can query that as to whether
a file is a file, dir, or symlink without calling stat or lstat every time you
call isfile(), isdir(), or issymlink(), but I couldn't figure out how to do
that, so I don't know if it can be reasonably done (since presumably it would
be silly and expensive to get the contents of the directory holding the file
just to get the DirEntry for the file itself).

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Feb 24 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3848




23:31:35 PST ---
Actually, after playing around with this a bit more, I think that I'm leaning
towards it being more useful for isdir() and isfile() to use stat. It can
result in less efficient code, but it's a lot easier to deal with the symlinks
that way. If isdir() and isfile() use lstat, you have to call getAttributes()
on the symlink to find out what it is, and that means anding the return value
with the appropriate flags, and that's not terribly user-friendly.

In fact, it would actually be nice if getAttributes() returned an enum or
struct rather a uint. I believe that it's quite OS-dependent as it is, and you
have to look at file.d to have any clue what the return value represents.
Ideally, you wouldn't have to look a function's source code to figure out how
to deal with its return value. But the whole issue of getAttributes() is more
or less a separate (albeit connected) issue.

In any case, I think that from a user-friendliness perspective, you're right in
your assessment that it would be better to have isdir() and isfile() use stat
rather than lstat. It does, however, mean that you have to be that much more
careful with isdir() or we risk recursive issues or removing stuff in the
directory that it points to rather than just the link in functions like
rmdirRecursive().

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Feb 24 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3848


Jonathan M Davis <jmdavisProg gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------

           obsolete|                            |



00:32:57 PST ---
Created an attachment (id=573)
Next attempt at fixing problem.

Okay. I tried to fix this more cleanly this time. The result is that DirEntry
has changed a fair bit, but now isdir() and isfile() use stat rather than
lstat, and you can now get a DirEntry for a specific file without getting all
the DirEntries for a directory. I don't know whether it will be to your liking,
and I have not tested the Windows code at all, but from the testing I did of
the linux code, it appears to work (though it could always use more unittests).

By making DirEntry as lazy/efficient as possible and making it possible to get
one for any file, it should fix the efficiency issue of calling both stat and
lstat as much as is possible (though it would still be nice if there were a
posix function which did stat but also told you whether it was a symlink).

There are also versions of isdir(), isfile(), and issymlink() which take an
attributes (uint) value to make them the attributes values easier to use and
facilitate rmdirRecursive() only calling lstat. I didn't add any for block
devices and such, but perhaps they would be good to add as well.

DirEntry is using properties now with its member variables being private,
allowing for guarantees as to the state of its members. I don't know whether
you'd consider that a problem or not, but I think that it's cleaner.

I have attached the updated std.file.d.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Feb 26 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3848




00:35:59 PST ---
Created an attachment (id=574)
Patch file for next attempt at fixing problem (instead of the full file).

Here's a patch if you'd prefer that to the fully changed file. You'd need to
apply dos2unix to the original file.d first for it work.

Hopefully this at least comes close to solving the problem for you and saves
you some work. Thanks.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Feb 26 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3848




02:43:05 PST ---
I would point out that my changes appear to fix bug 1635, so fixing this bug
could result in that one getting fixed at the same time.

I also noticed that I should have named the direntry() function dirEntry(). I
was not paying enough attention to the capitalization for dirEntries() and
thought that it was direntries(), so my attempt to make their capitalization
match failed.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Feb 28 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3848




07:04:11 PST ---

 Thanks, I'll look into it. Just like you, I'm unclear whether isdir() vs.
 isfile() should automatically dereference the link. I think we should keep the
 current behavior and add an islink() test that tells whether the thing is
 actually a link.
Actually windows gone the same way: if you want to examine symlink, you should state it explicitly.
 The main problem with having isdir() and isfile() continuing to use stat, and
 having the programmer just call issymlink() to check whether it's a link is
 that that's a call to stat _and_ a call to lstat, and from what I understand
 stat and lstat are a bit expensive (presumably because they have to talk to
 disk).
They don't talk to disk, they talk to disk cache. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Mar 01 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3848


Jonathan M Davis <jmdavisProg gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------

           obsolete|                            |



23:39:06 PST ---
Created an attachment (id=577)
Now fixes DirIterator and listdir() as well.

I don't know anything about the whole reading from disk reading from disk cache
thing, but it's still I/O of some variety and, as I understand it, is not a
cheap operation, so it's desirable to avoid extraneous stat and lstat calls
regardless.

Also, I figured out that DirIterator and listdir() did not deal with symlinks
correctly. In particular, they always followed them, which could be very bad in
some situations. So, I'm attaching another version which fixes that problem. It
also makes DirEntry on Linux a bit better. It occurred to me that we might as
well just save the result from stat rather than copying some of its values to
member variables to be used for the properties. That way, we can make the
resulting struct available. It makes the DirEntry struct a bit larger, but it
seems useful enough to me that it should be worth it. However, I didn't add any
more of stat's properties (like inode or block size) to DirEntry since those
are likely used rarely enough that the programmer can just access the stat
struct in such cases. That and it allows DirEntry's API to be mostly the same
between platforms.

As far as I've determined, this fix works on Linux (certainly, all of the unit
tests pass, and all of the other testing that I've done looks good), but I
haven't tested the Windows code at all. Most of the changes are to the Posix
side though, so odds are that the Windows side is okay.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Mar 02 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3848


Jonathan M Davis <jmdavisProg gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------

           obsolete|                            |



23:40:53 PST ---
Created an attachment (id=578)
Patch version of "Now fixes DirIterator and listdir() as well."

Here's the patch version of the most recent fix. Again, you'll need to run
dos2unix on the original std.file.d before applying it.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Mar 02 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3848


Jonathan M Davis <jmdavisProg gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------

           obsolete|                            |



23:51:13 PST ---
Created an attachment (id=579)
Current fix with DirIterator and listdir() fixed.

My apologies. I forgot to change rmdirRecurse() back to use SpanMode.depth
after fixing DirIterator. Here's the version with it back at SpanMode.depth.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Mar 02 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3848


Jonathan M Davis <jmdavisProg gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------

           obsolete|                            |



23:52:20 PST ---
Created an attachment (id=580)
Patch for "Current fix with DirIterator and listdir() fixed."

Patch version.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Mar 02 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3848


Jonathan M Davis <jmdavisProg gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------

           obsolete|                            |



22:25:35 PDT ---
Created an attachment (id=670)
Changes based off of dmd 2.047

File after changes using version from dmd 2.047 as the base without running
dos2uinx on it this time.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jun 21 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3848


Jonathan M Davis <jmdavisProg gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------

           obsolete|                            |



22:27:13 PDT ---
Created an attachment (id=671)
Patch with changes based off of dmd 2.047 without needing dos2unix first

Here's the patch version based off of dmd 2.047 but without needing to run
dos2unix on the original first.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jun 21 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3848


Jonathan M Davis <jmdavisProg gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------

           obsolete|                            |



22:31:22 PDT ---
Created an attachment (id=672)
Changes based off of dmd 2.047

Okay. It's mildly dumb to attach this again, but apparently I obsoleted the
wrong file, so I'm reattaching the changed file based on dmd 2.047 in order to
obsolete the correct file (I think that I obsoleted this file when attaching
the patch).

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jun 21 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3848


Jonathan M Davis <jmdavisProg gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------

           obsolete|                            |



23:19:37 PDT ---
Created an attachment (id=705)
Changes based off of phobos revision 1817

Various changes have been made to std.file since the release of dmd 2.047, so
it seems prudent to update my patch. Here's the fully updated file, merging my
changes and the changes since dmd 2.047 as of phobos revision 1817.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Aug 07 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3848


Jonathan M Davis <jmdavisProg gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------

           obsolete|                            |



23:22:38 PDT ---
Created an attachment (id=706)
Patch with changes based off of phobos revision 1817

Here's the patch version of the changes.

By the way, is there any particular reason that these changes have not been
applied to Phobos beyond no one taking the time to look at the issue? That is,
is there anything definitively wrong with my patch that I would need to fix? Or
is it just that the Phobos devs are busy with other things and haven't looked
at my patch?

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Aug 07 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3848


Don <clugdbug yahoo.com.au> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |clugdbug yahoo.com.au




 Created an attachment (id=706) [details]
 Patch with changes based off of phobos revision 1817
 
 Here's the patch version of the changes.
 
 By the way, is there any particular reason that these changes have not been
 applied to Phobos beyond no one taking the time to look at the issue? That is,
 is there anything definitively wrong with my patch that I would need to fix? Or
 is it just that the Phobos devs are busy with other things and haven't looked
 at my patch?
The latter. There's a huge backlog of quality patches -- for the compiler as well. Unfortunately a couple of issues have been grabbing all the attention. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Aug 08 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3848




16:05:16 PDT ---
I figured as much, but I didn't want to keep trying to keep my patch up-to-date
if there were definitely a problem with it that made it unacceptable. So, I'll
just keep making sure it's at least reasonably up-to-date until someone gets
around to it. Thanks.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Aug 08 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3848


Jonathan M Davis <jmdavisProg gmx.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------

           obsolete|                            |



PDT ---
Created an attachment (id=769)
Patch with changes based off of phobos revision 2046

Okay, I've updated the patch for dmd 2.049 (the diff is off of phobos svn
r2046, but there aren't any changes between 2.049 and r2046). I also added a


-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Sep 23 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3848


Jonathan M Davis <jmdavisProg gmx.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------

           obsolete|                            |



PDT ---
Created an attachment (id=770)
Changes based off of phobos revision 2046

Here's the full file with my changes.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Sep 23 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3848


Jonathan M Davis <jmdavisProg gmx.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------

           obsolete|                            |



PDT ---
Created an attachment (id=795)
Changes based off of dmd 2.050.

Here's the file updated for dmd 2.050. I also added a deprecated alias from
fullname to name (which I should have done in the beginning). Other than that,
it's essentially the same as before except that it's been updated to reflect
the changes in file.d in the official release.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Oct 29 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3848


Jonathan M Davis <jmdavisProg gmx.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------

           obsolete|                            |



PDT ---
Created an attachment (id=796)
Patch with changes based off of dmd 2.050.

Here's the updated patch.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Oct 29 2010
prev sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3848


Jonathan M Davis <jmdavisProg gmx.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|                            |FIXED



PST ---
Fixed in revision 2351.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jan 19 2011