www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Why doesn't std.file.exists follow symbolic links?

reply Jack Applegame <japplegame gmail.com> writes:
```d
import std.stdio : writeln;
import std.file : exists, write, symlink, remove;

void main() {
     write("file.txt", "hello");
     symlink("file.txt", "link.txt");
     writeln(exists("link.txt")); // true
     remove("file.txt");
     writeln(exists("link.txt")); // true, why?
}
```

In other languages (including C++) similar functions follow 
symbolic links.
Jul 02
next sibling parent reply jfondren <julian.fondren gmail.com> writes:
On Friday, 2 July 2021 at 12:09:20 UTC, Jack Applegame wrote:
 ```d
 import std.stdio : writeln;
 import std.file : exists, write, symlink, remove;

 void main() {
     write("file.txt", "hello");
     symlink("file.txt", "link.txt");
     writeln(exists("link.txt")); // true
     remove("file.txt");
     writeln(exists("link.txt")); // true, why?
 }
 ```

 In other languages (including C++) similar functions follow 
 symbolic links.
Some thoughts: 1. This is a dubious test anyway as the status of the file can change immediately after the test. If at all possible the better way to deal with a file system is to "ask for forgiveness" (gracefully react to errors) rather than "ask for permission" (use tests like this and then be surprised by an error that can still happen). 2. Saying that a symlink "doesn't exist" when it clearly does exist could also be confusing. 3. System software that's trying to make secure use of the filesystem should really be using the openat() and other *at syscalls with dir fds. The kernel APIs have developed a lot in the past few decades and one reason I prefer D over traditional 'scripting languages' is that those languages have all refused to track these developments, so e.g. D but not Perl can swap two files atomically (with renameat2), without worrying about race conditions where a process might notice that one of the files doesn't exist. 4. Actually for the reasons above, if std.file.exists were freshly made I think it would also be completely fine to change it if indeed stat() is the more popular underlying call... 5. ... but it's been like this since 2015. So people who wanted to know what the function actually did have already peeked into the library, saw it was lstat() on POSIX, and are now relying on that. I would resolving this in the direction of clearly documenting the interaction with symlinks.
Jul 02
parent reply Jack Applegame <japplegame gmail.com> writes:
On Friday, 2 July 2021 at 12:32:21 UTC, jfondren wrote:
 On Friday, 2 July 2021 at 12:09:20 UTC, Jack Applegame wrote:
 ```d
 import std.stdio : writeln;
 import std.file : exists, write, symlink, remove;

 void main() {
     write("file.txt", "hello");
     symlink("file.txt", "link.txt");
     writeln(exists("link.txt")); // true
     remove("file.txt");
     writeln(exists("link.txt")); // true, why?
 }
 ```

 In other languages (including C++) similar functions follow 
 symbolic links.
Some thoughts: 1. This is a dubious test anyway as the status of the file can change immediately after the test. If at all possible the better way to deal with a file system is to "ask for forgiveness" (gracefully react to errors) rather than "ask for permission" (use tests like this and then be surprised by an error that can still happen).
This is a completely different topic. The above code is just a demonstration that `std.file.exists` does not follow symbolic links, and not the real code.
 2. Saying that a symlink "doesn't exist" when it clearly does 
 exist
 could also be confusing.
I do not think so. The symbolic link should be transparent by default.
 3. System software that's trying to make secure use of the 
 filesystem
 should really be using the openat() and other *at syscalls with
 dir fds. The kernel APIs have developed a lot in the past few 
 decades
 and one reason I prefer D over traditional 'scripting 
 languages' is
 that those languages have all refused to track these 
 developments, so
 e.g. D but not Perl can swap two files atomically (with 
 renameat2),
 without worrying about race conditions where a process might 
 notice
 that one of the files doesn't exist.
This is also a completely different topic.
 4. Actually for the reasons above, if std.file.exists were 
 freshly
 made I think it would also be completely fine to change it if 
 indeed
 stat() is the more popular underlying call...

 5. ... but it's been like this since 2015. So people who wanted 
 to know what the function actually did have already peeked into 
 the library, saw it was lstat() on POSIX, and are now relying 
 on that.

 I would resolving this in the direction of clearly documenting 
 the interaction with symlinks.
Maybe you're right. I don't know how to fix this correctly.
Jul 02
parent reply jfondren <julian.fondren gmail.com> writes:
On Friday, 2 July 2021 at 14:11:22 UTC, Jack Applegame wrote:
 This is a completely different topic. The above code is just a 
 demonstration that `std.file.exists` does not follow symbolic 
 links, and not the real code.
...
 This is also a completely different topic.
The unifying topic is "there is a correct way to work with the filesystem, and exists() isn't it, so who cares if languages vary on the implementation of a wrong way to work with the filesystem?" A naive user of any implementation of exists() is going to have a lot more to worry about. A non-naive user of it will be aware of how it is implemented.
Jul 02
parent reply Jack Applegame <japplegame gmail.com> writes:
On Friday, 2 July 2021 at 15:04:33 UTC, jfondren wrote:
 On Friday, 2 July 2021 at 14:11:22 UTC, Jack Applegame wrote:
 This is a completely different topic. The above code is just a 
 demonstration that `std.file.exists` does not follow symbolic 
 links, and not the real code.
...
 This is also a completely different topic.
The unifying topic is "there is a correct way to work with the filesystem, and exists() isn't it, so who cares if languages vary on the implementation of a wrong way to work with the filesystem?"
I disagree. In many simple cases, this option is quite acceptable: ```d void main() { try { ... auto data = readData(file_name); ... } catch(Exception e) { // Fatal error } } auto readData(string file_name) { ... if(exists(file_name)) { ... read_file(file_name); ... } else { ... create_file(file_name); ... } ... } ```
 A naive user of any implementation of exists() is going to have 
 a lot
 more to worry about. A non-naive user of it will be aware of 
 how it is
 implemented.
I am a "naive user" of `exists()` in production and have not encountered any problems with it. Why do people think that any program should be written as if it will work on the International Space Station?
Jul 02
parent jfondren <julian.fondren gmail.com> writes:
On Friday, 2 July 2021 at 15:30:43 UTC, Jack Applegame wrote:
 I am a "naive user" of `exists()` in production and have not 
 encountered any problems with it.
OK, let's add a third category: 1. someone who uses exists() without an awareness of race conditions. (I argue this person has more to worry about than symlink resolution.) 2. someone who uses exists() with an acceptance of race conditions, but whose familiarity with similar functionality from other languages results in an unpleasant surprise with D. (I argue that this is a documentation problem. Incidentally, stuff like https://github.com/dlang/phobos/blob/master/std/file.d#L1957 should really just be in the generated phobos docs. That's useful information and very much like the topic at hand. Perhaps there are also people who expected exists() to be implemented with access) 3. someone who distrusts these abstractions of the POSIX API and therefore doesn't use them without confirming exactly how they're implemented. (I've offended you by presenting this as the "non-naive"
Jul 02
prev sibling parent Vladimir Panteleev <thecybershadow.lists gmail.com> writes:
On Friday, 2 July 2021 at 12:09:20 UTC, Jack Applegame wrote:
 In other languages (including C++) similar functions follow 
 symbolic links.
To try to answer the "why": https://github.com/dlang/phobos/pull/1142 Looks like eight years ago I thought that not using lstat would somehow break code in that circumstance, but it's difficult to figure out the details given that the sands of time have eroded the previous iterations of that pull request.
Jul 02