www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - dirEntries removes entire branches of empty directories

reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
In case it matters, the file system is ext4.

1) Create a directory:

   mkdir deleteme

and then run the following program:

import std;

void main() {
     foreach (e; dirEntries(absolutePath("./deleteme"), SpanMode.breadth)) {
         writeln(e.name);
     }
}

Understandably, the top level directory 'deleteme' will not be printed.

2) Make a sub-directory:

   mkdir deleteme/a

Running the program shows no output; 'a' is not visited as a directory 
entry.

3) Create a file inside the sub-directory:

   touch deleteme/a/x

Now the program will show 2 entries; the branch is accessible:

/home/ali/d/./deleteme/a
/home/ali/d/./deleteme/a/x

Imagine a program that wants to make sure the directory structure is 
intact, even the empty directories should exist. Can you think of a 
workaround to achieve that?

Do you think this is buggy behavior for dirEntries?

Ali
Nov 09 2022
next sibling parent reply Vladimir Panteleev <thecybershadow.lists gmail.com> writes:
On Wednesday, 9 November 2022 at 19:05:58 UTC, Ali Çehreli wrote:
 Running the program shows no output; 'a' is not visited as a 
 directory entry.
That's not what happens for me: ```d import std.exception; import std.file; import std.path; import std.stdio; void ls() { foreach (e; dirEntries(absolutePath("./deleteme"), SpanMode.breadth)) { writeln(e.name); } } void main() { "./deleteme".rmdirRecurse.collectException; "./deleteme".mkdir(); writeln("empty"); ls(); writeln("only a directory"); mkdir("./deleteme/a"); ls(); writeln("directory and file"); std.file.write("./deleteme/a/x", ""); ls(); } ``` Locally and on run.dlang.io I get: ``` empty only a directory /sandbox/./deleteme/a directory and file /sandbox/./deleteme/a /sandbox/./deleteme/a/x ```
Nov 09 2022
parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 11/9/22 11:30, Vladimir Panteleev wrote:
 On Wednesday, 9 November 2022 at 19:05:58 UTC, Ali Çehreli wrote:
 Running the program shows no output; 'a' is not visited as a directory
 entry.
That's not what happens for me:
Does not happen for me today either. (?) I must have confused myself both with my actual program and with a trivial isolated program that I had written to test it. Unless others have seen the same behavior yesterday there is no bug here today. :p Ali "walks away with a confused look on his face"
Nov 10 2022
parent reply Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Thursday, 10 November 2022 at 16:34:53 UTC, Ali Çehreli wrote:
 On 11/9/22 11:30, Vladimir Panteleev wrote:
 On Wednesday, 9 November 2022 at 19:05:58 UTC, Ali Çehreli
wrote:
 Running the program shows no output; 'a' is not visited as a
directory
 entry.
That's not what happens for me:
Does not happen for me today either. (?) I must have confused myself both with my actual program and with a trivial isolated program that I had written to test it. Unless others have seen the same behavior yesterday there is no bug here today. :p Ali "walks away with a confused look on his face"
Oh, did you run the program on Wednesday? Fool!
Nov 10 2022
parent "H. S. Teoh" <hsteoh qfbox.info> writes:
On Thu, Nov 10, 2022 at 07:07:33PM +0000, Imperatorn via Digitalmars-d-learn
wrote:
 On Thursday, 10 November 2022 at 16:34:53 UTC, Ali Çehreli wrote:
 On 11/9/22 11:30, Vladimir Panteleev wrote:
 On Wednesday, 9 November 2022 at 19:05:58 UTC, Ali Çehreli
 wrote:
 Running the program shows no output; 'a' is not visited as a
 directory entry.
That's not what happens for me:
Does not happen for me today either. (?) I must have confused myself both with my actual program and with a trivial isolated program that I had written to test it. Unless others have seen the same behavior yesterday there is no bug here today. :p Ali "walks away with a confused look on his face"
Oh, did you run the program on Wednesday? Fool!
I think it was because yesterday MSFT stock dipped, but today it rose by 15, so Windows is working properly again. :-P T -- "You are a very disagreeable person." "NO."
Nov 10 2022
prev sibling next sibling parent kdevel <kdevel vogtner.de> writes:
On Wednesday, 9 November 2022 at 19:05:58 UTC, Ali Çehreli wrote:
 In case it matters, the file system is ext4.
My code runs in tmp (tmpfs).
 2) Make a sub-directory:

   mkdir deleteme/a

 Running the program shows no output; 'a' is not visited as a 
 directory entry.
Was say strace/ltrace? ```didi.d import std.stdio; import std.file; void main (string [] args) { auto de = dirEntries (args[1], SpanMode.breadth); foreach (e; de) writeln(e.name); } ``` ``` $ mkdir -p deleteme/a $ dmd didi $ ./didi deleteme deleteme/a
 Do you think this is buggy behavior for dirEntries?
Sure.
Nov 09 2022
prev sibling next sibling parent reply Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Wednesday, 9 November 2022 at 19:05:58 UTC, Ali Çehreli wrote:
 In case it matters, the file system is ext4.

 1) Create a directory:

 [...]
That's not the behaviour I get in Windows. When I create the subdirectory, I see it even if it's empty
Nov 09 2022
parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 11/9/22 11:48, Imperatorn wrote:

 That's not the behaviour I get in Windows.
Windows users deserve it! :p (At least it is better in this case. :) )
 When I create the subdirectory, I see it even if it's empty
struct DirIteratorImpl has different implementations for Windows, etc. Ali
Nov 09 2022
parent Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Wednesday, 9 November 2022 at 19:59:57 UTC, Ali Çehreli wrote:
 On 11/9/22 11:48, Imperatorn wrote:

 That's not the behaviour I get in Windows.
Windows users deserve it! :p (At least it is better in this case. :) )
 When I create the subdirectory, I see it even if it's empty
struct DirIteratorImpl has different implementations for Windows, etc. Ali
Anyway, it's definitely a bug in that implementation
Nov 09 2022
prev sibling parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 11/9/22 11:05, Ali Çehreli wrote:

 Can you think of a workaround to achieve that?
Me, me, me! :) I've learned about the Posix function 'nftw' (but I am using its sibling 'ftw'). It was pretty easy to use but there is a quality issue there: They failed to support a 'void*' context for the user! You can walk the tree but can't put the results into your local context! Boo! I guess it was designed by someone who is happy with global variables. :) At least D makes it easy to guard access to module variables with 'synchronized', shared, etc. Ali
Nov 09 2022
next sibling parent Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Wednesday, 9 November 2022 at 20:06:15 UTC, Ali Çehreli wrote:
 On 11/9/22 11:05, Ali Çehreli wrote:

 It was pretty easy to use but there is a quality issue there: 
 They failed to support a 'void*' context for the user! You can 
 walk the tree but can't put the results into your local 
 context! Boo!
👻
Nov 09 2022
prev sibling parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 11/9/22 12:06, Ali Çehreli wrote:

 I am using its sibling 'ftw'
Now that we know that dirEntries works properly, I decided not to use ftw. However, ftw performs about twice as fast as dirEntries (despite some common code in the implementation below). I am leaving it here in case somebody finds it useful. (Why don't I put it on github then; ok, some day I will.) import core.sys.posix.sys.stat; import std.algorithm; import std.exception; import std.file; import std.path; import std.range; import std.string; // The Posix "file tree walker" function extern (C) int ftw(const char *dirpath, int function (const char *fpath, const stat_t *sb, int typeflag) fn, int nopenfd); enum TypeFlag { FTW_F, // regular file FTW_D, // directory // See 'man nftw' or /usr/include/ftw.h for the other values } struct DirectoryEntry { string name; ulong size; } struct WalkResult { DirectoryEntry[] entries; string[] emptyDirs; } WalkResult directoryWalk_ftw(string root) { WalkResult impl_() { // These have to be 'static' because ftw() does not allow us to pass a // context. And that's why this function must only be called from a // synchronized block. static DirectoryEntry[] entries; static string[] dirs; entries.length = 0; entries.assumeSafeAppend(); dirs.length = 0; dirs.assumeSafeAppend(); // This is the callback that ftw() uses. extern (C) int handler(const char *fpath, const stat_t *sb, int typeflag) { const path = fpath.fromStringz.idup; switch (typeflag) { case TypeFlag.FTW_F: entries ~= DirectoryEntry(path, sb.st_size); break; case TypeFlag.FTW_D: dirs ~= path; break; default: import std.stdio; writefln!"Ignoring type %s file: %s\n(See 'man nftw')b"( path, typeflag); break; } return 0; } // The tree walk will be faster up-to this "search depth" (See 'man nftw') enum nopenfd = 32; const ret = ftw(root.toStringz, &handler, nopenfd); enforce(ret == 0, format!"Failed walking the directory tree at %s; error: %s"( root, ret)); string[] nonEmptyDirs = chain(entries.map!(e => e.name), dirs) .map!dirName .array .sort .uniq .array; sort(dirs); string[] emptyDirs = setDifference(dirs, nonEmptyDirs) .array; return WalkResult(entries.dup, emptyDirs); } synchronized { return impl_(); } } WalkResult directoryWalk_dirEntries(string root) { DirectoryEntry[] entries; string[] dirs; foreach (entry; dirEntries(root, SpanMode.depth)) { if (entry.isDir) { dirs ~= entry; } else { entries ~= DirectoryEntry(entry, entry.getSize); } } string[] nonEmptyDirs = chain(entries.map!(e => e.name), dirs) .map!dirName .array .sort .uniq .array; sort(dirs); string[] emptyDirs = setDifference(dirs, nonEmptyDirs) .array; return WalkResult(entries.dup, emptyDirs); } int main(string[] args) { import std.datetime.stopwatch; import std.stdio; import std.path; if (args.length != 2) { stderr.writefln!"Please provide the directory to walk:\n\n %s <directory>\n" (args[0].baseName); return 1; } const dir = buildNormalizedPath("/home/ali/dlang"); auto timings = benchmark!({ directoryWalk_ftw(dir); }, { directoryWalk_dirEntries(dir); })(10); writefln!("ftw : %s\n" ~ "dirEntries: %s")(timings[0], timings[1]); return 0; } Ali
Nov 10 2022
next sibling parent reply kdevel <kdevel vogtner.de> writes:
On Thursday, 10 November 2022 at 21:27:28 UTC, Ali Çehreli wrote:
 On 11/9/22 12:06, Ali Çehreli wrote:

 I am using its sibling 'ftw'
Now that we know that dirEntries works properly, I decided not to use ftw. However, ftw performs about twice as fast as dirEntries (despite some common code in the implementation below).
dmd -O compiled patched (see below!) version applied to /usr/bin on my desktop yields: ftw : 363 ms, 750 ÃŽÅ’s, and 5 [*] dirEntries: 18 secs, 831 ms, 738 ÃŽÅ’s, and 3 [*] (* = offending units removed)
 [...]
     foreach (entry; dirEntries(root, SpanMode.depth)) {
         if (entry.isDir) {
             dirs ~= entry;

         } else {
             entries ~= DirectoryEntry(entry, entry.getSize);
         }
strace reports that entry.getSize invokes stat on the file a second time. Isn't the stat buf saved in the entry? This also gives rise for a complication with symlinks pointing to the directory which contain them: $ pwd /tmp/k/sub $ ln -s . foo $ ../direntrybenchmark . std.file.FileException 8[...]/linux/bin64/../../src/phobos/std/file.d(1150): ./foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/ oo/foo/foo/foo/foo: Too many levels of symbolic links [...]
 [...]
     if (args.length != 2) {
         stderr.writefln!"Please provide the directory to 
 walk:\n\n  %s <directory>\n"
             (args[0].baseName);
         return 1;
     }

     const dir = buildNormalizedPath("/home/ali/dlang");
diff --git a/direntrybenchmark.d b/direntrybenchmark.d index 661df51..a9a5616 100644 --- a/direntrybenchmark.d +++ b/direntrybenchmark.d -102,8 +102,9 WalkResult directoryWalk_dirEntries(string root) { if (entry.isDir) { dirs ~= entry; - } else { - entries ~= DirectoryEntry(entry, entry.getSize); + } + else { + entries ~= DirectoryEntry(entry, 0); } } -133,7 +134,7 int main(string[] args) { return 1; } - const dir = buildNormalizedPath("/home/ali/dlang"); + const dir = buildNormalizedPath(args[1]); auto timings = benchmark!({ directoryWalk_ftw(dir); }, { directoryWalk_dirEntries(dir); })(10);
Nov 11 2022
parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 11/11/22 05:13, kdevel wrote:

 dmd -O compiled patched (see below!) version applied to /usr/bin on my
 desktop
 yields:

 ftw       : 363 ms, 750 ÃŽÅ’s, and 5 [*]
 dirEntries: 18 secs, 831 ms, 738 ÃŽÅ’s, and 3 [*]
Great. I did not use -O with my test. It may have to do something with the performance of the hard disk. ftw wins big time. Being just a D binding of a C library function, its compilation should be quick too.
             entries ~= DirectoryEntry(entry, entry.getSize);
         }
strace reports that entry.getSize invokes stat on the file a second time. Isn't the stat buf saved in the entry?
That's my bad. entry.size is the cached version of the file size.
 This also gives rise for a complication with symlinks pointing to the
 directory
 which contain them:

     $ pwd
     /tmp/k/sub
     $ ln -s . foo
     $ ../direntrybenchmark .
 
std.file.FileException 8[...]/linux/bin64/../../src/phobos/std/file.d(1150): ./foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/foo/ oo/foo/foo/foo/foo: Too many levels of symbolic links So, ftw does not have that problem? Perhaps because of its default symlink behavior? There is also the more capable nftw, where the caller can specify some flags. And yes, there it is: FTW_PHYS If set, do not follow symbolic links. (This is what you want.) If not set, symbolic links are followed, but no file is reported twice. If FTW_PHYS is not set, but FTW_DEPTH is set, then the function fn() is never called for a directory that would be a descendant of itself.
 -    const dir = buildNormalizedPath("/home/ali/dlang");
 +    const dir = buildNormalizedPath(args[1]);
That one, and I had switched the arguments on the following call. One more example where string interpolation would be useful: writefln!"Ignoring type %s file: %s\n(See 'man nftw')b"( path, typeflag); I meant the arguments in the reverse order there. OT: And there is a 'b' character at the end of that format string which almost certainly appeared when I botched a Ctrl-b command in my editor. :) Ali
Nov 11 2022
next sibling parent =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 11/11/22 08:00, Ali Çehreli wrote:

 It may have to do something with the performance of the hard disk.
I meant "the reason you got a much better improvement" may have to do something with the performance differences of your hard disk and mine. Ali
Nov 11 2022
prev sibling parent reply kdevel <kdevel vogtner.de> writes:
On Friday, 11 November 2022 at 16:00:12 UTC, Ali Çehreli wrote:
 On 11/11/22 05:13, kdevel wrote:

 dmd -O compiled patched (see below!) version applied to
/usr/bin on my
 desktop
 yields:

 ftw       : 363 ms, 750 ÃŽÅ’s, and 5 [*]
 dirEntries: 18 secs, 831 ms, 738 ÃŽÅ’s, and 3 [*]
Great. I did not use -O with my test. It may have to do something with the performance of the hard disk.
It has to do with the large number of symlinks. When I use dirEntries(root, SpanMode.depth, false) the runtime is dramatically reduced and with entries ~= DirectoryEntry(entry, entry.size); the runtimes are ftw : 98 ms, 470 ÃŽÅ’s, and 2 *beeep* dirEntries: 170 ms, 515 ÃŽÅ’s, and 2 *beeep* (to be continued)
Nov 14 2022
parent reply kdevel <kdevel vogtner.de> writes:
On Monday, 14 November 2022 at 21:05:01 UTC, kdevel wrote:
 [...]
 the runtimes are

    ftw       : 98 ms, 470 ÃŽÅ’s, and 2 *beeep*
    dirEntries: 170 ms, 515 ÃŽÅ’s, and 2 *beeep*

 (to be continued)
When I examine the process with strace it appears that the ftw version gets the whole information from readdir alone. The dirEntries version seems to call lstat on every file (in order to check that it is not a symlink) Breakpoint 1, 0xf7cc59d4 in lstat64 () from [...]gcc-12.1/lib/libgphobos.so.3 (gdb) bt [...]gcc-12.1/lib/libgphobos.so.3 [...]gcc-12.1/lib/libgphobos.so.3 [...]gcc-12.1/lib/libgphobos.so.3 [...]gcc-12.1/lib/libgphobos.so.3 [...]gcc-12.1/lib/libgphobos.so.3 [...]gcc-12.1/lib/libgphobos.so.3 [...]gcc-12.1/lib/libgphobos.so.3 [...]gcc-12.1/lib/libgphobos.so.3 [...]gcc-12.1/lib/libgphobos.so.3 [...]gcc-12.1/lib/libgphobos.so.3 [...]gcc-12.1/lib/libgphobos.so.3 [...]gcc-12.1/lib/libgphobos.so.3 (root=..., dump=false) at direntrybenchmark.d:111 and after that an additional stat on the same file in order to check if it is a directory: Breakpoint 2, 0xf7cc5954 in stat64 () from [...]gcc-12.1/lib/libgphobos.so.3 (gdb) bt from [...]gcc-12.1/lib/libgphobos.so.3 [...]gcc-12.1/lib/libgphobos.so.3 (root=..., dump=<optimized out>) at direntrybenchmark.d:112 direntrybenchmark.d:158 at /md11/sda2-usr2l/gcc-12.1/lib/gcc/x86_64-pc-linux-gnu/12.1.0/include/d/std/datetime/stopwatch.d:421 __applyArg1=...) at direntrybenchmark.d:162 [...]gcc-12.1/lib/libgphobos.so.3 direntrybenchmark.d:161
Nov 14 2022
parent =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 11/14/22 14:41, kdevel wrote:

 the ftw version gets the whole information from readdir alone.
Created an enhancement request: https://issues.dlang.org/show_bug.cgi?id=23512 Ali
Nov 26 2022
prev sibling parent Vladimir Panteleev <thecybershadow.lists gmail.com> writes:
On Thursday, 10 November 2022 at 21:27:28 UTC, Ali Çehreli wrote:
 However, ftw performs about twice as fast as dirEntries
Yes, `dirEntries` isn't as fast as it could be. Here is a directory iterator which tries to strictly not do more work than what it must: https://github.com/CyberShadow/ae/blob/86b016fd258ebc26f0da3239a6332c4ebecd3215/sys/file.d#L178
Nov 29 2022