www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - SpanMode uses incorrect terminology (breadth)

reply Ben Davis <entheh cantab.net> writes:
According to http://dlang.org/phobos/std_file.html :

depth
Spans the directory depth-first, i.e. the content of any subdirectory is 
spanned before that subdirectory itself. Useful e.g. when recursively 
deleting files.

breadth
Spans the directory breadth-first, i.e. the content of any subdirectory 
is spanned right after that subdirectory itself.

That's not what breadth-first means at all. See 
http://en.wikipedia.org/wiki/Tree_traversal which has the correct 
definition: breadth-first means ALL directories at one level are 
returned before ANY child of ANY directory is considered.

Wikipedia uses the terms preorder and postorder for what's intended 
here, but those are a bit obtuse. Something like parentFirst and 
childrenFirst would get the message across clearly.

I don't much mind what the names change to, but the current completely 
wrong situation really irks me! Deprecate it at once! :P

Ben :)
Sep 16 2012
next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 9/16/12, Ben Davis <entheh cantab.net> wrote:
 According to http://dlang.org/phobos/std_file.html :

 depth
 Spans the directory depth-first, i.e. the content of any subdirectory is
 spanned before that subdirectory itself. Useful e.g. when recursively
 deleting files.

 breadth
 Spans the directory breadth-first, i.e. the content of any subdirectory
 is spanned right after that subdirectory itself.
I don't understand why "depth" is used in the first place. If you have "shallow" then the opposite of that is "deep", not "depth". I always make that typo when using spanmode.
Sep 16 2012
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 9/16/12 2:57 PM, Ben Davis wrote:
 According to http://dlang.org/phobos/std_file.html :

 depth
 Spans the directory depth-first, i.e. the content of any subdirectory is
 spanned before that subdirectory itself. Useful e.g. when recursively
 deleting files.

 breadth
 Spans the directory breadth-first, i.e. the content of any subdirectory
 is spanned right after that subdirectory itself.

 That's not what breadth-first means at all. See
 http://en.wikipedia.org/wiki/Tree_traversal which has the correct
 definition: breadth-first means ALL directories at one level are
 returned before ANY child of ANY directory is considered.

 Wikipedia uses the terms preorder and postorder for what's intended
 here, but those are a bit obtuse. Something like parentFirst and
 childrenFirst would get the message across clearly.

 I don't much mind what the names change to, but the current completely
 wrong situation really irks me! Deprecate it at once! :P

 Ben :)
What would be an example illustrating that "breadth" is doing the wrong thing? Andrei
Sep 16 2012
parent reply "Jesse Phillips" <jessekphillips+D gmail.com> writes:
 What would be an example illustrating that "breadth" is doing 
 the wrong thing?

 Andrei
Linux 64bit: import std.file; import std.stdio; void main() { mkdir("a"); mkdir("a/b"); mkdir("a/c"); mkdir("a/c/z"); std.file.write("a/1.txt", ""); std.file.write("a/2.txt", ""); std.file.write("a/b/1.txt", ""); std.file.write("a/b/2.txt", ""); std.file.write("a/c/1.txt", ""); std.file.write("a/c/z/1.txt", ""); foreach(string file; dirEntries("a/", SpanMode.breadth)) writeln(file); rmdirRecurse("a"); // Expected Approximation // a/2.txt // a/1.txt // a/b // a/c // a/c/1.txt // a/c/z // a/c/z/1.txt // a/b/1.txt // a/b/2.txt // // Actual // a/c // a/c/z // a/c/z/1.txt // a/c/1.txt // a/b // a/b/2.txt // a/b/1.txt // a/2.txt // a/1.txt }
Sep 16 2012
next sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 17-Sep-12 09:30, Jesse Phillips wrote:
 What would be an example illustrating that "breadth" is doing the
 wrong thing?

 Andrei
Shouldn't be hard to add "true" breadth first then. Since it's a stack based visitation one just needs to instead use queue (FIFO) and change code so that it puts all directories on given level into the queue and only then picks next one from queue.
      // Expected Approximation
      // a/2.txt
      // a/1.txt
      // a/b
      // a/c
      // a/c/1.txt
      // a/c/z
      // a/c/z/1.txt
      // a/b/1.txt
      // a/b/2.txt
      //
      // Actual
      // a/c
      // a/c/z
      // a/c/z/1.txt
      // a/c/1.txt
      // a/b
      // a/b/2.txt
      // a/b/1.txt
      // a/2.txt
      // a/1.txt
P.S. Sorry for pinging you by mail. Damn Thunderbird UI caught me by surprise again :( -- Dmitry Olshansky
Sep 17 2012
parent Ben Davis <entheh cantab.net> writes:
On 17/09/2012 19:15, Dmitry Olshansky wrote:
 On 17-Sep-12 09:30, Jesse Phillips wrote:
  >
  >> What would be an example illustrating that "breadth" is doing the
  >> wrong thing?
  >>
  >> Andrei
  >

 Shouldn't be hard to add "true" breadth first then.
 Since it's a stack based visitation one just needs to instead use queue
 (FIFO) and change code so that it puts all directories on given level
 into the queue and only then picks next one from queue.
Yeah, I think that would work for true breadth-first. :) Doubt anyone actually needs that though...?
Sep 17 2012
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 9/17/12 1:30 AM, Jesse Phillips wrote:
 What would be an example illustrating that "breadth" is doing the
 wrong thing?

 Andrei
Linux 64bit: import std.file; import std.stdio; void main() { mkdir("a"); mkdir("a/b"); mkdir("a/c"); mkdir("a/c/z"); std.file.write("a/1.txt", ""); std.file.write("a/2.txt", ""); std.file.write("a/b/1.txt", ""); std.file.write("a/b/2.txt", ""); std.file.write("a/c/1.txt", ""); std.file.write("a/c/z/1.txt", ""); foreach(string file; dirEntries("a/", SpanMode.breadth)) writeln(file); rmdirRecurse("a"); // Expected Approximation // a/2.txt // a/1.txt // a/b // a/c // a/c/1.txt // a/c/z // a/c/z/1.txt // a/b/1.txt // a/b/2.txt // // Actual // a/c // a/c/z // a/c/z/1.txt // a/c/1.txt // a/b // a/b/2.txt // a/b/1.txt // a/2.txt // a/1.txt }
Thanks, that does seem to be a bug. Please make sure it's in bugzilla. Probably the best way to go is to adjust the behavior so it matches the specification. Andrei
Sep 17 2012
parent reply "Jesse Phillips" <Jessekphillips+D gmail.com> writes:
On Monday, 17 September 2012 at 18:20:55 UTC, Andrei Alexandrescu 
wrote:

 Thanks, that does seem to be a bug. Please make sure it's in 
 bugzilla. Probably the best way to go is to adjust the behavior 
 so it matches the specification.

 Andrei
I should have noted that I'm using ReiserFS, as it use to fail completely to span subdirs in the past. Anyway. Testing on windows shows a smaller issue. http://d.puremagic.com/issues/show_bug.cgi?id=8680 Files show before spanning but directories are shown before spanning itself. a/1.txt a/2.txt a/b a/b\1.txt a/b\2.txt a/c a/c\1.txt a/c\z a/c\z\1.txt
Sep 17 2012
parent reply Ben Davis <entheh cantab.net> writes:
On 17/09/2012 19:47, Jesse Phillips wrote:
 On Monday, 17 September 2012 at 18:20:55 UTC, Andrei Alexandrescu wrote:

 Thanks, that does seem to be a bug. Please make sure it's in bugzilla.
 Probably the best way to go is to adjust the behavior so it matches
 the specification.

 Andrei
I should have noted that I'm using ReiserFS, as it use to fail completely to span subdirs in the past. Anyway. Testing on windows shows a smaller issue. http://d.puremagic.com/issues/show_bug.cgi?id=8680 Files show before spanning but directories are shown before spanning itself. a/1.txt a/2.txt a/b a/b\1.txt a/b\2.txt a/c a/c\1.txt a/c\z a/c\z\1.txt
These results also show a correct preorder depth-first search. The bug report as it stands is misleading; should I post a follow-up on the bug tracker, or should we close it and start again?
Sep 17 2012
parent "Jesse Phillips" <jessek.phillips+D gmail.m> writes:
On Monday, 17 September 2012 at 23:55:17 UTC, Ben Davis wrote:
 On 17/09/2012 19:47, Jesse Phillips wrote:
 a/1.txt
 a/2.txt
 a/b
 a/b\1.txt
 a/b\2.txt
 a/c
 a/c\1.txt
 a/c\z
 a/c\z\1.txt
These results also show a correct preorder depth-first search. The bug report as it stands is misleading; should I post a follow-up on the bug tracker, or should we close it and start again?
You can probably just update. But I fail to see what is missleading? Did I get the expected results wrong? Pre-order depth first does not satisfy breadth first so what am I missin? Sorry Im on a tablet making organization hard.
Sep 17 2012
prev sibling parent reply Ben Davis <entheh cantab.net> writes:
I have a feeling most people have missed the point here. Thanks for the 
example - it's a good one to work with. The 'expected approximation' was 
a bit of a mix of traversal strategies, so I've snipped it out below and 
put my own examples. Hope this helps clarify what I was getting at:

On 17/09/2012 06:30, Jesse Phillips wrote:
 What would be an example illustrating that "breadth" is doing the
 wrong thing?

 Andrei
Linux 64bit: import std.file; import std.stdio; void main() { mkdir("a"); mkdir("a/b"); mkdir("a/c"); mkdir("a/c/z"); std.file.write("a/1.txt", ""); std.file.write("a/2.txt", ""); std.file.write("a/b/1.txt", ""); std.file.write("a/b/2.txt", ""); std.file.write("a/c/1.txt", ""); std.file.write("a/c/z/1.txt", ""); foreach(string file; dirEntries("a/", SpanMode.breadth)) writeln(file); rmdirRecurse("a");
[snip] Here are the CORRECT tree traversal patterns: Breadth-first (probably never required): a/b a/c a/1.txt a/2.txt a/b/1.txt a/b/2.txt a/c/z a/c/1.txt a/c/z/1.txt Defining property: number of /'s increases monotonically. Note how the deeper you go, the more spread out the children become. It's ALL children, then ALL grandchildren, then ALL great-grandchildren, etc. I wouldn't bother implementing breadth-first. It's doubtful that anyone would want it, surely...? Depth-first search has two useful variants, preorder and postorder. Preorder depth-first (currently wrongly called SpanMode.breadth): a/b a/b/1.txt a/b/2.txt a/c a/c/z a/c/z/1.txt a/c/1.txt a/1.txt a/2.txt Defining property: every directory is immediately followed by all its children. Postorder depth-first (currently only called SpanMode.depth): a/b/1.txt a/b/2.txt a/b a/c/z/1.txt a/c/z a/c/1.txt a/c a/1.txt a/2.txt Defining property: every directory is immediately *preceded* by all its children. Going back to the 'actual' output you quoted:
      // Actual
      // a/c
      // a/c/z
      // a/c/z/1.txt
      // a/c/1.txt
      // a/b
      // a/b/2.txt
      // a/b/1.txt
      // a/2.txt
      // a/1.txt
This looks like a preorder depth-first iteration (albeit with child order reversed at each level). The implementation is correct in that it matches the spec as documented. The problem is that the spec is using the term 'breadth' wrongly. Here is what I would do to fix it (sorry if I got the syntax wrong): enum SpanMode { shallow, preorder, postorder; deprecated alias preorder breadth; deprecated alias postorder depth; } I initially said that possibly preorder and postorder weren't clear, but they grew on me very quickly. I think people could learn that "preorder means parents first" or "preorder is the one I usually want". Also, these are the standard terms (refer to the Wikipedia link), so once you know them, you can use them for everything, both in and outside of :D. Thanks, Ben :)
Sep 17 2012
next sibling parent "Jesse Phillips" <jessekphillips+D gmail.com> writes:
On Monday, 17 September 2012 at 23:27:22 UTC, Ben Davis wrote:

 Note how the deeper you go, the more spread out the children 
 become. It's ALL children, then ALL grandchildren, then ALL 
 great-grandchildren, etc.

 I wouldn't bother implementing breadth-first. It's doubtful 
 that anyone would want it, surely...?
Ah sorry, yes. I didn't pay too close attention as I've never wanted to traverse a tree in that manner. I'll update the title of the issue, add your comments.
Sep 17 2012
prev sibling parent reply "David Piepgrass" <qwertie256 gmail.com> writes:
 Breadth-first (probably never required):
 a/b
 a/c
 a/1.txt
 a/2.txt
 a/b/1.txt
 a/b/2.txt
 a/c/z
 a/c/1.txt
 a/c/z/1.txt
 Defining property: number of /'s increases monotonically. Note 
 how the deeper you go, the more spread out the children become. 
 It's ALL children, then ALL grandchildren, then ALL 
 great-grandchildren, etc.

 I wouldn't bother implementing breadth-first. It's doubtful 
 that anyone would want it, surely...?
Actually I prefer breadth-first search when searching the file system. When I search an entire volume, inevitably the (depth-first) search gets stuck in a few giant, deep directories like the source code of Mono or some other cave of source code, you know, something 12 directories deep with 100,000 files in it. A breadth-first search would be more likely to find the thing I'm looking for BEFORE going spelunking in these 12-deep caves.
Sep 18 2012
next sibling parent "Mehrdad" <wfunction hotmail.com> writes:
On Tuesday, 18 September 2012 at 14:24:30 UTC, David Piepgrass 
wrote:
 Breadth-first (probably never required):
 a/b
 a/c
 a/1.txt
 a/2.txt
 a/b/1.txt
 a/b/2.txt
 a/c/z
 a/c/1.txt
 a/c/z/1.txt
 Defining property: number of /'s increases monotonically. Note 
 how the deeper you go, the more spread out the children 
 become. It's ALL children, then ALL grandchildren, then ALL 
 great-grandchildren, etc.

 I wouldn't bother implementing breadth-first. It's doubtful 
 that anyone would want it, surely...?
Actually I prefer breadth-first search when searching the file system. When I search an entire volume, inevitably the (depth-first) search gets stuck in a few giant, deep directories like the source code of Mono or some other cave of source code, you know, something 12 directories deep with 100,000 files in it. A breadth-first search would be more likely to find the thing I'm looking for BEFORE going spelunking in these 12-deep caves.
We really need iterative deepening.
Sep 18 2012
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 9/18/12 10:25 AM, David Piepgrass wrote:
 Breadth-first (probably never required):
 a/b
 a/c
 a/1.txt
 a/2.txt
 a/b/1.txt
 a/b/2.txt
 a/c/z
 a/c/1.txt
 a/c/z/1.txt
 Defining property: number of /'s increases monotonically. Note how the
 deeper you go, the more spread out the children become. It's ALL
 children, then ALL grandchildren, then ALL great-grandchildren, etc.

 I wouldn't bother implementing breadth-first. It's doubtful that
 anyone would want it, surely...?
Actually I prefer breadth-first search when searching the file system. When I search an entire volume, inevitably the (depth-first) search gets stuck in a few giant, deep directories like the source code of Mono or some other cave of source code, you know, something 12 directories deep with 100,000 files in it. A breadth-first search would be more likely to find the thing I'm looking for BEFORE going spelunking in these 12-deep caves.
Yes, that was my intent too. Andrei
Sep 18 2012
parent "Mehrdad" <wfunction hotmail.com> writes:
On Tuesday, 18 September 2012 at 15:19:08 UTC, Andrei 
Alexandrescu wrote:
 On 9/18/12 10:25 AM, David Piepgrass wrote:
 Actually I prefer breadth-first search when searching the file 
 system. When I search an entire volume, inevitably the 
 (depth-first) search gets stuck in a few giant, deep 
 directories like the source code of Mono or some other cave of 
 source code, you know, something 12 directories deep with 
 100,000 files in it. A breadth-first search would be more 
 likely to find the thing I'm looking for BEFORE going 
 spelunking in these 12-deep caves.
Yes, that was my intent too. Andrei
I just wanted to point out, BFS is generally a pretty BAD idea for searching a file system. You instead probably want to implement DFS, and then alter it slightly to use iterative deepening instead of doing plain breadth-first search. That way you won't be keeping open a million OS file handles.
Oct 08 2012
prev sibling parent reply Ben Davis <entheh cantab.net> writes:
On 18/09/2012 15:25, David Piepgrass wrote:
 Breadth-first (probably never required):
 a/b
 a/c
 a/1.txt
 a/2.txt
 a/b/1.txt
 a/b/2.txt
 a/c/z
 a/c/1.txt
 a/c/z/1.txt
 Defining property: number of /'s increases monotonically. Note how the
 deeper you go, the more spread out the children become. It's ALL
 children, then ALL grandchildren, then ALL great-grandchildren, etc.

 I wouldn't bother implementing breadth-first. It's doubtful that
 anyone would want it, surely...?
Actually I prefer breadth-first search when searching the file system. When I search an entire volume, inevitably the (depth-first) search gets stuck in a few giant, deep directories like the source code of Mono or some other cave of source code, you know, something 12 directories deep with 100,000 files in it. A breadth-first search would be more likely to find the thing I'm looking for BEFORE going spelunking in these 12-deep caves.
Fair point. :) So there is a case for implementing breadth-first iteration (or as Mehrdad suggests, iterative deepening - it's just a CPU-memory tradeoff). However, the currently implemented 'breadth' (preorder depth-first) is probably used a lot, and someone might be relying on its current behaviour (e.g. to create a tree view). So how do we proceed without silently breaking existing code? Do we use 'breadthFirst' instead of 'breadth'? Do we just gulp and change it? Do we rename 'SpanMode' to 'TreeMode' or something?
Sep 18 2012
parent reply Ben Davis <entheh cantab.net> writes:
On 18/09/2012 21:08, Ben Davis wrote:
 On 18/09/2012 15:25, David Piepgrass wrote:
 Breadth-first (probably never required):
 a/b
 a/c
 a/1.txt
 a/2.txt
 a/b/1.txt
 a/b/2.txt
 a/c/z
 a/c/1.txt
 a/c/z/1.txt
 Defining property: number of /'s increases monotonically. Note how the
 deeper you go, the more spread out the children become. It's ALL
 children, then ALL grandchildren, then ALL great-grandchildren, etc.

 I wouldn't bother implementing breadth-first. It's doubtful that
 anyone would want it, surely...?
Actually I prefer breadth-first search when searching the file system. When I search an entire volume, inevitably the (depth-first) search gets stuck in a few giant, deep directories like the source code of Mono or some other cave of source code, you know, something 12 directories deep with 100,000 files in it. A breadth-first search would be more likely to find the thing I'm looking for BEFORE going spelunking in these 12-deep caves.
Fair point. :) So there is a case for implementing breadth-first iteration (or as Mehrdad suggests, iterative deepening - it's just a CPU-memory tradeoff). However, the currently implemented 'breadth' (preorder depth-first) is probably used a lot, and someone might be relying on its current behaviour (e.g. to create a tree view). So how do we proceed without silently breaking existing code? Do we use 'breadthFirst' instead of 'breadth'? Do we just gulp and change it? Do we rename 'SpanMode' to 'TreeMode' or something?
So maybe we'd have: enum SpanMode { shallow, depthFirstPreorder, depthFirstPostorder, breadthFirst; deprecated alias depthFirstPostorder depth; deprecated alias depthFirstPreorder breadth; } Also don't forget the intent conveyed by the current docs: "depth Spans the directory depth-first, i.e. the content of any subdirectory is spanned before that subdirectory itself. Useful e.g. when recursively deleting files. breadth Spans the directory breadth-first, i.e. the content of any subdirectory is spanned right after that subdirectory itself." This would need to be rewritten to say something like: "depthFirstPostorder Spans the directory depth-first postorder, i.e. the content of any subdirectory is spanned before that subdirectory itself. Useful e.g. when recursively deleting files. depthFirstPreorder Spans the directory depth-first preorder, i.e. the content of any subdirectory is spanned right after that subdirectory itself. breadthFirst Spans the directory breadth-first, i.e. all children are returned first, followed by all grandchildren, followed by all great-grandchildren, and so on. Useful e.g. when searching for a file that is likely not to be very deep." I personally won't object if everyone feels comfortable with reusing 'breadth' and changing existing behaviour. I just have a feeling it's generally considered a bad thing. Otherwise I'm all for shortening the names (perhaps by just omitting 'First' from the depth ones). :)
Sep 18 2012
parent Ben Davis <entheh cantab.net> writes:
Another thing worth mentioning occurred to me today.

Whoever implements breadth-first - don't get caught out by symlinks. For 
the depth-first variants, you can just detect them by looking in the 
stack. For the breadth-first, you won't be able to detect them by 
looking in the queue that replaced the stack. :)
Sep 18 2012