digitalmars.D - SpanMode uses incorrect terminology (breadth)

Ben Davis (18/18) Sep 16 2012 According to http://dlang.org/phobos/std_file.html :

Andrej Mitrovic (4/12) Sep 16 2012 I don't understand why "depth" is used in the first place. If you have
Andrei Alexandrescu (4/22) Sep 16 2012 What would be an example illustrating that "breadth" is doing the wrong

Jesse Phillips (39/42) Sep 16 2012 Linux 64bit:

Dmitry Olshansky (9/34) Sep 17 2012 Shouldn't be hard to add "true" breadth first then.

Ben Davis (3/14) Sep 17 2012 Yeah, I think that would work for true breadth-first. :)

Andrei Alexandrescu (5/48) Sep 17 2012 Thanks, that does seem to be a bug. Please make sure it's in bugzilla.

Jesse Phillips (17/21) Sep 17 2012 I should have noted that I'm using ReiserFS, as it use to fail

Ben Davis (4/25) Sep 17 2012 These results also show a correct preorder depth-first search.

Jesse Phillips (5/19) Sep 17 2012 You can probably just update. But I fail to see what is

Ben Davis (67/98) Sep 17 2012 I have a feeling most people have missed the point here. Thanks for the

Jesse Phillips (4/9) Sep 17 2012 Ah sorry, yes. I didn't pay too close attention as I've never
David Piepgrass (7/23) Sep 18 2012 Actually I prefer breadth-first search when searching the file

Mehrdad (3/28) Sep 18 2012 We really need iterative deepening.
Andrei Alexandrescu (3/26) Sep 18 2012 Yes, that was my intent too.

Mehrdad (8/19) Oct 08 2012 I just wanted to point out, BFS is generally a pretty BAD idea

Ben Davis (10/33) Sep 18 2012 Fair point. :)

Ben Davis (36/70) Sep 18 2012 So maybe we'd have:

Ben Davis (5/5) Sep 18 2012 Another thing worth mentioning occurred to me today.

Ben Davis <entheh cantab.net> writes:

According to http://dlang.org/phobos/std_file.html :

depth
Spans the directory depth-first, i.e. the content of any subdirectory is 
spanned before that subdirectory itself. Useful e.g. when recursively 
deleting files.

breadth
Spans the directory breadth-first, i.e. the content of any subdirectory 
is spanned right after that subdirectory itself.

That's not what breadth-first means at all. See 
http://en.wikipedia.org/wiki/Tree_traversal which has the correct 
definition: breadth-first means ALL directories at one level are 
returned before ANY child of ANY directory is considered.

Wikipedia uses the terms preorder and postorder for what's intended 
here, but those are a bit obtuse. Something like parentFirst and 
childrenFirst would get the message across clearly.

I don't much mind what the names change to, but the current completely 
wrong situation really irks me! Deprecate it at once! :P

Ben :)

Sep 16 2012

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

On 9/16/12, Ben Davis <entheh cantab.net> wrote:
 According to http://dlang.org/phobos/std_file.html :

 depth
 Spans the directory depth-first, i.e. the content of any subdirectory is
 spanned before that subdirectory itself. Useful e.g. when recursively
 deleting files.

 breadth
 Spans the directory breadth-first, i.e. the content of any subdirectory
 is spanned right after that subdirectory itself.

I don't understand why "depth" is used in the first place. If you have
"shallow" then the opposite of that is "deep", not "depth". I always
make that typo when using spanmode.

Sep 16 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 9/16/12 2:57 PM, Ben Davis wrote:
 According to http://dlang.org/phobos/std_file.html :

 depth
 Spans the directory depth-first, i.e. the content of any subdirectory is
 spanned before that subdirectory itself. Useful e.g. when recursively
 deleting files.

 breadth
 Spans the directory breadth-first, i.e. the content of any subdirectory
 is spanned right after that subdirectory itself.

 That's not what breadth-first means at all. See
 http://en.wikipedia.org/wiki/Tree_traversal which has the correct
 definition: breadth-first means ALL directories at one level are
 returned before ANY child of ANY directory is considered.

 Wikipedia uses the terms preorder and postorder for what's intended
 here, but those are a bit obtuse. Something like parentFirst and
 childrenFirst would get the message across clearly.

 I don't much mind what the names change to, but the current completely
 wrong situation really irks me! Deprecate it at once! :P

 Ben :)

What would be an example illustrating that "breadth" is doing the wrong 
thing?

Andrei

Sep 16 2012

"Jesse Phillips" <jessekphillips+D gmail.com> writes:

 What would be an example illustrating that "breadth" is doing 
 the wrong thing?

 Andrei

Linux 64bit:

import std.file;
import std.stdio;

void main() {
     mkdir("a");
     mkdir("a/b");
     mkdir("a/c");
     mkdir("a/c/z");
     std.file.write("a/1.txt", "");
     std.file.write("a/2.txt", "");
     std.file.write("a/b/1.txt", "");
     std.file.write("a/b/2.txt", "");
     std.file.write("a/c/1.txt", "");
     std.file.write("a/c/z/1.txt", "");
     foreach(string file; dirEntries("a/", SpanMode.breadth))
         writeln(file);

     rmdirRecurse("a");
     // Expected Approximation
     // a/2.txt
     // a/1.txt
     // a/b
     // a/c
     // a/c/1.txt
     // a/c/z
     // a/c/z/1.txt
     // a/b/1.txt
     // a/b/2.txt
     //
     // Actual
     // a/c
     // a/c/z
     // a/c/z/1.txt
     // a/c/1.txt
     // a/b
     // a/b/2.txt
     // a/b/1.txt
     // a/2.txt
     // a/1.txt
}

Sep 16 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 17-Sep-12 09:30, Jesse Phillips wrote:
 What would be an example illustrating that "breadth" is doing the
 wrong thing?

 Andrei


Shouldn't be hard to add "true" breadth first then.
Since it's a stack based visitation one just needs to instead use queue 
(FIFO) and change code so that it puts all directories on given level 
into the queue and only then picks next one from queue.

      // Expected Approximation
      // a/2.txt
      // a/1.txt
      // a/b
      // a/c
      // a/c/1.txt
      // a/c/z
      // a/c/z/1.txt
      // a/b/1.txt
      // a/b/2.txt
      //
      // Actual
      // a/c
      // a/c/z
      // a/c/z/1.txt
      // a/c/1.txt
      // a/b
      // a/b/2.txt
      // a/b/1.txt
      // a/2.txt
      // a/1.txt

P.S. Sorry for pinging you by mail.
Damn Thunderbird UI caught me by surprise again :(

-- 
Dmitry Olshansky

Sep 17 2012

Ben Davis <entheh cantab.net> writes:

On 17/09/2012 19:15, Dmitry Olshansky wrote:
 On 17-Sep-12 09:30, Jesse Phillips wrote:
  >
  >> What would be an example illustrating that "breadth" is doing the
  >> wrong thing?
  >>
  >> Andrei
  >

 Shouldn't be hard to add "true" breadth first then.
 Since it's a stack based visitation one just needs to instead use queue
 (FIFO) and change code so that it puts all directories on given level
 into the queue and only then picks next one from queue.

Yeah, I think that would work for true breadth-first. :)

Doubt anyone actually needs that though...?

Sep 17 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 9/17/12 1:30 AM, Jesse Phillips wrote:
 What would be an example illustrating that "breadth" is doing the
 wrong thing?

 Andrei

 Linux 64bit:

 import std.file;
 import std.stdio;

 void main() {
 mkdir("a");
 mkdir("a/b");
 mkdir("a/c");
 mkdir("a/c/z");
 std.file.write("a/1.txt", "");
 std.file.write("a/2.txt", "");
 std.file.write("a/b/1.txt", "");
 std.file.write("a/b/2.txt", "");
 std.file.write("a/c/1.txt", "");
 std.file.write("a/c/z/1.txt", "");
 foreach(string file; dirEntries("a/", SpanMode.breadth))
 writeln(file);

 rmdirRecurse("a");
 // Expected Approximation
 // a/2.txt
 // a/1.txt
 // a/b
 // a/c
 // a/c/1.txt
 // a/c/z
 // a/c/z/1.txt
 // a/b/1.txt
 // a/b/2.txt
 //
 // Actual
 // a/c
 // a/c/z
 // a/c/z/1.txt
 // a/c/1.txt
 // a/b
 // a/b/2.txt
 // a/b/1.txt
 // a/2.txt
 // a/1.txt
 }

Thanks, that does seem to be a bug. Please make sure it's in bugzilla. 
Probably the best way to go is to adjust the behavior so it matches the 
specification.

Andrei

Sep 17 2012

"Jesse Phillips" <Jessekphillips+D gmail.com> writes:

On Monday, 17 September 2012 at 18:20:55 UTC, Andrei Alexandrescu 
wrote:

 Thanks, that does seem to be a bug. Please make sure it's in 
 bugzilla. Probably the best way to go is to adjust the behavior 
 so it matches the specification.

 Andrei

I should have noted that I'm using ReiserFS, as it use to fail 
completely to span subdirs in the past. Anyway. Testing on 
windows shows a smaller issue.

http://d.puremagic.com/issues/show_bug.cgi?id=8680

Files show before spanning but directories are shown before 
spanning itself.

a/1.txt
a/2.txt
a/b
a/b\1.txt
a/b\2.txt
a/c
a/c\1.txt
a/c\z
a/c\z\1.txt

Sep 17 2012

Ben Davis <entheh cantab.net> writes:

On 17/09/2012 19:47, Jesse Phillips wrote:
 On Monday, 17 September 2012 at 18:20:55 UTC, Andrei Alexandrescu wrote:

 Thanks, that does seem to be a bug. Please make sure it's in bugzilla.
 Probably the best way to go is to adjust the behavior so it matches
 the specification.

 Andrei

 I should have noted that I'm using ReiserFS, as it use to fail
 completely to span subdirs in the past. Anyway. Testing on windows shows
 a smaller issue.

 http://d.puremagic.com/issues/show_bug.cgi?id=8680

 Files show before spanning but directories are shown before spanning
 itself.

 a/1.txt
 a/2.txt
 a/b
 a/b\1.txt
 a/b\2.txt
 a/c
 a/c\1.txt
 a/c\z
 a/c\z\1.txt

These results also show a correct preorder depth-first search.

The bug report as it stands is misleading; should I post a follow-up on 
the bug tracker, or should we close it and start again?

Sep 17 2012

"Jesse Phillips" <jessek.phillips+D gmail.m> writes:

On Monday, 17 September 2012 at 23:55:17 UTC, Ben Davis wrote:
 On 17/09/2012 19:47, Jesse Phillips wrote:
 a/1.txt
 a/2.txt
 a/b
 a/b\1.txt
 a/b\2.txt
 a/c
 a/c\1.txt
 a/c\z
 a/c\z\1.txt

 These results also show a correct preorder depth-first search.

 The bug report as it stands is misleading; should I post a 
 follow-up on the bug tracker, or should we close it and start 
 again?

You can probably just update. But I fail to see what is 
missleading? Did I get the expected results wrong?  Pre-order 
depth first does not satisfy breadth first so what am I missin? 
Sorry Im on a tablet making organization hard.

Sep 17 2012

Ben Davis <entheh cantab.net> writes:

I have a feeling most people have missed the point here. Thanks for the 
example - it's a good one to work with. The 'expected approximation' was 
a bit of a mix of traversal strategies, so I've snipped it out below and 
put my own examples. Hope this helps clarify what I was getting at:

On 17/09/2012 06:30, Jesse Phillips wrote:
 What would be an example illustrating that "breadth" is doing the
 wrong thing?

 Andrei

 Linux 64bit:

 import std.file;
 import std.stdio;

 void main() {
      mkdir("a");
      mkdir("a/b");
      mkdir("a/c");
      mkdir("a/c/z");
      std.file.write("a/1.txt", "");
      std.file.write("a/2.txt", "");
      std.file.write("a/b/1.txt", "");
      std.file.write("a/b/2.txt", "");
      std.file.write("a/c/1.txt", "");
      std.file.write("a/c/z/1.txt", "");
      foreach(string file; dirEntries("a/", SpanMode.breadth))
          writeln(file);

      rmdirRecurse("a");

[snip]

Here are the CORRECT tree traversal patterns:

Breadth-first (probably never required):
a/b
a/c
a/1.txt
a/2.txt
a/b/1.txt
a/b/2.txt
a/c/z
a/c/1.txt
a/c/z/1.txt
Defining property: number of /'s increases monotonically. Note how the 
deeper you go, the more spread out the children become. It's ALL 
children, then ALL grandchildren, then ALL great-grandchildren, etc.

I wouldn't bother implementing breadth-first. It's doubtful that anyone 
would want it, surely...?

Depth-first search has two useful variants, preorder and postorder.

Preorder depth-first (currently wrongly called SpanMode.breadth):
a/b
a/b/1.txt
a/b/2.txt
a/c
a/c/z
a/c/z/1.txt
a/c/1.txt
a/1.txt
a/2.txt
Defining property: every directory is immediately followed by all its 
children.

Postorder depth-first (currently only called SpanMode.depth):
a/b/1.txt
a/b/2.txt
a/b
a/c/z/1.txt
a/c/z
a/c/1.txt
a/c
a/1.txt
a/2.txt
Defining property: every directory is immediately *preceded* by all its 
children.

Going back to the 'actual' output you quoted:

      // Actual
      // a/c
      // a/c/z
      // a/c/z/1.txt
      // a/c/1.txt
      // a/b
      // a/b/2.txt
      // a/b/1.txt
      // a/2.txt
      // a/1.txt

This looks like a preorder depth-first iteration (albeit with child 
order reversed at each level). The implementation is correct in that it 
matches the spec as documented. The problem is that the spec is using 
the term 'breadth' wrongly.

Here is what I would do to fix it (sorry if I got the syntax wrong):

enum SpanMode {
   shallow,
   preorder,
   postorder;

    deprecated alias preorder breadth;
    deprecated alias postorder depth;
}

I initially said that possibly preorder and postorder weren't clear, but 
they grew on me very quickly. I think people could learn that "preorder 
means parents first" or "preorder is the one I usually want". Also, 
these are the standard terms (refer to the Wikipedia link), so once you 
know them, you can use them for everything, both in and outside of :D.

Thanks,

Ben :)

Sep 17 2012

"Jesse Phillips" <jessekphillips+D gmail.com> writes:

On Monday, 17 September 2012 at 23:27:22 UTC, Ben Davis wrote:

 Note how the deeper you go, the more spread out the children 
 become. It's ALL children, then ALL grandchildren, then ALL 
 great-grandchildren, etc.

 I wouldn't bother implementing breadth-first. It's doubtful 
 that anyone would want it, surely...?

Ah sorry, yes. I didn't pay too close attention as I've never 
wanted to traverse a tree in that manner.

I'll update the title of the issue, add your comments.

Sep 17 2012

"David Piepgrass" <qwertie256 gmail.com> writes:

 Breadth-first (probably never required):
 a/b
 a/c
 a/1.txt
 a/2.txt
 a/b/1.txt
 a/b/2.txt
 a/c/z
 a/c/1.txt
 a/c/z/1.txt
 Defining property: number of /'s increases monotonically. Note 
 how the deeper you go, the more spread out the children become. 
 It's ALL children, then ALL grandchildren, then ALL 
 great-grandchildren, etc.

 I wouldn't bother implementing breadth-first. It's doubtful 
 that anyone would want it, surely...?

Actually I prefer breadth-first search when searching the file 
system. When I search an entire volume, inevitably the 
(depth-first) search gets stuck in a few giant, deep directories 
like the source code of Mono or some other cave of source code, 
you know, something 12 directories deep with 100,000 files in it. 
A breadth-first search would be more likely to find the thing I'm 
looking for BEFORE going spelunking in these 12-deep caves.

Sep 18 2012

"Mehrdad" <wfunction hotmail.com> writes:

On Tuesday, 18 September 2012 at 14:24:30 UTC, David Piepgrass 
wrote:
 Breadth-first (probably never required):
 a/b
 a/c
 a/1.txt
 a/2.txt
 a/b/1.txt
 a/b/2.txt
 a/c/z
 a/c/1.txt
 a/c/z/1.txt
 Defining property: number of /'s increases monotonically. Note 
 how the deeper you go, the more spread out the children 
 become. It's ALL children, then ALL grandchildren, then ALL 
 great-grandchildren, etc.

 I wouldn't bother implementing breadth-first. It's doubtful 
 that anyone would want it, surely...?

 Actually I prefer breadth-first search when searching the file 
 system. When I search an entire volume, inevitably the 
 (depth-first) search gets stuck in a few giant, deep 
 directories like the source code of Mono or some other cave of 
 source code, you know, something 12 directories deep with 
 100,000 files in it. A breadth-first search would be more 
 likely to find the thing I'm looking for BEFORE going 
 spelunking in these 12-deep caves.

We really need iterative deepening.

Sep 18 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 9/18/12 10:25 AM, David Piepgrass wrote:
 Breadth-first (probably never required):
 a/b
 a/c
 a/1.txt
 a/2.txt
 a/b/1.txt
 a/b/2.txt
 a/c/z
 a/c/1.txt
 a/c/z/1.txt
 Defining property: number of /'s increases monotonically. Note how the
 deeper you go, the more spread out the children become. It's ALL
 children, then ALL grandchildren, then ALL great-grandchildren, etc.

 I wouldn't bother implementing breadth-first. It's doubtful that
 anyone would want it, surely...?

 Actually I prefer breadth-first search when searching the file system.
 When I search an entire volume, inevitably the (depth-first) search gets
 stuck in a few giant, deep directories like the source code of Mono or
 some other cave of source code, you know, something 12 directories deep
 with 100,000 files in it. A breadth-first search would be more likely to
 find the thing I'm looking for BEFORE going spelunking in these 12-deep
 caves.

Yes, that was my intent too.

Andrei

Sep 18 2012

"Mehrdad" <wfunction hotmail.com> writes:

On Tuesday, 18 September 2012 at 15:19:08 UTC, Andrei 
Alexandrescu wrote:
 On 9/18/12 10:25 AM, David Piepgrass wrote:
 Actually I prefer breadth-first search when searching the file 
 system. When I search an entire volume, inevitably the 
 (depth-first) search gets stuck in a few giant, deep 
 directories like the source code of Mono or some other cave of 
 source code, you know, something 12 directories deep with 
 100,000 files in it. A breadth-first search would be more 
 likely to find the thing I'm looking for BEFORE going 
 spelunking in these 12-deep caves.

 Yes, that was my intent too.

 Andrei



I just wanted to point out, BFS is generally a pretty BAD idea 
for searching a file system.

You instead probably want to implement DFS, and then alter it 
slightly to use iterative deepening instead of doing plain 
breadth-first search.

That way you won't be keeping open a million OS file handles.

Oct 08 2012

Ben Davis <entheh cantab.net> writes:

On 18/09/2012 15:25, David Piepgrass wrote:
 Breadth-first (probably never required):
 a/b
 a/c
 a/1.txt
 a/2.txt
 a/b/1.txt
 a/b/2.txt
 a/c/z
 a/c/1.txt
 a/c/z/1.txt
 Defining property: number of /'s increases monotonically. Note how the
 deeper you go, the more spread out the children become. It's ALL
 children, then ALL grandchildren, then ALL great-grandchildren, etc.

 I wouldn't bother implementing breadth-first. It's doubtful that
 anyone would want it, surely...?

 Actually I prefer breadth-first search when searching the file system.
 When I search an entire volume, inevitably the (depth-first) search gets
 stuck in a few giant, deep directories like the source code of Mono or
 some other cave of source code, you know, something 12 directories deep
 with 100,000 files in it. A breadth-first search would be more likely to
 find the thing I'm looking for BEFORE going spelunking in these 12-deep
 caves.

Fair point. :)

So there is a case for implementing breadth-first iteration (or as 
Mehrdad suggests, iterative deepening - it's just a CPU-memory 
tradeoff). However, the currently implemented 'breadth' (preorder 
depth-first) is probably used a lot, and someone might be relying on its 
current behaviour (e.g. to create a tree view). So how do we proceed 
without silently breaking existing code? Do we use 'breadthFirst' 
instead of 'breadth'? Do we just gulp and change it? Do we rename 
'SpanMode' to 'TreeMode' or something?

Sep 18 2012

Ben Davis <entheh cantab.net> writes:

On 18/09/2012 21:08, Ben Davis wrote:
 On 18/09/2012 15:25, David Piepgrass wrote:
 Breadth-first (probably never required):
 a/b
 a/c
 a/1.txt
 a/2.txt
 a/b/1.txt
 a/b/2.txt
 a/c/z
 a/c/1.txt
 a/c/z/1.txt
 Defining property: number of /'s increases monotonically. Note how the
 deeper you go, the more spread out the children become. It's ALL
 children, then ALL grandchildren, then ALL great-grandchildren, etc.

 I wouldn't bother implementing breadth-first. It's doubtful that
 anyone would want it, surely...?

 Actually I prefer breadth-first search when searching the file system.
 When I search an entire volume, inevitably the (depth-first) search gets
 stuck in a few giant, deep directories like the source code of Mono or
 some other cave of source code, you know, something 12 directories deep
 with 100,000 files in it. A breadth-first search would be more likely to
 find the thing I'm looking for BEFORE going spelunking in these 12-deep
 caves.

 Fair point. :)

 So there is a case for implementing breadth-first iteration (or as
 Mehrdad suggests, iterative deepening - it's just a CPU-memory
 tradeoff). However, the currently implemented 'breadth' (preorder
 depth-first) is probably used a lot, and someone might be relying on its
 current behaviour (e.g. to create a tree view). So how do we proceed
 without silently breaking existing code? Do we use 'breadthFirst'
 instead of 'breadth'? Do we just gulp and change it? Do we rename
 'SpanMode' to 'TreeMode' or something?

So maybe we'd have:

enum SpanMode {
   shallow,
   depthFirstPreorder,
   depthFirstPostorder,
   breadthFirst;

    deprecated alias depthFirstPostorder depth;
    deprecated alias depthFirstPreorder breadth;
}

Also don't forget the intent conveyed by the current docs:

"depth
Spans the directory depth-first, i.e. the content of any subdirectory is 
spanned before that subdirectory itself. Useful e.g. when recursively 
deleting files.

breadth
Spans the directory breadth-first, i.e. the content of any subdirectory 
is spanned right after that subdirectory itself."

This would need to be rewritten to say something like:

"depthFirstPostorder
Spans the directory depth-first postorder, i.e. the content of any 
subdirectory is spanned before that subdirectory itself. Useful e.g. 
when recursively deleting files.

depthFirstPreorder
Spans the directory depth-first preorder, i.e. the content of any 
subdirectory is spanned right after that subdirectory itself.

breadthFirst
Spans the directory breadth-first, i.e. all children are returned first, 
followed by all grandchildren, followed by all great-grandchildren, and 
so on. Useful e.g. when searching for a file that is likely not to be 
very deep."

I personally won't object if everyone feels comfortable with reusing 
'breadth' and changing existing behaviour. I just have a feeling it's 
generally considered a bad thing. Otherwise I'm all for shortening the 
names (perhaps by just omitting 'First' from the depth ones).

:)

Sep 18 2012

Ben Davis <entheh cantab.net> writes:

Another thing worth mentioning occurred to me today.

Whoever implements breadth-first - don't get caught out by symlinks. For 
the depth-first variants, you can just detect them by looking in the 
stack. For the breadth-first, you won't be able to detect them by 
looking in the queue that replaced the stack. :)

Sep 18 2012

D Programming

C/C++ Programming

Other

digitalmars.D - SpanMode uses incorrect terminology (breadth)