digitalmars.D.learn - prune with dirEntries

Dan (5/5) Nov 29 2012 Is there a way to walk files with std.file.dirEntries such that

Jonathan M Davis (4/7) Nov 29 2012 You can use std.algorithm.filter on its result. Then when it would itera...

Dan (20/28) Nov 29 2012 That will do the filtering correctly - but what I was hoping was

Jonathan M Davis (7/41) Nov 29 2012 You can use the glob matching overload then:
Joshua Niehus (13/21) Nov 29 2012 what about the following?

Joshua Niehus (23/25) Nov 29 2012 oh wait... it probably still looks through all those dir's.

Dan (47/51) Nov 30 2012 Good idea, thanks. I could not get original to compile as is -

Joshua Niehus (8/22) Nov 30 2012 Thats cool.
Jonathan M Davis (7/19) Nov 30 2012 If you're compiling with -property, filter must have the parens for the

Joshua Niehus (5/15) Nov 30 2012 ahh... well i hope those silly parens never become mandatory.
Dan (13/22) Nov 30 2012 That is it, thanks. The first project I looked at was vibe and

Dmitry Olshansky (11/35) Nov 30 2012 I do think that there is a race on 'files' variable. parallel doesn't

"Dan" <dbdavidson yahoo.com> writes:

Is there a way to walk files with std.file.dirEntries such that 
certain directories are skipped (i.e. how to avoid .git 
entirely/recursively)?

Thanks
Dan

Nov 29 2012

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Friday, November 30, 2012 01:24:07 Dan wrote:
 Is there a way to walk files with std.file.dirEntries such that
 certain directories are skipped (i.e. how to avoid .git
 entirely/recursively)?

You can use std.algorithm.filter on its result. Then when it would iterate to 
something which doesn't match filter's predicate, it skips it.

- Jonathan M Davis

Nov 29 2012

"Dan" <dbdavidson yahoo.com> writes:

On Friday, 30 November 2012 at 01:13:13 UTC, Jonathan M Davis 
wrote:
 On Friday, November 30, 2012 01:24:07 Dan wrote:
 Is there a way to walk files with std.file.dirEntries such that
 certain directories are skipped (i.e. how to avoid .git
 entirely/recursively)?

 You can use std.algorithm.filter on its result. Then when it 
 would iterate to
 something which doesn't match filter's predicate, it skips it.

 - Jonathan M Davis

That will do the filtering correctly - but what I was hoping was 
to actually prune at the directory level and not drill down to 
the files in of an unwanted directory (e.g. .git). The problem 
with this and what I'm trying to overcome is accessing lots of 
files and directories recursively all of which I want to skip. 
Much like there is a *followSymlink* it would be nice if a 
predicate were accepted to *followDirectory* in general or some 
way to cause that.

---------------

static bool desired(string m) {
   bool unwanted = match(m, _uninterestRe)? true : false;
   writeln("Is unwanted ", m, " ", unwanted);
   return !unwanted;
}
static Regex!(char) _uninterestRe = regex(`\.git\b`);
filter!(desired)(dirEntries(root, SpanMode.depth))) {
...
}

Nov 29 2012

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Friday, November 30, 2012 02:57:20 Dan wrote:
 On Friday, 30 November 2012 at 01:13:13 UTC, Jonathan M Davis
 
 wrote:
 On Friday, November 30, 2012 01:24:07 Dan wrote:
 Is there a way to walk files with std.file.dirEntries such that
 certain directories are skipped (i.e. how to avoid .git
 entirely/recursively)?

 
 You can use std.algorithm.filter on its result. Then when it
 would iterate to
 something which doesn't match filter's predicate, it skips it.
 
 - Jonathan M Davis

 
 That will do the filtering correctly - but what I was hoping was
 to actually prune at the directory level and not drill down to
 the files in of an unwanted directory (e.g. .git). The problem
 with this and what I'm trying to overcome is accessing lots of
 files and directories recursively all of which I want to skip.
 Much like there is a *followSymlink* it would be nice if a
 predicate were accepted to *followDirectory* in general or some
 way to cause that.
 
 ---------------
 
 static bool desired(string m) {
 bool unwanted = match(m, _uninterestRe)? true : false;
 writeln("Is unwanted ", m, " ", unwanted);
 return !unwanted;
 }
 static Regex!(char) _uninterestRe = regex(`\.git\b`);
 filter!(desired)(dirEntries(root, SpanMode.depth))) {
 ...
 }

You can use the glob matching overload then:

auto dirEntries(string path, string pattern, SpanMode mode,
 bool followSymlink = true)

I don't really know how to use it though, so you'll have to read the docs and 
figure it out.

- Jonathan M Davis

Nov 29 2012

"Joshua Niehus" <jm.niehus gmail.com> writes:

On Friday, 30 November 2012 at 01:57:21 UTC, Dan wrote:
 That will do the filtering correctly - but what I was hoping 
 was to actually prune at the directory level and not drill down 
 to the files in of an unwanted directory (e.g. .git). The 
 problem with this and what I'm trying to overcome is accessing 
 lots of files and directories recursively all of which I want 
 to skip. Much like there is a *followSymlink* it would be nice 
 if a predicate were accepted to *followDirectory* in general or 
 some way to cause that.

what about the following?

import std.algorithm, std.array, std.regex;
import std.stdio, std.file;
void main()
{
   auto exclude = regex(r"\.git", "g");
   dirEntries("/path/GIT", SpanMode.breadth)
     .filter!(a => match(a.name, exclude).empty)
     .writeln();
}

I think if you go breadth first, you can filter out the unwanted 
directories before it delves into them

Nov 29 2012

"Joshua Niehus" <jm.niehus gmail.com> writes:

On Friday, 30 November 2012 at 06:29:01 UTC, Joshua Niehus wrote:
 I think if you go breadth first, you can filter out the 
 unwanted directories before it delves into them

oh wait... it probably still looks through all those dir's.
What about this?

import std.algorithm, std.regex, std.stdio, std.file;
import std.parallelism;
DirEntry[] prune(string path, ref DirEntry[] files)
{
   auto exclude = regex(r"\.git|\.DS_Store", "g");
   foreach(_path; taskPool.parallel(dirEntries(path, 
SpanMode.shallow)
     .filter!(a => match(a.name, exclude).empty)))
   {
     files ~= _path;
     if (isDir(_path.name)) { prune(_path.name, files); }
   }
return files;
}

void main()
{
   DirEntry[] files;
   prune("/path", files);
   foreach(file;files) { writeln(file.name); }
}

Nov 29 2012

"Dan" <dbdavidson yahoo.com> writes:

On Friday, 30 November 2012 at 07:29:59 UTC, Joshua Niehus wrote:
 On Friday, 30 November 2012 at 06:29:01 UTC, Joshua Niehus 
 wrote:
 I think if you go breadth first, you can filter out the 
 unwanted directories before it delves into them


Good idea, thanks. I could not get original to compile as is - 
but the concept is just what was needed. I got an error on line 8:
Error: not a property dirEntries(path, cast(SpanMode)0, 
true).filter!(__lambda2)
I'm using a quite recent version of dmd and phobos.

But, I pulled the lamda out into a function and it works great. I 
assume the parallel is for performance, and it actually runs 
significantly slower than without on my test case - but no work 
is being done other than build the list of files, so that is 
probably normal. For my case the breakdown is:

No Pruning: 11 sec
Pruning Parallel: 4.78 sec
Pruning Serial: 0.377 sec

Thanks
Dan

---------------------
import std.algorithm, std.regex, std.stdio, std.file;
import std.parallelism;

bool interested(DirEntry path) {
   static auto exclude = regex(r"\.git|\.DS_Store", "g");
   return match(path.name, exclude).empty;
}

DirEntry[] prune(string path, ref DirEntry[] files)
{
   static if(0) {
     foreach(_path; 
taskPool.parallel(filter!interested(dirEntries(path, 
SpanMode.shallow))))  {
       files ~= _path;
       if (isDir(_path.name)) { prune(_path.name, files); }
     }
   } else {
     foreach(_path; filter!(interested)(dirEntries(path, 
SpanMode.shallow)))  {
       files ~= _path;
       if (isDir(_path.name)) { prune(_path.name, files); }
     }
   }
   return files;
}

void main()
{
   DirEntry[] files;
   prune("/path", files);
   //  foreach(file;files) { writeln(file.name); }
}

Nov 30 2012

"Joshua Niehus" <jm.niehus gmail.com> writes:

On Friday, 30 November 2012 at 12:02:51 UTC, Dan wrote:
 Good idea, thanks. I could not get original to compile as is - 
 but the concept is just what was needed. I got an error on line 
 8:
 Error: not a property dirEntries(path, cast(SpanMode)0, 
 true).filter!(__lambda2)
 I'm using a quite recent version of dmd and phobos.

hmm strange... I'm using 2.060 (on a mac),

 But, I pulled the lamda out into a function and it works great. 
 I assume the parallel is for performance, and it actually runs 
 significantly slower than without on my test case - but no work 
 is being done other than build the list of files, so that is 
 probably normal. For my case the breakdown is:

 No Pruning: 11 sec
 Pruning Parallel: 4.78 sec
 Pruning Serial: 0.377 sec

Thats cool.
Yea I thought parallel would make a big difference (in the 
positive sense) for large directories, but I guess if we are 
recursively spawning parallel tasks, the overhead involved starts 
accumulating, resulting in worse performance (my best guess 
anyway).

Nov 30 2012

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Friday, November 30, 2012 13:02:50 Dan wrote:
 On Friday, 30 November 2012 at 07:29:59 UTC, Joshua Niehus wrote:
 On Friday, 30 November 2012 at 06:29:01 UTC, Joshua Niehus
 
 wrote:
 I think if you go breadth first, you can filter out the
 unwanted directories before it delves into them


 
 Good idea, thanks. I could not get original to compile as is -
 but the concept is just what was needed. I got an error on line 8:
 Error: not a property dirEntries(path, cast(SpanMode)0,
 true).filter!(__lambda2)
 I'm using a quite recent version of dmd and phobos.

If you're compiling with -property, filter must have the parens for the 
function call as it's a function, not a property. The !() is for the template 
arguments and is separate from the parens for the function call. That means 
that if you're compiling with -property and using UFCS, then you end up with 
range.filter!(pred)(), whereas you have range.filter!(pred).

- Jonathan M Davis

Nov 30 2012

"Joshua Niehus" <jm.niehus gmail.com> writes:

On Friday, 30 November 2012 at 19:52:26 UTC, Jonathan M Davis 
wrote:
 If you're compiling with -property, filter must have the parens 
 for the
 function call as it's a function, not a property. The !() is 
 for the template
 arguments and is separate from the parens for the function 
 call. That means
 that if you're compiling with -property and using UFCS, then 
 you end up with
 range.filter!(pred)(), whereas you have range.filter!(pred).

 - Jonathan M Davis


ahh...  well i hope those silly parens never become mandatory. 
Ruby seems to be doing just fine with or without them.
Sorry Jonathan ;)

Nov 30 2012

"Dan" <dbdavidson yahoo.com> writes:

On Friday, 30 November 2012 at 19:52:26 UTC, Jonathan M Davis 
wrote:
 If you're compiling with -property, filter must have the parens 
 for the
 function call as it's a function, not a property. The !() is 
 for the template
 arguments and is separate from the parens for the function 
 call. That means
 that if you're compiling with -property and using UFCS, then 
 you end up with
 range.filter!(pred)(), whereas you have range.filter!(pred).

That is it, thanks. The first project I looked at was vibe and 
they used that flag so I put it in my script.

Regarding the timings, the relative orderings are the same, but 
the magnitude of the difference is much more reasonable now that 
I switched back to release of phobos, druntime (oops :-).

No filtering:
   parallel: 0.268 sec
   serial: 0.125 sec

With filtering:
   parallel: 0.119 sec
   serial: 0.064 sec

Nov 30 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

11/30/2012 11:29 AM, Joshua Niehus пишет:
 On Friday, 30 November 2012 at 06:29:01 UTC, Joshua Niehus wrote:
 I think if you go breadth first, you can filter out the unwanted
 directories before it delves into them

 oh wait... it probably still looks through all those dir's.
 What about this?

 import std.algorithm, std.regex, std.stdio, std.file;
 import std.parallelism;
 DirEntry[] prune(string path, ref DirEntry[] files)
 {
    auto exclude = regex(r"\.git|\.DS_Store", "g");
    foreach(_path; taskPool.parallel(dirEntries(path, SpanMode.shallow)
      .filter!(a => match(a.name, exclude).empty)))
    {
      files ~= _path;

I do think that there is a race on 'files' variable. parallel doesn't 
auto-magically lock anything.

      if (isDir(_path.name)) { prune(_path.name, files); }

An yes, I have a bad feeling that spawning a few threads per directory 
recursively is a bad idea.


    }
 return files;
 }

 void main()
 {
    DirEntry[] files;
    prune("/path", files);
    foreach(file;files) { writeln(file.name); }
 }

Otherwise I think there is a better way to filter out directories inside 
because here you a basically doing what dirEntries depth search does 
(but with recursion vs queue).

Maybe file it as an enhancement?


-- 
Dmitry Olshansky

Nov 30 2012

D Programming

C/C++ Programming

Other

digitalmars.D.learn - prune with dirEntries