www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Directory recursive walking

reply dog2002 <742617000027 aaathats3as.com> writes:
I need to make some operations with all the files in a directory 
and subdirectories. Currently, I do it like this:

import std;

void DirIteration(string path) {
     try {
         foreach(entry; dirEntries(path, SpanMode.shallow, false)) 
{ //SpanMode.shallow allows skip directories if any error happens
             if (entry.isFile && !entry.isSymlink)
                 writeln(entry); //Or something instead of this
             if (entry.isDir)
                 DirIteration(entry);
         }
     }
     catch (Throwable) {}
}

void main()
{
     DirIteration("C:\\Users\\angrypuppy\\MyDir");
}

But this method consumes a huge amount of memory (up to 4 GB and 
more). Is there a more appropriate way to walk directories 
recursively that does not consume a lot of memory?
Jan 14 2021
parent reply drug <drug2004 bk.ru> writes:
On 1/14/21 6:46 PM, dog2002 wrote:
 I need to make some operations with all the files in a directory and 
 subdirectories. Currently, I do it like this:
 
 import std;
 
 void DirIteration(string path) {
      try {
          foreach(entry; dirEntries(path, SpanMode.shallow, false)) { 
 //SpanMode.shallow allows skip directories if any error happens
              if (entry.isFile && !entry.isSymlink)
                  writeln(entry); //Or something instead of this
              if (entry.isDir)
                  DirIteration(entry);
          }
      }
      catch (Throwable) {}
 }
 
 void main()
 {
      DirIteration("C:\\Users\\angrypuppy\\MyDir");
 }
 
 But this method consumes a huge amount of memory (up to 4 GB and more). 
 Is there a more appropriate way to walk directories recursively that 
 does not consume a lot of memory?
DirEntry is a struct. First of all I would try this: ```D foreach(ref entry; dirEntries(path, SpanMode.shallow, false)) ```
Jan 14 2021
parent reply drug <drug2004 bk.ru> writes:
On 1/14/21 6:55 PM, drug wrote:
 But this method consumes a huge amount of memory (up to 4 GB and 
 more). Is there a more appropriate way to walk directories recursively 
 that does not consume a lot of memory?
DirEntry is a struct. First of all I would try this: ```D foreach(ref entry; dirEntries(path, SpanMode.shallow, false)) ```
Does your directory just contain large amount of files?
Jan 14 2021
parent reply dog2002 <742617000027 aaathats3as.com> writes:
On Thursday, 14 January 2021 at 16:01:43 UTC, drug wrote:
 On 1/14/21 6:55 PM, drug wrote:
 But this method consumes a huge amount of memory (up to 4 GB 
 and more). Is there a more appropriate way to walk 
 directories recursively that does not consume a lot of memory?
DirEntry is a struct. First of all I would try this: ```D foreach(ref entry; dirEntries(path, SpanMode.shallow, false)) ```
Does your directory just contain large amount of files?
Yes. I forgot to add it in the original post.
Jan 14 2021
next sibling parent reply drug <drug2004 bk.ru> writes:
On 1/14/21 7:06 PM, dog2002 wrote:
 On Thursday, 14 January 2021 at 16:01:43 UTC, drug wrote:
 On 1/14/21 6:55 PM, drug wrote:
 But this method consumes a huge amount of memory (up to 4 GB and 
 more). Is there a more appropriate way to walk directories 
 recursively that does not consume a lot of memory?
DirEntry is a struct. First of all I would try this: ```D foreach(ref entry; dirEntries(path, SpanMode.shallow, false)) ```
Does your directory just contain large amount of files?
Yes. I forgot to add it in the original post.
Does using `ref` changed anything? Try following: ``` import std; void DirIteration(ref DirEntry dir) { try { foreach(ref entry; dirEntries(dir, SpanMode.shallow, false)) { //SpanMode.shallow allows skip directories if any error happens if (entry.isFile && !entry.isSymlink) writeln(entry); //Or something instead of this if (entry.isDir) DirIteration(entry); } } catch (Throwable) {} } void main() { auto de = DirEntry("."); DirIteration(de); } ```
Jan 14 2021
parent reply dog2002 <742617000027 aaathats3as.com> writes:
On Thursday, 14 January 2021 at 16:18:28 UTC, drug wrote:
 On 1/14/21 7:06 PM, dog2002 wrote:
 On Thursday, 14 January 2021 at 16:01:43 UTC, drug wrote:
 [...]
Yes. I forgot to add it in the original post.
Does using `ref` changed anything? Try following: ``` import std; void DirIteration(ref DirEntry dir) { try { foreach(ref entry; dirEntries(dir, SpanMode.shallow, false)) { //SpanMode.shallow allows skip directories if any error happens if (entry.isFile && !entry.isSymlink) writeln(entry); //Or something instead of this if (entry.isDir) DirIteration(entry); } } catch (Throwable) {} } void main() { auto de = DirEntry("."); DirIteration(de); } ```
No, it doesn't. Seems like memory can't clear.
Jan 14 2021
parent drug <drug2004 bk.ru> writes:
On 1/14/21 7:30 PM, dog2002 wrote:
 On Thursday, 14 January 2021 at 16:18:28 UTC, drug wrote:
 On 1/14/21 7:06 PM, dog2002 wrote:
 On Thursday, 14 January 2021 at 16:01:43 UTC, drug wrote:
 [...]
Yes. I forgot to add it in the original post.
Does using `ref` changed anything? Try following: ``` import std; void DirIteration(ref DirEntry dir) {     try {         foreach(ref entry; dirEntries(dir, SpanMode.shallow, false)) { //SpanMode.shallow allows skip directories if any error happens             if (entry.isFile && !entry.isSymlink)                 writeln(entry); //Or something instead of this             if (entry.isDir)                 DirIteration(entry);         }     }     catch (Throwable) {} } void main() {     auto de = DirEntry(".");     DirIteration(de); } ```
No, it doesn't. Seems like memory can't clear.
It is a recursion. Memory will be freed only after completion. Then I would try to get rid of recursion.
Jan 14 2021
prev sibling parent reply drug <drug2004 bk.ru> writes:
On 1/14/21 7:06 PM, dog2002 wrote:
 On Thursday, 14 January 2021 at 16:01:43 UTC, drug wrote:
 On 1/14/21 6:55 PM, drug wrote:
 But this method consumes a huge amount of memory (up to 4 GB and 
 more). Is there a more appropriate way to walk directories 
 recursively that does not consume a lot of memory?
DirEntry is a struct. First of all I would try this: ```D foreach(ref entry; dirEntries(path, SpanMode.shallow, false)) ```
Does your directory just contain large amount of files?
Yes. I forgot to add it in the original post.
How much files do you have? DirEntry size is 168 bytes only and dirEntry is lazy range so I'm curious what is the reason of huge memory consumption. Do you use Windows 32 bits between?
Jan 14 2021
parent reply dog2002 <742617000027 aaathats3as.com> writes:
On Thursday, 14 January 2021 at 16:47:45 UTC, drug wrote:
 On 1/14/21 7:06 PM, dog2002 wrote:
 On Thursday, 14 January 2021 at 16:01:43 UTC, drug wrote:
 On 1/14/21 6:55 PM, drug wrote:
 But this method consumes a huge amount of memory (up to 
 4 GB and more). Is there a more appropriate way to walk 
 directories recursively that does not consume a lot of 
 memory?
DirEntry is a struct. First of all I would try this: ```D foreach(ref entry; dirEntries(path, SpanMode.shallow, false)) ```
Does your directory just contain large amount of files?
Yes. I forgot to add it in the original post.
How much files do you have? DirEntry size is 168 bytes only and dirEntry is lazy range so I'm curious what is the reason of huge memory consumption. Do you use Windows 32 bits between?
About 1000 large files. I want to replace several first bytes in all the files, so I just copy the remaining bytes into a new file. Might this be the reason for high memory consumption? If so, is there a way not to copy the entire file, just delete first bytes and write the replaced bytes into the beginning of the file? I use Windows x64.
Jan 14 2021
parent reply Paul Backus <snarwin gmail.com> writes:
On Thursday, 14 January 2021 at 20:23:37 UTC, dog2002 wrote:
 About 1000 large files.

 I want to replace several first bytes in all the files, so I 
 just copy the remaining bytes into a new file. Might this be 
 the reason for high memory consumption? If so, is there a way 
 not to copy the entire file, just delete first bytes and write 
 the replaced bytes into the beginning of the file?

 I use Windows x64.
What code are you using to copy the bytes? If you're reading the whole file into memory at once, that will consume a lot of memory.
Jan 14 2021
parent reply dog2002 <742617000027 aaathats3as.com> writes:
On Thursday, 14 January 2021 at 22:28:19 UTC, Paul Backus wrote:
 On Thursday, 14 January 2021 at 20:23:37 UTC, dog2002 wrote:
 About 1000 large files.

 I want to replace several first bytes in all the files, so I 
 just copy the remaining bytes into a new file. Might this be 
 the reason for high memory consumption? If so, is there a way 
 not to copy the entire file, just delete first bytes and write 
 the replaced bytes into the beginning of the file?

 I use Windows x64.
What code are you using to copy the bytes? If you're reading the whole file into memory at once, that will consume a lot of memory.
void func(string inputFile, string outFile, uint chunk_size) { try { File _inputFile = File(inputFile, "r"); File _outputFile = File(outFile, "w"); ubyte[] tempBuffer = _inputFile.rawRead(new ubyte[](512)); //doing some operations with the tempBuffer _outputFile.rawWrite(tempBuffer); _inputFile.seek(tempBuffer.length, SEEK_SET); foreach(_buffer; _inputFile.byChunk(chunk_size)) { _outputFile.rawWrite(_buffer); } _inputFile.close(); _outputFile.close(); } catch (Throwable) {} }
Jan 14 2021
next sibling parent reply Paul Backus <snarwin gmail.com> writes:
On Friday, 15 January 2021 at 06:15:06 UTC, dog2002 wrote:
 void func(string inputFile, string outFile, uint chunk_size) {
 	try {
 		File _inputFile = File(inputFile, "r");
 		File _outputFile = File(outFile, "w");
 		
 		ubyte[] tempBuffer = _inputFile.rawRead(new ubyte[](512));
 	
                 //doing some operations with the tempBuffer	

 		_outputFile.rawWrite(tempBuffer);
 		
 		_inputFile.seek(tempBuffer.length, SEEK_SET);
 		
 		
 		foreach(_buffer; _inputFile.byChunk(chunk_size)) {
 			_outputFile.rawWrite(_buffer);
 		}
 		_inputFile.close();
 		_outputFile.close();
 	}
 	catch (Throwable) {}

 }
You can save a little bit of memory here by allocating tempBuffer on the stack: ubyte[512] tempBuffer; _inputFile.rawRead(tempBuffer[]); // note the explicit [] // ... _outputFile.rawWrite(tempBuffer[]); However, those allocations alone shouldn't be enough to get you to 4GB+, so the real issue is probably elsewhere.
Jan 14 2021
parent reply Paul Backus <snarwin gmail.com> writes:
On Friday, 15 January 2021 at 06:31:18 UTC, Paul Backus wrote:
 You can save a little bit of memory here by allocating 
 tempBuffer on the stack:

     ubyte[512] tempBuffer;
     _inputFile.rawRead(tempBuffer[]); // note the explicit []
I made a mistake; this should be: ubyte[512] tempArray; ubyte[] tempBuffer = _inputFile.rawRead(tempArray[]); ...with the rest the same as your original version.
Jan 14 2021
parent reply dog2002 <742617000027 aaathats3as.com> writes:
On Friday, 15 January 2021 at 06:33:55 UTC, Paul Backus wrote:
 On Friday, 15 January 2021 at 06:31:18 UTC, Paul Backus wrote:
 You can save a little bit of memory here by allocating 
 tempBuffer on the stack:

     ubyte[512] tempBuffer;
     _inputFile.rawRead(tempBuffer[]); // note the explicit []
I made a mistake; this should be: ubyte[512] tempArray; ubyte[] tempBuffer = _inputFile.rawRead(tempArray[]); ...with the rest the same as your original version.
Thank you so much! It saves a lot of memory! And one last question: why the application crashes, if I allocate 1 MB array?
ubyte[1024000] tempBuffer;
Jan 14 2021
next sibling parent reply dog2002 <742617000027 aaathats3as.com> writes:
On Friday, 15 January 2021 at 06:56:36 UTC, dog2002 wrote:
 On Friday, 15 January 2021 at 06:33:55 UTC, Paul Backus wrote:
 On Friday, 15 January 2021 at 06:31:18 UTC, Paul Backus wrote:
 You can save a little bit of memory here by allocating 
 tempBuffer on the stack:

     ubyte[512] tempBuffer;
     _inputFile.rawRead(tempBuffer[]); // note the explicit []
I made a mistake; this should be: ubyte[512] tempArray; ubyte[] tempBuffer = _inputFile.rawRead(tempArray[]); ...with the rest the same as your original version.
Thank you so much! It saves a lot of memory! And one last question: why the application crashes, if I allocate 1 MB array?
ubyte[1024000] tempBuffer;
Solved: ubyte[] tempBuffer = new ubyte[1024000];
Jan 14 2021
parent Daniel Kozak <kozzi11 gmail.com> writes:
On Fri, Jan 15, 2021 at 8:20 AM dog2002 via Digitalmars-d-learn <
digitalmars-d-learn puremagic.com> wrote:

 On Friday, 15 January 2021 at 06:56:36 UTC, dog2002 wrote:
 On Friday, 15 January 2021 at 06:33:55 UTC, Paul Backus wrote:
 On Friday, 15 January 2021 at 06:31:18 UTC, Paul Backus wrote:
 You can save a little bit of memory here by allocating
 tempBuffer on the stack:

     ubyte[512] tempBuffer;
     _inputFile.rawRead(tempBuffer[]); // note the explicit []
I made a mistake; this should be: ubyte[512] tempArray; ubyte[] tempBuffer = _inputFile.rawRead(tempArray[]); ...with the rest the same as your original version.
Thank you so much! It saves a lot of memory! And one last question: why the application crashes, if I allocate 1 MB array?
ubyte[1024000] tempBuffer;
Solved: ubyte[] tempBuffer = new ubyte[1024000];
You can still use ubyte[1024000] tempBuffer; but you have to place it somewhere outside recursion or use a static static ubyte[1024000] tempBuffer;
Jan 14 2021
prev sibling parent reply Daniel Kozak <kozzi11 gmail.com> writes:
On Fri, Jan 15, 2021 at 8:00 AM dog2002 via Digitalmars-d-learn <
digitalmars-d-learn puremagic.com> wrote:

 On Friday, 15 January 2021 at 06:33:55 UTC, Paul Backus wrote:
 On Friday, 15 January 2021 at 06:31:18 UTC, Paul Backus wrote:
 You can save a little bit of memory here by allocating
 tempBuffer on the stack:

     ubyte[512] tempBuffer;
     _inputFile.rawRead(tempBuffer[]); // note the explicit []
I made a mistake; this should be: ubyte[512] tempArray; ubyte[] tempBuffer = _inputFile.rawRead(tempArray[]); ...with the rest the same as your original version.
Thank you so much! It saves a lot of memory! And one last question: why the application crashes, if I allocate 1 MB array?
ubyte[1024000] tempBuffer;
Because of stack overflow
Jan 14 2021
parent Ferhat =?UTF-8?B?S3VydHVsbXXFnw==?= <aferust gmail.com> writes:
On Friday, 15 January 2021 at 07:16:21 UTC, Daniel Kozak wrote:
 On Fri, Jan 15, 2021 at 8:00 AM dog2002 via Digitalmars-d-learn 
 < digitalmars-d-learn puremagic.com> wrote:

 On Friday, 15 January 2021 at 06:33:55 UTC, Paul Backus wrote:
 On Friday, 15 January 2021 at 06:31:18 UTC, Paul Backus 
 wrote:
 You can save a little bit of memory here by allocating 
 tempBuffer on the stack:

     ubyte[512] tempBuffer;
     _inputFile.rawRead(tempBuffer[]); // note the explicit 
 []
I made a mistake; this should be: ubyte[512] tempArray; ubyte[] tempBuffer = _inputFile.rawRead(tempArray[]); ...with the rest the same as your original version.
Thank you so much! It saves a lot of memory! And one last question: why the application crashes, if I allocate 1 MB array?
ubyte[1024000] tempBuffer;
Because of stack overflow
A compiler parameter can be used to increase the maximum stack size "dflags": ["-L/STACK:1500000000"] or recursion can be somehow emulated using heap memory. Here is my "fake" recursion: // wins is a range auto stack = wins.save; while(!stack.empty){ immutable n = stack.length - 1; auto window = stack[n]; doSomeThingforEachRecursiveElement(window) stack.popBack; if(window.children.length){ foreach (ref child; window.children) stack.pushBack(child); } } stack.free;
Jan 14 2021
prev sibling parent reply dog2002 <742617000027 aaathats3as.com> writes:
On Friday, 15 January 2021 at 06:15:06 UTC, dog2002 wrote:
 On Thursday, 14 January 2021 at 22:28:19 UTC, Paul Backus wrote:
 On Thursday, 14 January 2021 at 20:23:37 UTC, dog2002 wrote:
 [...]
What code are you using to copy the bytes? If you're reading the whole file into memory at once, that will consume a lot of memory.
void func(string inputFile, string outFile, uint chunk_size) { try { File _inputFile = File(inputFile, "r"); File _outputFile = File(outFile, "w"); ubyte[] tempBuffer = _inputFile.rawRead(new ubyte[](512)); //doing some operations with the tempBuffer _outputFile.rawWrite(tempBuffer); _inputFile.seek(tempBuffer.length, SEEK_SET); foreach(_buffer; _inputFile.byChunk(chunk_size)) { _outputFile.rawWrite(_buffer); } _inputFile.close(); _outputFile.close(); } catch (Throwable) {} }
Okay, the reason is incredibly stupid: using WinMain instead of main causes high memory usage. I don't know why, I use the same code. If I replace WinMain with main, the memory consumption is about 6 MB.
Jan 15 2021
parent reply Daniel Kozak <kozzi11 gmail.com> writes:
On Fri, Jan 15, 2021 at 10:30 AM dog2002 via Digitalmars-d-learn <
digitalmars-d-learn puremagic.com> wrote:

 ...
 Okay, the reason is incredibly stupid: using WinMain instead of
 main causes high memory usage. I don't know why, I use the same
 code. If I replace WinMain with main, the memory consumption is
 about 6 MB.
https://wiki.dlang.org/D_for_Win32
Jan 15 2021
parent dog2002 <742617000027 aaathats3as.com> writes:
On Friday, 15 January 2021 at 11:05:56 UTC, Daniel Kozak wrote:
 On Fri, Jan 15, 2021 at 10:30 AM dog2002 via 
 Digitalmars-d-learn < digitalmars-d-learn puremagic.com> wrote:

 ...
 Okay, the reason is incredibly stupid: using WinMain instead of
 main causes high memory usage. I don't know why, I use the same
 code. If I replace WinMain with main, the memory consumption is
 about 6 MB.
https://wiki.dlang.org/D_for_Win32
Thank you! Now the application works properly. And sorry for the dumb questions.
Jan 15 2021