www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Why are commands executing out of order?

reply "Josh" <moonburntm gmail.com> writes:
Here's my code:

import std.datetime;
import std.file;
import std.stdio;

struct info
{
     string name;
     bool isDir;
     ulong size;
     SysTime timeCreated;
     SysTime timeLastAccessed;
     SysTime timeLastModified;

     this(DirEntry d)
     {
         this.name = d.name;
         this.isDir = d.isDir;
         this.size = d.size;
         this.timeCreated = d.timeCreated;
         this.timeLastAccessed = d.timeLastAccessed;
         this.timeLastModified = d.timeLastModified;
     }
}

void main()
{
     writeln("Scanning drives...");
     info[] results;
     for (char c = 'A'; c <= 'Z'; c++)
     {
         if (exists(c ~ ":\\"))
             results ~= info(DirEntry(c ~ ":\\")) ~ scan(c ~ 
":\\");
     }
     File f = File("driveInfo.txt", "w");
     foreach (i; results)
     {
         f.writeln(i);
     }
     f.close();
     writeln(memSize(results));
}

info[] scan(string dir)
{
     info[] results;
     try
     {
         auto de = dirEntries(dir, SpanMode.shallow);
         foreach (d; de)
         {
             if (d.isDir())
                 results ~= info(d) ~ scan(d.name);
             else
                     results ~= info(d);
         }
     }
     catch (FileException fe){}
     return results;
}

size_t memSize(T)(T[] t)
{
     return t.length * T.sizeof;
}

But main's first writeln actually outputs after f.close().

The program uses ~1GB of RAM overall, even though main.results is 
~50MB according to memSize. Any attempts to call the GC do 
nothing, but I'm probably doing it wrong though. Is it possible 
that I have a memory leak somewhere that is causing this delay, 
or is this a DMD problem, or something else I haven't even 
thought of?

DMD v2.060, Windows 7 x64

Thanks,

Josh
Feb 01 2013
parent reply FG <home fgda.pl> writes:
On 2013-02-02 07:49, Josh wrote:
 But main's first writeln actually outputs after f.close().
Maybe because of output buffering and you need to add flush?
 The program uses ~1GB of RAM overall, even though main.results is ~50MB
 according to memSize.
Wrong comparison. You should have compared to the output file's size. info.sizeof doesn't include the size of info.name. In my test results were 30 MB but the output file was 101 MB.
 Any attempts to call the GC do nothing, but I'm probably
 doing it wrong though. Is it possible that I have a memory leak somewhere that
 is causing this delay, or is this a DMD problem, or something else I haven't
 even thought of?
It's a problem with array concatenation and memory fragmentation. There are two ways out of this mess. First, there's the "Unix Way" - don't use the results array at all and print out info records immediately as they appear. Your program will use under 3 MB of RAM. :) But if you really have to do some extra processing of results, then you can reduce the memory used like 6 times by using an Appender instead of concatenating arrays and by reserving a reasonable number of records, to limit possible initial fragmentation when the array is resized. My program used 145 MB whereas the output file was 101 MB. I think it's good enough. But that's scanning just one, non-system drive. Won't even try to scan them all. Try it out: import std.datetime; import std.file; import std.stdio; import std.array; struct info { string name; bool isDir; ulong size; SysTime timeCreated; SysTime timeLastAccessed; SysTime timeLastModified; this(DirEntry d) { this.name = d.name; this.isDir = d.isDir; this.size = d.size; this.timeCreated = d.timeCreated; this.timeLastAccessed = d.timeLastAccessed; this.timeLastModified = d.timeLastModified; } } alias Appender!(info[]) InfoAppender; void main() { writeln("Scanning drives..."); info[] results; InfoAppender app = appender(results); app.reserve(512*1024); // for (char c = 'D'; c <= 'D'; c++) { if (exists(c ~ ":\\")) { app.put(info(DirEntry(c ~ ":\\"))); scan(c ~ ":\\", app); } } File f = File("driveInfo.txt", "w"); foreach (i; app.data) { f.writeln(i); } f.close(); writeln(memSize(app.data)); } void scan(string dir, ref InfoAppender app) { try { auto de = dirEntries(dir, SpanMode.shallow); foreach (d; de) { app.put(info(d)); if (d.isDir()) scan(d.name, app); } } catch (FileException fe){} } size_t memSize(T)(T[] t) { return t.length * T.sizeof; } And the preferred version to scan whole filesystem without wasting RAM: :) import std.datetime; import std.file; import std.stdio; struct info { string name; bool isDir; ulong size; SysTime timeCreated; SysTime timeLastAccessed; SysTime timeLastModified; this(DirEntry d) { this.name = d.name; this.isDir = d.isDir; this.size = d.size; this.timeCreated = d.timeCreated; this.timeLastAccessed = d.timeLastAccessed; this.timeLastModified = d.timeLastModified; } } File f; void main() { writeln("Scanning drives..."); f = File("driveInfo.txt", "w"); for (char c = 'D'; c <= 'D'; c++) { if (exists(c ~ ":\\")) { f.writeln(info(DirEntry(c ~ ":\\"))); scan(c ~ ":\\"); } } f.close(); } void scan(string dir) { try { auto de = dirEntries(dir, SpanMode.shallow); foreach (d; de) { f.writeln(info(d)); if (d.isDir()) scan(d.name); } } catch (FileException fe){} }
Feb 02 2013
next sibling parent reply "Namespace" <rswhite4 googlemail.com> writes:
I'm was quite surprised as I had heard the first time of these 
'appender'.
Therefore I'm asking me:
Why is the 'appender' method so much more efficient?
Feb 02 2013
parent reply "Era Scarecrow" <rtcvb32 yahoo.com> writes:
On Saturday, 2 February 2013 at 11:33:07 UTC, Namespace wrote:
 I'm was quite surprised as I had heard the first time of these 
 'appender'.
 Therefore I'm asking me:
 Why is the 'appender' method so much more efficient?
Because concatenation likely will end up relocating the memory over and over again (possibly lots of abandoned slices too) whereas the appender will allocated & reuse the same buffer. On the other hand you might be better off (depending on the need) with just reserving a lot more memory (close to what you need or slightly larger) and then concatenation will perform much better. But if it need to reallocate then you've still got the same problem with slices where with the appender you probably won't. I tend to reserve string concatenation for assert messages, and quicker & dirty text manipulation/building like during ctfe. Outside of those I use alternatives first.
Feb 02 2013
parent reply FG <home fgda.pl> writes:
On 2013-02-02 13:50, Era Scarecrow wrote:
 On Saturday, 2 February 2013 at 11:33:07 UTC, Namespace wrote:
 I'm was quite surprised as I had heard the first time of these 'appender'.
 Therefore I'm asking me:
 Why is the 'appender' method so much more efficient?
Because concatenation likely will end up relocating the memory over and over again (possibly lots of abandoned slices too) whereas the appender will allocated & reuse the same buffer. On the other hand you might be better off (depending on the need) with just reserving a lot more memory (close to what you need or slightly larger) and then concatenation will perform much better. But if it need to reallocate then you've still got the same problem with slices where with the appender you probably won't.
Yeah but let us move reallocation out of the equation. Reserving space limits the amount of RAM used and can avoid reallocations all together but in a little test it came out that still appender is 2.5-4 times faster than tab ~= str, where tab is char[] (same when it is ubyte[]). Why is that? Let's say we have this code: char[] tab; string fill = "1234"; uint loops = 100_000_000; tab.reserve(loops * fill.length); foreach (i; 0..loops) tab ~= fill; Using Appender changes the above loop into something like this: size_t capacity = tab.capacity; foreach (i; 0..loops) { size_t flen = fill.length; size_t len = tab.length; if (len + flen > capacity) { ... // use GC.extend or GC.qalloc & memcpy // update capacity and assign tab = memory_block[0..len] } tab = tab.ptr[0..len+flen]; tab.ptr[len..len+flen] = fill[]; } Why cannot tab ~= fill achieve similar performance as this code? Am I missing something important?
Feb 02 2013
next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Sat, 02 Feb 2013 10:47:39 -0500, FG <home fgda.pl> wrote:

 On 2013-02-02 13:50, Era Scarecrow wrote:
 On Saturday, 2 February 2013 at 11:33:07 UTC, Namespace wrote:
 I'm was quite surprised as I had heard the first time of these  
 'appender'.
 Therefore I'm asking me:
 Why is the 'appender' method so much more efficient?
Because concatenation likely will end up relocating the memory over and over again (possibly lots of abandoned slices too) whereas the appender will allocated & reuse the same buffer. On the other hand you might be better off (depending on the need) with just reserving a lot more memory (close to what you need or slightly larger) and then concatenation will perform much better. But if it need to reallocate then you've still got the same problem with slices where with the appender you probably won't.
Yeah but let us move reallocation out of the equation. Reserving space limits the amount of RAM used and can avoid reallocations all together but in a little test it came out that still appender is 2.5-4 times faster than tab ~= str, where tab is char[] (same when it is ubyte[]). Why is that? Let's say we have this code: char[] tab; string fill = "1234"; uint loops = 100_000_000; tab.reserve(loops * fill.length); foreach (i; 0..loops) tab ~= fill; Using Appender changes the above loop into something like this: size_t capacity = tab.capacity; foreach (i; 0..loops) { size_t flen = fill.length; size_t len = tab.length; if (len + flen > capacity) { ... // use GC.extend or GC.qalloc & memcpy // update capacity and assign tab = memory_block[0..len] } tab = tab.ptr[0..len+flen]; tab.ptr[len..len+flen] = fill[]; } Why cannot tab ~= fill achieve similar performance as this code? Am I missing something important?
Appender is built for appending and has higher up-front costs, and higher memory costs. Normal arrays can be used for appending, but also can be used for so many other things. In other words, Appender cannot have the features that standard arrays have in order to make appending as fast as possible. -Steve
Feb 02 2013
prev sibling parent reply "Era Scarecrow" <rtcvb32 yahoo.com> writes:
On Saturday, 2 February 2013 at 15:47:37 UTC, FG wrote:
 Yeah but let us move reallocation out of the equation. 
 Reserving space limits the amount of RAM used and can avoid 
 reallocations all together but in a little test it came out 
 that still appender is 2.5-4 times faster than tab ~= str, 
 where tab is char[] (same when it is ubyte[]). Why is that?
Reserving doesn't mean it has to allocate any memory at all; That's up to the implementation of the runtime and OS, it may just say 'any allocations not related to this object start after address xxx', then as the reserved space needs more it requests the larger space from the OS. The limitations of how much you can use memory-wise doesn't change as that's determined by the infrastructure. (VM and harddrive space doesn't count).
 Why cannot tab ~= fill achieve similar performance as this code?
 Am I missing something important?
As for how many checks and overhead the runtime has I'm not sure. Try compiling the code again with full optimizations on and see if it makes a difference. Concatenation may perform better like that (as long as relocation/copying isn't done) or maybe not...
Feb 02 2013
parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Sunday, February 03, 2013 02:27:16 Era Scarecrow wrote:
 On Saturday, 2 February 2013 at 15:47:37 UTC, FG wrote:
 Yeah but let us move reallocation out of the equation.
 Reserving space limits the amount of RAM used and can avoid
 reallocations all together but in a little test it came out
 that still appender is 2.5-4 times faster than tab ~= str,
 where tab is char[] (same when it is ubyte[]). Why is that?
Reserving doesn't mean it has to allocate any memory at all; That's up to the implementation of the runtime and OS, it may just say 'any allocations not related to this object start after address xxx', then as the reserved space needs more it requests the larger space from the OS. The limitations of how much you can use memory-wise doesn't change as that's determined by the infrastructure. (VM and harddrive space doesn't count).
reserve should guarantee that you have at least the requested amount of memory already allocated, or it's broken. Its whole purpose is to guarantee that capacity is at least as large as the amount being reserved so that you can do a single allocation up front rather than having to worry about reallocations occuring as you append. - Jonathan M Davis
Feb 02 2013
parent "Era Scarecrow" <rtcvb32 yahoo.com> writes:
On Sunday, 3 February 2013 at 01:41:26 UTC, Jonathan M Davis 
wrote:
 reserve should guarantee that you have at least the requested 
 amount of memory already allocated, or it's broken. Its whole 
 purpose is to guarantee that capacity is at least as large as 
 the amount being reserved so that you can do a single 
 allocation up front rather than having to worry about 
 reallocations occurring as you append.
Allocated, set aside, uninterrupted continuous block, etc. As long as the space is actually available when it's needed it's implemented doesn't matter. In the end it's up to the OS. The OS could be using Virtual Memory space by compressing data, then decompressing when it's requested/needed; In that case the data pre-allocated is empty (zeroed) it can compress instantly to almost nothing (Compressed swapfile driver in Linux for example). The most important part is making sure nothing else can claim any memory in the reserved block.
Feb 02 2013
prev sibling parent reply "Josh" <moonburntm gmail.com> writes:
On Saturday, 2 February 2013 at 11:16:50 UTC, FG wrote:
 On 2013-02-02 07:49, Josh wrote:
 But main's first writeln actually outputs after f.close().
Maybe because of output buffering and you need to add flush?
How do I do that? I've never heard of flush before.
 The program uses ~1GB of RAM overall, even though main.results 
 is ~50MB
 according to memSize.
Wrong comparison. You should have compared to the output file's size. info.sizeof doesn't include the size of info.name. In my test results were 30 MB but the output file was 101 MB.
Yeah, my output file was 126MB. I figured strings had a set amount of memory they could use, like an int can use exactly 4 bytes.
 But if you really have to do some extra processing of results, 
 then you can reduce the memory used like 6 times by using an 
 Appender instead of concatenating arrays and by reserving a 
 reasonable number of records, to limit possible initial 
 fragmentation when the array is resized. My program used 145 MB 
 whereas the output file was 101 MB. I think it's good enough.
I've never come across Appenders before. Could you please explain them a little bit, and what each call in your modified code does? Thanks, Josh
Feb 02 2013
next sibling parent reply "Namespace" <rswhite4 googlemail.com> writes:
 I've never come across Appenders before. Could you please 
 explain them a little bit, and what each call in your modified 
 code does?

 Thanks,

 Josh
And read Era's post.
Feb 02 2013
parent reply "Andrea Fontana" <nospam example.com> writes:
On Saturday, 2 February 2013 at 15:39:00 UTC, Namespace wrote:
 I've never come across Appenders before. Could you please 
 explain them a little bit, and what each call in your modified 
 code does?

 Thanks,

 Josh
And read Era's post.
Why Appender has no ~ operator itself? auto app = appender!string(); app.put("a"); // why not app ~= "a"?
Feb 13 2013
next sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Andrea Fontana:

 Why Appender has no ~ operator itself?

 auto app = appender!string();
 app.put("a");  // why not app ~= "a"?
Now it's in GIT head. Bye, bearophile
Feb 13 2013
parent Lee Braiden <leebraid gmail.com> writes:
On Wed, 13 Feb 2013 17:00:39 +0100, bearophile wrote:
 Andrea Fontana:
 Why Appender has no ~ operator itself?

 auto app = appender!string(); app.put("a");  // why not app ~= "a"?
Now it's in GIT head.
That's a nice touch. Good idea, Andrea :) -- Lee
Feb 23 2013
prev sibling parent "Jesse Phillips" <Jessekphillips+d gmail.com> writes:
On Wednesday, 13 February 2013 at 15:53:44 UTC, Andrea Fontana 
wrote:
 Why Appender has no ~ operator itself?

 auto app = appender!string();
 app.put("a");  // why not app ~= "a"?
Because Appender is an output range, using ~= will mean your code will not work with other output ranges.
Feb 23 2013
prev sibling parent FG <home fgda.pl> writes:
On 2013-02-02 16:22, Josh wrote:
 Maybe because of output buffering and you need to add flush?
How do I do that? I've never heard of flush before.
writeln("..."); stdout.flush(); C also has flush, in the form of fflush(stdout);
 I've never come across Appenders before. Could you please explain them a little
 bit, and what each call in your modified code does?
I'm baffled myself as to why they perform better than normal array concatenation. We'll see if some interesting explanation appears. That's why I changed the topic to "Array concatenation vs. Appender".
Feb 02 2013