www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Reserving/Preallocating associative array?

reply "Gordon" <me home.com> writes:
Hello,

I want to load a large text file containing two numeric fields 
into an associative array.
The file looks like:
    1   40
    4   2
    42  11
    ...

And has 11M lines.

My code looks like this:
===
void main()
{
         size_t[size_t] unions;
         auto f = File("input.txt");
         foreach ( line ; f.byLine() ) {
                 auto fields = line.split();
                 size_t i = to!size_t(fields[0]);
                 size_t j = to!size_t(fields[1]);
                 unions[i] = j; // <-- here be question
         }
}
===

This is just a test code to illustrate my question (though 
general comments are welcomed - I'm new to D).

Commenting out the highlighted line (not populating the hash), 
the program completes in 25 seconds.
Compiling with the highlighted line, the program takes ~3.5 
minutes.

Is there a way to speed the loading? perhaps reserving memory in 
the hash before populating it? Or another trick?

Many thanks,
  -gordon
Dec 24 2013
next sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Gordon:

 void main()
 {
         size_t[size_t] unions;
         auto f = File("input.txt");
         foreach ( line ; f.byLine() ) {
                 auto fields = line.split();
                 size_t i = to!size_t(fields[0]);
                 size_t j = to!size_t(fields[1]);
                 unions[i] = j; // <-- here be question
         }
 }
 ...
 Is there a way to speed the loading? perhaps reserving memory 
 in the hash before populating it? Or another trick?
Built-in associative arrays don't yet have a "reserve". Some ides to speed up your code: - Try to use parse instead of split + to!size_t - Try to use byLineFast, in the attach here (the code is not mine): https://d.puremagic.com/issues/attachment.cgi?id=1305 - Try to disable the GC before the associative array creation and re-enable it when it's built. (This could double your max memory usage or more). To disable it import GC from core.memory and call GC.disable; and GC.enable; Bye, bearophile
Dec 24 2013
parent reply "bearophile" <bearophileHUGS lycos.com> writes:
 Some ides to speed up your code:
 - Try to use parse instead of split + to!size_t
 - Try to use byLineFast, in the attach here (the code is not 
 mine): https://d.puremagic.com/issues/attachment.cgi?id=1305
 - Try to disable the GC before the associative array creation 
 and re-enable it when it's built. (This could double your max 
 memory usage or more). To disable it import GC from core.memory 
 and call GC.disable; and GC.enable;
This code is almost 2.5X faster on my PC (but I am using only 300_000 lines of text): void main() { import std.stdio, std.conv, std.string, core.memory; import bylinefast; GC.disable; size_t[size_t] unions; foreach (line; "input.txt".File.byLineFast) { line.munch(" \t"); // skip ws immutable i = line.parse!size_t; line.munch(" \t"); // skip ws immutable j = line.parse!size_t; unions[i] = j; } GC.enable; } Bye, bearophile
Dec 24 2013
next sibling parent reply "Gordon" <me home.com> writes:
Hello Bearophine,

On Tuesday, 24 December 2013 at 23:13:59 UTC, bearophile wrote:
 Some ides to speed up your code:
 - Try to use parse instead of split + to!size_t
 - Try to use byLineFast, in the attach here (the code is not 
 mine): https://d.puremagic.com/issues/attachment.cgi?id=1305
 - Try to disable the GC before the associative array creation 
 and re-enable it when it's built. (This could double your max 
 memory usage or more). To disable it import GC from 
 core.memory and call GC.disable; and GC.enable;
This code is almost 2.5X faster on my PC (but I am using only 300_000 lines of text):
You're just too fast for me... After incorporating your three suggestions, the entire file (11M lines) loads in just 25 seconds (down from 3.5 minutes). AMAZING! Many thanks, -gordon
Dec 24 2013
parent reply Philippe Sigaud <philippe.sigaud gmail.com> writes:
Out of curiosity, do you know which one of his 3 suggestions brought you
the highest speed boost? What happens if you do not disable the GC, for
example?
Dec 24 2013
parent reply "Gordon" <me home.com> writes:
On Wednesday, 25 December 2013 at 08:27:53 UTC, Philippe Sigaud 
wrote:
 Out of curiosity, do you know which one of his 3 suggestions 
 brought you
 the highest speed boost? What happens if you do not disable the 
 GC, for
 example?
Good question, I did test each one incrementally: 1. Switching from "split+to!size_t" to "parse" (and commenting out the "union") reduced running time from ~26s to ~20s (on average). 2. Switching from "byLine" to "byLineFast" (and commenting out the "union") reduced running time from ~20s to ~14s (on average). 3. With "parse" and "byLineFast", and with GC still on, and populating the "union" the program took about 2m45s . 4. Adding "GC.disable" brought it down to 25s. HTH, -gordon
Dec 25 2013
next sibling parent Marco Leise <Marco.Leise gmx.de> writes:
Am Wed, 25 Dec 2013 16:12:09 +0000
schrieb "Gordon" <me home.com>:

 On Wednesday, 25 December 2013 at 08:27:53 UTC, Philippe Sigaud 
 wrote:
 Out of curiosity, do you know which one of his 3 suggestions 
 brought you
 the highest speed boost? What happens if you do not disable the 
 GC, for
 example?
Good question, I did test each one incrementally: 1. Switching from "split+to!size_t" to "parse" (and commenting out the "union") reduced running time from ~26s to ~20s (on average). 2. Switching from "byLine" to "byLineFast" (and commenting out the "union") reduced running time from ~20s to ~14s (on average). 3. With "parse" and "byLineFast", and with GC still on, and populating the "union" the program took about 2m45s . 4. Adding "GC.disable" brought it down to 25s. HTH, -gordon
So for the record: The garbage collector slowed down this piece of code by a factor of 6.6 -- Marco
Dec 26 2013
prev sibling parent reply Benjamin Thaut <code benjamin-thaut.de> writes:
Am 25.12.2013 17:12, schrieb Gordon:
 Good question, I did test each one incrementally:

 1. Switching from "split+to!size_t" to "parse" (and commenting out the
 "union") reduced running time from ~26s to ~20s (on average).

 2. Switching from "byLine" to "byLineFast" (and commenting out the
 "union") reduced running time from ~20s to ~14s (on average).

 3. With "parse" and "byLineFast", and with GC still on, and populating
 the "union" the program took about 2m45s .

 4. Adding "GC.disable" brought it down to 25s.

 HTH,
   -gordon
So I was interrested how far you can take this. The version bearophile posted runs in 14 seconds (48420527 ticks) on my machine. I created a new version using my own hashmap implementation and manual memory management (no GC present at all). This version runs in 12 seconds (41614375 ticks) Preallocating all the entries for the hashmap brings quite a boost. It brings it down to 9 seconds (32926550 ticks) If you replace the default hash function D uses with the MurMur hash function it brings it down even more to 8 seconds (29425713 ticks) I also tried not using any hash function at all, but that made the time skyrocket. I stopped at that point, because now the most time is spend looking for a free entry in the hashmap, which pretty much comes down to cache-misses. At this point the profiler looked something like this: 50% find free entry in hashmap 21% parse uint 9% find end of line + tab 3.5% read data from disk The ticks in my measurements have been obtained via QueryPerformanceCounter. Kind Regards Benjamin Thaut
Dec 27 2013
next sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Benjamin Thaut:

 If you replace the default hash function D uses with the MurMur 
 hash function it brings it down even more to 8 seconds 
 (29425713 ticks)
What compiler are you using? ldc2? And is it a good idea to put MurMur in D?
 50%  find free entry in hashmap
 21%  parse uint
Perhaps the parse uint times can be improved. A lightly tested function to try: uint toUint(const(char)[] txt) pure nothrow { auto p = txt.ptr; const q = p + txt.length; // Skip leading not-digits. while (p < q && (*p < '0' || *p > '9')) p++; uint result = 0; while (p < q && *p >= '0' && *p <= '9') { result = (result * 10) + (*p - '0'); p++; } return result; } void main() { import std.stdio; "".toUint.writeln; "x".toUint.writeln; "-1".toUint.writeln; " 125 ".toUint.writeln; " 10000".toUint.writeln; " 1000000 ".toUint.writeln; " 10000000000000 ".toUint.writeln; } Its output (it's not a safe conversion): 0 0 1 125 10000 1000000 1316134912 Bye, bearophile
Dec 27 2013
parent Benjamin Thaut <code benjamin-thaut.de> writes:
Am 27.12.2013 12:02, schrieb bearophile:
 Benjamin Thaut:

 If you replace the default hash function D uses with the MurMur hash
 function it brings it down even more to 8 seconds (29425713 ticks)
What compiler are you using? ldc2?
dmd 2.064.2
 And is it a good idea to put MurMur in D?
I don't know about the properties of the MurMur hash function. I only know that it is cheaper to compute then the hash function D uses. To decide if it would be better then the currently choosen hash function, a in depth analysis would be required.
 50%  find free entry in hashmap
 21%  parse uint
Perhaps the parse uint times can be improved. A lightly tested function to try: uint toUint(const(char)[] txt) pure nothrow { auto p = txt.ptr; const q = p + txt.length; // Skip leading not-digits. while (p < q && (*p < '0' || *p > '9')) p++; uint result = 0; while (p < q && *p >= '0' && *p <= '9') { result = (result * 10) + (*p - '0'); p++; } return result; }
The way I parse the uint already looks that way. I'm not using std.conv https://github.com/Ingrater/thBase/blob/master/src/thBase/conv.d#L22 Kind Regards Benjamin Thaut
Dec 27 2013
prev sibling next sibling parent reply "Gordon" <me home.com> writes:
On Friday, 27 December 2013 at 09:37:09 UTC, Benjamin Thaut wrote:
 Am 25.12.2013 17:12, schrieb Gordon:

 So I was interrested how far you can take this. The version 
 bearophile posted runs in 14 seconds (48420527 ticks) on my 
 machine.
<...>
 At this point the profiler looked something like this:
 50%  find free entry in hashmap
 21%  parse uint
 9%   find end of line + tab
 3.5% read data from disk
This is turning into very interesting (and useful) discussion. Thank you! I've created a generic "FileSlurp" library, to load a file (using "byLineFast") and use a delegate to store the data (so I can later use any improved methods you suggest). It's available here, comments are welcomed: http://code.dlang.org/packages/fileslurp Benjamin, Can you point me to your Hashmap implementation? I could perhaps use it to improve the timings even more. Bearophile, Who is "JM" who wrote "byLineFast" ? I'm using his code and would like to properly acknowledge him/her. Thanks! -gordon
Dec 27 2013
next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
Gordon:

 Who is "JM" who wrote "byLineFast" ? I'm using his code and 
 would like to properly acknowledge him/her.
It should be Juan Manuel Cabo: http://permalink.gmane.org/gmane.comp.lang.d.general/117750 Bye, bearophile
Dec 27 2013
prev sibling next sibling parent Benjamin Thaut <code benjamin-thaut.de> writes:
Am 27.12.2013 17:49, schrieb Gordon:
 Benjamin,
 Can you point me to your Hashmap implementation? I could perhaps use it
 to improve the timings even more.
https://github.com/Ingrater/druntime/blob/merge64/src/core/hashmap.d This implementation depends on my own allocator design, but it should be possible to remove that dependeny quite easly by replacing all allocations / frees with malloc/free or GC.malloc / nothing. Just make sure that memory is not initialized beforehand (as new ubyte[sizeInBytes] would do) because that also has a considerable performance impact. Also when allocating with GC.malloc you should specify the GC.BlkAttr.NO_SCAN flag, because you know that your data does not contain any pointers. That way the GC will not scan that huge memory block and it should speed up collection a lot. To improve the performance of the hashmap you can try: - specifingy different hashing functions (via the hash policy). See https://github.com/Ingrater/thBase/blob/master/src/thBase/policies/hashing.d for more examples of hashing policies. - Change the amount of always free entries in the hashmap (currently 25%) for that change line 233, 207, 210 (not factored into a constant yet). Reducing the free entries might result in less cache misses, but more linear search, as this is a linear probing hashmap. Increasing the free entries might reduce the amount of linear search, but increase cache misses and increases memory usage. - Compute a series of prime numbers, for which each prime number is at least twice as big as the previous one and use that as the possible sizes for the hashmap. Prime number sizes give better distribution of the items within the hashmap, therefor reduce the amount of linear searching neccessary and thus improve the hashmap performance. - When searching for the next free spot in the hashmap, it currently just adds 1 to the previous index found (line 301). It is also possible to rehash the previous index (just put it into the hashing function again), which would give a new index that is more distant in the hashmap from the current one. This would again improve the distribution of the items in the hashmap, and thus reduce linear search time, but may increase the amount of cache-misses as it no longer does linear memory access. In case you are interrested in my implementation of your problem see here: http://dpaste.dzfl.pl/8d0d174e You won't be able to compile the code unless you setup my custom D environment. Which I would not recommend unless you really hate the D garbage collector ;-) Kind Regards Benjamin Thaut
Dec 27 2013
prev sibling parent reply Benjamin Thaut <code benjamin-thaut.de> writes:
Also, gordon whats is keeping you from converting the textual 
representation to a binary one? I mean, if you need to read that file in 
often, you can easly preprocess it into a binary blob. You could even 
modify my hashmap implementation to support binary serialization, by 
just writing the entire array to a binary file and loading it again.
That would be fast...

Kind Regards
Benjamin Thaut
Dec 27 2013
parent reply "Gordon" <me home.com> writes:
On Friday, 27 December 2013 at 17:26:42 UTC, Benjamin Thaut wrote:
 Also, gordon whats is keeping you from converting the textual 
 representation to a binary one? I mean, if you need to read 
 that file in often, you can easly preprocess it into a binary 
 blob. You could even modify my hashmap implementation to 
 support binary serialization, by just writing the entire array 
 to a binary file and loading it again.
 That would be fast...
This is intended to be a general-purpose CLI program to handle certain text files (from other sources), not a in-house application where I can restructure my data. Text files are still the most ubiquitous way to share data (especially without limiting the languages/tools other will use to process them). But out of curiosity - how would you serialize it?
Dec 27 2013
parent Benjamin Thaut <code benjamin-thaut.de> writes:
Am 27.12.2013 20:55, schrieb Gordon:
 On Friday, 27 December 2013 at 17:26:42 UTC, Benjamin Thaut wrote:
 Also, gordon whats is keeping you from converting the textual
 representation to a binary one? I mean, if you need to read that file
 in often, you can easly preprocess it into a binary blob. You could
 even modify my hashmap implementation to support binary serialization,
 by just writing the entire array to a binary file and loading it again.
 That would be fast...
This is intended to be a general-purpose CLI program to handle certain text files (from other sources), not a in-house application where I can restructure my data. Text files are still the most ubiquitous way to share data (especially without limiting the languages/tools other will use to process them). But out of curiosity - how would you serialize it?
Given my hashmap implementation I would do something like // method of hashmap void serializeTo(FILE* f) { fwrite(&m_FullCount, typeof(m_FullCount).sizeof, 1, f); ulong len = m_Data.length; fwrite(&len, typeof(len).sizeof, 1, f); fwrite(m_Data.ptr, Pair.sizeof, len, f); } // construct from file void this(FILE* f) { fread(&m_FullCount, typeof(m_FullCount).sizeof, 1, f); ulong len; fread(&len, typeof(len).sizeof, 1, f); m_Data = (cast(Pair*)malloc(Pair.sizeof * len))[0..len]; fread(m_Data.ptr, Pair.sizeof, len, f); } The deserialization would be just one allocation and a big binary read from a file, which should be quite fast. Kind Regards Benjamin Thaut
Dec 27 2013
prev sibling parent "Peter Alexander" <peter.alexander.au gmail.com> writes:
On Friday, 27 December 2013 at 09:37:09 UTC, Benjamin Thaut wrote:
 At this point the profiler looked something like this:
 50%  find free entry in hashmap
When I'm building large immutable hash tables I do this: Read in all the unordered key-value pairs into an array of Tuple!(Key, Value) Determine the number of buckets you want (B) Schwartz sort the array by hash(key) % B (a radix sort will probably be efficient here, but any will do) This groups all items with the same bucket index together. You can then easily generate the array of bucket pointers by iterating the array. Of course, you need your own hash table implementation to do this. You cannot do it using the built-in hash tables.
Dec 27 2013
prev sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
     GC.disable;
     size_t[size_t] unions;
     foreach (line; "input.txt".File.byLineFast) {
         line.munch(" \t"); // skip ws
         immutable i = line.parse!size_t;
         line.munch(" \t"); // skip ws
         immutable j = line.parse!size_t;
         unions[i] = j;
     }
     GC.enable;
 }
A question for people that know something about the D garbage collector: can it be added some logic to the GC to detect the situations where you have to allocate many times very frequently, and disable or rarefy the collection frequency, to increase the code performance a lot? (I think "recently" the Python garbage collector has added such optimization). This way that enabling and disabling the GC becomes not needed any more. Bye, bearophile
Dec 26 2013
parent "thedeemon" <dlang thedeemon.com> writes:
On Thursday, 26 December 2013 at 14:09:29 UTC, bearophile wrote:

 A question for people that know something about the D garbage 
 collector: can it be added some logic to the GC to detect the 
 situations where you have to allocate many times very 
 frequently, and disable or rarefy the collection frequency, to 
 increase the code performance a lot? (I think "recently" the 
 Python garbage collector has added such optimization). This way 
 that enabling and disabling the GC becomes not needed any more.
I think it's possible. During the sweep phase where GC returns unmarked memory blocks (garbage) to the free lists, it can calculate total amount of freed memory at this stage. Probably this stat is already being calculated. It also knows total size of its heap, so after a collection it can know how much % of the heap was freed. When the program is in a state where it mostly allocates, this % will be very low compared to usual case. When GC sees it's so low it can increase the effective threshold for the next collection, making collections less often. I think actual parameter will be the size of pool allocated from system memory. AFAIR, currently GC is triggered when there are not enough free blocks in the current pool, and if after collection there is still not enough free space a new pool gets allocated. Its size is currently constant, making long allocating loops run in O(n^2) but if the pool size will grow in cases where % of collected memory is low and shrink back when % of freed space is bigger again, then GC will adapt to allocation rate and such scenarios will work faster.
Dec 26 2013
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
Gordon:

 The file looks like:
    1   40
    4   2
    42  11
    ...

 And has 11M lines.
What's the interval of the numbers of the first column? Bye, bearophile
Dec 24 2013
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/24/13 2:28 PM, Gordon wrote:
 Hello,

 I want to load a large text file containing two numeric fields into an
 associative array.
 The file looks like:
     1   40
     4   2
     42  11
     ...

 And has 11M lines.

 My code looks like this:
 ===
 void main()
 {
          size_t[size_t] unions;
          auto f = File("input.txt");
          foreach ( line ; f.byLine() ) {
                  auto fields = line.split();
                  size_t i = to!size_t(fields[0]);
                  size_t j = to!size_t(fields[1]);
                  unions[i] = j; // <-- here be question
          }
 }
 ===

 This is just a test code to illustrate my question (though general
 comments are welcomed - I'm new to D).

 Commenting out the highlighted line (not populating the hash), the
 program completes in 25 seconds.
 Compiling with the highlighted line, the program takes ~3.5 minutes.

 Is there a way to speed the loading? perhaps reserving memory in the
 hash before populating it? Or another trick?
void main() { size_t[size_t] unions; foreach (e; "input.txt" .slurp!(size_t, size_t)("%s %s").sort.uniq ) { unions[e[0]] = e[1]; } } Andrei
Dec 24 2013
next sibling parent reply "Gordon" <me home.com> writes:
On Tuesday, 24 December 2013 at 23:52:49 UTC, Andrei Alexandrescu 
wrote:
 void main()
 {
     size_t[size_t] unions;
     foreach (e; "input.txt"
             .slurp!(size_t, size_t)("%s %s").sort.uniq ) {
         unions[e[0]] = e[1];
     }
 }
Thanks for this very elegant solution! For completeness (since we're dealing with timing): 1. Running the above code with Garbage-collection enabled, takes 1m45s. 2. Running it with GC disabled takes 50s . 3. Running with GC disabled and without sort+uniq (my data is already uniq'd), takes 40s. This is a beautiful idiomatic D code, but for now I think I'll stick with the more verbose code above. Ine reason is that I want to provide helpful and verbose error messages for invalid input (e.g. "Invalid numeric value found in field 3 line 1000") - and with the "slurp" method, any input error will result in a not-so-helpful exception (e.g. "std.conv.ConvException /usr/include/dmd/phobos/std/conv.d(2009): Unexpected 'H' when converting from type char[] to type ulong). Many thanks, -gordon
Dec 25 2013
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/25/13 8:29 AM, Gordon wrote:
 On Tuesday, 24 December 2013 at 23:52:49 UTC, Andrei Alexandrescu wrote:
 void main()
 {
     size_t[size_t] unions;
     foreach (e; "input.txt"
             .slurp!(size_t, size_t)("%s %s").sort.uniq ) {
         unions[e[0]] = e[1];
     }
 }
Thanks for this very elegant solution! For completeness (since we're dealing with timing): 1. Running the above code with Garbage-collection enabled, takes 1m45s. 2. Running it with GC disabled takes 50s . 3. Running with GC disabled and without sort+uniq (my data is already uniq'd), takes 40s.
Thanks for the numbers!
 This is a beautiful idiomatic D code, but for now I think I'll stick
 with the more verbose code above.

 Ine reason is that I want to provide helpful and verbose error messages
 for invalid input (e.g. "Invalid numeric value found in field 3 line
 1000") - and with the "slurp" method, any input error will result in a
 not-so-helpful exception (e.g.
 "std.conv.ConvException /usr/include/dmd/phobos/std/conv.d(2009):
 Unexpected 'H' when converting from type char[] to type ulong).
Thanks, I added https://d.puremagic.com/issues/show_bug.cgi?id=11816 to keep track of this. Andrei
Dec 25 2013
parent Philippe Sigaud <philippe.sigaud gmail.com> writes:
On Wed, Dec 25, 2013 at 5:47 PM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 On 12/25/13 8:29 AM, Gordon wrote:
 For completeness (since we're dealing with timing):

 1. Running the above code with Garbage-collection enabled, takes 1m45s.

 2. Running it with GC disabled takes 50s .

 3. Running with GC disabled and without sort+uniq (my data is already
 uniq'd), takes 40s.
Thanks for the numbers!
Yes, indeed, thanks. I would have thought sorting (particularly the built-in sort) + uniq would 'cost' more.
Dec 25 2013
prev sibling parent reply "John Colvin" <john.loughran.colvin gmail.com> writes:
On Tuesday, 24 December 2013 at 23:52:49 UTC, Andrei Alexandrescu 
wrote:
 On 12/24/13 2:28 PM, Gordon wrote:
 Hello,

 I want to load a large text file containing two numeric fields 
 into an
 associative array.
 The file looks like:
    1   40
    4   2
    42  11
    ...

 And has 11M lines.

 My code looks like this:
 ===
 void main()
 {
         size_t[size_t] unions;
         auto f = File("input.txt");
         foreach ( line ; f.byLine() ) {
                 auto fields = line.split();
                 size_t i = to!size_t(fields[0]);
                 size_t j = to!size_t(fields[1]);
                 unions[i] = j; // <-- here be question
         }
 }
 ===

 This is just a test code to illustrate my question (though 
 general
 comments are welcomed - I'm new to D).

 Commenting out the highlighted line (not populating the hash), 
 the
 program completes in 25 seconds.
 Compiling with the highlighted line, the program takes ~3.5 
 minutes.

 Is there a way to speed the loading? perhaps reserving memory 
 in the
 hash before populating it? Or another trick?
void main() { size_t[size_t] unions; foreach (e; "input.txt" .slurp!(size_t, size_t)("%s %s").sort.uniq ) { unions[e[0]] = e[1]; } } Andrei
watch out for the parenthsesis on sort. As bearophile likes to point out frequently, without parenthesis you are calling the builtin sort, not the std.algorithm one. Gordon, you may find this has better performance if you add () to sort.
Dec 25 2013
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/25/13 10:25 AM, John Colvin wrote:
 watch out for the parenthsesis on sort. As bearophile likes to point out
 frequently, without parenthesis you are calling the builtin sort, not
 the std.algorithm one.
Ew, good point. Thanks.
 Gordon, you may find this has better performance if you add () to sort.
Also, can you share your data somewhere if it's not confidential? Andrei
Dec 25 2013
next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Dec 25, 2013 at 10:51:23AM -0800, Andrei Alexandrescu wrote:
 On 12/25/13 10:25 AM, John Colvin wrote:
watch out for the parenthsesis on sort. As bearophile likes to point
out frequently, without parenthesis you are calling the builtin sort,
not the std.algorithm one.
Ew, good point. Thanks.
[...] Is the built-in sort deprecated yet? If it is, when can we finally say bye-bye to it? T -- Be in denial for long enough, and one day you'll deny yourself of things you wish you hadn't.
Dec 25 2013
parent "bearophile" <bearophileHUGS lycos.com> writes:
H. S. Teoh:

 Is the built-in sort deprecated yet? If it is, when can we 
 finally say bye-bye to it?
https://d.puremagic.com/issues/show_bug.cgi?id=10318 Bye, bearophile
Dec 25 2013
prev sibling parent reply "Gordon" <me home.com> writes:
On Wednesday, 25 December 2013 at 18:51:22 UTC, Andrei 
Alexandrescu wrote:
 On 12/25/13 10:25 AM, John Colvin wrote:

 Gordon, you may find this has better performance if you add () 
 to sort.
Also, can you share your data somewhere if it's not confidential?
It will be related to this research: http://www.familinx.org . The data is available in the "download" section as an SQL dump (which can be easily converted to just tabular files). I've also added verbose errors to "std.file.slurp()", which (IMHO) provide all the necessary information when an error occurs. Available here: https://github.com/agordon/phobos/compare/file.slurp.verbose But this has lead me to detect some "formattedRead" quirks - I'll start a new thread about those. Thanks, -gordon
Dec 25 2013
parent reply Benjamin Thaut <code benjamin-thaut.de> writes:
Am 25.12.2013 21:26, schrieb Gordon:
 On Wednesday, 25 December 2013 at 18:51:22 UTC, Andrei Alexandrescu wrote:
 On 12/25/13 10:25 AM, John Colvin wrote:

 Gordon, you may find this has better performance if you add () to sort.
Also, can you share your data somewhere if it's not confidential?
It will be related to this research: http://www.familinx.org . The data is available in the "download" section as an SQL dump (which can be easily converted to just tabular files). I've also added verbose errors to "std.file.slurp()", which (IMHO) provide all the necessary information when an error occurs. Available here: https://github.com/agordon/phobos/compare/file.slurp.verbose But this has lead me to detect some "formattedRead" quirks - I'll start a new thread about those. Thanks, -gordon
Did you use D to convert that SQL dump to tabular data? If so, could you post the small snippet? If not did you convert the "relationship" table? Kind Regards Benjamin Thaut
Dec 26 2013
parent "Gordon" <me home.com> writes:
On Thursday, 26 December 2013 at 18:07:38 UTC, Benjamin Thaut 
wrote:
 Did you use D to convert that SQL dump to tabular data? If so, 
 could you post the small snippet? If not did you convert the 
 "relationship" table?
No, I use "D" for some internal experimentation. To get the data as tabular I loaded the SQL dump to a MySQL server, then used "mysqldump --tab" to export the tables I needed. HTH, -gordon
Dec 26 2013
prev sibling parent reply "Daniel Kozak" <kozzi11 gmail.com> writes:
On Tuesday, 24 December 2013 at 22:28:21 UTC, Gordon wrote:
 Hello,

 I want to load a large text file containing two numeric fields 
 into an associative array.
 The file looks like:
    1   40
    4   2
    42  11
    ...

 And has 11M lines.

 My code looks like this:
 ===
 void main()
 {
         size_t[size_t] unions;
         auto f = File("input.txt");
         foreach ( line ; f.byLine() ) {
                 auto fields = line.split();
                 size_t i = to!size_t(fields[0]);
                 size_t j = to!size_t(fields[1]);
                 unions[i] = j; // <-- here be question
         }
 }
 ===

 This is just a test code to illustrate my question (though 
 general comments are welcomed - I'm new to D).

 Commenting out the highlighted line (not populating the hash), 
 the program completes in 25 seconds.
 Compiling with the highlighted line, the program takes ~3.5 
 minutes.

 Is there a way to speed the loading? perhaps reserving memory 
 in the hash before populating it? Or another trick?

 Many thanks,
  -gordon
using OrderedAA improve speed 3x https://github.com/Kozzi11/Trash/tree/master/util import util.orderedaa; int main(string[] args) { import std.stdio, std.conv, std.string, core.memory; import bylinefast; GC.disable; OrderedAA!(size_t, size_t, 1_000_007) unions; //size_t[size_t] unions; foreach (line; "input.txt".File.byLineFast) { line.munch(" \t"); // skip ws immutable i = line.parse!size_t; line.munch(" \t"); // skip ws immutable j = line.parse!size_t; unions[i] = j; } GC.enable; return 0; }
Dec 27 2013
parent reply Benjamin Thaut <code benjamin-thaut.de> writes:
Am 27.12.2013 19:25, schrieb Daniel Kozak:
 using OrderedAA improve speed 3x
 https://github.com/Kozzi11/Trash/tree/master/util
A possible downside of this implementation is though, that due to the fact that you are using a double linked list per index, there will be more chache misses during a read operation compared to a linear probing hashmap. Did you try using a array per index instead of a linked list, and measure if that makes any difference? Kind Regards Benjamin Thaut
Dec 27 2013
parent reply Benjamin Thaut <code benjamin-thaut.de> writes:
Because I was always curious to do some hashmap profiling with real 
data, I did some more. Here are the results:

My implementation. Power of two size (25% free):
building hashmap: 8 seconds 28880605 ticks
looking entries up: 0 seconds 31685 ticks

My implementation. Prime number size (25% free):
building hashmap: 8 seconds 26802252 ticks
looking entries up: 0 seconds 27719 ticks

My implementation. Prime number size (10% free):
building hashmap: 8 seconds 29323952 ticks
looking entries up: 0 seconds 32768 ticks

My implementation. Prime number size (20% free):
26877474 entries
building hashmap: 8 seconds 28895211 ticks
sum 74128
looking entries up: 0 seconds 33222 ticks

My implementation. Prime number size (50% free):
building hashmap: 8 seconds 27738475 ticks
looking entries up: 0 seconds 25696 ticks

OrderedAA (implementation of Daniel Kozak):
building array: 13 seconds 45663219 ticks
lookup: 0 seconds 236323 ticks

Builtin AA:
building array: 14 seconds 47175035 ticks
lookup: 0 seconds 33692 ticks


You can see, that both my implementation and daniel kozaks outperform 
the builtin AA when filling the AA. Both my and daniel kozaks version 
preallocate the internal array. Although daniel kozaks version does one 
additional allocation per entry to build the double linked lists.
When not preallocating my implementation still outperforms the builtin AA.

When looking up data, my implementation outperforms both other 
implementations. My implementation is only slightly faster then the 
builtin one. Daniel Kozaks version is 8 times slower during lookup 
compared to my implementation. I think this is mostly due to cache 
misses caused by the linked list storage of the entries.
It is also interresting to note, that using prime number sizes for the 
internal array of the AA improved the lookup performance slightly in my 
implementation. A certain portion of all entries in the internal array 
are always kept free. Reducing this amount leads to worse performance 
during looup, increasing it, gives better performance. You can gain a 
few percent speed if you are willing to waste 50% memory. My 
measurements show that 25% free entries seem to be the sweet spot.
Dec 28 2013
parent reply "Daniel Kozak" <kozzi11 gmail.com> writes:
On Saturday, 28 December 2013 at 20:20:36 UTC, Benjamin Thaut 
wrote:
 Because I was always curious to do some hashmap profiling with 
 real data, I did some more. Here are the results:

 My implementation. Power of two size (25% free):
 building hashmap: 8 seconds 28880605 ticks
 looking entries up: 0 seconds 31685 ticks

 My implementation. Prime number size (25% free):
 building hashmap: 8 seconds 26802252 ticks
 looking entries up: 0 seconds 27719 ticks

 My implementation. Prime number size (10% free):
 building hashmap: 8 seconds 29323952 ticks
 looking entries up: 0 seconds 32768 ticks

 My implementation. Prime number size (20% free):
 26877474 entries
 building hashmap: 8 seconds 28895211 ticks
 sum 74128
 looking entries up: 0 seconds 33222 ticks

 My implementation. Prime number size (50% free):
 building hashmap: 8 seconds 27738475 ticks
 looking entries up: 0 seconds 25696 ticks

 OrderedAA (implementation of Daniel Kozak):
 building array: 13 seconds 45663219 ticks
 lookup: 0 seconds 236323 ticks

 Builtin AA:
 building array: 14 seconds 47175035 ticks
 lookup: 0 seconds 33692 ticks


 You can see, that both my implementation and daniel kozaks 
 outperform the builtin AA when filling the AA. Both my and 
 daniel kozaks version preallocate the internal array. Although 
 daniel kozaks version does one additional allocation per entry 
 to build the double linked lists.
 When not preallocating my implementation still outperforms the 
 builtin AA.

 When looking up data, my implementation outperforms both other 
 implementations. My implementation is only slightly faster then 
 the builtin one. Daniel Kozaks version is 8 times slower during 
 lookup compared to my implementation. I think this is mostly 
 due to cache misses caused by the linked list storage of the 
 entries.
 It is also interresting to note, that using prime number sizes 
 for the internal array of the AA improved the lookup 
 performance slightly in my implementation. A certain portion of 
 all entries in the internal array are always kept free. 
 Reducing this amount leads to worse performance during looup, 
 increasing it, gives better performance. You can gain a few 
 percent speed if you are willing to waste 50% memory. My 
 measurements show that 25% free entries seem to be the sweet 
 spot.
This is cool! Can you post somewhere your code and data which you use for this test? I really like to test it on my new AA implementation :)
Dec 30 2013
parent Benjamin Thaut <code benjamin-thaut.de> writes:
Am 30.12.2013 15:06, schrieb Daniel Kozak:
 This is cool!

 Can you post somewhere your code and data which you use for this test? I
 really like to test it on my new AA implementation :)
Here is the post that describes how to get the data: http://forum.dlang.org/post/yglwnmawrvbhpszdsiqs forum.dlang.org And here the post how to convert it to tabular text data: http://forum.dlang.org/post/rhyycuwlxsoqbtpzpgzd forum.dlang.org After dumping the "relationship" table from mysql I just removed the first column because it only contains the ID of the entry. I then used the remaining two columns for profiling. You will have to get the data yourself, as I can't repost it here (because its not my data). Here is the post that describes my implementation: http://forum.dlang.org/post/l9kcm2$2i4i$1 digitalmars.com For profiling the lookups, I generated 100_000 random numbers in the range range (0, 200_000] and saved them to a file. I then profile the lookup of those 100_000 random numbers. Out of all these numbers rughly 75% are valid indices and the rest are invalid indices to profile both the lookup of valid and invalid keys. For profiling your implementation I just used the code you posted. Kind Regards Benjamin Thaut
Dec 30 2013