digitalmars.D.learn - How to resume iteration from previous point in associative array
- Tarman (23/23) Feb 19 2014 Hi,
- Tarman (8/8) Feb 19 2014 I guess what might work here is to allow some kind of argument to
- bearophile (8/15) Feb 19 2014 Built-in associative arrays were never tested with so many pairs,
- Nicolas Sicard (3/19) Feb 19 2014 Are byKey and byValue guaranteed to iterate the associative array
- bearophile (4/6) Feb 19 2014 Nope, or not yet, sorry.
- =?UTF-8?B?QWxpIMOHZWhyZWxp?= (4/10) Feb 19 2014 Is there a bug report about that? I can't imagine how they can iterate
- simendsjo (5/28) Feb 19 2014 Probably not an answer to your question, but if possible, you can
- Tobias Pankrath (16/39) Feb 19 2014 You could iterate over byValue/byKey/zip(byKey,byValue) which
- Tarman (11/11) Feb 19 2014 @bearophile:
Hi, We're doing some "super computing" "big data" style stuff with D. We have a system where we're comparing associative arrays with billions of entries. However in this system we need to fairly consider possible solutions for many "units" at a time within a single thread. So we'd like to... say, first iterate over the first 10 million for each "unit" then iterate over the next 10 million for the next unit so that each unit gets a fair share of CPU time. However in this case we can't: int count = 0; foreach (item; associativeArray) { if (++count == 10_000_000) { // here we wish to somehow save the iteration state } } Then on the next iteration: foreach (resume from previous iteration point) { } We've tried copying the keys into a non-associative array and sure this works, but it is far far far less optimal than an equivalent C++ solution we wrote where we use an std::unordered_set and can simply store the iterator.
Feb 19 2014
I guess what might work here is to allow some kind of argument to "byKey" that will let us resume so maybe something like: foreach (item; massiveAssociativeArray.byKey(indexOfKeyToStartFrom)) { } If something like this isn't possible, then would one of our engineers be able to submit a patch to libphobos to provide the feature?
Feb 19 2014
Tarman:We're doing some "super computing" "big data" style stuff with D. We have a system where we're comparing associative arrays with billions of entries.Built-in associative arrays were never tested with so many pairs, so perform exhaustive performance benchmarks first, and report in Bugzilla the performance and memory problems you hit.We've tried copying the keys into a non-associative array and sure this works, but it is far far far less optimal than an equivalent C++ solution we wrote where we use an std::unordered_set and can simply store the iterator.Have you tried the simplest thing, to let std.parallelism chunk a AA.byKey.zip(AA.byValue) ? Bye, bearophile
Feb 19 2014
On Wednesday, 19 February 2014 at 09:28:27 UTC, bearophile wrote:Tarman:Are byKey and byValue guaranteed to iterate the associative array in the same order?We're doing some "super computing" "big data" style stuff with D. We have a system where we're comparing associative arrays with billions of entries.Built-in associative arrays were never tested with so many pairs, so perform exhaustive performance benchmarks first, and report in Bugzilla the performance and memory problems you hit.We've tried copying the keys into a non-associative array and sure this works, but it is far far far less optimal than an equivalent C++ solution we wrote where we use an std::unordered_set and can simply store the iterator.Have you tried the simplest thing, to let std.parallelism chunk a AA.byKey.zip(AA.byValue) ? Bye, bearophile
Feb 19 2014
Nicolas Sicard:Are byKey and byValue guaranteed to iterate the associative array in the same order?Nope, or not yet, sorry. Bye, bearophile
Feb 19 2014
On 02/19/2014 04:11 AM, bearophile wrote:Nicolas Sicard:Is there a bug report about that? I can't imagine how they can iterate in any other order. AliAre byKey and byValue guaranteed to iterate the associative array in the same order?Nope, or not yet, sorry. Bye, bearophile
Feb 19 2014
On Wednesday, 19 February 2014 at 09:21:48 UTC, Tarman wrote:Hi, We're doing some "super computing" "big data" style stuff with D. We have a system where we're comparing associative arrays with billions of entries. However in this system we need to fairly consider possible solutions for many "units" at a time within a single thread. So we'd like to... say, first iterate over the first 10 million for each "unit" then iterate over the next 10 million for the next unit so that each unit gets a fair share of CPU time. However in this case we can't: int count = 0; foreach (item; associativeArray) { if (++count == 10_000_000) { // here we wish to somehow save the iteration state } } Then on the next iteration: foreach (resume from previous iteration point) { } We've tried copying the keys into a non-associative array and sure this works, but it is far far far less optimal than an equivalent C++ solution we wrote where we use an std::unordered_set and can simply store the iterator.Probably not an answer to your question, but if possible, you can use a fiber and yield when you should give control to another fiber.
Feb 19 2014
On Wednesday, 19 February 2014 at 09:21:48 UTC, Tarman wrote:Hi, We're doing some "super computing" "big data" style stuff with D. We have a system where we're comparing associative arrays with billions of entries. However in this system we need to fairly consider possible solutions for many "units" at a time within a single thread. So we'd like to... say, first iterate over the first 10 million for each "unit" then iterate over the next 10 million for the next unit so that each unit gets a fair share of CPU time. However in this case we can't: int count = 0; foreach (item; associativeArray) { if (++count == 10_000_000) { // here we wish to somehow save the iteration state } } Then on the next iteration: foreach (resume from previous iteration point) { } We've tried copying the keys into a non-associative array and sure this works, but it is far far far less optimal than an equivalent C++ solution we wrote where we use an std::unordered_set and can simply store the iterator.You could iterate over byValue/byKey/zip(byKey,byValue) which gives you a range to start again from. You'd need to iterate by hand (with while,for) though, since your range won't get consumed by foreach, I guess. Or use RefRange. not tested: auto values = refRange(&associativeArray.byValue); foreach(item; values) { if(someday) break; } // here values should have been advanced accordingly // and you can just continue your iteration foreach(item; values) { }
Feb 19 2014
bearophile: Thanks for your suggestion! Yes we've tried this but unfortunately the performance doesn't work for us, maybe because all our CPUs are quite saturated. simendsjo: Thanks also for your suggestion, we are using tasks and yield in our code but in this case we felt that using it just to resume iteration was perhaps a misuse of the feature although we would be interested to see what the overhead is. Mr Pankrath: Thanks that looks like what we need!
Feb 19 2014