digitalmars.D.learn - How to resume iteration from previous point in associative array

Tarman (23/23) Feb 19 2014 Hi,

Tarman (8/8) Feb 19 2014 I guess what might work here is to allow some kind of argument to
bearophile (8/15) Feb 19 2014 Built-in associative arrays were never tested with so many pairs,

Nicolas Sicard (3/19) Feb 19 2014 Are byKey and byValue guaranteed to iterate the associative array

bearophile (4/6) Feb 19 2014 Nope, or not yet, sorry.

=?UTF-8?B?QWxpIMOHZWhyZWxp?= (4/10) Feb 19 2014 Is there a bug report about that? I can't imagine how they can iterate

simendsjo (5/28) Feb 19 2014 Probably not an answer to your question, but if possible, you can
Tobias Pankrath (16/39) Feb 19 2014 You could iterate over byValue/byKey/zip(byKey,byValue) which

Tarman (11/11) Feb 19 2014 @bearophile:

"Tarman" <tarman chilon.net> writes:

Hi,

We're doing some "super computing" "big data" style stuff with D. 
We have a system where we're comparing associative arrays with 
billions of entries.

However in this system we need to fairly consider possible 
solutions for many "units" at a time within a single thread.

So we'd like to... say, first iterate over the first 10 million 
for each "unit" then iterate over the next 10 million for the 
next unit so that each unit gets a fair share of CPU time.

However in this case we can't:

int count = 0;
foreach (item; associativeArray) {
   if (++count == 10_000_000) {
     // here we wish to somehow save the iteration state
   }
}

Then on the next iteration:

foreach (resume from previous iteration point) {
}

We've tried copying the keys into a non-associative array and 
sure this works, but it is far far far less optimal than an 
equivalent C++ solution we wrote where we use an 
std::unordered_set and can simply store the iterator.

Feb 19 2014

"Tarman" <tarman chilon.net> writes:

I guess what might work here is to allow some kind of argument to 
"byKey" that will let us resume so maybe something like:

foreach (item; 
massiveAssociativeArray.byKey(indexOfKeyToStartFrom)) {
}

If something like this isn't possible, then would one of our 
engineers be able to submit a patch to libphobos to provide the 
feature?

Feb 19 2014

"bearophile" <bearophileHUGS lycos.com> writes:

Tarman:

 We're doing some "super computing" "big data" style stuff with 
 D. We have a system where we're comparing associative arrays 
 with billions of entries.

Built-in associative arrays were never tested with so many pairs, 
so perform exhaustive performance benchmarks first, and report in 
Bugzilla the performance and memory problems you hit.


 We've tried copying the keys into a non-associative array and 
 sure this works, but it is far far far less optimal than an 
 equivalent C++ solution we wrote where we use an 
 std::unordered_set and can simply store the iterator.

Have you tried the simplest thing, to let std.parallelism chunk a 
  AA.byKey.zip(AA.byValue) ?

Bye,
bearophile

Feb 19 2014

"Nicolas Sicard" <dransic gmail.com> writes:

On Wednesday, 19 February 2014 at 09:28:27 UTC, bearophile wrote:
 Tarman:

 We're doing some "super computing" "big data" style stuff with 
 D. We have a system where we're comparing associative arrays 
 with billions of entries.

 Built-in associative arrays were never tested with so many 
 pairs, so perform exhaustive performance benchmarks first, and 
 report in Bugzilla the performance and memory problems you hit.


 We've tried copying the keys into a non-associative array and 
 sure this works, but it is far far far less optimal than an 
 equivalent C++ solution we wrote where we use an 
 std::unordered_set and can simply store the iterator.

 Have you tried the simplest thing, to let std.parallelism chunk 
 a
  AA.byKey.zip(AA.byValue) ?

 Bye,
 bearophile

Are byKey and byValue guaranteed to iterate the associative array 
in the same order?

Feb 19 2014

"bearophile" <bearophileHUGS lycos.com> writes:

Nicolas Sicard:

 Are byKey and byValue guaranteed to iterate the associative 
 array in the same order?

Nope, or not yet, sorry.

Bye,
bearophile

Feb 19 2014

=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:

On 02/19/2014 04:11 AM, bearophile wrote:
 Nicolas Sicard:

 Are byKey and byValue guaranteed to iterate the associative array in
 the same order?

 Nope, or not yet, sorry.

 Bye,
 bearophile

Is there a bug report about that? I can't imagine how they can iterate 
in any other order.

Ali

Feb 19 2014

"simendsjo" <simendsjo gmail.com> writes:

On Wednesday, 19 February 2014 at 09:21:48 UTC, Tarman wrote:
 Hi,

 We're doing some "super computing" "big data" style stuff with 
 D. We have a system where we're comparing associative arrays 
 with billions of entries.

 However in this system we need to fairly consider possible 
 solutions for many "units" at a time within a single thread.

 So we'd like to... say, first iterate over the first 10 million 
 for each "unit" then iterate over the next 10 million for the 
 next unit so that each unit gets a fair share of CPU time.

 However in this case we can't:

 int count = 0;
 foreach (item; associativeArray) {
   if (++count == 10_000_000) {
     // here we wish to somehow save the iteration state
   }
 }

 Then on the next iteration:

 foreach (resume from previous iteration point) {
 }

 We've tried copying the keys into a non-associative array and 
 sure this works, but it is far far far less optimal than an 
 equivalent C++ solution we wrote where we use an 
 std::unordered_set and can simply store the iterator.

Probably not an answer to your question, but if possible, you can 
use a fiber and yield when you should give control to another 
fiber.

Feb 19 2014

"Tobias Pankrath" <tobias pankrath.net> writes:

On Wednesday, 19 February 2014 at 09:21:48 UTC, Tarman wrote:
 Hi,

 We're doing some "super computing" "big data" style stuff with 
 D. We have a system where we're comparing associative arrays 
 with billions of entries.

 However in this system we need to fairly consider possible 
 solutions for many "units" at a time within a single thread.

 So we'd like to... say, first iterate over the first 10 million 
 for each "unit" then iterate over the next 10 million for the 
 next unit so that each unit gets a fair share of CPU time.

 However in this case we can't:

 int count = 0;
 foreach (item; associativeArray) {
   if (++count == 10_000_000) {
     // here we wish to somehow save the iteration state
   }
 }

 Then on the next iteration:

 foreach (resume from previous iteration point) {
 }

 We've tried copying the keys into a non-associative array and 
 sure this works, but it is far far far less optimal than an 
 equivalent C++ solution we wrote where we use an 
 std::unordered_set and can simply store the iterator.

You could iterate over byValue/byKey/zip(byKey,byValue) which 
gives you a range to start again from. You'd need to iterate by 
hand (with while,for) though, since your range won't get consumed 
by foreach, I guess. Or use RefRange.

not tested:

auto values = refRange(&associativeArray.byValue);
foreach(item; values)
{
   if(someday) break;
}

// here values should have been advanced accordingly
// and you can just continue your iteration
foreach(item; values)
{

}

Feb 19 2014

"Tarman" <tarman chilon.net> writes:

 bearophile:

Thanks for your suggestion! Yes we've tried this but 
unfortunately the performance doesn't work for us, maybe because 
all our CPUs are quite saturated.

 simendsjo:

Thanks also for your suggestion, we are using tasks and yield in 
our code but in this case we felt that using it just to resume 
iteration was perhaps a misuse of the feature although we would 
be interested to see what the overhead is.

 Mr Pankrath:

Thanks that looks like what we need!

Feb 19 2014

D Programming

C/C++ Programming

Other

digitalmars.D.learn - How to resume iteration from previous point in associative array