www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - AA Troubles

reply Jonathan Crapuchettes <jcrapuchettes gmail.com> writes:
Hey all,

I have been working with a lot of associative arrays of late and been running 
into some problems with the built-in implementation. It appears that very heavy 
use in an application can cause the garbage collector to have issues. Most of 
the time I have found ways around the problems, but the one that I ran into 
today has made me wonder if there is a less error prone/slower implementation
of 
AAs out there for D that doesn't pound the GC as hard. I have looked at the aa 
project at dsource and noticed some discussion on the forums from a little
while 
ago, but I wasn't sure how up-to-date everything was.

The problem that I ran into appears, according to gdb, to be in 
gc.gcx.Gcx.findPool(). During the execution of my program, the code halts at
one 
point with 1 core maxed (i.e. infinite loop) about 6 minutes into the execution 
with about 2.1GB of memory allocated. When I said that I used the AAs a lot, I 
mean that most of the memory is allocated to classes that are only accessible 
through the AAs. I wish that I could give you an example, but at the point of 
the halt, the program had pulled data from a 60M row database table and run 
through about 500 lines of code.

I know that I usually don't comment in this forum unless I have a problem, and
I 
appreciate that I still get replies. Just know that I am a great proponent of D 
and make very heavy use of it in a corporate environment doing a lot of 
cutting-edge economic calculations. Thank you to all who have contributed to D!

Thank you,

Jonathan
May 10 2011
next sibling parent reply dsimcha <dsimcha yahoo.com> writes:
On 5/10/2011 9:54 PM, Jonathan Crapuchettes wrote:
 Hey all,

 I have been working with a lot of associative arrays of late and been
 running into some problems with the built-in implementation. It appears
 that very heavy use in an application can cause the garbage collector to
 have issues. Most of the time I have found ways around the problems, but
 the one that I ran into today has made me wonder if there is a less
 error prone/slower implementation of AAs out there for D that doesn't
 pound the GC as hard. I have looked at the aa project at dsource and
 noticed some discussion on the forums from a little while ago, but I
 wasn't sure how up-to-date everything was.

 The problem that I ran into appears, according to gdb, to be in
 gc.gcx.Gcx.findPool(). During the execution of my program, the code
 halts at one point with 1 core maxed (i.e. infinite loop) about 6
 minutes into the execution with about 2.1GB of memory allocated. When I
 said that I used the AAs a lot, I mean that most of the memory is
 allocated to classes that are only accessible through the AAs. I wish
 that I could give you an example, but at the point of the halt, the
 program had pulled data from a 60M row database table and run through
 about 500 lines of code.

 I know that I usually don't comment in this forum unless I have a
 problem, and I appreciate that I still get replies. Just know that I am
 a great proponent of D and make very heavy use of it in a corporate
 environment doing a lot of cutting-edge economic calculations. Thank you
 to all who have contributed to D!

 Thank you,

 Jonathan

Two things: 1. This is one of the main problems my RandAA associative array implementation was meant to solve. It's usable but not completely polished yet. Code: http://dsource.org/projects/aa/browser/trunk/randAA/randaasealed.d Docs: http://cis.jhu.edu/~dsimcha/randaasealed.html 2. If you're using Linux, try compiling a 64-bit binary. Since pointers now have twice as many bits, false pointers are orders of magnitude less probable.
May 11 2011
parent reply Jonathan Crapuchettes <jcrapuchettes gmail.com> writes:
1. I will give you implementation a try.
2. I have been compiling with 64-bit as I expect most of the programs I write 
require more than 4GB of memory.

Thanks!

dsimcha wrote:
 On 5/10/2011 9:54 PM, Jonathan Crapuchettes wrote:
 Hey all,

 I have been working with a lot of associative arrays of late and been
 running into some problems with the built-in implementation. It appears
 that very heavy use in an application can cause the garbage collector to
 have issues. Most of the time I have found ways around the problems, but
 the one that I ran into today has made me wonder if there is a less
 error prone/slower implementation of AAs out there for D that doesn't
 pound the GC as hard. I have looked at the aa project at dsource and
 noticed some discussion on the forums from a little while ago, but I
 wasn't sure how up-to-date everything was.

 The problem that I ran into appears, according to gdb, to be in
 gc.gcx.Gcx.findPool(). During the execution of my program, the code
 halts at one point with 1 core maxed (i.e. infinite loop) about 6
 minutes into the execution with about 2.1GB of memory allocated. When I
 said that I used the AAs a lot, I mean that most of the memory is
 allocated to classes that are only accessible through the AAs. I wish
 that I could give you an example, but at the point of the halt, the
 program had pulled data from a 60M row database table and run through
 about 500 lines of code.

 I know that I usually don't comment in this forum unless I have a
 problem, and I appreciate that I still get replies. Just know that I am
 a great proponent of D and make very heavy use of it in a corporate
 environment doing a lot of cutting-edge economic calculations. Thank you
 to all who have contributed to D!

 Thank you,

 Jonathan

Two things: 1. This is one of the main problems my RandAA associative array implementation was meant to solve. It's usable but not completely polished yet. Code: http://dsource.org/projects/aa/browser/trunk/randAA/randaasealed.d Docs: http://cis.jhu.edu/~dsimcha/randaasealed.html 2. If you're using Linux, try compiling a 64-bit binary. Since pointers now have twice as many bits, false pointers are orders of magnitude less probable.

May 11 2011
parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from Jonathan Crapuchettes (jcrapuchettes gmail.com)'s article
 1. I will give you implementation a try.
 2. I have been compiling with 64-bit as I expect most of the programs I write
 require more than 4GB of memory.
 Thanks!

Unfortunately RandAA doesn't work on 64-bit yet, though the reasons are just some minor details, not fundamental issues. I didn't bother supporting it because last time I did a major update the 64-bit backend wasn't ready yet and I needed to find good linear congruential parameters for 64-bit. You might also be pleasantly surprised when the new release (currently in beta) comes out. I've put a lot of optimization work into the garbage collector and depending on the allocation size, it's between 25% and 200x faster than the 2.052 GC. (See https://github.com/dsimcha/druntime/wiki/Druntime-GC-Optimization-Fork)
May 11 2011
parent reply Jonathan Crapuchettes <jcrapuchettes gmail.com> writes:
Wow! When is the next release planned? Is it worth my trying the beta?

dsimcha wrote:
 == Quote from Jonathan Crapuchettes (jcrapuchettes gmail.com)'s article
 1. I will give you implementation a try.
 2. I have been compiling with 64-bit as I expect most of the programs I write
 require more than 4GB of memory.
 Thanks!

Unfortunately RandAA doesn't work on 64-bit yet, though the reasons are just some minor details, not fundamental issues. I didn't bother supporting it because last time I did a major update the 64-bit backend wasn't ready yet and I needed to find good linear congruential parameters for 64-bit. You might also be pleasantly surprised when the new release (currently in beta) comes out. I've put a lot of optimization work into the garbage collector and depending on the allocation size, it's between 25% and 200x faster than the 2.052 GC. (See https://github.com/dsimcha/druntime/wiki/Druntime-GC-Optimization-Fork)

May 11 2011
next sibling parent dsimcha <dsimcha yahoo.com> writes:
== Quote from Jonathan Crapuchettes (jcrapuchettes gmail.com)'s article
 Wow! When is the next release planned? Is it worth my trying the beta?

You probably shouldn't try the betas. DMD betas are very short (usually on the order of a week or less) and are designed so that library writers and other very heavy users can test their code and file bug reports on regressions. The next release will probably be out within a few days.
May 11 2011
prev sibling parent reply dsimcha <dsimcha yahoo.com> writes:
BTW, I've updated RandAA for 64-bits if you want to try it out.  The 
updates were rather trivial and I meant to do them a long time ago.

On 5/11/2011 1:49 PM, Jonathan Crapuchettes wrote:
 Wow! When is the next release planned? Is it worth my trying the beta?

 dsimcha wrote:
 == Quote from Jonathan Crapuchettes (jcrapuchettes gmail.com)'s article
 1. I will give you implementation a try.
 2. I have been compiling with 64-bit as I expect most of the programs
 I write
 require more than 4GB of memory.
 Thanks!

Unfortunately RandAA doesn't work on 64-bit yet, though the reasons are just some minor details, not fundamental issues. I didn't bother supporting it because last time I did a major update the 64-bit backend wasn't ready yet and I needed to find good linear congruential parameters for 64-bit. You might also be pleasantly surprised when the new release (currently in beta) comes out. I've put a lot of optimization work into the garbage collector and depending on the allocation size, it's between 25% and 200x faster than the 2.052 GC. (See https://github.com/dsimcha/druntime/wiki/Druntime-GC-Optimization-Fork)


May 11 2011
parent reply Jonathan Crapuchettes <jcrapuchettes gmail.com> writes:
Thank you, but I figured out a way to significantly reduce the number of AAs 
that were in use in my program and that seems to have helped it and sped it up
a 
bit.

The next problem that I just ran into really doesn't make sense and I'm hoping 
that you can help me. I'm getting segfaults in the function 
invariant._d_invariant. Does this mean anything to you?

Thanks,
Jonathan

dsimcha wrote:
 BTW, I've updated RandAA for 64-bits if you want to try it out. The updates
were
 rather trivial and I meant to do them a long time ago.

 On 5/11/2011 1:49 PM, Jonathan Crapuchettes wrote:
 Wow! When is the next release planned? Is it worth my trying the beta?

 dsimcha wrote:
 == Quote from Jonathan Crapuchettes (jcrapuchettes gmail.com)'s article
 1. I will give you implementation a try.
 2. I have been compiling with 64-bit as I expect most of the programs
 I write
 require more than 4GB of memory.
 Thanks!

Unfortunately RandAA doesn't work on 64-bit yet, though the reasons are just some minor details, not fundamental issues. I didn't bother supporting it because last time I did a major update the 64-bit backend wasn't ready yet and I needed to find good linear congruential parameters for 64-bit. You might also be pleasantly surprised when the new release (currently in beta) comes out. I've put a lot of optimization work into the garbage collector and depending on the allocation size, it's between 25% and 200x faster than the 2.052 GC. (See https://github.com/dsimcha/druntime/wiki/Druntime-GC-Optimization-Fork)



May 11 2011
parent Jonathan Crapuchettes <jcrapuchettes gmail.com> writes:
Please ignore my last question. I think I had worked on the problem too long 
last night.
Thanks,
JC
May 12 2011
prev sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 10 May 2011 21:54:30 -0400, Jonathan Crapuchettes  
<jcrapuchettes gmail.com> wrote:

 Hey all,

 I have been working with a lot of associative arrays of late and been  
 running into some problems with the built-in implementation. It appears  
 that very heavy use in an application can cause the garbage collector to  
 have issues. Most of the time I have found ways around the problems, but  
 the one that I ran into today has made me wonder if there is a less  
 error prone/slower implementation of AAs out there for D that doesn't  
 pound the GC as hard. I have looked at the aa project at dsource and  
 noticed some discussion on the forums from a little while ago, but I  
 wasn't sure how up-to-date everything was.

I realize this is very late response, but you can try an alternative map type in dcollections. There are both HashMap and TreeMap. Please use the latest d2 branch, as it has recently been updated to work with 64-bit linux (need to do a beta release). http://www.dsource.org/projects/dcollections In most cases, the HashMap actually is faster than the builtin AA. -Steve
May 16 2011