www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - A Lock-Free Hash Table (google video)

reply Knud Soerensen <4tuu4k002 sneakemail.com> writes:
I would like to share this interesting video 

http://video.google.com/videoplay?docid=2139967204534450862
Apr 01 2007
next sibling parent Dejan Lekic <dejan.lekic gmail.com> writes:
Mr. Soerensen, thanks - a really nice video.
Apr 01 2007
prev sibling next sibling parent Manfred Nowak <svv1999 hotmail.com> writes:
Knud Soerensen wrote

 I would like to share this interesting video 
 
 http://video.google.com/videoplay?docid=2139967204534450862

The author admits, that he has some problems when resizing. He solved them by "stalling". With 64MB hashes the stall time is short though. http://blogs.azulsystems.com/cliff/ But I wonder, how much "stalling" will be done on resizing to some milliards of hash positions. -manfred
Apr 01 2007
prev sibling next sibling parent reply "Craig Black" <cblack ara.com> writes:
Good stuff.  BTW, does anyone know when is D going to start using lock free
algorithms with their built-in/templated containers?

-Craig
Apr 01 2007
next sibling parent reply "David B. Held" <dheld codelogicconsulting.com> writes:
Craig Black wrote:
 Good stuff.  BTW, does anyone know when is D going to start using lock free
 algorithms with their built-in/templated containers?

When STM is implemented. ;) And I hope I'm not giving away too much by saying that some reasonably smart folks are working on it even as we speak... Dave
Apr 01 2007
next sibling parent "David B. Held" <dheld codelogicconsulting.com> writes:
Brad Roberts wrote:
 [...]
 STM (software transactional memory) isn't required for lock-free simple 
 data structures (not all structures can be done lock free and I'm not an 
 expert in that area).  It can be used, but it's overkill.
 [...]

Well, the point is that even if nobody rewrites any containers to use lock-free algorithms, you can get correct lock-free behavior at whatever level of granularity you feel is appropriate by writing the relevant code in an STM atomic {} block. Granted, there are cases where a lock-free container makes sense and may provide, for instance, better concurrency in the high-contention case. But much of the attraction of STM is that for the simple case, no code actually has to be rewritten, which is great for people who need to build big data structures but don't want to become experts in designing lock-free algorithms. Dave
Apr 01 2007
prev sibling next sibling parent reply Sean Kelly <sean f4.ca> writes:
David B. Held wrote:
 Craig Black wrote:
 Good stuff.  BTW, does anyone know when is D going to start using lock 
 free
 algorithms with their built-in/templated containers?

When STM is implemented. ;) And I hope I'm not giving away too much by saying that some reasonably smart folks are working on it even as we speak...

Will this require the user to expose a .clone() method for classes involved in transactions?
Apr 01 2007
parent "David B. Held" <dheld codelogicconsulting.com> writes:
Sean Kelly wrote:
 David B. Held wrote:
 Craig Black wrote:
 Good stuff.  BTW, does anyone know when is D going to start using 
 lock free
 algorithms with their built-in/templated containers?

When STM is implemented. ;) And I hope I'm not giving away too much by saying that some reasonably smart folks are working on it even as we speak...

Will this require the user to expose a .clone() method for classes involved in transactions?

I don't want to overpromise and underdeliver, so I will start out with the disclaimer that nothing I say about D features is guaranteed to be implemented. ;) That being said, word-based STM is the most generic form, and requires no special work on the part of any user whatsoever. The first form of STM provided may or may not be word-based. ;) However, Walter and his "STM advisors" are aware of the other forms of STM, including object-based, and there is a possibility of having user-specifiable STM implementations. I think everyone realizes that concurrency is the hot new buzzword, and you can rest assured that D will take it very, very seriously. Dave
Apr 01 2007
prev sibling parent reply "Craig Black" <cblack ara.com> writes:
"David B. Held" <dheld codelogicconsulting.com> wrote in message 
news:eupvmj$1au5$1 digitalmars.com...
 Craig Black wrote:
 Good stuff.  BTW, does anyone know when is D going to start using lock 
 free
 algorithms with their built-in/templated containers?

When STM is implemented. ;) And I hope I'm not giving away too much by saying that some reasonably smart folks are working on it even as we speak... Dave

This is very good. It seems that there are a number of things in the works for the D compiler. The const stuff, Walter has mentioned run-time reflection, and now this STM as you now mention. Any of these things would be very cool IMO. Although Walter sometimes mentions stuff currently in development, he does not say much about it. Do you have any insight on what is coming next? -Craig
Apr 02 2007
next sibling parent Dan <murpsoft hotmail.com> writes:
It seems to me that this is the only optimal form for hashtables - a massively
scaleable one.  Why?

For cases where n < 8, a linear search is typically faster than any other
search algorithm.  The reason is less overhead.

For cases where n < 1024, a level order binary search array is typically faster
than any other search algorithm I've learned about, including a hashtable.  The
reason is again less overhead.

For cases where n > 1024, I currently think that O(log(n)) in a level order
binary search array is actually more expensive than that to manage a hashtable.

For cases where n > 1048576, the data set is getting quite large.  Assuming,
through heuristics, that the number of gets and puts is also increasing, it
becomes very practical to parallelize the hashtable.

My understanding is that we tend to organize data either very small or very
large, rarely between 1024 and 1048576.  So implementing a hashtable, to me,
suggests it ought to always be parallelizeable.  Especially considering the low
overhead involved if it's done right.
Apr 02 2007
prev sibling parent "David B. Held" <dheld codelogicconsulting.com> writes:
Craig Black wrote:
 [...]
 This is very good.  It seems that there are a number of things in the works 
 for the D compiler.  The const stuff, Walter has mentioned run-time 
 reflection, and now this STM as you now mention.  Any of these things would 
 be very cool IMO.  Although Walter sometimes mentions stuff currently in 
 development, he does not say much about it.  Do you have any insight on what 
 is coming next?

Nothing more exciting than bug fixes, but that's always good, right? ;) Dave
Apr 02 2007
prev sibling parent Brad Roberts <braddr puremagic.com> writes:
David B. Held wrote:
 Craig Black wrote:
 Good stuff.  BTW, does anyone know when is D going to start using lock 
 free
 algorithms with their built-in/templated containers?

When STM is implemented. ;) And I hope I'm not giving away too much by saying that some reasonably smart folks are working on it even as we speak... Dave

STM (software transactional memory) isn't required for lock-free simple data structures (not all structures can be done lock free and I'm not an expert in that area). It can be used, but it's overkill. A better answer, however, is that thread safety isn't something that you apply to every single layer of data structures. You need to really consider the appropriate level of granularity. Even lock free routines have a cost that you don't necessarily want to pay all the time. Given that, the lowest level is generally _not_ the right place to be applying synchronization logic (be it with atomic operations or locks). And to be clear, I've asked this before and been convinced of the above. You could ask the same thing of the stl for c++, and the same answer would apply. NOW, and even better answer might be to have the safety be a pluggable policy decision and then people would be able to make the choice for themselves. That'd allow the best of all worlds. The problem is that writing code that flexibly is quite a bit of work and someone would need to volunteer to do it since I'm just about 100% sure that the code in question here isn't going to change otherwise. Your question is a good one, and the answer is a lot more complex than it seems at first blush. Later, Brad
Apr 01 2007
prev sibling parent reply Lionello Lunesu <lio lunesu.remove.com> writes:
Knud Soerensen wrote:
 I would like to share this interesting video 
 
 http://video.google.com/videoplay?docid=2139967204534450862

Very interesting, thanks for that. Maybe now we can convince Walter to include the patch I made about a year ago to make D's AA power-of-2/AND instead of prime/MOD? I'm wondering if this would be useful for D. What about memory allocations? If one thread is allocating a key/value, will that allocation block other threads? Any difference between Phobos/Tango? L.
Apr 02 2007
next sibling parent reply "Craig Black" <cblack ara.com> writes:
Yes.  I've done benchmarks and power of two is better.  I wonder, does Tango 
use a power of two algorithm for its AA's?

-Craig

"Lionello Lunesu" <lio lunesu.remove.com> wrote in message 
news:euqekc$24nu$1 digitalmars.com...
 Knud Soerensen wrote:
 I would like to share this interesting video 
 http://video.google.com/videoplay?docid=2139967204534450862

Very interesting, thanks for that. Maybe now we can convince Walter to include the patch I made about a year ago to make D's AA power-of-2/AND instead of prime/MOD? I'm wondering if this would be useful for D. What about memory allocations? If one thread is allocating a key/value, will that allocation block other threads? Any difference between Phobos/Tango? L.

Apr 02 2007
next sibling parent reply Dan <murpsoft hotmail.com> writes:
Craig Black Wrote:

 Very interesting, thanks for that. Maybe now we can convince Walter to 
 include the patch I made about a year ago to make D's AA power-of-2/AND 
 instead of prime/MOD?


oh my! I thought this was obvious; as was the powers of two, and using atomic asm operators to complete whole transactions or not at all. I assumed this was already well known and implemented. Actually, another way to implement the atomic transaction is to use any SSE/SSE2 move instruction to move a key/value pair into place with a: struct KeyVal { char* key; union { long l_value; void* ptr_value; double d_value; } } assert(KeyVal.sizeof == 16); into a ^2 sized array. It's either there, or it's not; so it's concurrent if you follow the rest of the principles, and requires absolutely no preparation for a CAS or locking. I'm going to hope that moving a 16-byte uses SSE2...
Apr 02 2007
next sibling parent reply Sean Kelly <sean f4.ca> writes:
Dan wrote:
 Craig Black Wrote:
 
 Very interesting, thanks for that. Maybe now we can convince Walter to 
 include the patch I made about a year ago to make D's AA power-of-2/AND 
 instead of prime/MOD?


oh my! I thought this was obvious; as was the powers of two, and using atomic asm operators to complete whole transactions or not at all. I assumed this was already well known and implemented. Actually, another way to implement the atomic transaction is to use any SSE/SSE2 move instruction to move a key/value pair into place with a: struct KeyVal { char* key; union { long l_value; void* ptr_value; double d_value; } } assert(KeyVal.sizeof == 16); into a ^2 sized array. It's either there, or it's not; so it's concurrent if you follow the rest of the principles, and requires absolutely no preparation for a CAS or locking.

Yup. It's rehashing that gets a bit messy.
 I'm going to hope that moving a 16-byte uses SSE2...

Nope. It's all plain old D code at the moment IIRC. Sean
Apr 02 2007
parent reply Dan <murpsoft hotmail.com> writes:
Sean Kelly Wrote:
 Yup.  It's rehashing that gets a bit messy.
 
 I'm going to hope that moving a 16-byte uses SSE2...

Nope. It's all plain old D code at the moment IIRC.

Rehashing, you'd probe until you've found the right location *then* move the 16-byte struct. He was suggesting using stride-1 probing for better cache coherency - but it's aweful at cascading into linear searches for an empty slot. Plain old D doesn't use SSE or SSE2??? Oh my... Hasn't SSE been out since 1997? I guess with 64-bit GPR's, SSE1 is practically irrelevant now, but SSE2 and SSE3 could definitely improve performance of certain things... like copying arrays of 16-byte Value structs in DMDScript. I assumed that was what Walter meant when he commented that using 16 byte Value structs was critical for performance. I'll be sure to use little asm { } blocks to get certain things done then. The wonderful thing about D is that Walter provides a default mechanism for prototyping that works well. If you want something that works superbly,you can cut under it without much hassle. A superb language, that D.
Apr 02 2007
parent Sean Kelly <sean f4.ca> writes:
Dan wrote:
 Sean Kelly Wrote:
 Yup.  It's rehashing that gets a bit messy.

 I'm going to hope that moving a 16-byte uses SSE2...


Rehashing, you'd probe until you've found the right location *then* move the 16-byte struct. He was suggesting using stride-1 probing for better cache coherency - but it's aweful at cascading into linear searches for an empty slot. Plain old D doesn't use SSE or SSE2???

I have no idea. I thought you were asking if there were inline asm blocks. I suppose the codegen may use SSE2 instructions where appropriate. Sean
Apr 02 2007
prev sibling parent Mark <mark-dm santaniello.net> writes:
For what it's worth, I'm pretty sure the atomicity of 128-bit SSE stores is
implementation defined.  They might be atomic on newer architectures, but older
designs implement them as two discrete 64-bit operations.
May 15 2007
prev sibling parent Sean Kelly <sean f4.ca> writes:
Craig Black wrote:
 Yes.  I've done benchmarks and power of two is better.  I wonder, does Tango 
 use a power of two algorithm for its AA's?

It's on my "to do" list, actually. So far, however, I have been avoiding making making unnecessary changes to the compiler runtime in case Walter decides to take over management of it. But there are a few things I want to resolve before Tango goes 1.0 either way. Sean
Apr 02 2007
prev sibling parent Sean Kelly <sean f4.ca> writes:
Lionello Lunesu wrote:
 Knud Soerensen wrote:
 I would like to share this interesting video
 http://video.google.com/videoplay?docid=2139967204534450862

Very interesting, thanks for that. Maybe now we can convince Walter to include the patch I made about a year ago to make D's AA power-of-2/AND instead of prime/MOD? I'm wondering if this would be useful for D. What about memory allocations? If one thread is allocating a key/value, will that allocation block other threads?

Yes.
 Any difference between Phobos/Tango?

No. I've considered creating a GC with per-thread memory pools and such, but haven't had the time. Sean
Apr 02 2007