digitalmars.D - A Lock-Free Hash Table (google video)

Knud Soerensen (2/2) Apr 01 2007 I would like to share this interesting video

Dejan Lekic (1/1) Apr 01 2007 Mr. Soerensen, thanks - a really nice video.
Manfred Nowak (8/11) Apr 01 2007 The author admits, that he has some problems when resizing.
Craig Black (3/3) Apr 01 2007 Good stuff. BTW, does anyone know when is D going to start using lock f...

David B. Held (5/7) Apr 01 2007 When STM is implemented. ;) And I hope I'm not giving away too much by

Brad Roberts (23/33) Apr 01 2007 STM (software transactional memory) isn't required for lock-free simple

David B. Held (11/16) Apr 01 2007 Well, the point is that even if nobody rewrites any containers to use

Sean Kelly (3/11) Apr 01 2007 Will this require the user to expose a .clone() method for classes

David B. Held (12/24) Apr 01 2007 I don't want to overpromise and underdeliver, so I will start out with

Craig Black (9/17) Apr 02 2007 This is very good. It seems that there are a number of things in the wo...

Dan (6/6) Apr 02 2007 It seems to me that this is the only optimal form for hashtables - a mas...
David B. Held (3/10) Apr 02 2007 Nothing more exciting than bug fixes, but that's always good, right? ;)

Lionello Lunesu (8/11) Apr 02 2007 Very interesting, thanks for that. Maybe now we can convince Walter to

Craig Black (5/15) Apr 02 2007 Yes. I've done benchmarks and power of two is better. I wonder, does T...

Dan (14/17) Apr 02 2007 oh my! I thought this was obvious; as was the powers of two, and using ...

Sean Kelly (4/26) Apr 02 2007 Nope. It's all plain old D code at the moment IIRC.

Dan (6/11) Apr 02 2007 Rehashing, you'd probe until you've found the right location *then* move...

Sean Kelly (4/13) Apr 02 2007 I have no idea. I thought you were asking if there were inline asm

Mark (3/3) May 15 2007 For what it's worth, I'm pretty sure the atomicity of 128-bit SSE stores...

Sean Kelly (6/8) Apr 02 2007 It's on my "to do" list, actually. So far, however, I have been

Sean Kelly (5/17) Apr 02 2007 No. I've considered creating a GC with per-thread memory pools and

Knud Soerensen <4tuu4k002 sneakemail.com> writes:

I would like to share this interesting video 

http://video.google.com/videoplay?docid=2139967204534450862

Apr 01 2007

Dejan Lekic <dejan.lekic gmail.com> writes:

Mr. Soerensen, thanks - a really nice video.

Apr 01 2007

Manfred Nowak <svv1999 hotmail.com> writes:

Knud Soerensen wrote

 I would like to share this interesting video 
 
 http://video.google.com/videoplay?docid=2139967204534450862

The author admits, that he has some problems when resizing.
He solved them by "stalling".
With 64MB hashes the stall time is short though.

http://blogs.azulsystems.com/cliff/

But I wonder, how much "stalling" will be done on resizing to some 
milliards of hash positions.

-manfred

Apr 01 2007

"Craig Black" <cblack ara.com> writes:

Good stuff.  BTW, does anyone know when is D going to start using lock free
algorithms with their built-in/templated containers?

-Craig

Apr 01 2007

"David B. Held" <dheld codelogicconsulting.com> writes:

Craig Black wrote:
 Good stuff.  BTW, does anyone know when is D going to start using lock free
 algorithms with their built-in/templated containers?

When STM is implemented. ;)  And I hope I'm not giving away too much by 
saying that some reasonably smart folks are working on it even as we 
speak...

Dave

Apr 01 2007

Brad Roberts <braddr puremagic.com> writes:

David B. Held wrote:
 Craig Black wrote:
 Good stuff.  BTW, does anyone know when is D going to start using lock 
 free
 algorithms with their built-in/templated containers?

 
 When STM is implemented. ;)  And I hope I'm not giving away too much by 
 saying that some reasonably smart folks are working on it even as we 
 speak...
 
 Dave

STM (software transactional memory) isn't required for lock-free simple 
data structures (not all structures can be done lock free and I'm not an 
expert in that area).  It can be used, but it's overkill.

A better answer, however, is that thread safety isn't something that you 
apply to every single layer of data structures.  You need to really 
consider the appropriate level of granularity.  Even lock free routines 
have a cost that you don't necessarily want to pay all the time.  Given 
that, the lowest level is generally _not_ the right place to be applying 
synchronization logic (be it with atomic operations or locks).

And to be clear, I've asked this before and been convinced of the above. 
  You could ask the same thing of the stl for c++, and the same answer 
would apply.

NOW, and even better answer might be to have the safety be a pluggable 
policy decision and then people would be able to make the choice for 
themselves.  That'd allow the best of all worlds.  The problem is that 
writing code that flexibly is quite a bit of work and someone would need 
to volunteer to do it since I'm just about 100% sure that the code in 
question here isn't going to change otherwise.

Your question is a good one, and the answer is a lot more complex than 
it seems at first blush.

Later,
Brad

Apr 01 2007

"David B. Held" <dheld codelogicconsulting.com> writes:

Brad Roberts wrote:
 [...]
 STM (software transactional memory) isn't required for lock-free simple 
 data structures (not all structures can be done lock free and I'm not an 
 expert in that area).  It can be used, but it's overkill.
 [...]

Well, the point is that even if nobody rewrites any containers to use 
lock-free algorithms, you can get correct lock-free behavior at whatever 
level of granularity you feel is appropriate by writing the relevant 
code in an STM atomic {} block.  Granted, there are cases where a 
lock-free container makes sense and may provide, for instance, better 
concurrency in the high-contention case.  But much of the attraction of 
STM is that for the simple case, no code actually has to be rewritten, 
which is great for people who need to build big data structures but 
don't want to become experts in designing lock-free algorithms.

Dave

Apr 01 2007

Sean Kelly <sean f4.ca> writes:

David B. Held wrote:
 Craig Black wrote:
 Good stuff.  BTW, does anyone know when is D going to start using lock 
 free
 algorithms with their built-in/templated containers?

 
 When STM is implemented. ;)  And I hope I'm not giving away too much by 
 saying that some reasonably smart folks are working on it even as we 
 speak...

Will this require the user to expose a .clone() method for classes 
involved in transactions?

Apr 01 2007

"David B. Held" <dheld codelogicconsulting.com> writes:

Sean Kelly wrote:
 David B. Held wrote:
 Craig Black wrote:
 Good stuff.  BTW, does anyone know when is D going to start using 
 lock free
 algorithms with their built-in/templated containers?

 When STM is implemented. ;)  And I hope I'm not giving away too much 
 by saying that some reasonably smart folks are working on it even as 
 we speak...

 
 Will this require the user to expose a .clone() method for classes 
 involved in transactions?

I don't want to overpromise and underdeliver, so I will start out with 
the disclaimer that nothing I say about D features is guaranteed to be 
implemented. ;)  That being said, word-based STM is the most generic 
form, and requires no special work on the part of any user whatsoever. 
The first form of STM provided may or may not be word-based. ;) 
However, Walter and his "STM advisors" are aware of the other forms of 
STM, including object-based, and there is a possibility of having 
user-specifiable STM implementations.  I think everyone realizes that 
concurrency is the hot new buzzword, and you can rest assured that D 
will take it very, very seriously.

Dave

Apr 01 2007

"Craig Black" <cblack ara.com> writes:

"David B. Held" <dheld codelogicconsulting.com> wrote in message 
news:eupvmj$1au5$1 digitalmars.com...
 Craig Black wrote:
 Good stuff.  BTW, does anyone know when is D going to start using lock 
 free
 algorithms with their built-in/templated containers?

 When STM is implemented. ;)  And I hope I'm not giving away too much by 
 saying that some reasonably smart folks are working on it even as we 
 speak...

 Dave

This is very good.  It seems that there are a number of things in the works 
for the D compiler.  The const stuff, Walter has mentioned run-time 
reflection, and now this STM as you now mention.  Any of these things would 
be very cool IMO.  Although Walter sometimes mentions stuff currently in 
development, he does not say much about it.  Do you have any insight on what 
is coming next?

-Craig

Apr 02 2007

Dan <murpsoft hotmail.com> writes:

It seems to me that this is the only optimal form for hashtables - a massively
scaleable one.  Why?

For cases where n < 8, a linear search is typically faster than any other
search algorithm.  The reason is less overhead.

For cases where n < 1024, a level order binary search array is typically faster
than any other search algorithm I've learned about, including a hashtable.  The
reason is again less overhead.

For cases where n > 1024, I currently think that O(log(n)) in a level order
binary search array is actually more expensive than that to manage a hashtable.

For cases where n > 1048576, the data set is getting quite large.  Assuming,
through heuristics, that the number of gets and puts is also increasing, it
becomes very practical to parallelize the hashtable.

My understanding is that we tend to organize data either very small or very
large, rarely between 1024 and 1048576.  So implementing a hashtable, to me,
suggests it ought to always be parallelizeable.  Especially considering the low
overhead involved if it's done right.

Apr 02 2007

"David B. Held" <dheld codelogicconsulting.com> writes:

Craig Black wrote:
 [...]
 This is very good.  It seems that there are a number of things in the works 
 for the D compiler.  The const stuff, Walter has mentioned run-time 
 reflection, and now this STM as you now mention.  Any of these things would 
 be very cool IMO.  Although Walter sometimes mentions stuff currently in 
 development, he does not say much about it.  Do you have any insight on what 
 is coming next?

Nothing more exciting than bug fixes, but that's always good, right? ;)

Dave

Apr 02 2007

Lionello Lunesu <lio lunesu.remove.com> writes:

Knud Soerensen wrote:
 I would like to share this interesting video 
 
 http://video.google.com/videoplay?docid=2139967204534450862

Very interesting, thanks for that. Maybe now we can convince Walter to 
include the patch I made about a year ago to make D's AA power-of-2/AND 
instead of prime/MOD?

I'm wondering if this would be useful for D. What about memory 
allocations? If one thread is allocating a key/value, will that 
allocation block other threads? Any difference between Phobos/Tango?

L.

Apr 02 2007

"Craig Black" <cblack ara.com> writes:

Yes.  I've done benchmarks and power of two is better.  I wonder, does Tango 
use a power of two algorithm for its AA's?

-Craig

"Lionello Lunesu" <lio lunesu.remove.com> wrote in message 
news:euqekc$24nu$1 digitalmars.com...
 Knud Soerensen wrote:
 I would like to share this interesting video 
 http://video.google.com/videoplay?docid=2139967204534450862

 Very interesting, thanks for that. Maybe now we can convince Walter to 
 include the patch I made about a year ago to make D's AA power-of-2/AND 
 instead of prime/MOD?

 I'm wondering if this would be useful for D. What about memory 
 allocations? If one thread is allocating a key/value, will that allocation 
 block other threads? Any difference between Phobos/Tango?

 L.

Apr 02 2007

Dan <murpsoft hotmail.com> writes:

Craig Black Wrote:

 Very interesting, thanks for that. Maybe now we can convince Walter to 
 include the patch I made about a year ago to make D's AA power-of-2/AND 
 instead of prime/MOD?


oh my!  I thought this was obvious; as was the powers of two, and using atomic
asm operators to complete whole transactions or not at all.  I assumed this was
already well known and implemented.

Actually, another way to implement the atomic transaction is to use any
SSE/SSE2 move instruction to move a key/value pair into place with a:

struct KeyVal {
   char* key;
   union {
      long l_value;
      void* ptr_value;
     double d_value;
   }
}
assert(KeyVal.sizeof == 16);

into a ^2 sized array.  It's either there, or it's not; so it's concurrent if
you follow the rest of the principles, and requires absolutely no preparation
for a CAS or locking.

I'm going to hope that moving a 16-byte uses SSE2...

Apr 02 2007

Sean Kelly <sean f4.ca> writes:

Dan wrote:
 Craig Black Wrote:
 
 Very interesting, thanks for that. Maybe now we can convince Walter to 
 include the patch I made about a year ago to make D's AA power-of-2/AND 
 instead of prime/MOD?


 
 oh my!  I thought this was obvious; as was the powers of two, and using atomic
asm operators to complete whole transactions or not at all.  I assumed this was
already well known and implemented.
 
 Actually, another way to implement the atomic transaction is to use any
SSE/SSE2 move instruction to move a key/value pair into place with a:
 
 struct KeyVal {
    char* key;
    union {
       long l_value;
       void* ptr_value;
      double d_value;
    }
 }
 assert(KeyVal.sizeof == 16);
 
 into a ^2 sized array.  It's either there, or it's not; so it's concurrent if
you follow the rest of the principles, and requires absolutely no preparation
for a CAS or locking.

Yup.  It's rehashing that gets a bit messy.

 I'm going to hope that moving a 16-byte uses SSE2...

Nope.  It's all plain old D code at the moment IIRC.


Sean

Apr 02 2007

Dan <murpsoft hotmail.com> writes:

Sean Kelly Wrote:
 Yup.  It's rehashing that gets a bit messy.
 
 I'm going to hope that moving a 16-byte uses SSE2...

 
 Nope.  It's all plain old D code at the moment IIRC.

Rehashing, you'd probe until you've found the right location *then* move the
16-byte struct.  He was suggesting using stride-1 probing for better cache
coherency - but it's aweful at cascading into linear searches for an empty slot.

Plain old D doesn't use SSE or SSE2???  Oh my...  Hasn't SSE been out since
1997?  I guess with 64-bit GPR's, SSE1 is practically irrelevant now, but SSE2
and SSE3 could definitely improve performance of certain things... like copying
arrays of 16-byte Value structs in DMDScript.  I assumed that was what Walter
meant when he commented that using 16 byte Value structs was critical for
performance.

I'll be sure to use little asm { } blocks to get certain things done then.

The wonderful thing about D is that Walter provides a default mechanism for
prototyping that works well.  If you want something that works superbly,you can
cut under it without much hassle.

A superb language, that D.

Apr 02 2007

Sean Kelly <sean f4.ca> writes:

Dan wrote:
 Sean Kelly Wrote:
 Yup.  It's rehashing that gets a bit messy.

 I'm going to hope that moving a 16-byte uses SSE2...

 Nope.  It's all plain old D code at the moment IIRC.

 
 Rehashing, you'd probe until you've found the right location *then* move the
16-byte struct.  He was suggesting using stride-1 probing for better cache
coherency - but it's aweful at cascading into linear searches for an empty slot.
 
 Plain old D doesn't use SSE or SSE2??? 

I have no idea.  I thought you were asking if there were inline asm 
blocks.  I suppose the codegen may use SSE2 instructions where appropriate.


Sean

Apr 02 2007

Mark <mark-dm santaniello.net> writes:

For what it's worth, I'm pretty sure the atomicity of 128-bit SSE stores is
implementation defined.  They might be atomic on newer architectures, but older
designs implement them as two discrete 64-bit operations.

May 15 2007

Sean Kelly <sean f4.ca> writes:

Craig Black wrote:
 Yes.  I've done benchmarks and power of two is better.  I wonder, does Tango 
 use a power of two algorithm for its AA's?

It's on my "to do" list, actually.  So far, however, I have been 
avoiding making making unnecessary changes to the compiler runtime in 
case Walter decides to take over management of it.  But there are a few 
things I want to resolve before Tango goes 1.0 either way.


Sean

Apr 02 2007

Sean Kelly <sean f4.ca> writes:

Lionello Lunesu wrote:
 Knud Soerensen wrote:
 I would like to share this interesting video
 http://video.google.com/videoplay?docid=2139967204534450862

 
 Very interesting, thanks for that. Maybe now we can convince Walter to 
 include the patch I made about a year ago to make D's AA power-of-2/AND 
 instead of prime/MOD?
 
 I'm wondering if this would be useful for D. What about memory 
 allocations? If one thread is allocating a key/value, will that 
 allocation block other threads?

Yes.

 Any difference between Phobos/Tango?

No.  I've considered creating a GC with per-thread memory pools and 
such, but haven't had the time.


Sean

Apr 02 2007

D Programming

C/C++ Programming

Other

digitalmars.D - A Lock-Free Hash Table (google video)