digitalmars.D - A Lock-Free Hash Table (google video)
- Knud Soerensen <4tuu4k002 sneakemail.com> Apr 01 2007
- Dejan Lekic <dejan.lekic gmail.com> Apr 01 2007
- Manfred Nowak <svv1999 hotmail.com> Apr 01 2007
- "Craig Black" <cblack ara.com> Apr 01 2007
- "David B. Held" <dheld codelogicconsulting.com> Apr 01 2007
- "David B. Held" <dheld codelogicconsulting.com> Apr 01 2007
- Sean Kelly <sean f4.ca> Apr 01 2007
- "David B. Held" <dheld codelogicconsulting.com> Apr 01 2007
- "Craig Black" <cblack ara.com> Apr 02 2007
- Dan <murpsoft hotmail.com> Apr 02 2007
- "David B. Held" <dheld codelogicconsulting.com> Apr 02 2007
- Brad Roberts <braddr puremagic.com> Apr 01 2007
- Lionello Lunesu <lio lunesu.remove.com> Apr 02 2007
- "Craig Black" <cblack ara.com> Apr 02 2007
- Dan <murpsoft hotmail.com> Apr 02 2007
- Sean Kelly <sean f4.ca> Apr 02 2007
- Dan <murpsoft hotmail.com> Apr 02 2007
- Sean Kelly <sean f4.ca> Apr 02 2007
- Mark <mark-dm santaniello.net> May 15 2007
- Sean Kelly <sean f4.ca> Apr 02 2007
- Sean Kelly <sean f4.ca> Apr 02 2007
I would like to share this interesting video http://video.google.com/videoplay?docid=2139967204534450862
Apr 01 2007
Mr. Soerensen, thanks - a really nice video.
Apr 01 2007
Knud Soerensen wroteI would like to share this interesting video http://video.google.com/videoplay?docid=2139967204534450862
The author admits, that he has some problems when resizing. He solved them by "stalling". With 64MB hashes the stall time is short though. http://blogs.azulsystems.com/cliff/ But I wonder, how much "stalling" will be done on resizing to some milliards of hash positions. -manfred
Apr 01 2007
Good stuff. BTW, does anyone know when is D going to start using lock free algorithms with their built-in/templated containers? -Craig
Apr 01 2007
Craig Black wrote:Good stuff. BTW, does anyone know when is D going to start using lock free algorithms with their built-in/templated containers?
When STM is implemented. ;) And I hope I'm not giving away too much by saying that some reasonably smart folks are working on it even as we speak... Dave
Apr 01 2007
Brad Roberts wrote:[...] STM (software transactional memory) isn't required for lock-free simple data structures (not all structures can be done lock free and I'm not an expert in that area). It can be used, but it's overkill. [...]
Well, the point is that even if nobody rewrites any containers to use lock-free algorithms, you can get correct lock-free behavior at whatever level of granularity you feel is appropriate by writing the relevant code in an STM atomic {} block. Granted, there are cases where a lock-free container makes sense and may provide, for instance, better concurrency in the high-contention case. But much of the attraction of STM is that for the simple case, no code actually has to be rewritten, which is great for people who need to build big data structures but don't want to become experts in designing lock-free algorithms. Dave
Apr 01 2007
David B. Held wrote:Craig Black wrote:Good stuff. BTW, does anyone know when is D going to start using lock free algorithms with their built-in/templated containers?
When STM is implemented. ;) And I hope I'm not giving away too much by saying that some reasonably smart folks are working on it even as we speak...
Will this require the user to expose a .clone() method for classes involved in transactions?
Apr 01 2007
Sean Kelly wrote:David B. Held wrote:Craig Black wrote:Good stuff. BTW, does anyone know when is D going to start using lock free algorithms with their built-in/templated containers?
When STM is implemented. ;) And I hope I'm not giving away too much by saying that some reasonably smart folks are working on it even as we speak...
Will this require the user to expose a .clone() method for classes involved in transactions?
I don't want to overpromise and underdeliver, so I will start out with the disclaimer that nothing I say about D features is guaranteed to be implemented. ;) That being said, word-based STM is the most generic form, and requires no special work on the part of any user whatsoever. The first form of STM provided may or may not be word-based. ;) However, Walter and his "STM advisors" are aware of the other forms of STM, including object-based, and there is a possibility of having user-specifiable STM implementations. I think everyone realizes that concurrency is the hot new buzzword, and you can rest assured that D will take it very, very seriously. Dave
Apr 01 2007
"David B. Held" <dheld codelogicconsulting.com> wrote in message news:eupvmj$1au5$1 digitalmars.com...Craig Black wrote:Good stuff. BTW, does anyone know when is D going to start using lock free algorithms with their built-in/templated containers?
When STM is implemented. ;) And I hope I'm not giving away too much by saying that some reasonably smart folks are working on it even as we speak... Dave
This is very good. It seems that there are a number of things in the works for the D compiler. The const stuff, Walter has mentioned run-time reflection, and now this STM as you now mention. Any of these things would be very cool IMO. Although Walter sometimes mentions stuff currently in development, he does not say much about it. Do you have any insight on what is coming next? -Craig
Apr 02 2007
It seems to me that this is the only optimal form for hashtables - a massively scaleable one. Why? For cases where n < 8, a linear search is typically faster than any other search algorithm. The reason is less overhead. For cases where n < 1024, a level order binary search array is typically faster than any other search algorithm I've learned about, including a hashtable. The reason is again less overhead. For cases where n > 1024, I currently think that O(log(n)) in a level order binary search array is actually more expensive than that to manage a hashtable. For cases where n > 1048576, the data set is getting quite large. Assuming, through heuristics, that the number of gets and puts is also increasing, it becomes very practical to parallelize the hashtable. My understanding is that we tend to organize data either very small or very large, rarely between 1024 and 1048576. So implementing a hashtable, to me, suggests it ought to always be parallelizeable. Especially considering the low overhead involved if it's done right.
Apr 02 2007
Craig Black wrote:[...] This is very good. It seems that there are a number of things in the works for the D compiler. The const stuff, Walter has mentioned run-time reflection, and now this STM as you now mention. Any of these things would be very cool IMO. Although Walter sometimes mentions stuff currently in development, he does not say much about it. Do you have any insight on what is coming next?
Nothing more exciting than bug fixes, but that's always good, right? ;) Dave
Apr 02 2007
David B. Held wrote:Craig Black wrote:Good stuff. BTW, does anyone know when is D going to start using lock free algorithms with their built-in/templated containers?
When STM is implemented. ;) And I hope I'm not giving away too much by saying that some reasonably smart folks are working on it even as we speak... Dave
STM (software transactional memory) isn't required for lock-free simple data structures (not all structures can be done lock free and I'm not an expert in that area). It can be used, but it's overkill. A better answer, however, is that thread safety isn't something that you apply to every single layer of data structures. You need to really consider the appropriate level of granularity. Even lock free routines have a cost that you don't necessarily want to pay all the time. Given that, the lowest level is generally _not_ the right place to be applying synchronization logic (be it with atomic operations or locks). And to be clear, I've asked this before and been convinced of the above. You could ask the same thing of the stl for c++, and the same answer would apply. NOW, and even better answer might be to have the safety be a pluggable policy decision and then people would be able to make the choice for themselves. That'd allow the best of all worlds. The problem is that writing code that flexibly is quite a bit of work and someone would need to volunteer to do it since I'm just about 100% sure that the code in question here isn't going to change otherwise. Your question is a good one, and the answer is a lot more complex than it seems at first blush. Later, Brad
Apr 01 2007
Knud Soerensen wrote:I would like to share this interesting video http://video.google.com/videoplay?docid=2139967204534450862
Very interesting, thanks for that. Maybe now we can convince Walter to include the patch I made about a year ago to make D's AA power-of-2/AND instead of prime/MOD? I'm wondering if this would be useful for D. What about memory allocations? If one thread is allocating a key/value, will that allocation block other threads? Any difference between Phobos/Tango? L.
Apr 02 2007
Yes. I've done benchmarks and power of two is better. I wonder, does Tango use a power of two algorithm for its AA's? -Craig "Lionello Lunesu" <lio lunesu.remove.com> wrote in message news:euqekc$24nu$1 digitalmars.com...Knud Soerensen wrote:I would like to share this interesting video http://video.google.com/videoplay?docid=2139967204534450862
Very interesting, thanks for that. Maybe now we can convince Walter to include the patch I made about a year ago to make D's AA power-of-2/AND instead of prime/MOD? I'm wondering if this would be useful for D. What about memory allocations? If one thread is allocating a key/value, will that allocation block other threads? Any difference between Phobos/Tango? L.
Apr 02 2007
Craig Black Wrote:Very interesting, thanks for that. Maybe now we can convince Walter to include the patch I made about a year ago to make D's AA power-of-2/AND instead of prime/MOD?
oh my! I thought this was obvious; as was the powers of two, and using atomic asm operators to complete whole transactions or not at all. I assumed this was already well known and implemented. Actually, another way to implement the atomic transaction is to use any SSE/SSE2 move instruction to move a key/value pair into place with a: struct KeyVal { char* key; union { long l_value; void* ptr_value; double d_value; } } assert(KeyVal.sizeof == 16); into a ^2 sized array. It's either there, or it's not; so it's concurrent if you follow the rest of the principles, and requires absolutely no preparation for a CAS or locking. I'm going to hope that moving a 16-byte uses SSE2...
Apr 02 2007
Dan wrote:Craig Black Wrote:Very interesting, thanks for that. Maybe now we can convince Walter to include the patch I made about a year ago to make D's AA power-of-2/AND instead of prime/MOD?
oh my! I thought this was obvious; as was the powers of two, and using atomic asm operators to complete whole transactions or not at all. I assumed this was already well known and implemented. Actually, another way to implement the atomic transaction is to use any SSE/SSE2 move instruction to move a key/value pair into place with a: struct KeyVal { char* key; union { long l_value; void* ptr_value; double d_value; } } assert(KeyVal.sizeof == 16); into a ^2 sized array. It's either there, or it's not; so it's concurrent if you follow the rest of the principles, and requires absolutely no preparation for a CAS or locking.
Yup. It's rehashing that gets a bit messy.I'm going to hope that moving a 16-byte uses SSE2...
Nope. It's all plain old D code at the moment IIRC. Sean
Apr 02 2007
Sean Kelly Wrote:Yup. It's rehashing that gets a bit messy.I'm going to hope that moving a 16-byte uses SSE2...
Nope. It's all plain old D code at the moment IIRC.
Rehashing, you'd probe until you've found the right location *then* move the 16-byte struct. He was suggesting using stride-1 probing for better cache coherency - but it's aweful at cascading into linear searches for an empty slot. Plain old D doesn't use SSE or SSE2??? Oh my... Hasn't SSE been out since 1997? I guess with 64-bit GPR's, SSE1 is practically irrelevant now, but SSE2 and SSE3 could definitely improve performance of certain things... like copying arrays of 16-byte Value structs in DMDScript. I assumed that was what Walter meant when he commented that using 16 byte Value structs was critical for performance. I'll be sure to use little asm { } blocks to get certain things done then. The wonderful thing about D is that Walter provides a default mechanism for prototyping that works well. If you want something that works superbly,you can cut under it without much hassle. A superb language, that D.
Apr 02 2007
Dan wrote:Sean Kelly Wrote:Yup. It's rehashing that gets a bit messy.I'm going to hope that moving a 16-byte uses SSE2...
Rehashing, you'd probe until you've found the right location *then* move the 16-byte struct. He was suggesting using stride-1 probing for better cache coherency - but it's aweful at cascading into linear searches for an empty slot. Plain old D doesn't use SSE or SSE2???
I have no idea. I thought you were asking if there were inline asm blocks. I suppose the codegen may use SSE2 instructions where appropriate. Sean
Apr 02 2007
For what it's worth, I'm pretty sure the atomicity of 128-bit SSE stores is implementation defined. They might be atomic on newer architectures, but older designs implement them as two discrete 64-bit operations.
May 15 2007
Craig Black wrote:Yes. I've done benchmarks and power of two is better. I wonder, does Tango use a power of two algorithm for its AA's?
It's on my "to do" list, actually. So far, however, I have been avoiding making making unnecessary changes to the compiler runtime in case Walter decides to take over management of it. But there are a few things I want to resolve before Tango goes 1.0 either way. Sean
Apr 02 2007
Lionello Lunesu wrote:Knud Soerensen wrote:I would like to share this interesting video http://video.google.com/videoplay?docid=2139967204534450862
Very interesting, thanks for that. Maybe now we can convince Walter to include the patch I made about a year ago to make D's AA power-of-2/AND instead of prime/MOD? I'm wondering if this would be useful for D. What about memory allocations? If one thread is allocating a key/value, will that allocation block other threads?
Yes.Any difference between Phobos/Tango?
No. I've considered creating a GC with per-thread memory pools and such, but haven't had the time. Sean
Apr 02 2007









Dejan Lekic <dejan.lekic gmail.com> 