digitalmars.D - storing the hash multiplier instead of the hash value
- Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Mar 22 2010
- Daniel Keep <daniel.keep.lists gmail.com> Mar 22 2010
- Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Mar 22 2010
- Walter Bright <newshound1 digitalmars.com> Mar 22 2010
- Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Mar 22 2010
- Walter Bright <newshound1 digitalmars.com> Mar 22 2010
- Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Mar 22 2010
- Walter Bright <newshound1 digitalmars.com> Mar 22 2010
- Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Mar 23 2010
- Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Mar 23 2010
- Walter Bright <newshound1 digitalmars.com> Mar 23 2010
- Walter Bright <newshound1 digitalmars.com> Mar 23 2010
- Fawzi Mohamed <fawzi gmx.ch> Mar 23 2010
- Walter Bright <newshound1 digitalmars.com> Mar 23 2010
- "Robert Jacques" <sandford jhu.edu> Mar 22 2010
- "Robert Jacques" <sandford jhu.edu> Mar 22 2010
- Michael Rynn <michaelrynn optusnet.com.au> Mar 23 2010
- Fawzi Mohamed <fawzi gmx.ch> Mar 23 2010
- Sea Kelly <sean invisibleduck.org> Mar 30 2010
- "Robert Jacques" <sandford jhu.edu> Mar 30 2010
- Sean Kelly <sean invisibleduck.org> Apr 11 2010
- Lionello Lunesu <lio lunesu.remove.com> Mar 22 2010
- Leandro Lucarella <llucax gmail.com> Mar 23 2010
- Fawzi Mohamed <fawzi gmx.ch> Mar 23 2010
- Fawzi Mohamed <fawzi gmx.ch> Mar 23 2010
Currently, each entry in a D hashtable stores the hash of the key for efficiency purposes. There is a bit of redundancy: as soon as you entered a hash bucket you know that the hashkey % tablesize is a specific number, call it k. But k is not enough information to restore the hash, so the actual hash gets stored for each entry. I'm thinking of exploiting that redundancy by storing the hash multiplier, not the hash value. Instead of storing h in each slot, store m = h / tablesize. Then you can restore the actual h by writing: restored_h = m * tablesize + k; k is known for each given bucket (it's the index of the bucket) and m is what gets stored per entry. What's the advantage of this approach? Well the advantage is that m is a small number. Any hash function will try to disperse the hash value as much as possible between the 32 available bits. But m will be a smaller number, and therefore will be more grouped and will engender fewer false pointers. Would this help? Andrei
Mar 22 2010
How about just *fixing* the hashtable so it doesn't generate false pointers in the first place? Maybe I'm just strange, but that seems like a more direct, efficacious solution... As for the redundancy, I was under the impression that the hash was cached so make resizing more efficient: if the number of buckets changes, you have to re-insert all the nodes. If you don't store the hash, you have to recompute it (which is expensive for anything other than ints (wherein D laughably 'hashes' by returning the ints value)). Given that I already think this is a silly way to go about solving the false pointer issue, I don't see any benefit to doing this other than to give the CPU something to do while it waits for memory reads. Sadly, I lack the background to make any sort of objective judgement of the hash function *itself*, so I'll just reiterate what I've heard repeated to me over and over again by numerous people: D's builtin hash function sucks like a black holes.
Mar 22 2010
On 03/22/2010 02:50 PM, Daniel Keep wrote:How about just *fixing* the hashtable so it doesn't generate false pointers in the first place? Maybe I'm just strange, but that seems like a more direct, efficacious solution...
This was of course the first thing Walter and I discussed. It turns out that that would necessitate precise GC, which we don't have yet.As for the redundancy, I was under the impression that the hash was cached so make resizing more efficient: if the number of buckets changes, you have to re-insert all the nodes. If you don't store the hash, you have to recompute it (which is expensive for anything other than ints (wherein D laughably 'hashes' by returning the ints value)).
What would be a better hashing scheme for ints?Given that I already think this is a silly way to go about solving the false pointer issue, I don't see any benefit to doing this other than to give the CPU something to do while it waits for memory reads. Sadly, I lack the background to make any sort of objective judgement of the hash function *itself*, so I'll just reiterate what I've heard repeated to me over and over again by numerous people: D's builtin hash function sucks like a black holes.
Then you might be glad to hear that Walter has recently improved the default hashtable implementation significantly (at least for heavy-duty applications). We're using Paul Hsieh's hash function for strings and general memory, which is no slouch. http://www.dsource.org/projects/druntime/browser/trunk/src/rt/util/hash.d Better suggestions are always welcome. For integrals I'm unclear on what we could use to make things better. (Clearly we could and should get rid of the extraneous field.) Andrei
Mar 22 2010
Andrei Alexandrescu wrote:Better suggestions are always welcome. For integrals I'm unclear on what we could use to make things better. (Clearly we could and should get rid of the extraneous field.)
Unfortunately, it won't be much of a win. Memory allocation is done in buckets of size 16, 32, 64, etc. Reducing the node size for a uint[uint] from 16 to 12 saves no memory at all.
Mar 22 2010
On 03/22/2010 03:36 PM, Walter Bright wrote:Andrei Alexandrescu wrote:Better suggestions are always welcome. For integrals I'm unclear on what we could use to make things better. (Clearly we could and should get rid of the extraneous field.)
Unfortunately, it won't be much of a win. Memory allocation is done in buckets of size 16, 32, 64, etc. Reducing the node size for a uint[uint] from 16 to 12 saves no memory at all.
As we discussed, if nodes are allocated in bunches, you could store 5 nodes in a 64-byte block instead of 4. That's more than a 25% increase in effectiveness because the per-block bookkeeping is also slashed by 5. Andrei
Mar 22 2010
Andrei Alexandrescu wrote:As we discussed, if nodes are allocated in bunches, you could store 5 nodes in a 64-byte block instead of 4. That's more than a 25% increase in effectiveness because the per-block bookkeeping is also slashed by 5.
Right, but the downside is that the nodes will not be collected unless all its neighbors are also not used.
Mar 22 2010
On 03/22/2010 04:03 PM, Andrei Alexandrescu wrote:On 03/22/2010 03:36 PM, Walter Bright wrote:Andrei Alexandrescu wrote:Better suggestions are always welcome. For integrals I'm unclear on what we could use to make things better. (Clearly we could and should get rid of the extraneous field.)
Unfortunately, it won't be much of a win. Memory allocation is done in buckets of size 16, 32, 64, etc. Reducing the node size for a uint[uint] from 16 to 12 saves no memory at all.
As we discussed, if nodes are allocated in bunches, you could store 5 nodes in a 64-byte block instead of 4. That's more than a 25% increase in effectiveness because the per-block bookkeeping is also slashed by 5. Andrei
One more idea before I forget: the bunches approach requires using a freelist for nodes that are available but not used. (Freelists are a good idea whether or not bunches are used.) One good possibility is to store the head of the freelist in a static variable, such that e.g. all int[string] instances use the same freelist. And there's no thread contention issue because each thread has its own freelist. Andrei
Mar 22 2010
On 03/22/2010 04:59 PM, Andrei Alexandrescu wrote:On 03/22/2010 04:03 PM, Andrei Alexandrescu wrote:On 03/22/2010 03:36 PM, Walter Bright wrote:Andrei Alexandrescu wrote:Better suggestions are always welcome. For integrals I'm unclear on what we could use to make things better. (Clearly we could and should get rid of the extraneous field.)
Unfortunately, it won't be much of a win. Memory allocation is done in buckets of size 16, 32, 64, etc. Reducing the node size for a uint[uint] from 16 to 12 saves no memory at all.
As we discussed, if nodes are allocated in bunches, you could store 5 nodes in a 64-byte block instead of 4. That's more than a 25% increase in effectiveness because the per-block bookkeeping is also slashed by 5. Andrei
One more idea before I forget: the bunches approach requires using a freelist for nodes that are available but not used. (Freelists are a good idea whether or not bunches are used.) One good possibility is to store the head of the freelist in a static variable, such that e.g. all int[string] instances use the same freelist. And there's no thread contention issue because each thread has its own freelist.
And the last idea before I forget (just talked to Walter about it) is that the GC could and should offer functions: GC.preCollect(void delegate()); GC.postCollect(void delegate()); That way we can effectively implement weak pointers. Andrei
Mar 22 2010
On 03/22/2010 08:32 PM, Robert Jacques wrote:On Mon, 22 Mar 2010 19:00:57 -0300, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote: [snip]And the last idea before I forget (just talked to Walter about it) is that the GC could and should offer functions: GC.preCollect(void delegate()); GC.postCollect(void delegate()); That way we can effectively implement weak pointers.
Pre and post collect should be an array of delegates. Otherwise only one person's weak pointer library would work.
Those are functions that you can call several times. The gc will keep indeed a container of delegates. Andrei
Mar 22 2010
On 03/23/2010 08:11 AM, Michael Rynn wrote:On Mon, 22 Mar 2010 16:59:36 -0500, Andrei Alexandrescu wrote:On 03/22/2010 04:03 PM, Andrei Alexandrescu wrote:On 03/22/2010 03:36 PM, Walter Bright wrote:Andrei Alexandrescu wrote:Better suggestions are always welcome. For integrals I'm unclear on what we could use to make things better. (Clearly we could and should get rid of the extraneous field.)
Unfortunately, it won't be much of a win. Memory allocation is done in buckets of size 16, 32, 64, etc. Reducing the node size for a uint[uint] from 16 to 12 saves no memory at all.
As we discussed, if nodes are allocated in bunches, you could store 5 nodes in a 64-byte block instead of 4. That's more than a 25% increase in effectiveness because the per-block bookkeeping is also slashed by 5. Andrei
One more idea before I forget: the bunches approach requires using a freelist for nodes that are available but not used. (Freelists are a good idea whether or not bunches are used.) One good possibility is to store the head of the freelist in a static variable, such that e.g. all int[string] instances use the same freelist. And there's no thread contention issue because each thread has its own freelist. Andrei
I just committed a aaA.d version that uses some heap node memory management, although its on a per class instance basis. Also it dispences with a separate hash storage for keys<= size_t. Sharing would mean some working out of which classes share the same sized node blocks. Much easier to implement class sharing using templates. See the code here : http://www.dsource.org/projects/aa/browser/trunk/ druntime/aaA.d See the benchmarks and comments here : http://www.dsource.org/projects/aa/ wiki/DrunTimeAA. The result squeezes some extra performance for integer or less sized keys (about 20% faster for lookups). For the druntime, the generic C interface constrains the kinds of specialized AA versions that can be instantiated using run-time TypeInfo and var-arged calls. Maybe a class / interface direct calls would work better. Just imagine at runtime looking at the TypeInfo_AssociatedArray and trying to work out exactly which template is going to be instantiated. And having a low code size, one or few versions fit all approach, that performance sacrifice is unavoidable in a basic runtime libary. But in template land , options and combinations and gains abound.
Absolutely! This is solid work. It would be interesting to assess how your implementation and the old built-in hashes compare with Walter's new implementation, which uses a singly-linked list instead of a tree. Do you have what it takes to check out and build druntime and phobos? Andrei
Mar 23 2010
Andrei Alexandrescu wrote:Then you might be glad to hear that Walter has recently improved the default hashtable implementation significantly (at least for heavy-duty applications).
All I did was switch from a tree used to resolve a bucket collision with a singly linked list. The improvement comes from reduced memory consumption, which for uint[uint] changes a node from 20 bytes to 16, thus halving the memory used. If the hash node still uses over 16 bytes, I doubt there will be any improvement at all. Note that the linked list implementation is equivalent to g++'s hash_map implementation.
Mar 22 2010
On 03/22/2010 03:54 PM, Walter Bright wrote:Andrei Alexandrescu wrote:Then you might be glad to hear that Walter has recently improved the default hashtable implementation significantly (at least for heavy-duty applications).
All I did was switch from a tree used to resolve a bucket collision with a singly linked list. The improvement comes from reduced memory consumption, which for uint[uint] changes a node from 20 bytes to 16, thus halving the memory used. If the hash node still uses over 16 bytes, I doubt there will be any improvement at all. Note that the linked list implementation is equivalent to g++'s hash_map implementation.
In at least one application, the new hash performed on a par with or better than g++'s hash_map. Andrei
Mar 22 2010
On 03/23/2010 12:08 PM, Fawzi Mohamed wrote:On 22-mar-10, at 21:04, Andrei Alexandrescu wrote:Better suggestions are always welcome. For integrals I'm unclear on what we could use to make things better. (Clearly we could and should get rid of the extraneous field.)
I like murmurhash, that I what I put into tango, see http://www.dsource.org/projects/tango/browser/trunk/tango/core/rt/compiler/util/hash.d all that file is based on freely available code and written by me, and I give my code with whatever license would be needed.
Thanks, Fawzi, that's great. I'm trying to collect evidence of MurmurHash's effectiveness versus Hsieh's SuperFastHash. I seemed to find some: http://tanjent.livejournal.com/756623.htmlhttp://tanjent.livejournal.com/756623.html I also found what seems to be a thorough benchmark: http://www.strchr.com/hash_functions Looks pretty solid to me. The benchmarks look only at strings, not at typical workloads on void* (e.g. arbitrary objects containing integers and pointers and whatnot). Hsieh's hash is smack in the middle, whereas Murmur2 is fourth, faster by 10%.I think that the public interface should be exposed (or re-exposed) somewhere to the outside so that one can easily create efficient hashes for user defined types. For example it is nice to be able to chain hash functions (something that the default one does not allow).
What I'm pushing for as of now is to move the associative array definition from opacity into templated goodies in object_.d. Andrei
Mar 23 2010
On 03/23/2010 02:34 PM, Fawzi Mohamed wrote:On 23-mar-10, at 19:04, Andrei Alexandrescu wrote:What I'm pushing for as of now is to move the associative array definition from opacity into templated goodies in object_.d.
that would be nice, that is one the main reasons Steven implementation is faster. It would be nice if this would be done by the compiler as rewriting the calls as call to "normal" templates, i.e. to a specially named templated struct (AArray for example) so that (syntactic sugar for the name aside) that would be the same as a library implementation. This would have two advantages: - easy to replace the library implementation, as it would be even less special - easy to replace the usage in one piece of code with another implementation (well truth to be told that is rather easy also now)
I have much loftier goals, which scare Walter sometimes :o). In the long term, my plan is to allow object.d to decide on a number of fundamental program-wide choices, such as the use of mark-sweep GC versus reference counting GC. In wake of my experience with D in heavyset programs, I reckon that there is a necessity to have deterministic memory management for a subset of applications. Walter is afraid that that's going to mean the balkanization of D - everyone will define their own object.d. That may be a risk, but I strongly believe it's a risk worth taking. Andrei
Mar 23 2010
Andrei Alexandrescu wrote:Walter is afraid that that's going to mean the balkanization of D - everyone will define their own object.d. That may be a risk, but I strongly believe it's a risk worth taking.
It would mean the end of precompiled libraries.
Mar 23 2010
Fawzi Mohamed wrote:On 23-mar-10, at 19:04, Andrei Alexandrescu wrote:What I'm pushing for as of now is to move the associative array definition from opacity into templated goodies in object_.d.
that would be nice, that is one the main reasons Steven implementation is faster. It would be nice if this would be done by the compiler as rewriting the calls as call to "normal" templates, i.e. to a specially named templated struct (AArray for example) so that (syntactic sugar for the name aside) that would be the same as a library implementation. This would have two advantages: - easy to replace the library implementation, as it would be even less special - easy to replace the usage in one piece of code with another implementation (well truth to be told that is rather easy also now)
D2 already allows this.
Mar 23 2010
On 23-mar-10, at 23:07, Walter Bright wrote:Fawzi Mohamed wrote:On 23-mar-10, at 19:04, Andrei Alexandrescu wrote:What I'm pushing for as of now is to move the associative array definition from opacity into templated goodies in object_.d.
implementation is faster. It would be nice if this would be done by the compiler as rewriting the calls as call to "normal" templates, i.e. to a specially named templated struct (AArray for example) so that (syntactic sugar for the name aside) that would be the same as a library implementation. This would have two advantages: - easy to replace the library implementation, as it would be even less special - easy to replace the usage in one piece of code with another implementation (well truth to be told that is rather easy also now)
D2 already allows this.
ok I did not know that was possible.
Mar 23 2010
Fawzi Mohamed wrote:I think that the public interface should be exposed (or re-exposed) somewhere to the outside so that one can easily create efficient hashes for user defined types. For example it is nice to be able to chain hash functions (something that the default one does not allow).
Just overload the toHash() function for your user-defined type to be whatever you want it to be.
Mar 23 2010
On Mon, 22 Mar 2010 19:00:57 -0300, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote: [snip]And the last idea before I forget (just talked to Walter about it) is that the GC could and should offer functions: GC.preCollect(void delegate()); GC.postCollect(void delegate()); That way we can effectively implement weak pointers.
Pre and post collect should be an array of delegates. Otherwise only one person's weak pointer library would work.
Mar 22 2010
On Mon, 22 Mar 2010 17:04:56 -0300, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:On 03/22/2010 02:50 PM, Daniel Keep wrote:How about just *fixing* the hashtable so it doesn't generate false pointers in the first place? Maybe I'm just strange, but that seems like a more direct, efficacious solution...
This was of course the first thing Walter and I discussed. It turns out that that would necessitate precise GC, which we don't have yet.
What about the precise GC patch? http://d.puremagic.com/issues/show_bug.cgi?id=3463
Mar 22 2010
On Mon, 22 Mar 2010 16:59:36 -0500, Andrei Alexandrescu wrote:On 03/22/2010 04:03 PM, Andrei Alexandrescu wrote:On 03/22/2010 03:36 PM, Walter Bright wrote:Andrei Alexandrescu wrote:Better suggestions are always welcome. For integrals I'm unclear on what we could use to make things better. (Clearly we could and should get rid of the extraneous field.)
Unfortunately, it won't be much of a win. Memory allocation is done in buckets of size 16, 32, 64, etc. Reducing the node size for a uint[uint] from 16 to 12 saves no memory at all.
As we discussed, if nodes are allocated in bunches, you could store 5 nodes in a 64-byte block instead of 4. That's more than a 25% increase in effectiveness because the per-block bookkeeping is also slashed by 5. Andrei
One more idea before I forget: the bunches approach requires using a freelist for nodes that are available but not used. (Freelists are a good idea whether or not bunches are used.) One good possibility is to store the head of the freelist in a static variable, such that e.g. all int[string] instances use the same freelist. And there's no thread contention issue because each thread has its own freelist. Andrei
I just committed a aaA.d version that uses some heap node memory management, although its on a per class instance basis. Also it dispences with a separate hash storage for keys <= size_t. Sharing would mean some working out of which classes share the same sized node blocks. Much easier to implement class sharing using templates. See the code here : http://www.dsource.org/projects/aa/browser/trunk/ druntime/aaA.d See the benchmarks and comments here : http://www.dsource.org/projects/aa/ wiki/DrunTimeAA. The result squeezes some extra performance for integer or less sized keys (about 20% faster for lookups). For the druntime, the generic C interface constrains the kinds of specialized AA versions that can be instantiated using run-time TypeInfo and var-arged calls. Maybe a class / interface direct calls would work better. Just imagine at runtime looking at the TypeInfo_AssociatedArray and trying to work out exactly which template is going to be instantiated. And having a low code size, one or few versions fit all approach, that performance sacrifice is unavoidable in a basic runtime libary. But in template land , options and combinations and gains abound. --- Michael Rynn
Mar 23 2010
On 22-mar-10, at 21:04, Andrei Alexandrescu wrote:On 03/22/2010 02:50 PM, Daniel Keep wrote:Sadly, I lack the background to make any sort of objective judgement of the hash function *itself*, so I'll just reiterate what I've heard repeated to me over and over again by numerous people: D's builtin hash function sucks like a black holes.
Then you might be glad to hear that Walter has recently improved the default hashtable implementation significantly (at least for heavy- duty applications). We're using Paul Hsieh's hash function for strings and general memory, which is no slouch. http://www.dsource.org/projects/druntime/browser/trunk/src/rt/util/hash.d Better suggestions are always welcome. For integrals I'm unclear on what we could use to make things better. (Clearly we could and should get rid of the extraneous field.)
I like murmurhash, that I what I put into tango, see http://www.dsource.org/projects/tango/browser/trunk/tango/core/rt/compiler/util/hash.d all that file is based on freely available code and written by me, and I give my code with whatever license would be needed. I think that the public interface should be exposed (or re-exposed) somewhere to the outside so that one can easily create efficient hashes for user defined types. For example it is nice to be able to chain hash functions (something that the default one does not allow). Fawzi
Mar 23 2010
I like murmurhash as well, and I'd even had it in druntime briefly, but it uses a proprietary license and I couldn't get permision from the author to use it. It's also an order of magnitude slower on sparc than x86. Fawzi Mohamed <fawzi gmx.ch> wrote:On 22-mar-10, at 21:04, Andrei Alexandrescu wrote:On 03/22/2010 02:50 PM, Daniel Keep wrote:Sadly, I lack the background to make any sort of objective >> judgement of the hash function *itself*, so I'll just reiterate what I've heard repeated to me over and over again by numerous people: D's builtinhash
Then you might be glad to hear that Walter has recently improved the > default hashtable implementation significantly (at least for heavy- > duty applications). We're using Paul Hsieh's hash function for strings and general > memory, which is no slouch. http://www.dsource.org/projects/druntime/browser/trunk/src/rt/util/hash.d Better suggestions are always welcome. For integrals I'm unclear onwhat we could use to make things better. (Clearly we could and > should get rid of the extraneous field.)
I like murmurhash, that I what I put into tango, see http://www.dsource.org/projects/tango/browser/trunk/tango/core/rt/compiler/util/hash.d all that file is based on freely available code and written by me, and I give my code with whatever license would be needed. I think that the public interface should be exposed (or re-exposed) somewhere to the outside so that one can easily create efficient hashes for user defined types. For example it is nice to be able to chain hash functions (something that the default one does not allow). Fawzi
Mar 30 2010
On Tue, 30 Mar 2010 16:17:20 -0300, Sea Kelly <sean invisibleduck.org> wrote:I like murmurhash as well, and I'd even had it in druntime briefly, but it uses a proprietary license and I couldn't get permision from the author to use it. It's also an order of magnitude slower on sparc than x86.
That seems weird. The website it lists it as: "All code is released to the public domain. For business purposes, Murmurhash is under the MIT license."
Mar 30 2010
"Robert Jacques" <sandford jhu.edu> wrote:On Tue, 30 Mar 2010 16:17:20 -0300, Sea Kelly <sean invisibleduck.org> wrote:I like murmurhash as well, and I'd even had it in druntime briefly, but it uses a proprietary license and I couldn't get permision from the author to use it. It's also an order of magnitude slower on sparc than x86.
That seems weird. The website it lists it as: "All code is released to the public domain. For business purposes, Murmurhash is under the MIT license."
That must be new, I couldn't find any license at all at the time. I emailed the author about it and he never replied, jough he'd been helpful with other issues. Or perhaps I jus missed it.
Apr 11 2010
On 23-3-2010 1:51, Andrei Alexandrescu wrote:Currently, each entry in a D hashtable stores the hash of the key for efficiency purposes. There is a bit of redundancy: as soon as you entered a hash bucket you know that the hashkey % tablesize is a specific number, call it k. But k is not enough information to restore the hash, so the actual hash gets stored for each entry. I'm thinking of exploiting that redundancy by storing the hash multiplier, not the hash value. Instead of storing h in each slot, store m = h / tablesize. Then you can restore the actual h by writing: restored_h = m * tablesize + k; k is known for each given bucket (it's the index of the bucket) and m is what gets stored per entry. What's the advantage of this approach? Well the advantage is that m is a small number. Any hash function will try to disperse the hash value as much as possible between the 32 available bits. But m will be a smaller number, and therefore will be more grouped and will engender fewer false pointers. Would this help?
Probably not. hash is an 'int' and this 'm' will still use up the same space as an int, I reckon. Even if it ends up being 16-bit and can be put in a short, alignment will probably cause it to consume 32-bits anyway. What's more the 16-bit access (not to mention the multiplier) will slow the AA down. I say, leave it be. L.
Mar 22 2010
Andrei Alexandrescu, el 22 de marzo a las 15:04 me escribiste:On 03/22/2010 02:50 PM, Daniel Keep wrote:How about just *fixing* the hashtable so it doesn't generate false pointers in the first place? Maybe I'm just strange, but that seems like a more direct, efficacious solution...
This was of course the first thing Walter and I discussed. It turns out that that would necessitate precise GC, which we don't have yet.
Well... there is an implementation hanging in bugzilla... -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ ---------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ----------------------------------------------------------------------
Mar 23 2010
On 23-mar-10, at 19:04, Andrei Alexandrescu wrote:What I'm pushing for as of now is to move the associative array definition from opacity into templated goodies in object_.d.
that would be nice, that is one the main reasons Steven implementation is faster. It would be nice if this would be done by the compiler as rewriting the calls as call to "normal" templates, i.e. to a specially named templated struct (AArray for example) so that (syntactic sugar for the name aside) that would be the same as a library implementation. This would have two advantages: - easy to replace the library implementation, as it would be even less special - easy to replace the usage in one piece of code with another implementation (well truth to be told that is rather easy also now) Fawzi
Mar 23 2010
On 23-mar-10, at 23:02, Walter Bright wrote:Fawzi Mohamed wrote:I think that the public interface should be exposed (or re-exposed) somewhere to the outside so that one can easily create efficient hashes for user defined types. For example it is nice to be able to chain hash functions (something that the default one does not allow).
Just overload the toHash() function for your user-defined type to be whatever you want it to be.
I know, maybe I have not expressed myself clearly, but this overridden function has to be written. For objects combining various pieces, one has to create a unique hash from various pieces. The functions I have defined in hash.d help in doing that is such a way that changing a bit anywhere most likely changes the whole hash.
Mar 23 2010









Walter Bright <newshound1 digitalmars.com> 