www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Call for arms: associative array reimplementation

reply "Dicebot" <m.strashun gmail.com> writes:
I have had a small e-mail conversation with H. S. Teoh recently 
regarding his old attempt to reimplement built-in associative 
arrays and fix various issues they create. His experience and 
memories are written down in wiki now:

http://wiki.dlang.org/AA_Implementation_Issues

I still do want to continue this task, but this happened to be 
much more tricky and I just been walking circles around so far. 
So if anyone has some spare time and wants to show his D Kung Fu 
- this wiki article is probably worth checking out :) As far as I 
know this is kind of "pre-approved" change.
Feb 21 2013
next sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
21-Feb-2013 14:49, Dicebot пишет:
 I have had a small e-mail conversation with H. S. Teoh recently
 regarding his old attempt to reimplement built-in associative arrays and
 fix various issues they create. His experience and memories are written
 down in wiki now:

 http://wiki.dlang.org/AA_Implementation_Issues

 I still do want to continue this task, but this happened to be much more
 tricky and I just been walking circles around so far. So if anyone has
 some spare time and wants to show his D Kung Fu - this wiki article is
 probably worth checking out :) As far as I know this is kind of
 "pre-approved" change.

I'd love to help but I've got my hands full already. In essence the only way I see this ball rolling: a) make a hacked version of compiler that actually **cking gets rid of V[K] by replacing it with __AA!(V, K) the moment it sees it. Literals are structs of type __AALiteral!(V,K), both could be defined anywhere but are expected in object.d. Yes, that means error messages, is(...) matching and so on all see and expose this template. NO reverse conversion or special casing to have it as built-in type must be done anywhere, period. Technically advanced users should be able to replace built-in __AA template with something custom. b) The aforementioned compiler probably won't be able to compile much of anything excpet druntime. The way to go is to hack away on this__AA template in druntime object.d and compile test cases in isolation. c) Once the stuff is stable enough integrate Phobos. d) Only then when all works, and AA is plain temaplted type done via lowering we should try to make error messages look more pretty and replace __AA!(K, V) with V[K] in error logs alone. The tricky part about immutable values isn't tricky at all. At the moment you just blast your way through with casts. Since AA implementation is in sole control of memory and it's definitely on HEAP there is no risk of writing to ROM/protected stuff. It's more an act of courage then trickery ;) P.S. The new AA implementation IMHO would fix as much as it would break simply because the current one is half-magic, half-bug. That's one of reasons it should be done by reaping it off, then doing it from scratch in the clean room at the side and then re-introduced it again. -- Dmitry Olshansky
Feb 21 2013
prev sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Feb 21, 2013 at 09:33:53PM +0400, Dmitry Olshansky wrote:
 21-Feb-2013 14:49, Dicebot пишет:
I have had a small e-mail conversation with H. S. Teoh recently
regarding his old attempt to reimplement built-in associative arrays
and fix various issues they create. His experience and memories are
written down in wiki now:

http://wiki.dlang.org/AA_Implementation_Issues

I still do want to continue this task, but this happened to be much
more tricky and I just been walking circles around so far. So if
anyone has some spare time and wants to show his D Kung Fu - this
wiki article is probably worth checking out :) As far as I know this
is kind of "pre-approved" change.

I'd love to help but I've got my hands full already. In essence the only way I see this ball rolling: a) make a hacked version of compiler that actually **cking gets rid of V[K] by replacing it with __AA!(V, K) the moment it sees it. Literals are structs of type __AALiteral!(V,K), both could be defined anywhere but are expected in object.d. Yes, that means error messages, is(...) matching and so on all see and expose this template. NO reverse conversion or special casing to have it as built-in type must be done anywhere, period. Technically advanced users should be able to replace built-in __AA template with something custom.

Hmm that's a good idea. Hack the compiler to kill off all instances of the old code first, then work on it until things start to work again. Better than trying to sort out the current mess, which is not only split across object.d, aaA.d, and dmd internals, but within dmd is also sprinkled everywhere like powder. Better to rip the whole thing out and start over. The thing that would be needed to pull this off, though, is to have a large-enough corpus of D code that uses AA's, hopefully in non-trivial ways, so that we can cover all the use cases (i.e. ferret out hidden compiler paths that produce references to the old code). The same corpus can then be used to test the new implementation by failing to compile or work until everything is done. We can incrementally work on the new stuff, say reduce compile errors one by one, until everything works again. [...]
 The tricky part about immutable values isn't tricky at all. At the
 moment you just blast your way through with casts. Since AA
 implementation is in sole control of memory and it's definitely on
 HEAP there is no risk of writing to ROM/protected stuff. It's more
 an act of courage then trickery ;)

The problem is that if the key is an array, it's *not* in the sole control of the AA implementation. User code may have outside slices that reference the same memory. That's the problem. I just got an idea about how to solve immutable key problem. We can require that POD AA keys must be immutable, but user-defined structs/classes are allowed to be non-immutable. The rationale is that POD's hash values are computed over their entire value, so we know for sure that if their value changes, the hash will change, so this is not allowed. But user types can define their own toHash that only depends on a subset of the fields, and we need to allow that, so in that case the onus is on the user to not change things externally. D is a systems programming language, so we shouldn't need to be babysitting the users. If they can be trusted to write a sane toHash function (because the AA will malfunction badly if toHash misbehaves), then they can also be trusted not to shoot themselves in the foot by externally changing an object being used as an AA key.
 P.S. The new AA implementation IMHO would fix as much as it would
 break simply because the current one is half-magic, half-bug. That's
 one of reasons it should be done by reaping it off, then doing it from
 scratch in the clean room at the side and then re-introduced it again.

I like this approach. I'd go ahead with it, except that so far I haven't touched the DMD code yet, and I'm unfamiliar with how it works. It would be nice if somebody familiar with DMD internals can chip in too. T -- The peace of mind---from knowing that viruses which exploit Microsoft system vulnerabilities cannot touch Linux---is priceless. -- Frustrated system administrator.
Feb 26 2013