www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - hashed array?

reply "monarch_dodra" <monarchdodra gmail.com> writes:
D has a built in hashed associative container, which is great.

I've noticed though it has no built-in "hashed container" for 
when you just need to store unique elements, but without any 
specific data associated (lists of names, for example).

Has anybody requested this before? Would there be any plans for 
it?

----
The current "workaround" is to just dummy it.
alias bool Dummy;
Dummy[string] names;
names["Paul"] = true; //Ew.

But it is not very obvious what is going on. I also tried 
"void"ing it, eg:
void[string] names;
But that doesn't work either: Error: can't have associative array 
of void.

----
What would you think of introducing a built-in hashed container, 
that contains only keys?

We could call them "HA" for Hashed Array, and declare them with 
this simple syntax:

[string] names;

he interface would be mostly the already existing functions of AA 
(".remove", "in", "rehash()"), plus ".insert" to insert a key.

Thoughts?
Nov 12 2012
next sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
monarch_dodra:

 I've noticed though it has no built-in "hashed container" for 
 when you just need to store unique elements, but without any 
 specific data associated (lists of names, for example).

They are often named "hash sets". In Python this is a built-in type. But in D I think it's better to define such sets as a library template in a module to import. This is meant to be in Phobos.
 he interface would be mostly the already existing functions of 
 AA (".remove", "in", "rehash()"), plus ".insert" to insert a 
 key.

It also needs some other methods, see: http://docs.python.org/2/library/sets.html Bye, bearophile
Nov 12 2012
parent Jacob Carlborg <doob me.com> writes:
On 2012-11-12 16:10, Andrej Mitrovic wrote:
 On 11/12/12, monarch_dodra <monarchdodra gmail.com> wrote:
 But of course, a library solution would be good too.

dcollections has sets http://www.dsource.org/projects/dcollections (not sure if it still compiles though)

And Tango: https://github.com/SiegeLord/Tango-D2 http://dsource.org/projects/tango/docs/current -- /Jacob Carlborg
Nov 12 2012
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 11/12/12, monarch_dodra <monarchdodra gmail.com> wrote:
 I've noticed though it has no built-in "hashed container" for
 when you just need to store unique elements, but without any
 specific data associated (lists of names, for example).

You probably mean a Set? I think we could really use some syntax support for sets. Otherwise people use workarounds like "alias void[0][string] StringSet", and wrap such a type in a struct with operator overloads to emulate the built-in hash syntax.
Nov 12 2012
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
Andrej Mitrovic:

 I think we could really use some syntax support for sets.<

I don't agree. I think in D there is no real need for built-in support for sets, despite built-ins are handy. It's better to leave the compiler changes to things that can't be implemented in library code, like a good syntax for tuple unpacking and other things. I think the only feature you can't implement in D is "full" operator support for set comparisons, because currently D operator overloading doesn't allow you to define ">=" and ">" independently. But this is a minor limitation that doesn't justify built-in sets. Bye, bearophile
Nov 12 2012
prev sibling next sibling parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Monday, 12 November 2012 at 14:50:48 UTC, bearophile wrote:
 monarch_dodra:

 I've noticed though it has no built-in "hashed container" for 
 when you just need to store unique elements, but without any 
 specific data associated (lists of names, for example).

They are often named "hash sets". In Python this is a built-in type. But in D I think it's better to define such sets as a library template in a module to import. This is meant to be in Phobos.
 he interface would be mostly the already existing functions of 
 AA (".remove", "in", "rehash()"), plus ".insert" to insert a 
 key.

It also needs some other methods, see: http://docs.python.org/2/library/sets.html Bye, bearophile

Well, I don't see why AA would be built-in, but not set. In particular, more often than not, AAs are implemented in terms of set (and not the other way around) anyways, so if we take the time to implement AA, why not hashed sets? The advantage of having it built-in would mean more convenient declaration syntax: int[string] vs AA!(string, int) // string => int [string] vs Set!(string) //set of string But of course, a library solution would be good too. Might even make it my next project and get started on it now :)
Nov 12 2012
prev sibling next sibling parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Monday, 12 November 2012 at 14:53:19 UTC, Andrej Mitrovic 
wrote:
 On 11/12/12, monarch_dodra <monarchdodra gmail.com> wrote:
 I've noticed though it has no built-in "hashed container" for
 when you just need to store unique elements, but without any
 specific data associated (lists of names, for example).

You probably mean a Set? I think we could really use some syntax support for sets. Otherwise people use workarounds like "alias void[0][string] StringSet", and wrap such a type in a struct with operator overloads to emulate the built-in hash syntax.

Well "set" is just a name by convention. In C++, set's complexity restrictions forces the binary tree implementation on it.
Nov 12 2012
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 11/12/12, monarch_dodra <monarchdodra gmail.com> wrote:
 But of course, a library solution would be good too.

dcollections has sets http://www.dsource.org/projects/dcollections (not sure if it still compiles though)
Nov 12 2012
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
monarch_dodra:

 Well, I don't see why AA would be built-in, but not set.

It's generally wise to minimize the amount of stuff you put in the D front end or in the D runtime.
 so if we take the time to implement AA, why not hashed sets?

 The advantage of having it built-in would mean more convenient 
 declaration syntax:
 int[string] vs AA!(string, int) // string => int
 [string] vs Set!(string) //set of string

The situation is not fully symmetrical: associative arrays fully defined in Phobos have bad literals. Bye, bearophile
Nov 12 2012
prev sibling next sibling parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Monday, 12 November 2012 at 15:22:00 UTC, bearophile wrote:
 The situation is not fully symmetrical: associative arrays 
 fully defined in Phobos have bad literals.

I apologize, I'm not sure what you mean.
Nov 12 2012
prev sibling next sibling parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 11/12/2012 03:53 PM, Andrej Mitrovic wrote:
 You probably mean a Set? I think we could really use some syntax
 support for sets. Otherwise people use workarounds like "alias
 void[0][string] StringSet"

Can you give a bit more explanation of how that workaround would work? I can't see how you would insert entries in that case ... StringSet s; s["one"] = .... ?
Nov 12 2012
prev sibling next sibling parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Monday, 12 November 2012 at 15:32:24 UTC, Joseph Rushton 
Wakeling wrote:
 On 11/12/2012 03:53 PM, Andrej Mitrovic wrote:
 You probably mean a Set? I think we could really use some 
 syntax
 support for sets. Otherwise people use workarounds like "alias
 void[0][string] StringSet"

Can you give a bit more explanation of how that workaround would work? I can't see how you would insert entries in that case ... StringSet s; s["one"] = .... ?

The set interface would mean you use "add" or "insert", so you would write it like this: Set!string s; s.add("one"); As for the implementation, this seems to work: StringSet(T) { private void[0][T] _data; void add(T value) { _data.get(value, cast(void[0]) null); } }
Nov 12 2012
prev sibling next sibling parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Monday, 12 November 2012 at 15:22:00 UTC, bearophile wrote:
 The situation is not fully symmetrical: associative arrays 
 fully defined in Phobos have bad literals.

 Bye,
 bearophile

Related: what is the hook for implementing "in"?
Nov 12 2012
prev sibling next sibling parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Monday, 12 November 2012 at 15:36:29 UTC, monarch_dodra wrote:
 As for the implementation, this seems to work:


 StringSet(T)
 {
     private void[0][T] _data;
     void add(T value)
     {
         _data.get(value, cast(void[0]) null);
     }
 }

Wait. That doesn't work actually. This does. auto add(T value) { return _data[value] = cast(void[0]) null; }
Nov 12 2012
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
monarch_dodra:

 Related: what is the hook for implementing "in"?

"in" is supported: http://dlang.org/operatoroverloading.html#Binary Bye, bearophile
Nov 12 2012
prev sibling next sibling parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 11/12/2012 04:49 PM, monarch_dodra wrote:
 Wait. That doesn't work actually. This does.
      auto add(T value)
      {
          return _data[value] = cast(void[0]) null;
      }

Only with DMD. GDC and LDC both reject this formulation -- it seems better to use byte instead of void[0]. The problem is this workaround version blows up massively with number of entries -- disproportionately to how much memory it would take to simply store all the key values. :-(
Nov 12 2012
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 11/12/12, Joseph Rushton Wakeling <joseph.wakeling webdrake.net> wrote:
 Can you give a bit more explanation of how that workaround would work?  I
 can't
 see how you would insert entries in that case ...

      StringSet s;

      s["one"] = .... ?

s["one"] = [];
Nov 12 2012
prev sibling next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Mon, Nov 12, 2012 at 03:43:57PM +0100, monarch_dodra wrote:
 D has a built in hashed associative container, which is great.
 
 I've noticed though it has no built-in "hashed container" for when
 you just need to store unique elements, but without any specific
 data associated (lists of names, for example).
 
 Has anybody requested this before? Would there be any plans for it?

Well, I haven't *requested* this before, but I've certainly encountered the need for it. It's not too hard to implement, actually, you can basically just use the same code as the current AA implementation, minus the parts that deal with values. Then modify the lookup functions to return false when no key is found (instead of a[b] throwing an exception when b isn't in a, or returning a null value with the 'in' operator), and you're good to go. T -- "You are a very disagreeable person." "NO."
Nov 12 2012
prev sibling next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Mon, Nov 12, 2012 at 04:38:59PM +0100, monarch_dodra wrote:
 On Monday, 12 November 2012 at 15:22:00 UTC, bearophile wrote:
The situation is not fully symmetrical: associative arrays fully
defined in Phobos have bad literals.

Bye,
bearophile

Related: what is the hook for implementing "in"?

auto opBinaryRight(string op)(KeyType k) if (op=="in") { ... } T -- The early bird gets the worm. Moral: ewww...
Nov 12 2012
prev sibling next sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
11/12/2012 6:43 PM, monarch_dodra пишет:
 D has a built in hashed associative container, which is great.

No. The literals are nice though.
 I've noticed though it has no built-in "hashed container" for when you
 just need to store unique elements, but without any specific data
 associated (lists of names, for example).

There is no need for built-in containers. If anything there is plan to move hash out of built-ins. The fact that it utterly failed to happen doesn't prove anything as it still has to be the library type.
 Has anybody requested this before? Would there be any plans for it?

 ----
 The current "workaround" is to just dummy it.
 alias bool Dummy;
 Dummy[string] names;
 names["Paul"] = true; //Ew.

Make a wrapper?
 But it is not very obvious what is going on. I also tried "void"ing it, eg:
 void[string] names;
 But that doesn't work either: Error: can't have associative array of void.

 ----
 What would you think of introducing a built-in hashed container, that
 contains only keys?

Infinitely awful.
 We could call them "HA" for Hashed Array, and declare them with this
 simple syntax:

Or rather call them sets and have them in the library. -- Dmitry Olshansky
Nov 12 2012
next sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
11/12/2012 8:52 PM, Joseph Rushton Wakeling пишет:
 On 11/12/2012 05:27 PM, Dmitry Olshansky wrote:
 The current "workaround" is to just dummy it.
 alias bool Dummy;
 Dummy[string] names;
 names["Paul"] = true; //Ew.

Make a wrapper?

The problem with wrapped versions of associative arrays is that they just don't scale with number of keys. If I want to store (say) a set of 3 million size_t's, then it costs a lot more than 3_000_000 * size_t.sizeof() to do so this way. Do dedicated set implementations do better than this?

Yes, sure they can. Starting with e.g. a bitset. If the distribution is sparce then inversion lists can do wonders. They work especially well if your distribution has contiguous intervals, searching would be logN though. I use one for unicode character sets and it saves a LOT of space. Otherwise if it's sparce but doesn't have contiguous interval it seems entirely doable to combine the two in a universal datastructure for integral sets. Say a sorted array of 32/64-element bitsets (depending on architecture): //denotes a small bit set covering interval of [start..start+32/64) struct node{ size_t start; size_t bits; }; Should be quite compact & fast.
 I should say that my own interests in an implementation of set are
 twofold -- first, efficient storage; and second, being able to do an
 efficient foreach across the stored values, without any concern for order.

 Or rather call them sets and have them in the library.

The whole builtin-vs.-library thing seems a big distraction -- if something needs to be implemented, the library seems the natural place unless there's a very good reason otherwise.

-- Dmitry Olshansky
Nov 12 2012
prev sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
11/12/2012 9:44 PM, H. S. Teoh пишет:
 On Mon, Nov 12, 2012 at 05:52:32PM +0100, Joseph Rushton Wakeling wrote:
 On 11/12/2012 05:27 PM, Dmitry Olshansky wrote:
 The current "workaround" is to just dummy it.
 alias bool Dummy;
 Dummy[string] names;
 names["Paul"] = true; //Ew.

Make a wrapper?

The problem with wrapped versions of associative arrays is that they just don't scale with number of keys. If I want to store (say) a set of 3 million size_t's, then it costs a lot more than 3_000_000 * size_t.sizeof() to do so this way. Do dedicated set implementations do better than this?

Probably. For storing integer types, you could optimize for space by dividing your key space into blocks of, say, 32 entries each. Then your hash function will hash only the upper (n-5) bits of the key, and each hash slot will be a bit array of 32 bits, which are indexed by the lower 5 bits of the key. Assuming your keys are relatively densely populated, this will save quite a significant amount of space. In the sparse case, you probably don't waste that much space either, since each slot is only one 32-bit int wide.

Problem is that keys will collide so each slot will have to have upper bits of key *and* the small set. The bucket itself is still inside intrusive linked list. Otherwise the idea is great. About 3 words of overhead per word-sized set.
 You can probably also increase the size of the slots to, say, 64 bits or
 more, if you want to maximize the usage to GC allocation block overhead
 ratio. Yes, each slot is only 1 int wide, but the GC does have to
 maintain bookkeeping information that's probably a lot more than that.

16 bytes. Though you'd have to know how bucket entry looks like which is impossible with built-in hash map sadly.
 Allocating a large number of very small blocks is generally worse than a
 smaller number of medium-sized blocks.

-- Dmitry Olshansky
Nov 12 2012
prev sibling next sibling parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Monday, 12 November 2012 at 16:27:58 UTC, Dmitry Olshansky 
wrote:
 11/12/2012 6:43 PM, monarch_dodra пишет:
 [SNIP]

Yeah... I'm implementing my lightweight wrapper now.
Nov 12 2012
prev sibling next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Mon, Nov 12, 2012 at 08:27:51PM +0400, Dmitry Olshansky wrote:
 11/12/2012 6:43 PM, monarch_dodra пишет:
D has a built in hashed associative container, which is great.

No. The literals are nice though.
I've noticed though it has no built-in "hashed container" for when
you just need to store unique elements, but without any specific data
associated (lists of names, for example).

There is no need for built-in containers. If anything there is plan to move hash out of built-ins. The fact that it utterly failed to happen doesn't prove anything as it still has to be the library type.

I'm still hoping to get back to moving AA into the library one day. There are a lot of open bugs with the current AA implementation that stem from the fact that it's in two different places, and the core AA code has no access to value types. T -- Klein bottle for rent ... inquire within. -- Stephen Mulraney
Nov 12 2012
prev sibling next sibling parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 11/12/2012 05:27 PM, Dmitry Olshansky wrote:
 The current "workaround" is to just dummy it.
 alias bool Dummy;
 Dummy[string] names;
 names["Paul"] = true; //Ew.

Make a wrapper?

The problem with wrapped versions of associative arrays is that they just don't scale with number of keys. If I want to store (say) a set of 3 million size_t's, then it costs a lot more than 3_000_000 * size_t.sizeof() to do so this way. Do dedicated set implementations do better than this? I should say that my own interests in an implementation of set are twofold -- first, efficient storage; and second, being able to do an efficient foreach across the stored values, without any concern for order.
 Or rather call them sets and have them in the library.

The whole builtin-vs.-library thing seems a big distraction -- if something needs to be implemented, the library seems the natural place unless there's a very good reason otherwise.
Nov 12 2012
prev sibling next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Mon, Nov 12, 2012 at 05:52:32PM +0100, Joseph Rushton Wakeling wrote:
 On 11/12/2012 05:27 PM, Dmitry Olshansky wrote:
The current "workaround" is to just dummy it.
alias bool Dummy;
Dummy[string] names;
names["Paul"] = true; //Ew.

Make a wrapper?

The problem with wrapped versions of associative arrays is that they just don't scale with number of keys. If I want to store (say) a set of 3 million size_t's, then it costs a lot more than 3_000_000 * size_t.sizeof() to do so this way. Do dedicated set implementations do better than this?

Probably. For storing integer types, you could optimize for space by dividing your key space into blocks of, say, 32 entries each. Then your hash function will hash only the upper (n-5) bits of the key, and each hash slot will be a bit array of 32 bits, which are indexed by the lower 5 bits of the key. Assuming your keys are relatively densely populated, this will save quite a significant amount of space. In the sparse case, you probably don't waste that much space either, since each slot is only one 32-bit int wide. You can probably also increase the size of the slots to, say, 64 bits or more, if you want to maximize the usage to GC allocation block overhead ratio. Yes, each slot is only 1 int wide, but the GC does have to maintain bookkeeping information that's probably a lot more than that. Allocating a large number of very small blocks is generally worse than a smaller number of medium-sized blocks. If your keys are *very* dense, then storing ranges instead of individual elements is the way to go (though the implementation will be significantly more complex). For storing sets of strings, though, you'd want something like a trie instead. It all depends on your specific application. So a per application implementation might not be out of place.
 I should say that my own interests in an implementation of set are
 twofold -- first, efficient storage; and second, being able to do an
 efficient foreach across the stored values, without any concern for
 order.

With the hash + bit-array approach above, foreach would be very simple (just walk the hash table slots and enumerate the bits in each slot).
Or rather call them sets and have them in the library.

The whole builtin-vs.-library thing seems a big distraction -- if something needs to be implemented, the library seems the natural place unless there's a very good reason otherwise.

Yeah, there's definitely an advantage to minimizing the built-in stuff and keeping them in the library. Keeps the language relatively clean and simple, but still provide convenient library types for common tasks. T -- Be in denial for long enough, and one day you'll deny yourself of things you wish you hadn't.
Nov 12 2012
prev sibling next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Mon, Nov 12, 2012 at 09:58:14PM +0400, Dmitry Olshansky wrote:
 11/12/2012 9:44 PM, H. S. Teoh пишет:

For storing integer types, you could optimize for space by dividing
your key space into blocks of, say, 32 entries each. Then your hash
function will hash only the upper (n-5) bits of the key, and each
hash slot will be a bit array of 32 bits, which are indexed by the
lower 5 bits of the key. Assuming your keys are relatively densely
populated, this will save quite a significant amount of space. In the
sparse case, you probably don't waste that much space either, since
each slot is only one 32-bit int wide.

Problem is that keys will collide so each slot will have to have upper bits of key *and* the small set. The bucket itself is still inside intrusive linked list.

True.
 Otherwise the idea is great. About 3 words of overhead per word-sized
 set.
 
You can probably also increase the size of the slots to, say, 64 bits
or more, if you want to maximize the usage to GC allocation block
overhead ratio. Yes, each slot is only 1 int wide, but the GC does
have to maintain bookkeeping information that's probably a lot more
than that.

16 bytes. Though you'd have to know how bucket entry looks like which is impossible with built-in hash map sadly.

Actually, the built-in AA's bucket struct is duplicated in object_.d; it's what the range-based API uses to walk the AA. (Yeah it's extremely ugly. It must be cleaned up.) T -- There is no gravity. The earth sucks.
Nov 12 2012
prev sibling next sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Monday, November 12, 2012 15:43:57 monarch_dodra wrote:
 Thoughts?

For now, if you need a set, then use RedBlackTree. Long term, a hash set should be added to std.container. As cool as it is to have AAs built into the language, they have been a disaster on a number of levels, and they buy you very little over having a library type. So, I see no reason to complicate the language further buy trying to add a hash set to it (especially considering all of the issues with AAs). - Jonathan M Davis
Nov 12 2012
prev sibling next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Mon, Nov 12, 2012 at 08:07:08PM +0100, Jonathan M Davis wrote:
 On Monday, November 12, 2012 15:43:57 monarch_dodra wrote:
 Thoughts?

For now, if you need a set, then use RedBlackTree. Long term, a hash set should be added to std.container. As cool as it is to have AAs built into the language, they have been a disaster on a number of levels, and they buy you very little over having a library type. So, I see no reason to complicate the language further buy trying to add a hash set to it (especially considering all of the issues with AAs).

I contend that the problem with built-in AA's is their implementation, not the fact that they're built-in. T -- Be in denial for long enough, and one day you'll deny yourself of things you wish you hadn't.
Nov 12 2012
prev sibling next sibling parent reply "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Monday, November 12, 2012 11:36:38 H. S. Teoh wrote:
 I contend that the problem with built-in AA's is their implementation,
 not the fact that they're built-in.

Oh, I agree, but we, as a group, have obviously failed to handle the implementation of the built-in AAs properly, and I think that the indications are definitely that it's harder to get the built-in AAs right that it would be to create a library solution. At this point, AAs are part of the language and need to be fixed, but I see no reason to add anything else like them to the language. It's just not worth the trouble. - Jonathan M Davis
Nov 12 2012
parent reply Don Clugston <dac nospam.com> writes:
On 12/11/12 20:42, Jonathan M Davis wrote:
 On Monday, November 12, 2012 11:36:38 H. S. Teoh wrote:
 I contend that the problem with built-in AA's is their implementation,
 not the fact that they're built-in.

Oh, I agree, but we, as a group, have obviously failed to handle the implementation of the built-in AAs properly, and I think that the indications are definitely that it's harder to get the built-in AAs right that it would be to create a library solution.

I disagree with that 100%. Built-in AAs in D1 work fine (They could use a lot of improvement, but there's nothing wrong with the basic approach, it's quite simple). It was the idea that they could be seamlessly moved to a library type that was an unmitigated disaster. And for some reason, there has been a refusal to roll it back to the old method which worked. The thing that really really should change is the bizarre 'magic null' semantics of AAs.
Nov 14 2012
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/14/12 1:25 AM, Don Clugston wrote:
 On 12/11/12 20:42, Jonathan M Davis wrote:
 On Monday, November 12, 2012 11:36:38 H. S. Teoh wrote:
 I contend that the problem with built-in AA's is their implementation,
 not the fact that they're built-in.

Oh, I agree, but we, as a group, have obviously failed to handle the implementation of the built-in AAs properly, and I think that the indications are definitely that it's harder to get the built-in AAs right that it would be to create a library solution.

I disagree with that 100%. Built-in AAs in D1 work fine (They could use a lot of improvement, but there's nothing wrong with the basic approach, it's quite simple). It was the idea that they could be seamlessly moved to a library type that was an unmitigated disaster. And for some reason, there has been a refusal to roll it back to the old method which worked.

I think it is an absolute must that we move forward with a library implementation.
 The thing that really really should change is the bizarre 'magic null'
 semantics of AAs.

This is new! What does this mean? Andrei
Nov 14 2012
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
11/14/2012 9:44 PM, Andrei Alexandrescu пишет:
 On 11/14/12 1:25 AM, Don Clugston wrote:
 On 12/11/12 20:42, Jonathan M Davis wrote:
 On Monday, November 12, 2012 11:36:38 H. S. Teoh wrote:
 I contend that the problem with built-in AA's is their implementation,
 not the fact that they're built-in.

Oh, I agree, but we, as a group, have obviously failed to handle the implementation of the built-in AAs properly, and I think that the indications are definitely that it's harder to get the built-in AAs right that it would be to create a library solution.

I disagree with that 100%. Built-in AAs in D1 work fine (They could use a lot of improvement, but there's nothing wrong with the basic approach, it's quite simple). It was the idea that they could be seamlessly moved to a library type that was an unmitigated disaster. And for some reason, there has been a refusal to roll it back to the old method which worked.

I think it is an absolute must that we move forward with a library implementation.
 The thing that really really should change is the bizarre 'magic null'
 semantics of AAs.

This is new! What does this mean?

I'm sure it is nothing new. Basically AA is a reference type but it is auto-magically created on the first insertion. This is called magic null behavior. void foo(int[int] aa){ //aa here is null aa[1] = 1; //now the local aa points to a new hash-map } void main(){ int[int] map; foo(map); //map in this scope is still null assert(1 in map); // fails } I'm in favor of implicitly allocating an AA on definition that'll make it a proper reference type. -- Dmitry Olshansky
Nov 14 2012
next sibling parent reply Rainer Schuetze <r.sagitario gmx.de> writes:
On 11/14/2012 7:20 PM, Dmitry Olshansky wrote:
 11/14/2012 9:44 PM, Andrei Alexandrescu пишет:
 This is new! What does this mean?

I'm sure it is nothing new. Basically AA is a reference type but it is auto-magically created on the first insertion. This is called magic null behavior. void foo(int[int] aa){ //aa here is null aa[1] = 1; //now the local aa points to a new hash-map } void main(){ int[int] map; foo(map); //map in this scope is still null assert(1 in map); // fails } I'm in favor of implicitly allocating an AA on definition that'll make it a proper reference type.

I don't like the "magic null behaviour", but I see issues with this straight forward implementation: how can you define a statically initialized struct value (including init) that contains an associative array? It will have to run a default constructor to allocate the AA root object, introducing something like the often rejected user-defined default constructor for structs.
Nov 14 2012
parent Don <nospam nospam.com> writes:
On 14.11.2012 20:15, Rainer Schuetze wrote:
 On 11/14/2012 7:20 PM, Dmitry Olshansky wrote:
 11/14/2012 9:44 PM, Andrei Alexandrescu пишет:
 This is new! What does this mean?

I'm sure it is nothing new. Basically AA is a reference type but it is auto-magically created on the first insertion. This is called magic null behavior. void foo(int[int] aa){ //aa here is null aa[1] = 1; //now the local aa points to a new hash-map } void main(){ int[int] map; foo(map); //map in this scope is still null assert(1 in map); // fails } I'm in favor of implicitly allocating an AA on definition that'll make it a proper reference type.

I don't like the "magic null behaviour", but I see issues with this straight forward implementation: how can you define a statically initialized struct value (including init) that contains an associative array? It will have to run a default constructor to allocate the AA root object, introducing something like the often rejected user-defined default constructor for structs.

It doesn't need to allocate any keys or values. It just needs to allocate whatever structure it needs to keep track of how many items it has. As if you added an element, and then removed it.
Nov 14 2012
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/14/12 10:20 AM, Dmitry Olshansky wrote:
 11/14/2012 9:44 PM, Andrei Alexandrescu пишет:
 On 11/14/12 1:25 AM, Don Clugston wrote:
 The thing that really really should change is the bizarre 'magic null'
 semantics of AAs.

This is new! What does this mean?

I'm sure it is nothing new. Basically AA is a reference type but it is auto-magically created on the first insertion. This is called magic null behavior.

Oh I see. Well it's not that magic because it can be done in library code. Andrei
Nov 14 2012
prev sibling parent Don <nospam nospam.com> writes:
On 14.11.2012 16:39, H. S. Teoh wrote:
 On Wed, Nov 14, 2012 at 10:25:53AM +0100, Don Clugston wrote:
 On 12/11/12 20:42, Jonathan M Davis wrote:
 On Monday, November 12, 2012 11:36:38 H. S. Teoh wrote:
 I contend that the problem with built-in AA's is their
 implementation, not the fact that they're built-in.

Oh, I agree, but we, as a group, have obviously failed to handle the implementation of the built-in AAs properly, and I think that the indications are definitely that it's harder to get the built-in AAs right that it would be to create a library solution.

I disagree with that 100%. Built-in AAs in D1 work fine (They could use a lot of improvement, but there's nothing wrong with the basic approach, it's quite simple).

I don't know how AA's worked in D1, but the current codebase does not handle complex value types correctly. For one thing, it uses bitwise comparison of *arbitrary types*, as opposed to using opEquals or some such built-in language capability. This is the source of a good number of AA bugs.
 It was the idea that they could be seamlessly moved to a library type
 that was an unmitigated disaster.

The problem is with making byKey and byValue return ranges, which is a Phobos concept. That is the source of the schizophrenic code duplication in object_.d, which probably hides a subtle bug or two. Basically, it's the conflict between making AA's built-in, and making them compatible with a Phobos concept.

Yeah, that's from the library point view. From the compiler side, the problem is that this is a type which the compiler knows far too much about. I cannot see any way that the coupling between compiler and library can be less than we started with. The compiler has to relinquish control over most of the semantics, but cannot get rid of it all.
 Either we make it built-in, and if ranges are required then we add new
 functions to aaA.d that return ranges, or we make it library-level code,
 which means we move it out of aaA.d and add lowerings in the compiler to
 map it to a library type. Trying to straddle the two only leads to
 disaster.


 And for some reason, there has been a refusal to roll it back to the
 old method which worked.

 The thing that really really should change is the bizarre 'magic
 null' semantics of AAs.

Agreed. T

Nov 14 2012
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 11/12/12, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:
 Otherwise people use workarounds like "alias
 void[0][string] StringSet", and wrap such a type in a struct with
 operator overloads to emulate the built-in hash syntax.

Unfortunately this is also problematic with libraries which don't handle this type. For example serialization libraries can choke on such a type, and probably any routine that isn't specifically designed to work with void[0] will choke.
Nov 12 2012
prev sibling next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Nov 14, 2012 at 10:25:53AM +0100, Don Clugston wrote:
 On 12/11/12 20:42, Jonathan M Davis wrote:
On Monday, November 12, 2012 11:36:38 H. S. Teoh wrote:
I contend that the problem with built-in AA's is their
implementation, not the fact that they're built-in.

Oh, I agree, but we, as a group, have obviously failed to handle the implementation of the built-in AAs properly, and I think that the indications are definitely that it's harder to get the built-in AAs right that it would be to create a library solution.

I disagree with that 100%. Built-in AAs in D1 work fine (They could use a lot of improvement, but there's nothing wrong with the basic approach, it's quite simple).

I don't know how AA's worked in D1, but the current codebase does not handle complex value types correctly. For one thing, it uses bitwise comparison of *arbitrary types*, as opposed to using opEquals or some such built-in language capability. This is the source of a good number of AA bugs.
 It was the idea that they could be seamlessly moved to a library type
 that was an unmitigated disaster.

The problem is with making byKey and byValue return ranges, which is a Phobos concept. That is the source of the schizophrenic code duplication in object_.d, which probably hides a subtle bug or two. Basically, it's the conflict between making AA's built-in, and making them compatible with a Phobos concept. Either we make it built-in, and if ranges are required then we add new functions to aaA.d that return ranges, or we make it library-level code, which means we move it out of aaA.d and add lowerings in the compiler to map it to a library type. Trying to straddle the two only leads to disaster.
 And for some reason, there has been a refusal to roll it back to the
 old method which worked.
 
 The thing that really really should change is the bizarre 'magic
 null' semantics of AAs.

Agreed. T -- May you live all the days of your life. -- Jonathan Swift
Nov 14 2012
prev sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Wednesday, November 14, 2012 20:29:03 Don wrote:
 It doesn't need to allocate any keys or values. It just needs to
 allocate whatever structure it needs to keep track of how many items it
 has. As if you added an element, and then removed it.

Except that that doesn't play nicely with init. Instead of using init, it would have to be default constructed, which goes against how every other type works and risks causing problems outside of the case where you simply declare an AA as a local variable. For instance, it wouldn't work at all when an AA is a member variable of a struct. I understand wanting to get rid of the magic initialization nonsense, but what it does is completely consistent with how dynamic arrays work, and making it work otherwise would make it inconsistent with every other type in D with regards to default-initiliazation. We _do_ need a way to indicate that an AA should be properly initialized without inserting anything into it (having to insert and then remove something to do that is atrocious), but I don't think that default constructing AAs will fly. - Jonathan M Davis
Nov 14 2012