digitalmars.D - hashed array?

monarch_dodra (25/25) Nov 12 2012 D has a built in hashed associative container, which is great.

bearophile (9/15) Nov 12 2012 They are often named "hash sets". In Python this is a built-in

monarch_dodra (11/26) Nov 12 2012 Well, I don't see why AA would be built-in, but not set.

Andrej Mitrovic (3/4) Nov 12 2012 dcollections has sets http://www.dsource.org/projects/dcollections

Jacob Carlborg (6/10) Nov 12 2012 And Tango:

bearophile (7/13) Nov 12 2012 It's generally wise to minimize the amount of stuff you put in

monarch_dodra (2/4) Nov 12 2012 I apologize, I'm not sure what you mean.
monarch_dodra (2/6) Nov 12 2012 Related: what is the hook for implementing "in"?

bearophile (5/6) Nov 12 2012 "in" is supported:
H. S. Teoh (5/13) Nov 12 2012 auto opBinaryRight(string op)(KeyType k) if (op=="in") { ... }

Andrej Mitrovic (5/8) Nov 12 2012 You probably mean a Set? I think we could really use some syntax

bearophile (13/14) Nov 12 2012 I don't agree. I think in D there is no real need for built-in
monarch_dodra (4/13) Nov 12 2012 Well "set" is just a name by convention. In C++, set's complexity

Joseph Rushton Wakeling (5/8) Nov 12 2012 Can you give a bit more explanation of how that workaround would work? ...

monarch_dodra (15/25) Nov 12 2012 The set interface would mean you use "add" or "insert", so you

monarch_dodra (6/15) Nov 12 2012 Wait. That doesn't work actually. This does.

Joseph Rushton Wakeling (6/11) Nov 12 2012 Only with DMD. GDC and LDC both reject this formulation -- it seems bet...

Andrej Mitrovic (2/7) Nov 12 2012 s["one"] = [];
H. S. Teoh (12/19) Nov 12 2012 Well, I haven't *requested* this before, but I've certainly encountered
Dmitry Olshansky (10/28) Nov 12 2012 There is no need for built-in containers. If anything there is plan to

monarch_dodra (3/5) Nov 12 2012 Yeah... I'm implementing my lightweight wrapper now.
H. S. Teoh (9/24) Nov 12 2012 [...]
Joseph Rushton Wakeling (12/18) Nov 12 2012 The problem with wrapped versions of associative arrays is that they jus...

Dmitry Olshansky (18/37) Nov 12 2012 Yes, sure they can. Starting with e.g. a bitset.

H. S. Teoh (31/53) Nov 12 2012 Probably.

Dmitry Olshansky (10/40) Nov 12 2012 Problem is that keys will collide so each slot will have to have upper

H. S. Teoh (10/35) Nov 12 2012 True.

Jonathan M Davis (8/9) Nov 12 2012 For now, if you need a set, then use RedBlackTree. Long term, a hash set...
H. S. Teoh (7/16) Nov 12 2012 [...]
Jonathan M Davis (9/11) Nov 12 2012 Oh, I agree, but we, as a group, have obviously failed to handle the

Don Clugston (10/17) Nov 14 2012 I disagree with that 100%.

H. S. Teoh (20/41) Nov 14 2012 I don't know how AA's worked in D1, but the current codebase does not

Don (6/45) Nov 14 2012 Yeah, that's from the library point view. From the compiler side, the

Andrei Alexandrescu (5/25) Nov 14 2012 I think it is an absolute must that we move forward with a library

Dmitry Olshansky (19/45) Nov 14 2012 I'm sure it is nothing new. Basically AA is a reference type but it is

Rainer Schuetze (7/27) Nov 14 2012 I don't like the "magic null behaviour", but I see issues with this

Don (4/37) Nov 14 2012 It doesn't need to allocate any keys or values. It just needs to

Jonathan M Davis (14/17) Nov 14 2012 Except that that doesn't play nicely with init. Instead of using init, i...

Andrei Alexandrescu (3/13) Nov 14 2012 Oh I see. Well it's not that magic because it can be done in library cod...

Andrej Mitrovic (5/8) Nov 12 2012 Unfortunately this is also problematic with libraries which don't

"monarch_dodra" <monarchdodra gmail.com> writes:

D has a built in hashed associative container, which is great.

I've noticed though it has no built-in "hashed container" for 
when you just need to store unique elements, but without any 
specific data associated (lists of names, for example).

Has anybody requested this before? Would there be any plans for 
it?

----
The current "workaround" is to just dummy it.
alias bool Dummy;
Dummy[string] names;
names["Paul"] = true; //Ew.

But it is not very obvious what is going on. I also tried 
"void"ing it, eg:
void[string] names;
But that doesn't work either: Error: can't have associative array 
of void.

----
What would you think of introducing a built-in hashed container, 
that contains only keys?

We could call them "HA" for Hashed Array, and declare them with 
this simple syntax:

[string] names;

he interface would be mostly the already existing functions of AA 
(".remove", "in", "rehash()"), plus ".insert" to insert a key.

Thoughts?

Nov 12 2012

"bearophile" <bearophileHUGS lycos.com> writes:

monarch_dodra:

 I've noticed though it has no built-in "hashed container" for 
 when you just need to store unique elements, but without any 
 specific data associated (lists of names, for example).

They are often named "hash sets". In Python this is a built-in 
type. But in D I think it's better to define such sets as a 
library template in a module to import. This is meant to be in 
Phobos.


 he interface would be mostly the already existing functions of 
 AA (".remove", "in", "rehash()"), plus ".insert" to insert a 
 key.

It also needs some other methods, see:
http://docs.python.org/2/library/sets.html

Bye,
bearophile

Nov 12 2012

"monarch_dodra" <monarchdodra gmail.com> writes:

On Monday, 12 November 2012 at 14:50:48 UTC, bearophile wrote:
 monarch_dodra:

 I've noticed though it has no built-in "hashed container" for 
 when you just need to store unique elements, but without any 
 specific data associated (lists of names, for example).

 They are often named "hash sets". In Python this is a built-in 
 type. But in D I think it's better to define such sets as a 
 library template in a module to import. This is meant to be in 
 Phobos.


 he interface would be mostly the already existing functions of 
 AA (".remove", "in", "rehash()"), plus ".insert" to insert a 
 key.

 It also needs some other methods, see:
 http://docs.python.org/2/library/sets.html

 Bye,
 bearophile

Well, I don't see why AA would be built-in, but not set.

In particular, more often than not, AAs are implemented in terms 
of set (and not the other way around) anyways, so if we take the 
time to implement AA, why not hashed sets?

The advantage of having it built-in would mean more convenient 
declaration syntax:
int[string] vs AA!(string, int) // string => int
[string] vs Set!(string) //set of string

But of course, a library solution would be good too.

Might even make it my next project and get started on it now :)

Nov 12 2012

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

On 11/12/12, monarch_dodra <monarchdodra gmail.com> wrote:
 But of course, a library solution would be good too.

dcollections has sets http://www.dsource.org/projects/dcollections
(not sure if it still compiles though)

Nov 12 2012

Jacob Carlborg <doob me.com> writes:

On 2012-11-12 16:10, Andrej Mitrovic wrote:
 On 11/12/12, monarch_dodra <monarchdodra gmail.com> wrote:
 But of course, a library solution would be good too.

 dcollections has sets http://www.dsource.org/projects/dcollections
 (not sure if it still compiles though)

And Tango:

https://github.com/SiegeLord/Tango-D2
http://dsource.org/projects/tango/docs/current

-- 
/Jacob Carlborg

Nov 12 2012

"bearophile" <bearophileHUGS lycos.com> writes:

monarch_dodra:

 Well, I don't see why AA would be built-in, but not set.

It's generally wise to minimize the amount of stuff you put in 
the D front end or in the D runtime.


 so if we take the time to implement AA, why not hashed sets?

 The advantage of having it built-in would mean more convenient 
 declaration syntax:
 int[string] vs AA!(string, int) // string => int
 [string] vs Set!(string) //set of string

The situation is not fully symmetrical: associative arrays fully 
defined in Phobos have bad literals.

Bye,
bearophile

Nov 12 2012

"monarch_dodra" <monarchdodra gmail.com> writes:

On Monday, 12 November 2012 at 15:22:00 UTC, bearophile wrote:
 The situation is not fully symmetrical: associative arrays 
 fully defined in Phobos have bad literals.

I apologize, I'm not sure what you mean.

Nov 12 2012

"monarch_dodra" <monarchdodra gmail.com> writes:

On Monday, 12 November 2012 at 15:22:00 UTC, bearophile wrote:
 The situation is not fully symmetrical: associative arrays 
 fully defined in Phobos have bad literals.

 Bye,
 bearophile

Related: what is the hook for implementing "in"?

Nov 12 2012

"bearophile" <bearophileHUGS lycos.com> writes:

monarch_dodra:

 Related: what is the hook for implementing "in"?

"in" is supported:

http://dlang.org/operatoroverloading.html#Binary

Bye,
bearophile

Nov 12 2012

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Mon, Nov 12, 2012 at 04:38:59PM +0100, monarch_dodra wrote:
 On Monday, 12 November 2012 at 15:22:00 UTC, bearophile wrote:
The situation is not fully symmetrical: associative arrays fully
defined in Phobos have bad literals.

Bye,
bearophile

 
 Related: what is the hook for implementing "in"?

auto opBinaryRight(string op)(KeyType k) if (op=="in") { ... }


T

-- 
The early bird gets the worm. Moral: ewww...

Nov 12 2012

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

On 11/12/12, monarch_dodra <monarchdodra gmail.com> wrote:
 I've noticed though it has no built-in "hashed container" for
 when you just need to store unique elements, but without any
 specific data associated (lists of names, for example).

You probably mean a Set? I think we could really use some syntax
support for sets. Otherwise people use workarounds like "alias
void[0][string] StringSet", and wrap such a type in a struct with
operator overloads to emulate the built-in hash syntax.

Nov 12 2012

"bearophile" <bearophileHUGS lycos.com> writes:

Andrej Mitrovic:

 I think we could really use some syntax support for sets.<

I don't agree. I think in D there is no real need for built-in 
support for sets, despite built-ins are handy. It's better to 
leave the compiler changes to things that can't be implemented in 
library code, like a good syntax for tuple unpacking and other 
things.

I think the only feature you can't implement in D is "full" 
operator support for set comparisons, because currently D 
operator overloading doesn't allow you to define ">=" and ">" 
independently. But this is a minor limitation that doesn't 
justify built-in sets.

Bye,
bearophile

Nov 12 2012

"monarch_dodra" <monarchdodra gmail.com> writes:

On Monday, 12 November 2012 at 14:53:19 UTC, Andrej Mitrovic 
wrote:
 On 11/12/12, monarch_dodra <monarchdodra gmail.com> wrote:
 I've noticed though it has no built-in "hashed container" for
 when you just need to store unique elements, but without any
 specific data associated (lists of names, for example).

 You probably mean a Set? I think we could really use some syntax
 support for sets. Otherwise people use workarounds like "alias
 void[0][string] StringSet", and wrap such a type in a struct 
 with
 operator overloads to emulate the built-in hash syntax.

Well "set" is just a name by convention. In C++, set's complexity 
restrictions forces the binary tree implementation on it.

Nov 12 2012

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On 11/12/2012 03:53 PM, Andrej Mitrovic wrote:
 You probably mean a Set? I think we could really use some syntax
 support for sets. Otherwise people use workarounds like "alias
 void[0][string] StringSet"

Can you give a bit more explanation of how that workaround would work?  I can't 
see how you would insert entries in that case ...

     StringSet s;

     s["one"] = .... ?

Nov 12 2012

"monarch_dodra" <monarchdodra gmail.com> writes:

On Monday, 12 November 2012 at 15:32:24 UTC, Joseph Rushton 
Wakeling wrote:
 On 11/12/2012 03:53 PM, Andrej Mitrovic wrote:
 You probably mean a Set? I think we could really use some 
 syntax
 support for sets. Otherwise people use workarounds like "alias
 void[0][string] StringSet"

 Can you give a bit more explanation of how that workaround 
 would work?  I can't see how you would insert entries in that 
 case ...

     StringSet s;

     s["one"] = .... ?

The set interface would mean you use "add" or "insert", so you 
would write it like this:

Set!string s;
s.add("one");

As for the implementation, this seems to work:


StringSet(T)
{
     private void[0][T] _data;
     void add(T value)
     {
         _data.get(value, cast(void[0]) null);
     }
}

Nov 12 2012

"monarch_dodra" <monarchdodra gmail.com> writes:

On Monday, 12 November 2012 at 15:36:29 UTC, monarch_dodra wrote:
 As for the implementation, this seems to work:


 StringSet(T)
 {
     private void[0][T] _data;
     void add(T value)
     {
         _data.get(value, cast(void[0]) null);
     }
 }

Wait. That doesn't work actually. This does.
     auto add(T value)
     {
         return _data[value] = cast(void[0]) null;
     }

Nov 12 2012

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On 11/12/2012 04:49 PM, monarch_dodra wrote:
 Wait. That doesn't work actually. This does.
      auto add(T value)
      {
          return _data[value] = cast(void[0]) null;
      }

Only with DMD.  GDC and LDC both reject this formulation -- it seems better to 
use byte instead of void[0].

The problem is this workaround version blows up massively with number of
entries 
-- disproportionately to how much memory it would take to simply store all the 
key values. :-(

Nov 12 2012

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

On 11/12/12, Joseph Rushton Wakeling <joseph.wakeling webdrake.net> wrote:
 Can you give a bit more explanation of how that workaround would work?  I
 can't
 see how you would insert entries in that case ...

      StringSet s;

      s["one"] = .... ?

s["one"] = [];

Nov 12 2012

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Mon, Nov 12, 2012 at 03:43:57PM +0100, monarch_dodra wrote:
 D has a built in hashed associative container, which is great.
 
 I've noticed though it has no built-in "hashed container" for when
 you just need to store unique elements, but without any specific
 data associated (lists of names, for example).
 
 Has anybody requested this before? Would there be any plans for it?

Well, I haven't *requested* this before, but I've certainly encountered
the need for it.

It's not too hard to implement, actually, you can basically just use the
same code as the current AA implementation, minus the parts that deal
with values. Then modify the lookup functions to return false when no
key is found (instead of a[b] throwing an exception when b isn't in a,
or returning a null value with the 'in' operator), and you're good to
go.


T

-- 
"You are a very disagreeable person." "NO."

Nov 12 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

11/12/2012 6:43 PM, monarch_dodra пишет:
 D has a built in hashed associative container, which is great.


No. The literals are nice though.

 I've noticed though it has no built-in "hashed container" for when you
 just need to store unique elements, but without any specific data
 associated (lists of names, for example).

There is no need for built-in containers. If anything there is plan to 
move hash out of built-ins. The fact that it utterly failed to happen 
doesn't prove anything as it still has to be the library type.

 Has anybody requested this before? Would there be any plans for it?

 ----
 The current "workaround" is to just dummy it.
 alias bool Dummy;
 Dummy[string] names;
 names["Paul"] = true; //Ew.

Make a wrapper?

 But it is not very obvious what is going on. I also tried "void"ing it, eg:
 void[string] names;
 But that doesn't work either: Error: can't have associative array of void.

 ----
 What would you think of introducing a built-in hashed container, that
 contains only keys?

Infinitely awful.

 We could call them "HA" for Hashed Array, and declare them with this
 simple syntax:

Or rather call them sets and have them in the library.


-- 
Dmitry Olshansky

Nov 12 2012

"monarch_dodra" <monarchdodra gmail.com> writes:

On Monday, 12 November 2012 at 16:27:58 UTC, Dmitry Olshansky 
wrote:
 11/12/2012 6:43 PM, monarch_dodra пишет:
 [SNIP]

Yeah... I'm implementing my lightweight wrapper now.

Nov 12 2012

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Mon, Nov 12, 2012 at 08:27:51PM +0400, Dmitry Olshansky wrote:
 11/12/2012 6:43 PM, monarch_dodra пишет:
D has a built in hashed associative container, which is great.

 
 
 No. The literals are nice though.
 
I've noticed though it has no built-in "hashed container" for when
you just need to store unique elements, but without any specific data
associated (lists of names, for example).

 
 There is no need for built-in containers. If anything there is plan
 to move hash out of built-ins. The fact that it utterly failed to
 happen doesn't prove anything as it still has to be the library
 type.

[...]

I'm still hoping to get back to moving AA into the library one day.
There are a lot of open bugs with the current AA implementation that
stem from the fact that it's in two different places, and the core AA
code has no access to value types.


T

-- 
Klein bottle for rent ... inquire within. -- Stephen Mulraney

Nov 12 2012

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On 11/12/2012 05:27 PM, Dmitry Olshansky wrote:
 The current "workaround" is to just dummy it.
 alias bool Dummy;
 Dummy[string] names;
 names["Paul"] = true; //Ew.

 Make a wrapper?

The problem with wrapped versions of associative arrays is that they just don't 
scale with number of keys.  If I want to store (say) a set of 3 million 
size_t's, then it costs a lot more than 3_000_000 * size_t.sizeof() to do so 
this way.

Do dedicated set implementations do better than this?

I should say that my own interests in an implementation of set are twofold -- 
first, efficient storage; and second, being able to do an efficient foreach 
across the stored values, without any concern for order.

 Or rather call them sets and have them in the library.

The whole builtin-vs.-library thing seems a big distraction -- if something 
needs to be implemented, the library seems the natural place unless there's a 
very good reason otherwise.

Nov 12 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

11/12/2012 8:52 PM, Joseph Rushton Wakeling пишет:
 On 11/12/2012 05:27 PM, Dmitry Olshansky wrote:
 The current "workaround" is to just dummy it.
 alias bool Dummy;
 Dummy[string] names;
 names["Paul"] = true; //Ew.

 Make a wrapper?

 The problem with wrapped versions of associative arrays is that they
 just don't scale with number of keys.  If I want to store (say) a set of
 3 million size_t's, then it costs a lot more than 3_000_000 *
 size_t.sizeof() to do so this way.

 Do dedicated set implementations do better than this?

Yes, sure they can. Starting with e.g. a bitset.
If the distribution is sparce then inversion lists can do wonders. They 
work especially well if your distribution has contiguous intervals, 
searching would be logN though.

I use one for unicode character sets and it saves a LOT of space.

Otherwise if it's sparce but doesn't have contiguous interval it seems 
entirely doable to combine the two in a universal datastructure for 
integral sets. Say a sorted array of 32/64-element bitsets (depending on 
architecture):

//denotes a small bit set covering interval of [start..start+32/64)
struct node{
	size_t start;
	size_t bits;
};

Should be quite compact & fast.

 I should say that my own interests in an implementation of set are
 twofold -- first, efficient storage; and second, being able to do an
 efficient foreach across the stored values, without any concern for order.

 Or rather call them sets and have them in the library.

 The whole builtin-vs.-library thing seems a big distraction -- if
 something needs to be implemented, the library seems the natural place
 unless there's a very good reason otherwise.


-- 
Dmitry Olshansky

Nov 12 2012

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Mon, Nov 12, 2012 at 05:52:32PM +0100, Joseph Rushton Wakeling wrote:
 On 11/12/2012 05:27 PM, Dmitry Olshansky wrote:
The current "workaround" is to just dummy it.
alias bool Dummy;
Dummy[string] names;
names["Paul"] = true; //Ew.

Make a wrapper?

 
 The problem with wrapped versions of associative arrays is that they
 just don't scale with number of keys.  If I want to store (say) a
 set of 3 million size_t's, then it costs a lot more than 3_000_000 *
 size_t.sizeof() to do so this way.

 Do dedicated set implementations do better than this?

Probably.

For storing integer types, you could optimize for space by dividing your
key space into blocks of, say, 32 entries each. Then your hash function
will hash only the upper (n-5) bits of the key, and each hash slot will
be a bit array of 32 bits, which are indexed by the lower 5 bits of the
key. Assuming your keys are relatively densely populated, this will save
quite a significant amount of space. In the sparse case, you probably
don't waste that much space either, since each slot is only one 32-bit
int wide.

You can probably also increase the size of the slots to, say, 64 bits or
more, if you want to maximize the usage to GC allocation block overhead
ratio. Yes, each slot is only 1 int wide, but the GC does have to
maintain bookkeeping information that's probably a lot more than that.
Allocating a large number of very small blocks is generally worse than a
smaller number of medium-sized blocks.

If your keys are *very* dense, then storing ranges instead of individual
elements is the way to go (though the implementation will be
significantly more complex).

For storing sets of strings, though, you'd want something like a trie
instead.

It all depends on your specific application. So a per application
implementation might not be out of place.


 I should say that my own interests in an implementation of set are
 twofold -- first, efficient storage; and second, being able to do an
 efficient foreach across the stored values, without any concern for
 order.

With the hash + bit-array approach above, foreach would be very simple
(just walk the hash table slots and enumerate the bits in each slot).


Or rather call them sets and have them in the library.

 
 The whole builtin-vs.-library thing seems a big distraction -- if
 something needs to be implemented, the library seems the natural
 place unless there's a very good reason otherwise.

Yeah, there's definitely an advantage to minimizing the built-in stuff
and keeping them in the library. Keeps the language relatively clean and
simple, but still provide convenient library types for common tasks.


T

-- 
Be in denial for long enough, and one day you'll deny yourself of things you
wish you hadn't.

Nov 12 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

11/12/2012 9:44 PM, H. S. Teoh пишет:
 On Mon, Nov 12, 2012 at 05:52:32PM +0100, Joseph Rushton Wakeling wrote:
 On 11/12/2012 05:27 PM, Dmitry Olshansky wrote:
 The current "workaround" is to just dummy it.
 alias bool Dummy;
 Dummy[string] names;
 names["Paul"] = true; //Ew.

 Make a wrapper?

 The problem with wrapped versions of associative arrays is that they
 just don't scale with number of keys.  If I want to store (say) a
 set of 3 million size_t's, then it costs a lot more than 3_000_000 *
 size_t.sizeof() to do so this way.

 Do dedicated set implementations do better than this?

 Probably.

 For storing integer types, you could optimize for space by dividing your
 key space into blocks of, say, 32 entries each. Then your hash function
 will hash only the upper (n-5) bits of the key, and each hash slot will
 be a bit array of 32 bits, which are indexed by the lower 5 bits of the
 key. Assuming your keys are relatively densely populated, this will save
 quite a significant amount of space. In the sparse case, you probably
 don't waste that much space either, since each slot is only one 32-bit
 int wide.

Problem is that keys will collide so each slot will have to have upper 
bits of key *and* the small set. The bucket itself is still inside 
intrusive linked list.

Otherwise the idea is great. About 3 words of overhead per word-sized set.

 You can probably also increase the size of the slots to, say, 64 bits or
 more, if you want to maximize the usage to GC allocation block overhead
 ratio. Yes, each slot is only 1 int wide, but the GC does have to
 maintain bookkeeping information that's probably a lot more than that.

16 bytes. Though you'd have to know how bucket entry looks like which is 
impossible with built-in hash map sadly.

 Allocating a large number of very small blocks is generally worse than a
 smaller number of medium-sized blocks.

[snip other good points]


-- 
Dmitry Olshansky

Nov 12 2012

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Mon, Nov 12, 2012 at 09:58:14PM +0400, Dmitry Olshansky wrote:
 11/12/2012 9:44 PM, H. S. Teoh пишет:

[...]
For storing integer types, you could optimize for space by dividing
your key space into blocks of, say, 32 entries each. Then your hash
function will hash only the upper (n-5) bits of the key, and each
hash slot will be a bit array of 32 bits, which are indexed by the
lower 5 bits of the key. Assuming your keys are relatively densely
populated, this will save quite a significant amount of space. In the
sparse case, you probably don't waste that much space either, since
each slot is only one 32-bit int wide.

 
 Problem is that keys will collide so each slot will have to have upper
 bits of key *and* the small set. The bucket itself is still inside
 intrusive linked list.

True.


 Otherwise the idea is great. About 3 words of overhead per word-sized
 set.
 
You can probably also increase the size of the slots to, say, 64 bits
or more, if you want to maximize the usage to GC allocation block
overhead ratio. Yes, each slot is only 1 int wide, but the GC does
have to maintain bookkeeping information that's probably a lot more
than that.

 
 16 bytes. Though you'd have to know how bucket entry looks like
 which is impossible with built-in hash map sadly.

[...]

Actually, the built-in AA's bucket struct is duplicated in object_.d;
it's what the range-based API uses to walk the AA. (Yeah it's extremely
ugly. It must be cleaned up.)


T

-- 
There is no gravity. The earth sucks.

Nov 12 2012

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Monday, November 12, 2012 15:43:57 monarch_dodra wrote:
 Thoughts?

For now, if you need a set, then use RedBlackTree. Long term, a hash set 
should be added to std.container. As cool as it is to have AAs built into the 
language, they have been a disaster on a number of levels, and they buy you 
very little over having a library type. So, I see no reason to complicate the 
language further buy trying to add a hash set to it (especially considering 
all of the issues with AAs).

- Jonathan M Davis

Nov 12 2012

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Mon, Nov 12, 2012 at 08:07:08PM +0100, Jonathan M Davis wrote:
 On Monday, November 12, 2012 15:43:57 monarch_dodra wrote:
 Thoughts?

 
 For now, if you need a set, then use RedBlackTree. Long term, a hash
 set should be added to std.container. As cool as it is to have AAs
 built into the language, they have been a disaster on a number of
 levels, and they buy you very little over having a library type. So, I
 see no reason to complicate the language further buy trying to add a
 hash set to it (especially considering all of the issues with AAs).

[...]

I contend that the problem with built-in AA's is their implementation,
not the fact that they're built-in.


T

-- 
Be in denial for long enough, and one day you'll deny yourself of things you
wish you hadn't.

Nov 12 2012

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Monday, November 12, 2012 11:36:38 H. S. Teoh wrote:
 I contend that the problem with built-in AA's is their implementation,
 not the fact that they're built-in.

Oh, I agree, but we, as a group, have obviously failed to handle the 
implementation of the built-in AAs properly, and I think that the indications 
are definitely that it's harder to get the built-in AAs right that it would be 
to create a library solution.

At this point, AAs are part of the language and need to be fixed, but I see no 
reason to add anything else like them to the language. It's just not worth the 
trouble.

- Jonathan M Davis

Nov 12 2012

Don Clugston <dac nospam.com> writes:

On 12/11/12 20:42, Jonathan M Davis wrote:
 On Monday, November 12, 2012 11:36:38 H. S. Teoh wrote:
 I contend that the problem with built-in AA's is their implementation,
 not the fact that they're built-in.

 Oh, I agree, but we, as a group, have obviously failed to handle the
 implementation of the built-in AAs properly, and I think that the indications
 are definitely that it's harder to get the built-in AAs right that it would be
 to create a library solution.

I disagree with that 100%.
Built-in AAs in D1 work fine (They could use a lot of improvement, but 
there's nothing wrong with the basic approach, it's quite simple). It 
was the idea that they could be seamlessly moved to a library type that 
was an unmitigated disaster.

And for some reason, there has been a refusal to roll it back to the old 
method which worked.

The thing that really really should change is the bizarre 'magic null' 
semantics of AAs.

Nov 14 2012

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Wed, Nov 14, 2012 at 10:25:53AM +0100, Don Clugston wrote:
 On 12/11/12 20:42, Jonathan M Davis wrote:
On Monday, November 12, 2012 11:36:38 H. S. Teoh wrote:
I contend that the problem with built-in AA's is their
implementation, not the fact that they're built-in.

Oh, I agree, but we, as a group, have obviously failed to handle the
implementation of the built-in AAs properly, and I think that the
indications are definitely that it's harder to get the built-in AAs
right that it would be to create a library solution.

 
 I disagree with that 100%.
 Built-in AAs in D1 work fine (They could use a lot of improvement,
 but there's nothing wrong with the basic approach, it's quite
 simple).

I don't know how AA's worked in D1, but the current codebase does not
handle complex value types correctly. For one thing, it uses bitwise
comparison of *arbitrary types*, as opposed to using opEquals or some
such built-in language capability. This is the source of a good number
of AA bugs.


 It was the idea that they could be seamlessly moved to a library type
 that was an unmitigated disaster.

The problem is with making byKey and byValue return ranges, which is a
Phobos concept. That is the source of the schizophrenic code duplication
in object_.d, which probably hides a subtle bug or two. Basically, it's
the conflict between making AA's built-in, and making them compatible
with a Phobos concept.

Either we make it built-in, and if ranges are required then we add new
functions to aaA.d that return ranges, or we make it library-level code,
which means we move it out of aaA.d and add lowerings in the compiler to
map it to a library type. Trying to straddle the two only leads to
disaster.


 And for some reason, there has been a refusal to roll it back to the
 old method which worked.
 
 The thing that really really should change is the bizarre 'magic
 null' semantics of AAs.

Agreed.


T

-- 
May you live all the days of your life. -- Jonathan Swift

Nov 14 2012

Don <nospam nospam.com> writes:

On 14.11.2012 16:39, H. S. Teoh wrote:
 On Wed, Nov 14, 2012 at 10:25:53AM +0100, Don Clugston wrote:
 On 12/11/12 20:42, Jonathan M Davis wrote:
 On Monday, November 12, 2012 11:36:38 H. S. Teoh wrote:
 I contend that the problem with built-in AA's is their
 implementation, not the fact that they're built-in.

 Oh, I agree, but we, as a group, have obviously failed to handle the
 implementation of the built-in AAs properly, and I think that the
 indications are definitely that it's harder to get the built-in AAs
 right that it would be to create a library solution.

 I disagree with that 100%.
 Built-in AAs in D1 work fine (They could use a lot of improvement,
 but there's nothing wrong with the basic approach, it's quite
 simple).

 I don't know how AA's worked in D1, but the current codebase does not
 handle complex value types correctly. For one thing, it uses bitwise
 comparison of *arbitrary types*, as opposed to using opEquals or some
 such built-in language capability. This is the source of a good number
 of AA bugs.


 It was the idea that they could be seamlessly moved to a library type
 that was an unmitigated disaster.

 The problem is with making byKey and byValue return ranges, which is a
 Phobos concept. That is the source of the schizophrenic code duplication
 in object_.d, which probably hides a subtle bug or two. Basically, it's
 the conflict between making AA's built-in, and making them compatible
 with a Phobos concept.

Yeah, that's from the library point view. From the compiler side, the 
problem is that this is a type which the compiler knows far too much about.
I cannot see any way that the coupling between compiler and library can 
be less than we started with. The compiler has to relinquish control 
over most of the semantics, but cannot get rid of it all.

 Either we make it built-in, and if ranges are required then we add new
 functions to aaA.d that return ranges, or we make it library-level code,
 which means we move it out of aaA.d and add lowerings in the compiler to
 map it to a library type. Trying to straddle the two only leads to
 disaster.


 And for some reason, there has been a refusal to roll it back to the
 old method which worked.

 The thing that really really should change is the bizarre 'magic
 null' semantics of AAs.

 Agreed.


 T

Nov 14 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 11/14/12 1:25 AM, Don Clugston wrote:
 On 12/11/12 20:42, Jonathan M Davis wrote:
 On Monday, November 12, 2012 11:36:38 H. S. Teoh wrote:
 I contend that the problem with built-in AA's is their implementation,
 not the fact that they're built-in.

 Oh, I agree, but we, as a group, have obviously failed to handle the
 implementation of the built-in AAs properly, and I think that the
 indications
 are definitely that it's harder to get the built-in AAs right that it
 would be
 to create a library solution.

 I disagree with that 100%.
 Built-in AAs in D1 work fine (They could use a lot of improvement, but
 there's nothing wrong with the basic approach, it's quite simple). It
 was the idea that they could be seamlessly moved to a library type that
 was an unmitigated disaster.

 And for some reason, there has been a refusal to roll it back to the old
 method which worked.

I think it is an absolute must that we move forward with a library 
implementation.

 The thing that really really should change is the bizarre 'magic null'
 semantics of AAs.

This is new! What does this mean?


Andrei

Nov 14 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

11/14/2012 9:44 PM, Andrei Alexandrescu пишет:
 On 11/14/12 1:25 AM, Don Clugston wrote:
 On 12/11/12 20:42, Jonathan M Davis wrote:
 On Monday, November 12, 2012 11:36:38 H. S. Teoh wrote:
 I contend that the problem with built-in AA's is their implementation,
 not the fact that they're built-in.

 Oh, I agree, but we, as a group, have obviously failed to handle the
 implementation of the built-in AAs properly, and I think that the
 indications
 are definitely that it's harder to get the built-in AAs right that it
 would be
 to create a library solution.

 I disagree with that 100%.
 Built-in AAs in D1 work fine (They could use a lot of improvement, but
 there's nothing wrong with the basic approach, it's quite simple). It
 was the idea that they could be seamlessly moved to a library type that
 was an unmitigated disaster.

 And for some reason, there has been a refusal to roll it back to the old
 method which worked.

 I think it is an absolute must that we move forward with a library
 implementation.

 The thing that really really should change is the bizarre 'magic null'
 semantics of AAs.

 This is new! What does this mean?

I'm sure it is nothing new. Basically AA is a reference type but it is 
auto-magically created on the first insertion. This is called magic null 
behavior.

void foo(int[int] aa){
//aa here is null
	aa[1] = 1;
//now the local aa points to a new hash-map
}

void main(){
	int[int] map;
	foo(map);
	//map in this scope is still null
	assert(1 in map); // fails
}


I'm in favor of implicitly allocating an AA on definition that'll make 
it a proper reference type.

-- 
Dmitry Olshansky

Nov 14 2012

Rainer Schuetze <r.sagitario gmx.de> writes:

On 11/14/2012 7:20 PM, Dmitry Olshansky wrote:
 11/14/2012 9:44 PM, Andrei Alexandrescu пишет:
 This is new! What does this mean?

 I'm sure it is nothing new. Basically AA is a reference type but it is
 auto-magically created on the first insertion. This is called magic null
 behavior.

 void foo(int[int] aa){
 //aa here is null
      aa[1] = 1;
 //now the local aa points to a new hash-map
 }

 void main(){
      int[int] map;
      foo(map);
      //map in this scope is still null
      assert(1 in map); // fails
 }


 I'm in favor of implicitly allocating an AA on definition that'll make
 it a proper reference type.

I don't like the "magic null behaviour", but I see issues with this 
straight forward implementation: how can you define a statically 
initialized struct value (including init) that contains an associative 
array? It will have to run a default constructor to allocate the AA root 
object, introducing something like the often rejected user-defined 
default constructor for structs.

Nov 14 2012

Don <nospam nospam.com> writes:

On 14.11.2012 20:15, Rainer Schuetze wrote:
 On 11/14/2012 7:20 PM, Dmitry Olshansky wrote:
 11/14/2012 9:44 PM, Andrei Alexandrescu пишет:
 This is new! What does this mean?

 I'm sure it is nothing new. Basically AA is a reference type but it is
 auto-magically created on the first insertion. This is called magic null
 behavior.

 void foo(int[int] aa){
 //aa here is null
 aa[1] = 1;
 //now the local aa points to a new hash-map
 }

 void main(){
 int[int] map;
 foo(map);
 //map in this scope is still null
 assert(1 in map); // fails
 }


 I'm in favor of implicitly allocating an AA on definition that'll make
 it a proper reference type.

 I don't like the "magic null behaviour", but I see issues with this
 straight forward implementation: how can you define a statically
 initialized struct value (including init) that contains an associative
 array? It will have to run a default constructor to allocate the AA root
 object, introducing something like the often rejected user-defined
 default constructor for structs.

It doesn't need to allocate any keys or values. It just needs to 
allocate whatever structure it needs to keep track of how many items it 
has. As if you added an element, and then removed it.

Nov 14 2012

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Wednesday, November 14, 2012 20:29:03 Don wrote:
 It doesn't need to allocate any keys or values. It just needs to
 allocate whatever structure it needs to keep track of how many items it
 has. As if you added an element, and then removed it.

Except that that doesn't play nicely with init. Instead of using init, it 
would have to be default constructed, which goes against how every other type 
works and risks causing problems outside of the case where you simply declare 
an AA as a local variable. For instance, it wouldn't work at all when an AA is 
a member variable of a struct.

I understand wanting to get rid of the magic initialization nonsense, but what 
it does is completely consistent with how dynamic arrays work, and making it 
work otherwise would make it inconsistent with every other type in D with
regards to default-initiliazation.

We _do_ need a way to indicate that an AA should be properly initialized without
inserting anything into it (having to insert and then remove something to do
that is atrocious), but I don't think that default constructing AAs will fly.

- Jonathan M Davis

Nov 14 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 11/14/12 10:20 AM, Dmitry Olshansky wrote:
 11/14/2012 9:44 PM, Andrei Alexandrescu пишет:
 On 11/14/12 1:25 AM, Don Clugston wrote:
 The thing that really really should change is the bizarre 'magic null'
 semantics of AAs.

 This is new! What does this mean?

 I'm sure it is nothing new. Basically AA is a reference type but it is
 auto-magically created on the first insertion. This is called magic null
 behavior.

Oh I see. Well it's not that magic because it can be done in library code.

Andrei

Nov 14 2012

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

On 11/12/12, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:
 Otherwise people use workarounds like "alias
 void[0][string] StringSet", and wrap such a type in a struct with
 operator overloads to emulate the built-in hash syntax.

Unfortunately this is also problematic with libraries which don't
handle this type. For example serialization libraries can choke on
such a type, and probably any routine that isn't specifically designed
to work with void[0] will choke.

Nov 12 2012

D Programming

C/C++ Programming

Other

digitalmars.D - hashed array?