www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Why isn't bool allowed as AA key type?

reply David Nadlinger <see klickverbot.at> writes:
Hello all,

is there a reason bool isn't allowed as key type for associative arrays 
(mtype.c:3995)? The docs don't mention this.

bool-key AAs probably wouldn't be too useful in practice, but it seems 
like a quite arbitrary and unexpected limitation to me – e.g. I would 
have to add special cases to all my Thrift map handling code.

In case this is not on purpose, I'd be happy to put together a pull request.

David
May 28 2011
next sibling parent KennyTM~ <kennytm gmail.com> writes:
On May 29, 11 01:54, David Nadlinger wrote:
 Hello all,

 is there a reason bool isn't allowed as key type for associative arrays
 (mtype.c:3995)? The docs don't mention this.

 bool-key AAs probably wouldn't be too useful in practice, but it seems
 like a quite arbitrary and unexpected limitation to me – e.g. I would
 have to add special cases to all my Thrift map handling code.

 In case this is not on purpose, I'd be happy to put together a pull
 request.

 David

The limitation is apparently introduced in 0.68* (!), but the changelog doesn't show why this is added either. Perhaps it was because at that time the type was known as 'bit'. If that's the case, I believe such limitation is no longer applicable. *: https://github.com/D-Programming-Language/dmd/commit/42b868c31085211cbb8838325b72d7522f9164d2#L6R1544
May 28 2011
prev sibling parent reply "Nick Sabalausky" <a a.a> writes:
"David Nadlinger" <see klickverbot.at> wrote in message 
news:irrd5h$hdj$1 digitalmars.com...
 Hello all,

 is there a reason bool isn't allowed as key type for associative arrays 
 (mtype.c:3995)? The docs don't mention this.

 bool-key AAs probably wouldn't be too useful in practice, but it seems 
 like a quite arbitrary and unexpected limitation to me - e.g. I would have 
 to add special cases to all my Thrift map handling code.

 In case this is not on purpose, I'd be happy to put together a pull 
 request.

It works for me. I use it all the time as a hash set. What version are you on? I've been using D2, maybe it's a D1 issue? If so, I'd file a bug report.
May 29 2011
next sibling parent Daniel Gibson <metalcaedes gmail.com> writes:
Am 29.05.2011 22:21, schrieb Nick Sabalausky:
 "David Nadlinger" <see klickverbot.at> wrote in message 
 news:irrd5h$hdj$1 digitalmars.com...
 Hello all,

 is there a reason bool isn't allowed as key type for associative arrays 
 (mtype.c:3995)? The docs don't mention this.

 bool-key AAs probably wouldn't be too useful in practice, but it seems 
 like a quite arbitrary and unexpected limitation to me - e.g. I would have 
 to add special cases to all my Thrift map handling code.

 In case this is not on purpose, I'd be happy to put together a pull 
 request.

It works for me. I use it all the time as a hash set. What version are you on? I've been using D2, maybe it's a D1 issue? If so, I'd file a bug report.

string[bool] map; } test.d(2): Error: can't have associative array key of bool
May 29 2011
prev sibling next sibling parent reply David Nadlinger <see klickverbot.at> writes:
On 5/29/11 10:21 PM, Nick Sabalausky wrote:
 "David Nadlinger"<see klickverbot.at>  wrote in message
 news:irrd5h$hdj$1 digitalmars.com...
 […]
 is there a reason bool isn't allowed as key type for associative arrays
 (mtype.c:3995)? The docs don't mention this. […]

on? I've been using D2, maybe it's a D1 issue? If so, I'd file a bug report.

Are you sure you are using bool _keys_, not values? On a related note, https://github.com/D-Programming-Language/dmd/pull/77 seems to be all required for making them work. By the way, do you happen to have a complete hash set implementation lying around? I'd be interested in preparing and proposing one for std.container, as I need one for Thrift (currently, I have a really, REALLY preliminary and buggy one based on a void[0] AA, see https://github.com/klickverbot/thrift/blob/d-gsoc/lib/d/src/thrift/hashset.d). David
May 29 2011
next sibling parent "Nick Sabalausky" <a a.a> writes:
"David Nadlinger" <see klickverbot.at> wrote in message 
news:irubao$gpu$1 digitalmars.com...
 On 5/29/11 10:21 PM, Nick Sabalausky wrote:
 "David Nadlinger"<see klickverbot.at>  wrote in message
 news:irrd5h$hdj$1 digitalmars.com...
 [.]
 is there a reason bool isn't allowed as key type for associative arrays
 (mtype.c:3995)? The docs don't mention this. [.]

you on? I've been using D2, maybe it's a D1 issue? If so, I'd file a bug report.

Are you sure you are using bool _keys_, not values? On a related note, https://github.com/D-Programming-Language/dmd/pull/77 seems to be all required for making them work.

Shit, you're right. Not thinking here...
May 29 2011
prev sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
Steven Schveighoffer wrote:
 There is always dcollections (has both Hash- and TreeSet).  It is boost
 1.0, so feel free to steal anything to propose for std.container (in fact,
 RedBlackTree is from dcollections).  Note that I'm nowhere near an expert
 on hashing, so I'm not sure how it will perform against AAs.  I know it
 does beat them using my custom allocator, but that's not a truly fair
 comparison -- AA's could be written with a custom allocator too.

 -Steve

Well, how? Since an 'in' expression returns an internal pointer into the AA for value types, they cannot do better than to rely on GC. This rules out heavy AA use for real-time applications. I think AAs are broken. Timon
Jun 02 2011
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
 On Thu, 02 Jun 2011 06:21:28 -0400, Timon Gehr <timon.gehr gmx.ch> wrote:

 Steven Schveighoffer wrote:
 There is always dcollections (has both Hash- and TreeSet).  It is boost
 1.0, so feel free to steal anything to propose for std.container (in
 fact,
 RedBlackTree is from dcollections).  Note that I'm nowhere near an
 expert
 on hashing, so I'm not sure how it will perform against AAs.  I know it
 does beat them using my custom allocator, but that's not a truly fair
 comparison -- AA's could be written with a custom allocator too.

 -Steve

Well, how? Since an 'in' expression returns an internal pointer into the AA for value types, they cannot do better than to rely on GC. This rules out heavy AA use for real-time applications. I think AAs are broken.

What is the issue here? Are you saying you can use the pointer after destroying the element, because you can't. AA's free an element when it is removed.

Huh? import std.stdio,core.memory; int* test(){ int[int] aa=[0:0,1:1,2:2,3:3,4:4,5:5]; return 1 in aa; } void main(){ int* u=test(); GC.collect(); assert(*u==1); /*use u for fun and profit*/ }
 BTW, dcollections' custom allocator does use the GC, it just uses it in
 bulk instead of allocating for each node.

 -Steve

If AAs would do that, the whole bulk could be pinned by one pointer to one element inside it, which is still not optimal. Returning internal pointers into a data structure is always a bad idea (assuming it must be safe). Timon
Jun 02 2011
parent Timon Gehr <timon.gehr gmx.ch> writes:
Steven Schveighoffer wrote:
 That is not a removal:

So, destroying the container does not free it's elements, but removing one element does free it?
 int[int] aa=[0:0,1:1,2:2,3:3,4:4,5:5];
 auto ret = 1 in aa;
 aa.remove(1);
 return ret; // ret now points to free'd memory

Weeh! It's a dangling pointer built right into the language =)! Is the memory really free'd? Because: import std.stdio,core.memory; int *test(){ int[int] aa=[0:0,1:1,2:2,3:3,4:4,5:5]; int* ret=1 in aa; aa.remove(1);//int* tmp=ret;clear(tmp);//replace line with this to get segfault in main return ret; } void main(){ int *u=test(); assert(*u==1); } I do not get what is happening here, can you help?
 Your equivalent code using dcollections' custom allocator would work too,
 the GC backs up the custom allocator.

That implies that dcollections' hash map performs poorly in real-time constrained environments too, right?
 BTW, dcollections' custom allocator does use the GC, it just uses it in
 bulk instead of allocating for each node.

 -Steve

If AAs would do that, the whole bulk could be pinned by one pointer to one element inside it, which is still not optimal.

No, but this is a compromise -- better performance vs. smaller footprint. It would be nice if the GC just performed better.

I agree. But GC runtime has to be at least linear in number of alive and garbage objects. It is inherently quite slow.
 Returning internal pointers into a data
 structure is always a bad idea (assuming it must be safe).

Then ranges on a container are not possible (a range contains a pointer to the next element). Ranges can be invalidated using a "modification" count, but this removes some valid uses, and it also costs quite a bit (extra int to store range's copy of count, extra check on every range function to ensure counts are synchronized, possible extra pointer to store in range to point at owner container). -Steve

Well, it is possible unless you want to represent those ranges as pairs of raw pointers. Raw pointers have not enough type information to be useful as range representation. Even if your ranges can get invalidated, I would not pick a raw pointer as representation. C++ iterators do not leak any pointers to the internal representation of the data structure to the user afaik. &a[123] does on a std::vector, but Ds operator overloading explicitly allows to prevent this. Therefore there _must_ be some merit to the idea. Aren't 'in' expressions returning pointers just a hack to prevent multiple lookups of the same data? I do not think this optimization is worth all the GC overhead (and dangling pointer issues?). Timon
Jun 02 2011
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Sun, 29 May 2011 16:41:56 -0400, David Nadlinger <see klickverbot.at>  
wrote:

 On 5/29/11 10:21 PM, Nick Sabalausky wrote:
 "David Nadlinger"<see klickverbot.at>  wrote in message
 news:irrd5h$hdj$1 digitalmars.com...
 […]
 is there a reason bool isn't allowed as key type for associative arrays
 (mtype.c:3995)? The docs don't mention this. […]

you on? I've been using D2, maybe it's a D1 issue? If so, I'd file a bug report.

Are you sure you are using bool _keys_, not values? On a related note, https://github.com/D-Programming-Language/dmd/pull/77 seems to be all required for making them work. By the way, do you happen to have a complete hash set implementation lying around? I'd be interested in preparing and proposing one for std.container, as I need one for Thrift (currently, I have a really, REALLY preliminary and buggy one based on a void[0] AA, see https://github.com/klickverbot/thrift/blob/d-gsoc/lib/d/src/thrift/hashset.d).

There is always dcollections (has both Hash- and TreeSet). It is boost 1.0, so feel free to steal anything to propose for std.container (in fact, RedBlackTree is from dcollections). Note that I'm nowhere near an expert on hashing, so I'm not sure how it will perform against AAs. I know it does beat them using my custom allocator, but that's not a truly fair comparison -- AA's could be written with a custom allocator too. -Steve
Jun 01 2011
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 02 Jun 2011 06:21:28 -0400, Timon Gehr <timon.gehr gmx.ch> wrote:

 Steven Schveighoffer wrote:
 There is always dcollections (has both Hash- and TreeSet).  It is boost
 1.0, so feel free to steal anything to propose for std.container (in  
 fact,
 RedBlackTree is from dcollections).  Note that I'm nowhere near an  
 expert
 on hashing, so I'm not sure how it will perform against AAs.  I know it
 does beat them using my custom allocator, but that's not a truly fair
 comparison -- AA's could be written with a custom allocator too.

 -Steve

Well, how? Since an 'in' expression returns an internal pointer into the AA for value types, they cannot do better than to rely on GC. This rules out heavy AA use for real-time applications. I think AAs are broken.

What is the issue here? Are you saying you can use the pointer after destroying the element, because you can't. AA's free an element when it is removed. BTW, dcollections' custom allocator does use the GC, it just uses it in bulk instead of allocating for each node. -Steve
Jun 02 2011
prev sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 02 Jun 2011 06:44:33 -0400, Timon Gehr <timon.gehr gmx.ch> wrote:

 On Thu, 02 Jun 2011 06:21:28 -0400, Timon Gehr <timon.gehr gmx.ch>  
 wrote:

 Steven Schveighoffer wrote:
 There is always dcollections (has both Hash- and TreeSet).  It is  
 boost
 1.0, so feel free to steal anything to propose for std.container (in
 fact,
 RedBlackTree is from dcollections).  Note that I'm nowhere near an
 expert
 on hashing, so I'm not sure how it will perform against AAs.  I know  
 it
 does beat them using my custom allocator, but that's not a truly fair
 comparison -- AA's could be written with a custom allocator too.

 -Steve

Well, how? Since an 'in' expression returns an internal pointer into the AA for value types, they cannot do better than to rely on GC. This rules out heavy AA use for real-time applications. I think AAs are broken.

What is the issue here? Are you saying you can use the pointer after destroying the element, because you can't. AA's free an element when it is removed.

Huh? import std.stdio,core.memory; int* test(){ int[int] aa=[0:0,1:1,2:2,3:3,4:4,5:5]; return 1 in aa; } void main(){ int* u=test(); GC.collect(); assert(*u==1); /*use u for fun and profit*/ }

That is not a removal: int[int] aa=[0:0,1:1,2:2,3:3,4:4,5:5]; auto ret = 1 in aa; aa.remove(1); return ret; // ret now points to free'd memory Your equivalent code using dcollections' custom allocator would work too, the GC backs up the custom allocator.
 BTW, dcollections' custom allocator does use the GC, it just uses it in
 bulk instead of allocating for each node.

 -Steve

If AAs would do that, the whole bulk could be pinned by one pointer to one element inside it, which is still not optimal.

No, but this is a compromise -- better performance vs. smaller footprint. It would be nice if the GC just performed better.
 Returning internal pointers into a data
 structure is always a bad idea (assuming it must be safe).

Then ranges on a container are not possible (a range contains a pointer to the next element). Ranges can be invalidated using a "modification" count, but this removes some valid uses, and it also costs quite a bit (extra int to store range's copy of count, extra check on every range function to ensure counts are synchronized, possible extra pointer to store in range to point at owner container). -Steve
Jun 02 2011