www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.announce - Hash Tables in D

reply Walter Bright <newshound2 digitalmars.com> writes:
http://minas-mina.com/2016/01/01/associative-arrays/

https://www.reddit.com/r/programming/comments/3z03ji/hash_tables_in_the_d_programming_language/
Jan 01
next sibling parent reply Minas Mina <minas_0 hotmail.co.uk> writes:
On Friday, 1 January 2016 at 13:59:35 UTC, Walter Bright wrote:
 http://minas-mina.com/2016/01/01/associative-arrays/

 https://www.reddit.com/r/programming/comments/3z03ji/hash_tables_in_the_d_programming_language/
Thanks for sharing this. I am the author. :)
Jan 01
next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 1/1/2016 7:27 AM, Minas Mina wrote:
 On Friday, 1 January 2016 at 13:59:35 UTC, Walter Bright wrote:
 http://minas-mina.com/2016/01/01/associative-arrays/

 https://www.reddit.com/r/programming/comments/3z03ji/hash_tables_in_the_d_programming_language/
Thanks for sharing this. I am the author. :)
You're welcome, and thanks for writing the nice article.
Jan 01
prev sibling parent reply Martin Nowak <code+news.digitalmars dawg.eu> writes:
On 01/01/2016 04:27 PM, Minas Mina wrote:
 On Friday, 1 January 2016 at 13:59:35 UTC, Walter Bright wrote:
 http://minas-mina.com/2016/01/01/associative-arrays/

 https://www.reddit.com/r/programming/comments/3z03ji/hash_tables_in_the_d_programming_language/
Thanks for sharing this. I am the author. :)
There is a bug. You should never do this b/c of iterator/range invalidation. foreach (key; aa.keys) aa.remove(key);
Jan 03
next sibling parent reply Minas Mina <minas_0 hotmail.co.uk> writes:
On Sunday, 3 January 2016 at 19:29:05 UTC, Martin Nowak wrote:
 On 01/01/2016 04:27 PM, Minas Mina wrote:
 On Friday, 1 January 2016 at 13:59:35 UTC, Walter Bright wrote:
 http://minas-mina.com/2016/01/01/associative-arrays/

 https://www.reddit.com/r/programming/comments/3z03ji/hash_tables_in_the_d_programming_language/
Thanks for sharing this. I am the author. :)
There is a bug. You should never do this b/c of iterator/range invalidation. foreach (key; aa.keys) aa.remove(key);
The reference states that keys: "Returns dynamic array, the elements of which are the keys in the associative array". Isn't the array newly allocated?
Jan 03
parent reply Bastiaan Veelo <Bastiaan Veelo.net> writes:
On Monday, 4 January 2016 at 07:09:30 UTC, Minas Mina wrote:
 On Sunday, 3 January 2016 at 19:29:05 UTC, Martin Nowak wrote:
 There is a bug.

 You should never do this b/c of iterator/range invalidation.

 foreach (key; aa.keys)
     aa.remove(key);
The reference states that keys: "Returns dynamic array, the elements of which are the keys in the associative array". Isn't the array newly allocated?
Hi, You are right:
 void main()
 {
 	import std.stdio;
 	int[int] aa;
 	foreach (i; 0..9)
 		aa[i] = i * 10;
 	writeln(aa); // [0:0, 6:60, 7:70, 2:20, 3:30, 1:10, 8:80, 
 5:50, 4:40]
 
 	foreach (key; aa.keys)
 		aa.remove(key);
 	writeln(aa); // []
 }
This would be a bug (segfault on my machine):
 	foreach (key; aa.byKey)
 		aa.remove(key);
Note that, in this example, there is no need to remove every element separately, you can also just do
 void main()
 {
 	import std.stdio;
 	int[int] aa;
 	foreach (i; 0..9)
 		aa[i] = i * 10;
 	writeln(aa); // [0:0, 6:60, 7:70, 2:20, 3:30, 1:10, 8:80, 
 5:50, 4:40]
 
 	aa = null;
 	writeln(aa); // []
 }
Bastiaan.
Jan 04
parent reply Martin Nowak <code+news.digitalmars dawg.eu> writes:
On 01/04/2016 09:06 AM, Bastiaan Veelo wrote:
 
 This would be a bug (segfault on my machine):
 
     foreach (key; aa.byKey)
         aa.remove(key);
Note that, in this example, there is no need to remove every element separately, you can also just do
Sorry my mistake, I never use aa.keys (instead of aa.byKey) b/c it allocates. So it's still sort of a bug to recommend people allocating an array ;).
Jan 04
parent reply Minas Mina <minas_0 hotmail.co.uk> writes:
On Monday, 4 January 2016 at 19:58:03 UTC, Martin Nowak wrote:
 On 01/04/2016 09:06 AM, Bastiaan Veelo wrote:
 
 This would be a bug (segfault on my machine):
 
     foreach (key; aa.byKey)
         aa.remove(key);
Note that, in this example, there is no need to remove every element separately, you can also just do
Sorry my mistake, I never use aa.keys (instead of aa.byKey) b/c it allocates. So it's still sort of a bug to recommend people allocating an array ;).
I haven't found a way to clear an AA without allocating though. (Creating a new one doesn't count).
Jan 05
parent reply Jacob Carlborg <doob me.com> writes:
On 2016-01-05 11:19, Minas Mina wrote:

 I haven't found a way to clear an AA without allocating though.
 (Creating a new one doesn't count).
Set it to "null", or will that allocate a new one? -- /Jacob Carlborg
Jan 05
parent reply Minas Mina <minas_0 hotmail.co.uk> writes:
On Tuesday, 5 January 2016 at 13:36:48 UTC, Jacob Carlborg wrote:
 On 2016-01-05 11:19, Minas Mina wrote:

 I haven't found a way to clear an AA without allocating though.
 (Creating a new one doesn't count).
Set it to "null", or will that allocate a new one?
It won't, but to use it again you need to allocate a new one (If I'm not mistaken).
Jan 05
parent reply Jacob Carlborg <doob me.com> writes:
On 2016-01-05 15:44, Minas Mina wrote:

 It won't, but to use it again you need to allocate a new one (If I'm not
 mistaken).
Not explicitly. I don't know if the runtime allocates a new one. This works: void main() { auto foo = ["foo" : 1]; foo = null; foo["bar"] = 2; assert(foo["bar"] == 2); } -- /Jacob Carlborg
Jan 06
parent reply Minas Mina <minas_0 hotmail.co.uk> writes:
On Wednesday, 6 January 2016 at 12:19:45 UTC, Jacob Carlborg 
wrote:
 On 2016-01-05 15:44, Minas Mina wrote:

 It won't, but to use it again you need to allocate a new one 
 (If I'm not
 mistaken).
Not explicitly. I don't know if the runtime allocates a new one. This works: void main() { auto foo = ["foo" : 1]; foo = null; foo["bar"] = 2; assert(foo["bar"] == 2); }
I believe it does, check out this example: import std.stdio; class C { int[int] squares; } void main() { auto squares = [0 : 0, 1 : 1]; C c = new C(); c.squares = squares; writeln(c.squares is squares); // true squares = null; squares[10] = 100; writeln(c.squares is squares); // false } If the runtime used the same underlying memory, the second writeln() would print true, right? So if I am correct, a new AA is allocated.
Jan 06
parent Rory McGuire via Digitalmars-d-announce writes:
On 06 Jan 2016 3:25 PM, "Minas Mina via Digitalmars-d-announce" <
digitalmars-d-announce puremagic.com> wrote:
 On Wednesday, 6 January 2016 at 12:19:45 UTC, Jacob Carlborg wrote:
 On 2016-01-05 15:44, Minas Mina wrote:

 It won't, but to use it again you need to allocate a new one (If I'm not
 mistaken).
Not explicitly. I don't know if the runtime allocates a new one. This
works:
 void main()
 {
     auto foo = ["foo" : 1];
     foo = null;
     foo["bar"] = 2;
     assert(foo["bar"] == 2);
 }
I believe it does, check out this example: import std.stdio; class C { int[int] squares; } void main() { auto squares = [0 : 0, 1 : 1]; C c = new C(); c.squares = squares; writeln(c.squares is squares); // true squares = null; squares[10] = 100; writeln(c.squares is squares); // false } If the runtime used the same underlying memory, the second writeln()
would print true, right?
 So if I am correct, a new AA is allocated.
Probably depends on the current implementation. If you are using an associative array you are going to be allocating at least a little. If you used an associative array backed by two arrays you could allocate and reuse memory when null is assigned. It would also be able to keep its insertion order.
Jan 06
prev sibling parent reply Adrian Matoga <epi atari8.info> writes:
On Sunday, 3 January 2016 at 19:29:05 UTC, Martin Nowak wrote:
 There is a bug.

 You should never do this b/c of iterator/range invalidation.

 foreach (key; aa.keys)
     aa.remove(key);
I've recently hit this when trying to remove some of the elements from an AA while iterating over it. It's easy to forget that it is not allowed, especially when working on more complex algorithms. Accidentally, it worked all fine even with large data sets with DMD 2.069, but with GDC mysterious segfaults appeared - the first thought in such case was obviously a bug in GDC or in the older front-end. However, this is a very convenient, natural and intuitive syntax for something that is needed quite frequently, yet this code breaks silently in non-predictable ways. There are already registered issues concerning this (or similar with insertion) problem, including: https://issues.dlang.org/show_bug.cgi?id=4179 https://issues.dlang.org/show_bug.cgi?id=2255 https://issues.dlang.org/show_bug.cgi?id=10821 https://issues.dlang.org/show_bug.cgi?id=10876 https://issues.dlang.org/show_bug.cgi?id=13903 I've personally encountered segfaults, infinite loops, programs producing incorrect results. Some solutions to this were proposed in the bug reports - either: a) explicitly allow removing during iteration, and make sure it always works, or b) explicitly disallow removing during iteration, and always throw an Error whenever it is attempted. In some cases it could even be detectable in compile time. And I don't mean including a single sentence about this somewhere in the docs. I mean an error reported by the compiler and/or runtime. The runtime checks could be disabled for -release, but there should be at least an option to detect such errors. Probably the worst part of it is that you're free to wrap it in safe, while it's not safe at all.
Jan 07
next sibling parent Marc =?UTF-8?B?U2Now7x0eg==?= <schuetzm gmx.net> writes:
On Thursday, 7 January 2016 at 12:26:29 UTC, Adrian Matoga wrote:
 b) explicitly disallow removing during iteration, and always 
 throw an Error whenever it is attempted. In some cases it could 
 even be detectable in compile time.

 And I don't mean including a single sentence about this 
 somewhere in the docs. I mean an error reported by the compiler 
 and/or runtime. The runtime checks could be disabled for 
 -release, but there should be at least an option to detect such 
 errors.
We don't have the means to do that currently, neither at compile-time, nor at runtime. Both would require some kind of borrowing detection, either to make the AA `const` during iteration, or to set and reset a flag indicating that it's being iterated, allowing it to assert() at runtime.
 Probably the worst part of it is that you're free to wrap it in 
  safe, while it's not safe at all.
That's something we can change right now. I guess `remove` must be ` system`.
Jan 07
prev sibling parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 1/7/16 7:26 AM, Adrian Matoga wrote:
 On Sunday, 3 January 2016 at 19:29:05 UTC, Martin Nowak wrote:
 There is a bug.

 You should never do this b/c of iterator/range invalidation.

 foreach (key; aa.keys)
     aa.remove(key);
I've recently hit this when trying to remove some of the elements from an AA while iterating over it. It's easy to forget that it is not allowed, especially when working on more complex algorithms. Accidentally, it worked all fine even with large data sets with DMD 2.069, but with GDC mysterious segfaults appeared - the first thought in such case was obviously a bug in GDC or in the older front-end. However, this is a very convenient, natural and intuitive syntax for something that is needed quite frequently, yet this code breaks silently in non-predictable ways. There are already registered issues concerning this (or similar with insertion) problem, including: https://issues.dlang.org/show_bug.cgi?id=4179 https://issues.dlang.org/show_bug.cgi?id=2255 https://issues.dlang.org/show_bug.cgi?id=10821 https://issues.dlang.org/show_bug.cgi?id=10876 https://issues.dlang.org/show_bug.cgi?id=13903 I've personally encountered segfaults, infinite loops, programs producing incorrect results. Some solutions to this were proposed in the bug reports - either: a) explicitly allow removing during iteration, and make sure it always works, or b) explicitly disallow removing during iteration, and always throw an Error whenever it is attempted. In some cases it could even be detectable in compile time. And I don't mean including a single sentence about this somewhere in the docs. I mean an error reported by the compiler and/or runtime. The runtime checks could be disabled for -release, but there should be at least an option to detect such errors. Probably the worst part of it is that you're free to wrap it in safe, while it's not safe at all.
With dcollections [1] I had a feature called "purging" where you would iterate over a collection like this: foreach(ref bool removeIt, int val; collection) { if(someCondition) removeIt = true; } The entire reason for this is because you can do something similar in C++ containers, and I found it absolutely annoying that the existing container classes don't allow this. A very frequent use of containers is to iterate through picking which ones should be removed (think of a cache) -Steve [1] http://schveiguy.github.io/dcollections/
Jan 07
parent Jacob Carlborg <doob me.com> writes:
On 2016-01-07 15:11, Steven Schveighoffer wrote:

 With dcollections [1] I had a feature called "purging" where you would
 iterate over a collection like this:

 foreach(ref bool removeIt, int val; collection)
 {
     if(someCondition) removeIt = true;
 }

 The entire reason for this is because you can do something similar in
 C++ containers, and I found it absolutely annoying that the existing
 container classes don't allow this. A very frequent use of containers is
 to iterate through picking which ones should be removed (think of a cache)
In Ruby that's called "reject", i.e. the opposite of "filter". -- /Jacob Carlborg
Jan 07
prev sibling parent Andre Polykanine via Digitalmars-d-announce writes:
WBvDda> http://minas-mina.com/2016/01/01/associative-arrays/

This article translated into Russian:
http://habrahabr.ru/post/274723/


-- 
With best regards from Ukraine,
Andre
Skype: Francophile
Twitter:  m_elensule; Facebook: menelion
Jan 09