www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - How to use sets in D?

reply tastyminerals <tastyminerals gmail.com> writes:
https://forum.dlang.org/post/mailman.1072.1581112984.31109.digitalmars-d-learn puremagic.com

On Friday, 7 February 2020 at 22:03:00 UTC, H. S. Teoh wrote:
 On Fri, Feb 07, 2020 at 07:37:08PM +0000, mark via 
 Digitalmars-d-learn wrote:
 [...]
bool[E] works just fine. The bool does take up 1 byte, though, so if you're *really* want to optimize that away, you could do this: alias Unit = void[0]; enum unit = Unit.init; // Look, ma! A bona fide set! Unit[E] mySet; mySet[...] = unit; mySet.remove(...); ... // etc. Or you can wrap void[0][E] in a nice user-defined type that gives nice set-like syntax. But IMO, this is all overkill, and adds needless complexity. Just use bool[E] or std.container.rbtree. :-D T
Can you please explain what does `bool[E]` mean? And how does the code with aliasing void[0] and then using enum even work?
Feb 08 2022
next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Feb 08, 2022 at 09:08:47PM +0000, tastyminerals via Digitalmars-d-learn
wrote:
[...]
 bool[E] works just fine.
 
 The bool does take up 1 byte, though, so if you're *really* want to
 optimize that away, you could do this:
 
 	alias Unit = void[0];
 	enum unit = Unit.init;
 
 	// Look, ma! A bona fide set!
 	Unit[E] mySet;
 
 	mySet[...] = unit;
 	mySet.remove(...);
 	... // etc.
 
 Or you can wrap void[0][E] in a nice user-defined type that gives
 nice set-like syntax.  But IMO, this is all overkill, and adds
 needless complexity. Just use bool[E] or std.container.rbtree. :-D
[...]
 Can you please explain what does `bool[E]` mean?
E is whatever key type you want to use. String, or integer ID, or whatever you want to put in your set.
 And how does the code with aliasing void[0] and then using enum even
 work?
The alias is a workaround for a syntactic limitation in D, because you cannot write `mySet[someKey] = void[0].init;` without a syntax error. So we use the alias to give `void[0]` a name that we can refer to in assignment statements, so that we can assign its values to the AA. As for `void[0]` itself, it's just a zero-element static array of an indeterminate type. The whole point is to create something that occupies 0 bytes. The actual type doesn't actually matter; you could have used int[0] and it'd work too. (You can't use an empty struct because empty structs have size 1.) By using a zero-sized value type for the AA, we effectively turn it into a set: it only stores keys. Well, technically it stores values of zero size too, but since it's zero-sized, it's as if the value isn't there, and it incurs zero memory overhead. (Whereas an empty struct would likely occupy an entire machine word, rounded up from size 1.) But as I said, this is overkill for something so trivial. Using `bool[E]` or an RBTree works just fine. T -- Javascript is what you use to allow third party programs you don't know anything about and doing you know not what to run on your computer. -- Charles Hixson
Feb 08 2022
next sibling parent reply Siarhei Siamashka <siarhei.siamashka gmail.com> writes:
On Tuesday, 8 February 2022 at 21:42:06 UTC, H. S. Teoh wrote:
 But as I said, this is overkill for something so trivial. Using 
 `bool[E]` or an RBTree works just fine.
Is the current implementation of associative arrays in D language resistant to Denial of Service hash collision attacks? * https://github.com/dlang/dmd/blob/master/src/dmd/root/aav.d * https://arstechnica.com/information-technology/2011/12/huge-portions-of-web-vulnerable-to-hashing-denial-of-service-attack/ * https://lwn.net/Articles/474912/ * https://codeforces.com/blog/entry/62393 If not, then redBlackTree may be safer to use when dealing with potentially adversarially constructed input data.
Feb 09 2022
parent reply Siarhei Siamashka <siarhei.siamashka gmail.com> writes:
On Wednesday, 9 February 2022 at 21:05:47 UTC, Siarhei Siamashka 
wrote:
 Is the current implementation of associative arrays in D 
 language resistant to Denial of Service hash collision attacks?
Answering to myself. No, it isn't. Here's a simple example: ```D import std, core.time; const ulong n = 100000; // Count the number of unique elements using associative array ulong count_unique_values_assoc(const ulong[] a) { bool[ulong] set; foreach (x ; a) set[x] = true; return set.length; } // Count the number of unique elements using redBlackTree ulong count_unique_values_rbtree(const ulong[] a) { auto set = redBlackTree!("a<b", false, ulong)(); foreach (x ; a) set.insert(x); return set.length; } // Generate array, which triggers hash collisions in the // current D language associative array implementation ulong[] generate_bad_array(ulong n) { return iota(n).map!(x => (x << 46)).array; } // Benchmark associative array vs. rbtree void run_benchmark(ulong[] a) { auto t1 = MonoTime.currTime; auto result_assoc = a.count_unique_values_assoc; auto t2 = MonoTime.currTime; auto result_rbtree = a.count_unique_values_rbtree; auto t3 = MonoTime.currTime; assert(result_assoc == result_rbtree); writefln("n=%d, unique_elements=%d, assoc time: %d ms, rbtree time: %d ms", a.length, result_assoc, (t2 - t1).total!"msecs", (t3 - t2).total!"msecs"); } void main() { assert([1UL, 1UL, 2UL].count_unique_values_assoc == 2); assert([1UL, 1UL, 2UL].count_unique_values_rbtree == 2); writefln("== consecutive numbers =="); n.iota.array.run_benchmark; writefln("\n== random numbers between 0 and 99 =="); n.iota.map!(_ => uniform(0UL, 100UL)).array.run_benchmark; writefln("\n== adversarially constructed array =="); n.generate_bad_array.run_benchmark; } ``` Benchmark results using a 64-bit LDC compiler: ``` == consecutive numbers == n=100000, unique_elements=100000, assoc time: 11 ms, rbtree time: 20 ms == random numbers between 0 and 99 == n=100000, unique_elements=100, assoc time: 2 ms, rbtree time: 4 ms == adversarially constructed array == n=100000, unique_elements=100000, assoc time: 22626 ms, rbtree time: 15 ms ```
Feb 09 2022
parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Feb 10, 2022 at 12:53:08AM +0000, Siarhei Siamashka via
Digitalmars-d-learn wrote:
 On Wednesday, 9 February 2022 at 21:05:47 UTC, Siarhei Siamashka wrote:
 Is the current implementation of associative arrays in D language
 resistant to Denial of Service hash collision attacks?
Answering to myself. No, it isn't. Here's a simple example:
[...]
 ulong[] generate_bad_array(ulong n)
 {
     return iota(n).map!(x => (x << 46)).array;
 }
[...]
 == adversarially constructed array ==
 n=100000, unique_elements=100000, assoc time: 22626 ms, rbtree time: 15 ms
 ```
Nice investigation! This should be filed as a bug. I remember browsing through various hash functions in druntime before (built-in AA's use per-type hash functions), and many of them were trivial, which would imply that it's trivial to construct adversarial input that triggers a large number of collisions for various basic types. T -- Give me some fresh salted fish, please.
Feb 09 2022
prev sibling parent tastyminerals <tastyminerals gmail.com> writes:
On Tuesday, 8 February 2022 at 21:42:06 UTC, H. S. Teoh wrote:
 On Tue, Feb 08, 2022 at 09:08:47PM +0000, tastyminerals via 
 Digitalmars-d-learn wrote: [...]
 [...]
[...]
 [...]
E is whatever key type you want to use. String, or integer ID, or whatever you want to put in your set. [...]
Thank you! That's neat.
Feb 10 2022
prev sibling parent reply Paul Backus <snarwin gmail.com> writes:
On Tuesday, 8 February 2022 at 21:08:47 UTC, tastyminerals wrote:
 https://forum.dlang.org/post/mailman.1072.1581112984.31109.digitalmars-d-learn puremagic.com

 On Friday, 7 February 2020 at 22:03:00 UTC, H. S. Teoh wrote:
 On Fri, Feb 07, 2020 at 07:37:08PM +0000, mark via 
 Digitalmars-d-learn wrote:
 [...]
bool[E] works just fine. The bool does take up 1 byte, though, so if you're *really* want to optimize that away, you could do this: alias Unit = void[0]; enum unit = Unit.init; // Look, ma! A bona fide set! Unit[E] mySet; mySet[...] = unit; mySet.remove(...); ... // etc.
[...]
 Can you please explain what does `bool[E]` mean? And how does 
 the code with aliasing void[0] and then using enum even work?
`bool[E]` means an [associative array][1] with keys of type `E` and values of type `bool`. Since a `bool` value takes up 1 byte in memory, using a `bool[E]` associative array to store a set of `E` values means that for every value in your set, you will have to allocate an extra 1 byte for the `bool` in addition to the bytes required for the `E` itself. In most cases, 1 extra byte is not going to matter very much, so that's not a big deal. But if you really, really want to avoid allocating any extra bytes, you can use `void[0]` in place of `bool`, since a `void[0]` takes up 0 bytes in memory. (The `void` part has no special significance here--you could also use `int[0]` or `char[0]`, for example.) The `alias` and the `enum` just make the code a little nicer to read by letting you write `Unit` instead of `void[0]` and `unit` instead of `void[0].init`. You could get rid of them and the code would work exactly the same way; it'd just be a little bit uglier: void[0][E] mySet; mySet[...] = void[0].init; mySet.remove(...); // etc. The name "unit" comes from the concept of a [unit type][2] in theoretical computer science. [1]: https://dlang.org/spec/hash-map.html [2]: https://en.wikipedia.org/wiki/Unit_type
Feb 08 2022
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Feb 08, 2022 at 09:47:13PM +0000, Paul Backus via Digitalmars-d-learn
wrote:
[...]
 The `alias` and the `enum` just make the code a little nicer to read
 by letting you write `Unit` instead of `void[0]` and `unit` instead of
 `void[0].init`. You could get rid of them and the code would work
 exactly the same way; it'd just be a little bit uglier:
 
     void[0][E] mySet;
 
     mySet[...] = void[0].init;
[...] Unfortunately, this doesn't work due to a syntax restriction (the parser isn't expecting a type name after the `=`, and will raise a syntax error). So the alias is in fact necessary. T -- Fact is stranger than fiction.
Feb 08 2022
parent reply Paul Backus <snarwin gmail.com> writes:
On Tuesday, 8 February 2022 at 21:58:49 UTC, H. S. Teoh wrote:
 On Tue, Feb 08, 2022 at 09:47:13PM +0000, Paul Backus via 
 Digitalmars-d-learn wrote: [...]
 The `alias` and the `enum` just make the code a little nicer 
 to read by letting you write `Unit` instead of `void[0]` and 
 `unit` instead of `void[0].init`. You could get rid of them 
 and the code would work exactly the same way; it'd just be a 
 little bit uglier:
 
     void[0][E] mySet;
 
     mySet[...] = void[0].init;
[...] Unfortunately, this doesn't work due to a syntax restriction (the parser isn't expecting a type name after the `=`, and will raise a syntax error). So the alias is in fact necessary.
Ah, yeah, I forgot about that bug. It also works if you use parentheses: mySet[...] = (void[0]).init;
Feb 08 2022
parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Feb 08, 2022 at 10:02:30PM +0000, Paul Backus via Digitalmars-d-learn
wrote:
[...]
 It also works if you use parentheses:
 
     mySet[...] = (void[0]).init;
OT1H, nice that works! ... but OTOH, ugh, it looks so ugly. :-P I think I'll just stick with the alias. T -- Who told you to swim in Crocodile Lake without life insurance??
Feb 08 2022