www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Need a collection of unique elements (like C++ std::set).

reply "Cooler" <kulkin hotbox.ru> writes:
Example in C++:
   set<int> uniqueInts;
   assert(uniqueInts.count(99) == 0);
   uniqueInts.insert(99);
   assert(uniqueInts.count(99) == 1);
   uniqueInts.erase(99);
   assert(uniqueInts.count(99) == 0);

Which it will be analogue solution in D?

I need a collection that can hold number of items, and can tell 
me weather is an item in the collection or not. I found that 
RedBlackTree can help me. But RedBlackTree uses O(log(n)) time 
for insert/remove operations. On the other hand we have built-in 
associative arrays with hashed keys. But associative arrays 
requires pairs of (key, value), which is quite wasting in my case.
May be it will be convenient to allow void[key] built-in 
collections?
Any suggestion?
Jan 01 2014
next sibling parent "Brad Anderson" <eco gnuk.net> writes:
On Wednesday, 1 January 2014 at 21:54:13 UTC, Cooler wrote:
 Example in C++:
   set<int> uniqueInts;
   assert(uniqueInts.count(99) == 0);
   uniqueInts.insert(99);
   assert(uniqueInts.count(99) == 1);
   uniqueInts.erase(99);
   assert(uniqueInts.count(99) == 0);

 Which it will be analogue solution in D?

 I need a collection that can hold number of items, and can tell 
 me weather is an item in the collection or not. I found that 
 RedBlackTree can help me. But RedBlackTree uses O(log(n)) time 
 for insert/remove operations. On the other hand we have 
 built-in associative arrays with hashed keys. But associative 
 arrays requires pairs of (key, value), which is quite wasting 
 in my case.
 May be it will be convenient to allow void[key] built-in 
 collections?
 Any suggestion?

C++'s std::set is actually almost always implemented as a red-black tree so it's pretty much the exact same thing as D's RedBlackTree. void[key] is a commonly requested feature but so far no one has implemented it. In C++ the equivalent would be std::unordered_set.
Jan 01 2014
prev sibling next sibling parent reply =?UTF-8?B?UmFwaGHDq2wgSmFrc2U=?= <raphael.jakse gmail.com> writes:
Le 01/01/2014 22:54, Cooler a écrit :
 Example in C++:
    set<int> uniqueInts;
    assert(uniqueInts.count(99) == 0);
    uniqueInts.insert(99);
    assert(uniqueInts.count(99) == 1);
    uniqueInts.erase(99);
    assert(uniqueInts.count(99) == 0);

 Which it will be analogue solution in D?

 I need a collection that can hold number of items, and can tell me
 weather is an item in the collection or not. I found that RedBlackTree
 can help me. But RedBlackTree uses O(log(n)) time for insert/remove
 operations. On the other hand we have built-in associative arrays with
 hashed keys. But associative arrays requires pairs of (key, value),
 which is quite wasting in my case.
 May be it will be convenient to allow void[key] built-in collections?
 Any suggestion?

Hello, I don't know if it fits with your needs, but I just pushed a library for handling sets I wrote months ago. It supports basic operations like adding, removing, testing the presence of an object in the set, intersection, union, powerset, difference, symmetric difference. Sets can be untyped (it accepts elements of any type) or typed (more efficient, accepts only elements of a given type like int). If I'm not mistaken, complexity of insertion, presence checking and removal is O(1). You can get it here: https://gitorious.org/set/set/ Feel free to ask question, make suggestions... if you have any. Example of code: import std.stdio; import std.conv; import set; void main() { auto S = new Set!(); S.add("hello"); S.add(42); writeln(S); writeln("has 42? ", S.contains(42)); writeln("has \"42\"? ", S.contains("42")); writeln("has \"hello\"? ", S.contains("hello")); S.remove("hello"); writeln(S); writeln("has \"hello\"? ", S.contains("hello")); writeln("address of 42 : ", S.addressOf(42)); auto E = new Set!int([1,2,3,4]); // it is possible to declare an empty set auto F = new Set!int([3,4,5,6]); // of int like this: auto G = new Set!int; writeln("union: ", E.Union(F)); writeln("inter: ", E.Inter(F)); writeln("symmetric difference: ", E.SymDiff(F)); writeln("minus: ", E.Minus(F)); writeln("Powerset: ", E.Powerset); S.UnionInPlace(E); writeln("Union in place: ", S); // lets iterate over elements: foreach (element ; S) { write(element); // beware, because S is an untyped set (Set!() or Set!Object), // type of element is Element. // If you want the real value, cast to Element!the_type_of_your_element // and access to the e property of the class. auto el = cast(Element!int)(element); write(" (", el.e, ") "); // el.e is an int // Note that this works only because the only remaining values in S // are integer values. If another type were present in the set, we // would experience a segmentation fault. } writeln(); } Expected output: {hello, 42} has 42? true has "42"? false has "hello"? true {42} has "hello"? false address of 42 : 7F9323979F60 union: {4, 1, 5, 2, 6, 3} inter: {4, 3} symmetric difference: {1, 5, 2, 6} minus: {1, 2} Powerset: {{}, {4}, {1, 3}, {4, 1, 3}, {1}, {4, 1}, {2, 3}, {4, 2, 3}, {2}, {4, 2}, {1, 2, 3}, {4, 1, 2, 3}, {1, 2}, {4, 1, 2}, {3}, {4, 3}} Union in place: {4, 1, 42, 2, 3} 4 (4) 1 (1) 42 (42) 2 (2) 3 (3) Should Phobos have something similar built in? Happy new year to everyone, Raphaël.
Jan 01 2014
next sibling parent reply =?UTF-8?B?UmFwaGHDq2wgSmFrc2U=?= <raphael.jakse gmail.com> writes:
Le 02/01/2014 01:44, bearophile a écrit :
 Raphaël Jakse:

         auto E = new Set!int([1,2,3,4]); // it is possible to declare
 an empty set

You can add a factory helper: auto e = set([1, 2, 3, 4]); Or even: auto e = set(1, 2, 3, 4);

I'm going to add this. Two questions: - By default, should set be typed or untyped? I would go for typed set, but what about people's expectations? - What is preferable: first version or second version of the proposed set factory function? (imho both cannot co-exist because of sets of arrays handling) * The first version allows creation of a set from a array. That is already possible with the constructor, but then you have to pass the type template, this is not beautiful. * The second version is more pretty. But then, we'll need a setFromArray function which makes a set from a list. I prefer the second option, but then, set([1, 2, 3, 4]) would make a set containing on element: an array of four ints. Could this be misleading?
         writeln("union: ", E.Union(F));
         writeln("inter: ", E.Inter(F));
         writeln("symmetric difference: ", E.SymDiff(F));
         writeln("minus: ", E.Minus(F));
         writeln("Powerset: ", E.Powerset);

In idiomatic D all the member functions start with a lowercase letter.

I prefer lowercase for first letters too, but there is the "union" member, which is a keyword, so I put them all in uppercase. Any idea?
 And a little operator overloading helps, look at the Python set() for a
 nice API.


             // beware, because S is an untyped set (Set!() or
 Set!Object),

Probably in idiomatic D you use templates, so Set!Object is a very uncommon usage, unlike Java.

What do you mean? about Set!Object: I needed a means to specify the type of elements in the Set. So I went with Set!the_type. Then, I needed a special type to make sets able to hold any value. I went for Object. void might have been nice, though it would probably have complicated things. I tried to make `auto S = new Set;` create a untyped set, but I didn't succeed. So again, what do you suggest?
 Should Phobos have something similar built in?

A set is a good thing to have in Phobos, but the API and design should be a little different from yours.

I'm eager to make my library close to "idiomatic D", so feel free to make any remark, suggestion for this or for anything else.
 Bye,
 bearophile

Jan 02 2014
parent Martin Nowak <code dawg.eu> writes:
On 01/02/2014 09:09 AM, Raphaël Jakse wrote:
 I'm going to add this.
 Two questions:
   - By default, should set be typed or untyped?
     I would go for typed set, but what about people's expectations?
   - What is preferable: first version or second version of the proposed
 set factory function? (imho both cannot co-exist because of sets of
 arrays handling)
      * The first version allows creation of a set from a array. That is
 already possible with the constructor, but then you have to pass the
 type template, this is not beautiful.
      * The second version is more pretty. But then, we'll need a
 setFromArray function which makes a set from a list.
     I prefer the second option, but then, set([1, 2, 3, 4]) would make a
 set containing on element: an array of four ints. Could this be misleading?

Use a typesafe variadic function. http://dlang.org/function.html#variadic https://d.puremagic.com/issues/show_bug.cgi?id=11657
Jan 02 2014
prev sibling parent =?UTF-8?B?UmFwaGHDq2wgSmFrc2U=?= <raphael.jakse gmail.com> writes:
Le 02/01/2014 16:00, Dejan Lekic a écrit :
 On Thursday, 2 January 2014 at 00:22:14 UTC, Raphaël Jakse wrote:
 Le 01/01/2014 22:54, Cooler a écrit :
 Example in C++:
   set<int> uniqueInts;
   assert(uniqueInts.count(99) == 0);
   uniqueInts.insert(99);
   assert(uniqueInts.count(99) == 1);
   uniqueInts.erase(99);
   assert(uniqueInts.count(99) == 0);

 Which it will be analogue solution in D?

 I need a collection that can hold number of items, and can tell me
 weather is an item in the collection or not. I found that RedBlackTree
 can help me. But RedBlackTree uses O(log(n)) time for insert/remove
 operations. On the other hand we have built-in associative arrays with
 hashed keys. But associative arrays requires pairs of (key, value),
 which is quite wasting in my case.
 May be it will be convenient to allow void[key] built-in collections?
 Any suggestion?

Hello, I don't know if it fits with your needs, but I just pushed a library for handling sets I wrote months ago. It supports basic operations like adding, removing, testing the presence of an object in the set, intersection, union, powerset, difference, symmetric difference. Sets can be untyped (it accepts elements of any type) or typed (more efficient, accepts only elements of a given type like int). If I'm not mistaken, complexity of insertion, presence checking and removal is O(1). You can get it here: https://gitorious.org/set/set/ Feel free to ask question, make suggestions... if you have any. Example of code: import std.stdio; import std.conv; import set; void main() { auto S = new Set!(); S.add("hello"); S.add(42); writeln(S); writeln("has 42? ", S.contains(42)); writeln("has \"42\"? ", S.contains("42")); writeln("has \"hello\"? ", S.contains("hello")); S.remove("hello"); writeln(S); writeln("has \"hello\"? ", S.contains("hello")); writeln("address of 42 : ", S.addressOf(42)); auto E = new Set!int([1,2,3,4]); // it is possible to declare an empty set auto F = new Set!int([3,4,5,6]); // of int like this: auto G = new Set!int; writeln("union: ", E.Union(F)); writeln("inter: ", E.Inter(F)); writeln("symmetric difference: ", E.SymDiff(F)); writeln("minus: ", E.Minus(F)); writeln("Powerset: ", E.Powerset); S.UnionInPlace(E); writeln("Union in place: ", S); // lets iterate over elements: foreach (element ; S) { write(element); // beware, because S is an untyped set (Set!() or Set!Object), // type of element is Element. // If you want the real value, cast to Element!the_type_of_your_element // and access to the e property of the class. auto el = cast(Element!int)(element); write(" (", el.e, ") "); // el.e is an int // Note that this works only because the only remaining values in S // are integer values. If another type were present in the set, we // would experience a segmentation fault. } writeln(); } Expected output: {hello, 42} has 42? true has "42"? false has "hello"? true {42} has "hello"? false address of 42 : 7F9323979F60 union: {4, 1, 5, 2, 6, 3} inter: {4, 3} symmetric difference: {1, 5, 2, 6} minus: {1, 2} Powerset: {{}, {4}, {1, 3}, {4, 1, 3}, {1}, {4, 1}, {2, 3}, {4, 2, 3}, {2}, {4, 2}, {1, 2, 3}, {4, 1, 2, 3}, {1, 2}, {4, 1, 2}, {3}, {4, 3}} Union in place: {4, 1, 42, 2, 3} 4 (4) 1 (1) 42 (42) 2 (2) 3 (3) Should Phobos have something similar built in? Happy new year to everyone, Raphaël.

Is there any reason not to use Phobos Tuple? :)

My ignorance. Fixed, thanks ;-)
Jan 02 2014
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
Raphaël Jakse:

         auto E = new Set!int([1,2,3,4]); // it is possible to 
 declare an empty set

You can add a factory helper: auto e = set([1, 2, 3, 4]); Or even: auto e = set(1, 2, 3, 4);
         writeln("union: ", E.Union(F));
         writeln("inter: ", E.Inter(F));
         writeln("symmetric difference: ", E.SymDiff(F));
         writeln("minus: ", E.Minus(F));
         writeln("Powerset: ", E.Powerset);

In idiomatic D all the member functions start with a lowercase letter. And a little operator overloading helps, look at the Python set() for a nice API.
             // beware, because S is an untyped set (Set!() or 
 Set!Object),

Probably in idiomatic D you use templates, so Set!Object is a very uncommon usage, unlike Java.
 Should Phobos have something similar built in?

A set is a good thing to have in Phobos, but the API and design should be a little different from yours. Bye, bearophile
Jan 01 2014
prev sibling next sibling parent "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Wednesday, 1 January 2014 at 21:54:13 UTC, Cooler wrote:
 RedBlackTree can help me. But RedBlackTree uses O(log(n)) time 
 for insert/remove operations.

If that is too slow, maybe you could roll your own simple hash table with open addressing and set the entries to negative values rather than delete? If the values/access patterns/hash/table size is right you might get O(1) and no need for bounds checks on array indexing either. I assume basic associative arrays will have a more costly/generic hash function and need to rehash the table a few times as it grows.
Jan 01 2014
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 1/1/14, Brad Anderson <eco gnuk.net> wrote:
 void[key] is a commonly requested feature but so far no one has
 implemented it.

You can however use: void[0][int] hash; hash[1] = [];
Jan 02 2014
prev sibling next sibling parent "Cooler" <kulkin hotbox.ru> writes:
On Thursday, 2 January 2014 at 10:21:11 UTC, Andrej Mitrovic 
wrote:
 On 1/1/14, Brad Anderson <eco gnuk.net> wrote:
 void[key] is a commonly requested feature but so far no one has
 implemented it.

You can however use: void[0][int] hash; hash[1] = [];

Interesting... What is a profit of this solution over simple bool[int]?
Jan 02 2014
prev sibling next sibling parent "Cooler" <kulkin hotbox.ru> writes:
On Thursday, 2 January 2014 at 01:21:19 UTC, Ola Fosheim Grøstad 
wrote:
 On Wednesday, 1 January 2014 at 21:54:13 UTC, Cooler wrote:
 RedBlackTree can help me. But RedBlackTree uses O(log(n)) time 
 for insert/remove operations.

If that is too slow, maybe you could roll your own simple hash table with open addressing and set the entries to negative values rather than delete? If the values/access patterns/hash/table size is right you might get O(1) and no need for bounds checks on array indexing either. I assume basic associative arrays will have a more costly/generic hash function and need to rehash the table a few times as it grows.

My point is to don't invent something new. Such fundamental collection must exist at basic.
Jan 02 2014
prev sibling next sibling parent Jacob Carlborg <doob me.com> writes:
On 2014-01-01 22:54, Cooler wrote:
 Example in C++:
    set<int> uniqueInts;
    assert(uniqueInts.count(99) == 0);
    uniqueInts.insert(99);
    assert(uniqueInts.count(99) == 1);
    uniqueInts.erase(99);
    assert(uniqueInts.count(99) == 0);

 Which it will be analogue solution in D?

 I need a collection that can hold number of items, and can tell me
 weather is an item in the collection or not. I found that RedBlackTree
 can help me. But RedBlackTree uses O(log(n)) time for insert/remove
 operations. On the other hand we have built-in associative arrays with
 hashed keys. But associative arrays requires pairs of (key, value),
 which is quite wasting in my case.
 May be it will be convenient to allow void[key] built-in collections?
 Any suggestion?

Tango contains collection, among the a hash set: http://siegelord.github.io/Tango-D2/tango.util.container.HashSet.html https://github.com/SiegeLord/Tango-D2 -- /Jacob Carlborg
Jan 02 2014
prev sibling next sibling parent "Cooler" <kulkin hotbox.ru> writes:
On Thursday, 2 January 2014 at 11:20:32 UTC, Jacob Carlborg wrote:
 On 2014-01-01 22:54, Cooler wrote:
 Example in C++:
   set<int> uniqueInts;
   assert(uniqueInts.count(99) == 0);
   uniqueInts.insert(99);
   assert(uniqueInts.count(99) == 1);
   uniqueInts.erase(99);
   assert(uniqueInts.count(99) == 0);

 Which it will be analogue solution in D?

 I need a collection that can hold number of items, and can 
 tell me
 weather is an item in the collection or not. I found that 
 RedBlackTree
 can help me. But RedBlackTree uses O(log(n)) time for 
 insert/remove
 operations. On the other hand we have built-in associative 
 arrays with
 hashed keys. But associative arrays requires pairs of (key, 
 value),
 which is quite wasting in my case.
 May be it will be convenient to allow void[key] built-in 
 collections?
 Any suggestion?

Tango contains collection, among the a hash set: http://siegelord.github.io/Tango-D2/tango.util.container.HashSet.html https://github.com/SiegeLord/Tango-D2

Why do not include this to Phobos?
Jan 02 2014
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 1/2/14, Cooler <kulkin hotbox.ru> wrote:
 Interesting... What is a profit of this solution over simple
 bool[int]?

AFAIK it will make the druntime hashes avoid allocating memory for the value type, because: assert(bool.sizeof == 1); assert((void[0]).sizeof == 0); Note that void[0] is different from void[], as the latter is really two size_t types in a struct.
Jan 02 2014
prev sibling next sibling parent "John Colvin" <john.loughran.colvin gmail.com> writes:
On Thursday, 2 January 2014 at 11:29:41 UTC, Cooler wrote:
 On Thursday, 2 January 2014 at 11:20:32 UTC, Jacob Carlborg 
 wrote:
 On 2014-01-01 22:54, Cooler wrote:
 Example in C++:
  set<int> uniqueInts;
  assert(uniqueInts.count(99) == 0);
  uniqueInts.insert(99);
  assert(uniqueInts.count(99) == 1);
  uniqueInts.erase(99);
  assert(uniqueInts.count(99) == 0);

 Which it will be analogue solution in D?

 I need a collection that can hold number of items, and can 
 tell me
 weather is an item in the collection or not. I found that 
 RedBlackTree
 can help me. But RedBlackTree uses O(log(n)) time for 
 insert/remove
 operations. On the other hand we have built-in associative 
 arrays with
 hashed keys. But associative arrays requires pairs of (key, 
 value),
 which is quite wasting in my case.
 May be it will be convenient to allow void[key] built-in 
 collections?
 Any suggestion?

Tango contains collection, among the a hash set: http://siegelord.github.io/Tango-D2/tango.util.container.HashSet.html https://github.com/SiegeLord/Tango-D2

Why do not include this to Phobos?

Licensing problems.
Jan 02 2014
prev sibling next sibling parent "Cooler" <kulkin hotbox.ru> writes:
On Thursday, 2 January 2014 at 12:18:37 UTC, Andrej Mitrovic 
wrote:
 On 1/2/14, Cooler <kulkin hotbox.ru> wrote:
 Interesting... What is a profit of this solution over simple
 bool[int]?

AFAIK it will make the druntime hashes avoid allocating memory for the value type, because: assert(bool.sizeof == 1); assert((void[0]).sizeof == 0); Note that void[0] is different from void[], as the latter is really two size_t types in a struct.

As I understand any type can be used instead of void, because assert((bool[0]).sizeof == 0). Correct?
Jan 02 2014
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 1/2/14, Cooler <kulkin hotbox.ru> wrote:
 As I understand any type can be used instead of void, because
 assert((bool[0]).sizeof == 0). Correct?

Yes, the only reason void was there was to make it more documenting.
Jan 02 2014
prev sibling parent "Dejan Lekic" <dejan.lekic gmail.com> writes:
On Thursday, 2 January 2014 at 00:22:14 UTC, Raphaël Jakse wrote:
 Le 01/01/2014 22:54, Cooler a écrit :
 Example in C++:
   set<int> uniqueInts;
   assert(uniqueInts.count(99) == 0);
   uniqueInts.insert(99);
   assert(uniqueInts.count(99) == 1);
   uniqueInts.erase(99);
   assert(uniqueInts.count(99) == 0);

 Which it will be analogue solution in D?

 I need a collection that can hold number of items, and can 
 tell me
 weather is an item in the collection or not. I found that 
 RedBlackTree
 can help me. But RedBlackTree uses O(log(n)) time for 
 insert/remove
 operations. On the other hand we have built-in associative 
 arrays with
 hashed keys. But associative arrays requires pairs of (key, 
 value),
 which is quite wasting in my case.
 May be it will be convenient to allow void[key] built-in 
 collections?
 Any suggestion?

Hello, I don't know if it fits with your needs, but I just pushed a library for handling sets I wrote months ago. It supports basic operations like adding, removing, testing the presence of an object in the set, intersection, union, powerset, difference, symmetric difference. Sets can be untyped (it accepts elements of any type) or typed (more efficient, accepts only elements of a given type like int). If I'm not mistaken, complexity of insertion, presence checking and removal is O(1). You can get it here: https://gitorious.org/set/set/ Feel free to ask question, make suggestions... if you have any. Example of code: import std.stdio; import std.conv; import set; void main() { auto S = new Set!(); S.add("hello"); S.add(42); writeln(S); writeln("has 42? ", S.contains(42)); writeln("has \"42\"? ", S.contains("42")); writeln("has \"hello\"? ", S.contains("hello")); S.remove("hello"); writeln(S); writeln("has \"hello\"? ", S.contains("hello")); writeln("address of 42 : ", S.addressOf(42)); auto E = new Set!int([1,2,3,4]); // it is possible to declare an empty set auto F = new Set!int([3,4,5,6]); // of int like this: auto G = new Set!int; writeln("union: ", E.Union(F)); writeln("inter: ", E.Inter(F)); writeln("symmetric difference: ", E.SymDiff(F)); writeln("minus: ", E.Minus(F)); writeln("Powerset: ", E.Powerset); S.UnionInPlace(E); writeln("Union in place: ", S); // lets iterate over elements: foreach (element ; S) { write(element); // beware, because S is an untyped set (Set!() or Set!Object), // type of element is Element. // If you want the real value, cast to Element!the_type_of_your_element // and access to the e property of the class. auto el = cast(Element!int)(element); write(" (", el.e, ") "); // el.e is an int // Note that this works only because the only remaining values in S // are integer values. If another type were present in the set, we // would experience a segmentation fault. } writeln(); } Expected output: {hello, 42} has 42? true has "42"? false has "hello"? true {42} has "hello"? false address of 42 : 7F9323979F60 union: {4, 1, 5, 2, 6, 3} inter: {4, 3} symmetric difference: {1, 5, 2, 6} minus: {1, 2} Powerset: {{}, {4}, {1, 3}, {4, 1, 3}, {1}, {4, 1}, {2, 3}, {4, 2, 3}, {2}, {4, 2}, {1, 2, 3}, {4, 1, 2, 3}, {1, 2}, {4, 1, 2}, {3}, {4, 3}} Union in place: {4, 1, 42, 2, 3} 4 (4) 1 (1) 42 (42) 2 (2) 3 (3) Should Phobos have something similar built in? Happy new year to everyone, Raphaël.

Is there any reason not to use Phobos Tuple? :)
Jan 02 2014