www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - using a binary tree container

reply Dominic Jones <dominic.jones qmul.ac.uk> writes:
Hello,

I have a list of strings and I want to determine whether or not a particular
string in the is in that list. Assuming I should store the list of strings in
a binary tree to facilitate fast searching, I had a look at the std.container
documentation and found RedBlackTree, but the documentation for it has no
examples. May someone offer an example of how to use it?

Thank you,
Dominic Jones
Feb 11 2011
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Dominic Jones:

 I have a list of strings and I want to determine whether or not a particular
 string in the is in that list.

What about using: size_t[sting] yourStringSet; Bye, bearophile
Feb 11 2011
next sibling parent spir <denis.spir gmail.com> writes:
On 02/11/2011 01:05 PM, bearophile wrote:
 Dominic Jones:

 I have a list of strings and I want to determine whether or not a particular
 string in the is in that list.

What about using: size_t[sting] yourStringSet; Bye, bearophile

bool[st<r>ing] yourStringSet; does the job and better matches semantics denis -- _________________ vita es estrany spir.wikidot.com
Feb 11 2011
prev sibling next sibling parent reply Dominic Jones <dominic.jones qmul.ac.uk> writes:
== Quote from bearophile (bearophileHUGS lycos.com)'s article
 Dominic Jones:
 I have a list of strings and I want to determine whether or not a particular
 string in the is in that list.

size_t[sting] yourStringSet; Bye, bearophile

Would that not be constructing an associated array? Whilst an associated array would do the job, there is no value for the "key:value" pair, just a list of keys. In the C++ STL there are the "set" and "map" containers. I want something like "set". Dominic
Feb 11 2011
next sibling parent bearophile <bearophileHUGS lycos.com> writes:
Dominic Jones:

 Would that not be constructing an associated array? Whilst an associated array
 would do the job, there is no value for the "key:value" pair, just a list of
keys.
 In the C++ STL there are the "set" and "map" containers. I want something like
"set".

Phobos2 is young, and it doesn't contain a HashSet yet. So you may use built-in associative arrays as a set, with a default value, or you may use the Ordered Tree Set that's in the collections module (look at the its unittests for some usage examples), or you may write a HashSet and then put it in Bugzilla so if it's good enough it will be added to Phobos. My suggestion is to use the built-in associative array. Bye, bearophile
Feb 11 2011
prev sibling next sibling parent reply =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 02/11/2011 10:35 AM, spir wrote:

 D's builtin AAs seem /very/ fast
 on key access. You'd probably have a hard time even approaching their
 performance using whatever kind of tree (or rather, of trie).

That is expected. D AAs are hash tables, meaning that they provide O(1) complexity for element access; compared to the O(log N) complexity of binary trees. For completeness: adding elements is still O(1) for a hash table; compared to O(N log N) of a tree. Ali
Feb 11 2011
next sibling parent =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 02/11/2011 04:55 PM, spir wrote:

 Also, trees are not always O(logN): tries () are O(1) for access,
 relative to number of elements, in fact their search is not related to
 that factor, just like hash table instead to the length of keys
 (O(log(length)).

Yep. I should know: I had written a patricia trie in the context of a networking ASIC. :)
 In D, not only there is no way (for me) to even approach builtin AA perf
 with tries (even with some search for optimisation), but those deadly
 flash-fast AAs are worth it as early as with a few items!

Thank you. It's very good to know that D's AAs are very fast. I had no idea. :) Ali
Feb 12 2011
prev sibling parent spir <denis.spir gmail.com> writes:
On 02/13/2011 01:18 AM, Ali Çehreli wrote:
 On 02/11/2011 04:55 PM, spir wrote:

  Also, trees are not always O(logN): tries () are O(1) for access,
  relative to number of elements, in fact their search is not related to
  that factor, just like hash table instead to the length of keys
  (O(log(length)).

Yep. I should know: I had written a patricia trie in the context of a networking ASIC. :)
  In D, not only there is no way (for me) to even approach builtin AA perf
  with tries (even with some search for optimisation), but those deadly
  flash-fast AAs are worth it as early as with a few items!

Thank you. It's very good to know that D's AAs are very fast. I had no idea. :)

Beware: I'm not saying this as an absolute truth out of extensive measures. Just comparing plain arrays of (key,value) pairs versus tries versus hashed AAs in the same language, done in 2 languages only (D and freepascal). Denis -- _________________ vita es estrany spir.wikidot.com
Feb 13 2011
prev sibling next sibling parent spir <denis.spir gmail.com> writes:
On 02/12/2011 12:56 AM, Ali Çehreli wrote:
 On 02/11/2011 10:35 AM, spir wrote:

  D's builtin AAs seem /very/ fast
  on key access. You'd probably have a hard time even approaching their
  performance using whatever kind of tree (or rather, of trie).

That is expected. D AAs are hash tables, meaning that they provide O(1) complexity for element access; compared to the O(log N) complexity of binary trees. For completeness: adding elements is still O(1) for a hash table; compared to O(N log N) of a tree.

You are right, but O(1) & O(log N) do not tell the whole story, --they're just proportional to a given factor that may be whatever. Also, trees are not always O(logN): tries () are O(1) for access, relative to number of elements, in fact their search is not related to that factor, just like hash table instead to the length of keys (O(log(length)). Hash tables actually have an access time firstly depending on hash computation time, which can be costly: they are like O(k), where can like for tries often depends on key size. Then statistically a linear time O(n) inside "buckets" enters the game; hard to manage & evaluate because it depends on average load, meaning numbered of elements relative to number of buckets. Then, it's a question of pondering time vs space cost. FWIW, I have implemented tries as fast as library hash tables for a single key type in freepascal (without even optimising). And both of those "sophisticated" data structures only were worth it above a threshold of about 100 items; I mean compared to plain arrays of (key,value) pairs! In D, not only there is no way (for me) to even approach builtin AA perf with tries (even with some search for optimisation), but those deadly flash-fast AAs are worth it as early as with a few items! (again, compared to arrays of (key,value) pairs). If you want numbers, search the archives of the PiLuD mailing list, I published much data in a thread there (last year): http://groups.google.com/group/pilud/ People, like me, concluded for instance that to implement namespaces it's really not worth it to use complicated data structures. We were wrong (I had not yet met D's AAs then). Denis -- _________________ vita es estrany spir.wikidot.com
Feb 11 2011
prev sibling parent Stewart Gordon <smjg_1998 yahoo.com> writes:
On 11/02/2011 12:30, Dominic Jones wrote:
<snip>
 Would that not be constructing an associated array? Whilst an associated array
 would do the job, there is no value for the "key:value" pair, just a list of
keys.
 In the C++ STL there are the "set" and "map" containers. I want something like
"set".
 Dominic

http://pr.stewartsplace.org.uk/d/sutil/ includes a set class template. Stewart.
Feb 18 2011
prev sibling next sibling parent spir <denis.spir gmail.com> writes:
On 02/11/2011 01:30 PM, Dominic Jones wrote:
 == Quote from bearophile (bearophileHUGS lycos.com)'s article
 Dominic Jones:
 I have a list of strings and I want to determine whether or not a particular
 string in the is in that list.

size_t[sting] yourStringSet; Bye, bearophile

Would that not be constructing an associated array? Whilst an associated array would do the job, there is no value for the "key:value" pair, just a list of keys. In the C++ STL there are the "set" and "map" containers. I want something like "set". Dominic

Precisely. D does not have a Set type yet, at least officially, it's on the way (std.container is beeing worked on). But a very common way to emulate sets in a language that has associative arrays is to build a bool[Element] AA, with true values only. Then, your keys are the elements, right? Values are just fake, but having them true bools, set[elem] returns true if present. Unfortunately, D would raise an error if absent. But there is even better in D: (key in AA) returns a pointer, which is null if absent. Thus, (elem in set) correctly simulates test membership. Note that in various languages (eg Python), builtin or library Set types are actually built like that. The reason is AAs are precisely built to make key lookup very fast, which is the main operation of a set. I guess (unsure) D's future set type will certainly also be built like that. I have one in stock, if you're interested. Denis -- _________________ vita es estrany spir.wikidot.com
Feb 11 2011
prev sibling parent spir <denis.spir gmail.com> writes:
On 02/11/2011 06:55 PM, spir wrote:
 On 02/11/2011 01:30 PM, Dominic Jones wrote:
 == Quote from bearophile (bearophileHUGS lycos.com)'s article
 Dominic Jones:
 I have a list of strings and I want to determine whether or not a particular
 string in the is in that list.

size_t[sting] yourStringSet; Bye, bearophile

Would that not be constructing an associated array? Whilst an associated array would do the job, there is no value for the "key:value" pair, just a list of keys. In the C++ STL there are the "set" and "map" containers. I want something like "set". Dominic


By the way, i may be wrong on that, but D's builtin AAs seem /very/ fast on key access. You'd probably have a hard time even approaching their performance using whatever kind of tree (or rather, of trie). I tried ;-) Both with string and bit-seq keys (in the latter case, thus using a bit-trie). If ever you have any good result on this path, I am very interested. denis -- _________________ vita es estrany spir.wikidot.com
Feb 11 2011
prev sibling next sibling parent reply Tom <no.email gmail.com> writes:
what with this?:
	auto arr = ["foo", "bar", "aaa", "zzz"];
	sort(arr);
	assert(canFind("foo"));
	assert(canFind("aaa"));
	assert(!canFind("aa"));
I had a "Error	1	Error: module algorithem is in file 'std\algorithem.d' which
cannot be read	main.d	"
when I tried to compile, so I couldn't check it.
Feb 11 2011
next sibling parent spir <denis.spir gmail.com> writes:
On 02/11/2011 08:46 PM, Tom wrote:
 what with this?:
 	auto arr = ["foo", "bar", "aaa", "zzz"];
 	sort(arr);
 	assert(canFind("foo"));
 	assert(canFind("aaa"));
 	assert(!canFind("aa"));
 I had a "Error	1	Error: module algorithem is in file 'std\algorithem.d' which
 cannot be read	main.d	"
 when I tried to compile, so I couldn't check it.

canFind is in std.algorithm: import it. (with the prefix "std.") denis -- _________________ vita es estrany spir.wikidot.com
Feb 11 2011
prev sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Fri, 11 Feb 2011 14:46:26 -0500, Tom <no.email gmail.com> wrote:

 what with this?:
 	auto arr = ["foo", "bar", "aaa", "zzz"];
 	sort(arr);
 	assert(canFind("foo"));
 	assert(canFind("aaa"));
 	assert(!canFind("aa"));
 I had a "Error	1	Error: module algorithem is in file 'std\algorithem.d'

it's spelled "algorithm", no e in there. -Steve
Feb 11 2011
parent reply Mafi <mafi example.org> writes:
Am 11.02.2011 21:38, schrieb Steven Schveighoffer:
 On Fri, 11 Feb 2011 14:46:26 -0500, Tom <no.email gmail.com> wrote:

 what with this?:
 auto arr = ["foo", "bar", "aaa", "zzz"];
 sort(arr);
 assert(canFind("foo"));
 assert(canFind("aaa"));
 assert(!canFind("aa"));
 I had a "Error 1 Error: module algorithem is in file 'std\algorithem.d'

it's spelled "algorithm", no e in there. -Steve

I allways try to import 'algorythm' because of the german word 'Algorythmus'. When this happend for the first time, I spent about five minutes to find out what's wrong. Mafi
Feb 11 2011
parent Mafi <mafi example.org> writes:
Am 12.02.2011 00:02, schrieb spir:
 On 02/11/2011 10:33 PM, Mafi wrote:
 I allways try to import 'algorythm' because of the german word
 'Algorythmus'.
 When this happend for the first time, I spent about five minutes to
 find out
 what's wrong.

It is spelled Algorithmus in German, no ypsilon ;-) http://de.wiktionary.org/wiki/Algorithmus The word is not from greek, but from arabic/persian etymology (like many words starting in al-). From wiktionary: http://en.wiktionary.org/wiki/algorithm: From French algorithme; from the Old French algorisme ("the Arabic numeral system"), a modification likely due to a mistaken connection with Greek ἀριθμός (number); from Medieval Latin algorismus, a transliteration of the name of the Persian mathematician al-Khwārizmī (Arabic: الخوارزمي, "native of Khwarezm.") denis

I'm going to correct all my personal notes about algorithms. Mafi
Feb 13 2011
prev sibling next sibling parent spir <denis.spir gmail.com> writes:
On 02/11/2011 10:33 PM, Mafi wrote:
 Am 11.02.2011 21:38, schrieb Steven Schveighoffer:
 On Fri, 11 Feb 2011 14:46:26 -0500, Tom <no.email gmail.com> wrote:

 what with this?:
 auto arr = ["foo", "bar", "aaa", "zzz"];
 sort(arr);
 assert(canFind("foo"));
 assert(canFind("aaa"));
 assert(!canFind("aa"));
 I had a "Error 1 Error: module algorithem is in file 'std\algorithem.d'

it's spelled "algorithm", no e in there. -Steve

I allways try to import 'algorythm' because of the german word 'Algorythmus'. When this happend for the first time, I spent about five minutes to find out what's wrong.

It is spelled Algorithmus in German, no ypsilon ;-) http://de.wiktionary.org/wiki/Algorithmus The word is not from greek, but from arabic/persian etymology (like many words starting in al-). From wiktionary: http://en.wiktionary.org/wiki/algorithm: From French algorithme; from the Old French algorisme ("the Arabic numeral system"), a modification likely due to a mistaken connection with Greek ἀριθμός (number); from Medieval Latin algorismus, a transliteration of the name of the Persian mathematician al-Khwārizmī (Arabic: الخوارزمي, "native of Khwarezm.") denis -- _________________ vita es estrany spir.wikidot.com
Feb 11 2011
prev sibling parent reply Lutger Blijdestijn <lutger.blijdestijn gmail.com> writes:
Dominic Jones wrote:

 Hello,
 
 I have a list of strings and I want to determine whether or not a
 particular string in the is in that list. Assuming I should store the list
 of strings in a binary tree to facilitate fast searching, I had a look at
 the std.container documentation and found RedBlackTree, but the
 documentation for it has no examples. May someone offer an example of how
 to use it?
 
 Thank you,
 Dominic Jones

I tried it out and it's simple to use but I stumbled upon some issues. (See below) As others have commented, regular AA's will work fine for this purpose too. Probably the best reason why you would choose a RedBlackTree over AA is when you also need them sorted, this you get for free. A tree is also likely to consume less memory, which may or may not matter. Example: import std.stdio; import std.container; void main() { auto stuff = ["foo", "bar"]; /* using make instead of the constructor ensures the code will work when RedBlackTree will become a class: */ auto set = make!(RedBlackTree!string)(stuff); /* iterates in order: */ foreach( value; set ) writeln(value); set.stableInsert("baz"); /* the 'in' operator returns bool, not a pointer to the element like builtin aa's do: */ assert( "baz" in set ); } Now for the issues, this doesn't work but it should: auto set = make!(RedBlackTree!string)("foo", "bar"); Error: template std.container.RedBlackTree!(string).RedBlackTree.__ctor(U) if (isImplicitlyConvertible!(U,Elem)) does not match any function template declaration Error: template std.container.RedBlackTree!(string).RedBlackTree.__ctor(U) if (isImplicitlyConvertible!(U,Elem)) cannot deduce template function from argument types !()(string,string) ... I couldn't see any function to remove a single element from the tree and std.algorithm.remove doesn't work. Removing a range also doesn't work for strings: set.remove(stuff); Error: function std.container.RedBlackTree!(string).RedBlackTree.remove (Range r) is not callable using argument types (string[]) Error: cannot implicitly convert expression (stuff) of type string[] to Range I'll file these issues soon, when I have the time.
Feb 13 2011
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Sun, 13 Feb 2011 09:29:14 -0500, Lutger Blijdestijn  
<lutger.blijdestijn gmail.com> wrote:

 Dominic Jones wrote:

 Hello,

 I have a list of strings and I want to determine whether or not a
 particular string in the is in that list. Assuming I should store the  
 list
 of strings in a binary tree to facilitate fast searching, I had a look  
 at
 the std.container documentation and found RedBlackTree, but the
 documentation for it has no examples. May someone offer an example of  
 how
 to use it?

 Thank you,
 Dominic Jones

I tried it out and it's simple to use but I stumbled upon some issues. (See below) As others have commented, regular AA's will work fine for this purpose too. Probably the best reason why you would choose a RedBlackTree over AA is when you also need them sorted, this you get for free. A tree is also likely to consume less memory, which may or may not matter. Example: import std.stdio; import std.container; void main() { auto stuff = ["foo", "bar"]; /* using make instead of the constructor ensures the code will work when RedBlackTree will become a class: */ auto set = make!(RedBlackTree!string)(stuff); /* iterates in order: */ foreach( value; set ) writeln(value); set.stableInsert("baz"); /* the 'in' operator returns bool, not a pointer to the element like builtin aa's do: */ assert( "baz" in set ); } Now for the issues, this doesn't work but it should: auto set = make!(RedBlackTree!string)("foo", "bar"); Error: template std.container.RedBlackTree!(string).RedBlackTree.__ctor(U) if (isImplicitlyConvertible!(U,Elem)) does not match any function template declaration Error: template std.container.RedBlackTree!(string).RedBlackTree.__ctor(U) if (isImplicitlyConvertible!(U,Elem)) cannot deduce template function from argument types !()(string,string) ... I couldn't see any function to remove a single element from the tree and std.algorithm.remove doesn't work. Removing a range also doesn't work for strings: set.remove(stuff); Error: function std.container.RedBlackTree!(string).RedBlackTree.remove (Range r) is not callable using argument types (string[]) Error: cannot implicitly convert expression (stuff) of type string[] to Range I'll file these issues soon, when I have the time.

Yes, thank you. RedBlackTree is certainly not well-used, so it is bound to have lots of usability issues. -Steve
Feb 14 2011
parent Lutger Blijdestijn <lutger.blijdestijn gmail.com> writes:
Sorry I almost forgot: http://d.puremagic.com/issues/show_bug.cgi?id=5640

The issue with remove is talked about in digitalmars.D and perhaps not 
really specific to RedBlackTree.
Feb 22 2011