www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - About std.container.RedBlackTree

reply bearophile <bearophileHUGS lycos.com> writes:
I've had to use a search tree, so RedBlackTree was the right data structure. It
seems to do what I need, so thank you for this useful data structure. Some of
the things I write here are questions or things that show my ignorance about
this implementation.

---------------------

Please add some usage examples to this page, this is important and helps people
reduce a lot the number of experiments to do to use this tree:
http://www.digitalmars.com/d/2.0/phobos/std_container.html#

---------------------

This doesn't seem to work, it gives a forward reference error:

import std.container: RedBlackTree;
RedBlackTree!int t;
void main() {
    t = RedBlackTree!int(1);
}



This gives a similar error:

import std.container: RedBlackTree;
struct Foo {
    RedBlackTree!int t;
    void initialize() {
        t = RedBlackTree!int(1);
    }
}
void main() {}

---------------------

I need to create an empty tree and add items to it later (where I declare a
fixed-sized array of trees I don't know the items to add). How do you do it?
This code doesn't work:


import std.container: RedBlackTree;
void main() {
    auto t = RedBlackTree!int();
    t.insert(1);
}

---------------------

Is the tree insert() method documented in the HTML docs?

---------------------

A tree is a kind of set, so instead of "insert()" I'd like a name like "add()".
(But maybe this is not standard in D).

---------------------

In theory an helper redBlackTree() function allows to write just this, with no
need to write types:

redBlackTree(1, 2, 3)

---------------------

I have tried to use printTree(), but I have failed. I don't know what to give
to it and the docs don't say that it requires -unittest

If it's a private debug function then there's no need to give it a ddoc comment.

---------------------

I've seen that the tree doesn't seem to contain a length. Using walkLength is
an option, but a possible idea is to replace:
struct RedBlackTree(T,alias less = "a < b",bool allowDuplicates = false)

With:
struct RedBlackTree(T, alias less="a < b", bool allowDuplicates=false, bool
withLength=false)

---------------------

If you need to add many value nodes quickly to the tree a memory pool may speed
up mass allocation. This has some disadvantages too.

---------------------

If I have to create a set of epsilon-approximate floating point numbers, what's
the most efficient way to use a RedBlackTree to see if it contains a value x
(or values very close to x), and othrwise add x to the tree?

I was thinking about something like this, but it doesn't look very good because
it performs three lookups (this function returns the number of items added):


int approxInsert(RedBlackTree!double set, double x, double epsilon) {
    if (x in set) {
        return 0;
    } else {
        auto nextOnes = set.upperBound(x);
        if (!nextOnes.empty() && (nextOnes.front - x) < epsilon) {
            return 0;
        }
        auto precedentOnes = set.lowerBound(x);
        if (!precedentOnes.empty() && (x - precedentOnes.back) < epsilon) {
            return 0;
        } else {
            set.insert(x);
            return 1;
        }
    }
}

---------------------

This code runs with no errors, is this correct? The two Foos have the same x
but different y:


import std.container: RedBlackTree;
struct Foo {
    int x, y;
}
void main() {
    auto t = RedBlackTree!(Foo, "a.x < b.x")(Foo(1,1));
    assert(Foo(1,2) in t);
}

(In practice this looks like a tree map instead of a tree set.)
If this is correct then I suggest to add this example to the std_container.html
page.

---------------------

Bye and thank you,
bearophile
Jan 10 2011
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Mon, 10 Jan 2011 18:14:31 -0500, bearophile <bearophileHUGS lycos.com>  
wrote:

 I've had to use a search tree, so RedBlackTree was the right data  
 structure. It seems to do what I need, so thank you for this useful data  
 structure. Some of the things I write here are questions or things that  
 show my ignorance about this implementation.

 ---------------------

 Please add some usage examples to this page, this is important and helps  
 people reduce a lot the number of experiments to do to use this tree:
 http://www.digitalmars.com/d/2.0/phobos/std_container.html#

I will do this, when I have some free time. If you want to submit some examples, I would gladly include them.
 ---------------------

 This doesn't seem to work, it gives a forward reference error:

 import std.container: RedBlackTree;
 RedBlackTree!int t;
 void main() {
     t = RedBlackTree!int(1);
 }

Grrr... I had issues with forward references (you can see from this comment: http://www.dsource.org/projects/phobos/browser/trunk/phobos/std container.d#L4071), I thought by reordering the functions I had fixed it, but apparently, it resurfaces under certain conditions. Please vote for that bug. http://d.puremagic.com/issues/show_bug.cgi?id=2810 I don't really know what to do about fixing it. Most likely any 'fix' I try will result in some other situation not compiling. I probably should just avoid using auto, not being able to declare a red black tree as a global variable is a huge limitation.
 ---------------------

 I need to create an empty tree and add items to it later (where I  
 declare a fixed-sized array of trees I don't know the items to add). How  
 do you do it?
 This code doesn't work:


 import std.container: RedBlackTree;
 void main() {
     auto t = RedBlackTree!int();
     t.insert(1);
 }

RedBlackTree must be initialized with a constructor. Otherwise, your root node is null. I chose this path instead of checking for null on every function. I realize the mistake -- you cannot create an empty tree, because you cannot have a default constructor. I have another function that I use to help create trees during unit tests because IFTI can be weird. I will make this function public and always present, then you can create an empty tree like this: auto t = RedBlackTree!int.create(); If Andrei decides eventually that containers should be classes, then this problem goes away. Please bugzillize this
 ---------------------

 Is the tree insert() method documented in the HTML docs?

I thought this would do it, but apparently it doesn't: http://www.dsource.org/projects/phobos/browser/trunk/phobos/std/container.d#L4457 I will try to make those docs show up.
 ---------------------

 A tree is a kind of set, so instead of "insert()" I'd like a name like  
 "add()".
 (But maybe this is not standard in D).

The function names must be consistent across containers, because the point is that complexity and semantic requirements are attached to the function name. The function names were decided long ago by Andrei, and I don't think insert is a bad name (I believe std::set and std::map in C++ STL uses insert).
 ---------------------

 In theory an helper redBlackTree() function allows to write just this,  
 with no need to write types:

 redBlackTree(1, 2, 3)

Yes, this should be done. Please make a bugzilla report. In fact, this can extend to all std.container types.
 ---------------------

 I have tried to use printTree(), but I have failed. I don't know what to  
 give to it and the docs don't say that it requires -unittest

 If it's a private debug function then there's no need to give it a ddoc  
 comment.

It is a private debug function, only enabled when version = doRBChecks is enabled. When developing the red black node, the red-black algorithms to fix the tree are very complex and error prone to write. This function basically printed the tree layout in a horribly ugly fashion when the red-black properties were not preserved. It helped me find bugs, but is mostly no-longer needed unless I try some more optimizations. Please ignore the function. I will make sure the comment is not ddoc'd.
 ---------------------

 I've seen that the tree doesn't seem to contain a length. Using  
 walkLength is an option, but a possible idea is to replace:
 struct RedBlackTree(T,alias less = "a < b",bool allowDuplicates = false)

 With:
 struct RedBlackTree(T, alias less="a < b", bool allowDuplicates=false,  
 bool withLength=false)

The reason for this is to keep it a reference-type (pImpl style), but I realize that I can easily fix this (I can just make the root node contain a length field). Please file a bugzilla to add length.
 ---------------------

 If you need to add many value nodes quickly to the tree a memory pool  
 may speed up mass allocation. This has some disadvantages too.

I have done this in dcollections, and it helps immensely in node-based containers. It is not clear to me how std.container will incorporate these types of things, so Andrei and I decided to just defer it for now. RedBlackTree is easily modified to use an allocator, all allocation and de-allocations are done via a centralized function. In fact, the same RBNode is used in dcollections. The allocator I created in dcollections is also the default allocator in Tango containers too. I anticipate that something like it will eventually make its way into phobos.
 ---------------------

 If I have to create a set of epsilon-approximate floating point numbers,  
 what's the most efficient way to use a RedBlackTree to see if it  
 contains a value x (or values very close to x), and othrwise add x to  
 the tree?

 I was thinking about something like this, but it doesn't look very good  
 because it performs three lookups (this function returns the number of  
 items added):


 int approxInsert(RedBlackTree!double set, double x, double epsilon) {
     if (x in set) {
         return 0;
     } else {
         auto nextOnes = set.upperBound(x);
         if (!nextOnes.empty() && (nextOnes.front - x) < epsilon) {
             return 0;
         }
         auto precedentOnes = set.lowerBound(x);
         if (!precedentOnes.empty() && (x - precedentOnes.back) <  
 epsilon) {
             return 0;
         } else {
             set.insert(x);
             return 1;
         }
     }
 }

What about changing the predicate? I know it will likely violate the requirements of a strict ordering, but it should probably work.
 ---------------------

 This code runs with no errors, is this correct? The two Foos have the  
 same x but different y:


 import std.container: RedBlackTree;
 struct Foo {
     int x, y;
 }
 void main() {
     auto t = RedBlackTree!(Foo, "a.x < b.x")(Foo(1,1));
     assert(Foo(1,2) in t);
 }

 (In practice this looks like a tree map instead of a tree set.)
 If this is correct then I suggest to add this example to the  
 std_container.html page.

Yes, the intention is that you can create a map type in this fashion. If you define a predicate, then it is up to you to ensure the predicate makes sense for you. I'm unsure whether a map type could actually be implemented using RedBlackTree, but if it's not possible, I will make it possible.
 ---------------------

 Bye and thank you,
 bearophile

Thank you for your detailed analysis! I will hope to fix all the problems soon. -Steve
Jan 11 2011
next sibling parent spir <denis.spir gmail.com> writes:
On 01/11/2011 02:22 PM, Steven Schveighoffer wrote:
 A tree is a kind of set, so instead of "insert()" I'd like a name like
 "add()".
 (But maybe this is not standard in D).

The function names must be consistent across containers, because the point is that complexity and semantic requirements are attached to the function name. The function names were decided long ago by Andrei, and I don't think insert is a bad name (I believe std::set and std::map in C++ STL uses insert).

I have thought at this naming issue, precisely, for a while. "add" is bad because of connotation with "addition". D does not use '+' as operator for putting new elements in a container: this is a very sensible choice imo. "insert" is bad because of "in-between" connotation: does not fit when putting an element at the end of a seq, even less for unordered containers. "put" instead seems to me the right term, obvious and general enough: one puts a new element in there. This can nicely adapt to very diverse container types such as sequences including stacks (no explicite index --> put at end), sets/AAs, trees,... Denis _________________ vita es estrany spir.wikidot.com
Jan 11 2011
prev sibling next sibling parent "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:
On Tue, 11 Jan 2011 15:13:13 +0100, spir wrote:

 On 01/11/2011 02:22 PM, Steven Schveighoffer wrote:
 A tree is a kind of set, so instead of "insert()" I'd like a name like
 "add()".
 (But maybe this is not standard in D).

The function names must be consistent across containers, because the point is that complexity and semantic requirements are attached to the function name. The function names were decided long ago by Andrei, and I don't think insert is a bad name (I believe std::set and std::map in C++ STL uses insert).

I have thought at this naming issue, precisely, for a while. "add" is bad because of connotation with "addition". D does not use '+' as operator for putting new elements in a container: this is a very sensible choice imo. "insert" is bad because of "in-between" connotation: does not fit when putting an element at the end of a seq, even less for unordered containers. "put" instead seems to me the right term, obvious and general enough: one puts a new element in there. This can nicely adapt to very diverse container types such as sequences including stacks (no explicite index --> put at end), sets/AAs, trees,...

I seem to remember this being discussed before, and 'put' being rejected for fear of confusion with the output range interface. -Lars
Jan 12 2011
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 12 Jan 2011 03:45:11 -0500, Lars T. Kyllingstad  
<public kyllingen.nospamnet> wrote:

 On Tue, 11 Jan 2011 15:13:13 +0100, spir wrote:

 On 01/11/2011 02:22 PM, Steven Schveighoffer wrote:
 A tree is a kind of set, so instead of "insert()" I'd like a name like
 "add()".
 (But maybe this is not standard in D).

The function names must be consistent across containers, because the point is that complexity and semantic requirements are attached to the function name. The function names were decided long ago by Andrei, and I don't think insert is a bad name (I believe std::set and std::map in C++ STL uses insert).

I have thought at this naming issue, precisely, for a while. "add" is bad because of connotation with "addition". D does not use '+' as operator for putting new elements in a container: this is a very sensible choice imo. "insert" is bad because of "in-between" connotation: does not fit when putting an element at the end of a seq, even less for unordered containers. "put" instead seems to me the right term, obvious and general enough: one puts a new element in there. This can nicely adapt to very diverse container types such as sequences including stacks (no explicite index --> put at end), sets/AAs, trees,...

I seem to remember this being discussed before, and 'put' being rejected for fear of confusion with the output range interface.

technically, it is an output range ;) -Steve
Jan 13 2011
prev sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Sorry for being so late with my answer. I am busy for some days.

Steven Schveighoffer:

I don't really know what to do about fixing it.

It's not your fault. The forward reference errors I've found with RedBlackTree are so bad that in the end I've created a different algorithm that doesn't use a search tree.
I probably should just avoid using auto,

Good.
not being able to declare a red black tree as a global variable is a huge
limitation.<

Also as instance variable for a struct.
 RedBlackTree must be initialized with a constructor.  Otherwise, your root  
 node is null.  I chose this path instead of checking for null on every  
 function.

This is OK, given your original design choice of using PIMPL.
 auto t = RedBlackTree!int.create();
 Please bugzillize this

OK.
 The function names must be consistent across containers, because the point  
 is that complexity and semantic requirements are attached to the function  
 name.  The function names were decided long ago by Andrei, and I don't  
 think insert is a bad name (I believe std::set and std::map in C++ STL  
 uses insert).

Python set uses "add", I prefer it for a set :-) But "insert" is acceptable.
 redBlackTree(1, 2, 3)

can extend to all std.container types.

OK. I will create a single bugzilla report for RedBlackTree.
 It helped me find bugs, but is  
 mostly no-longer needed unless I try some more optimizations.
 Please ignore the function.  I will make sure the comment is not ddoc'd.

Please hide it, but don't remove it from the module :-)
 With:
 struct RedBlackTree(T, alias less="a < b", bool allowDuplicates=false,  
 bool withLength=false)

The reason for this is to keep it a reference-type (pImpl style), but I realize that I can easily fix this (I can just make the root node contain a length field). Please file a bugzilla to add length.

OK. The withLength template argument plus some static ifs allow to have zero overhead for people that don't need the O(1) length. I think withLength is better true on default, because removing the length is an optimization...
 I have done this in dcollections, and it helps immensely in node-based  
 containers.

I agree, I have seen similar things (D1 programs that become twice faster, etc), that's why I have suggested it.
 It is not clear to me how std.container will incorporate  
 these types of things, so Andrei and I decided to just defer it for now.

A template trait...
 What about changing the predicate?  I know it will likely violate the  
 requirements of a strict ordering, but it should probably work.

I am not sure it will work, but in the end I have used a very different solution. Implementing an approximate (floating point) hash with a search tree is a common enough need, so you may add something like that approxInsert() as an free templated function in the std.collections module :-)
 Yes, the intention is that you can create a map type in this fashion.  If  
 you define a predicate, then it is up to you to ensure the predicate makes  
 sense for you.

OK. Then probably this important explanation (and example too, if you want) must be added to the RedBlackTree docs.
 Thank you for your detailed analysis!

You are welcome. And thank you for implementing this important collection. Bye, bearophile
Jan 13 2011
parent bearophile <bearophileHUGS lycos.com> writes:
 OK. I will create a single bugzilla report for RedBlackTree.

http://d.puremagic.com/issues/show_bug.cgi?id=5451 Bye, bearophile
Jan 13 2011