www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - New set class

reply Michiel <nomail please.com> writes:
I have written a new class to represent sets, with several useful
features. Uploaded for your review and free use. I don't know much about
open source and licensing, but I request that if you use the class, that
you keep my name in the author section of the documentation.

http://files.michiel.eu/set.d

Tell me what you think.

There still are some problems, though. I could use some help:

* There are still problems involving static arrays as the set type. The
set literal creates a set of the exact type you specify. So if you use a
string literal, it will be a set of static char arrays. It is only a
sort of hack in the Set class that allows for the dynamic array
equivalent to also be accepted in. What I really need is to implement
the set literal function so it creates a set of the dynamic array type
directly. But I've run into some problems, sometimes strangely involving
char-encodings. (invalid UTF-8 sequence, printed to the terminal at
runtime when I try to print the toString())

* When I enable unit testing, several things can go wrong. Sometimes I
get a segmentation fault. Sometimes I get a program that doesn't
terminate and eats all of the memory until I kill it.

-- 
Michiel
Feb 24 2007
next sibling parent reply Bill Baxter <dnewsgroup billbaxter.com> writes:
Michiel wrote:
 I have written a new class to represent sets, with several useful
 features. Uploaded for your review and free use. I don't know much about
 open source and licensing, but I request that if you use the class, that
 you keep my name in the author section of the documentation.
 
 http://files.michiel.eu/set.d
 
 Tell me what you think.
If you add an opApply with the signature int opApply(int delegate(uint i, inout Type) dg) { ... } it will allow this kind of foreach too: foreach(i, e; myset) { ... } Also I would just leave out opApplyReverse. That way using it is a compile-time error. Or if you're dead set on it, at least have it throw an exception. I believe assert() is a no-op in release builds, meaning your program will fail in strange ways in release. Actually I'm surprised that compiles without a return statement. Finally Set!(T[0]) set(T...)(T elements) why not just Set!(T) set(T)(T elements...) ?? Have you looked at the HashSet implementation in Tango for comparison? --bb
Feb 24 2007
parent reply Michiel <nomail please.com> writes:
Bill Baxter wrote:

 I have written a new class to represent sets, with several useful
 features. Uploaded for your review and free use. I don't know much about
 open source and licensing, but I request that if you use the class, that
 you keep my name in the author section of the documentation.

 http://files.michiel.eu/set.d

 Tell me what you think.
If you add an opApply with the signature int opApply(int delegate(uint i, inout Type) dg) { ... } it will allow this kind of foreach too: foreach(i, e; myset) { ... }
That's why I didn't do it. What meaning would it have for a set, since they have no index?
 Also I would just leave out opApplyReverse.  That way using it is a
 compile-time error.  Or if you're dead set on it, at least have it throw
 an exception.  I believe assert() is a no-op in release builds, meaning
 your program will fail in strange ways in release.  Actually I'm
 surprised that compiles without a return statement.
assert(0) is evaluated by D as a point in the code you can't ever reach. I got the idea from the opCmp example: -------------------------------------- For some objects, testing for less or greater makes no sense. For these, override opCmp() with: class A { int opCmp(Object o) { assert(0); // comparison makes no sense return 0; } } -------------------------------------- I see that that has a return statement, but I don't see why it would need one. assert(0) always exists the function. I think it's a feature of the compiler that it recognizes this.
 Finally
    Set!(T[0]) set(T...)(T elements)
 why not just
    Set!(T) set(T)(T elements...) ??
You only use that syntax if T is an aggregate type. So the resulting set would actually be Set!(T[0]) anyway. I think I tried this before and had some problems, but I'm not certain now.
 Have you looked at the HashSet implementation in Tango for comparison?
I had not. Is this the one? http://www.dsource.org/projects/tango/docs/current/tango.util.collection.HashSet.html Though I haven't looked at the source code, I see it exposes many of its internal workings to the outside. When you are working with a set you don't want to know if it's using a hash table or a balanced tree. Just that it implements a set. Though I'm sure that they put more work into it than I and that it's more efficient. I just sneakily used an associative array on the inside. Which uses its own hash table, if I'm not mistaken. -- Michiel
Feb 25 2007
parent reply Bill Baxter <dnewsgroup billbaxter.com> writes:
Michiel wrote:
 Bill Baxter wrote:
 
 I have written a new class to represent sets, with several useful
 features. Uploaded for your review and free use. I don't know much about
 open source and licensing, but I request that if you use the class, that
 you keep my name in the author section of the documentation.

 http://files.michiel.eu/set.d

 Tell me what you think.
If you add an opApply with the signature int opApply(int delegate(uint i, inout Type) dg) { ... } it will allow this kind of foreach too: foreach(i, e; myset) { ... }
That's why I didn't do it. What meaning would it have for a set, since they have no index?
It's useful to the user who wants to print out the items numbered or something. Or they have a set of 10 items and they want to copy them to a preallocated 10-item list. Or they want to make another associative array that maps numbers to the items in the set. Or they want to do something with only the first five items in the set then if(i>5) break;. Or they want to append a number to every item in the set. Or they want to split the set into two smaller sets arbitrarily using if (i%2). Yes the user can always declare their own damn counter, but it doesn't make your code much more complex, and it does make their lives a little bit easier.
 
 Also I would just leave out opApplyReverse.  That way using it is a
 compile-time error.  Or if you're dead set on it, at least have it throw
 an exception.  I believe assert() is a no-op in release builds, meaning
 your program will fail in strange ways in release.  Actually I'm
 surprised that compiles without a return statement.
assert(0) is evaluated by D as a point in the code you can't ever reach. I got the idea from the opCmp example:
Interesting. I didn't realize that. That's really not obvious from the docs that that's what happens: """ The expression assert(0) is a special case; it signifies that it is unreachable code. Either AssertError is thrown at runtime if it is reachable, or the execution is halted (on the x86 processor, a HLT instruction can be used to halt execution). The optimization and code generation phases of compilation may assume that it is unreachable code. """ -- http://www.digitalmars.com/d/expression.html#AssertExpression Nowhere does it say it that it is "unconditional regardless of debug/release flags". This page talks about asserts but doesn't mention debug/release behavior either. http://www.digitalmars.com/d/dbc.html Actually I can't find where disabling asserts for release builds is documented, though I'm sure I've seen it somewhere before. A quick test, however, reveals that you are indeed correct: "regular" asserts turn off in a release build, but assert(0) does not. Well that's good to know. Still, it generates a runtime error, when you really would prefer a compile-time error. So I think it's a bad idea to start putting in lots of opFoo() { assert(0,"This method doesn't exist"); } Once you start putting those in, where do you stop? I'm sure there are other operators and functions that Set does not implement. It's almost like comments along the lines of: // this is an integer that does the counting: int counter; The fact that the method is not there is all the documentation you need. But if you really want to do put such things in your code you could try this: struct Set(T) { ... void opApplyReverse()() {static assert(0,"don't be silly");} } Since foo is a template member, no attempt is made to compile it unless it's actually called from somewhere. If you do try to call it then you trip the static assert at compile-time.
 Finally
    Set!(T[0]) set(T...)(T elements)
 why not just
    Set!(T) set(T)(T elements...) ??
You only use that syntax if T is an aggregate type. So the resulting set would actually be Set!(T[0]) anyway. I think I tried this before and had some problems, but I'm not certain now.
I don't follow what you mean about "aggregate type". The unittest uses int as the type. I don't think 'int' is considered an aggregate.
 Have you looked at the HashSet implementation in Tango for comparison?
I had not. Is this the one? http://www.dsource.org/projects/tango/docs/current/tango.util.collection.HashSet.html
Yes.
 Though I haven't looked at the source code, I see it exposes many of its
 internal workings to the outside. When you are working with a set you
 don't want to know if it's using a hash table or a balanced tree. Just
 that it implements a set.
It also doesn't include set operations like you have. (union, intersection, difference, symmetric difference). And I agree with you that it looks like it exposes too much of its innards for most users.
 Though I'm sure that they put more work into it than I and that it's
 more efficient. I just sneakily used an associative array on the inside.
 Which uses its own hash table, if I'm not mistaken.
It looks to be based on code by Doug Lea, who wrote dlmalloc, one of the fastest implementations of malloc around. So I suspect performance is pretty good. Another thing you might want to add to your set class is opCat (~) and opCatAssign (~=). Those are pretty commonly used in D for appending something to something else. They'd just be synonyms for add and union. --bb
Feb 25 2007
parent reply Michiel <nomail please.com> writes:
Bill Baxter wrote:

 If you add an opApply with the signature
    int opApply(int delegate(uint i, inout Type) dg) { ... }

 it will allow this kind of foreach too:
    foreach(i, e; myset) {
        ...
    }
That's why I didn't do it. What meaning would it have for a set, since they have no index?
It's useful to the user who wants to print out the items numbered or something. Or they have a set of 10 items and they want to copy them to a preallocated 10-item list. Or they want to make another associative array that maps numbers to the items in the set. Or they want to do something with only the first five items in the set then if(i>5) break;. Or they want to append a number to every item in the set. Or they want to split the set into two smaller sets arbitrarily using if (i%2). Yes the user can always declare their own damn counter, but it doesn't make your code much more complex, and it does make their lives a little bit easier.
I feel that if I added that syntax, it would imply that the counter has some sort of relevance to the set, which it really doesn't. A counter that increments by one each iteration has real meaning for an array. So does a key-type for an associative array. But for a set... sorry, I'd just feel dirty adding it.
 Also I would just leave out opApplyReverse.  That way using it is a
 compile-time error.  Or if you're dead set on it, at least have it throw
 an exception.  I believe assert() is a no-op in release builds, meaning
 your program will fail in strange ways in release.  Actually I'm
 surprised that compiles without a return statement.
assert(0) is evaluated by D as a point in the code you can't ever reach. I got the idea from the opCmp example:
Interesting. I didn't realize that. That's really not obvious from the docs that that's what happens: """ The expression assert(0) is a special case; it signifies that it is unreachable code. Either AssertError is thrown at runtime if it is reachable, or the execution is halted (on the x86 processor, a HLT instruction can be used to halt execution). The optimization and code generation phases of compilation may assume that it is unreachable code. """ -- http://www.digitalmars.com/d/expression.html#AssertExpression Nowhere does it say it that it is "unconditional regardless of debug/release flags". This page talks about asserts but doesn't mention debug/release behavior either. http://www.digitalmars.com/d/dbc.html Actually I can't find where disabling asserts for release builds is documented, though I'm sure I've seen it somewhere before. A quick test, however, reveals that you are indeed correct: "regular" asserts turn off in a release build, but assert(0) does not. Well that's good to know. Still, it generates a runtime error, when you really would prefer a compile-time error. So I think it's a bad idea to start putting in lots of opFoo() { assert(0,"This method doesn't exist"); } Once you start putting those in, where do you stop? I'm sure there are other operators and functions that Set does not implement. It's almost like comments along the lines of: // this is an integer that does the counting: int counter; The fact that the method is not there is all the documentation you need. But if you really want to do put such things in your code you could try this: struct Set(T) { ... void opApplyReverse()() {static assert(0,"don't be silly");} } Since foo is a template member, no attempt is made to compile it unless it's actually called from somewhere. If you do try to call it then you trip the static assert at compile-time.
No, if you use a static assert, the compilation will fail as soon as you instantiate the class somewhere. Not if you try to use foreach_reverse. If it weren't a template class, the compiler would throw the error in any case. You're right that a compiler error would be preferable to a runtime error, but simply leaving the function out doesn't give as clear an error message ("Error: cannot infer type for e"). It would be ideal if I could have both. But well, I don't understand why unittests are done at runtime either.
 Finally
    Set!(T[0]) set(T...)(T elements)
 why not just
    Set!(T) set(T)(T elements...) ??
You only use that syntax if T is an aggregate type. So the resulting set would actually be Set!(T[0]) anyway. I think I tried this before and had some problems, but I'm not certain now.
I don't follow what you mean about "aggregate type". The unittest uses int as the type. I don't think 'int' is considered an aggregate.
Well, That's what the 'void func(T name...)' syntax is for. If T is an aggregate type (array, class, struct, etc.) you can either supply that type to the func, or all of its members separated by comma's. I suspect that you meant this one: 'void func(T name, ...)', which is something different. I haven't tried it, and I will.
 Another thing you might want to add to your set class is opCat (~) and
 opCatAssign (~=).  Those are pretty commonly used in D for appending
 something to something else.  They'd just be synonyms for add and union.
Good idea. I'll add them. -- Michiel
Feb 26 2007
parent reply Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Michiel wrote:
 Bill Baxter wrote:
 
[snip suggestion about counters in set-foreach]
 I feel that if I added that syntax, it would imply that the counter has
 some sort of relevance to the set, which it really doesn't.
 
 A counter that increments by one each iteration has real meaning for an
 array. So does a key-type for an associative array. But for a set...
 sorry, I'd just feel dirty adding it.
I agree with you here, sets don't have indexes for their elements, they are unordered.
 But if you really want to do put such things in your code you could try
 this:
 struct Set(T) {
     ...
     void opApplyReverse()() {static assert(0,"don't be silly");}
 }

 Since foo is a template member, no attempt is made to compile it unless
 it's actually called from somewhere.  If you do try to call it then you
 trip the static assert at compile-time.
No, if you use a static assert, the compilation will fail as soon as you instantiate the class somewhere. Not if you try to use foreach_reverse. If it weren't a template class, the compiler would throw the error in any case.
The _member_ is _also_ a template here (with 0 parameters, but a template nonetheless). So compilation will only fail if that template is instantiated, which will happen only if foreach_reverse is used (or it's explicitly called, of course). Here, let me show you (some modifications made to above code) --- urxae urxae:~/tmp$ cat test.d import std.stdio; struct Set(T) { void opApplyReverse()(int delegate(inout T)) { static assert(0,"don't be silly");} } void main() { Set!(int) x; version(ForeachReverse) foreach_reverse(int e; x) {} } urxae urxae:~/tmp$ dmd test.d && ./test gcc test.o -o test -m32 -lphobos -lpthread -lm -Xlinker -L/home/urxae/opt/dmd/lib urxae urxae:~/tmp$ dmd -version=ForeachReverse test.d test.d(4): static assert (0) is false, "don't be silly" --- The static assert is only triggered with -version=ForeachReverse.
 You're right that a compiler error would be preferable to a runtime
 error, but simply leaving the function out doesn't give as clear an
 error message ("Error: cannot infer type for e"). It would be ideal if I
 could have both.
No, "cannot infer type for e" is indeed a bit unclear, it doesn't explain *why* it happened.
 But well, I don't understand why unittests are done at runtime either.
Yeah, there's been some discussion about that. A lot of people would like them to be ran after compiling & linking, but so far no change has been made to allow this to be done.
 Finally
    Set!(T[0]) set(T...)(T elements)
 why not just
    Set!(T) set(T)(T elements...) ??
You only use that syntax if T is an aggregate type. So the resulting set would actually be Set!(T[0]) anyway. I think I tried this before and had some problems, but I'm not certain now.
I don't follow what you mean about "aggregate type". The unittest uses int as the type. I don't think 'int' is considered an aggregate.
Well, That's what the 'void func(T name...)' syntax is for. If T is an aggregate type (array, class, struct, etc.) you can either supply that type to the func, or all of its members separated by comma's. I suspect that you meant this one: 'void func(T name, ...)', which is something different. I haven't tried it, and I will.
I suspect he may have meant the "void func(T)(T[] name...)" syntax (or in this case, "Set!(T) set(T)(T[] elements...)"). That would let the user pass as many arguments as he wants, while presenting them as a nice typesafe array to the library. By the way, "void func(int name...)" is also perfectly valid syntax but only allows one argument to be specified. This is all explained in the section "Typesafe Variadic Functions" of http://www.digitalmars.com/d/function.html.
Feb 26 2007
parent reply Michiel <nomail please.com> writes:
Frits van Bommel wrote:

 But if you really want to do put such things in your code you could try
 this:
 struct Set(T) {
     ...
     void opApplyReverse()() {static assert(0,"don't be silly");}
 }

 Since foo is a template member, no attempt is made to compile it unless
 it's actually called from somewhere.  If you do try to call it then you
 trip the static assert at compile-time.
No, if you use a static assert, the compilation will fail as soon as you instantiate the class somewhere. Not if you try to use foreach_reverse. If it weren't a template class, the compiler would throw the error in any case.
The _member_ is _also_ a template here (with 0 parameters, but a template nonetheless). So compilation will only fail if that template is instantiated, which will happen only if foreach_reverse is used (or it's explicitly called, of course). Here, let me show you (some modifications made to above code) <SNIP>
Interesting trick. I missed the extra (). One problem, though. If I use it like this: foreach_reverse(e ; x) { } Not specifying the element type (int). I get the "cannot infer type" error, instead of our custom error message. Why can't the compiler infer the type? Isn't it right in its face?
 I suspect he may have meant the "void func(T)(T[] name...)" syntax (or
 in this case, "Set!(T) set(T)(T[] elements...)"). That would let the
 user pass as many arguments as he wants, while presenting them as a nice
 typesafe array to the library.
I did try some variations of that. I couldn't make it work quite the way I wanted. -- Michiel
Feb 26 2007
parent reply Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Michiel wrote:
 Frits van Bommel wrote:
 
 But if you really want to do put such things in your code you could try
 this:
 struct Set(T) {
     ...
     void opApplyReverse()() {static assert(0,"don't be silly");}
 }
[snip Michiel's disbelief and my explanation]
 Here, let me show you (some modifications made to above code)
 <SNIP>
Interesting trick. I missed the extra (). One problem, though. If I use it like this: foreach_reverse(e ; x) { } Not specifying the element type (int). I get the "cannot infer type" error, instead of our custom error message. Why can't the compiler infer the type? Isn't it right in its face?
The code you <SNIP>ped didn't have that problem, so I presume you used the declaration Bill Baxter posted. Guess why I changed it for my example ;). The compiler can't infer the type in Bill's code because his opApplyReverse has the wrong signature. In order to get type deduction (as well as instantiation) opApplyReverse needs to have as parameter a delegate with the correct number of (inout) arguments that returns int. In my code I had --- void opApplyReverse()(int delegate(inout T)) { static assert(0,"don't be silly");} --- and it worked.
Feb 26 2007
parent reply Michiel <nomail please.com> writes:
Frits van Bommel wrote:

 Interesting trick. I missed the extra (). One problem, though. If I use
 it like this:

 foreach_reverse(e ; x) { }

 Not specifying the element type (int). I get the "cannot infer type"
 error, instead of our custom error message. Why can't the compiler infer
 the type? Isn't it right in its face?
The code you <SNIP>ped didn't have that problem, so I presume you used the declaration Bill Baxter posted. Guess why I changed it for my example ;). The compiler can't infer the type in Bill's code because his opApplyReverse has the wrong signature. In order to get type deduction (as well as instantiation) opApplyReverse needs to have as parameter a delegate with the correct number of (inout) arguments that returns int. In my code I had --- void opApplyReverse()(int delegate(inout T)) { static assert(0,"don't be silly");} --- and it worked.
No, I used your code. I just tested again to be sure. It doesn't work. If you leave out the 'int' in the foreach_reverse statement, the silly error message doesn't get displayed. -- Michiel
Feb 26 2007
parent reply Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Michiel wrote:
 No, I used your code. I just tested again to be sure. It doesn't work.
 If you leave out the 'int' in the foreach_reverse statement, the silly
 error message doesn't get displayed.
Oh, did I put an 'int' in there? Then I guess type inference doesn't work with template functions :(.
Feb 26 2007
parent Michiel <nomail please.com> writes:
Frits van Bommel wrote:

 No, I used your code. I just tested again to be sure. It doesn't work.
 If you leave out the 'int' in the foreach_reverse statement, the silly
 error message doesn't get displayed.
Oh, did I put an 'int' in there? Then I guess type inference doesn't work with template functions :(.
Sounds like room for improvement to me. Not that I'm ruling out that we just don't know how to do this yet. Any clarification would be appreciated. Until then I'm sticking with my runtime error message. I agree it's a shame, because this is exactly the sort of error the compiler should catch. I would think this should be able to expand to self-written error messages. -- Michiel
Feb 26 2007
prev sibling parent reply Stephan Diehl <stephan.diehl gmx.net> writes:
Michiel wrote:
 I have written a new class to represent sets, with several useful
 features. Uploaded for your review and free use. I don't know much about
 open source and licensing, but I request that if you use the class, that
 you keep my name in the author section of the documentation.
 
 http://files.michiel.eu/set.d
 
 Tell me what you think.
 
excelent. I somewhat missed some subset operations. If it were up to me, I'd use the opCmp operator for this (just like they do in python).
Feb 26 2007
parent reply Michiel <nomail please.com> writes:
Stephan Diehl wrote:

 I have written a new class to represent sets, with several useful
 features. Uploaded for your review and free use. I don't know much about
 open source and licensing, but I request that if you use the class, that
 you keep my name in the author section of the documentation.

 http://files.michiel.eu/set.d

 Tell me what you think.
excelent. I somewhat missed some subset operations. If it were up to me, I'd use the opCmp operator for this (just like they do in python).
You're right! I missed the subset operations. :) Unfortunately the opCmp operator is taken. Implementing that correctly allows a set to be a key-type for an associative array (or the type of another set). I can't make it double as a subset operation, because sets would have to be declared equal if one is not subset of the other. Even if they are not, indeed, equal. This would break their proper definition. For a moment I was tempted to use the shifting operators, because they're only one character apart. But I don't think I like it. They already have a non-shifter use (right, Walter? :)). A third would be confusing. Besides, <<= (which would mean: non-strict subset) is recognized as an op= (or: opAssign) operator. I also considered the 'in' operator for non-strict subset, but it's already used as a membership operator, and overloading it for this would also be confusing. I will just create a function for it. -- Michiel
Feb 26 2007
parent reply Stephan Diehl <stephan.diehl gmx.net> writes:
Michiel wrote:
 Stephan Diehl wrote:
 
 I have written a new class to represent sets, with several useful
 features. Uploaded for your review and free use. I don't know much about
 open source and licensing, but I request that if you use the class, that
 you keep my name in the author section of the documentation.

 http://files.michiel.eu/set.d

 Tell me what you think.
excelent. I somewhat missed some subset operations. If it were up to me, I'd use the opCmp operator for this (just like they do in python).
You're right! I missed the subset operations. :) Unfortunately the opCmp operator is taken. Implementing that correctly allows a set to be a key-type for an associative array (or the type of another set). I can't make it double as a subset operation, because sets would have to be declared equal if one is not subset of the other. Even if they are not, indeed, equal. This would break their proper definition. For a moment I was tempted to use the shifting operators, because they're only one character apart. But I don't think I like it. They already have a non-shifter use (right, Walter? :)). A third would be confusing. Besides, <<= (which would mean: non-strict subset) is recognized as an op= (or: opAssign) operator. I also considered the 'in' operator for non-strict subset, but it's already used as a membership operator, and overloading it for this would also be confusing. I will just create a function for it.
the python people have 'set' and 'frozenset' (http://docs.python.org/lib/types-set.html). The main difference between these two is basicly, that a 'set' can't be the key to a dict (and therefore not being a member of a set. To make it short: every member of a set must be immutable, or, to be more technical, must always produce the same hash value. This really makes sense, if one thinks about it a bit (and one wants to model sets in the more mathematical sense)
Feb 26 2007
parent reply Michiel <nomail please.com> writes:
Stephan Diehl wrote:

 the python people have 'set' and 'frozenset'
 (http://docs.python.org/lib/types-set.html). The main difference between
 these two is basicly, that a 'set' can't be the key to a dict (and
 therefore not being a member of a set. To make it short: every member of
 a set must be immutable, or, to be more technical, must always produce
 the same hash value. This really makes sense, if one thinks about it a
 bit (and one wants to model sets in the more mathematical sense)
Yes, this does make sense. But my set implementation doesn't provide any write access to its members anyway, so you could say that everything you put into it automatically becomes 'frozen'. Now that I think about it, I suppose it's possible to have another reference to an object that you are going to put into a set and change it through that reference. I'm not quite sure what to do with that. How does D handle this with its associative arrays? You know, I liked C++ better in that regard. If you do an assignment or pass something as an argument it's automatically a copy. In Java and in D you have to explicitly copy something, except if it's a base type. I don't like exceptions like that. -- Michiel
Feb 26 2007
parent reply Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Michiel wrote:
 Stephan Diehl wrote:
 
 the python people have 'set' and 'frozenset'
 (http://docs.python.org/lib/types-set.html). The main difference between
 these two is basicly, that a 'set' can't be the key to a dict (and
 therefore not being a member of a set. To make it short: every member of
 a set must be immutable, or, to be more technical, must always produce
 the same hash value. This really makes sense, if one thinks about it a
 bit (and one wants to model sets in the more mathematical sense)
Yes, this does make sense. But my set implementation doesn't provide any write access to its members anyway, so you could say that everything you put into it automatically becomes 'frozen'.
Actually, the frozenset type doesn't allow things like add() and remove(). And both set and frozenset disallow changing the elements. At least, according to the page he linked.
 Now that I think about it, I suppose it's possible to have another
 reference to an object that you are going to put into a set and change
 it through that reference. I'm not quite sure what to do with that.
You could require that any user-defined type to be put in a set has a .dup property/member function with postcondition ((orig !is dup) && (orig == dup) && (orig.opCmp(dup) == 0)), i.e. that returns a copy. And that it also doesn't store a reference to it anywhere else... But then opApply would also need to .dup the value to ensure it wasn't modified... This may just be more trouble than it's worth. Perhaps you should just say that things can get really screwed up if you change elements of a set...
 How does D handle this with its associative arrays?
I'm pretty sure it messes up if keys get altered while referenced by the AA, but I can't find that statement in the section of the spec dealing with AAs. [later] Ah, here we go, from the comments page: http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=12062 Walter responds: http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=12085 So yeah, don't change AA keys.
 You know, I liked C++ better in that regard. If you do an assignment or
 pass something as an argument it's automatically a copy. In Java and in
 D you have to explicitly copy something, except if it's a base type. I
 don't like exceptions like that.
IIRC there have been suggestions to add .dup to base types that just returns their value, for use in generic programming (i.e. templates).
Feb 26 2007
parent Michiel <nomail please.com> writes:
Frits van Bommel wrote:

 Yes, this does make sense. But my set implementation doesn't provide any
 write access to its members anyway, so you could say that everything you
 put into it automatically becomes 'frozen'.
Actually, the frozenset type doesn't allow things like add() and remove(). And both set and frozenset disallow changing the elements. At least, according to the page he linked.
Yes, I meant the elements of the set ('everything you put into it').
 This may just be more trouble than it's worth. Perhaps you should just
 say that things can get really screwed up if you change elements of a
 set...
I think I will. Since that's how D handles this problem at the moment.
 How does D handle this with its associative arrays?
<SNIPPYSNAPSNAP> So yeah, don't change AA keys.
Hm.. This is not a good thing. I'm not saying I have the perfect solution, but this is not a good thing.
 You know, I liked C++ better in that regard. If you do an assignment or
 pass something as an argument it's automatically a copy. In Java and in
 D you have to explicitly copy something, except if it's a base type. I
 don't like exceptions like that.
IIRC there have been suggestions to add .dup to base types that just returns their value, for use in generic programming (i.e. templates).
Well, that would be a start. :) But I'd still like the C++ way better. It could still have Java-style references, but the copy behavior isn't necessary. -- Michiel
Feb 26 2007