digitalmars.D - New set class

Michiel (21/21) Feb 24 2007 I have written a new class to represent sets, with several useful

Bill Baxter (18/26) Feb 24 2007 If you add an opApply with the signature

Michiel (34/60) Feb 25 2007 That's why I didn't do it. What meaning would it have for a set, since

Bill Baxter (61/111) Feb 25 2007 It's useful to the user who wants to print out the items numbered or

Michiel (23/115) Feb 26 2007 I feel that if I added that syntax, it would imply that the counter has

Frits van Bommel (43/87) Feb 26 2007 I agree with you here, sets don't have indexes for their elements, they

Michiel (11/38) Feb 26 2007 Interesting trick. I missed the extra (). One problem, though. If I use

Frits van Bommel (15/35) Feb 26 2007 The code you ped didn't have that problem, so I presume you used

Michiel (6/28) Feb 26 2007 No, I used your code. I just tested again to be sure. It doesn't work.

Frits van Bommel (3/6) Feb 26 2007 Oh, did I put an 'int' in there?

Michiel (9/16) Feb 26 2007 Sounds like room for improvement to me.

Stephan Diehl (3/12) Feb 26 2007 excelent. I somewhat missed some subset operations. If it were up to me,...

Michiel (19/31) Feb 26 2007 You're right! I missed the subset operations. :)

Stephan Diehl (8/44) Feb 26 2007 the python people have 'set' and 'frozenset'

Michiel (14/21) Feb 26 2007 Yes, this does make sense. But my set implementation doesn't provide any

Frits van Bommel (22/43) Feb 26 2007 Actually, the frozenset type doesn't allow things like add() and

Michiel (10/32) Feb 26 2007 I think I will. Since that's how D handles this problem at the moment.

Michiel <nomail please.com> writes:

I have written a new class to represent sets, with several useful
features. Uploaded for your review and free use. I don't know much about
open source and licensing, but I request that if you use the class, that
you keep my name in the author section of the documentation.

http://files.michiel.eu/set.d

Tell me what you think.

There still are some problems, though. I could use some help:

* There are still problems involving static arrays as the set type. The
set literal creates a set of the exact type you specify. So if you use a
string literal, it will be a set of static char arrays. It is only a
sort of hack in the Set class that allows for the dynamic array
equivalent to also be accepted in. What I really need is to implement
the set literal function so it creates a set of the dynamic array type
directly. But I've run into some problems, sometimes strangely involving
char-encodings. (invalid UTF-8 sequence, printed to the terminal at
runtime when I try to print the toString())

* When I enable unit testing, several things can go wrong. Sometimes I
get a segmentation fault. Sometimes I get a program that doesn't
terminate and eats all of the memory until I kill it.

-- 
Michiel

Feb 24 2007

Bill Baxter <dnewsgroup billbaxter.com> writes:

Michiel wrote:
 I have written a new class to represent sets, with several useful
 features. Uploaded for your review and free use. I don't know much about
 open source and licensing, but I request that if you use the class, that
 you keep my name in the author section of the documentation.
 
 http://files.michiel.eu/set.d
 
 Tell me what you think.

If you add an opApply with the signature
    int opApply(int delegate(uint i, inout Type) dg) { ... }

it will allow this kind of foreach too:
    foreach(i, e; myset) {
        ...
    }

Also I would just leave out opApplyReverse.  That way using it is a 
compile-time error.  Or if you're dead set on it, at least have it throw 
an exception.  I believe assert() is a no-op in release builds, meaning 
your program will fail in strange ways in release.  Actually I'm 
surprised that compiles without a return statement.

Finally
    Set!(T[0]) set(T...)(T elements)
why not just
    Set!(T) set(T)(T elements...) ??

Have you looked at the HashSet implementation in Tango for comparison?

--bb

Feb 24 2007

Michiel <nomail please.com> writes:

Bill Baxter wrote:

 I have written a new class to represent sets, with several useful
 features. Uploaded for your review and free use. I don't know much about
 open source and licensing, but I request that if you use the class, that
 you keep my name in the author section of the documentation.

 http://files.michiel.eu/set.d

 Tell me what you think.

 
 If you add an opApply with the signature
    int opApply(int delegate(uint i, inout Type) dg) { ... }
 
 it will allow this kind of foreach too:
    foreach(i, e; myset) {
        ...
    }

That's why I didn't do it. What meaning would it have for a set, since
they have no index?

 Also I would just leave out opApplyReverse.  That way using it is a
 compile-time error.  Or if you're dead set on it, at least have it throw
 an exception.  I believe assert() is a no-op in release builds, meaning
 your program will fail in strange ways in release.  Actually I'm
 surprised that compiles without a return statement.

assert(0) is evaluated by D as a point in the code you can't ever reach.
I got the idea from the opCmp example:

--------------------------------------
For some objects, testing for less or greater makes no sense. For these,
override opCmp() with:

class A
{
    int opCmp(Object o)
    {
	assert(0);	// comparison makes no sense
	return 0;
    }
}
--------------------------------------

I see that that has a return statement, but I don't see why it would
need one. assert(0) always exists the function. I think it's a feature
of the compiler that it recognizes this.

 Finally
    Set!(T[0]) set(T...)(T elements)
 why not just
    Set!(T) set(T)(T elements...) ??

You only use that syntax if T is an aggregate type. So the resulting set
would actually be Set!(T[0]) anyway. I think I tried this before and had
some problems, but I'm not certain now.

 Have you looked at the HashSet implementation in Tango for comparison?

I had not. Is this the one?

http://www.dsource.org/projects/tango/docs/current/tango.util.collection.HashSet.html

Though I haven't looked at the source code, I see it exposes many of its
internal workings to the outside. When you are working with a set you
don't want to know if it's using a hash table or a balanced tree. Just
that it implements a set.

Though I'm sure that they put more work into it than I and that it's
more efficient. I just sneakily used an associative array on the inside.
Which uses its own hash table, if I'm not mistaken.

-- 
Michiel

Feb 25 2007

Bill Baxter <dnewsgroup billbaxter.com> writes:

Michiel wrote:
 Bill Baxter wrote:
 
 I have written a new class to represent sets, with several useful
 features. Uploaded for your review and free use. I don't know much about
 open source and licensing, but I request that if you use the class, that
 you keep my name in the author section of the documentation.

 http://files.michiel.eu/set.d

 Tell me what you think.

 If you add an opApply with the signature
    int opApply(int delegate(uint i, inout Type) dg) { ... }

 it will allow this kind of foreach too:
    foreach(i, e; myset) {
        ...
    }

 
 That's why I didn't do it. What meaning would it have for a set, since
 they have no index?

It's useful to the user who wants to print out the items numbered or 
something.  Or they have a set of 10 items and they want to copy them to 
a preallocated 10-item list.  Or they want to make another associative 
array that maps numbers to the items in the set.  Or they want to do 
something with only the first five items in the set then if(i>5) break;. 
  Or they want to append a number to every item in the set.  Or they 
want to split the set into two smaller sets arbitrarily using if (i%2).

Yes the user can always declare their own damn counter, but it doesn't 
make your code much more complex, and it does make their lives a little 
bit easier.

 
 Also I would just leave out opApplyReverse.  That way using it is a
 compile-time error.  Or if you're dead set on it, at least have it throw
 an exception.  I believe assert() is a no-op in release builds, meaning
 your program will fail in strange ways in release.  Actually I'm
 surprised that compiles without a return statement.

 
 assert(0) is evaluated by D as a point in the code you can't ever reach.
 I got the idea from the opCmp example:
 

Interesting.  I didn't realize that.  That's really not obvious from the 
docs that that's what happens:

"""
The expression assert(0) is a special case; it signifies that it is 
unreachable code. Either AssertError is thrown at runtime if it is 
reachable, or the execution is halted (on the x86 processor, a HLT 
instruction can be used to halt execution). The optimization and code 
generation phases of compilation may assume that it is unreachable code.
"""  -- http://www.digitalmars.com/d/expression.html#AssertExpression

Nowhere does it say it that it is "unconditional regardless of 
debug/release flags".

This page talks about asserts but doesn't mention debug/release behavior 
either.
http://www.digitalmars.com/d/dbc.html

Actually I can't find where disabling asserts for release builds is 
documented, though I'm sure I've seen it somewhere before.  A quick 
test, however, reveals that you are indeed correct: "regular" asserts 
turn off in a release build, but assert(0) does not.

Well that's good to know.  Still, it generates a runtime error, when you 
really would prefer a compile-time error.   So I think it's a bad idea 
to start putting in lots of
    opFoo() { assert(0,"This method doesn't exist"); }
Once you start putting those in, where do you stop?  I'm sure there are 
other operators and functions that Set does not implement.

It's almost like comments along the lines of:
    // this is an integer that does the counting:
    int counter;

The fact that the method is not there is all the documentation you need.

But if you really want to do put such things in your code you could try 
this:
struct Set(T) {
     ...
     void opApplyReverse()() {static assert(0,"don't be silly");}
}

Since foo is a template member, no attempt is made to compile it unless 
it's actually called from somewhere.  If you do try to call it then you 
trip the static assert at compile-time.

 Finally
    Set!(T[0]) set(T...)(T elements)
 why not just
    Set!(T) set(T)(T elements...) ??

 
 You only use that syntax if T is an aggregate type. So the resulting set
 would actually be Set!(T[0]) anyway. I think I tried this before and had
 some problems, but I'm not certain now.

I don't follow what you mean about "aggregate type".  The unittest uses 
int as the type.  I don't think 'int' is considered an aggregate.



 Have you looked at the HashSet implementation in Tango for comparison?

 
 I had not. Is this the one?
 
 http://www.dsource.org/projects/tango/docs/current/tango.util.collection.HashSet.html

Yes.

 Though I haven't looked at the source code, I see it exposes many of its
 internal workings to the outside. When you are working with a set you
 don't want to know if it's using a hash table or a balanced tree. Just
 that it implements a set.

It also doesn't include set operations like you have.  (union, 
intersection, difference, symmetric difference).  And I agree with you 
that it looks like it exposes too much of its innards for most users.

 Though I'm sure that they put more work into it than I and that it's
 more efficient. I just sneakily used an associative array on the inside.
 Which uses its own hash table, if I'm not mistaken.

It looks to be based on code by Doug Lea, who wrote dlmalloc, one of the 
fastest implementations of malloc around.  So I suspect performance is 
pretty good.


Another thing you might want to add to your set class is opCat (~) and 
opCatAssign (~=).  Those are pretty commonly used in D for appending 
something to something else.  They'd just be synonyms for add and union.

--bb

Feb 25 2007

Michiel <nomail please.com> writes:

Bill Baxter wrote:

 If you add an opApply with the signature
    int opApply(int delegate(uint i, inout Type) dg) { ... }

 it will allow this kind of foreach too:
    foreach(i, e; myset) {
        ...
    }

 That's why I didn't do it. What meaning would it have for a set, since
 they have no index?

 
 It's useful to the user who wants to print out the items numbered or
 something.  Or they have a set of 10 items and they want to copy them to
 a preallocated 10-item list.  Or they want to make another associative
 array that maps numbers to the items in the set.  Or they want to do
 something with only the first five items in the set then if(i>5) break;.
  Or they want to append a number to every item in the set.  Or they want
 to split the set into two smaller sets arbitrarily using if (i%2).
 
 Yes the user can always declare their own damn counter, but it doesn't
 make your code much more complex, and it does make their lives a little
 bit easier.

I feel that if I added that syntax, it would imply that the counter has
some sort of relevance to the set, which it really doesn't.

A counter that increments by one each iteration has real meaning for an
array. So does a key-type for an associative array. But for a set...
sorry, I'd just feel dirty adding it.

 Also I would just leave out opApplyReverse.  That way using it is a
 compile-time error.  Or if you're dead set on it, at least have it throw
 an exception.  I believe assert() is a no-op in release builds, meaning
 your program will fail in strange ways in release.  Actually I'm
 surprised that compiles without a return statement.

 assert(0) is evaluated by D as a point in the code you can't ever reach.
 I got the idea from the opCmp example:

 
 Interesting.  I didn't realize that.  That's really not obvious from the
 docs that that's what happens:
 
 """
 The expression assert(0) is a special case; it signifies that it is
 unreachable code. Either AssertError is thrown at runtime if it is
 reachable, or the execution is halted (on the x86 processor, a HLT
 instruction can be used to halt execution). The optimization and code
 generation phases of compilation may assume that it is unreachable code.
 """  -- http://www.digitalmars.com/d/expression.html#AssertExpression
 
 Nowhere does it say it that it is "unconditional regardless of
 debug/release flags".
 
 This page talks about asserts but doesn't mention debug/release behavior
 either.
 http://www.digitalmars.com/d/dbc.html
 
 Actually I can't find where disabling asserts for release builds is
 documented, though I'm sure I've seen it somewhere before.  A quick
 test, however, reveals that you are indeed correct: "regular" asserts
 turn off in a release build, but assert(0) does not.
 
 Well that's good to know.  Still, it generates a runtime error, when you
 really would prefer a compile-time error.   So I think it's a bad idea
 to start putting in lots of
    opFoo() { assert(0,"This method doesn't exist"); }
 Once you start putting those in, where do you stop?  I'm sure there are
 other operators and functions that Set does not implement.
 
 It's almost like comments along the lines of:
    // this is an integer that does the counting:
    int counter;
 
 The fact that the method is not there is all the documentation you need.
 
 But if you really want to do put such things in your code you could try
 this:
 struct Set(T) {
     ...
     void opApplyReverse()() {static assert(0,"don't be silly");}
 }
 
 Since foo is a template member, no attempt is made to compile it unless
 it's actually called from somewhere.  If you do try to call it then you
 trip the static assert at compile-time.

No, if you use a static assert, the compilation will fail as soon as you
instantiate the class somewhere. Not if you try to use foreach_reverse.
If it weren't a template class, the compiler would throw the error in
any case.

You're right that a compiler error would be preferable to a runtime
error, but simply leaving the function out doesn't give as clear an
error message ("Error: cannot infer type for e"). It would be ideal if I
could have both.

But well, I don't understand why unittests are done at runtime either.

 Finally
    Set!(T[0]) set(T...)(T elements)
 why not just
    Set!(T) set(T)(T elements...) ??

 You only use that syntax if T is an aggregate type. So the resulting set
 would actually be Set!(T[0]) anyway. I think I tried this before and had
 some problems, but I'm not certain now.

 
 I don't follow what you mean about "aggregate type".  The unittest uses
 int as the type.  I don't think 'int' is considered an aggregate.

Well, That's what the 'void func(T name...)' syntax is for. If T is an
aggregate type (array, class, struct, etc.) you can either supply that
type to the func, or all of its members separated by comma's.

I suspect that you meant this one: 'void func(T name, ...)', which is
something different. I haven't tried it, and I will.

 Another thing you might want to add to your set class is opCat (~) and
 opCatAssign (~=).  Those are pretty commonly used in D for appending
 something to something else.  They'd just be synonyms for add and union.

Good idea. I'll add them.

-- 
Michiel

Feb 26 2007

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Michiel wrote:
 Bill Baxter wrote:
 

[snip suggestion about counters in set-foreach]
 I feel that if I added that syntax, it would imply that the counter has
 some sort of relevance to the set, which it really doesn't.
 
 A counter that increments by one each iteration has real meaning for an
 array. So does a key-type for an associative array. But for a set...
 sorry, I'd just feel dirty adding it.

I agree with you here, sets don't have indexes for their elements, they 
are unordered.

 But if you really want to do put such things in your code you could try
 this:
 struct Set(T) {
     ...
     void opApplyReverse()() {static assert(0,"don't be silly");}
 }

 Since foo is a template member, no attempt is made to compile it unless
 it's actually called from somewhere.  If you do try to call it then you
 trip the static assert at compile-time.

 
 No, if you use a static assert, the compilation will fail as soon as you
 instantiate the class somewhere. Not if you try to use foreach_reverse.
 If it weren't a template class, the compiler would throw the error in
 any case.

The _member_ is _also_ a template here (with 0 parameters, but a 
template nonetheless). So compilation will only fail if that template is 
instantiated, which will happen only if foreach_reverse is used (or it's 
explicitly called, of course).

Here, let me show you (some modifications made to above code)
---
urxae urxae:~/tmp$ cat test.d
import std.stdio;

struct Set(T) {
     void opApplyReverse()(int delegate(inout T)) { static 
assert(0,"don't be silly");}
}

void main()
{
     Set!(int) x;
     version(ForeachReverse)
         foreach_reverse(int e; x) {}
}
urxae urxae:~/tmp$ dmd test.d && ./test
gcc test.o -o test -m32 -lphobos -lpthread -lm -Xlinker 
-L/home/urxae/opt/dmd/lib

urxae urxae:~/tmp$ dmd -version=ForeachReverse test.d
test.d(4): static assert  (0) is false, "don't be silly"
---

The static assert is only triggered with -version=ForeachReverse.

 You're right that a compiler error would be preferable to a runtime
 error, but simply leaving the function out doesn't give as clear an
 error message ("Error: cannot infer type for e"). It would be ideal if I
 could have both.

No, "cannot infer type for e" is indeed a bit unclear, it doesn't 
explain *why* it happened.

 But well, I don't understand why unittests are done at runtime either.

Yeah, there's been some discussion about that. A lot of people would 
like them to be ran after compiling & linking, but so far no change has 
been made to allow this to be done.

 Finally
    Set!(T[0]) set(T...)(T elements)
 why not just
    Set!(T) set(T)(T elements...) ??

 You only use that syntax if T is an aggregate type. So the resulting set
 would actually be Set!(T[0]) anyway. I think I tried this before and had
 some problems, but I'm not certain now.

 I don't follow what you mean about "aggregate type".  The unittest uses
 int as the type.  I don't think 'int' is considered an aggregate.

 
 Well, That's what the 'void func(T name...)' syntax is for. If T is an
 aggregate type (array, class, struct, etc.) you can either supply that
 type to the func, or all of its members separated by comma's.
 
 I suspect that you meant this one: 'void func(T name, ...)', which is
 something different. I haven't tried it, and I will.

I suspect he may have meant the "void func(T)(T[] name...)" syntax (or 
in this case, "Set!(T) set(T)(T[] elements...)"). That would let the 
user pass as many arguments as he wants, while presenting them as a nice 
typesafe array to the library.
By the way, "void func(int name...)" is also perfectly valid syntax but 
only allows one argument to be specified.
This is all explained in the section "Typesafe Variadic Functions" of 
http://www.digitalmars.com/d/function.html.

Feb 26 2007

Michiel <nomail please.com> writes:

Frits van Bommel wrote:

 But if you really want to do put such things in your code you could try
 this:
 struct Set(T) {
     ...
     void opApplyReverse()() {static assert(0,"don't be silly");}
 }

 Since foo is a template member, no attempt is made to compile it unless
 it's actually called from somewhere.  If you do try to call it then you
 trip the static assert at compile-time.

 No, if you use a static assert, the compilation will fail as soon as you
 instantiate the class somewhere. Not if you try to use foreach_reverse.
 If it weren't a template class, the compiler would throw the error in
 any case.

 
 The _member_ is _also_ a template here (with 0 parameters, but a
 template nonetheless). So compilation will only fail if that template is
 instantiated, which will happen only if foreach_reverse is used (or it's
 explicitly called, of course).
 
 Here, let me show you (some modifications made to above code)
 <SNIP>

Interesting trick. I missed the extra (). One problem, though. If I use
it like this:

foreach_reverse(e ; x) { }

Not specifying the element type (int). I get the "cannot infer type"
error, instead of our custom error message. Why can't the compiler infer
the type? Isn't it right in its face?

 I suspect he may have meant the "void func(T)(T[] name...)" syntax (or
 in this case, "Set!(T) set(T)(T[] elements...)"). That would let the
 user pass as many arguments as he wants, while presenting them as a nice
 typesafe array to the library.

I did try some variations of that. I couldn't make it work quite the way
I wanted.

-- 
Michiel

Feb 26 2007

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Michiel wrote:
 Frits van Bommel wrote:
 
 But if you really want to do put such things in your code you could try
 this:
 struct Set(T) {
     ...
     void opApplyReverse()() {static assert(0,"don't be silly");}
 }




[snip Michiel's disbelief and my explanation]
 Here, let me show you (some modifications made to above code)
 <SNIP>

 
 Interesting trick. I missed the extra (). One problem, though. If I use
 it like this:
 
 foreach_reverse(e ; x) { }
 
 Not specifying the element type (int). I get the "cannot infer type"
 error, instead of our custom error message. Why can't the compiler infer
 the type? Isn't it right in its face?

The code you <SNIP>ped didn't have that problem, so I presume you used 
the declaration Bill Baxter posted. Guess why I changed it for my 
example ;).
The compiler can't infer the type in Bill's code because his 
opApplyReverse has the wrong signature. In order to get type deduction 
(as well as instantiation) opApplyReverse needs to have as parameter a 
delegate with the correct number of (inout) arguments that returns int.
In my code I had
---
void opApplyReverse()(int delegate(inout T)) { static assert(0,"don't be 
silly");}
---
and it worked.

Feb 26 2007

Michiel <nomail please.com> writes:

Frits van Bommel wrote:

 Interesting trick. I missed the extra (). One problem, though. If I use
 it like this:

 foreach_reverse(e ; x) { }

 Not specifying the element type (int). I get the "cannot infer type"
 error, instead of our custom error message. Why can't the compiler infer
 the type? Isn't it right in its face?

 
 The code you <SNIP>ped didn't have that problem, so I presume you used
 the declaration Bill Baxter posted. Guess why I changed it for my
 example ;).
 The compiler can't infer the type in Bill's code because his
 opApplyReverse has the wrong signature. In order to get type deduction
 (as well as instantiation) opApplyReverse needs to have as parameter a
 delegate with the correct number of (inout) arguments that returns int.
 In my code I had
 ---
 void opApplyReverse()(int delegate(inout T)) { static assert(0,"don't be
 silly");}
 ---
 and it worked.

No, I used your code. I just tested again to be sure. It doesn't work.
If you leave out the 'int' in the foreach_reverse statement, the silly
error message doesn't get displayed.

-- 
Michiel

Feb 26 2007

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Michiel wrote:
 No, I used your code. I just tested again to be sure. It doesn't work.
 If you leave out the 'int' in the foreach_reverse statement, the silly
 error message doesn't get displayed.

Oh, did I put an 'int' in there?

Then I guess type inference doesn't work with template functions :(.

Feb 26 2007

Michiel <nomail please.com> writes:

Frits van Bommel wrote:

 No, I used your code. I just tested again to be sure. It doesn't work.
 If you leave out the 'int' in the foreach_reverse statement, the silly
 error message doesn't get displayed.

 
 Oh, did I put an 'int' in there?
 
 Then I guess type inference doesn't work with template functions :(.

Sounds like room for improvement to me.

Not that I'm ruling out that we just don't know how to do this yet. Any
clarification would be appreciated. Until then I'm sticking with my
runtime error message. I agree it's a shame, because this is exactly the
sort of error the compiler should catch. I would think this should be
able to expand to self-written error messages.

-- 
Michiel

Feb 26 2007

Stephan Diehl <stephan.diehl gmx.net> writes:

Michiel wrote:
 I have written a new class to represent sets, with several useful
 features. Uploaded for your review and free use. I don't know much about
 open source and licensing, but I request that if you use the class, that
 you keep my name in the author section of the documentation.
 
 http://files.michiel.eu/set.d
 
 Tell me what you think.
 

excelent. I somewhat missed some subset operations. If it were up to me, 
I'd use the opCmp operator for this (just like they do in python).

Feb 26 2007

Michiel <nomail please.com> writes:

Stephan Diehl wrote:

 I have written a new class to represent sets, with several useful
 features. Uploaded for your review and free use. I don't know much about
 open source and licensing, but I request that if you use the class, that
 you keep my name in the author section of the documentation.

 http://files.michiel.eu/set.d

 Tell me what you think.

 
 excelent. I somewhat missed some subset operations. If it were up to me,
 I'd use the opCmp operator for this (just like they do in python).

You're right! I missed the subset operations. :)

Unfortunately the opCmp operator is taken. Implementing that correctly
allows a set to be a key-type for an associative array (or the type of
another set).

I can't make it double as a subset operation, because sets would have to
be declared equal if one is not subset of the other. Even if they are
not, indeed, equal. This would break their proper definition.

For a moment I was tempted to use the shifting operators, because
they're only one character apart. But I don't think I like it. They
already have a non-shifter use (right, Walter? :)). A third would be
confusing. Besides, <<= (which would mean: non-strict subset) is
recognized as an op= (or: opAssign) operator.

I also considered the 'in' operator for non-strict subset, but it's
already used as a membership operator, and overloading it for this would
also be confusing.

I will just create a function for it.

-- 
Michiel

Feb 26 2007

Stephan Diehl <stephan.diehl gmx.net> writes:

Michiel wrote:
 Stephan Diehl wrote:
 
 I have written a new class to represent sets, with several useful
 features. Uploaded for your review and free use. I don't know much about
 open source and licensing, but I request that if you use the class, that
 you keep my name in the author section of the documentation.

 http://files.michiel.eu/set.d

 Tell me what you think.

 excelent. I somewhat missed some subset operations. If it were up to me,
 I'd use the opCmp operator for this (just like they do in python).

 
 You're right! I missed the subset operations. :)
 
 Unfortunately the opCmp operator is taken. Implementing that correctly
 allows a set to be a key-type for an associative array (or the type of
 another set).
 
 I can't make it double as a subset operation, because sets would have to
 be declared equal if one is not subset of the other. Even if they are
 not, indeed, equal. This would break their proper definition.
 
 For a moment I was tempted to use the shifting operators, because
 they're only one character apart. But I don't think I like it. They
 already have a non-shifter use (right, Walter? :)). A third would be
 confusing. Besides, <<= (which would mean: non-strict subset) is
 recognized as an op= (or: opAssign) operator.
 
 I also considered the 'in' operator for non-strict subset, but it's
 already used as a membership operator, and overloading it for this would
 also be confusing.
 
 I will just create a function for it.
 

the python people have 'set' and 'frozenset' 
(http://docs.python.org/lib/types-set.html). The main difference between 
these two is basicly, that a 'set' can't be the key to a dict (and 
therefore not being a member of a set. To make it short: every member of 
a set must be immutable, or, to be more technical, must always produce 
the same hash value. This really makes sense, if one thinks about it a 
bit (and one wants to model sets in the more mathematical sense)

Feb 26 2007

Michiel <nomail please.com> writes:

Stephan Diehl wrote:

 the python people have 'set' and 'frozenset'
 (http://docs.python.org/lib/types-set.html). The main difference between
 these two is basicly, that a 'set' can't be the key to a dict (and
 therefore not being a member of a set. To make it short: every member of
 a set must be immutable, or, to be more technical, must always produce
 the same hash value. This really makes sense, if one thinks about it a
 bit (and one wants to model sets in the more mathematical sense)

Yes, this does make sense. But my set implementation doesn't provide any
write access to its members anyway, so you could say that everything you
put into it automatically becomes 'frozen'.

Now that I think about it, I suppose it's possible to have another
reference to an object that you are going to put into a set and change
it through that reference. I'm not quite sure what to do with that.

How does D handle this with its associative arrays?

You know, I liked C++ better in that regard. If you do an assignment or
pass something as an argument it's automatically a copy. In Java and in
D you have to explicitly copy something, except if it's a base type. I
don't like exceptions like that.

-- 
Michiel

Feb 26 2007

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Michiel wrote:
Stephan Diehl wrote:

the python people have 'set' and 'frozenset'
(http://docs.python.org/lib/types-set.html). The main difference between
these two is basicly, that a 'set' can't be the key to a dict (and
therefore not being a member of a set. To make it short: every member of
a set must be immutable, or, to be more technical, must always produce
the same hash value. This really makes sense, if one thinks about it a
bit (and one wants to model sets in the more mathematical sense)

Yes, this does make sense. But my set implementation doesn't provide any
write access to its members anyway, so you could say that everything you
put into it automatically becomes 'frozen'.

Actually, the frozenset type doesn't allow things like add() and
remove(). And both set and frozenset disallow changing the elements.
At least, according to the page he linked.

Now that I think about it, I suppose it's possible to have another
reference to an object that you are going to put into a set and change
it through that reference. I'm not quite sure what to do with that.

You could require that any user-defined type to be put in a set has a
.dup property/member function with postcondition ((orig !is dup) &&
(orig == dup) && (orig.opCmp(dup) == 0)), i.e. that returns a copy. And
that it also doesn't store a reference to it anywhere else...
But then opApply would also need to .dup the value to ensure it wasn't
modified...

This may just be more trouble than it's worth. Perhaps you should just
say that things can get really screwed up if you change elements of a set...

How does D handle this with its associative arrays?

I'm pretty sure it messes up if keys get altered while referenced by the
AA, but I can't find that statement in the section of the spec dealing
with AAs.
[later] Ah, here we go, from the comments page:
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=12062
Walter responds:
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=12085

So yeah, don't change AA keys.

You know, I liked C++ better in that regard. If you do an assignment or
pass something as an argument it's automatically a copy. In Java and in
D you have to explicitly copy something, except if it's a base type. I
don't like exceptions like that.

IIRC there have been suggestions to add .dup to base types that just
returns their value, for use in generic programming (i.e. templates).

Feb 26 2007

Michiel <nomail please.com> writes:

Frits van Bommel wrote:

 Yes, this does make sense. But my set implementation doesn't provide any
 write access to its members anyway, so you could say that everything you
 put into it automatically becomes 'frozen'.

 
 Actually, the frozenset type doesn't allow things like add() and
 remove(). And both set and frozenset disallow changing the elements.
 At least, according to the page he linked.

Yes, I meant the elements of the set ('everything you put into it').

 This may just be more trouble than it's worth. Perhaps you should just
 say that things can get really screwed up if you change elements of a
 set...

I think I will. Since that's how D handles this problem at the moment.

 How does D handle this with its associative arrays?

 
 <SNIPPYSNAPSNAP>
 
 So yeah, don't change AA keys.

Hm.. This is not a good thing. I'm not saying I have the perfect
solution, but this is not a good thing.

 You know, I liked C++ better in that regard. If you do an assignment or
 pass something as an argument it's automatically a copy. In Java and in
 D you have to explicitly copy something, except if it's a base type. I
 don't like exceptions like that.

 
 IIRC there have been suggestions to add .dup to base types that just
 returns their value, for use in generic programming (i.e. templates).

Well, that would be a start. :) But I'd still like the C++ way better.
It could still have Java-style references, but the copy behavior isn't
necessary.

-- 
Michiel

Feb 26 2007

D Programming

C/C++ Programming

Other

digitalmars.D - New set class