www.digitalmars.com         C & C++   DMDScript  

D - Ode to the set

reply Daniel Mantione <daniel deadlock.et.tudelft.nl> writes:
Hi,

One of the features I miss the most in C is the set. Being a member of the 
Free Pascal development team, I program a lot in Pascal. Now I don't want 
to say that Pascal rules and C sucks, but there are some things that Pascal 
is superior in.

Why do you need a set?

I think it can best be illustrated with an example. The following code code 
from the AOLserver http server and is used to encode a parameter to place 
in in a URL. For example the text "This is a sentence" is changed into 
"This%20is%20a%20sentence".

char *
Ns_EncodeUrl(Ns_DString *pds, char *string)
{
    static char safechars[] =
        "abcdefghijklmnopqrstuvwxyz"
        "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
        "0123456789";

    while (*string != '\0') {

        if (strchr(safechars, *string) != NULL) {
            Ns_DStringNAppend(pds, string, 1);
        } else {
            char buf[4];
            sprintf(buf, "%%%02x", (unsigned char) *string);
            Ns_DStringNAppend(pds, buf, 3);
        }
        ++string;
    }
    return pds->string;
}

That's really horrible inefficient isn't it? But it's easy to maintain, if 
the safechars are not correct, you can correct it by changing it. Now how 
would you do this into Pascal?

function ns_encodeurl(s:string):string;

const   safechars=['a'..'z','A'..'Z','0'..'9'];

var     i:longint;

begin
    ns_encodeurl:='';
    for i:=1 to length(s) do
        if s[i] in safechars then
            ns_encodeurl:=ns_encodeurl+s[i]
        else
            ns_encodeurl:=ns_encodeurl+'%'+hexstr(byte(s[i]));
end;

That looks much cleaner! And it's much more efficient; instead of a strchr 
call for each character the compiler now only needs a set lookup. And 
because this is a constant set, the compiler can optimize this into a few 
compare operations.

Set's make also good flags. In C you usually see code like this:

#define borders_enabled         1
#define shadow_enabled          2
#define lights_enabled          4

And then:

void do_nice_things(int flags) {
    if (flags & borders_enabled) {
    };
};

Now how would you do it with a set? A simple enumeration does the trick:

type    effects=(borders_enabled,shadow_enabled,lights_enabled);

And then:

procedure do_nice_things(flags:set of effects);

begin
    if borders_enabled in effects then
        begin
        end;
end;

... so no terrible long header files. Just simple enumerations, while the 
compiler will propably generate the same code.

So, how about a set in D??

Greetings,

Daniel Mantione
Aug 21 2001
next sibling parent reply "Angus Graham" <agraham_d agraham.ca> writes:
"Daniel Mantione" <daniel deadlock.et.tudelft.nl> wrote
 That looks much cleaner! And it's much more efficient; instead of a strchr
 call for each character the compiler now only needs a set lookup. And
 because this is a constant set, the compiler can optimize this into a few
 compare operations.
Well, to be fair, the C version doesn't have to use strchr. You could just change if (strchr(safechars, *string) != NULL) { to char c = *string; if( (c>='a' && 'z'>=c) || (c>='A' && 'Z'>=c) || (c>='0' && '9'>=c) ) or even char set[][] = { {'a','z'}, {'A','Z'}, {'0','9'}; char c = *string; if( (c>=set[0][0] && set[0][1]>=c) || (c>=set[1][0] && set[1][1]>=c) || (c>=set[2][0] && set[2][1]>=c) ) or finally char set[][] = { {'a','z'}, {'A','Z'}, {'0','9'}; char c = *string; for( int i=0; i<sizeof(set); ++i ) { found = (c>=set[i][0] && set[i][1]>=c); } if(found) { which is a little hectic, bit is what I presume the Pascal version does internally. Angus Graham
Aug 21 2001
next sibling parent reply Russell Bornschlegel <kaleja estarcion.com> writes:
Angus Graham wrote:
 
 "Daniel Mantione" <daniel deadlock.et.tudelft.nl> wrote
 That looks much cleaner! And it's much more efficient; instead of a strchr
 call for each character the compiler now only needs a set lookup. And
 because this is a constant set, the compiler can optimize this into a few
 compare operations.
Well, to be fair, the C version doesn't have to use strchr. You could just change... [options snipped]
Those alternatives aren't very pretty, IMO. strchr()'s actually pretty efficient -- only one function call involved, and I presume the typical Pascal system is going to make a function call to do a set lookup. However, strchr() isn't a general object-in-set function. For the general case, are D's associative arrays good enough, Daniel? -RB
Aug 21 2001
parent reply Daniel Mantione <daniel deadlock.et.tudelft.nl> writes:
Russell Bornschlegel wrote:

 Those alternatives aren't very pretty, IMO.
That's indeed the point. Code like this is hard to maintain.
 strchr()'s actually pretty efficient -- only one function call involved,
 and I presume the typical Pascal system is going to make a function call
 to do a set lookup. However, strchr() isn't a general object-in-set
 function.
Our own Free Pascal compiler uses compares if the amount of compares that are needed is smaller than 9. If the amount of compares is greater than 9, it creates a bit map, and checks if bit number x is set. So: if s[i] in ['0'..'9'] .. will generate the same code as .. if i>='0' and i<='9' .. while .. .. will generate code similar to .. if bitmap[s[i]] then -- Both are much more efficient than a strchr call. Furthermore, strchr is often optmized for very long strings, look at the implementation in the GNU libc for an example how much code it is.
 For the general case, are D's associative arrays good enough, Daniel?
They are a cool feature, but how would you use them as a set replacement? Daniel
Aug 21 2001
next sibling parent reply Russell Bornschlegel <kaleja estarcion.com> writes:
Daniel Mantione wrote:
 
 Russell Bornschlegel wrote:
 For the general case, are D's associative arrays good enough, Daniel?
They are a cool feature, but how would you use them as a set replacement?
Associative array of bit; 1 = exists, 0 = unexists. bit[ char ] myset; // set of char // do we need to explicitly zero things here? myset[ 'x' ] = 1; // 'x' exists in set myset[ 'y' ] = 1; // 'y' exists in set myset[ 'j' ] = 1; // 'j' exists in set if (myset['j'] == 1) { ... } myset['y'] = 0; // take y out of the set if (myset['y'] == 1) { ... } Alternately, ignore the contents of each element, and use the existence of an element in the associative array as set membership. int[ char ] myset; // set of char myset[ 'x' ] = 1; // 'x' exists in set myset[ 'y' ] = 42; // 'y' exists in set myset[ 'j' ] = 3721; // 'j' exists in set if ('j' in myset) { ... } delete myset['y']; // take y out of the set if ('y' in myset) { ... } The D spec is silent on the implementation details for associative arrays, but they'd likely be binary trees (ordered keys, lookup time normally O(log2(n))) or hashes (disordered keys, lookup time normally O(1)). The D compiler could conceivably do weird optimizations on bit-type associative arrays. -RB
Aug 22 2001
next sibling parent reply Daniel Mantione <daniel deadlock.et.tudelft.nl> writes:
Hi,

No, this won't do as replacement; it is much to cumbersome to use. You are 
right that a set allways needs to be a bitmap in unoptimized form. The 
compiler can then decide if it optimizes the bitmap away.

Try to code my first example with such an array. It's no replacement.

Furthermore, sets also have union, difference and subset operations, which 
you don't want to hard code.

Daniel

Russell Bornschlegel wrote:

 They are a cool feature, but how would you use them as a set replacement?
Associative array of bit; 1 = exists, 0 = unexists. bit[ char ] myset; // set of char // do we need to explicitly zero things here? myset[ 'x' ] = 1; // 'x' exists in set myset[ 'y' ] = 1; // 'y' exists in set myset[ 'j' ] = 1; // 'j' exists in set if (myset['j'] == 1) { ... } myset['y'] = 0; // take y out of the set if (myset['y'] == 1) { ... } Alternately, ignore the contents of each element, and use the existence of an element in the associative array as set membership. int[ char ] myset; // set of char myset[ 'x' ] = 1; // 'x' exists in set myset[ 'y' ] = 42; // 'y' exists in set myset[ 'j' ] = 3721; // 'j' exists in set if ('j' in myset) { ... } delete myset['y']; // take y out of the set if ('y' in myset) { ... } The D spec is silent on the implementation details for associative arrays, but they'd likely be binary trees (ordered keys, lookup time normally O(log2(n))) or hashes (disordered keys, lookup time normally O(1)). The D compiler could conceivably do weird optimizations on bit-type associative arrays. -RB
Aug 22 2001
parent reply Russell Bornschlegel <kaleja estarcion.com> writes:
Daniel Mantione wrote:

 No, this won't do as replacement; it is much to cumbersome to use. You are
 right that a set allways needs to be a bitmap in unoptimized form. The
 compiler can then decide if it optimizes the bitmap away.
 
 Try to code my first example with such an array. It's no replacement.
Your Ns_EncodeUrl() example? In C, that's a mess for a number of reasons, not the least of which is C's inadequate string handling. In terms of performance efficiency, in C, the set lookup is going to get swamped by the sprintf. In Pascal, the set lookup may be faster but it'll be drowned out over the long haul by array bounds checking. :) In terms of implementation cleanliness, Pascal wins out here. In D, I think the _usage_ of the set is just as clean; if assoc-array-of-bits is optimized specially, then performance should be basically equal, and that just leaves the instantiation of the set. So, I concede the desirability of static range-initialization in associative arrays in D: char[] hexlate = "0123456789ABCDEF"; bit[char] safechars; // we want a way of saying this in a static initializer safechars[ 'a'..'z' ] = 1; safechars[ 'A'..'Z' ] = 1; safechars[ '0'..'9' ] = 1; // all others left undefined/null/nonexistent char[] source, destination; for (i = 0; i < source.length(); i++) { char c = source[i]; if (c in safechars) { // append character destination.append( c ); // Walter: does destination += c do the Right Thing? } else { destination.append( '%%' ); destination.append( hexlate[ ( (c >> 4) & 0x0F ) ] ); // upper 4 bits of c as hex digit destination.append( hexlate[ (c & 0x0F) ] ); // lower 4 bits of c as hex digit } } Which seems reasonably clean to me. It's not clear to me from my reading of the D spec whether Walter intended range initialization to work on associative arrays. -RB
Aug 22 2001
parent reply Daniel Mantione <daniel deadlock.et.tudelft.nl> writes:
Hi,

Yes, this looks like a very promising implementation to use sets. What you 
are essentially saying is that an array of bits has set semantics, like the 
"in" operator.

This fits very well in the philosophy of the language. In C you can use an 
int as boolean, in Pascal not. This can apply to arrays/sets as set as well.

Still, I would propose to also have some kind of set constructor, to be 
able to code:

if (i in ['a'..'z']) {};

So the set constructor ['a'..'z'] constructs an associative array of bits. 
Now the compiler can analize this and see that the same result can be 
achieved by doing compares. In fact, Free Pascal does this when it 
encounters a constant set.

Note that a set constructor is not something to write down a constant set, 
but an expression that results in a set. The following code is completely 
legal:

var a,b:longint;
    c:set;

begin
    a:=1;
    b:=3;
    c:=[a,b,2*b+7,4,5,6];
end;

How about using [...] as set constructor in D? I don't think it makes the 
syntax ambiguous. Or do we need a different constructor?

Your example would then be written as:

   char[] hexlate = "0123456789ABCDEF";
   bit[char] safechars = ['a'..'z','A'..'Z','0'..'9'];
 
   // all others left undefined/null/nonexistent
 
   char[] source, destination;
 
   for (i = 0; i < source.length(); i++)
   {
     char c = source[i];
     if (c in safechars)
     {
         // append character
         destination.append( c );
         // Walter: does destination += c do the Right Thing?
     }
     else
     {

         destination.append( '%%' );
         destination.append( hexlate[ ( (c >> 4) & 0x0F ) ] );
         destination.append( hexlate[ (c & 0x0F) ] );
     }
   }

Russell Bornschlegel wrote:

 Daniel Mantione wrote:
 
 No, this won't do as replacement; it is much to cumbersome to use. You
 are right that a set allways needs to be a bitmap in unoptimized form.
 The compiler can then decide if it optimizes the bitmap away.
 
 Try to code my first example with such an array. It's no replacement.
Your Ns_EncodeUrl() example? In C, that's a mess for a number of reasons, not the least of which is C's inadequate string handling. In terms of performance efficiency, in C, the set lookup is going to get swamped by the sprintf. In Pascal, the set lookup may be faster but it'll be drowned out over the long haul by array bounds checking. :)
What do you mean? There are no arrays involved.
 In terms of implementation cleanliness, Pascal wins out here. In D, I
 think the _usage_ of the set is just as clean; if assoc-array-of-bits is
 optimized specially, then performance should be basically equal, and that
 just leaves the instantiation of the set.
 
 So, I concede the desirability of static range-initialization in
 associative arrays in D:
 
   char[] hexlate = "0123456789ABCDEF";
   bit[char] safechars;
 
   // we want a way of saying this in a static initializer
   safechars[ 'a'..'z' ] = 1;
   safechars[ 'A'..'Z' ] = 1;
   safechars[ '0'..'9' ] = 1;
 
   // all others left undefined/null/nonexistent
 
   char[] source, destination;
 
   for (i = 0; i < source.length(); i++)
   {
     char c = source[i];
     if (c in safechars)
     {
         // append character
         destination.append( c );
 // Walter: does destination += c do the Right Thing?
     }
     else
     {

         destination.append( '%%' );
         destination.append( hexlate[ ( (c >> 4) & 0x0F ) ] );   // upper 4
         bits of c as hex digit
         destination.append( hexlate[ (c & 0x0F) ] );            // lower 4
         bits of c as hex digit
     }
   }
 
 Which seems reasonably clean to me. It's not clear to me from my reading
 of the D spec whether Walter intended range initialization to work on
 associative arrays.
 
 -RB
Greetings, Daniel Mantione
Aug 22 2001
parent reply Russell Bornschlegel <kaleja estarcion.com> writes:
Daniel Mantione wrote:
 Yes, this looks like a very promising implementation to use sets. What you
 are essentially saying is that an array of bits has set semantics, like the
 "in" operator.
As currently specified, though, associative-array-of-bit actually has three states per key: 1, 0, or not-in-set. That's not a great fit to an underlying array-of-bit implementation.
 Still, I would propose to also have some kind of set constructor, to be
 able to code:
 
 if (i in ['a'..'z']) {};
 
 So the set constructor ['a'..'z'] constructs an associative array of bits.
 Now the compiler can analize this and see that the same result can be
 achieved by doing compares. In fact, Free Pascal does this when it
 encounters a constant set.
OK.
 In terms of performance efficiency, in C, the set lookup is going to get
 swamped by the sprintf. In Pascal, the set lookup may be faster but it'll
 be drowned out over the long haul by array bounds checking. :)
What do you mean? There are no arrays involved.
Not in that code snippet, but in the rest of the program, whatever that program might be. That's what I meant by "the long haul". I really tried to resist suggesting: static char* translate[] = { "%00", "%01", "%02" ... "0", "1", "2" ... "A", "B", "C" ... "a", "b", "c" ... "%FE", "%FF" }; // about 2K bytes while (*string) { char* x = translate[*string++]; while (*x) { *dest++ = *x++; } } -RB
Aug 22 2001
next sibling parent reply Daniel Mantione <daniel deadlock.et.tudelft.nl> writes:
Russell Bornschlegel wrote:

 What do you mean? There are no arrays involved.
Not in that code snippet, but in the rest of the program, whatever that program might be. That's what I meant by "the long haul".
Luckily you can turn off range checking, which you usually do when your program is bug-free. Daniel
Aug 22 2001
parent "Walter" <walter digitalmars.com> writes:
Daniel Mantione wrote in message <9m19an$1a1b$1 digitaldaemon.com>...
Luckily you can turn off range checking, which you usually do when your
program is bug-free.
Range checking is a wonderful feature that goes too far when you can't turn it off for the production release.
Oct 15 2001
prev sibling parent "Sean L. Palmer" <spalmer iname.com> writes:
"Russell Bornschlegel" <kaleja estarcion.com> wrote in message
news:3B8422B8.80B09E1C estarcion.com...
 Daniel Mantione wrote:
 Yes, this looks like a very promising implementation to use sets. What
you
 are essentially saying is that an array of bits has set semantics, like
the
 "in" operator.
As currently specified, though, associative-array-of-bit actually has three states per key: 1, 0, or not-in-set. That's not a great fit to an underlying array-of-bit implementation.
Seems like a set is more of an associative array of a type that has no value and takes no storage. So I don't think associative array of any value type would be an optimal way to implement a set. Sean
Oct 28 2001
prev sibling next sibling parent reply "Walter" <walter digitalmars.com> writes:
"Russell Bornschlegel" <kaleja estarcion.com> wrote in message
news:3B83EF19.D755A19 estarcion.com...
 The D spec is silent on the implementation details for
 associative arrays, but they'd likely be binary trees (ordered
 keys, lookup time normally O(log2(n))) or hashes (disordered keys,
 lookup time normally O(1)).
They're a mix of hash tables and binary trees. The implementation would be free to pick its own way of implementing them.
Aug 23 2001
next sibling parent Russell Bornschlegel <kaleja estarcion.com> writes:
Walter wrote:
 
 "Russell Bornschlegel" <kaleja estarcion.com> wrote in message
 news:3B83EF19.D755A19 estarcion.com...
 The D spec is silent on the implementation details for
 associative arrays, but they'd likely be binary trees (ordered
 keys, lookup time normally O(log2(n))) or hashes (disordered keys,
 lookup time normally O(1)).
They're a mix of hash tables and binary trees. The implementation would be free to pick its own way of implementing them.
So the spec won't offer promises about key ordering? -R
Aug 23 2001
prev sibling parent "Rajiv Bhagwat" <dataflow vsnl.com> writes:
Walter <walter digitalmars.com> wrote in message
news:9m4329$2vv4$1 digitaldaemon.com...
 "Russell Bornschlegel" <kaleja estarcion.com> wrote in message
 news:3B83EF19.D755A19 estarcion.com...
 The D spec is silent on the implementation details for
 associative arrays, but they'd likely be binary trees (ordered
 keys, lookup time normally O(log2(n))) or hashes (disordered keys,
 lookup time normally O(1)).
They're a mix of hash tables and binary trees. The implementation would be free to pick its own way of implementing them.
Have you checked up 'Suneido' language, where the common 'object' data type is accesible *both* as asso. arrays and vectors? Taking their own example, this is how you can specify the object literals: In the last example, the value of element 2 is "Joe", also accesible thru [2]. More on www.suneido.com -- Rajiv
Aug 24 2001
prev sibling parent "Walter" <walter digitalmars.com> writes:
Russell Bornschlegel wrote in message <3B83EF19.D755A19 estarcion.com>...
The D spec is silent on the implementation details for
associative arrays, but they'd likely be binary trees (ordered
keys, lookup time normally O(log2(n))) or hashes (disordered keys,
lookup time normally O(1)).
It's supposed to be an implementation detail <g>, but they are implemented as a hashed array of binary trees. My experience with them is that they are very efficient.
Oct 14 2001
prev sibling parent "Walter" <walter digitalmars.com> writes:
"Daniel Mantione" <daniel deadlock.et.tudelft.nl> wrote in message
news:9lvkln$a05$1 digitaldaemon.com...
 For the general case, are D's associative arrays good enough, Daniel?
They are a cool feature, but how would you use them as a set replacement?
int[char] set; char c; if (c in set) { }
Aug 23 2001
prev sibling parent reply "Sheldon Simms" <sheldon semanticedge.com> writes:
Im Artikel <9luk4s$2oqi$1 digitaldaemon.com> schrieb "Angus Graham"
<agraham_d agraham.ca>:

 "Daniel Mantione" <daniel deadlock.et.tudelft.nl> wrote
 That looks much cleaner! And it's much more efficient; instead of a
 strchr call for each character the compiler now only needs a set
 lookup. And because this is a constant set, the compiler can optimize
 this into a few compare operations.
Well, to be fair, the C version doesn't have to use strchr. You could just change if (strchr(safechars, *string) != NULL) { to char c = *string; if( (c>='a' && 'z'>=c) || (c>='A' && 'Z'>=c) || (c>='0' && '9'>=c) ) or even char set[][] = { {'a','z'}, {'A','Z'}, {'0','9'}; char c = *string; if( (c>=set[0][0] && set[0][1]>=c) || (c>=set[1][0] && set[1][1]>=c) || (c>=set[2][0] && set[2][1]>=c) ) or finally char set[][] = { {'a','z'}, {'A','Z'}, {'0','9'}; char c = *string; for( int i=0; i<sizeof(set); ++i ) { found = (c>=set[i][0] && set[i][1]>=c); } if(found) { which is a little hectic, bit is what I presume the Pascal version does internally.
Which basically proves Daniel's point. Why should the programmer have to code this explicitly? -- Sheldon Simms / sheldon semanticedge.com
Aug 21 2001
next sibling parent "D Man" <clockwork austin.rr.com> writes:
I second that.  I am not a fan of Pascal, but if there are features that are
this handy...

"Sheldon Simms" <sheldon semanticedge.com> wrote in message
news:9lul5d$2p87$1 digitaldaemon.com...
 Im Artikel <9luk4s$2oqi$1 digitaldaemon.com> schrieb "Angus Graham"
 <agraham_d agraham.ca>:

 "Daniel Mantione" <daniel deadlock.et.tudelft.nl> wrote
 That looks much cleaner! And it's much more efficient; instead of a
 strchr call for each character the compiler now only needs a set
 lookup. And because this is a constant set, the compiler can optimize
 this into a few compare operations.
Well, to be fair, the C version doesn't have to use strchr. You could just change if (strchr(safechars, *string) != NULL) { to char c = *string; if( (c>='a' && 'z'>=c) || (c>='A' && 'Z'>=c) || (c>='0' && '9'>=c) ) or even char set[][] = { {'a','z'}, {'A','Z'}, {'0','9'}; char c = *string; if( (c>=set[0][0] && set[0][1]>=c) || (c>=set[1][0] && set[1][1]>=c) || (c>=set[2][0] && set[2][1]>=c) ) or finally char set[][] = { {'a','z'}, {'A','Z'}, {'0','9'}; char c = *string; for( int i=0; i<sizeof(set); ++i ) { found = (c>=set[i][0] && set[i][1]>=c); } if(found) { which is a little hectic, bit is what I presume the Pascal version does internally.
Which basically proves Daniel's point. Why should the programmer have to code this explicitly? -- Sheldon Simms / sheldon semanticedge.com
Aug 21 2001
prev sibling parent reply "Angus Graham" <agraham_d agraham.ca> writes:
 Which basically proves Daniel's point. Why should the programmer have to
 code this explicitly?
Becuase if you have a large list of things to test membership in you can use a real container. If you have a tiny list you can use one line of code. How often would a built in set be useful? To me, seldom to never, so why cut down a tree and put it in every D manual? I'm not even too happy about having associative arrays built in. Much as I love them in Perl, it's just gonna make the runtime bigger and slower, my manual longer and heavier. A nice standard library is the place for containers. Angus Graham
Aug 21 2001
next sibling parent Daniel Mantione <daniel deadlock.et.tudelft.nl> writes:
Angus Graham wrote:

 Which basically proves Daniel's point. Why should the programmer have to
 code this explicitly?
Becuase if you have a large list of things to test membership in you can use a real container.
This is true for complex types. Sets of records would be example be a very bad idea, the question comes wether they need to be stored in a tree, btree, or whatever. In a practical language the programmers needs to be in control of this. In these cases, a tree object or a btree object is more apropiate. But in the case of ordinal types, a set isn't easily replaced by a library.
  If you have a tiny list you can use one line of code.
Only in the case of a constant set! And even then, people are going to do things like strchr yo make things easy to maintain. With a set, you have both: clean looking code and very efficient code.
 How often would a built in set be useful?  To me, seldom to never, so why
 cut down a tree and put it in every D manual?
You have never programmed in Pasacal or Delphi? If I program in C I *really* miss the set.
 I'm not even too happy about having associative arrays built in.  Much as
 I love them in Perl, it's just gonna make the runtime bigger and slower,
 my
 manual longer and heavier.  A nice standard library is the place for
 containers.
There is some truth in this. Associative arrays also need to be stored in internal datastructures over which the programmer has no control. An object might be a good idea too here. If you do it with properties, it will feel like an array too. Greetings, Daniel Mantione
Aug 22 2001
prev sibling parent "Anthony Steele" <asteele nospam.iafrica.com> writes:
"Angus Graham" <agraham_d agraham.ca> wrote in message
news:9lump1$2qal$1 digitaldaemon.com...
 Which basically proves Daniel's point. Why should the programmer have to
 code this explicitly?
Becuase if you have a large list of things to test membership in you can
use
 a real container.
True, for instance, In Delphi (*The* pascal-derived language these days), a set may have up to 256 elements, no more. All it really is whole lot of bits side by side, 32 byets at most. Membership testing is implemented as bit testing, union as or-ing the bits, etc.
 If you have a tiny list you can use one line of code.
 How often would a built in set be useful?  To me, seldom to never, so why
 cut down a tree and put it in every D manual?
Speaking as someone who has used Delphi for several years, sets over an enum are very usefull, used often, and would be missed. The most common usage is for options to a class or proc. For instance, look at this code fragment: type TFontStyle = (fsBold, fsItalic, fsUnderline, fsStrikeOut); TFontStyles = set of TFontStyle; class TFont procedure setStyle(style: TFontStyles); // actually this would be a property, but one thing at a time ... end; MyFont.setStyle( [fsBold, fsUnderline]); .. MyFont.Style := myFont.Style + fsItalic; The implementation is effiicient, and so is the usage syntax. And it's type-safe. Object-Pascal is not perfect, but this feature is really nice
Sep 14 2001
prev sibling next sibling parent reply Charles Hixson <charleshixsn earthlink.net> writes:
Daniel Mantione wrote:
 Hi,
 
 One of the features I miss the most in C is the set. Being a member of the 
 Free Pascal development team, I program a lot in Pascal. Now I don't want 
 to say that Pascal rules and C sucks, but there are some things that Pascal 
 is superior in.
 
...
It's hard to argue that sets wouldn't be nice to have. So would AVL-trees, and B+Trees. And other classical data structures. Perhaps a better approach would be to emphasize ways that make the language easy to extend with these features. If not, then if it came to a choice my vote would be for the inclusion of B+Trees as a standard language feature. I would want to use them far more frequently, and they are much more work to implement. So, perhaps the question should be, what possible language (as opposed to library) features would make it easier to integrate sets into the language? And how could these be generalized to support data strudtures in general? (Presumably including associative arrays, strings, and the other built in features.) One thing that most of these seem to do is to allow one to store data by means of some non-numeric keying function (usually a string, but lets try to keep this general). Another is to test the structure for the existence of that key. Another is to retrieve data associated with that key. Another is to remove a key, and its associated data. Anything else primitive-common? Then the operations are, essentially a[key]=x, x=a[key], x in a, and a[key]=null. Can these operations be defined for arbitrary user defined data structures? The one remaining operation is initialization. If a data structure were a class, then this could be handled with a constructor. So the problem would be in specifying how the argument list of the constructor could cope with the kinds of argument that you want to use to initialize a set. Is this a fair analysis? If so, then I feel it a better basis to proceed than on a data structure by data structure basis. And what is required here is that in be made an overloadable operator. (Well, and the argument list problem. I don't know how that should be handled. I usually start with an empty data structure and add things to it.)
Aug 22 2001
next sibling parent reply Russell Bornschlegel <kaleja estarcion.com> writes:
Charles Hixson wrote:
 If not, then if it came to a choice my
 vote would be for the inclusion of B+Trees as a standard language
 feature.  I would want to use them far more frequently, and they are
 much more work to implement.
Are you interested in the primitive functionality of B+Trees, or of some particular implementation-dependent characteristic of them? In a garbage-collected language like D, where you have few or no performance guarantees to begin with, it doesn't seem necessary to make a big deal over the latter.
 One thing that most of these seem to do is to allow one to store data by
 means of some non-numeric keying function (usually a string, but lets
 try to keep this general).  Another is to test the structure for the
 existence of that key.  Another is to retrieve data associated with that
 key.  Another is to remove a key, and its associated data.  Anything
 else primitive-common?
Yes: iterate over the whole collection, performing an arbitrary operation on each element. This is the only operation I see in your list that isn't supported by associative arrays in the D spec. A set can be represented by an associative array where the contents of an existing element are unused -- all you care about is the existence or absence of an element. Consider that the STL set and map containers are normally implemented as some flavor of binary tree, while Perl's associative array, which does essentially the same thing, is normally referred to and implemented as a hash. They're used the same way, but have slightly different performance characteristics -- that most people aren't worried about, because they're lazy and would otherwise be doing linear searches on arrays or linked lists, so the near-constant lookup time of a hash or the log2 lookup time of a tree is a huge win for them. Trees can also maintain the collection in order, while hashes can't.
 Then the operations are, essentially a[key]=x, x=a[key], x in a, and
 a[key]=null.
The D-spec associative array specifies "delete a[key]" instead of "a[key] = null", and I think you meant "key in a" instead of "x in a", but yeah. Note -- I do agree that the language should support adding new data structures for your particular needs; to do them reasonably requires: - operator overloading to make them pretty, including overloading operator delete, operator in, and operator []. - templates or the equivalent to make them reusable and type-safe. It looks like you aren't going to get both these features in early releases of D, unfortunately... -Russell B
Aug 22 2001
next sibling parent reply Charles Hixson <charleshixsn earthlink.net> writes:
Russell Bornschlegel wrote:
 Charles Hixson wrote:
 
If not, then if it came to a choice my
vote would be for the inclusion of B+Trees as a standard language
feature.  I would want to use them far more frequently, and they are
much more work to implement.
Are you interested in the primitive functionality of B+Trees, or of some particular implementation-dependent characteristic of them? In a garbage-collected language like D, where you have few or no performance guarantees to begin with, it doesn't seem necessary to make a big deal over the latter.
Yes. I want to store data on files, and access it in random order. And remember it from session to session. An external database can do this, but installing it remotely can be a large hassle, and it's often overkill for what I need.
 
One thing that most of these seem to do is to allow one to store data by
means of some non-numeric keying function (usually a string, but lets
try to keep this general).  Another is to test the structure for the
existence of that key.  Another is to retrieve data associated with that
key.  Another is to remove a key, and its associated data.  Anything
else primitive-common?
Yes: iterate over the whole collection, performing an arbitrary operation on each element. This is the only operation I see in your list that isn't supported by associative arrays in the D spec. A set can be represented by an associative array where the contents of an existing element are unused -- all you care about is the existence or absence of an element. Consider that the STL set and map containers are normally implemented as some flavor of binary tree, while Perl's associative array, which does essentially the same thing, is normally referred to and implemented as a hash. They're used the same way, but have slightly different performance characteristics -- that most people aren't worried about, because they're lazy and would otherwise be doing linear searches on arrays or linked lists, so the near-constant lookup time of a hash or the log2 lookup time of a tree is a huge win for them. Trees can also maintain the collection in order, while hashes can't.
You're right. Ok, in addition to the "in" operator we need an "each" or "for_each" operator. But that could be implemented as a function, though an operator would be nicer. Remember, that if these structures are implemented as classes, then additional functions can be added as needed, by sub-classing. (E.g., is an each operator sensible on a set? Only the common sensible ones are primitive-common.) Part of the question here is what are the inheritance characteristics. How does one specify a generic class parameter? A set of integers and a set of strings have slightly different characteristics. Does the inheritance mechanism allow for this specialization? If I define a set of wheels, what is it's relationship to a set of bicycle wheels? (Assuming that BicycleWheel is a descendant from Wheel.) Can I compare membership between the two sets? etc. Another question is what kind of languate is D? Clearly STL/C++ and Python would have much different answers if I asked them "What kind of object can I put into a set?" and "What kind of object can I get back out of a set?" Python can look at the thing you get back and tell what it is. C++ depends on the definition of the set to tell it what kind of object it got back. C++ is faster. Python is more flexible. Now my personal needs are for a language that produces stand-alone code. Python has problems with this (though that's being worked on). To me, speed is a secondary consideration. I'm currently working on a dataset of about 1500 cases with a basically N^2 complexity. It takes about 10-15 minutes to run using Eiffel code with all of the checks on. I'd like a faster execution, but it's not a killer. (Most of that time comes from hash table processing. I'm searching for a better alternative.) But it MUST be stand alone code. People need to be able to run it from out on the server, and I may never have seen their computer. Scratch Python. Scratch all interpretive languages. Scratch all languages that require massive dll's. etc. So that's how I notice what features are optimum. Garbage collection is a real winner. A B+Tree would really help, as would any method for maintaining persistent variables... help, but not be decisive. Stand-alone code is decisive.
 
Then the operations are, essentially a[key]=x, x=a[key], x in a, and
a[key]=null.
The D-spec associative array specifies "delete a[key]" instead of "a[key] = null", and I think you meant "key in a" instead of "x in a", but yeah. Note -- I do agree that the language should support adding new data structures for your particular needs; to do them reasonably requires: - operator overloading to make them pretty, including overloading operator delete, operator in, and operator []. - templates or the equivalent to make them reusable and type-safe. It looks like you aren't going to get both these features in early releases of D, unfortunately... -Russell B
Well, Walter said that it was alpha, and the features were subject to change. So now's the time to speak up. If I don't at least mention what I feel is important, then it's only my fault if it doesn't happen. If it doesn't happen anyway ... all languages are design tradeoffs. You examine what's around, and you see what you can make do what you need done. (And I'm more interested in the Linux version than in the Windows version, so what's in the early releases isn't THAT significant. But if I don't get people thinking about it, then it probably won't be in the later one's either.)
Aug 22 2001
parent reply Russell Bornschlegel <kaleja estarcion.com> writes:
Charles Hixson wrote:
 
 Russell Bornschlegel wrote:
 Charles Hixson wrote:
One thing that most of these seem to do is to allow one to store data by
means of some non-numeric keying function (usually a string, but lets
try to keep this general).  Another is to test the structure for the
existence of that key.  Another is to retrieve data associated with that
key.  Another is to remove a key, and its associated data.  Anything
else primitive-common?
Yes: iterate over the whole collection, performing an arbitrary operation on each element. This is the only operation I see in your list that isn't supported by associative arrays in the D spec.
You're right. Ok, in addition to the "in" operator we need an "each" or "for_each" operator. But that could be implemented as a function, though an operator would be nicer.
Turns out we have it in the D spec: for (int i = 0; i < assoc_array.keys.length; i++) { operate_on( assoc_array[ assoc_array.keys[i] ] ); }
 Another question is what kind of languate is D?  Clearly STL/C++ and
 Python would have much different answers if I asked them "What kind of
 object can I put into a set?" and "What kind of object can I get back
 out of a set?"  Python can look at the thing you get back and tell what
 it is.  C++ depends on the definition of the set to tell it what kind of
 object it got back.  C++ is faster.  Python is more flexible.
I know you're concerned with the builtin functionality, but it's worth mentioning that you can get the Pythonish functionality (if I understand what you're talking about -- I don't know Python) from C++ in a number of ways if you're willing to do some work. An STL container can store references to polymorphic types. C++ can do just about anything, wrap that functionality in a class, and tuck it away -- the only problem is that someone has to write that class in the first place. The code to do it, apparently, is much smaller than the Ada equivalent, but that's not saying a lot. ;)
 Now my personal needs are for a language that produces stand-alone code.
   Python has problems with this (though that's being worked on).  To me,
 speed is a secondary consideration.  I'm currently working on a dataset
 of about 1500 cases with a basically N^2 complexity. [problem description
 snipped]
All I can say is "C++ and STL maps." There's a reason C++ is widely used, and its syntactic elegance sure ain't it...
 It looks like you aren't going to get both these features in early
 releases of D, unfortunately...
Well, Walter said that it was alpha, and the features were subject to change. So now's the time to speak up. If I don't at least mention what I feel is important, then it's only my fault if it doesn't happen.
All granted. My concern right now is that a one-man, no-budget effort to develop a new language can get derailed very easily by people screaming "don't even bother unless you're going to put Feature X in!" I want Walter to get a usable compiler out, _then_ we can beat on it and complain in a more informed manner :) -Russell B
Aug 22 2001
next sibling parent reply Charles Hixson <charleshixsn earthlink.net> writes:
Russell Bornschlegel wrote:
 Charles Hixson wrote:
 
Russell Bornschlegel wrote:

Charles Hixson wrote:

One thing that most of these seem to do is to allow one to store data by
means of some non-numeric keying function (usually a string, but lets
try to keep this general).  Another is to test the structure for the
existence of that key.  Another is to retrieve data associated with that
key.  Another is to remove a key, and its associated data.  Anything
else primitive-common?
Yes: iterate over the whole collection, performing an arbitrary operation on each element. This is the only operation I see in your list that isn't supported by associative arrays in the D spec.
You're right. Ok, in addition to the "in" operator we need an "each" or "for_each" operator. But that could be implemented as a function, though an operator would be nicer.
Turns out we have it in the D spec: for (int i = 0; i < assoc_array.keys.length; i++) { operate_on( assoc_array[ assoc_array.keys[i] ] ); }
That makes it possible (well, easier, it was already possible). That doesn't make it clean to use. Contrast that with: assoc_array.for_each.key(k) operate_on (k); The syntax of that example is a bit wierd, as I am mangling several different languages, but I trust the meaning is clear. The question here is how the key should be passed to the operate_on method.
 
...
 
 C++ can do just about anything, wrap that functionality in a 
 class, and tuck it away -- the only problem is that someone has
 to write that class in the first place. The code to do it, 
 apparently, is much smaller than
 the Ada equivalent, but that's not saying a lot. ;)
Actually, except in the sense of algorithmic completeness, there are a lot of things that neither C++ nor any other statically bound language can do. But there's been a big speed and portability tradeoff, so languages have tended to divide into two camps, the dynamic linkers and the static linkers. C++, Ada, Eiffel, that kind are static linkers. Smalltalk, Lisp, Python, Ruby, that kind are the dynamic linkers. There are reasons for the split, and neither side is able to really capture the features of the other (though there are a lot of wild claims made). Java is a rather interesting mix, sort of standing half way between them. It attempts to get the advantages of both sides, and ends up getting a lot of the disadvantages of each, also. As well as some unique strengths. (Which it tends to ignore in favor of being familiar to C programmers, but look at Kiev to see some of the possibilities.)
 
 
...
All I can say is "C++ and STL maps." There's a reason C++ is widely used, and its syntactic elegance sure ain't it...
Doing it by hand in Eiffel is, for me, easier and more portable than using C++ with the STL. Possibly if I had to throw a Mac into the equation, my answer would be different, but so far I haven't needed that. I guess that if someone tells me it has to run on a Mac, I may tell them "Yellow Dog Linux". It's not a good answer, but I haven't really studied that question. Or maybe by that time Python will be able to do stand-alone compiles. Or Ruby. But it won't be C++ until after they fix their inheritance rules mess.
 
 
...
 
 All granted. My concern right now is that a one-man, no-budget effort 
 to develop a new language can get derailed very easily by people screaming
 "don't even bother unless you're going to put Feature X in!" I want Walter
 to get a usable compiler out, _then_ we can beat on it and complain in 
 a more informed manner :)
 
 -Russell B
 
Which is why I try to keep my suggestions to what seems to me as simple syntactic sugar. Maybe I'm wrong. Maybe I get overenthusiastic. But I try. E.g., most of this discussion was about what minimal changes could make lots of standard data structures easy and clean to use. Most of these would be library features. Inheritance is already scheduled to be a lot more dynamic than in C++, garbage collection is already built in (eliminating a lot of work for destructors). Etc. So what is being discussed is whether or not a couple of prepositions could be added, and whether or not they would be useful enough to bother with. But with decently flexible inheritance and overloadable user defined operators, then, say, :in: and :each: could be defined within a class tree, and all data structures could just implement their own flavor of it. So it would all reduce to libraries. But if it had to be written "in( xxx )" and "each ( xxx )" then it would be much uglier and less friendly.
Aug 22 2001
parent reply Russell Bornschlegel <kaleja estarcion.com> writes:
Charles Hixson wrote:
 
 Russell Bornschlegel wrote:
 Turns out we have it in the D spec:

   for (int i = 0; i < assoc_array.keys.length; i++)
   {
       operate_on( assoc_array[ assoc_array.keys[i] ] );
   }
That makes it possible (well, easier, it was already possible). That doesn't make it clean to use. Contrast that with: assoc_array.for_each.key(k) operate_on (k); The syntax of that example is a bit wierd, as I am mangling several different languages, but I trust the meaning is clear. The question here is how the key should be passed to the operate_on method.
Hmm. How about: foreach (keytype k in assoc_array.keys) { operate_on( k ); } This overloads "in" somewhat, but the foreach should be a sufficient hint to the parser; the keytype (int, string, whatever) makes the k look like a declaration like any other.
 C++ can do just about anything, wrap that functionality in a
 class, and tuck it away...
Actually, except in the sense of algorithmic completeness, there are a lot of things that neither C++ nor any other statically bound language can do.
Frankly, I've never used a dynamically linked language, I don't think. Can you give me a short example of a cool thing you can do with dynamic linking?
    So what is being discussed is whether or not a couple of prepositions
 could be added, and whether or not they would be useful enough to bother
 with.  But with decently flexible inheritance and overloadable user
 defined operators, then, say, :in: and :each: could be defined within a
 class tree, and all data structures could just implement their own
 flavor of it.  So it would all reduce to libraries.  But if it had to be
 written "in( xxx )" and "each ( xxx )" then it would be much uglier and
 less friendly.
Okay. I freely admit to being unafraid of ugly constructs, and to having been thoroughly warped by using C++. I firmly believe that no matter how elegant the language can be, some damn fool is going to use two-space indents, asinine bracketing styles, uncommented state machines implemented via numeric literal cases in huge nested switches, or other constructs of equal value, and that I'll be forced to maintain their code. C++ is the least of my problems this week. :) -RB
Aug 22 2001
parent reply Charles Hixson <charleshixsn earthlink.net> writes:
Russell Bornschlegel wrote:
 Charles Hixson wrote:
 
...
Hmm. How about: foreach (keytype k in assoc_array.keys) { operate_on( k ); } This overloads "in" somewhat, but the foreach should be a sufficient hint to the parser; the keytype (int, string, whatever) makes the k look like a declaration like any other.
Nice!
 
C++ can do just about anything, wrap that functionality in a
class, and tuck it away...
Actually, except in the sense of algorithmic completeness, there are a lot of things that neither C++ nor any other statically bound language can do.
Frankly, I've never used a dynamically linked language, I don't think. Can you give me a short example of a cool thing you can do with dynamic linking?
This one's a bit strange, and you don't mind wierd syntax, so how about: var = eval(include(filename.suf).something(include(otherfile.suf))) + "qwert"; The reason that most AI work is done in the dynamic languages isn't only because Lisp is easier to parse (or Forth would be more popular) and it sure isn't speed. It's because code can copy itself, modify the copy, and then transfer execution to the copy. Even where parsing is a bit difficult (can you say Python) this ability is maintained. This normally means that you need to cart around an interpreter to do the job, but not always. Some Smalltalk dialects can export their meaning as C code, and some dialects of Lisp are directly compileable. Now truthfully, this could also be done with C (assuming decent cooperation from the OS), but it's so difficult that nobody does it. So C, etc., don't get used in the areas that need that kind of flexibility. And Lisp, Python, Ruby, etc. don't get used where speed is needed, though there's a study that "proves" that Lisp code can execute as fast as C code. (I think that what they actually end up comparing is code optimizers, but that's not what they claim to show... and what's wrong with saying "My code can be more highly optimized with automatic tools, so that for a sufficiently complex task it is faster than your code!", but that's not what they said.)
 
...
 
 Okay. I freely admit to being unafraid of ugly constructs, and to having
 been thoroughly warped by using C++. I firmly believe that no matter how
 elegant the language can be, some damn fool is going to use two-space 
 indents, asinine bracketing styles, uncommented state machines implemented
 via numeric literal cases in huge nested switches, or other constructs of
 equal value, and that I'll be forced to maintain their code. C++ is the
 least of my problems this week. :)
 
 -RB
 
I generally end up maintaining my own code. So I guess that's not a problem for me (except when stupid editors convert my tabs into spaces, and won't let me say what columns the tabs are supposed to represent).
Aug 22 2001
parent Russell Bornschlegel <kaleja estarcion.com> writes:
Charles Hixson wrote:
 
 Russell Bornschlegel wrote:
 Frankly, I've never used a dynamically linked language, I don't
 think. Can you give me a short example of a cool thing you can
 do with dynamic linking?
This one's a bit strange, and you don't mind wierd syntax, so how about: var = eval(include(filename.suf).something(include(otherfile.suf))) + "qwert"; The reason that most AI work is done in the dynamic languages isn't only because Lisp is easier to parse (or Forth would be more popular) and it sure isn't speed. It's because code can copy itself, modify the copy, and then transfer execution to the copy. Even where parsing is a bit difficult (can you say Python) this ability is maintained. This normally means that you need to cart around an interpreter to do the job, but not always. Some Smalltalk dialects can export their meaning as C code, and some dialects of Lisp are directly compileable. Now truthfully, this could also be done with C (assuming decent cooperation from the OS), but it's so difficult that nobody does it. So C, etc., don't get used in the areas that need that kind of flexibility.
Ah, okay, Perl has this also -- I've seen it, seen that it could be cool, but outside of one cookbook use (a Perl program eval'ing a startup/rc file written in Perl) I haven't tried it. Really it's dynamic code- evaluation/dynamic compilation, as much as dynamic linking. I suppose I could invoke a C compiler to build a DLL and then bind the DLL, in C/Win32, but that's probably not really very time-efficient... :) And not really supported by standard C, but with the proper OS support you can build a sequence of native machine code in a buffer and then execute that buffer. High performance 3D applications have been known to do this, for highly critical polygon-pushing paths. Also, blah blah blah security blah blah blah executing arbitrary code from an external file blah blah blah. :) -RB
Aug 22 2001
prev sibling parent Dan Hursh <hursh infonet.isl.net> writes:
Russell Bornschlegel wrote:
 
 Charles Hixson wrote:
 Yes: iterate over the whole collection, performing an arbitrary
 operation on each element. This is the only operation I see in your
 list that isn't supported by associative arrays in the D spec.
You're right. Ok, in addition to the "in" operator we need an "each" or "for_each" operator. But that could be implemented as a function, though an operator would be nicer.
Turns out we have it in the D spec: for (int i = 0; i < assoc_array.keys.length; i++) { operate_on( assoc_array[ assoc_array.keys[i] ] ); }
Yes, but can this break if you are adding or deleting certain elements for the array as you go? In that case you would have to do this: string[] keys = assoc_array.keys; for (int i = 0; i < keys.length; i++){ if( bad_element(assoc_array[keys[i]]) ) delete assoc_array[keys[i]]; } A foreach that worked on arrays with a known size and associative arrays would be nicer and less error prone. Dan
Aug 26 2001
prev sibling parent Russell Bornschlegel <kaleja estarcion.com> writes:
Please excuse reply-to-self.

Russell Bornschlegel wrote:
 Yes: iterate over the whole collection, performing an arbitrary
 operation on each element. This is the only operation I see in your
 list that isn't supported by associative arrays in the D spec.
My mistake: myarray.keys returns the keys of an associative array, over which you can iterate to your hearts content. -Russell B
Aug 22 2001
prev sibling next sibling parent reply Daniel Mantione <daniel deadlock.et.tudelft.nl> writes:
Charles Hixson wrote:

 It's hard to argue that sets wouldn't be nice to have.  So would
 AVL-trees, and B+Trees.  And other classical data structures.  Perhaps a
 better approach would be to emphasize ways that make the language easy
 to extend with these features.  If not, then if it came to a choice my
 vote would be for the inclusion of B+Trees as a standard language
 feature.  I would want to use them far more frequently, and they are
 much more work to implement.
Nonsense. The set is an elementary type, just like the integer or pointer and no library can easily replace it, just like you cannot easily implement a float in a library. In fact, with any example you would be able to give, a the rewrite of that example with a set if almost allways superior in efficiency and/or maintainability and never has worse efficiency. It is true that mathematically, a set can consist of everything, but this is not done so in programming languages. It is true that a mathematically behaving set (that can store everything) can better be implemented as a set. With operator overloading you can make it behave as a set. Now the elementary datatype. A set is a bitmap in all programming languages that support it. In Pascal a set consits of a maximum of 256 different items, because the compiler uses a byte to enumerate them. This is a stupid limitation of Pascal, but the designers definitely got the idea. The built-in set should support types that can be enumerated. For example: enum colours {red, green, blue}; .. would do as the basetype of a set. char[]; .. would not do. You cannot enumerate all possible strings. char[2]; .. perhaps would do. That would be a bitmap of 65536 entries. Greetings, Daniel Mantione
Aug 22 2001
parent reply Charles Hixson <charleshixsn earthlink.net> writes:
Daniel Mantione wrote:
 Charles Hixson wrote:
 
 
It's hard to argue that sets wouldn't be nice to have.  So would
AVL-trees, and B+Trees.  And other classical data structures. ...
Nonsense. The set is an elementary type, just like the integer or pointer and no library can easily replace it, just like you cannot easily implement a float in a library. In fact, ... .. perhaps would do. That would be a bitmap of 65536 entries. Greetings, Daniel Mantione
Considering that bit arrays are an included part of the language, I fail to understand why a set could not be implemented as a library. Given that the compiler would have all the necessary information, it should (this might be optomistic) be able to optimize references to your "SmallSet" class as inline calls. So the problem is just syntax, which is what I was discussing. If that was answered, then I didn't understand the answer. Also, neither strings, associative arrays, nor sets are elementary types. They are composits. A reasonable way to model a string is as an array of characters. A reasonable way to model an associative array is as a hash table. And a reasonable way to model a small set is as a bit vector. But even these aren't the elementary types. If you wish evidence, consider C++ without the STL, or other complex libraries. You can build each of those types out of the elements present, yet none of them are present. And the things that you build them out of are simpler constructs that have been inherited from C. Now, perhaps what you meant was that it seems to you vastly preferable that a set type be built-in to the language. If so, then you have the right of it. I can't argue about what your opinion is. But others may legitimately have different opinions. What I am proposing is that the language support the operations needed to facilitate an easy use of sets, among other data structures. And that the implementation be parcelled out to the constructors of the libraries. This seems to me a much more efficient approach, as well as a more reasonable one. But then I don't often feel the need for sets. I even want to use bags more often than I want to use sets, and usually I prefer that my data be in some sorted form with attatched data structures. So I am also biased.
Aug 22 2001
parent reply Daniel Mantione <daniel deadlock.et.tudelft.nl> writes:
Charles Hixson wrote:

 Considering that bit arrays are an included part of the language, I fail
 to understand why a set could not be implemented as a library.  Given
 that the compiler would have all the necessary information, it should
 (this might be optomistic) be able to optimize references to your
 "SmallSet" class as inline calls.  So the problem is just syntax, which
 is what I was discussing.  If that was answered, then I didn't
 understand the answer.
To optimize things well, you need compiler magic. A simple ... if i in [0..9,18..20] ... would never get optimized if the set was implemented in a library. So compiler would be able to see that it can optimize the bitmap away in this case. About syntax, you would like that a set that is implemented with a btree in a library for example, programs same as the built in set. To be able to do that you need operator overloading, which is being heavily dicussed. Daniel
Aug 22 2001
parent Charles Hixson <charleshixsn earthlink.net> writes:
Daniel Mantione wrote:
 Charles Hixson wrote:
 
 
Considering that bit arrays are an included part of the language, I fail
to understand why a set could not be implemented as a library.  Given
that the compiler would have all the necessary information, it should
(this might be optomistic) be able to optimize references to your
"SmallSet" class as inline calls.  So the problem is just syntax, which
is what I was discussing.  If that was answered, then I didn't
understand the answer.
To optimize things well, you need compiler magic. A simple ... if i in [0..9,18..20] ... would never get optimized if the set was implemented in a library. So compiler would be able to see that it can optimize the bitmap away in this case. About syntax, you would like that a set that is implemented with a btree in a library for example, programs same as the built in set. To be able to do that you need operator overloading, which is being heavily dicussed. Daniel
A library should provide many different varieties of data structure. Not just one. You chose the flavor that fits your problem. As to compound ranges (not to my mind the same thing as a set, but rather one thing that is sometimes handled by sets), I don't know what the best choice would be. If there were a range primitive, then one might want to say: if i in [0..9] or i in [18..20] a bit clumsier, I admit. Or one could define a class that implemented range handling in the way that one chose, as defined by the problem under consideration. OTOH, I will admit that it would be nice if complex range constants were a recognized data type (which would imply complex range variables). And that one of the more reasonable ways to implement that would be as a set. But not the only reasonable way. It's perfectly reasonable to have a complex range that can handle things like: Range r = "Alpha" .. "Blueberry" union "Peanuts" ... "Zebras" All you need for this is two vectors of string, call them lower and upper. (The rest is a lot of work, but basically obvious.) I've needed to do this a couple of times, and once I needed to deal with complex ranges of floating point numbers, and even needed to perform arithmetic on the ranges. Consider what: r -= "Spinach"; would mean. (Defining what arithmetic and comparison ops mean when applied to ranges of floating point values is a bit interesting. That is probably very domain specific. But I needed to do it the time I needed ranges of floating point numbers [well, rationals was all I really needed, but it was easier to use floats].) Here union, intersect, remove, etc. are the basic ops. But these map in a relatively straightforward way onto the general data structure operations that I was proposing. (Relatively. It's certainly a less good fit than most.) But I rarely end up dealing with simple numbers except as numbers, so *I* don't see much need for the Pascal kind of set. Except occasionally.
Aug 22 2001
prev sibling parent reply "Anthony Steele" <asteele nospam.iafrica.com> writes:
"Charles Hixson" <charleshixsn earthlink.net> wrote in message
news:3B83D667.40505 earthlink.net...

...
It's hard to argue that sets wouldn't be nice to have. So would AVL-trees, and B+Trees. And other classical data structures. Perhaps a better approach would be to emphasize ways that make the language easy to extend with these features.
Most things can be added to an OO language simply by writing the appropriate classes, or finding them in the standard library. But I have not seen sets done well this way in an efficient manner - To take an example, Java, where classes are by design the *only* mechanism for making new types, there is a set class, and no language support for enums. However using sets and option-objects is so ugly and cumbersome that everyone, sun included, just uses named integer constants. This is a giant leap backwards. Built-in enums and sets in Delphi are a great enabler.
Sep 14 2001
parent Charles Hixson <charleshixsn earthlink.net> writes:
Anthony Steele wrote:

 "Charles Hixson" <charleshixsn earthlink.net> wrote in
 message news:3B83D667.40505 earthlink.net...


 ...
It's hard to argue that sets wouldn't be nice to have. So would AVL-trees, and B+Trees. And other classical data structures. Perhaps a better approach would be to emphasize ways that make the language easy to extend with these features.
Most things can be added to an OO language simply by writing the appropriate classes, or finding them in the standard library. But I have not seen sets done well this way in an efficient manner - To take an example, Java, where classes are by design the *only* mechanism for making new types, there is a set class, and no language support for enums. However using sets and option-objects is so ugly and cumbersome that
everyone, sun included, just uses named integer constants. This is a giant
 leap backwards. Built-in enums and sets in Delphi are a great 
enabler.

The primary feature needed to allow this (efficiently) is the 
ability to overload the indexing operators.



 everyone, sun included, just uses named integer
constants. This is a giant leap backwards. Built-in enums and sets in Delphi are a great enabler.
The primary feature needed to allow this (efficiently) is the ability to overload the indexing operators. In Python, e.g., the same opeartors used to set and access the hash table directories, can also be used to access B+Tree style databases. It's true that one needs ancillary operations to access the additional features (e.g., in-order traversal), but the features of set by key and retrieve by key can be accessed via a simple: a{"peanut"} = "butter"; print a{"peanut"} --> butter In python the key step is that when on initializes the variable ("a" in this case), instead of initializing it as a hash table: a = {} or a = {"Able" : "was", "I" : "ere"} one initializes it as a database: a = db("C:\file\location.dbm") The type of database is determined by the module "anydbm" which looks at the file to figure out which of three or four kinds of database it is. The B+Tree version that I'm speaking of is the SleepyCat database. But most of this doesn't require run-time configuration. Clearly one might prefer to declare at compile time that it was a B+Tree rather than a flat-file lookup table, but as long as the ancestral class (abstract class or interface?), say, DBM, had the requisite methods defined, that would be all that was required. Then the methods could be activated at run time. But if the standard operations can't be overloaded, then this falls through. As an alternative, I have proposed using demarkations, e.g. :+:, but this doesn't work well when used as indexing syntax, consider: a:[:"peanut":}: = "butter"; I'd almost prefer: a.set("peanut").to("butter"); or: a("peanut").set_to("butter");
Sep 17 2001
prev sibling parent "Walter" <walter digitalmars.com> writes:
An array of bits in D probably comes closest to what you're looking
for. -Walter
Oct 14 2001