D - Ode to the set

Daniel Mantione (70/70) Aug 21 2001 Hi,

Angus Graham (28/32) Aug 21 2001 Well, to be fair, the C version doesn't have to use strchr. You could ju...

Russell Bornschlegel (8/18) Aug 21 2001 Those alternatives aren't very pretty, IMO.

Daniel Mantione (20/26) Aug 21 2001 Our own Free Pascal compiler uses compares if the amount of compares tha...

Russell Bornschlegel (27/32) Aug 22 2001 Associative array of bit; 1 = exists, 0 = unexists.

Daniel Mantione (9/52) Aug 22 2001 Hi,

Russell Bornschlegel (41/46) Aug 22 2001 Your Ns_EncodeUrl() example? In C, that's a mess for a number of reasons...

Daniel Mantione (54/113) Aug 22 2001 Hi,

Russell Bornschlegel (19/36) Aug 22 2001 As currently specified, though, associative-array-of-bit actually has

Daniel Mantione (4/8) Aug 22 2001 Luckily you can turn off range checking, which you usually do when your

Walter (3/5) Oct 15 2001 Range checking is a wonderful feature that goes too far when you can't t...

Sean L. Palmer (8/15) Oct 28 2001 you

Walter (4/8) Aug 23 2001 They're a mix of hash tables and binary trees. The implementation would ...

Russell Bornschlegel (3/13) Aug 23 2001 So the spec won't offer promises about key ordering?
Rajiv Bhagwat (12/20) Aug 24 2001 Have you checked up 'Suneido' language, where the common 'object' data t...

Walter (4/8) Oct 14 2001 It's supposed to be an implementation detail , but they are implement...

Walter (7/9) Aug 23 2001 int[char] set;

Sheldon Simms (6/47) Aug 21 2001 Which basically proves Daniel's point. Why should the programmer have to

D Man (4/51) Aug 21 2001 I second that. I am not a fan of Pascal, but if there are features that...
Angus Graham (9/11) Aug 21 2001 Becuase if you have a large list of things to test membership in you can...

Daniel Mantione (19/32) Aug 22 2001 This is true for complex types. Sets of records would be example be a ve...
Anthony Steele (22/29) Sep 14 2001 use

Charles Hixson (31/39) Aug 22 2001 It's hard to argue that sets wouldn't be nice to have. So would

Russell Bornschlegel (33/45) Aug 22 2001 Are you interested in the primitive functionality of B+Trees, or of

Charles Hixson (48/113) Aug 22 2001 Yes. I want to store data on files, and access it in random order. And...

Russell Bornschlegel (23/57) Aug 22 2001 Turns out we have it in the D spec:

Charles Hixson (45/101) Aug 22 2001 That makes it possible (well, easier, it was already possible). That

Russell Bornschlegel (20/51) Aug 22 2001 Hmm. How about:

Charles Hixson (26/66) Aug 22 2001 This one's a bit strange, and you don't mind wierd syntax, so how about:

Russell Bornschlegel (14/34) Aug 22 2001 Ah, okay, Perl has this also -- I've seen it, seen that it could be

Dan Hursh (12/28) Aug 26 2001 Yes, but can this break if you are adding or deleting certain elements

Russell Bornschlegel (5/8) Aug 22 2001 My mistake: myarray.keys returns the keys of an associative array,

Daniel Mantione (24/31) Aug 22 2001 Nonsense. The set is an elementary type, just like the integer or pointe...

Charles Hixson (29/46) Aug 22 2001 Considering that bit arrays are an included part of the language, I fail...

Daniel Mantione (10/17) Aug 22 2001 To optimize things well, you need compiler magic. A simple ...

Charles Hixson (34/60) Aug 22 2001 A library should provide many different varieties of data structure.

Anthony Steele (10/15) Sep 14 2001 Most things can be added to an OO language simply by writing the appropr...

Charles Hixson (40/62) Sep 17 2001 everyone, sun included, just uses named integer constants. This

Walter (2/2) Oct 14 2001 An array of bits in D probably comes closest to what you're looking

Daniel Mantione <daniel deadlock.et.tudelft.nl> writes:

Hi,

One of the features I miss the most in C is the set. Being a member of the 
Free Pascal development team, I program a lot in Pascal. Now I don't want 
to say that Pascal rules and C sucks, but there are some things that Pascal 
is superior in.

Why do you need a set?

I think it can best be illustrated with an example. The following code code 
from the AOLserver http server and is used to encode a parameter to place 
in in a URL. For example the text "This is a sentence" is changed into 
"This%20is%20a%20sentence".

char *
Ns_EncodeUrl(Ns_DString *pds, char *string)
{
    static char safechars[] =
        "abcdefghijklmnopqrstuvwxyz"
        "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
        "0123456789";

    while (*string != '\0') {

        if (strchr(safechars, *string) != NULL) {
            Ns_DStringNAppend(pds, string, 1);
        } else {
            char buf[4];
            sprintf(buf, "%%%02x", (unsigned char) *string);
            Ns_DStringNAppend(pds, buf, 3);
        }
        ++string;
    }
    return pds->string;
}

That's really horrible inefficient isn't it? But it's easy to maintain, if 
the safechars are not correct, you can correct it by changing it. Now how 
would you do this into Pascal?

function ns_encodeurl(s:string):string;

const   safechars=['a'..'z','A'..'Z','0'..'9'];

var     i:longint;

begin
    ns_encodeurl:='';
    for i:=1 to length(s) do
        if s[i] in safechars then
            ns_encodeurl:=ns_encodeurl+s[i]
        else
            ns_encodeurl:=ns_encodeurl+'%'+hexstr(byte(s[i]));
end;

That looks much cleaner! And it's much more efficient; instead of a strchr 
call for each character the compiler now only needs a set lookup. And 
because this is a constant set, the compiler can optimize this into a few 
compare operations.

Set's make also good flags. In C you usually see code like this:

#define borders_enabled         1
#define shadow_enabled          2
#define lights_enabled          4

And then:

void do_nice_things(int flags) {
    if (flags & borders_enabled) {
    };
};

Now how would you do it with a set? A simple enumeration does the trick:

type    effects=(borders_enabled,shadow_enabled,lights_enabled);

And then:

procedure do_nice_things(flags:set of effects);

begin
    if borders_enabled in effects then
        begin
        end;
end;

... so no terrible long header files. Just simple enumerations, while the 
compiler will propably generate the same code.

So, how about a set in D??

Greetings,

Daniel Mantione

Aug 21 2001

"Angus Graham" <agraham_d agraham.ca> writes:

"Daniel Mantione" <daniel deadlock.et.tudelft.nl> wrote
 That looks much cleaner! And it's much more efficient; instead of a strchr
 call for each character the compiler now only needs a set lookup. And
 because this is a constant set, the compiler can optimize this into a few
 compare operations.

Well, to be fair, the C version doesn't have to use strchr. You could just
change

if (strchr(safechars, *string) != NULL) {

to

char c = *string;
if( (c>='a' && 'z'>=c) || (c>='A' && 'Z'>=c) || (c>='0' && '9'>=c) )

or even

char set[][] = { {'a','z'},
                         {'A','Z'},
                         {'0','9'};
char c = *string;
if( (c>=set[0][0] && set[0][1]>=c) || (c>=set[1][0] && set[1][1]>=c) ||
(c>=set[2][0] && set[2][1]>=c) )

or finally

char set[][] = { {'a','z'},
                         {'A','Z'},
                         {'0','9'};
char c = *string;
for( int i=0; i<sizeof(set); ++i )
{
    found = (c>=set[i][0] && set[i][1]>=c);
}
if(found)
{

which is a little hectic, bit is what I presume the Pascal version does
internally.


Angus Graham

Aug 21 2001

Russell Bornschlegel <kaleja estarcion.com> writes:

Angus Graham wrote:
 
 "Daniel Mantione" <daniel deadlock.et.tudelft.nl> wrote
 That looks much cleaner! And it's much more efficient; instead of a strchr
 call for each character the compiler now only needs a set lookup. And
 because this is a constant set, the compiler can optimize this into a few
 compare operations.

 
 Well, to be fair, the C version doesn't have to use strchr. You could just
 change... [options snipped]

Those alternatives aren't very pretty, IMO.

strchr()'s actually pretty efficient -- only one function call involved, 
and I presume the typical Pascal system is going to make a function call 
to do a set lookup. However, strchr() isn't a general object-in-set 
function. 

For the general case, are D's associative arrays good enough, Daniel?

-RB

Aug 21 2001

Daniel Mantione <daniel deadlock.et.tudelft.nl> writes:

Russell Bornschlegel wrote:

 Those alternatives aren't very pretty, IMO.

That's indeed the point. Code like this is hard to maintain.

 strchr()'s actually pretty efficient -- only one function call involved,
 and I presume the typical Pascal system is going to make a function call
 to do a set lookup. However, strchr() isn't a general object-in-set
 function.

Our own Free Pascal compiler uses compares if the amount of compares that 
are needed is smaller than 9. If the amount of compares is greater than 9, 
it creates a bit map, and checks if bit number x is set.

So:

if s[i] in ['0'..'9']

.. will generate the same code as ..

if i>='0' and i<='9'

.. while ..



.. will generate code similar to ..



if bitmap[s[i]] then

--

Both are much more efficient than a strchr call. Furthermore, strchr is 
often optmized for very long strings, look at the implementation in the GNU 
libc for an example how much code it is.


 For the general case, are D's associative arrays good enough, Daniel?

They are a cool feature, but how would you use them as a set replacement?

Daniel

Aug 21 2001

Russell Bornschlegel <kaleja estarcion.com> writes:

Daniel Mantione wrote:
 
 Russell Bornschlegel wrote:
 For the general case, are D's associative arrays good enough, Daniel?

 
 They are a cool feature, but how would you use them as a set replacement?

Associative array of bit; 1 = exists, 0 = unexists.

 bit[ char ] myset;     // set of char

 // do we need to explicitly zero things here?

 myset[ 'x' ] = 1;      // 'x' exists in set
 myset[ 'y' ] = 1;      // 'y' exists in set
 myset[ 'j' ] = 1;      // 'j' exists in set

 if (myset['j'] == 1)  { ... }

 myset['y'] = 0;  // take y out of the set

 if (myset['y'] == 1)  { ... }

Alternately, ignore the contents of each element, and use 
the existence of an element in the associative array as set 
membership.

 int[ char ] myset;     // set of char

 myset[ 'x' ] = 1;      // 'x' exists in set
 myset[ 'y' ] = 42;     // 'y' exists in set
 myset[ 'j' ] = 3721;   // 'j' exists in set

 if ('j' in myset)  { ... }

 delete myset['y'];  // take y out of the set

 if ('y' in myset)  { ... }

The D spec is silent on the implementation details for 
associative arrays, but they'd likely be binary trees (ordered 
keys, lookup time normally O(log2(n))) or hashes (disordered keys, 
lookup time normally O(1)).

The D compiler could conceivably do weird optimizations on 
bit-type associative arrays.

-RB

Aug 22 2001

Daniel Mantione <daniel deadlock.et.tudelft.nl> writes:

Hi,

No, this won't do as replacement; it is much to cumbersome to use. You are 
right that a set allways needs to be a bitmap in unoptimized form. The 
compiler can then decide if it optimizes the bitmap away.

Try to code my first example with such an array. It's no replacement.

Furthermore, sets also have union, difference and subset operations, which 
you don't want to hard code.

Daniel

Russell Bornschlegel wrote:

 They are a cool feature, but how would you use them as a set replacement?

 
 Associative array of bit; 1 = exists, 0 = unexists.
 
  bit[ char ] myset;     // set of char
 
  // do we need to explicitly zero things here?
 
  myset[ 'x' ] = 1;      // 'x' exists in set
  myset[ 'y' ] = 1;      // 'y' exists in set
  myset[ 'j' ] = 1;      // 'j' exists in set
 
  if (myset['j'] == 1)  { ... }
 
  myset['y'] = 0;  // take y out of the set
 
  if (myset['y'] == 1)  { ... }
 
 Alternately, ignore the contents of each element, and use
 the existence of an element in the associative array as set
 membership.
 
  int[ char ] myset;     // set of char
 
  myset[ 'x' ] = 1;      // 'x' exists in set
  myset[ 'y' ] = 42;     // 'y' exists in set
  myset[ 'j' ] = 3721;   // 'j' exists in set
 
  if ('j' in myset)  { ... }
 
  delete myset['y'];  // take y out of the set
 
  if ('y' in myset)  { ... }
 
 The D spec is silent on the implementation details for
 associative arrays, but they'd likely be binary trees (ordered
 keys, lookup time normally O(log2(n))) or hashes (disordered keys,
 lookup time normally O(1)).
 
 The D compiler could conceivably do weird optimizations on
 bit-type associative arrays.
 
 -RB

Aug 22 2001

Russell Bornschlegel <kaleja estarcion.com> writes:

Daniel Mantione wrote:

 No, this won't do as replacement; it is much to cumbersome to use. You are
 right that a set allways needs to be a bitmap in unoptimized form. The
 compiler can then decide if it optimizes the bitmap away.
 
 Try to code my first example with such an array. It's no replacement.

Your Ns_EncodeUrl() example? In C, that's a mess for a number of reasons, 
not the least of which is C's inadequate string handling. 

In terms of performance efficiency, in C, the set lookup is going to get 
swamped by the sprintf. In Pascal, the set lookup may be faster but it'll 
be drowned out over the long haul by array bounds checking. :)

In terms of implementation cleanliness, Pascal wins out here. In D, I think
the _usage_ of the set is just as clean; if assoc-array-of-bits is optimized 
specially, then performance should be basically equal, and that just leaves 
the instantiation of the set. 

So, I concede the desirability of static range-initialization in associative 
arrays in D:

  char[] hexlate = "0123456789ABCDEF";
  bit[char] safechars;

  // we want a way of saying this in a static initializer
  safechars[ 'a'..'z' ] = 1;
  safechars[ 'A'..'Z' ] = 1;
  safechars[ '0'..'9' ] = 1;

  // all others left undefined/null/nonexistent

  char[] source, destination;

  for (i = 0; i < source.length(); i++)
  {
    char c = source[i];
    if (c in safechars)
    {
        // append character
        destination.append( c );
	// Walter: does destination += c do the Right Thing?
    }
    else
    {

        destination.append( '%%' );
        destination.append( hexlate[ ( (c >> 4) & 0x0F ) ] );   // upper 4 bits
of c as hex digit
        destination.append( hexlate[ (c & 0x0F) ] );            // lower 4 bits
of c as hex digit
    }
  }

Which seems reasonably clean to me. It's not clear to me from my reading of 
the D spec whether Walter intended range initialization to work on associative 
arrays. 

-RB

Aug 22 2001

Daniel Mantione <daniel deadlock.et.tudelft.nl> writes:

Hi,

Yes, this looks like a very promising implementation to use sets. What you 
are essentially saying is that an array of bits has set semantics, like the 
"in" operator.

This fits very well in the philosophy of the language. In C you can use an 
int as boolean, in Pascal not. This can apply to arrays/sets as set as well.

Still, I would propose to also have some kind of set constructor, to be 
able to code:

if (i in ['a'..'z']) {};

So the set constructor ['a'..'z'] constructs an associative array of bits. 
Now the compiler can analize this and see that the same result can be 
achieved by doing compares. In fact, Free Pascal does this when it 
encounters a constant set.

Note that a set constructor is not something to write down a constant set, 
but an expression that results in a set. The following code is completely 
legal:

var a,b:longint;
    c:set;

begin
    a:=1;
    b:=3;
    c:=[a,b,2*b+7,4,5,6];
end;

How about using [...] as set constructor in D? I don't think it makes the 
syntax ambiguous. Or do we need a different constructor?

Your example would then be written as:

   char[] hexlate = "0123456789ABCDEF";
   bit[char] safechars = ['a'..'z','A'..'Z','0'..'9'];
 
   // all others left undefined/null/nonexistent
 
   char[] source, destination;
 
   for (i = 0; i < source.length(); i++)
   {
     char c = source[i];
     if (c in safechars)
     {
         // append character
         destination.append( c );
         // Walter: does destination += c do the Right Thing?
     }
     else
     {

         destination.append( '%%' );
         destination.append( hexlate[ ( (c >> 4) & 0x0F ) ] );
         destination.append( hexlate[ (c & 0x0F) ] );
     }
   }

Russell Bornschlegel wrote:

 Daniel Mantione wrote:
 
 No, this won't do as replacement; it is much to cumbersome to use. You
 are right that a set allways needs to be a bitmap in unoptimized form.
 The compiler can then decide if it optimizes the bitmap away.
 
 Try to code my first example with such an array. It's no replacement.

 
 Your Ns_EncodeUrl() example? In C, that's a mess for a number of reasons,
 not the least of which is C's inadequate string handling.
 
 In terms of performance efficiency, in C, the set lookup is going to get
 swamped by the sprintf. In Pascal, the set lookup may be faster but it'll
 be drowned out over the long haul by array bounds checking. :)

What do you mean? There are no arrays involved.

 In terms of implementation cleanliness, Pascal wins out here. In D, I
 think the _usage_ of the set is just as clean; if assoc-array-of-bits is
 optimized specially, then performance should be basically equal, and that
 just leaves the instantiation of the set.
 
 So, I concede the desirability of static range-initialization in
 associative arrays in D:
 
   char[] hexlate = "0123456789ABCDEF";
   bit[char] safechars;
 
   // we want a way of saying this in a static initializer
   safechars[ 'a'..'z' ] = 1;
   safechars[ 'A'..'Z' ] = 1;
   safechars[ '0'..'9' ] = 1;
 
   // all others left undefined/null/nonexistent
 
   char[] source, destination;
 
   for (i = 0; i < source.length(); i++)
   {
     char c = source[i];
     if (c in safechars)
     {
         // append character
         destination.append( c );
 // Walter: does destination += c do the Right Thing?
     }
     else
     {

         destination.append( '%%' );
         destination.append( hexlate[ ( (c >> 4) & 0x0F ) ] );   // upper 4
         bits of c as hex digit
         destination.append( hexlate[ (c & 0x0F) ] );            // lower 4
         bits of c as hex digit
     }
   }
 
 Which seems reasonably clean to me. It's not clear to me from my reading
 of the D spec whether Walter intended range initialization to work on
 associative arrays.
 
 -RB

Greetings,

Daniel Mantione

Aug 22 2001

Russell Bornschlegel <kaleja estarcion.com> writes:

Daniel Mantione wrote:
 Yes, this looks like a very promising implementation to use sets. What you
 are essentially saying is that an array of bits has set semantics, like the
 "in" operator.

As currently specified, though, associative-array-of-bit actually has 
three states per key: 1, 0, or not-in-set. That's not a great fit to an
underlying array-of-bit implementation. 

 Still, I would propose to also have some kind of set constructor, to be
 able to code:
 
 if (i in ['a'..'z']) {};
 
 So the set constructor ['a'..'z'] constructs an associative array of bits.
 Now the compiler can analize this and see that the same result can be
 achieved by doing compares. In fact, Free Pascal does this when it
 encounters a constant set.

OK.

 In terms of performance efficiency, in C, the set lookup is going to get
 swamped by the sprintf. In Pascal, the set lookup may be faster but it'll
 be drowned out over the long haul by array bounds checking. :)

 
 What do you mean? There are no arrays involved.

Not in that code snippet, but in the rest of the program, whatever that 
program might be. That's what I meant by "the long haul".

I really tried to resist suggesting:

 static char* translate[] = { "%00", "%01", "%02" ... "0", "1", "2" ... "A",
"B", "C" ... "a", "b", "c" ... "%FE", "%FF"
}; // about 2K bytes

 while (*string)
 {
   char* x = translate[*string++];
   while (*x)
   {
     *dest++ = *x++;
   }
 }

-RB

Aug 22 2001

Daniel Mantione <daniel deadlock.et.tudelft.nl> writes:

Russell Bornschlegel wrote:

 What do you mean? There are no arrays involved.

 
 Not in that code snippet, but in the rest of the program, whatever that
 program might be. That's what I meant by "the long haul".

Luckily you can turn off range checking, which you usually do when your 
program is bug-free.

Daniel

Aug 22 2001

"Walter" <walter digitalmars.com> writes:

Daniel Mantione wrote in message <9m19an$1a1b$1 digitaldaemon.com>...
Luckily you can turn off range checking, which you usually do when your
program is bug-free.

Range checking is a wonderful feature that goes too far when you can't turn
it off for the production release.

Oct 15 2001

"Sean L. Palmer" <spalmer iname.com> writes:

"Russell Bornschlegel" <kaleja estarcion.com> wrote in message
news:3B8422B8.80B09E1C estarcion.com...
 Daniel Mantione wrote:
 Yes, this looks like a very promising implementation to use sets. What


you
 are essentially saying is that an array of bits has set semantics, like


the
 "in" operator.

 As currently specified, though, associative-array-of-bit actually has
 three states per key: 1, 0, or not-in-set. That's not a great fit to an
 underlying array-of-bit implementation.

Seems like a set is more of an associative array of a type that has no value
and takes no storage.  So I don't think associative array of any value type
would be an optimal way to implement a set.

Sean

Oct 28 2001

"Walter" <walter digitalmars.com> writes:

"Russell Bornschlegel" <kaleja estarcion.com> wrote in message
news:3B83EF19.D755A19 estarcion.com...
 The D spec is silent on the implementation details for
 associative arrays, but they'd likely be binary trees (ordered
 keys, lookup time normally O(log2(n))) or hashes (disordered keys,
 lookup time normally O(1)).

They're a mix of hash tables and binary trees. The implementation would be
free to pick its own way of implementing them.

Aug 23 2001

Russell Bornschlegel <kaleja estarcion.com> writes:

Walter wrote:
 
 "Russell Bornschlegel" <kaleja estarcion.com> wrote in message
 news:3B83EF19.D755A19 estarcion.com...
 The D spec is silent on the implementation details for
 associative arrays, but they'd likely be binary trees (ordered
 keys, lookup time normally O(log2(n))) or hashes (disordered keys,
 lookup time normally O(1)).

 
 They're a mix of hash tables and binary trees. The implementation would be
 free to pick its own way of implementing them.

So the spec won't offer promises about key ordering?

-R

Aug 23 2001

"Rajiv Bhagwat" <dataflow vsnl.com> writes:

Walter <walter digitalmars.com> wrote in message
news:9m4329$2vv4$1 digitaldaemon.com...
 "Russell Bornschlegel" <kaleja estarcion.com> wrote in message
 news:3B83EF19.D755A19 estarcion.com...
 The D spec is silent on the implementation details for
 associative arrays, but they'd likely be binary trees (ordered
 keys, lookup time normally O(log2(n))) or hashes (disordered keys,
 lookup time normally O(1)).

 They're a mix of hash tables and binary trees. The implementation would be
 free to pick its own way of implementing them.

Have you checked up 'Suneido' language, where the common 'object' data type
is accesible *both* as asso. arrays and vectors? Taking their own example,
this is how you can specify the object literals:






In the last example, the value of element 2 is "Joe", also accesible thru
[2].
More on www.suneido.com

-- Rajiv

Aug 24 2001

"Walter" <walter digitalmars.com> writes:

Russell Bornschlegel wrote in message <3B83EF19.D755A19 estarcion.com>...
The D spec is silent on the implementation details for
associative arrays, but they'd likely be binary trees (ordered
keys, lookup time normally O(log2(n))) or hashes (disordered keys,
lookup time normally O(1)).


It's supposed to be an implementation detail <g>, but they are implemented
as a hashed array of binary trees. My experience with them is that they are
very efficient.

Oct 14 2001

"Walter" <walter digitalmars.com> writes:

"Daniel Mantione" <daniel deadlock.et.tudelft.nl> wrote in message
news:9lvkln$a05$1 digitaldaemon.com...
 For the general case, are D's associative arrays good enough, Daniel?

 They are a cool feature, but how would you use them as a set replacement?

int[char] set;
char c;

if (c in set)
{
}

Aug 23 2001

"Sheldon Simms" <sheldon semanticedge.com> writes:

Im Artikel <9luk4s$2oqi$1 digitaldaemon.com> schrieb "Angus Graham"
<agraham_d agraham.ca>:

 "Daniel Mantione" <daniel deadlock.et.tudelft.nl> wrote
 That looks much cleaner! And it's much more efficient; instead of a
 strchr call for each character the compiler now only needs a set
 lookup. And because this is a constant set, the compiler can optimize
 this into a few compare operations.

 
 Well, to be fair, the C version doesn't have to use strchr. You could
 just change
 
 if (strchr(safechars, *string) != NULL) {
 
 to
 
 char c = *string;
 if( (c>='a' && 'z'>=c) || (c>='A' && 'Z'>=c) || (c>='0' && '9'>=c) )
 
 or even
 
 char set[][] = { {'a','z'},
                          {'A','Z'},
                          {'0','9'};
 char c = *string;
 if( (c>=set[0][0] && set[0][1]>=c) || (c>=set[1][0] && set[1][1]>=c) ||
 (c>=set[2][0] && set[2][1]>=c) )
 
 or finally
 
 char set[][] = { {'a','z'},
                          {'A','Z'},
                          {'0','9'};
 char c = *string;
 for( int i=0; i<sizeof(set); ++i )
 {
     found = (c>=set[i][0] && set[i][1]>=c);
 }
 if(found)
 {
 
 which is a little hectic, bit is what I presume the Pascal version does
 internally.

Which basically proves Daniel's point. Why should the programmer have to
code this explicitly?

-- 
Sheldon Simms / sheldon semanticedge.com

Aug 21 2001

"D Man" <clockwork austin.rr.com> writes:

I second that.  I am not a fan of Pascal, but if there are features that are
this handy...

"Sheldon Simms" <sheldon semanticedge.com> wrote in message
news:9lul5d$2p87$1 digitaldaemon.com...
 Im Artikel <9luk4s$2oqi$1 digitaldaemon.com> schrieb "Angus Graham"
 <agraham_d agraham.ca>:

 "Daniel Mantione" <daniel deadlock.et.tudelft.nl> wrote
 That looks much cleaner! And it's much more efficient; instead of a
 strchr call for each character the compiler now only needs a set
 lookup. And because this is a constant set, the compiler can optimize
 this into a few compare operations.

 Well, to be fair, the C version doesn't have to use strchr. You could
 just change

 if (strchr(safechars, *string) != NULL) {

 to

 char c = *string;
 if( (c>='a' && 'z'>=c) || (c>='A' && 'Z'>=c) || (c>='0' && '9'>=c) )

 or even

 char set[][] = { {'a','z'},
                          {'A','Z'},
                          {'0','9'};
 char c = *string;
 if( (c>=set[0][0] && set[0][1]>=c) || (c>=set[1][0] && set[1][1]>=c) ||
 (c>=set[2][0] && set[2][1]>=c) )

 or finally

 char set[][] = { {'a','z'},
                          {'A','Z'},
                          {'0','9'};
 char c = *string;
 for( int i=0; i<sizeof(set); ++i )
 {
     found = (c>=set[i][0] && set[i][1]>=c);
 }
 if(found)
 {

 which is a little hectic, bit is what I presume the Pascal version does
 internally.

 Which basically proves Daniel's point. Why should the programmer have to
 code this explicitly?

 --
 Sheldon Simms / sheldon semanticedge.com

Aug 21 2001

"Angus Graham" <agraham_d agraham.ca> writes:

 Which basically proves Daniel's point. Why should the programmer have to
 code this explicitly?

Becuase if you have a large list of things to test membership in you can use
a real container.  If you have a tiny list you can use one line of code.
How often would a built in set be useful?  To me, seldom to never, so why
cut down a tree and put it in every D manual?
I'm not even too happy about having associative arrays built in.  Much as I
love them in Perl, it's just gonna make the runtime bigger and slower, my
manual longer and heavier.  A nice standard library is the place for
containers.

Angus Graham

Aug 21 2001

Daniel Mantione <daniel deadlock.et.tudelft.nl> writes:

Angus Graham wrote:

 Which basically proves Daniel's point. Why should the programmer have to
 code this explicitly?

 
 Becuase if you have a large list of things to test membership in you can
 use a real container.

This is true for complex types. Sets of records would be example be a very 
bad idea, the question comes wether they need to be stored in a tree, 
btree, or whatever. In a practical language the programmers needs to be in 
control of this. In these cases, a tree object or a btree object is more 
apropiate.

But in the case of ordinal types, a set isn't easily replaced by a library.

  If you have a tiny list you can use one line of code.

Only in the case of a constant set!

And even then, people are going to do things like strchr yo make things 
easy to maintain. With a set, you have both: clean looking code and very 
efficient code.

 How often would a built in set be useful?  To me, seldom to never, so why
 cut down a tree and put it in every D manual?

You have never programmed in Pasacal or Delphi? If I program in C I 
*really* miss the set.

 I'm not even too happy about having associative arrays built in.  Much as
 I love them in Perl, it's just gonna make the runtime bigger and slower,
 my
 manual longer and heavier.  A nice standard library is the place for
 containers.

There is some truth in this. Associative arrays also need to be stored in 
internal datastructures over which the programmer has no control. An object 
might be a good idea too here. If you do it with properties, it will feel 
like an array too.

Greetings,

Daniel Mantione

Aug 22 2001

"Anthony Steele" <asteele nospam.iafrica.com> writes:

"Angus Graham" <agraham_d agraham.ca> wrote in message
news:9lump1$2qal$1 digitaldaemon.com...
 Which basically proves Daniel's point. Why should the programmer have to
 code this explicitly?

 Becuase if you have a large list of things to test membership in you can

use
 a real container.

True, for instance, In Delphi (*The* pascal-derived language these days), a
set may have up to 256 elements, no more. All it really is whole lot of bits
side by side, 32 byets at most. Membership testing is implemented as bit
testing, union as or-ing the bits, etc.

 If you have a tiny list you can use one line of code.
 How often would a built in set be useful?  To me, seldom to never, so why
 cut down a tree and put it in every D manual?

Speaking as someone who has used Delphi for several years, sets over an enum
are very usefull, used often, and would be missed. The most common usage is
for options to a class or proc. For instance, look at this code fragment:

type
  TFontStyle = (fsBold, fsItalic, fsUnderline, fsStrikeOut);
  TFontStyles = set of TFontStyle;

 class TFont

   procedure setStyle(style: TFontStyles); // actually this would be a
property, but one thing at a time
 ...
end;

  MyFont.setStyle( [fsBold, fsUnderline]);
.. MyFont.Style := myFont.Style + fsItalic;

The implementation is effiicient, and so is the usage syntax. And it's
type-safe. Object-Pascal is not perfect, but this feature is really nice

Sep 14 2001

Charles Hixson <charleshixsn earthlink.net> writes:

Daniel Mantione wrote:
 Hi,
 
 One of the features I miss the most in C is the set. Being a member of the 
 Free Pascal development team, I program a lot in Pascal. Now I don't want 
 to say that Pascal rules and C sucks, but there are some things that Pascal 
 is superior in.
 
...

It's hard to argue that sets wouldn't be nice to have.  So would 
AVL-trees, and B+Trees.  And other classical data structures.  Perhaps a 
better approach would be to emphasize ways that make the language easy 
to extend with these features.  If not, then if it came to a choice my 
vote would be for the inclusion of B+Trees as a standard language 
feature.  I would want to use them far more frequently, and they are 
much more work to implement.

So, perhaps the question should be, what possible language (as opposed 
to library) features would make it easier to integrate sets into the 
language?  And how could these be generalized to support data strudtures 
in general?  (Presumably including associative arrays, strings, and the 
other built in features.)

One thing that most of these seem to do is to allow one to store data by 
means of some non-numeric keying function (usually a string, but lets 
try to keep this general).  Another is to test the structure for the 
existence of that key.  Another is to retrieve data associated with that 
key.  Another is to remove a key, and its associated data.  Anything 
else primitive-common?


Then the operations are, essentially a[key]=x, x=a[key], x in a, and 
a[key]=null.
Can these operations be defined for arbitrary user defined data structures?

The one remaining operation is initialization.  If a data structure were 
a class, then this could be handled with a constructor.  So the problem 
would be in specifying how the argument list of the constructor could 
cope with the kinds of argument that you want to use to initialize a set.

Is this a fair analysis?  If so, then I feel it a better basis to 
proceed than on a data structure by data structure basis.  And what is 
required here is that in be made an overloadable operator.  (Well, and 
the argument list problem.  I don't know how that should be handled.  I 
usually start with an empty data structure and add things to it.)

Aug 22 2001

Russell Bornschlegel <kaleja estarcion.com> writes:

Charles Hixson wrote:
 If not, then if it came to a choice my
 vote would be for the inclusion of B+Trees as a standard language
 feature.  I would want to use them far more frequently, and they are
 much more work to implement.

Are you interested in the primitive functionality of B+Trees, or of
some particular implementation-dependent characteristic of them?

In a garbage-collected language like D, where you have few or no 
performance guarantees to begin with, it doesn't seem necessary 
to make a big deal over the latter. 

 One thing that most of these seem to do is to allow one to store data by
 means of some non-numeric keying function (usually a string, but lets
 try to keep this general).  Another is to test the structure for the
 existence of that key.  Another is to retrieve data associated with that
 key.  Another is to remove a key, and its associated data.  Anything
 else primitive-common?

Yes: iterate over the whole collection, performing an arbitrary 
operation on each element. This is the only operation I see in your 
list that isn't supported by associative arrays in the D spec. 

A set can be represented by an associative array where the contents 
of an existing element are unused -- all you care about is the 
existence or absence of an element. 

Consider that the STL set and map containers are normally implemented
as some flavor of binary tree, while Perl's associative array, which 
does essentially the same thing, is normally referred to and 
implemented as a hash. They're used the same way, but have slightly 
different performance characteristics -- that most people aren't 
worried about, because they're lazy and would otherwise be doing 
linear searches on arrays or linked lists, so the near-constant
lookup time of a hash or the log2 lookup time of a tree is a huge
win for them. Trees can also maintain the collection in order, 
while hashes can't. 

 Then the operations are, essentially a[key]=x, x=a[key], x in a, and
 a[key]=null.

The D-spec associative array specifies "delete a[key]" instead of 
"a[key] = null", and I think you meant "key in a" instead of 
"x in a", but yeah.  

Note -- I do agree that the language should support adding new data 
structures for your particular needs; to do them reasonably requires:

- operator overloading to make them pretty, including overloading 
operator delete, operator in, and operator [].

- templates or the equivalent to make them reusable and type-safe.

It looks like you aren't going to get both these features in early
releases of D, unfortunately...

-Russell B

Aug 22 2001

Charles Hixson <charleshixsn earthlink.net> writes:

Russell Bornschlegel wrote:
 Charles Hixson wrote:
 
If not, then if it came to a choice my
vote would be for the inclusion of B+Trees as a standard language
feature.  I would want to use them far more frequently, and they are
much more work to implement.

 
 Are you interested in the primitive functionality of B+Trees, or of
 some particular implementation-dependent characteristic of them?
 
 In a garbage-collected language like D, where you have few or no 
 performance guarantees to begin with, it doesn't seem necessary 
 to make a big deal over the latter. 
 

Yes.  I want to store data on files, and access it in random order.  And 
remember it from session to session.  An external database can do this, 
but installing it remotely can be a large hassle, and it's often 
overkill for what I need.

 
One thing that most of these seem to do is to allow one to store data by
means of some non-numeric keying function (usually a string, but lets
try to keep this general).  Another is to test the structure for the
existence of that key.  Another is to retrieve data associated with that
key.  Another is to remove a key, and its associated data.  Anything
else primitive-common?

 
 Yes: iterate over the whole collection, performing an arbitrary 
 operation on each element. This is the only operation I see in your 
 list that isn't supported by associative arrays in the D spec. 
 
 A set can be represented by an associative array where the contents 
 of an existing element are unused -- all you care about is the 
 existence or absence of an element. 
 
 Consider that the STL set and map containers are normally implemented
 as some flavor of binary tree, while Perl's associative array, which 
 does essentially the same thing, is normally referred to and 
 implemented as a hash. They're used the same way, but have slightly 
 different performance characteristics -- that most people aren't 
 worried about, because they're lazy and would otherwise be doing 
 linear searches on arrays or linked lists, so the near-constant
 lookup time of a hash or the log2 lookup time of a tree is a huge
 win for them. Trees can also maintain the collection in order, 
 while hashes can't. 
 

You're right.  Ok, in addition to the "in" operator we need an "each" or 
"for_each" operator.  But that could be implemented as a function, 
though an operator would be nicer.

Remember, that if these structures are implemented as classes, then 
additional functions can be added as needed, by sub-classing.  (E.g., is 
  an each operator sensible on a set?  Only the common sensible ones are 
primitive-common.)

Part of the question here is what are the inheritance characteristics. 
How does one specify a generic class parameter?  A set of integers and a 
set of strings have slightly different characteristics.  Does the 
inheritance mechanism allow for this specialization?  If I define a set 
of wheels, what is it's relationship to a set of bicycle wheels? 
(Assuming that BicycleWheel is a descendant from Wheel.)  Can I compare 
membership between the two sets?  etc.

Another question is what kind of languate is D?  Clearly STL/C++ and 
Python would have much different answers if I asked them "What kind of 
object can I put into a set?" and "What kind of object can I get back 
out of a set?"  Python can look at the thing you get back and tell what 
it is.  C++ depends on the definition of the set to tell it what kind of 
object it got back.  C++ is faster.  Python is more flexible.

Now my personal needs are for a language that produces stand-alone code. 
  Python has problems with this (though that's being worked on).  To me, 
speed is a secondary consideration.  I'm currently working on a dataset 
of about 1500 cases with a basically N^2 complexity.  It takes about 
10-15 minutes to run using Eiffel code with all of the checks on.  I'd 
like a faster execution, but it's not a killer.  (Most of that time 
comes from hash table processing.  I'm searching for a better 
alternative.)  But it MUST be stand alone code.  People need to be able 
to run it from out on the server, and I may never have seen their 
computer.  Scratch Python.  Scratch all interpretive languages.  Scratch 
all languages that require massive dll's.  etc.  So that's how I notice 
what features are optimum.  Garbage collection is a real winner.  A 
B+Tree would really help, as would any method for maintaining persistent 
variables... help, but not be decisive.  Stand-alone code is decisive.

 
Then the operations are, essentially a[key]=x, x=a[key], x in a, and
a[key]=null.

 
 The D-spec associative array specifies "delete a[key]" instead of 
 "a[key] = null", and I think you meant "key in a" instead of 
 "x in a", but yeah.  
 
 Note -- I do agree that the language should support adding new data 
 structures for your particular needs; to do them reasonably requires:
 
 - operator overloading to make them pretty, including overloading 
 operator delete, operator in, and operator [].
 
 - templates or the equivalent to make them reusable and type-safe.
 
 It looks like you aren't going to get both these features in early
 releases of D, unfortunately...
 
 -Russell B
 

Well, Walter said that it was alpha, and the features were subject to 
change.  So now's the time to speak up.  If I don't at least mention 
what I feel is important, then it's only my fault if it doesn't happen. 
  If it doesn't happen anyway ... all languages are design tradeoffs. 
You examine what's around, and you see what you can make do what you 
need done.  (And I'm more interested in the Linux version than in the 
Windows version, so what's in the early releases isn't THAT significant. 
  But if I don't get people thinking about it, then it probably won't be 
in the later one's either.)

Aug 22 2001

Russell Bornschlegel <kaleja estarcion.com> writes:

Charles Hixson wrote:
 
 Russell Bornschlegel wrote:
 Charles Hixson wrote:
One thing that most of these seem to do is to allow one to store data by
means of some non-numeric keying function (usually a string, but lets
try to keep this general).  Another is to test the structure for the
existence of that key.  Another is to retrieve data associated with that
key.  Another is to remove a key, and its associated data.  Anything
else primitive-common?

 Yes: iterate over the whole collection, performing an arbitrary
 operation on each element. This is the only operation I see in your
 list that isn't supported by associative arrays in the D spec.

 
 You're right.  Ok, in addition to the "in" operator we need an "each" or
 "for_each" operator.  But that could be implemented as a function,
 though an operator would be nicer.

Turns out we have it in the D spec:

  for (int i = 0; i < assoc_array.keys.length; i++)
  {
      operate_on( assoc_array[ assoc_array.keys[i] ] );
  }

 Another question is what kind of languate is D?  Clearly STL/C++ and
 Python would have much different answers if I asked them "What kind of
 object can I put into a set?" and "What kind of object can I get back
 out of a set?"  Python can look at the thing you get back and tell what
 it is.  C++ depends on the definition of the set to tell it what kind of
 object it got back.  C++ is faster.  Python is more flexible.

I know you're concerned with the builtin functionality, but it's worth
mentioning that you can get the Pythonish functionality (if I understand 
what you're talking about -- I don't know Python) from C++ in a number 
of ways if you're willing to do some work. An STL container can store 
references to polymorphic types. 

C++ can do just about anything, wrap that functionality in a class, and 
tuck it away -- the only problem is that someone has to write that class
in the first place. The code to do it, apparently, is much smaller than
the Ada equivalent, but that's not saying a lot. ;)

 Now my personal needs are for a language that produces stand-alone code.
   Python has problems with this (though that's being worked on).  To me,
 speed is a secondary consideration.  I'm currently working on a dataset
 of about 1500 cases with a basically N^2 complexity. [problem description
 snipped]

All I can say is "C++ and STL maps." There's a reason C++ is widely 
used, and its syntactic elegance sure ain't it...

 It looks like you aren't going to get both these features in early
 releases of D, unfortunately...

 Well, Walter said that it was alpha, and the features were subject to
 change.  So now's the time to speak up.  If I don't at least mention
 what I feel is important, then it's only my fault if it doesn't happen.

All granted. My concern right now is that a one-man, no-budget effort 
to develop a new language can get derailed very easily by people screaming
"don't even bother unless you're going to put Feature X in!" I want Walter
to get a usable compiler out, _then_ we can beat on it and complain in 
a more informed manner :)

-Russell B

Aug 22 2001

Charles Hixson <charleshixsn earthlink.net> writes:

Russell Bornschlegel wrote:
 Charles Hixson wrote:
 
Russell Bornschlegel wrote:

Charles Hixson wrote:

One thing that most of these seem to do is to allow one to store data by
means of some non-numeric keying function (usually a string, but lets
try to keep this general).  Another is to test the structure for the
existence of that key.  Another is to retrieve data associated with that
key.  Another is to remove a key, and its associated data.  Anything
else primitive-common?

Yes: iterate over the whole collection, performing an arbitrary
operation on each element. This is the only operation I see in your
list that isn't supported by associative arrays in the D spec.

You're right.  Ok, in addition to the "in" operator we need an "each" or
"for_each" operator.  But that could be implemented as a function,
though an operator would be nicer.

 
 Turns out we have it in the D spec:
 
   for (int i = 0; i < assoc_array.keys.length; i++)
   {
       operate_on( assoc_array[ assoc_array.keys[i] ] );
   }

That makes it possible (well, easier, it was already possible).  That 
doesn't make it clean to use.  Contrast that with:

    assoc_array.for_each.key(k)
      operate_on (k);

The syntax of that example is a bit wierd, as I am mangling several 
different languages, but I trust the meaning is clear.  The question 
here is how the key should be passed to the operate_on method.
 
...
 
 C++ can do just about anything, wrap that functionality in a 
 class, and tuck it away -- the only problem is that someone has
 to write that class in the first place. The code to do it, 
 apparently, is much smaller than
 the Ada equivalent, but that's not saying a lot. ;)

Actually, except in the sense of algorithmic completeness, there are a 
lot of things that neither C++ nor any other statically bound language 
can do.  But there's been a big speed and portability tradeoff, so 
languages have tended to divide into two camps, the dynamic linkers and 
the static linkers.  C++, Ada, Eiffel, that kind are static linkers. 
Smalltalk, Lisp, Python, Ruby, that kind are the dynamic linkers.  There 
are reasons for the split, and neither side is able to really capture 
the features of the other (though there are a lot of wild claims made). 
  Java is a rather interesting mix, sort of standing half way between 
them.  It attempts to get the advantages of both sides, and ends up 
getting a lot of the disadvantages of each, also.  As well as some 
unique strengths.  (Which it tends to ignore in favor of being familiar 
to C programmers, but look at Kiev to see some of the possibilities.)
 
 
...

 
 All I can say is "C++ and STL maps." There's a reason C++ is widely 
 used, and its syntactic elegance sure ain't it...

Doing it by hand in Eiffel is, for me, easier and more portable than 
using C++ with the STL.  Possibly if I had to throw a Mac into the 
equation, my answer would be different, but so far I haven't needed 
that.  I guess that if someone tells me it has to run on a Mac, I may 
tell them "Yellow Dog Linux".  It's not a good answer, but I haven't 
really studied that question.  Or maybe by that time Python will be able 
to do stand-alone compiles.  Or Ruby.  But it won't be C++ until after 
they fix their inheritance rules mess.
 
 
...
 
 All granted. My concern right now is that a one-man, no-budget effort 
 to develop a new language can get derailed very easily by people screaming
 "don't even bother unless you're going to put Feature X in!" I want Walter
 to get a usable compiler out, _then_ we can beat on it and complain in 
 a more informed manner :)
 
 -Russell B
 

Which is why I try to keep my suggestions to what seems to me as simple 
syntactic sugar.  Maybe I'm wrong.  Maybe I get overenthusiastic.  But I 
try.
E.g., most of this discussion was about what minimal changes could make 
lots of standard data structures easy and clean to use.  Most of these 
would be library features.  Inheritance is already scheduled to be a lot 
more dynamic than in C++, garbage collection is already built in 
(eliminating a lot of work for destructors).  Etc.
   So what is being discussed is whether or not a couple of prepositions 
could be added, and whether or not they would be useful enough to bother 
with.  But with decently flexible inheritance and overloadable user 
defined operators, then, say, :in: and :each: could be defined within a 
class tree, and all data structures could just implement their own 
flavor of it.  So it would all reduce to libraries.  But if it had to be 
written "in( xxx )" and "each ( xxx )" then it would be much uglier and 
less friendly.

Aug 22 2001

Russell Bornschlegel <kaleja estarcion.com> writes:

Charles Hixson wrote:
 
 Russell Bornschlegel wrote:
 Turns out we have it in the D spec:

   for (int i = 0; i < assoc_array.keys.length; i++)
   {
       operate_on( assoc_array[ assoc_array.keys[i] ] );
   }

 That makes it possible (well, easier, it was already possible).  That
 doesn't make it clean to use.  Contrast that with:
 
     assoc_array.for_each.key(k)
       operate_on (k);
 
 The syntax of that example is a bit wierd, as I am mangling several
 different languages, but I trust the meaning is clear.  The question
 here is how the key should be passed to the operate_on method.

Hmm. How about:

   foreach (keytype k in assoc_array.keys)
   {
       operate_on( k );
   }

This overloads "in" somewhat, but the foreach should be a sufficient 
hint to the parser; the keytype (int, string, whatever) makes the 
k look like a declaration like any other. 

 C++ can do just about anything, wrap that functionality in a
 class, and tuck it away...

 
 Actually, except in the sense of algorithmic completeness, there are a
 lot of things that neither C++ nor any other statically bound language
 can do. 

Frankly, I've never used a dynamically linked language, I don't think. 
Can you give me a short example of a cool thing you can do with dynamic 
linking? 

    So what is being discussed is whether or not a couple of prepositions
 could be added, and whether or not they would be useful enough to bother
 with.  But with decently flexible inheritance and overloadable user
 defined operators, then, say, :in: and :each: could be defined within a
 class tree, and all data structures could just implement their own
 flavor of it.  So it would all reduce to libraries.  But if it had to be
 written "in( xxx )" and "each ( xxx )" then it would be much uglier and
 less friendly.

Okay. I freely admit to being unafraid of ugly constructs, and to having
been thoroughly warped by using C++. I firmly believe that no matter how
elegant the language can be, some damn fool is going to use two-space 
indents, asinine bracketing styles, uncommented state machines implemented
via numeric literal cases in huge nested switches, or other constructs of
equal value, and that I'll be forced to maintain their code. C++ is the
least of my problems this week. :)

-RB

Aug 22 2001

Charles Hixson <charleshixsn earthlink.net> writes:

Russell Bornschlegel wrote:
 Charles Hixson wrote:
 
...

 
 Hmm. How about:
 
    foreach (keytype k in assoc_array.keys)
    {
        operate_on( k );
    }
 
 This overloads "in" somewhat, but the foreach should be a sufficient 
 hint to the parser; the keytype (int, string, whatever) makes the 
 k look like a declaration like any other. 
 

Nice!

 
C++ can do just about anything, wrap that functionality in a
class, and tuck it away...

Actually, except in the sense of algorithmic completeness, there are a
lot of things that neither C++ nor any other statically bound language
can do. 

 
 Frankly, I've never used a dynamically linked language, I don't 
 think. Can you give me a short example of a cool thing you can
 do with dynamic linking?

This one's a bit strange, and you don't mind wierd syntax, so how about:
var = eval(include(filename.suf).something(include(otherfile.suf))) + 
"qwert";

The reason that most AI work is done in the dynamic languages isn't only 
because Lisp is easier to parse (or Forth would be more popular) and it 
sure isn't speed.  It's because code can copy itself, modify the copy, 
and then transfer execution to the copy.  Even where parsing is a bit 
difficult (can you say Python) this ability is maintained.  This 
normally means that you need to cart around an interpreter to do the 
job, but not always.  Some Smalltalk dialects can export their meaning 
as C code, and some dialects of Lisp are directly compileable.

Now truthfully, this could also be done with C (assuming decent 
cooperation from the OS), but it's so difficult that nobody does it.  So 
C, etc., don't get used in the areas that need that kind of flexibility. 
  And Lisp, Python, Ruby, etc. don't get used where speed is needed, 
though there's a study that "proves" that Lisp code can execute as fast 
as C code.  (I think that what they actually end up comparing is code 
optimizers, but that's not what they claim to show... and what's wrong 
with saying "My code can be more highly optimized with automatic tools, 
so that for a sufficiently complex task it is faster than your code!", 
but that's not what they said.)
 
...
 
 Okay. I freely admit to being unafraid of ugly constructs, and to having
 been thoroughly warped by using C++. I firmly believe that no matter how
 elegant the language can be, some damn fool is going to use two-space 
 indents, asinine bracketing styles, uncommented state machines implemented
 via numeric literal cases in huge nested switches, or other constructs of
 equal value, and that I'll be forced to maintain their code. C++ is the
 least of my problems this week. :)
 
 -RB
 

I generally end up maintaining my own code.  So I guess that's not a 
problem for me (except when stupid editors convert my tabs into spaces, 
and won't let me say what columns the tabs are supposed to represent).

Aug 22 2001

Russell Bornschlegel <kaleja estarcion.com> writes:

Charles Hixson wrote:
 
 Russell Bornschlegel wrote:
 Frankly, I've never used a dynamically linked language, I don't
 think. Can you give me a short example of a cool thing you can
 do with dynamic linking?

 This one's a bit strange, and you don't mind wierd syntax, so how about:
 var = eval(include(filename.suf).something(include(otherfile.suf))) +
 "qwert";
 
 The reason that most AI work is done in the dynamic languages isn't only
 because Lisp is easier to parse (or Forth would be more popular) and it
 sure isn't speed.  It's because code can copy itself, modify the copy,
 and then transfer execution to the copy.  Even where parsing is a bit
 difficult (can you say Python) this ability is maintained.  This
 normally means that you need to cart around an interpreter to do the
 job, but not always.  Some Smalltalk dialects can export their meaning
 as C code, and some dialects of Lisp are directly compileable.

 Now truthfully, this could also be done with C (assuming decent
 cooperation from the OS), but it's so difficult that nobody does it.  So
 C, etc., don't get used in the areas that need that kind of flexibility.

Ah, okay, Perl has this also -- I've seen it, seen that it could be 
cool, but outside of one cookbook use (a Perl program eval'ing a startup/rc 
file written in Perl) I haven't tried it. Really it's dynamic code-
evaluation/dynamic compilation, as much as dynamic linking. I suppose I 
could invoke a C compiler to build a DLL and then bind the DLL, in C/Win32, 
but that's probably not really very time-efficient... :) 

And not really supported by standard C, but with the proper OS support
you can build a sequence of native machine code in a buffer and then 
execute that buffer. High performance 3D applications have been known 
to do this, for highly critical polygon-pushing paths. 

Also, blah blah blah security blah blah blah executing arbitrary code 
from an external file blah blah blah. :)

-RB

Aug 22 2001

Dan Hursh <hursh infonet.isl.net> writes:

Russell Bornschlegel wrote:
 
 Charles Hixson wrote:
 Yes: iterate over the whole collection, performing an arbitrary
 operation on each element. This is the only operation I see in your
 list that isn't supported by associative arrays in the D spec.

 You're right.  Ok, in addition to the "in" operator we need an "each" or
 "for_each" operator.  But that could be implemented as a function,
 though an operator would be nicer.

 
 Turns out we have it in the D spec:
 
   for (int i = 0; i < assoc_array.keys.length; i++)
   {
       operate_on( assoc_array[ assoc_array.keys[i] ] );
   }

	Yes, but can this break if you are adding or deleting certain elements
for the array as you go?  In that case you would have to do this:

	string[] keys = assoc_array.keys;
	for (int i = 0; i < keys.length; i++){
		if( bad_element(assoc_array[keys[i]]) )
			delete assoc_array[keys[i]];
	}

A foreach that worked on arrays with a known size and associative arrays
would be nicer and less error prone.
 
Dan

Aug 26 2001

Russell Bornschlegel <kaleja estarcion.com> writes:

Please excuse reply-to-self.

Russell Bornschlegel wrote:
 Yes: iterate over the whole collection, performing an arbitrary
 operation on each element. This is the only operation I see in your
 list that isn't supported by associative arrays in the D spec.

My mistake: myarray.keys returns the keys of an associative array, 
over which you can iterate to your hearts content.

-Russell B

Aug 22 2001

Daniel Mantione <daniel deadlock.et.tudelft.nl> writes:

Charles Hixson wrote:

 It's hard to argue that sets wouldn't be nice to have.  So would
 AVL-trees, and B+Trees.  And other classical data structures.  Perhaps a
 better approach would be to emphasize ways that make the language easy
 to extend with these features.  If not, then if it came to a choice my
 vote would be for the inclusion of B+Trees as a standard language
 feature.  I would want to use them far more frequently, and they are
 much more work to implement.

Nonsense. The set is an elementary type, just like the integer or pointer 
and no library can easily replace it, just like you cannot easily implement 
a float in a library. In fact, with any example you would be able to give, 
a the rewrite of that example with a set if almost allways superior in 
efficiency and/or maintainability and never has worse efficiency.

It is true that mathematically, a set can consist of everything, but this 
is not done so in programming languages. It is true that a mathematically 
behaving set (that can store everything) can better be implemented as a 
set. With operator overloading you can make it behave as a set.

Now the elementary datatype. A set is a bitmap in all programming languages 
that support it. In Pascal a set consits of a maximum of 256 different 
items, because the compiler uses a byte to enumerate them. This is a stupid 
limitation of Pascal, but the designers definitely got the idea. The 
built-in set should support types that can be enumerated.

For example:

enum    colours {red, green, blue};

.. would do as the basetype of a set.

char[];

.. would not do. You cannot enumerate all possible strings.

char[2];

.. perhaps would do. That would be a bitmap of 65536 entries.

Greetings,

Daniel Mantione

Aug 22 2001

Charles Hixson <charleshixsn earthlink.net> writes:

Daniel Mantione wrote:
 Charles Hixson wrote:
 
 
It's hard to argue that sets wouldn't be nice to have.  So would
AVL-trees, and B+Trees.  And other classical data structures. ...

 
 Nonsense. The set is an elementary type, just like the integer
 or pointer and no library can easily replace it, just like you
 cannot easily implement a float in a library. In fact, ...
 
 .. perhaps would do. That would be a bitmap of 65536 entries.
 
 Greetings,
 
 Daniel Mantione
 

Considering that bit arrays are an included part of the language, I fail 
to understand why a set could not be implemented as a library.  Given 
that the compiler would have all the necessary information, it should 
(this might be optomistic) be able to optimize references to your 
"SmallSet" class as inline calls.  So the problem is just syntax, which 
is what I was discussing.  If that was answered, then I didn't 
understand the answer.

Also, neither strings, associative arrays, nor sets are elementary 
types.  They are composits.  A reasonable way to model a string is as an 
array of characters.  A reasonable way to model an associative array is 
as a hash table.  And a reasonable way to model a small set is as a bit 
vector.  But even these aren't the elementary types.  If you wish 
evidence, consider C++ without the STL, or other complex libraries.  You 
can build each of those types out of the elements present, yet none of 
them are present.  And the things that you build them out of are simpler 
constructs that have been inherited from C.

Now, perhaps what you meant was that it seems to you vastly preferable 
that a set type be built-in to the language.  If so, then you have the 
right of it.  I can't argue about what your opinion is.  But others may 
legitimately have different opinions.

What I am proposing is that the language support the operations needed 
to facilitate an easy use of sets, among other data structures.  And 
that the implementation be parcelled out to the constructors of the 
libraries.  This seems to me a much more efficient approach, as well as 
a more reasonable one.  But then I don't often feel the need for sets. 
I even want to use bags more often than I want to use sets, and usually 
I prefer that my data be in some sorted form with attatched data 
structures.  So I am also biased.

Aug 22 2001

Daniel Mantione <daniel deadlock.et.tudelft.nl> writes:

Charles Hixson wrote:

 Considering that bit arrays are an included part of the language, I fail
 to understand why a set could not be implemented as a library.  Given
 that the compiler would have all the necessary information, it should
 (this might be optomistic) be able to optimize references to your
 "SmallSet" class as inline calls.  So the problem is just syntax, which
 is what I was discussing.  If that was answered, then I didn't
 understand the answer.

To optimize things well, you need compiler magic. A simple ...

if i in [0..9,18..20]

... would never get optimized if the set was implemented in a library. So 
compiler would be able to see that it can optimize the bitmap away in this 
case.

About syntax, you would like that a set that is implemented with a btree in 
a library for example, programs same as the built in set. To be able to do 
that you need operator overloading, which is being heavily dicussed.

Daniel

Aug 22 2001

Charles Hixson <charleshixsn earthlink.net> writes:

Daniel Mantione wrote:
 Charles Hixson wrote:
 
 
Considering that bit arrays are an included part of the language, I fail
to understand why a set could not be implemented as a library.  Given
that the compiler would have all the necessary information, it should
(this might be optomistic) be able to optimize references to your
"SmallSet" class as inline calls.  So the problem is just syntax, which
is what I was discussing.  If that was answered, then I didn't
understand the answer.

 
 To optimize things well, you need compiler magic. A simple ...
 
 if i in [0..9,18..20]
 
 ... would never get optimized if the set was implemented in a library. So 
 compiler would be able to see that it can optimize the bitmap away in this 
 case.
 
 About syntax, you would like that a set that is implemented with a btree in 
 a library for example, programs same as the built in set. To be able to do 
 that you need operator overloading, which is being heavily dicussed.
 
 Daniel
 

A library should provide many different varieties of data structure. 
Not just one.  You chose the flavor that fits your problem.  As to 
compound ranges (not to my mind the same thing as a set, but rather one 
thing that is sometimes handled by sets), I don't know what the best 
choice would be.  If there were a range primitive, then one might want 
to say:
if i in [0..9] or i in [18..20]

a bit clumsier, I admit.  Or one could define a class that implemented 
range handling in the way that one chose, as defined by the problem 
under consideration.

OTOH, I will admit that it would be nice if complex range constants were 
a recognized data type (which would imply complex range variables).  And 
that one of the more reasonable ways to implement that would be as a 
set.  But not the only reasonable way.  It's perfectly reasonable to 
have a complex range that can handle things like:
Range r = "Alpha" .. "Blueberry" union "Peanuts" ... "Zebras"

All you need for this is two vectors of string, call them lower and 
upper.  (The rest is a lot of work, but basically obvious.)  I've needed 
to do this a couple of times, and once I needed to deal with complex 
ranges of floating point numbers, and even needed to perform arithmetic 
on the ranges.  Consider what:
r -= "Spinach";
would mean.  (Defining what arithmetic and comparison ops mean when 
applied to ranges of floating point values is a bit interesting.  That 
is probably very domain specific.  But I needed to do it the time I 
needed ranges of floating point numbers [well, rationals was all I 
really needed, but it was easier to use floats].)

Here union, intersect, remove, etc. are the basic ops.  But these map in 
a relatively straightforward way onto the general data structure 
operations that I was proposing.  (Relatively.  It's certainly a less 
good fit than most.)

But I rarely end up dealing with simple numbers except as numbers, so 
*I* don't see much need for the Pascal kind of set.  Except occasionally.

Aug 22 2001

"Anthony Steele" <asteele nospam.iafrica.com> writes:

"Charles Hixson" <charleshixsn earthlink.net> wrote in message
news:3B83D667.40505 earthlink.net...

...

 It's hard to argue that sets wouldn't be nice to have.  So would
 AVL-trees, and B+Trees.  And other classical data structures.  Perhaps a
 better approach would be to emphasize ways that make the language easy
 to extend with these features.

Most things can be added to an OO language simply by writing the appropriate
classes, or finding them in the standard library.

But I have not seen sets done well this way in an efficient manner - To take
an example, Java, where classes are by design the *only* mechanism for
making new types, there is a set class, and no language support for enums.
However using sets and option-objects is so ugly and cumbersome that
everyone, sun included, just uses named integer constants. This is a giant
leap backwards. Built-in enums and sets in Delphi are a great enabler.

Sep 14 2001

Charles Hixson <charleshixsn earthlink.net> writes:

Anthony Steele wrote:

 "Charles Hixson" <charleshixsn earthlink.net> wrote in
 message news:3B83D667.40505 earthlink.net...


 ...

 It's hard to argue that sets wouldn't be nice to have.  So
  would AVL-trees, and B+Trees.  And other classical data
 structures.  Perhaps a better approach would be to
 emphasize ways that make the language easy to extend with
  these features.

 Most things can be added to an OO language simply by writing
  the appropriate classes, or finding them in the standard
 library.

 But I have not seen sets done well this way in an efficient
 manner - To take an example, Java, where classes are by
 design the *only* mechanism for making new types, there is
 a set class, and no language support for enums. However
 using sets and option-objects is so ugly and cumbersome that 

everyone, sun included, just uses named integer constants. This 
is a giant
 leap backwards. Built-in enums and sets in Delphi are a great 

enabler.

The primary feature needed to allow this (efficiently) is the 
ability to overload the indexing operators.



 everyone, sun included, just uses named integer

 constants. This is a giant leap backwards. Built-in enums
 and sets in Delphi are a great enabler.

The primary feature needed to allow this (efficiently) is the 
ability to overload the indexing operators.  In Python, e.g., 
the same opeartors used to set and access the hash table 
directories, can also be used to access B+Tree style databases. 
  It's true that one needs ancillary operations to access the 
additional features (e.g., in-order traversal), but the features 
of set by key and retrieve by key can be accessed via a simple:
a{"peanut"} = "butter";
print a{"peanut"}
--> butter

In python the key step is that when on initializes the variable 
("a" in this case), instead of initializing it as a hash table:
a = {} or a = {"Able" : "was", "I" : "ere"}
one initializes it as a database:
a = db("C:\file\location.dbm")
The type of database is determined by the module "anydbm" which 
looks at the file to figure out which of three or four kinds of 
database it is.  The B+Tree version that I'm speaking of is the 
SleepyCat database.

But most of this doesn't require run-time configuration. 
Clearly one might prefer to declare at compile time that it was 
a B+Tree rather than a flat-file lookup table, but as long as 
the ancestral class (abstract class or interface?), say, DBM, 
had the requisite methods defined, that would be all that was 
required.  Then the methods could be activated at run time.  But 
if the standard operations can't be overloaded, then this falls 
through.  As an alternative, I have proposed using demarkations, 
e.g. :+:, but this doesn't work well when used as indexing 
syntax, consider:
a:[:"peanut":}: = "butter";
I'd almost prefer:
a.set("peanut").to("butter");
or:
a("peanut").set_to("butter");

Sep 17 2001

"Walter" <walter digitalmars.com> writes:

An array of bits in D probably comes closest to what you're looking
for. -Walter

Oct 14 2001

D Programming

C/C++ Programming

Other

D - Ode to the set