www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Sets again

reply Fredrik Olsson <peylow gmail.com> writes:
Well, with so much talk of features in (regexp) and out (bit), why not 
bring up my favourite yet again: the sets!

Set are unordered arrays, that you can do union, intersect and 
difference operations on (And more).

I have implemented sets as templates, but the syntax is cluncy, and 
well, optimisation is not that easy :/.

We have all done this:
if (((ch>='0')&&(ch<='9'))||((ch>='a')&&( ... and so on...

With sets it is a much more straight forward aproach to this in reality 
simple problem:
if (ch in <'0'..9, 'a'..'z', 'A'..'Z'>) { ... }

Lets write a nice small example function:

enum Weekday { MON, TUE, WED, THU, FRI, SAT, SUN };
set Weekdays { Weekday };
struct Employee {
   char[] name;
   Weekdays availableDays;
}

Weekdays weekdaysAvailableForAll(Employees[] employees) {
   Weekdays avilableDays = <MON..SUN>; // Weekdays.all ?
   foreach(Employee employee; emplouees) {
     availableDays -= employee.availableDays;
   }
   return vailableDays;
}

bool isDayGoodForAll(Weekday weekday, Employees[] employees) {
   return weekday in weekdaysAvailableForAll(employees);
}


If someone can propose a more clean, elegant, robust, usable, and 
optimisation friendly solution to this problem domain I stand corrected. 
But until then I say it is probably more useful to more people than 
complex math.

Only problem I see is that set literals are even more needed than array 
literals. And then we have the problem, with:
set Weekdays {Weekdays}; // and
set Workdays {MON..FRI};
auto foo = <TUE, WED, FRI>; // Is this a Weekdays or a Workdays set?

regards
   Fredrik Olsson
Feb 17 2006
next sibling parent reply "Craig Black" <cblack ara.com> writes:
Sets are indeed a cool language feature.  Perhaps some of the capability 
that you are looking for could be done with a template.  The syntax could be 
something like ...

if (i == Set!<int>(1, 3, 5..6, 8)) { ... }

and with implicit function template instantiation:

if(i == Set(1, 3, 5..6, 8)) { ... }

What do you think?

-Craig 
Feb 17 2006
next sibling parent "Craig Black" <cblack ara.com> writes:
Oops I'm too use to C++ template syntax. Minor correction:

if (i == Set!(int)(1, 3, 5..6, 8)) { ... }
Feb 17 2006
prev sibling parent reply Fredrik Olsson <peylow gmail.com> writes:
Craig Black skrev:
 Sets are indeed a cool language feature.  Perhaps some of the capability 
 that you are looking for could be done with a template.  The syntax could be 
 something like ...
 
 if (i == Set!<int>(1, 3, 5..6, 8)) { ... }
 
 and with implicit function template instantiation:
 
 if(i == Set(1, 3, 5..6, 8)) { ... }
 
 What do you think?
 
 -Craig 
 
 
It works and I have some templates up and running for the purpose. But I see two major and obvious problems: 1. 5..6 is not possible as I can see it, and Set('a', 'b', 'c' ... is well, hidious at best. 2. The implementation will probably never be as good as could be. For a general purpose set I use an associative array with "bool[T] _set". For a more limited set a bitfield is way more effective. But I still stand by my point that sets are useful enough to have, that they should not be a template hack and putting them in a library. After all, sets are a primitive type of arrays (unordered arrays), and belong in the language, or not at all. // Fredrik Olsson
Feb 17 2006
parent "Kris" <fu bar.com> writes:
"Fredrik Olsson" <peylow gmail.com> wrote..
 Craig Black skrev:
 Sets are indeed a cool language feature.  Perhaps some of the capability 
 that you are looking for could be done with a template.  The syntax could 
 be something like ...

 if (i == Set!<int>(1, 3, 5..6, 8)) { ... }

 and with implicit function template instantiation:

 if(i == Set(1, 3, 5..6, 8)) { ... }

 What do you think?

 -Craig
It works and I have some templates up and running for the purpose. But I see two major and obvious problems: 1. 5..6 is not possible as I can see it, and Set('a', 'b', 'c' ... is well, hidious at best. 2. The implementation will probably never be as good as could be. For a general purpose set I use an associative array with "bool[T] _set". For a more limited set a bitfield is way more effective. But I still stand by my point that sets are useful enough to have, that they should not be a template hack and putting them in a library. After all, sets are a primitive type of arrays (unordered arrays), and belong in the language, or not at all.
Aye. Even rudimentary Set support within the grammar would be really useful. Pascal was just a tuition tool, but it did have Sets ~ that almost made up for a number of deficiencies.
Feb 17 2006
prev sibling next sibling parent reply Dave <Dave_member pathlink.com> writes:
In article <dt54hb$alg$1 digitaldaemon.com>, Fredrik Olsson says...
Well, with so much talk of features in (regexp) and out (bit), why not 
bring up my favourite yet again: the sets!

Set are unordered arrays, that you can do union, intersect and 
difference operations on (And more).

I have implemented sets as templates, but the syntax is cluncy, and 
well, optimisation is not that easy :/.

We have all done this:
if (((ch>='0')&&(ch<='9'))||((ch>='a')&&( ... and so on...

With sets it is a much more straight forward aproach to this in reality 
simple problem:
if (ch in <'0'..9, 'a'..'z', 'A'..'Z'>) { ... }

Lets write a nice small example function:

enum Weekday { MON, TUE, WED, THU, FRI, SAT, SUN };
set Weekdays { Weekday };
struct Employee {
   char[] name;
   Weekdays availableDays;
}

Weekdays weekdaysAvailableForAll(Employees[] employees) {
   Weekdays avilableDays = <MON..SUN>; // Weekdays.all ?
   foreach(Employee employee; emplouees) {
     availableDays -= employee.availableDays;
   }
   return vailableDays;
}

bool isDayGoodForAll(Weekday weekday, Employees[] employees) {
   return weekday in weekdaysAvailableForAll(employees);
}


If someone can propose a more clean, elegant, robust, usable, and 
optimisation friendly solution to this problem domain I stand corrected. 
But until then I say it is probably more useful to more people than 
complex math.

Only problem I see is that set literals are even more needed than array 
literals. And then we have the problem, with:
set Weekdays {Weekdays}; // and
set Workdays {MON..FRI};
auto foo = <TUE, WED, FRI>; // Is this a Weekdays or a Workdays set?

regards
   Fredrik Olsson
Sets are cool. How about in for arrays for consistency with AA's: void foo(char[] ) { char[] arr = "abcdef"; //... if('d' in arr) { ... } } And perhaps when array operations are available: void main() { char[62] alphanum; alphanum[0..26] = 'a' + $index; alphanum[26..52] = alphanum[0..26] - 32; alphanum[52..62] = '0' + $index + 52; if('m' in alpha) { ... } }
Feb 17 2006
next sibling parent reply Ivan Senji <ivan.senji_REMOVE_ _THIS__gmail.com> writes:
Dave wrote:
 In article <dt54hb$alg$1 digitaldaemon.com>, Fredrik Olsson says...
 
Well, with so much talk of features in (regexp) and out (bit), why not 
bring up my favourite yet again: the sets!

Set are unordered arrays, that you can do union, intersect and 
difference operations on (And more).

I have implemented sets as templates, but the syntax is cluncy, and 
well, optimisation is not that easy :/.

We have all done this:
if (((ch>='0')&&(ch<='9'))||((ch>='a')&&( ... and so on...

With sets it is a much more straight forward aproach to this in reality 
simple problem:
if (ch in <'0'..9, 'a'..'z', 'A'..'Z'>) { ... }

Lets write a nice small example function:

enum Weekday { MON, TUE, WED, THU, FRI, SAT, SUN };
set Weekdays { Weekday };
struct Employee {
  char[] name;
  Weekdays availableDays;
}

Weekdays weekdaysAvailableForAll(Employees[] employees) {
  Weekdays avilableDays = <MON..SUN>; // Weekdays.all ?
  foreach(Employee employee; emplouees) {
    availableDays -= employee.availableDays;
  }
  return vailableDays;
}

bool isDayGoodForAll(Weekday weekday, Employees[] employees) {
  return weekday in weekdaysAvailableForAll(employees);
}


If someone can propose a more clean, elegant, robust, usable, and 
optimisation friendly solution to this problem domain I stand corrected. 
But until then I say it is probably more useful to more people than 
complex math.

Only problem I see is that set literals are even more needed than array 
literals. And then we have the problem, with:
set Weekdays {Weekdays}; // and
set Workdays {MON..FRI};
auto foo = <TUE, WED, FRI>; // Is this a Weekdays or a Workdays set?

regards
  Fredrik Olsson
Sets are cool. How about in for arrays for consistency with AA's:
Yes please :) This is so obviously missing :)
 void foo(char[] )
 {
 char[] arr = "abcdef";
 //...
 if('d' in arr) { ... }
 }
 
Feb 17 2006
parent Steve Adams <adamss ascinet.com> writes:
Ivan Senji wrote:
 Dave wrote:
 
 In article <dt54hb$alg$1 digitaldaemon.com>, Fredrik Olsson says...

 Well, with so much talk of features in (regexp) and out (bit), why 
 not bring up my favourite yet again: the sets!

 Set are unordered arrays, that you can do union, intersect and 
 difference operations on (And more).

 I have implemented sets as templates, but the syntax is cluncy, and 
 well, optimisation is not that easy :/.

 We have all done this:
 if (((ch>='0')&&(ch<='9'))||((ch>='a')&&( ... and so on...

 With sets it is a much more straight forward aproach to this in 
 reality simple problem:
 if (ch in <'0'..9, 'a'..'z', 'A'..'Z'>) { ... }

 Lets write a nice small example function:

 enum Weekday { MON, TUE, WED, THU, FRI, SAT, SUN };
 set Weekdays { Weekday };
 struct Employee {
  char[] name;
  Weekdays availableDays;
 }

 Weekdays weekdaysAvailableForAll(Employees[] employees) {
  Weekdays avilableDays = <MON..SUN>; // Weekdays.all ?
  foreach(Employee employee; emplouees) {
    availableDays -= employee.availableDays;
  }
  return vailableDays;
 }

 bool isDayGoodForAll(Weekday weekday, Employees[] employees) {
  return weekday in weekdaysAvailableForAll(employees);
 }


 If someone can propose a more clean, elegant, robust, usable, and 
 optimisation friendly solution to this problem domain I stand 
 corrected. But until then I say it is probably more useful to more 
 people than complex math.

 Only problem I see is that set literals are even more needed than 
 array literals. And then we have the problem, with:
 set Weekdays {Weekdays}; // and
 set Workdays {MON..FRI};
 auto foo = <TUE, WED, FRI>; // Is this a Weekdays or a Workdays set?

 regards
  Fredrik Olsson
Sets are cool. How about in for arrays for consistency with AA's:
Yes please :) This is so obviously missing :)
 void foo(char[] )
 {
 char[] arr = "abcdef";
 //...
 if('d' in arr) { ... }
 }
Having sets would indeed be useful. In particular if you could have sets of integers, or even better, arbitrary types.
Feb 17 2006
prev sibling parent reply "Lionello Lunesu" <lio remove.lunesu.com> writes:
 How about in for arrays for consistency with AA's:

 void foo(char[] )
 {
 char[] arr = "abcdef";
 //...
 if('d' in arr) { ... }
 }
NO! Please don't do this. The operation is O(n). If will be useful perhaps for "if (day in Weekdays)", but the system is extremely unscalable, and using "in" on a larger array will harm performance. I don't think it's a good idea to hide such a potential performance hit behind a small word like "in". (In fact, I generally think the costly operations should get long, complicated names, so they don't get used too often) Now if the array were sorted, one could do a binary search, which is not all too bad. But this still needs special syntax, since the compiler should enforce the sorting. L.
Feb 19 2006
next sibling parent Ivan Senji <ivan.senji_REMOVE_ _THIS__gmail.com> writes:
Lionello Lunesu wrote:
How about in for arrays for consistency with AA's:

void foo(char[] )
{
char[] arr = "abcdef";
//...
if('d' in arr) { ... }
}
NO! Please don't do this. The operation is O(n). If will be useful perhaps for "if (day in Weekdays)", but the system is extremely unscalable, and using "in" on a larger array will harm performance. I don't think it's a good idea to hide such a potential performance hit behind a small word like "in". (In fact, I generally think the costly operations should get long, complicated names, so they don't get used too often) Now if the array were sorted, one could do a binary search, which is not all too bad. But this still needs special syntax, since the compiler should enforce the sorting.
O(n) can be an expensive operation, but if used with arrays with less then 10 elements, it would not only be simpler to code, but also almost as fast as any other search algorithm. Very often i have such small arrays and having to do: bool foundit=false; foreach(....){if(...)foundit=true;} is not only as equally efficient but also much harder to write. Not having in operator for arrays because of O(n) is like saying vector<int> in C++ is bad because you can have milions of elements and linear search is slow. It is all about choosing the right tool for the job. If someone want's a container with many elements he will probably use a container implementning binary search or some other faster element, but in many cases simple linear search in a couple of elements is more than enough. One thing that would be very nice is having overloadable in operator for user defined types. PS Another positive side about having in for arrays would be consistency, we have it for AA's, having it for normal arrays and classes would just make it make more sence to have.
Feb 20 2006
prev sibling parent "Craig Black" <cblack ara.com> writes:
 How about in for arrays for consistency with AA's:

 void foo(char[] )
 {
 char[] arr = "abcdef";
 //...
 if('d' in arr) { ... }
 }
NO! Please don't do this. The operation is O(n). If will be useful perhaps for "if (day in Weekdays)", but the system is extremely unscalable, and using "in" on a larger array will harm performance. I don't think it's a good idea to hide such a potential performance hit behind a small word like "in". (In fact, I generally think the costly operations should get long, complicated names, so they don't get used too often) Now if the array were sorted, one could do a binary search, which is not all too bad. But this still needs special syntax, since the compiler should enforce the sorting.
I disagree. The linear search would be adequate for most problems. Typically "in" is merely used instead of an "if(x == a || x == y || x == z ...)", which is also a linear search. If you need a binary search then you can use a different syntax for that. -Craig
Feb 21 2006
prev sibling next sibling parent David Medlock <noone nowhere.com> writes:
Fredrik Olsson wrote:
 Well, with so much talk of features in (regexp) and out (bit), why not 
 bring up my favourite yet again: the sets!
 
 Set are unordered arrays, that you can do union, intersect and 
 difference operations on (And more).
 
 I have implemented sets as templates, but the syntax is cluncy, and 
 well, optimisation is not that easy :/.
 
 We have all done this:
 if (((ch>='0')&&(ch<='9'))||((ch>='a')&&( ... and so on...
 
 With sets it is a much more straight forward aproach to this in reality 
 simple problem:
 if (ch in <'0'..9, 'a'..'z', 'A'..'Z'>) { ... }
 
 Lets write a nice small example function:
 
 enum Weekday { MON, TUE, WED, THU, FRI, SAT, SUN };
 set Weekdays { Weekday };
 struct Employee {
   char[] name;
   Weekdays availableDays;
 }
 
 Weekdays weekdaysAvailableForAll(Employees[] employees) {
   Weekdays avilableDays = <MON..SUN>; // Weekdays.all ?
   foreach(Employee employee; emplouees) {
     availableDays -= employee.availableDays;
   }
   return vailableDays;
 }
 
 bool isDayGoodForAll(Weekday weekday, Employees[] employees) {
   return weekday in weekdaysAvailableForAll(employees);
 }
 
 
 If someone can propose a more clean, elegant, robust, usable, and 
 optimisation friendly solution to this problem domain I stand corrected. 
 But until then I say it is probably more useful to more people than 
 complex math.
 
 Only problem I see is that set literals are even more needed than array 
 literals. And then we have the problem, with:
 set Weekdays {Weekdays}; // and
 set Workdays {MON..FRI};
 auto foo = <TUE, WED, FRI>; // Is this a Weekdays or a Workdays set?
 
 regards
   Fredrik Olsson
Probably not entirely what you are after, but what the hell....here is the set class I use on occaision( hope this is the latest version..). -DavidM
Feb 17 2006
prev sibling next sibling parent reply Hasan Aljudy <hasan.aljudy gmail.com> writes:
Please guys .. let's not try to put everything in the language. It all 
can be implemented using classes (and templates, if you really want).

Building everything into D will just turn D into another C++. What I 
mean is, it'll become a kludge.

Fredrik Olsson wrote:
 Well, with so much talk of features in (regexp) and out (bit), why not 
 bring up my favourite yet again: the sets!
 
 Set are unordered arrays, that you can do union, intersect and 
 difference operations on (And more).
 
 I have implemented sets as templates, but the syntax is cluncy, and 
 well, optimisation is not that easy :/.
 
 We have all done this:
 if (((ch>='0')&&(ch<='9'))||((ch>='a')&&( ... and so on...
 
 With sets it is a much more straight forward aproach to this in reality 
 simple problem:
 if (ch in <'0'..9, 'a'..'z', 'A'..'Z'>) { ... }
Well, this really is a range problem. if( ch.inrange('0','9') ) { .... }
 
 Lets write a nice small example function:
 
 enum Weekday { MON, TUE, WED, THU, FRI, SAT, SUN };
 set Weekdays { Weekday };
 struct Employee {
   char[] name;
   Weekdays availableDays;
 }
 
 Weekdays weekdaysAvailableForAll(Employees[] employees) {
   Weekdays avilableDays = <MON..SUN>; // Weekdays.all ?
   foreach(Employee employee; emplouees) {
     availableDays -= employee.availableDays;
   }
   return vailableDays;
 }
 
 bool isDayGoodForAll(Weekday weekday, Employees[] employees) {
   return weekday in weekdaysAvailableForAll(employees);
 }
 
 
 If someone can propose a more clean, elegant, robust, usable, and 
 optimisation friendly solution to this problem domain I stand corrected. 
I believe that a proper object oriented implementation is certainly possible.
 But until then I say it is probably more useful to more people than 
 complex math.
I don't use either of them, I find the need for other things (i.e. standard Collection Classes) to be more urgent, but that's just me.
 
 Only problem I see is that set literals are even more needed than array 
 literals. And then we have the problem, with:
 set Weekdays {Weekdays}; // and
 set Workdays {MON..FRI};
 auto foo = <TUE, WED, FRI>; // Is this a Weekdays or a Workdays set?
I think auto type inference wasn't a good idea anyway.
 
 regards
   Fredrik Olsson
Feb 17 2006
parent Fredrik Olsson <peylow gmail.com> writes:
Hasan Aljudy skrev:
 Please guys .. let's not try to put everything in the language. It all 
 can be implemented using classes (and templates, if you really want).
 
 Building everything into D will just turn D into another C++. What I 
 mean is, it'll become a kludge.
 <snip>
I agree, no need to add stuff for the sake of adding. But there are many reasons not to put it is a library. First of all it a so simple and basic data type, that if put in a library you might as well throw arrays in there (An array is an ordered list of values, a set is an unordered list without dublicates). Secondly hiding a feature this good in a library will hide it from many users, just as Walter argued on regexes and added match expressions. And thirdly, as a first class sitizen in the language the compiler can do wonders with it. But letts look at the overview page of D, and the major goals (top 1): * Reduce software development costs by at least 10% by adding in proven productivity enhancing features and by adjusting language features so that common, time-consuming bugs are eliminated from the start. Check on all accounts. Far less code to write (less code==less bugs). Sets solves a problem domain that normaly requires many, many lines of code, some loops, nested ifs and a general mess. I think the reason som people do not use regex or associative arrays is that they simply have not tried them, and understood just how useful they are. I sure know that everyone I have introduced to regex, associative arrays AND sets have never understood how they could code without thm before. // Fredrik
Feb 18 2006
prev sibling parent Dawid =?UTF-8?B?Q2nEmcW8YXJraWV3aWN6?= <dawid.ciezarkiewicz gmail.com> writes:
Fredrik Olsson wrote:

 Well, with so much talk of features in (regexp) and out (bit), why not
 bring up my favourite yet again: the sets!
There are sets in D _almost_ avaiable today. We could have sets as associative arrays: void[int] set_of_ints; void[Car] set_of_cars; and use them as usual: Car ferrari = new Car("ferrari F50"); if (ferrari in set_of_cars) { writef("you've got a ferrari. wow - you're lucky.\n"); } else { set_of_cars.insert(ferrari); writef("I will give you mine ferrari.\n"); } Not much to change, lot to gain. The only real problem was inserting into such array. Just ask Walter to let this work well. I remember having some difficulties and using bit[int] to achive this in my app.
Feb 18 2006