www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Phobos2: sorting and std.typecons.Tuple

reply bearophile <bearophileHUGS lycos.com> writes:
This is code I have had to write:

auto arr = genArray();
schwartzSort!("...")(arr);
return result;

Often you know your array is small (or you don't want to sort the original
array/range), so you can add a functional-style variant that sorts a copy, 33%
of the code lines, a big saving of code:

return schwartzSorted!("...")(genArray());

---------------

schwartzSorted isn't a nice name, maybe you can find something shorter can
simpler to write, like "keySort" (key is the mapping function).

---------------

The toString of one Tuple gives me:

Tuple!(immutable(char)[],float)(hi, 1)

But I think that's a bit overkill and very long, in most situations you know
the type, so something like the following is better and much less noisy
(especially if you have to print an array of them (also notice the useful ""
around the string):

tuple("hi", 1)

I have seen than D2 writeln() prints [] around associative arrays, but not
around arrays, while D1 prints [] around both of them. This is a regression
even compared to the raw behavior of D1.

---------------

Tuple misses opCmp and toHash. I don't want to post pages of code here, but I
suggest you to take a look at:

- Record() struct in module templates
- hash() function template in module func
- structCmp() function template in module func
The code is here still:
http://www.fantascienza.net/leonardo/so/libs_d.zip

If you have questions about that please ask.

This allows people to use Tuples as keys of AA/sets and to sort them, even if
they recursively contain other Tuples (or even if they contain normal structs).
This allows to use Tuple in a *much* more flexible way in programs, increasing
their usefulness three-fold.

Bye,
bearophile
Apr 27 2009
parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from bearophile (bearophileHUGS lycos.com)'s article
 Tuple misses opCmp and toHash. I don't want to post pages of code here, but I
suggest you to take a look at:
 - Record() struct in module templates
 - hash() function template in module func
 - structCmp() function template in module func
 The code is here still:
 http://www.fantascienza.net/leonardo/so/libs_d.zip
 If you have questions about that please ask.
 This allows people to use Tuples as keys of AA/sets and to sort them, even if
they recursively contain other Tuples (or even if they contain normal structs). This allows to use Tuple in a *much* more flexible way in programs, increasing their usefulness three-fold.
 Bye,
 bearophile
Vote++ for giving Tuple a decent default implementation of opCmp and toHash. Then maybe I could use Tuple to represent joint samples from a probability distribution (I often want to use hash tables to count the frequency of each observation using hash tables). Right now, I use a really ad-hoc struct to do this, and I'm sure that my scheme for taking the hashes of the elements of the struct and turning them into one hash is sub-optimal, but it works. As far as opCmp, here's a useful snippet for a default opCmp for tuples. It works well when all you need is an arbitrary transitive ordering for use in a binary tree or something. int opCmp(const ref typeof(this) other) const { foreach(ti, elem; other.tupleof) { if(this.tupleof[ti] < elem) { return -1; } else if(this.tupleof[ti] > elem) { return 1; } } return 0; }
Apr 27 2009
next sibling parent bearophile <bearophileHUGS lycos.com> writes:
dsimcha:
 int opCmp(const ref typeof(this) other) const {
     foreach(ti, elem; other.tupleof) {
         if(this.tupleof[ti] < elem) {
             return -1;
         } else if(this.tupleof[ti] > elem) {
             return 1;
         }
     }
     return 0;
 }
The unittest of my structCmp(s1, s2) function template: unittest { // Tests of structCmp() struct S0 { } assert(structCmp(S0(), S0()) == 0); struct S1 { int x; } assert(structCmp(S1(7), S1(7)) == 0); assert(structCmp(S1(8), S1(7)) == 1); assert(structCmp(S1(7), S1(8)) == -1); assert(structCmp(S0(), S1(7)) == -1); assert(structCmp(S1(7), S0()) == 1); struct S2 { int x, y; } assert(structCmp(S2(7, 8), S2(7, 8)) == 0); assert(structCmp(S2(7, 8), S2(7, 7)) == 1); assert(structCmp(S2(7, 7), S2(7, 8)) == -1); assert(structCmp(S0(), S2(7, 5)) == -1); assert(structCmp(S2(7, 5), S0()) == 1); assert(structCmp(S1(4), S2(7, 5)) == -1); assert(structCmp(S1(7), S2(7, 5)) == -1); assert(structCmp(S1(8), S2(7, 5)) == 1); assert(structCmp(S2(7, 5), S1(7)) == 1); assert(structCmp(S2(7, 5), S1(8)) == -1); struct S3 { int x; float y; int z; } assert(structCmp(S3(2, 7.1, 8), S3(2, 7.1, 8)) == 0); assert(structCmp(S3(2, 7.1, 8), S3(2, 7.1, 7)) == 1); assert(structCmp(S3(2, 7.1, 7), S3(2, 7.1, 8)) == -1); assert(structCmp(S0(), S3(2, 7.1, 8)) == -1); assert(structCmp(S3(2, 7.1, 8), S0()) == 1); assert(structCmp(S1(2), S3(2, 7.1, 8)) == -1); assert(structCmp(S1(3), S3(2, 7.1, 8)) == 1); assert(structCmp(S3(2, 7.1, 8), S1(2)) == 1); assert(structCmp(S3(2, 7.1, 8), S1(3)) == -1); struct S3b { int x; float y; int z; int opCmp(S3b other) { return structCmp(this, other); } } assert( (S3b(2, 7.1, 8) > S3b(2, 7.1, 8)) == false); assert( (S3b(2, 7.1, 8) > S3b(2, 7.1, 7)) == true); assert( (S3b(2, 7.1, 7) > S3b(2, 7.1, 8)) == false); struct S4 { int x1, x2; float y; int z; } assert(structCmp(S4(1, 2, 7.1, 8), S4(1, 2, 7.1, 8)) == 0); assert(structCmp(S4(1, 2, 7.1, 8), S4(1, 2, 7.1, 7)) == 1); assert(structCmp(S4(1, 2, 7.1, 7), S4(1, 2, 7.1, 8)) == -1); struct St1 { int i, j; } struct St2 { int i, j, k; } auto s1 = St1(10, 20); auto s2 = St2(10, 30, 2); assert(structCmp(s1, s2) < 0); struct St3 { St1 s1; St2 s2; } auto s3a = St3(St1(10, 20), St2(10, 30, 2)); auto s3b = St3(St1(10, 20), St2(10, 30, 2)); auto s3c = St3(St1(10, 20), St2(10, 10, 2)); auto s3d = St3(St1(10, 20), St2(10, 30, 1)); struct St4 { St2 s2; St1 s1; } auto s4a = St4(St2(10, 20, 50), St1(10, 30)); assert( structCmp(s3a, s3b) == 0); assert( structCmp(s3a, s3c) > 0); assert( structCmp(s3c, s3a) < 0); assert( structCmp(s3a, s3d) > 0); assert( structCmp(s3d, s3a) < 0); assert( structCmp(s3a, s4a) < 0); assert( structCmp(s4a, s3a) > 0); } // End tests of structCmp() Bye, bearophile
Apr 27 2009
prev sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
dsimcha:
 Vote++ for giving Tuple a decent default implementation of opCmp and toHash.
I have a much better idea (that I have expressed here more than one year ago): let's add good *recursive* opCmp, toHash, opEqual, toString to all structs. Similar "small" things are able to improve the flexibility of D2 a lot. Bye, bearophile
Apr 27 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
bearophile wrote:
 dsimcha:
 Vote++ for giving Tuple a decent default implementation of opCmp and toHash.
I have a much better idea (that I have expressed here more than one year ago): let's add good *recursive* opCmp, toHash, opEqual, toString to all structs. Similar "small" things are able to improve the flexibility of D2 a lot.
s/recursive/transitive/ Litmus test: recursive could recurse forever. Transitive never does. Andrei
Apr 27 2009
parent reply bearophile <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:
 s/recursive/transitive/
 Litmus test: recursive could recurse forever. Transitive never does.
I have used the word recursive because the functions I have designed (and I think they are designed correctly) call themselves until they stop calling themselves because a stopping criterion is met. That's recursivity, I think. Bye, bearophile
Apr 27 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
bearophile wrote:
 Andrei Alexandrescu:
 s/recursive/transitive/
 Litmus test: recursive could recurse forever. Transitive never does.
I have used the word recursive because the functions I have designed (and I think they are designed correctly) call themselves until they stop calling themselves because a stopping criterion is met. That's recursivity, I think.
But then what if you have a web of class objects that ultimately will close a cycle? Transitive: you'd use a worklist. Recursive: you'd recurse forever. Andrei
Apr 27 2009
parent reply Christopher Wright <dhasenan gmail.com> writes:
Andrei Alexandrescu wrote:
 bearophile wrote:
 Andrei Alexandrescu:
 s/recursive/transitive/
 Litmus test: recursive could recurse forever. Transitive never does.
I have used the word recursive because the functions I have designed (and I think they are designed correctly) call themselves until they stop calling themselves because a stopping criterion is met. That's recursivity, I think.
But then what if you have a web of class objects that ultimately will close a cycle? Transitive: you'd use a worklist. Recursive: you'd recurse forever. Andrei
bearophile explicitly said "structs". I assume it would extend to primitives, treating pointers as integer types. I also assume it would call .toHash on any object. This would still allow you to recurse infinitely, but it would require added effort: struct S { C c; } class C { S s; this () { s.c = this; } auto toHash () { return s.toHash; } }
Apr 27 2009
next sibling parent bearophile <bearophileHUGS lycos.com> writes:
Christopher Wright:
 bearophile explicitly said "structs". I assume it would extend to 
 primitives, treating pointers as integer types. I also assume it would 
 call .toHash on any object.
If you are interested this is the code: /********************************************** Return the standard hash value of the given value/object. It uses the toHash() method of the given object/struct if it's defined. If 'value' is a struct that doesn't define a toHash(), then it returns a combination of hash() mapped on each of its fields, so it computes a correct hash value of structs that contain dynamic arrays too. So it's used inside the Record struct like this: ------------- uint toHash() { return hash(*this, true); } ------------- You can use the same in your structs to avoid that problem with dynamic array fields. The 'inside_struct' run-time boolean flag is necessary to avoid infinite recursion when you define a toHash() using a hash(): set it to true in that situation, and leave its default false value in all other situations. */ hash_t hash(T)(T value, bool inside_struct=false) { if (inside_struct) { static if ( is(T == struct) ) { // adapted from [...] // This to hash (hopefully) correctly structs that contain arrays! uint mult = 1000003; uint result = 0x345678; // I haven't used a hash_t here foreach (i, el; value.tupleof) { // this is a static foreach result = (result ^ hash(el)) * mult; // recursive call mult += 82520 + i + i; } return result + 97531; } else static if ( is(typeof(T.toHash)) || is(typeof(T.init.toHash)) ) { return value.toHash(); } else { // Suggested by "John C" <johnch_atms hotmail.com> return typeid(T).getHash(&value); } } else { static if ( is(typeof(T.toHash)) || is(typeof(T.init.toHash)) ) { return value.toHash(); } else static if ( is(T == struct) ) { // adapted from [...] // This to hash (hopefully) correctly structs that contain arrays! uint mult = 1000003; uint result = 0x345678; foreach (i, el; value.tupleof) { // this is a static foreach result = (result ^ hash(el)) * mult; // recursive call mult += 82520 + i + i; } return result + 97531; } else { // Suggested by "John C" <johnch_atms hotmail.com> return typeid(T).getHash(&value); } } } I didn't find a way to use a compile-time inside_struct flag. Bye, bearophile
Apr 27 2009
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Christopher Wright wrote:
 Andrei Alexandrescu wrote:
 bearophile wrote:
 Andrei Alexandrescu:
 s/recursive/transitive/
 Litmus test: recursive could recurse forever. Transitive never does.
I have used the word recursive because the functions I have designed (and I think they are designed correctly) call themselves until they stop calling themselves because a stopping criterion is met. That's recursivity, I think.
But then what if you have a web of class objects that ultimately will close a cycle? Transitive: you'd use a worklist. Recursive: you'd recurse forever. Andrei
bearophile explicitly said "structs". I assume it would extend to primitives, treating pointers as integer types. I also assume it would call .toHash on any object.
No matter. As soon as a class is reached, the problem appears. I wanted more to fix a terminological issue than to naysay. Andrei
Apr 27 2009