www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Sorting array or AssocArrays by value.

reply "Me Here" <p9e883002 sneakemail.com> writes:
Hi all,

A little advice please. Using D v1.028.

I have (currently) an array of uints. I need to output these sorted, which
.sort does admirably,
but I need to know what index is associated with each value. How?

Springing from my own use of the word "associated", I thought that maybe I
should be using
and associative array for this. But again the problem is that I would need to
sort the keys
by their associated values.

In Perl I'd do

my  array = ...;

## Get the indexes ordered by their values
my  orderIndexes = sort{ 
     $array[ $a ] <=> $array[ $b ] 
} 0 .. $#array;

for my $i (  orderedIndexes ) {
    printf "%d : %d\n2, $i, $array[ $i ];
}

Is there anything built-in or in Phobos that will help me here?
Or do I need to write my own sort?

Cheers, b.
-- 
May 04 2008
next sibling parent reply Derek Parnell <derek psych.ward> writes:
On Sun, 4 May 2008 07:58:16 +0000 (UTC), Me Here wrote:

 Hi all,
 
 A little advice please. Using D v1.028.
 
 I have (currently) an array of uints. I need to output these sorted, which
 .sort does admirably,
 but I need to know what index is associated with each value. How?
 
 Springing from my own use of the word "associated", I thought that maybe I
 should be using
 and associative array for this. But again the problem is that I would need to
 sort the keys
 by their associated values.
 
 In Perl I'd do
 
 my  array = ...;
 
 ## Get the indexes ordered by their values
 my  orderIndexes = sort{ 
      $array[ $a ] <=> $array[ $b ] 
 } 0 .. $#array;
 
 for my $i (  orderedIndexes ) {
     printf "%d : %d\n2, $i, $array[ $i ];
 }
 
 Is there anything built-in or in Phobos that will help me here?
 Or do I need to write my own sort?
 
 Cheers, b.

Try this 'sort' of thing ... import std.stdio; void main() { string[string] theArray; theArray["one"] = "neung"; theArray["two"] = "song"; theArray["three"] = "saam"; theArray["four"] = "sii"; theArray["five"] = "hah"; foreach(string k; theArray.keys.sort) writefln("Key: %10s ==> Data: %s", k, theArray[k]); } -- Derek Parnell Melbourne, Australia skype: derek.j.parnell
May 04 2008
parent reply "Me Here" <p9e883002 sneakemail.com> writes:
Derek Parnell wrote:

 On Sun, 4 May 2008 07:58:16 +0000 (UTC), Me Here wrote:
 
 Hi all,
 
 A little advice please. Using D v1.028.
 
 I have (currently) an array of uints. I need to output these sorted, which
 .sort does admirably,
 but I need to know what index is associated with each value. How?
 
 Springing from my own use of the word "associated", I thought that maybe I
 should be using
 and associative array for this. But again the problem is that I would need
 to sort the keys
 by their associated values.
 
 In Perl I'd do
 
 my  array = ...;
 
 ## Get the indexes ordered by their values
 my  orderIndexes = sort{ 
      $array[ $a ] <=> $array[ $b ] 
 } 0 .. $#array;
 
 for my $i (  orderedIndexes ) {
     printf "%d : %d\n2, $i, $array[ $i ];
 }
 
 Is there anything built-in or in Phobos that will help me here?
 Or do I need to write my own sort?
 
 Cheers, b.

Try this 'sort' of thing ... import std.stdio; void main() { string[string] theArray; theArray["one"] = "neung"; theArray["two"] = "song"; theArray["three"] = "saam"; theArray["four"] = "sii"; theArray["five"] = "hah"; foreach(string k; theArray.keys.sort) writefln("Key: %10s ==> Data: %s", k, theArray[k]); }

I need them ordered by the /values/ not the keys. Eg. Given, (whether array or hash): //index/key 0 1 2 3 4 5 6 7 8 uint a[] = [ 3, 1, 8, 6, 4, 9, 2, 5, 7 ]; I need to output: // value - index 1 - 1 2 - 6 3 = 0 4 - 4 5 - 7 6 - 3 7 - 8 8 - 2 9 - 5 Cheers, b. --
May 04 2008
parent reply Derek Parnell <derek psych.ward> writes:
On Sun, 4 May 2008 13:11:23 +0000 (UTC), Me Here wrote:

 I need them ordered by the /values/ not the keys.
 
 Eg. Given, (whether array or hash):
 
 //index/key  0  1  2  3  4  5  6  7 8 
 uint a[] =    [ 3, 1,  8, 6, 4, 9, 2, 5, 7 ];
 
 I need to output:
 // value  - index
 1 - 1
 2 - 6
 3 = 0
 4 - 4
 5 - 7
 6 - 3
 7 - 8
 8 - 2
 9 - 5

The trick I use for this is to have two AAs. When the set is small, performance isn't really an issue. import std.stdio; void main() { int[int] theArray; theArray[0] = 3; theArray[1] = 1; theArray[2] = 8; theArray[3] = 6; theArray[4] = 4; theArray[5] = 9; theArray[6] = 2; theArray[7] = 5; theArray[8] = 7; int[int] altArray; foreach(int i; theArray.keys) { altArray[ theArray[i] ] = i; } foreach(int i; altArray.keys.sort) writefln("%s - %s", i, altArray[i]); } -- Derek Parnell Melbourne, Australia skype: derek.j.parnell
May 04 2008
parent reply "Me Here" <p9e883002 sneakemail.com> writes:
Derek Parnell wrote:

 On Sun, 4 May 2008 13:11:23 +0000 (UTC), Me Here wrote:
 
 I need them ordered by the values not the keys.
 
 Eg. Given, (whether array or hash):
 
 //index/key  0  1  2  3  4  5  6  7 8 
 uint a[] =    [ 3, 1,  8, 6, 4, 9, 2, 5, 7 ];
 
 I need to output:
 // value  - index
 1 - 1
 2 - 6
 3 = 0
 4 - 4
 5 - 7
 6 - 3
 7 - 8
 8 - 2
 9 - 5

The trick I use for this is to have two AAs. When the set is small, performance isn't really an issue. import std.stdio; void main() { int[int] theArray; theArray[0] = 3; theArray[1] = 1; theArray[2] = 8; theArray[3] = 6; theArray[4] = 4; theArray[5] = 9; theArray[6] = 2; theArray[7] = 5; theArray[8] = 7; int[int] altArray; foreach(int i; theArray.keys) { altArray[ theArray[i] ] = i; } foreach(int i; altArray.keys.sort) writefln("%s - %s", i, altArray[i]); }

Problem: That works fine if the values in the origianl array are unique. But in this case thay are not. This is frequency information. The index to the array is numeric value of words in a file. The values are counts of the occurance of that value in the file, A truncated sample showing the first of many duplicates. 0 : 5780646 1 : 219016 2 : 138623 3 : 100037 4 : 89876 5 : 61845 6 : 60890 7 : 41191 8 : 60231 9 : 36885 * ... 70 : 13069 71 : 36885 * 72 : 16121 73 : 10721 I need to out put this as either: ... 36885: 9, 70 ... or 36885: 9 36885: 70 Cheers, b. --
May 04 2008
next sibling parent Xinok <xnknet gmail.com> writes:
Me Here wrote:
 Derek Parnell wrote:
 
 On Sun, 4 May 2008 13:11:23 +0000 (UTC), Me Here wrote:

 I need them ordered by the values not the keys.

 Eg. Given, (whether array or hash):

 //index/key  0  1  2  3  4  5  6  7 8 
 uint a[] =    [ 3, 1,  8, 6, 4, 9, 2, 5, 7 ];

 I need to output:
 // value  - index
 1 - 1
 2 - 6
 3 = 0
 4 - 4
 5 - 7
 6 - 3
 7 - 8
 8 - 2
 9 - 5

performance isn't really an issue. import std.stdio; void main() { int[int] theArray; theArray[0] = 3; theArray[1] = 1; theArray[2] = 8; theArray[3] = 6; theArray[4] = 4; theArray[5] = 9; theArray[6] = 2; theArray[7] = 5; theArray[8] = 7; int[int] altArray; foreach(int i; theArray.keys) { altArray[ theArray[i] ] = i; } foreach(int i; altArray.keys.sort) writefln("%s - %s", i, altArray[i]); }

Problem: That works fine if the values in the origianl array are unique. But in this case thay are not. This is frequency information. The index to the array is numeric value of words in a file. The values are counts of the occurance of that value in the file, A truncated sample showing the first of many duplicates. 0 : 5780646 1 : 219016 2 : 138623 3 : 100037 4 : 89876 5 : 61845 6 : 60890 7 : 41191 8 : 60231 9 : 36885 * ... 70 : 13069 71 : 36885 * 72 : 16121 73 : 10721 I need to out put this as either: ... 36885: 9, 70 ... or 36885: 9 36885: 70 Cheers, b.

I suggest you write your own sort function which will sort two arrays accordingly. Shell sort would probably be the easiest to implement this for.
May 04 2008
prev sibling next sibling parent reply Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Me Here wrote:
 Problem: That works fine if the values in the origianl array are unique.
 But in this case thay are not. This is frequency information. 
 The index to the array is numeric value of words in a file. 
 The values are counts of the occurance of that value in the file,

Try putting them in an array of structs with overloaded opCmp, and then sorting that. For example: ----- module test; import tango.io.Stdout; import tango.text.convert.Format; struct Pair { int a, b; int opCmp(Pair* other) { if (a != other.a) return typeid(typeof(a)).compare(&a, &other.a); return typeid(typeof(b)).compare(&b, &other.b); } char[] toString() { return Format("{}:{}", a, b); } } void main() { int[int] aa = [43: 2, 42: 12, 10: 3, 100: 1000, 101: 200, 7: 3]; Stdout(aa).newline; auto arr = new Pair[aa.length]; size_t i = 0; foreach (key, val ; aa) arr[i++] = Pair(val, key); arr.sort; Stdout(arr).newline; } ----- output: ===== { 100=>1000, 101=>200, 7=>3, 10=>3, 42=>12, 43=>2 } [ 2:43, 3:7, 3:10, 12:42, 200:101, 1000:100 ] ===== Notes: * Tango is only used for formatting and could easily be replaced by appropriate Phobos routines. * I used typeid().compare because simple subtraction might overflow for integer data >= int.sizeof. * You may want to use an unsigned type for the frequency. Just change the types of the AA and the Pair member data. Optionally, template Pair on the types if you need to do this for multiple data types.
May 04 2008
next sibling parent "Me Here" <p9e883002 sneakemail.com> writes:
Frits van Bommel wrote:
 
 Try putting them in an array of structs with overloaded opCmp, and then
 sorting that. For example:  -----

That worked very well. Thank you. b. --
May 04 2008
prev sibling parent BCS <ao pathlink.com> writes:
Reply to Frits,

 module test;
 import tango.io.Stdout;
 import tango.text.convert.Format;
 struct Pair {
   int a, b;
   int opCmp(Pair* other) {
     if (a != other.a)
       return typeid(typeof(a)).compare(&a, &other.a);
     else
       return typeid(typeof(b)).compare(&b, &other.b);
   }
   char[] toString() {
     return Format("{}:{}", a, b);
   }
 }

maybe that should be templated. struct SortSet(K, V...) { ... }
May 04 2008
prev sibling parent reply Derek Parnell <derek psych.ward> writes:
On Sun, 4 May 2008 13:52:42 +0000 (UTC), Me Here wrote:

 Derek Parnell wrote:
 The trick I use for this is to have two AAs. When the set is small,
 performance isn't really an issue.

Problem: That works fine if the values in the origianl array are unique.

Doh! Of course that's an issue. My bad ... just come off playing a couple of hours of COD4 :-) I'm still a huge fan of variable-length arrays and AAs ( I also code in the Euphoria language where V/L arrays are the lifeblood). So here is a better solution to this sort of problem using those built-in features of D. import std.stdio; void main() { uint[int] SampleData; // NB: Has non-sequential indexes and duplicate values. SampleData[ 20] = 30; SampleData[ 1] = 10; SampleData[ 2] = 30; SampleData[243] = 60; SampleData[ 4] = 90; SampleData[ 75] = 90; SampleData[ 6] = 120; SampleData[ 17] = 50; SampleData[ -8] = 20; SampleData[ 9] = 30; SampleData[ 10] = 30; SampleData[211] = 10; SampleData[-12] = 30; SampleData[ 13] = 60; SampleData[114] = 90; SampleData[315] = 90; SampleData[126] = 20; SampleData[ 17] = 50; SampleData[418] = 20; SampleData[ 19] = 30; SampleData[120] = 30; SampleData[ 21] = 10; SampleData[-22] = 30; SampleData[223] = 60; SampleData[ 24] = 90; SampleData[425] = 90; SampleData[126] = 20; SampleData[-27] = 50; SampleData[278] = 20; SampleData[292] = 30; auto SSD = SortVals(SampleData); // Publish the results writefln("%6s: %s", "Value", "Occurs at ..."); foreach(i; SSD.keys.sort) { writef("%6s: ", i); auto vals = SSD[i]; foreach( j, v; vals) { if (j == vals.length - 1) { writefln("%s", v); } else { writef("%s, ", v); } } } } Kt[][Vt] SortVals(Kt, Vt)(Vt[Kt] theArray) { // Reverse index array. // nb: The original keys are stored in a V/L array. Kt[][Vt] altArray; // Setup the reverse index array. foreach(i; theArray.keys.sort) { altArray[ theArray[i] ] ~= i; } return altArray; } This produces the output ... Value: Occurs at ... 10: 1, 21, 211 20: -8, 126, 278, 418 30: -22, -12, 2, 9, 10, 19, 20, 120, 292 50: -27, 17 60: 13, 223, 243 90: 4, 24, 75, 114, 315, 425 120: 6 -- Derek Parnell Melbourne, Australia skype: derek.j.parnell
May 04 2008
parent pragma <eric.t.anderton gmail.com> writes:
Derek Parnell wrote:
 On Sun, 4 May 2008 13:52:42 +0000 (UTC), Me Here wrote:
 
 Derek Parnell wrote:
 The trick I use for this is to have two AAs. When the set is small,
 performance isn't really an issue.


Doh! Of course that's an issue. My bad ... just come off playing a couple of hours of COD4 :-)

COD4 + programming... Is that anything like playing a few hours of GTA, and then immediately driving to the corner store for a soda and some chips? - Pragma
May 04 2008
prev sibling next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Me Here:
 I have (currently) an array of uints. I need to output these sorted, which
 .sort does admirably,
 but I need to know what index is associated with each value. How?

My libs are designed to solve such problems too, take a look at sortedAA inside the func module: http://www.fantascienza.net/leonardo/so/libs_d.zip Bye, bearophile
May 04 2008
parent reply "Me Here" <p9e883002 sneakemail.com> writes:
bearophile wrote:

 Me Here:
 I have (currently) an array of uints. I need to output these sorted, which
 .sort does admirably,
 but I need to know what index is associated with each value. How?

My libs are designed to solve such problems too, take a look at sortedAA inside the func module: http://www.fantascienza.net/leonardo/so/libs_d.zip Bye, bearophile

sortAA() looks like it would do the trick, but could you expand on the example a litle for me. I'm sure the sub signature should tell me all I need to know, but I'm having a problem working out how to call it? -------- TyKey[] sortedAA(TyKey,TyVal,TyFun)(TyVal[TyKey] aa, TyFun key); void valGetter(); Return the keys of an AA sorted according to the result of the given function/delegate 'key' applied on two arguments, the key and value. 'key' can be any callable, or it can be the predefined &valGetter to sort the AA keys according to the values. Example: Sort the keys of an AA according to the values: int[string] a = ["DD":3, "aa":1, "BB":2]; sortedAA(d, &valGetter) ==> ["aa", "BB", "DD"] If you use this to sort an AA according to values, using a valGetter function, you are paying just about 5% speed penality. Throws ArgumentException if key returns void. ---------------- If I define my hash as uint freq[ ushort ]; How do I write valGetter() ? Cheers, b. --
May 04 2008
parent reply bearophile <bearophileHUGS lycos.com> writes:
Me Here:
 How do I write valGetter() ?

Use the predefined one, my libs are supposed to minimize the work for the programmer ;-) Bye, bearophile
May 04 2008
parent reply "Me Here" <p9e883002 sneakemail.com> writes:
bearophile wrote:

 Me Here:
 How do I write valGetter() ?

Use the predefined one, my libs are supposed to minimize the work for the programmer ;-) Bye, bearophile

Oh. I guess I was confused by the implementation: void valGetter() { } /// ditto I couldn't for the life of me se how that would work as is, so I assumed it must be some kind of virtual functio or place holder. Actually, I /still/ don't understand how it can work, but if you say it does I will try it. What the convention of where to place thrid-party libraries? The stem (root, path; package?) of the librabry names 'd.*' seems um.. a little generic? Cheers, b. --
May 04 2008
parent bearophile <bearophileHUGS lycos.com> writes:
Me Here:
 Oh. I guess I was confused by the implementation:
 void valGetter() { } /// ditto
 I couldn't for the life of me se how that would work as is, so I assumed it
 must be some kind of virtual functio or place holder.
 Actually, I /still/ don't understand how it can work, but if you say it does I
 will try it.

sortedAA() accepts any callable with key+val arguments, and sorts the keys of the given AA according to the result of that callable, like the key attribute of Python sort()/sorted(). If the callable is the default valGetter, then the result of that callable is meant to be right the value. So it sorts according to the values. That's quite similar to the itemgetter() and similar functions in the operator module of the Python standard library. If you try the given example, or you look inside the tens of unittests, you can see how and why it works. It's just an empty function, used as flag, to simplify the life of the user.
 What the convention of where to place thrid-party libraries?
 The stem (root, path; package?) of the librabry names 'd.*' seems um.. a little
 generic?

The package is called "d". I know no other package with that name that may collide with it. You can put that directory inside the directory of your modules, libs, etc. Then you have to tell "bud" (or similar tool, there's another more modern one, that I don't use) the path of that directory of your modules. Then at the top of your code you can use: import d.func: sortAA; And bud will find the right module and its dependent modules, and compile them. If you need more compilation speed you can compile the d libs in a library "lib" and maybe create "short" modules, all using the DMD compiler and its tools. Bye, bearophile
May 04 2008
prev sibling parent reply BCS <ao pathlink.com> writes:
Reply to Me,

 Hi all,
 
 A little advice please. Using D v1.028.
 
 I have (currently) an array of uints. I need to output these sorted,
 which
 .sort does admirably,
 but I need to know what index is associated with each value. How?
 Springing from my own use of the word "associated", I thought that
 maybe I
 should be using
 and associative array for this. But again the problem is that I would
 need to
 sort the keys
 by their associated values.
 In Perl I'd do
 
 my  array = ...;
 
 ## Get the indexes ordered by their values
 my  orderIndexes = sort{
 $array[ $a ] <=> $array[ $b ]
 } 0 .. $#array;
 for my $i (  orderedIndexes ) {
 printf "%d : %d\n2, $i, $array[ $i ];
 }
 Is there anything built-in or in Phobos that will help me here? Or do
 I need to write my own sort?
 
 Cheers, b.
 

pack it in ulongs with the index in the lower bits and the value in the upper bits. auto tmp = new ulong[](array.length) foreach(uint i, uint v; array) tmp[i] = (cast(ulong)v)<<32 | i; tmp.sort; // get value of item i cast(uint)(tmp[i]>>32); // get index of item i; cast(uint)(tmp[i] & 0xffff_ffff);
May 04 2008
parent reply BCS <ao pathlink.com> writes:
Reply to myself

 pack it in ulongs with the index in the lower bits and the value in
 the upper bits.

My solution is about the same as frits' solution but without the extra type or function calls, but with a big does of WTHuuu...
May 04 2008
parent reply Manfred Nowak <svv1999 hotmail.com> writes:
BCS wrote:

 My solution is about the same as frits' solution but without the
 extra type or function calls, but with a big does of WTHuuu...

This isn't the first time a request for the permutation computed by the sort routine comes up. Should such be incorprated into the language? -manfred
May 04 2008
next sibling parent "Me Here" <p9e883002 sneakemail.com> writes:
Manfred Nowak wrote:

 This isn't the first time a request for the permutation computed by the 
 sort routine comes up. Should such be incorprated into the language?
 

method that was called back with both the [index, value] or [key,value] of the items being compared and returning -1, 0, +1, would allow for custom sorting. The other notion that crossed my mind was having the ability to locally (scoped) override the opCmp of built-in types, but that would require all opCmps to expect two pairs. b. --
May 04 2008
prev sibling parent BCS <BCS pathlink.com> writes:
Manfred Nowak wrote:
 BCS wrote:
 
 
My solution is about the same as frits' solution but without the
extra type or function calls, but with a big does of WTHuuu...

This isn't the first time a request for the permutation computed by the sort routine comes up. Should such be incorprated into the language? -manfred

No, it's easy enough to wrap it up in a templated struct in the library.
May 05 2008