www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Chars sorting and copies

reply bearophile <bearophileHUGS lycos.com> writes:
I have many strings and I want to use as associative array kay a sorted concat
of two strings (it's a signature of the two strings):


import std.algorithm;
void main() {
    string a = "red";
    string b = "green";
    int[string] aa;
    //aa[(a ~ b).sort] = 1;
    //aa[(a ~ b).sort.idup] = 1;
    aa[(a.dup ~ b.dup).sort.idup] = 1;
}

I think the first way used to work, the second way was recently forbidden (and
you can't use char[] as associative array key). The third way now works, but
array.sort will go away.

So do you know what's a copy-minimizing (and possibly short) way to perform
this, in future?

Bye and thank you,
bearophile
Oct 20 2011
next sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Thursday, October 20, 2011 21:49:27 bearophile wrote:
 I have many strings and I want to use as associative array kay a sorted
 concat of two strings (it's a signature of the two strings):
 
 
 import std.algorithm;
 void main() {
     string a = "red";
     string b = "green";
     int[string] aa;
     //aa[(a ~ b).sort] = 1;
     //aa[(a ~ b).sort.idup] = 1;
     aa[(a.dup ~ b.dup).sort.idup] = 1;
 }
 
 I think the first way used to work, the second way was recently forbidden
 (and you can't use char[] as associative array key). The third way now
 works, but array.sort will go away.
 
 So do you know what's a copy-minimizing (and possibly short) way to perform
 this, in future?

The elements in a string are immutable. I don't see how the first twe sorts _ever_ worked, unless it was a bug. The reality of the matter is that if you want to sort an array, its elements must be mutable, but if you want to use it as a key in an AA, its elements must be immutable. So, you're either going to have to copy them or cast to immutable. Also, you're going to have to use std.algorithm.sort eventually, and isn't going to work on chars or wchars (and what you're doing now is buggy unless you're restricting yourself to ASCII). So, you're either going to have to do one of 1. Use dchar[] to sort and then convert it to dstring for the key (either by casting or copying). 2. Use dchar[] to sort and then convert it to a string for the key (which results in a copy). 3. If you're using ASCII only, you could use char[], cast it to ubyte[] to sort it, and then cast it to string to use as the key (or idup it). But sorting and and immutability are obviously at odds with one another, requiring either a copy or a cast, and sorting really isn't going to work with char[] or wchar[], unless you restrict yourself to values that fit in a UTF-8 or UTF-16 code unit. - Jonathan M Davis
Oct 20 2011
parent bearophile <bearophileHUGS lycos.com> writes:
Jonathan M Davis:

 The elements in a string are immutable. I don't see how the first twe sorts 
 _ever_ worked, unless it was a bug.

Sorry, I meant that it used to work in D1: http://codepad.org/HYdNktNd Later I'll try some of your solutions. Bye, bearophile
Oct 20 2011
prev sibling next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 20 Oct 2011 21:49:27 -0400, bearophile <bearophileHUGS lycos.com>  
wrote:

 I have many strings and I want to use as associative array kay a sorted  
 concat of two strings (it's a signature of the two strings):


 import std.algorithm;
 void main() {
     string a = "red";
     string b = "green";
     int[string] aa;
     //aa[(a ~ b).sort] = 1;
     //aa[(a ~ b).sort.idup] = 1;
     aa[(a.dup ~ b.dup).sort.idup] = 1;
 }

 I think the first way used to work, the second way was recently  
 forbidden (and you can't use char[] as associative array key). The third  
 way now works, but array.sort will go away.

 So do you know what's a copy-minimizing (and possibly short) way to  
 perform this, in future?

a ~ b should technically be assignable to char[], since it's alread new memory. We may yet get there with pure functions being able to implicit cast to immutable. You could do this for now, but it's ugly: char[] concatStr(string[] s...) pure { auto len = 0; foreach(str; s) len += str.length; auto result = new char[len]; auto slice = result; foreach(str; s) { slice[0..str.length] = str; slice = slice[str.length..$]; } return result; } ... auto str = concatStr(a, b); str.sort; // this is going to stop working, you'll have to cast to byte[] aa[assumeUnique(str)] = 1; There are also alternatives to AA (such as dcollections' HashMap) which are willing to take non-immutable keys. -Steve
Oct 20 2011
parent bearophile <bearophileHUGS lycos.com> writes:
Thanks for all the answers.

Steven Schveighoffer:

 a ~ b should technically be assignable to char[], since it's alread new  
 memory.  We may yet get there with pure functions being able to implicit  
 cast to immutable.

Isn't that kind of the opposite? Is this already in Bugzilla? Some versions I have written, there is no very good solution, it seems: import std.algorithm, std.exception; void main() { string a = "red"; string b = "green"; int[string] aa; // Works in D1. // aa[(a ~ b).sort] = 1; // Works in D2, but will stop working. aa[(a.dup ~ b.dup).sort.idup] = 1; // Works in D2, but will stop working. char[] ab = (a.dup ~ b.dup).sort; aa[assumeUnique(ab)] = 2; // If keys don't actually need to be strings // the code gets a bit simpler. int[immutable(ubyte[])] aa2; // not good // aa2[assumeUnique(sort(cast(ubyte[])(a ~ b)))] = 3; // not good? // aa2[assumeUnique(sort(cast(ubyte[])(a ~ b)).release())] = 3; // just 1 copy, two visible casts, 3 lines auto ab4 = cast(ubyte[])(a ~ b); ab4.sort(); aa2[cast(immutable(ubyte[]))ab4] = 3; // a visible cast and an hidden cast ubyte[] ab2 = cast(ubyte[])(a ~ b); ab2.sort(); aa2[assumeUnique(ab2)] = 3; // 2 lines, more complex looking auto ab3 = sort(cast(ubyte[])(a ~ b)).release(); aa2[assumeUnique(ab3)] = 3; } Bye, bearophile
Oct 22 2011
prev sibling next sibling parent Kagamin <spam here.lot> writes:
bearophile Wrote:

 I have many strings and I want to use as associative array kay a sorted concat
of two strings (it's a signature of the two strings):
 
 
 import std.algorithm;
 void main() {
     string a = "red";
     string b = "green";
     int[string] aa;
     //aa[(a ~ b).sort] = 1;
     //aa[(a ~ b).sort.idup] = 1;
     aa[(a.dup ~ b.dup).sort.idup] = 1;
 }
 
 I think the first way used to work, the second way was recently forbidden (and
you can't use char[] as associative array key). The third way now works, but
array.sort will go away.
 
 So do you know what's a copy-minimizing (and possibly short) way to perform
this, in future?
 
 Bye and thank you,
 bearophile

http://www.d-programming-language.org/phobos/std_algorithm.html#setUnion ?
Oct 21 2011
prev sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Sat, 22 Oct 2011 23:01:17 -0400, bearophile <bearophileHUGS lycos.com>  
wrote:

 Thanks for all the answers.

 Steven Schveighoffer:

 a ~ b should technically be assignable to char[], since it's alread new
 memory.  We may yet get there with pure functions being able to implicit
 cast to immutable.

Isn't that kind of the opposite?

No, if a ~ b would call a pure function that returns mutable when possible, then you could use it for mutable, immutable, or const.
 Is this already in Bugzilla?

Somewhat, although I like using the pure function avenue to ensure immutable does not violate const than the crazy rules we came up with here: http://d.puremagic.com/issues/show_bug.cgi?id=1654 Note this was in the infancy of my D knowledge ;)
 Some versions I have written, there is no very good solution, it seems:


 import std.algorithm, std.exception;

 void main() {
     string a = "red";
     string b = "green";
     int[string] aa;

     // Works in D1.
     // aa[(a ~ b).sort] = 1;

     // Works in D2, but will stop working.
     aa[(a.dup ~ b.dup).sort.idup] = 1;

     // Works in D2, but will stop working.
     char[] ab = (a.dup ~ b.dup).sort;
     aa[assumeUnique(ab)] = 2;

     // If keys don't actually need to be strings
     // the code gets a bit simpler.
     int[immutable(ubyte[])] aa2;

     // not good
     // aa2[assumeUnique(sort(cast(ubyte[])(a ~ b)))] = 3;

     // not good?
     // aa2[assumeUnique(sort(cast(ubyte[])(a ~ b)).release())] = 3;

     // just 1 copy, two visible casts, 3 lines
     auto ab4 = cast(ubyte[])(a ~ b);
     ab4.sort();
     aa2[cast(immutable(ubyte[]))ab4] = 3;

     // a visible cast and an hidden cast
     ubyte[] ab2 = cast(ubyte[])(a ~ b);
     ab2.sort();
     aa2[assumeUnique(ab2)] = 3;

     // 2 lines, more complex looking
     auto ab3 = sort(cast(ubyte[])(a ~ b)).release();
     aa2[assumeUnique(ab3)] = 3;
 }

I honestly am not sure your specific problem is solvable without a lot of special cases. It may be you have to write a sub-function that hides the mess. -Steve
Oct 24 2011