www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Why std.algorithm.sort can't be applied to char[]?

reply "hane" <han.ehit.suzi.0 gmail.com> writes:
and is there any way to sort char array with algorithm.sort?
---
import std.algorithm;
import std.range;

void main()
{
   int[] arr = [5, 3, 7];
   sort(arr); // OK

   char[] arr2 = ['z', 'g', 'c'];
   sort(arr2); // error
   sort!q{ a[0] > b[0] }(zip(arr, arr2)); // error
}
---
I don't know what's difference between int[] and char[] in D, but 
it's very unnatural.
May 12 2014
next sibling parent reply "John Colvin" <john.loughran.colvin gmail.com> writes:
On Monday, 12 May 2014 at 14:49:53 UTC, hane wrote:
 and is there any way to sort char array with algorithm.sort?
 ---
 import std.algorithm;
 import std.range;

 void main()
 {
   int[] arr = [5, 3, 7];
   sort(arr); // OK

   char[] arr2 = ['z', 'g', 'c'];
   sort(arr2); // error
   sort!q{ a[0] > b[0] }(zip(arr, arr2)); // error
 }
 ---
 I don't know what's difference between int[] and char[] in D, 
 but it's very unnatural.
char[] is a rather special type of array: the language has unicode support and iterates over it by code-point (i.e. not guaranteed to be a single char per iteration). If you want to sort chars and are assuming ASCII, you can just use std.string.representation to work with them as integer types: import std.algorithm; import std.range; import std.string; void main() { int[] arr = [5, 3, 7]; sort(arr); // OK char[] arr2 = ['z', 'g', 'c']; sort(arr2.representation); // error sort!q{ a[0] > b[0] }(zip(arr, arr2.representation)); // error }
May 12 2014
next sibling parent "John Colvin" <john.loughran.colvin gmail.com> writes:
On Monday, 12 May 2014 at 14:56:46 UTC, John Colvin wrote:
 On Monday, 12 May 2014 at 14:49:53 UTC, hane wrote:
 and is there any way to sort char array with algorithm.sort?
 ---
 import std.algorithm;
 import std.range;

 void main()
 {
  int[] arr = [5, 3, 7];
  sort(arr); // OK

  char[] arr2 = ['z', 'g', 'c'];
  sort(arr2); // error
  sort!q{ a[0] > b[0] }(zip(arr, arr2)); // error
 }
 ---
 I don't know what's difference between int[] and char[] in D, 
 but it's very unnatural.
char[] is a rather special type of array: the language has unicode support and iterates over it by code-point (i.e. not guaranteed to be a single char per iteration). If you want to sort chars and are assuming ASCII, you can just use std.string.representation to work with them as integer types: import std.algorithm; import std.range; import std.string; void main() { int[] arr = [5, 3, 7]; sort(arr); // OK char[] arr2 = ['z', 'g', 'c']; sort(arr2.representation); // error sort!q{ a[0] > b[0] }(zip(arr, arr2.representation)); // error }
woops, should have deleted the // error comments
May 12 2014
prev sibling parent "hane" <han.ehit.suzi.0 gmail.com> writes:
On Monday, 12 May 2014 at 14:56:46 UTC, John Colvin wrote:
 char[] is a rather special type of array: the language has 
 unicode support and iterates over it by code-point (i.e. not 
 guaranteed to be a single char per iteration).

 If you want to sort chars and are assuming ASCII, you can just 
 use std.string.representation to work with them as integer 
 types:
On Monday, 12 May 2014 at 16:30:12 UTC, Jonathan M Davis via Digitalmars-d-learn wrote:
 All strings in D are treated as ranges of dchar, not their 
 element type. This
 has to with the fact that a char or wchar are only part of a 
 character. If you
 want to sort arrays of characters, you need to use dchar[].
I understand the reason. Thanks!
May 13 2014
prev sibling next sibling parent Jonathan M Davis via Digitalmars-d-learn writes:
On Mon, 12 May 2014 14:49:52 +0000
hane via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> wrote:

 and is there any way to sort char array with algorithm.sort?
 ---
 import std.algorithm;
 import std.range;

 void main()
 {
    int[] arr = [5, 3, 7];
    sort(arr); // OK

    char[] arr2 = ['z', 'g', 'c'];
    sort(arr2); // error
    sort!q{ a[0] > b[0] }(zip(arr, arr2)); // error
 }
 ---
 I don't know what's difference between int[] and char[] in D, but
 it's very unnatural.
All strings in D are treated as ranges of dchar, not their element type. This has to with the fact that a char or wchar are only part of a character. If you want to sort arrays of characters, you need to use dchar[]. http://stackoverflow.com/questions/12288465 http://stackoverflow.com/questions/16590650 - Jonathan M Davis
May 12 2014
prev sibling next sibling parent Charles Hixson via Digitalmars-d-learn writes:
On 05/12/2014 09:29 AM, Jonathan M Davis via Digitalmars-d-learn wrote:
 On Mon, 12 May 2014 14:49:52 +0000
 hane via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> wrote:

 and is there any way to sort char array with algorithm.sort?
 ---
 import std.algorithm;
 import std.range;

 void main()
 {
     int[] arr = [5, 3, 7];
     sort(arr); // OK

     char[] arr2 = ['z', 'g', 'c'];
     sort(arr2); // error
     sort!q{ a[0] > b[0] }(zip(arr, arr2)); // error
 }
 ---
 I don't know what's difference between int[] and char[] in D, but
 it's very unnatural.
All strings in D are treated as ranges of dchar, not their element type. This has to with the fact that a char or wchar are only part of a character. If you want to sort arrays of characters, you need to use dchar[]. http://stackoverflow.com/questions/12288465 http://stackoverflow.com/questions/16590650 - Jonathan M Davis
Given that he was working with pure ASCII, he should be able to cast the array to byte[] and sort it, but I haven't tried. Also char[] isn't string. Strings are immutable, and thus cannot be sorted in place. To me this looks like an error in sort that he should be able to work around with a cast. -- Charles Hixson
May 12 2014
prev sibling parent reply Jonathan M Davis via Digitalmars-d-learn writes:
On Mon, 12 May 2014 11:08:47 -0700
Charles Hixson via Digitalmars-d-learn
<digitalmars-d-learn puremagic.com> wrote:

 On 05/12/2014 09:29 AM, Jonathan M Davis via Digitalmars-d-learn
 wrote:
 On Mon, 12 May 2014 14:49:52 +0000
 hane via Digitalmars-d-learn <digitalmars-d-learn puremagic.com>
 wrote:

 and is there any way to sort char array with algorithm.sort?
 ---
 import std.algorithm;
 import std.range;

 void main()
 {
     int[] arr = [5, 3, 7];
     sort(arr); // OK

     char[] arr2 = ['z', 'g', 'c'];
     sort(arr2); // error
     sort!q{ a[0] > b[0] }(zip(arr, arr2)); // error
 }
 ---
 I don't know what's difference between int[] and char[] in D, but
 it's very unnatural.
All strings in D are treated as ranges of dchar, not their element type. This has to with the fact that a char or wchar are only part of a character. If you want to sort arrays of characters, you need to use dchar[]. http://stackoverflow.com/questions/12288465 http://stackoverflow.com/questions/16590650 - Jonathan M Davis
Given that he was working with pure ASCII, he should be able to cast the array to byte[] and sort it, but I haven't tried.
Sure, you can cast char[] to ubyte[] and sort that if you know that the array only holds pure ASCII. In fact, you can use std.string.representation to do it - e.g. auto ascii = str.representation; and if str were mutable, then you could sort it. But that will only work if the string only contains ASCII characters. Regardless, he wanted to know why he couldn't sort char[], and I explained why - all strings are treated as ranges of dchar, making it so that if their element type is char or wchar, so they're not random access and thus can't be sorted.
 Also char[] isn't string.
Yes, string is aliased to immutable(string)[], but char[] is still a string type, and what I said applies to all string types. In particular, arrays of char or wchar are called "narrow strings," because it's not guaranteed that one of their code units is a full code point, unlike arrays of dchar. But all arrays of char, wchar, or dchar are strings.
 Strings are immutable, and thus cannot be sorted in place.
"string" is immutable and thus couldn't be sorted regardless of whether narrow strings were treated as ranges of dchar or not, but char[] is a string just as much as immutable(char)[] is. It's just not string. - Jonathan M Davis
May 12 2014
parent reply "monarch_dodra" <monarchdodra gmail.com> writes:
On Monday, 12 May 2014 at 18:44:22 UTC, Jonathan M Davis via 
Digitalmars-d-learn wrote:
 Sure, you can cast char[] to ubyte[] and sort that if you know 
 that the array
 only holds pure ASCII. In fact, you can use 
 std.string.representation to do it
 - e.g.

 auto ascii = str.representation;

 and if str were mutable, then you could sort it. But that will 
 only work if
 the string only contains ASCII characters. Regardless, he 
 wanted to know why
 he couldn't sort char[], and I explained why - all strings are 
 treated as
 ranges of dchar, making it so that if their element type is 
 char or wchar, so
 they're not random access and thus can't be sorted.
Arguably, a smart enough implementation should know how to sort a "char[]", while still preserving codepoint integrity. As a matter of fact, the built in "sort" property does it. void main() { char[] s = "éöeèûà".dup; s.sort; writeln(s); } //prints: eàèéöû
May 14 2014
next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
monarch_dodra:

 As a matter of fact, the built in "sort" property does it.
It's going to be deprecated "soon". Bye, bearophile
May 14 2014
prev sibling next sibling parent reply "John Colvin" <john.loughran.colvin gmail.com> writes:
On Wednesday, 14 May 2014 at 08:27:46 UTC, monarch_dodra wrote:
 On Monday, 12 May 2014 at 18:44:22 UTC, Jonathan M Davis via 
 Digitalmars-d-learn wrote:
 Sure, you can cast char[] to ubyte[] and sort that if you know 
 that the array
 only holds pure ASCII. In fact, you can use 
 std.string.representation to do it
 - e.g.

 auto ascii = str.representation;

 and if str were mutable, then you could sort it. But that will 
 only work if
 the string only contains ASCII characters. Regardless, he 
 wanted to know why
 he couldn't sort char[], and I explained why - all strings are 
 treated as
 ranges of dchar, making it so that if their element type is 
 char or wchar, so
 they're not random access and thus can't be sorted.
Arguably, a smart enough implementation should know how to sort a "char[]", while still preserving codepoint integrity. As a matter of fact, the built in "sort" property does it. void main() { char[] s = "éöeèûà".dup; s.sort; writeln(s); } //prints: eàèéöû
Why would anyone ever want to sort code-points? They might want to sort graphemes, but that's difficult to do in-place (needs O(n) memory, I think...). If out-of-place is good enough someStr.byGrapheme.array.sort();
May 14 2014
parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Wednesday, 14 May 2014 at 09:01:23 UTC, John Colvin wrote:
 Why would anyone ever want to sort code-points?
Why not? To remove duplicate characters?
 They might want to sort graphemes, but that's difficult to do 
 in-place (needs O(n) memory, I think...). If out-of-place is 
 good enough

 someStr.byGrapheme.array.sort();
The current "status quo" in D is that a dchar basically represents a character.
May 15 2014
prev sibling parent reply Jonathan M Davis via Digitalmars-d-learn writes:
On Wed, 14 May 2014 08:27:45 +0000
monarch_dodra via Digitalmars-d-learn
<digitalmars-d-learn puremagic.com> wrote:

 On Monday, 12 May 2014 at 18:44:22 UTC, Jonathan M Davis via
 Digitalmars-d-learn wrote:
 Sure, you can cast char[] to ubyte[] and sort that if you know
 that the array
 only holds pure ASCII. In fact, you can use
 std.string.representation to do it
 - e.g.

 auto ascii = str.representation;

 and if str were mutable, then you could sort it. But that will
 only work if
 the string only contains ASCII characters. Regardless, he
 wanted to know why
 he couldn't sort char[], and I explained why - all strings are
 treated as
 ranges of dchar, making it so that if their element type is
 char or wchar, so
 they're not random access and thus can't be sorted.
Arguably, a smart enough implementation should know how to sort a "char[]", while still preserving codepoint integrity.
I don't think that that can be done at the same algorithmic complexity though. So, I don't know if that would be acceptable or not from the standpoint of std.algorithm.sort. But even if it's a good idea, someone would have to special case sort for char[], and no one has done that.
 As a matter of fact, the built in "sort" property does it.

 void main()
 {
      char[] s = "éöeèûà".dup;
      s.sort;
      writeln(s);
 }
 //prints:
 eàèéöû
I'm surprised. I thought that one of Bearophile's favorite complaints was that it didn't sort unicode properly (and hence one of the reasons that it should be removed). Regardless, I do think that it should be removed. - Jonathan M Davis
May 14 2014
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 14 May 2014 05:13:42 -0400, Jonathan M Davis via  =

Digitalmars-d-learn <digitalmars-d-learn puremagic.com> wrote:

 On Wed, 14 May 2014 08:27:45 +0000
 monarch_dodra via Digitalmars-d-learn
 <digitalmars-d-learn puremagic.com> wrote:

 On Monday, 12 May 2014 at 18:44:22 UTC, Jonathan M Davis via
 Digitalmars-d-learn wrote:
 Sure, you can cast char[] to ubyte[] and sort that if you know
 that the array
 only holds pure ASCII. In fact, you can use
 std.string.representation to do it
 - e.g.

 auto ascii =3D str.representation;

 and if str were mutable, then you could sort it. But that will
 only work if
 the string only contains ASCII characters. Regardless, he
 wanted to know why
 he couldn't sort char[], and I explained why - all strings are
 treated as
 ranges of dchar, making it so that if their element type is
 char or wchar, so
 they're not random access and thus can't be sorted.
Arguably, a smart enough implementation should know how to sort a "char[]", while still preserving codepoint integrity.
I don't think that that can be done at the same algorithmic complexity=
=
 though.
 So, I don't know if that would be acceptable or not from the standpoin=
t =
 of
 std.algorithm.sort. But even if it's a good idea, someone would have t=
o
 special case sort for char[], and no one has done that.
I think it can be done at O(nlgn) complexity, but you must allocate a = block of scratch space to do the sorting. You can't do it in-place becau= se = swapping isn't available. Might as well convert to dchar and back explicitly. Merge sort may allow more promising speedups, but I still think you will= = need to allocate extra space.
 As a matter of fact, the built in "sort" property does it.

 void main()
 {
      char[] s =3D "=C3=A9=C3=B6e=C3=A8=C3=BB=C3=A0".dup;
      s.sort;
      writeln(s);
 }
 //prints:
 e=C3=A0=C3=A8=C3=A9=C3=B6=C3=BB
I'm surprised. I thought that one of Bearophile's favorite complaints =
=
 was that
 it didn't sort unicode properly (and hence one of the reasons that it =
=
 should
 be removed). Regardless, I do think that it should be removed.
I can't believe this worked. I want to say that it's a freak accident fo= r = that set of characters. Looking in druntime, I don't see where the speci= al = case is. -Steve
May 15 2014
parent reply "monarch_dodra" <monarchdodra gmail.com> writes:
On Thursday, 15 May 2014 at 13:26:45 UTC, Steven Schveighoffer
wrote:
 On Wed, 14 May 2014 05:13:42 -0400, Jonathan M Davis via 
 Digitalmars-d-learn <digitalmars-d-learn puremagic.com> wrote:

 On Wed, 14 May 2014 08:27:45 +0000
 monarch_dodra via Digitalmars-d-learn
 <digitalmars-d-learn puremagic.com> wrote:
 As a matter of fact, the built in "sort" property does it.

 void main()
 {
     char[] s = "éöeèûà".dup;
     s.sort;
     writeln(s);
 }
 //prints:
 eàèéöû
I'm surprised. I thought that one of Bearophile's favorite complaints was that it didn't sort unicode properly (and hence one of the reasons that it should be removed). Regardless, I do think that it should be removed.
I can't believe this worked. I want to say that it's a freak accident for that set of characters. Looking in druntime, I don't see where the special case is. -Steve
Must be a hell of a freak accident ;) auto s = "é東öe京ûタèワà".dup; writeln(s.sort); => eàèéöûタワ京東 It's in rt/adi.d extern (C) char[] _adSortChar(char[] a) It's basically: string=>dstring=>sort=>dstring=>string. BTW, the built in reverse also works with char[], and so does std.algorithm.reverse (and it does it pretty cleverly too, might I say). As far as I'm concerned, if we *can* do it in n.log(n), and somebody provides the implementation, then there is no reason to not offer dchar sorting for char[]/wchar.
May 15 2014
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 15 May 2014 11:37:07 -0400, monarch_dodra <monarchdodra gmail.co=
m>  =

wrote:

 On Thursday, 15 May 2014 at 13:26:45 UTC, Steven Schveighoffer
 wrote:
 On Wed, 14 May 2014 05:13:42 -0400, Jonathan M Davis via  =
 Digitalmars-d-learn <digitalmars-d-learn puremagic.com> wrote:

 On Wed, 14 May 2014 08:27:45 +0000
 monarch_dodra via Digitalmars-d-learn
 <digitalmars-d-learn puremagic.com> wrote:
 As a matter of fact, the built in "sort" property does it.

 void main()
 {
     char[] s =3D "=C3=A9=C3=B6e=C3=A8=C3=BB=C3=A0".dup;
     s.sort;
     writeln(s);
 }
 //prints:
 e=C3=A0=C3=A8=C3=A9=C3=B6=C3=BB
I'm surprised. I thought that one of Bearophile's favorite complaint=
s =
 was that
 it didn't sort unicode properly (and hence one of the reasons that i=
t =
 should
 be removed). Regardless, I do think that it should be removed.
I can't believe this worked. I want to say that it's a freak accident=
=
 for that set of characters. Looking in druntime, I don't see where th=
e =
 special case is.

 -Steve
Must be a hell of a freak accident ;) auto s =3D "=C3=A9=E6=9D=B1=C3=B6e=E4=BA=AC=C3=BB=E3=82=BF=C3=A8=
=E3=83=AF=C3=A0".dup;
       writeln(s.sort);

 =3D> e=C3=A0=C3=A8=C3=A9=C3=B6=C3=BB=E3=82=BF=E3=83=AF=E4=BA=AC=E6=9D=B1=
 It's in rt/adi.d
Seriously, I fucking hate the file names in druntime. I looked in qsort.d.
 extern (C) char[] _adSortChar(char[] a)

 It's basically: string=3D>dstring=3D>sort=3D>dstring=3D>string.
OK, that's what I would have expected. I don't think we should support = this.
 BTW, the built in reverse also works with char[], and so does  =
 std.algorithm.reverse (and it does it pretty cleverly too, might I say=
). This is a different algorithm. Sort requires random-access swapping. = Reverse does not.
 As far as I'm concerned, if we *can* do it in n.log(n), and somebody  =
 provides the implementation, then there is no reason to not offer dcha=
r =
 sorting for char[]/wchar.
I think there is nothing wrong with requiring the steps to be explicit. = We = should not hide such bloat behind sort. -Steve
May 15 2014
parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Thursday, 15 May 2014 at 17:46:52 UTC, Steven Schveighoffer 
wrote:
 As far as I'm concerned, if we *can* do it in n.log(n), and 
 somebody provides the implementation, then there is no reason 
 to not offer dchar sorting for char[]/wchar.
I think there is nothing wrong with requiring the steps to be explicit. We should not hide such bloat behind sort. -Steve
Of course, but I meant if we could find an algoirthm O(1) (or O(log(N)) space overhead. If the only algorithm we are capable of providing just temporarily dstrings, then no.
May 15 2014