digitalmars.D.learn - Why std.algorithm.sort can't be applied to char[]?
- hane (15/15) May 12 2014 and is there any way to sort char array with algorithm.sort?
- John Colvin (17/32) May 12 2014 char[] is a rather special type of array: the language has
- John Colvin (2/37) May 12 2014 woops, should have deleted the // error comments
- hane (4/15) May 13 2014 On Monday, 12 May 2014 at 16:30:12 UTC, Jonathan M Davis via
- Jonathan M Davis via Digitalmars-d-learn (8/23) May 12 2014 All strings in D are treated as ranges of dchar, not their element type....
- Charles Hixson via Digitalmars-d-learn (8/33) May 12 2014 Given that he was working with pure ASCII, he should be able to cast the...
- Jonathan M Davis via Digitalmars-d-learn (21/58) May 12 2014 On Mon, 12 May 2014 11:08:47 -0700
- monarch_dodra (13/28) May 14 2014 Arguably, a smart enough implementation should know how to sort a
- bearophile (4/5) May 14 2014 It's going to be deprecated "soon".
- John Colvin (6/36) May 14 2014 Why would anyone ever want to sort code-points?
- monarch_dodra (4/9) May 15 2014 The current "status quo" in D is that a dchar basically
- Jonathan M Davis via Digitalmars-d-learn (11/41) May 14 2014 On Wed, 14 May 2014 08:27:45 +0000
- Steven Schveighoffer (21/67) May 15 2014 =
- monarch_dodra (15/40) May 15 2014 Must be a hell of a freak accident ;)
-
Steven Schveighoffer
(20/62)
May 15 2014
On Thu, 15 May 2014 11:37:07 -0400, monarch_dodra
- monarch_dodra (6/12) May 15 2014 Of course, but I meant if we could find an algoirthm O(1) (or
and is there any way to sort char array with algorithm.sort? --- import std.algorithm; import std.range; void main() { int[] arr = [5, 3, 7]; sort(arr); // OK char[] arr2 = ['z', 'g', 'c']; sort(arr2); // error sort!q{ a[0] > b[0] }(zip(arr, arr2)); // error } --- I don't know what's difference between int[] and char[] in D, but it's very unnatural.
May 12 2014
On Monday, 12 May 2014 at 14:49:53 UTC, hane wrote:and is there any way to sort char array with algorithm.sort? --- import std.algorithm; import std.range; void main() { int[] arr = [5, 3, 7]; sort(arr); // OK char[] arr2 = ['z', 'g', 'c']; sort(arr2); // error sort!q{ a[0] > b[0] }(zip(arr, arr2)); // error } --- I don't know what's difference between int[] and char[] in D, but it's very unnatural.char[] is a rather special type of array: the language has unicode support and iterates over it by code-point (i.e. not guaranteed to be a single char per iteration). If you want to sort chars and are assuming ASCII, you can just use std.string.representation to work with them as integer types: import std.algorithm; import std.range; import std.string; void main() { int[] arr = [5, 3, 7]; sort(arr); // OK char[] arr2 = ['z', 'g', 'c']; sort(arr2.representation); // error sort!q{ a[0] > b[0] }(zip(arr, arr2.representation)); // error }
May 12 2014
On Monday, 12 May 2014 at 14:56:46 UTC, John Colvin wrote:On Monday, 12 May 2014 at 14:49:53 UTC, hane wrote:woops, should have deleted the // error commentsand is there any way to sort char array with algorithm.sort? --- import std.algorithm; import std.range; void main() { int[] arr = [5, 3, 7]; sort(arr); // OK char[] arr2 = ['z', 'g', 'c']; sort(arr2); // error sort!q{ a[0] > b[0] }(zip(arr, arr2)); // error } --- I don't know what's difference between int[] and char[] in D, but it's very unnatural.char[] is a rather special type of array: the language has unicode support and iterates over it by code-point (i.e. not guaranteed to be a single char per iteration). If you want to sort chars and are assuming ASCII, you can just use std.string.representation to work with them as integer types: import std.algorithm; import std.range; import std.string; void main() { int[] arr = [5, 3, 7]; sort(arr); // OK char[] arr2 = ['z', 'g', 'c']; sort(arr2.representation); // error sort!q{ a[0] > b[0] }(zip(arr, arr2.representation)); // error }
May 12 2014
On Monday, 12 May 2014 at 14:56:46 UTC, John Colvin wrote:char[] is a rather special type of array: the language has unicode support and iterates over it by code-point (i.e. not guaranteed to be a single char per iteration). If you want to sort chars and are assuming ASCII, you can just use std.string.representation to work with them as integer types:On Monday, 12 May 2014 at 16:30:12 UTC, Jonathan M Davis via Digitalmars-d-learn wrote:All strings in D are treated as ranges of dchar, not their element type. This has to with the fact that a char or wchar are only part of a character. If you want to sort arrays of characters, you need to use dchar[].I understand the reason. Thanks!
May 13 2014
On Mon, 12 May 2014 14:49:52 +0000 hane via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> wrote:and is there any way to sort char array with algorithm.sort? --- import std.algorithm; import std.range; void main() { int[] arr = [5, 3, 7]; sort(arr); // OK char[] arr2 = ['z', 'g', 'c']; sort(arr2); // error sort!q{ a[0] > b[0] }(zip(arr, arr2)); // error } --- I don't know what's difference between int[] and char[] in D, but it's very unnatural.All strings in D are treated as ranges of dchar, not their element type. This has to with the fact that a char or wchar are only part of a character. If you want to sort arrays of characters, you need to use dchar[]. http://stackoverflow.com/questions/12288465 http://stackoverflow.com/questions/16590650 - Jonathan M Davis
May 12 2014
On 05/12/2014 09:29 AM, Jonathan M Davis via Digitalmars-d-learn wrote:On Mon, 12 May 2014 14:49:52 +0000 hane via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> wrote:Given that he was working with pure ASCII, he should be able to cast the array to byte[] and sort it, but I haven't tried. Also char[] isn't string. Strings are immutable, and thus cannot be sorted in place. To me this looks like an error in sort that he should be able to work around with a cast. -- Charles Hixsonand is there any way to sort char array with algorithm.sort? --- import std.algorithm; import std.range; void main() { int[] arr = [5, 3, 7]; sort(arr); // OK char[] arr2 = ['z', 'g', 'c']; sort(arr2); // error sort!q{ a[0] > b[0] }(zip(arr, arr2)); // error } --- I don't know what's difference between int[] and char[] in D, but it's very unnatural.All strings in D are treated as ranges of dchar, not their element type. This has to with the fact that a char or wchar are only part of a character. If you want to sort arrays of characters, you need to use dchar[]. http://stackoverflow.com/questions/12288465 http://stackoverflow.com/questions/16590650 - Jonathan M Davis
May 12 2014
On Mon, 12 May 2014 11:08:47 -0700 Charles Hixson via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> wrote:On 05/12/2014 09:29 AM, Jonathan M Davis via Digitalmars-d-learn wrote:Sure, you can cast char[] to ubyte[] and sort that if you know that the array only holds pure ASCII. In fact, you can use std.string.representation to do it - e.g. auto ascii = str.representation; and if str were mutable, then you could sort it. But that will only work if the string only contains ASCII characters. Regardless, he wanted to know why he couldn't sort char[], and I explained why - all strings are treated as ranges of dchar, making it so that if their element type is char or wchar, so they're not random access and thus can't be sorted.On Mon, 12 May 2014 14:49:52 +0000 hane via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> wrote:Given that he was working with pure ASCII, he should be able to cast the array to byte[] and sort it, but I haven't tried.and is there any way to sort char array with algorithm.sort? --- import std.algorithm; import std.range; void main() { int[] arr = [5, 3, 7]; sort(arr); // OK char[] arr2 = ['z', 'g', 'c']; sort(arr2); // error sort!q{ a[0] > b[0] }(zip(arr, arr2)); // error } --- I don't know what's difference between int[] and char[] in D, but it's very unnatural.All strings in D are treated as ranges of dchar, not their element type. This has to with the fact that a char or wchar are only part of a character. If you want to sort arrays of characters, you need to use dchar[]. http://stackoverflow.com/questions/12288465 http://stackoverflow.com/questions/16590650 - Jonathan M DavisAlso char[] isn't string.Yes, string is aliased to immutable(string)[], but char[] is still a string type, and what I said applies to all string types. In particular, arrays of char or wchar are called "narrow strings," because it's not guaranteed that one of their code units is a full code point, unlike arrays of dchar. But all arrays of char, wchar, or dchar are strings.Strings are immutable, and thus cannot be sorted in place."string" is immutable and thus couldn't be sorted regardless of whether narrow strings were treated as ranges of dchar or not, but char[] is a string just as much as immutable(char)[] is. It's just not string. - Jonathan M Davis
May 12 2014
On Monday, 12 May 2014 at 18:44:22 UTC, Jonathan M Davis via Digitalmars-d-learn wrote:Sure, you can cast char[] to ubyte[] and sort that if you know that the array only holds pure ASCII. In fact, you can use std.string.representation to do it - e.g. auto ascii = str.representation; and if str were mutable, then you could sort it. But that will only work if the string only contains ASCII characters. Regardless, he wanted to know why he couldn't sort char[], and I explained why - all strings are treated as ranges of dchar, making it so that if their element type is char or wchar, so they're not random access and thus can't be sorted.Arguably, a smart enough implementation should know how to sort a "char[]", while still preserving codepoint integrity. As a matter of fact, the built in "sort" property does it. void main() { char[] s = "éöeèûà".dup; s.sort; writeln(s); } //prints: eàèéöû
May 14 2014
monarch_dodra:As a matter of fact, the built in "sort" property does it.It's going to be deprecated "soon". Bye, bearophile
May 14 2014
On Wednesday, 14 May 2014 at 08:27:46 UTC, monarch_dodra wrote:On Monday, 12 May 2014 at 18:44:22 UTC, Jonathan M Davis via Digitalmars-d-learn wrote:Why would anyone ever want to sort code-points? They might want to sort graphemes, but that's difficult to do in-place (needs O(n) memory, I think...). If out-of-place is good enough someStr.byGrapheme.array.sort();Sure, you can cast char[] to ubyte[] and sort that if you know that the array only holds pure ASCII. In fact, you can use std.string.representation to do it - e.g. auto ascii = str.representation; and if str were mutable, then you could sort it. But that will only work if the string only contains ASCII characters. Regardless, he wanted to know why he couldn't sort char[], and I explained why - all strings are treated as ranges of dchar, making it so that if their element type is char or wchar, so they're not random access and thus can't be sorted.Arguably, a smart enough implementation should know how to sort a "char[]", while still preserving codepoint integrity. As a matter of fact, the built in "sort" property does it. void main() { char[] s = "éöeèûà".dup; s.sort; writeln(s); } //prints: eàèéöû
May 14 2014
On Wednesday, 14 May 2014 at 09:01:23 UTC, John Colvin wrote:Why would anyone ever want to sort code-points?Why not? To remove duplicate characters?They might want to sort graphemes, but that's difficult to do in-place (needs O(n) memory, I think...). If out-of-place is good enough someStr.byGrapheme.array.sort();The current "status quo" in D is that a dchar basically represents a character.
May 15 2014
On Wed, 14 May 2014 08:27:45 +0000 monarch_dodra via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> wrote:On Monday, 12 May 2014 at 18:44:22 UTC, Jonathan M Davis via Digitalmars-d-learn wrote:I don't think that that can be done at the same algorithmic complexity though. So, I don't know if that would be acceptable or not from the standpoint of std.algorithm.sort. But even if it's a good idea, someone would have to special case sort for char[], and no one has done that.Sure, you can cast char[] to ubyte[] and sort that if you know that the array only holds pure ASCII. In fact, you can use std.string.representation to do it - e.g. auto ascii = str.representation; and if str were mutable, then you could sort it. But that will only work if the string only contains ASCII characters. Regardless, he wanted to know why he couldn't sort char[], and I explained why - all strings are treated as ranges of dchar, making it so that if their element type is char or wchar, so they're not random access and thus can't be sorted.Arguably, a smart enough implementation should know how to sort a "char[]", while still preserving codepoint integrity.As a matter of fact, the built in "sort" property does it. void main() { char[] s = "éöeèûà".dup; s.sort; writeln(s); } //prints: eàèéöûI'm surprised. I thought that one of Bearophile's favorite complaints was that it didn't sort unicode properly (and hence one of the reasons that it should be removed). Regardless, I do think that it should be removed. - Jonathan M Davis
May 14 2014
On Wed, 14 May 2014 05:13:42 -0400, Jonathan M Davis via = Digitalmars-d-learn <digitalmars-d-learn puremagic.com> wrote:On Wed, 14 May 2014 08:27:45 +0000 monarch_dodra via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> wrote:=On Monday, 12 May 2014 at 18:44:22 UTC, Jonathan M Davis via Digitalmars-d-learn wrote:I don't think that that can be done at the same algorithmic complexity=Sure, you can cast char[] to ubyte[] and sort that if you know that the array only holds pure ASCII. In fact, you can use std.string.representation to do it - e.g. auto ascii =3D str.representation; and if str were mutable, then you could sort it. But that will only work if the string only contains ASCII characters. Regardless, he wanted to know why he couldn't sort char[], and I explained why - all strings are treated as ranges of dchar, making it so that if their element type is char or wchar, so they're not random access and thus can't be sorted.Arguably, a smart enough implementation should know how to sort a "char[]", while still preserving codepoint integrity.though. So, I don't know if that would be acceptable or not from the standpoin=t =of std.algorithm.sort. But even if it's a good idea, someone would have t=ospecial case sort for char[], and no one has done that.I think it can be done at O(nlgn) complexity, but you must allocate a = block of scratch space to do the sorting. You can't do it in-place becau= se = swapping isn't available. Might as well convert to dchar and back explicitly. Merge sort may allow more promising speedups, but I still think you will= = need to allocate extra space.=As a matter of fact, the built in "sort" property does it. void main() { char[] s =3D "=C3=A9=C3=B6e=C3=A8=C3=BB=C3=A0".dup; s.sort; writeln(s); } //prints: e=C3=A0=C3=A8=C3=A9=C3=B6=C3=BBI'm surprised. I thought that one of Bearophile's favorite complaints =was that it didn't sort unicode properly (and hence one of the reasons that it ==should be removed). Regardless, I do think that it should be removed.I can't believe this worked. I want to say that it's a freak accident fo= r = that set of characters. Looking in druntime, I don't see where the speci= al = case is. -Steve
May 15 2014
On Thursday, 15 May 2014 at 13:26:45 UTC, Steven Schveighoffer wrote:On Wed, 14 May 2014 05:13:42 -0400, Jonathan M Davis via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> wrote:Must be a hell of a freak accident ;) auto s = "é東öe京ûタèワà".dup; writeln(s.sort); => eàèéöûタワ京東 It's in rt/adi.d extern (C) char[] _adSortChar(char[] a) It's basically: string=>dstring=>sort=>dstring=>string. BTW, the built in reverse also works with char[], and so does std.algorithm.reverse (and it does it pretty cleverly too, might I say). As far as I'm concerned, if we *can* do it in n.log(n), and somebody provides the implementation, then there is no reason to not offer dchar sorting for char[]/wchar.On Wed, 14 May 2014 08:27:45 +0000 monarch_dodra via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> wrote:I can't believe this worked. I want to say that it's a freak accident for that set of characters. Looking in druntime, I don't see where the special case is. -SteveAs a matter of fact, the built in "sort" property does it. void main() { char[] s = "éöeèûà".dup; s.sort; writeln(s); } //prints: eàèéöûI'm surprised. I thought that one of Bearophile's favorite complaints was that it didn't sort unicode properly (and hence one of the reasons that it should be removed). Regardless, I do think that it should be removed.
May 15 2014
On Thu, 15 May 2014 11:37:07 -0400, monarch_dodra <monarchdodra gmail.co= m> = wrote:On Thursday, 15 May 2014 at 13:26:45 UTC, Steven Schveighoffer wrote:On Wed, 14 May 2014 05:13:42 -0400, Jonathan M Davis via =s =Digitalmars-d-learn <digitalmars-d-learn puremagic.com> wrote:On Wed, 14 May 2014 08:27:45 +0000 monarch_dodra via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> wrote:As a matter of fact, the built in "sort" property does it. void main() { char[] s =3D "=C3=A9=C3=B6e=C3=A8=C3=BB=C3=A0".dup; s.sort; writeln(s); } //prints: e=C3=A0=C3=A8=C3=A9=C3=B6=C3=BBI'm surprised. I thought that one of Bearophile's favorite complaint=t =was that it didn't sort unicode properly (and hence one of the reasons that i==should be removed). Regardless, I do think that it should be removed.I can't believe this worked. I want to say that it's a freak accident=e =for that set of characters. Looking in druntime, I don't see where th==E3=83=AF=C3=A0".dup;special case is. -SteveMust be a hell of a freak accident ;) auto s =3D "=C3=A9=E6=9D=B1=C3=B6e=E4=BA=AC=C3=BB=E3=82=BF=C3=A8=writeln(s.sort); =3D> e=C3=A0=C3=A8=C3=A9=C3=B6=C3=BB=E3=82=BF=E3=83=AF=E4=BA=AC=E6=9D=B1=It's in rt/adi.dSeriously, I fucking hate the file names in druntime. I looked in qsort.d.extern (C) char[] _adSortChar(char[] a) It's basically: string=3D>dstring=3D>sort=3D>dstring=3D>string.OK, that's what I would have expected. I don't think we should support = this.BTW, the built in reverse also works with char[], and so does =std.algorithm.reverse (and it does it pretty cleverly too, might I say=). This is a different algorithm. Sort requires random-access swapping. = Reverse does not.As far as I'm concerned, if we *can* do it in n.log(n), and somebody =provides the implementation, then there is no reason to not offer dcha=r =sorting for char[]/wchar.I think there is nothing wrong with requiring the steps to be explicit. = We = should not hide such bloat behind sort. -Steve
May 15 2014
On Thursday, 15 May 2014 at 17:46:52 UTC, Steven Schveighoffer wrote:Of course, but I meant if we could find an algoirthm O(1) (or O(log(N)) space overhead. If the only algorithm we are capable of providing just temporarily dstrings, then no.As far as I'm concerned, if we *can* do it in n.log(n), and somebody provides the implementation, then there is no reason to not offer dchar sorting for char[]/wchar.I think there is nothing wrong with requiring the steps to be explicit. We should not hide such bloat behind sort. -Steve
May 15 2014