digitalmars.D.learn - How to reverse char[]?

H. S. Teoh (13/13) Feb 07 2012 Hi all,

Timon Gehr (5/16) Feb 07 2012 char[] is handled by Phobos as a range of dchar, ergo it does not have

James Miller (32/60) Feb 07 2012 ot deduce
Jonathan M Davis (3/7) Feb 07 2012 There already is such an overload in HEAD.
Jos van Uden (2/4) Feb 08 2012 I'm surprised that array.reverse does work (using 2.057)

Steven Schveighoffer (9/13) Feb 08 2012 array.reverse is *not* the same as reverse(array). The former is a

Timon Gehr (7/23) Feb 08 2012 That is obviously the case. It is just that functions taking eg. char[]
H. S. Teoh (10/12) Feb 08 2012 [...]
Jonathan M Davis (13/25) Feb 08 2012 Except that char[] is _not_ an array of characters. It's an array of cod...

Manfred Nowak (12/13) Feb 08 2012 This does not mean, that the data structure representing a sequence of

Jonathan M Davis (21/35) Feb 08 2012 It is impossible to have a random access range of characters with unicod...

H. S. Teoh (10/22) Feb 08 2012 [...]

Jonathan M Davis (8/30) Feb 08 2012 You shouldn't normally have to worry about byte order on char[] at all. ...

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

Hi all,

I'm trying to reverse a character array. Why doesn't the following work?

	import std.algorithm;
	void main() {
		char[] array = ['a', 'b', 'c'];
		reverse(array);
	}

I get:

Error: template std.algorithm.reverse(Range) if (isBidirectionalRange!(Range)
&& hasSwappableElements!(Range)) does not match any function template
declaration
Error: template std.algorithm.reverse(Range) if (isBidirectionalRange!(Range)
&& hasSwappableElements!(Range)) cannot deduce template function from argument
types !()(char[])


T

-- 
Three out of two people have difficulties with fractions. -- Dirk Eddelbuettel

Feb 07 2012

Timon Gehr <timon.gehr gmx.ch> writes:

On 02/08/2012 02:29 AM, H. S. Teoh wrote:
 Hi all,

 I'm trying to reverse a character array. Why doesn't the following work?

 	import std.algorithm;
 	void main() {
 		char[] array = ['a', 'b', 'c'];
 		reverse(array);
 	}

 I get:

 Error: template std.algorithm.reverse(Range) if
(isBidirectionalRange!(Range)&&  hasSwappableElements!(Range)) does not match
any function template declaration
 Error: template std.algorithm.reverse(Range) if
(isBidirectionalRange!(Range)&&  hasSwappableElements!(Range)) cannot deduce
template function from argument types !()(char[])


 T

char[] is handled by Phobos as a range of dchar, ergo it does not have 
swappable elements. Apparently there is no template specialisation of 
'reverse' that handles narrow strings, you might want to file an 
enhancement request.

Feb 07 2012

James Miller <james aatch.net> writes:

 On 02/08/2012 02:29 AM, H. S. Teoh wrote:
 Hi all,

 I'm trying to reverse a character array. Why doesn't the following work?

 =C2=A0 =C2=A0 =C2=A0 =C2=A0import std.algorithm;
 =C2=A0 =C2=A0 =C2=A0 =C2=A0void main() {
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0char[] array =3D =


['a', 'b', 'c'];
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0reverse(array);
 =C2=A0 =C2=A0 =C2=A0 =C2=A0}

 I get:

 Error: template std.algorithm.reverse(Range) if
 (isBidirectionalRange!(Range)&& =C2=A0hasSwappableElements!(Range)) does=


 not
 match any function template declaration
 Error: template std.algorithm.reverse(Range) if
 (isBidirectionalRange!(Range)&& =C2=A0hasSwappableElements!(Range)) cann=


ot deduce
 template function from argument types !()(char[])


 T

 char[] is handled by Phobos as a range of dchar, ergo it does not have
 swappable elements. Apparently there is no template specialisation of
 'reverse' that handles narrow strings, you might want to file an enhancem=

ent
 request.

That seems correct, the `reverse' function tests for
`isBidirectionalRange' and `hasSwappableElements'

The following code shows the results

import std.range;
import std.stdio;
import std.conv;

void main() {
    char[] char_array =3D ['a','b','c'];
    ubyte[] ubyte_array =3D ['a','b','c'];

    writefln("isBidirectonalRange char_array:\t%s",
        to!string(isBidirectionalRange!(typeof(char_array))));
    writefln("isBidirectonalRange ubyte_array:\t%s",
        to!string(isBidirectionalRange!(typeof(ubyte_array))));

    writefln("hasSwappableElements char_array:\t%s",
        to!string(hasSwappableElements!(typeof(char_array))));
    writefln("hasSwappableElements ubyte_array:\t%s",
        to!string(hasSwappableElements!(typeof(ubyte_array))));

}

The output is


isBidirectonalRange char_array:	true
isBidirectonalRange ubyte_array:	true
hasSwappableElements char_array:	false
hasSwappableElements ubyte_array:	true

So if just just need an array of bytes and the `char' semantics are
unimportant, then you can just use a ubyte instead. However Timon is
correct that there should probably be a narrow string version of
`reverse'.

James Miller

Feb 07 2012

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Wednesday, February 08, 2012 02:36:23 Timon Gehr wrote:
 char[] is handled by Phobos as a range of dchar, ergo it does not have
 swappable elements. Apparently there is no template specialisation of
 'reverse' that handles narrow strings, you might want to file an
 enhancement request.

There already is such an overload in HEAD.

- Jonathan M Davis

Feb 07 2012

Jos van Uden <user domain.invalid> writes:

On 8-2-2012 2:36, Timon Gehr wrote:

 char[] is handled by Phobos as a range of dchar, ergo it does not have
 swappable elements.

I'm surprised that array.reverse does work (using 2.057)

Feb 08 2012

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Wed, 08 Feb 2012 04:30:04 -0500, Jos van Uden <user domain.invalid>  
wrote:

 On 8-2-2012 2:36, Timon Gehr wrote:

 char[] is handled by Phobos as a range of dchar, ergo it does not have
 swappable elements.

 I'm surprised that array.reverse does work (using 2.057)

array.reverse is *not* the same as reverse(array).  The former is a  
compiler-builtin property for all arrays (the compiler believes that  
anything of the form T[] is an array, even if it's a narrow-width string  
type), and the latter is a range function.

D will continue to trip over itself and fall into newbies until it makes a  
decision to make strings not also be arrays.

-Steve

Feb 08 2012

Timon Gehr <timon.gehr gmx.ch> writes:

On 02/08/2012 03:56 PM, Steven Schveighoffer wrote:
 On Wed, 08 Feb 2012 04:30:04 -0500, Jos van Uden <user domain.invalid>
 wrote:

 On 8-2-2012 2:36, Timon Gehr wrote:

 char[] is handled by Phobos as a range of dchar, ergo it does not have
 swappable elements.

 I'm surprised that array.reverse does work (using 2.057)

 array.reverse is *not* the same as reverse(array). The former is a
 compiler-builtin property for all arrays (the compiler believes that
 anything of the form T[] is an array,

That is obviously the case. It is just that functions taking eg. char[] 
often have an in-contract that the array contains a valid utf-8 string.

 even if it's a narrow-width string
 type),  and the latter is a range function.

Luckily, array.reverse is going away. Anyway, note that char[].reverse 
reverses unicode code points, not code units.

 D will continue to trip over itself and fall into newbies until it makes
 a decision to make strings not also be arrays.

 -Steve

When I was a newbie, I liked the design of D strings. Anyway it is not 
the case that strings are _also_ arrays. D strings are arrays.

Feb 08 2012

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Wed, Feb 08, 2012 at 09:56:17AM -0500, Steven Schveighoffer wrote:
[...]
 D will continue to trip over itself and fall into newbies until it
 makes a decision to make strings not also be arrays.

[...]

I disagree. D will continue to trip over itself until it treats all
arrays equally, that is, if reverse() works on ubyte[], then it should
also work on char[]. There's nothing wrong with treating a string as an
array. After all, "string" means "string of characters", i.e., an array.


T

-- 
Change is inevitable, except from a vending machine.

Feb 08 2012

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Wednesday, February 08, 2012 07:39:44 H. S. Teoh wrote:
 On Wed, Feb 08, 2012 at 09:56:17AM -0500, Steven Schveighoffer wrote:
 [...]
 
 D will continue to trip over itself and fall into newbies until it
 makes a decision to make strings not also be arrays.

 
 [...]
 
 I disagree. D will continue to trip over itself until it treats all
 arrays equally, that is, if reverse() works on ubyte[], then it should
 also work on char[]. There's nothing wrong with treating a string as an
 array. After all, "string" means "string of characters", i.e., an array.

Except that char[] is _not_ an array of characters. It's an array of code 
units. There is a _big_ difference. Not even dchar[] is an array of characters. 
It's both an array of code units and an array of code points, but not even 
that quite gets you characters (though at this point, Phobos pretty much 
treats a code point as if it were a character). If you want a character, you 
need a grapheme (which could be multiple code points). _That_ is where the 
problem comes in.

You can definitely do array operations on strings. In fact, it can be very 
desirable to do so if you want to process strings efficiently. But if you treat 
them like you would ubyte[], you're in for a heap of trouble thanks to how 
unicode works.

- Jonathan M Davis

Feb 08 2012

Manfred Nowak <svv1999 hotmail.com> writes:

Jonathan M Davis wrote:

 thanks to how unicode works

This does not mean, that the data structure representing a sequence of 
"letters" has to follow exactly the "working" you cited above. That 
data structure must only enable it efficiently. If a requirement for 
sequences of letters is, that a sequence `s' of letters indexed by some 
natural number `n' gives the letter `s[n]' and that is not efficiently 
possible, than unicode and its "workings" are as maldesigned as the 
alphabet Gutenberg has to take to produce books:

take randomly an ancient book `b' and randomly a letter `c'. Then try 
to verify that `b[ 314.159] == c'. Of course you are allowed to read 
only one letter.

-manfred

Feb 08 2012

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Wednesday, February 08, 2012 17:52:17 Manfred Nowak wrote:
 Jonathan M Davis wrote:
 thanks to how unicode works

 
 This does not mean, that the data structure representing a sequence of
 "letters" has to follow exactly the "working" you cited above. That
 data structure must only enable it efficiently. If a requirement for
 sequences of letters is, that a sequence `s' of letters indexed by some
 natural number `n' gives the letter `s[n]' and that is not efficiently
 possible, than unicode and its "workings" are as maldesigned as the
 alphabet Gutenberg has to take to produce books:
 
 take randomly an ancient book `b' and randomly a letter `c'. Then try
 to verify that `b[ 314.159] == c'. Of course you are allowed to read
 only one letter.

It is impossible to have a random access range of characters with unicode 
unless you have a range of graphemes - which would require a grapheme to be a 
struct of some kind which represented a character - either that or an array of 
arrays. So, you could have

char[][]

where each char[] is a grapheme. But as long as you're dealing with an array 
of code units or code points like we do now, it's impossible to have efficient 
random access of characters. Phobos currently takes the tact of treating a 
code point as a character, which _mostly_ works, but it's not correct.

And while unicode could definitely have been designed better IMHO (e.g. forcing 
code point order with modifying code points and _not_ having multiple ways to 
generate the same character), the core problem is that you're forced to have 
variable length encodings. It wouldn't be feasible to have an integral value 
which represented _every_ single character, because of the combinatorial 
explosion caused by code points which modify other code points (e.g. 
subscript, superscript, cedille, etc.). So, there are problems which are just 
integral to the issue of designing unicode and which cannot be avoided no 
matter how good a job you do at designing unicode. And, of course, there are 
issues with the design on top of that.

- Jonathan M Davis

Feb 08 2012

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Wed, Feb 08, 2012 at 08:32:32AM -0800, Jonathan M Davis wrote:
[...]
 Except that char[] is _not_ an array of characters. It's an array of
 code units. There is a _big_ difference. Not even dchar[] is an array
 of characters.  It's both an array of code units and an array of code
 points, but not even that quite gets you characters (though at this
 point, Phobos pretty much treats a code point as if it were a
 character). If you want a character, you need a grapheme (which could
 be multiple code points). _That_ is where the problem comes in.
 
 You can definitely do array operations on strings. In fact, it can be
 very desirable to do so if you want to process strings efficiently.
 But if you treat them like you would ubyte[], you're in for a heap of
 trouble thanks to how unicode works.

[...]

Except that the point of my code was to fix byte-order so that they can
be correctly interpreted. I suppose I really should be using ubyte[] for
that instead, and perhaps use a union to translate it to char[] when I
call decode().


T

-- 
Ph.D. = Permanent head Damage

Feb 08 2012

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Wednesday, February 08, 2012 09:35:28 H. S. Teoh wrote:
 On Wed, Feb 08, 2012 at 08:32:32AM -0800, Jonathan M Davis wrote:
 [...]
 
 Except that char[] is _not_ an array of characters. It's an array of
 code units. There is a _big_ difference. Not even dchar[] is an array
 of characters. It's both an array of code units and an array of code
 points, but not even that quite gets you characters (though at this
 point, Phobos pretty much treats a code point as if it were a
 character). If you want a character, you need a grapheme (which could
 be multiple code points). _That_ is where the problem comes in.
 
 You can definitely do array operations on strings. In fact, it can be
 very desirable to do so if you want to process strings efficiently.
 But if you treat them like you would ubyte[], you're in for a heap of
 trouble thanks to how unicode works.

 
 [...]
 
 Except that the point of my code was to fix byte-order so that they can
 be correctly interpreted. I suppose I really should be using ubyte[] for
 that instead, and perhaps use a union to translate it to char[] when I
 call decode().

You shouldn't normally have to worry about byte order on char[] at all. So, I 
don't know what you'd be doing that would result in them being in the wrong 
order. But char is a UTF-8 code unit by definition, so if you're doing 
something that involves char[] not being a valid array of UTF-8 code units, 
you're almost certainly going to want to be using ubyte[] instead. There's a 
lot of stuff in Phobos which will through if you have invalid code points.

- Jonathan M Davis

Feb 08 2012

D Programming

C/C++ Programming

Other

digitalmars.D.learn - How to reverse char[]?