www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - char[] -> wchar[] cast error

reply teqDruid <me teqdruid.com> writes:
If char[].length is not even, casting to a wchar[] gives an "Error: array
cast misalignment" in some cases.  Seems to happen only if the char[] is a
variable, since 'cast(wchar[])"hello"' works.  Interesting bug...

I'm running DMD 0.95 on Linux.

John

Example:
--------- dtest.d ----------

void main(char[][] args)
{
	wchar[] wstring = cast(wchar[])args[1];
}
----------------------------
$ dmd dtest.d
gcc dtest.o -o dtest -lphobos -lpthread -lm
$ ./dtest hello
Error: array cast misalignment
$ ./dtest hell
$ ./dtest fivec
Error: array cast misalignment
$ ./dtest five
$
Jul 11 2004
parent reply Arcane Jill <Arcane_member pathlink.com> writes:
In article <pan.2004.07.12.05.44.19.995572 teqdruid.com>, teqDruid says...
If char[].length is not even, casting to a wchar[] gives an "Error: array
cast misalignment" in some cases.

Forgive me, but what possible meaning can there be to *CAST* a char[] array to a wchar[] array? The only thing I can imagine CASTING a char[] array to is a ubyte[] array or a void[] array. Nothing else makes even the remotest conceptual sense (to me). I'm not surprised that this gives an error. (In fact, as a advocate of typesafety, I would even argue that such a cast ought to a compile time error, regardeless of the length, but D lets you do type-unsafe things). If you want to CONVERT from UTF-8 to UTF-16, what you need is not a cast, but a function call: std.utf.toUTF16(). What are you trying to achieve, exactly? Arcane Jill
Jul 12 2004
next sibling parent reply "Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:
way to be kind and understanding jill :|

i would imagine he's trying to convert to utf-16, in which case you're
right, he should be using that function.

but i won't lie to you - the manual is never really clear as to what
_exactly_ casting a char[] to a wchar[]/dchar[] does!  i would've thought it
would convert as well, but oh well..
Jul 12 2004
parent reply Regan Heath <regan netwin.co.nz> writes:
On Mon, 12 Jul 2004 11:05:28 -0400, Jarrett Billingsley 
<kb3ctd2 yahoo.com> wrote:
 way to be kind and understanding jill :|

 i would imagine he's trying to convert to utf-16, in which case you're
 right, he should be using that function.

 but i won't lie to you - the manual is never really clear as to what
 _exactly_ casting a char[] to a wchar[]/dchar[] does!  i would've 
 thought it
 would convert as well, but oh well..

Given that the documentation on arrays: http://www.digitalmars.com/d/arrays.html "String literals are implicitly converted between chars, wchars, and dchars as necessary." Then the compiler must know how to convert them, so it should probably convert them on a cast. Unless these conversions add bloat to the compiler, or executable produced, then it might be best to leave the conversion to library functions. The former has the 'it's so easy' factor however. ;) Regan -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Jul 12 2004
parent reply Regan Heath <regan netwin.co.nz> writes:
On Tue, 13 Jul 2004 09:24:19 +1200, Regan Heath <regan netwin.co.nz> wrote:

 On Mon, 12 Jul 2004 11:05:28 -0400, Jarrett Billingsley 
 <kb3ctd2 yahoo.com> wrote:
 way to be kind and understanding jill :|

 i would imagine he's trying to convert to utf-16, in which case you're
 right, he should be using that function.

 but i won't lie to you - the manual is never really clear as to what
 _exactly_ casting a char[] to a wchar[]/dchar[] does!  i would've 
 thought it
 would convert as well, but oh well..

Given that the documentation on arrays: http://www.digitalmars.com/d/arrays.html "String literals are implicitly converted between chars, wchars, and dchars as necessary." Then the compiler must know how to convert them, so it should probably convert them on a cast. Unless these conversions add bloat to the compiler, or executable produced, then it might be best to leave the conversion to library functions. The former has the 'it's so easy' factor however. ;)

I have had jumped the fence at least 3 times thinking about this. My current thought is that it makes sense that casting from char[] to dchar[] should convert the data but only because char and dchar have a specified encoding type. It does not make sense to convert the data if going to/from a type with no specified encoding i.e. ubyte[] to dchar[] should not attempt any conversion. If you do not convert the data, as is currently the case, then the cast could cause illegal values in the resulting array.. couldn't it? Regan. -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Jul 12 2004
parent reply Arcane Jill <Arcane_member pathlink.com> writes:
In article <opsa1q9btr5a2sq9 digitalmars.com>, Regan Heath says...

If you do not convert the data, as is currently the case, then the cast 
could cause illegal values in the resulting array.. couldn't it?

Yes. Which I why I think it should be either outlawed or made to work. It would appear that cast(dchar[])"string" converts fine - but that of course is done at compile-time, so there's no run-time overhead. It's just another way of writing a dchar[] literal. I would be well in favor of extending this behavior to run-time. If cast(dchar[]) could be made to call std.utf.toUTF32(), things would be a lot more consistent. Arcane Jill
Jul 13 2004
parent reply "Walter" <newshound digitalmars.com> writes:
"Arcane Jill" <Arcane_member pathlink.com> wrote in message
news:cd05jl$25af$1 digitaldaemon.com...
 I would be well in favor of extending this behavior to run-time. If
 cast(dchar[]) could be made to call std.utf.toUTF32(), things would be a

 more consistent.

Casting on arrays is done as a type 'paint', because there are many programming tasks where an array of data is built up as one type, then interpreted as another. For example reading things off of disk.
Jul 14 2004
parent reply Regan Heath <regan netwin.co.nz> writes:
On Wed, 14 Jul 2004 02:44:42 -0700, Walter <newshound digitalmars.com> 
wrote:

 "Arcane Jill" <Arcane_member pathlink.com> wrote in message
 news:cd05jl$25af$1 digitaldaemon.com...
 I would be well in favor of extending this behavior to run-time. If
 cast(dchar[]) could be made to call std.utf.toUTF32(), things would be a

 more consistent.

Casting on arrays is done as a type 'paint', because there are many programming tasks where an array of data is built up as one type, then interpreted as another. For example reading things off of disk.

I agree that this is the way it should work for types like ubyte, ushort etc which 'have no specified encoding' BUT types with an encoding should be treated differently BUT only when casting from one with an encoding to another with an encoding. Using your reading from disk example. You read the data into a ubyte[] (has NO encoding) then you cast to char[] (has encoding) this does NOT perform any conversion. It 'paint's the ubytes as chars. Alternately if you know the encoding of the file, you could read straight into char, wchar or dchar. If you next cast from that char[] to dchar[] it SHOULD convert the data, and it can convert the data, it knows the first encoding UTF-8 and it knows the seccond encoding UTF-32. It makes NO sense whatsoever to 'paint' UTF-8 as UTF-32 all you get is an illegal UTF-32 array. If you need to 'paint' the char[], wchar[], or dchar[] to int[] this would NOT convert the data either. as int has no encoding. So the rule is, if both have encoding then convert, otherwise paint. Regan. -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Jul 14 2004
parent reply teqDruid <me teqdruid.com> writes:
It sure would be handy if the compiler could implicitly convert from
char[] to wchar[] to dchar[].
Jul 14 2004
parent "Walter" <newshound digitalmars.com> writes:
"teqDruid" <me teqdruid.com> wrote in message
news:pan.2004.07.15.00.10.47.867494 teqdruid.com...
 It sure would be handy if the compiler could implicitly convert from
 char[] to wchar[] to dchar[].

There's a handy set of functions in std.utf to do just that <g>.
Aug 18 2004
prev sibling parent teqDruid <me teqdruid.com> writes:
When you use the cast(wchar[])"something here" it does a convert, so I
thought that it did a convert in the case of something like char[] a=
"something here";wchar[] wa = cast(wchar[])a;

I deleted the post when I realized my stupidity, and posted another one
that this behavior should be noted in the specs.

On Mon, 12 Jul 2004 08:17:34 +0000, Arcane Jill wrote:

 In article <pan.2004.07.12.05.44.19.995572 teqdruid.com>, teqDruid says...
If char[].length is not even, casting to a wchar[] gives an "Error: array
cast misalignment" in some cases.

Forgive me, but what possible meaning can there be to *CAST* a char[] array to a wchar[] array? The only thing I can imagine CASTING a char[] array to is a ubyte[] array or a void[] array. Nothing else makes even the remotest conceptual sense (to me). I'm not surprised that this gives an error. (In fact, as a advocate of typesafety, I would even argue that such a cast ought to a compile time error, regardeless of the length, but D lets you do type-unsafe things). If you want to CONVERT from UTF-8 to UTF-16, what you need is not a cast, but a function call: std.utf.toUTF16(). What are you trying to achieve, exactly? Arcane Jill

Jul 12 2004