digitalmars.D - might be a bug in the DMD FrontEnd

Davidl (31/31) Mar 29 2007 i don't see what prevent the following from compiling:

Davidl (4/4) Mar 29 2007 sorry i didn't read the frontend carefully enough. it handles the

Davidl (6/6) Mar 29 2007 err in constfold.c

Daniel Keep (63/95) Mar 29 2007 What's with the casting?

Deewiant (8/17) Mar 30 2007 <...>

Daniel Keep (24/46) Mar 30 2007 True; I hadn't considered that. The first thing I thought was that he

Deewiant (17/21) Mar 30 2007 I agree. However, both Phobos and Tango use char[] for all their

Daniel Keep (19/49) Mar 30 2007 Would *not* work correctly with the above if your string contains

Deewiant (9/21) Mar 30 2007 True. But it's hard to implement (some of) them _without_ supporting non...
Dan (2/7) Mar 30 2007 ^^^ Priceless. : D

Pragma (6/15) Mar 30 2007 Taken from XKCD: http://xkcd.com/c221.html
Don Clugston (6/13) Apr 02 2007 The original IBM random number generator was a bit like that. When a bug...

Davidl <Davidl 126.com> writes:

i don't see what prevent the following from compiling:
import std.stdio;
char[] ctfe()
{
	wchar[] k=3Dcast(wchar[])"int  =

i;/*asdfasf"~cast(wchar[])("adf"~cast(char)100~cast(char)192~cast(char)2=
50)~cast(wchar[])"dsafj*/  =

int j;";
	char[] jimmy =3D cast(char[])(k[0..k.length-1]);
	int i;
	for(i=3D0;i<jimmy.length;)
	{
		if (jimmy[i]=3D=3D0)
			jimmy=3Djimmy[0..i]~jimmy[i+1..jimmy.length];
		else
			i++;
	}
	return jimmy;
	=

}
void main()
{
	mixin(ctfe);
	char[] k=3Dctfe;
         printf ("%s",k.ptr);
}

and by viewing the frontend, i think there might be a bug of slicing  =

wchar[] in compile time.
ilwr, iupr ain't taken care of for wchar[] and dchar[] case

Regards,
David Leon

Mar 29 2007

Davidl <Davidl 126.com> writes:

sorry i didn't read the frontend carefully enough. it handles the  
different cases.
but i still don't get why my little func couldn't be evaulated in compile  
time

Mar 29 2007

Davidl <Davidl 126.com> writes:

err in constfold.c
func cat declare and assign Type t;
but t never used?
	t =3D es2->type;
	es->type =3D type;
either es->type =3D t; or an assertion might be preferred?

Mar 29 2007

Daniel Keep <daniel.keep.lists gmail.com> writes:

Davidl wrote:
 i don't see what prevent the following from compiling:

I can see a few things...

 import std.stdio;
 char[] ctfe()
 {
     wchar[] k=cast(wchar[])"int

What's with the casting?
http://www.digitalmars.com/d/lex.html#StringLiteral -- see the part on
"Postfix" characters.

I'm fairly certain that cast(wchar[]) doesn't do what you *think* it's
doing.  cast(wchar[]) casts an array of chars into an array of wchars...
note that I *did not* say "converts" -- UTF-8 and UTF-16 are very
different encodings, so you can't just cast between them and expect it
to make any sense.

It would be like casting a double pointer to a ushort pointer --
meaningless.

Also, I can't work out why you would want to embed a comment in a mixin
string, but it's not especially problematic :P

 i;/*asdfasf"~cast(wchar[])("adf"~cast(char)100~cast(char)192~cast(char)250)~cast(wchar[])"dsafj*/

cast(char)192, whilst technically valid, is really nasty.  For starters,
'192' isn't a valid character by itself in UTF-8, which means it can't
be printed.  Not to mention the potential byte-order problems.  We have
Unicode escape sequences for a reason.  Again, see the section on string
literals, but basically, we have "\x12" for ASCII characters, "\u1234"
for wide characters, and "\U12345678" for really wide characters.

And, again, that cast doesn't make any sense.

 int j;";
     char[] jimmy = cast(char[])(k[0..k.length-1]);

Dear lord, why?!  You just spent half your time casting it to a wchar
array, and now you're casting it back?!  char[] is perfectly capable of
storing Unicode text, if that's what you're worried about.

Also, you're cutting off the last character of the string, which means
you're losing that last ";", which means your mixin isn't valid, and
will cause compilation to fail.

     int i;
     for(i=0;i<jimmy.length;)
     {
         if (jimmy[i]==0)
             jimmy=jimmy[0..i]~jimmy[i+1..jimmy.length];
         else
             i++;
     }

The only reason I can come up with as to why you're doing the above is
because all that casting above generates a string with null characters
in it... which it wouldn't if you didn't use all the casting.

     return jimmy;
     
 }
 void main()
 {
     mixin(ctfe);
     char[] k=ctfe;
         printf ("%s",k.ptr);

Please don't use printf, at least not without passing the string through
toStringz.  writefln works perfectly fine.  I mean, you even imported
std.stdio...

Ok, let's try rewriting this...

import std.stdio;

wchar[] ctfe()
{
    wchar[] k = "int
i;/*asdfasf"w~("adf"w~cast(wchar)'\x64'~cast(wchar)'\u1234')~"dsafj*/
int j;";
    return k;
}

void main()
{
    mixin(ctfe());
    wchar[] k = ctfe();
    writefln("%s", k);
}

The above works perfectly.  Heck, we could get rid of those cast(wchars)
by just using wchar strings: "\x64\u1234"w.

 }
 
 and by viewing the frontend, i think there might be a bug of slicing
 wchar[] in compile time.
 ilwr, iupr ain't taken care of for wchar[] and dchar[] case
 
 Regards,
 David Leon

It could well be there's a bug in the frontend.  But this is kind of
like setting your house on fire, and then pointing out there's a
burn-mark on the wall. :P

	-- Daniel

-- 
int getRandomNumber()
{
    return 4; // chosen by fair dice roll.
              // guaranteed to be random.
}

http://xkcd.com/

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D
i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/

Mar 29 2007

Deewiant <deewiant.doesnotlike.spam gmail.com> writes:

Daniel Keep wrote:
 Davidl wrote:
 i don't see what prevent the following from compiling:

 
 I can see a few things...

<...>

 cast(char)192, whilst technically valid, is really nasty.  For starters,
 '192' isn't a valid character by itself in UTF-8, which means it can't
 be printed.

<...>

 Please don't use printf, at least not without passing the string through
 toStringz.  writefln works perfectly fine.

Perhaps he's using printf because he wants to output the byte 192 without
getting an "Error: 4invalid UTF-8 sequence". I've found that, currently, in both
Phobos and Tango, the C library is the best way of outputting a character whilst
letting the user worry about whether he can see it properly in his locale or
not.

The point about toStringz still stands, though.

Mar 30 2007

Daniel Keep <daniel.keep.lists gmail.com> writes:

Deewiant wrote:
 Daniel Keep wrote:
 Davidl wrote:
 i don't see what prevent the following from compiling:

 I can see a few things...

 
 <...>
 
 cast(char)192, whilst technically valid, is really nasty.  For starters,
 '192' isn't a valid character by itself in UTF-8, which means it can't
 be printed.

 
 <...>
 
 Please don't use printf, at least not without passing the string through
 toStringz.  writefln works perfectly fine.

 
 Perhaps he's using printf because he wants to output the byte 192 without
 getting an "Error: 4invalid UTF-8 sequence". I've found that, currently, in
both
 Phobos and Tango, the C library is the best way of outputting a character
whilst
 letting the user worry about whether he can see it properly in his locale or
not.
 
 The point about toStringz still stands, though.

True; I hadn't considered that.  The first thing I thought was that he
didn't know about Unicode literals, and was trying to manually encode
the character in UTF-16.

That said, I personally think that if you need to use printf because
writefln is barfing on your string, then that's a bug in your program.
char[] is UTF-8: if you're not storing UTF-8, you should be using
ubyte[], not char[].

Incidentally, since D source must be either ASCII or some variant of
UTF, cast(char)192 isn't a valid character *anyway*, unless it's part of
a multibyte code-point, at which point the argument for outputting it
literally falls apart since he's using it in a mixin :P

Also, I just realised that the "you can't cast arrays of chars around"
is something I should add to my text in D article...

	-- Daniel

-- 
int getRandomNumber()
{
    return 4; // chosen by fair dice roll.
              // guaranteed to be random.
}

http://xkcd.com/

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D
i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/

Mar 30 2007

Deewiant <deewiant.doesnotlike.spam gmail.com> writes:

Daniel Keep wrote:
 That said, I personally think that if you need to use printf because
 writefln is barfing on your string, then that's a bug in your program.
 char[] is UTF-8: if you're not storing UTF-8, you should be using
 ubyte[], not char[].

I agree. However, both Phobos and Tango use char[] for all their
string-processing functions _which also work on non-UTF-8_. This means that to
call such a function you need to do, for instance,
"std.string.strip(*cast(char[])iso-8859-1-string.ptr);" which gets ugly very
quickly.

I hoped that Tango would use ubyte[] in the C standard library, at least, but
no. I understand why not (standard; most people use only char[] and don't want
to do the cast from char[]* to ubyte[]* as above; good for ASCII anyway), and so
I don't complain, but it's still something I'd like.

Perhaps D needs a way to allow implicit conversion:

finally(ubyte[] is char[]) {
	char[] foo(ubyte[] myString) {
		return std.string.strip(myString.dup);
	}
}

<g>

Mar 30 2007

Daniel Keep <daniel.keep.lists gmail.com> writes:

Deewiant wrote:
 Daniel Keep wrote:
 That said, I personally think that if you need to use printf because
 writefln is barfing on your string, then that's a bug in your program.
 char[] is UTF-8: if you're not storing UTF-8, you should be using
 ubyte[], not char[].

 
 I agree. However, both Phobos and Tango use char[] for all their
 string-processing functions _which also work on non-UTF-8_. This means that to
 call such a function you need to do, for instance,
 "std.string.strip(*cast(char[])iso-8859-1-string.ptr);" which gets ugly very
 quickly.
 
 I hoped that Tango would use ubyte[] in the C standard library, at least, but
 no. I understand why not (standard; most people use only char[] and don't want
 to do the cast from char[]* to ubyte[]* as above; good for ASCII anyway), and
so
 I don't complain, but it's still something I'd like.
 
 Perhaps D needs a way to allow implicit conversion:
 
 finally(ubyte[] is char[]) {
 	char[] foo(ubyte[] myString) {
 		return std.string.strip(myString.dup);
 	}
 }
 
 <g>

 foreach( dchar c ; some_string )
 {
     // ...
 }

Would *not* work correctly with the above if your string contains
anything outside of the ASCII range.  Yes, the functions might work with
non-UTF-8 codepages, but that's more a side-effect of how they are
implemented.

I think what Phobos really needs is a character encoding conversion
library, even if it's just a paper-thin binding to iconv or something.

	-- Daniel

[1] I hope I've got the right term; I'm liable to get my head chewed off
if I'm wrong :P

-- 
int getRandomNumber()
{
    return 4; // chosen by fair dice roll.
              // guaranteed to be random.
}

http://xkcd.com/

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D
i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/

Mar 30 2007

Deewiant <deewiant.doesnotlike.spam gmail.com> writes:

Daniel Keep wrote:
 foreach( dchar c ; some_string )
 {
     // ...
 }

 
 Would *not* work correctly with the above if your string contains
 anything outside of the ASCII range.  Yes, the functions might work with
 non-UTF-8 codepages, but that's more a side-effect of how they are
 implemented.

True. But it's hard to implement (some of) them _without_ supporting non-UTF-8.
<g> And there's always the C standard library.

 I think what Phobos really needs is a character encoding conversion
 library, even if it's just a paper-thin binding to iconv or something.
 

The problem is that you often aren't told the encoding, and have to work with
just bytes. You can guess (and I'm sure some pretty smart heuristics have been
developed for this), but it's not perfect, and you still need ASCII whitespace
stripping to work, regardless of the encoding.*

* Okay, so if the 0-127 range isn't ASCII, it won't work, but that's practically
nonexistent these days (at least on the platforms DMD supports).

Mar 30 2007

Dan <murpsoft hotmail.com> writes:

 int getRandomNumber()
 {
     return 4; // chosen by fair dice roll.
               // guaranteed to be random.
 }

^^^ Priceless.  : D

That reminds of the ol' "find x" "here it is ----> x" picture going around.

Mar 30 2007

Pragma <ericanderton yahoo.removeme.com> writes:

Dan wrote:
 int getRandomNumber()
 {
     return 4; // chosen by fair dice roll.
               // guaranteed to be random.
 }

 
 ^^^ Priceless.  : D
 
 That reminds of the ol' "find x" "here it is ----> x" picture going around.

Taken from XKCD: http://xkcd.com/c221.html

The author is one part geek, one part comedian and one part hopeless romantic. 
The comic archive is *packed* with humor 
like this.

-- 
- EricAnderton at yahoo

Mar 30 2007

Don Clugston <dac nospam.com.au> writes:

Dan wrote:
 int getRandomNumber()
 {
     return 4; // chosen by fair dice roll.
               // guaranteed to be random.
 }

 
 ^^^ Priceless.  : D

The original IBM random number generator was a bit like that. When a bug 
report was made, the response was:
"We guarantee that each number is random individually, but we don't 
guarantee that more than one of them is random."
- Numerical Recipes, chapter 7.

Apr 02 2007

D Programming

C/C++ Programming

Other

digitalmars.D - might be a bug in the DMD FrontEnd