www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - might be a bug in the DMD FrontEnd

reply Davidl <Davidl 126.com> writes:
i don't see what prevent the following from compiling:
import std.stdio;
char[] ctfe()
{
	wchar[] k=3Dcast(wchar[])"int  =

i;/*asdfasf"~cast(wchar[])("adf"~cast(char)100~cast(char)192~cast(char)2=
50)~cast(wchar[])"dsafj*/  =

int j;";
	char[] jimmy =3D cast(char[])(k[0..k.length-1]);
	int i;
	for(i=3D0;i<jimmy.length;)
	{
		if (jimmy[i]=3D=3D0)
			jimmy=3Djimmy[0..i]~jimmy[i+1..jimmy.length];
		else
			i++;
	}
	return jimmy;
	=

}
void main()
{
	mixin(ctfe);
	char[] k=3Dctfe;
         printf ("%s",k.ptr);
}

and by viewing the frontend, i think there might be a bug of slicing  =

wchar[] in compile time.
ilwr, iupr ain't taken care of for wchar[] and dchar[] case

Regards,
David Leon
Mar 29 2007
next sibling parent reply Davidl <Davidl 126.com> writes:
sorry i didn't read the frontend carefully enough. it handles the  
different cases.
but i still don't get why my little func couldn't be evaulated in compile  
time
Mar 29 2007
parent Davidl <Davidl 126.com> writes:
err in constfold.c
func cat declare and assign Type t;
but t never used?
	t =3D es2->type;
	es->type =3D type;
either es->type =3D t; or an assertion might be preferred?
Mar 29 2007
prev sibling parent reply Daniel Keep <daniel.keep.lists gmail.com> writes:
Davidl wrote:
 i don't see what prevent the following from compiling:

I can see a few things...
 import std.stdio;
 char[] ctfe()
 {
     wchar[] k=cast(wchar[])"int

What's with the casting? http://www.digitalmars.com/d/lex.html#StringLiteral -- see the part on "Postfix" characters. I'm fairly certain that cast(wchar[]) doesn't do what you *think* it's doing. cast(wchar[]) casts an array of chars into an array of wchars... note that I *did not* say "converts" -- UTF-8 and UTF-16 are very different encodings, so you can't just cast between them and expect it to make any sense. It would be like casting a double pointer to a ushort pointer -- meaningless. Also, I can't work out why you would want to embed a comment in a mixin string, but it's not especially problematic :P
 i;/*asdfasf"~cast(wchar[])("adf"~cast(char)100~cast(char)192~cast(char)250)~cast(wchar[])"dsafj*/

cast(char)192, whilst technically valid, is really nasty. For starters, '192' isn't a valid character by itself in UTF-8, which means it can't be printed. Not to mention the potential byte-order problems. We have Unicode escape sequences for a reason. Again, see the section on string literals, but basically, we have "\x12" for ASCII characters, "\u1234" for wide characters, and "\U12345678" for really wide characters. And, again, that cast doesn't make any sense.
 int j;";
     char[] jimmy = cast(char[])(k[0..k.length-1]);

Dear lord, why?! You just spent half your time casting it to a wchar array, and now you're casting it back?! char[] is perfectly capable of storing Unicode text, if that's what you're worried about. Also, you're cutting off the last character of the string, which means you're losing that last ";", which means your mixin isn't valid, and will cause compilation to fail.
     int i;
     for(i=0;i<jimmy.length;)
     {
         if (jimmy[i]==0)
             jimmy=jimmy[0..i]~jimmy[i+1..jimmy.length];
         else
             i++;
     }

The only reason I can come up with as to why you're doing the above is because all that casting above generates a string with null characters in it... which it wouldn't if you didn't use all the casting.
     return jimmy;
     
 }
 void main()
 {
     mixin(ctfe);
     char[] k=ctfe;
         printf ("%s",k.ptr);

Please don't use printf, at least not without passing the string through toStringz. writefln works perfectly fine. I mean, you even imported std.stdio... Ok, let's try rewriting this... import std.stdio; wchar[] ctfe() { wchar[] k = "int i;/*asdfasf"w~("adf"w~cast(wchar)'\x64'~cast(wchar)'\u1234')~"dsafj*/ int j;"; return k; } void main() { mixin(ctfe()); wchar[] k = ctfe(); writefln("%s", k); } The above works perfectly. Heck, we could get rid of those cast(wchars) by just using wchar strings: "\x64\u1234"w.
 }
 
 and by viewing the frontend, i think there might be a bug of slicing
 wchar[] in compile time.
 ilwr, iupr ain't taken care of for wchar[] and dchar[] case
 
 Regards,
 David Leon

It could well be there's a bug in the frontend. But this is kind of like setting your house on fire, and then pointing out there's a burn-mark on the wall. :P -- Daniel -- int getRandomNumber() { return 4; // chosen by fair dice roll. // guaranteed to be random. } http://xkcd.com/ v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP http://hackerkey.com/
Mar 29 2007
parent reply Deewiant <deewiant.doesnotlike.spam gmail.com> writes:
Daniel Keep wrote:
 Davidl wrote:
 i don't see what prevent the following from compiling:

I can see a few things...

<...>
 cast(char)192, whilst technically valid, is really nasty.  For starters,
 '192' isn't a valid character by itself in UTF-8, which means it can't
 be printed.

<...>
 Please don't use printf, at least not without passing the string through
 toStringz.  writefln works perfectly fine.

Perhaps he's using printf because he wants to output the byte 192 without getting an "Error: 4invalid UTF-8 sequence". I've found that, currently, in both Phobos and Tango, the C library is the best way of outputting a character whilst letting the user worry about whether he can see it properly in his locale or not. The point about toStringz still stands, though.
Mar 30 2007
parent reply Daniel Keep <daniel.keep.lists gmail.com> writes:
Deewiant wrote:
 Daniel Keep wrote:
 Davidl wrote:
 i don't see what prevent the following from compiling:


<...>
 cast(char)192, whilst technically valid, is really nasty.  For starters,
 '192' isn't a valid character by itself in UTF-8, which means it can't
 be printed.

<...>
 Please don't use printf, at least not without passing the string through
 toStringz.  writefln works perfectly fine.

Perhaps he's using printf because he wants to output the byte 192 without getting an "Error: 4invalid UTF-8 sequence". I've found that, currently, in both Phobos and Tango, the C library is the best way of outputting a character whilst letting the user worry about whether he can see it properly in his locale or not. The point about toStringz still stands, though.

True; I hadn't considered that. The first thing I thought was that he didn't know about Unicode literals, and was trying to manually encode the character in UTF-16. That said, I personally think that if you need to use printf because writefln is barfing on your string, then that's a bug in your program. char[] is UTF-8: if you're not storing UTF-8, you should be using ubyte[], not char[]. Incidentally, since D source must be either ASCII or some variant of UTF, cast(char)192 isn't a valid character *anyway*, unless it's part of a multibyte code-point, at which point the argument for outputting it literally falls apart since he's using it in a mixin :P Also, I just realised that the "you can't cast arrays of chars around" is something I should add to my text in D article... -- Daniel -- int getRandomNumber() { return 4; // chosen by fair dice roll. // guaranteed to be random. } http://xkcd.com/ v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP http://hackerkey.com/
Mar 30 2007
parent reply Deewiant <deewiant.doesnotlike.spam gmail.com> writes:
Daniel Keep wrote:
 That said, I personally think that if you need to use printf because
 writefln is barfing on your string, then that's a bug in your program.
 char[] is UTF-8: if you're not storing UTF-8, you should be using
 ubyte[], not char[].

I agree. However, both Phobos and Tango use char[] for all their string-processing functions _which also work on non-UTF-8_. This means that to call such a function you need to do, for instance, "std.string.strip(*cast(char[])iso-8859-1-string.ptr);" which gets ugly very quickly. I hoped that Tango would use ubyte[] in the C standard library, at least, but no. I understand why not (standard; most people use only char[] and don't want to do the cast from char[]* to ubyte[]* as above; good for ASCII anyway), and so I don't complain, but it's still something I'd like. Perhaps D needs a way to allow implicit conversion: finally(ubyte[] is char[]) { char[] foo(ubyte[] myString) { return std.string.strip(myString.dup); } } <g>
Mar 30 2007
parent reply Daniel Keep <daniel.keep.lists gmail.com> writes:
Deewiant wrote:
 Daniel Keep wrote:
 That said, I personally think that if you need to use printf because
 writefln is barfing on your string, then that's a bug in your program.
 char[] is UTF-8: if you're not storing UTF-8, you should be using
 ubyte[], not char[].

I agree. However, both Phobos and Tango use char[] for all their string-processing functions _which also work on non-UTF-8_. This means that to call such a function you need to do, for instance, "std.string.strip(*cast(char[])iso-8859-1-string.ptr);" which gets ugly very quickly. I hoped that Tango would use ubyte[] in the C standard library, at least, but no. I understand why not (standard; most people use only char[] and don't want to do the cast from char[]* to ubyte[]* as above; good for ASCII anyway), and so I don't complain, but it's still something I'd like. Perhaps D needs a way to allow implicit conversion: finally(ubyte[] is char[]) { char[] foo(ubyte[] myString) { return std.string.strip(myString.dup); } } <g>

 foreach( dchar c ; some_string )
 {
     // ...
 }

Would *not* work correctly with the above if your string contains anything outside of the ASCII range. Yes, the functions might work with non-UTF-8 codepages, but that's more a side-effect of how they are implemented. I think what Phobos really needs is a character encoding conversion library, even if it's just a paper-thin binding to iconv or something. -- Daniel [1] I hope I've got the right term; I'm liable to get my head chewed off if I'm wrong :P -- int getRandomNumber() { return 4; // chosen by fair dice roll. // guaranteed to be random. } http://xkcd.com/ v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP http://hackerkey.com/
Mar 30 2007
next sibling parent Deewiant <deewiant.doesnotlike.spam gmail.com> writes:
Daniel Keep wrote:
 foreach( dchar c ; some_string )
 {
     // ...
 }

Would *not* work correctly with the above if your string contains anything outside of the ASCII range. Yes, the functions might work with non-UTF-8 codepages, but that's more a side-effect of how they are implemented.

True. But it's hard to implement (some of) them _without_ supporting non-UTF-8. <g> And there's always the C standard library.
 I think what Phobos really needs is a character encoding conversion
 library, even if it's just a paper-thin binding to iconv or something.
 

The problem is that you often aren't told the encoding, and have to work with just bytes. You can guess (and I'm sure some pretty smart heuristics have been developed for this), but it's not perfect, and you still need ASCII whitespace stripping to work, regardless of the encoding.* * Okay, so if the 0-127 range isn't ASCII, it won't work, but that's practically nonexistent these days (at least on the platforms DMD supports).
Mar 30 2007
prev sibling parent reply Dan <murpsoft hotmail.com> writes:
 int getRandomNumber()
 {
     return 4; // chosen by fair dice roll.
               // guaranteed to be random.
 }

^^^ Priceless. : D That reminds of the ol' "find x" "here it is ----> x" picture going around.
Mar 30 2007
next sibling parent Pragma <ericanderton yahoo.removeme.com> writes:
Dan wrote:
 int getRandomNumber()
 {
     return 4; // chosen by fair dice roll.
               // guaranteed to be random.
 }

^^^ Priceless. : D That reminds of the ol' "find x" "here it is ----> x" picture going around.

Taken from XKCD: http://xkcd.com/c221.html The author is one part geek, one part comedian and one part hopeless romantic. The comic archive is *packed* with humor like this. -- - EricAnderton at yahoo
Mar 30 2007
prev sibling parent Don Clugston <dac nospam.com.au> writes:
Dan wrote:
 int getRandomNumber()
 {
     return 4; // chosen by fair dice roll.
               // guaranteed to be random.
 }

^^^ Priceless. : D

The original IBM random number generator was a bit like that. When a bug report was made, the response was: "We guarantee that each number is random individually, but we don't guarantee that more than one of them is random." - Numerical Recipes, chapter 7.
Apr 02 2007