digitalmars.D.bugs - [Issue 6458] New: Multibyte char literals shouldn't implicitly convert to char
- d-bugmail puremagic.com (35/35) Aug 08 2011 http://d.puremagic.com/issues/show_bug.cgi?id=6458
- d-bugmail puremagic.com (24/24) Aug 08 2011 http://d.puremagic.com/issues/show_bug.cgi?id=6458
- d-bugmail puremagic.com (18/27) Aug 08 2011 http://d.puremagic.com/issues/show_bug.cgi?id=6458
- d-bugmail puremagic.com (11/11) Aug 08 2011 http://d.puremagic.com/issues/show_bug.cgi?id=6458
- d-bugmail puremagic.com (11/11) Aug 08 2011 http://d.puremagic.com/issues/show_bug.cgi?id=6458
- d-bugmail puremagic.com (7/10) Aug 08 2011 http://d.puremagic.com/issues/show_bug.cgi?id=6458
- d-bugmail puremagic.com (9/9) Aug 08 2011 http://d.puremagic.com/issues/show_bug.cgi?id=6458
- d-bugmail puremagic.com (16/16) Aug 08 2011 http://d.puremagic.com/issues/show_bug.cgi?id=6458
- d-bugmail puremagic.com (12/20) Aug 09 2011 http://d.puremagic.com/issues/show_bug.cgi?id=6458
- d-bugmail puremagic.com (15/18) Jan 30 2012 http://d.puremagic.com/issues/show_bug.cgi?id=6458
- d-bugmail puremagic.com (11/11) Jan 30 2012 http://d.puremagic.com/issues/show_bug.cgi?id=6458
- d-bugmail puremagic.com (10/10) Jan 31 2012 http://d.puremagic.com/issues/show_bug.cgi?id=6458
- d-bugmail puremagic.com (14/14) Apr 20 2012 http://d.puremagic.com/issues/show_bug.cgi?id=6458
- d-bugmail puremagic.com (12/12) Jul 19 2012 http://d.puremagic.com/issues/show_bug.cgi?id=6458
- d-bugmail puremagic.com (7/7) Jul 20 2012 http://d.puremagic.com/issues/show_bug.cgi?id=6458
- d-bugmail puremagic.com (9/9) Jul 20 2012 http://d.puremagic.com/issues/show_bug.cgi?id=6458
http://d.puremagic.com/issues/show_bug.cgi?id=6458 Summary: Multibyte char literals shouldn't implicitly convert to char Product: D Version: D2 Platform: Other OS/Version: Windows Status: NEW Severity: normal Priority: P2 Component: DMD AssignedTo: nobody puremagic.com ReportedBy: clugdbug yahoo.com.au --- Comment #0 from Don <clugdbug yahoo.com.au> 2011-08-08 21:43:38 PDT --- The code below should either be rejected, or work correctly. The particularly problematic case is: s[0..2] = 'ä', which looks perfectly reasonable, but creates garbage. I'm a bit confused about non-ASCII char literals, since although they are typed as 'char', they can't be stored in a char... This just seems wrong. ---- int bug6458() { char [] s = "abcdef".dup; s[0] = 'ä'; assert(s == "äcdef"); return 34; } void main() { bug6458(); } Surely this has been reported before, but I can't find it. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Aug 08 2011
http://d.puremagic.com/issues/show_bug.cgi?id=6458 Jonathan M Davis <jmdavisProg gmx.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jmdavisProg gmx.com --- Comment #1 from Jonathan M Davis <jmdavisProg gmx.com> 2011-08-08 21:53:05 PDT --- Personally, I think that all character literals should be typed as dchar, since it's generally a _bad_ idea to operate on individual chars or wchars. Normally, the only places that chars or wchars should be used is in ranges of chars or wchars (which would normally be arrays). But making character literals dchar be default might break too much code at this point. Though, since it should be possible to use range propagation to verify whether a particular code point will fit in a particular code unit, the breakage might be minimal. Regardless, I actually never would have expected s[0 .. 2] = 'ä' to work, since you're assigning a character to multiple characters as far as types go, though I can see why you might think that it would work or why it arguably _should_ work. Obviously though, if the compiler is allowing you to assign a code point to multiple code units like that, it should only compile if it can verify that the code unit will fit exactly in those code units, and if it does compile, it should work correctly rather than generate garbage. So, there are several issues at work here it seems. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Aug 08 2011
http://d.puremagic.com/issues/show_bug.cgi?id=6458 Don <clugdbug yahoo.com.au> changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |accepts-invalid --- Comment #2 from Don <clugdbug yahoo.com.au> 2011-08-08 22:27:32 PDT --- (In reply to comment #1)Personally, I think that all character literals should be typed as dchar, since it's generally a _bad_ idea to operate on individual chars or wchars. Normally, the only places that chars or wchars should be used is in ranges of chars or wchars (which would normally be arrays). But making character literals dchar be default might break too much code at this point. Though, since it should be possible to use range propagation to verify whether a particular code point will fit in a particular code unit, the breakage might be minimal.Oddly, this passes: static assert('ä'.sizeof == 2); So there's something a bit nonsensical about the whole thing.Regardless, I actually never would have expected s[0 .. 2] = 'ä' to work, since you're assigning a character to multiple characters as far as types go,It's more subtle. This is block assignment. s[0..4] = 'a'; works, and creates "aaaa". s[0..4] = 'ä' is expected to fill the string with ä, creating "ää". Instead, it fills it with four copies of the first uft8 byte of ä, creating an invalid string. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Aug 08 2011
http://d.puremagic.com/issues/show_bug.cgi?id=6458 --- Comment #3 from Jonathan M Davis <jmdavisProg gmx.com> 2011-08-08 22:33:20 PDT --- Ah, yes. I forgot that you could assign a single value to every element in an array like that. That being the case, it should just fail to compile given that the code point is not going to fit in each of the elements of the array. But regardless, something odd is definitely going on here given that 'ä'.sizeof == 2. It's probably an edge case which wasn't caught, since the only types which take up multiple elements like that are char and wchar. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Aug 08 2011
http://d.puremagic.com/issues/show_bug.cgi?id=6458 changlon <changlon gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |changlon gmail.com --- Comment #4 from changlon <changlon gmail.com> 2011-08-08 23:13:53 PDT --- s[0..3] = 'a'; this should raise an exception ? -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Aug 08 2011
http://d.puremagic.com/issues/show_bug.cgi?id=6458 --- Comment #5 from changlon <changlon gmail.com> 2011-08-08 23:14:35 PDT --- (In reply to comment #4)s[0..3] = 'a'; this should raise an exception ?sorry , I mean s[0..3] = 'ä'; -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Aug 08 2011
http://d.puremagic.com/issues/show_bug.cgi?id=6458 --- Comment #6 from Jonathan M Davis <jmdavisProg gmx.com> 2011-08-08 23:19:15 PDT --- It shouldn't even compile, because the types don't match. Even with range propagation, the best that you'll do with 'ä' is fit it in a wchar, so it won't fit in a char, and so you _can't_ assign it to each element of s[0 .. 3] like that. s[0 .. 3] = "ä"[] should work, but s[0 .. 3] = 'ä' definitely shouldn't. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Aug 08 2011
http://d.puremagic.com/issues/show_bug.cgi?id=6458 Jacob Carlborg <doob me.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |doob me.com --- Comment #7 from Jacob Carlborg <doob me.com> 2011-08-08 23:44:22 PDT --- As far as I can see, D uses the smallest type necessary to fit a character literal. So all non-ascii character literals will either be wchar or dchar. Both of the following passes, as expected. static assert(is(typeof('ä') == wchar)); static assert(is(typeof('a') == char)); But I don't know why the compiler allows to assign a wchar to a char array element. That doesn't seem right. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Aug 08 2011
http://d.puremagic.com/issues/show_bug.cgi?id=6458 --- Comment #8 from Don <clugdbug yahoo.com.au> 2011-08-09 00:09:02 PDT --- (In reply to comment #7)As far as I can see, D uses the smallest type necessary to fit a character literal. So all non-ascii character literals will either be wchar or dchar. Both of the following passes, as expected. static assert(is(typeof('ä') == wchar)); static assert(is(typeof('a') == char));That's good news. Seems like it's only a few cases where it behaves stupidly.But I don't know why the compiler allows to assign a wchar to a char array element. That doesn't seem right.It's more general than that: wchar w = 'ä'; char c = w; // Error: cannot implicitly convert expression (w) of type wchar to char char c = 'ä'; // passes!!! -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Aug 09 2011
http://d.puremagic.com/issues/show_bug.cgi?id=6458 yebblies <yebblies gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |yebblies gmail.com Platform|Other |All AssignedTo|nobody puremagic.com |yebblies gmail.com OS/Version|Windows |All --- Comment #10 from yebblies <yebblies gmail.com> 2012-01-31 15:24:30 EST --- (In reply to comment #9)The compiler complains about the code above, just as it should, because a long won't fit in an int. Don't know why character literals are treated differently.They aren't. The problem is that 'ä' evaluates to 0x00E4, and a bug in integer range propagation thinks this is ok to convert back to a char. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jan 30 2012
http://d.puremagic.com/issues/show_bug.cgi?id=6458 yebblies <yebblies gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |patch --- Comment #11 from yebblies <yebblies gmail.com> 2012-01-31 15:48:56 EST --- Actually, this doesn't involve integer range propagation. https://github.com/D-Programming-Language/dmd/pull/663 -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jan 30 2012
http://d.puremagic.com/issues/show_bug.cgi?id=6458 yebblies <yebblies gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |andrei metalanguage.com --- Comment #12 from yebblies <yebblies gmail.com> 2012-02-01 14:48:05 EST --- *** Issue 6988 has been marked as a duplicate of this issue. *** -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jan 31 2012
http://d.puremagic.com/issues/show_bug.cgi?id=6458 SomeDude <lovelydear mailmetrash.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |lovelydear mailmetrash.com --- Comment #13 from SomeDude <lovelydear mailmetrash.com> 2012-04-20 16:01:13 PDT --- This doesn't compile on 2.059 Win32. PS E:\DigitalMars\dmd2\samples> rdmd -w bug.d bug.d(4): invalid UTF-8 sequence bug.d(5): invalid UTF-8 sequence PS E:\DigitalMars\dmd2\samples> -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Apr 20 2012
http://d.puremagic.com/issues/show_bug.cgi?id=6458 Kenji Hara <k.hara.pg gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Version|D2 |D1 --- Comment #14 from Kenji Hara <k.hara.pg gmail.com> 2012-07-19 09:12:46 PDT --- D2 is fixed, but D1 also has same issue. Pull request for D1: https://github.com/D-Programming-Language/dmd/pull/1056 -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 19 2012
http://d.puremagic.com/issues/show_bug.cgi?id=6458 --- Comment #15 from Kenji Hara <k.hara.pg gmail.com> 2012-07-20 23:33:01 PDT --- Fixed for D1: https://github.com/9rnsr/dmd/commit/6f5ae56f52c1f2a8af921905926a3ea4752ee388 -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 20 2012
http://d.puremagic.com/issues/show_bug.cgi?id=6458 Kenji Hara <k.hara.pg gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 20 2012