www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 10125] New: readln!dchar misdecodes Unicode non-BMP

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=10125

           Summary: readln!dchar misdecodes Unicode non-BMP
           Product: D
           Version: D2
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Phobos
        AssignedTo: nobody puremagic.com
        ReportedBy: fw..vdijk gmail.com



readln!dchar decodes Unicode code point >U+FFFF to 2 surrogates instead of 1
dchar containing the code point.

e.g. U+10001 becomes [0xd800,0xdc01] instead of [0x10001]

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
May 20 2013
next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=10125






-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
May 20 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=10125


monarchdodra gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |monarchdodra gmail.com



Concurrently fixed in:
https://github.com/D-Programming-Language/phobos/pull/1381

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jul 06 2013
prev sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=10125




Commits pushed to master at https://github.com/D-Programming-Language/phobos

https://github.com/D-Programming-Language/phobos/commit/8f401a9199b441f941717cda5ab551c4e1a86a40
fix Issue 10125 Unicode non-BMP decoding to dchar in stdio readln

strings were first decoded to wchars, each wchar was then separately
decoded to dchar, resulting in 2 dchars in the surrogate block instead
of 1 correct dchar.

added unit test to verify readln decoding of non-ASCII characters

https://github.com/D-Programming-Language/phobos/commit/1086f2955418a4effd7e815d906460e1b137eb2d


fix Issue 10125 Unicode non-BMP decoding to dchar in stdio readln

Merged.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Aug 04 2013