digitalmars.D.bugs - [Issue 5904] New: std.json parseString doesn't handle chars outside the BMP
- d-bugmail puremagic.com (28/28) Apr 28 2011 http://d.puremagic.com/issues/show_bug.cgi?id=5904
http://d.puremagic.com/issues/show_bug.cgi?id=5904 Summary: std.json parseString doesn't handle chars outside the BMP Product: D Version: D2 Platform: Other OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Phobos AssignedTo: nobody puremagic.com ReportedBy: sean invisibleduck.org --- Comment #0 from Sean Kelly <sean invisibleduck.org> 2011-04-28 12:24:48 PDT --- According to RFC 4627, characters outside the Basic Multilingual Plane (ie. those that require more than two bytes to represent) are encoded as a surrogate pair in JSON strings. In effect, what you have to do is test whether a "\uXXXX" value is >= 0xD800 and <= 0xDBFF. If so, then the next value should be another "\uXXXX" character representing the low surrogate. To verify this, the value should be >= 0xDC00 and <= 0xDFFF. If it isn't, then skip the preceding "\uXXXX" value (the high surrogate) as invalid and decode the following "\uXXXX" value as a standalone Unicode code-point (the RFC is actually unclear on this point, but this seems the most reasonable failure mode). Assuming that you have a valid high and low surrogate, stick them into a wchar[2] and convert to UTF8. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Apr 28 2011