www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 9173] New: std.string.wrap should conform to Unicode line-breaking algorithm

http://d.puremagic.com/issues/show_bug.cgi?id=9173

           Summary: std.string.wrap should conform to Unicode
                    line-breaking algorithm
           Product: D
           Version: D2
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Phobos
        AssignedTo: nobody puremagic.com
        ReportedBy: hsteoh quickfur.ath.cx



Currently, there are some issues with std.string.wrap:

1) It uses std.uni.isWhite as criterion for line-breaking opportunities, but
isWhite includes such things as non-breaking space, which should *not* be
wrapped. It also includes things like vowel mark separators, which shouldn't be
wrapped, either.

2) It does not take zero-width characters and combining diacritics into account
when counting columns, which means that it will sometimes wrap the line at the
wrong place.

3) It does not wrap CJK text or Thai text correctly.

For reference, here's the Unicode technical reference that describes proper
line-breaking of Unicode text:

http://www.unicode.org/reports/tr14/

(After having read through TR14, I was in awe at how insanely complicated
line-wrapping in Unicode is. So I'd propose that, if nothing else, we should
fix items (1) and (2) above, which should be within the reach of a relatively
simple-to-implement European-centric line wrapping algorithm. People who want
CJK wrapping or other complicated stuff probably want to be writing their own
algo anyway.)

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Dec 17 2012