|
Archives
D Programming
digitalmars.Ddigitalmars.D.bugs digitalmars.D.dtl digitalmars.D.ide digitalmars.D.dwt digitalmars.D.announce digitalmars.D.learn digitalmars.D.debugger D.gnu D C/C++ Programming
c++c++.announce c++.atl c++.beta c++.chat c++.command-line c++.dos c++.dos.16-bits c++.dos.32-bits c++.idde c++.mfc c++.rtl c++.stl c++.stl.hp c++.stl.port c++.stl.sgi c++.stlsoft c++.windows c++.windows.16-bits c++.windows.32-bits c++.wxwindows digitalmars.empire digitalmars.DMDScript electronics |
digitalmars.D - Re: Making all strings UTF ranges has some risk of WTF
Andrei Alexandrescu Wrote:It's no secret that string et al. are not a magic recipe for writing correct Unicode code. However, things are pretty good and could be further improved by operating the following changes in std.array and std.range: These changes effectively make UTF-8 and UTF-16 bidirectional ranges, with the quirk that you still have a sort of a random-access operator. I'm very strongly in favor of this change. Bidirectional strings allow beautiful correct algorithms to be written that handle encoded strings without any additional effort; with these changes, everything applicable of std.algorithm works out of the box (with the appropriate fixes here and there), which is really remarkable. The remaining WTF is the length property. Traditionally, a range offering length also implies the expectation that a range of length n allows you to call popFront n times and then assert that the range is empty. However, if you check e.g. hasLength!string it will yield false, although the string does have an accessible member by that name and of the appropriate type. Although Phobos always checks its assumptions, people might occasionally write code that just uses .length without checking hasLength. Then, they'll be annoyed when the code fails with UTF-8 and UTF-16 strings. (The "real" length of the range is not stored, but can be computed by using str.walkLength() in std.range.) What can be done about that? I see a number of solutions: Feb 04 2010
|