www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - proposal string std.utf:sanitizeUTF(string) which returns an always

I keep running into issues due to auto-decoding (arguably a significant
design flaw of phobos) when using strings from external sources (which may
not be 100% valid UTF8) eg see stracktrace [1] on
getSomeExternalString().splitLines,

Could we have something like `sanitizeUTF` in std.utf, to allow for a
simple fix when running into such UTF8 issues see proposal implementation
[2]; the fix would then be:
```
getSomeExternalString().splitLines,
=>
getSomeExternalString().sanitizeUTF.splitLines,
```


[1]
core.exception.AssertError std/utf.d(2254): Assertion failure
----------------
??:? _d_assert [0x4f4e63]
??:? void std.utf.__assert(int) [0x53a304]
??:? pure nothrow  nogc  safe ubyte
std.utf.codeLength!(char).codeLength(dchar) [0xa5d78191]
??:? pure nothrow  nogc  safe int
std.string.stripRight!(immutable(char)[]).stripRight(immutable(char)[]).__foreachbody2(ref
ulong, ref dchar) [0xa5c42bd9]
??:? _aApplyRcd2 [0x4f9bd1]
??:? pure  nogc  safe immutable(char)[]
std.string.stripRight!(immutable(char)[]).stripRight(immutable(char)[])
[0xa5c42b5c]
??:? pure  property  nogc  safe immutable(char)[]
std.algorithm.iteration.stripRight.MapResult.front() [0xa5cda053]
??:? pure  safe immutable(char)[]
std.array.join!(std.algorithm.iteration.stripRight.MapResult,
immutable(char)[]).join(std.algorithm.iteration.stripRight.MapResult,
immutable(char)[]) [0xa5cda39a]


[2] sanitizeUTF proposal:
// TODO: rangeify to make it work in more situations
string sanitizeUTF(string a){
  import std.utf;
  Appender!string b;
  while(a.length){
    b~=decodeFront!(Yes.useReplacementDchar)(a);
  }
  return b.data;
}
Dec 18 2016