digitalmars.D - proposal string std.utf:sanitizeUTF(string) which returns an always
- Timothee Cour via Digitalmars-d (42/42) Dec 18 2016 I keep running into issues due to auto-decoding (arguably a significant
I keep running into issues due to auto-decoding (arguably a significant design flaw of phobos) when using strings from external sources (which may not be 100% valid UTF8) eg see stracktrace [1] on getSomeExternalString().splitLines, Could we have something like `sanitizeUTF` in std.utf, to allow for a simple fix when running into such UTF8 issues see proposal implementation [2]; the fix would then be: ``` getSomeExternalString().splitLines, => getSomeExternalString().sanitizeUTF.splitLines, ``` [1] core.exception.AssertError std/utf.d(2254): Assertion failure ---------------- ??:? _d_assert [0x4f4e63] ??:? void std.utf.__assert(int) [0x53a304] ??:? pure nothrow nogc safe ubyte std.utf.codeLength!(char).codeLength(dchar) [0xa5d78191] ??:? pure nothrow nogc safe int std.string.stripRight!(immutable(char)[]).stripRight(immutable(char)[]).__foreachbody2(ref ulong, ref dchar) [0xa5c42bd9] ??:? _aApplyRcd2 [0x4f9bd1] ??:? pure nogc safe immutable(char)[] std.string.stripRight!(immutable(char)[]).stripRight(immutable(char)[]) [0xa5c42b5c] ??:? pure property nogc safe immutable(char)[] std.algorithm.iteration.stripRight.MapResult.front() [0xa5cda053] ??:? pure safe immutable(char)[] std.array.join!(std.algorithm.iteration.stripRight.MapResult, immutable(char)[]).join(std.algorithm.iteration.stripRight.MapResult, immutable(char)[]) [0xa5cda39a] [2] sanitizeUTF proposal: // TODO: rangeify to make it work in more situations string sanitizeUTF(string a){ import std.utf; Appender!string b; while(a.length){ b~=decodeFront!(Yes.useReplacementDchar)(a); } return b.data; }
Dec 18 2016