digitalmars.D.learn - Always std.utf.validate, or rely on exceptions?

SimonN (35/35) Mar 02 2017 Many functions in std.utf throw UTFException when we pass them

ketmar (2/6) Mar 02 2017 i'd say: "ALWAYS validate before ANY further processing".
Kagamin (6/11) Mar 02 2017 If you expect file with malformed utf that can cause you trouble

Kagamin (3/5) Mar 02 2017 Or rather they report an unrecoverable error terminating the
SimonN (6/10) Mar 02 2017 Thanks. Now, I still call std.stdio.byLine or std.stdio.lines on

SimonN <eiderdaus gmail.com> writes:

Many functions in std.utf throw UTFException when we pass them 
malformed UTF, and many functions in std.string throw 
StringException. From this, I developed a habit of reading user 
files like so, hoping that it traps all malformed UTF:

     try {
         // call D standard lib on string from file
     }
     catch (Exception e) {
         // treat file as bogus
         // log e.msg
     }

But std.string.stripRight!string calls std.utf.codeLength, which 
doesn't ever throw on malformed UTF, but asserts false on errors:

     ubyte codeLength(C)(dchar c)  safe pure nothrow  nogc
         if (isSomeChar!C)
     {
         static if (C.sizeof == 1)
         {
             if (c <= 0x7F) return 1;
             if (c <= 0x7FF) return 2;
             if (c <= 0xFFFF) return 3;
             if (c <= 0x10FFFF) return 4;
             assert(false);
         }
         // ...
     }

Apparently, once my code calls stripRight, I should be sure that 
this string contains only well-formed UTF. Right now, my code 
doesn't guarantee that.

Should I always validate text from files manually with 
std.utf.validate?

Or should I memorize which functions throw, then validate 
manually whenever I call the non-throwing UTF functions? What is 
the pattern behind what throws and what asserts false?

-- Simon

Mar 02 2017

ketmar <ketmar ketmar.no-ip.org> writes:

SimonN wrote:

 Should I always validate text from files manually with std.utf.validate?

 Or should I memorize which functions throw, then validate manually 
 whenever I call the non-throwing UTF functions? What is the pattern 
 behind what throws and what asserts false?

i'd say: "ALWAYS validate before ANY further processing".

Mar 02 2017

Kagamin <spam here.lot> writes:

On Thursday, 2 March 2017 at 16:20:30 UTC, SimonN wrote:
 Should I always validate text from files manually with 
 std.utf.validate?

 Or should I memorize which functions throw, then validate 
 manually whenever I call the non-throwing UTF functions? What 
 is the pattern behind what throws and what asserts false?

If you expect file with malformed utf that can cause you trouble 
and want to handle it gracefully, pass its content through 
validator and catch exception from validator. Functions working 
with strings usually assume valid utf and can behave incorrectly 
on malformed utf.

Mar 02 2017

Kagamin <spam here.lot> writes:

On Thursday, 2 March 2017 at 17:03:01 UTC, Kagamin wrote:
 Functions working with strings usually assume valid utf and can 
 behave incorrectly on malformed utf.

Or rather they report an unrecoverable error terminating the 
process.

Mar 02 2017

SimonN <eiderdaus gmail.com> writes:

ketmar wrote:
 i'd say: "ALWAYS validate before ANY further processing".

On Thursday, 2 March 2017 at 17:03:01 UTC, Kagamin wrote:
 If you expect file with malformed utf that can cause you 
 trouble and want to handle it gracefully, pass its content 
 through validator and catch exception from validator.

Thanks. Now, I still call std.stdio.byLine or std.stdio.lines on 
the raw data, this seems robust with random binary blobs. Then, I 
validate each line before calling anything else.

-- Simon

Mar 02 2017

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Always std.utf.validate, or rely on exceptions?