www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Error: 4invalid UTF-8 sequence :: How can I catch this?? (or otherwise

reply Charles Hixson <charleshixsn earthlink.net> writes:
I want to read a bunch of files, and if the aren't UTF, then I want to 
list their names for conversion, or other processing.  How should this 
be handled??

try..catch..finally blocks just ignore this error.
Oct 21 2009
next sibling parent reply Daniel Keep <daniel.keep.lists gmail.com> writes:
Charles Hixson wrote:
 I want to read a bunch of files, and if the aren't UTF, then I want to
 list their names for conversion, or other processing.  How should this
 be handled??
 
 try..catch..finally blocks just ignore this error.
 type stuff.d
import std.stdio; import std.utf; void main() { try { writefln("A B \xfe C"); } catch( UtfException e ) { writefln("I caught a %s!", e); } }
 dmd stuff && stuff
A B I caught a 4invalid UTF-8 sequence! Works for me.
Oct 21 2009
parent reply Charles Hixson <charleshixsn earthlink.net> writes:
Daniel Keep wrote:
 Charles Hixson wrote:
 I want to read a bunch of files, and if the aren't UTF, then I want to
 list their names for conversion, or other processing.  How should this
 be handled??

 try..catch..finally blocks just ignore this error.
 type stuff.d
import std.stdio; import std.utf; void main() { try { writefln("A B \xfe C"); } catch( UtfException e ) { writefln("I caught a %s!", e); } }
 dmd stuff && stuff
A B I caught a 4invalid UTF-8 sequence! Works for me.
Sorry, the error is on the read. The code I tried to use was: try { lin = fil.readLine; } catch { writefln("File <<" ~ filIter [curFilNdx] ~ ">> is not a valid UTF file."); fil.close; getLine; return; } finally { } debug (9) writefln ("lin = <<" ~ lin ~ ">>"); try { validate (lin); } catch (UtfException ue) { writefln ("File <<" ~ filIter [curFilNdx] ~ ">> is not a valid UTF file."); fil.close; getLine; return; } where fil is a File and getLine is one of my routines that automatically switches to the next file if the current file has been closed.
Oct 21 2009
parent Charles Hixson <charleshixsn earthlink.net> writes:
Charles Hixson wrote:
 Daniel Keep wrote:
 Charles Hixson wrote:
 I want to read a bunch of files, and if the aren't UTF, then I want to
 list their names for conversion, or other processing. How should this
 be handled??

 try..catch..finally blocks just ignore this error.
 type stuff.d
import std.stdio; import std.utf; void main() { try { writefln("A B \xfe C"); } catch( UtfException e ) { writefln("I caught a %s!", e); } }
 dmd stuff && stuff
A B I caught a 4invalid UTF-8 sequence! Works for me.
Sorry, the error is on the read. The code I tried to use was: try { lin = fil.readLine; } catch { writefln("File <<" ~ filIter [curFilNdx] ~ ">> is not a valid UTF file."); fil.close; getLine; return; } finally { } debug (9) writefln ("lin = <<" ~ lin ~ ">>"); try { validate (lin); } catch (UtfException ue) { writefln ("File <<" ~ filIter [curFilNdx] ~ ">> is not a valid UTF file."); fil.close; getLine; return; } where fil is a File and getLine is one of my routines that automatically switches to the next file if the current file has been closed.
For some reason when I explicitly put the (UtfException ue) on the catch statement that I'd been trying to use to catch everything (i.e., just a blank catch) it works. I'm not sure whether I misunderstand how the unlabeled catch works in D, or whether something really strange is going on. The documentation seems to say that an unlabeled catch statement catches everything, but it doesn't catch the UtfException. When the UtfException is explicitly listed it works. (Admittedly I altered the code a lot, trying lots of different things, before I tried just using an explicit: catch (UtfException ue) What I finally ended up with that worked was while (!curFile.eof) { ... try { s = curFile.readLine; std.utf.validate (s); } catch (UtfException ue) { writef ("\n err at <<" ~ fileName ~ ">>line " ~ std.string.toString (line)); if (++errs > 3) { writefln ("\ntoo many errs"); break; } } } with curFile a std.File. I don't know whether a BufferedFile would have worked.
Nov 01 2009
prev sibling parent Charles Hixson <charleshixsn earthlink.net> writes:
Charles Hixson wrote:
 I want to read a bunch of files, and if the aren't UTF, then I want to 
 list their names for conversion, or other processing.  How should this 
 be handled??
 
 try..catch..finally blocks just ignore this error.
OK. One approach that occurs to me is to read the data in as a byte stream, break it into lines, and validate the lines. But validate requires an array of chars, so this seems to put me right back where I was. Unless, perhaps, I can cast an array of bytes into an array of chars without having throw an "Error: 4invalid UTF-8 sequence", then validate the entire array. But if I do that, I won't know where the break should be, so I might only get half of a legitimate UTF-8 character, and so it would legitimately throw UTFException, even though the file was good. I'm sure there are ways around that, but it really seems a round-about way to proceed for something that should be easy. P.S.: As before, the actual code that throws the error is: try { lin = fil.readLine; } catch { writefln("File <<" ~ filIter [curFilNdx] ~ ">> is not a valid UTF file."); fil.close; getLine; return; } finally { } debug (9) writefln ("lin = <<" ~ lin ~ ">>"); try { validate (lin); } catch (UtfException ue) { writefln ("File <<" ~ filIter [curFilNdx] ~ ">> is not a valid UTF file."); fil.close; getLine; return; } where fil is a File and getLine is one of my routines that automatically switches to the next file if the current file has been closed.
Oct 22 2009