digitalmars.D.learn - Character recognition and output

Tyro (28/28) Nov 06 2006 Wondering if someone can point me in the right direction on small

Hasan Aljudy (10/42) Nov 06 2006 Seems to me an encoding problem.

Tyro <ridimz yahoo.com> writes:

Wondering if someone can point me in the right direction on small
problem.

I'm attempting to parse(?) a file with the following
string "�������������" embeded somewhere in it. When I try to
output the information, however, writef() chokes if it comes across
one of these characters. I thought that this was simply a writef
[doFormat] problem so I tried to read the file using Christopher
Miller's sample richtext viewer that accompanies DFL and the same
thing happens (Error: 4invalid UTF-8 sequence). I tried different
combinations of wchar[], dchar[], and byte[] but to no avail. How
do I fix this?

import std.stdio: emitln = writefln, emit = writef;
import std.file: exists, read;

void main (char[][] args)
{
  if (args.length == 2 && args[1].exists())
  {
    char[] file = cast(char[])args[1].read();
    foreach(sizendx, char ch; file)
    {
      try { emit(ch); }             // terminates on �
      catch { emit(" ");continue; }
    }
  }
  else
    emit ("usage is: ids filename");
}

Andrew Edwards

Nov 06 2006

Hasan Aljudy <hasan.aljudy gmail.com> writes:

Tyro wrote:
 Wondering if someone can point me in the right direction on small
 problem.
 
 I'm attempting to parse(?) a file with the following
 string "�������������" embeded somewhere in it. When
I try to
 output the information, however, writef() chokes if it comes across
 one of these characters. I thought that this was simply a writef
 [doFormat] problem so I tried to read the file using Christopher
 Miller's sample richtext viewer that accompanies DFL and the same
 thing happens (Error: 4invalid UTF-8 sequence). I tried different
 combinations of wchar[], dchar[], and byte[] but to no avail. How
 do I fix this?
 
 import std.stdio: emitln = writefln, emit = writef;
 import std.file: exists, read;
 
 void main (char[][] args)
 {
   if (args.length == 2 && args[1].exists())
   {
     char[] file = cast(char[])args[1].read();
     foreach(sizendx, char ch; file)
     {
       try { emit(ch); }             // terminates on �
       catch { emit(" ");continue; }
     }
   }
   else
     emit ("usage is: ids filename");
 }
 
 Andrew Edwards

Seems to me an encoding problem.
Even my mozilla Thunderbird client doesn't recognize the characters, it 
prints little diamonds with a question mark inside (the encoding is set 
to UTF-8).

I think the standard library is written to deal mainly with unicode text 
only.

If it's just one file (or a couple of them) the easiest way to 
trans-code it is probably to just open it with notepad then save it 
again with UTF-8 encoding.

Nov 06 2006

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Character recognition and output