www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - ANSI - output with phobos

reply me <me nospamusa.com> writes:
for(char c = 0; c < c.max; c++)
    writefln(c);

In a not too distant past the above code could produce the entire ANSI table,
however this is not the case today. Today it peters out at 127 and any code
beyond that cannot be desplayed. The error message produced is:

  Error: 4invalid UTF-8 sequence

Please provide some guidance on how to accomplish this in present D.

Thanks,
Drew
Apr 03 2007
next sibling parent reply me <me nospamusa.com> writes:
me Wrote:

 for(char c = 0; c < c.max; c++)
     writefln(c);
 
 In a not too distant past the above code could produce the entire ANSI table,
however this is not the case today. Today it peters out at 127 and any code
beyond that cannot be desplayed. The error message produced is:
 
   Error: 4invalid UTF-8 sequence
 
 Please provide some guidance on how to accomplish this in present D.
 
 Thanks,
 Drew

First let me apologize for the double post. I am aware that printf() can still be used to achieve the desired result. However, Im interested in accomplishing this through writef()/writefln(); Thanks again, Drew
Apr 03 2007
next sibling parent Derek Parnell <derek nomail.afraid.org> writes:
On Tue, 03 Apr 2007 20:26:49 -0400, me wrote:

 me Wrote:
 
 for(char c = 0; c < c.max; c++)
     writefln(c);
 
 In a not too distant past the above code could produce the entire ANSI table,
however this is not the case today. Today it peters out at 127 and any code
beyond that cannot be desplayed. The error message produced is:
 
   Error: 4invalid UTF-8 sequence
 
 Please provide some guidance on how to accomplish this in present D.
 
 Thanks,
 Drew


You seem to be wanting to display the characters of the console's current code-page.
 I am aware that printf() can still be used to achieve the desired result.

So I guess, the issue you are trying to resolve is how to convert code-page characters into UTF-8 form. Character values 128-255 are displayed on the Windows console using the console's current code-page to select the appropriate glyph. To get the same glyph to display using Unicode (which is the only character set that D supports) would mean that you have to set the console to a Unicode "code-page" and manually convert the character values from the code-page you were assuming, to the equivalent Unicode value. Not a trivial task at all. -- Derek (skype: derek.j.parnell) Melbourne, Australia "Justice for David Hicks!" 4/04/2007 11:05:14 AM
Apr 03 2007
prev sibling parent Deewiant <deewiant.doesnotlike.spam gmail.com> writes:
me wrote:
 for(char c = 0; c < c.max; c++)
     writefln(c);

 In a not too distant past the above code could produce the entire ANSI table,
however this is not the case today. Today it peters out at 127 and any code
beyond that cannot be desplayed. The error message produced is:

   Error: 4invalid UTF-8 sequence

 Please provide some guidance on how to accomplish this in present D.

First let me apologize for the double post. I am aware that printf() can still be used to achieve the desired result. However, I知 interested in accomplishing this through writef()/writefln();

Not possible. Just use the C library, writing a wrapper around it if you don't want to worry about whether strings are zero-terminated all the time. -- Remove ".doesnotlike.spam" from the mail address.
Apr 04 2007
prev sibling next sibling parent Derek Parnell <derek nomail.afraid.org> writes:
On Tue, 03 Apr 2007 20:16:06 -0400, me wrote:

 for(char c = 0; c < c.max; c++)
     writefln(c);
 
 In a not too distant past the above code could produce the entire ANSI table,
however this is not the case today. Today it peters out at 127 and any code
beyond that cannot be desplayed. The error message produced is:
 
   Error: 4invalid UTF-8 sequence
 
 Please provide some guidance on how to accomplish this in present D.
 

Characters whose numeric representation is above 127 and less than 256, are not UTF-8 characters and the function 'writefln' expects 'char' values to be UTF-8. So, to do what you want, you must either not use writefln or not use 'char' types. import std.stdio; void main() { for(ubyte c = 0; c < c.max; c++) { if (c <= 127) writef("'%s' ", cast(char)c); writefln(c); } } -- Derek (skype: derek.j.parnell) Melbourne, Australia "Justice for David Hicks!" 4/04/2007 10:22:49 AM
Apr 03 2007
prev sibling parent reply Juan Jose Comellas <jcomellas gmail.com> writes:
The problem is that the 'char' type can only contain valid UTF-8
*characters*. A character in UTF-8 can be composed of 1 to 4 *bytes*, and
not all of the values a byte can take are valid in UTF-8. In fact, most of
the byte values above 127 are not valid. You have two options: 1) use the
wchar type (the Latin 1/ISO8859-1 character set is very similar to ANSI and
all of its characters are 2 byte-wide when mapped to the UTF-16 character
set); 2) manually convert the 'ANSI' value into UTF-8.

For more information I suggest reading this:

http://en.wikipedia.org/wiki/Utf-8
http://en.wikipedia.org/wiki/Utf-16


me wrote:

 for(char c = 0; c < c.max; c++)
     writefln(c);
 
 In a not too distant past the above code could produce the entire ANSI
 table, however this is not the case today. Today it peters out at 127 and
 any code beyond that cannot be desplayed. The error message produced is:
 
   Error: 4invalid UTF-8 sequence
 
 Please provide some guidance on how to accomplish this in present D.
 
 Thanks,
 Drew

Apr 04 2007
next sibling parent Daniel Keep <daniel.keep.lists gmail.com> writes:
Juan Jose Comellas wrote:
 The problem is that the 'char' type can only contain valid UTF-8
 *characters*. A character in UTF-8 can be composed of 1 to 4 *bytes*, and
 not all of the values a byte can take are valid in UTF-8. In fact, most of
 the byte values above 127 are not valid. You have two options: 1) use the
 wchar type (the Latin 1/ISO8859-1 character set is very similar to ANSI and
 all of its characters are 2 byte-wide when mapped to the UTF-16 character
 set); 2) manually convert the 'ANSI' value into UTF-8.
 
 For more information I suggest reading this:
 
 http://en.wikipedia.org/wiki/Utf-8
 http://en.wikipedia.org/wiki/Utf-16

Here's another one (shameless plug): http://www.prowiki.org/wiki4d/wiki.cgi?DanielKeep/TextInD -- Daniel
 me wrote:
 
 for(char c = 0; c < c.max; c++)
     writefln(c);

 In a not too distant past the above code could produce the entire ANSI
 table, however this is not the case today. Today it peters out at 127 and
 any code beyond that cannot be desplayed. The error message produced is:

   Error: 4invalid UTF-8 sequence

 Please provide some guidance on how to accomplish this in present D.

 Thanks,
 Drew


-- int getRandomNumber() { return 4; // chosen by fair dice roll. // guaranteed to be random. } http://xkcd.com/ v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP http://hackerkey.com/
Apr 04 2007
prev sibling parent reply Deewiant <deewiant.doesnotlike.spam gmail.com> writes:
Juan Jose Comellas wrote:
 The problem is that the 'char' type can only contain valid UTF-8
 *characters*. A character in UTF-8 can be composed of 1 to 4 *bytes*, and
 not all of the values a byte can take are valid in UTF-8. In fact, most of
 the byte values above 127 are not valid. You have two options: 1) use the
 wchar type (the Latin 1/ISO8859-1 character set is very similar to ANSI and
 all of its characters are 2 byte-wide when mapped to the UTF-16 character
 set); 2) manually convert the 'ANSI' value into UTF-8.

3) Use ubyte (or use char, but be careful about what functions you pass non-UTF-8 chars to), and print using the C standard library. One might have to output a string without knowing its encoding, thus making it impossible to convert it to a UTF encoding reliably. -- Remove ".doesnotlike.spam" from the mail address.
Apr 04 2007
parent reply Don Clugston <dac nospam.com.au> writes:
Deewiant wrote:
 One might have to output a string without knowing its encoding, thus making it
 impossible to convert it to a UTF encoding reliably.

Then how can you know how to output it?
Apr 05 2007
parent Deewiant <deewiant.doesnotlike.spam gmail.com> writes:
Don Clugston wrote:
 Deewiant wrote:
 One might have to output a string without knowing its encoding, thus
 making it
 impossible to convert it to a UTF encoding reliably.

Then how can you know how to output it?

Just pass the bytes to the console, and let the user worry about how it's displayed. If you write "\xe4" to a file, you expect the file to contain the byte 0xE4. If you write it in a console, the console should display the character in the current character set which 0xE4 is mapped to. -- Remove ".doesnotlike.spam" from the mail address.
Apr 05 2007