www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Displaying non UTF-8 8 bit character codes with writefln()

reply Graham <GC <grahamc001uk nospam-yahoo.co.uk>> writes:
Is there an easy way of displaying non UTF-8 8 bit codes with writefln() ?

E.g. code like:

writefln("elapsed time %.9f \&micro;S", elapsed_time);

On a windows system displays output like:

elapsed time 2.598202392 µS

(displayed when running in a cmd.exe window)

The µ is character codes 0xC2 0xB5 for the UTF-8 encoding
of µ.

Code like:
writefln("elapsed time %.9f \u00B5S", elapsed_time);

displays the same

and code like:
writefln("elapsed time %.9f \xB5S", elapsed_time);

understandably displays the run-time error:
Error: 4invalid UTF-8 sequence

trying a Wysiwyg string like: 
writefln("elapsed time %.9f " r"µ" "S", elapsed_time);

displays a compiler error: invalid UTF-8 sequence

Is there any simple way to output a non UTF-8 string containing
the B5 character code without the C2 prefix ?
Oct 04 2007
next sibling parent reply Regan Heath <regan netmail.co.nz> writes:
Try printf and saving the file as a UTF-8 encoded text file...

--[b5.d]--
import std.stdio;

void main()
{
	printf("\&micro;\n");
	printf("\u00B5\n");
	printf("\xB5\n");  //doesn't output anything
	writefln("µ");
}

Using this source saved as b5.d as a UTF-8 encoded text file (IMPORTANT) 
  I can set my command prompt font to "Lucida Console" and execute the 
following commands:

E:\D\src\tmp>chcp 65001
Active code page: 65001

E:\D\src\tmp>dmd -run b5.d
µ
µ
µ

The 3rd printf doesn't output anything, not sure why, the others all 
output the same character.

chcp 65001 changes to UTF-8 code page :)

Regan
Oct 04 2007
next sibling parent Graham <GC <grahamc001uk nospam-yahoo.co.uk>> writes:
Regan Heath Wrote:

 Try printf and saving the file as a UTF-8 encoded text file...
 
 --[b5.d]--
 import std.stdio;
 
 void main()
 {
 	printf("\&micro;\n");
 	printf("\u00B5\n");
 	printf("\xB5\n");  //doesn't output anything
 	writefln("µ");
 }
 
 Using this source saved as b5.d as a UTF-8 encoded text file (IMPORTANT) 
   I can set my command prompt font to "Lucida Console" and execute the 
 following commands:
 
 E:\D\src\tmp>chcp 65001
 Active code page: 65001
 
 E:\D\src\tmp>dmd -run b5.d
 µ
 µ
 µ
 
 The 3rd printf doesn't output anything, not sure why, the others all 
 output the same character.
 
 chcp 65001 changes to UTF-8 code page :)
 
 Regan
Thanks, I was hoping for something more elegant but if all char variables in phobos have to be UTF-8 I guess this is the only way.
Oct 04 2007
prev sibling parent reply "Stewart Gordon" <smjg_1998 yahoo.com> writes:
"Regan Heath" <regan netmail.co.nz> wrote in message 
news:fe2uf5$2gsa$1 digitalmars.com...
 Try printf and saving the file as a UTF-8 encoded text file...
Why, exactly, are you advocating going back to the printf abomination? <snip>
 Using this source saved as b5.d as a UTF-8 encoded text file (IMPORTANT) I 
 can set my command prompt font to "Lucida Console" and execute the 
 following commands:

 E:\D\src\tmp>chcp 65001
 Active code page: 65001
<snip> This misses the point slightly. The user shouldn't have to change the codepage just to get someone else's application to work properly. What you want is my utility library: http://pr.stewartsplace.org.uk/d/sutil/ Stewart. -- My e-mail address is valid but not my primary mailbox. Please keep replies on the 'group where everybody may benefit.
Oct 05 2007
next sibling parent reply Graham <GC <grahamc001uk nospam-yahoo.co.uk>> writes:
Stewart Gordon Wrote:
 
 What you want is my utility library:
 http://pr.stewartsplace.org.uk/d/sutil/
 
 Stewart.
 
 -- 
Thanks, that's nice. By the way, I spotted some minor errors on a couple of your documentation pages: ConsoleOutput referring to ConsoleInput in second column on http://pr.stewartsplace.org.uk/d/sutil/ref/annotated.html and the subtitle on http://pr.stewartsplace.org.uk/d/sutil/ref/classsmjg_1_1libs_1_1util_1_1console_1_1ConsoleOutput.html is ConsoleInput instead of ConsoleOutput
Oct 05 2007
parent "Stewart Gordon" <smjg_1998 yahoo.com> writes:
"Graham >" <GC <grahamc001uk nospam-yahoo.co.uk> wrote in message 
news:fe5cp5$bp$1 digitalmars.com...
<snip>
 By the way, I spotted some minor errors on a couple of your documentation 
 pages:

 ConsoleOutput referring to ConsoleInput in second column on
 http://pr.stewartsplace.org.uk/d/sutil/ref/annotated.html

 and the subtitle on
 http://pr.stewartsplace.org.uk/d/sutil/ref/classsmjg_1_1libs_1_1util_1_1console_1_1ConsoleOutput.html
 is ConsoleInput instead of ConsoleOutput
Good catch. Also noticed quite a few cases where the automatic removal of words like "The ConsoleInput class" in the brief description hasn't worked. Stewart. -- My e-mail address is valid but not my primary mailbox. Please keep replies on the 'group where everybody may benefit.
Oct 05 2007
prev sibling parent reply Regan Heath <regan netmail.co.nz> writes:
Stewart Gordon wrote:
 "Regan Heath" <regan netmail.co.nz> wrote in message 
 news:fe2uf5$2gsa$1 digitalmars.com...
 Try printf and saving the file as a UTF-8 encoded text file...
Why, exactly, are you advocating going back to the printf abomination?
Well.. there were 2 ways to solve his problem: 1. avoid the valid utf-8 cahracter check. 2. make the console display utf-8 correctly. printf("%c\n", 230); writefln("\u00B5"); or save the file as UTF-8 and use writefln("µ");
 <snip>
 Using this source saved as b5.d as a UTF-8 encoded text file 
 (IMPORTANT) I can set my command prompt font to "Lucida Console" and 
 execute the following commands:

 E:\D\src\tmp>chcp 65001
 Active code page: 65001
<snip> This misses the point slightly. The user shouldn't have to change the codepage just to get someone else's application to work properly.
Sadly, if the application is outputting UTF-8 you don't have a choice.
 What you want is my utility library:
 http://pr.stewartsplace.org.uk/d/sutil/
Cool. You're converting UTF-8 to the console code page I assume. Regan
Oct 05 2007
parent reply "Stewart Gordon" <smjg_1998 yahoo.com> writes:
"Regan Heath" <regan netmail.co.nz> wrote in message 
news:fe5d88$15l$1 digitalmars.com...
<snip>
 1. avoid the valid utf-8 cahracter check.
 2. make the console display utf-8 correctly.


   printf("%c\n", 230);
No I gottan't. I could use putchar, puts or OutputStream.writeString for example. <snip>
 This misses the point slightly.  The user shouldn't have to change the 
 codepage just to get someone else's application to work properly.
Sadly, if the application is outputting UTF-8 you don't have a choice.
But how many DOS or Windows console apps in the real world output UTF-8? Presumably not many, considering that no versions of DOS and only a few versions of Windows support it. There's also a causal loop in that even modern Windows versions don't come with the console code page set to 65001 by default. I don't know what is likely to break this loop, but I doubt that the restrictiveness of one language's standard library is going to do it.
 What you want is my utility library:
 http://pr.stewartsplace.org.uk/d/sutil/
Cool. You're converting UTF-8 to the console code page I assume.
Exactly. (Well, as exactly as is possible under the constraints.) Stewart. -- My e-mail address is valid but not my primary mailbox. Please keep replies on the 'group where everybody may benefit.
Oct 05 2007
parent reply Regan Heath <regan netmail.co.nz> writes:
Stewart Gordon wrote:
 "Regan Heath" <regan netmail.co.nz> wrote in message 
 news:fe5d88$15l$1 digitalmars.com...
 <snip>
 1. avoid the valid utf-8 cahracter check.
 2. make the console display utf-8 correctly.


   printf("%c\n", 230);
No I gottan't. I could use putchar, puts or OutputStream.writeString for example.
Sure, except the OP wanted formatting. End of the day, as long as you know what you're doing using printf isn't going to kill you.
 <snip>
 This misses the point slightly.  The user shouldn't have to change 
 the codepage just to get someone else's application to work properly.
Sadly, if the application is outputting UTF-8 you don't have a choice.
But how many DOS or Windows console apps in the real world output UTF-8?
Everything written in D using writefln from phobos ;) Even if you're only outputting ASCII characters (a subset of UTF-8 - as I'm sure you know) you have the ability to output the full range of UTF-8 codepoints and really we need a console which can handle that.
 Presumably not many, considering that no versions of DOS and only a few 
 versions of Windows support it.  There's also a causal loop in that even 
 modern Windows versions don't come with the console code page set to 
 65001 by default.  I don't know what is likely to break this loop, but I 
 doubt that the restrictiveness of one language's standard library is 
 going to do it.
True. I wonder what the vista console defaults to? Are they still using local code pages or are they using UTF-8 or UTF-16 (perhaps more likely)
 What you want is my utility library:
 http://pr.stewartsplace.org.uk/d/sutil/
Cool. You're converting UTF-8 to the console code page I assume.
Exactly. (Well, as exactly as is possible under the constraints.)
:) Regan
Oct 05 2007
parent "Stewart Gordon" <smjg_1998 yahoo.com> writes:
"Regan Heath" <regan netmail.co.nz> wrote in message 
news:fe5g9k$5i6$1 digitalmars.com...
<snip>
 True.  I wonder what the vista console defaults to?  Are they still using 
 local code pages or are they using UTF-8 or UTF-16 (perhaps more likely)
<snip> Mine defaults to 850. (Strange - British installations of MS-DOS back in the day always default to 437 as far as my experience goes. Sometimes under Win9x, you would get the anomaly of 437 in full screen mode, but a console font in windowed mode that's set up for 850.) But having it use UTF-16 would break far too many programs. There is, however, a function ReadConsoleW, which reads characters in UTF-16 regardless of the active code page. But it doesn't work if stdin is redirected. But I also found that ReadFile doesn't handle UTF-8 console input properly. Look at the way my library uses the two functions, each to get around the problems with the other depending on circumstance. Stewart. -- My e-mail address is valid but not my primary mailbox. Please keep replies on the 'group where everybody may benefit.
Oct 05 2007
prev sibling parent reply Graham <GC <grahamc001uk nospam-yahoo.co.uk>> writes:
After searching back a bit further than before I see this was discussed
in April and the answer was to use printf for the 8 bit string.

something like:

writef("elapsed time %.9f", elapsed_time);
printf(" \xB5S\n");

does work, but if anybody has a more elegant solution please let me know.
Oct 04 2007
parent "Aziz K." <aziz.kerim gmail.com> writes:
Graham wrote:
 After searching back a bit further than before I see this was discussed
 in April and the answer was to use printf for the 8 bit string.

 something like:

 writef("elapsed time %.9f", elapsed_time);
 printf(" \xB5S\n");

 does work, but if anybody has a more elegant solution please let me know.
Hi, There's a better solution. You could switch to the Tango librabry which uses WriteConsoleW() internally to correctly write Unicode characters on the Windows console. Regards, Aziz
Oct 04 2007