www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Activating UTF-8 in Windows Console: CHCP

reply "Simon Buchan" <currently no.where> writes:
The console command chcp can change the current console's codepage, meaning
chcp 65001 will tell the console to use UTF-8. Make sure you arn't using
raster fonts!
(/me makes "whee!" noises while running around in circles)

-- 
"Unhappy Microsoft customers have a funny way of becoming Linux,
Salesforce.com and Oracle customers." - www.microsoft-watch.com:
"The Year in Review: Microsoft Opens Up"
--
"I plan on at least one critical patch every month, and I haven't been  
disappointed."
- Adam Hansen, manager of security at Sonnenschein Nath & Rosenthal LLP
(Quote from http://www.eweek.com/article2/0,1759,1736104,00.asp)
--
"It's been a challenge to "reteach or retrain" Web users to pay for  
content, said Pizey"
-Wired website: "The Incredible Shrinking Comic"
Dec 21 2004
next sibling parent "Simon Buchan" <currently no.where> writes:
On Tue, 21 Dec 2004 22:41:40 +1300, Simon Buchan <currently no.where>  
wrote:

 The console command chcp can change the current console's codepage,  
 meaning
 chcp 65001 will tell the console to use UTF-8. Make sure you arn't using
 raster fonts!
 (/me makes "whee!" noises while running around in circles)

Although it seems it is an error if you write when not using raster fonts? i.e. It will write fine to raster fonts, then be fine when you change it to lucidia console (and display it correctly!) but error out if it was written while in lucidia console. WTF? Gives "Unable to write to stream" after writing and simply ignores special char's. Anyone know what's going on? -- "Unhappy Microsoft customers have a funny way of becoming Linux, Salesforce.com and Oracle customers." - www.microsoft-watch.com: "The Year in Review: Microsoft Opens Up" -- "I plan on at least one critical patch every month, and I haven't been disappointed." - Adam Hansen, manager of security at Sonnenschein Nath & Rosenthal LLP (Quote from http://www.eweek.com/article2/0,1759,1736104,00.asp) -- "It's been a challenge to "reteach or retrain" Web users to pay for content, said Pizey" -Wired website: "The Incredible Shrinking Comic"
Dec 21 2004
prev sibling parent reply Roberto Mariottini <Roberto_member pathlink.com> writes:
In article <opsjcqjqqhjccy7t simon.mshome.net>, Simon Buchan says...
The console command chcp can change the current console's codepage, meaning
chcp 65001 will tell the console to use UTF-8. Make sure you arn't using
raster fonts!
(/me makes "whee!" noises while running around in circles)

This gives even strabger results: trying to writef an UTF-8 string terminates the program. See the following transcript for details: -------------------------------------------------------------------------- C:\Down\dlang>ver Microsoft Windows XP [Versione 5.1.2600] C:\Down\dlang>chcp 850 Tabella codici attiva: 850 C:\Down\dlang>type testUTF.d &#9559;&#9488;import std.stdio; import std.c.stdio; import std.c.windows.windows; extern (Windows) { export BOOL CharToOemW( LPCWSTR lpszSrc, // string to translate LPSTR lpszDst // translated string ); } int main() { puts("-- untranslated --"); puts("&#9500;&#9500;&#9500;&#9565;&#9500;&#9500;&#9500;&#9500;"); writef("&#9500;&#9500;&#9500;&#9565;&#9500;&#9500;&#9500;&#9500;\n"); puts("-- translated --"); wchar[] mess = "&#9500;&#9500;&#9500;&#9565;&#9500;&#9500;&#9500;&#9500;"; char[] OEMmess = new char[mess.length]; CharToOemW(mess, OEMmess); puts(OEMmess); writef(OEMmess); return 0; } C:\Down\dlang>testUTF.exe -- untranslated -- &#9500;&#9500;&#9500;&#9565;&#9500;&#9500;&#9500;&#9500; &#9500;&#9500;&#9500;&#9565;&#9500;&#9500;&#9500;&#9500; -- translated -- Error: invalid UTF-8 sequence C:\Down\dlang>chcp 65001 Tabella codici attiva: 65001 C:\Down\dlang>type testUTF.d &#65279;import std.stdio; import std.c.stdio; import std.c.windows.windows; extern (Windows) { export BOOL CharToOemW( LPCWSTR lpszSrc, // string to translate LPSTR lpszDst // translated string ); } int main() { puts("-- untranslated --"); puts(""); writef("\n"); puts("-- translated --"); wchar[] mess = ""; char[] OEMmess = new char[mess.length]; CharToOemW(mess, OEMmess); puts(OEMmess); writef(OEMmess); return 0; } C:\Down\dlang>testUTF.exe -- untranslated -- C:\Down\dlang> -------------------------------------------------------------------------- Ciao
Dec 22 2004
parent reply "Simon Buchan" <currently no.where> writes:
On Wed, 22 Dec 2004 13:46:27 +0000 (UTC), Roberto Mariottini  
<Roberto_member pathlink.com> wrote:

 In article <opsjcqjqqhjccy7t simon.mshome.net>, Simon Buchan says...
 The console command chcp can change the current console's codepage,  
 meaning
 chcp 65001 will tell the console to use UTF-8. Make sure you arn't using
 raster fonts!
 (/me makes "whee!" noises while running around in circles)

This gives even strabger results: trying to writef an UTF-8 string terminates the program. See the following transcript for details: -------------------------------------------------------------------------- C:\Down\dlang>ver Microsoft Windows XP [Versione 5.1.2600] C:\Down\dlang>chcp 850 Tabella codici attiva: 850 C:\Down\dlang>type testUTF.d �&#9559;&#9488;import std.stdio; import std.c.stdio; import std.c.windows.windows; extern (Windows) { export BOOL CharToOemW( LPCWSTR lpszSrc, // string to translate LPSTR lpszDst // translated string ); } int main() { puts("-- untranslated --"); puts("&#9500;�&#9500;�&#9500;&#9565;&#9500;�&#9500;�&#9500;�&#9500;�"); writef("&#9500;�&#9500;�&#9500;&#9565;&#9500;�&#9500;�&#9500;�&#9500;�\n"); puts("-- translated --"); wchar[] mess = "&#9500;�&#9500;�&#9500;&#9565;&#9500;�&#9500;�&#9500;�&#9500;�"; char[] OEMmess = new char[mess.length]; CharToOemW(mess, OEMmess); puts(OEMmess); writef(OEMmess); return 0; } C:\Down\dlang>testUTF.exe -- untranslated -- &#9500;�&#9500;�&#9500;&#9565;&#9500;�&#9500;�&#9500;�&#9500;� &#9500;�&#9500;�&#9500;&#9565;&#9500;�&#9500;�&#9500;�&#9500;� -- translated -- ������� Error: invalid UTF-8 sequence C:\Down\dlang>chcp 65001 Tabella codici attiva: 65001 C:\Down\dlang>type testUTF.d &#65279;import std.stdio; import std.c.stdio; import std.c.windows.windows; extern (Windows) { export BOOL CharToOemW( LPCWSTR lpszSrc, // string to translate LPSTR lpszDst // translated string ); } int main() { puts("-- untranslated --"); puts("�������"); writef("�������\n"); puts("-- translated --"); wchar[] mess = "�������"; char[] OEMmess = new char[mess.length]; CharToOemW(mess, OEMmess); puts(OEMmess); writef(OEMmess); return 0; } C:\Down\dlang>testUTF.exe -- untranslated -- ������� C:\Down\dlang> -------------------------------------------------------------------------- Ciao

See my above reply: I may have been too hasty... This may have something to do with surrogates, etc... Putting cmd.exe in raster fonts, running the program, then changeing the font to lucidia displays the UTF correctly, but changing it back results in corruption. WTF? Does anyone know of a completly seperate to window command shell? I suppose one could use MSYS (or MinGW?) or something... -- "Unhappy Microsoft customers have a funny way of becoming Linux, Salesforce.com and Oracle customers." - www.microsoft-watch.com: "The Year in Review: Microsoft Opens Up" -- "I plan on at least one critical patch every month, and I haven't been disappointed." - Adam Hansen, manager of security at Sonnenschein Nath & Rosenthal LLP (Quote from http://www.eweek.com/article2/0,1759,1736104,00.asp) -- "It's been a challenge to "reteach or retrain" Web users to pay for content, said Pizey" -Wired website: "The Incredible Shrinking Comic"
Dec 22 2004
parent Geoff Hickey <ardri comcast.net> writes:
 
 Does anyone know of a completly seperate to window command shell?
 I suppose one could use MSYS (or MinGW?) or something...
 

4NT might work. You can download it here: ftp://jpsoft.com/4nt/. It's not free, but there's a trial download. I'm not a 4NT user myself, but quite a few programmers I know are. I have no idea if it supports UTF-8. It does support Unicode, but that might just mean UTF-16. - Geoff Hickey
Dec 22 2004