|
Archives
D Programming
D
D.gnu
digitalmars.D
digitalmars.D.bugs
digitalmars.D.dtl
digitalmars.D.dwt
digitalmars.D.announce
digitalmars.D.learn
digitalmars.D.debugger
C/C++ Programming
c++
c++.announce
c++.atl
c++.beta
c++.chat
c++.command-line
c++.dos
c++.dos.16-bits
c++.dos.32-bits
c++.idde
c++.mfc
c++.rtl
c++.stl
c++.stl.hp
c++.stl.port
c++.stl.sgi
c++.stlsoft
c++.windows
c++.windows.16-bits
c++.windows.32-bits
c++.wxwindows
digitalmars.empire
digitalmars.DMDScript
|
digitalmars.D - Character encoding problem
↑ ↓ ← → "Mathias Bierschenk" <Mathias.Bierschenk web.de> writes:
How can I print German characters? I've tried the following simple program:
import std.c.stdio;
int main()
{
puts("äöüßÄÖÜ"); // German characters
return 0;
}
As the normal MS-DOS EDIT encoding didn't work (Windows 98 SE, German
edition) I tried Mozilla to save the source code file with different
character encodings but none worked as expected. Here's what I tried using
the current DMD version:
MS-DOS encoding as performed by Microsoft's EDIT editor:
(5) "invalid UTF-sequence"
Western (ISO-8859-1):
(5) "invalid UTF-sequence"
Unicode (UTF-16 and UTF-32, each with Big Endian and Little Endian):
(1) "semicolon expected, not '.'"
(1) no identifier for declarator
Unicode (UTF-16 and UTF-8):
both compile fine but output garbage under MS-DOS
(Windows 98 SE, German edition)
↑ ↓ ← → =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Mathias Bierschenk wrote:
How can I print German characters? I've tried the following simple program:
import std.c.stdio;
int main()
{
puts("äöüßÄÖÜ"); // German characters
return 0;
}
D only supports Unicode, so *both* your editor and
your terminal must be set to this. (UTF-8, usually)
Does the Windows 98 SE command prompt support Unicode ?
If you not, you need to convert before outputting...
--anders
↑ ↓ ← → "Simon Buchan" <currently no.where> writes:
On Fri, 19 Nov 2004 12:49:01 +0100, Mathias Bierschenk
<Mathias.Bierschenk web.de> wrote:
How can I print German characters? I've tried the following simple
program:
import std.c.stdio;
int main()
{
puts("äöüßÄÖÜ"); // German characters
return 0;
}
As the normal MS-DOS EDIT encoding didn't work (Windows 98 SE, German
edition) I tried Mozilla to save the source code file with different
character encodings but none worked as expected. Here's what I tried
using the current DMD version:
MS-DOS encoding as performed by Microsoft's EDIT editor:
(5) "invalid UTF-sequence"
Western (ISO-8859-1):
(5) "invalid UTF-sequence"
Unicode (UTF-16 and UTF-32, each with Big Endian and Little Endian):
(1) "semicolon expected, not '.'"
(1) no identifier for declarator
Unicode (UTF-16 and UTF-8):
both compile fine but output garbage under MS-DOS
(Windows 98 SE, German edition)
The c functions dont like non-latin char's very much. I had this problem
displaying a file to console.
Currently, you are best of to use either writef or (if you dont want it
formatted) std.stream 's stdout.writeString and stdout.writeLine. (You
could of course use writef("%s", yourstring) , but I dont like that very
much)
Be careful: std.stdio and std.stream.stdout arn't sync'ed. (I use
std.stream
exclusively)
--
Using Opera's revolutionary e-mail client: http://www.opera.com/m2/
↑ ↓ ← → "Mathias Bierschenk" <Mathias.Bierschenk web.de> writes:
Am Sat, 20 Nov 2004 01:03:14 +1300 schrieb Simon Buchan
<currently no.where>:
The c functions dont like non-latin char's very much. I had this problem
displaying a file to console.
Currently, you are best of to use either writef or (if you dont want it
formatted) std.stream 's stdout.writeString and stdout.writeLine. (You
could of course use writef("%s", yourstring) , but I dont like that very
much)
Be careful: std.stdio and std.stream.stdout arn't sync'ed. (I use
std.stream
exclusively)
Could you provide an example? I can't get it to work here. The following
program, saved with several unicode encodings, still yields garbage:
import std.stream;
int main()
{
stdout.writeString("äöüßÄÖÜ\n");
return 0;
}
↑ ↓ ← → Ben Hinkle <Ben_member pathlink.com> writes:
Could you provide an example? I can't get it to work here. The following
program, saved with several unicode encodings, still yields garbage:
import std.stream;
int main()
{
stdout.writeString("äöüßÄÖÜ\n");
return 0;
}
Are you sure your command window is set to use UTF-8? On Windows I think you
change it by going to the "Regional Settings" control panel.
↑ ↓ ← → Ilya Minkov <minkov cs.tum.edu> writes:
Ben Hinkle schrieb:
Are you sure your command window is set to use UTF-8? On Windows I think you
change it by going to the "Regional Settings" control panel.
That doesn't matter - or rather i think there is nothing to configure.
The problem is, he misuses Mozilla for something wrong. He should rather
use a programmer's editor which supports UTF-8, for example SciTE. In
this example, also go to File -> Encoding -> UTF-8.
The output will be another problem - either multi-character garbage (C
functions) or automatically converted to local codepage (D native
Unicode functions)
-eye
↑ ↓ ← → "Mathias Bierschenk" <Mathias.Bierschenk web.de> writes:
Am Fri, 19 Nov 2004 17:03:36 +0100 schrieb Ilya Minkov <minkov cs.tum.edu>:
Are you sure your command window is set to use UTF-8? On Windows I
think you
change it by going to the "Regional Settings" control panel.
That doesn't matter - or rather i think there is nothing to configure.
The problem is, he misuses Mozilla for something wrong. He should rather
use a programmer's editor which supports UTF-8, for example SciTE. In
this example, also go to File -> Encoding -> UTF-8.
I've just downloaded SciTE and have done what you suggested. I admit that
using Mozilla for encoding issues is not very elegant. SciTE doesn't
change anything, though. I still get garbage.
By the way, I there a D plugin for SciTE?
The output will be another problem - either multi-character garbage (C
functions) or automatically converted to local codepage (D native
Unicode functions)
↑ ↓ ← → "Valéry Croizier" <valery freesurf.fr> writes:
"Mathias Bierschenk" <Mathias.Bierschenk web.de> a écrit dans le message de
news: opshp0d1h29gaiaw dialin-145-254-035-176.arcor-ip.net...
By the way, I there a D plugin for SciTE?
You'll find it there
http://www.prowiki.org/wiki4d/wiki.cgi?EditorSupport#SciTE
↑ ↓ ← → "Mathias Bierschenk" <Mathias.Bierschenk web.de> writes:
Am Fri, 19 Nov 2004 22:08:56 +0100 schrieb Valéry Croizier
<valery freesurf.fr>:
By the way, I there a D plugin for SciTE?
You'll find it there
http://www.prowiki.org/wiki4d/wiki.cgi?EditorSupport#SciTE
Thanks!
↑ ↓ ← → Ilya Minkov <minkov cs.tum.edu> writes:
Mathias Bierschenk schrieb:
I've just downloaded SciTE and have done what you suggested. I admit
that using Mozilla for encoding issues is not very elegant. SciTE
doesn't change anything, though. I still get garbage.
Ah, i missed out that you are through to getting garbage. :) Well, i'll
see what can be wrong. In general, non-NT Windows has not been largely
considered in the Phobos implementation, because these Windows versions
are not very Unicode compatible.
-eye
↑ ↓ ← → Stewart Gordon <smjg_1998 yahoo.com> writes:
Ben Hinkle wrote:
<snip>
Are you sure your command window is set to use UTF-8? On Windows I think you
change it by going to the "Regional Settings" control panel.
In Windows 98, a command prompt is still a plain old MS-DOS window. As
such, it can't possibly use UTF-8, as this would break the essential
one-to-one mapping between bytes and on-screen character positions.
I don't know how different this really is in Windows 2000/XP....
Stewart.
↑ ↓ ← → Thomas Kuehne <thomas-dloop kuehne.thisisspam.cn> writes:
Let's try to track down the real problem.
change the string into "\u00E2\u00F6\u00FC\u00DF" (ae)(oe)(ue)(ss).
If the output is still garbage try printf instead of puts.
If the problem still exists it's an output/shell problem.
Thomas
Mathias Bierschenk schrieb am Fri, 19 Nov 2004 12:49:01 +0100:
How can I print German characters? I've tried the following simple program:
import std.c.stdio;
int main()
{
puts("äöüßÄÖÜ"); // German characters
return 0;
}
As the normal MS-DOS EDIT encoding didn't work (Windows 98 SE, German
edition) I tried Mozilla to save the source code file with different
character encodings but none worked as expected. Here's what I tried using
the current DMD version:
MS-DOS encoding as performed by Microsoft's EDIT editor:
(5) "invalid UTF-sequence"
Western (ISO-8859-1):
(5) "invalid UTF-sequence"
Unicode (UTF-16 and UTF-32, each with Big Endian and Little Endian):
(1) "semicolon expected, not '.'"
(1) no identifier for declarator
Unicode (UTF-16 and UTF-8):
both compile fine but output garbage under MS-DOS
(Windows 98 SE, German edition)
↑ ↓ ← → "Mathias Bierschenk" <Mathias.Bierschenk web.de> writes:
Am Fri, 19 Nov 2004 13:09:06 +0100 schrieb Thomas Kuehne
<thomas-dloop kuehne.thisisspam.cn>:
Let's try to track down the real problem.
change the string into "\u00E2\u00F6\u00FC\u00DF" (ae)(oe)(ue)(ss).
If the output is still garbage try printf instead of puts.
I've tested the above string. The result for both puts and printf is that
either it doesn't compile or it outputs garbage:
MS-DOS/Western (ISO-8859-1), UTF-16, UTF-8
compile fine but output garbage under MS-DOS
(Windows 98 SE, German edition)
Unicode (UTF-16 and UTF-32, each with Big Endian and Little Endian):
(1) "semicolon expected, not '.'"
(1) no identifier for declarator
If the problem still exists it's an output/shell problem.
↑ ↓ ← → Thomas Kuehne <thomas-dloop kuehne.cn> writes:
Mathias Bierschenk schrieb:
Let's try to track down the real problem.
change the string into "\u00E2\u00F6\u00FC\u00DF" (ae)(oe)(ue)(ss).
If the output is still garbage try printf instead of puts.
I've tested the above string. The result for both puts and printf is that
either it doesn't compile or it outputs garbage:
MS-DOS/Western (ISO-8859-1), UTF-16, UTF-8
compile fine but output garbage under MS-DOS
(Windows 98 SE, German edition)
Clearly seems to be a shell problem.
Unicode (UTF-16 and UTF-32, each with Big Endian and Little Endian):
(1) "semicolon expected, not '.'"
(1) no identifier for declarator
This is a known problem. If you use UTF-16/32 without a BOM(byte order mark) the
current dmd assumes UTF-8 and subsequently fails.
http://svn.kuehne.cn/dstress/www/dstress.html#encoding_utf_16be
http://svn.kuehne.cn/dstress/www/dstress.html#encoding_utf_16le
http://svn.kuehne.cn/dstress/www/dstress.html#encoding_utf_32be
http://svn.kuehne.cn/dstress/www/dstress.html#encoding_utf_32le
Thomas
↑ ↓ ← → Thomas Kuehne <thomas-dloop kuehne.thisisspam.cn> writes:
Here a patch that enables GDC-0.8 and DMD-0.106 to handle
UTF-8/16/32 with and without bom.
Thomas
--- gdc-0.8/d/dmd/module.c 2004-10-02 19:19:31.000000000 +0200
+++ gdc-0.8d/d/dmd/module.c 2004-11-19 19:19:09.522419400 +0100
-241,6 +241,7
* EF BB BF UTF-8
*/
+ int haveNoBom=0;
if (buf[0] == 0xFF && buf[1] == 0xFE)
{
if (buflen >= 4 && buf[2] == 0 && buf[3] == 0)
-257,6 +258,7
fatal();
}
+ pu-=haveNoBom;
dbuf.reserve(buflen / 4);
while (++pu < pumax)
{ unsigned u;
-292,6 +294,7
fatal();
}
+ pu-=haveNoBom;
dbuf.reserve(buflen / 2);
while (++pu < pumax)
{ unsigned u;
-354,6 +357,8
* figure out the encoding.
*/
+ haveNoBom=1;
+
if (buflen >= 4)
{ if (buf[1] == 0 && buf[2] == 0 && buf[3] == 0)
{ // UTF-32LE
Thomas Kuehne schrieb am Fri, 19 Nov 2004 14:19:33 +0000 (UTC):
Let's try to track down the real problem.
change the string into "\u00E2\u00F6\u00FC\u00DF" (ae)(oe)(ue)(ss).
If the output is still garbage try printf instead of puts.
I've tested the above string. The result for both puts and printf is that
either it doesn't compile or it outputs garbage:
MS-DOS/Western (ISO-8859-1), UTF-16, UTF-8
compile fine but output garbage under MS-DOS
(Windows 98 SE, German edition)
Clearly seems to be a shell problem.
Unicode (UTF-16 and UTF-32, each with Big Endian and Little Endian):
(1) "semicolon expected, not '.'"
(1) no identifier for declarator
This is a known problem. If you use UTF-16/32 without a BOM(byte order mark)
the
current dmd assumes UTF-8 and subsequently fails.
http://svn.kuehne.cn/dstress/www/dstress.html#encoding_utf_16be
http://svn.kuehne.cn/dstress/www/dstress.html#encoding_utf_16le
http://svn.kuehne.cn/dstress/www/dstress.html#encoding_utf_32be
http://svn.kuehne.cn/dstress/www/dstress.html#encoding_utf_32le
↑ ↓ ← → Thomas Kuehne <thomas-dloop kuehne.thisisspam.cn> writes:
Thomas Kuehne schrieb am Fri, 19 Nov 2004 19:26:25 +0100:
Unicode (UTF-16 and UTF-32, each with Big Endian and Little Endian):
(1) "semicolon expected, not '.'"
(1) no identifier for declarator
This is a known problem. If you use UTF-16/32 without a BOM(byte order mark)
the
current dmd assumes UTF-8 and subsequently fails.
The real problem was that it removed the bytes of the not existing BOM.
Thomas
↑ ↓ ← → Stewart Gordon <smjg_1998 yahoo.com> writes:
Mathias Bierschenk wrote:
How can I print German characters? I've tried the following simple program:
import std.c.stdio;
int main()
{
puts("äöüßÄÖÜ"); // German characters
return 0;
}
Unicode (UTF-16 and UTF-8):
both compile fine but output garbage under MS-DOS
(Windows 98 SE, German edition)
You can include MS-DOS characters in a string, but only as escape codes.
In your case (assuming your code page is 437, 850, 852, 853 or 857):
puts("\x84\x94\x81\xE1\x8E\x99\x9A");
Since the whole point of this is for outputting to MS-DOS, you could
argue that this is appropriate use of non-Unicode characters in a string.
Stewart.
↑ ↓ ← → "Mathias Bierschenk" <Mathias.Bierschenk web.de> writes:
Am Fri, 19 Nov 2004 16:02:17 +0000 schrieb Stewart Gordon
<smjg_1998 yahoo.com>:
You can include MS-DOS characters in a string, but only as escape codes.
In your case (assuming your code page is 437, 850, 852, 853 or 857):
puts("\x84\x94\x81\xE1\x8E\x99\x9A");
Since the whole point of this is for outputting to MS-DOS, you could
argue that this is appropriate use of non-Unicode characters in a string.
Yep, that works. Maybe this is a more portable (encoded as UTF-8):
import std.c.stdio;
int main()
{
version(Win32)
puts("\x84\x94\x81\xE1\x8E\x99\x9A");
else
puts("äöüßÄÖÜ");
return 0;
}
What do you think?!
↑ ↓ ← → "Walter" <newshound digitalmars.com> writes:
"Mathias Bierschenk" <Mathias.Bierschenk web.de> wrote in message
news:opshpm3zlo9gaiaw dialin-212-144-051-051.arcor-ip.net...
How can I print German characters? I've tried the following simple
import std.c.stdio;
int main()
{
puts("äöüßÄÖÜ"); // German characters
return 0;
}
As the normal MS-DOS EDIT encoding didn't work (Windows 98 SE, German
edition) I tried Mozilla to save the source code file with different
character encodings but none worked as expected. Here's what I tried using
the current DMD version:
MS-DOS encoding as performed by Microsoft's EDIT editor:
Using Microsoft Notepad, click on "Save As" and under encoding, select
"UTF-8". Then, use std.stdio.writef() instead of std.c.stdio.puts(), and it
should work.
↑ ↓ ← → "Mathias Bierschenk" <Mathias.Bierschenk web.de> writes:
Am Fri, 19 Nov 2004 14:13:32 -0800 schrieb Walter
<newshound digitalmars.com>:
Using Microsoft Notepad, click on "Save As" and under encoding, select
"UTF-8". Then, use std.stdio.writef() instead of std.c.stdio.puts(), and
it
should work.
No, that doesn't work.
Some others here have tracked down the main problem: The Win9x console
doesn't support Unicode. Instead one can only make use of some DOS escape
sequences. The only thing that works so far (thanks to Stewart Gordon):
puts("\x84\x94\x81\xE1\x8E\x99\x9A"); // äöüßÄÖÜ
or, more portable(?), written by myself:
import std.c.stdio;
int main()
{
version(Win32)
puts("\x84\x94\x81\xE1\x8E\x99\x9A");
else
puts("äöüßÄÖÜ");
return 0;
}
Carlos Santander B. suggested another solution, based on Y. Tomino's Win32
headers, that seems to convert characters at run-time. I can't get it to
print anything at the moment, so I can't yet tell if it is better than
what I have got so far.
Maybe someone should write a tutorial about input/output basics in D? ;-)
↑ ↓ ← → Roberto Mariottini <Roberto_member pathlink.com> writes:
In article <opshrgt5ci9gaiaw dialin-212-144-051-198.arcor-ip.net>, Mathias
Bierschenk says...
[...]
Some others here have tracked down the main problem: The Win9x console
doesn't support Unicode.
This problem is for Windows NT/2000/XP also.
Consoles use OEM character set.
D doesn't support this.
Instead one can only make use of some DOS escape
sequences. The only thing that works so far (thanks to Stewart Gordon):
puts("\x84\x94\x81\xE1\x8E\x99\x9A"); // äöüßÄÖÜ
This are binary encodings of OEM characters.
or, more portable(?), written by myself:
import std.c.stdio;
int main()
{
version(Win32)
puts("\x84\x94\x81\xE1\x8E\x99\x9A");
else
puts("äöüßÄÖÜ");
return 0;
}
This is not portable at all. It work only if the OEM codepage used is compatible
with CP437 for those codeponits.
The solution is to use CharToOemW, a function that translates a string from
UTF-16 to OEM character set (when possible, of course).
See an example:
<code>
import std.stdio;
import std.c.stdio;
import std.c.windows.windows;
extern (Windows)
{
export BOOL CharToOemW(
LPCWSTR lpszSrc, // string to translate
LPSTR lpszDst // translated string
);
}
int main()
{
puts("-- untranslated --");
puts("äöüßÄÖÜ");
writef("äöüßÄÖÜ\n");
puts("-- translated --");
wchar[] mess = "äöüßÄÖÜ";
char[] OEMmess = new char[mess.length];
CharToOemW(mess, OEMmess);
puts(OEMmess);
writef(OEMmess);
return 0;
}
</code>
This outputs:
-- untranslated --
├ñ├Â├╝├ƒ├ä├û├£
├ñ├Â├╝├ƒ├ä├û├£
-- translated --
äöüßÄÖÜ
Error: invalid UTF-8 sequence
Here you can not that puts() works, but writef() not. That's because writefs
expects OEMmess to be UTF-8.
The results are that writef doesn't work, in any case, under Windows.
Note also that on Windows 95/98/Me this works only if the Microsoft Layer for
Unicode is installed.
The only alternative is to use CharToOemA, that converts the current ANSI
codepage (for most western countries: Windows-1252) to current OEM codepage.
I don't know how to translate UTF-8 to ANSI.
Carlos Santander B. suggested another solution, based on Y. Tomino's Win32
headers, that seems to convert characters at run-time. I can't get it to
print anything at the moment, so I can't yet tell if it is better than
what I have got so far.
I've not tested it, too.
Maybe someone should write a tutorial about input/output basics in D? ;-)
Yes, please do it.
Ciao
↑ ↓ ← → Thomas Kuehne <thomas-dloop kuehne.thisisspam.cn> writes:
Roberto Mariottini schrieb am Mon, 22 Nov 2004 09:52:27 +0000 (UTC):
Here you can not that puts() works, but writef() not. That's because writefs
expects OEMmess to be UTF-8.
The results are that writef doesn't work, in any case, under Windows.
Note also that on Windows 95/98/Me this works only if the Microsoft Layer for
Unicode is installed.
The only alternative is to use CharToOemA, that converts the current ANSI
codepage (for most western countries: Windows-1252) to current OEM codepage.
I don't know how to translate UTF-8 to ANSI.
Maybe you could take a look at dmd/src/phobos/std/c/stdio.d?
You should be able to change it in a way that - if "FILE*" equals
stdout, stderr or stdlog and the hosting environment is Windows -
CharToOemA is called before C's "fputs", "fputc", "puts" or "putw" is
called.
The consequence would be that all writef/*put* calls should produce reasonable
output. To do the same with with "printf" you'd have to modify
dmd/src/phobos/internal/object.d and dmd/src/phobos/object.d .
I'm currently not running Windows but it would be interesting if
"fputws" works correctly for non-ASCI chars.
Thomas
↑ ↓ ← → =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Roberto Mariottini wrote:
Some others here have tracked down the main problem: The Win9x console
doesn't support Unicode.
This problem is for Windows NT/2000/XP also.
Consoles use OEM character set.
D doesn't support this.
Mac OS X has a similar issue (uses MacRoman/ISO-8859-1 by default),
but fortunately you can choose UTF-8 from the Terminal settings...
This is not portable at all. It work only if the OEM codepage used is
compatible
with CP437 for those codeponits.
The solution is to use CharToOemW, a function that translates a string from
UTF-16 to OEM character set (when possible, of course).
Or supply similar functions in D, which could be an alternative ?
Carlos Santander B. suggested another solution, based on Y. Tomino's Win32
headers, that seems to convert characters at run-time. I can't get it to
print anything at the moment, so I can't yet tell if it is better than
what I have got so far.
I have written some basic lookups (i.e. "wchar mapping[256];")
using the tables that are all available on the Unicode site:
ISO Latin-1 (simple!)
http://www.unicode.org/Public/MAPPINGS/ISO8859/8859-1.TXT
DOS Latin Console
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/CP437.TXT
Windows "Latin-1"
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT
Mac OS Roman
http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/ROMAN.TXT
(there are few dozen others, but I think these are the most common ?)
But it needs a more thought-through API to be really useful...
And some optimization to do the reverse lookup, I suppose ?
I'm thinking one array of char[256], and one char[] of exceptions.
(where 0x00-0xFF would use the lookup, and 0x0100-0xFFFF the hash)
--anders
↑ ↓ ← → Roberto Mariottini <Roberto_member pathlink.com> writes:
In article <cnlrlp$14b6$1 digitaldaemon.com>, Walter says...
Using Microsoft Notepad, click on "Save As" and under encoding, select
"UTF-8". Then, use std.stdio.writef() instead of std.c.stdio.puts(), and it
should work.
The code doesn't work anyway, see my other post for details.
The biggest problem is that writef() doesn't work on Windows, neither 9x/Me nor
NT/2000/XP.
Ciao
↑ ↓ ← → "Carlos Santander B." <csantander619 gmail.com> writes:
"Mathias Bierschenk" <Mathias.Bierschenk web.de> escribió en el mensaje
news:opshpm3zlo9gaiaw dialin-212-144-051-051.arcor-ip.net...
| How can I print German characters? I've tried the following simple program:
|
| import std.c.stdio;
|
| int main()
| {
| puts("äöüßÄÖÜ"); // German characters
|
| return 0;
| }
|
| As the normal MS-DOS EDIT encoding didn't work (Windows 98 SE, German
| edition) I tried Mozilla to save the source code file with different
| character encodings but none worked as expected. Here's what I tried using
| the current DMD version:
|
| MS-DOS encoding as performed by Microsoft's EDIT editor:
| (5) "invalid UTF-sequence"
|
| Western (ISO-8859-1):
| (5) "invalid UTF-sequence"
|
| Unicode (UTF-16 and UTF-32, each with Big Endian and Little Endian):
| (1) "semicolon expected, not '.'"
| (1) no identifier for declarator
|
| Unicode (UTF-16 and UTF-8):
| both compile fine but output garbage under MS-DOS
| (Windows 98 SE, German edition)
I was investigating the same thing recently. What I really wanted was a Windows
console that did Unicode, but I couldn't find it.
But I came across to some C++ program which allows you to output UTF-16 strings
(wchar * in C++ on Windows). Translated to D, the program was like this:
import std.file;
import std.string;
import std.utf;
import win32.winbase;
import win32.wincon;
import win32.winnls;
void main ()
{
wchar [] tmp_w = toUTF16(cast(char[])"carlos andrés");
wchar * szwOut = tmp_w;
DWORD dwBytesWritten;
DWORD fdwMode;
HANDLE outHandle = GetStdHandle(STD_OUTPUT_HANDLE);
if( (GetFileType(outHandle) & FILE_TYPE_CHAR) && GetConsoleMode( outHandle,
&fdwMode) )
WriteConsoleW( outHandle, szwOut, wcslen(szwOut), &dwBytesWritten,
null);
else
{
int nOutputCP = GetConsoleOutputCP();
//int charCount = WideCharToMultiByte(nOutputCP, 0, szwOut, -1, null,
0,
null, null);
//char* szaStr = new char[charCount];
//WideCharToMultiByte( nOutputCP, 0, szwOut, -1, szaStr, charCount,
null, null);
char [] tmp = toUTF8(tmp_w);
char * szaStr = toMBSz(tmp);
int charCount = tmp.length;
WriteFile(outHandle, szaStr, charCount-1, &dwBytesWritten, null);
}
}
It uses Y Tomino's Win32 headers. The encoding how it's saved doesn't seem to
matter.
I really don't remember where I found the original, so you can use this code as
you want since it's not mine.
For linux, I don't think there's any problem since it goes UTF-8 by default (at
least with RedHat based distros, in my experience).
BTW, if someone knows about a Unicode console for Windows, please let me know.
-----------------------
Carlos Santander Bernal
↑ ↓ ← → Manfred Hansen <manfred toppoint.de> writes:
Hello,
i have the same problem on Linux Debian (sarge) and SUSE 9.1.
"invalid UTF-8 sequence"
Editor is vim .
Manfred
Mathias Bierschenk wrote:
How can I print German characters? I've tried the following simple
program:
import std.c.stdio;
int main()
{
puts("äöüßÄÖÜ"); // German characters
return 0;
}
As the normal MS-DOS EDIT encoding didn't work (Windows 98 SE, German
edition) I tried Mozilla to save the source code file with different
character encodings but none worked as expected. Here's what I tried using
the current DMD version:
MS-DOS encoding as performed by Microsoft's EDIT editor:
(5) "invalid UTF-sequence"
Western (ISO-8859-1):
(5) "invalid UTF-sequence"
Unicode (UTF-16 and UTF-32, each with Big Endian and Little Endian):
(1) "semicolon expected, not '.'"
(1) no identifier for declarator
Unicode (UTF-16 and UTF-8):
both compile fine but output garbage under MS-DOS
(Windows 98 SE, German edition)
↑ ↓ ← → Thomas Kuehne <thomas-dloop kuehne.thisisspam.cn> writes:
Manfred Hansen schrieb am Sat, 20 Nov 2004 08:53:41 +0100:
Hello,
i have the same problem on Linux Debian (sarge) and SUSE 9.1.
"invalid UTF-8 sequence"
Editor is vim .
Vim 6.2 works for me.
Are you sure your locale is set to use UTF-8?
# > locale
# LANG=de_DE.UTF-8
# LC_CTYPE=de_DE.UTF-8
# LC_NUMERIC=de_DE.UTF-8
# LC_TIME=de_DE.UTF-8
# LC_COLLATE=de_DE.UTF-8
# LC_MONETARY=de_DE.UTF-8
# LC_MESSAGES=de_DE.UTF-8
# LC_PAPER=de_DE.UTF-8
# LC_NAME=de_DE.UTF-8
# LC_ADDRESS=de_DE.UTF-8
# LC_TELEPHONE=de_DE.UTF-8
# LC_MEASUREMENT=de_DE.UTF-8
# LC_IDENTIFICATION=de_DE.UTF-8
# LC_ALL=
Please send me a sample, if this problem persists.
Thomas
↑ ↓ ← → Manfred Hansen <manfred toppoint.de> writes:
Thomas Kuehne wrote:
Manfred Hansen schrieb am Sat, 20 Nov 2004 08:53:41 +0100:
Hello,
i have the same problem on Linux Debian (sarge) and SUSE 9.1.
"invalid UTF-8 sequence"
Editor is vim .
Vim 6.2 works for me.
Are you sure your locale is set to use UTF-8?
# > locale
# LANG=de_DE.UTF-8
# LC_CTYPE=de_DE.UTF-8
# LC_NUMERIC=de_DE.UTF-8
# LC_TIME=de_DE.UTF-8
# LC_COLLATE=de_DE.UTF-8
# LC_MONETARY=de_DE.UTF-8
# LC_MESSAGES=de_DE.UTF-8
# LC_PAPER=de_DE.UTF-8
# LC_NAME=de_DE.UTF-8
# LC_ADDRESS=de_DE.UTF-8
# LC_TELEPHONE=de_DE.UTF-8
# LC_MEASUREMENT=de_DE.UTF-8
# LC_IDENTIFICATION=de_DE.UTF-8
# LC_ALL=
Please send me a sample, if this problem persists.
Thomas
My locale
hansen hansen-lx:~/d$ locale
LANG=de_DE euro
LC_CTYPE="de_DE euro"
LC_NUMERIC="de_DE euro"
LC_TIME="de_DE euro"
LC_COLLATE="de_DE euro"
LC_MONETARY="de_DE euro"
LC_MESSAGES="de_DE euro"
LC_PAPER="de_DE euro"
LC_NAME="de_DE euro"
LC_ADDRESS="de_DE euro"
LC_TELEPHONE="de_DE euro"
LC_MEASUREMENT="de_DE euro"
LC_IDENTIFICATION="de_DE euro"
LC_ALL=
thank you for the advice, i try to switch to UTF-8 .
mfg Manfred
|
|