digitalmars.D.bugs - [Issue 1448] New: UTF-8 output to console is seriously broken
- d-bugmail puremagic.com (35/35) Aug 28 2007 http://d.puremagic.com/issues/show_bug.cgi?id=1448
- d-bugmail puremagic.com (6/6) Aug 28 2007 http://d.puremagic.com/issues/show_bug.cgi?id=1448
- d-bugmail puremagic.com (10/10) Aug 29 2007 http://d.puremagic.com/issues/show_bug.cgi?id=1448
- d-bugmail puremagic.com (9/9) Sep 28 2007 http://d.puremagic.com/issues/show_bug.cgi?id=1448
- d-bugmail puremagic.com (9/9) Oct 29 2007 http://d.puremagic.com/issues/show_bug.cgi?id=1448
- d-bugmail puremagic.com (8/8) Oct 29 2007 http://d.puremagic.com/issues/show_bug.cgi?id=1448
- d-bugmail puremagic.com (18/18) Sep 03 2008 http://d.puremagic.com/issues/show_bug.cgi?id=1448
- d-bugmail puremagic.com (20/20) Feb 07 2012 http://d.puremagic.com/issues/show_bug.cgi?id=1448
- d-bugmail puremagic.com (39/39) Mar 19 2013 http://d.puremagic.com/issues/show_bug.cgi?id=1448
- d-bugmail puremagic.com (23/23) Aug 07 2013 http://d.puremagic.com/issues/show_bug.cgi?id=1448
- d-bugmail puremagic.com (7/7) Aug 07 2013 http://d.puremagic.com/issues/show_bug.cgi?id=1448
http://d.puremagic.com/issues/show_bug.cgi?id=1448 Summary: UTF-8 output to console is seriously broken Product: D Version: 1.020 Platform: PC OS/Version: Windows Status: NEW Severity: normal Priority: P2 Component: Phobos AssignedTo: bugzilla digitalmars.com ReportedBy: a.solovey gmail.com If windows console code page is set to 65001 (UTF-8) and program outputs non-ascii characters in UTF-8 encoding, there will be no more output after the first new line after accented character. I believe that problem is in underlying DMC stdio, but it is more disturbing with D as it has good Unicode support and it is very convenient to work international texts in it. This problem has been reported in newsgroup several times before, see for example http://www.digitalmars.com/d/archives/digitalmars/D/announce/openquran_v0.21_8492.html Here is the code to illustrate the problem: //////// import std.c.stdio; import std.c.windows.windows; extern(Windows) export BOOL SetConsoleOutputCP( UINT ); void main() { SetConsoleOutputCP( 65001 ); // or use "chcp 65001" instead // Codepoint 00e9 is "Latin small letter e with acute" puts( "Output utf-8 accented char \u00e9\n... and the rest is cut off!\n" ); } ///////// If you run it, "... and the rest is cut off!" won't be displayed. Do not forget to set console font to Lucida Console before trying this. --
Aug 28 2007
http://d.puremagic.com/issues/show_bug.cgi?id=1448 Created an attachment (id=172) --> (http://d.puremagic.com/issues/attachment.cgi?id=172&action=view) Small test cae for the same problem in DMC --
Aug 28 2007
http://d.puremagic.com/issues/show_bug.cgi?id=1448 smjg iname.com changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |smjg iname.com The problem doesn't show if I use the Windows API (either WriteConsole or WriteFile) to output. So the bug must be somewhere in DM's stdio implementation. --
Aug 29 2007
http://d.puremagic.com/issues/show_bug.cgi?id=1448 bugzilla digitalmars.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED Fixed dmd 1.021 and 2.004 --
Sep 28 2007
http://d.puremagic.com/issues/show_bug.cgi?id=1448 mk krej.cz changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|FIXED | The problem was NOT fixed for stderr (DMD 1.022) --
Oct 29 2007
http://d.puremagic.com/issues/show_bug.cgi?id=1448 mk krej.cz changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |mk krej.cz *** Bug 1608 has been marked as a duplicate of this bug. *** --
Oct 29 2007
http://d.puremagic.com/issues/show_bug.cgi?id=1448 I hope this gets fixed one day. Here is an updated example, where it still doesn't work (for stderr, stdout is ok) as of DMD 1.035 import std.c.stdio; import std.c.windows.windows; extern(Windows) export BOOL SetConsoleOutputCP( UINT ); void main() { SetConsoleOutputCP( 65001 ); // or use "chcp 65001" instead // Codepoint 00e9 is "Latin small letter e with acute" fputs("Output utf-8 accented char \u00e9\n... and the rest is OK\n", stdout); fputs("Output utf-8 accented char \u00e9\n... and the rest is cut off!\n", stderr); fputs("STDOUT.\n", stdout); fputs("STDERR.\n", stderr); } --
Sep 03 2008
http://d.puremagic.com/issues/show_bug.cgi?id=1448 Kevin <kevin brogan.ca> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |kevin brogan.ca Sort of works for me. The text doesn't get cut off, but the unicode characters don't get displayed either. C:\Users\Kevin\Documents\D Projects\ConsoleApp1\ConsoleApp1\bin>ConsoleApp1.exe Output utf-8 accented char é ... and the rest is OK Output utf-8 accented char �� ... and the rest is cut off! STDOUT. STDERR. C:\Users\Kevin\Documents\D Projects\ConsoleApp1\ConsoleApp1\bin> -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Feb 07 2012
http://d.puremagic.com/issues/show_bug.cgi?id=1448 Martin Krejcirik <mk krej.cz> changed: What |Removed |Added ---------------------------------------------------------------------------- Version|1.020 |D1 & D2 Status update as of DMD 2.062 (Win XP 32 bit) Still the same error for the above mentioned example, however, when modified to use write instead of fputs: import std.stdio; import std.c.windows.windows; extern(Windows) BOOL SetConsoleOutputCP( UINT ); void main() { SetConsoleOutputCP( 65001 ); // or use "chcp 65001" instead stderr.write("STDERR:Output utf-8 accented char \u00e9\n... and the rest is cut off!\n"); stderr.write("end_STDERR.\n"); } I get this error: STDERR:Output utf-8 accented char é ... and the rest is cut off! std.exception.ErrnoException D:\PROGRAMS\DMD2\WINDOWS\BIN\..\..\src\phobos\std\stdio.d(1264): (No error) ---------------- 0x0040D874 0x0040D6FF 0x00402218 0x00402189 0x00402121 0x00402030 0x0040354E 0x00403151 0x00402388 0x7C81776F in RegisterWaitForInputIdle ---------------- So if anybody have a clue what's going on there... -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Mar 19 2013
http://d.puremagic.com/issues/show_bug.cgi?id=1448 Axel Bender <ben world-of-ben.de> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |ben world-of-ben.de Platform|x86 |x86_64 Version|D1 & D2 |D2 --- I can confirm this issue. When enumerating a directory (via dirEntries()) containing a file with a character in the CP850/CP1252 space (e.g. "säb"), depending on the codepage settings, the output is as follows: chcp 1252 => output is "säb" (Unicode encoding for "ä") chcp 65001 => output is "säbstd.exception.ErrnoException D:\tools\d\bin\..\src\phobos\std\stdio.d(1352): (No error)" In both cases e.g. cmd's dir shows the correct results. The correct results are also shown when using - not really comparable - C with printf(). Tried the case in cmd, console2, and conemu. All show the same results. It'd really be nice if this bug would get fixed... -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Aug 07 2013
http://d.puremagic.com/issues/show_bug.cgi?id=1448 --- Addendum: Windows 7 64-bit, dmd v2.063.2. Sorry. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Aug 07 2013