www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 1448] New: UTF-8 output to console is seriously broken

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=1448

           Summary: UTF-8 output to console is seriously broken
           Product: D
           Version: 1.020
          Platform: PC
        OS/Version: Windows
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Phobos
        AssignedTo: bugzilla digitalmars.com
        ReportedBy: a.solovey gmail.com


If windows console code page is set to 65001 (UTF-8) and program outputs
non-ascii characters in UTF-8 encoding, there will be no more output after the
first new line after accented character. I believe that problem is in
underlying DMC stdio, but it is more disturbing with D as it has good Unicode
support and it is very convenient to work international texts in it.
This problem has been reported in newsgroup several times before, see for
example
http://www.digitalmars.com/d/archives/digitalmars/D/announce/openquran_v0.21_8492.html
Here is the code to illustrate the problem:
////////
import std.c.stdio;
import std.c.windows.windows;

extern(Windows) export BOOL SetConsoleOutputCP( UINT );

void main() {
    SetConsoleOutputCP( 65001 ); // or use "chcp 65001" instead
    // Codepoint 00e9 is "Latin small letter e with acute"
    puts( "Output utf-8 accented char \u00e9\n... and the rest is cut off!\n"
);
}
/////////
If you run it, "... and the rest is cut off!" won't be displayed. Do not forget
to set console font to Lucida Console before trying this.


-- 
Aug 28 2007
next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=1448





------- Comment #1 from a.solovey gmail.com  2007-08-28 22:52 -------
Created an attachment (id=172)
 --> (http://d.puremagic.com/issues/attachment.cgi?id=172&action=view)
Small test cae for the same problem in DMC


-- 
Aug 28 2007
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=1448


smjg iname.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |smjg iname.com




------- Comment #2 from smjg iname.com  2007-08-29 13:03 -------
The problem doesn't show if I use the Windows API (either WriteConsole or
WriteFile) to output.  So the bug must be somewhere in DM's stdio
implementation.


-- 
Aug 29 2007
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=1448


bugzilla digitalmars.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED




------- Comment #3 from bugzilla digitalmars.com  2007-09-28 22:15 -------
Fixed dmd 1.021 and 2.004


-- 
Sep 28 2007
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=1448


mk krej.cz changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|FIXED                       |




------- Comment #4 from mk krej.cz  2007-10-29 11:02 -------
The problem was NOT fixed for stderr (DMD 1.022)


-- 
Oct 29 2007
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=1448


mk krej.cz changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mk krej.cz




------- Comment #5 from mk krej.cz  2007-10-29 11:04 -------
*** Bug 1608 has been marked as a duplicate of this bug. ***


-- 
Oct 29 2007
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=1448





------- Comment #6 from mk krej.cz  2008-09-03 10:57 -------
I hope this gets fixed one day. Here is an updated example, where it still
doesn't work (for stderr, stdout is ok) as of DMD 1.035

import std.c.stdio;
import std.c.windows.windows;

extern(Windows) export BOOL SetConsoleOutputCP( UINT );

void main() {
    SetConsoleOutputCP( 65001 ); // or use "chcp 65001" instead
    // Codepoint 00e9 is "Latin small letter e with acute"
    fputs("Output utf-8 accented char \u00e9\n... and the rest is OK\n",
stdout);
    fputs("Output utf-8 accented char \u00e9\n... and the rest is cut off!\n",
stderr);
    fputs("STDOUT.\n", stdout);
    fputs("STDERR.\n", stderr);
}


-- 
Sep 03 2008
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=1448


Kevin <kevin brogan.ca> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |kevin brogan.ca


--- Comment #7 from Kevin <kevin brogan.ca> 2012-02-07 22:48:48 PST ---
Sort of works for me.

The text doesn't get cut off, but the unicode characters don't get displayed
either.

C:\Users\Kevin\Documents\D Projects\ConsoleApp1\ConsoleApp1\bin>ConsoleApp1.exe
Output utf-8 accented char é
... and the rest is OK
Output utf-8 accented char ��
... and the rest is cut off!
STDOUT.
STDERR.

C:\Users\Kevin\Documents\D Projects\ConsoleApp1\ConsoleApp1\bin>

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Feb 07 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=1448


Martin Krejcirik <mk krej.cz> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Version|1.020                       |D1 & D2


--- Comment #8 from Martin Krejcirik <mk krej.cz> 2013-03-19 18:21:18 PDT ---
Status update as of DMD 2.062 (Win XP 32 bit)

Still the same error for the above mentioned example, however, when modified to
use write instead of fputs:

import std.stdio;
import std.c.windows.windows;

extern(Windows) BOOL SetConsoleOutputCP( UINT );

void main() {
    SetConsoleOutputCP( 65001 ); // or use "chcp 65001" instead
    stderr.write("STDERR:Output utf-8 accented char \u00e9\n... and the rest is
cut off!\n");
    stderr.write("end_STDERR.\n");
}

I get this error:

STDERR:Output utf-8 accented char 
... and the rest is cut off!
std.exception.ErrnoException D:\PROGRAMS\DMD2\WINDOWS\BIN\..\..\src\phobos\std\stdio.d(1264):
 (No error)
----------------
0x0040D874
0x0040D6FF
0x00402218
0x00402189
0x00402121
0x00402030
0x0040354E
0x00403151
0x00402388
0x7C81776F in RegisterWaitForInputIdle
----------------

So if anybody have a clue what's going on there...

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Mar 19 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=1448


Axel Bender <ben world-of-ben.de> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |ben world-of-ben.de
           Platform|x86                         |x86_64
            Version|D1 & D2                     |D2


--- Comment #9 from Axel Bender <ben world-of-ben.de> 2013-08-07 00:55:43 PDT
---
I can confirm this issue. When enumerating a directory (via dirEntries())
containing a file with a character in the CP850/CP1252 space (e.g. "sb"),
depending on the codepage settings, the output is as follows:

chcp 1252  => output is "säb" (Unicode encoding for "")
chcp 65001 => output is
"sbstd.exception.ErrnoException D:\tools\d\bin\..\src\phobos\std\stdio.d(1352):
 (No error)"

In both cases e.g. cmd's dir shows the correct results.
The correct results are also shown when using - not really comparable - C with
printf().

Tried the case in cmd, console2, and conemu. All show the same results.

It'd really be nice if this bug would get fixed...

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Aug 07 2013
prev sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=1448



--- Comment #10 from Axel Bender <ben world-of-ben.de> 2013-08-07 00:58:06 PDT
---
Addendum: Windows 7 64-bit, dmd v2.063.2.

Sorry.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Aug 07 2013