www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - TDPL: Foreach over Unicode string

reply Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On page 123 there's an example of what happens when traversing a unicode string
with a char, and on the next page the string is traversed with a dchar, which
should fix the output. But I'm getting different results, here's the code and
output of the two samples:

import std.stdio;

void main() {
    string str = "Hall\u00E5, V\u00E4rld!";
    foreach (c; str) {
        write('[', c, ']');
    }
    writeln();
}

Prints:
[H][a][l][l][][][,][ ][V][][][r][l][d][!]

Second example:

import std.stdio;

void main() {
    string str = "Hall\u00E5, V\u00E4rld!";
    foreach (dchar c; str) {
        write('[', c, ']');
    }
    writeln();
}

Prints:
[H][a][l][l][å][,][ ][V][ä][r][l][d][!]


The second example should print out:
[H][a][l][l][][,][ ][V][][r][l][d][!] 

This is on DMD 2.047 on Windows.
Jul 27 2010
next sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
Andrej Mitrovic Wrote:

 On page 123 there's an example of what happens when traversing a unicode
string with a char, and on the next page the string is traversed with a dchar,
which should fix the output. But I'm getting different results, here's the code
and output of the two samples:
 
 import std.stdio;
 
 void main() {
     string str = "Hall\u00E5, V\u00E4rld!";
     foreach (c; str) {
         write('[', c, ']');
     }
     writeln();
 }
 
 Prints:
 [H][a][l][l][][][,][ ][V][][][r][l][d][!]
 
 Second example:
 
 import std.stdio;
 
 void main() {
     string str = "Hall\u00E5, V\u00E4rld!";
     foreach (dchar c; str) {
         write('[', c, ']');
     }
     writeln();
 }
 
 Prints:
 [H][a][l][l][å][,][ ][V][ä][r][l][d][!]
 
 
 The second example should print out:
 [H][a][l][l][][,][ ][V][][r][l][d][!] 
 
 This is on DMD 2.047 on Windows.

I think it's Windows integration that's the problem, on OSX I get: [H][a][l][l][?][?][,][ ][V][?][?][r][l][d][!] [H][a][l][l][][,][ ][V][][r][l][d][!] which is essentially correct. The only difference between this and doing the same thing in C and using printf() in place of write() is that both lines display correctly in C. I think printf() must be detecting partial UTF-8 characters and buffering until the complete chunk has arrived. Interestingly, the C output can't even be broken by badly timed calls to fflush(), so the buffering is happening at a fairly high level. I'd be interested in seeing the same thing in write() at some point.
Jul 27 2010
next sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
Sean Kelly Wrote:
 
 I think it's Windows integration that's the problem, on OSX I get:
 
 [H][a][l][l][?][?][,][ ][V][?][?][r][l][d][!]
 [H][a][l][l][][,][ ][V][][r][l][d][!]
 
 which is essentially correct.  The only difference between this and doing the
same thing in C and using printf() in place of write() is that both lines
display correctly in C.  I think printf() must be detecting partial UTF-8
characters and buffering until the complete chunk has arrived.  Interestingly,
the C output can't even be broken by badly timed calls to fflush(), so the
buffering is happening at a fairly high level.  I'd be interested in seeing the
same thing in write() at some point.

Ah, write() already works that way. It was the brackets that were screwing things up.
Jul 27 2010
next sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
Andrej Mitrovic Wrote:

 On Wed, Jul 28, 2010 at 12:34 AM, Sean Kelly <sean invisibleduck.org> wrote:
 
 Sean Kelly Wrote:
 I think it's Windows integration that's the problem, on OSX I get:

 [H][a][l][l][?][?][,][ ][V][?][?][r][l][d][!]
 [H][a][l][l][][,][ ][V][][r][l][d][!]

 which is essentially correct.  The only difference between this and doing

lines display correctly in C. I think printf() must be detecting partial UTF-8 characters and buffering until the complete chunk has arrived. Interestingly, the C output can't even be broken by badly timed calls to fflush(), so the buffering is happening at a fairly high level. I'd be interested in seeing the same thing in write() at some point. Ah, write() already works that way. It was the brackets that were screwing things up.

You are right about printf(), I'm getting the correct output with this code: import std.stdio, std.stream; void main() { string str = "Hall\u00E5, V\u00E4rld!"; foreach (dchar c; str) { printf("%c", c); } writeln(); } Hall, Vrld! Should I file this as a Windows bug for DMD?

Yes. I looked into this briefly, and after a bit of googling, it looks like fwide() isn't implemented on Windows (unless Walter had done this himself in the DMC libraries). See here: http://blogs.msdn.com/b/michkap/archive/2009/06/23/9797156.aspx If I change std.stdio.LockingTextWriter.put(C)(C c) to always use the version(Windows) code for a 32-bit argument it *almost* works correctly. Instead of garbage, the Unicode characters are a lowercase o with an accent above (U+01A1 I believe) and an uppercase sigma (U+01A9). I'll have to spend some more time later trying to figure out why it's these characters and not the intended ones. I wouldn't think that endian issues should be relevant, but that's the only thing I've come up with so far.
Jul 27 2010
next sibling parent Sean kelly <sean invisibleduck.org> writes:
After a bit more research, the situation is a bit more complicated than I
realized.  First, if I compile this C app using DMC:

#include <stdio.h>

int main()
{
    printf( "Hall\u00E5, V\u00E4rld!" );
    return 0;
}

The output is:

Hall&#963;, V&#931;rld!

This is what I was seeing once I started messing with std.stdio.  An
improvement I suppose, since it's not garbage, but the output it still
incorrect if you're expecting Unicode.  After a bit of experimenting, it looks
like there are two ways to output non-ASCII correctly in Windows: convert to a
multi-byte string (toMBSz) or call WriteConsoleW.  Here's a test app and the
associated output.  Notice how writeln() has the same output as
printf(unicodeString).

import std.stdio;
import std.string;
import std.utf;
import std.windows.charset;
import core.sys.windows.windows;

void main()
{
    HANDLE h = GetStdHandle(STD_OUTPUT_HANDLE);
    DWORD ignore;
    wchar[] buf = ("\u00E5 \u00E4"w).dup;

    writeln(buf);
    printf("%s\n", toStringz(toUTF8(buf)));
    printf("%s\n", toMBSz(toUTF8(buf), 1));
    WriteConsoleW(h, buf.ptr, buf.length, &ignore, null);
}

prints:

&#9500; &#9500;
&#9500; &#9500;
 
 

I'd think it should be enough to have std.stdio call the wide char output
routine to have things display correctly, but I tried that and that's when I
got the sigma.  Figuring out what's going on there will take some more work,
and the ultimate fix may end up being in the DMC libraries... I really don't
know.
Jul 27 2010
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
Sean Kelly wrote:
 Yes.  I looked into this briefly, and after a bit of googling, it looks like
 fwide() isn't implemented on Windows (unless Walter had done this himself in
 the DMC libraries).

fwide() has nothing to do with Windows. Yes, it is implemented in dmc, upon which dmd for Windows depends. When writing characters out to Windows, though, you have to be careful what "code page" Windows thinks your app is running in.
Jul 29 2010
parent reply Kagamin <spam here.lot> writes:
Walter Bright Wrote:

 When writing characters out to Windows, though, you have to be careful what 
 "code page" Windows thinks your app is running in.

It's valid for char functions. Is it valid that wide functions don't work either?
Jul 29 2010
parent reply Walter Bright <newshound2 digitalmars.com> writes:
Kagamin wrote:
 Walter Bright Wrote:
 
 When writing characters out to Windows, though, you have to be careful what 
 "code page" Windows thinks your app is running in.

It's valid for char functions. Is it valid that wide functions don't work either?

The wide functions are supposed to be utf16, and those should work.
Jul 29 2010
parent reply Sean Kelly <sean invisibleduck.org> writes:
Walter Bright <newshound2 digitalmars.com> wrote:
 Kagamin wrote:
 Walter Bright Wrote:
 When writing characters out to Windows, though, you have to be
 careful what >> "code page" Windows thinks your app is running
 in.



The wide functions are supposed to be utf16, and those should work.

Surprisingly, they don't appear to work properly. The locale used for the UTF16 to multibyte conversion is the currently set locale, and that prints garbage on my Windows install. I had to use the OEM locale for it to work. I was going to fix this but wasn't sure if std.stdio should be setting the codepage it requires, or if the DMC code is broken (which doesn't seem likely).
Jul 30 2010
next sibling parent reply Kagamin <spam here.lot> writes:
Sean Kelly Wrote:

 The wide functions are supposed to be utf16, and those should work.

Surprisingly, they don't appear to work properly. The locale used for the UTF16 to multibyte conversion is the currently set locale, and that prints garbage on my Windows install.

For me it just didn't print non-ASCII characters. May be it supports just a small subset of unicode?
Jul 30 2010
parent Sean Kelly <sean invisibleduck.org> writes:
Kagamin Wrote:

 Sean Kelly Wrote:
 
 The wide functions are supposed to be utf16, and those should work.

Surprisingly, they don't appear to work properly. The locale used for the UTF16 to multibyte conversion is the currently set locale, and that prints garbage on my Windows install.

For me it just didn't print non-ASCII characters. May be it supports just a small subset of unicode?

I think it depends on the default codepage. My guess is that it does just as you described and only passes through ASCII.
Jul 30 2010
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
Sean Kelly wrote:
 Walter Bright <newshound2 digitalmars.com> wrote:
 Kagamin wrote:
 Walter Bright Wrote:
 When writing characters out to Windows, though, you have to be
 careful what >> "code page" Windows thinks your app is running
 in.




Surprisingly, they don't appear to work properly. The locale used for the UTF16 to multibyte conversion is the currently set locale, and that prints garbage on my Windows install. I had to use the OEM locale for it to work. I was going to fix this but wasn't sure if std.stdio should be setting the codepage it requires, or if the DMC code is broken (which doesn't seem likely).

The D functions are supposed to send UTF16 to Windows via the "W" interface. What Windows does with it is up to Windows. The functions are NOT supposed to do a multibyte conversion and send it to the Windows "A" interface, except for the Win9x versions.
Jul 30 2010
next sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
Walter Bright Wrote:

 Sean Kelly wrote:
 
 Surprisingly, they don't appear to work properly. The locale used for
 the UTF16 to multibyte conversion is the currently set locale, and that
 prints garbage on my Windows install. I had to use the OEM locale for it
 to work. I was going to fix this but wasn't sure if std.stdio should be
 setting the codepage it requires, or if the DMC code is broken (which
 doesn't seem likely).

The D functions are supposed to send UTF16 to Windows via the "W" interface. What Windows does with it is up to Windows. The functions are NOT supposed to do a multibyte conversion and send it to the Windows "A" interface, except for the Win9x versions.

So the relevant code for printing the described string is essentially as follows: module std.stdio; alias _fputc_nlock FPUTC; alias _fputwc_nlock FPUTWC; void put(C)(C c) if (is(C : const(dchar))) { int orientation = fwide(fps, 0); if (orientation <= 0) { auto b = std.utf.toUTF8(buf, c); foreach (i ; 0 .. b.length) FPUTC(b[i], handle); } else { if (c <= 0xFFFF) FPUTWC(c, handle); } } Assuming the orientation is wide and the file is open in text mode: wint_t _fputwc_nlock(wint_t wch, FILE *fp) { char mbc[3]; int size = wctomb(mbc, wch); _fputc_nlock(mbc[0], fp); _fputc_nlock(mbc[1], fp); } int wctomb(char *s, wchar_t wch) { len = WideCharToMultiByte(__locale_codepage, ...); } I found the C code via grep so I may not be looking at the correct implementation of each function, but it matches the behavior I'm seeing. I think the standard C routines were used in D to make sure IO buffers were shared with C, etc. Are you saying this should be changed to use the Windows routines instead? Alternately, is fputwc() really doing the right thing by using the default locale? I'd imagine so except that this approach doesn't work in my tests on Windows.
Jul 30 2010
parent Walter Bright <newshound2 digitalmars.com> writes:
Sean Kelly wrote:
 I found the C code via grep so I may not be looking at the correct
 implementation of each function, but it matches the behavior I'm seeing.  I
 think the standard C routines were used in D to make sure IO buffers were
 shared with C, etc.  Are you saying this should be changed to use the Windows
 routines instead?  Alternately, is fputwc() really doing the right thing by
 using the default locale?  I'd imagine so except that this approach doesn't
 work in my tests on Windows.

I don't know, it's been years since I worked on that code. The idea is that D and C writes to stdio can be interleaved.
Jul 31 2010
prev sibling parent Kagamin <spam here.lot> writes:
Walter Bright Wrote:

 The D functions are supposed to send UTF16 to Windows via the "W" interface. 
 What Windows does with it is up to Windows. The functions are NOT supposed to
do 
 a multibyte conversion and send it to the Windows "A" interface, except for
the 
 Win9x versions.

They can't just blindly call WriteConsoleW because according to msdn it fails if stdout is not a console. Shin Fujishiro's code is the correct one.
Jul 31 2010
prev sibling next sibling parent Kagamin <spam here.lot> writes:
Shin Fujishiro Wrote:

 Now I'm thinking on how to integrate conversion facility to the stdio
 File framework.

I think creating a low-level unicode console interface will help. Like this void putchar(char c) disable { assert(false); } void putchar(wchar c) disable { assert(false); } void putchar(dchar c) {...}
Jul 29 2010
prev sibling parent reply Kagamin <spam here.lot> writes:
Shin Fujishiro Wrote:

   http://www.dsource.org/projects/phobos/browser/branches/devel/stdio-native-codeset/

I don't quite get, what is the difference between GetConsoleCP and CP_OEMCP for japanese and korean windows.
Jul 29 2010
parent Kagamin <spam here.lot> writes:
Shin Fujishiro Wrote:

 By the way, which CP should be used for redirected stdio: ANSI or OEM?
 I thought ANSI was preferred, but OEM seems to be more commonly used
 for console apps.

I think, they just don't care and write text as usual. C standard was created with implication that strings are in system codepage and no transcoding is ever mentioned, it's a language for ASCII text. There is even problem when program code is edited in a gui editor and saved in ANSI codepage, after compilation hardcoded strings are not transcoded and remain in ANSI codepage, printf just writes text blindly, so the output is broken.
Jul 30 2010
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
--001636418291f791b6048c6647c8
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

On Wed, Jul 28, 2010 at 12:34 AM, Sean Kelly <sean invisibleduck.org> wrote=
:

 Sean Kelly Wrote:
 I think it's Windows integration that's the problem, on OSX I get:

 [H][a][l][l][?][?][,][ ][V][?][?][r][l][d][!]
 [H][a][l][l][=E5][,][ ][V][=E4][r][l][d][!]

 which is essentially correct.  The only difference between this and doi=


 the same thing in C and using printf() in place of write() is that both
 lines display correctly in C.  I think printf() must be detecting partial
 UTF-8 characters and buffering until the complete chunk has arrived.
  Interestingly, the C output can't even be broken by badly timed calls to
 fflush(), so the buffering is happening at a fairly high level.  I'd be
 interested in seeing the same thing in write() at some point.

 Ah, write() already works that way.  It was the brackets that were screwi=

 things up.

You are right about printf(), I'm getting the correct output with this code= : import std.stdio, std.stream; void main() { string str =3D "Hall\u00E5, V\u00E4rld!"; foreach (dchar c; str) { printf("%c", c); } writeln(); } Hall=E5, V=E4rld! Should I file this as a Windows bug for DMD? --001636418291f791b6048c6647c8 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable <br><br><div class=3D"gmail_quote">On Wed, Jul 28, 2010 at 12:34 AM, Sean K= elly <span dir=3D"ltr">&lt;<a href=3D"mailto:sean invisibleduck.org">sean i= nvisibleduck.org</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote"= style=3D"margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 2= 04); padding-left: 1ex;"> <div class=3D"im">Sean Kelly Wrote:<br> &gt;<br> &gt; I think it&#39;s Windows integration that&#39;s the problem, on OSX I = get:<br> &gt;<br> &gt; [H][a][l][l][?][?][,][ ][V][?][?][r][l][d][!]<br> </div>&gt; [H][a][l][l][=E5][,][ ][V][=E4][r][l][d][!]<br> <div class=3D"im">&gt;<br> &gt; which is essentially correct. =A0The only difference between this and = doing the same thing in C and using printf() in place of write() is that bo= th lines display correctly in C. =A0I think printf() must be detecting part= ial UTF-8 characters and buffering until the complete chunk has arrived. = =A0Interestingly, the C output can&#39;t even be broken by badly timed call= s to fflush(), so the buffering is happening at a fairly high level. =A0I&#= 39;d be interested in seeing the same thing in write() at some point.<br> <br> </div>Ah, write() already works that way. =A0It was the brackets that were = screwing things up.<br> </blockquote></div><br>You are right about printf(), I&#39;m getting the co= rrect output with this code:<br><br>import std.stdio, std.stream;<br><br>vo= id main() {<br>=A0=A0=A0 string str =3D &quot;Hall\u00E5, V\u00E4rld!&quot;= ;<br> =A0=A0=A0 foreach (dchar c; str) {<br>=A0=A0=A0=A0=A0=A0=A0 printf(&quot;%c= &quot;, c);<br>=A0=A0=A0 }<br>=A0=A0=A0 writeln();<br>}<br><br>Hall=E5, V= =E4rld!<br><br>Should I file this as a Windows bug for DMD?<br> --001636418291f791b6048c6647c8--
Jul 27 2010
prev sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
--001636d3475a7aa134048c720f56
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Black unicode magic.

It's not a big issue for me, but it probably will be for people that deal
with Unicode all the time. Personally, ASCII is good enough for me. :)

 Thanks for your efforts!

On Wed, Jul 28, 2010 at 7:17 AM, Sean kelly <sean invisibleduck.org> wrote:

 After a bit more research, the situation is a bit more complicated than I
 realized.  First, if I compile this C app using DMC:

 #include <stdio.h>

 int main()
 {
    printf( "Hall\u00E5, V\u00E4rld!" );
    return 0;
 }

 The output is:

 Hall&#963;, V&#931;rld!

 This is what I was seeing once I started messing with std.stdio.  An
 improvement I suppose, since it's not garbage, but the output it still
 incorrect if you're expecting Unicode.  After a bit of experimenting, it
 looks like there are two ways to output non-ASCII correctly in Windows:
 convert to a multi-byte string (toMBSz) or call WriteConsoleW.  Here's a
 test app and the associated output.  Notice how writeln() has the same
 output as printf(unicodeString).

 import std.stdio;
 import std.string;
 import std.utf;
 import std.windows.charset;
 import core.sys.windows.windows;

 void main()
 {
    HANDLE h =3D GetStdHandle(STD_OUTPUT_HANDLE);
    DWORD ignore;
    wchar[] buf =3D ("\u00E5 \u00E4"w).dup;

    writeln(buf);
    printf("%s\n", toStringz(toUTF8(buf)));
    printf("%s\n", toMBSz(toUTF8(buf), 1));
    WriteConsoleW(h, buf.ptr, buf.length, &ignore, null);
 }

 prints:

 &#9500;=D1 &#9500;=F1
 &#9500;=D1 &#9500;=F1
 =E5 =E4
 =E5 =E4

 I'd think it should be enough to have std.stdio call the wide char output
 routine to have things display correctly, but I tried that and that's whe=

 got the sigma.  Figuring out what's going on there will take some more wo=

 and the ultimate fix may end up being in the DMC libraries... I really do=

 know.

--001636d3475a7aa134048c720f56 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Black unicode magic.<br><br>It&#39;s not a big issue for me, but it probabl= y will be for people that deal with Unicode all the time. Personally, ASCII= is good enough for me. :)<br><br>=A0Thanks for your efforts!<br><br><div c= lass=3D"gmail_quote"> On Wed, Jul 28, 2010 at 7:17 AM, Sean kelly <span dir=3D"ltr">&lt;<a href= =3D"mailto:sean invisibleduck.org" target=3D"_blank">sean invisibleduck.org= </a>&gt;</span> wrote:<br> <blockquote class=3D"gmail_quote" style=3D"margin: 0pt 0pt 0pt 0.8ex; borde= r-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">After a bit more = research, the situation is a bit more complicated than I realized. =A0First= , if I compile this C app using DMC:<br> <br> #include &lt;stdio.h&gt;<br> <br> int main()<br> {<br> =A0 =A0printf( &quot;Hall\u00E5, V\u00E4rld!&quot; );<br> =A0 =A0return 0;<br> }<br> <br> The output is:<br> <br> Hall&amp;#963;, V&amp;#931;rld!<br> <br> This is what I was seeing once I started messing with std.stdio. =A0An impr= ovement I suppose, since it&#39;s not garbage, but the output it still inco= rrect if you&#39;re expecting Unicode. =A0After a bit of experimenting, it = looks like there are two ways to output non-ASCII correctly in Windows: con= vert to a multi-byte string (toMBSz) or call WriteConsoleW. =A0Here&#39;s a= test app and the associated output. =A0Notice how writeln() has the same o= utput as printf(unicodeString).<br> <br> import std.stdio;<br> import std.string;<br> import std.utf;<br> import std.windows.charset;<br> import core.sys.windows.windows;<br> <br> void main()<br> {<br> =A0 =A0HANDLE h =3D GetStdHandle(STD_OUTPUT_HANDLE);<br> =A0 =A0DWORD ignore;<br> =A0 =A0wchar[] buf =3D (&quot;\u00E5 \u00E4&quot;w).dup;<br> <br> =A0 =A0writeln(buf);<br> =A0 =A0printf(&quot;%s\n&quot;, toStringz(toUTF8(buf)));<br> =A0 =A0printf(&quot;%s\n&quot;, toMBSz(toUTF8(buf), 1));<br> =A0 =A0WriteConsoleW(h, buf.ptr, buf.length, &amp;ignore, null);<br> }<br> <br> prints:<br> <br> &amp;#9500;=D1 &amp;#9500;=F1<br> &amp;#9500;=D1 &amp;#9500;=F1<br> =E5 =E4<br> =E5 =E4<br> <br> I&#39;d think it should be enough to have std.stdio call the wide char outp= ut routine to have things display correctly, but I tried that and that&#39;= s when I got the sigma. =A0Figuring out what&#39;s going on there will take= some more work, and the ultimate fix may end up being in the DMC libraries= ... I really don&#39;t know.<br> </blockquote></div><br> --001636d3475a7aa134048c720f56--
Jul 28 2010
prev sibling next sibling parent Shin Fujishiro <rsinfu gmail.com> writes:
Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:
 You are right about printf(), I'm getting the correct output with this code:
 
 import std.stdio, std.stream;
 
 void main() {
     string str = "Hall\u00E5, V\u00E4rld!";
     foreach (dchar c; str) {
         printf("%c", c);
     }
     writeln();
 }
 
 Hallå, Värld!

The reason why printf printed the correct characters is probably that the console was working in Windows-1257 (variant of ISO-8859-1). ISO-8859-1 (aka Latin-1) coded character set is compatible with Unicode. For example, Latin-1 0xE5 corresponds to U+00E5 and both represents the character å. Due to this fact, your console could _occasionally_ print Latin-1 compatible Unicode characters. The reason that Sean saw õ and Õ was that the console worked in CP850, I believe. In CP850 coded character set, 0xE4 = õ and 0xE5 = Õ. D/Phobos works in Unicode, but system (console) works in a different codeset. As Kagamin pointed out, Phobos must transcode Unicode to system native codeset to correctly print characters (even on linux). By the way, I'm working on this problem in a devel branch: http://www.dsource.org/projects/phobos/browser/branches/devel/stdio-native-codeset/ Native codeset transcoder (std/internal/stdio/nativechar.d) is done. Now I'm thinking on how to integrate conversion facility to the stdio File framework. Shin
Jul 29 2010
prev sibling parent Shin Fujishiro <rsinfu gmail.com> writes:
Kagamin <spam here.lot> wrote:
 Shin Fujishiro Wrote:
 
   http://www.dsource.org/projects/phobos/browser/branches/devel/stdio-native-codeset/

I don't quite get, what is the difference between GetConsoleCP and CP_OEMCP for japanese and korean windows.

User might change console code page by the chcp command. Or it might be changed by programmer. CP_OEMCP does not track such situation. By the way, which CP should be used for redirected stdio: ANSI or OEM? I thought ANSI was preferred, but OEM seems to be more commonly used for console apps. Shin
Jul 29 2010