www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - unicode characters are not printed correctly on the windows command

reply moth <postmaster gmail.com> writes:
hi all.

been learning d for the last few years but suddenly realised...

when i use this code:

writeln('♥');

the output displayed on the windows command line is "ÔÖÑ" [it 
works fine when piped directly into a text file, however].

i've looked about in this forum, but all that i could find was 
people in 2016[!] saying the codepage had to be altered - clearly 
nonsense, since Rust [which i am also learning] has no problem 
whatsoever displaying "♥".

is there any function i can call or setting i can adjust to get D 
to do the same, or do i have to wait for something to be fixed in 
the language / compiler itself?

best regards

moth [su.angel-island.zone]
Dec 21 2019
next sibling parent reply rikki cattermole <rikki cattermole.co.nz> writes:
On 22/12/2019 7:11 PM, moth wrote:
 hi all.
 
 been learning d for the last few years but suddenly realised...
 
 when i use this code:
 
 writeln('♥');
 
 the output displayed on the windows command line is "ÔÖÑ" [it works fine 
 when piped directly into a text file, however].
 
 i've looked about in this forum, but all that i could find was people in 
 2016[!] saying the codepage had to be altered - clearly nonsense, since 
 Rust [which i am also learning] has no problem whatsoever displaying "♥".
This is not nonsense. This is the correct solution if that is what you intend for your program to do. Not everybody will want this. They may have set the code page themselves in some way. It may not have even occurred within a D application! Its best we leave it as the default to play nice with other applications and libraries.
 is there any function i can call or setting i can adjust to get D to do 
 the same, or do i have to wait for something to be fixed in the language 
 / compiler itself?
 
 best regards
 
 moth [su.angel-island.zone]
 
Not a bug. This is a known issue on the Windows side for people new to developing natively for it. I just checked the terminal emulator I use, ConEmu and yeah it doesn't have to do anything to make Unicode "just work" settings wise. Its conhost with its legacy which is what you are facing.
Dec 21 2019
next sibling parent Mike Parker <aldacron gmail.com> writes:
On Sunday, 22 December 2019 at 06:25:42 UTC, rikki cattermole 
wrote:
 On 22/12/2019 7:11 PM, moth wrote:
 is there any function i can call or setting i can adjust to 
 get D to do the same, or do i have to wait for something to be 
 fixed in the language / compiler itself?
 
Not a bug. This is a known issue on the Windows side for people new to developing natively for it.
Yes, and it's not just D programs. And setting the code page isn't always perfect, as it matters which font cmd is configured to use. Google for "windows command prompt unicode output". MS has updated the command prompt to support Unicode, but I don't know how to use it: https://devblogs.microsoft.com/commandline/windows-command-line-unicode-and-utf-8-output-text-buffer/ If you're on Windows 10, there's also Windows Terminal, which was released on the app store in June: https://devblogs.microsoft.com/commandline/windows-terminal-preview-v0-7-release/
Dec 22 2019
prev sibling parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Sunday, 22 December 2019 at 06:25:42 UTC, rikki cattermole 
wrote:
 Not a bug.
No, Phobos is *clearly* in the wrong here. There is a proper fix. http://dpldocs.info/this-week-in-d/Blog.Posted_2019_11_25.html#unicode Use the correct WriteConsoleW api instead of the ancient ascii api. WriteConsoleW works without changing any settings. (on old versions of Windows, you may have to install fonts to display it, but new ones come with it all preinstalled).
Dec 22 2019
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 12/22/19 8:40 AM, Adam D. Ruppe wrote:
 On Sunday, 22 December 2019 at 06:25:42 UTC, rikki cattermole wrote:
 Not a bug.
No, Phobos is *clearly* in the wrong here. There is a proper fix.
Phobos doesn't call the wrong function, libc does. Phobos uses fwrite for output.
 http://dpldocs.info/this-week-in-d/Blog.Posted_2019_11_25.html#unicode
You need to address that in DMC. I wonder, does MSVCRT have the same problem? -Steve
Dec 22 2019
parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Sunday, 22 December 2019 at 18:41:16 UTC, Steven Schveighoffer 
wrote:
 Phobos doesn't call the wrong function, libc does. Phobos uses 
 fwrite for output.
There is allegedly a way to set fwrite to do the translations on MSVCRT: https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/setmode?view=vs-2019 but trying it here it throws invalid parameter exception so idk. Regardless, I'm pretty well of the opinion that fwrite is the wrong thing to do anyway. fwrite writes bytes to a file, but we want to write strings to the console. There's other functions that do that. There is the worry of mixing stuff from C and keeping the buffer consistent, but it could always just flush() before doing its thing too. Or maybe even merge the buffers, idk what the MS runtime supports for that. or maybe i'm missing something and _setmode is a viable solution. But whatever we do, passing the buck isn't solving anything. Windows has supported Unicode console output since NT 4.0 in 1996.. just have to call the right function, and whether it is Phobos calling it or druntime or the CRT, someone just needs to do it!
Dec 22 2019
next sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 12/22/19 5:04 PM, Adam D. Ruppe wrote:
 On Sunday, 22 December 2019 at 18:41:16 UTC, Steven Schveighoffer wrote:
 Phobos doesn't call the wrong function, libc does. Phobos uses fwrite 
 for output.
There is allegedly a way to set fwrite to do the translations on MSVCRT: https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/ etmode?view=vs-2019
Looks like you need to switch to "wprintf". I'm not sure, but I think we rely only on fwrite, for which there is no "w" equivalent.
 but trying it here it throws invalid parameter exception so idk.
Not surprised ;) Here's a cool feature of Windows: https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/fwide?view=vs-2019 Basically does nothing, all parameters ignored (and yes, we use this function in Phobos, assuming it does something). But let me just say, the fact that there is some "mode" you have to set, like binary mode, that makes unicode work is unsettling. I hate libc streams...
 
 Regardless, I'm pretty well of the opinion that fwrite is the wrong 
 thing to do anyway. fwrite writes bytes to a file, but we want to write 
 strings to the console. There's other functions that do that.
Preaching to the choir here. I wanted to rip out libc reliance a decade ago.
 There is the worry of mixing stuff from C and keeping the buffer 
 consistent, but it could always just flush() before doing its thing too. 
 Or maybe even merge the buffers, idk what the MS runtime supports for that.
This is the crux. Some people gotta have their printf. And if you do different types of buffered streams, the result even from single-threaded output looks like garbage. The only solution is to wrap FILE *. And I do mean only. I looked into trying to hook the buffers. There's no reliable way without knowing all the implementation details.
 or maybe i'm missing something and _setmode is a viable solution.
_setmode is on a file descriptor. That already is a red flag to me, as there are no file descriptors in the OS. Windows use handles. So this has some weird library "translation" happening underneath. Ugh.
 But whatever we do, passing the buck isn't solving anything. Windows has 
 supported Unicode console output since NT 4.0 in 1996.. just have to 
 call the right function, and whether it is Phobos calling it or druntime 
 or the CRT, someone just needs to do it!
Hey, you can always just call the function yourself! Just make an output stream that writes with the right function, and then you can use formattedWrite instead of writef. To fix Phobos, we just(!) need to remove libc as the underlying stream implementation. I had at one point agreement from Walter to make a "backwards-compatible-ish" mechanism for file/streams. But it's not pretty, and was convoluted. At the time, I was struggling getting what would become iopipe to be usable on its own, and I eventually quit worrying about that aspect of it. We have the basic building blocks with https://github.com/MartinNowak/io and https://github.com/schveiguy/iopipe. It would be cool to get this into Phobos, but it's a lot of work. I bet Rust just skips libc altogether. -Steve
Dec 22 2019
parent reply Symphony <a a.a> writes:
On Sunday, 22 December 2019 at 22:47:43 UTC, Steven Schveighoffer 
wrote:
 To fix Phobos, we just(!) need to remove libc as the underlying 
 stream implementation.

 I had at one point agreement from Walter to make a 
 "backwards-compatible-ish" mechanism for file/streams. But it's 
 not pretty, and was convoluted. At the time, I was struggling 
 getting what would become iopipe to be usable on its own, and I 
 eventually quit worrying about that aspect of it.

 We have the basic building blocks with 
 https://github.com/MartinNowak/io and 
 https://github.com/schveiguy/iopipe. It would be cool to get 
 this into Phobos, but it's a lot of work.

 I bet Rust just skips libc altogether.

 -Steve
I don't have the ingenuity, intelligence, nor experience that many of you possess, but I have *a lot* of time on my hands for something like this. I assume I should start with std.stdio's source code and the aforementioned projects' source code, but some guidance on this would be very helpful, if not needed. D has been quite useful to me since I stumbled upon it, and I think it's time to give back in some way. (I'd do it financially, but I'm poor, haha) Anyway, if anybody wants to take me up on this offer, just let me know!
Dec 22 2019
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 12/22/19 11:53 PM, Symphony wrote:
 On Sunday, 22 December 2019 at 22:47:43 UTC, Steven Schveighoffer wrote:
 To fix Phobos, we just(!) need to remove libc as the underlying stream 
 implementation.

 I had at one point agreement from Walter to make a 
 "backwards-compatible-ish" mechanism for file/streams. But it's not 
 pretty, and was convoluted. At the time, I was struggling getting what 
 would become iopipe to be usable on its own, and I eventually quit 
 worrying about that aspect of it.

 We have the basic building blocks with 
 https://github.com/MartinNowak/io and 
 https://github.com/schveiguy/iopipe. It would be cool to get this into 
 Phobos, but it's a lot of work.

 I bet Rust just skips libc altogether.
I don't have the ingenuity, intelligence, nor experience that many of you possess, but I have *a lot* of time on my hands for something like this. I assume I should start with std.stdio's source code and the aforementioned projects' source code, but some guidance on this would be very helpful, if not needed. D has been quite useful to me since I stumbled upon it, and I think it's time to give back in some way. (I'd do it financially, but I'm poor, haha) Anyway, if anybody wants to take me up on this offer, just let me know!
I really appreciate the enthusiasm here, but at the risk of being cynical, I see little chance that this gets accepted. Before you spend any time on actual code, a DIP is going to be required, as this would be a huge change to the language. I'm sure you have a lot of time, but I don't want you to waste it on something that is likely to be rejected. If you still want to proceed, even at the risk of doing a lot of work for nothing (or at least, a lot of work that ends up being just on code.dlang.org instead of Phobos), I can tell you what my plan was: 1. std.stdio.File was going to be set up to source from either an iopipe-based io subsystem, or a FILE *. 2. The standard handles would be open with the default C FILE * standard handles as the source/target. 3. Upon using any "d-like" features on a File that is sourced from a FILE * (i.e. byline), the File would be switched to a newly-created iopipe-based source. The theory is here, that once you do something like this, you commit to using D on that, and I'd much rather use a higher performing subsystem (iopipe beats Phobos right now by 2x performance). This only counts for things that make the File unusable on its own anyway. So writefln and writeln would NOT switch the source, neither would lockingTextReader/Writer. 4. Any new File that is opened using any constructor other than passing in a FILE * will be opened with an iopipe source. 5. The iopipe and io subsystems can be used directly instead of with File, as a lot of times you don't need that overhead. Let me know if you decide to do this, I can guide you. -Steve
Dec 23 2019
next sibling parent reply bachmeier <no spam.net> writes:
On Monday, 23 December 2019 at 15:34:13 UTC, Steven Schveighoffer 
wrote:

 I really appreciate the enthusiasm here, but at the risk of 
 being cynical, I see little chance that this gets accepted. 
 Before you spend any time on actual code, a DIP is going to be 
 required, as this would be a huge change to the language. I'm 
 sure you have a lot of time, but I don't want you to waste it 
 on something that is likely to be rejected.

 If you still want to proceed, even at the risk of doing a lot 
 of work for nothing (or at least, a lot of work that ends up 
 being just on code.dlang.org instead of Phobos)
Just out of curiosity, what would be the advantage of having something like this in Phobos rather than as a separate package?
Dec 23 2019
parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 12/23/19 10:48 AM, bachmeier wrote:
 On Monday, 23 December 2019 at 15:34:13 UTC, Steven Schveighoffer wrote:
 
 I really appreciate the enthusiasm here, but at the risk of being 
 cynical, I see little chance that this gets accepted. Before you spend 
 any time on actual code, a DIP is going to be required, as this would 
 be a huge change to the language. I'm sure you have a lot of time, but 
 I don't want you to waste it on something that is likely to be rejected.

 If you still want to proceed, even at the risk of doing a lot of work 
 for nothing (or at least, a lot of work that ends up being just on 
 code.dlang.org instead of Phobos)
Just out of curiosity, what would be the advantage of having something like this in Phobos rather than as a separate package?
It means that all of Phobos can take advantage of the better performance and other benefits. For instance, std.process uses File (and therefore FILE *) as it's streams for the pipes to the child process. This has huge limitations. -Steve
Dec 23 2019
prev sibling parent reply Symphony <a a.a> writes:
On Monday, 23 December 2019 at 15:34:13 UTC, Steven Schveighoffer 
wrote:
 I really appreciate the enthusiasm here, but at the risk of 
 being cynical, I see little chance that this gets accepted. 
 Before you spend any time on actual code, a DIP is going to be 
 required, as this would be a huge change to the language. I'm 
 sure you have a lot of time, but I don't want you to waste it 
 on something that is likely to be rejected.

 If you still want to proceed, even at the risk of doing a lot 
 of work for nothing (or at least, a lot of work that ends up 
 being just on code.dlang.org instead of Phobos), I can tell you 
 what my plan was:

 1. std.stdio.File was going to be set up to source from either 
 an iopipe-based io subsystem, or a FILE *.

 2. The standard handles would be open with the default C FILE * 
 standard handles as the source/target.

 3. Upon using any "d-like" features on a File that is sourced 
 from a FILE * (i.e. byline), the File would be switched to a 
 newly-created iopipe-based source. The theory is here, that 
 once you do something like this, you commit to using D on that, 
 and I'd much rather use a higher performing subsystem (iopipe 
 beats Phobos right now by 2x performance). This only counts for 
 things that make the File unusable on its own anyway. So 
 writefln and writeln would NOT switch the source, neither would 
 lockingTextReader/Writer.

 4. Any new File that is opened using any constructor other than 
 passing in a FILE * will be opened with an iopipe source.

 5. The iopipe and io subsystems can be used directly instead of 
 with File, as a lot of times you don't need that overhead.

 Let me know if you decide to do this, I can guide you.

 -Steve
Pardon my ignorance, but wouldn't the inclusion of a std.io (e.g. Martin Nowak's io library) into Phobos be an easier and cleaner move? Other Phobos modules that require std.stdio could be gradually changed so that they use std.io instead. There would be the issue of two coexisting IO libraries in std, but issuing some warnings whenever std.stdio is imported wouldn't be too bad in my view; that is unless Mr. Bright's opposition is the main blocker.
Dec 23 2019
parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 12/23/19 2:52 PM, Symphony wrote:

 Pardon my ignorance, but wouldn't the inclusion of a std.io (e.g. Martin 
 Nowak's io library) into Phobos be an easier and cleaner move? Other 
 Phobos modules that require std.stdio could be gradually changed so that 
 they use std.io instead.
Well, that's certainly a lot easier project. But one might question whether we should do it unless we have a reason to have Phobos start using it. As bachmeier mentioned, it can happily exist in its own location. The "gradual change" thing, I don't know how that works. Also note that std.io has no buffering. You need something like iopipe on top of it for it to be reasonably usable.
 There would be the issue of two coexisting IO 
 libraries in std, but issuing some warnings whenever std.stdio is 
 imported wouldn't be too bad in my view; that is unless Mr. Bright's 
 opposition is the main blocker.
It's not without precedent though. There actually was an alternate stream system in Phobos, now in undead: https://github.com/dlang/undeaD/blob/master/src/undead/stream.d But I think before we think about making the attempt to get this accepted, we really need to flesh out the end goal. The maintainers have soured a bit I think on the std.experiemental location, especially since we do have code.dlang.org. The bar for entry is high for Phobos. My recommendation is to focus on getting the std.io project and the iopipe project to be usable and fully featured. Then it may be a much easier task to convince leadership that they should be in Phobos. -Steve
Dec 23 2019
prev sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Sun, Dec 22, 2019 at 10:04:20PM +0000, Adam D. Ruppe via Digitalmars-d-learn
wrote:
[...]
 Regardless, I'm pretty well of the opinion that fwrite is the wrong
 thing to do anyway. fwrite writes bytes to a file, but we want to
 write strings to the console. There's other functions that do that.
[...] Would it make sense for std.stdio.write* (the package global functions, as opposed to File.write*) to use the Windows console output functions instead of proxying to libc? Alternatively, we could change std.stdio.File to check if the current file descriptor is the console (fd == stdout && stdout == console, however you figure that out in Windows), and silently switch to the Windows console output functions instead of libc. We *are* already wrapping libc's FILE*, why not wrap the Windows console output functions as well. Mixing raw libc printf with std.stdio.write* is a bad idea anyway; do we really need to support that?? Though calling fflush(stdout) may not be amiss, just to alleviate sudden breakage and ensuing complaints. And of course, this only applies to Windows. On Posix libc is pretty much still the standard way of working with console output. T -- VI = Visual Irritation
Dec 23 2019
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 12/23/19 10:25 AM, H. S. Teoh wrote:
 On Sun, Dec 22, 2019 at 10:04:20PM +0000, Adam D. Ruppe via
Digitalmars-d-learn wrote:
 [...]
 Regardless, I'm pretty well of the opinion that fwrite is the wrong
 thing to do anyway. fwrite writes bytes to a file, but we want to
 write strings to the console. There's other functions that do that.
[...] Would it make sense for std.stdio.write* (the package global functions, as opposed to File.write*) to use the Windows console output functions instead of proxying to libc?
That means we have to buffer separately, which means we have a problem interleaving printf with writef. It would be awful.
 Alternatively, we could change std.stdio.File to check if the current
 file descriptor is the console (fd == stdout && stdout == console,
 however you figure that out in Windows), and silently switch to the
 Windows console output functions instead of libc.  We *are* already
 wrapping libc's FILE*, why not wrap the Windows console output functions
 as well.
Again, the docs say you have to use wprintf, not fwrite. We would have to switch to using wprintf, and I'm not sure it's very easy thing to do. It might be possible though.
 
 Mixing raw libc printf with std.stdio.write* is a bad idea anyway; do we
 really need to support that??  Though calling fflush(stdout) may not be
 amiss, just to alleviate sudden breakage and ensuing complaints.
There's this guy, his name is Walter. He likes printf. I'm pretty sure when he's buried, his cold dead fingers will be tightly and inextricably wrapped around printf.
 And of course, this only applies to Windows. On Posix libc is pretty
 much still the standard way of working with console output.
The source of this thread is for valid unicode to come out on the screen, which I'm pretty sure Posix systems support just fine. Other than that, there are good reasons NOT to use libc, but this is disruptive and difficult to get right as a "drop in" -Steve
Dec 23 2019
next sibling parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Monday, 23 December 2019 at 15:41:33 UTC, Steven Schveighoffer 
wrote:
 That means we have to buffer separately, which means we have a 
 problem interleaving printf with writef. It would be awful.
Or simply don't buffer. Any call you get, flush the C buffer and write the D stuff immediately. Remember, this code branch is only called if we already know it is an interactive console. They're usually flushed frequently (at least at every line) anyway... so especially with writeln / writefln those are virtually guaranteed and certainly expected to flush at the end anyway. I really don't think any performance concern would be significant.
Dec 23 2019
parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 12/23/19 11:02 AM, Adam D. Ruppe wrote:
 On Monday, 23 December 2019 at 15:41:33 UTC, Steven Schveighoffer wrote:
 That means we have to buffer separately, which means we have a problem 
 interleaving printf with writef. It would be awful.
Or simply don't buffer. Any call you get, flush the C buffer and write the D stuff immediately.
Unbuffered output would perform badly, especially if you are writing characters at a time (which is what formattedWrite does). But I think this would solve the interleaving problem.
 Remember, this code branch is only called if we already know it is an 
 interactive console. They're usually flushed frequently (at least at 
 every line) anyway... so especially with writeln / writefln those are 
 virtually guaranteed and certainly expected to flush at the end anyway. 
 I really don't think any performance concern would be significant.
Honestly, I think it sounds horrible to have yet another special case for this specific situation. But also, I almost never use Windows for D work, so I'm fine if you want to duct tape some more cruft onto that branch. std.stdio is already a pretty big mess. -Steve
Dec 23 2019
prev sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Mon, Dec 23, 2019 at 10:41:33AM -0500, Steven Schveighoffer via
Digitalmars-d-learn wrote:
[...]
 There's this guy, his name is Walter. He likes printf. I'm pretty sure
 when he's buried, his cold dead fingers will be tightly and
 inextricably wrapped around printf.
[...] But that's not a problem; since he loves printf so much, he'd never use std.stdio.write* in the first place. No conflict there. :-D T -- INTEL = Only half of "intelligence".
Dec 23 2019
prev sibling parent Adam D. Ruppe <destructionator gmail.com> writes:
On Sunday, 22 December 2019 at 06:11:13 UTC, moth wrote:
 is there any function i can call or setting i can adjust to get 
 D to do the same, or do i have to wait for something to be 
 fixed in the language / compiler itself?
It isn't the language/compiler per se, it is the library calling the wrong function. See the code in the link in my last email - if you call the Windows WriteConsoleW function directly it will do what you want. The rest of the surrounding code in the link is to handle conversions and pipes to files.
Dec 22 2019