www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Windows console is broken

reply Sergey Gromov <snake.scaly gmail.com> writes:
Sorry to mention it again, but it is.

If a command-line argument to a D program contains a non-ascii character, that
argument doesn't get into main().  This happens even if console code page is
65001.  This is most annoying because it cannot be worked around.

The standard output does not work with non-utf8 consoles.  But console code
page is national/traditional by default.  To make writeln() work, you must
switch to 65001 codepage AND change to a console font which supports unicode,
which means you're stuck with Lucida Console.  Not a perfect solution,
especially if a tool is developed for use by other people.

All in all, when it comes to simple utilities, I just put D aside and switch to
batch/C/perl/whatever.  As it said in Phobos's Philosophy, "Simple Operations
should be Simple."  I don't need a standard output that doesn't work.

SnakE
Jan 30 2008
next sibling parent reply Sean Kelly <sean f4.ca> writes:
Sergey Gromov wrote:
 Sorry to mention it again, but it is.
 
 If a command-line argument to a D program contains a non-ascii character, that
argument doesn't get into main().  This happens even if console code page is
65001.  This is most annoying because it cannot be worked around.

This has worked properly in Tango since its release over a year ago.
 All in all, when it comes to simple utilities, I just put D aside and switch
to batch/C/perl/whatever.  As it said in Phobos's Philosophy, "Simple
Operations should be Simple."  I don't need a standard output that doesn't work.

Personally, I feel that for scripting-type applications, the best solution may be to build a custom wrapper around the standard library which provides the utmost in convenience with no concern for efficiency. We've actually talked about this in relation to Tango, but it would mean yet another API for people to learn and even more code to maintain. Perhaps this would be a good third-party project for someone so inclined? Sean
Jan 30 2008
next sibling parent Sergey Gromov <snake.scaly gmail.com> writes:
Sean Kelly Wrote:
 Sergey Gromov wrote:
 If a command-line argument to a D program contains a non-ascii
 character, that argument doesn't get into main().  This happens even
 if console code page is 65001.  This is most annoying because it
 cannot be worked around.

This has worked properly in Tango since its release over a year ago.

Thanks for mentioning Tango, I've tried it and it really worked both in and out. Though it would be nice to have the correct command line handling in the official distribution. The command line is parsed in application startup code, OS is known, API is there, and there's no performance hit.
 All in all, when it comes to simple utilities, I just put D aside and
 switch to batch/C/perl/whatever.  As it said in Phobos's Philosophy,
 "Simple Operations should be Simple."  I don't need a standard
 output that doesn't work.

Personally, I feel that for scripting-type applications, the best solution may be to build a custom wrapper around the standard library which provides the utmost in convenience with no concern for efficiency. We've actually talked about this in relation to Tango, but it would mean yet another API for people to learn and even more code to maintain.

If stdout were a bit more than _iobuf, it would be possible to replace it with a more sophisticated stream aware of the nature of the console. But it's essentially only a Posix file handle, and Windows doesn't allow for custom streams.
  Perhaps this would be a good third-party project for someone so inclined?

It feels somewhat wrong to use a custom library for a one-file utility. But I'll probably try.
Jan 30 2008
prev sibling parent reply Sergey Gromov <snake.scaly gmail.com> writes:
Sean Kelly Wrote:
  Perhaps this would be a good third-party project for someone so inclined?

Here's what I came to. // Codepage-enabled console output functions for D and Phobos // Copyright 2008 Sergey Gromov module snake.cout; import std.conv; import std.stdio; import std.format; import std.c.windows.windows; version (Windows) { void cwrite(T, R...)(T t, R r) { foreach (ch; to!(dchar[])(to!(string)(t))) cout(ch); static if (r.length) cwrite(r); } void cwritef(...) { doFormat((dchar c){cout(c);}, _arguments, _argptr); } void cwriteln(T...)(T args) { cwrite(args, '\n'); } void cwritefln(T...)(T args) { cwritef(args, '\n'); } void cout(dchar c) { wchar src = c; char[4] buf; // buffer for a converted char auto used = WideCharToMultiByte(GetOEMCP(), 0, &src, 1, buf.ptr, buf.length, "?", null); fwrite(buf.ptr, 1, used, stdout); } } else { alias write cwrite; alias writef cwritef; alias writeln cwriteln; alias writefln cwritefln; } import std.file; import std.contracts; unittest { auto testFileName = "cout-test-file.tmp"; { FILE* save_stdout = stdout; scope(exit) stdout = save_stdout; stdout = enforce(fopen(testFileName, "wb")); scope(failure) remove(testFileName); scope(exit) fclose(stdout); cwriteln("This is %dth day", 14); cwritef("This is %dth day", 14); } scope(exit) remove(testFileName); string result = cast(string) read(testFileName); assert(result == "This is %dth day14\nThis is 14th day"); } unittest { auto testFileName = "cout-test-file.tmp"; { FILE* save_stdout = stdout; scope(exit) stdout = save_stdout; stdout = enforce(fopen(testFileName, "wb")); scope(failure) remove(testFileName); scope(exit) fclose(stdout); cwrite("Ýòî ðóññêèÿ áóêâû"); } scope(exit) remove(testFileName); invariant wchar[] mustBe = "Ýòî ðóññêèÿ áóêâû"w; auto required = WideCharToMultiByte(GetOEMCP(), 0, mustBe.ptr, mustBe.length, null, 0, "?", null); assert(required); auto mustBeConv = new char[required]; WideCharToMultiByte(GetOEMCP(), 0, mustBe.ptr, mustBe.length, mustBeConv.ptr, mustBeConv.length, "?", null); string result = cast(string) read(testFileName); assert(result == mustBeConv); }
Jan 31 2008
parent bearophile <bearophileHUGS lycos.com> writes:
Sergey Gromov:
 import std.conv;
 import std.stdio;
 import std.format;
 import std.c.windows.windows;

I think everyone has to start qualifying all the imports, so can you replace all those with: import std.conv: name1, name2, ...; import std.stdio: name3, name4, ...; import std.format: name5, name6, ...; import std.c.windows.windows: name7, name8, ...; Bye, bearophile
Jan 31 2008
prev sibling parent reply "Vladimir Panteleev" <thecybershadow gmail.com> writes:
On Thu, 31 Jan 2008 01:30:48 +0200, Sergey Gromov <snake.scaly gmail.com> wrote:

 Sorry to mention it again, but it is.

 If a command-line argument to a D program contains a non-ascii character, that
argument doesn't get into main().  This happens even if console code page is
65001.  This is most annoying because it cannot be worked around.

If I understood your problem correctly, here's a workaround: import std.windows.charset, std.string; import std.file; void main(char[][] args) { // convert from MBS (Windows ANSI encoding) to UTF-8 foreach(ref arg;args) arg = fromMBSz(toStringz(arg)); write(args[1], "Hello international world!"); } C:\Temp\d\encoding> dmd example.d C:\Soft\dmd\bin\..\..\dm\bin\link.exe example,,,user32+kernel32/noi; C:\Temp\d\encoding> example "Привет, мультиязыковый мир!.txt" C:\Temp\d\encoding> dir Volume in drive C is SYSTEM Volume Serial Number is C801-8D10 Directory of C:\Temp\d\encoding 31.01.2008 05:01 <DIR> . 31.01.2008 05:01 <DIR> .. 31.01.2008 04:57 257 example.d 31.01.2008 04:58 100 892 example.exe 31.01.2008 04:58 2 390 example.map 31.01.2008 04:58 983 example.obj 31.01.2008 04:58 26 Привет, мультиязыковый мир!.txt 5 File(s) 104 548 bytes 2 Dir(s) 3 325 255 680 bytes free -- Best regards, Vladimir mailto:thecybershadow gmail.com
Jan 30 2008
next sibling parent reply Sergey Gromov <snake.scaly gmail.com> writes:
Vladimir Panteleev Wrote:
 On Thu, 31 Jan 2008 01:30:48 +0200, Sergey Gromov <snake.scaly gmail.com>
wrote:
 If a command-line argument to a D program contains a non-ascii
 character, that argument doesn't get into main().

import std.windows.charset, std.string; import std.file; void main(char[][] args) { // convert from MBS (Windows ANSI encoding) to UTF-8 foreach(ref arg;args) arg = fromMBSz(toStringz(arg)); write(args[1], "Hello international world!"); }

Unfortunately, there is no workaround: import std.stdio; void main(string[] args) { writeln("Number of arguments to main: ", args.length); }
example  

example mother father

You can do nothing about arguments which are not there. SnakE
Jan 31 2008
parent reply Sean Kelly <sean f4.ca> writes:
Janice Caron wrote:
 On 1/31/08, Sergey Gromov <snake.scaly gmail.com> wrote:
 example  

 example mother father

You can do nothing about arguments which are not there.

Wow! That's an interesting problem. The way I see it is this. main's argument has type string[], and string is /by definition/ UTF-8, so D is not wrong to reject non-UTF-8 input. The problem is that the console is feeding it with non-UTF data. So there would be two possible fixes. Either (1), allow main to have a signature main(ubyte[][] args) thereby allowing any encoding, or (2) have the D runtime convert the shell arguments from the console's local encoding to UTF-8 before passing to main.

Tango does (2) on Windows. This doesn't appear to be a problem on other OSes however, because the consoles there can typically be set to use UTF-8. As far as I know it's just Windows that's in the stone age. Sean
Jan 31 2008
parent Lars Noschinski <lars-2006-1 usenet.noschinski.de> writes:
* Sean Kelly <sean f4.ca> [08-01-31 21:42]:
Janice Caron wrote:
 On 1/31/08, Sergey Gromov <snake.scaly gmail.com> wrote:
 So there would be two possible fixes. Either (1), allow main to have a
signature
 
     main(ubyte[][] args)
 
 thereby allowing any encoding, or (2) have the D runtime convert the
 shell arguments from the console's local encoding to UTF-8 before
 passing to main.

Tango does (2) on Windows. This doesn't appear to be a problem on other OSes however, because the consoles there can typically be set to use UTF-8. As far as I know it's just Windows that's in the stone age.

I'd think something like (1) is needed, too: E.g. Unix paths are encoding agnostic (just a stream of bytes excluding '\0'). So to implement for example a POSIX conforming cp command, you need a way to get the raw, binary command line arguments.
Feb 01 2008
prev sibling parent "Janice Caron" <caron800 googlemail.com> writes:
On 1/31/08, Sergey Gromov <snake.scaly gmail.com> wrote:
example  

example mother father

You can do nothing about arguments which are not there.

Wow! That's an interesting problem. The way I see it is this. main's argument has type string[], and string is /by definition/ UTF-8, so D is not wrong to reject non-UTF-8 input. The problem is that the console is feeding it with non-UTF data. So there would be two possible fixes. Either (1), allow main to have a signature main(ubyte[][] args) thereby allowing any encoding, or (2) have the D runtime convert the shell arguments from the console's local encoding to UTF-8 before passing to main. I think I would prefer a combination of both. That is, if main(ubyte[][]) exists, call that; else transcode the input then call main(string[]). That gets you the best of both worlds (but you still have the same problem with output).
Jan 31 2008