www.digitalmars.com         C & C++   DMDScript  

D - other languages for output.writeLine

reply "Y.Tomino" <demoonlit inter7.jp> writes:
Hello.

DMD accepts the unicode identifier when source file is written with UTF-8.
But we can't output non-ascii letters (Japanese, etc).

This code fix it to be able to output non-ascii letters to console with
UTF-8 source code.

YT

When DMD 0.74 released, Walter wrote.
That is a problem, I'm not sure what to do about it. One thing I have been
looking for is a mapping from Shift-JIS to unicode. Do you have such a
table?
---- typedef char[] mstring; //multi-byte encoding string wchar[] toUTF16(mstring s) { wchar[] result; result.length = MultiByteToWideChar(0, 0, s, s.length, null, 0); MultiByteToWideChar(0, 0, s, s.length, result, result.length); return result; } class Console : File { this(HANDLE _handle, FileMode _mode){ super(_handle, _mode); } override void writeString(char[] s) { if(s.length > 0){ DWORD written; wchar[] w = toUTF16(s); if(WriteConsoleW(handle, &w[0], w.length, &written, null) == FALSE){ mstring m = toMBCS(w); if(WriteConsoleA(handle, &m[0], m.length, &written, null) == FALSE){ super.writeString(m); // for redirect } } } } override void write(char[] s) { super.write(s.length); writeExact(&s[0], s.length * char.size); // for binary } static this() { std.stream.stdout = new Console(std.stream.stdout.handle(), FileMode.Out); std.stream.stderr = new Console(std.stream.stderr.handle(), FileMode.Out); } }
Nov 22 2003
next sibling parent "Y.Tomino" <demoonlit inter7.jp> writes:
Sorry, I mistook editing.
toMBCS is here.

----

mstring toMBCS(wchar[] s)
{
 mstring result;
 result.length = WideCharToMultiByte(0, 0, s, s.length, null, 0, null,
null);
 WideCharToMultiByte(0, 0, s, s.length, result, result.length, null, null);
 return result;
}
Nov 22 2003
prev sibling next sibling parent reply "Walter" <walter digitalmars.com> writes:
I'm puzzled why it's necessary to convert to wide char and then back to
multi byte?


"Y.Tomino" <demoonlit inter7.jp> wrote in message
news:bpp6od$1lfm$1 digitaldaemon.com...
 Hello.

 DMD accepts the unicode identifier when source file is written with UTF-8.
 But we can't output non-ascii letters (Japanese, etc).

 This code fix it to be able to output non-ascii letters to console with
 UTF-8 source code.

 YT

 When DMD 0.74 released, Walter wrote.
That is a problem, I'm not sure what to do about it. One thing I have
been
looking for is a mapping from Shift-JIS to unicode. Do you have such a
table?
---- typedef char[] mstring; //multi-byte encoding string wchar[] toUTF16(mstring s) { wchar[] result; result.length = MultiByteToWideChar(0, 0, s, s.length, null, 0); MultiByteToWideChar(0, 0, s, s.length, result, result.length); return result; } class Console : File { this(HANDLE _handle, FileMode _mode){ super(_handle, _mode); } override void writeString(char[] s) { if(s.length > 0){ DWORD written; wchar[] w = toUTF16(s); if(WriteConsoleW(handle, &w[0], w.length, &written, null) == FALSE){ mstring m = toMBCS(w); if(WriteConsoleA(handle, &m[0], m.length, &written, null) == FALSE){ super.writeString(m); // for redirect } } } } override void write(char[] s) { super.write(s.length); writeExact(&s[0], s.length * char.size); // for binary } static this() { std.stream.stdout = new Console(std.stream.stdout.handle(),
FileMode.Out);
   std.stream.stderr = new Console(std.stream.stderr.handle(),
FileMode.Out);
  }
 }
Nov 22 2003
parent "Y.Tomino" <demoonlit inter7.jp> writes:
Because WriteFile can't output unicode letters to console.
WriteConsoleW works correctly.
String literal on source code is UTF-8, first, it converts to UTF-16 for
WriteConsoleW.

But WriteConsoleW doesn't wok on Windows95/98/Me.
Microsoft Platform SDK says.
 Implemented as Unicode and ANSI versions on Windows NT/2000/XP. Also
supported by Microsoft Layer for Unicode. So it call WriteConsoleA if WriteConsoleW failed. WriteConsoleA's argument must be multi-byte string. Multi-byte string is not UTF-8, it's necessary to convert with WideCharToMultiByte. Since Unicode has many characters rather than MBCS(Shift-JIS), it tries WriteConsoleW previously. And when we used redirect( C:\>myexe > output.txt ), Console API may fail. It have to call super.writeString. But as output of redirect, multi-byte encoded text file is natural like other programs. Therefore it pass "m" instead of "s" to super.writeString. Thanks. YT "Walter" <walter digitalmars.com> wrote in message news:bppk03$28l6$1 digitaldaemon.com...
 I'm puzzled why it's necessary to convert to wide char and then back to
 multi byte?
Nov 23 2003
prev sibling next sibling parent "Y.Tomino" <demoonlit inter7.jp> writes:
Sorry, WriteConsoleA is same as WriteFile in this case, it's unnecessarily.

mstring m = toMBCS(w);
if(WriteConsoleA(handle, &m[0], m.length, &written, null) == FALSE){
  super.writeString(m); // for redirect
}
mstring m = toMBCS(w); super.writeString(m); // for 95/98/Me and redirect
Nov 23 2003
prev sibling parent reply Hauke Duden <H.NS.Duden gmx.net> writes:
Y.Tomino wrote:
 wchar[] toUTF16(mstring s)
 {
  wchar[] result;
  result.length = MultiByteToWideChar(0, 0, s, s.length, null, 0);
  MultiByteToWideChar(0, 0, s, s.length, result, result.length);
  return result;
 }
<snip>
  override void writeString(char[] s)
  {
   if(s.length > 0){
    DWORD written;
    wchar[] w = toUTF16(s);
This will only work if the current system codepage is UTF-8, since MultiByteToWideChar assumes that the input string is in the current code page. Passing CP_UTF8 to MultiByteToWideChar won't help either, because that is only supported on Win98 and up. Seems to me that the only way to do this is to manually convert the string from UTF-8 to UTF-16 (not that much of a deal). The Win32 functions won't help you much because there's absolutely no Unicode support on Win95.
    if(WriteConsoleW(handle, &w[0], w.length, &written, null) == FALSE){
This call might be a little dangerous. WriteConsoleW is not supported on Win9x, so there's no guarantee that it won't cause a crash on some systems or return an undefined result (or is there some explicit guarantee somewhere in the docs?). It would probably be better to check whether the OS is an NT variant and call the W and A versions accordingly. Something like: OSVERSIONINFO osVersion; GetVersionEx(&osVersion); if(osVersion.dwPlatformId==VER_PLATFORM_WIN32_NT) WriteConsoleW(...); else { mstring m = toMBCS(w); WriteConsoleA(...) } Hauke
Nov 23 2003
parent reply "Y.Tomino" <demoonlit inter7.jp> writes:
 wchar[] toUTF16(mstring s)
 {
  wchar[] result;
  result.length = MultiByteToWideChar(0, 0, s, s.length, null, 0);
  MultiByteToWideChar(0, 0, s, s.length, result, result.length);
  return result;
 }
<snip>
  override void writeString(char[] s)
  {
   if(s.length > 0){
    DWORD written;
    wchar[] w = toUTF16(s);
This will only work if the current system codepage is UTF-8, since MultiByteToWideChar assumes that the input string is in the current code page. Passing CP_UTF8 to MultiByteToWideChar won't help either, because that is only supported on Win98 and up.
Sorry, It's my editing mistake. My toUTF16(mstring) is not used. toUTF16 called from writeString is std.utf.toUTF16(char[]) because D's typedef is strong. (I mistake copied my wrong toUTF16 instead of toMBCS :-)
 This call might be a little dangerous. WriteConsoleW is not supported on
 Win9x, so there's no guarantee that it won't cause a crash on some
 systems or return an undefined result (or is there some explicit
 guarantee somewhere in the docs?).
I think ~W API return FALSE and GetLastError() = ERROR_CALL_NOT_IMPLEMENTED on Win9x... Will it crash or undefined result ? YT
Nov 23 2003
next sibling parent "Matthew Wilson" <matthew.hat stlsoft.dot.org> writes:
 This call might be a little dangerous. WriteConsoleW is not supported on
 Win9x, so there's no guarantee that it won't cause a crash on some
 systems or return an undefined result (or is there some explicit
 guarantee somewhere in the docs?).
I think ~W API return FALSE and GetLastError() =
ERROR_CALL_NOT_IMPLEMENTED
 on Win9x...
That would be my expectation of any unimplemented API function on Win9x (as long as it actually exists, of course)
Nov 23 2003
prev sibling parent reply Hauke Duden <H.NS.Duden gmx.net> writes:
Y.Tomino wrote:
 I think ~W API return FALSE and GetLastError() = ERROR_CALL_NOT_IMPLEMENTED
 on Win9x...
 Will it crash or undefined result ?
Maybe, maybe not. In my experience, when you're dealing with the Windows API you should better not rely on anything that is not explicitly stated in the documentation. Otherwise there will quite often be some obscure combination of Windows version, system language and system DLL versions that will violate your assumption. So, since testing on all possible Windows configuations is close to impossible I usually stick to the documented stuff. Hauke
Nov 23 2003
next sibling parent reply "Matthew Wilson" <matthew.hat stlsoft.dot.org> writes:
"Hauke Duden" <H.NS.Duden gmx.net> wrote in message
news:bprg1c$1s8k$1 digitaldaemon.com...
 Y.Tomino wrote:
 I think ~W API return FALSE and GetLastError() =
ERROR_CALL_NOT_IMPLEMENTED
 on Win9x...
 Will it crash or undefined result ?
Maybe, maybe not. In my experience, when you're dealing with the Windows API you should better not rely on anything that is not explicitly stated in the documentation. Otherwise there will quite often be some obscure combination of Windows version, system language and system DLL versions that will violate your assumption.
It is my understanding that all unimplemented functions in the Win32 for a given operating system cause the thread error to be set to ERROR_CALL_NOT_IMPLEMENTED.
 So, since testing on all possible Windows configuations is close to
 impossible I usually stick to the documented stuff.
Your caution is worthy, and I agree in most cases. However, I think in this case it is safe to go with GetLastError. Cheers Matthew
Nov 23 2003
parent reply Hauke Duden <H.NS.Duden gmx.net> writes:
Matthew Wilson wrote:
 It is my understanding that all unimplemented functions in the Win32 for a
 given operating system cause the thread error to be set to
 ERROR_CALL_NOT_IMPLEMENTED.
Well, that can only be true for functions that already existed when the operating system was shipped, right? But I agree, if the "Ansi" version is supported, then the missing Unicode function will probably return NOT_IMPLEMENTED (or some other error - you can never be sure!). However, the Unicode function might fail for other reasons as well and maybe the ANSI version doesn't. Could be a simple case of not having enough free memory for the Unicode strings, but just enough for the ANSI version. An automated fallback might cause inconsistency within the program and its data (e.g. mixed ANSI and Unicode data in a file or something similar). If you go the fallback route you'd have to at least check the error code. If you want to be on the safe side, that is. I find it easier to just check for NTness. Since this boolean doesn't change, it can be checked once at startup and then stored, so you won't have to call GetVersionEx every time you have to decide between Ansi and Unicode versions. This might be something that could be done by Phobos - something like std.os.windows.isWinNT(). Hauke
Nov 24 2003
next sibling parent "Y.Tomino" <demoonlit inter7.jp> writes:
But I agree, if the "Ansi" version is supported, then the missing
Unicode function will probably return NOT_IMPLEMENTED (or some other
error - you can never be sure!).
A even in the case of NT/2000/XP, WriteConsoleW may fail if handle of standard-output was redirected. (GetLastError() = ERROR_INVALID_HANDLE) WriteConsoleA may fail, too. A simple way is that if WriteConsoleW fails, pass ANSI(MBCS)-converted string to WriteFile. WriteFile can write ANSI string to both Console and redirected file. Thanks. YT
Nov 24 2003
prev sibling parent reply "Matthew Wilson" <matthew.hat stlsoft.dot.org> writes:
 It is my understanding that all unimplemented functions in the Win32 for
a
 given operating system cause the thread error to be set to
 ERROR_CALL_NOT_IMPLEMENTED.
Well, that can only be true for functions that already existed when the operating system was shipped, right?
Sure. I don't think anyone's suggesting otherwise. I don't understand your point.
 But I agree, if the "Ansi" version is supported, then the missing
 Unicode function will probably return NOT_IMPLEMENTED (or some other
 error - you can never be sure!).
Naturally a particular function may be incorrectly written. What I'm saying is that it is a design feature of Win9x that a stubbed (as opposed to entirely missing) function will set the NOT_IMPL value to the thread error.
 However, the Unicode function might fail for other reasons as well and
 maybe the ANSI version doesn't. Could be a simple case of not having
 enough free memory for the Unicode strings, but just enough for the ANSI
 version. An automated fallback might cause inconsistency within the
 program and its data (e.g. mixed ANSI and Unicode data in a file or
 something similar). If you go the fallback route you'd have to at least
 check the error code. If you want to be on the safe side, that is.
This doesn't make any kind of sense to me. Why would anyone call a function without allocating the appropriate amount of memory, other than through their own incompetence? And why would such incompetence only manifest when doing Unicode programming, and not ANSI?
 I find it easier to just check for NTness. Since this boolean doesn't
 change, it can be checked once at startup and then stored, so you won't
 have to call GetVersionEx every time you have to decide between Ansi and
 Unicode versions. This might be something that could be done by Phobos -
 something like std.os.windows.isWinNT().
That's entirely true. In fact, this would be more appropriate as a robust and consistent implementation. But, given that, why not simply use MSLU, and take all the hassles from we poor overworked D people and utilise the industry, late in the day though it may be, of Microsoft. The ng for MSLU is well serviced, the library is free and redistributable, it is easy to use, and works well. Matthew
Nov 24 2003
parent reply Hauke Duden <H.NS.Duden gmx.net> writes:
Matthew Wilson wrote:
given operating system cause the thread error to be set to
ERROR_CALL_NOT_IMPLEMENTED.
Well, that can only be true for functions that already existed when the operating system was shipped, right?
Sure. I don't think anyone's suggesting otherwise. I don't understand your point.
The point I was trying to make is that you cannot generally assume that all unimplemented functions return ERROR_CALL_NOT_IMPLEMENTED. This is not really that much of an issue for Unicode functions that have an implemented Ansi version, but there are other functions that only exist on NT that may not have a stub on Win9x. I guess I'm just saying that a consistent way to handle there issues would be preferable instead of trying to deduce which functions are "stub-unimplemented" as opposed to non-existent. >>However, the Unicode function might fail for other reasons as well and
maybe the ANSI version doesn't. Could be a simple case of not having
enough free memory for the Unicode strings, but just enough for the ANSI
version. An automated fallback might cause inconsistency within the
program and its data (e.g. mixed ANSI and Unicode data in a file or
something similar). If you go the fallback route you'd have to at least
check the error code. If you want to be on the safe side, that is.
This doesn't make any kind of sense to me. Why would anyone call a function without allocating the appropriate amount of memory, other than through their own incompetence? And why would such incompetence only manifest when doing Unicode programming, and not ANSI?
Simple example: you have 5000 bytes of free disk space and want to write a 4000 character string to a file, using an imaginary WriteStringToFileA/W function. Your system is Win2000, so WriteStringToFileW exists. However, the call to WriteStringToFileW will fail because this implementation needs 8000 bytes of disc space. The Ansi version will succeed, though, since it only needs 4000 bytes. If you automatically fall back to the Ansi version without checking the error code, then you end up writing Ansi data into a file that was supposed to hold Unicode data.
I find it easier to just check for NTness. Since this boolean doesn't
change, it can be checked once at startup and then stored, so you won't
have to call GetVersionEx every time you have to decide between Ansi and
Unicode versions. This might be something that could be done by Phobos -
something like std.os.windows.isWinNT().
That's entirely true. In fact, this would be more appropriate as a robust and consistent implementation. But, given that, why not simply use MSLU, and take all the hassles from we poor overworked D people and utilise the industry, late in the day though it may be, of Microsoft. The ng for MSLU is well serviced, the library is free and redistributable, it is easy to use, and works well.
Because AFAIK the MSLU is not installed on any Win9x system by default. Certainly not on Win95. So you'd have to ship it with every application. For some applications that may be acceptable, but for others it might not. For example, it wouldn't be possible to write a ZIP self-extractor in D, because the .exe file would need an additional DLL to extract its contents. Hauke
Nov 24 2003
parent "Matthew Wilson" <matthew.hat stlsoft.dot.org> writes:
 Because AFAIK the MSLU is not installed on any Win9x system by default.
Correct
 Certainly not on Win95. So you'd have to ship it with every application.
True. And I certainly acknowledge the problems this causes.
 For some applications that may be acceptable, but for others it might
 not. For example, it wouldn't be possible to write a ZIP self-extractor
 in D, because the .exe file would need an additional DLL to extract its
 contents.
The alternative is to have the requisite amount of equivalent code build into the library. This is an approach I've taken often. It's a swings & roundabouts deal. I would certainly prefer the statically bound approach, but I'm aware of what a huge job it would be to make this work. Matthew
Nov 24 2003
prev sibling parent reply Raiko <phantom2023 hotmail.com> writes:
Hauke Duden wrote:

 Y.Tomino wrote:
 
 I think ~W API return FALSE and GetLastError() = 
 ERROR_CALL_NOT_IMPLEMENTED
 on Win9x...
 Will it crash or undefined result ?
Maybe, maybe not. In my experience, when you're dealing with the Windows API you should better not rely on anything that is not explicitly stated in the documentation. Otherwise there will quite often be some obscure combination of Windows version, system language and system DLL versions that will violate your assumption. So, since testing on all possible Windows configuations is close to impossible I usually stick to the documented stuff. Hauke
Just to jump in for a second. Alot of Unicode APIs are supported in Win9x if you have the Unicode layer ie.. WriteConsoleW (from the Platform SDK docs) Windows Me/98/95: WriteConsoleW is supported by the Microsoft Layer for Unicode. To use this, you must add certain files to your application, as outlined in Microsoft Layer for Unicode on Windows Me/98/95 Systems. Sorry for being out of place :)
Nov 23 2003
next sibling parent "Matthew Wilson" <matthew.hat stlsoft.dot.org> writes:
 I think ~W API return FALSE and GetLastError() =
 ERROR_CALL_NOT_IMPLEMENTED
 on Win9x...
 Will it crash or undefined result ?
Maybe, maybe not. In my experience, when you're dealing with the Windows API you should better not rely on anything that is not explicitly stated in the documentation. Otherwise there will quite often be some obscure combination of Windows version, system language and system DLL versions that will violate your assumption. So, since testing on all possible Windows configuations is close to impossible I usually stick to the documented stuff. Hauke
Just to jump in for a second. Alot of Unicode APIs are supported in Win9x if you have the Unicode layer ie.. WriteConsoleW (from the Platform SDK docs) Windows Me/98/95: WriteConsoleW is supported by the Microsoft Layer for Unicode. To use this, you must add certain files to your application, as outlined in Microsoft Layer for Unicode on Windows Me/98/95 Systems. Sorry for being out of place :)
You're not out of place! :-) Using MSLU might be an option. It's redistributable, and pretty reliable. (In fact, the December issue of Windows Developer Network contains an interesting article on the issue, by one of our foremost authors ...) Cheers Matthew
Nov 23 2003
prev sibling next sibling parent Hauke Duden <H.NS.Duden gmx.net> writes:
Raiko wrote:
 Alot of Unicode APIs are supported in Win9x if you have the Unicode layer
Unfortunately, that would require every D application to ship with the MSLU DLL. It's pretty small by todays standards, granted, but I don't think that it should be required. Besides, the MSLU does have some quirks. There are quite a lot of bugs in there when it comes to error handling or rarely used functions. And Microsoft doesn't really support it well either. And, of course, much of the GUI stuff is not included in the MSLU (Common Controls!). Hauke
Nov 24 2003
prev sibling parent "Julio César Carrascal Urquijo" <adnoctum phreaker.net> writes:
 Unicode. To use this, you must add certain files to your application, as
 outlined in Microsoft Layer for Unicode on Windows Me/98/95 Systems.
There's always ICU, wich is included in Parrot (Perl 6's engine). http://oss.software.ibm.com/icu/userguide/index.html
Nov 24 2003