digitalmars.D - phobos/tango on win32: please drop ANSI "support"
- Lionello Lunesu <lio lunesu.remove.com> Feb 14 2007
- BCS <BCS pathlink.com> Feb 14 2007
- "Lionello Lunesu" <lionello lunesu.remove.com> Feb 14 2007
- BCS <BCS pathlink.com> Feb 14 2007
- Kirk McDonald <kirklin.mcdonald gmail.com> Feb 14 2007
- Walter Bright <newshound digitalmars.com> Feb 14 2007
- Lionello Lunesu <lio lunesu.remove.com> Feb 14 2007
- kris <foo bar.com> Feb 14 2007
- Lionello Lunesu <lio lunesu.remove.com> Feb 15 2007
- Frits van Bommel <fvbommel REMwOVExCAPSs.nl> Feb 15 2007
- Lionello Lunesu <lio lunesu.remove.com> Feb 15 2007
- Lars Ivar Igesund <larsivar igesund.net> Feb 15 2007
- Lionello Lunesu <lio lunesu.remove.com> Feb 15 2007
- Lars Ivar Igesund <larsivar igesund.net> Feb 15 2007
- Walter Bright <newshound digitalmars.com> Feb 15 2007
- Lionello Lunesu <lio lunesu.remove.com> Feb 15 2007
- Walter Bright <newshound digitalmars.com> Feb 15 2007
- Sean Kelly <sean f4.ca> Feb 15 2007
- "Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> Feb 15 2007
- Walter Bright <newshound digitalmars.com> Feb 15 2007
- Sean Kelly <sean f4.ca> Feb 15 2007
- "Lionello Lunesu" <lionello lunesu.remove.com> Feb 15 2007
- Don Clugston <dac nospam.com.au> Feb 20 2007
- "Todor Totev" <umbra.tenebris list.ru> Feb 15 2007
- Thomas Kuehne <thomas-dloop kuehne.cn> Feb 15 2007
Both Phobos and Tango pretend utf8 is valid for calling ANSI methods from the Windows' API. Obviously, it's not. The correct way is to convert the utf8 string to the code-page expected by the call, or convert them to unicode. I'd like to suggest the latter. Let's drop the ANSI support for Win32 altogether. Unicode is supported since Windows 95 OSR-2 (if I'm not mistaken) and converting utf8 to ANSI is more expensive than converting it utf8 to utf16 (which is what Windows 2000 and up convert to internally anyway). No more "bool UseWFuncs". And converting utf8 to utf16 using MultiByteToWideChar would also take care of the 0-terminator. There, I've said it. L.
Feb 14 2007
Lionello Lunesu wrote:Both Phobos and Tango pretend utf8 is valid for calling ANSI methods from the Windows' API. Obviously, it's not. The correct way is to convert the utf8 string to the code-page expected by the call, or convert them to unicode. I'd like to suggest the latter. Let's drop the ANSI support for Win32 altogether. Unicode is supported since Windows 95 OSR-2 (if I'm not mistaken) and converting utf8 to ANSI is more expensive than converting it utf8 to utf16 (which is what Windows 2000 and up convert to internally anyway). No more "bool UseWFuncs". And converting utf8 to utf16 using MultiByteToWideChar would also take care of the 0-terminator. There, I've said it. L.
Do you mean ASCII?
Feb 14 2007
"BCS" <BCS pathlink.com> wrote in message news:eqvgkt$ubi$1 digitalmars.com...Lionello Lunesu wrote:Both Phobos and Tango pretend utf8 is valid for calling ANSI methods from the Windows' API. Obviously, it's not. The correct way is to convert the utf8 string to the code-page expected by the call, or convert them to unicode. I'd like to suggest the latter. Let's drop the ANSI support for Win32 altogether. Unicode is supported since Windows 95 OSR-2 (if I'm not mistaken) and converting utf8 to ANSI is more expensive than converting it utf8 to utf16 (which is what Windows 2000 and up convert to internally anyway). No more "bool UseWFuncs". And converting utf8 to utf16 using MultiByteToWideChar would also take care of the 0-terminator. There, I've said it. L.
Do you mean ASCII?
No, definitely not ASCII.. What does the A stand for in RegisterClassA, CreateWindowA, CreateFileA, etc. in the Windows API? W = Wide, 'wchar', but what's A? From MSDN: ...with the specific "A" (ANSI) or "W" (wide, Unicode)... L.
Feb 14 2007
Lionello Lunesu wrote:"BCS" <BCS pathlink.com> wrote in messageDo you mean ASCII?
No, definitely not ASCII.. What does the A stand for in RegisterClassA, CreateWindowA, CreateFileA, etc. in the Windows API? W = Wide, 'wchar', but what's A? From MSDN: ....with the specific "A" (ANSI) or "W" (wide, Unicode)... L.
Feb 14 2007
BCS wrote:Lionello Lunesu wrote:Both Phobos and Tango pretend utf8 is valid for calling ANSI methods from the Windows' API. Obviously, it's not. The correct way is to convert the utf8 string to the code-page expected by the call, or convert them to unicode. I'd like to suggest the latter. Let's drop the ANSI support for Win32 altogether. Unicode is supported since Windows 95 OSR-2 (if I'm not mistaken) and converting utf8 to ANSI is more expensive than converting it utf8 to utf16 (which is what Windows 2000 and up convert to internally anyway). No more "bool UseWFuncs". And converting utf8 to utf16 using MultiByteToWideChar would also take care of the 0-terminator. There, I've said it. L.
Do you mean ASCII?
Perhaps these would be edifying: http://en.wikipedia.org/wiki/Windows_code_page http://en.wikipedia.org/wiki/Windows-1252 In short, when someone says "ANSI" in reference to Windows or the Windows API, they mean the stuff in the above articles (which isn't actually an ANSI standard at all). Those are flat 8-bit encodings, and storing them in a UTF-8 datatype will only cause grief. As Lionello points out, modern versions of Windows use UTF-16 internally. (Although originally it was just UCS-2, and most Windows fonts don't know about anything beyond those two bytes.) I agree with Lionello: UTF-8 is a terrible thing to call the Windows API with. When dealing with the Windows API in D, it is best to stick with wchar[]. -- Kirk McDonald http://kirkmcdonald.blogspot.com Pyd: Connecting D and Python http://pyd.dsource.org
Feb 14 2007
Lionello Lunesu wrote:Both Phobos and Tango pretend utf8 is valid for calling ANSI methods from the Windows' API. Obviously, it's not. The correct way is to convert the utf8 string to the code-page expected by the call, or convert them to unicode. I'd like to suggest the latter. Let's drop the ANSI support for Win32 altogether. Unicode is supported since Windows 95 OSR-2 (if I'm not mistaken) and converting utf8 to ANSI is more expensive than converting it utf8 to utf16 (which is what Windows 2000 and up convert to internally anyway). No more "bool UseWFuncs". And converting utf8 to utf16 using MultiByteToWideChar would also take care of the 0-terminator.
The "useWfuncs" only happens for Windows 9x (including Me). All Windows 9x systems are 8 bit internally, and even if you use the W interface, they are internally converted to 8 bits anyway.
Feb 14 2007
Walter Bright wrote:Lionello Lunesu wrote:Both Phobos and Tango pretend utf8 is valid for calling ANSI methods from the Windows' API. Obviously, it's not. The correct way is to convert the utf8 string to the code-page expected by the call, or convert them to unicode. I'd like to suggest the latter. Let's drop the ANSI support for Win32 altogether. Unicode is supported since Windows 95 OSR-2 (if I'm not mistaken) and converting utf8 to ANSI is more expensive than converting it utf8 to utf16 (which is what Windows 2000 and up convert to internally anyway). No more "bool UseWFuncs". And converting utf8 to utf16 using MultiByteToWideChar would also take care of the 0-terminator.
The "useWfuncs" only happens for Windows 9x (including Me). All Windows 9x systems are 8 bit internally, and even if you use the W interface, they are internally converted to 8 bits anyway.
Yes, they will be converted to "8 bits", but not to utf8. They will be converted to whatever code-page the thread's currently using, which is what's supposed to be done. That's my point: both Phobos and Tango pass utf8 to ANSI (..A) versions of Windows' functions, which is not correct. You should either convert the utf8 to the correct code-page for passing to WhatEverA(..), or convert it to utf16 and pass it to WhatEverW(..). The last one is much easier: a fixed, straightforward conversion (no need to know about code-pages) that also happens to be efficient for Windows 2000 and up. As for UseWFuncs: I don't like it because the check is done at run-time. It's allover the place, practically doubles all Win32 code, not to mention the imports / obj-size. More importantly, for the reasons mentioned above, I don't think it's necessary. L.
Feb 14 2007
Lionello Lunesu wrote:Walter Bright wrote:Lionello Lunesu wrote:Both Phobos and Tango pretend utf8 is valid for calling ANSI methods from the Windows' API. Obviously, it's not. The correct way is to convert the utf8 string to the code-page expected by the call, or convert them to unicode. I'd like to suggest the latter. Let's drop the ANSI support for Win32 altogether. Unicode is supported since Windows 95 OSR-2 (if I'm not mistaken) and converting utf8 to ANSI is more expensive than converting it utf8 to utf16 (which is what Windows 2000 and up convert to internally anyway). No more "bool UseWFuncs". And converting utf8 to utf16 using MultiByteToWideChar would also take care of the 0-terminator.
The "useWfuncs" only happens for Windows 9x (including Me). All Windows 9x systems are 8 bit internally, and even if you use the W interface, they are internally converted to 8 bits anyway.
Yes, they will be converted to "8 bits", but not to utf8. They will be converted to whatever code-page the thread's currently using, which is what's supposed to be done. That's my point: both Phobos and Tango pass utf8 to ANSI (..A) versions of Windows' functions, which is not correct.
Regarding Tango, it uses the WindowsA functions only if -verion=Win32SansUnicode is configured. This switch is for supporting certain older environments, but does /not/ imply that code-pages are supported in Tango. There has never been an intent to do so. For code-page support, we currently suggest using a library such as ICU to do the appropriate conversions.
Feb 14 2007
kris wrote:Lionello Lunesu wrote:Walter Bright wrote:Lionello Lunesu wrote:Both Phobos and Tango pretend utf8 is valid for calling ANSI methods from the Windows' API. Obviously, it's not. The correct way is to convert the utf8 string to the code-page expected by the call, or convert them to unicode. I'd like to suggest the latter. Let's drop the ANSI support for Win32 altogether. Unicode is supported since Windows 95 OSR-2 (if I'm not mistaken) and converting utf8 to ANSI is more expensive than converting it utf8 to utf16 (which is what Windows 2000 and up convert to internally anyway). No more "bool UseWFuncs". And converting utf8 to utf16 using MultiByteToWideChar would also take care of the 0-terminator.
The "useWfuncs" only happens for Windows 9x (including Me). All Windows 9x systems are 8 bit internally, and even if you use the W interface, they are internally converted to 8 bits anyway.
Yes, they will be converted to "8 bits", but not to utf8. They will be converted to whatever code-page the thread's currently using, which is what's supposed to be done. That's my point: both Phobos and Tango pass utf8 to ANSI (..A) versions of Windows' functions, which is not correct.
Regarding Tango, it uses the WindowsA functions only if -verion=Win32SansUnicode is configured. This switch is for supporting certain older environments, but does /not/ imply that code-pages are supported in Tango. There has never been an intent to do so.
Then it's not actually supporting those older environments at all.For code-page support, we currently suggest using a library such as ICU to do the appropriate conversions.
On Windows, just convert to wchar[] (as you would on W2K and up) and then use WideCharToMultiByte. L.
Feb 15 2007
Lionello Lunesu wrote:kris wrote:Regarding Tango, it uses the WindowsA functions only if -verion=Win32SansUnicode is configured. This switch is for supporting certain older environments, but does /not/ imply that code-pages are supported in Tango. There has never been an intent to do so.
Then it's not actually supporting those older environments at all.
Well, that's support, just not *full* support. Needing to stick to ASCII is still better than no support at all...
Feb 15 2007
Frits van Bommel wrote:Lionello Lunesu wrote:kris wrote:Regarding Tango, it uses the WindowsA functions only if -verion=Win32SansUnicode is configured. This switch is for supporting certain older environments, but does /not/ imply that code-pages are supported in Tango. There has never been an intent to do so.
Then it's not actually supporting those older environments at all.
Well, that's support, just not *full* support. Needing to stick to ASCII is still better than no support at all...
OK, then all it needs is a "ThrowIfContainsUpperAscii(str);" and we're set :) L.
Feb 15 2007
Lionello Lunesu wrote:kris wrote:Regarding Tango, it uses the WindowsA functions only if -verion=Win32SansUnicode is configured. This switch is for supporting certain older environments, but does /not/ imply that code-pages are supported in Tango. There has never been an intent to do so.
Then it's not actually supporting those older environments at all.
Would depend on the nature of said environments, no? -- Lars Ivar Igesund blog at http://larsivi.net DSource & #D: larsivi Dancing the Tango
Feb 15 2007
Lars Ivar Igesund wrote:Lionello Lunesu wrote:kris wrote:Regarding Tango, it uses the WindowsA functions only if -verion=Win32SansUnicode is configured. This switch is for supporting certain older environments, but does /not/ imply that code-pages are supported in Tango. There has never been an intent to do so.
Would depend on the nature of said environments, no?
?? That would mean that a char[] in Tango is not always utf8 and could in fact be code-page specific encoding. This is quite nasty for somebody writing library functions in Tango. L.
Feb 15 2007
Lionello Lunesu wrote:Lars Ivar Igesund wrote:Lionello Lunesu wrote:kris wrote:Regarding Tango, it uses the WindowsA functions only if -verion=Win32SansUnicode is configured. This switch is for supporting certain older environments, but does /not/ imply that code-pages are supported in Tango. There has never been an intent to do so.
Would depend on the nature of said environments, no?
?? That would mean that a char[] in Tango is not always utf8 and could in fact be code-page specific encoding. This is quite nasty for somebody writing library functions in Tango. L.
No, as Kris mentions, code pages are not currently supported in Tango (they are possible to support via the ICU bindings in Mango), but in environments where ASCII is the only used subset (like on your typical old PC in the US of A) would be supported by the functionality in question. This is not compiled in by default in Tango, and as such you use it only if you are aware that you don't use standard Unicode compliant Tango. -- Lars Ivar Igesund blog at http://larsivi.net DSource & #D: larsivi Dancing the Tango
Feb 15 2007
Lionello Lunesu wrote:Walter Bright wrote:The "useWfuncs" only happens for Windows 9x (including Me). All Windows 9x systems are 8 bit internally, and even if you use the W interface, they are internally converted to 8 bits anyway.
Yes, they will be converted to "8 bits", but not to utf8. They will be converted to whatever code-page the thread's currently using, which is what's supposed to be done. That's my point: both Phobos and Tango pass utf8 to ANSI (..A) versions of Windows' functions, which is not correct. You should either convert the utf8 to the correct code-page for passing to WhatEverA(..),
It does convert to the correct code-page. See std.windows.charset.toMBSz().or convert it to utf16 and pass it to WhatEverW(..). The last one is much easier: a fixed, straightforward conversion (no need to know about code-pages)
This just does not work under Win9x, because most of the 'W' functions are not supported. (Also, Win9x internally converts the few 'W' functions it does support right back to 'A'.)that also happens to be efficient for Windows 2000 and up.
Under Windows NT, 2000, and up, the 'W' functions *are* called.As for UseWFuncs: I don't like it because the check is done at run-time.
It has to be done at runtime, because that's the only way to make it work between different Windows versions.It's allover the place, practically doubles all Win32 code, not to mention the imports / obj-size. More importantly, for the reasons mentioned above, I don't think it's necessary.
There's no hope for it unless all support for Win9x is dropped.
Feb 15 2007
Walter Bright wrote:Lionello Lunesu wrote:Walter Bright wrote:The "useWfuncs" only happens for Windows 9x (including Me). All Windows 9x systems are 8 bit internally, and even if you use the W interface, they are internally converted to 8 bits anyway.
Yes, they will be converted to "8 bits", but not to utf8. They will be converted to whatever code-page the thread's currently using, which is what's supposed to be done. That's my point: both Phobos and Tango pass utf8 to ANSI (..A) versions of Windows' functions, which is not correct. You should either convert the utf8 to the correct code-page for passing to WhatEverA(..),
It does convert to the correct code-page. See std.windows.charset.toMBSz().
The problem is that this function is not always called. And because, by default, the A-functions are the ones that get aliased to the 'normal form', many times the utf8 char[] is passed as if it were 'ansi'. A quick grep reveals: std\loader.d [5] std\windows\registry.d [35] I know these are easily solvable, but I was just wondering if it was worth the trouble.or convert it to utf16 and pass it to WhatEverW(..). The last one is much easier: a fixed, straightforward conversion (no need to know about code-pages)
This just does not work under Win9x, because most of the 'W' functions are not supported. (Also, Win9x internally converts the few 'W' functions it does support right back to 'A'.)
Yes, but it would be done by Windows. Instead of: if (UseWFuncs) WhatEverA( str.toMBSz ); else WhatEverW( str.toUTF16z ); You'd do only: WhatEverW( str.toUTF16z ); and Windows' unicode layer for Win9x would convert the string back to the proper code-page. Hey, which is exactly what's going on in std.windows.charset! But at least I don't have to worry about "UseWFuncs" in my own code anymore...that also happens to be efficient for Windows 2000 and up.
Under Windows NT, 2000, and up, the 'W' functions *are* called.
Only is you'd bother to check UseWFuncs. You probably would, but many don't.As for UseWFuncs: I don't like it because the check is done at run-time.
It has to be done at runtime, because that's the only way to make it work between different Windows versions.
You could provide link-time support only, using version blocks?It's allover the place, practically doubles all Win32 code, not to mention the imports / obj-size. More importantly, for the reasons mentioned above, I don't think it's necessary.
There's no hope for it unless all support for Win9x is dropped.
See previous question. L.
Feb 15 2007
Lionello Lunesu wrote:Walter Bright wrote:It does convert to the correct code-page. See std.windows.charset.toMBSz().
default, the A-functions are the ones that get aliased to the 'normal form', many times the utf8 char[] is passed as if it were 'ansi'. A quick grep reveals: std\loader.d [5] std\windows\registry.d [35]
Those would be bugs. All the ones using useWfuncs are correctly done (see std.file).This just does not work under Win9x, because most of the 'W' functions are not supported. (Also, Win9x internally converts the few 'W' functions it does support right back to 'A'.)
Yes, but it would be done by Windows. Instead of: if (UseWFuncs) WhatEverA( str.toMBSz ); else WhatEverW( str.toUTF16z ); You'd do only: WhatEverW( str.toUTF16z ); and Windows' unicode layer for Win9x would convert the string back to the proper code-page. Hey, which is exactly what's going on in std.windows.charset! But at least I don't have to worry about "UseWFuncs" in my own code anymore...
unicode layer for Windows is not part of Win9x, it's a separate add-on. This means that in order to use a D executable, the user would have to find and install MSLU. This is unacceptable - I don't want to deal with the constant "bug reports" about this.that also happens to be efficient for Windows 2000 and up.
Under Windows NT, 2000, and up, the 'W' functions *are* called.
Only is you'd bother to check UseWFuncs. You probably would, but many don't.As for UseWFuncs: I don't like it because the check is done at run-time.
It has to be done at runtime, because that's the only way to make it work between different Windows versions.
You could provide link-time support only, using version blocks?
Then there'd be two Phobos libraries, and the D programmer would have to ship two different executables. This is not worth it.
Feb 15 2007
Walter Bright wrote:unicode layer for Windows is not part of Win9x, it's a separate add-on. This means that in order to use a D executable, the user would have to find and install MSLU. This is unacceptable - I don't want to deal with the constant "bug reports" about this.
Supporting 9x in general is a huge pain. There are a lot of important library features that it doesn't provide. Sean
Feb 15 2007
Sean Kelly wrote:Walter Bright wrote:unicode layer for Windows is not part of Win9x, it's a separate add-on. This means that in order to use a D executable, the user would have to find and install MSLU. This is unacceptable - I don't want to deal with the constant "bug reports" about this.
Supporting 9x in general is a huge pain. There are a lot of important library features that it doesn't provide.
Couldn't that be just dropped? MS itself dropped support for them six months ago: http://support.microsoft.com/gp/lifean18 Andrei
Feb 15 2007
Sean Kelly wrote:Walter Bright wrote:unicode layer for Windows is not part of Win9x, it's a separate add-on. This means that in order to use a D executable, the user would have to find and install MSLU. This is unacceptable - I don't want to deal with the constant "bug reports" about this.
Supporting 9x in general is a huge pain. There are a lot of important library features that it doesn't provide.
The basic stuff, like file I/O, does work, and must work.
Feb 15 2007
Walter Bright wrote:Sean Kelly wrote:Walter Bright wrote:unicode layer for Windows is not part of Win9x, it's a separate add-on. This means that in order to use a D executable, the user would have to find and install MSLU. This is unacceptable - I don't want to deal with the constant "bug reports" about this.
Supporting 9x in general is a huge pain. There are a lot of important library features that it doesn't provide.
The basic stuff, like file I/O, does work, and must work.
Well sure, but their socket library and threading support are somewhat weak. I'll admit that my opinion is skewed towards my own particular areas of interest. Sean
Feb 15 2007
"Sean Kelly" <sean f4.ca> wrote in message news:er24ru$171o$1 digitalmars.com...Walter Bright wrote:Sean Kelly wrote:Walter Bright wrote:unicode layer for Windows is not part of Win9x, it's a separate add-on. This means that in order to use a D executable, the user would have to find and install MSLU. This is unacceptable - I don't want to deal with the constant "bug reports" about this.
Supporting 9x in general is a huge pain. There are a lot of important library features that it doesn't provide.
The basic stuff, like file I/O, does work, and must work.
Well sure, but their socket library and threading support are somewhat weak. I'll admit that my opinion is skewed towards my own particular areas of interest.
You make a great point: isn't Phobos using winsock2? This is also an add-on for the older windows systems. L
Feb 15 2007
Lionello Lunesu wrote:"Sean Kelly" <sean f4.ca> wrote in message news:er24ru$171o$1 digitalmars.com...Walter Bright wrote:Sean Kelly wrote:Walter Bright wrote:unicode layer for Windows is not part of Win9x, it's a separate add-on. This means that in order to use a D executable, the user would have to find and install MSLU. This is unacceptable - I don't want to deal with the constant "bug reports" about this.
library features that it doesn't provide.
weak. I'll admit that my opinion is skewed towards my own particular areas of interest.
You make a great point: isn't Phobos using winsock2? This is also an add-on for the older windows systems.
It shipped with the second revision of Win95. There must be very few Win9x systems that don't have it, and those that don't would not be using sockets. It's pretty safe to assume it's installed.
Feb 20 2007
I'd like to suggest the latter. Let's drop the ANSI support for Win32 altogether. Unicode is supported since Windows 95 OSR-2 (if I'm not mistaken) and converting utf8 to ANSI is more expensive than converting it utf8 to utf16 (which is what Windows 2000 and up convert to internally anyway). No more "bool UseWFuncs". And converting utf8 to utf16 using MultiByteToWideChar would also take care of the 0-terminator.
Actually Microsoft are heading this way themselves. See this blog post: http://blogs.msdn.com/michkap/archive/2005/10/02/476213.aspx In short - Microsoft are not developing W/A APIs anymore. Also, if you look at their latest software you'll notice that they are using MSLU (Microsoft layer for UNICODE) so that their UNICODE programs can run on 9x. My personal experience is that our customers don't even use Windows 2000. Everyone is using XP for desktops and x64 for servers. So what is your opinion? Do you need to support a 9x version of a program for living? Todor
Feb 15 2007
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Todor Totev schrieb am 2007-02-15: [snip]My personal experience is that our customers don't even use Windows 2000. Everyone is using XP for desktops and x64 for servers. So what is your opinion? Do you need to support a 9x version of a program for living?
Yes. 9x is still used because the communication software for engineering hardware are still 16bit. (closed source and undocumented protocols ...) Thomas -----BEGIN PGP SIGNATURE----- iD8DBQFF1D5ILK5blCcjpWoRAgo5AJ4rP678+vko2yaU/sZorPN1vVxyxwCeI9Hi jQ2gbuZ3GMr1PQQYvFpZIBU= =V4rC -----END PGP SIGNATURE-----
Feb 15 2007









BCS <BCS pathlink.com> 