www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - D and Unicode(UTF16) strings

reply Vincent Richomme <forumer smartmobili.com> writes:
Hi,

would it be possible to add a type wstring that could represent a UTF16 
string. Actually on Windows platform you can compile in ANSI or UNICODE 
and you have the standard char* as well as a wchar_t*.

I saw that in D string is an alias for char[], would it be possible to 
do the same for wchar[] and define a wstring in core language ?

That would allow to declare an alias like this :

Version(Unicode)
{
  alias wstring tstring
}
else
{
alias string tstring
}
Jul 24 2008
parent reply "Lionello Lunesu" <lionello lunesu.remove.com> writes:
"Vincent Richomme" <forumer smartmobili.com> wrote in message 
news:g6b5ne$1anf$1 digitalmars.com...
 Hi,

 would it be possible to add a type wstring that could represent a UTF16 
 string. Actually on Windows platform you can compile in ANSI or UNICODE 
 and you have the standard char* as well as a wchar_t*.

 I saw that in D string is an alias for char[], would it be possible to do 
 the same for wchar[] and define a wstring in core language ?
That's already the case; check object.d in the dmd distri.
 That would allow to declare an alias like this :

 Version(Unicode)
 {
  alias wstring tstring
 }
 else
 {
 alias string tstring
 }
Although, coming from C++, that might seem a good idea at first, note that Windows doesn't quite know about UTF8. It can convert UTF8 to UNICODE and back, but apart from the MultiByteToWideChar-like functions you cannot pass UTF8 (ie. string, char[]) to any ANSI Windows API. The ANSI functions all use the current thead code page for conversion, which cannot be set to UTF8. (God knows I've tried. If anybody managed to do just this, pls let me know how.) I'd suggest to stick to wstring/Unicode. Most Unicode APIs are also available on Win95 so there should be little reason to use the ANSI functions for any Windows application. Trying to use UTF8 on Windows means that you'll either have to constantly convert the UTF8 strings to Unicode yourself, or use byte[] instead of "string" to prevent any errors using Phobos/Tango APIs that assume char[]/string contains UTF8. Anyway, that's what I've found out while messing with unicode/ansi stuff on Windows. It might even be outdated at this point.. L.
Jul 24 2008
parent "Stewart Gordon" <smjg_1998 yahoo.com> writes:
"Lionello Lunesu" <lionello lunesu.remove.com> wrote in message 
news:g6bga9$2dmh$1 digitalmars.com...
<snip>
 Although, coming from C++, that might seem a good idea at first, note that 
 Windows doesn't quite know about UTF8. It can convert UTF8 to UNICODE and 
 back, but apart from the MultiByteToWideChar-like functions you cannot 
 pass UTF8 (ie. string, char[]) to any ANSI Windows API.
Check out std.windows.charset.
 The ANSI functions all use the current thead code page for conversion, 
 which cannot be set to UTF8. (God knows I've tried. If anybody managed to 
 do just this, pls let me know how.)

 I'd suggest to stick to wstring/Unicode. Most Unicode APIs are also 
 available on Win95 so there should be little reason to use the ANSI 
 functions for any Windows application.
I've never established which Unicode APIs are implemented on Win9x. There ought to be documentation on this. There's also a thing called Microsoft Layer for Unicode, but annoyingly, there seems to be no convenient way for apps to use it iff it's installed.
 Trying to use UTF8 on Windows means that you'll either have to constantly 
 convert the UTF8 strings to Unicode yourself, or use byte[] instead of 
 "string" to prevent any errors using Phobos/Tango APIs that assume 
 char[]/string contains UTF8.
Just not using the Phobos/Tango string functions would do this, whether you store your strings as byte[], ubyte[] or char[].
 Anyway, that's what I've found out while messing with unicode/ansi stuff 
 on Windows. It might even be outdated at this point..
I guess it depends on which Windows versions you're targeting.... Stewart. -- My e-mail address is valid but not my primary mailbox. Please keep replies on the 'group where everybody may benefit.
Jul 25 2008