www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Re: Wide characters support in D

reply Ruslan Nikolaev <nruslan_devel yahoo.com> writes:
=0A> =0A> Is this what you want?=0A> =0A> =A0=A0=A0 version (utf16)=0A> =A0=
=A0=A0 =A0=A0=A0 alias wchar tchar;=0A> =A0=A0=A0 else=0A> =A0=A0=A0 =A0=A0=
=A0 alias char tchar;=0A> =0A> =A0=A0=A0 alias immutable(tchar)[] tstring;=
=0A> =0A> =A0=A0=A0 import std.utf;=0A> =0A> =A0=A0=A0 unittest {=0A> =A0=
=A0=A0 =A0=A0=A0 tstring tstr =3D=0A> "hello";=0A> =A0=A0=A0 =A0=A0=A0 dstr=
ing dstr =3D=0A> toUTF32(tstr);=0A> =A0=A0=A0 }=0A> =0A=0AYes, I think some=
thing like this but standardized by the language. Also would be nice to hav=
e for interoperability (like I also mentioned in the beginning) toUTF16, to=
UTF8, fromUTF16, fromUTF8, fromUTF32, as tchar can be anything. If it's UTF=
-16, and you do toUTF16 - it won't do actual conversion, rather use input s=
tring instead. Something like this.=0A=0AThe other point of argument - whet=
her to use this kind of type as the main character type. My point was that =
having this kind of type used in dynamic libraries would be nice since you =
don't need to provide instances for every other character type, and at the =
same time - use native character encoding available on system. Of course it=
 does not mean, that you should be deprived of other types. If you need spe=
cific type to do something specific, you can always use it.=0A=0A=0A      
Jun 08 2010
parent Michel Fortin <michel.fortin michelf.com> writes:
On 2010-06-08 09:22:02 -0400, Ruslan Nikolaev <nruslan_devel yahoo.com> said:

 you don't need to provide instances for every other character type, and 
 at the same time - use native character encoding available on system.

My opinion is thinking this will work is a fallacy. Here's why... Generally Linux systems use UTF-8 so I guess the "system encoding" there will be UTF-8. But then if you start to use QT you have to use UTF-16, but you might have to intermix UTF-8 to work with other libraries in the backend (libraries which are not necessarily D libraries, nor system libraries). So you may have a UTF-8 backend (such as the MySQL library), UTF-8 "system encoding" glue code, and UTF-16 GUI code (QT). That might be a good or a bad choice, depending on various factors, such as whether the glue code send more strings to the backend or the GUI. Now try to port the thing to Windows where you define the "system encoding" as UTF-16. Now you still have the same UTF-8 backend, and the same UTF-16 GUI code, but for some reason you're changing the glue code in the middle to UTF-16? Sure, it can be made to work, but all the string conversions will start to happen elsewhere, which may change the performance characteristics and add some potential for bugs, and all this for no real reason. The problem is that what you call "system encoding" is only the encoding used by the system frameworks. It is relevant when working with the system frameworks, but when you're working with any other API, you'll probably want to use the same character type as that API does, not necessarily the "system encoding". Not all programs are based on extensive use of the system frameworks. In some situations you'll want to use UTF-16 on Linux, or UTF-8 on Windows, because you're dealing with libraries that expect that (QT, MySQL). A compiler switch is a poor choice there, because you can't mix libraries compiled with a different compiler switches when that switch changes the default character type. In most cases, it's much better in my opinion if the programmer just uses the same character type as one of the libraries it uses, stick to that, and is aware of what he's doing. If someone really want to deal with the complexity of supporting both character types depending on the environment it runs on, it's easy to create a "tchar" and "tstring" alias that depends on whether it's Windows or Linux, or on a custom version flag from a compiler switch, but that'll be his choice and his responsibility to make everything work. But I think in this case a better option might be to abstract all those 'strings' under a single type that work with all UTF encodings (something like [mtext]). [mtext]: http://www.dprogramming.com/mtext.php -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Jun 08 2010