www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Re: Wide characters support in D

reply Ruslan Nikolaev <nruslan_devel yahoo.com> writes:
 You only need to do that where you are shipping closed
 source and for that, it should be trivial to get the
 compiler to generate all three versions. 

You will also need to do it in open source projects if you want to include generated template code into dynamic library as opposed to user's program (read as unnecessary space "burden" where code is repeated over and over again across user programs). But, yes, closed source programs is a good particular example. True, you can compile all 3 versions. But the whole argument was about additional generated code which someone claimed will not happen.
 
 Your, right: it depends. In the few cases I can think of
 where more of the D code will be interacting with non D code
 than just processing the text, you could almost use void[]
 as your type. Where would you care about the encoding but
 not do much worth it?
 
 Also unless you have large amounts of text, you are going
 to have to work hard to get perf problems. If you do have
 large amounts of text, you are going to be I/O bound (cache
 misses etc.) and at that point, the cost of any operation,
 is it's I/O. From that, Reading in some date, doing a single
 pass of processing on it and writing it back out would only
 take 2/3 long with translations on both side.
 

True. But even simple string handling is faster for UTF-16. The time required to read 2 bytes from UTF-16 string is the same 1 byte from UTF-8. Generally, we have to read one code point after another (not more than this) since data guaranteed to be aligned by 2 byte boundary for wchar and 1 byte for char. Not to mention that converting 2 code points takes less time in UTF-16. And why not use this opportunity if system already natively support this? In addition, I want to mention that reading/writing file in text mode is very transparent. For instance, in Windows, the conversion will happen automatically from multibyte to unicode for open, fopen, etc. when text mode is specified. In general, it is a good practice since 1 byte char text is not necessary UTF-8 anyway and can be ANSI as well. Also, some other OS use 2 bytes UTF-16 natively, so it's not just for Windows. If I am not wrong, Symbian should be one such example.
Jun 07 2010
parent "Nick Sabalausky" <a a.a> writes:
"Ruslan Nikolaev" <nruslan_devel yahoo.com> wrote in message 
news:mailman.127.1275974825.24349.digitalmars-d puremagic.com...
 True. But even simple string handling is faster for UTF-16. The time 
 required to read 2 bytes from UTF-16 string is the same 1 byte from UTF-8. 
 Generally, we have to read one code point after another (not more than 
 this) since data guaranteed to be aligned by 2 byte boundary for wchar and 
 1 byte for char. Not to mention that converting 2 code points takes less 
 time in UTF-16. And why not use this opportunity if system already 
 natively support this?

Why do you say that UTF-16 is faster than UTF-8?
In general, it is a good practice since 1 byte char text is not necessary 
UTF-8 anyway and can be ANSI as well.

That's what the BOM is for.
Jun 07 2010