www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Re: Questions about Unicode, particularly Japanese

reply Ruslan Nikolaev <nruslan_devel yahoo.com> writes:
Sorry, if it's again top post in your mail clients. I'll try to figure out
what's going on later today.


 
 1. Am I correct in all of that?

Yes. That's the reason I was saying that UTF-16 is *NOT* a lousy encoding. It really depends on a situation. The advantage is not only space but also faster processing speed (even for 2 byte letters: Greek, Cyrillic, etc.) since those 2 bytes can be read at one memory access as opposed to UTF-8. Also, consider another thing: it's easier (and cheaper) to convert from ANSI to UTF-16 since a direct table can be created. Whereas for UTF-8, you'll have to do some shifts to create a surrogate for non-ASCII letters (even for Latin ones). What encoding is better depends on your taste, language, applications, etc. I was simply pointing out that it's quite nice to have universal 'tchar' type. My argument was never about which encoding is better - it's hard to tell in general. Besides, many people still use ANSI and not UTF-8.
Jun 08 2010
next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 08 Jun 2010 16:18:54 -0400, Ruslan Nikolaev  
<nruslan_devel yahoo.com> wrote:

 Sorry, if it's again top post in your mail clients. I'll try to figure  
 out what's going on later today.

It appears as a top-post in my newsreader too.
 1. Am I correct in all of that?

Yes. That's the reason I was saying that UTF-16 is *NOT* a lousy encoding. It really depends on a situation. The advantage is not only space but also faster processing speed (even for 2 byte letters: Greek, Cyrillic, etc.) since those 2 bytes can be read at one memory access as opposed to UTF-8. Also, consider another thing: it's easier (and cheaper) to convert from ANSI to UTF-16 since a direct table can be created. Whereas for UTF-8, you'll have to do some shifts to create a surrogate for non-ASCII letters (even for Latin ones). What encoding is better depends on your taste, language, applications, etc. I was simply pointing out that it's quite nice to have universal 'tchar' type. My argument was never about which encoding is better - it's hard to tell in general. Besides, many people still use ANSI and not UTF-8.

Wouldn't this suggest that the decision of what character type to use would be more suited to what language you speak than what OS you are running? -Steve
Jun 08 2010
prev sibling parent "Nick Sabalausky" <a a.a> writes:
"Ruslan Nikolaev" <nruslan_devel yahoo.com> wrote in message 
news:mailman.138.1276028343.24349.digitalmars-d puremagic.com...
 Sorry, if it's again top post in your mail clients. I'll try to figure out 
 what's going on later today.


 1. Am I correct in all of that?

Yes. That's the reason I was saying that UTF-16 is *NOT* a lousy encoding. It really depends on a situation. The advantage is not only space but also faster processing speed (even for 2 byte letters: Greek, Cyrillic, etc.) since those 2 bytes can be read at one memory access as opposed to UTF-8. Also, consider another thing: it's easier (and cheaper) to convert from ANSI to UTF-16 since a direct table can be created. Whereas for UTF-8, you'll have to do some shifts to create a surrogate for non-ASCII letters (even for Latin ones).

Yea, I need to remember not to try to post late at night ;)
Jun 08 2010