digitalmars.D - Re: Questions about Unicode, particularly Japanese

Ruslan Nikolaev <nruslan_devel yahoo.com> Jun 08 2010

"Steven Schveighoffer" <schveiguy yahoo.com> Jun 08 2010
"Nick Sabalausky" <a a.a> Jun 08 2010

Ruslan Nikolaev <nruslan_devel yahoo.com> writes:

Sorry, if it's again top post in your mail clients. I'll try to figure out
what's going on later today.


 
 1. Am I correct in all of that?


Yes. That's the reason I was saying that UTF-16 is *NOT* a lousy encoding. It
really depends on a situation. The advantage is not only space but also faster
processing speed (even for 2 byte letters: Greek, Cyrillic, etc.) since those 2
bytes can be read at one memory access as opposed to UTF-8. Also, consider
another thing: it's easier (and cheaper) to convert from ANSI to UTF-16 since a
direct table can be created. Whereas for UTF-8, you'll have to do some shifts
to create a surrogate for non-ASCII letters (even for Latin ones).

What encoding is better depends on your taste, language, applications, etc. I
was simply pointing out that it's quite nice to have universal 'tchar' type. My
argument was never about which encoding is better - it's hard to tell in
general. Besides, many people still use ANSI and not UTF-8.

Jun 08 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Tue, 08 Jun 2010 16:18:54 -0400, Ruslan Nikolaev  
<nruslan_devel yahoo.com> wrote:

 Sorry, if it's again top post in your mail clients. I'll try to figure  
 out what's going on later today.


It appears as a top-post in my newsreader too.

 1. Am I correct in all of that?


 Yes. That's the reason I was saying that UTF-16 is *NOT* a lousy  
 encoding. It really depends on a situation. The advantage is not only  
 space but also faster processing speed (even for 2 byte letters: Greek,  
 Cyrillic, etc.) since those 2 bytes can be read at one memory access as  
 opposed to UTF-8. Also, consider another thing: it's easier (and  
 cheaper) to convert from ANSI to UTF-16 since a direct table can be  
 created. Whereas for UTF-8, you'll have to do some shifts to create a  
 surrogate for non-ASCII letters (even for Latin ones).

 What encoding is better depends on your taste, language, applications,  
 etc. I was simply pointing out that it's quite nice to have universal  
 'tchar' type. My argument was never about which encoding is better -  
 it's hard to tell in general. Besides, many people still use ANSI and  
 not UTF-8.


Wouldn't this suggest that the decision of what character type to use  
would be more suited to what language you speak than what OS you are  
running?

-Steve

Jun 08 2010

"Nick Sabalausky" <a a.a> writes:

"Ruslan Nikolaev" <nruslan_devel yahoo.com> wrote in message 
news:mailman.138.1276028343.24349.digitalmars-d puremagic.com...
 Sorry, if it's again top post in your mail clients. I'll try to figure out 
 what's going on later today.


 1. Am I correct in all of that?


 Yes. That's the reason I was saying that UTF-16 is *NOT* a lousy encoding. 
 It really depends on a situation. The advantage is not only space but also 
 faster processing speed (even for 2 byte letters: Greek, Cyrillic, etc.) 
 since those 2 bytes can be read at one memory access as opposed to UTF-8. 
 Also, consider another thing: it's easier (and cheaper) to convert from 
 ANSI to UTF-16 since a direct table can be created. Whereas for UTF-8, 
 you'll have to do some shifts to create a surrogate for non-ASCII letters 
 (even for Latin ones).


Yea, I need to remember not to try to post late at night ;)

Jun 08 2010

D Programming

C/C++ Programming

Other

digitalmars.D - Re: Questions about Unicode, particularly Japanese