digitalmars.D - Encodings

Nathan M. Swan (6/6) Apr 08 2012 For most of the string processing I do, I read/write text in

Jonathan M Davis (14/21) Apr 08 2012 It depends on what you're doing. Depending on the functions that you use...

"Nathan M. Swan" <nathanmswan gmail.com> writes:

For most of the string processing I do, I read/write text in 
UTF-8 and convert it to UTF-32 for processing (with std.utf), so 
I don't have to worry about encoding. Is this a good or bad 
paradigm? Is there a better way to do this? What method do all of 
you use?

Just curious, NMS

Apr 08 2012

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Sunday, April 08, 2012 23:36:23 Nathan M. Swan wrote:
 For most of the string processing I do, I read/write text in
 UTF-8 and convert it to UTF-32 for processing (with std.utf), so
 I don't have to worry about encoding. Is this a good or bad
 paradigm? Is there a better way to do this? What method do all of
 you use?
 
 Just curious, NMS

It depends on what you're doing. Depending on the functions that you use and 
your memory requirements, UTF-8 may be faster or UTF-32 may be faster. UTF-32 
has the advantage of being a random-access range, which will make it work with 
a number of functions that UTF-8 won't work with. But UTF-32 also takes 
considerably more memory (especially if most of your characters are ASCII 
characters), which can be a problem.

I think that the most common thing is to just operate on UTF-8 unless another 
encoding is needed (e.g. UTF-32 is required because random-access is needed), 
and in plenty of cases, you end up operating on generic ranges anyway if you 
use range-based functions on strings and don't use std.array.array on them.

You're going to have to profile your code to see whether using UTF-8 or UTF-32 
primarily in your string-processing is more efficient.

- Jonathan M Davis

Apr 08 2012

D Programming

C/C++ Programming

Other

digitalmars.D - Encodings