www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - std.string.toupper/tolower failed with mixture of Engish and Chinese characters

reply Shawn Liu <Shawn_member pathlink.com> writes:
std.string.toupper() and std.string.tolower() give a wrong result when deal with
a mixture of upper/lower English and Chinese characters. e.g.
char[] a = "AbCdÖŠeFgH";
char[] b = std.string.toupper(a);
char[] c = std.string.tolower(a);

The length of a is 11, but the length of b,c is 18 now.
Nov 21 2005
next sibling parent reply "Kris" <fu bar.com> writes:
"Shawn Liu" <Shawn_member pathlink.com> wrote...
 std.string.toupper() and std.string.tolower() give a wrong result when 
 deal with
 a mixture of upper/lower English and Chinese characters. e.g.
 char[] a = "AbCdÖŠeFgH";
 char[] b = std.string.toupper(a);
 char[] c = std.string.tolower(a);

 The length of a is 11, but the length of b,c is 18 now.

Phobos doesn't supports non-ascii conversions/comparisons at this time?
Nov 21 2005
parent Thomas Kuehne <thomas-dloop kuehne.cn> writes:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

[follow up set to: digitalmars.D.bugs]

Kris schrieb am 2005-11-22:
 "Shawn Liu" <Shawn_member pathlink.com> wrote...
 std.string.toupper() and std.string.tolower() give a wrong result when 
 deal with
 a mixture of upper/lower English and Chinese characters. e.g.
 char[] a = "AbCdÖŠeFgH";
 char[] b = std.string.toupper(a);
 char[] c = std.string.tolower(a);

 The length of a is 11, but the length of b,c is 18 now.

Phobos doesn't supports non-ascii conversions/comparisons at this time?

Phobos does, at least the simple conversions. No matter what cases are treated, the untreated data shouldn't get corrupted. The attached zipped string.d fixes toupper/tolower and extends the unittests. (Yes I know, it isn't the fastest possible algorithm ...) Thomas
Nov 26 2005
prev sibling parent Derek Parnell <derek psych.ward> writes:
On Tue, 22 Nov 2005 02:19:50 +0000 (UTC), Shawn Liu wrote:

 std.string.toupper() and std.string.tolower() give a wrong result when deal
with
 a mixture of upper/lower English and Chinese characters. e.g.
 char[] a = "AbCdÖŠeFgH";
 char[] b = std.string.toupper(a);
 char[] c = std.string.tolower(a);
 
 The length of a is 11, but the length of b,c is 18 now.

If it isn't ASCII then DMD doesn't want to know about it. Try the Mango library for its ICU bindings, I think that might have it. -- Derek (skype: derek.j.parnell) Melbourne, Australia 22/11/2005 1:33:24 PM
Nov 21 2005