www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - doFormat counts bytes, not characters

When calculating padding, doFormat counts the number of bytes in the string, not
hte number of characters. UTF-8 strings containing multibyte characters are
padded wrong.

A quick and dirty fix is to apply the following to format.d:

141c141
<           int padding = field_width - (strlen(prefix) + s.length);
---
           int padding = field_width - (strlen(prefix) + toUTF32(s).length);
Another better solution to add functions to std.utf for counting the number of characters in a string. This is slightly faster and avoids unnecessary memory allocation. One possible way to do this (for UTF-8) follows below. It's basically a stripped down version of decode(). Nick
Feb 09 2005