www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - string and utf aliases

Would it be yet another "blasphemy" to
add a string *alias* to the language ?
(No, not a string typedef. Just alias)

I think that, and some char type aliases
similar to stdint.d, could do *wonders*
for the readability/understandability ?


alias char  utf8_t;
alias wchar utf16_t;
alias dchar utf32_t;

alias utf8_t[]   string; // ASCII-optimized
alias utf16_t[] ustring; // Unicode-optimized


Used like in the following example D program,
that will print all args in UTF-8 and UTF-32:

void main(string[] args)
{
   foreach(int a, string arg; args) {
     printf("%d: %.*s\n", a, arg);
     printf("    ");
     foreach (utf8_t b; arg) {
       printf("%02x ", b);
     }
     printf("\n");
     foreach (utf32_t c; arg) {
       printf("\t\\U%08x\n", c);
     }
   }
}

For simple ASCII, the output looks something like:

0: ./unichar
     2e 2f 75 6e 69 63 68 61 72
         \U0000002e
         \U0000002f
         \U00000075
         \U0000006e
         \U00000069
         \U00000063
         \U00000068
         \U00000061
         \U00000072

With unicode arguments, it looks ... different.
(since some UTF-8 code units will be surrogates)

--anders


PS:
I think this string alias and UTF-8 chars are way
better than Java's String class and UTF-16 chars!
(pretty much the same way that the compiled D code
vastly outperforms the Java code with JVM startup)
Oct 14 2004