www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - dchar counting in a char[]

reply Derek Parnell <derek psych.ward> writes:
I've seen this sort of code ...

  int  FooFind(char[] X, dchar D)
  {
    foreach(int i, dchar C; X)
    {
      if (c == D) return i;
    }
    return -1;
  }

Now I understand that the foreach correctly packages up the utf-8 codepoint
fragments to form a valid utf32 character, but when the value of 'i' is
returned, it is an index in to the original utf-8 string or an index into
the equivalent utf32 string? I'm pretty sure its a utf-8 index and that is
a useful thing, as it tells you where in the original string the set of
code fragements that make up the character begins. However, it doesn't tell
you how many characters into the utf-8 string that the searched-for
character was found.

I wrote this routine below, but I'm not sure if I needed to.

  int  FooFind(dchar[] X, dchar D)
  {
    foreach(int i, dchar C; X)
    {
      if (c == D) return i;
    }
    return -1;
  }



-- 
Derek
Melbourne, Australia
22/03/2005 4:24:50 PM
Mar 21 2005
parent "Walter" <newshound digitalmars.com> writes:
"Derek Parnell" <derek psych.ward> wrote in message
news:1w6s40so7p838.8yz58w5g6l4q.dlg 40tude.net...
 I've seen this sort of code ...

   int  FooFind(char[] X, dchar D)
   {
     foreach(int i, dchar C; X)
     {
       if (c == D) return i;
     }
     return -1;
   }

The Phobos library routine std.string.find() does the same thing.
 Now I understand that the foreach correctly packages up the utf-8

 fragments to form a valid utf32 character, but when the value of 'i' is
 returned, it is an index in to the original utf-8 string or an index into
 the equivalent utf32 string?

The former.
 I'm pretty sure its a utf-8 index and that is
 a useful thing, as it tells you where in the original string the set of
 code fragements that make up the character begins. However, it doesn't

 you how many characters into the utf-8 string that the searched-for
 character was found.

That's right. You can feed the result into std.utf.toUCSindex() to get the other index.
 I wrote this routine below, but I'm not sure if I needed to.

   int  FooFind(dchar[] X, dchar D)
   {
     foreach(int i, dchar C; X)
     {
       if (c == D) return i;
     }
     return -1;
   }

I think this will do what you wish as well (return UCS index): int FooFind(dchar[] X, dchar D) { int i; foreach(dchar C; X) { if (c == D) return i; i++; } return -1; }
Mar 22 2005