digitalmars.D.learn - Get Character At?

okibi (2/2) Apr 24 2007 Is there a getCharAt() function for D?

Derek Parnell (7/8) Apr 24 2007 Get a character from what? A string, a file, a console screen, ... ?

okibi (6/17) Apr 24 2007 Such as this:

Tomas Lindquist Olsen (4/27) Apr 24 2007 Why not just do:

okibi (2/34) Apr 24 2007 Because it isn't working for me. That was what I was trying to do seeing...

BCS (2/5) Apr 24 2007 How about a little more code. What I've seen so far should work.
Tomas Lindquist Olsen (9/20) Apr 24 2007 import std.stdio;

okibi (2/25) Apr 24 2007 That fixed the problem, thanks!

Clay Smith (2/31) Apr 24 2007 text[5] will return the sixth element in the array.

Tomas Lindquist Olsen (2/4) Apr 24 2007 He never said anything about getCharAt starting at one...

Clay Smith (4/27) Apr 24 2007 Just use
Derek Parnell (34/57) Apr 24 2007 Because char[] represents a UTF-8 encoded unicode string, to get the Nth

Chris Nicholson-Sauls (7/68) Apr 24 2007 Which is why I tend to try and bite the bullet and just use dchar[] for ...
Daniel Keep (19/87) Apr 24 2007 I was going to post a link to my old Text In D article[1], but I guess

Derek Parnell (71/74) Apr 25 2007 It seems that your routine is about 3 times slower than the one I had

Frits van Bommel (25/68) Apr 25 2007 How is it unclear? Postfix-increment clearly means that the value before...

Daniel Keep (15/95) Apr 25 2007 Yoikes! I'm rather amazed that the "simple" foreach method is that much
Derek Parnell (25/42) Apr 25 2007 Yes, I know what it is supposed to do, but when written as it is, it can

Frits van Bommel (39/50) Apr 25 2007 I was just mentioning that you seemed to be over-complicating the code,

okibi <okibi ratedo.com> writes:

Is there a getCharAt() function for D?

Thanks!

Apr 24 2007

Derek Parnell <derek psych.ward> writes:

On Tue, 24 Apr 2007 10:30:16 -0400, okibi wrote:

 Is there a getCharAt() function for D?

Get a character from what? A string, a file, a console screen, ... ?

-- 
Derek Parnell
Melbourne, Australia
"Justice for David Hicks!"
skype: derek.j.parnell

Apr 24 2007

okibi <okibi ratedo.com> writes:

Derek Parnell Wrote:

 On Tue, 24 Apr 2007 10:30:16 -0400, okibi wrote:
 
 Is there a getCharAt() function for D?

 
 Get a character from what? A string, a file, a console screen, ... ?
 
 -- 
 Derek Parnell
 Melbourne, Australia
 "Justice for David Hicks!"
 skype: derek.j.parnell

Such as this:

char[] text = "This is a test sentence.";

int loc = 5;

char num5 = text.getCharAt(loc);

Something along those lines.

Apr 24 2007

Tomas Lindquist Olsen <tomas famolsen.dk> writes:

okibi wrote:

 Derek Parnell Wrote:
 
 On Tue, 24 Apr 2007 10:30:16 -0400, okibi wrote:
 
 Is there a getCharAt() function for D?

 
 Get a character from what? A string, a file, a console screen, ... ?
 
 --
 Derek Parnell
 Melbourne, Australia
 "Justice for David Hicks!"
 skype: derek.j.parnell

 
 Such as this:
 
 char[] text = "This is a test sentence.";
 
 int loc = 5;
 
 char num5 = text.getCharAt(loc);
 
 Something along those lines.

Why not just do:

char[] text = "some text";
char num5 = text[5];

Apr 24 2007

okibi <okibi ratedo.com> writes:

Tomas Lindquist Olsen Wrote:

 okibi wrote:
 
 Derek Parnell Wrote:
 
 On Tue, 24 Apr 2007 10:30:16 -0400, okibi wrote:
 
 Is there a getCharAt() function for D?

 
 Get a character from what? A string, a file, a console screen, ... ?
 
 --
 Derek Parnell
 Melbourne, Australia
 "Justice for David Hicks!"
 skype: derek.j.parnell

 
 Such as this:
 
 char[] text = "This is a test sentence.";
 
 int loc = 5;
 
 char num5 = text.getCharAt(loc);
 
 Something along those lines.

 
 Why not just do:
 
 char[] text = "some text";
 char num5 = text[5];
 
 

Because it isn't working for me. That was what I was trying to do seeing as
char[] is simply an array of characters. However, it's returning an int and not
a char.

Apr 24 2007

BCS <BCS pathlink.com> writes:

okibi wrote:
 
 
 Because it isn't working for me. That was what I was trying to do seeing as
char[] is simply an array of characters. However, it's returning an int and not
a char.

How about a little more code. What I've seen so far should work.

Apr 24 2007

Tomas Lindquist Olsen <tomas famolsen.dk> writes:

okibi wrote:
 
 Why not just do:
 
 char[] text = "some text";
 char num5 = text[5];
 
 

 
 Because it isn't working for me. That was what I was trying to do seeing
 as char[] is simply an array of characters. However, it's returning an int
 and not a char.

import std.stdio;

void main()
{
    char[] text = "this is a sentence";
    int loc = 5;
    writefln("%s", typeid(typeof(text[loc])));
}

this prints 'char' as expected...

Apr 24 2007

okibi <okibi ratedo.com> writes:

Tomas Lindquist Olsen Wrote:

 okibi wrote:
 
 Why not just do:
 
 char[] text = "some text";
 char num5 = text[5];
 
 

 
 Because it isn't working for me. That was what I was trying to do seeing
 as char[] is simply an array of characters. However, it's returning an int
 and not a char.

 
 import std.stdio;
 
 void main()
 {
     char[] text = "this is a sentence";
     int loc = 5;
     writefln("%s", typeid(typeof(text[loc])));
 }
 
 this prints 'char' as expected...

That fixed the problem, thanks!

Apr 24 2007

Clay Smith <clayasaurus gmail.com> writes:

Tomas Lindquist Olsen wrote:
 okibi wrote:
 
 Derek Parnell Wrote:

 On Tue, 24 Apr 2007 10:30:16 -0400, okibi wrote:

 Is there a getCharAt() function for D?

 Get a character from what? A string, a file, a console screen, ... ?

 --
 Derek Parnell
 Melbourne, Australia
 "Justice for David Hicks!"
 skype: derek.j.parnell

 Such as this:

 char[] text = "This is a test sentence.";

 int loc = 5;

 char num5 = text.getCharAt(loc);

 Something along those lines.

 
 Why not just do:
 
 char[] text = "some text";
 char num5 = text[5];
 

text[5] will return the sixth element in the array.

Apr 24 2007

Tomas Lindquist Olsen <tomas famolsen.dk> writes:

Clay Smith wrote:
 
 text[5] will return the sixth element in the array.

He never said anything about getCharAt starting at one...

Apr 24 2007

Clay Smith <clayasaurus gmail.com> writes:

okibi wrote:
 Derek Parnell Wrote:
 
 On Tue, 24 Apr 2007 10:30:16 -0400, okibi wrote:

 Is there a getCharAt() function for D?

 Get a character from what? A string, a file, a console screen, ... ?

 -- 
 Derek Parnell
 Melbourne, Australia
 "Justice for David Hicks!"
 skype: derek.j.parnell

 
 Such as this:
 
 char[] text = "This is a test sentence.";
 
 int loc = 5;
 
 char num5 = text.getCharAt(loc);
 
 Something along those lines.
 

Just use

char num5 = text[loc-1];

?

Apr 24 2007

Derek Parnell <derek psych.ward> writes:

On Tue, 24 Apr 2007 11:56:19 -0400, okibi wrote:

 Derek Parnell Wrote:
 
 On Tue, 24 Apr 2007 10:30:16 -0400, okibi wrote:
 
 Is there a getCharAt() function for D?

 
 Get a character from what? A string, a file, a console screen, ... ?
 
 -- 
 Derek Parnell
 Melbourne, Australia
 "Justice for David Hicks!"
 skype: derek.j.parnell

 
 Such as this:
 
 char[] text = "This is a test sentence.";
 
 int loc = 5;
 
 char num5 = text.getCharAt(loc);
 
 Something along those lines.

Because char[] represents a UTF-8 encoded unicode string, to get the Nth
character (first character is a position 1), try this ...

   import std.stdio;
   import std.utf;

   T getCharAt(T)(T pText, uint pPos)
   {
       size_t lUTF_Index;
       uint   lStride;

       // Firstly, find out where the character starts in the string.
       lUTF_Index = std.utf.toUTFindex(pText, pPos-1);

       // Then find out its width (in bytes)
       lStride = std.utf.stride(pText, lUTF_Index);

       // Return the character encoded in UTF format.
       return pText[lUTF_Index .. lUTF_Index + lStride];
  }

  void main()
  {
    char[] text = "a\ua034bcdef";
    uint loc = 4;
    writefln("%s", getCharAt(text, loc)); // shows "c"
    writefln("%s", text[loc-1]); // correctly fails
  }


If you just use 'text[loc]', you may not get the correct character, and you
actually only get a UTF code point fragment anyway.

Remember that char[] is not an array of characters. It is an array of UTF-8
code point fragments (each 1-byte wide) and a UTF-8 encoded character (code
point) can have from 1 to 4 fragments.
 
-- 
Derek Parnell
Melbourne, Australia
"Justice for David Hicks!"
skype: derek.j.parnell

Apr 24 2007

Chris Nicholson-Sauls <ibisbasenji gmail.com> writes:

Derek Parnell wrote:
 On Tue, 24 Apr 2007 11:56:19 -0400, okibi wrote:
 
 Derek Parnell Wrote:

 On Tue, 24 Apr 2007 10:30:16 -0400, okibi wrote:

 Is there a getCharAt() function for D?

 Get a character from what? A string, a file, a console screen, ... ?

 -- 
 Derek Parnell
 Melbourne, Australia
 "Justice for David Hicks!"
 skype: derek.j.parnell

 Such as this:

 char[] text = "This is a test sentence.";

 int loc = 5;

 char num5 = text.getCharAt(loc);

 Something along those lines.

 
 Because char[] represents a UTF-8 encoded unicode string, to get the Nth
 character (first character is a position 1), try this ...
 
    import std.stdio;
    import std.utf;
 
    T getCharAt(T)(T pText, uint pPos)
    {
        size_t lUTF_Index;
        uint   lStride;
 
        // Firstly, find out where the character starts in the string.
        lUTF_Index = std.utf.toUTFindex(pText, pPos-1);
 
        // Then find out its width (in bytes)
        lStride = std.utf.stride(pText, lUTF_Index);
 
        // Return the character encoded in UTF format.
        return pText[lUTF_Index .. lUTF_Index + lStride];
   }
 
   void main()
   {
     char[] text = "a\ua034bcdef";
     uint loc = 4;
     writefln("%s", getCharAt(text, loc)); // shows "c"
     writefln("%s", text[loc-1]); // correctly fails
   }
 
 
 If you just use 'text[loc]', you may not get the correct character, and you
 actually only get a UTF code point fragment anyway.
 
 Remember that char[] is not an array of characters. It is an array of UTF-8
 code point fragments (each 1-byte wide) and a UTF-8 encoded character (code
 point) can have from 1 to 4 fragments.
  

Which is why I tend to try and bite the bullet and just use dchar[] for general
purpose 
things.  I only use char[] in cases where I know it's "safe" to do so (that is,
cases 
where I know what the input will be, and know it will be within the single-byte
character 
range).  That said, its a darn good thing Phobos has std.utf and Tango has 
tango.utils.Utf, otherwise we'd often be in a pickle.  (Avoiding potential
tango.io joke.)

-- Chris Nicholson-Sauls

Apr 24 2007

Daniel Keep <daniel.keep.lists gmail.com> writes:

Derek Parnell wrote:
 On Tue, 24 Apr 2007 11:56:19 -0400, okibi wrote:
 
 Derek Parnell Wrote:

 On Tue, 24 Apr 2007 10:30:16 -0400, okibi wrote:

 Is there a getCharAt() function for D?

 Get a character from what? A string, a file, a console screen, ... ?

 -- 
 Derek Parnell
 Melbourne, Australia
 "Justice for David Hicks!"
 skype: derek.j.parnell

 Such as this:

 char[] text = "This is a test sentence.";

 int loc = 5;

 char num5 = text.getCharAt(loc);

 Something along those lines.

 
 Because char[] represents a UTF-8 encoded unicode string, to get the Nth
 character (first character is a position 1), try this ...
 
    import std.stdio;
    import std.utf;
 
    T getCharAt(T)(T pText, uint pPos)
    {
        size_t lUTF_Index;
        uint   lStride;
 
        // Firstly, find out where the character starts in the string.
        lUTF_Index = std.utf.toUTFindex(pText, pPos-1);
 
        // Then find out its width (in bytes)
        lStride = std.utf.stride(pText, lUTF_Index);
 
        // Return the character encoded in UTF format.
        return pText[lUTF_Index .. lUTF_Index + lStride];
   }
 
   void main()
   {
     char[] text = "a\ua034bcdef";
     uint loc = 4;
     writefln("%s", getCharAt(text, loc)); // shows "c"
     writefln("%s", text[loc-1]); // correctly fails
   }
 
 
 If you just use 'text[loc]', you may not get the correct character, and you
 actually only get a UTF code point fragment anyway.
 
 Remember that char[] is not an array of characters. It is an array of UTF-8
 code point fragments (each 1-byte wide) and a UTF-8 encoded character (code
 point) can have from 1 to 4 fragments.

I was going to post a link to my old Text In D article[1], but I guess
that'd be redundant now :P

Incidentally, I don't suppose you know anything about the relative
performance of your method up there ^^ and the one in my article down
here vv:

 dchar nthCharacter(char[] string, int n)
 {
     int curChar = 0;
     foreach( dchar cp ; string )
         if( curChar++ == n )
             return cp;
     return dchar.init;
 }

I'm curious since I don't want to recommend a slow solution if I can
help it :)

	-- Daniel

[1] http://www.prowiki.org/wiki4d/wiki.cgi?DanielKeep/TextInD

-- 
int getRandomNumber()
{
    return 4; // chosen by fair dice roll.
              // guaranteed to be random.
}

http://xkcd.com/

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D
i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/

Apr 24 2007

Derek Parnell <derek psych.ward> writes:

On Wed, 25 Apr 2007 13:41:25 +1000, Daniel Keep wrote:

 Incidentally, I don't suppose you know anything about the relative
 performance of your method up there ^^ and the one in my article down
 here vv:

It seems that your routine is about 3 times slower than the one I had
shown. Here is my test program ... I modified your routine slightly because
the idiom "if (x++ == n)" is a dangerous one as it is unclear if 'x' gets
incremented before or after the comparision. I changed it to be more clear.
I also changed my routine to output a dchar rather than a char[] and to
test for invalid position input.

//-----------------------------
import std.perf;
import std.stdio;
import std.utf;


 dchar getCharAt(T)(T pText, int pPos)
 {
       size_t lUTF_Index;
       uint   lStride;

       if (pPos < 0 || pPos >= pText.length)
        return dchar.init;
       // Firstly, find out where the character starts in the string.
       lUTF_Index = std.utf.toUTFindex(pText, pPos);

       // Then find out its width (in bytes)
       lStride = std.utf.stride(pText, lUTF_Index);

       // Return the character encoded in UTF format.
       return std.utf.toUTF32(
                pText[lUTF_Index .. lUTF_Index + lStride])[0];
}

dchar nthCharacter(T)(T string, int n)
{
    int curChar = 0;
    foreach( dchar cp ; string )
    {
        if( curChar == n )
            return cp;
        curChar++;
    }
    return dchar.init;
}

void main()
{
    char[] text = "a\ua034bcdefa\ua034bcdefa\ua034bcdefa\ua034bcdefg1"
                  "a\ua034bcdefa\ua034bcdefa\ua034bcdefa\ua034bcdefg2"
                  "a\ua034bcdefa\ua034bcdefa\ua034bcdefa\ua034bcdefg3"
                  "a\ua034bcdefa\ua034bcdefa\ua034bcdefa\ua034bcdefg4"
                  "a\ua034bcdefa\ua034bcdefa\ua034bcdefa\ua034bcdefg5"
                  "a\ua034bcdefa\ua034bcdefa\ua034bcdefa\ua034bcdefg6"
                  ;
    // Test must locate the last character.
    int loc = std.utf.toUTF32(text).length-1;

    assert(getCharAt(text, loc) == '6');
    assert(nthCharacter(text, loc) == '6');

    PerformanceCounter    counter = new PerformanceCounter();

    counter.start();
    volatile for(int i = 0; i < 10_000_000; ++i)
    {  getCharAt(text, loc); }
    counter.stop();

    writefln("Derek Parnell: %10d", counter.microseconds());

    counter.start();
    volatile for(int i = 0; i < 10_000_000; ++i)
    {  nthCharacter(text, loc); }
    counter.stop();

    writefln("  Daniel Keep: %10d", counter.microseconds());
}
//-----------------------------

On my machine (Intel Core 2 6600   2.40GHz, 2GB RAM) I got this result ...

c:\temp>test
Derek Parnell:    7939664
  Daniel Keep:   26683373

-- 
Derek Parnell
Melbourne, Australia
"Justice for David Hicks!"
skype: derek.j.parnell

Apr 25 2007

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Derek Parnell wrote:
 On Wed, 25 Apr 2007 13:41:25 +1000, Daniel Keep wrote:
 
 Incidentally, I don't suppose you know anything about the relative
 performance of your method up there ^^ and the one in my article down
 here vv:

 
 It seems that your routine is about 3 times slower than the one I had
 shown. Here is my test program ... I modified your routine slightly because
 the idiom "if (x++ == n)" is a dangerous one as it is unclear if 'x' gets
 incremented before or after the comparision. I changed it to be more clear.

How is it unclear? Postfix-increment clearly means that the value before 
incrementation is returned (and thus compared to n in that expression).

 I also changed my routine to output a dchar rather than a char[] and to
 test for invalid position input.
 
 //-----------------------------
 import std.perf;
 import std.stdio;
 import std.utf;
 
 
  dchar getCharAt(T)(T pText, int pPos)
  {
        size_t lUTF_Index;
        uint   lStride;
 
        if (pPos < 0 || pPos >= pText.length)
         return dchar.init;
        // Firstly, find out where the character starts in the string.
        lUTF_Index = std.utf.toUTFindex(pText, pPos);
 


        // Then find out its width (in bytes)
        lStride = std.utf.stride(pText, lUTF_Index);
 
        // Return the character encoded in UTF format.
        return std.utf.toUTF32(
                 pText[lUTF_Index .. lUTF_Index + lStride])[0];

I think you can change these last two statements to just:
---
	return pText.decode(lUTF_Index);
---
(that's std.utf.decode, just to be clear)
That changes the index variable passed, but that doesn't matter here.

 }

[snip]
 //-----------------------------
 
 On my machine (Intel Core 2 6600   2.40GHz, 2GB RAM) I got this result ...
 
 c:\temp>test
 Derek Parnell:    7939664
   Daniel Keep:   26683373

With mine added: (and obviously on _my_ machine)
---
urxae urxae:~/tmp$ dmd -O -release -inline -run test.d
    Derek Parnell:   17693368
      Daniel Keep:   54037341
Frits van Bommel:   12045495
urxae urxae:~/tmp$ gdc -O3 -finline -frelease -o test test.d && ./test
    Derek Parnell:   19567337
      Daniel Keep:   26750383
Frits van Bommel:   14332419
---
(My machine & compilers: AMD Sempron 3200+, 1GB RAM, 64-bit Ubuntu 6.10, 
running DMD 1.013 and GDC 0.23/x86_64)

So my version is even faster (about 30%), at least on my machine. And 
IMHO it's also more readable. No need to know what "stride" is, for example.

Apr 25 2007

Daniel Keep <daniel.keep.lists gmail.com> writes:

Frits van Bommel wrote:
 Derek Parnell wrote:
 On Wed, 25 Apr 2007 13:41:25 +1000, Daniel Keep wrote:

 Incidentally, I don't suppose you know anything about the relative
 performance of your method up there ^^ and the one in my article down
 here vv:

 It seems that your routine is about 3 times slower than the one I had
 shown. Here is my test program ... I modified your routine slightly
 because
 the idiom "if (x++ == n)" is a dangerous one as it is unclear if 'x' gets
 incremented before or after the comparision. I changed it to be more
 clear.

 
 How is it unclear? Postfix-increment clearly means that the value before
 incrementation is returned (and thus compared to n in that expression).
 
 I also changed my routine to output a dchar rather than a char[] and to
 test for invalid position input.

 //-----------------------------
 import std.perf;
 import std.stdio;
 import std.utf;


  dchar getCharAt(T)(T pText, int pPos)
  {
        size_t lUTF_Index;
        uint   lStride;

        if (pPos < 0 || pPos >= pText.length)
         return dchar.init;
        // Firstly, find out where the character starts in the string.
        lUTF_Index = std.utf.toUTFindex(pText, pPos);

 
 
        // Then find out its width (in bytes)
        lStride = std.utf.stride(pText, lUTF_Index);

        // Return the character encoded in UTF format.
        return std.utf.toUTF32(
                 pText[lUTF_Index .. lUTF_Index + lStride])[0];

 
 I think you can change these last two statements to just:
 ---
     return pText.decode(lUTF_Index);
 ---
 (that's std.utf.decode, just to be clear)
 That changes the index variable passed, but that doesn't matter here.
 
 }

 [snip]
 //-----------------------------

 On my machine (Intel Core 2 6600   2.40GHz, 2GB RAM) I got this result
 ...

 c:\temp>test
 Derek Parnell:    7939664
   Daniel Keep:   26683373

 
 With mine added: (and obviously on _my_ machine)
 ---
 urxae urxae:~/tmp$ dmd -O -release -inline -run test.d
    Derek Parnell:   17693368
      Daniel Keep:   54037341
 Frits van Bommel:   12045495
 urxae urxae:~/tmp$ gdc -O3 -finline -frelease -o test test.d && ./test
    Derek Parnell:   19567337
      Daniel Keep:   26750383
 Frits van Bommel:   14332419
 ---
 (My machine & compilers: AMD Sempron 3200+, 1GB RAM, 64-bit Ubuntu 6.10,
 running DMD 1.013 and GDC 0.23/x86_64)
 
 So my version is even faster (about 30%), at least on my machine. And
 IMHO it's also more readable. No need to know what "stride" is, for
 example.

Yoikes!  I'm rather amazed that the "simple" foreach method is that much
slower.  I'll add the faster version to the article as soon as I get the
chance.

Thanks, guys.

	-- Daniel

-- 
int getRandomNumber()
{
    return 4; // chosen by fair dice roll.
              // guaranteed to be random.
}

http://xkcd.com/

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D
i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/

Apr 25 2007

Derek Parnell <derek psych.ward> writes:

On Wed, 25 Apr 2007 15:52:45 +0200, Frits van Bommel wrote:

 Derek Parnell wrote:
 On Wed, 25 Apr 2007 13:41:25 +1000, Daniel Keep wrote:
 
 Incidentally, I don't suppose you know anything about the relative
 performance of your method up there ^^ and the one in my article down
 here vv:

 
 It seems that your routine is about 3 times slower than the one I had
 shown. Here is my test program ... I modified your routine slightly because
 the idiom "if (x++ == n)" is a dangerous one as it is unclear if 'x' gets
 incremented before or after the comparision. I changed it to be more clear.

 
 How is it unclear? Postfix-increment clearly means that the value before 
 incrementation is returned (and thus compared to n in that expression).

Yes, I know what it is supposed to do, but when written as it is, it can
either be mistakenly thought that the variable gets incremented before the
comparision or requires that extra bit of thinking to 'see' the process
flow. For that reason, I prefer to either have ++ written as its own
statement or write it so the casual reader can explicitly see the process
flow. 

For example, in the original code by Daniel, I was unsure as to whether he
was using a 0-based index or a 1-based index, as I had done in my example.
The code he supplied assumed a 0-based if the ++ worked as you describe but
it assumed a 1-based index if it worked the other way. As my example was
1-based, and I assumed that Daniel knew how to use ++ correctly, I figured
he had thus changed my definition of the Position parameter. But the point
is, because it was not absolutely clear what the *intention* of the Daniel
was, I decided to coded it so the intention was more clear.


 I think you can change these last two statements to just:

 ... 
 So my version is even faster (about 30%), at least on my machine. And 
 IMHO it's also more readable. No need to know what "stride" is, for example.

Well, if we were really into a pissing contest, we'd both remove the calls
to library routines and code it inline, in assembler etc ... but that was
not the point. Daniel's code is another example of 'foreach' not producing
the best machine code to solve the problem at hand.

-- 
Derek Parnell
Melbourne, Australia
"Justice for David Hicks!"
skype: derek.j.parnell

Apr 25 2007

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Derek Parnell wrote:
 On Wed, 25 Apr 2007 15:52:45 +0200, Frits van Bommel wrote:
 
 I think you can change these last two statements to just:

  ... 
 So my version is even faster (about 30%), at least on my machine. And 
 IMHO it's also more readable. No need to know what "stride" is, for example.

 
 Well, if we were really into a pissing contest, we'd both remove the calls
 to library routines and code it inline, in assembler etc ... but that was

I was just mentioning that you seemed to be over-complicating the code, 
and as a side-benefit the simpler code was faster as well.

 not the point. Daniel's code is another example of 'foreach' not producing
 the best machine code to solve the problem at hand.

Well to be fair, I don't think that's purely the fault of 'foreach' 
implementation problems in this case.
'foreach' is doing genuinely more work in this case. Specifically, the 
foreach loop is decoding all characters up to the one it returns while 
the getCharAt() variants only actually decode the character asked for, 
using no more than the stride of the preceding ones.

What the foreach version does is therefore more like the following:
-----
dchar nthCharacter2(T)(T string, int n)
{
     int curChar = 0;
     for(size_t index = 0 ; index < string.length ; string.decode(index))
     {
         if( curChar == n )
             return string.decode(index);	// return _next_ char
         curChar++;
     }
     return dchar.init;
}
-----
Which is also on the slow side. (Though on DMD this version is still 
faster than the 'foreach' version :( )
The results with this added as well:
=====
urxae urxae:~/tmp$ dmd -O -release -inline -run test.d
    Derek Parnell:   14416041
Frits van Bommel:    9803830
      Daniel Keep:   37386228
       for-decode:   33767606
urxae urxae:~/tmp$ gdc -O3 -finline -frelease -o test test.d && ./test
    Derek Parnell:   17267995
Frits van Bommel:   11836242
      Daniel Keep:   21390295
       for-decode:   25339226
=====
("for-decode" is the code above)

Apr 25 2007

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Get Character At?