digitalmars.D - "foreach(i, dchar c; s)" vs "decode"

monarch_dodra (36/36) Nov 25 2012 I spent *all* week benchmarking a string processing function. And

Jonathan M Davis (20/22) Nov 25 2012 You should also be testing with -O if you're benchmarking, but I still w...

monarch_dodra (19/58) Nov 25 2012 Well, "-release -O" went without saying, but you are right to

Jonathan M Davis (12/30) Nov 26 2012 Some improvements were made to std.array.popFront for 2.060. I'm not sur...
monarch_dodra (11/13) Nov 27 2012 Well, my (quick) benches showed that unless the input is

Dmitry Olshansky (11/45) Nov 27 2012 It was inevitable that one day utf decoding implementation in Phobos

monarch_dodra (5/22) Nov 27 2012 I benched with a dump of the japanese wiki actually ^^

Jonathan M Davis (4/10) Nov 27 2012 Nope. It's for @safe code. -release removes bounds checking from @system...
Dmitry Olshansky (12/30) Nov 27 2012 That would the -release switch. It has the effect of removing asserts

monarch_dodra (6/8) Nov 27 2012 You mean U+2620?

"monarch_dodra" <monarchdodra gmail.com> writes:

I spent *all* week benchmarking a string processing function. And 
now, at the end of the week, I can safely say that the compiler's 
"foreach" is slower than a phobos decode based while loop.

Basically, given a
----
foreach(i, dchar c; s)
{codeCodeCode;}
----
  loop, I replaced it with:
----
{
     size_t i;
     size_t j;
     immutable k = s.length;
     dchar c;
     for ( ; i < k ; i = j )
     {
         c = decode(s, j);
         codeCodeCode;
     }
}
----

And my algorithms instantly gained a 10-25% performance 
improvement(!). I benched using varied sources of data, in 
particular, both ASCII only strings, as well as unicode heavy 
text.

Unicode has better gains, but raw ASCII text is *also* has gains 
:/
this holds true for both UTF-8 and UTF-16.

UTF-32 is different, because foreach has the "unfair" advantage 
of not validating the code points...

I got these results on 2.061 alpha release, with phobos in 
release and both -inline and without inline.

So if any of the compiler guys are reading this... I have no idea 
how the unicode foreach is actually implemented, but there 
*should* be substantial gains to be had...

Nov 25 2012

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Sunday, November 25, 2012 22:37:24 monarch_dodra wrote:
 I got these results on 2.061 alpha release, with phobos in
 release and both -inline and without inline.

You should also be testing with -O if you're benchmarking, but I still would 
have thought that the compiler would be faster. Apparently not. I believe that 
definite work has been put into improving the decode, stride, popFront, etc. in 
Phobos over the past year or two, so they've definitely been improving. I 
suspect that whatever the compiler is doing hasn't been touched in ages, and I 
have no idea what improvements could or couldn't be done. It _is_ the sort of 
thing that I'd kind of expect to be sitting somewhere in druntime though. If 
it is, maybe foreach and Phobos' implemenations can be made to share in some 
way. I don't know (though IMHO speed should be more important here than 
reducing code duplication).

The speed of foreach's decoding definitely matters, but in the code that I've 
really been trying to make fast, I don't generally use it, because it's often 
the case that some portion of what I'm doing can be made faster by skipping 
decoding for some portion of the characters (like explicitly handling the code 
units for paraSep and lineSep in code that cares about the end of lines). 
Making string processing fast should definitely be one of our performance 
priorities though IMHO given how big an impact that can have on many programs 
and how unfriendly ranges generally are to efficient string processing.

- Jonathan M Davis

Nov 25 2012

"monarch_dodra" <monarchdodra gmail.com> writes:

On Sunday, 25 November 2012 at 21:51:42 UTC, Jonathan M Davis 
wrote:
 On Sunday, November 25, 2012 22:37:24 monarch_dodra wrote:
 I got these results on 2.061 alpha release, with phobos in
 release and both -inline and without inline.

 You should also be testing with -O if you're benchmarking, but 
 I still would
 have thought that the compiler would be faster. Apparently not. 
 I believe that
 definite work has been put into improving the decode, stride, 
 popFront, etc. in
 Phobos over the past year or two, so they've definitely been 
 improving. I
 suspect that whatever the compiler is doing hasn't been touched 
 in ages, and I
 have no idea what improvements could or couldn't be done. It 
 _is_ the sort of
 thing that I'd kind of expect to be sitting somewhere in 
 druntime though. If
 it is, maybe foreach and Phobos' implemenations can be made to 
 share in some
 way. I don't know (though IMHO speed should be more important 
 here than
 reducing code duplication).

 The speed of foreach's decoding definitely matters, but in the 
 code that I've
 really been trying to make fast, I don't generally use it, 
 because it's often
 the case that some portion of what I'm doing can be made faster 
 by skipping
 decoding for some portion of the characters (like explicitly 
 handling the code
 units for paraSep and lineSep in code that cares about the end 
 of lines).
 Making string processing fast should definitely be one of our 
 performance
 priorities though IMHO given how big an impact that can have on 
 many programs
 and how unfriendly ranges generally are to efficient string 
 processing.

 - Jonathan M Davis

Well, "-release -O" went without saying, but you are right to 
mention it, you never know.

Looking at 2.060 to 2.061, std.utf has changed a lot. I'll bench 
my algo using the old implementation of 2.060 to see if the 
change of performance could be related to that.

As you said, I found how some a "rt.util.utf" module in druntime, 
  I was looking in the dmd tree. However, it is pretty much an old 
version of std.utf, verbatim...

Also, druntime has a *radically* different approach to striding 
UTF-8. I'll try to see which approach is faster.

I'd have suggested we try some sort of code sharing, but now that 
"std.utf" supports range, the code has "forked" and I'm not sure 
is shareable anymore... Not without duplicating code inside 
std.utf, or adding range support (or at least code) for decoding 
ranges in druntime.

Well, I'll see what I can uncover, and update dmd utf in the 
meantime...

Nov 25 2012

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Monday, November 26, 2012 08:43:29 monarch_dodra wrote:
 Looking at 2.060 to 2.061, std.utf has changed a lot. I'll bench
 my algo using the old implementation of 2.060 to see if the
 change of performance could be related to that.

Some improvements were made to std.array.popFront for 2.060. I'm not sure that 
much in the way of changes to std.utf improved performance. I did most (all?) 
of them, but I don't remember all of the details now, so I'm not sure how many 
of them related to performance. Mostly what they did was make it so that 
stride, strideBack, and decode work with ranges of char and wchar, so that 
stuff like the D lexer can operate on ranges of code units rather than ranges 
of code points.

 As you said, I found how some a "rt.util.utf" module in druntime,
   I was looking in the dmd tree. However, it is pretty much an old
 version of std.utf, verbatim...
 
 Also, druntime has a *radically* different approach to striding
 UTF-8. I'll try to see which approach is faster.
 
 I'd have suggested we try some sort of code sharing, but now that
 "std.utf" supports range, the code has "forked" and I'm not sure
 is shareable anymore... Not without duplicating code inside
 std.utf, or adding range support (or at least code) for decoding
 ranges in druntime.
 
 Well, I'll see what I can uncover, and update dmd utf in the
 meantime...

Code sharing would be nice, but performance is far more critical in this case 
IMHO. The main question is how to speed up what druntime is doing. If that can 
be done while sharing code, then great. If not, then oh well.

- Jonathan M Davis

Nov 26 2012

"monarch_dodra" <monarchdodra gmail.com> writes:

On Monday, 26 November 2012 at 07:43:30 UTC, monarch_dodra wrote:
 Also, druntime has a *radically* different approach to striding 
 UTF-8. I'll try to see which approach is faster.

Well, my (quick) benches showed that unless the input is 
something like 90% multibyte unicode, std.utf's stride 
implementation beats the crap out of rc.util.utf's.

Depending on compile options (-inline/-noboundscheck):

Given an ASCII string, then phobo's implementation is anywhere 
from 100% to 500% faster.

Given a unicode only string, then UTF's is actually 30% faster.

If anybody else wants to have a go at it...

I'll go ahead and update rc.util.utf's implementation to match 
phobo's (minus  jmdavis' support for ranges).

Nov 27 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

11/26/2012 1:37 AM, monarch_dodra пишет:
 I spent *all* week benchmarking a string processing function. And now,
 at the end of the week, I can safely say that the compiler's "foreach"
 is slower than a phobos decode based while loop.

It was inevitable that one day utf decoding implementation in Phobos 
could outmatch the one buried in the compiler/runtime. The latter wasn't 
scrutinized nearly as much as the decode in std.uni.


 Basically, given a
 ----
 foreach(i, dchar c; s)
 {codeCodeCode;}
 ----
   loop, I replaced it with:
 ----
 {
      size_t i;
      size_t j;
      immutable k = s.length;
      dchar c;
      for ( ; i < k ; i = j )
      {
          c = decode(s, j);
          codeCodeCode;
      }
 }
 ----

 And my algorithms instantly gained a 10-25% performance improvement(!).
 I benched using varied sources of data, in particular, both ASCII only
 strings, as well as unicode heavy text.

Nothing better then a dump of Arabic wiki ? ;)

 Unicode has better gains, but raw ASCII text is *also* has gains :/
 this holds true for both UTF-8 and UTF-16.

 UTF-32 is different, because foreach has the "unfair" advantage of not
 validating the code points...

 I got these results on 2.061 alpha release, with phobos in release and
 both -inline and without inline.

Don't forget the -O -noboundscheck. As some things are safe and thus 
always have bounds check.

 So if any of the compiler guys are reading this... I have no idea how
 the unicode foreach is actually implemented, but there *should* be
 substantial gains to be had...

And how the compiler generated loop can be better? Fundamentally it has 
the same amount of knowledge as the "user-space" code has.

-- 
Dmitry Olshansky

Nov 27 2012

"monarch_dodra" <monarchdodra gmail.com> writes:

On Tuesday, 27 November 2012 at 09:15:02 UTC, Dmitry Olshansky 
wrote:
 I benched using varied sources of data, in particular, both 
 ASCII only
 strings, as well as unicode heavy text.

 Nothing better then a dump of Arabic wiki ? ;)

I benched with a dump of the japanese wiki actually ^^

 Unicode has better gains, but raw ASCII text is *also* has 
 gains :/
 this holds true for both UTF-8 and UTF-16.

 UTF-32 is different, because foreach has the "unfair" 
 advantage of not
 validating the code points...

 I got these results on 2.061 alpha release, with phobos in 
 release and
 both -inline and without inline.

 Don't forget the -O -noboundscheck. As some things are safe and 
 thus always have bounds check.

I though "noboundscheck" only had an effect on code marked 
" system"...?

Nov 27 2012

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Tuesday, November 27, 2012 12:47:19 monarch_dodra wrote:
 On Tuesday, 27 November 2012 at 09:15:02 UTC, Dmitry Olshansky
 Don't forget the -O -noboundscheck. As some things are safe and
 thus always have bounds check.

 
 I though "noboundscheck" only had an effect on code marked
 " system"...?

Nope. It's for  safe code. -release removes bounds checking from  system code 
but not from  safe code. -noboundscheck removes it completely.

- Jonathan M Davis

Nov 27 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

11/27/2012 3:47 PM, monarch_dodra пишет:
 On Tuesday, 27 November 2012 at 09:15:02 UTC, Dmitry Olshansky wrote:
 I benched using varied sources of data, in particular, both ASCII only
 strings, as well as unicode heavy text.

 Nothing better then a dump of Arabic wiki ? ;)

 I benched with a dump of the japanese wiki actually ^^

Also I recall German has surprisingly interesting mix of unicode & ascii.

 Unicode has better gains, but raw ASCII text is *also* has gains :/
 this holds true for both UTF-8 and UTF-16.

 UTF-32 is different, because foreach has the "unfair" advantage of not
 validating the code points...

 I got these results on 2.061 alpha release, with phobos in release and
 both -inline and without inline.

 Don't forget the -O -noboundscheck. As some things are safe and thus
 always have bounds check.

 I though "noboundscheck" only had an effect on code marked " system"...?

That would the -release switch. It has the effect of removing asserts 
and bounds checks from  system code. -noboundscheck will kill all of 
them everywhere I believe.

The dedicated to -release switch TDPL table goes as following:
              Safe  System
Non-release:  +      +
Release:      +      -

Except that '-' in a the book is marked as a skull-and-crossbones :)

-- 
Dmitry Olshansky

Nov 27 2012

"monarch_dodra" <monarchdodra gmail.com> writes:

On Tuesday, 27 November 2012 at 12:06:13 UTC, Dmitry Olshansky 
wrote:

 Except that '-' in a the book is marked as a 
 skull-and-crossbones :)

You mean U+2620?
☠
:D

In all seriousness though, thanks for the tip.

Nov 27 2012

D Programming

C/C++ Programming

Other

digitalmars.D - "foreach(i, dchar c; s)" vs "decode"