www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Performance penalty for using ranges

reply "Paul Jurczak" <pauljurczak yahoo.com> writes:
I wrote a few versions of a palindrome testing function and 
noticed that versions using ranges (isPalindrome1..isPalindrome3) 
are 3 times slower than array indexing one (isPalindrome0). Is 
there a more efficient way of using ranges? Timing was done on 
Windows with DMD 2.063.2 (32-bit) dmd -O -noboundscheck -inline 
-release.

// 6.9ms
bool isPalindrome0(T)(T[] s) {
    auto i = 0;
    for (; i < s.length/2 && s[i] == s[$-i-1]; ++i) {}
    return i == s.length/2;
}

// 21ms
bool isPalindrome1(T)(T[] s) {
    while (s.length > 1 && s.front == s.back) {
       s.popBack();
       s.popFront();
    }

    return s.length <= 1;
}

// 21ms
bool isPalindrome2(T)(T[] s) {
    for (; s.length > 1 && s.front == s.back; s.popBack(), 
s.popFront()) {}
    return s.length <= 1;
}

// 21.4ms
bool isPalindrome3(T)(T s) {
    foreach (i; 0..s.length/2)
       if (s[i] != s[$-i-1])
          return false;

    return true;
}


int test() {
    int a[10];
    int n = 0;

    foreach (i; 0..1_000_000) {
       a[4] = !a[4];
       n += isPalindrome0(a);
    }

    return n;
}


void main()
{
    enum N = 16;
    auto t = benchmark!(test)(N);

    writeln(cast(float)t[0].msecs/N);
    writeln(test);
}
Aug 25 2013
next sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Paul Jurczak:

 I wrote a few versions of a palindrome testing function and 
 noticed that versions using ranges 
 (isPalindrome1..isPalindrome3) are 3 times slower than array 
 indexing one (isPalindrome0). Is there a more efficient way of 
 using ranges?

I have modified and improved and fixed your code a little: import std.stdio, std.datetime, std.array, std.typetuple, std.range, std.algorithm; bool isPalindrome0(T)(in T[] s) pure nothrow { size_t i = 0; for (; i < s.length / 2 && s[i] == s[$ - i - 1]; ++i) {} return i == s.length / 2; } bool isPalindrome1(T)(T[] s) pure nothrow { while (s.length > 1 && s.front == s.back) { s.popBack; s.popFront; } return s.length <= 1; } bool isPalindrome2(T)(T[] s) pure nothrow { for (; s.length > 1 && s.front == s.back; s.popBack, s.popFront) {} return s.length <= 1; } bool isPalindrome3(T)(in T[] s) pure nothrow { foreach (immutable i; 0 .. s.length / 2) if (s[i] != s[$ - i - 1]) return false; return true; } /// A high-level version. bool isPalindrome4(T)(in T[] s) /*pure nothrow*/ { return s.retro.equal(s); } int test(alias F)(in int nLoops) /*pure nothrow*/ { int[10] a; typeof(return) n = 0; foreach (immutable _; 0 .. nLoops) { a[4] = !a[4]; n += F(a); } return n; } void main() { enum size_t nLoops = 60_000_000; StopWatch sw; foreach (alg; TypeTuple!(isPalindrome0, isPalindrome1, isPalindrome2, isPalindrome3, isPalindrome4)) { sw.reset; sw.start; immutable n = test!alg(nLoops); sw.stop; writeln(n, " ", sw.peek.msecs); } } Using a compiler with a better optimizing back-end (especially with a better inlining), the timings show that the faster versions are the second and third, but all of them have a very similar performance (the last one reads the whole array twice, so it's a little slower, as expected): ...>ldmd2 -O -release -inline -noboundscheck -run test2.d 30000000 1150 30000000 1073 30000000 1038 30000000 1215 30000000 1621 Someday a function like isPalindrome4 will be pure & nothrow. Bye, bearophile
Aug 25 2013
next sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 08/25/2013 10:42 PM, Joseph Rushton Wakeling wrote:
      foreach (immutable i; iota(0, n, 2)) { ... }

 ... but again, that will be slow compared to the for loop or a foreach
 across an interval.

Why would that be the case?
Aug 25 2013
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 08/25/2013 11:45 PM, bearophile wrote:
 Timon Gehr:

 ... but again, that will be slow compared to the for loop or a foreach
 across an interval.

Why would that be the case?

In the real world, where we live, ...

This is uncalled for.
Aug 25 2013
next sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
On 08/26/2013 12:46 AM, Joseph Rushton Wakeling wrote:
 On 26/08/13 00:36, Timon Gehr wrote:
 This is uncalled for.

I'm sure you didn't mean it, but your original remark, "Why would that be the case?", came across as flip and dismissive about a serious practical problem

It was a genuine question. I assumed that the inliner actually works. But you are right, DMD does not even seem to inline iota, and gdc appears to fail to vectorize some iota-loops even though it vectorizes the range-foreach equivalent.
Aug 25 2013
prev sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
On 08/26/2013 01:18 AM, bearophile wrote:
 Timon Gehr:

 This is uncalled for.

I didn't mean to offend,

No offence taken.
 sorry for the unneeded (light) sarcasm.

Well, this kind of thing does not propagate too well over the Internet.
Aug 25 2013
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/25/13 12:46 PM, Joseph Rushton Wakeling wrote:
 On 25/08/13 21:10, monarch_dodra wrote:
 It never clashes untill you nest two foreach, and then you have to use
 __ ...

 foreach( _ ; 0 .. M)
      foreach( __ ; 0 .. N)
          ...

 I have an enhancement request to simply allow anonymous iteration:
 foreach( ; 0 .. N)

 http://d.puremagic.com/issues/show_bug.cgi?id=9009

Good call. :-)

I don't see why the excitement around anonymous bindings. It's a rare case and it's not like we're running out of symbols, particularly given they're by definition written only once :o). Andrei
Aug 25 2013
prev sibling next sibling parent "Paul Jurczak" <pauljurczak yahoo.com> writes:
On Sunday, 25 August 2013 at 14:54:04 UTC, bearophile wrote:
 I have modified and improved and fixed your code a little:

Thank you for your analysis. I've run it on Linux with DMD64 2.063.2: dmd -O -noboundscheck -inline -release and relative timings are different than on Windows: 500000 7160 500000 18588 500000 18592 500000 14466 500000 28368 I'm also running it with LDC, but it reports timings too good to be true - something meaningful is getting optimized away and I'm trying to find out why. I have a few newbie questions below:
 int test(alias F)(in int nLoops) /*pure nothrow*/ {
     int[10] a;
     typeof(return) n = 0;

     foreach (immutable _; 0 .. nLoops) {
         a[4] = !a[4];
         n += F(a);
     }

     return n;
 }

What is the purpose of "immutable _;" above? Why not "i;"?
 void main() {
     enum size_t nLoops = 60_000_000;
     StopWatch sw;

     foreach (alg; TypeTuple!(isPalindrome0, isPalindrome1,
                              isPalindrome2, isPalindrome3,
                              isPalindrome4)) {
         sw.reset;
         sw.start;
         immutable n = test!alg(nLoops);
         sw.stop;
         writeln(n, " ", sw.peek.msecs);
     }
 }

Same question here: why immutable? I wanted to get more stable run-to-run results, so I'm looking for minimums: void main() { enum N = 100; enum M = 1_000_000; StopWatch sw; foreach (alg; TypeTuple!(isPalindrome0, isPalindrome1, isPalindrome2, isPalindrome3, isPalindrome4)) { int [N] results; long[N] timings; foreach (i; 0..N) { sw.reset; sw.start; results[i] = test!alg(M); sw.stop; timings[i] = sw.peek.usecs; } writeln(results.reduce!min, " ", timings.reduce!min); } }
Aug 25 2013
prev sibling next sibling parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 25/08/13 20:07, Paul Jurczak wrote:
 What is the purpose of "immutable _;" above? Why not "i;"?

It's never used, so notating it like this means you'll never clash with another variable name by accident (e.g. if somewhere else in the function you declare an int i). Had never thought of that before but it's a handy trick, I'll have to make use of it in my own code ...
Aug 25 2013
prev sibling next sibling parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Sunday, 25 August 2013 at 18:13:39 UTC, Joseph Rushton 
Wakeling wrote:
 On 25/08/13 20:07, Paul Jurczak wrote:
 What is the purpose of "immutable _;" above? Why not "i;"?

It's never used, so notating it like this means you'll never clash with another variable name by accident (e.g. if somewhere else in the function you declare an int i). Had never thought of that before but it's a handy trick, I'll have to make use of it in my own code ...

It never clashes untill you nest two foreach, and then you have to use __ ... foreach( _ ; 0 .. M) foreach( __ ; 0 .. N) ... I have an enhancement request to simply allow anonymous iteration: foreach( ; 0 .. N) http://d.puremagic.com/issues/show_bug.cgi?id=9009
Aug 25 2013
prev sibling next sibling parent "Paul Jurczak" <pauljurczak yahoo.com> writes:
On Sunday, 25 August 2013 at 18:07:33 UTC, Paul Jurczak wrote:
[..]
 I'm also running it with LDC, but it reports timings too good 
 to be true - something meaningful is getting optimized away and 
 I'm trying to find out why.

It seems that LDC optimizer is smarter than I expected and it was able to optimize away comparison on constant elements of array a in my test: int test(alias F)(int N) { int a[10]; int n = 0; foreach (immutable _; 0..N) { a[4] = !a[4]; n += F(a); } return n; } I modified it, so there is no constant elements and all cases of pairwise comparison are exercised: int test(alias F)(int N) { enum L = 10; int a[L]; int n = 0; foreach (int i; 0..N) { auto j = (i%L + L/2)%L; a[j] = !a[j]; n += F(a); } return n; } Here are the results: Xubuntu 13.04 64-bit Core i5 3450S 2.8GHz (3.5GHz turbo): LDC 0.11.0: ldmd2 -m64 -O -noboundscheck -inline -release 100000 5284 100000 4265 100000 4268 100000 5717 100000 4265 GDC 4.8.1: gdc -m64 -march=native -fno-bounds-check -frename-registers -frelease -O3 100000 4612 100000 5717 100000 5718 100000 4984 100000 11546 DMD64 2.063.2: dmd -O -noboundscheck -inline -release 100000 8350 100000 14615 100000 14597 100000 14270 100000 18509 Windows 7 64-bit Core i5 2500 3.4GHz turbo: DMD 2.063.2 (32-bit) dmd -O -noboundscheck -inline -release 100000 9803 100000 17050 100000 17031 100000 20851 100000 20283 I'm stunned by LDC ability to optimize isPalindrome4 to the best performance level of the whole group. Other compilers placed it at the bottom of performance rank.
Aug 25 2013
prev sibling next sibling parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 25/08/13 21:10, monarch_dodra wrote:
 It never clashes untill you nest two foreach, and then you have to use __ ...

 foreach( _ ; 0 .. M)
      foreach( __ ; 0 .. N)
          ...

 I have an enhancement request to simply allow anonymous iteration:
 foreach( ; 0 .. N)

 http://d.puremagic.com/issues/show_bug.cgi?id=9009

Good call. :-)
Aug 25 2013
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
Paul Jurczak:

 int test(alias F)(in int nLoops) /*pure nothrow*/ {
    int[10] a;
    typeof(return) n = 0;

    foreach (immutable _; 0 .. nLoops) {
        a[4] = !a[4];
        n += F(a);
    }

    return n;
 }

What is the purpose of "immutable _;" above? Why not "i;"?

I have used _ as variable iteration name to remind the person that is reading the code that loop variable name is not used inside the loop itself. When you have more of those variables you can number them as _1, _2, or use no more than two underscores __. And there the usage of immutable is to disallow the mutation the iteration variable inside the loop. In my opinion, to keep code clean and less bug-prone D should on make foreach indexes immutable on default. This saves you from bugs when you iterate on struct items: struct Foo { int x; } auto foos = [Foo(10), Foo(20)]; foreach (f; foos) f.x++; It's not uncommon in those cases to forget a "ref" and think foos are changed. If iteration items are immutable as in C# this saves you from this class of bugs. I have asked this in past, but D lacks a "mutable" keyword for the situations when you want to actually mutate the foreach variable, and generally Walter didn't sound very interested on this. So as workaround for this language design problem, it's a good habit to always put a const/immutable on foreach variables, unless you don't wait it, and unless something forces you to not use a const.
 void main() {
    enum size_t nLoops = 60_000_000;
    StopWatch sw;

    foreach (alg; TypeTuple!(isPalindrome0, isPalindrome1,
                             isPalindrome2, isPalindrome3,
                             isPalindrome4)) {
        sw.reset;
        sw.start;
        immutable n = test!alg(nLoops);
        sw.stop;
        writeln(n, " ", sw.peek.msecs);
    }
 }

Same question here: why immutable?

It's explained here: http://blog.knatten.org/2011/11/11/disempower-every-variable/ In short all your variables/function arguments should be tagged as const/immutable unless the semantics of your algorithm needs them to mutate, or unless some limit in D/Phobos forbids you to (like when you have created a lazy range, currently you can't assign it to const and then use it). Bye, bearophile
Aug 25 2013
prev sibling next sibling parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 25/08/13 22:22, bearophile wrote:
 In short all your variables/function arguments should be tagged as
 const/immutable unless the semantics of your algorithm needs them to mutate, or
 unless some limit in D/Phobos forbids you to (like when you have created a lazy
 range, currently you can't assign it to const and then use it).

It's slightly annoying that one can't readily get immutability to play nice with more general iterations than i .. j. For example if you consider the loop, for (i = 10; i > 0; --i) { ... } You can't make i immutable for obvious reasons but it's not clear to me how you can get a performant version of that with immutable i. You could do something like [*], foreach (immutable i; iota(1, 11).retro) { ... } ... but that will be slow and inefficient. Ditto for cases like for (i = 0; i < n; i += 2) { ... } which you could write as foreach (immutable i; iota(0, n, 2)) { ... } ... but again, that will be slow compared to the for loop or a foreach across an interval. [* Hijacking of discussion: a while back I think I floated the idea of generalizing iota() with closed/open boundary conditions similar to those found in std.random.uniform; so e.g. you could do iota!"[]"(0, 10) and the upper bound would be included in the values returned. Would be useful for cases like these.]
Aug 25 2013
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
Joseph Rushton Wakeling:

 It's slightly annoying that one can't readily get immutability 
 to play nice with more general iterations than i .. j.

 For example if you consider the loop,

     for (i = 10; i > 0; --i) { ... }

Thankfully foreach_reverse was not deprecated: void main() { import std.stdio; for (auto i = 10; i > 0; --i) write(i, " "); writeln; foreach_reverse (immutable i; 1 .. 11) write(i, " "); writeln; }
 [* Hijacking of discussion: a while back I think I floated the 
 idea of generalizing iota() with closed/open boundary 
 conditions similar to those found in std.random.uniform; so 
 e.g. you could do iota!"[]"(0, 10) and the upper bound would be 
 included in the values returned.  Would be useful for cases 
 like these.]

Yes, it's a kind of necessary enhancement: http://d.puremagic.com/issues/show_bug.cgi?id=10466 Bye, bearophile
Aug 25 2013
prev sibling next sibling parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Sunday, 25 August 2013 at 21:01:54 UTC, bearophile wrote:
 [* Hijacking of discussion: a while back I think I floated the 
 idea of generalizing iota() with closed/open boundary 
 conditions similar to those found in std.random.uniform; so 
 e.g. you could do iota!"[]"(0, 10) and the upper bound would 
 be included in the values returned.  Would be useful for cases 
 like these.]

Yes, it's a kind of necessary enhancement: http://d.puremagic.com/issues/show_bug.cgi?id=10466 Bye, bearophile

If somebody were to implement this, I would *love* to review it.
Aug 25 2013
prev sibling next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Sun, Aug 25, 2013 at 11:10:39PM +0200, monarch_dodra wrote:
 On Sunday, 25 August 2013 at 21:01:54 UTC, bearophile wrote:
[* Hijacking of discussion: a while back I think I floated the
idea of generalizing iota() with closed/open boundary conditions
similar to those found in std.random.uniform; so e.g. you could
do iota!"[]"(0, 10) and the upper bound would be included in the
values returned.  Would be useful for cases like these.]

Yes, it's a kind of necessary enhancement: http://d.puremagic.com/issues/show_bug.cgi?id=10466 Bye, bearophile

If somebody were to implement this, I would *love* to review it.

If somebody were to attempt to implement this, I'd recommend taking this issue into account as well: http://d.puremagic.com/issues/show_bug.cgi?id=10762 (which is a generalization of: http://d.puremagic.com/issues/show_bug.cgi?id=6447) I don't know if the total amount of changes would warrant rewriting iota entirely in order to keep the complexity of the implementation minimal. (The nice thing about D unittests is that rewriting code is much less scary because if the unittests are complete enough, they will catch any regressions right off the bat.) T -- WINDOWS = Will Install Needless Data On Whole System -- CompuMan
Aug 25 2013
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:

 I don't see why the excitement around anonymous bindings. It's 
 a rare case and it's not like we're running out of symbols, 
 particularly given they're by definition written only once :o).

It's not a rare case, I can test this searching in my code for "foreach (_;". And while not essential, anonymous loops are useful because adding an useless variable to your code increases its complexity a little without giving something back. It's just like adding an useless variable inside one of your functions. In C/C++ I look for the "unused variable" warnings (a warning that is currently missing in D). When you define a function, and you don't want to use its second argument, D allows you to not give it a name, despite you are not running out of symbols: void foo(int x, int) {} When I see code with a loop with a variable, to understand the loop I first of all search how such variable is used inside the loop. If the loop variable is anonymous this tells me something important about the loop, and reduces the noise. Bye, bearophile
Aug 25 2013
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
Timon Gehr:

 ... but again, that will be slow compared to the for loop or a 
 foreach
 across an interval.

Why would that be the case?

In the real world, where we live, there is a thing named "abstraction penalty", it's something that asks you to pay something when there is an interface between separated subsystems. To avoid paying that penalty at run-time in your programs you need a smarter compiler, that usually requires more time to be developed, and usually requires more time to do its work (this means slower compilations, see the LDC2 compiler compared to DMD. LDC2 was able to remove most of the run-time penalty, but when it compiles it's slower than DMD). So in the real life we have to take compromises all the time. Bye, bearophile
Aug 25 2013
prev sibling next sibling parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 25/08/13 23:11, Timon Gehr wrote:
 Why would that be the case?

I don't know why it _need_ be the case, but it _is_ the case in practice right now. The precise example I cited of a for() loop with decreasing value, I tried replacing using retro and iota -- and it was significantly slower. Ditto foreach's over iota() -- even though in principle there is no conceptual difference between foreach(i; m .. n) and foreach(i; iota(m, n)), and it ought to be possible for the two to reduce to the same machine code, there is right now a performance hit.
Aug 25 2013
prev sibling next sibling parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 25/08/13 23:17, Andrei Alexandrescu wrote:
 I don't see why the excitement around anonymous bindings. It's a rare case and
 it's not like we're running out of symbols, particularly given they're by
 definition written only once :o).

Every little helps. :-) (Don't know if you get the cultural reference on that, it probably helps to be British...:-)
Aug 25 2013
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
 It's not a rare case, I can test this searching in my code for 
 "foreach (_;".

Beside my code, if you want I can show 16 URLs to use cases of "anonymous iteration" in D code. Bye, bearophile
Aug 25 2013
prev sibling next sibling parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 25/08/13 23:01, bearophile wrote:
 Thankfully foreach_reverse was not deprecated:

Really? I thought it was viewed these days as some kind of awful abomination that should never have been in the language. (I'm not sure I understand why, but equally I'm not sure I care to open the can of worms of finding out why...)
Aug 25 2013
prev sibling next sibling parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 26/08/13 00:36, Timon Gehr wrote:
 This is uncalled for.

I'm sure you didn't mean it, but your original remark, "Why would that be the case?", came across as flip and dismissive about a serious practical problem of working with D, that anyone caring about performance has run into. I don't mean to defend or attack anyone, but I'm not surprised you got a sarcastic response.
Aug 25 2013
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
Timon Gehr:

 This is uncalled for.

I didn't mean to offend, sorry for the unneeded (light) sarcasm. While generally you seem expert in computer science and many other things, in several of your post you seem to dismiss a little too much practical considerations. Bye, bearophile
Aug 25 2013
prev sibling parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Sunday, 25 August 2013 at 21:17:43 UTC, Andrei Alexandrescu 
wrote:
 On 8/25/13 12:46 PM, Joseph Rushton Wakeling wrote:
 On 25/08/13 21:10, monarch_dodra wrote:
 It never clashes untill you nest two foreach, and then you 
 have to use
 __ ...

 foreach( _ ; 0 .. M)
     foreach( __ ; 0 .. N)
         ...

 I have an enhancement request to simply allow anonymous 
 iteration:
 foreach( ; 0 .. N)

 http://d.puremagic.com/issues/show_bug.cgi?id=9009

Good call. :-)

I don't see why the excitement around anonymous bindings. It's a rare case and it's not like we're running out of symbols, particularly given they're by definition written only once :o). Andrei

I think "the excitement" is an overstatement. The subject comes up every now and then. The "foreach(_)" semantic is getting a bit more common, but it tends to confuse newbies, that ask about it. At that point, I simply point out that there is this ER, and people tend to agree it would be a nice addition. I agree it's not like its a real problem or anything, but it would be nice to have. Is all.
Aug 26 2013