digitalmars.D.learn - Re: Re: class templates and static if

Tyler Jameson Little <beatgammit gmail.com> Feb 27 2012

=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> Feb 27 2012

Dmitry Olshansky <dmitry.olsh gmail.com> Feb 27 2012

=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> Feb 27 2012

Dmitry Olshansky <dmitry.olsh gmail.com> Feb 28 2012

=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> Feb 29 2012

"Jonathan M Davis" <jmdavisProg gmx.com> Feb 27 2012
"Jonathan M Davis" <jmdavisProg gmx.com> Feb 27 2012

Tyler Jameson Little <beatgammit gmail.com> writes:

--90e6ba211f1b4454c504b9f49e72
Content-Type: text/plain; charset=ISO-8859-1

 Parser!(Type.request) h = new Parser!(Type.request)("Hello world")


Wow, I was so close!  I had tried this:

Parser!Type.request h = new Parser!Type.request("Hello world");

but that didn't work. I didn't think about enclosing it in parens!

I didn't want to do subclassing, because my parser is a state-machine style
parser, so it's in a big switch. Pretty gross, but I would like it to be as
fast as possible. That's why I thought this model would be so cool, because
I could remove conditions from the generated code, and get rid of a lot of
the conditionals.

Thanks so much!

--90e6ba211f1b4454c504b9f49e72
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

&gt; Parser!(Type.request) h =3D new Parser!(Type.request)(&quot;Hello worl=
d&quot;)<div><br></div><div>Wow, I was so close! =A0I had tried this:</div>=
<div><br></div><div>Parser!Type.request h =3D new Parser!Type.request(&quot=
;Hello world&quot;);</div>

<div><br></div><div>but that didn&#39;t work. I didn&#39;t think about encl=
osing it in parens!</div><div><br></div><div>I didn&#39;t want to do subcla=
ssing, because my parser is a state-machine style parser, so it&#39;s in a =
big switch. Pretty gross, but I would like it to be as fast as possible. Th=
at&#39;s why I thought this model would be so cool, because I could remove =
conditions from the generated code, and get rid of a lot of the conditional=
s.</div>

<div><br></div><div>Thanks so much!</div>

--90e6ba211f1b4454c504b9f49e72--

Feb 27 2012

=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:

On 02/27/2012 08:29 AM, Tyler Jameson Little wrote:

 I didn't want to do subclassing, because my parser is a state-machine 


 parser, so it's in a big switch. Pretty gross, but I would like it to 


 fast as possible. That's why I thought this model would be so cool, 


 I could remove conditions from the generated code, and get rid of a 


 the conditionals.


I am pretty sure switch statements boil down to a sequence conditionals 
consisting of equality comparisons.

I know that some compilers use optimizations where the comparisons are 
converted to a single lookup, but last I checked, dmd does not apply 
that optimization. Perhaps because it's not implemented yet, or because 
using a table lookup might be slower because of reaching outside of the 
cpu cache. (Or another reason that I am not aware of. :))

Ali

Feb 27 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 27.02.2012 23:18, Jonathan M Davis wrote:
 On Monday, February 27, 2012 10:55:12 Ali Çehreli wrote:
 On 02/27/2012 08:29 AM, Tyler Jameson Little wrote:
 I didn't want to do subclassing, because my parser is a state-machine


 style

 parser, so it's in a big switch. Pretty gross, but I would like it to


 be as

 fast as possible. That's why I thought this model would be so cool,


 because

 I could remove conditions from the generated code, and get rid of a


 lot of

 the conditionals.


 I am pretty sure switch statements boil down to a sequence conditionals
 consisting of equality comparisons.

 I know that some compilers use optimizations where the comparisons are
 converted to a single lookup, but last I checked, dmd does not apply
 that optimization. Perhaps because it's not implemented yet, or because
 using a table lookup might be slower because of reaching outside of the
 cpu cache. (Or another reason that I am not aware of. :))


 Yeah. In theory, switch statements can be optimized better than if-else
 chains, and eventually I'd fully expect that dmd would do that, but right now,
 I don't think that they really are. You'd have to look at the generated
 assembly though to be sure though.


Please stop spreading the misconception that they are going to be 
optimized "probably and eventually". Switch statements over integers are 
much better then if/else chains and were for something like 20 years. 
They do produce nice improvements over a sequence of if/else because 
values are known in advance and are bounded.
So it goes like this:
1. If all numbers are tightly packed (e.g. almost all of M..N range is 
covered by a case) then it does a jump table. So there is almost no 
comparisons at all (just check it fits the range) and it's just one 
indirect jump.

2.Even failing that it does a binary-search like comparisons to speed 
thing up.

3. Sometimes linear searching is faster then binary (on small number of 
case) and it can estimate this threshold. Same goes for table vs linear 
search and table vs binary search.

4. In fact it even does combination of all of the above, e.g. separate 
range in two by one comparison then use jump tables in both etc.

For instance:

switch(x){
	case 0:
	...
	case 1:
  	...
	case 4:
	...
	case 9:
	... //all cases used up to 40
	case 40:

It might do something like (depends on compiler vendor ;) ):

if( x < 9)
{
	if(x == 0)
		...

	else if(x == 1)
		...

	else if(x == 4)
		...
}
else {
	if(x <=40)
		goto table[x]; //pseudocode
	else
		goto default;
}

And for the record I seen all of that in assembly listings produced by 
dmd, when optimizing VM dispatch in std.regex. I had to revisit opcodes 
layout a bit so that dmd  outputs jump table and not a binary-searching 
like stuff in a tight loop.


-- 
Dmitry Olshansky

Feb 27 2012

=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:

On 02/27/2012 02:06 PM, Jonathan M Davis wrote:

 I'm not saying that dmd doesn't ever optimize switch statements. I'm just
 saying that as I understand it, it doesn't always do so (its 


 ranged case statements isn't great for instance). Odds are that it _does_
 optimize straight integer case statements fairly well, because that's the
 classic C stuff (and if you've verified that, all the better - I 



I have played with this optimization recently. (Could be dmd 2.057.) No, 
dmd did not optimize a straightforward switch statement over a ubyte 
expression with about two hundred ubyte cases.

The assembly contained conditional jump statements, not a table. And 
yes, I did try with -O.

But I am not sure that a lookup table really is an optimization with 
modern CPUs. A series of conditional jumps that fit the CPU's cache 
could be faster than a table that's outside of the cache. I think 
accessing the cache is hundreds of times faster than accessing memory 
outside of the cache.

Ali

Feb 27 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 28.02.2012 2:17, Ali Çehreli wrote:
 On 02/27/2012 02:06 PM, Jonathan M Davis wrote:

  > I'm not saying that dmd doesn't ever optimize switch statements. I'm
 just
  > saying that as I understand it, it doesn't always do so (its
 performance with
  > ranged case statements isn't great for instance). Odds are that it
 _does_
  > optimize straight integer case statements fairly well, because that's
 the
  > classic C stuff (and if you've verified that, all the better - I
 haven't).

I would say use switch when you have multiple choises over one variable, 
because the compiler *can* do much better job then with if/else chain 
(even for strings, e.g. hashing, same binary-search approach). Because 
the if/else chain fixes the order of comparisons it can't become faster 
at all (or it would be an aggressive optimization).
It's just wrong idea to lose this opportunity with switch.

 I have played with this optimization recently. (Could be dmd 2.057.) No,
 dmd did not optimize a straightforward switch statement over a ubyte
 expression with about two hundred ubyte cases.

doubled. Mm care to share you experiments? Alternatively just dissemble 
any regex sample, seek out mangled trash of _xxx_eval_xxxx function.

 The assembly contained conditional jump statements, not a table. And
 yes, I did try with -O.

 But I am not sure that a lookup table really is an optimization with
 modern CPUs. A series of conditional jumps that fit the CPU's cache
 could be faster than a table that's outside of the cache. I think
 accessing the cache is hundreds of times faster than accessing memory
 outside of the cache.

It's all about branch prediction really - hundreds of cmp xxx je xxx 
stall pipelines pretty darn well. Though the effect might depend on the 
code size inside these case branches.
BTW tables should be somewhere close, though I haven't checked this.

-- 
Dmitry Olshansky

Feb 28 2012

=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:

On 02/28/2012 02:12 AM, Dmitry Olshansky wrote:
 On 28.02.2012 2:17, Ali Çehreli wrote:


 I have played with this optimization recently. (Could be dmd 2.057.) No,
 dmd did not optimize a straightforward switch statement over a ubyte
 expression with about two hundred ubyte cases.


 doubled.


That's really good to hear. Perhaps my experiments were either naive, or 
a series of "cmp xxx je xxx" would be faster, say for the ubyte type 
that I've used.

 Mm care to share you experiments?


Here is a program that mixes-in a lookup function that contains a single 
switch statement that covers the entire 256 values of ubyte:

import std.stdio;
import std.conv;

string oneCase(FromT, ToT)(FromT fromExpr, ToT toExpr)
{
     return
         " case " ~ to!string(fromExpr)
         ~ ": return " ~ to!string(ToT.init) ~ "; ";
}

string manyCases(FromT, ToT)(size_t count)
{
     string result;

     ToT fromExpr;
     foreach (i; 0 .. count) {
         result ~= oneCase(fromExpr, 42);
         ++fromExpr;
     }

     return result;
}

string lookUpFunc(FromT, ToT)(size_t count)
{
     immutable header = ToT.stringof ~ " lookUp(" ~ FromT.stringof
                        ~ " expr) { switch (expr) { ";
     immutable cases = manyCases!(FromT, ToT)(count);
     immutable footer = " default: return " ~ to!string(ToT.init) ~ "; } }";

     return header ~ cases ~ footer;
}

void main()
{
     enum funcWithManyCases = lookUpFunc!(ubyte, ubyte)(256);

     writeln("Mixing-in: ", funcWithManyCases);
     mixin(funcWithManyCases);

     assert(lookUp(42) == 0);
}

I compiled with -O and then used obj2asm to see that there are 256 cmp 
xxx jmp xxx instructions.

Ali

Feb 29 2012

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Monday, February 27, 2012 10:55:12 Ali Çehreli wrote:
 On 02/27/2012 08:29 AM, Tyler Jameson Little wrote:
 I didn't want to do subclassing, because my parser is a state-machine


 style
 
 parser, so it's in a big switch. Pretty gross, but I would like it to


 be as
 
 fast as possible. That's why I thought this model would be so cool,


 because
 
 I could remove conditions from the generated code, and get rid of a


 lot of
 
 the conditionals.


 I am pretty sure switch statements boil down to a sequence conditionals
 consisting of equality comparisons.
 
 I know that some compilers use optimizations where the comparisons are
 converted to a single lookup, but last I checked, dmd does not apply
 that optimization. Perhaps because it's not implemented yet, or because
 using a table lookup might be slower because of reaching outside of the
 cpu cache. (Or another reason that I am not aware of. :))


Yeah. In theory, switch statements can be optimized better than if-else 
chains, and eventually I'd fully expect that dmd would do that, but right now, 
I don't think that they really are. You'd have to look at the generated 
assembly though to be sure though.

- Jonathan M Davis

Feb 27 2012

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Monday, February 27, 2012 23:55:39 Dmitry Olshansky wrote:
 On 27.02.2012 23:18, Jonathan M Davis wrote:
 On Monday, February 27, 2012 10:55:12 Ali Çehreli wrote:
 On 02/27/2012 08:29 AM, Tyler Jameson Little wrote:
 I didn't want to do subclassing, because my parser is a state-machine


 style
 
 parser, so it's in a big switch. Pretty gross, but I would like it to


 be as
 
 fast as possible. That's why I thought this model would be so cool,


 because
 
 I could remove conditions from the generated code, and get rid of a


 lot of
 
 the conditionals.


 I am pretty sure switch statements boil down to a sequence conditionals
 consisting of equality comparisons.
 
 I know that some compilers use optimizations where the comparisons are
 converted to a single lookup, but last I checked, dmd does not apply
 that optimization. Perhaps because it's not implemented yet, or because
 using a table lookup might be slower because of reaching outside of the
 cpu cache. (Or another reason that I am not aware of. :))


 Yeah. In theory, switch statements can be optimized better than if-else
 chains, and eventually I'd fully expect that dmd would do that, but right
 now, I don't think that they really are. You'd have to look at the
 generated assembly though to be sure though.


 Please stop spreading the misconception that they are going to be
 optimized "probably and eventually". Switch statements over integers are
 much better then if/else chains and were for something like 20 years.
 They do produce nice improvements over a sequence of if/else because
 values are known in advance and are bounded.
 So it goes like this:


 And for the record I seen all of that in assembly listings produced by
 dmd, when optimizing VM dispatch in std.regex. I had to revisit opcodes
 layout a bit so that dmd outputs jump table and not a binary-searching
 like stuff in a tight loop.


I'm not saying that dmd doesn't ever optimize switch statements. I'm just 
saying that as I understand it, it doesn't always do so (its performance with 
ranged case statements isn't great for instance). Odds are that it _does_ 
optimize straight integer case statements fairly well, because that's the 
classic C stuff (and if you've verified that, all the better - I haven't). It's 
the new stuff that's less likely to be well-optimized (like ranged case 
statements or case statements with strings). And those probably _will_ be 
better optimized eventually, but as I understand it, they aren't now.

So, if the reason for using a switch statement over and an if-else chain is 
purely for performance (particularly if doing something else than just using 
plain case statements with integers like in C), then you should probably 
profile your code and/or look at the assembly to be sure that you're getting 
the performance gain that you think that you're getting.

- Jonathan M Davis

Feb 27 2012

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Re: Re: class templates and static if