www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Re: Re: class templates and static if

reply Tyler Jameson Little <beatgammit gmail.com> writes:
--90e6ba211f1b4454c504b9f49e72
Content-Type: text/plain; charset=ISO-8859-1

 Parser!(Type.request) h = new Parser!(Type.request)("Hello world")

Wow, I was so close! I had tried this: Parser!Type.request h = new Parser!Type.request("Hello world"); but that didn't work. I didn't think about enclosing it in parens! I didn't want to do subclassing, because my parser is a state-machine style parser, so it's in a big switch. Pretty gross, but I would like it to be as fast as possible. That's why I thought this model would be so cool, because I could remove conditions from the generated code, and get rid of a lot of the conditionals. Thanks so much! --90e6ba211f1b4454c504b9f49e72 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable &gt; Parser!(Type.request) h =3D new Parser!(Type.request)(&quot;Hello worl= d&quot;)<div><br></div><div>Wow, I was so close! =A0I had tried this:</div>= <div><br></div><div>Parser!Type.request h =3D new Parser!Type.request(&quot= ;Hello world&quot;);</div> <div><br></div><div>but that didn&#39;t work. I didn&#39;t think about encl= osing it in parens!</div><div><br></div><div>I didn&#39;t want to do subcla= ssing, because my parser is a state-machine style parser, so it&#39;s in a = big switch. Pretty gross, but I would like it to be as fast as possible. Th= at&#39;s why I thought this model would be so cool, because I could remove = conditions from the generated code, and get rid of a lot of the conditional= s.</div> <div><br></div><div>Thanks so much!</div> --90e6ba211f1b4454c504b9f49e72--
Feb 27 2012
next sibling parent reply =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 02/27/2012 08:29 AM, Tyler Jameson Little wrote:

 I didn't want to do subclassing, because my parser is a state-machine 

 parser, so it's in a big switch. Pretty gross, but I would like it to 

 fast as possible. That's why I thought this model would be so cool, 

 I could remove conditions from the generated code, and get rid of a 

 the conditionals.

I am pretty sure switch statements boil down to a sequence conditionals consisting of equality comparisons. I know that some compilers use optimizations where the comparisons are converted to a single lookup, but last I checked, dmd does not apply that optimization. Perhaps because it's not implemented yet, or because using a table lookup might be slower because of reaching outside of the cpu cache. (Or another reason that I am not aware of. :)) Ali
Feb 27 2012
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 27.02.2012 23:18, Jonathan M Davis wrote:
 On Monday, February 27, 2012 10:55:12 Ali Çehreli wrote:
 On 02/27/2012 08:29 AM, Tyler Jameson Little wrote:
 I didn't want to do subclassing, because my parser is a state-machine

style
 parser, so it's in a big switch. Pretty gross, but I would like it to

be as
 fast as possible. That's why I thought this model would be so cool,

because
 I could remove conditions from the generated code, and get rid of a

lot of
 the conditionals.

I am pretty sure switch statements boil down to a sequence conditionals consisting of equality comparisons. I know that some compilers use optimizations where the comparisons are converted to a single lookup, but last I checked, dmd does not apply that optimization. Perhaps because it's not implemented yet, or because using a table lookup might be slower because of reaching outside of the cpu cache. (Or another reason that I am not aware of. :))

Yeah. In theory, switch statements can be optimized better than if-else chains, and eventually I'd fully expect that dmd would do that, but right now, I don't think that they really are. You'd have to look at the generated assembly though to be sure though.

Please stop spreading the misconception that they are going to be optimized "probably and eventually". Switch statements over integers are much better then if/else chains and were for something like 20 years. They do produce nice improvements over a sequence of if/else because values are known in advance and are bounded. So it goes like this: 1. If all numbers are tightly packed (e.g. almost all of M..N range is covered by a case) then it does a jump table. So there is almost no comparisons at all (just check it fits the range) and it's just one indirect jump. 2.Even failing that it does a binary-search like comparisons to speed thing up. 3. Sometimes linear searching is faster then binary (on small number of case) and it can estimate this threshold. Same goes for table vs linear search and table vs binary search. 4. In fact it even does combination of all of the above, e.g. separate range in two by one comparison then use jump tables in both etc. For instance: switch(x){ case 0: ... case 1: ... case 4: ... case 9: ... //all cases used up to 40 case 40: It might do something like (depends on compiler vendor ;) ): if( x < 9) { if(x == 0) ... else if(x == 1) ... else if(x == 4) ... } else { if(x <=40) goto table[x]; //pseudocode else goto default; } And for the record I seen all of that in assembly listings produced by dmd, when optimizing VM dispatch in std.regex. I had to revisit opcodes layout a bit so that dmd outputs jump table and not a binary-searching like stuff in a tight loop. -- Dmitry Olshansky
Feb 27 2012
parent reply =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 02/27/2012 02:06 PM, Jonathan M Davis wrote:

 I'm not saying that dmd doesn't ever optimize switch statements. I'm just
 saying that as I understand it, it doesn't always do so (its 

 ranged case statements isn't great for instance). Odds are that it _does_
 optimize straight integer case statements fairly well, because that's the
 classic C stuff (and if you've verified that, all the better - I 

I have played with this optimization recently. (Could be dmd 2.057.) No, dmd did not optimize a straightforward switch statement over a ubyte expression with about two hundred ubyte cases. The assembly contained conditional jump statements, not a table. And yes, I did try with -O. But I am not sure that a lookup table really is an optimization with modern CPUs. A series of conditional jumps that fit the CPU's cache could be faster than a table that's outside of the cache. I think accessing the cache is hundreds of times faster than accessing memory outside of the cache. Ali
Feb 27 2012
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 28.02.2012 2:17, Ali Çehreli wrote:
 On 02/27/2012 02:06 PM, Jonathan M Davis wrote:

  > I'm not saying that dmd doesn't ever optimize switch statements. I'm
 just
  > saying that as I understand it, it doesn't always do so (its
 performance with
  > ranged case statements isn't great for instance). Odds are that it
 _does_
  > optimize straight integer case statements fairly well, because that's
 the
  > classic C stuff (and if you've verified that, all the better - I
 haven't).

I would say use switch when you have multiple choises over one variable, because the compiler *can* do much better job then with if/else chain (even for strings, e.g. hashing, same binary-search approach). Because the if/else chain fixes the order of comparisons it can't become faster at all (or it would be an aggressive optimization). It's just wrong idea to lose this opportunity with switch.
 I have played with this optimization recently. (Could be dmd 2.057.) No,
 dmd did not optimize a straightforward switch statement over a ubyte
 expression with about two hundred ubyte cases.

doubled. Mm care to share you experiments? Alternatively just dissemble any regex sample, seek out mangled trash of _xxx_eval_xxxx function.
 The assembly contained conditional jump statements, not a table. And
 yes, I did try with -O.

 But I am not sure that a lookup table really is an optimization with
 modern CPUs. A series of conditional jumps that fit the CPU's cache
 could be faster than a table that's outside of the cache. I think
 accessing the cache is hundreds of times faster than accessing memory
 outside of the cache.

It's all about branch prediction really - hundreds of cmp xxx je xxx stall pipelines pretty darn well. Though the effect might depend on the code size inside these case branches. BTW tables should be somewhere close, though I haven't checked this. -- Dmitry Olshansky
Feb 28 2012
parent =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 02/28/2012 02:12 AM, Dmitry Olshansky wrote:
 On 28.02.2012 2:17, Ali Çehreli wrote:

 I have played with this optimization recently. (Could be dmd 2.057.) No,
 dmd did not optimize a straightforward switch statement over a ubyte
 expression with about two hundred ubyte cases.

doubled.

That's really good to hear. Perhaps my experiments were either naive, or a series of "cmp xxx je xxx" would be faster, say for the ubyte type that I've used.
 Mm care to share you experiments?

Here is a program that mixes-in a lookup function that contains a single switch statement that covers the entire 256 values of ubyte: import std.stdio; import std.conv; string oneCase(FromT, ToT)(FromT fromExpr, ToT toExpr) { return " case " ~ to!string(fromExpr) ~ ": return " ~ to!string(ToT.init) ~ "; "; } string manyCases(FromT, ToT)(size_t count) { string result; ToT fromExpr; foreach (i; 0 .. count) { result ~= oneCase(fromExpr, 42); ++fromExpr; } return result; } string lookUpFunc(FromT, ToT)(size_t count) { immutable header = ToT.stringof ~ " lookUp(" ~ FromT.stringof ~ " expr) { switch (expr) { "; immutable cases = manyCases!(FromT, ToT)(count); immutable footer = " default: return " ~ to!string(ToT.init) ~ "; } }"; return header ~ cases ~ footer; } void main() { enum funcWithManyCases = lookUpFunc!(ubyte, ubyte)(256); writeln("Mixing-in: ", funcWithManyCases); mixin(funcWithManyCases); assert(lookUp(42) == 0); } I compiled with -O and then used obj2asm to see that there are 256 cmp xxx jmp xxx instructions. Ali
Feb 29 2012
prev sibling next sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Monday, February 27, 2012 10:55:12 Ali Çehreli wrote:
 On 02/27/2012 08:29 AM, Tyler Jameson Little wrote:
 I didn't want to do subclassing, because my parser is a state-machine

style
 parser, so it's in a big switch. Pretty gross, but I would like it to

be as
 fast as possible. That's why I thought this model would be so cool,

because
 I could remove conditions from the generated code, and get rid of a

lot of
 the conditionals.

I am pretty sure switch statements boil down to a sequence conditionals consisting of equality comparisons. I know that some compilers use optimizations where the comparisons are converted to a single lookup, but last I checked, dmd does not apply that optimization. Perhaps because it's not implemented yet, or because using a table lookup might be slower because of reaching outside of the cpu cache. (Or another reason that I am not aware of. :))

Yeah. In theory, switch statements can be optimized better than if-else chains, and eventually I'd fully expect that dmd would do that, but right now, I don't think that they really are. You'd have to look at the generated assembly though to be sure though. - Jonathan M Davis
Feb 27 2012
prev sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Monday, February 27, 2012 23:55:39 Dmitry Olshansky wrote:
 On 27.02.2012 23:18, Jonathan M Davis wrote:
 On Monday, February 27, 2012 10:55:12 Ali Çehreli wrote:
 On 02/27/2012 08:29 AM, Tyler Jameson Little wrote:
 I didn't want to do subclassing, because my parser is a state-machine

style
 parser, so it's in a big switch. Pretty gross, but I would like it to

be as
 fast as possible. That's why I thought this model would be so cool,

because
 I could remove conditions from the generated code, and get rid of a

lot of
 the conditionals.

I am pretty sure switch statements boil down to a sequence conditionals consisting of equality comparisons. I know that some compilers use optimizations where the comparisons are converted to a single lookup, but last I checked, dmd does not apply that optimization. Perhaps because it's not implemented yet, or because using a table lookup might be slower because of reaching outside of the cpu cache. (Or another reason that I am not aware of. :))

Yeah. In theory, switch statements can be optimized better than if-else chains, and eventually I'd fully expect that dmd would do that, but right now, I don't think that they really are. You'd have to look at the generated assembly though to be sure though.

Please stop spreading the misconception that they are going to be optimized "probably and eventually". Switch statements over integers are much better then if/else chains and were for something like 20 years. They do produce nice improvements over a sequence of if/else because values are known in advance and are bounded. So it goes like this:

 And for the record I seen all of that in assembly listings produced by
 dmd, when optimizing VM dispatch in std.regex. I had to revisit opcodes
 layout a bit so that dmd outputs jump table and not a binary-searching
 like stuff in a tight loop.

I'm not saying that dmd doesn't ever optimize switch statements. I'm just saying that as I understand it, it doesn't always do so (its performance with ranged case statements isn't great for instance). Odds are that it _does_ optimize straight integer case statements fairly well, because that's the classic C stuff (and if you've verified that, all the better - I haven't). It's the new stuff that's less likely to be well-optimized (like ranged case statements or case statements with strings). And those probably _will_ be better optimized eventually, but as I understand it, they aren't now. So, if the reason for using a switch statement over and an if-else chain is purely for performance (particularly if doing something else than just using plain case statements with integers like in C), then you should probably profile your code and/or look at the assembly to be sure that you're getting the performance gain that you think that you're getting. - Jonathan M Davis
Feb 27 2012