www.digitalmars.com         C & C++   DMDScript  

D - Switch Statement Optimisation

reply J Anderson <REMOVEanderson badmama.com.au> writes:
What type of optimisations does the compiler do for switch statements?

For example if I had something like:
switch (val)
{
case 1:
…
break;
case 2:
…
break;
case 3:
…
break;
case 4:
…
break;
case 5:
…
break;
}

Could the compiler change this to something like.

If (val >= 1 && val <= 5)
(caseposfunc + val - 1)(); //Ie val is the offset to the function
else
default;


Also if I had:

switch (val)
{
case “Hello”:
…
break;
case “Goodbye”:
...
break;
case “Something else”
...
break;
case “Another thing”;
...
break;
ect…
}


Could the compiler sort the definitions and use a binary lookup (ie 
log(n)) to find a matching entry (if any).

Obviously if it did there would be some kinda threshold where a load of 
comparisons would be more efficient. Also, I guess that in some rare 
cases, this would be less efficient when users sort their cases 
statements by-most-likely first.

-Anderson
Dec 11 2003
parent reply Benji Smith <dlanguage xxagg.com> writes:
On Fri, 12 Dec 2003 06:45:28 +0800, J Anderson
<REMOVEanderson badmama.com.au> wrote:

Obviously if it did there would be some kinda threshold where a load of 
comparisons would be more efficient. Also, I guess that in some rare 
cases, this would be less efficient when users sort their cases 
statements by-most-likely first.

-Anderson

I always sort my switch cases by most-likely-first (to the best of my design-time knowledge). If I think the switch cases will be a bottleneck and I'm not entirely sure of which case is the most likely, I'll benchmark a few different case orderings. --Benji
Dec 11 2003
next sibling parent J Anderson <REMOVEanderson badmama.com.au> writes:
Benji Smith wrote:

On Fri, 12 Dec 2003 06:45:28 +0800, J Anderson
<REMOVEanderson badmama.com.au> wrote:

  

Obviously if it did there would be some kinda threshold where a load of 
comparisons would be more efficient. Also, I guess that in some rare 
cases, this would be less efficient when users sort their cases 
statements by-most-likely first.

-Anderson
    

I always sort my switch cases by most-likely-first (to the best of my design-time knowledge). If I think the switch cases will be a bottleneck and I'm not entirely sure of which case is the most likely, I'll benchmark a few different case orderings. --Benji

If you knew it was likely to be binary you could put the numbers for the most-likely first at the middle points. Also, if the compiler had something like: switch (val) { case "Something X": ... case "Something Y": ... case "Something Z": case "Other X": case "Other Y": case "Other Z": } A dictionary-binary tree could be used, so that each letter (4 letters) only needs to be looked at once. ie if (val[0..4] == "Some") { if (val[4..10] == "thing ") { if (val[10] == "X") ... else if (val[10] == "Y") ... else if (val[10] == "Z") ... else default; } else default; } else if (val[0..6] == "Other ") { if (val[6] == "X") ... else if (val[6] == "Y") ... else if (val[6] == "Z") ... else default; } else default; Of course the asm version would be even more efficient.
Dec 11 2003
prev sibling next sibling parent reply Ilya Minkov <minkov cs.tum.edu> writes:
Benji Smith wrote:
 I always sort my switch cases by most-likely-first (to the best of my
 design-time knowledge).

I think this is wrong. As far as my experience with Delphi goes, this leads to the worst performance. You should generally sort your labels so, that the values are in their ordinal sorting order. This allows the compiler to write no more than 2 IFs and one jump adress table for each nearly consecutive region of values. If you have something like 1, 2, 3, 4, 5, then 1000, 1001, 1002, 1003, you have 2 distinct groups of values. Within the group, values must be sorted in their order for the best performance, but you can decide which group you want to process first, since this might actually done with IFs. Though this probably doesn't matter for 2 regions, it might starting with 3... but as you can guess the impact is minimal. Of course, with enums you can not always have the order in your memory, so i would expect the compiler to reorder the cases automatically. -eye EDIT. If there is an optimal bit-matcher, order of the groups should not mater as well. -eye
Dec 12 2003
parent reply "Sean L. Palmer" <palmer.sean verizon.net> writes:
The compiler should be able to sort your cases for you.  Although this is
made a bit more difficult by fallthrough.  But the compiler probably won't
be able to tell which cases will be hit most frequently.

Sean

"Ilya Minkov" <minkov cs.tum.edu> wrote in message
news:brcrdv$2t9g$2 digitaldaemon.com...
 Benji Smith wrote:
 I always sort my switch cases by most-likely-first (to the best of my
 design-time knowledge).

I think this is wrong. As far as my experience with Delphi goes, this leads to the worst performance. You should generally sort your labels so, that the values are in their ordinal sorting order. This allows the compiler to write no more than 2 IFs and one jump adress table for each nearly consecutive region of values. If you have something like 1, 2, 3, 4, 5, then 1000, 1001, 1002, 1003, you have 2 distinct groups of values. Within the group, values must be sorted in their order for the best performance, but you can decide which group you want to process first, since this might actually done with IFs. Though this probably doesn't matter for 2 regions, it might starting with 3... but as you can guess the impact is minimal. Of course, with enums you can not always have the order in your memory, so i would expect the compiler to reorder the cases automatically. -eye EDIT. If there is an optimal bit-matcher, order of the groups should not mater as well. -eye

Dec 12 2003
next sibling parent J Anderson <REMOVEanderson badmama.com.au> writes:
Sean L. Palmer wrote:

The compiler should be able to sort your cases for you.  Although this is
made a bit more difficult by fallthrough.  But the compiler probably won't
be able to tell which cases will be hit most frequently.

Sean
  

There was this software product I saw advertised for C++ (I've forgotten the name) that could benchmark your code and then optimise functions, but most-used-first. I guess the same technique could apply to switch statements. Although you can't predict every run, it could help a bit.
"Ilya Minkov" <minkov cs.tum.edu> wrote in message
news:brcrdv$2t9g$2 digitaldaemon.com...
  

Benji Smith wrote:
    

I always sort my switch cases by most-likely-first (to the best of my
design-time knowledge).
      

leads to the worst performance. You should generally sort your labels so, that the values are in their ordinal sorting order. This allows the compiler to write no more than 2 IFs and one jump adress table for each nearly consecutive region of values. If you have something like 1, 2, 3, 4, 5, then 1000, 1001, 1002, 1003, you have 2 distinct groups of values. Within the group, values must be sorted in their order for the best performance, but you can decide which group you want to process first, since this might actually done with IFs. Though this probably doesn't matter for 2 regions, it might starting with 3... but as you can guess the impact is minimal. Of course, with enums you can not always have the order in your memory, so i would expect the compiler to reorder the cases automatically. -eye EDIT. If there is an optimal bit-matcher, order of the groups should not mater as well. -eye


Dec 12 2003
prev sibling parent "Walter" <walter digitalmars.com> writes:
"Sean L. Palmer" <palmer.sean verizon.net> wrote in message
news:brd07g$3l1$1 digitaldaemon.com...
 The compiler should be able to sort your cases for you.  Although this is
 made a bit more difficult by fallthrough.  But the compiler probably won't
 be able to tell which cases will be hit most frequently.

Yes. There is no reason for the user to do an ordinal sort, that kind of thing compilers are good at. And since the compiler cannot know which are the most frequent case, having the user sort them by most-used-appears-first, works pretty good in practice.
Dec 12 2003
prev sibling parent reply "Walter" <walter digitalmars.com> writes:
"Ilya Minkov" <minkov cs.tum.edu> wrote in message
news:brcqlo$2s89$1 digitaldaemon.com...
 Benji Smith wrote:
 I always sort my switch cases by most-likely-first (to the best of my
 design-time knowledge).

I think this is wrong. As far as my experience with Delphi goes, this leads to the worst performance. You should generally sort your labels so, that the values are in their ordinal sorting order. This allows the compiler to write no more than 2 IFs and one jump adress table for each nearly consecutive region of values.

With Digital Mars compilers, you're better off putting sorting by most used first. 3 kinds of code are generated for switches, depending on the population of case values: 1) sequence of cmp-je 2) index case value into a jump table 3) search for case value in table, use resulting index into jump table As always, you can verify what is being done by using obj2asm.
Dec 12 2003
parent reply Ilya Minkov <minkov cs.tum.edu> writes:
Walter wrote:

 As always, you can verify what is being done by using obj2asm.

Yup, first look is quite interesting. I'll have to take time some other day, i've got an importand math test tomorrow. I want to propose yet another strategy. When matching on string hashes or pointers, or other nearly-random, loosely populated data, you can almost always find and cut out a few bits which are always different, and use them directly as indexes into a jump table. The table would only have to be somewhat larget than minimal, up to 50% (worst case), generally about the half of that. The next power of 2 in size, that is. If it doesn't quite work in some certain case, you can stack up 2 cases or fallback to the old strategies. -eye
Dec 12 2003
parent reply J Anderson <REMOVEanderson badmama.com.au> writes:
Ilya Minkov wrote:

 Walter wrote:

 As always, you can verify what is being done by using obj2asm.

Yup, first look is quite interesting. I'll have to take time some other day, i've got an importand math test tomorrow. I want to propose yet another strategy. When matching on string hashes or pointers, or other nearly-random, loosely populated data, you can almost always find and cut out a few bits which are always different, and use them directly as indexes into a jump table. The table would only have to be somewhat larget than minimal, up to 50% (worst case), generally about the half of that. The next power of 2 in size, that is. If it doesn't quite work in some certain case, you can stack up 2 cases or fallback to the old strategies. -eye

Walter would be using the best technique known. -Anderson
Dec 12 2003
parent "Walter" <walter digitalmars.com> writes:
"J Anderson" <REMOVEanderson badmama.com.au> wrote in message
news:brdp7l$18gf$2 digitaldaemon.com...
 I guess, case-statement optimisation is a well-threshed-out field, and
 Walter would be using the best technique known.

I could do better code generation for some special cases, but the 3 strategies used cover the territory effectively enough that it just never has been a performance issue.
Dec 12 2003