www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - C++ static analysis

reply bearophile <bearophileHUGS lycos.com> writes:
Through Reddit I've seen another document that shows common little bugs in
C/C++ code found by a static analysis tool. The bugs shown seem to be real,
from real software:
http://www.slideshare.net/Andrey_Karpov/add2011-en

From the slides I have selected seven of the bugs, here translated to D2:


enum uint FLAG1 = 0b001;
void main() {
    int x = 1000;
    int y = 1;
    if (!y & x) {} //#1

    uint flags;
    if (flags && FLAG1) {} //#2

    if (y == 0 || 10 || y) {} //#3

    int i;
    for (i = 0; i < 10; i++)
        for (i = 1; i < 5; i++) {} //#4

    if (x || x) {} //#5

    if (x) y = 1; else y = 1; //#6

    auto z = x * y ? 10 : 20; //#7
}


Notes about the bugs:
- In #1 the programmer meant to write if(!(y & x)). This is often a bug.
- In #2 the programmer probably meant to use & instead of && to intersect the
flags.
- In #3 the boolean condition is always true, and y is ignored.
- In #4 there are two nested loops with the same index variable.
- In #5 the two conditions in or are the exactly the same. When this happens,
it's probably a bug.
- In #6 the two branches of the if are exactly the same. When this happens,
it's probably a bug.
- In #7 the programmer may have misunderstood the precedence levels between ?:
operator and the * (mult) operator.

Notes about D2:
- Bug #1 is common, easy to catch statically, so I think it's worth catching by
D2 language (we have discussed this bug in past too, but the catching code is
not present yet). I think D2 may require parentheses here.
- Regarding #2, that static analysis tool catches it too, it seems, but I don't
know how to catch it, or if it's worth catching. But it's probably a not so
uncommon bug.
- #3 is an example of always true or always false boolean expression (that's
not 0/1 or true/false). It's a case that asks for considerations similar of #5
ones.
- Bug #4 is probably easy to find, but in D is less common, because you usually
use foreach, or for with local index variable (that D doesn't allow you to
redefine). Once in a while that kind of code is correct, maybe.
- Bug #5 and #6 are similar. Simular redundancies often are bugs. But I don't
know if D2 wants to actually turn this correct code into a real compile-time
error. But it's worth thinking more about it.
- Bug #7 is easy to catch, and the fix is to ask for more parentheses. But I
think it's not a so common bug.

So:
- I suggest to change D2 to catch #1 statically.
- I encourage you to think more about what to do with bugs #3, #5, #6 (there is
another related bug, with class/struct members assignments,
http://d.puremagic.com/issues/show_bug.cgi?id=3878 ).
- Maybe think about fixing #7 too. ?: operators are a rich source of bugs
anyway.

Bye,
bearophile
May 05 2011
next sibling parent reply Don <nospam nospam.com> writes:
bearophile wrote:
 Through Reddit I've seen another document that shows common little bugs in
C/C++ code found by a static analysis tool. The bugs shown seem to be real,
from real software:
 http://www.slideshare.net/Andrey_Karpov/add2011-en
 
 From the slides I have selected seven of the bugs, here translated to D2:
 
 
 enum uint FLAG1 = 0b001;
 void main() {
     int x = 1000;
     int y = 1;
     if (!y & x) {} //#1
 
     uint flags;
     if (flags && FLAG1) {} //#2
 
     if (y == 0 || 10 || y) {} //#3
 
     int i;
     for (i = 0; i < 10; i++)
         for (i = 1; i < 5; i++) {} //#4
 
     if (x || x) {} //#5
 
     if (x) y = 1; else y = 1; //#6
 
     auto z = x * y ? 10 : 20; //#7
 }
 
 
 Notes about the bugs:

 - In #1 the programmer meant to write if(!(y & x)). This is often a bug.

should always be rewritten as: if (!y && x) // wasn't a bug, but was unclear OR if (!(y & x)) // was a bug OR in the one-in-a-million case where the existing code is correct: if ( (!y) & x ) // wasn't a bug, but even this case is clearer: x MUST be evaluated Making #7 an error would break a lot of my code, unless it were done in a really complicated way (allowing boolean expressions, but not integral ones other than UnaryExpressions). And that seems quite hard to justify. But I don't find the other cases convincing. They don't seem any more likely to be bugs than not. For example, a variation of case #3: if (1 || y) {} is common and completely legitimate. Disallowing that would be very annoying.
May 06 2011
parent bearophile <bearophileHUGS lycos.com> writes:
Don:

 I would say that disallowing it would _always_ lead to clearer code.

OK. It's in Bugzilla since some time: http://d.puremagic.com/issues/show_bug.cgi?id=5409 (But if we want to disallow something then we have to list exactly what syntax are not allowed.)
 Making #7 an error would break a lot of my code, unless it were done in
 a really complicated way (allowing boolean expressions, but not integral
 ones other than UnaryExpressions). And that seems quite hard to justify.

I think in this case introducing complex rules is not good. auto z = x * y ? 10 : 20; //#7 auto z = x * (y ? 10 : 20);
 But I don't find the other cases convincing. They don't seem any more
 likely to be bugs than not. For example, a variation of case #3:
 if (1 || y) {}
 is common and completely legitimate. Disallowing that would be very
 annoying.

I agree that sometimes it's not a bug (example: the 1 is the __ctfe pseudo-constant). Always true or always false boolean conditions (unless they are just a single "true", "false", "0" or "1" literal) are sometimes associated with bugs. Good C lints give just a warning in this case, no error. I agree that statically disallowing this in all cases is too much harsh. ------------------ The midway cases are #5 and #6. If I write manually code like that, it's probably a bug. A similar case (auto-assign): x = x; Or even (auto-assign, through alias): alias x y; x = y; Or even (repeated assign): x = 10; x = 10; (repeated assign, this happens often enough, the second variable is not x): x = Foo.barx; x = Foo.bary; On the other hand automatic code generation (in CTFE) sometimes produces some redundancy in code, this may cause some false positives. But similar redundancies that often hide bugs. So I think this class of troubles deserves more thinking. A common bug in my code is this one, I have had something like this six times or more: http://d.puremagic.com/issues/show_bug.cgi?id=3878 Bye, bearophile
May 06 2011
prev sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
 Through Reddit I've seen another document that shows common little bugs in
C/C++

software:
 http://www.slideshare.net/Andrey_Karpov/add2011-en

 From the slides I have selected seven of the bugs, here translated to D2:


 enum uint FLAG1 = 0b001;
 void main() {
     int x = 1000;
     int y = 1;
     if (!y & x) {} //#1

     uint flags;
     if (flags && FLAG1) {} //#2

     if (y == 0 || 10 || y) {} //#3

     int i;
     for (i = 0; i < 10; i++)
         for (i = 1; i < 5; i++) {} //#4

     if (x || x) {} //#5

     if (x) y = 1; else y = 1; //#6

     auto z = x * y ? 10 : 20; //#7
 }


 Notes about the bugs:
 - In #1 the programmer meant to write if(!(y & x)). This is often a bug.

 - In #2 the programmer probably meant to use & instead of && to intersect the
flags.
 - In #3 the boolean condition is always true, and y is ignored.
 - In #4 there are two nested loops with the same index variable.
 - In #5 the two conditions in or are the exactly the same. When this happens,

 - In #6 the two branches of the if are exactly the same. When this happens,
it's

 - In #7 the programmer may have misunderstood the precedence levels between ?:

 Notes about D2:
 - Bug #1 is common, easy to catch statically, so I think it's worth catching by

present yet). I think D2 may require parentheses here.
 - Regarding #2, that static analysis tool catches it too, it seems, but I don't

uncommon bug. This is actually #3 in disguise.
 - #3 is an example of always true or always false boolean expression (that's
not

The problem is that generated/CTFE'd code might produce that kind of redundancy. I think it would be possible to just catch plain user-written literals, but then, where exactly to make the cut, considering cases like #2?
 - Bug #4 is probably easy to find, but in D is less common, because you usually

redefine). Once in > a while that kind of code is correct, maybe. This is already disallowed (deprecated).
 - Bug #5 and #6 are similar. Simular redundancies often are bugs. But I don't

error. But it's worth thinking more about it. Proving that two sub-expressions are equivalent is a hard task. Has anyone ever experienced a bug that was hard to find similar to those toy examples? Well, maybe if(x||y||z||a||b||x||d){/*do stuff*/}, but who writes such code?
 - Bug #7 is easy to catch, and the fix is to ask for more parentheses. But I

You can write (most) code without ?:. But if a programmer wants to use it, he has to be familiar with its precedence rules (very simple, lower precedence than anything but = and ,). You cannot "fix" it, because any code relying on the precedence (about any code that uses ?: without using many excessive parens) will get broken, which includes most of my programs. It would be possible to disallow ?: outside a (), [], ',' (comma), or ?: expression by just a small change in grammar, so it would be feasible. I as an excessive user of ?: am very much against it though. A little anecdote: Apparently, ?: are not only a source of bugs in user code, but in compilers as well: int a,b,c,d,e; int*p=&(a?b?c:d:e); See http://d.puremagic.com/issues/show_bug.cgi?id=5799.
 So:
 - I suggest to change D2 to catch #1 statically.

is ugly. D has done it before and it seems reasonable to do so, but still ugly. It also implies that the precedence rules would be a design mistake if not for compatibility with C. What is the great catch about having & defined on (bool, int) anyways?
 - I encourage you to think more about what to do with bugs #3, #5, #6 (there is

http://d.puremagic.com/issues/show_bug.cgi?id=3878 ).
 - Maybe think about fixing #7 too. ?: operators are a rich source of bugs
anyway.

 Bye,
 bearophile

Timon
May 06 2011
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
Timon Gehr wrote:
 - Bug #4 is probably easy to find, but in D is less common, because you >
usually

redefine). Once in > a while that kind of code is correct, maybe. This is already disallowed (deprecated).

Oh, it is not. I thought it was because I always use loop local counter variables. But again, where to make the cut? More complex nested loops may legitimately want to access the same variable.
May 06 2011
parent reply bearophile <bearophileHUGS lycos.com> writes:
Timon Gehr:

 The problem is that generated/CTFE'd code might produce that kind of
redundancy.

Right.
 Proving that two sub-expressions are equivalent is a hard task.

A simple equivalence is enough here: the code with normalized whitespace. Those tools are probably doing something not much complex than this.
 Has anyone ever
 experienced a bug that was hard to find similar to those toy examples? Well,
maybe
 if(x||y||z||a||b||x||d){/*do stuff*/}, but who writes such code?

The examples I've shown are present in real code. See the results of the Coccinelle project too.
 But if a programmer wants to use it, he has
 to be familiar with its precedence rules (very simple, lower precedence than
 anything but = and ,).

In computer languages there are plenty simple things that are bug-prone.
 I as an excessive user of ?: am very much against it though.

It's not advisable to use the ?: operator a *lot* :-)
 I think requiring parentheses (providing undefined or ambiguous precedence
rules)
 is ugly. D has done it before and it seems reasonable to do so, but still ugly.

I don't see better solutions given the D Zen contraints. And in this case adding parentheses may be better than doing nothing. If you have some comments about this bug #1, a good place to add them is here: http://d.puremagic.com/issues/show_bug.cgi?id=5409
 It also implies that the precedence rules would be a design mistake if not for
compatibility with C.

I don't fully understand.
 What property makes them so bug-prone?

I am not sure, but I think it encourages a too much compact code style (and you have to be a bit careful with operator precedence too). Probably there is some optimum for compactness of C-like code. The sources of the first J interprer is surely too much compact, and some C# code I've seen seems not enough compact to the point of not allowing an easy read.
 I think they are quite easy to use.

Right, the ?: operator is easy to use. Thank you for your comments, bye, bearophile
May 06 2011
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
 Timon Gehr:

 The problem is that generated/CTFE'd code might produce that kind of
redundancy.

Right.
 Proving that two sub-expressions are equivalent is a hard task.

A simple equivalence is enough here: the code with normalized whitespace. Those

 Has anyone ever
 experienced a bug that was hard to find similar to those toy examples? Well,
maybe
 if(x||y||z||a||b||x||d){/*do stuff*/}, but who writes such code?

The examples I've shown are present in real code. See the results of the

Okay, I see. Lexical equivalence can catch the infamous copy-paste bugs. An alternative is to never ever copy-paste code. But unfortunately the compiler cannot enforce this, catching only the most trivial cases. I think it might be a good idea though. However, having this feature means requiring one AST compare for every boolean operator.
 But if a programmer wants to use it, he has
 to be familiar with its precedence rules (very simple, lower precedence than
 anything but = and ,).

In computer languages there are plenty simple things that are bug-prone.
 I as an excessive user of ?: am very much against it though.

It's not advisable to use the ?: operator a *lot* :-)

I use it whenever I would have to write either if(c) x=y;else x=z; // => x=c?y:z; or if(c) x=z;else y=z; // => c?x:y=z; This reduces code duplication, mostly when x or z are complicated expressions. I do not use it often in larger arithmetical expressions though. Unfortunately, D's conditional operator is not quite as mighty as C++'s (Eg, it cannot throw an exception). On the plus side, it is simple (0.5 p in TDPL, where C++ allocates 1.5 p in the standard to it). I also like g++ style x?:y;
 I think requiring parentheses (providing undefined or ambiguous precedence
rules)
 is ugly. D has done it before and it seems reasonable to do so, but still ugly.

I don't see better solutions given the D Zen contraints. And in this case adding

I agree. You usually don't want to do this anyways.
 If you have some comments about this bug #1, a good place to add them is here:
 http://d.puremagic.com/issues/show_bug.cgi?id=5409


 It also implies that the precedence rules would be a design mistake if not for


 I don't fully understand.

precedence rules are counter-intuitive to many programmers. Eg if a&b/a<<b would behave more like a*b than a&&b/wtf, I think there would be less precedence related bugs. But of course, this breaks compatibility with C. If however C never existed, the precedence rules would be suboptimal.
 What property makes them so bug-prone?

I am not sure, but I think it encourages a too much compact code style (and you

optimum for
 compactness of C-like code. The sources of the first J interprer is surely too

not allowing
 an easy read.

Afaik the average programmer introduces a new bug every 20 lines of code or so. Therefore, you better get your work done in less than that. ;) C# comment: I totally agree. And the usual convention to have { waste an entire line of horizontal space does not make this any better. Unfortunately, balancing the compactness in an optimal way is not possible, because the optimum varies between programmers. I appreciate the fact, that in D most likely "very compact" is the optimal choice most of the time.
 I think they are quite easy to use.

Right, the ?: operator is easy to use. Thank you for your comments, bye, bearophile

The best feature to crush bugs would still be automated model checking, let's hope good program provers will be available soon =). Timon
May 06 2011
parent reply bearophile <bearophileHUGS lycos.com> writes:
Timon Gehr:

 I think it might be a
 good idea though. However, having this feature means requiring one AST compare
for
 every boolean operator.

You are worried about compilation time. I think the feature we're talking about just tests the equivalence of the then/else clauses. Clang has a --analyze switch that runs the (potentially slow) static analyser, otherwise it performs a normal amount of static test on the code. This gives you choice between a faster compilation and a slower analysis able to find some other bugs. Bye, bearophile
May 06 2011
parent Timon Gehr <timon.gehr gmx.ch> writes:
 Timon Gehr:

 I think it might be a
 good idea though. However, having this feature means requiring one AST compare
for
 every boolean operator.

You are worried about compilation time. I think the feature we're talking about

 Clang has a --analyze switch that runs the (potentially slow) static analyser,

choice between a faster compilation and a slower analysis able to find some other bugs.
 Bye,
 bearophile

I think Walter does not like introducing new switches, because it doubles the number of compilers to test and maintain. Timon
May 07 2011