www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - C++ static analysis

reply bearophile <bearophileHUGS lycos.com> writes:
Through Reddit I've seen another document that shows common little bugs in
C/C++ code found by a static analysis tool. The bugs shown seem to be real,
from real software:
http://www.slideshare.net/Andrey_Karpov/add2011-en

From the slides I have selected seven of the bugs, here translated to D2:


enum uint FLAG1 = 0b001;
void main() {
    int x = 1000;
    int y = 1;


    uint flags;




    int i;
    for (i = 0; i < 10; i++)







}


Notes about the bugs:


flags.



it's probably a bug.

it's probably a bug.

operator and the * (mult) operator.

Notes about D2:

D2 language (we have discussed this bug in past too, but the catching code is
not present yet). I think D2 may require parentheses here.

know how to catch it, or if it's worth catching. But it's probably a not so
uncommon bug.


ones.

use foreach, or for with local index variable (that D doesn't allow you to
redefine). Once in a while that kind of code is correct, maybe.

know if D2 wants to actually turn this correct code into a real compile-time
error. But it's worth thinking more about it.

think it's not a so common bug.

So:


another related bug, with class/struct members assignments,
http://d.puremagic.com/issues/show_bug.cgi?id=3878 ).

anyway.

Bye,
bearophile
May 05 2011
next sibling parent reply Don <nospam nospam.com> writes:
bearophile wrote:
 Through Reddit I've seen another document that shows common little bugs in
C/C++ code found by a static analysis tool. The bugs shown seem to be real,
from real software:
 http://www.slideshare.net/Andrey_Karpov/add2011-en
 
 From the slides I have selected seven of the bugs, here translated to D2:
 
 
 enum uint FLAG1 = 0b001;
 void main() {
     int x = 1000;
     int y = 1;

 
     uint flags;

 

 
     int i;
     for (i = 0; i < 10; i++)

 

 

 

 }
 
 
 Notes about the bugs:

I would say that disallowing it would _always_ lead to clearer code. It should always be rewritten as: if (!y && x) // wasn't a bug, but was unclear OR if (!(y & x)) // was a bug OR in the one-in-a-million case where the existing code is correct: if ( (!y) & x ) // wasn't a bug, but even this case is clearer: x MUST be evaluated a really complicated way (allowing boolean expressions, but not integral ones other than UnaryExpressions). And that seems quite hard to justify. But I don't find the other cases convincing. They don't seem any more if (1 || y) {} is common and completely legitimate. Disallowing that would be very annoying.
May 06 2011
parent bearophile <bearophileHUGS lycos.com> writes:
Don:

 I would say that disallowing it would _always_ lead to clearer code.
OK. It's in Bugzilla since some time: http://d.puremagic.com/issues/show_bug.cgi?id=5409 (But if we want to disallow something then we have to list exactly what syntax are not allowed.)

 a really complicated way (allowing boolean expressions, but not integral
 ones other than UnaryExpressions). And that seems quite hard to justify.
I think in this case introducing complex rules is not good. auto z = x * (y ? 10 : 20);
 But I don't find the other cases convincing. They don't seem any more

 if (1 || y) {}
 is common and completely legitimate. Disallowing that would be very
 annoying.
I agree that sometimes it's not a bug (example: the 1 is the __ctfe pseudo-constant). Always true or always false boolean conditions (unless they are just a single "true", "false", "0" or "1" literal) are sometimes associated with bugs. Good C lints give just a warning in this case, no error. I agree that statically disallowing this in all cases is too much harsh. ------------------ probably a bug. A similar case (auto-assign): x = x; Or even (auto-assign, through alias): alias x y; x = y; Or even (repeated assign): x = 10; x = 10; (repeated assign, this happens often enough, the second variable is not x): x = Foo.barx; x = Foo.bary; On the other hand automatic code generation (in CTFE) sometimes produces some redundancy in code, this may cause some false positives. But similar redundancies that often hide bugs. So I think this class of troubles deserves more thinking. A common bug in my code is this one, I have had something like this six times or more: http://d.puremagic.com/issues/show_bug.cgi?id=3878 Bye, bearophile
May 06 2011
prev sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
 Through Reddit I've seen another document that shows common little bugs in
C/C++
code found by a static analysis tool. The bugs shown seem to be real, from real software:
 http://www.slideshare.net/Andrey_Karpov/add2011-en

 From the slides I have selected seven of the bugs, here translated to D2:


 enum uint FLAG1 = 0b001;
 void main() {
     int x = 1000;
     int y = 1;


     uint flags;




     int i;
     for (i = 0; i < 10; i++)







 }


 Notes about the bugs:

Actually

flags.



it's probably a bug.

it's
probably a bug.

operator and the * (mult) operator.
 Notes about D2:

D2 language (we have discussed this bug in past too, but the catching code is not present yet). I think D2 may require parentheses here.

know how to catch it, or if it's worth catching. But it's probably a not so uncommon bug.

not
The problem is that generated/CTFE'd code might produce that kind of redundancy. I think it would be possible to just catch plain user-written literals, but then,

use foreach, or for with local index variable (that D doesn't allow you to redefine). Once in > a while that kind of code is correct, maybe. This is already disallowed (deprecated).

know if D2 wants to actually turn this correct code into a real compile-time error. But it's worth thinking more about it. Proving that two sub-expressions are equivalent is a hard task. Has anyone ever experienced a bug that was hard to find similar to those toy examples? Well, maybe if(x||y||z||a||b||x||d){/*do stuff*/}, but who writes such code?

think it's not a so common bug. You can write (most) code without ?:. But if a programmer wants to use it, he has to be familiar with its precedence rules (very simple, lower precedence than anything but = and ,). You cannot "fix" it, because any code relying on the precedence (about any code that uses ?: without using many excessive parens) will get broken, which includes most of my programs. It would be possible to disallow ?: outside a (), [], ',' (comma), or ?: expression by just a small change in grammar, so it would be feasible. I as an excessive user of ?: am very much against it though. A little anecdote: Apparently, ?: are not only a source of bugs in user code, but in compilers as well: int a,b,c,d,e; int*p=&(a?b?c:d:e); See http://d.puremagic.com/issues/show_bug.cgi?id=5799.
 So:

I think requiring parentheses (providing undefined or ambiguous precedence rules) is ugly. D has done it before and it seems reasonable to do so, but still ugly. It also implies that the precedence rules would be a design mistake if not for compatibility with C. What is the great catch about having & defined on (bool, int) anyways?

another related bug, with class/struct members assignments, http://d.puremagic.com/issues/show_bug.cgi?id=3878 ).

anyway.
What property makes them so bug-prone? I think they are quite easy to use.
 Bye,
 bearophile
Timon
May 06 2011
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
Timon Gehr wrote:

usually
use foreach, or for with local index variable (that D doesn't allow you to redefine). Once in > a while that kind of code is correct, maybe. This is already disallowed (deprecated).
Oh, it is not. I thought it was because I always use loop local counter variables. But again, where to make the cut? More complex nested loops may legitimately want to access the same variable.
May 06 2011
parent reply bearophile <bearophileHUGS lycos.com> writes:
Timon Gehr:

 The problem is that generated/CTFE'd code might produce that kind of
redundancy.
Right.
 Proving that two sub-expressions are equivalent is a hard task.
A simple equivalence is enough here: the code with normalized whitespace. Those tools are probably doing something not much complex than this.
 Has anyone ever
 experienced a bug that was hard to find similar to those toy examples? Well,
maybe
 if(x||y||z||a||b||x||d){/*do stuff*/}, but who writes such code?
The examples I've shown are present in real code. See the results of the Coccinelle project too.
 But if a programmer wants to use it, he has
 to be familiar with its precedence rules (very simple, lower precedence than
 anything but = and ,).
In computer languages there are plenty simple things that are bug-prone.
 I as an excessive user of ?: am very much against it though.
It's not advisable to use the ?: operator a *lot* :-)
 I think requiring parentheses (providing undefined or ambiguous precedence
rules)
 is ugly. D has done it before and it seems reasonable to do so, but still ugly.
I don't see better solutions given the D Zen contraints. And in this case adding parentheses may be better than doing nothing. http://d.puremagic.com/issues/show_bug.cgi?id=5409
 It also implies that the precedence rules would be a design mistake if not for
compatibility with C.
I don't fully understand.
 What property makes them so bug-prone?
I am not sure, but I think it encourages a too much compact code style (and you have to be a bit careful with operator precedence too). Probably there is some optimum for compactness of C-like code. The sources of the first J interprer is the point of not allowing an easy read.
 I think they are quite easy to use.
Right, the ?: operator is easy to use. Thank you for your comments, bye, bearophile
May 06 2011
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
 Timon Gehr:

 The problem is that generated/CTFE'd code might produce that kind of
redundancy.
Right.
 Proving that two sub-expressions are equivalent is a hard task.
A simple equivalence is enough here: the code with normalized whitespace. Those
tools are probably doing something not much complex than this.
 Has anyone ever
 experienced a bug that was hard to find similar to those toy examples? Well,
maybe
 if(x||y||z||a||b||x||d){/*do stuff*/}, but who writes such code?
The examples I've shown are present in real code. See the results of the
Coccinelle project too. Okay, I see. Lexical equivalence can catch the infamous copy-paste bugs. An alternative is to never ever copy-paste code. But unfortunately the compiler cannot enforce this, catching only the most trivial cases. I think it might be a good idea though. However, having this feature means requiring one AST compare for every boolean operator.
 But if a programmer wants to use it, he has
 to be familiar with its precedence rules (very simple, lower precedence than
 anything but = and ,).
In computer languages there are plenty simple things that are bug-prone.
 I as an excessive user of ?: am very much against it though.
It's not advisable to use the ?: operator a *lot* :-)
I use it whenever I would have to write either if(c) x=y;else x=z; // => x=c?y:z; or if(c) x=z;else y=z; // => c?x:y=z; This reduces code duplication, mostly when x or z are complicated expressions. I do not use it often in larger arithmetical expressions though. Unfortunately, D's conditional operator is not quite as mighty as C++'s (Eg, it cannot throw an exception). On the plus side, it is simple (0.5 p in TDPL, where C++ allocates 1.5 p in the standard to it). I also like g++ style x?:y;
 I think requiring parentheses (providing undefined or ambiguous precedence
rules)
 is ugly. D has done it before and it seems reasonable to do so, but still ugly.
I don't see better solutions given the D Zen contraints. And in this case adding
parentheses may be better than doing nothing. I agree. You usually don't want to do this anyways.

 http://d.puremagic.com/issues/show_bug.cgi?id=5409


 It also implies that the precedence rules would be a design mistake if not for
compatibility with C.
 I don't fully understand.
The fact that that kind of manual disambiguation is necessary implies that the precedence rules are counter-intuitive to many programmers. Eg if a&b/a<<b would behave more like a*b than a&&b/wtf, I think there would be less precedence related bugs. But of course, this breaks compatibility with C. If however C never existed, the precedence rules would be suboptimal.
 What property makes them so bug-prone?
I am not sure, but I think it encourages a too much compact code style (and you
have to be a bit careful with operator precedence too). Probably there is some optimum for
 compactness of C-like code. The sources of the first J interprer is surely too
of not allowing
 an easy read.
Afaik the average programmer introduces a new bug every 20 lines of code or so. Therefore, you better get your work done in less than that. ;) line of horizontal space does not make this any better. Unfortunately, balancing the compactness in an optimal way is not possible, because the optimum varies between programmers. I appreciate the fact, that in D most likely "very compact" is the optimal choice most of the time.
 I think they are quite easy to use.
Right, the ?: operator is easy to use. Thank you for your comments, bye, bearophile
The best feature to crush bugs would still be automated model checking, let's hope good program provers will be available soon =). Timon
May 06 2011
parent reply bearophile <bearophileHUGS lycos.com> writes:
Timon Gehr:

 I think it might be a
 good idea though. However, having this feature means requiring one AST compare
for
 every boolean operator.
You are worried about compilation time. I think the feature we're talking about just tests the equivalence of the then/else clauses. Clang has a --analyze switch that runs the (potentially slow) static analyser, otherwise it performs a normal amount of static test on the code. This gives you choice between a faster compilation and a slower analysis able to find some other bugs. Bye, bearophile
May 06 2011
parent Timon Gehr <timon.gehr gmx.ch> writes:
 Timon Gehr:

 I think it might be a
 good idea though. However, having this feature means requiring one AST compare
for
 every boolean operator.
You are worried about compilation time. I think the feature we're talking about
just tests the equivalence of the then/else clauses.
 Clang has a --analyze switch that runs the (potentially slow) static analyser,
otherwise it performs a normal amount of static test on the code. This gives you choice between a faster compilation and a slower analysis able to find some other bugs.
 Bye,
 bearophile
I think Walter does not like introducing new switches, because it doubles the number of compilers to test and maintain. Timon
May 07 2011