digitalmars.D - C++ static analysis

bearophile (39/39) May 05 2011 Through Reddit I've seen another document that shows common little bugs ...

Don (17/48) May 06 2011 I would say that disallowing it would _always_ lead to clearer code. It

bearophile (27/36) May 06 2011 OK. It's in Bugzilla since some time:

Timon Gehr (46/85) May 06 2011 Actually

Timon Gehr (4/8) May 06 2011 Oh, it is not. I thought it was because I always use loop local counter ...

bearophile (14/28) May 06 2011 A simple equivalence is enough here: the code with normalized whitespace...

Timon Gehr (39/69) May 06 2011 Coccinelle project too.

bearophile (5/8) May 06 2011 You are worried about compilation time. I think the feature we're talkin...

Timon Gehr (7/15) May 07 2011 otherwise it performs a normal amount of static test on the code. This g...

bearophile <bearophileHUGS lycos.com> writes:

Through Reddit I've seen another document that shows common little bugs in
C/C++ code found by a static analysis tool. The bugs shown seem to be real,
from real software:
http://www.slideshare.net/Andrey_Karpov/add2011-en

From the slides I have selected seven of the bugs, here translated to D2:


enum uint FLAG1 = 0b001;
void main() {
    int x = 1000;
    int y = 1;


    uint flags;




    int i;
    for (i = 0; i < 10; i++)







}


Notes about the bugs:


flags.



it's probably a bug.

it's probably a bug.

operator and the * (mult) operator.

Notes about D2:

D2 language (we have discussed this bug in past too, but the catching code is
not present yet). I think D2 may require parentheses here.

know how to catch it, or if it's worth catching. But it's probably a not so
uncommon bug.


ones.

use foreach, or for with local index variable (that D doesn't allow you to
redefine). Once in a while that kind of code is correct, maybe.

know if D2 wants to actually turn this correct code into a real compile-time
error. But it's worth thinking more about it.

think it's not a so common bug.

So:


another related bug, with class/struct members assignments,
http://d.puremagic.com/issues/show_bug.cgi?id=3878 ).

anyway.

Bye,
bearophile

May 05 2011

Don <nospam nospam.com> writes:

bearophile wrote:
 Through Reddit I've seen another document that shows common little bugs in
C/C++ code found by a static analysis tool. The bugs shown seem to be real,
from real software:
 http://www.slideshare.net/Andrey_Karpov/add2011-en
 
 From the slides I have selected seven of the bugs, here translated to D2:
 
 
 enum uint FLAG1 = 0b001;
 void main() {
     int x = 1000;
     int y = 1;

 
     uint flags;

 

 
     int i;
     for (i = 0; i < 10; i++)

 

 

 

 }
 
 
 Notes about the bugs:



I would say that disallowing it would _always_ lead to clearer code. It 
should always be rewritten as:
if (!y && x)  // wasn't a bug, but was unclear
OR
if (!(y & x)) // was a bug
OR in the one-in-a-million case where the existing code is correct:
if ( (!y) & x )  // wasn't a bug, but even this case is clearer: x MUST 
be evaluated


a really complicated way (allowing boolean expressions, but not integral 
ones other than UnaryExpressions). And that seems quite hard to justify.

But I don't find the other cases convincing. They don't seem any more 

if (1 || y) {}
is common and completely legitimate. Disallowing that would be very 
annoying.

May 06 2011

bearophile <bearophileHUGS lycos.com> writes:

Don:

 I would say that disallowing it would _always_ lead to clearer code.

OK. It's in Bugzilla since some time:
http://d.puremagic.com/issues/show_bug.cgi?id=5409
(But if we want to disallow something then we have to list exactly what syntax
are not allowed.)



 a really complicated way (allowing boolean expressions, but not integral
 ones other than UnaryExpressions). And that seems quite hard to justify.

I think in this case introducing complex rules is not good.



auto z = x * (y ? 10 : 20);


 But I don't find the other cases convincing. They don't seem any more

 if (1 || y) {}
 is common and completely legitimate. Disallowing that would be very
 annoying.

I agree that sometimes it's not a bug (example: the 1 is the __ctfe
pseudo-constant). Always true or always false boolean conditions (unless they
are just a single "true", "false", "0" or "1" literal) are sometimes associated
with bugs. Good C lints give just a warning in this case, no error.
I agree that statically disallowing this in all cases is too much harsh.

------------------


probably a bug.

A similar case (auto-assign):
x = x;

Or even (auto-assign, through alias):
alias x y;
x = y;

Or even (repeated assign):
x = 10;
x = 10;

(repeated assign, this happens often enough, the second variable is not x):
x = Foo.barx;
x = Foo.bary;

On the other hand automatic code generation (in CTFE) sometimes produces some
redundancy in code, this may cause some false positives. But similar
redundancies that often hide bugs. So I think this class of troubles deserves
more thinking.


A common bug in my code is this one, I have had something like this six times
or more:
http://d.puremagic.com/issues/show_bug.cgi?id=3878

Bye,
bearophile

May 06 2011

Timon Gehr <timon.gehr gmx.ch> writes:

 Through Reddit I've seen another document that shows common little bugs in
C/C++

code found by a static analysis tool. The bugs shown seem to be real, from real
software:
 http://www.slideshare.net/Andrey_Karpov/add2011-en

 From the slides I have selected seven of the bugs, here translated to D2:


 enum uint FLAG1 = 0b001;
 void main() {
     int x = 1000;
     int y = 1;


     uint flags;




     int i;
     for (i = 0; i < 10; i++)







 }


 Notes about the bugs:


Actually


flags.




it's probably a bug.

it's

probably a bug.


operator and the * (mult) operator.
 Notes about D2:


D2 language (we have discussed this bug in past too, but the catching code is
not
present yet). I think D2 may require parentheses here.


know how to catch it, or if it's worth catching. But it's probably a not so
uncommon bug.




not



The problem is that generated/CTFE'd code might produce that kind of
redundancy. I
think it would be possible to just catch plain user-written literals, but then,




use foreach, or for with local index variable (that D doesn't allow you to
redefine). Once in > a while that kind of code is correct, maybe.

This is already disallowed (deprecated).



know if D2 wants to actually turn this correct code into a real compile-time
error. But it's worth thinking more about it.

Proving that two sub-expressions are equivalent is a hard task. Has anyone ever
experienced a bug that was hard to find similar to those toy examples? Well,
maybe
if(x||y||z||a||b||x||d){/*do stuff*/}, but who writes such code?



think it's not a so common bug.

You can write (most) code without ?:. But if a programmer wants to use it, he
has
to be familiar with its precedence rules (very simple, lower precedence than
anything but = and ,). You cannot "fix" it, because any code relying on the
precedence (about any code that uses ?: without using many excessive parens)
will
get broken, which includes most of my programs. It would be possible to disallow
?: outside a (), [], ',' (comma), or ?: expression by just a small change in
grammar, so it would be feasible. I as an excessive user of ?: am very much
against it though.

A little anecdote: Apparently, ?: are not only a source of bugs in user code,
but
in compilers as well:

int a,b,c,d,e;
int*p=&(a?b?c:d:e);

See http://d.puremagic.com/issues/show_bug.cgi?id=5799.

 So:


I think requiring parentheses (providing undefined or ambiguous precedence
rules)
is ugly. D has done it before and it seems reasonable to do so, but still ugly.
It
also implies that the precedence rules would be a design mistake if not for
compatibility with C. What is the great catch about having & defined on (bool,
int) anyways?



another related bug, with class/struct members assignments,
http://d.puremagic.com/issues/show_bug.cgi?id=3878 ).

anyway.

What property makes them so bug-prone? I think they are quite easy to use.

 Bye,
 bearophile

Timon

May 06 2011

Timon Gehr <timon.gehr gmx.ch> writes:

Timon Gehr wrote:

usually

 use foreach, or for with local index variable (that D doesn't allow you to
 redefine). Once in > a while that kind of code is correct, maybe.

 This is already disallowed (deprecated).

Oh, it is not. I thought it was because I always use loop local counter
variables.
But again, where to make the cut? More complex nested loops may legitimately
want
to access the same variable.

May 06 2011

bearophile <bearophileHUGS lycos.com> writes:

Timon Gehr:

 The problem is that generated/CTFE'd code might produce that kind of
redundancy.

Right.


 Proving that two sub-expressions are equivalent is a hard task.

A simple equivalence is enough here: the code with normalized whitespace. Those
tools are probably doing something not much complex than this.


 Has anyone ever
 experienced a bug that was hard to find similar to those toy examples? Well,
maybe
 if(x||y||z||a||b||x||d){/*do stuff*/}, but who writes such code?

The examples I've shown are present in real code. See the results of the
Coccinelle project too.


 But if a programmer wants to use it, he has
 to be familiar with its precedence rules (very simple, lower precedence than
 anything but = and ,).

In computer languages there are plenty simple things that are bug-prone.


 I as an excessive user of ?: am very much against it though.

It's not advisable to use the ?: operator a *lot* :-)


 I think requiring parentheses (providing undefined or ambiguous precedence
rules)
 is ugly. D has done it before and it seems reasonable to do so, but still ugly.

I don't see better solutions given the D Zen contraints. And in this case
adding parentheses may be better than doing nothing.

http://d.puremagic.com/issues/show_bug.cgi?id=5409


 It also implies that the precedence rules would be a design mistake if not for
compatibility with C.

I don't fully understand.


 What property makes them so bug-prone?

I am not sure, but I think it encourages a too much compact code style (and you
have to be a bit careful with operator precedence too). Probably there is some
optimum for compactness of C-like code. The sources of the first J interprer is

the point of not allowing an easy read.


 I think they are quite easy to use.

Right, the ?: operator is easy to use.

Thank you for your comments, bye,
bearophile

May 06 2011

Timon Gehr <timon.gehr gmx.ch> writes:

 Timon Gehr:

 The problem is that generated/CTFE'd code might produce that kind of
redundancy.

 Right.


 Proving that two sub-expressions are equivalent is a hard task.

 A simple equivalence is enough here: the code with normalized whitespace. Those

tools are probably doing something not much complex than this.
 Has anyone ever
 experienced a bug that was hard to find similar to those toy examples? Well,
maybe
 if(x||y||z||a||b||x||d){/*do stuff*/}, but who writes such code?

 The examples I've shown are present in real code. See the results of the

Coccinelle project too.


Okay, I see. Lexical equivalence can catch the infamous copy-paste bugs. An
alternative is to never ever copy-paste code. But unfortunately the compiler
cannot enforce this, catching only the most trivial cases. I think it might be a
good idea though. However, having this feature means requiring one AST compare
for
every boolean operator.

 But if a programmer wants to use it, he has
 to be familiar with its precedence rules (very simple, lower precedence than
 anything but = and ,).

 In computer languages there are plenty simple things that are bug-prone.


 I as an excessive user of ?: am very much against it though.

 It's not advisable to use the ?: operator a *lot* :-)

I use it whenever I would have to write either
if(c) x=y;else x=z; // => x=c?y:z;
or
if(c) x=z;else y=z; // => c?x:y=z;

This reduces code duplication, mostly when x or z are complicated expressions. I
do not use it often in larger arithmetical expressions though.

Unfortunately, D's conditional operator is not quite as mighty as C++'s (Eg, it
cannot throw an exception). On the plus side, it is simple (0.5 p in TDPL, where
C++ allocates 1.5 p in the standard to it).

I also like g++ style x?:y;

 I think requiring parentheses (providing undefined or ambiguous precedence
rules)
 is ugly. D has done it before and it seems reasonable to do so, but still ugly.

 I don't see better solutions given the D Zen contraints. And in this case
adding

parentheses may be better than doing nothing.

I agree. You usually don't want to do this anyways.


 http://d.puremagic.com/issues/show_bug.cgi?id=5409


 It also implies that the precedence rules would be a design mistake if not for


compatibility with C.
 I don't fully understand.

The fact that that kind of manual disambiguation is necessary implies that the
precedence rules are counter-intuitive to many programmers. Eg if a&b/a<<b would
behave more like a*b than a&&b/wtf, I think there would be less precedence
related
bugs. But of course, this breaks compatibility with C. If however C never
existed,
the precedence rules would be suboptimal.

 What property makes them so bug-prone?

 I am not sure, but I think it encourages a too much compact code style (and you

have to be a bit careful with operator precedence too). Probably there is some
optimum for
 compactness of C-like code. The sources of the first J interprer is surely too


of
not allowing
 an easy read.

Afaik the average programmer introduces a new bug every 20 lines of code or so.
Therefore, you better get your work done in less than that. ;)

line of horizontal space does not make this any better.

Unfortunately, balancing the compactness in an optimal way is not possible,
because the optimum varies between programmers. I appreciate the fact, that in D
most likely "very compact" is the optimal choice most of the time.

 I think they are quite easy to use.

 Right, the ?: operator is easy to use.

 Thank you for your comments, bye,
 bearophile

The best feature to crush bugs would still be automated model checking, let's
hope
good program provers will be available soon =).


Timon

May 06 2011

bearophile <bearophileHUGS lycos.com> writes:

Timon Gehr:

 I think it might be a
 good idea though. However, having this feature means requiring one AST compare
for
 every boolean operator.

You are worried about compilation time. I think the feature we're talking about
just tests the equivalence of the then/else clauses.

Clang has a --analyze switch that runs the (potentially slow) static analyser,
otherwise it performs a normal amount of static test on the code. This gives
you choice between a faster compilation and a slower analysis able to find some
other bugs.

Bye,
bearophile

May 06 2011

Timon Gehr <timon.gehr gmx.ch> writes:

 Timon Gehr:

 I think it might be a
 good idea though. However, having this feature means requiring one AST compare
for
 every boolean operator.

 You are worried about compilation time. I think the feature we're talking about

just tests the equivalence of the then/else clauses.
 Clang has a --analyze switch that runs the (potentially slow) static analyser,

otherwise it performs a normal amount of static test on the code. This gives you
choice between a faster compilation and a slower analysis able to find some
other
bugs.
 Bye,
 bearophile

I think Walter does not like introducing new switches, because it doubles the
number of compilers to test and maintain.

Timon

May 07 2011

D Programming

C/C++ Programming

Other

digitalmars.D - C++ static analysis