www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Clang static analysis results for dmd

reply Robert Clipsham <robert octarineparrot.com> writes:
Hi all,

I took the liberty of running dmd through clang's static analysis, and 
it's turned up quite a few (486) potential bugs in dmd:

http://octarineparrot.com/assets/dmd/

There's bound to be a few false positives in there (please feel free to 
report them to the nice folk over at llvm), and the dead code won't 
cause any problems (and most of the "dead code" in the backend is 
probably in use with dmc or something), but there are still a few 
hundred potential crashes in there.

The link above is rather slow - if anyone wants a local copy to work 
with let me know and I'll see about getting one to you... The compressed 
source to those pages is ~206MB, it's about 2.8GB without compression!

-- 
Robert
http://octarineparrot.com/
Jul 24 2011
next sibling parent Robert Clipsham <robert octarineparrot.com> writes:
On 24/07/2011 22:06, Robert Clipsham wrote:
 Hi all,

 I took the liberty of running dmd through clang's static analysis, and
 it's turned up quite a few (486) potential bugs in dmd:

 http://octarineparrot.com/assets/dmd/

 There's bound to be a few false positives in there (please feel free to
 report them to the nice folk over at llvm), and the dead code won't
 cause any problems (and most of the "dead code" in the backend is
 probably in use with dmc or something), but there are still a few
 hundred potential crashes in there.

 The link above is rather slow - if anyone wants a local copy to work
 with let me know and I'll see about getting one to you... The compressed
 source to those pages is ~206MB, it's about 2.8GB without compression!

Question: Is it worth making bug reports out of any of these? If so, which, and should I group them? -- Robert http://octarineparrot.com/
Jul 24 2011
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 7/24/11 4:06 PM, Robert Clipsham wrote:
 Hi all,

 I took the liberty of running dmd through clang's static analysis, and
 it's turned up quite a few (486) potential bugs in dmd:

 http://octarineparrot.com/assets/dmd/

 There's bound to be a few false positives in there (please feel free to
 report them to the nice folk over at llvm), and the dead code won't
 cause any problems (and most of the "dead code" in the backend is
 probably in use with dmc or something), but there are still a few
 hundred potential crashes in there.

Very interesting. I selected randomly one of each category and found only false positives though, mostly because the tool does not understand that assert(p) will subsequently guarantee p is non-null, and assert(0) terminates the program. Andrei
Jul 24 2011
parent reply Robert Clipsham <robert octarineparrot.com> writes:
On 24/07/2011 22:29, Andrei Alexandrescu wrote:
 On 7/24/11 4:06 PM, Robert Clipsham wrote:
 Hi all,

 I took the liberty of running dmd through clang's static analysis, and
 it's turned up quite a few (486) potential bugs in dmd:

 http://octarineparrot.com/assets/dmd/

 There's bound to be a few false positives in there (please feel free to
 report them to the nice folk over at llvm), and the dead code won't
 cause any problems (and most of the "dead code" in the backend is
 probably in use with dmc or something), but there are still a few
 hundred potential crashes in there.

Very interesting. I selected randomly one of each category and found only false positives though, mostly because the tool does not understand that assert(p) will subsequently guarantee p is non-null, and assert(0) terminates the program. Andrei

That's true for quite a few of the null pointer dereferences, there are some where there are no assertions though. It might be worth reporting these false positives to the folk working on it. In the other categories there are far fewer where they're false positives due to the assertion. It tends to be the ones with shorter path lengths that exhibit this problem. -- Robert http://octarineparrot.com/
Jul 24 2011
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 7/24/2011 2:43 PM, Robert Clipsham wrote:
 That's true for quite a few of the null pointer dereferences, there are some
 where there are no assertions though. It might be worth reporting these false
 positives to the folk working on it. In the other categories there are far
fewer
 where they're false positives due to the assertion. It tends to be the ones
with
 shorter path lengths that exhibit this problem.

dmd uses its own definition of assert(). It is marked as: #pragma noreturn(util_assert) in tassert.h. If there is an equivalent for clang, I suggest adding that in and rerunning the analysis. Getting rid of hundreds of false positives will save a lot of time.
Jul 24 2011
parent reply Robert Clipsham <robert octarineparrot.com> writes:
On 24/07/2011 22:59, Walter Bright wrote:
 On 7/24/2011 2:43 PM, Robert Clipsham wrote:
 That's true for quite a few of the null pointer dereferences, there
 are some
 where there are no assertions though. It might be worth reporting
 these false
 positives to the folk working on it. In the other categories there are
 far fewer
 where they're false positives due to the assertion. It tends to be the
 ones with
 shorter path lengths that exhibit this problem.

dmd uses its own definition of assert(). It is marked as: #pragma noreturn(util_assert) in tassert.h. If there is an equivalent for clang, I suggest adding that in and rerunning the analysis. Getting rid of hundreds of false positives will save a lot of time.

http://clang.llvm.org/docs/LanguageExtensions.html#__builtin_unreachable If you'd do the honours of adding this in I'll rerun the analysis next time I get a chance. -- Robert http://octarineparrot.com/
Jul 24 2011
parent Walter Bright <newshound2 digitalmars.com> writes:
On 7/24/2011 3:11 PM, Robert Clipsham wrote:
 http://clang.llvm.org/docs/LanguageExtensions.html#__builtin_unreachable

 If you'd do the honours of adding this in I'll rerun the analysis next time I
 get a chance.

Done.
Jul 24 2011
prev sibling next sibling parent bearophile <bearophileHUGS lycos.com> writes:
Robert Clipsham:

 I took the liberty of running dmd through clang's static analysis,

Thank you, this will be useful, and hopefully it will have some good consequences :-) (I'm reading a random sample of the bugs.) Bye, bearophile
Jul 24 2011
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 7/24/2011 2:06 PM, Robert Clipsham wrote:
 I took the liberty of running dmd through clang's static analysis, and it's
 turned up quite a few (486) potential bugs in dmd:

 http://octarineparrot.com/assets/dmd/

 There's bound to be a few false positives in there (please feel free to report
 them to the nice folk over at llvm), and the dead code won't cause any problems
 (and most of the "dead code" in the backend is probably in use with dmc or
 something), but there are still a few hundred potential crashes in there.

Looks like clang has some bugs itself. http://octarineparrot.com/assets/dmd/report-ZGvM0T.html#EndPath if ((unsigned short)value != value) 6 Both operands to '!=' always have the same value No, that's not true. value is a uint64_t, and casting it to unsigned short and then back to uint64_t does not yield the same value.
Jul 24 2011
parent reply Robert Clipsham <robert octarineparrot.com> writes:
On 25/07/2011 03:10, Walter Bright wrote:
 Looks like clang has some bugs itself.

 http://octarineparrot.com/assets/dmd/report-ZGvM0T.html#EndPath


 if ((unsigned short)value != value)

 6 Both operands to '!=' always have the same value

 No, that's not true. value is a uint64_t, and casting it to unsigned
 short and then back to uint64_t does not yield the same value.

It does if value is always within the range of an unsigned short. I'm not sure whether it is in this case or not, if not we need to narrow down a test case and report a bug. -- Robert http://octarineparrot.com/
Jul 26 2011
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 7/26/2011 11:01 AM, Robert Clipsham wrote:
 On 25/07/2011 03:10, Walter Bright wrote:
 Looks like clang has some bugs itself.

 http://octarineparrot.com/assets/dmd/report-ZGvM0T.html#EndPath


 if ((unsigned short)value != value)

 6 Both operands to '!=' always have the same value

 No, that's not true. value is a uint64_t, and casting it to unsigned
 short and then back to uint64_t does not yield the same value.

It does if value is always within the range of an unsigned short. I'm not sure whether it is in this case or not, if not we need to narrow down a test case and report a bug.

It's not a bug in dmc. The code is quite deliberate and necessary. The bug is in clang.
Jul 26 2011
parent Robert Clipsham <robert octarineparrot.com> writes:
On 26/07/2011 19:08, Walter Bright wrote:
 It's not a bug in dmc. The code is quite deliberate and necessary. The
 bug is in clang.

$ cat test.c #include <stdint.h> #include <stdlib.h> int main(int argc, char *argv[]) { uint64_t value = atoi(argv[0]); if ((unsigned short)value != value) { return 0; } return 1; } $ scan-build clang -g test.c Note that this leads to no errors from the analyzer - it will only flag it if it knows statically whether value fits within an unsigned short or not. We'll see if it's still flagged when I re-run it. -- Robert http://octarineparrot.com/
Jul 26 2011
prev sibling next sibling parent reply Don <nospam nospam.com> writes:
Robert Clipsham wrote:
 Hi all,
 
 I took the liberty of running dmd through clang's static analysis, and 
 it's turned up quite a few (486) potential bugs in dmd:
 
 http://octarineparrot.com/assets/dmd/
 
 There's bound to be a few false positives in there (please feel free to 
 report them to the nice folk over at llvm), and the dead code won't 
 cause any problems (and most of the "dead code" in the backend is 
 probably in use with dmc or something), but there are still a few 
 hundred potential crashes in there.
 
 The link above is rather slow - if anyone wants a local copy to work 
 with let me know and I'll see about getting one to you... The compressed 
 source to those pages is ~206MB, it's about 2.8GB without compression!
 

early build steps optabgen.c, etc. Either that, or else it doesn't realize that the arrays in those files (eg, optab1[] in optab.c) are constant values. Every backend report that I looked at, was a false positive caused by this.
Jul 25 2011
parent Robert Clipsham <robert octarineparrot.com> writes:
On 25/07/2011 13:05, Don wrote:
 It seems to have ignored the backend source files which are generated by
 early build steps optabgen.c, etc. Either that, or else it doesn't
 realize that the arrays in those files (eg, optab1[] in optab.c) are
 constant values.
 Every backend report that I looked at, was a false positive caused by this.

It has indeed ignored them. This is a consequence of the way I was running the build, I'll see if I can sort this out when I run it again. -- Robert http://octarineparrot.com/
Jul 26 2011
prev sibling next sibling parent reply Don <nospam nospam.com> writes:
Robert Clipsham wrote:
 Hi all,
 
 I took the liberty of running dmd through clang's static analysis, and 
 it's turned up quite a few (486) potential bugs in dmd:
 
 http://octarineparrot.com/assets/dmd/
 
 There's bound to be a few false positives in there (please feel free to 
 report them to the nice folk over at llvm), and the dead code won't 
 cause any problems (and most of the "dead code" in the backend is 
 probably in use with dmc or something), but there are still a few 
 hundred potential crashes in there.
 
 The link above is rather slow - if anyone wants a local copy to work 
 with let me know and I'll see about getting one to you... The compressed 
 source to those pages is ~206MB, it's about 2.8GB without compression!
 

Here's a clang bug: http://octarineparrot.com/assets/dmd/report-vtxpYt.html#EndPath 3436 if (!exp) <9> Taking false branch 3437 fd->nrvo_can = 0; 3438 3439 if (exp) <10> Taking false branch 3440 { That is, (!exp) is false, and (exp) is also false.
Jul 27 2011
next sibling parent Don <nospam nospam.com> writes:
Don wrote:
 Robert Clipsham wrote:
 Hi all,

 I took the liberty of running dmd through clang's static analysis, and 
 it's turned up quite a few (486) potential bugs in dmd:

 http://octarineparrot.com/assets/dmd/

 There's bound to be a few false positives in there (please feel free 
 to report them to the nice folk over at llvm), and the dead code won't 
 cause any problems (and most of the "dead code" in the backend is 
 probably in use with dmc or something), but there are still a few 
 hundred potential crashes in there.

 The link above is rather slow - if anyone wants a local copy to work 
 with let me know and I'll see about getting one to you... The 
 compressed source to those pages is ~206MB, it's about 2.8GB without 
 compression!

Here's a clang bug: http://octarineparrot.com/assets/dmd/report-vtxpYt.html#EndPath 3436 if (!exp) <9> Taking false branch 3437 fd->nrvo_can = 0; 3438 3439 if (exp) <10> Taking false branch 3440 { That is, (!exp) is false, and (exp) is also false.

Another one: http://octarineparrot.com/assets/dmd/report-bksOGf.html#EndPath Quite bizarre -- it seems to think the static member array Type::sizeTy[TMAX] is a null pointer. I finally found one genuine DMD bug report: http://d.puremagic.com/issues/show_bug.cgi?id=6389 But it looks to me as though the reports show more bugs in clang, than in DMD.
Jul 27 2011
prev sibling parent Trass3r <un known.com> writes:
Am 27.07.2011, 10:22 Uhr, schrieb Don <nospam nospam.com>:
 Here's a clang bug:
 http://octarineparrot.com/assets/dmd/report-vtxpYt.html#EndPath

 3436 if (!exp)
 	
       <9> Taking false branch
 3437	fd->nrvo_can = 0;
 3438	
 3439	if (exp)
 	
       <10> Taking false branch
 3440	{


 That is, (!exp) is false, and (exp) is also false.

Yeah, though that code should really use else instead of another check.
Jul 27 2011
prev sibling parent reply Robert Clipsham <robert octarineparrot.com> writes:
On 24/07/2011 22:06, Robert Clipsham wrote:
 Hi all,

 I took the liberty of running dmd through clang's static analysis, and
 it's turned up quite a few (486) potential bugs in dmd:

 http://octarineparrot.com/assets/dmd/

 There's bound to be a few false positives in there (please feel free to
 report them to the nice folk over at llvm), and the dead code won't
 cause any problems (and most of the "dead code" in the backend is
 probably in use with dmc or something), but there are still a few
 hundred potential crashes in there.

 The link above is rather slow - if anyone wants a local copy to work
 with let me know and I'll see about getting one to you... The compressed
 source to those pages is ~206MB, it's about 2.8GB without compression!

I've posted updated results: http://octarineparrot.com/assets/dmd/ You'll probably need to hit refresh before the new results come up. Note that in addition to Walter's global.h changes, __attribute__((noreturn)) needed to be added to local_assert and util_assert in tassert.h too (could you add these too Walter?) -- Robert http://octarineparrot.com/
Jul 27 2011
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Robert Clipsham:

 I've posted updated results:

I'd like all those tests done by the D compiler too (Clang doesn't perform those tests on default because they take time, you have to add the --analyze compiler switch). Some of them like Dead store look better as warnings, other ones seem better as errors. Bye, bearophile
Jul 27 2011
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 7/27/2011 3:50 PM, bearophile wrote:
 I'd like all those tests done by the D compiler too (Clang doesn't perform
 those tests on default because they take time, you have to add the --analyze
 compiler switch). Some of them like Dead store look better as warnings, other
 ones seem better as errors.

The signal to noise ratio of this kind of flow analysis is rather poor. While it does find some legitimate bugs, the rate of false positives is far too high to be a standard part of the language. I would agree that adding extra conditionals to the source code will both eliminate the false positives and make the code more readable, but those extra conditionals exact a performance penalty and would not be something a high performance coder would want. I originally did have stuff like this in the optimizer, but removed it because the false positive rate was untenable.
Jul 28 2011
parent reply bearophile <bearophileHUGS lycos.com> writes:
Walter:

The signal to noise ratio of this kind of flow analysis is rather poor. While
it does find some legitimate bugs, the rate of false positives is far too high
to be a standard part of the language.<

I don't fully understand your answers on this. I am a bit confused, but it's not your fault. In Clang those tests aren't a standard part of the C or C++ languages. They are extra tests, like a lint tool built in the compiler, and they aren't a part of the normal compilation (if you use --analyze it doesn't produce a compiled binary, but an HTML of the test results). I take a look at a random sample of the first groups of the results: --------------------- Some Dead code, Idempotent operation: uRegmask3 = ASM_GET_uRegmask(popnd3->usFlags); Value stored to 'uRegmask3' is never read ty = ta->ty; Value stored to 'ty' is never read --------------------- Some Dead code, dead assignment: uSizemask3 = ASM_GET_uSizemask(popnd3->usFlags); Value stored to 'uSizemask3' is never read s = retregs & mES; Value stored to 's' is never read --------------------- Some Dead code, Dead increment: offset += vtblInterfaces->dim * (4 * PTRSIZE); Value stored to 'offset' is never read flags |= 1; // already deduced, so don't to toHeadMutable() Value stored to 'flags' is never read --------------------- Dead store Dead initialization: TY tyto = t->toBasetype()->ty; Value stored to 'tyto' during its initialization is never read int aimports_dim = aimports.dim; Value stored to 'aimports_dim' during its initialization is never read --------------------- Logic error Assigned value is garbage or undefined: Parameters *FuncDeclaration::getParameters(int *pvarargs) 2918 { Parameters *fparameters; 2919 int fvarargs; 2920 2921 if (type) 1 Taking false branch 2922 { 2923 assert(type->ty == Tfunction); 2924 TypeFunction *fdtype = (TypeFunction *)type; 2925 fparameters = fdtype->parameters; 2926 fvarargs = fdtype->varargs; 2927 } 2928 if (pvarargs) 2 Taking true branch 2929 *pvarargs = fvarargs; 3 Assigned value is garbage or undefined --------------------- Logic error Dereference of undefined pointer value: STATIC void ivfamelems(register Iv *biv,register elem **pn) 2447 { register unsigned op; 2448 register tym_t ty,c2ty; 2449 register famlist *f; 2450 register elem *n,*n1,*n2; 2451 2452 assert(pn); 2453 n = *pn; 2454 assert(biv && n); 2455 op = n->Eoper; 2456 if (OTunary(op)) 1 Taking true branch 2457 { ivfamelems(biv,&n->E1); 2458 n1 = n->E1; 2459 } 2460 else if (OTbinary(op)) 2461 { ivfamelems(biv,&n->E1); 2462 ivfamelems(biv,&n->E2); /* LTOR or RTOL order is unimportant */ 2463 n1 = n->E1; 2464 n2 = n->E2; 2465 } 2466 else /* else leaf elem */ 2467 return; /* which can't be in the family */ 2468 2469 if (op == OPmul || op == OPadd || op == OPmin || 2 Taking true branch 2470 op == OPneg || op == OPshl) 2471 { /* Note that we are wimping out and not considering */ 2472 /* LI variables as part of c1 and c2, but only constants. */ 2473 2474 ty = n->Ety; 2485 2486 /* If we have (li + var), swap the leaves. */ 2487 if (op == OPadd && isLI(n1) && n1->Eoper == OPvar && n2->Eoper == OPvar) 3 Dereference of undefined pointer value -------------------- 2819 targ_ldouble el_toldouble(elem *e) 2820 { targ_ldouble result; 2821 2822 elem_debug(e); 2823 assert(cnst(e)); 2824 #if TX86 2825 switch (tybasic(typemask(e))) 1 'Default' branch taken. Execution continues on line 2860 2860 return result; 2 Undefined or garbage value returned to caller 2861 } ----------------------- Is Clang correct there, or are those false positives? If it's correct then I'd like the D compiler to tell me 100% of those I have listed here, even if not even one of those is a real bug. In some cases you store a value in a variable even if you know you will not use it (example: last iteration of a loop, to code simpler and shoter. But I'd like to know every time I do this. I like to write tidy code. Returning values that can be undefined is less easy to catch in D, because the language initializes variables, to their initial default value is sometimes what the programmer wants.
I would agree that adding extra conditionals to the source code will both
eliminate the false positives and make the code more readable, but those extra
conditionals exact a performance penalty and would not be something a high
performance coder would want. I originally did have stuff like this in the
optimizer, but removed it because the false positive rate was untenable.<

I don't fully understand what you are saying, but is __assume useful here? http://msdn.microsoft.com/en-us/library/1b3fsfxw%28VS.80%29.aspx Bye, bearophile
Jul 28 2011
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 7/28/2011 3:27 PM, bearophile wrote:
 In Clang those tests aren't a standard part of the C or C++ languages. They
 are extra tests, like a lint tool built in the compiler, and they aren't a
 part of the normal compilation (if you use --analyze it doesn't produce a
 compiled binary, but an HTML of the test results).

If they are part of the compiler, I guarantee you that people will regard it as a standard part of the language, and the complaints about false positives will cause problems.
 Some Dead code, Idempotent operation:

Dead code is not a bug. It's more of a stylistic issue, and sometimes that "dead" code is needed in other code that has been #if'd out. The compiler complaining about dead code is also a nuisance when turning on and off sections of code that is a normal part of the dev process.
 Is Clang correct there, or are those false positives? If it's correct then
 I'd like the D compiler to tell me 100% of those I have listed here, even if
 not even one of those is a real bug.

I've been slowly going through the reports, and so far all of them have been false positives. Don found one that's a real bug, but I haven't gotten to it yet. Here's an example of a false positive - clang complains the comparison is idempotent: size_t e2factor; ... if (e2factor == (int)e2factor) For a 32 bit compile, yes, it's idempotent. But not for a 64 bit compile! I *want* it to be a no-op for a 32 bit compile and to become active in a 64 bit compile. Of course, I could also use an ugly #ifdef, but I like that little idiom, it works, and it is correct. I know why clang is doing what it is doing, but that shows a weakness in its static analysis. There are other false positives for things like assigning an uninitialized value to a field in a data structure that will never be used in the cases where it is uninitialized. I could add a conditional, but that's slower than just assigning it anyway. Trying to figure these things out with static analysis is impossible - it would be solving the halting problem - hence you're stuck with false positives.
Jul 28 2011
next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
Take a look at all the complaints about evalu8.c. They're all false positives.

Consider that this code has been in use for 25 years now. It's been used by 
several hundred thousand developers. If it was so chock full of scores of 
undefined values and null pointer dereferences, I probably would have heard 
about it by now!
Jul 28 2011
prev sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Walter:

 I've been slowly going through the reports, and so far all of them have been 
 false positives.

Somewhere I have read that the errors found by Clang are so often true bugs they have a tool that submits bug reports automatically. I now presume they were wrong. But keep in mind that Clang static analysis is a *very young* sub-tool. It's something like a year or so old. There are commercial C/C++ lints that are probably about or more than 20 years old that are probably less buggy and more precise than the Clang compiler, and they find far more kinds of problems in the code. It often finds issues in my code (usually stylistic issues, not true bugs, but I usually agree with its advice and change my code).
 There are other false positives for things like assigning an uninitialized
value 
 to a field in a data structure that will never be used in the cases where it
is 
 uninitialized. I could add a conditional, but that's slower than just
assigning 
 it anyway. Trying to figure these things out with static analysis is
impossible 
 - it would be solving the halting problem - hence you're stuck with false
positives.

Probably there are ways for a programmer+language to tell such simple semantics to a compiler, but C language is not good enough for this. Thank you for your answers. It's always interesting when "theory" (of practical-purposed software tools) meets the road of the experiment reality. I hope Clang has not wasted too much of your time. Bye, bearophile
Jul 28 2011
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 7/28/2011 5:19 PM, bearophile wrote:
 Thank you for your answers. It's always interesting when "theory" (of
 practical-purposed software tools) meets the road of the experiment reality.
 I hope Clang has not wasted too much of your time.

The thing is, I've tried the things Clang tries way back in the 80's. It's attempting the impossible. C is just not amenable to static proofs (if I were a better theoretician I might be able to show that this is partially related to C being memory unsafe). Even so, I still plan to go through them all just to be sure. Clang does put out a nice report, though, far better than I attempted. A kickass bug finding tool that really *does* work on C/C++ code is valgrind.
Jul 28 2011
next sibling parent bearophile <bearophileHUGS lycos.com> writes:
Walter:

 The thing is, I've tried the things Clang tries way back in the 80's. It's 
 attempting the impossible.

"Unused variables", "dead assignments", "dead last assignments" aren't bugs and they are sometimes acceptable while you debug some code, but you too agree they are better removed from code meant to be good and tidy (also because they sometimes associated with bugs). You have appreciated some of those "dead code" (useless variables) notes from Clang: https://github.com/D-Programming-Language/dmd/compare/c50a936...08aa57a Bye, bearophile
Jul 29 2011
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
Here's another one:

    T* p;
    ...
    if (A)
        p = &t;
    ...
    if (B)
        ... *p ...

B is true if and only if A is true. B can even be the same expression as A,
such 
as a->b->c. Clang complains on the *p that p is "uninitialized". Ok, so I 
rewrite as:

    T* p = NULL;
    ...
    if (A)
        p = &t;
    ...
    if (B)
        ... *p ...

but now clang says I'm dereferencing a NULL pointer. At this point, I'm faced 
with some significant trial-and-error refactoring to get rid of the message.

At what point does this cease to be fixing "bugs" and become "contort to fit 
clang's quirks"?
Jul 29 2011
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 7/29/2011 1:30 PM, Brad Roberts wrote:
 2) if the tool has trouble analyzing the code, there's a not unreasonable
 chance a person also has trouble.  The above case is a good one where
 depending on how close those two if's are in the code and how obvious it
 is that B is a super set of A, it's the kind of thing someone's going to
 have trouble with too.

In general I agree with this, which is why I've made some changes to the source code to 'fix' some of the non-bugs identified by clang. I felt the changes made the code more readable.
 By and large though, this isn't the way I'd spend my time, unless you goal
 is to reduce test cases to feed into clang to improve it.  The
 cost/benefit ratio just doesn't meet the bar.

So far, two real bugs have been identified. This makes it worth one pass through the clang results, but as you say, the rate of false positives is so high it is not worth continuing to use it.
Jul 29 2011
parent Walter Bright <newshound2 digitalmars.com> writes:
On 7/29/2011 2:49 PM, Brad Roberts wrote:
 Two real, hitable, bugs?

Nobody has run into them yet.
Jul 29 2011
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 7/29/2011 1:35 PM, Trass3r wrote:
 Here's another one:

 T* p;
 ...
 if (A)
 p = &t;
 ...
 if (B)
 ... *p ...

 B is true if and only if A is true. B can even be the same expression as A,
 such as a->b->c. Clang complains on the *p that p is "uninitialized".

 At what point does this cease to be fixing "bugs" and become "contort to fit
 clang's quirks"?

Clang's static analysis isn't very mature yet.

It's not about maturity, it's simply impossible to correctly do the above case.
Jul 29 2011
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 7/29/2011 2:53 PM, Brad Roberts wrote:
 You keep using the term impossible.

It's the halting problem. How are you going to prove that a->b->c yields the same result after you've done pretty much any function call?
 It's impossible to be 100% correct in
 100% of the code bases.

When it is built into the compiler, I think it should be 100%. And frankly, the marketing of these false positive tools bugs me. They'll run their tool over some code base, then publish the results and make claims about all these hundreds of "bugs" they've found. Even worse, others will uncritically read those pages and assume those code bases are "buggy".
 Sure, but that's obviously not the goal.  The
 goal is to be correct enough on enough code to be useful.  A far more
 achievable goal.  Clang isn't there yet, but with time and effort it can
 improve towards that goal.  That's where producing test cases on things it
 fails on is useful.. standard bug reporting (or feature gap reporting in
 this case).

 If B can be shown to be a super set of A, which is very doable for a
 reasonable set of cases, then that false positive can be eliminated.

 Continuing to think of these tools as either perfect or useless and useful
 discussions about them are kinda hard to have.

There are two things one can try with static analysis: 1. Flag the code if one cannot prove it is good. 2. Flag the code if one can prove it is bad. clang is going for (1). I think (2) is more appropriate. BTW, a while back Spec# was touted here as having this great theorem contract prover, and what a great tool that was. Reading the documentation on it, it looked great. So I tried it out in pastebin. It pretty much only works on the examples given in the marketing literature. It couldn't even handle bitwise ops. I know enough about data flow analysis to infer what clang is doing from its pattern of reports, and it hasn't even begun to solve this problem. As for breaking new ground, I really like what D is doing with value range propagation in implicit conversions. So far it's been a solid base hit.
Jul 29 2011
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Walter:

 BTW, a while back Spec# was touted here as having this great theorem contract 
 prover, and what a great tool that was. Reading the documentation on it, it 
 looked great. So I tried it out in pastebin. It pretty much only works on the 
 examples given in the marketing literature. It couldn't even handle bitwise
ops.

You are missing something important. The purpose of Spec# (and newer things that are coming out now) is to create a small kernel (for an operating system) that is 100% proved correct (correct according to its specs, etc). If they don't use an automatic theorem prover, then they have to do 100% of the proofs by hand. Even if the tool is able to do only 30% of the proofs automatically (and another 40% with a bit of help) it's a lot of manual work saved. The papers report the percentage of the theorems done by the tool, the percentage of the theorems done partially by hand, and the remaining percentage done by hand. The source code is all available, to verify the numbers inside the papers are not made up.
 I know enough about data flow analysis to infer what clang is doing from its 
 pattern of reports, and it hasn't even begun to solve this problem.

Clang static analysis is a very new sub-tool. On the market there are commercial lint tools that are probably 15-18 times older than Clang. Give Clang two or three years are we'll see. Its creators have created a good C++ compiler in a very short time, and its back-end is already rivalling GCC in optimizations (it's not there yet, but the distance is not so much, auto-vectorization is one of the most important differences between the two back-ends). Bye, bearophile
Jul 29 2011
parent Walter Bright <newshound2 digitalmars.com> writes:
On 7/29/2011 4:51 PM, bearophile wrote:
 Walter:

 BTW, a while back Spec# was touted here as having this great theorem
 contract prover, and what a great tool that was. Reading the documentation
 on it, it looked great. So I tried it out in pastebin. It pretty much only
 works on the examples given in the marketing literature. It couldn't even
 handle bitwise ops.

You are missing something important. The purpose of Spec# (and newer things that are coming out now) is to create a small kernel (for an operating system) that is 100% proved correct (correct according to its specs, etc). If they don't use an automatic theorem prover, then they have to do 100% of the proofs by hand. Even if the tool is able to do only 30% of the proofs automatically (and another 40% with a bit of help) it's a lot of manual work saved. The papers report the percentage of the theorems done by the tool, the percentage of the theorems done partially by hand, and the remaining percentage done by hand. The source code is all available, to verify the numbers inside the papers are not made up.

From the way you originally positioned it, I had higher expectations.
 I know enough about data flow analysis to infer what clang is doing from
 its pattern of reports, and it hasn't even begun to solve this problem.

Clang static analysis is a very new sub-tool. On the market there are commercial lint tools that are probably 15-18 times older than Clang. Give Clang two or three years are we'll see.

I think you misunderstand the scope of this problem. They've done data flow analysis, an off-the-shelf technology for the last 30 years at least. They haven't done any of the hard stuff that I can see. It's like doing natural language translation using a dictionary. It looks like it'll work, and you can have promising early success with it, but then you hit a wall.
Jul 29 2011
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 7/29/11 5:48 PM, Walter Bright wrote:
 As for breaking new ground, I really like what D is doing with value
 range propagation in implicit conversions. So far it's been a solid base
 hit.

But that's also an approximate thing :o). Andrei
Jul 29 2011
parent Walter Bright <newshound2 digitalmars.com> writes:
On 7/29/2011 5:48 PM, Andrei Alexandrescu wrote:
 On 7/29/11 5:48 PM, Walter Bright wrote:
 As for breaking new ground, I really like what D is doing with value
 range propagation in implicit conversions. So far it's been a solid base
 hit.

But that's also an approximate thing :o).

True, but it's done in a way that enables things rather than disallowing them. This is much more palatable.
Jul 29 2011
prev sibling parent reply dennis luehring <dl.soluz gmx.net> writes:
Am 29.07.2011 22:02, schrieb Walter Bright:
 Here's another one:
 
    T* p;
    ...
    if (A)
        p = &t;
    ...
    if (B)
        ... *p ...
 
 B is true if and only if A is true. B can even be the same expression as
 A, such as a->b->c. Clang complains on the *p that p is "uninitialized".
 Ok, so I rewrite as:
 
    T* p = NULL;
    ...
    if (A)
        p = &t;
    ...
    if (B)
        ... *p ...
 
 but now clang says I'm dereferencing a NULL pointer. At this point, I'm
 faced with some significant trial-and-error refactoring to get rid of
 the message.
 
 At what point does this cease to be fixing "bugs" and become "contort to
 fit clang's quirks"?

but your code does not reflect your "B is true if and only if A is true." statement T* p = NULL; if (A) { p = &t; if (B) ... *p ... }
Jul 30 2011
parent reply Don <nospam nospam.com> writes:
dennis luehring wrote:
 Am 29.07.2011 22:02, schrieb Walter Bright:
 Here's another one:

    T* p;
    ...
    if (A)
        p = &t;
    ...
    if (B)
        ... *p ...

 B is true if and only if A is true. B can even be the same expression as
 A, such as a->b->c. Clang complains on the *p that p is "uninitialized".
 Ok, so I rewrite as:

    T* p = NULL;
    ...
    if (A)
        p = &t;
    ...
    if (B)
        ... *p ...

 but now clang says I'm dereferencing a NULL pointer. At this point, I'm
 faced with some significant trial-and-error refactoring to get rid of
 the message.

 At what point does this cease to be fixing "bugs" and become "contort to
 fit clang's quirks"?

but your code does not reflect your "B is true if and only if A is true." statement

The example I posted was clearer. Where p is just a variable: if (!p) { ... } if (p) { ... } it thinks that NEITHER of the code paths is taken. I would rate that as a show-stopper bug. Don't waste any more time on clang until that's fixed.
Jul 31 2011
parent reply dennis luehring <dl.soluz gmx.net> writes:
Am 31.07.2011 11:47, schrieb Don:
 dennis luehring wrote:
 Am 29.07.2011 22:02, schrieb Walter Bright:
 Here's another one:

    T* p;
    ...
    if (A)
        p = &t;
    ...
    if (B)
        ... *p ...

 B is true if and only if A is true. B can even be the same expression as
 A, such as a->b->c. Clang complains on the *p that p is "uninitialized".
 Ok, so I rewrite as:

    T* p = NULL;
    ...
    if (A)
        p = &t;
    ...
    if (B)
        ... *p ...

 but now clang says I'm dereferencing a NULL pointer. At this point, I'm
 faced with some significant trial-and-error refactoring to get rid of
 the message.

 At what point does this cease to be fixing "bugs" and become "contort to
 fit clang's quirks"?

but your code does not reflect your "B is true if and only if A is true." statement

The example I posted was clearer.

but i think clang was right with Walters code - i do not understand why he splits up the A and B conditional block into spereated ones when B does not work if A wasn't true before... makes it sense (in exactly this case) to write the code like he does?
Jul 31 2011
parent Walter Bright <newshound2 digitalmars.com> writes:
On 7/31/2011 3:28 AM, dennis luehring wrote:
 but i think clang was right with Walters code - i do not understand
 why he splits up the A and B conditional block into spereated ones when
 B does not work if A wasn't true before... makes it sense (in exactly
 this case) to write the code like he does?

What I meant was: "A" represents an expression "B" represents a different expression than "A", but is true if and only if "A" is true. For example: "A" might be: x=0;x=a->b->c?1:0;a->b->c "B" might be: x
Jul 31 2011
prev sibling next sibling parent Brad Roberts <braddr slice-2.puremagic.com> writes:
On Fri, 29 Jul 2011, Walter Bright wrote:

 Here's another one:
 
    T* p;
    ...
    if (A)
        p = &t;
    ...
    if (B)
        ... *p ...
 
 B is true if and only if A is true. B can even be the same expression as A,
 such as a->b->c. Clang complains on the *p that p is "uninitialized". Ok, so I
 rewrite as:
 
    T* p = NULL;
    ...
    if (A)
        p = &t;
    ...
    if (B)
        ... *p ...
 
 but now clang says I'm dereferencing a NULL pointer. At this point, I'm faced
 with some significant trial-and-error refactoring to get rid of the message.
 
 At what point does this cease to be fixing "bugs" and become "contort to fit
 clang's quirks"?

I look at it in (at last) two ways which are are contradictory.. 1) static analysis reported issues are of materialy less value than a bug report that comes with a repro case. reason: the analysis doesn't have a test case that can prove it stays fixed short of integrating the tool as part of the test process -- an almost pointless exercise unless there's additional tooling around it to mask out false positives. 2) if the tool has trouble analyzing the code, there's a not unreasonable chance a person also has trouble. The above case is a good one where depending on how close those two if's are in the code and how obvious it is that B is a super set of A, it's the kind of thing someone's going to have trouble with too. By and large though, this isn't the way I'd spend my time, unless you goal is to reduce test cases to feed into clang to improve it. The cost/benefit ratio just doesn't meet the bar. My 2 cents, Brad
Jul 29 2011
prev sibling next sibling parent Brad Roberts <braddr slice-2.puremagic.com> writes:
On Fri, 29 Jul 2011, Walter Bright wrote:

 On 7/29/2011 1:30 PM, Brad Roberts wrote:
 2) if the tool has trouble analyzing the code, there's a not unreasonable
 chance a person also has trouble.  The above case is a good one where
 depending on how close those two if's are in the code and how obvious it
 is that B is a super set of A, it's the kind of thing someone's going to
 have trouble with too.

In general I agree with this, which is why I've made some changes to the source code to 'fix' some of the non-bugs identified by clang. I felt the changes made the code more readable.
 By and large though, this isn't the way I'd spend my time, unless you goal
 is to reduce test cases to feed into clang to improve it.  The
 cost/benefit ratio just doesn't meet the bar.

So far, two real bugs have been identified. This makes it worth one pass through the clang results, but as you say, the rate of false positives is so high it is not worth continuing to use it.

Two real, hitable, bugs? I still look at cost/benefit.. in that same time a number of other things could be done that had at least as much direct benefit. Don't get me wrong, I really love static analysis tools, but ones that are mature and have mechanisms for managing the false positives. Later, Brad
Jul 29 2011
prev sibling parent Brad Roberts <braddr slice-2.puremagic.com> writes:
On Fri, 29 Jul 2011, Walter Bright wrote:

 On 7/29/2011 1:35 PM, Trass3r wrote:
 Here's another one:
 
 T* p;
 ...
 if (A)
 p = &t;
 ...
 if (B)
 ... *p ...
 
 B is true if and only if A is true. B can even be the same expression as
 A,
 such as a->b->c. Clang complains on the *p that p is "uninitialized".

 At what point does this cease to be fixing "bugs" and become "contort to
 fit
 clang's quirks"?

Clang's static analysis isn't very mature yet.

It's not about maturity, it's simply impossible to correctly do the above case.

You keep using the term impossible. It's impossible to be 100% correct in 100% of the code bases. Sure, but that's obviously not the goal. The goal is to be correct enough on enough code to be useful. A far more achievable goal. Clang isn't there yet, but with time and effort it can improve towards that goal. That's where producing test cases on things it fails on is useful.. standard bug reporting (or feature gap reporting in this case). If B can be shown to be a super set of A, which is very doable for a reasonable set of cases, then that false positive can be eliminated. Continuing to think of these tools as either perfect or useless and useful discussions about them are kinda hard to have. Later, Brad
Jul 29 2011
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 7/27/2011 3:08 PM, Robert Clipsham wrote:
 Note that in
 addition to Walter's global.h changes, __attribute__((noreturn)) needed to be
 added to local_assert and util_assert in tassert.h too (could you add these too
 Walter?)

I thought I did. Will check.
Jul 27 2011
prev sibling next sibling parent Trass3r <un known.com> writes:
 Here's another one:

     T* p;
     ...
     if (A)
         p = &t;
     ...
     if (B)
         ... *p ...

 B is true if and only if A is true. B can even be the same expression as  
 A, such as a->b->c. Clang complains on the *p that p is "uninitialized".

 At what point does this cease to be fixing "bugs" and become "contort to  
 fit clang's quirks"?

Clang's static analysis isn't very mature yet. Also Apple primarily pushes the Objective-C analysis development so they can integrate it into XCode.
Jul 29 2011
prev sibling parent reply Andrew Wiley <wiley.andrew.j gmail.com> writes:
--0016e68ee3edb2e5e804a93ec2e1
Content-Type: text/plain; charset=ISO-8859-1

On Fri, Jul 29, 2011 at 4:51 PM, bearophile <bearophileHUGS lycos.com>wrote:

 Walter:

 BTW, a while back Spec# was touted here as having this great theorem

 prover, and what a great tool that was. Reading the documentation on it,

 looked great. So I tried it out in pastebin. It pretty much only works on

 examples given in the marketing literature. It couldn't even handle

You are missing something important. The purpose of Spec# (and newer things that are coming out now) is to create a small kernel (for an operating system)

Woah Woah Woah If that's the purpose, why the hell would they be using C#? Have they proven the entire CLR as well?
 that is 100% proved correct (correct according to its specs, etc). If they
 don't use an automatic theorem prover, then they have to do 100% of the
 proofs by hand. Even if the tool is able to do only 30% of the proofs
 automatically (and another 40% with a bit of help) it's a lot of manual work
 saved. The papers report the percentage of the theorems done by the tool,
 the percentage of the theorems done partially by hand, and the remaining
 percentage done by hand. The source code is all available, to verify the
 numbers inside the papers are not made up.


 I know enough about data flow analysis to infer what clang is doing from

 pattern of reports, and it hasn't even begun to solve this problem.

Clang static analysis is a very new sub-tool. On the market there are commercial lint tools that are probably 15-18 times older than Clang. Give Clang two or three years are we'll see. Its creators have created a good C++ compiler in a very short time, and its back-end is already rivalling GCC in optimizations (it's not there yet, but the distance is not so much, auto-vectorization is one of the most important differences between the two back-ends).

I've seen several times that a major feature of a lint tool was that you could disable warnings of certain types, which is necessary because there are just too many false positives. Clang will doubtlessly get better, but the fact will still remain that C and C++ are terrible languages to try to run static analyses on. --0016e68ee3edb2e5e804a93ec2e1 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable <div class=3D"gmail_quote">On Fri, Jul 29, 2011 at 4:51 PM, bearophile <spa= n dir=3D"ltr">&lt;<a href=3D"mailto:bearophileHUGS lycos.com">bearophileHUG= S lycos.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" styl= e=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"> Walter:<br> <div class=3D"im"><br> &gt; BTW, a while back Spec# was touted here as having this great theorem c= ontract<br> &gt; prover, and what a great tool that was. Reading the documentation on i= t, it<br> &gt; looked great. So I tried it out in pastebin. It pretty much only works= on the<br> &gt; examples given in the marketing literature. It couldn&#39;t even handl= e bitwise ops.<br> <br> </div>You are missing something important. The purpose of Spec# (and newer = things that are coming out now) is to create a small kernel (for an operati= ng system)</blockquote><div><br>Woah Woah Woah<br>If that&#39;s the purpose= , why the hell would they be using C#? Have they proven the entire CLR as w= ell?<br> =A0</div><blockquote class=3D"gmail_quote" style=3D"margin: 0pt 0pt 0pt 0.8= ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">that is = 100% proved correct (correct according to its specs, etc). If they don&#39;= t use an automatic theorem prover, then they have to do 100% of the proofs = by hand. Even if the tool is able to do only 30% of the proofs automaticall= y (and another 40% with a bit of help) it&#39;s a lot of manual work saved.= The papers report the percentage of the theorems done by the tool, the per= centage of the theorems done partially by hand, and the remaining percentag= e done by hand. The source code is all available, to verify the numbers ins= ide the papers are not made up.<br> <div class=3D"im"><br> <br> &gt; I know enough about data flow analysis to infer what clang is doing fr= om its<br> &gt; pattern of reports, and it hasn&#39;t even begun to solve this problem= .<br> <br> </div>Clang static analysis is a very new sub-tool. On the market there are= commercial lint tools that are probably 15-18 times older than Clang. Give= Clang two or three years are we&#39;ll see. Its creators have created a go= od C++ compiler in a very short time, and its back-end is already rivalling= GCC in optimizations (it&#39;s not there yet, but the distance is not so m= uch, auto-vectorization is one of the most important differences between th= e two back-ends).<font color=3D"#888888"><br> </font></blockquote><div><br>I&#39;ve seen several times that a major featu= re of a lint tool was that you could disable warnings of certain types, whi= ch is necessary because there are just too many false positives. Clang will= doubtlessly get better, but the fact will still remain that C and C++ are = terrible languages to try to run static analyses on. <br> </div></div><br> --0016e68ee3edb2e5e804a93ec2e1--
Jul 29 2011
parent bearophile <bearophileHUGS lycos.com> writes:
Andrew Wiley:

 If that's the purpose, why the hell would they be using C#?

They are now trying again using F* (derived by F#, that is a ML-derived functional language), that has a Coq-proved type system that seems a tour the force: http://lambda-the-ultimate.org/node/4318
 Have they proven the entire CLR as well?

The project is unrelated to the CLR. But the people working on the Contracts for C# are (they say) slowly adding contracts to the whole CLR. Bye, bearophile
Jul 29 2011