digitalmars.D - Clang static analysis results for dmd

Robert Clipsham (15/15) Jul 24 2011 Hi all,

Robert Clipsham (6/18) Jul 24 2011 Question: Is it worth making bug reports out of any of these? If so,
Andrei Alexandrescu (6/15) Jul 24 2011 Very interesting. I selected randomly one of each category and found

Robert Clipsham (9/27) Jul 24 2011 That's true for quite a few of the null pointer dereferences, there are

Walter Bright (6/11) Jul 24 2011 dmd uses its own definition of assert(). It is marked as:

Robert Clipsham (7/23) Jul 24 2011 http://clang.llvm.org/docs/LanguageExtensions.html#__builtin_unreachable

Walter Bright (2/5) Jul 24 2011 Done.

bearophile (5/6) Jul 24 2011 Thank you, this will be useful, and hopefully it will have some good con...
Walter Bright (7/14) Jul 24 2011 Looks like clang has some bugs itself.

Robert Clipsham (7/13) Jul 26 2011 It does if value is always within the range of an unsigned short. I'm

Walter Bright (3/18) Jul 26 2011 It's not a bug in dmc. The code is quite deliberate and necessary. The b...

Robert Clipsham (20/22) Jul 26 2011 $ cat test.c

Don (6/23) Jul 25 2011 It seems to have ignored the backend source files which are generated by...

Robert Clipsham (6/11) Jul 26 2011 It has indeed ignored them. This is a consequence of the way I was

Don (13/30) Jul 27 2011 Here's a clang bug:

Don (9/44) Jul 27 2011 Another one:
Trass3r (2/14) Jul 27 2011 Yeah, though that code should really use else instead of another check.

Robert Clipsham (10/22) Jul 27 2011 I've posted updated results:

bearophile (4/5) Jul 27 2011 I'd like all those tests done by the D compiler too (Clang doesn't perfo...

Walter Bright (10/14) Jul 28 2011 The signal to noise ratio of this kind of flow analysis is rather poor. ...

bearophile (119/121) Jul 28 2011 I don't fully understand your answers on this. I am a bit confused, but ...

Walter Bright (26/34) Jul 28 2011 If they are part of the compiler, I guarantee you that people will regar...

Walter Bright (5/5) Jul 28 2011 Take a look at all the complaints about evalu8.c. They're all false posi...
bearophile (7/14) Jul 28 2011 Somewhere I have read that the errors found by Clang are so often true b...

Walter Bright (8/11) Jul 28 2011 The thing is, I've tried the things Clang tries way back in the 80's. It...

bearophile (5/7) Jul 29 2011 "Unused variables", "dead assignments", "dead last assignments" aren't b...
Walter Bright (22/22) Jul 29 2011 Here's another one:

Brad Roberts (17/44) Jul 29 2011 I look at it in (at last) two ways which are are contradictory..

Walter Bright (7/15) Jul 29 2011 In general I agree with this, which is why I've made some changes to the...

Brad Roberts (8/27) Jul 29 2011 Two real, hitable, bugs?

Walter Bright (2/3) Jul 29 2011 Nobody has run into them yet.

Trass3r (4/16) Jul 29 2011 Clang's static analysis isn't very mature yet.

Walter Bright (2/18) Jul 29 2011 It's not about maturity, it's simply impossible to correctly do the abov...

Brad Roberts (14/37) Jul 29 2011 You keep using the term impossible. It's impossible to be 100% correct ...

Walter Bright (20/33) Jul 29 2011 It's the halting problem. How are you going to prove that a->b->c yields...

bearophile (5/11) Jul 29 2011 Clang static analysis is a very new sub-tool. On the market there are co...

Andrew Wiley (9/39) Jul 29 2011 Woah Woah Woah

bearophile (7/9) Jul 29 2011 They are now trying again using F* (derived by F#, that is a ML-derived ...

Walter Bright (8/29) Jul 29 2011 I think you misunderstand the scope of this problem. They've done data f...

Andrei Alexandrescu (3/6) Jul 29 2011 But that's also an approximate thing :o).

Walter Bright (3/8) Jul 29 2011 True, but it's done in a way that enables things rather than disallowing...

dennis luehring (10/38) Jul 30 2011 but your code does not reflect your "B is true if and only if A is

Don (8/40) Jul 31 2011 The example I posted was clearer.

dennis luehring (5/40) Jul 31 2011 but i think clang was right with Walters code - i do not understand

Walter Bright (8/12) Jul 31 2011 What I meant was:

Walter Bright (2/6) Jul 27 2011 I thought I did. Will check.

Robert Clipsham <robert octarineparrot.com> writes:

Hi all,

I took the liberty of running dmd through clang's static analysis, and 
it's turned up quite a few (486) potential bugs in dmd:

http://octarineparrot.com/assets/dmd/

There's bound to be a few false positives in there (please feel free to 
report them to the nice folk over at llvm), and the dead code won't 
cause any problems (and most of the "dead code" in the backend is 
probably in use with dmc or something), but there are still a few 
hundred potential crashes in there.

The link above is rather slow - if anyone wants a local copy to work 
with let me know and I'll see about getting one to you... The compressed 
source to those pages is ~206MB, it's about 2.8GB without compression!

-- 
Robert
http://octarineparrot.com/

Jul 24 2011

Robert Clipsham <robert octarineparrot.com> writes:

On 24/07/2011 22:06, Robert Clipsham wrote:
 Hi all,

 I took the liberty of running dmd through clang's static analysis, and
 it's turned up quite a few (486) potential bugs in dmd:

 http://octarineparrot.com/assets/dmd/

 There's bound to be a few false positives in there (please feel free to
 report them to the nice folk over at llvm), and the dead code won't
 cause any problems (and most of the "dead code" in the backend is
 probably in use with dmc or something), but there are still a few
 hundred potential crashes in there.

 The link above is rather slow - if anyone wants a local copy to work
 with let me know and I'll see about getting one to you... The compressed
 source to those pages is ~206MB, it's about 2.8GB without compression!

Question: Is it worth making bug reports out of any of these? If so, 
which, and should I group them?

-- 
Robert
http://octarineparrot.com/

Jul 24 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 7/24/11 4:06 PM, Robert Clipsham wrote:
 Hi all,

 I took the liberty of running dmd through clang's static analysis, and
 it's turned up quite a few (486) potential bugs in dmd:

 http://octarineparrot.com/assets/dmd/

 There's bound to be a few false positives in there (please feel free to
 report them to the nice folk over at llvm), and the dead code won't
 cause any problems (and most of the "dead code" in the backend is
 probably in use with dmc or something), but there are still a few
 hundred potential crashes in there.

Very interesting. I selected randomly one of each category and found 
only false positives though, mostly because the tool does not understand 
that assert(p) will subsequently guarantee p is non-null, and assert(0) 
terminates the program.

Andrei

Jul 24 2011

Robert Clipsham <robert octarineparrot.com> writes:

On 24/07/2011 22:29, Andrei Alexandrescu wrote:
 On 7/24/11 4:06 PM, Robert Clipsham wrote:
 Hi all,

 I took the liberty of running dmd through clang's static analysis, and
 it's turned up quite a few (486) potential bugs in dmd:

 http://octarineparrot.com/assets/dmd/

 There's bound to be a few false positives in there (please feel free to
 report them to the nice folk over at llvm), and the dead code won't
 cause any problems (and most of the "dead code" in the backend is
 probably in use with dmc or something), but there are still a few
 hundred potential crashes in there.

 Very interesting. I selected randomly one of each category and found
 only false positives though, mostly because the tool does not understand
 that assert(p) will subsequently guarantee p is non-null, and assert(0)
 terminates the program.

 Andrei

That's true for quite a few of the null pointer dereferences, there are 
some where there are no assertions though. It might be worth reporting 
these false positives to the folk working on it. In the other categories 
there are far fewer where they're false positives due to the assertion. 
It tends to be the ones with shorter path lengths that exhibit this problem.

-- 
Robert
http://octarineparrot.com/

Jul 24 2011

Walter Bright <newshound2 digitalmars.com> writes:

On 7/24/2011 2:43 PM, Robert Clipsham wrote:
 That's true for quite a few of the null pointer dereferences, there are some
 where there are no assertions though. It might be worth reporting these false
 positives to the folk working on it. In the other categories there are far
fewer
 where they're false positives due to the assertion. It tends to be the ones
with
 shorter path lengths that exhibit this problem.

dmd uses its own definition of assert(). It is marked as:

    #pragma noreturn(util_assert)

in tassert.h.

If there is an equivalent for clang, I suggest adding that in and rerunning the 
analysis. Getting rid of hundreds of false positives will save a lot of time.

Jul 24 2011

Robert Clipsham <robert octarineparrot.com> writes:

On 24/07/2011 22:59, Walter Bright wrote:
 On 7/24/2011 2:43 PM, Robert Clipsham wrote:
 That's true for quite a few of the null pointer dereferences, there
 are some
 where there are no assertions though. It might be worth reporting
 these false
 positives to the folk working on it. In the other categories there are
 far fewer
 where they're false positives due to the assertion. It tends to be the
 ones with
 shorter path lengths that exhibit this problem.

 dmd uses its own definition of assert(). It is marked as:

 #pragma noreturn(util_assert)

 in tassert.h.

 If there is an equivalent for clang, I suggest adding that in and
 rerunning the analysis. Getting rid of hundreds of false positives will
 save a lot of time.



If you'd do the honours of adding this in I'll rerun the analysis next 
time I get a chance.

-- 
Robert
http://octarineparrot.com/

Jul 24 2011

Walter Bright <newshound2 digitalmars.com> writes:

On 7/24/2011 3:11 PM, Robert Clipsham wrote:


 If you'd do the honours of adding this in I'll rerun the analysis next time I
 get a chance.

Done.

Jul 24 2011

bearophile <bearophileHUGS lycos.com> writes:

Robert Clipsham:

 I took the liberty of running dmd through clang's static analysis,

Thank you, this will be useful, and hopefully it will have some good
consequences :-)

(I'm reading a random sample of the bugs.)

Bye,
bearophile

Jul 24 2011

Walter Bright <newshound2 digitalmars.com> writes:

On 7/24/2011 2:06 PM, Robert Clipsham wrote:
 I took the liberty of running dmd through clang's static analysis, and it's
 turned up quite a few (486) potential bugs in dmd:

 http://octarineparrot.com/assets/dmd/

 There's bound to be a few false positives in there (please feel free to report
 them to the nice folk over at llvm), and the dead code won't cause any problems
 (and most of the "dead code" in the backend is probably in use with dmc or
 something), but there are still a few hundred potential crashes in there.


Looks like clang has some bugs itself.

http://octarineparrot.com/assets/dmd/report-ZGvM0T.html#EndPath


    if ((unsigned short)value != value)

    6 Both operands to '!=' always have the same value

No, that's not true. value is a uint64_t, and casting it to unsigned short and 
then back to uint64_t does not yield the same value.

Jul 24 2011

Robert Clipsham <robert octarineparrot.com> writes:

On 25/07/2011 03:10, Walter Bright wrote:
 Looks like clang has some bugs itself.

 http://octarineparrot.com/assets/dmd/report-ZGvM0T.html#EndPath


 if ((unsigned short)value != value)

 6 Both operands to '!=' always have the same value

 No, that's not true. value is a uint64_t, and casting it to unsigned
 short and then back to uint64_t does not yield the same value.

It does if value is always within the range of an unsigned short. I'm 
not sure whether it is in this case or not, if not we need to narrow 
down a test case and report a bug.

-- 
Robert
http://octarineparrot.com/

Jul 26 2011

Walter Bright <newshound2 digitalmars.com> writes:

On 7/26/2011 11:01 AM, Robert Clipsham wrote:
 On 25/07/2011 03:10, Walter Bright wrote:
 Looks like clang has some bugs itself.

 http://octarineparrot.com/assets/dmd/report-ZGvM0T.html#EndPath


 if ((unsigned short)value != value)

 6 Both operands to '!=' always have the same value

 No, that's not true. value is a uint64_t, and casting it to unsigned
 short and then back to uint64_t does not yield the same value.

 It does if value is always within the range of an unsigned short. I'm not sure
 whether it is in this case or not, if not we need to narrow down a test case
and
 report a bug.

It's not a bug in dmc. The code is quite deliberate and necessary. The bug is
in 
clang.

Jul 26 2011

Robert Clipsham <robert octarineparrot.com> writes:

On 26/07/2011 19:08, Walter Bright wrote:
 It's not a bug in dmc. The code is quite deliberate and necessary. The
 bug is in clang.

$ cat test.c
#include <stdint.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
     uint64_t value = atoi(argv[0]);
     if ((unsigned short)value != value)
     {
         return 0;
     }
     return 1;
}
$ scan-build clang -g test.c

Note that this leads to no errors from the analyzer - it will only flag 
it if it knows statically whether value fits within an unsigned short or 
not. We'll see if it's still flagged when I re-run it.

-- 
Robert
http://octarineparrot.com/

Jul 26 2011

Don <nospam nospam.com> writes:

Robert Clipsham wrote:
 Hi all,
 
 I took the liberty of running dmd through clang's static analysis, and 
 it's turned up quite a few (486) potential bugs in dmd:
 
 http://octarineparrot.com/assets/dmd/
 
 There's bound to be a few false positives in there (please feel free to 
 report them to the nice folk over at llvm), and the dead code won't 
 cause any problems (and most of the "dead code" in the backend is 
 probably in use with dmc or something), but there are still a few 
 hundred potential crashes in there.
 
 The link above is rather slow - if anyone wants a local copy to work 
 with let me know and I'll see about getting one to you... The compressed 
 source to those pages is ~206MB, it's about 2.8GB without compression!
 

It seems to have ignored the backend source files which are generated by 
early build steps optabgen.c, etc. Either that, or else it doesn't 
realize that the arrays in those files (eg, optab1[] in optab.c) are 
constant values.
Every backend report that I looked at, was a false positive caused by this.

Jul 25 2011

Robert Clipsham <robert octarineparrot.com> writes:

On 25/07/2011 13:05, Don wrote:
 It seems to have ignored the backend source files which are generated by
 early build steps optabgen.c, etc. Either that, or else it doesn't
 realize that the arrays in those files (eg, optab1[] in optab.c) are
 constant values.
 Every backend report that I looked at, was a false positive caused by this.

It has indeed ignored them. This is a consequence of the way I was 
running the build, I'll see if I can sort this out when I run it again.

-- 
Robert
http://octarineparrot.com/

Jul 26 2011

Don <nospam nospam.com> writes:

Robert Clipsham wrote:
 Hi all,
 
 I took the liberty of running dmd through clang's static analysis, and 
 it's turned up quite a few (486) potential bugs in dmd:
 
 http://octarineparrot.com/assets/dmd/
 
 There's bound to be a few false positives in there (please feel free to 
 report them to the nice folk over at llvm), and the dead code won't 
 cause any problems (and most of the "dead code" in the backend is 
 probably in use with dmc or something), but there are still a few 
 hundred potential crashes in there.
 
 The link above is rather slow - if anyone wants a local copy to work 
 with let me know and I'll see about getting one to you... The compressed 
 source to those pages is ~206MB, it's about 2.8GB without compression!
 

Here's a clang bug:
http://octarineparrot.com/assets/dmd/report-vtxpYt.html#EndPath

3436 if (!exp)
	
      <9> Taking false branch
3437	fd->nrvo_can = 0;
3438	
3439	if (exp)
	
      <10> Taking false branch
3440	{


That is, (!exp) is false, and (exp) is also false.

Jul 27 2011

Don <nospam nospam.com> writes:

Don wrote:
 Robert Clipsham wrote:
 Hi all,

 I took the liberty of running dmd through clang's static analysis, and 
 it's turned up quite a few (486) potential bugs in dmd:

 http://octarineparrot.com/assets/dmd/

 There's bound to be a few false positives in there (please feel free 
 to report them to the nice folk over at llvm), and the dead code won't 
 cause any problems (and most of the "dead code" in the backend is 
 probably in use with dmc or something), but there are still a few 
 hundred potential crashes in there.

 The link above is rather slow - if anyone wants a local copy to work 
 with let me know and I'll see about getting one to you... The 
 compressed source to those pages is ~206MB, it's about 2.8GB without 
 compression!

 
 Here's a clang bug:
 http://octarineparrot.com/assets/dmd/report-vtxpYt.html#EndPath
 
 3436 if (!exp)
     
      <9> Taking false branch
 3437    fd->nrvo_can = 0;
 3438   
 3439    if (exp)
     
      <10> Taking false branch
 3440    {
 
 
 That is, (!exp) is false, and (exp) is also false.

Another one:
http://octarineparrot.com/assets/dmd/report-bksOGf.html#EndPath

Quite bizarre -- it seems to think the static member array 
Type::sizeTy[TMAX] is a null pointer.

I finally found one genuine DMD bug report: 
http://d.puremagic.com/issues/show_bug.cgi?id=6389

But it looks to me as though the reports show more bugs in clang, than 
in DMD.

Jul 27 2011

Trass3r <un known.com> writes:

Am 27.07.2011, 10:22 Uhr, schrieb Don <nospam nospam.com>:
 Here's a clang bug:
 http://octarineparrot.com/assets/dmd/report-vtxpYt.html#EndPath

 3436 if (!exp)
 	
       <9> Taking false branch
 3437	fd->nrvo_can = 0;
 3438	
 3439	if (exp)
 	
       <10> Taking false branch
 3440	{


 That is, (!exp) is false, and (exp) is also false.

Yeah, though that code should really use else instead of another check.

Jul 27 2011

Robert Clipsham <robert octarineparrot.com> writes:

On 24/07/2011 22:06, Robert Clipsham wrote:
 Hi all,

 I took the liberty of running dmd through clang's static analysis, and
 it's turned up quite a few (486) potential bugs in dmd:

 http://octarineparrot.com/assets/dmd/

 There's bound to be a few false positives in there (please feel free to
 report them to the nice folk over at llvm), and the dead code won't
 cause any problems (and most of the "dead code" in the backend is
 probably in use with dmc or something), but there are still a few
 hundred potential crashes in there.

 The link above is rather slow - if anyone wants a local copy to work
 with let me know and I'll see about getting one to you... The compressed
 source to those pages is ~206MB, it's about 2.8GB without compression!

I've posted updated results:

http://octarineparrot.com/assets/dmd/

You'll probably need to hit refresh before the new results come up. Note 
that in addition to Walter's global.h changes, __attribute__((noreturn)) 
needed to be added to local_assert and util_assert in tassert.h too 
(could you add these too Walter?)

-- 
Robert
http://octarineparrot.com/

Jul 27 2011

bearophile <bearophileHUGS lycos.com> writes:

Robert Clipsham:

 I've posted updated results:

I'd like all those tests done by the D compiler too (Clang doesn't perform
those tests on default because they take time, you have to add the --analyze
compiler switch). Some of them like Dead store look better as warnings, other
ones seem better as errors.

Bye,
bearophile

Jul 27 2011

Walter Bright <newshound2 digitalmars.com> writes:

On 7/27/2011 3:50 PM, bearophile wrote:
 I'd like all those tests done by the D compiler too (Clang doesn't perform
 those tests on default because they take time, you have to add the --analyze
 compiler switch). Some of them like Dead store look better as warnings, other
 ones seem better as errors.


The signal to noise ratio of this kind of flow analysis is rather poor. While
it 
does find some legitimate bugs, the rate of false positives is far too high to 
be a standard part of the language.

I would agree that adding extra conditionals to the source code will both 
eliminate the false positives and make the code more readable, but those extra 
conditionals exact a performance penalty and would not be something a high 
performance coder would want.

I originally did have stuff like this in the optimizer, but removed it because 
the false positive rate was untenable.

Jul 28 2011

bearophile <bearophileHUGS lycos.com> writes:

Walter:

The signal to noise ratio of this kind of flow analysis is rather poor. While
it does find some legitimate bugs, the rate of false positives is far too high
to be a standard part of the language.<

I don't fully understand your answers on this. I am a bit confused, but it's
not your fault.

In Clang those tests aren't a standard part of the C or C++ languages. They are
extra tests, like a lint tool built in the compiler, and they aren't a part of
the normal compilation (if you use --analyze it doesn't produce a compiled
binary, but an HTML of the test results).

I take a look at a random sample of the first groups of the results:

---------------------

Some Dead code, Idempotent operation:

uRegmask3 = ASM_GET_uRegmask(popnd3->usFlags);
Value stored to 'uRegmask3' is never read

ty = ta->ty;
Value stored to 'ty' is never read

---------------------

Some Dead code, dead assignment:

uSizemask3 = ASM_GET_uSizemask(popnd3->usFlags);	
Value stored to 'uSizemask3' is never read

s = retregs & mES;	
Value stored to 's' is never read

---------------------

Some Dead code, Dead increment:

offset += vtblInterfaces->dim * (4 * PTRSIZE);
Value stored to 'offset' is never read

flags |= 1; // already deduced, so don't to toHeadMutable()
Value stored to 'flags' is never read

---------------------

Dead store	Dead initialization:

TY tyto = t->toBasetype()->ty;
Value stored to 'tyto' during its initialization is never read

int aimports_dim = aimports.dim;	
Value stored to 'aimports_dim' during its initialization is never read

---------------------

Logic error	Assigned value is garbage or undefined:


Parameters *FuncDeclaration::getParameters(int *pvarargs)
2918	{ Parameters *fparameters;
2919	int fvarargs;
2920	
2921	if (type)
	
1
	Taking false branch
2922	{
2923	assert(type->ty == Tfunction);
2924	TypeFunction *fdtype = (TypeFunction *)type;
2925	fparameters = fdtype->parameters;
2926	fvarargs = fdtype->varargs;
2927	}
2928	if (pvarargs)
	
2
	Taking true branch
2929	*pvarargs = fvarargs;
	
3
	Assigned value is garbage or undefined

---------------------

Logic error	Dereference of undefined pointer value:

STATIC void ivfamelems(register Iv *biv,register elem **pn)
2447	{ register unsigned op;
2448	register tym_t ty,c2ty;
2449	register famlist *f;
2450	register elem *n,*n1,*n2;
2451	
2452	assert(pn);
2453	n = *pn;
2454	assert(biv && n);
2455	op = n->Eoper;
2456	if (OTunary(op))
	
1
	Taking true branch
2457	{ ivfamelems(biv,&n->E1);
2458	n1 = n->E1;
2459	}
2460	else if (OTbinary(op))
2461	{ ivfamelems(biv,&n->E1);
2462	ivfamelems(biv,&n->E2); /* LTOR or RTOL order is unimportant */
2463	n1 = n->E1;
2464	n2 = n->E2;
2465	}
2466	else /* else leaf elem */
2467	return; /* which can't be in the family */
2468	
2469	if (op == OPmul || op == OPadd || op == OPmin ||
	
2
	Taking true branch
2470	op == OPneg || op == OPshl)
2471	{ /* Note that we are wimping out and not considering */
2472	/* LI variables as part of c1 and c2, but only constants. */
2473	
2474	ty = n->Ety;

2485	
2486	/* If we have (li + var), swap the leaves. */
2487	if (op == OPadd && isLI(n1) && n1->Eoper == OPvar && n2->Eoper == OPvar)
	
3
	Dereference of undefined pointer value
	
--------------------

2819	targ_ldouble el_toldouble(elem *e)
2820	{ targ_ldouble result;
2821	
2822	elem_debug(e);
2823	assert(cnst(e));
2824	#if TX86
2825	switch (tybasic(typemask(e)))
	
1
	'Default' branch taken. Execution continues on line 2860

2860	return result;
	
2
	Undefined or garbage value returned to caller
2861	}

-----------------------

Is Clang correct there, or are those false positives? If it's correct then I'd
like the D compiler to tell me 100% of those I have listed here, even if not
even one of those is a real bug. In some cases you store a value in a variable
even if you know you will not use it (example: last iteration of a loop, to
code simpler and shoter. But I'd like to know every time I do this. I like to
write tidy code.

Returning values that can be undefined is less easy to catch in D, because the
language initializes variables, to their initial default value is sometimes
what the programmer wants.


I would agree that adding extra conditionals to the source code will both
eliminate the false positives and make the code more readable, but those extra
conditionals exact a performance penalty and would not be something a high
performance coder would want. I originally did have stuff like this in the
optimizer, but removed it because the false positive rate was untenable.<

I don't fully understand what you are saying, but is __assume useful here?
http://msdn.microsoft.com/en-us/library/1b3fsfxw%28VS.80%29.aspx

Bye,
bearophile

Jul 28 2011

Walter Bright <newshound2 digitalmars.com> writes:

On 7/28/2011 3:27 PM, bearophile wrote:
 In Clang those tests aren't a standard part of the C or C++ languages. They
 are extra tests, like a lint tool built in the compiler, and they aren't a
 part of the normal compilation (if you use --analyze it doesn't produce a
 compiled binary, but an HTML of the test results).

If they are part of the compiler, I guarantee you that people will regard it as 
a standard part of the language, and the complaints about false positives will 
cause problems.


 Some Dead code, Idempotent operation:

Dead code is not a bug. It's more of a stylistic issue, and sometimes that 
"dead" code is needed in other code that has been #if'd out. The compiler 
complaining about dead code is also a nuisance when turning on and off sections 
of code that is a normal part of the dev process.


 Is Clang correct there, or are those false positives? If it's correct then
 I'd like the D compiler to tell me 100% of those I have listed here, even if
 not even one of those is a real bug.

I've been slowly going through the reports, and so far all of them have been 
false positives. Don found one that's a real bug, but I haven't gotten to it
yet.

Here's an example of a false positive - clang complains the comparison is 
idempotent:

     size_t e2factor;
     ...
     if (e2factor == (int)e2factor)

For a 32 bit compile, yes, it's idempotent. But not for a 64 bit compile! I 
*want* it to be a no-op for a 32 bit compile and to become active in a 64 bit 
compile.

Of course, I could also use an ugly #ifdef, but I like that little idiom, it 
works, and it is correct. I know why clang is doing what it is doing, but that 
shows a weakness in its static analysis.

There are other false positives for things like assigning an uninitialized
value 
to a field in a data structure that will never be used in the cases where it is 
uninitialized. I could add a conditional, but that's slower than just assigning 
it anyway. Trying to figure these things out with static analysis is impossible 
- it would be solving the halting problem - hence you're stuck with false
positives.

Jul 28 2011

Walter Bright <newshound2 digitalmars.com> writes:

Take a look at all the complaints about evalu8.c. They're all false positives.

Consider that this code has been in use for 25 years now. It's been used by 
several hundred thousand developers. If it was so chock full of scores of 
undefined values and null pointer dereferences, I probably would have heard 
about it by now!

Jul 28 2011

bearophile <bearophileHUGS lycos.com> writes:

Walter:

 I've been slowly going through the reports, and so far all of them have been 
 false positives.

Somewhere I have read that the errors found by Clang are so often true bugs
they have a tool that submits bug reports automatically. I now presume they
were wrong.

But keep in mind that Clang static analysis is a *very young* sub-tool. It's
something like a year or so old. There are commercial C/C++ lints that are
probably about or more than 20 years old that are probably less buggy and more
precise than the Clang compiler, and they find far more kinds of problems in
the code. It often finds issues in my code (usually stylistic issues, not true
bugs, but I usually agree with its advice and change my code).


 There are other false positives for things like assigning an uninitialized
value 
 to a field in a data structure that will never be used in the cases where it
is 
 uninitialized. I could add a conditional, but that's slower than just
assigning 
 it anyway. Trying to figure these things out with static analysis is
impossible 
 - it would be solving the halting problem - hence you're stuck with false
positives.

Probably there are ways for a programmer+language to tell such simple semantics
to a compiler, but C language is not good enough for this.

Thank you for your answers. It's always interesting when "theory" (of
practical-purposed software tools) meets the road of the experiment reality. I
hope Clang has not wasted too much of your time.

Bye,
bearophile

Jul 28 2011

Walter Bright <newshound2 digitalmars.com> writes:

On 7/28/2011 5:19 PM, bearophile wrote:
 Thank you for your answers. It's always interesting when "theory" (of
 practical-purposed software tools) meets the road of the experiment reality.
 I hope Clang has not wasted too much of your time.

The thing is, I've tried the things Clang tries way back in the 80's. It's 
attempting the impossible. C is just not amenable to static proofs (if I were a 
better theoretician I might be able to show that this is partially related to C 
being memory unsafe).

Even so, I still plan to go through them all just to be sure.

Clang does put out a nice report, though, far better than I attempted.

A kickass bug finding tool that really *does* work on C/C++ code is valgrind.

Jul 28 2011

bearophile <bearophileHUGS lycos.com> writes:

Walter:

 The thing is, I've tried the things Clang tries way back in the 80's. It's 
 attempting the impossible.

"Unused variables", "dead assignments", "dead last assignments" aren't bugs and
they are sometimes acceptable while you debug some code, but you too agree they
are better removed from code meant to be good and tidy (also because they
sometimes associated with bugs). You have appreciated some of those "dead code"
(useless variables) notes from Clang:
https://github.com/D-Programming-Language/dmd/compare/c50a936...08aa57a

Bye,
bearophile

Jul 29 2011

Walter Bright <newshound2 digitalmars.com> writes:

Here's another one:

    T* p;
    ...
    if (A)
        p = &t;
    ...
    if (B)
        ... *p ...

B is true if and only if A is true. B can even be the same expression as A,
such 
as a->b->c. Clang complains on the *p that p is "uninitialized". Ok, so I 
rewrite as:

    T* p = NULL;
    ...
    if (A)
        p = &t;
    ...
    if (B)
        ... *p ...

but now clang says I'm dereferencing a NULL pointer. At this point, I'm faced 
with some significant trial-and-error refactoring to get rid of the message.

At what point does this cease to be fixing "bugs" and become "contort to fit 
clang's quirks"?

Jul 29 2011

Brad Roberts <braddr slice-2.puremagic.com> writes:

On Fri, 29 Jul 2011, Walter Bright wrote:

 Here's another one:
 
    T* p;
    ...
    if (A)
        p = &t;
    ...
    if (B)
        ... *p ...
 
 B is true if and only if A is true. B can even be the same expression as A,
 such as a->b->c. Clang complains on the *p that p is "uninitialized". Ok, so I
 rewrite as:
 
    T* p = NULL;
    ...
    if (A)
        p = &t;
    ...
    if (B)
        ... *p ...
 
 but now clang says I'm dereferencing a NULL pointer. At this point, I'm faced
 with some significant trial-and-error refactoring to get rid of the message.
 
 At what point does this cease to be fixing "bugs" and become "contort to fit
 clang's quirks"?

I look at it in (at last) two ways which are are contradictory..

1) static analysis reported issues are of materialy less value than a bug 
report that comes with a repro case.  reason: the analysis doesn't have a 
test case that can prove it stays fixed short of integrating the tool as 
part of the test process -- an almost pointless exercise unless there's 
additional tooling around it to mask out false positives.

2) if the tool has trouble analyzing the code, there's a not unreasonable 
chance a person also has trouble.  The above case is a good one where 
depending on how close those two if's are in the code and how obvious it 
is that B is a super set of A, it's the kind of thing someone's going to 
have trouble with too.

By and large though, this isn't the way I'd spend my time, unless you goal 
is to reduce test cases to feed into clang to improve it.  The 
cost/benefit ratio just doesn't meet the bar.

My 2 cents,
Brad

Jul 29 2011

Walter Bright <newshound2 digitalmars.com> writes:

On 7/29/2011 1:30 PM, Brad Roberts wrote:
 2) if the tool has trouble analyzing the code, there's a not unreasonable
 chance a person also has trouble.  The above case is a good one where
 depending on how close those two if's are in the code and how obvious it
 is that B is a super set of A, it's the kind of thing someone's going to
 have trouble with too.

In general I agree with this, which is why I've made some changes to the source 
code to 'fix' some of the non-bugs identified by clang. I felt the changes made 
the code more readable.


 By and large though, this isn't the way I'd spend my time, unless you goal
 is to reduce test cases to feed into clang to improve it.  The
 cost/benefit ratio just doesn't meet the bar.

So far, two real bugs have been identified. This makes it worth one pass
through 
the clang results, but as you say, the rate of false positives is so high it is 
not worth continuing to use it.

Jul 29 2011

Brad Roberts <braddr slice-2.puremagic.com> writes:

On Fri, 29 Jul 2011, Walter Bright wrote:

 On 7/29/2011 1:30 PM, Brad Roberts wrote:
 2) if the tool has trouble analyzing the code, there's a not unreasonable
 chance a person also has trouble.  The above case is a good one where
 depending on how close those two if's are in the code and how obvious it
 is that B is a super set of A, it's the kind of thing someone's going to
 have trouble with too.

 
 In general I agree with this, which is why I've made some changes to the
 source code to 'fix' some of the non-bugs identified by clang. I felt the
 changes made the code more readable.
 
 
 By and large though, this isn't the way I'd spend my time, unless you goal
 is to reduce test cases to feed into clang to improve it.  The
 cost/benefit ratio just doesn't meet the bar.

 
 So far, two real bugs have been identified. This makes it worth one pass
 through the clang results, but as you say, the rate of false positives is so
 high it is not worth continuing to use it.

Two real, hitable, bugs?

I still look at cost/benefit.. in that same time a number of other things 
could be done that had at least as much direct benefit.

Don't get me wrong, I really love static analysis tools, but ones that are 
mature and have mechanisms for managing the false positives.

Later,
Brad

Jul 29 2011

Walter Bright <newshound2 digitalmars.com> writes:

On 7/29/2011 2:49 PM, Brad Roberts wrote:
 Two real, hitable, bugs?

Nobody has run into them yet.

Jul 29 2011

Trass3r <un known.com> writes:

 Here's another one:

     T* p;
     ...
     if (A)
         p = &t;
     ...
     if (B)
         ... *p ...

 B is true if and only if A is true. B can even be the same expression as  
 A, such as a->b->c. Clang complains on the *p that p is "uninitialized".

...
 At what point does this cease to be fixing "bugs" and become "contort to  
 fit clang's quirks"?

Clang's static analysis isn't very mature yet.
Also Apple primarily pushes the Objective-C analysis development so they  
can integrate it into XCode.

Jul 29 2011

Walter Bright <newshound2 digitalmars.com> writes:

On 7/29/2011 1:35 PM, Trass3r wrote:
 Here's another one:

 T* p;
 ...
 if (A)
 p = &t;
 ...
 if (B)
 ... *p ...

 B is true if and only if A is true. B can even be the same expression as A,
 such as a->b->c. Clang complains on the *p that p is "uninitialized".

 ...
 At what point does this cease to be fixing "bugs" and become "contort to fit
 clang's quirks"?

 Clang's static analysis isn't very mature yet.

It's not about maturity, it's simply impossible to correctly do the above case.

Jul 29 2011

Brad Roberts <braddr slice-2.puremagic.com> writes:

On Fri, 29 Jul 2011, Walter Bright wrote:

 On 7/29/2011 1:35 PM, Trass3r wrote:
 Here's another one:
 
 T* p;
 ...
 if (A)
 p = &t;
 ...
 if (B)
 ... *p ...
 
 B is true if and only if A is true. B can even be the same expression as
 A,
 such as a->b->c. Clang complains on the *p that p is "uninitialized".

 ...
 At what point does this cease to be fixing "bugs" and become "contort to
 fit
 clang's quirks"?

 
 Clang's static analysis isn't very mature yet.

 
 It's not about maturity, it's simply impossible to correctly do the above
 case.

You keep using the term impossible.  It's impossible to be 100% correct in 
100% of the code bases.  Sure, but that's obviously not the goal.  The 
goal is to be correct enough on enough code to be useful.  A far more 
achievable goal.  Clang isn't there yet, but with time and effort it can 
improve towards that goal.  That's where producing test cases on things it 
fails on is useful.. standard bug reporting (or feature gap reporting in 
this case).

If B can be shown to be a super set of A, which is very doable for a 
reasonable set of cases, then that false positive can be eliminated.

Continuing to think of these tools as either perfect or useless and useful 
discussions about them are kinda hard to have.

Later,
Brad

Jul 29 2011

Walter Bright <newshound2 digitalmars.com> writes:

On 7/29/2011 2:53 PM, Brad Roberts wrote:
 You keep using the term impossible.

It's the halting problem. How are you going to prove that a->b->c yields the 
same result after you've done pretty much any function call?


 It's impossible to be 100% correct in
 100% of the code bases.

When it is built into the compiler, I think it should be 100%. And frankly, the 
marketing of these false positive tools bugs me. They'll run their tool over 
some code base, then publish the results and make claims about all these 
hundreds of "bugs" they've found. Even worse, others will uncritically read 
those pages and assume those code bases are "buggy".


 Sure, but that's obviously not the goal.  The
 goal is to be correct enough on enough code to be useful.  A far more
 achievable goal.  Clang isn't there yet, but with time and effort it can
 improve towards that goal.  That's where producing test cases on things it
 fails on is useful.. standard bug reporting (or feature gap reporting in
 this case).

 If B can be shown to be a super set of A, which is very doable for a
 reasonable set of cases, then that false positive can be eliminated.

 Continuing to think of these tools as either perfect or useless and useful
 discussions about them are kinda hard to have.

There are two things one can try with static analysis:

1. Flag the code if one cannot prove it is good.

2. Flag the code if one can prove it is bad.

clang is going for (1). I think (2) is more appropriate.


prover, and what a great tool that was. Reading the documentation on it, it 
looked great. So I tried it out in pastebin. It pretty much only works on the 
examples given in the marketing literature. It couldn't even handle bitwise ops.

I know enough about data flow analysis to infer what clang is doing from its 
pattern of reports, and it hasn't even begun to solve this problem.

As for breaking new ground, I really like what D is doing with value range 
propagation in implicit conversions. So far it's been a solid base hit.

Jul 29 2011

bearophile <bearophileHUGS lycos.com> writes:

Walter:


 prover, and what a great tool that was. Reading the documentation on it, it 
 looked great. So I tried it out in pastebin. It pretty much only works on the 
 examples given in the marketing literature. It couldn't even handle bitwise
ops.


that are coming out now) is to create a small kernel (for an operating system)
that is 100% proved correct (correct according to its specs, etc). If they
don't use an automatic theorem prover, then they have to do 100% of the proofs
by hand. Even if the tool is able to do only 30% of the proofs automatically
(and another 40% with a bit of help) it's a lot of manual work saved. The
papers report the percentage of the theorems done by the tool, the percentage
of the theorems done partially by hand, and the remaining percentage done by
hand. The source code is all available, to verify the numbers inside the papers
are not made up.


 I know enough about data flow analysis to infer what clang is doing from its 
 pattern of reports, and it hasn't even begun to solve this problem.

Clang static analysis is a very new sub-tool. On the market there are
commercial lint tools that are probably 15-18 times older than Clang. Give
Clang two or three years are we'll see. Its creators have created a good C++
compiler in a very short time, and its back-end is already rivalling GCC in
optimizations (it's not there yet, but the distance is not so much,
auto-vectorization is one of the most important differences between the two
back-ends).

Bye,
bearophile

Jul 29 2011

Andrew Wiley <wiley.andrew.j gmail.com> writes:

On Fri, Jul 29, 2011 at 4:51 PM, bearophile <bearophileHUGS lycos.com>wrote:

 Walter:



 contract
 prover, and what a great tool that was. Reading the documentation on it,

 it
 looked great. So I tried it out in pastebin. It pretty much only works on

 the
 examples given in the marketing literature. It couldn't even handle

 bitwise ops.


 that are coming out now) is to create a small kernel (for an operating
 system)


Woah Woah Woah

the entire CLR as well?


 that is 100% proved correct (correct according to its specs, etc). If they
 don't use an automatic theorem prover, then they have to do 100% of the
 proofs by hand. Even if the tool is able to do only 30% of the proofs
 automatically (and another 40% with a bit of help) it's a lot of manual work
 saved. The papers report the percentage of the theorems done by the tool,
 the percentage of the theorems done partially by hand, and the remaining
 percentage done by hand. The source code is all available, to verify the
 numbers inside the papers are not made up.


 I know enough about data flow analysis to infer what clang is doing from

 its
 pattern of reports, and it hasn't even begun to solve this problem.

 Clang static analysis is a very new sub-tool. On the market there are
 commercial lint tools that are probably 15-18 times older than Clang. Give
 Clang two or three years are we'll see. Its creators have created a good C++
 compiler in a very short time, and its back-end is already rivalling GCC in
 optimizations (it's not there yet, but the distance is not so much,
 auto-vectorization is one of the most important differences between the two
 back-ends).

I've seen several times that a major feature of a lint tool was that you
could disable warnings of certain types, which is necessary because there
are just too many false positives. Clang will doubtlessly get better, but
the fact will still remain that C and C++ are terrible languages to try to
run static analyses on.

Jul 29 2011

bearophile <bearophileHUGS lycos.com> writes:

Andrew Wiley:




functional language), that has a Coq-proved type system that seems a tour the
force:
http://lambda-the-ultimate.org/node/4318


 Have they proven the entire CLR as well?

The project is unrelated to the CLR.

contracts to the whole CLR.

Bye,
bearophile

Jul 29 2011

Walter Bright <newshound2 digitalmars.com> writes:

On 7/29/2011 4:51 PM, bearophile wrote:
 Walter:


 contract prover, and what a great tool that was. Reading the documentation
 on it, it looked great. So I tried it out in pastebin. It pretty much only
 works on the examples given in the marketing literature. It couldn't even
 handle bitwise ops.


 that are coming out now) is to create a small kernel (for an operating
 system) that is 100% proved correct (correct according to its specs, etc). If
 they don't use an automatic theorem prover, then they have to do 100% of the
 proofs by hand. Even if the tool is able to do only 30% of the proofs
 automatically (and another 40% with a bit of help) it's a lot of manual work
 saved. The papers report the percentage of the theorems done by the tool, the
 percentage of the theorems done partially by hand, and the remaining
 percentage done by hand. The source code is all available, to verify the
 numbers inside the papers are not made up.

 From the way you originally positioned it, I had higher expectations.


 I know enough about data flow analysis to infer what clang is doing from
 its pattern of reports, and it hasn't even begun to solve this problem.

 Clang static analysis is a very new sub-tool. On the market there are
 commercial lint tools that are probably 15-18 times older than Clang. Give
 Clang two or three years are we'll see.

I think you misunderstand the scope of this problem. They've done data flow 
analysis, an off-the-shelf technology for the last 30 years at least. They 
haven't done any of the hard stuff that I can see.

It's like doing natural language translation using a dictionary. It looks like 
it'll work, and you can have promising early success with it, but then you hit
a 
wall.

Jul 29 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 7/29/11 5:48 PM, Walter Bright wrote:
 As for breaking new ground, I really like what D is doing with value
 range propagation in implicit conversions. So far it's been a solid base
 hit.

But that's also an approximate thing :o).

Andrei

Jul 29 2011

Walter Bright <newshound2 digitalmars.com> writes:

On 7/29/2011 5:48 PM, Andrei Alexandrescu wrote:
 On 7/29/11 5:48 PM, Walter Bright wrote:
 As for breaking new ground, I really like what D is doing with value
 range propagation in implicit conversions. So far it's been a solid base
 hit.

 But that's also an approximate thing :o).

True, but it's done in a way that enables things rather than disallowing them. 
This is much more palatable.

Jul 29 2011

dennis luehring <dl.soluz gmx.net> writes:

Am 29.07.2011 22:02, schrieb Walter Bright:
 Here's another one:
 
    T* p;
    ...
    if (A)
        p = &t;
    ...
    if (B)
        ... *p ...
 
 B is true if and only if A is true. B can even be the same expression as
 A, such as a->b->c. Clang complains on the *p that p is "uninitialized".
 Ok, so I rewrite as:
 
    T* p = NULL;
    ...
    if (A)
        p = &t;
    ...
    if (B)
        ... *p ...
 
 but now clang says I'm dereferencing a NULL pointer. At this point, I'm
 faced with some significant trial-and-error refactoring to get rid of
 the message.
 
 At what point does this cease to be fixing "bugs" and become "contort to
 fit clang's quirks"?

but your code does not reflect your "B is true if and only if A is
true." statement

T* p = NULL;

if (A)
{
  p = &t;

  if (B)
     ... *p ...
}

Jul 30 2011

Don <nospam nospam.com> writes:

dennis luehring wrote:
 Am 29.07.2011 22:02, schrieb Walter Bright:
 Here's another one:

    T* p;
    ...
    if (A)
        p = &t;
    ...
    if (B)
        ... *p ...

 B is true if and only if A is true. B can even be the same expression as
 A, such as a->b->c. Clang complains on the *p that p is "uninitialized".
 Ok, so I rewrite as:

    T* p = NULL;
    ...
    if (A)
        p = &t;
    ...
    if (B)
        ... *p ...

 but now clang says I'm dereferencing a NULL pointer. At this point, I'm
 faced with some significant trial-and-error refactoring to get rid of
 the message.

 At what point does this cease to be fixing "bugs" and become "contort to
 fit clang's quirks"?

 
 but your code does not reflect your "B is true if and only if A is
 true." statement

The example I posted was clearer.

Where p is just a variable:

if (!p) {  ... }
if (p) { ... }

it thinks that NEITHER of the code paths is taken.
I would rate that as a show-stopper bug. Don't waste any more time on 
clang until that's fixed.

Jul 31 2011

dennis luehring <dl.soluz gmx.net> writes:

Am 31.07.2011 11:47, schrieb Don:
 dennis luehring wrote:
 Am 29.07.2011 22:02, schrieb Walter Bright:
 Here's another one:

    T* p;
    ...
    if (A)
        p = &t;
    ...
    if (B)
        ... *p ...

 B is true if and only if A is true. B can even be the same expression as
 A, such as a->b->c. Clang complains on the *p that p is "uninitialized".
 Ok, so I rewrite as:

    T* p = NULL;
    ...
    if (A)
        p = &t;
    ...
    if (B)
        ... *p ...

 but now clang says I'm dereferencing a NULL pointer. At this point, I'm
 faced with some significant trial-and-error refactoring to get rid of
 the message.

 At what point does this cease to be fixing "bugs" and become "contort to
 fit clang's quirks"?

 but your code does not reflect your "B is true if and only if A is
 true." statement

 
 The example I posted was clearer.

but i think clang was right with Walters code - i do not understand
why he splits up the A and B conditional block into spereated ones when
B does not work if A wasn't true before... makes it sense (in exactly
this case) to write the code like he does?

Jul 31 2011

Walter Bright <newshound2 digitalmars.com> writes:

On 7/31/2011 3:28 AM, dennis luehring wrote:
 but i think clang was right with Walters code - i do not understand
 why he splits up the A and B conditional block into spereated ones when
 B does not work if A wasn't true before... makes it sense (in exactly
 this case) to write the code like he does?

What I meant was:

"A" represents an expression

"B" represents a different expression than "A", but is true if and only if "A" 
is true.

For example:

"A" might be: x=0;x=a->b->c?1:0;a->b->c

"B" might be: x

Jul 31 2011

Walter Bright <newshound2 digitalmars.com> writes:

On 7/27/2011 3:08 PM, Robert Clipsham wrote:
 Note that in
 addition to Walter's global.h changes, __attribute__((noreturn)) needed to be
 added to local_assert and util_assert in tassert.h too (could you add these too
 Walter?)


I thought I did. Will check.

Jul 27 2011

D Programming

C/C++ Programming

Other

digitalmars.D - Clang static analysis results for dmd