www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Redundancies often reveal bugs

reply bearophile <bearophileHUGS lycos.com> writes:
Here (pdf alert) I have found a very simple but interesting paper that has
confirmed an hypothesis of mine.

This is a page that contains a pdf that shows a short introduction to the paper:
http://www.ganssle.com/tem/tem80.htm

This is the paper, "Using Redundancies to Find Errors", by Yichen Xie and
Dawson Engler, 2002:
www.stanford.edu/~engler/p401-xie.pdf


A trimmed down quote from the tem80 page:

Researchers at Stanford have just released a paper detailing their use of
automated tools
to look for redundant code in 1.6 million lines of Linux. "Redundant" is defined as: - Idempotent operations (like assigning a variable to itself) - Values assigned to variables that are not subsequently used - Dead code - Redundant conditionals They found that redundancies, even when harmless, strongly correlate with bugs. Even when the extra code causes no problems, odds are high that other, real, errors will be found within a few lines of the redundant operations. Block-copied code is often suspect, as the developer neglects to change things needed for the codeís new use. Another common problem area: error handlers, which are tough to test, and are, in data Iíve gathered, a huge source of problems in deployed systems. The authors note that their use of lint has long produced warnings about unused variables and return codes, which they've always treated as harmless stylistic issues. Now it's clear that lint is indeed signalling something that may be critically important. The study makes me wonder if compilers that optimize out dead code to reduce memory needs aren't in fact doing us a disservice. Perhaps they should error and exit instead.< This study confirms that situations like: x = x; often hide bugs, unused variables are often enough (as I have suspected, despite what Walter said about it) a sign for possible real bugs, and assigned but later unused variables too may hide bugs. This paper has confirmed that some of my enhancement requests need more attention: http://d.puremagic.com/issues/show_bug.cgi?id=3878 http://d.puremagic.com/issues/show_bug.cgi?id=3960 http://d.puremagic.com/issues/show_bug.cgi?id=4407 situations like x=x; reveal true bugs like: class Foo { int x, y; this(int x_, int y_) { this.x = x; y = y; } } void main() {} Now I think that such redundancies and similar things often enough hide true bugs. But what to do? To turn x=x; into a true error? In a comment to bug 3878 Don gives a situation where DMD may raise a true true compile-time error. But in other cases a true error looks too much to me. Bye, bearophile
Sep 30 2010
next sibling parent reply Kagamin <spam here.lot> writes:
bearophile Wrote:

 errors will be found
 often hide bugs
 situations like x=x; reveal true bugs like:
 
 class Foo {
     int x, y;
     this(int x_, int y_) {
         this.x = x;
         y = y;
         
     }
 }
 void main() {}
Yes, fields and locals in camelCase is a bug.
Sep 30 2010
parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Thursday 30 September 2010 23:33:26 Kagamin wrote:
 bearophile Wrote:
 errors will be found
 often hide bugs
 
 situations like x=x; reveal true bugs like:
 
 class Foo {
 
     int x, y;
     this(int x_, int y_) {
     
         this.x = x;
         y = y;
     
     }
 
 }
 void main() {}
Yes, fields and locals in camelCase is a bug.
??? Why on earth would it be a bug to have variable names in camelcase? Camelcase is purely a stylistic issue - and one which most people adhere to. - Jonathan M Davis
Oct 01 2010
prev sibling next sibling parent reply "JimBob" <jim bob.com> writes:
"bearophile" <bearophileHUGS lycos.com> wrote in message 
news:i83cil$2o02$1 digitalmars.com...
 situations like x=x; reveal true bugs like:

 class Foo {
    int x, y;
    this(int x_, int y_) {
        this.x = x;
        y = y;

    }
 }
I get hit much more often by somthing like this.... class Foo { int m_x, m_y; this(int x, int y) { int m_x = x; int m_y = y; } } I dont know if it is, but IMO it really should be an error to declare local variables that hide member variables.
Oct 01 2010
parent reply Peter Alexander <peter.alexander.au gmail.com> writes:
 I dont know if it is, but IMO it really should be an error to declare local
 variables that hide member variables.
I disagree. I always do that in constructors: int x, y; this(int x, int y) { this.x = x; this.y = y; } I think you would annoy a lot of people if it was forbidden.
Oct 01 2010
next sibling parent reply Daniel Gibson <metalcaedes gmail.com> writes:
On Fri, Oct 1, 2010 at 9:50 AM, Peter Alexander
<peter.alexander.au gmail.com> wrote:
 I dont know if it is, but IMO it really should be an error to declare lo=
cal
 variables that hide member variables.
I disagree. I always do that in constructors: int x, y; this(int x, int y) { =A0this.x =3D x; =A0this.y =3D y; } I think you would annoy a lot of people if it was forbidden.
I do the same, but got a nasty bug that took me hours to find because in one condition later down the constructor I forgot the "this." prefix. this kind of bug is hard to spot by just reading the code. IMHO it's quite tedious to do all these assignments in a constructor anyway - it'd be cool to have some possibility to say "this constructor argument should be assigned to the classes field of the same name", like int x, y, z; this(class int x, class int y, int a) { // this.x and this.y are set implicitly this.z =3D (x+y)/a; } or something like that. Dunno if "class" is an appropriate keyword for that (probably not), but it should suffice to illustrate the idea. Well, maybe "this(int this.x, int this.y, int a)" would be better. And maybe this wouldn't need addition to the language at all but could be done with some template/string-mixin magic. I haven't really thought this through, but *some* possibility to do this (assign constructor- or even function-arguments to class field of same name) would be cool :-)
Oct 01 2010
parent reply "Simen kjaeraas" <simen.kjaras gmail.com> writes:
Daniel Gibson <metalcaedes gmail.com> wrote:

 this(int this.x, int this.y, int a)
Me likes. -- Simen
Oct 01 2010
parent retard <re tard.com.invalid> writes:
Fri, 01 Oct 2010 12:38:26 +0200, Simen kjaeraas wrote:

 Daniel Gibson <metalcaedes gmail.com> wrote:
 
 this(int this.x, int this.y, int a)
Me likes.
Looks almost like Scala: class MyClass(var x: Int, var y: Int, a: Int) { ... }
Oct 02 2010
prev sibling parent "JimBob" <jim bob.com> writes:
"Peter Alexander" <peter.alexander.au gmail.com> wrote in message 
news:i843rl$1gr9$1 digitalmars.com...
 I dont know if it is, but IMO it really should be an error to declare 
 local
 variables that hide member variables.
I disagree. I always do that in constructors: int x, y; this(int x, int y) { this.x = x; this.y = y; } I think you would annoy a lot of people if it was forbidden.
I'm sure it would. But i think the benefit would outweigh the cost. I mean the cost is coding style, personal preference, the benefit is fewer bugs. And people would get used to it.
Oct 01 2010
prev sibling next sibling parent reply Justin Johansson <no spam.com> writes:
On 1/10/2010 11:12 AM, bearophile wrote:
 Here (pdf alert) I have found a very simple but interesting paper that has
confirmed an hypothesis of mine.
So far most respondents have gone completely off-subject here. In hardware systems redundancy is critical for safety. In software systems redundancy is bad because, as you and the paper suggest, redundancy makes for bugs. The principle for software is both normalization, DRY (do not repeat yourself) and ZIP (zero intolerance for plagiarism). As always, I enjoy your interesting posts. Regards Justin Johansson
Oct 01 2010
parent Justin Johansson <no spam.com> writes:
On 2/10/2010 1:52 AM, Justin Johansson wrote:
Whoops, bug in my reply.
ZIP as "zero intolerance for plagiarism" is obviously
what I did not mean.  I meant "zero tolerance"
rather than "zero intolerance" but then the acronym
ZTP does not sound so good, :-(
Oct 01 2010
prev sibling next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Thank you for all the answers.

Daniel Gibson:	

 Well, maybe "this(int this.x, int this.y, int a)" would be better.
This reduces useless code in the constructor and keep the code more DRY, looks able to avoid part of the problems I was talking about (but not all of them). So this struct: // Code #1 struct Something { int x, y, aa; this(int x_, int y_, int a_) { this.x = x_; this.y = y_; this.aa = a_ * a_ + x_; } void update(int x_, int b) { this.x = x_; this.aa += b; } } May be written (it's just syntax sugar): // Code #2 struct Something { int x, y, aa; this(this.x, this.y, int a_) { this.aa = a_ * a_ + x; } void update(this.x) { this.aa += b; } } In some situations you need constructor arguments to be of type different from instance attributes. In such situations you may use the normal old syntax. Or instance argument types may be optional, so this code: // Code #3 class Foo {} class Bar : Foo {} class Something { Foo c; this(Bar c_) { this.c = c_; } } void main() { auto s = new Something(new Bar); } May be written: // Code #4 class Foo {} class Bar : Foo {} class Something { Foo c; this(Bar this.c) {} } void main() { auto s = new Something(new Bar); } That syntax idea is nice to avoid some code duplication, and I'd like to have it if it has no bad side effects (beside making the language a bit more complex), but it can't avoid bugs like the following inc(), so I think it's not enough to solve the problems I was talking about: // Code #5 class Foo { int x; void inc(int x) { x += x; } } void main() {} Despite Python is seen by some people as a scripting language unfit for larger programs, it contains many design decisions able to avoid several kinds of bugs (that are often enough present in D programs too). Regarding the bugs discussed in this post, Python is able to avoid some of them because inside methods all instance attributes must be prefixed by a name typically like "self." (and class instance attributes, that are similar to static class attributes in D, must be prefixed by the class name). So some of the troubles in D code I am talking about may be avoided requiring the "this." prefix where the code may be ambiguous for the programmer (I am not talking about code ambiguous for the compiler). This can't avoid troubles like in Code #5. The only good way I see to avoid troubles like in Code #5 is to forbid the method arguments that have the same name as class/struct/union attributes (this is what bug 3878 is about). For the problems we are talking in this thread probably more than one solution at the same time is needed. The method "this" arguments seem a nice idea to improve the DRY-ness of the code and avoid some bugs, the obligatory usage of the "this." prefix when the code is ambiguous for the programmer helps avoid other bugs, and maybe a warning for x=x; lines of code is useful, and a warning for unused variables and unused last assigned values too are useful to avoid other bugs. Bye, bearophile
Oct 01 2010
next sibling parent "Simen kjaeraas" <simen.kjaras gmail.com> writes:
bearophile <bearophileHUGS lycos.com> wrote:

 but it can't avoid bugs like the following inc(), so I think it's not  
 enough to solve the problems I was talking about:


 // Code #5
 class Foo {
     int x;
     void inc(int x) { x += x; }
 }
 void main() {}
Oh, but it can (sort of). By allowing this syntax, there is *very* little reason to allow for shadowing of members by parameters or local variables, and those may thus more readily be disallowed. -- Simen
Oct 01 2010
prev sibling parent bearophile <bearophileHUGS lycos.com> writes:
 // Code #2
 struct Something {
     int x, y, aa;
     this(this.x, this.y, int a_) {
         this.aa = a_ * a_ + x;
     }
     void update(this.x) {
         this.aa += b;
     }
 }
Sorry, that's wrong. The correct part: void update(this.x, int b) { this.aa += b; } Bye, bearophile
Oct 01 2010
prev sibling next sibling parent reply retard <re tard.com.invalid> writes:
Thu, 30 Sep 2010 21:12:53 -0400, bearophile wrote:

 Here (pdf alert) I have found a very simple but interesting paper that
 has confirmed an hypothesis of mine.
 
 This is a page that contains a pdf that shows a short introduction to
 the paper: http://www.ganssle.com/tem/tem80.htm
 
 This is the paper, "Using Redundancies to Find Errors", by Yichen Xie
 and Dawson Engler, 2002: www.stanford.edu/~engler/p401-xie.pdf
 
 
 A trimmed down quote from the tem80 page:
 
Researchers at Stanford have just released a paper detailing their use
of automated tools
to look for redundant code in 1.6 million lines of Linux. "Redundant" is defined as: - Idempotent operations (like assigning a variable to itself) - Values assigned to variables that are not subsequently used - Dead code - Redundant conditionals They found that redundancies, even when harmless, strongly correlate with bugs. Even when the extra code causes no problems, odds are high that other, real, errors will be found within a few lines of the redundant operations. Block-copied code is often suspect, as the developer neglects to change things needed for the code’s new use. Another common problem area:
error
 handlers, which are tough to test, and are, in data I’ve gathered, a
 huge source of problems in deployed systems. The authors note that their
 use of lint has long produced warnings about unused variables and return
 codes, which they've always treated as harmless stylistic issues. Now
 it's clear that lint is indeed signalling something that may be
 critically important. The study makes me wonder if compilers that
 optimize out dead code to reduce memory needs aren't in fact doing us a
 disservice. Perhaps they should error and exit instead.
If you've ever compiled open source code, you probably have noticed that some developers take software quality seriously. Their programs show no warnings/errors on compile time. That's not very impressive, when the code is below 5000 LOC, but if you apply the same principle when the codebase grows to 500000 LOC, it's a big win. OTOH, there are lots of projects with lazy bastards developing them. Something ALWAYS breaks. A minor update from gcc ?.?.0 to ?.?.1 seems to be enough to break something. The developers were too lazy to study even the basic functionality of C and seem rather surprised when the compiler prevents data corruption or segfaults or other indeterministic states. I always treat code with lots of these bugs as something completely rotten. In distros like Gentoo these bugs prevent people from actually installing and using the program.
 class Foo {
     int x, y;
     this(int x_, int y_) {
         this.x = x;
         y = y;
         
     }
 }
 void main() {}
Some languages prevent this bug by making the parameters immutable in some sense (at least shallow immutability). It's even possible in Java, and in one place I worked previously "final params by default" was one of the rules in code review and style guides.
Oct 02 2010
next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
retard wrote:
 Some languages prevent this bug by making the parameters immutable in 
 some sense (at least shallow immutability). It's even possible in Java, 
 and in one place I worked previously "final params by default" was one of 
 the rules in code review and style guides.
this(const int x, const int y) { ... }
Oct 03 2010
prev sibling parent reply Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 10/2/10, retard <re tard.com.invalid> wrote:
 Thu, 30 Sep 2010 21:12:53 -0400, bearophile wrote:

 Here (pdf alert) I have found a very simple but interesting paper that
 has confirmed an hypothesis of mine.

 This is a page that contains a pdf that shows a short introduction to
 the paper: http://www.ganssle.com/tem/tem80.htm

 This is the paper, "Using Redundancies to Find Errors", by Yichen Xie
 and Dawson Engler, 2002: www.stanford.edu/~engler/p401-xie.pdf


 A trimmed down quote from the tem80 page:

Researchers at Stanford have just released a paper detailing their use
of automated tools
to look for redundant code in 1.6 million lines of Linux. "Redundant" is defined as: - Idempotent operations (like assigning a variable to itself) - Values assigned to variables that are not subsequently used - Dead code - Redundant conditionals They found that redundancies, even when harmless, strongly correlate with bugs. Even when the extra code causes no problems, odds are high that other, real, errors will be found within a few lines of the redundant operations. Block-copied code is often suspect, as the developer neglects to change things needed for the code=92s new use. Another common problem area:
error
 handlers, which are tough to test, and are, in data I=92ve gathered, a
 huge source of problems in deployed systems. The authors note that their
 use of lint has long produced warnings about unused variables and return
 codes, which they've always treated as harmless stylistic issues. Now
 it's clear that lint is indeed signalling something that may be
 critically important. The study makes me wonder if compilers that
 optimize out dead code to reduce memory needs aren't in fact doing us a
 disservice. Perhaps they should error and exit instead.
If you've ever compiled open source code, you probably have noticed that some developers take software quality seriously. Their programs show no warnings/errors on compile time. That's not very impressive, when the code is below 5000 LOC, but if you apply the same principle when the codebase grows to 500000 LOC, it's a big win. OTOH, there are lots of projects with lazy bastards developing them. Something ALWAYS breaks. A minor update from gcc ?.?.0 to ?.?.1 seems to be enough to break something. The developers were too lazy to study even the basic functionality of C and seem rather surprised when the compiler prevents data corruption or segfaults or other indeterministic states. I always treat code with lots of these bugs as something completely rotten. In distros like Gentoo these bugs prevent people from actually installing and using the program.
Don't forget pragma abuse! I don't have the exact source, but I've seen code like this in several medium-big sized projects: // Shut up stupid compiler warnings #pragma (DISABLE, 5596) #pragma (DISABLE, 5597) #pragma (DISABLE, 5598) So not only do people neglect warnings, they get annoyed with them but then decide the best solution is to silence the compiler. OTOH in some cases the warnings are caused by 3rd party libraries and the warnings are re-enabled for user-code again (I've seen this latter case used in Scintilla or Scite).
Oct 14 2010
parent retard <re tard.com.invalid> writes:
Thu, 14 Oct 2010 17:21:39 +0200, Andrej Mitrovic wrote:

 Don't forget pragma abuse! I don't have the exact source, but I've seen
 code like this in several medium-big sized projects:
 
 // Shut up stupid compiler warnings
 #pragma (DISABLE, 5596)
 #pragma (DISABLE, 5597)
 #pragma (DISABLE, 5598)
 
 So not only do people neglect warnings, they get annoyed with them but
 then decide the best solution is to silence the compiler.
 
 OTOH in some cases the warnings are caused by 3rd party libraries and
 the warnings are re-enabled for user-code again (I've seen this latter
 case used in Scintilla or Scite).
Ah, true. Makes one wonder, if C/C++ as systems programming languages are not limiting the programmer unlike impractical high level languages, why do you need to hack the simple warning/error system..
Oct 15 2010
prev sibling parent Stewart Gordon <smjg_1998 yahoo.com> writes:
On 01/10/2010 02:12, bearophile wrote:
<snip>
 Researchers at Stanford have just released a paper detailing their use of
automated tools
to look for redundant code in 1.6 million lines of Linux. "Redundant" is defined as: - Idempotent operations (like assigning a variable to itself)
<snip> Idempotent operations are not necessarily redundant. For example, x = y; is idempotent, but not redundant. But performing the same idempotent operation multiple times in succession is an example of redundancy. Really, section 2 of that paper isn't about idempotence at all. For those who aren't sure what idempotent means, put simply it means that performing the operation multiple times in succession has the same effect as performing it only once. But assigning a variable to itself is indeed redundant, because it has no effect. Stewart.
Oct 02 2010