digitalmars.D - [OT] Walter about compilers

eles (11/11) Jan 22 2013 Hi everybody,

Era Scarecrow (39/46) Jan 22 2013 It's been quoted that for every 10 lines of code there's a bug.

Peter Alexander (4/8) Jan 22 2013 I love how >1kloc is "large" :D

deadalnix (3/12) Jan 22 2013 It really depends if we are talking about java or not.

Peter Alexander (5/19) Jan 22 2013 Not just Java. According to Wikipedia Debian 5 has over 300

jerro (1/3) Jan 22 2013 It also consists of over 20000 packages. It is not one program.

eles (3/7) Jan 22 2013 That means (at least) 100k bugs. Happy fixing!
Simen Kjaeraas (6/13) Jan 22 2013 It's context dependent, of course. Finding all the bugs in 1kloc is doab...

bearophile (6/8) Jan 22 2013 In debug mode that's the job of a modern well designed language,

Era Scarecrow (15/21) Jan 22 2013 Agreed. However D (compilers) doesn't have an option to check

Thiez (6/11) Jan 22 2013 Since D aims to emulate C in this aspect, overflow with uints is

Era Scarecrow (38/50) Jan 22 2013 That merely shortens the size of the check, not where you need

Walter Bright (4/5) Jan 22 2013 I've been doing some refactoring in dmd now and then. Every time I do, t...
deadalnix (4/5) Jan 22 2013 It is said a lot. I'd like to see hard data on that one. I'd bet

Philippe Sigaud (2/8) Jan 22 2013 With D, we aim for one bug every 14 lines of code :)

eles (4/8) Jan 22 2013 Add to that the fact that programs in D tend to be shorter than

Era Scarecrow (11/16) Jan 22 2013 Less boiler plate code, fewer direct pointers, no preprocessor

eles (3/8) Jan 23 2013 Sigh... Only if it would go into that gcc suite... faster.

Simen Kjaeraas (5/16) Jan 22 2013 Can do. Who wants to patch the compiler to automaically insert those bug...
Don (6/12) Jan 23 2013 It definitely does.

Era Scarecrow (21/25) Jan 23 2013 Bugs in code don't always live on one line per bug; They can

"eles" <eles eles.com> writes:

Hi everybody,

I was just reading this:

http://www.laputan.org/metamorphosis/metamorphosis.html#SoftwareTectonics

(a thing about software architectures).

The text opens with...:

"We like it when people always want more! Otherwise, we'd be out 
of the upgrade business. Sometimes, people ask me what I will do 
when the compiler is done. Done? No software program that is 
selling is ever done!
-- Walter Bright, C++ compiler architect"

So... the question is: does that quote also applies for dmd? :)

Jan 22 2013

"Era Scarecrow" <rtcvb32 yahoo.com> writes:

On Tuesday, 22 January 2013 at 13:54:08 UTC, eles wrote:

 The text opens with...:

 "We like it when people always want more! Otherwise, we'd be 
 out of the upgrade business. Sometimes, people ask me what I 
 will do when the compiler is done. Done? No software program 
 that is selling is ever done!
 -- Walter Bright, C++ compiler architect"

 So... the question is: does that quote also applies for dmd? :)

  It's been quoted that for every 10 lines of code there's a bug. 
There are programs with tens of thousands of lines of code, so 
finding every bug is probably impossible for large programs 
(above 1000 lines). But that doesn't mean they can't run very 
very well. A number of the bugs for unchecked work is addition 
for example, perhaps simplest of operations; Are you going to 
check after every little + that you didn't have an overflow? 
Without a lot of extra work are you going to include checks that 
ensure they can't break?

C example:

   //code looks okay
   void* getMemory(int a, int b) {
     return malloc(a + b);
   }

   //becomes negative due to overflow. it can happen
   //probably returns NULL. I don't know..
   void* ptr = getMemory(0x7fffffff, 0x7fffffff);

   //overflow free version?
   void* getMemory(unsigned int a, unsigned int b) {
     //max name may be wrong, but you get the idea.
     //don't remember, need third cast?
     assert(((long long) a) + ((long long) b) <= UNSIGNED_INT_MAX);
     return malloc(a + b);
   }

   //should assert now
   void* ptr = getMemory(UNSIGNED_INT_MAX, UNSIGNED_INT_MAX);


  Since part of the process is not only fixing bugs and improving 
the compiler, but there's also new features that may be requested 
that you find necessary yet never needed before you thought about 
it.

  Consider: A recent project of mine that I hadn't updated in over 
a year and a half seemed to have a bug with how it handled a 
certain feature and was just brought up, needed to add about 10 
lines of code to handle it; Then I found a bug within those 10 
lines (after it was working).

  With that in mind, it's likely no program will be 'done', but if 
they do the job well enough then it's probably good enough. So to 
answer it, the answer is probably yes it applies to dmd.

Jan 22 2013

"Peter Alexander" <peter.alexander.au gmail.com> writes:

On Tuesday, 22 January 2013 at 14:44:26 UTC, Era Scarecrow wrote:
 It's been quoted that for every 10 lines of code there's a bug. 
 There are programs with tens of thousands of lines of code, so 
 finding every bug is probably impossible for large programs 
 (above 1000 lines).

I love how >1kloc is "large" :D

I'd say anything under 100kloc is a small program. 100kloc-1mloc 
medium, and >1mloc large.

Jan 22 2013

"deadalnix" <deadalnix gmail.com> writes:

On Tuesday, 22 January 2013 at 14:59:48 UTC, Peter Alexander 
wrote:
 On Tuesday, 22 January 2013 at 14:44:26 UTC, Era Scarecrow 
 wrote:
 It's been quoted that for every 10 lines of code there's a 
 bug. There are programs with tens of thousands of lines of 
 code, so finding every bug is probably impossible for large 
 programs (above 1000 lines).

 I love how >1kloc is "large" :D

 I'd say anything under 100kloc is a small program. 
 100kloc-1mloc medium, and >1mloc large.

It really depends if we are talking about java or not.

Jan 22 2013

"Peter Alexander" <peter.alexander.au gmail.com> writes:

On Tuesday, 22 January 2013 at 15:26:28 UTC, deadalnix wrote:
 On Tuesday, 22 January 2013 at 14:59:48 UTC, Peter Alexander 
 wrote:
 On Tuesday, 22 January 2013 at 14:44:26 UTC, Era Scarecrow 
 wrote:
 It's been quoted that for every 10 lines of code there's a 
 bug. There are programs with tens of thousands of lines of 
 code, so finding every bug is probably impossible for large 
 programs (above 1000 lines).

 I love how >1kloc is "large" :D

 I'd say anything under 100kloc is a small program. 
 100kloc-1mloc medium, and >1mloc large.

 It really depends if we are talking about java or not.

Not just Java. According to Wikipedia Debian 5 has over 300 
million lines of code.

http://en.wikipedia.org/wiki/Source_lines_of_code

Last time I counted, Phobos has ~200kloc.

Jan 22 2013

"jerro" <a a.com> writes:

 Not just Java. According to Wikipedia Debian 5 has over 300 
 million lines of code.

It also consists of over 20000 packages. It is not one program.

Jan 22 2013

"eles" <eles eles.com> writes:

On Tuesday, 22 January 2013 at 14:59:48 UTC, Peter Alexander 
wrote:
 On Tuesday, 22 January 2013 at 14:44:26 UTC, Era Scarecrow 
 wrote:

 I'd say anything under 100kloc is a small program. 
 100kloc-1mloc medium, and >1mloc large.

That means (at least) 100k bugs. Happy fixing!

Jan 22 2013

"Simen Kjaeraas" <simen.kjaras gmail.com> writes:

On 2013-01-22, 15:59, Peter Alexander wrote:

 On Tuesday, 22 January 2013 at 14:44:26 UTC, Era Scarecrow wrote:
 It's been quoted that for every 10 lines of code there's a bug. There  
 are programs with tens of thousands of lines of code, so finding every  
 bug is probably impossible for large programs (above 1000 lines).

 I love how >1kloc is "large" :D

 I'd say anything under 100kloc is a small program. 100kloc-1mloc medium,  
 and >1mloc large.

It's context dependent, of course. Finding all the bugs in 1kloc is doable,
but lots of work. Finding all the bugs in 10kloc, conceivably doable, but
unlikely to be worth it. >= 100kloc? ouch.

-- 
Simen

Jan 22 2013

"bearophile" <bearophileHUGS lycos.com> writes:

Era Scarecrow:

 Are you going to check after every little + that you
 didn't have an overflow?

In debug mode that's the job of a modern well designed language, 
just like checking an index is inside the bounds of an array 
every time you perform an array access.

Bye,
bearophile

Jan 22 2013

"Era Scarecrow" <rtcvb32 yahoo.com> writes:

On Tuesday, 22 January 2013 at 15:11:41 UTC, bearophile wrote:
 Era Scarecrow:

 Are you going to check after every little + that you didn't 
 have an overflow?

 In debug mode that's the job of a modern well designed 
 language, just like checking an index is inside the bounds of 
 an array every time you perform an array access.

  Agreed. However D (compilers) doesn't have an option to check 
those, I think it was requested but walter said no (due to slower 
speed I think); Therefore if the compiler won't do it for you, 
you have to do it yourself. I really wouldn't want to have to use 
BigInt for everything that can't overflow and then check to make 
sure I can fit it in my smaller variables afterwards along with 
the extra move. I wouldn't want to use BigInts everywhere, and 
long's aren't needed everywhere either.

  Of course if an attribute was added that checked just those 
functions for important overflows then it could help, but in 
truth it kinda clutters the signatures with something that isn't 
an important attribute. Guess 'CheckedInt' could work in those 
cases, but that's more during runtime and release rather than 
debugging.

Jan 22 2013

"Thiez" <thiezz gmail.com> writes:

On Tuesday, 22 January 2013 at 16:31:20 UTC, Era Scarecrow wrote:
 I really wouldn't want to have to use BigInt for everything 
 that can't overflow and then check to make sure I can fit it in 
 my smaller variables afterwards along with the extra move. I 
 wouldn't want to use BigInts everywhere, and long's aren't 
 needed everywhere either.

Since D aims to emulate C in this aspect, overflow with uints is 
probably defined as a wrap-around (like C). In this case it seems 
to me the check for overflow would simply be '(a+b)<a', no need 
to cast to longs and BigInts and all that. Of course this may not 
apply to signed ints...

Jan 22 2013

"Era Scarecrow" <rtcvb32 yahoo.com> writes:

On Tuesday, 22 January 2013 at 17:10:35 UTC, Thiez wrote:
 On Tuesday, 22 January 2013 at 16:31:20 UTC, Era Scarecrow 
 wrote:
 I really wouldn't want to have to use BigInt for everything 
 that can't overflow and then check to make sure I can fit it 
 in my smaller variables afterwards along with the extra move. 
 I wouldn't want to use BigInts everywhere, and long's aren't 
 needed everywhere either.

 Since D aims to emulate C in this aspect, overflow with uints 
 is probably defined as a wrap-around (like C). In this case it 
 seems to me the check for overflow would simply be '(a+b)<a', 
 no need to cast to longs and BigInts and all that. Of course 
 this may not apply to signed ints...

  That merely shortens the size of the check, not where you need 
to place the checks or how often. Truthfully, in almost all cases 
the wrap-around or overflow/underflow is an error, usually 
unchecked. If 1 million were the max, then 1,000,000 + 1 should 
equal 1,000,001 and not <=0, and if 0 is the minimum, 0 - 1 
should not equal >=0.

  The only real time I can find overflow wanted is while making 
something that watches for it explicitly to make use of it. Say 
we emulate or write the 'ucent' types. That could be done as:

   //addition example obviously
   void add(const uint[4] lhs, const uint[4] rhs) {
     uint[4] val;
     bool carry = false;
     foreach(i, ref v; val) {
       uint tmp = lhs[i];

       v = lhs[i] + rhs[i] + (carry ? 1 : 0);
       carry = v < tmp;
     }

     assert(!carry); //could fail. How to handle this? Ignore?
  }

  Now let's say there's a for loop which someone decides they 
would be clever and use a ubyte (unsigned char) as an index or 
counter.

  for(ubyte i = 0; i < 1000; i++) {
    writeln(i);
  }

  The overflow is an error because the wrong type was selected but 
doesn't change the obvious logic behind it. You can hide the type 
behind an alias or similar but that doesn't change the fact it's 
a bug, and can be easier to detect if we are aware the overflow 
is happening at all rather than it getting stuck and having to 
manually kill the process or step through it in a debugger. If it 
wasn't outputting in some way you could identify it's much harder 
to find.

  Encryption may make use of the overflow/wrap around, but far 
more likely they use xor or binary operations which don't have 
those problems.

Jan 22 2013

Walter Bright <newshound2 digitalmars.com> writes:

On 1/22/2013 6:44 AM, Era Scarecrow wrote:
   It's been quoted that for every 10 lines of code there's a bug.

I've been doing some refactoring in dmd now and then. Every time I do, the 
process exposes latent bugs. On the one hand, that's discouraging, on the other 
hand, I think it shows the value in refactoring into a better design.

Jan 22 2013

"deadalnix" <deadalnix gmail.com> writes:

On Tuesday, 22 January 2013 at 14:44:26 UTC, Era Scarecrow wrote:
  It's been quoted that for every 10 lines of code there's a bug.

It is said a lot. I'd like to see hard data on that one. I'd bet 
that it greatly vary from one programmer to another, and probably 
from one language to another.

Jan 22 2013

Philippe Sigaud <philippe.sigaud gmail.com> writes:

On Wed, Jan 23, 2013 at 5:56 AM, deadalnix <deadalnix gmail.com> wrote:
 On Tuesday, 22 January 2013 at 14:44:26 UTC, Era Scarecrow wrote:
  It's been quoted that for every 10 lines of code there's a bug.


 It is said a lot. I'd like to see hard data on that one. I'd bet that it
 greatly vary from one programmer to another, and probably from one language
 to another.

With D, we aim for one bug every 14 lines of code :)

Jan 22 2013

"eles" <eles eles.com> writes:

On Wednesday, 23 January 2013 at 06:22:55 UTC, Philippe Sigaud 
wrote:
 On Wed, Jan 23, 2013 at 5:56 AM, deadalnix 
 <deadalnix gmail.com> wrote:
 On Tuesday, 22 January 2013 at 14:44:26 UTC, Era Scarecrow

 With D, we aim for one bug every 14 lines of code :)

Add to that the fact that programs in D tend to be shorter than 
their C or even C++ equivalents!

Jan 22 2013

"Era Scarecrow" <rtcvb32 yahoo.com> writes:

On Wednesday, 23 January 2013 at 07:33:22 UTC, eles wrote:
 On Wednesday, 23 January 2013 at 06:22:55 UTC, Philippe Sigaud 
 wrote:
 With D, we aim for one bug every 14 lines of code :)

 Add to that the fact that programs in D tend to be shorter than 
 their C or even C++ equivalents!

  Less boiler plate code, fewer direct pointers, no preprocessor 
macros. Code that might have ambiguities based on order of 
priority force (or sternly warn) you to use parentheses for what 
you intend rather than a set of long complex rules. Templates 
easier to make and use (needing fewer of them). No header file(s) 
(and all the duplication or annoying separation that comes with 
it). Assignment in certain locations are illegal. Oh yes, no ugly 
STL, and a lot more.

  Plenty of stuff that simplifies a whole lot of stuff. D is 
indeed the language I always wanted :)

Jan 22 2013

"eles" <eles eles.com> writes:

On Wednesday, 23 January 2013 at 07:57:38 UTC, Era Scarecrow 
wrote:
 On Wednesday, 23 January 2013 at 07:33:22 UTC, eles wrote:
 On Wednesday, 23 January 2013 at 06:22:55 UTC, Philippe Sigaud 
 wrote:

  Plenty of stuff that simplifies a whole lot of stuff. D is 
 indeed the language I always wanted :)

Sigh... Only if it would go into that gcc suite... faster.

Jan 23 2013

"Simen Kjaeraas" <simen.kjaras gmail.com> writes:

On 2013-16-23 07:01, Philippe Sigaud <philippe.sigaud gmail.com> wrote:

 On Wed, Jan 23, 2013 at 5:56 AM, deadalnix <deadalnix gmail.com> wrote:
 On Tuesday, 22 January 2013 at 14:44:26 UTC, Era Scarecrow wrote:
  It's been quoted that for every 10 lines of code there's a bug.


 It is said a lot. I'd like to see hard data on that one. I'd bet that it
 greatly vary from one programmer to another, and probably from one  
 language
 to another.

 With D, we aim for one bug every 14 lines of code :)

Can do. Who wants to patch the compiler to automaically insert those bugs?  
:p

-- 
Simen

Jan 22 2013

"Don" <don nospam.com> writes:

On Wednesday, 23 January 2013 at 04:56:11 UTC, deadalnix wrote:
 On Tuesday, 22 January 2013 at 14:44:26 UTC, Era Scarecrow 
 wrote:
 It's been quoted that for every 10 lines of code there's a bug.

 It is said a lot. I'd like to see hard data on that one. I'd 
 bet that it greatly vary from one programmer to another, and 
 probably from one language to another.


It definitely does.

"There has been no error reported in TeX since 1994 or 1995"  -- 
Knuth, 2002.
There were 7 bugs in TeX reported between 1982 and 1995.
Tex has a lot more than 70 lines of code :-)

Jan 23 2013

"Era Scarecrow" <rtcvb32 yahoo.com> writes:

On Wednesday, 23 January 2013 at 09:46:47 UTC, Don wrote:
 "There has been no error reported in TeX since 1994 or 1995"  
 -- Knuth, 2002.
 There were 7 bugs in TeX reported between 1982 and 1995.
 Tex has a lot more than 70 lines of code :-)

  Bugs in code don't always live on one line per bug; They can 
span multiple very easily. Some bugs are simply missing logic, 
untested cases, no default values in variables. Now if we have a 
while loop and you modify the index at the wrong spot you need to 
move it, making it have a bug spanning at least two lines.

  Some bugs are known but for the most part ignored, like memory 
management for very tiny programs. Many error values returned by 
the OS & errorno are ignored, but don't usually have any 
catastrophic effects.

  Some bugs are the effect of using a macro which expands. 
Logically it makes sense, but the macro makes it unstable at 
best; while an actual function wouldn't have a bug.

   #define min(a,b) ((a)>(b) ? (b) : (a))

   int a=1,b=2,c;
   c = min(a++, b++); //minimum of both a or b, and increase each 
once

   //will any of these pass?
   assert(c == 1);
   assert(a == 2);
   assert(b == 3);

Jan 23 2013

D Programming

C/C++ Programming

Other

digitalmars.D - [OT] Walter about compilers