www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - [OT] Walter about compilers

reply "eles" <eles eles.com> writes:
Hi everybody,

I was just reading this:

http://www.laputan.org/metamorphosis/metamorphosis.html#SoftwareTectonics

(a thing about software architectures).

The text opens with...:

"We like it when people always want more! Otherwise, we'd be out 
of the upgrade business. Sometimes, people ask me what I will do 
when the compiler is done. Done? No software program that is 
selling is ever done!
-- Walter Bright, C++ compiler architect"

So... the question is: does that quote also applies for dmd? :)
Jan 22 2013
next sibling parent reply "Era Scarecrow" <rtcvb32 yahoo.com> writes:
On Tuesday, 22 January 2013 at 13:54:08 UTC, eles wrote:

 The text opens with...:

 "We like it when people always want more! Otherwise, we'd be 
 out of the upgrade business. Sometimes, people ask me what I 
 will do when the compiler is done. Done? No software program 
 that is selling is ever done!
 -- Walter Bright, C++ compiler architect"

 So... the question is: does that quote also applies for dmd? :)

It's been quoted that for every 10 lines of code there's a bug. There are programs with tens of thousands of lines of code, so finding every bug is probably impossible for large programs (above 1000 lines). But that doesn't mean they can't run very very well. A number of the bugs for unchecked work is addition for example, perhaps simplest of operations; Are you going to check after every little + that you didn't have an overflow? Without a lot of extra work are you going to include checks that ensure they can't break? C example: //code looks okay void* getMemory(int a, int b) { return malloc(a + b); } //becomes negative due to overflow. it can happen //probably returns NULL. I don't know.. void* ptr = getMemory(0x7fffffff, 0x7fffffff); //overflow free version? void* getMemory(unsigned int a, unsigned int b) { //max name may be wrong, but you get the idea. //don't remember, need third cast? assert(((long long) a) + ((long long) b) <= UNSIGNED_INT_MAX); return malloc(a + b); } //should assert now void* ptr = getMemory(UNSIGNED_INT_MAX, UNSIGNED_INT_MAX); Since part of the process is not only fixing bugs and improving the compiler, but there's also new features that may be requested that you find necessary yet never needed before you thought about it. Consider: A recent project of mine that I hadn't updated in over a year and a half seemed to have a bug with how it handled a certain feature and was just brought up, needed to add about 10 lines of code to handle it; Then I found a bug within those 10 lines (after it was working). With that in mind, it's likely no program will be 'done', but if they do the job well enough then it's probably good enough. So to answer it, the answer is probably yes it applies to dmd.
Jan 22 2013
parent Walter Bright <newshound2 digitalmars.com> writes:
On 1/22/2013 6:44 AM, Era Scarecrow wrote:
   It's been quoted that for every 10 lines of code there's a bug.

I've been doing some refactoring in dmd now and then. Every time I do, the process exposes latent bugs. On the one hand, that's discouraging, on the other hand, I think it shows the value in refactoring into a better design.
Jan 22 2013
prev sibling next sibling parent "Peter Alexander" <peter.alexander.au gmail.com> writes:
On Tuesday, 22 January 2013 at 14:44:26 UTC, Era Scarecrow wrote:
 It's been quoted that for every 10 lines of code there's a bug. 
 There are programs with tens of thousands of lines of code, so 
 finding every bug is probably impossible for large programs 
 (above 1000 lines).

I love how >1kloc is "large" :D I'd say anything under 100kloc is a small program. 100kloc-1mloc medium, and >1mloc large.
Jan 22 2013
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
Era Scarecrow:

 Are you going to check after every little + that you
 didn't have an overflow?

In debug mode that's the job of a modern well designed language, just like checking an index is inside the bounds of an array every time you perform an array access. Bye, bearophile
Jan 22 2013
prev sibling next sibling parent "deadalnix" <deadalnix gmail.com> writes:
On Tuesday, 22 January 2013 at 14:59:48 UTC, Peter Alexander 
wrote:
 On Tuesday, 22 January 2013 at 14:44:26 UTC, Era Scarecrow 
 wrote:
 It's been quoted that for every 10 lines of code there's a 
 bug. There are programs with tens of thousands of lines of 
 code, so finding every bug is probably impossible for large 
 programs (above 1000 lines).

I love how >1kloc is "large" :D I'd say anything under 100kloc is a small program. 100kloc-1mloc medium, and >1mloc large.

It really depends if we are talking about java or not.
Jan 22 2013
prev sibling next sibling parent "eles" <eles eles.com> writes:
On Tuesday, 22 January 2013 at 14:59:48 UTC, Peter Alexander 
wrote:
 On Tuesday, 22 January 2013 at 14:44:26 UTC, Era Scarecrow 
 wrote:

 I'd say anything under 100kloc is a small program. 
 100kloc-1mloc medium, and >1mloc large.

That means (at least) 100k bugs. Happy fixing!
Jan 22 2013
prev sibling next sibling parent "Simen Kjaeraas" <simen.kjaras gmail.com> writes:
On 2013-01-22, 15:59, Peter Alexander wrote:

 On Tuesday, 22 January 2013 at 14:44:26 UTC, Era Scarecrow wrote:
 It's been quoted that for every 10 lines of code there's a bug. There  
 are programs with tens of thousands of lines of code, so finding every  
 bug is probably impossible for large programs (above 1000 lines).

I love how >1kloc is "large" :D I'd say anything under 100kloc is a small program. 100kloc-1mloc medium, and >1mloc large.

It's context dependent, of course. Finding all the bugs in 1kloc is doable, but lots of work. Finding all the bugs in 10kloc, conceivably doable, but unlikely to be worth it. >= 100kloc? ouch. -- Simen
Jan 22 2013
prev sibling next sibling parent "Peter Alexander" <peter.alexander.au gmail.com> writes:
On Tuesday, 22 January 2013 at 15:26:28 UTC, deadalnix wrote:
 On Tuesday, 22 January 2013 at 14:59:48 UTC, Peter Alexander 
 wrote:
 On Tuesday, 22 January 2013 at 14:44:26 UTC, Era Scarecrow 
 wrote:
 It's been quoted that for every 10 lines of code there's a 
 bug. There are programs with tens of thousands of lines of 
 code, so finding every bug is probably impossible for large 
 programs (above 1000 lines).

I love how >1kloc is "large" :D I'd say anything under 100kloc is a small program. 100kloc-1mloc medium, and >1mloc large.

It really depends if we are talking about java or not.

Not just Java. According to Wikipedia Debian 5 has over 300 million lines of code. http://en.wikipedia.org/wiki/Source_lines_of_code Last time I counted, Phobos has ~200kloc.
Jan 22 2013
prev sibling next sibling parent "Era Scarecrow" <rtcvb32 yahoo.com> writes:
On Tuesday, 22 January 2013 at 15:11:41 UTC, bearophile wrote:
 Era Scarecrow:

 Are you going to check after every little + that you didn't 
 have an overflow?

In debug mode that's the job of a modern well designed language, just like checking an index is inside the bounds of an array every time you perform an array access.

Agreed. However D (compilers) doesn't have an option to check those, I think it was requested but walter said no (due to slower speed I think); Therefore if the compiler won't do it for you, you have to do it yourself. I really wouldn't want to have to use BigInt for everything that can't overflow and then check to make sure I can fit it in my smaller variables afterwards along with the extra move. I wouldn't want to use BigInts everywhere, and long's aren't needed everywhere either. Of course if an attribute was added that checked just those functions for important overflows then it could help, but in truth it kinda clutters the signatures with something that isn't an important attribute. Guess 'CheckedInt' could work in those cases, but that's more during runtime and release rather than debugging.
Jan 22 2013
prev sibling next sibling parent "jerro" <a a.com> writes:
 Not just Java. According to Wikipedia Debian 5 has over 300 
 million lines of code.

It also consists of over 20000 packages. It is not one program.
Jan 22 2013
prev sibling next sibling parent "Thiez" <thiezz gmail.com> writes:
On Tuesday, 22 January 2013 at 16:31:20 UTC, Era Scarecrow wrote:
 I really wouldn't want to have to use BigInt for everything 
 that can't overflow and then check to make sure I can fit it in 
 my smaller variables afterwards along with the extra move. I 
 wouldn't want to use BigInts everywhere, and long's aren't 
 needed everywhere either.

Since D aims to emulate C in this aspect, overflow with uints is probably defined as a wrap-around (like C). In this case it seems to me the check for overflow would simply be '(a+b)<a', no need to cast to longs and BigInts and all that. Of course this may not apply to signed ints...
Jan 22 2013
prev sibling next sibling parent "Era Scarecrow" <rtcvb32 yahoo.com> writes:
On Tuesday, 22 January 2013 at 17:10:35 UTC, Thiez wrote:
 On Tuesday, 22 January 2013 at 16:31:20 UTC, Era Scarecrow 
 wrote:
 I really wouldn't want to have to use BigInt for everything 
 that can't overflow and then check to make sure I can fit it 
 in my smaller variables afterwards along with the extra move. 
 I wouldn't want to use BigInts everywhere, and long's aren't 
 needed everywhere either.

Since D aims to emulate C in this aspect, overflow with uints is probably defined as a wrap-around (like C). In this case it seems to me the check for overflow would simply be '(a+b)<a', no need to cast to longs and BigInts and all that. Of course this may not apply to signed ints...

That merely shortens the size of the check, not where you need to place the checks or how often. Truthfully, in almost all cases the wrap-around or overflow/underflow is an error, usually unchecked. If 1 million were the max, then 1,000,000 + 1 should equal 1,000,001 and not <=0, and if 0 is the minimum, 0 - 1 should not equal >=0. The only real time I can find overflow wanted is while making something that watches for it explicitly to make use of it. Say we emulate or write the 'ucent' types. That could be done as: //addition example obviously void add(const uint[4] lhs, const uint[4] rhs) { uint[4] val; bool carry = false; foreach(i, ref v; val) { uint tmp = lhs[i]; v = lhs[i] + rhs[i] + (carry ? 1 : 0); carry = v < tmp; } assert(!carry); //could fail. How to handle this? Ignore? } Now let's say there's a for loop which someone decides they would be clever and use a ubyte (unsigned char) as an index or counter. for(ubyte i = 0; i < 1000; i++) { writeln(i); } The overflow is an error because the wrong type was selected but doesn't change the obvious logic behind it. You can hide the type behind an alias or similar but that doesn't change the fact it's a bug, and can be easier to detect if we are aware the overflow is happening at all rather than it getting stuck and having to manually kill the process or step through it in a debugger. If it wasn't outputting in some way you could identify it's much harder to find. Encryption may make use of the overflow/wrap around, but far more likely they use xor or binary operations which don't have those problems.
Jan 22 2013
prev sibling next sibling parent "deadalnix" <deadalnix gmail.com> writes:
On Tuesday, 22 January 2013 at 14:44:26 UTC, Era Scarecrow wrote:
  It's been quoted that for every 10 lines of code there's a bug.

It is said a lot. I'd like to see hard data on that one. I'd bet that it greatly vary from one programmer to another, and probably from one language to another.
Jan 22 2013
prev sibling next sibling parent Philippe Sigaud <philippe.sigaud gmail.com> writes:
On Wed, Jan 23, 2013 at 5:56 AM, deadalnix <deadalnix gmail.com> wrote:
 On Tuesday, 22 January 2013 at 14:44:26 UTC, Era Scarecrow wrote:
  It's been quoted that for every 10 lines of code there's a bug.

It is said a lot. I'd like to see hard data on that one. I'd bet that it greatly vary from one programmer to another, and probably from one language to another.

With D, we aim for one bug every 14 lines of code :)
Jan 22 2013
prev sibling next sibling parent "eles" <eles eles.com> writes:
On Wednesday, 23 January 2013 at 06:22:55 UTC, Philippe Sigaud 
wrote:
 On Wed, Jan 23, 2013 at 5:56 AM, deadalnix 
 <deadalnix gmail.com> wrote:
 On Tuesday, 22 January 2013 at 14:44:26 UTC, Era Scarecrow


Add to that the fact that programs in D tend to be shorter than their C or even C++ equivalents!
Jan 22 2013
prev sibling next sibling parent "Simen Kjaeraas" <simen.kjaras gmail.com> writes:
On 2013-16-23 07:01, Philippe Sigaud <philippe.sigaud gmail.com> wrote:

 On Wed, Jan 23, 2013 at 5:56 AM, deadalnix <deadalnix gmail.com> wrote:
 On Tuesday, 22 January 2013 at 14:44:26 UTC, Era Scarecrow wrote:
  It's been quoted that for every 10 lines of code there's a bug.

It is said a lot. I'd like to see hard data on that one. I'd bet that it greatly vary from one programmer to another, and probably from one language to another.

With D, we aim for one bug every 14 lines of code :)

Can do. Who wants to patch the compiler to automaically insert those bugs? :p -- Simen
Jan 22 2013
prev sibling next sibling parent "Era Scarecrow" <rtcvb32 yahoo.com> writes:
On Wednesday, 23 January 2013 at 07:33:22 UTC, eles wrote:
 On Wednesday, 23 January 2013 at 06:22:55 UTC, Philippe Sigaud 
 wrote:
 With D, we aim for one bug every 14 lines of code :)

Add to that the fact that programs in D tend to be shorter than their C or even C++ equivalents!

Less boiler plate code, fewer direct pointers, no preprocessor macros. Code that might have ambiguities based on order of priority force (or sternly warn) you to use parentheses for what you intend rather than a set of long complex rules. Templates easier to make and use (needing fewer of them). No header file(s) (and all the duplication or annoying separation that comes with it). Assignment in certain locations are illegal. Oh yes, no ugly STL, and a lot more. Plenty of stuff that simplifies a whole lot of stuff. D is indeed the language I always wanted :)
Jan 22 2013
prev sibling next sibling parent "eles" <eles eles.com> writes:
On Wednesday, 23 January 2013 at 07:57:38 UTC, Era Scarecrow 
wrote:
 On Wednesday, 23 January 2013 at 07:33:22 UTC, eles wrote:
 On Wednesday, 23 January 2013 at 06:22:55 UTC, Philippe Sigaud 
 wrote:

indeed the language I always wanted :)

Sigh... Only if it would go into that gcc suite... faster.
Jan 23 2013
prev sibling next sibling parent "Don" <don nospam.com> writes:
On Wednesday, 23 January 2013 at 04:56:11 UTC, deadalnix wrote:
 On Tuesday, 22 January 2013 at 14:44:26 UTC, Era Scarecrow 
 wrote:
 It's been quoted that for every 10 lines of code there's a bug.

It is said a lot. I'd like to see hard data on that one. I'd bet that it greatly vary from one programmer to another, and probably from one language to another.

It definitely does. "There has been no error reported in TeX since 1994 or 1995" -- Knuth, 2002. There were 7 bugs in TeX reported between 1982 and 1995. Tex has a lot more than 70 lines of code :-)
Jan 23 2013
prev sibling parent "Era Scarecrow" <rtcvb32 yahoo.com> writes:
On Wednesday, 23 January 2013 at 09:46:47 UTC, Don wrote:
 "There has been no error reported in TeX since 1994 or 1995"  
 -- Knuth, 2002.
 There were 7 bugs in TeX reported between 1982 and 1995.
 Tex has a lot more than 70 lines of code :-)

Bugs in code don't always live on one line per bug; They can span multiple very easily. Some bugs are simply missing logic, untested cases, no default values in variables. Now if we have a while loop and you modify the index at the wrong spot you need to move it, making it have a bug spanning at least two lines. Some bugs are known but for the most part ignored, like memory management for very tiny programs. Many error values returned by the OS & errorno are ignored, but don't usually have any catastrophic effects. Some bugs are the effect of using a macro which expands. Logically it makes sense, but the macro makes it unstable at best; while an actual function wouldn't have a bug. #define min(a,b) ((a)>(b) ? (b) : (a)) int a=1,b=2,c; c = min(a++, b++); //minimum of both a or b, and increase each once //will any of these pass? assert(c == 1); assert(a == 2); assert(b == 3);
Jan 23 2013