www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - "Expressive vs. permissive languages" and bugs

reply bearophile <bearophileHUGS lycos.com> writes:
I think I have not shown this article yet, "Expressive vs. permissive
languages: Is that the question?" by Yannick Moy:

First page, with reader comments:

http://www.eetimes.com/design/eda-design/4008921/Expressive-vs-permissive-languages--Is-that-the-question-

Single page, without reader comments:

http://www.eetimes.com/General/DisplayPrintViewContent?contentItemId=4008921


I think this article doesn't say particularly new things (and I think it's a
bit biased toward Ada), but it says them in a nice and compact way, it
discusses about a topic that interests D designers, because D is designed to
avoid some of the typical bugs of C code.


The section "A simple example in C/Java/Ada": A D version of that function may
look just like the Java code. But probably it's better to add a precondition
too, that tests if conf is not null and the bounds of num_proc too and raises
exceptions otherwise.

That Ada version of the code takes something that can't be null and the number
of items of the array can't be too much big. So the article says:

this makes a total of five possible errors in C, three in Java, and two in Ada.<
Ranged integers (that are a special case of integer overflows) are a good idea (as probably a not-null attribute for pointers/references). See also the comments about the different kinds of pointers, that have different capabilities, to avoid bugs.
According to a study reported in 2003 by Andy German on military systems
varying in size from 3,000 lines of code to 300,000 lines of code, these
languages are also those in which programmers make less errors, four per
thousand lines on average for SPARK, between 4.8 and 50 per thousand lines for
Ada, between 12.5 and 500 (sic) per thousand lines for C.<
I presume the bug rate of well written D code may lie somewhere between the C and Ada one, because D is able to avoid some of the bugs of C programs, but it's not as strict as Ada (see for example bug http://d.puremagic.com/issues/show_bug.cgi?id=3999 ). ------------------- The comments after the article look even more interesting than the article :-)
Plus it compiles despite a crucial bug: your parameter res should be a Proc**
and you should be assigning the result of the allocation to *res. <
To try to make the C code safer, that commenter has added stuff to the C program, and has introduced another bug, uncaught by the compiler. Ada isn't a succinct language, but all those extra fluff you add to an Ada program is useful to actually increase consistency of the code. So it's not the same thing. One of the answers is very nice and speaks for a strong typedef, stronger enum, ranged types:
In your revised version of the C code, the types Result_t and uint8_t are
compatible with each other in expressions, despite their different purposes.
Indeed, they are compatible with every other enum and integer type and floats
under most circumstances, under various confusing and inconsistent silent
promotion rules. If you are lucky then you will sometimes get a warning, but
you can't rely on it. Even the MISRA checker allows a Result_t to be assigned
to a uint8_t, even though this almost certainly makes no sense. And in any any
C-derived language (MISRA or not) you have to use one of a small fixed set of
integer types that almost never have the appropriate range for the quantity in
question. And as well as having inappropriate ranges, quantities that should
never be assignment compatible or mixable in expressions (without explicit
conversions) can be silently confused with each other.<
This compiles, passes MISRA checking, and makes no sense. The if test is never
true (it should say "ActiveState[n] == INACTIVE)"). There isn't a real type
tState, just a bunch of constants. INACTIVE, being the first one, has the value
0. ActiveState, used in an expression, is merely a pointer. Pointers can be
compared with 0. This is all fundamentally bad. <
The D compiler is able to catch that bug, yeah :-)
Also, you cannot just dismiss returning a pointer to a local variable as a
beginner's mistake. It can be done in less obvious ways, for one thing. But the
main point is that it is obviously dangerous and should simply be forbidden.
Ada has rules that are designed to prevent such a mistake from even compiling.
They make it less permissive than C in this respect. This is a good thing.<
I agree. See also bug: http://d.puremagic.com/issues/show_bug.cgi?id=3925
 if (getuid() != 0 && getuid == 0) {
   ErrorF("only root!");
 
   exit(1);
 
 }
 for (int i=0; i != MAX_ELEMENTS; i++);
 {
 floatValues_l[i] = 0.0f;
 }
D compiler is able to catch both bugs, yeah! :-) ---------------- Now I'd like the D language to become a bit more strict, so the compiler may catch more bugs, integral-related bugs, enum bugs, some pointers bugs, and so on. An example: Here in C the order of evaluation of foo() and bar() is not specified. In D it's better to specify it, to define the semantics of D code and make a bit more safe porting D code across different compilers: auto z = foo(x) + bar(y); On the other hand if both foo() and bar() are strongly pure functions, then the D compiler must be free and able to act as in C, choosing the most efficient order of evaluation of foo() and bar(). ---------------- Several sources I have read seem to show that programs written in Ada contain less bugs than programs written in about all other languages (but Ada subsets like SPARK, etc). And it's generally known that often the amount of time needed to debug programs is a significant percentage of the whole programming time. Then why isn't Ada used more? - Maybe programmers that have learnt as their first language a C-like language find bad the Pascal-like syntax of Ada. - Maybe because Ada programs are a little "logorrhoeic", you need to write lot of code. - Maybe because Ada isn's diffused, and professional programmers don't want to use years of their life to study and use a language that offers low hopes of being hired elsewhere. - Maybe because Ada is a pernickety language, every detail needs to be correct if you want to see your program compiled. But in my opinion that list misses an important point that I have not seen in those articles: to me Ada doesn't look very good for explorative programming. This means I think it's not well fit to both invent new coding ideas, or just to invent a working solution to programming problem. When I have an algorithmic problem, I want to use most of my mind to think about the problem, and not to care and cuddle the compiler, otherwise it's less likely that I am actually able to find a solution. So it may be positive a less fussy language, as Python, for the first phase of exploration (think about using MatPlotLib from the Python shell to plot data and invent ideas) and invention of a solution algorithm. And later a good language has to offer ways to make the code less buggy and more rigorous, like Ada code (for example using an attribute, that switches the code from dynamic typing to static typing, etc). I do this a bit when I program in D2: first I write the D2 code without const/pure/immutable, then when the code works I add those attributes. ------------------ To avoid bugs in C code I have found a tool that I didn't know about, "mygcc", a variant of it may be written to find bugs in D code too: http://mygcc.free.fr/overview.html It's a kind of metalanguage, it allows to define rules, using a ugly but compact syntax, that are then applied on C code to catch bugs. From the tests it seems to work well enough, and the rules are compact. But their syntax doesn't look very good yet. Bye, bearophile
Oct 23 2010
next sibling parent reply dsimcha <dsimcha yahoo.com> writes:
At a meta level, I never really understood all these calls for more strictness
in
D to prevent bugs.  We already have a garbage collector and rules that prevent
the
most common C bugs.  I generally find that the vast majority of time I spend
debugging D code (which is more relevant than the number of bugs) comes from one
of two sources:

1.  Cases where I intentionally bypass safety features (for example, using
manual
memory management, unsafe casting where the type system is just getting in the
way, or unchecked shared-memory multithreading) for performance or convenience
reasons.  In these cases what's really needed is to make existing safe features
more usable and efficient so I'm not as tempted to bypass them.

2.  High-level/domain-specific logic errors.  About the only language features
that will help here are good abstraction capabilities (so I only need to get
complicated algorithmic code right once, and can then reuse it) and good
contracts/asserts (which D mostly has, though I'd love an alwaysAssert() or
something that throws an AssertError instead of Exception, like assert(), but
isn't compiled out in release mode, like enforce(),).  One other thing that
might
help here is finer grained control over bounds checking.  I'd love to leave
bounds
checking on for most of a program, but turn it off just for certain
performance-critical classes/structs/functions.
Oct 23 2010
parent bearophile <bearophileHUGS lycos.com> writes:
dsimcha:

 At a meta level, I never really understood all these calls for more strictness
in
 D to prevent bugs.
Every time I suggest something I try to give a rationale for it, and you are able to shot down my arguments. While if you want a general answer I can't help you much, it's experience from coding & debugging and from reading about how things are done in languages where bugs are much less common compared to C (such languages do exist, as Ada and SPARK and few others).
 1.  Cases where I intentionally bypass safety features (for example, using
manual
 memory management, unsafe casting where the type system is just getting in the
 way, or unchecked shared-memory multithreading) for performance or convenience
 reasons.
In origin, about three years ago, to solve this kind of problems my instinct was pushing toward a solution like the one used by the Cyclone language. Later I have understood why Cyclone has failed (making C safer is hopeless if you also want a handy and terse language) and why the D solution was better (to avoid C solution and use languages features designed to be safer from the start, like D dynamic arrays). Still, even if Cyclone is a failure, I still think that it's possible to make the C-style code a bit safer (not as safe as Cyclone, but safer than normal C) and keep it almost as handy as before. An example of this effort is the idea of the tagged attribute for enums. In the end the very good Don has let me understand that was a bad idea, and that enhancement request is now closed, but that tagged attribute is syntactically clean, you just need it and in theory you don't need to change other parts of the D code.
 In these cases what's really needed is to make existing safe features
 more usable and efficient so I'm not as tempted to bypass them.
This is a good thing, right.
 2.  High-level/domain-specific logic errors.
In the last months I am developing more and more interest in sofisticated type systems. From what I've seen typestates and Dependent types are able to catch a growing number of those higher level bugs. But D doesn't have that type system (once D has user-defined attributes plus more static introspection it's maybe possible to define linear types with an attribyte like linear. This idea is quite similar to the uniqueness ideas of Bartoz. But linear types are far simpler, both in implementation and usage, than Dependent types. I have recently shown a post here about dependant types) so this is not a true option. A problem is of course that D must be usable in few years or right now, while maybe it will take too many years to design Dependent types that aren't too much hard to use. They are mostly an interesting research topic still (despite you may use them today in the ATS language), while D is a practical language designed with parts well understood and well known to work now.
 and good contracts/asserts (which D mostly has,
Few months ago in a post I have discussed about some possible improvements of the contract system of D. Introducing the "old" is probably one of the most important missing features. Some older threads about it: http://www.digitalmars.com/d/archives/digitalmars/D/why_no_old_operator_in_function_postconditions_as_in_Eiffel_54654.html http://www.digitalmars.com/d/archives/digitalmars/D/Communicating_between_in_and_out_contracts_98252.html When I have discussed about D contract system improvements, I have said something about "loop invariants" and "loop variants" too. A person has shown me that calling a pure function inside an assert positioned at the top of the loop was probably a good enough loop invariant. So maybe there is no need of explicit syntax for loop invariants in D (despite it's light syntax, it may reuse the invariant{} syntax). In that post about D contract system improvements I have also said that loop variants aren't that important. In the meantime I have found that instead they may be more important that I have thought, this is one item from a list of ten rules to write higher reliability C code: http://spinroot.com/p10/rule2.html The lack of loop variant has caused this famous Zune bug: http://www.zuneboards.com/forums/zune-news/38143-cause-zune-30-leapyear-problem-isolated.html If you don't remember what a loop variant is: http://en.wikipedia.org/wiki/Loop_variant I have not yet written an enhancement request that lists all the (few) improvements I've suggested for the D contract system. I was lazy.
 though I'd love an alwaysAssert() or
 something that throws an AssertError instead of Exception, like assert(), but
 isn't compiled out in release mode, like enforce(),).
This looks a bit messy.
 One other thing that might
 help here is finer grained control over bounds checking.  I'd love to leave
bounds
 checking on for most of a program, but turn it off just for certain
 performance-critical classes/structs/functions.
I think a pragma may be used for this. But a better solution is of course to introduce a bit more static analysis inside the D compiler, that allows it to remove some bounds checking where it infers the bounds can't be crossed. This is done by the last version of the Oracle Java Machine. On the other hand currently the D language looks designed to don't require a smart back-end, so this may be something to rely on (of course future better D compilers are free to perform such optimization in non-release mode). A disadvantage of those compiler optimizations is that you can't rely on them. While you are able to rely on the performance kick gained from a pragma that locally disables bounds checking :-) Predictability is something I'm appreciating more and more in compilers/languages. Thank you for your comments. Bye, bearophile
Oct 23 2010
prev sibling parent reply Roman Ivanov <isroman.del ete.km.ru> writes:
I haven't worked with Ada, but I have worked with Ada-derived hardware
description language called VHDL and found it easier to use than than
C++. Maybe that's due to the ways I used the language. Maybe not. The
ability to define different integer types seemed nice. I like clarity.

At the same time, I can easily see how that would get out of hand and
sabotage readability.


Java. Why all the reference types are nullable by default? Most of the
time when an object is assigned a null value it is wrong and should
immediately generate an exception. I 5% of the cases when I want nulls,
I can ask for the explicitly.

On 10/23/2010 9:12 AM, bearophile wrote:
 I think I have not shown this article yet, "Expressive vs. permissive
languages: Is that the question?" by Yannick Moy:
 
 First page, with reader comments:
 
 http://www.eetimes.com/design/eda-design/4008921/Expressive-vs-permissive-languages--Is-that-the-question-
 
 Single page, without reader comments:
 
 http://www.eetimes.com/General/DisplayPrintViewContent?contentItemId=4008921
 
 
 I think this article doesn't say particularly new things (and I think it's a
bit biased toward Ada), but it says them in a nice and compact way, it
discusses about a topic that interests D designers, because D is designed to
avoid some of the typical bugs of C code.
Oct 23 2010
parent bearophile <bearophileHUGS lycos.com> writes:
Roman Ivanov:

 The ability to define different integer types seemed nice. I like clarity.
It _is_ nice, if the compiler is able to enforce those bounds.

In theory I agree a lot with you, that signed sizes is wrong. In practice, given that: - D uses the silly signed/unsigned promotion rules of C - D has no integral overflows yet - D even lacks warning about mixing signed and unsigned variables in expressions, that even C GCC has The result of those three facts makes the usage of integers in D safer. In the end I use signed lengths and indexes everywhere it's safe and sane to do (but not everywhere). So I have asked the opposite of what I and you like: http://d.puremagic.com/issues/show_bug.cgi?id=3843
 At the same time, I can easily see how that would get out of hand and
 sabotage readability.
This is a real risk, so you need to design the syntax carefully.

 Java. Why all the reference types are nullable by default? Most of the
 time when an object is assigned a null value it is wrong and should
 immediately generate an exception. I 5% of the cases when I want nulls,
 I can ask for the explicitly.
This is a topic that was discussed hotly during the last stages of the finalization of D2 (I haven't understood why that idea was in the end refused). Anyway, even if now and forever in D pointers/references on default are nullable, an annotation + some semantics may be added still to tag the pointers/references you don't want nullable. I have started an enhancement request here some time ago: http://d.puremagic.com/issues/show_bug.cgi?id=4571 Thank you for your comments, bye, bearophile
Oct 23 2010