digitalmars.D - "Expressive vs. permissive languages" and bugs

bearophile (42/58) Oct 23 2010 I think I have not shown this article yet, "Expressive vs. permissive la...

dsimcha (19/19) Oct 23 2010 At a meta level, I never really understood all these calls for more stri...

bearophile (23/40) Oct 23 2010 In origin, about three years ago, to solve this kind of problems my inst...

Roman Ivanov (13/25) Oct 23 2010 I haven't worked with Ada, but I have worked with Ada-derived hardware

bearophile (16/25) Oct 23 2010 In theory I agree a lot with you, that signed sizes is wrong. In practic...

bearophile <bearophileHUGS lycos.com> writes:

I think I have not shown this article yet, "Expressive vs. permissive
languages: Is that the question?" by Yannick Moy:

First page, with reader comments:

http://www.eetimes.com/design/eda-design/4008921/Expressive-vs-permissive-languages--Is-that-the-question-

Single page, without reader comments:

http://www.eetimes.com/General/DisplayPrintViewContent?contentItemId=4008921

I think this article doesn't say particularly new things (and I think it's a
bit biased toward Ada), but it says them in a nice and compact way, it
discusses about a topic that interests D designers, because D is designed to
avoid some of the typical bugs of C code.

The section "A simple example in C/Java/Ada": A D version of that function may
look just like the Java code. But probably it's better to add a precondition
too, that tests if conf is not null and the bounds of num_proc too and raises
exceptions otherwise.

That Ada version of the code takes something that can't be null and the number
of items of the array can't be too much big. So the article says:

this makes a total of five possible errors in C, three in Java, and two in Ada.<

Ranged integers (that are a special case of integer overflows) are a good idea
(as probably a not-null attribute for pointers/references).

See also the comments about the different kinds of pointers, that have
different capabilities, to avoid bugs.

According to a study reported in 2003 by Andy German on military systems
varying in size from 3,000 lines of code to 300,000 lines of code, these
languages are also those in which programmers make less errors, four per
thousand lines on average for SPARK, between 4.8 and 50 per thousand lines for
Ada, between 12.5 and 500 (sic) per thousand lines for C.<

I presume the bug rate of well written D code may lie somewhere between the C
and Ada one, because D is able to avoid some of the bugs of C programs, but
it's not as strict as Ada (see for example bug
http://d.puremagic.com/issues/show_bug.cgi?id=3999 ).

-------------------

The comments after the article look even more interesting than the article :-)

Plus it compiles despite a crucial bug: your parameter res should be a Proc**
and you should be assigning the result of the allocation to *res. <

To try to make the C code safer, that commenter has added stuff to the C
program, and has introduced another bug, uncaught by the compiler. Ada isn't a
succinct language, but all those extra fluff you add to an Ada program is
useful to actually increase consistency of the code. So it's not the same thing.

One of the answers is very nice and speaks for a strong typedef, stronger enum,
ranged types:

In your revised version of the C code, the types Result_t and uint8_t are
compatible with each other in expressions, despite their different purposes.
Indeed, they are compatible with every other enum and integer type and floats
under most circumstances, under various confusing and inconsistent silent
promotion rules. If you are lucky then you will sometimes get a warning, but
you can't rely on it. Even the MISRA checker allows a Result_t to be assigned
to a uint8_t, even though this almost certainly makes no sense. And in any any
C-derived language (MISRA or not) you have to use one of a small fixed set of
integer types that almost never have the appropriate range for the quantity in
question. And as well as having inappropriate ranges, quantities that should
never be assignment compatible or mixable in expressions (without explicit
conversions) can be silently confused with each other.<

This compiles, passes MISRA checking, and makes no sense. The if test is never
true (it should say "ActiveState[n] == INACTIVE)"). There isn't a real type
tState, just a bunch of constants. INACTIVE, being the first one, has the value
0. ActiveState, used in an expression, is merely a pointer. Pointers can be
compared with 0. This is all fundamentally bad. <

The D compiler is able to catch that bug, yeah :-)

Also, you cannot just dismiss returning a pointer to a local variable as a
beginner's mistake. It can be done in less obvious ways, for one thing. But the
main point is that it is obviously dangerous and should simply be forbidden.
Ada has rules that are designed to prevent such a mistake from even compiling.
They make it less permissive than C in this respect. This is a good thing.<

I agree. See also bug: http://d.puremagic.com/issues/show_bug.cgi?id=3925

if (getuid() != 0 && getuid == 0) {
ErrorF("only root!");

exit(1);

}

for (int i=0; i != MAX_ELEMENTS; i++);
{
floatValues_l[i] = 0.0f;
}

D compiler is able to catch both bugs, yeah! :-)

----------------

Now I'd like the D language to become a bit more strict, so the compiler may
catch more bugs, integral-related bugs, enum bugs, some pointers bugs, and so
on.

An example:

Here in C the order of evaluation of foo() and bar() is not specified. In D
it's better to specify it, to define the semantics of D code and make a bit
more safe porting D code across different compilers:

auto z = foo(x) + bar(y);

On the other hand if both foo() and bar() are strongly pure functions, then the
D compiler must be free and able to act as in C, choosing the most efficient
order of evaluation of foo() and bar().

----------------

Several sources I have read seem to show that programs written in Ada contain
less bugs than programs written in about all other languages (but Ada subsets
like SPARK, etc). And it's generally known that often the amount of time needed
to debug programs is a significant percentage of the whole programming time.

Then why isn't Ada used more?
- Maybe programmers that have learnt as their first language a C-like language
find bad the Pascal-like syntax of Ada.
- Maybe because Ada programs are a little "logorrhoeic", you need to write lot
of code.
- Maybe because Ada isn's diffused, and professional programmers don't want to
use years of their life to study and use a language that offers low hopes of
being hired elsewhere.
- Maybe because Ada is a pernickety language, every detail needs to be correct
if you want to see your program compiled.

But in my opinion that list misses an important point that I have not seen in
those articles: to me Ada doesn't look very good for explorative programming.
This means I think it's not well fit to both invent new coding ideas, or just
to invent a working solution to programming problem. When I have an algorithmic
problem, I want to use most of my mind to think about the problem, and not to
care and cuddle the compiler, otherwise it's less likely that I am actually
able to find a solution.

So it may be positive a less fussy language, as Python, for the first phase of
exploration (think about using MatPlotLib from the Python shell to plot data
and invent ideas) and invention of a solution algorithm. And later a good
language has to offer ways to make the code less buggy and more rigorous, like
Ada code (for example using an attribute, that switches the code from dynamic
typing to static typing, etc).

I do this a bit when I program in D2: first I write the D2 code without
const/pure/immutable, then when the code works I add those attributes.

------------------

To avoid bugs in C code I have found a tool that I didn't know about, "mygcc",
a variant of it may be written to find bugs in D code too:

http://mygcc.free.fr/overview.html

It's a kind of metalanguage, it allows to define rules, using a ugly but
compact syntax, that are then applied on C code to catch bugs.

From the tests it seems to work well enough, and the rules are compact. But
their syntax doesn't look very good yet.

Bye,
bearophile

Oct 23 2010

dsimcha <dsimcha yahoo.com> writes:

At a meta level, I never really understood all these calls for more strictness
in
D to prevent bugs.  We already have a garbage collector and rules that prevent
the
most common C bugs.  I generally find that the vast majority of time I spend
debugging D code (which is more relevant than the number of bugs) comes from one
of two sources:

1.  Cases where I intentionally bypass safety features (for example, using
manual
memory management, unsafe casting where the type system is just getting in the
way, or unchecked shared-memory multithreading) for performance or convenience
reasons.  In these cases what's really needed is to make existing safe features
more usable and efficient so I'm not as tempted to bypass them.

2.  High-level/domain-specific logic errors.  About the only language features
that will help here are good abstraction capabilities (so I only need to get
complicated algorithmic code right once, and can then reuse it) and good
contracts/asserts (which D mostly has, though I'd love an alwaysAssert() or
something that throws an AssertError instead of Exception, like assert(), but
isn't compiled out in release mode, like enforce(),).  One other thing that
might
help here is finer grained control over bounds checking.  I'd love to leave
bounds
checking on for most of a program, but turn it off just for certain
performance-critical classes/structs/functions.

Oct 23 2010

bearophile <bearophileHUGS lycos.com> writes:

dsimcha:

 At a meta level, I never really understood all these calls for more strictness
in
 D to prevent bugs.

Every time I suggest something I try to give a rationale for it, and you are
able to shot down my arguments. While if you want a general answer I can't help
you much, it's experience from coding & debugging and from reading about how
things are done in languages where bugs are much less common compared to C
(such languages do exist, as Ada and SPARK and few others).


 1.  Cases where I intentionally bypass safety features (for example, using
manual
 memory management, unsafe casting where the type system is just getting in the
 way, or unchecked shared-memory multithreading) for performance or convenience
 reasons.

In origin, about three years ago, to solve this kind of problems my instinct
was pushing toward a solution like the one used by the Cyclone language. Later
I have understood why Cyclone has failed (making C safer is hopeless if you
also want a handy and terse language) and why the D solution was better (to
avoid C solution and use languages features designed to be safer from the
start, like D dynamic arrays).

Still, even if Cyclone is a failure, I still think that it's possible to make
the C-style code a bit safer (not as safe as Cyclone, but safer than normal C)
and keep it almost as handy as before. An example of this effort is the idea of
the  tagged attribute for enums. In the end the very good Don has let me
understand that was a bad idea, and that enhancement request is now closed, but
that  tagged attribute is syntactically clean, you just need it and in theory
you don't need to change other parts of the D code.


 In these cases what's really needed is to make existing safe features
 more usable and efficient so I'm not as tempted to bypass them.

This is a good thing, right.


 2.  High-level/domain-specific logic errors.

In the last months I am developing more and more interest in sofisticated type
systems. From what I've seen typestates and Dependent types are able to catch a
growing number of those higher level bugs. But D doesn't have that type system
(once D has user-defined attributes plus more static introspection it's maybe
possible to define linear types with an attribyte like  linear. This idea is
quite similar to the uniqueness ideas of Bartoz. But linear types are far
simpler, both in implementation and usage, than Dependent types. I have
recently shown a post here about dependant types) so this is not a true option.
A problem is of course that D must be usable in few years or right now, while
maybe it will take too many years to design Dependent types that aren't too
much hard to use. They are mostly an interesting research topic still (despite
you may use them today in the ATS language), while D is a practical language
designed with parts well understood and well known to work now.


 and good contracts/asserts (which D mostly has,

Few months ago in a post I have discussed about some possible improvements of
the contract system of D. Introducing the "old" is probably one of the most
important missing features. Some older threads about it:
http://www.digitalmars.com/d/archives/digitalmars/D/why_no_old_operator_in_function_postconditions_as_in_Eiffel_54654.html
http://www.digitalmars.com/d/archives/digitalmars/D/Communicating_between_in_and_out_contracts_98252.html

When I have discussed about D contract system improvements, I have said
something about "loop invariants" and "loop variants" too. A person has shown
me that calling a pure function inside an assert positioned at the top of the
loop was probably a good enough loop invariant. So maybe there is no need of
explicit syntax for loop invariants in D (despite it's light syntax, it may
reuse the invariant{} syntax).

In that post about D contract system improvements I have also said that loop
variants aren't that important. In the meantime I have found that instead they
may be more important that I have thought, this is one item from a list of ten
rules to write higher reliability C code:
http://spinroot.com/p10/rule2.html
The lack of loop variant has caused this famous Zune bug:
http://www.zuneboards.com/forums/zune-news/38143-cause-zune-30-leapyear-problem-isolated.html

If you don't remember what a loop variant is:
http://en.wikipedia.org/wiki/Loop_variant

I have not yet written an enhancement request that lists all the (few)
improvements I've suggested for the D contract system. I was lazy.


 though I'd love an alwaysAssert() or
 something that throws an AssertError instead of Exception, like assert(), but
 isn't compiled out in release mode, like enforce(),).

This looks a bit messy.


 One other thing that might
 help here is finer grained control over bounds checking.  I'd love to leave
bounds
 checking on for most of a program, but turn it off just for certain
 performance-critical classes/structs/functions.

I think a pragma may be used for this.
But a better solution is of course to introduce a bit more static analysis
inside the D compiler, that allows it to remove some bounds checking where it
infers the bounds can't be crossed. This is done by the last version of the
Oracle Java Machine. On the other hand currently the D language looks designed
to don't require a smart back-end, so this may be something to rely on (of
course future better D compilers are free to perform such optimization in
non-release mode). A disadvantage of those compiler optimizations is that you
can't rely on them. While you are able to rely on the performance kick gained
from a pragma that locally disables bounds checking :-) Predictability is
something I'm appreciating more and more in compilers/languages.

Thank you for your comments.

Bye,
bearophile

Oct 23 2010

Roman Ivanov <isroman.del ete.km.ru> writes:

I haven't worked with Ada, but I have worked with Ada-derived hardware
description language called VHDL and found it easier to use than than
C++. Maybe that's due to the ways I used the language. Maybe not. The
ability to define different integer types seemed nice. I like clarity.

At the same time, I can easily see how that would get out of hand and
sabotage readability.


Java. Why all the reference types are nullable by default? Most of the
time when an object is assigned a null value it is wrong and should
immediately generate an exception. I 5% of the cases when I want nulls,
I can ask for the explicitly.

On 10/23/2010 9:12 AM, bearophile wrote:
 I think I have not shown this article yet, "Expressive vs. permissive
languages: Is that the question?" by Yannick Moy:
 
 First page, with reader comments:
 
 http://www.eetimes.com/design/eda-design/4008921/Expressive-vs-permissive-languages--Is-that-the-question-
 
 Single page, without reader comments:
 
 http://www.eetimes.com/General/DisplayPrintViewContent?contentItemId=4008921
 
 
 I think this article doesn't say particularly new things (and I think it's a
bit biased toward Ada), but it says them in a nice and compact way, it
discusses about a topic that interests D designers, because D is designed to
avoid some of the typical bugs of C code.

Oct 23 2010

bearophile <bearophileHUGS lycos.com> writes:

Roman Ivanov:

 The ability to define different integer types seemed nice. I like clarity.

It _is_ nice, if the compiler is able to enforce those bounds.




In theory I agree a lot with you, that signed sizes is wrong. In practice,
given that:
- D uses the silly signed/unsigned promotion rules of C
- D has no integral overflows yet
- D even lacks warning about mixing signed and unsigned variables in
expressions, that even C GCC has

The result of those three facts makes the usage of integers in D safer. In the
end I use signed lengths and indexes everywhere it's safe and sane to do (but
not everywhere).
So I have asked the opposite of what I and you like:
http://d.puremagic.com/issues/show_bug.cgi?id=3843


 At the same time, I can easily see how that would get out of hand and
 sabotage readability.

This is a real risk, so you need to design the syntax carefully.



 Java. Why all the reference types are nullable by default? Most of the
 time when an object is assigned a null value it is wrong and should
 immediately generate an exception. I 5% of the cases when I want nulls,
 I can ask for the explicitly.

This is a topic that was discussed hotly during the last stages of the
finalization of D2 (I haven't understood why that idea was in the end refused).

Anyway, even if now and forever in D pointers/references on default are
nullable, an annotation + some semantics may be added still to tag the
pointers/references you don't want nullable. I have started an enhancement
request here some time ago:
http://d.puremagic.com/issues/show_bug.cgi?id=4571

Thank you for your comments,
bye,
bearophile

Oct 23 2010

D Programming

C/C++ Programming

Other

digitalmars.D - "Expressive vs. permissive languages" and bugs