www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Built-in unsafety in D

reply bearophile <bearophileHUGS lycos.com> writes:
This is a follow-up of this thread, and other older threads on this topic:

This is a nice article written in 2005 by Thomas Guest, "Built-in Type Safety?":

It shows some bugs common in C++ code that I really really hope D will help
avoid. It's 2010, so it's about time. (Note: the C# v.4 language give ways to
avoid them all).

For me having a way to avoid most of those bugs is more important than:
- Having a good operator overload system;
- Having a way to break/ignore circular imports;
- Having actors;
- Having transitive immutability;
- Having true closures;
- Having good data structures in the standard library;
- Having efficient literal arrays;
- Having fast associative arrays, built-in or in a library;
- Having an efficient dynamic array append;
- Changing fixed sized arrays semantics to returning them by value;
- etc.

This is a compressed version of the function shown near the top of that article:

void signalUpdate(Signal update, Signal & stored) {
    int const tolerance = 10;
    if ((update > stored + tolerance) || (update < stored - tolerance)) {
        stored = update;

The bug was caused by:
typedef unsigned Signal;

Quoting the article:

So, the expression in signalsDifferent(10, 10, 20) evaluates: 10u > 10u + 20 ||
10u < 10u - 20
Now, when you subtract an int from an unsigned  both values are promoted to
unsigned and the result is unsigned. So, 10u - 20 is a very big unsigned
number. Our expression is therefore equivalent to:
false||true which is of course true.

They originally have written very bad unittests, so that's part of the cause of
their problem. But C++ too is flawed, this part of the design of C was maybe OK
in 1970 but in 2010 is unacceptable. This is one of the few cases where
breaking compatibility with C can be acceptable (and I think that breaking C
compatibility for this purpose is more important for breaking it to improve the
semantics of fixed sized arrays, as recently done).

CommonLisp has taught us that many functions in a program don't need max
performance, so using efficient (usually not heap-allocated) multi-precision
integers into them is not going to slow down a program significantly, but can
avoid many integral values-related bugs. In Lisp the fixnums are usually a
performance optimization you can use in selected performance-critical
functions. In Lisp using fixnums everywhere in a program is (correctly) seen as
premature optimization.

Even if D doesn't want to go the CLisp way, and wants to keep using C-style
fixed-sized bit fields to represent integral values, I feel that having
optional runtime overflow errors for integral values can help locate many of
those bugs during the creation of a program (there can be two compilation
switches, one to switch on those runtime errors only for signed integral
values, and one to switch them on on both signed and unsigned integral values).

If you don't believe me, you can take a C# compiler, switch on the overflow
errors, and then write a medium program, you will see your compiler+runtime
happily catching several of your integral-related bugs.

- In D more sane & stricter promotion rules from signed <-> unsigned values too
will help, but they can't replace overflow errors.
- Adding a Sint (safe integral value) struct in the standard library is not a
solution, because generally no one will use it.
- Avoiding the usage of unsigned values everywhere possible in the language and
standard library too helps. I don't understand why the length attribute of
arrays and the array indexes are unsigned in D (in C# they are signed, despite
C# allows the user to use unsigned types), but so far I think it's a bad design
choice that I'd like to change as soon as possible.


The second problem shown in that article has a simpler and less disruptive
solution: named arguments will be something useful to have in D. But this is a
additive change, so I think there is no need to rush for this, it can wait.


The third problem shown in that article was related to the usage of booleans to
represent an input value for a function. Such usage of a boolean is indeed not
clear at the calling point:

void textRender(std::string const & text,
                Rectangle const & region,
                bool wrap = false,
                bool bold = false,
                bool justify = false,
                int first_line_indent = 0);

           true); // wrap text

In Python I have seen that named arguments help solve this problem a lot,
because you use a name that lets you understand the purpose of the boolean.

In alternative another possible solution is shown in this Wish of the D Wish
List, "Inline enum declaration":

That page contains:
void ShowWindow( enum{Show,Hide} sw ) { ... }
* self-documenting
* better than using "bool" (what's true/false?)
* no dummy types (otherwise enum showwindow_t {...})

On the surface it looks cute, but I don't like that solution a lot because it's
a locally defined type, so you can't store it elsewhere, you can't store the
argument of this function somewhere before giving such arguments to the
function, etc.

So I think named arguments are enough to solve most of this third problem too.
But named arguments can be added later, for example in D2.5 or D3. D2 contains
enough bugs now, I think it's better to remove some of them before adding other
_additive_ features. While changing the way integral values are managed is a
breaking change, and it's not fit for D3.

Mar 12 2010
parent reply Ellery Newcomer <ellery-newcomer utulsa.edu> writes:
On 03/12/2010 07:46 AM, bearophile wrote:
 The bug was caused by:
 typedef unsigned Signal;
It would be very nice to be able to know where unsafe comparisons are happening in your program without having to modify and recompile dmd. Another thing that bit me is foreach(k; arr){ k = something; } when k should have a ref attribute. I think both of these are good candidates for warnings: neither is necessarily incorrect, but they often are. Otherwise, I am very well convinced by now that fixnums are a severe inhibition to writing good code. I would very much like to see arbitrary precision integers supported on a language level.
Mar 12 2010
parent bearophile <bearophileHUGS lycos.com> writes:
Ellery Newcomer:
 foreach(k; arr){
 	k = something;
 when k should have a ref attribute.
I have found 2 bugs like that one in my code where 'arr' was an array of structs. Another related bug was, done 2 times in my code: void foo(string s) { s.length += 1; } Here outside foo the length of the string s is seen as unchanged (and allowing to change the length of a string that's supposed to be immutable is a bit silly. D docs have to explain very well that D strings are not immutable). The number of times I have spelled "length" wrong (as lenght) is now uncountable. I really think that "len" or "size" are 100 times better for this purpose. Bye, bearophile
Mar 12 2010