written by Walter Bright
May 8, 2008
Eliminating redundancy is often viewed as a way to streamline productivity and reduce waste. But the right kind of redundancy can dramatically improve quality. For example, double entry bookkeeping, invented in the middle ages, was a great advance in accounting because it introduced a redundancy that nearly eliminated common arithmetic errors.
How can good redundancy be distinguished from bad redundancy?
A good redundancy detects faults by having two independent paths to the same result. A failure in one path does not affect the other, so the discrepancy in result indicates an error.
A bad redundancy is distinguished by either not providing a checkable result, or by having the redundant paths dependent on each other in such a way that an error in one path propagates to the other, so the wrong results agree.
Programming languages extensively use this to reduce coding errors. If there was no redundancy at all in a language, then any random sequence of characters would be a valid program. Every error message generated by the compiler is an example of redundancy in the language that finds user errors. An example of a language with next to no redundancy is machine code — nearly any bit pattern is a valid opcode. Anyone who’s written a disassembler for machine code knows that it’s nearly impossible to detect errors, one just soldiers on. (This difficulty in detecting errors is one reason why we don’t normally program in machine code.)
What are some examples of good redundancy in a programming language?
Variable declarations are one. But since the compiler can figure the need for declarations from the context, declarations seem like prime redundancies that can be jettisoned. This is called implicit variable declaration. It sounds like a great idea, and it gets regularly enshrined into new languages. The problem is, the compiler cannot tell the difference between an intended new declaration and a typo — and the poor maintenance programmer can’t tell, either. After a while, though, the lesson about redundancy is learned anew, and a special switch will be added to require explicit declaration.
Statement terminators, such as the popular semicolon, are sometimes considered useless redundancies. After all,
x = 3 + y if (x) *p++
why would semicolons be needed? The compiler is perfectly capable of figuring that out. But what about:
x = 3 + y *p++
that could be (x = 3 + y * p++) where the * is a multiply rather than an indirection. Lack of semicolons also makes error recovery poor, as the compiler cannot figure out where the current wrong statement ends and the next one starts. Semicolons make for nice “anchors” in the text that help both the compiler and the reader synchronize with the code structure.
Probably the best example of redundancy in a programming language is unit tests. Unit tests are an alternative specification for what a function does, but rather than specify the method, they specify the results. The odds of the same error in the method being reproduced in the results are very, very low. This means that even only moderate use of unit tests can dramatically improve program quality, which is why unit test support is built in to the D programming language.
What are some examples of bad redundancy in a programming language?
Redundancy isn’t always good. In C, for example, source code is divided up into header files and implementation files. The declarations are reproduced in both. The problem with this redundancy is that the copies are literally copied text — an error in one is simply copied to the other, giving no value to the redundancy. A useful redundancy needs to be independent.
Another example is declaring temporaries:
int a, b; ... int tmp = a + b;
At first glance, this looks like a good redundancy. It’s checking that the type of tmp is int. But no, it isn’t checking the type, it’s forcing the type to be int, and converting the initializer (a+b) to be int. This can result in problems if a and b are redeclared as some other type, like unsigned or float. To solve this problem, the D programming language has introduced auto to have tmp declared as the type of (a+b):
auto tmp = a + b;
The auto storage class is also slated for inclusion into the future C++0x Standard.
What are some examples of lack of redundancy in a programming language?
A common case is the syntactic ambiguity between declarations and expressions in C++. For example:
T * P;
Is that a declaration of P as a pointer to T, or is it multiplying T and P? A little redundancy would go a long way here.
There’s the overloaded use of < and > to denote template argument lists in C++. What does:
mean? The consequence of this lack of redundancy is notoriously obtuse error messages as the compiler has too little to guide it in guessing at what the user’s intent is.
C++ has added some keywords, like typename, and some productions, like ->template, to add redundancy and so improve error checking and diagnostics.
You can get a feel for where redundancy in a language is lacking by looking at warning diagnostics the compiler implementors tend to add over time. For the designer of a new language, common warnings are a rich source of inspiration for improvements.
Many programming language designers mistakenly assume that removing redundancy makes programmers more productive. This is exhibited with small benchmarks showing how terse they are, with the implication that terseness makes a language powerful and expressive. And this does work great for small programs, such as scripts that are only a few lines long. But it doesn’t scale well, and this rapidly becomes apparent as the source code gets larger and larger.
Well—designed redundancy can, however, dramatically improve productivity with its ability to uncover errors early in the development process. Redundancy not only helps find errors, it helps produce better error messages, and helps the maintenance programmer in understanding the intent of the code.
The trick, of course, is figuring out which redundancies are good, and which aren’t.