written by Walter Bright
July 31, 2008
Invariant data in a program is data that, once initialized, never changes. A new term was coined for it after discovering that the existing jargon const, final, readonly and immutable had different meanings for different programmers and different programming languages (and even had shifting meanings within programming languages!). A term was needed that enabled a precise definition.
The existing terminology confusion led to the discovery that invariance doesn’t get much respect in the programming language business (*), it’s the Rodney Dangerfield of concepts. C++ only has invariance for literals and top level const. There is, for example, no way to declare a pointer to invariant data. Java has similar problems with its final storage class. There is no way to declare an entire data structure as invariant, or even the leaves of a data structure as invariant. Perl has invariance only for strings.
(Note: pointer to const in C++ does not declare a pointer to invariant data, as another alias to that same data can still change it.)
The D programming language, however, brings invariant to the fore as a transitive, first class, pervasive language feature. Whole data structures can be invariant, as well as leaves and substructures of it. It adds some measure of complexity to the language, and since other languages have such severely limited invariant support, and yet large and successful programs are developed with them, the obvious question is why bother? Invariance doesn’t seem to be a property anyone misses.
So here’s a list, in no particular order of importance, of what invariance is good for and why it is worth the extra complexity:
- By marking data as invariant, the self–documenting nature of the type is improved. There is no wondering if any part of a function’s implementation modifies any part of an invariant argument’s data structure. There is no wondering if any subsequent user of an invariant variable changes it or any part of what it refers to.
- Invariant data enables many optimization opportunities. If an optimizer can figure out what an invariant variable is initialized with, it knows that any use of that variable will use the same value, and so generic code can be replaced with special case code for that particular value.
- Data computed using invariant operands is also invariant, and so the result can be cached instead of recomputed.
- Invariant function arguments make pure functions possible, and with pure functions one can use functional programming techniques. Pure functions are inherently thread safe and parallelizable, which is important in the coming ubiquitous world of multiple cores.
- Invariant data can be treated as having the semantics of value types, but with the overhead of a reference type. Strings (discussed in one of my previous columns) are a canonical example of this.
- If a deep copy of a data structure is required, such as for implementing a copy–modify–swap lock–free algorithm, the invariant portions of the structure need not be copied. For example, a symbol table using invariant strings for its leaves need not copy those strings when copying the symbol table.
- Invariant data can be shared among multiple threads without need for locking, synchronization, etc. Data races and sequential consistency problems with mutable data are just not possible with invariant data.
- Invariant data can be automatically partitioned out by the development tools into ROM for use in embedded systems.
- In virtual memory systems, invariant data can be placed into pages marked with the read–only hardware bit, to provide protection against memory corruption and detection of such bugs.
I believe these add up to a compelling case for invariant support, what do you think?
(*) Functional programming (FP) languages inherently require invariance, but these languages have not really entered mainstream use. Perhaps this is because FP languages take a good thing too far and make everything invariant. FP has sometimes crept in the back door, for example C++ templates were discovered to actually be a functional programming language.