www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - YACP -- Yet Another Const Proposal

reply Reiner Pope <reiner.pope gmail.com> writes:
Ugh, much as I hate more and more proposals over the same thing, if 
enough people have enough different ideas, we might eventually get 
somewhere. If anyone is interested in these ideas, please feel free to 
elaborate into an *even* better system, or just say that you like it, or 
whatever.

This proposal mostly involves introducing Javari's ideas to the 
discussion, as well as how they integrate into D. The paper goes into 
much more detail, and you can get it at 
http://pag.csail.mit.edu/pubs/ref-immutability-oopsla2005-abstract.html 
(or just google 'Javari'). I highly recommend reading it, because it is 
very informative and I refer to it in this post.

I give some more code examples as well as reasons why it satisfies 
Walter's objections in subsequent posts.

The important aspects of my proposal are:
1. using Javari's distinction between mutability and assignability
2. enforcing const by default for method 'in' parameters.
3. introducing an in-between type known as rocheck (readonly, checked), 
which keeps track of const-ness at runtime, for things like copy on 
write, as I discussed in other posts
4. allow readonly templating, a la Javari (which avoids code duplication 
for some const methods)
5. allow type inference using 'auto' to also detect const-ness.

In slightly more depth:

*1. Mutability vs assignability*
_Assignability_: the property which determines whether a variable can be 
used as an Lvalue.

_Mutability_: this determines whether alterations are permissible to the 
logical and /abstract/ state of an object (from the Javari paper, 'the 
abstract state is (part of) the transitively reachable state: that is, 
the state of the object and all state reachable from it by following 
references').

This clears up the need for C++'s things like:
   const char const * foo;
which would instead turn into the more understandable:
   readonly final char[] foo;
using the syntax from Javari (syntax is not a major concern of mine, 
though).

Obviously, since Javari is based on Java, pointers are not an issue. 
However, it turns out that this fits perfectly with the 
assignability/mutability distinction. The following law is added: a 
pointer to an unassignable and/or immutable object must be immutable.

*2. Const by default*
'Const by default', as Walter discussed, seems like a very useful 
detail, and I suggest it would be included. This would mean that all 
unmarked parameters in functions are truly 'in', which means immutable 
for reference types (pass-by-value types are already fine). This is 
already done to an extreme in functional languages, where the 
fundamental idiom is that *everything* is const. However, that means 
in-place modification is impossible, so I see const by default as a 
happy medium. Const by default in function parameters means:
-less typing, saving time, and also making it more likely that libraries 
are const-aware.
-it probably leads to the least broken code, because most functions 
don't modify their parameters anyway.

In code, it means that this C++ code:
   void foo(const char const * string) {...}
would look like this in D:
   void foo(char[] string) {...} // Since string is an 'in' parameter, 
it can't be modified

*3. rocheck*
A detail which Javari introduces is casting to mutable from immutable. 
This works around the static const-checking system, but safety is 
'ensured' by inserting dynamic checking everywhere. I wondered how this 
was implemented efficiently. Another paper indicates that this requies 
inserting a const-violation check at _every modification_ of _every 
mutable variable_. Despite the claims that this only leaves a 10% (and 
with optimizations, less) overhead in Java, I see this overhead as way 
too much for a systems language like D, especially considering that it 
probably couldn't be disabled for release builds, because that would 
cause ABI incompatibilities between release and debug builds.

The two main reasons that the paper presents for casting to mutable are 
interfacing with const-unware code and the situations in which static 
checking is limited (see my other posts to find about these limitations: 
basically, they mean that extra duplications will be required, 
especially with copy-on-write things like string functions, toupper, etc).

As I discussed earlier, legacy code problems are ameliorated by 
const-by-default. The compiler limitations can be worked around by 
introducing a new type: rochecked (possibly readonly, but only known at 
runtime). Both mutable and const types can be cast to rochecked, and 
mutability is determined at runtime (by adding a field to the 
_reference_) but rocheck is in fact statically-verifiable as const-safe: 
it presents the same interface as an immutable type, but with two added 
methods:

isMutable() and

/*mutable*/ Type ensureWritable()
{   if (isMutable()) return cast(mutable) this; // cast(mutable) is only 
accessible by the compiler
     return this.dup;
}

Since the only access to the mutating methods are through the 
ensureWritable() method (which is inserted by the compiler), this is 
guaranteed not to modify the original object, and runtime /checking/ can 
be avoided to some extent.

*4. rotemplate/romaybe*
Javari introduces this as romaybe, but I think the keyword rotemplate is 
more informative. Anyway, let me describe. Consider the following C++ code:

class MyVector<Value>
{   Value* array;
     const Value getAtIndex(int index) const
     {   return array[index];
     }

     Value getAtIndex(int index)
     {   return array[index];
     }
}

The code for both functions is the same, but they are both required. 
Readonly templating turns it into the following equivalent D code:

class MyVector(Value)
{   Value[] array;
     romaybe Value opIndex(int index) romaybe
     {   return array[index];
     }
}

*5. const type inference*
This is just a mechanism for avoiding even more const attributes 
scattered throughout the code. Since const is just an alteration to the 
typing mechanism (at least, in Javari it is), type inference could be 
used to change this code:
   readonly char[] c = getSomeReadonlyFoo();
into
   auto c = getSomeReadonlyFoo();

*Everything else*
Static const-checking is done just as Javari, which means through the 
type system, in which every immutable type is a superclass of a 
corresponding mutable type, and casting is then allowed /to/ const, but 
not /out of/ const (this operation must be done via duplication, or 
dirty assembler hacks -- however, there should be enough flexibility 
that workarounds are not required often).

The rest of the details are identical to Javari, including:
- the concept of this-assignability and this-mutability
- the assignability rules
- functions may be overloaded by const-ness
- romaybe as a const-overloaded template. I think that this name would 
be more intuitive if it were rotemplate, though
- dealing with templates/arrays: specifications such as readonly 
List(readonly Date); are required

*Side note*
One more (slightly unrelated) possibility that this allows is declaring 
utility functions as const, like toupper, etc. I haven't thought much 
about the details, but this could allow, given some planning, a system 
to ensure that a given function has no side-effects, which means it will 
always give the same result given the same parameters. This can lead to 
some interesting optimizations, but since this would seem not to require 
any breaking changes, it can be left until later. This is just another 
step which could allow D to have more of the features of a functional 
language, but without the slowness.
Jul 27 2006
next sibling parent reply Reiner Pope <reiner.pope gmail.com> writes:
This is a safe static checking system, so the only way to work around it 
is via some form of dirty hacks. However, it is much safer than C++, 
because it provides a form of deep constness, and the extra flexibility 
of this system should further avoid the need for const_cast.

The rocheck type also works with copy-on-write functions, which solves 
an objection which I raised earlier about most static checking methods 
requiring extra dups on these functions.

To quote Walter from earlier:
 Const has a number of issues whether or not it is default:
 
 1) function parameters that are reference types (what is talked about most
here) being const
 2) member functions being able to modify the object instance (const functions)
 3) const in the return value

the assignability/mutability distinction covers those problems.
 4) what happens if both const and non-const references to the same data are
simultaneous, and one modifies through the non-const one?

is ensuring that the functions you call only have a readonly view of the data you give them.
 5) assignment of a const reference to a non-const one, either explicitly or
through a function parameter

cause a const-violation, and any programmer problems it causes are probably actually bugs caught at compile-time instead of just annoyances.
 6) what happens if one returns a const reference

 One way to do it is to have const-as-type-modifier like C++, something I've
tried to avoid as being excessively complex (compiler and programmer) and ugly.

programmer, however, const-checking seems to be an essential feature for safety and speed. I think much of the ugliness of having many const’s all over the place can be largely avoided by using const-by-default and const inference as part of type inference. The rotemplate/romaybe mechanism also avoids most of the need for duplicate code which C++ requires.
 (An indirect quote, since I can't find the original) The objection about ‘too
much water under the bridge’ with regards to const by default:

language feature. I understand that Walter doesn’t value const as much as others, but so many people see const as important that such a dismissal is not enough. As discussed earlier, const by default should actually break the /least/ code and although introducing another form of const may not _break_ the code, it will probably just hide the problems until someone wants to make it const-aware. Furthermore, now is the best time to make breaking changes, since breaking changes after 1.0 are a big no-no.
 The objection about making const-unaware code const-aware:

practically forces people to write const-correct code anyway.
Jul 27 2006
parent BCS <BCS pathlink.com> writes:
Reiner Pope wrote:
 Furthermore, now is the best 
 time to make breaking changes, since breaking changes after 1.0 are a 
 big no-no.

Why? I known the obvious an answer, but why not allow major breaking changes with major version changed? If the ABI is backwards compatible, then inter-version linking could be handled like D-to-C linking. However this would require continued updating of the 1.0 compilers and some way to tell versions. I'm not saying this should be done, I'm just sort of thinking what would happen if it is.
Jul 27 2006
prev sibling next sibling parent Reiner Pope <reiner.pope gmail.com> writes:
class Cell
{   int _value;
     assignable int _hashCode;

     public int hashCode() readonly
     {   if (hashCode == 0)
             hashCode = ...
         return hasCode;
     }

     public int value() readonly
     {   return _value;
     }

     public int value(int newValue)
     {   return (_value = newValue);
     }

     final char[] _filename;
     readonly char[] _filename2;
     final readonly char[] _filename3;

     public void foo()
     {   _filename = something; // Error, _filename is final
         _filename[0] = 'c'; // Legal
         _filename2 = something; // Legal
         _filename2[0] = 'c'; // Error, _filename2 is readonly
         _filename3 = something; // Error
         _filename3[0] = something; // Error
     }

     public void foo() readonly // We can overload by immutability
     {   _filename = something; // Error, _filename is final
         _filename[0] = 'c'; // Error, _filename is this-mutable and 
enclosing function, foo() is readonly
         _filename2 = something; // Error, enclosing function is 
readonly and _filename2 isn't mutable, but this-mutable
         _filename2[0] = 'c'; // Error, _filename2 is readonly
         _filename3 = something; // Error
         _filename3[0] = something; // Error
     }

     public void bar() rotemplate
     {   foo(); // If this is readonly, call the readonly version, else 
call the non-readonly version
     }

     public void baz() readonly
     {
     }

     public void bam()
     {   baz(); // Error, baz is readonly
     }
}

rocheck char[] toupper(rocheck char[] string)
{   foreach (char c; string)
     {   if (needsToModify())
         {   char[] modifiable = string.ensureWritable();
             modifyIt(modifiable);
             return modifiable;
         }
         return string;
     }
}

void main()
{   readonly char[] h = "hello world";
     char[] h2 = "hello world";
     readonly char[] h3 = "HELLO WORLD";
     char[] h4 = "HELLO WORLD";
     readonly something = toupper(h); // A copy is made, since h is readonly
     readonly something = toupper(h2); // No copy is made, since h2 is 
modifiable
     readonly something = toupper(cast(readonly)h2); // A copy is made, 
since we requested it
     readonly something = toupper(h3); // No copy is made, and the 
return value can't be modified
     char[] h5 = touppper(h2); // Illegal: rocheck can't be converted to 
mutable
     char[] h6 = toupper(h2).ensureWritable(); // Legal, and all dups 
are avoided. Great!
     readonly something = toupper(h4); // Legal, and no copy is made
}


That's enough examples. It only covers a few cases, but it's all I'm 
doing for now.
Jul 27 2006
prev sibling parent reply Bruno Medeiros <brunodomedeirosATgmail SPAM.com> writes:
This might have not been the best time to deal with this. This is a 
complex issue, people are very busy and the dust is just settling with 
regards to the import issues.
I myself have only read part of the Javari paper, not all yet. I hope to 
finish it at a later time, and eventually get back at this const issue.
Still, I've read your post and I'll give some initial comments.

Reiner Pope wrote:
 
 *1. Mutability vs assignability*

I think that as a simplification, immutable variables should also be unassignable. That is, all readonly vars would be final as well. I have a feeling this would be beneficial.
 *2. Const by default*
 

OK to me.
 *3. rocheck*
 

I haven't read that part of the paper yet, but I think I get a good enough picture from your description. And I think rocheck wouldn't fit well with D, due to the runtime checks and performance penalties, which, like you said, could not be disabled in release builds (as they are not contracts). Also, it would only work with classes, and not any kind of type. (You have an example where you have a "rocheck char[]" variable, how would that work internally?)
 *4. rotemplate/romaybe*

This fits nicely with D, since D already has the notion of templates and templated functions, which is what rotemplate is (although the runtime function code is the same).
 *5. const type inference*
 This is just a mechanism for avoiding even more const attributes 
 scattered throughout the code. Since const is just an alteration to the 
 typing mechanism (at least, in Javari it is), type inference could be 
 used to change this code:
   readonly char[] c = getSomeReadonlyFoo();
 into
   auto c = getSomeReadonlyFoo();
 

 *Everything else*
 Static const-checking is done just as Javari, which means through the 
 type system, in which every immutable type is a superclass of a 
 corresponding mutable type, and casting is then allowed /to/ const, but 
 not /out of/ const (this operation must be done via duplication, or 
 dirty assembler hacks -- however, there should be enough flexibility 
 that workarounds are not required often).
 

I don't like to think of a immutable type as a supertype of a mutable, although the semantics are the same.
 
 *Side note*
 One more (slightly unrelated) possibility that this allows is declaring 
 utility functions as const, like toupper, etc. I haven't thought much 
 about the details, but this could allow, given some planning, a system 
 to ensure that a given function has no side-effects, which means it will 
 always give the same result given the same parameters. This can lead to 

No it wouldn't guarantee any of that (no side-effects). The function could access global variables, static variables, other functions, etc. (there are many things other than just the functions parameters) ---- *Overall Comments* This seems like a good starting point for a const proposal. But we have to think of all aspects of how this would work with D. D is more complex than Java, and there are more things to consider when implementing immutability. In particular D is way more advanced in features that work with types, like templates, template specializations, typeof expressions, is expressions, etc., and we have to think how const would work with those, and see if it is all correct, solid and feasible, (and also if there is a better way). For example, how about this: Would the delegate literal syntax allow an immutable-this? Would there be syntax conflicts? (I think there wouldn't be syntax problems) How to use type inference to create a readonly view of some mutable data? immutable auto bar = somefoo.somebar; ? auto bar = cast(immutable) somefoo.somebar; ? Would "typeof(foo)" include the constness of foo? (Hum...) Should another is-expression metatype check be added? How would it work? is(constof(foo) == immutable) ? is(typeof(foo) == immutable) ? (Hum...) How would template specialization work with constness? (T : Object) -> specializes on any Objects or just mutable Objects? (T : T*) -> specializes on any pointers or just mutable ones? How to specialize on just mutable or immutables types? (T : mutable Object) ? (T : immutable Object)? (as we can we probably we also need would a "mutable" keyword) As you see there are still many details that must be considered. I think this is by far the most complicated proposal D will ever face. -- Bruno Medeiros - MSc in CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
Jul 30 2006
parent reply Reiner Pope <reiner.pope gmail.com> writes:
Thank you for your comments. Certainly, the proposal is only started, 
but much of it, I believe, can also be gleaned from how Javari manages 
it. You also raised concerns which I was not aware of, which is why it 
is so important that many people discuss it.

Bruno Medeiros wrote:
 This might have not been the best time to deal with this. This is a 
 complex issue, people are very busy and the dust is just settling with 
 regards to the import issues.

pushing for anything yet. Let's work out the details first, and then start convincing people it's right. :-)
 Reiner Pope wrote:
 *1. Mutability vs assignability*

I think that as a simplification, immutable variables should also be unassignable. That is, all readonly vars would be final as well. I have a feeling this would be beneficial.

void stepThrough() { readonly /*assignable*/ currentNode = foo(); while (currentNode.next) currentNode = next; // currentNode is now the last node } We don't want to modify currentNode, because we want to keep the list structure the same. However, we do want to step through it. For cases like this, I think it is important
 *3. rocheck*

I haven't read that part of the paper yet, but I think I get a good enough picture from your description. And I think rocheck wouldn't fit well with D, due to the runtime checks and performance penalties, which, like you said, could not be disabled in release builds (as they are not contracts). Also, it would only work with classes, and not any kind of type. (You have an example where you have a "rocheck char[]" variable, how would that work internally?)

with getting CoW to work properly. Consider: char[] /*someCOWFunction*/ foo(char[] input) {} char[] /*Also COW*/ bar(char[] input) {} char[] /*Also COW*/ bam(char[] input) {} foo(bar(bam("hello world"))); Each of them would ordinarily duplicate the string if a modification is required. However, a maximum of 1 is needed in real life, because no-one owns the intermediate copies. So there needs to be some way to know at runtime whether the array reference is readonly or not, because if it is mutable, then the function would be best advised to modify it in-place. That's what rocheck is for. However, in my proposal, it is both: -statically verifiable as correct -devoid of runtime checking this is because the mutating methods can only be accessed via the ensureWritable() method generated by the compiler, so a correct COW function would look like this: rocheck char[] capitalizeAllIfFirstLetterIsLowerThenRunBar(rocheck char[] input) { if (isLowerCase(input[0])) { /*mutable*/ char[] modified = input.ensureWritable(); foreach (inout c; modified) { c = makeUpper(c); } input = modified; } bar(input); } So in this function, it is clearly const-safe as well as only doing one check in the entire function (the call to ensureWritable). This means that, although there is a memory cost added to the array reference, there is actually a runtime saving because of the avoidance of duplications. Additionally, since using this requires a special keyword, it will only be used where the memory/speed tradeoff is deemed worthwhile by the programmer (however, I recommend that it is used for any COW function).
 *Everything else*
 Static const-checking is done just as Javari, which means through the 
 type system, in which every immutable type is a superclass of a 
 corresponding mutable type, and casting is then allowed /to/ const, 
 but not /out of/ const (this operation must be done via duplication, 
 or dirty assembler hacks -- however, there should be enough 
 flexibility that workarounds are not required often).

I don't like to think of a immutable type as a supertype of a mutable, although the semantics are the same.

Obviously, we can choose to think about things as we wish, but if the semantics are simplest to describe this way, then what's wrong with that? Even if it is 'conceptually' wrong, the simplest explanation has its merits for just that reason.
 
 *Side note*
 One more (slightly unrelated) possibility that this allows is 
 declaring utility functions as const, like toupper, etc. I haven't 
 thought much about the details, but this could allow, given some 
 planning, a system to ensure that a given function has no 
 side-effects, which means it will always give the same result given 
 the same parameters. This can lead to 

No it wouldn't guarantee any of that (no side-effects). The function could access global variables, static variables, other functions, etc. (there are many things other than just the functions parameters)

Ok, I didn't mean to use the exact same meaning as the meaning which exists in classes. I thought that utility functions tend not be contained within classes, but rather as global functions. I meant overloading the const function declaration to mean the following for global functions: they don't access any global/static variables other than some sort of mutable exception (just like const functions in classes being able to access mutable fields). However, this restriction would be even greater, because they shouldn't even read them, so the result can't be affected by an external function. However, I believe this won't break any code, so further thought about this can afford to be delayed. However, the potential for optimizations is promising, so I think we should consider it.
 ----
 
 *Overall Comments*
 
 This seems like a good starting point for a const proposal. But we have 
 to think of all aspects of how this would work with D. D is more complex 
 than Java, and there are more things to consider when implementing 
 immutability. In particular D is way more advanced in features that work 
 with types, like templates, template specializations, typeof 
 expressions, is expressions, etc., and we have to think how const would 
 work with those, and see if it is all correct, solid and feasible, (and 
 also if there is a better way).

 How to use type inference to create a readonly view of some mutable data?
   immutable auto bar = somefoo.somebar; ?
   auto bar = cast(immutable) somefoo.somebar; ?

somefoo.immutable will always return the reference cast to immutable
 
 Would "typeof(foo)" include the constness of foo?
 (Hum...)

use cases right now.
 
 Should another is-expression metatype check be added? How would it work?
   is(constof(foo) == immutable) ?
   is(typeof(foo) == immutable) ?
 (Hum...)

it on a preliminary search of the documentation. However, I think the choice isn't that difficult, and it would be a simple matter of deciding what is most consistent with D (which other people are better qualified to do).
 
 How would template specialization work with constness?
  (T : Object)  -> specializes on any Objects or just mutable Objects?
  (T : T*)      -> specializes on any pointers or just mutable ones?
 
 How to specialize on just mutable or immutables types?
  (T : mutable Object) ?
  (T : immutable Object)?
 (as we can we probably we also need would a "mutable" keyword)
 

in specializing just to immutable Objects, since we've already said that mutable objects could be implicitly cast to immutable. However, there should be a form of specialization the other way. Javari suggests is this: (T : Object) -> The object must derive from Object, which is mutable. Therefore, T must be mutable (T: readonly Object) -> The object must derive from readonly Object. Therefore T can be either immutable or mutable. I don't like this, however, because it implicitly restricts T: Object to mutable Objects, whereas I think that restrictions should be explicit. Const by default should do the trick: we would get the following: (T: Object) -> Both mutable and immutable allowed (T: mutable Object) -> only mutable allowed
 
 As you see there are still many details that must be considered. I think 
 this is by far the most complicated proposal D will ever face.
 

other people's input so much. I don't feel possessive of this: I just want a good solution to emerge. Cheers, Reiner
Jul 31 2006
parent Bruno Medeiros <brunodomedeirosATgmail SPAM.com> writes:
Reiner Pope wrote:
 *3. rocheck*

I haven't read that part of the paper yet, but I think I get a good enough picture from your description. And I think rocheck wouldn't fit well with D, due to the runtime checks and performance penalties, which, like you said, could not be disabled in release builds (as they are not contracts). Also, it would only work with classes, and not any kind of type. (You have an example where you have a "rocheck char[]" variable, how would that work internally?)

with getting CoW to work properly. Consider: char[] /*someCOWFunction*/ foo(char[] input) {} char[] /*Also COW*/ bar(char[] input) {} char[] /*Also COW*/ bam(char[] input) {} foo(bar(bam("hello world"))); Each of them would ordinarily duplicate the string if a modification is required. However, a maximum of 1 is needed in real life, because no-one owns the intermediate copies. So there needs to be some way to know at runtime whether the array reference is readonly or not, because if it is mutable, then the function would be best advised to modify it in-place. That's what rocheck is for. However, in my proposal, it is both: -statically verifiable as correct -devoid of runtime checking this is because the mutating methods can only be accessed via the ensureWritable() method generated by the compiler, so a correct COW

It is not entirely devoid of runtime checking, since it envolves calling .ensureWritable() .
 function would look like this:
 
 rocheck char[] capitalizeAllIfFirstLetterIsLowerThenRunBar(rocheck 
 char[] input)
 {   if (isLowerCase(input[0]))
     {   /*mutable*/ char[] modified = input.ensureWritable();
         foreach (inout c; modified)
         {   c = makeUpper(c);
         }
         input = modified;
     }
     bar(input);
 }
 
 So in this function, it is clearly const-safe as well as only doing one 
 check in the entire function (the call to ensureWritable). This means 
 that, although there is a memory cost added to the array reference, 
 there is actually a runtime saving because of the avoidance of 
 duplications. Additionally, since using this requires a special keyword, 
 it will only be used where the memory/speed tradeoff is deemed 
 worthwhile by the programmer (however, I recommend that it is used for 
 any COW function).
 

Hum. Well, a template version of rocheck can be done already (with or without const), and it isn't that much worse from the in-language version: (the code may have some errors) struct rocheck!(T) { immutable T elem; bool writable; /*mutable*/ Type ensureWritable() { if (!writable) return cast(mutable) elem; else return elem.dup; } } // A rocheck caster/constructor rocheck!(T) asrocheck!(T) (T refval) { rocheck!(T) rochk; rochk.elem = refval; return rochk; } rocheck!(char[]) capitalizeEtc(rocheck!(char[]) input) { if (isLowerCase(input.elem[0])) { /*mutable*/ char[] modified = input.ensureWritable(); foreach (inout c; modified) { c = makeUpper(c); } input.elem = modified; } bar(input); return input; } ... char[] mystr = "abc"; auto str = capitalizeEtc(filter1(mystr.asrocheck)); The differences are that it is a bit more verbose to declare rocheck vars, and also you have to explicitly convert from non-rocheck to rocheck (what asrocheck() does). Other than that I think it works ok. -- Bruno Medeiros - MSc in CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
Aug 09 2006