www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - pure functions without invariant (a get-rid-of-const proposal)

reply hasen <hasan.aljudy gmail.com> writes:
The line of thinking for introducing const/invariant seems to go like this:

- D needs to be suitable for parallelism
- let's introduce pure functions
- pure functions need to accept immutable data only
- let's introduce const!! (invariant)

OK, I can agree with most of this, except for the last part! It's not 
necessary that invariant is needed to introduce pure functions into D. 
There must be better ways.

First of all, one could argue, why? what's wrong with invariant?

Well, there are a number of things that are wrong with it:

First of all, immutable data is a concept, implementing it as 
"invariant" seems kind of hackish. By definition, immutable data doesn't 
change. Forcing invariant on top of a mutable type is like saying "This 
data would usually change, but please, for this instance, make it not 
un-changeable!"
This is a HACK. Immutable types should have immutability be a 
fundamental part of the type, not some kind of a glue on top of it; a 
flag that's set on or off!

For instance, it doesn't make sense to have an invariant pointer to 
invariant pointer to invariant data. If the data is invariant, then why 
access it through pointers at all?

When data is immutable, then all you need is the data! You shouldn't 
need to know where is it stored.

Invariant pointers just don't make sense!

Another thing to consider. When you have an object (in the oop sense of 
the term), and you mark it as immutable, what's the point? The whole 
point of objects is to encapsulate state that could change. If the 
object's state doesn't change at all, then it becomes much less 
interesting as an object. All you need then is the data in that object, 
and not the object itself.

Invariant (oop) objects don't make sense.

You're trying to glue completely different things together and hope that 
they stick.

I see this as similar to C++ trying to be "Object Oriented" without 
automatic memory management, and it just doesn't make sense! They put 
high level concepts in a low level language with no consideration to how 
people actually need to use OOP. (They can argue all day long about how 
they have inheritance and polymorphism!)

Similarly for D2, are we considering how people actually need to use 
immutable data in their code?

Wouldn't it make much more sense, to have immutable data-types (such as 
python tuples, and python strings), and remove the "invariant" concept 
all together?

Let's step back and look at what we're trying to achieve again, and 
think of a different way to implement it.

We want to have pure functions, but those have some restrictions!
A pure function's result can only depend on its input parameters (so 
cannot have side effects)
- doesn't read global state
- doesn't perform I/O operations
- doesn't change input parameters (if they are passed by reference, that is)

It's clear that we need immutable types, so, just introduce immutable 
types, but would they be like?
I suggest looking at python.

Consider for instance, python strings (let's call them python-style 
strings from here on):
     -immutable type
     -no `a[b] = c` operation defined on string a
     -`a + b` returns a completely new string
     -not implemented in terms of an underlying `char` type

We can use it as an inspiration for the string type in D2 (excluding the 
"no underlying char type" part)

We can also extend the concept and introduce immutable lists
     -similar to python-style strings (see above)
     -python-style string can be implemented as an immutable list
         - while we're at it, string slice and index operations should 
operate on characters, not bytes!
             - I actually prefer that string is an actual type, not just 
an alias
     -the elements of the list must also be immutable.
         - immutable list to mutable data doesn't make sense (wouldn't 
be used or needed in practice)

Immutable lists don't need an `invariant` keyword, they can be defined 
by i[ ] instead of [ ] (where `i[` is a proper token, like `!(` and 
`r"`). This ties strongly to the idea that immutability is a fundamental 
part of the type, not just some kind of annotation.

The other immutable type we need is a clustering of data, similar to 
tuples in python.
I'm proposing here "immutable struct objects"
   - based on struct
   - internal state is initialized in constructor, never changed anywhere
   - have reference semantics (no value semantics at all, similar to 
python strings)
   - all operations simply return a new object
   - fields can only be of immutable types or native types with value 
semantics (no pointers)

for a practical example, consider a 3d vector:

istruct vector //istruce is an immutable struct type
{
     real x = 0; //ok, initialization behavior
     real z = 0;
     real y = 0;
     this( real px, real py, real pz )
     {
         //constructor can initialize fields
         x = px;
         y = px;
         z = pz;
     }

     real length() { return sqrt( x*x + y*y + z*z ); } //doesn't change 
any state (doesn't need to, anyway)

     vector normalized()
     {
         real len = length();
         return vector( x/len, y/len, z/len ); //returns a new object
         //no need for "new", as immutable structs have no value 
semantics, only reference semantics
         //so "new" is implicit
     }

     vecor opAdd( vector other )
     {
         return vector( x + other.x, y + other.y, z + other.z ); 
//returns a new object
     }

     //..etc..
}
//usage example
auto a = vector(1,2,3);
auto b = vector(5,4,3);
auto c = a.cross(b); //neither a nor b is changed

Why should you consider this proposal?

Well, for one thing, this is how things work in functional programming 
(AFAIK), so this approach sticks to the spirit of functional languages. 
What I mean is, this how data types are like! (I can only speak about 
Haskell here)

Also, This approach is more natural.
Immutable types don't have a state to change by their nature, where as 
if you annotate some type as invariant, you're saying that this object 
could potentially change, but shouldn't for this instance.
This creates confusion, consider when a type that's mutable by 
definition, suddenly needs to be passed to pure functions, and you're in 
for trouble! Welcome to the invariant virus!
If you need this object to be immutable, then why base it on a mutable 
type at all?

The suggested approach presented here removes much complication from the 
type system!
I mean, just what the hell is `invariant(char*)** p`??? why would anyone 
pass such a thing to a pure function?

There's an issue I want to brush on, that is:

Reading global state

Pure function cannot depend on global state, but probably one exception 
can be allowed:
     - if the global state is of a native or immutable type
     - if it's a constant
     - it can't be initialized by an impure function
         Consider if the variable is initialized to `now()`? If a pure 
function reads this variable,
         then it's getting another input, which varies depending on when 
the program was run.
      - corollary: pure functions cannot use:
         - global pointers
         - global (normal/oop) objects
Essentially, this global data is not a state at all, it's simply an 
alias for a value.
e.g. mathematical PI

Now, it's easier to define (and check) pure functions:

A pure function needs to satisfy the following:
- All parameters are either of native types passed by value, or of 
immutable type (passed by reference)
- doesn't read global state unless const (as stated above)
- doesn't perform IO

There are things that pure functions can do (I think, but someone 
correct me if I'm wrong please!)

Pure functions can create and modify local state (as long as it's local 
to the function)
This entails that pure function can call impure functions if:
     - the impure function doesn't read or change global state in an 
impure-way (again, as stated above)
     - the impure function doesn't perform I/O

Let's call such functions "semi-pure" (def: impure functions that are 
callable from within pure functions)


This applies to all kinds of expressions, as they can all be reduced to 
functions. For instance, newing objects can be reduced to calling their 
constructors, which are functions that can be checked for 
pureness/semi-pureness.


When the coder marks a function as "pure", the compiler must check that 
it actually is pure.
I think the compiler can do this, in a way similar to how it can detect 
which functions can have compile time function execution (CTFE)

Probably the procedure can be something like this:
   - functions can be marked (by the compiler) as one of four: pure, 
semi-pure, impure, unknown
   - At first, all functions are marked unknown
   - Three passes:
     Pass1: detect impure functions:
       - functions that read global state in an impure way
       - functions that perform I/O
       - functions that call other impure functions (I'm not sure how 
simple or hard this can be)
     Pass2: detect semi-pure functions:
       - unknown functions that accept parameters of mutable types
         - please note: only unknown functions are checked at this pass
     Pass3: check pure functions:
       - check functions marked by the programmer as "pure"
         - are they already marked impure or semi-pure? issue an error
         - do they call impure functions? issue an error
         - if they call semi-pure functions, it's ok.
       - if you're here, you passed the pureness test!

There are probably some quirks here and there about what constitutes a 
proper semi-pure function,
but I'm sure it can essentially be worked out, without resorting to 
`invariant`.
Mar 08 2009
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
hasen wrote:
 The line of thinking for introducing const/invariant seems to go like this:
 
 - D needs to be suitable for parallelism
 - let's introduce pure functions
 - pure functions need to accept immutable data only
 - let's introduce const!! (invariant)
That's not the line of reasoning. Invariant data has other uses - it is data that can be shared between threads without prejudice. Pure functions themselves are ok with any inputs, and put "const" in the parameter types to clarify to the caller they won't change the arguments.
 OK, I can agree with most of this, except for the last part! It's not 
 necessary that invariant is needed to introduce pure functions into D. 
 There must be better ways.
This is a non-problem. Invariant is not needed to introduce pure functions into D at all.
 First of all, one could argue, why? what's wrong with invariant?
 
 Well, there are a number of things that are wrong with it:
 
 First of all, immutable data is a concept, implementing it as 
 "invariant" seems kind of hackish. By definition, immutable data doesn't 
 change. Forcing invariant on top of a mutable type is like saying "This 
 data would usually change, but please, for this instance, make it not 
 un-changeable!"
 This is a HACK.
On the contrary, it's a very principled approach. Const types are proper supertypes of their mutable counterparts. Immutable types are proper subtypes of const types. It's an established approach. Please google "A theory of type qualifiers" by Jeff Foster.
 Immutable types should have immutability be a 
 fundamental part of the type, not some kind of a glue on top of it; a 
 flag that's set on or off!
That is quite what's happening - the flag is kept during compilation though.
 For instance, it doesn't make sense to have an invariant pointer to 
 invariant pointer to invariant data. If the data is invariant, then why 
 access it through pointers at all?
All interesting data has indirection. You couldn't have an immutable list without indirection.
 When data is immutable, then all you need is the data! You shouldn't 
 need to know where is it stored.
 
 Invariant pointers just don't make sense!
Of course they do. You may want to rethink your argument. It rests on invalid premises and inevitably can't go interesting places. Andrei
Mar 08 2009
prev sibling next sibling parent BCS <none anon.com> writes:
Hello hasen,

 The line of thinking for introducing const/invariant seems to go like
 this:
 
 - D needs to be suitable for parallelism
 - let's introduce pure functions
 - pure functions need to accept immutable data only
 - let's introduce const!! (invariant)
 OK, I can agree with most of this, except for the last part! It's not
 necessary that invariant is needed to introduce pure functions into D.
 There must be better ways.
 
 First of all, one could argue, why? what's wrong with invariant?
 
 Well, there are a number of things that are wrong with it:
 
 First of all, immutable data is a concept, implementing it as
 "invariant" seems kind of hackish. By definition, immutable data
 doesn't
 change. Forcing invariant on top of a mutable type is like saying
 "This
 data would usually change, but please, for this instance, make it not
 un-changeable!"
That is, in fact, exactly what I think people want.
 This is a HACK. Immutable types should have immutability be a
 fundamental part of the type, not some kind of a glue on top of it; a
 flag that's set on or off!
what about an "immutable int" are you proposing that it be a totally separate type?
 For instance, it doesn't make sense to have an invariant pointer to
 invariant pointer to invariant data. If the data is invariant, then
 why access it through pointers at all?
Because arrays /are/ pointers because objects are pointers, because ref parameters are pointers, because passing large structs by value is to expensive and the alternative is a pointer. OK, I'll admit that all but the last case are not pointer at the code level but I'm not seeing any way your argument would work work that doesn't also apply to those cases.
 
 When data is immutable, then all you need is the data! You shouldn't
 need to know where is it stored.
 
pointers are a way to access data, not a way to tell you where it is.
 Invariant pointers just don't make sense!
they make a lot of sence to me.
 
 Another thing to consider. When you have an object (in the oop sense
 of the term), and you mark it as immutable, what's the point? The
 whole point of objects is to encapsulate state that could change. If
 the object's state doesn't change at all, then it becomes much less
 interesting as an object. All you need then is the data in that
 object, and not the object itself.
IIRC, the point of objects is to encapsulate implementation that can change.
 Invariant (oop) objects don't make sense.
 
they make a lot of sence to me.
 You're trying to glue completely different things together and hope
 that they stick.
 
 I see this as similar to C++ trying to be "Object Oriented" without
 automatic memory management, and it just doesn't make sense! They put
 high level concepts in a low level language with no consideration to
 how people actually need to use OOP. (They can argue all day long
 about how they have inheritance and polymorphism!)
 
 Similarly for D2, are we considering how people actually need to use
 immutable data in their code?
I'm reasonably sure they have.
 
 Wouldn't it make much more sense, to have immutable data-types (such
 as python tuples, and python strings), and remove the "invariant"
 concept all together?
 
There are to make cases where the data that will be wanted immutable is of a type (that already exists) that is also wanted in a mutable form.
 Let's step back and look at what we're trying to achieve again, and
 think of a different way to implement it.
 
 We want to have pure functions, but those have some restrictions!
 A pure function's result can only depend on its input parameters (so
 cannot have side effects)
 - doesn't read global state
wrong, it can access anything (including globals) that can be proven won't change (that is they are immutable)
 - doesn't perform I/O operations
 - doesn't change input parameters (if they are passed by reference,
 that is)
 It's clear that we need immutable types, so, just introduce immutable
 types, but would they be like?
Based on your comment about globals it is not clear that immutables are needed. Based on the above, all that would be needed is const. [...]
 - while we're at it, string slice and index operations should
 operate on characters, not bytes!
That's a side issue, but indexing char's is an O(n) operation.
 - I actually prefer that string is an actual type, not
 just
 an alias
 -the elements of the list must also be immutable.
 - immutable list to mutable data doesn't make sense (wouldn't
 be used or needed in practice)
You, me and walter aggree on that. [...] After reading this far and skimming the rest my only question is "WHY?". I see exactly zero benefit to your proposal. You have defined a system that has all the problems of the current one and more. If we were to switch to it I expect that we would end up with a common idiom of defining two almost identical types where one is immutable and the other is not. *What does your system give me that the current system does not?*
Mar 08 2009
prev sibling parent Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
I'm pretty much in agreement with Andrei and BCS here, so I'll just 
comment on this bit:

hasen wrote:
 When the coder marks a function as "pure", the compiler must check that 
 it actually is pure.
 I think the compiler can do this, in a way similar to how it can detect 
 which functions can have compile time function execution (CTFE)
The DMD[1] doesn't actually check whether it can execute a function at compile time, it just tries to do it and errors out if it comes across something it can't handle. For example, this compiles and prints '100': ----- module test; uint y = 100; uint* x = &y; uint foo(bool readptr) { if (readptr) return *x; return 100; } // Doesn't compile if changed to foo(true): // test.d(14): Error: cannot evaluate foo(true) at compile time auto bar = foo(false); void main() { printf("%d\n", bar); } ----- But as the comment states, changing the parameter to true makes it error out because it can't do CTFE on foo(true), even though it managed fine with foo(false). [1]: and LDC/GDC, which use the same frontend.
Mar 08 2009