www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - pure or not pure?

reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
I realize that I don't fully understand all the rules of what D pure 
functions will be.

I think all agree on these rules:

- pure functions can only call pure functions
- allow access to invariant data
- allow mutable access to simple stack variables (variables that are all on 
the stack, no references)

In Andrei's accu-functional document, instead of the last rule, there is:

- allow local automatic mutable state

I'm not sure what this means :)  But here are some questions about what is 
pure and what is not pure:

1. Can pure functions use mutable heap data?  If so, what are the 
restrictions for this?

i.e.:
pure char[] f(int x, int y)
{
   char[] c = new char[y];
   c[] = (char)x;
}

pure char[] g(int x)
{
   char[] c = f(x, 5);
}

2. Can pure functions use stack references to objects that are partially 
invariant, partially mutable?

i.e.:
class C
{
   invariant int x;
   int y;
}

pure int f(C c) {return c.x} // allowed?

This of course, assumes that C can assign x through a constructor, not just 
through a static initializer (this is not implemented today).

as an example of a pure function that takes a pointer to mutable data, but 
doesn't use the data, i.e. this doesn't ever read or write heap data:

pure char *add(char * c, int n) { return c + n;}

3. If the answer to question 1 is 'yes', can you pass mutable heap data to 
pure functions?

pure char[] substr(char[] src, int beg, int end) { return src[beg..end];}

Would substr be allowed to be called from a pure function?
Would substr be allowed to be called from a non-pure function?
Will the compiler tag heap data somehow during execution of a pure function 
to determine whether it is unique or not?

I'm just trying to get a handle on what is and isn't considered pure, as 
everyone seems to have a different opinion...

-Steve 
Apr 09 2008
next sibling parent reply "Janice Caron" <caron800 googlemail.com> writes:
On 09/04/2008, Steven Schveighoffer <schveiguy yahoo.com> wrote:
 I realize that I don't fully understand all the rules of what D pure
  functions will be.

  I think all agree on these rules:

  - pure functions can only call pure functions
  - allow access to invariant data
  - allow mutable access to simple stack variables (variables that are all on
  the stack, no references)

  In Andrei's accu-functional document, instead of the last rule, there is:

  - allow local automatic mutable state
I thought that was pretty clear. It means local variables are allowed to be mutable. (I realise that "automatic" isn't a synonym for "local", but in the examples in accu-functional, Andrei does seem to use it that way).
  But here are some questions about what is
  pure and what is not pure:

  1. Can pure functions use mutable heap data?
I don't see why not.
  If so, what are the
  restrictions for this?

  i.e.:
  pure char[] f(int x, int y)
I'm confident enough to say the return value will have to be invariant. However, you that doesn't stop you from returning heap data. You should be able to do pure string spaces(int x) { auto s = new char[x]; s[] = ' '; return assumeUnique!(s); } but you can't omit the assumeUnique!().
  2. Can pure functions use stack references to objects that are partially
  invariant, partially mutable?
My guess would be no. There's just no need for it.
  i.e.:
By the way, that should be e.g., not i.e.
  This of course, assumes that C can assign x through a constructor, not just
  through a static initializer (this is not implemented today).
I don't think it will be possible to initialize invariant data in a non-invariant constructor, and I think an invariant constructor can only make a fully invariant class. I think this is a /necessary/ restriction, because otherwise the following could not be typechecked: class C { invariant int * p; this(int * q) { p = q; // Big no no! } }
  as an example of a pure function that takes a pointer to mutable data, but
  doesn't use the data, i.e. this doesn't ever read or write heap data:

  pure char *add(char * c, int n) { return c + n;}
Oooh - now that's an interesting one. That one's got me foxed. My instinct says it should be disallowed, but I can't quite put my finger on why. I'm gonna have to think about that one some more.
  3. If the answer to question 1 is 'yes', can you pass mutable heap data to
  pure functions?

  pure char[] substr(char[] src, int beg, int end) { return src[beg..end];}
Same answer.
  Would substr be allowed to be called from a pure function?
If and only if substr() is itself a pure function.
  Would substr be allowed to be called from a non-pure function?
Yes
  Will the compiler tag heap data somehow during execution of a pure function
  to determine whether it is unique or not?
My guess would be no. (But that's just a guess). I reckon you'll have to worry about that yourself and use the assumeUnique!() template to return the data.
Apr 09 2008
next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Janice Caron" wrote
 On 09/04/2008, Steven Schveighoffer wrote:
 I realize that I don't fully understand all the rules of what D pure
  functions will be.

  I think all agree on these rules:

  - pure functions can only call pure functions
  - allow access to invariant data
  - allow mutable access to simple stack variables (variables that are all 
 on
  the stack, no references)

  In Andrei's accu-functional document, instead of the last rule, there 
 is:

  - allow local automatic mutable state
I thought that was pretty clear. It means local variables are allowed to be mutable. (I realise that "automatic" isn't a synonym for "local", but in the examples in accu-functional, Andrei does seem to use it that way).
So 'local automatic' means 'local local'?
  But here are some questions about what is
  pure and what is not pure:

  1. Can pure functions use mutable heap data?
I don't see why not.
  If so, what are the
  restrictions for this?

  i.e.:
  pure char[] f(int x, int y)
I'm confident enough to say the return value will have to be invariant. However, you that doesn't stop you from returning heap data. You should be able to do pure string spaces(int x) { auto s = new char[x]; s[] = ' '; return assumeUnique!(s); } but you can't omit the assumeUnique!().
So how is new char[] a pure function? And why can't I 'wrap' new? For instance, if I have a function: int[] createIncreasingSequence(int s, int e, int step){...} which creates an array of ints that start at s and end at e, increasing by step each time, why can't I make that pure? i.e. I want to specify my own way to initialize an array. If I do that in 10 different pure functions, you're saying I have to re-implement that code in each one?
  2. Can pure functions use stack references to objects that are partially
  invariant, partially mutable?
My guess would be no. There's just no need for it.
Surely there is some possible need for having run-time initialized invariant variables...
  This of course, assumes that C can assign x through a constructor, not 
 just
  through a static initializer (this is not implemented today).
I don't think it will be possible to initialize invariant data in a non-invariant constructor, and I think an invariant constructor can only make a fully invariant class. I think this is a /necessary/ restriction, because otherwise the following could not be typechecked: class C { invariant int * p; this(int * q) { p = q; // Big no no! } }
Why would that compile? If it did, then so would this: this(int *q) invariant { p = q; } I think invariant members that are run-time initialized can be done, as during the constructor, nothing has access to the 'this' pointer.
  3. If the answer to question 1 is 'yes', can you pass mutable heap data 
 to
  pure functions?

  pure char[] substr(char[] src, int beg, int end) { return 
 src[beg..end];}
Same answer.
  Would substr be allowed to be called from a pure function?
If and only if substr() is itself a pure function.
  Would substr be allowed to be called from a non-pure function?
Yes
This was misleading on my part. I should have asked instead of these two questions, if substr is allowed to be pure, can it be called from a non-pure function? In the context of being called from a pure function, src is unique, because pure has to have unique data. But in the context of calling from a non-pure function, it might not be... -Steve
Apr 09 2008
next sibling parent reply Georg Wrede <georg nospam.org> writes:
Steven Schveighoffer wrote:
 "Janice Caron" wrote
 
 So how is new char[] a pure function?  And why can't I 'wrap' new?  For 
 instance, if I have a function:
 
 int[] createIncreasingSequence(int s, int e, int step){...}
 
 which creates an array of ints that start at s and end at e, increasing by 
 step each time, why can't I make that pure?  i.e. I want to specify my own 
 way to initialize an array.  If I do that in 10 different pure functions, 
 you're saying I have to re-implement that code in each one?
The return value must be immutable if you're going to use it several times. And if you want to cache the return values outside of the function (as in only returning a new array when new parameters are used, and otherwise just a reference to a previously returned one), then I guess I don't have an opinion. :-?
 This was misleading on my part.  I should have asked instead of these two 
 questions, if substr is allowed to be pure, can it be called from a non-pure 
 function?  In the context of being called from a pure function, src is 
 unique, because pure has to have unique data.  But in the context of calling 
 from a non-pure function, it might not be...
Ultimately, any function is called from main(). Which is a non-pure function.
Apr 09 2008
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Georg Wrede" wrote
 Steven Schveighoffer wrote:
 "Janice Caron" wrote

 So how is new char[] a pure function?  And why can't I 'wrap' new?  For 
 instance, if I have a function:

 int[] createIncreasingSequence(int s, int e, int step){...}

 which creates an array of ints that start at s and end at e, increasing 
 by step each time, why can't I make that pure?  i.e. I want to specify my 
 own way to initialize an array.  If I do that in 10 different pure 
 functions, you're saying I have to re-implement that code in each one?
The return value must be immutable if you're going to use it several times.
Agreed. But I don't want to use it several times, I want to use it once :) I want it to act like new does. For example, if I have char[] c = new char[5]; in two different pure functions, new better not return the same exact memory space for both! It should be called every time. But I can't do this with a custom function because pure functions can only call other pure functions. Basically, I'm asking if there will be a way to move a common piece of a function that allocates and initializes data out of a pure function and into another. It would be cool if the compiler did not perform special optimizations (such as memoization) on pure functions that return mutable heap data, but still enforced the rules of purity.
 And if you want to cache the return values outside of the function (as in 
 only returning a new array when new parameters are used, and otherwise 
 just a reference to a previously returned one), then I guess I don't have 
 an opinion. :-?

 This was misleading on my part.  I should have asked instead of these two 
 questions, if substr is allowed to be pure, can it be called from a 
 non-pure function?  In the context of being called from a pure function, 
 src is unique, because pure has to have unique data.  But in the context 
 of calling from a non-pure function, it might not be...
Ultimately, any function is called from main(). Which is a non-pure function.
This is a special case of pure function. Let's say I have a pure function: pure invariant(char)[] f() { char[] c = new char[15]; g(c); return assumeUnique!(c); } Now, if g is a pure function that takes a char[] argument and accesses the data referenced by the argument, it could be OK to call it from another pure function because we could assume that the pure function can only use heap data that is 'local' to the function. But it might be invalid to call such a pure function from a non-pure function because data held by a non-pure function is not always considered 'local'. On the other hand, if a pure function can hold pointers to heap data that is shared, but just can't use it, then a pure function that takes mutable heap data might not be able to use it, so a function like g() isn't possible. If pure functions only allow invariant heap parameters and invariant heap returns, then this whole discussion becomes moot, but if they accept non-invariant heap parameters, or allow non-invariant heap returns, I want to know what the rules for those are. It seems to me that only allowing invariant heap data is a severe limitation that will leave D functional programming crippled in the face of an actual FP language. -Steve
Apr 10 2008
next sibling parent "Janice Caron" <caron800 googlemail.com> writes:
On 10/04/2008, Steven Schveighoffer <schveiguy yahoo.com> wrote:
  I want it to act like new does.  For example, if I have

  char[] c = new char[5];

  in two different pure functions, new better not return the same exact memory
  space for both!
You're overcomplicating this. The rule is same input => same output And I would put good money on the notion that "same" means "compares as equal with the == operator". (And that almost certainly means that mutable char[]s won't be allowed as parameters to pure functions - because strings and char[]s with the same content compare as equal)
Apr 10 2008
prev sibling parent Georg Wrede <georg nospam.org> writes:
Steven Schveighoffer wrote:
 "Georg Wrede" wrote
Steven Schveighoffer wrote:
"Janice Caron" wrote

So how is new char[] a pure function?  And why can't I 'wrap' new?  For 
instance, if I have a function:

int[] createIncreasingSequence(int s, int e, int step){...}

which creates an array of ints that start at s and end at e, increasing 
by step each time, why can't I make that pure?  i.e. I want to specify my 
own way to initialize an array.  If I do that in 10 different pure 
functions, you're saying I have to re-implement that code in each one?
The return value must be immutable if you're going to use it several times.
Agreed. But I don't want to use it several times, I want to use it once :) I want it to act like new does. For example, if I have char[] c = new char[5]; in two different pure functions, new better not return the same exact memory space for both! It should be called every time. But I can't do this with a custom function because pure functions can only call other pure functions. Basically, I'm asking if there will be a way to move a common piece of a function that allocates and initializes data out of a pure function and into another. It would be cool if the compiler did not perform special optimizations (such as memoization) on pure functions that return mutable heap data, but still enforced the rules of purity.
And if you want to cache the return values outside of the function (as in 
only returning a new array when new parameters are used, and otherwise 
just a reference to a previously returned one), then I guess I don't have 
an opinion. :-?

This was misleading on my part.  I should have asked instead of these two 
questions, if substr is allowed to be pure, can it be called from a 
non-pure function?  In the context of being called from a pure function, 
src is unique, because pure has to have unique data.  But in the context 
of calling from a non-pure function, it might not be...
Ultimately, any function is called from main(). Which is a non-pure function.
I'll have to take that back. Of course main can be a pure function. Take the unix sort function for an example. It receives data (which by the way, is /assumed/ immutable, that is, the input file is assumed not to change while sort is running) and _without_ otherwise changing its environment (read, files on the file system), returns a value (the sorted input). Actually a lot of unix functionality is based on this. (Almost (just so I'm not caught lying :-) )) all the unix commands that take stdin as input and stdout as output, can be considered pure functions. (That most of them also output to stderr doesn't count in this discussion. :-) Nor the fact that they can change files (like the tee command) if the user wants. Also, they're excellent examples of functions that can be pure or not pure, depending on the invocation.)
 This is a special case of pure function.  Let's say I have a pure function:
 
 pure invariant(char)[] f()
 {
     char[] c = new char[15];
     g(c);
     return assumeUnique!(c);
 }
 
 Now, if g is a pure function that takes a char[] argument and accesses the 
 data referenced by the argument, it could be OK to call it from another pure 
 function because we could assume that the pure function can only use heap 
 data that is 'local' to the function.  But it might be invalid to call such 
 a pure function from a non-pure function because data held by a non-pure 
 function is not always considered 'local'.  On the other hand, if a pure 
 function can hold pointers to heap data that is shared, but just can't use 
 it, then a pure function that takes mutable heap data might not be able to 
 use it, so a function like g() isn't possible.
I'd say that - you can call a pure function from a non-pure function, anytime - you can call a non-pure function from a pure function, BUT THAT makes your pure function lose its purity. It'll still work, however. (And of course, the compiler could be polite enough to remind you of losing purity -- if the compiler can see that you're actually trying to have a pure function, as opposed to the function being pure just by happenstance.) For a program none of this matters. I'd say that you can mix freely whatever you like. BUT, if you wish the compiler has the possibility to do serious optimizing, the you better not soil your pure functions. I'd say, that is your only punishment for frivolous mixing. The data doesn't get corrupted, and the sky doesn't turn checkered above you.
 If pure functions only allow invariant heap parameters and invariant heap 
 returns, then this whole discussion becomes moot, but if they accept 
 non-invariant heap parameters, or allow non-invariant heap returns, I want 
 to know what the rules for those are.  It seems to me that only allowing 
 invariant heap data is a severe limitation that will leave D functional 
 programming crippled in the face of an actual FP language.
One of the dreamed benefits of deciding that _D_ pure functions can only access invariant data is to ensure that the compiler can do memoisation when it wants to. That is, then the programmer doesn't have to do memoisation by hand. This lessens bugs, eases coding, and gives the compiler potentially big possibilities of very serious optimizing. (Not that DMD probably will achieve such in some time, but still. After all, we're designing a laguage here, too, which should make it _possible_ to write such compilers.) If we define - pure functions can only receive immutable data as arguments - pure functions can only access immutable other data - pure functions can only return immutable data - pure functions can't change or output other than the return "value" - pure functions can only use other pure functions and then we define - "return value" can also be a reference to immutable heap data - pure functions can use the heap as their local storage and we decide that the last one means that pure functions can create and manipulate mutable data on the heap, but only so that nobody else has access to this data (including previous, later, or concurrent instances of the same function), then we should be half-way to the goals A/W are trying to reach. --- Now, why all this heap stuff? Simply because even Real Functional Languages, that make it look like the stack is the only place in the world, can actually use the heap all they want, only they do it behind the back of the programmer. It's not about *literally* using the stack only, it's about the metaphor. This metaphor is dictated by the fact that pure functionality means that stuff is passed via return values only. Now, in D (which is a multi-paradigm, practical language, for Systems Programming), we don't want to restrict ourselves any more by purity than we'd want to do that with OO, for example. Before C++ was invented, we had languages that were OO-only, and languages that were non-OO. (Even Tiobe still uses this distinction, and I always wonder to which category they've put D.) The reason for this either-or was simply that people were new to OO, and hadn't yet figured out how (or whether) they can mix OO and procedural. To take another example, when I got a VIC-20 in the early eighties, it only had a Microsoft Basic interpreter. You couldn't do structured programming with it simply because the facilities were missing in the language. This made the programming effort more than exponential into the number of lines written. Today, I only do structural programming (as in versus unstructural), and admire Walter, who still mixes structured and unstructured, quite fluently. (Just see Phobos source, especially the C files.) Now, with D we are charting new ground here. We are (arguably) the first to make a serious effort of mixing functional (as in pure) with procedural programming. And this is where the heap stuff comes along. It would be impractical for D to pass everything via the stack. First of all, non-trivial progams would potentially need huge stack spaces, and in many environments the stack space allocated for a program is given at startup time. So, either we need to allocate stack just-in-case, or risk running out of stack in mid-run. Wasteful, and impolite, especially in a multiuser environment. But to achieve this, we, at this point, have to decide that both input to pure functions and output (i.e. the return "value") have to be invariant. Only this lets us get away with only "pretending" to use the stack, while in essence we pass references to heap stuff via the stack. --- Later, when we're more at ease with mixing functional with non-functional, we might well find ways to ease these restrictions. But that might be several years after we've actually started using this in real projects. PS, oh Bob, I wish Walter would comment on this post, in any way. Suppose my understanding on this differs from that of A/W, then I'd only be misleading everybody here.
Apr 10 2008
prev sibling parent reply "Janice Caron" <caron800 googlemail.com> writes:
On 09/04/2008, Steven Schveighoffer <schveiguy yahoo.com> wrote:
 So how is new char[] a pure function?
new is special.
  And why can't I 'wrap' new?  For
  instance, if I have a function:

  int[] createIncreasingSequence(int s, int e, int step){...}

  which creates an array of ints that start at s and end at e, increasing by
  step each time, why can't I make that pure?
I reckon you can, but you got the return type wrong. invariant(int)[] createIncreasingSequence(int s, int e, int step){...} { auto array = new int[(e-s)/step]; foreach(ref a;array) { a = e; e += step; } return assumeUnique!(array); }
 i.e. I want to specify my own
  way to initialize an array.  If I do that in 10 different pure functions,
  you're saying I have to re-implement that code in each one?
To be honest, I don't understand the question. The rule is : same input => same output What happens in the middle, so long as it is side-effect free, doesn't matter.
 Surely there is some possible need for having run-time initialized invariant
  variables...
You can do that. Walter confirmed that in response to a previous example I gave. import std.date; void main() { invariant s = toString(getUTCtime()); pure string f() { return s; } writefln(s); } Walter says f is pure. Ironically, the program will give a different output each time it's run, but aparently purity only has to be global, not universal.
  > I think this is a /necessary/ restriction, because otherwise the
  > following could not be typechecked:
  >
  >    class C
  >    {
  >        invariant int * p;
  >
  >        this(int * q)
  >        {
  >            p = q; // Big no no!
  >        }
  >    }


 Why would that compile?
Thankfully, it doesn't. But you were suggesting it should be possible to initialize member variables which were declared invariant inside a constructor, so I was demonstrating why this would not be a good idea.
  If it did, then so would this:

  this(int *q) invariant
  {
     p = q;
  }
Not so. Invariant constructors have different typechecking rules. Member variable p initially has type __raw(int *), which means it can only be assigned with an invariant(int *), not with an int *.
  I think invariant members that are run-time initialized can be done, as
  during the constructor, nothing has access to the 'this' pointer.
"this" is certainly available during a constructor.
  This was misleading on my part.  I should have asked instead of these two
  questions, if substr is allowed to be pure,
which is a big "if", and I'm still hedging my bets on the answer being no. But let's hypothetically assume it's yes for sake of argument.
 can it be called from a non-pure
 function?
There is nothing to stop you calling a pure function from a non-pure function. So that would be a yes.
 In the context of being called from a pure function, src is
 unique, because pure has to have unique data.
No it doesn't. It has to have invariant data, not unique data. The whole point about invariant data is it's OK to share it.
Apr 09 2008
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Janice Caron" wrote
 On 09/04/2008, Steven Schveighoffer wrote:
 So how is new char[] a pure function?
new is special.
  And why can't I 'wrap' new?  For
  instance, if I have a function:

  int[] createIncreasingSequence(int s, int e, int step){...}

  which creates an array of ints that start at s and end at e, increasing 
 by
  step each time, why can't I make that pure?
I reckon you can, but you got the return type wrong. invariant(int)[] createIncreasingSequence(int s, int e, int step){...} { auto array = new int[(e-s)/step]; foreach(ref a;array) { a = e; e += step; } return assumeUnique!(array); }
This doesn't work for what I'm asking. I'll add more detail to the example. You have two functions f, and g: pure int f() { int[] x = new int[5]; for(int i = 0; i < x.length; i++) x[i] = 1 + i; /* do stuff */ return result; } pure int g() { int[] y = new int[15]; for(int i = 0; i < y.length; y++) y[i] = 10 + (i * 3); /* do stuff */ return result; } Now, I realize I have very similar code that initializes the int arrays, so I want to put that into a function that creates and initializes an array with the given parameters: int[] createIncreasingSequence(int length, int start, int step) { int[] result = new int[length]; for(int i = 0; i < result.length; i++) result[i] = start + (i * step); return result; } Now, I think we all agree that createIncreasingSequence is as pure as calling new, right? That is, it doesn't affect any global state (except for allocating heap data) and will return an equivalent array every time the same arguments are given. But if I can't declare it as 'pure', then I can't call it from f and g: pure int f() { int[] x = createIncreasingSequence(5, 1, 1); /* do stuff */ return result; } pure int g() { int[] y = createIncreasingSequence(15, 10, 3); /* do stuff */ return result; } So should there be a way to define createIncreasingSequence such that I can call it from a pure function? Or do I have to re-implement it in every pure function I use? Not having the ability to pass around mutable heap data to and from pure functions is going to limit severely the usefulness of pure functions.
 Surely there is some possible need for having run-time initialized 
 invariant
  variables...
You can do that. Walter confirmed that in response to a previous example I gave. import std.date; void main() { invariant s = toString(getUTCtime()); pure string f() { return s; } writefln(s); } Walter says f is pure. Ironically, the program will give a different output each time it's run, but aparently purity only has to be global, not universal.
This is exactly what I want to do, But I want to do it inside a class scope, not in a function scope. I think someone in another thread already confirmed that this is possible in DMD 2.012, so my question is already answered.
  > I think this is a /necessary/ restriction, because otherwise the
  > following could not be typechecked:
  >
  >    class C
  >    {
  >        invariant int * p;
  >
  >        this(int * q)
  >        {
  >            p = q; // Big no no!
  >        }
  >    }


 Why would that compile?
Thankfully, it doesn't. But you were suggesting it should be possible to initialize member variables which were declared invariant inside a constructor, so I was demonstrating why this would not be a good idea.
Like I said, this is already possible.
  If it did, then so would this:

  this(int *q) invariant
  {
     p = q;
  }
Not so. Invariant constructors have different typechecking rules. Member variable p initially has type __raw(int *), which means it can only be assigned with an invariant(int *), not with an int *.
I would guess that this is a requirement for invariant class members as well.
  I think invariant members that are run-time initialized can be done, as
  during the constructor, nothing has access to the 'this' pointer.
"this" is certainly available during a constructor.
I would guess that you cannot pass 'this' to an external entity before initializing the invariant variables (or before initializing mutable variables in an invariant constructor). This should be statically verifiable. -Steve
Apr 10 2008
next sibling parent reply "Janice Caron" <caron800 googlemail.com> writes:
On 10/04/2008, Steven Schveighoffer <schveiguy yahoo.com> wrote:
  But if I can't declare it as 'pure', then I can't
  call it from f and g:
No, you can't.
  So should there be a way to define createIncreasingSequence such that I can
  call it from a pure function?
I would say no.
  Or do I have to re-implement it in every pure
  function I use?
You can make a pure function to return an immutable array, and then duplicate it within f and g.
  Not having the ability to pass around mutable heap data to and from pure
  functions is going to limit severely the usefulness of pure functions.
I don't see why. Functional programming languages such as Haskell seem not to mind. In many functional programming languages, there is no mutable data at all.
Apr 10 2008
next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Janice Caron" wrote
 On 10/04/2008, Steven Schveighoffer wrote:
  Or do I have to re-implement it in every pure
  function I use?
You can make a pure function to return an immutable array, and then duplicate it within f and g.
So your solution is, create a duplicate array that is never used again, thereby increasing the runtime (and memory allocation) of the pure function, and lowering the performance of the application. I can't wait to use functional programming, it sounds awesome :)
  Not having the ability to pass around mutable heap data to and from pure
  functions is going to limit severely the usefulness of pure functions.
I don't see why. Functional programming languages such as Haskell seem not to mind. In many functional programming languages, there is no mutable data at all.
Of course functional programming languages have mutable data. They just don't have SHARED data. We need immutable data in D because D is not a functional language and can share data, so that data needs to be immutable. But data that isn't shared shouldn't need to be immutable. So let me restate: not having the ability to pass around UNIQUE mutable heap data to and from pure functions is going to limit severely the usefulness of pure functions. -Steve
Apr 10 2008
next sibling parent reply "Janice Caron" <caron800 googlemail.com> writes:
On 10/04/2008, Steven Schveighoffer <schveiguy yahoo.com> wrote:
 So your solution is, create a duplicate array that is never used again,
If you need to modify it, then it needs to be mutable, so a heap allocation is unavoidable in any case.
  thereby increasing the runtime
No, /decreasing/ the runtime, because the construction of the invariant array only has to happen ONCE. (Conceptually, it happens every time you call the function, but that's the beauty of functional programming. The compiler is able to cache things and reuse them).
 Of course functional programming languages have mutable data.
Haskell doesn't.
  They just
  don't have SHARED data.
Haskell has no mutable data at all, only constants. In Haskell, you cannot do x = x + 1 because x, once assigned, keeps its value until it goes out of scope. Now, I'm not saying that D needs to be that extreme. I'm only saying that to get the benefits of FP, you have to hand over a certain amount of control to the compiler, to the language. You no longer get to choose the order of evaluation. You cannot be certain what will and what won't be caller. You are no longer in /any/ position to estimate execution time on the basis of what will or will not be called, or when, or in what order. Assuming the compiler is smart enough to make all the optimizations that FP permits it to make, this is a plus, not a minus.
  We need immutable data in D because
No one is arguing against that position.
  But data that isn't shared shouldn't need to be immutable.
Or that one.
  So let me restate: not having the ability to pass around UNIQUE mutable heap
 data to and from pure functions is going to limit severely the usefulness of
  pure functions.
I disagree - partly, because the compiler cannot statically verify uniqueness, so you would immediately be compromising the ability of FP to strongly typecheck everything, thereby robbing the compiler of the guarantees that it needs to ensure fast and crash-free multiprocessing. But mostly because FP requires a different way of thinking about things, and trying to force it to accomodate non-FP paradigms because they sit well with the way you think about problems just isn't playing the game. You have to trust that the compiler a little more. It can optimise better - you just have to keep your fingers crossed that it actually will!
Apr 10 2008
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Janice Caron" wrote
 On 10/04/2008, Steven Schveighoffer wrote:
 So your solution is, create a duplicate array that is never used again,
If you need to modify it, then it needs to be mutable, so a heap allocation is unavoidable in any case.
Yes, but this is 2x heap allocation. For no reason.
  thereby increasing the runtime
No, /decreasing/ the runtime, because the construction of the invariant array only has to happen ONCE. (Conceptually, it happens every time you call the function, but that's the beauty of functional programming. The compiler is able to cache things and reuse them).
Please re-read the example. If I only ever call createIncreasingSequence with a set of arguments once, then it's memoizing on data that's never used again. My supposition is that I have several functions that have similar code that I parameterize into a pure function. Yes, f, and g can be cached, but that doesn't mean createIncreasingSequence needs to be cached. What I'm trying to do (which is a common programming practice) is take common pieces of functions and use a separate function to do that piece. My common piece happens to allocate data. In other words, if I have pure functions: f(x) { statement1; statement2; } g(x) { statement1; statement2; } Then I should be able to do h(x) { statement1; statement2; } f(x) { h(x); } g(x) { h(x); } No matter what statement 1 or 2 is. If it happens to include memory allocation, and that is allowed in a pure function, then I should be able to wrap that memory allocation in another function. If the compiler can't optimize on it inside the pure function, then it can't optimize it in the sub function, but who cares at that point?
 Of course functional programming languages have mutable data.
Haskell doesn't.
  They just
  don't have SHARED data.
Haskell has no mutable data at all, only constants. In Haskell, you cannot do x = x + 1 because x, once assigned, keeps its value until it goes out of scope. Now, I'm not saying that D needs to be that extreme. I'm only saying that to get the benefits of FP, you have to hand over a certain amount of control to the compiler, to the language. You no longer get to choose the order of evaluation. You cannot be certain what will and what won't be caller. You are no longer in /any/ position to estimate execution time on the basis of what will or will not be called, or when, or in what order.
I've been reading a bit about FP, and I see now that you are correct that mutable data isn't allowed. But let's not forget that D is not a functional language, so a loop like: for(int i = 0; i < n; i++) statement; Is legal inside a D pure function, whereas the same code would be achieved through tail-recursion in a functional language. Having mutable return values and arguments may not prevent the compiler from considering the function pure (then again, it may, that is my question). But certainly, comparing D to Haskell doesn't prove either case. I don't think the intent of bolting functional programming optimization opportunities on top of D is going to mean that during pure functions we will have to use a fully FP style.
  So let me restate: not having the ability to pass around UNIQUE mutable 
 heap
 data to and from pure functions is going to limit severely the usefulness 
 of
  pure functions.
I disagree - partly, because the compiler cannot statically verify uniqueness, so you would immediately be compromising the ability of FP to strongly typecheck everything, thereby robbing the compiler of the guarantees that it needs to ensure fast and crash-free multiprocessing. But mostly because FP requires a different way of thinking about things, and trying to force it to accomodate non-FP paradigms because they sit well with the way you think about problems just isn't playing the game. You have to trust that the compiler a little more. It can optimise better - you just have to keep your fingers crossed that it actually will!
OK, sure, so that means you believe: char[] c = new char[5]; is invalid inside a pure function because the compiler cannot verify uniqueness of mutable heap data, right? But the data returned IS unique. Couldn't there be a way to specify that the data is unique, or at least local? I consider having to dup data just so I can use it to be wasteful, no matter what paradigm I'm using. -Steve
Apr 10 2008
next sibling parent reply "Janice Caron" <caron800 googlemail.com> writes:
On 10/04/2008, Steven Schveighoffer <schveiguy yahoo.com> wrote:
 OK, sure, so that means you believe:
Please don't make claims concerning what I may or may not believe. Only I get to make those claims. For the record, I do /not/ believe that which you claim I believe. Feel free to rephrase as "assertion X leads to conclusion Y" or some such, but please don't get personal, or I'm outa here.
Apr 10 2008
parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Janice Caron" wrote
 On 10/04/2008, Steven Schveighoffer wrote:
 OK, sure, so that means you believe:
Please don't make claims concerning what I may or may not believe. Only I get to make those claims. For the record, I do /not/ believe that which you claim I believe. Feel free to rephrase as "assertion X leads to conclusion Y" or some such, but please don't get personal, or I'm outa here.
If you read the post, you would see that it was a question, and assuming that it was true, a further question about why you think that. "so that means you believe: .... right?" First, off, this is not an attack. Second, even if it was an attack, it's not about anything personal. It's about your beliefs on a language design. Please don't get so defensive, nobody is attacking you personally. We're all trying to figure this pure function thing out, and I happen to think your statements are inconsistent. This has nothing to do with what I think of you personally. -Steve
Apr 10 2008
prev sibling next sibling parent reply "Janice Caron" <caron800 googlemail.com> writes:
On 10/04/2008, Steven Schveighoffer <schveiguy yahoo.com> wrote:
  Then I should be able to do

  h(x)
  {
     statement1;
     statement2;
  }

  f(x)
  {
    h(x);
  }

  g(x)
  {
    h(x);
  }
What you're asking for is what someone else called "amoral" (slightly pure). You're suggesting that h be pure-/ish/. It doesn't have to be completely pure per se, but it must be pure /enough/ such that when embedded within f or g, f and g remain pure. I have no idea whether or not D will be able to accomodate this concept. Strictly functional programming doesn't have it. D might or might not. If we do, I suspect it will require another keyword.
  char[] c = new char[5];

 is invalid inside a pure function because the compiler cannot verify
  uniqueness of mutable heap data, right?
No. As I said, new is special. It's integral to the language.
Apr 10 2008
next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Janice Caron" wrote
 On 10/04/2008, Steven Schveighoffer wrote:
  Then I should be able to do

  h(x)
  {
     statement1;
     statement2;
  }

  f(x)
  {
    h(x);
  }

  g(x)
  {
    h(x);
  }
What you're asking for is what someone else called "amoral" (slightly pure). You're suggesting that h be pure-/ish/. It doesn't have to be completely pure per se, but it must be pure /enough/ such that when embedded within f or g, f and g remain pure. I have no idea whether or not D will be able to accomodate this concept. Strictly functional programming doesn't have it. D might or might not. If we do, I suspect it will require another keyword.
We may be able to do it without a keyword. For example, if a pure function returns an invariant(char)[], it means that the return value can be memoized, if not, it means that the return value is well-defined, but another call to the function will return another copy of the same data. Reordering is OK, because it always returns the same data. You could even wrap this in another version that returns invariant data, which then allows for memoization, if you know you're going to call it a lot, and don't care about mutating the data afterwards.
  char[] c = new char[5];

 is invalid inside a pure function because the compiler cannot verify
  uniqueness of mutable heap data, right?
No. As I said, new is special. It's integral to the language.
Of course, but it is like a building block. If I have a function that is pure, and that function calls nothing but other pure functions AND 'new', then why can't it also be proven to be pure? -Steve
Apr 10 2008
next sibling parent reply "Janice Caron" <caron800 googlemail.com> writes:
On 10/04/2008, Steven Schveighoffer <schveiguy yahoo.com> wrote:
 Of course, but it is like a building block.  If I have a function that is
  pure, and that function calls nothing but other pure functions AND 'new',
  then why can't it also be proven to be pure?
It's not the calling of new that's a problem, it's the returning of mutable data. If we allow "pureish" functions, then maybe a "pureish" function could return mutable data, but not a truly function. The reason is that the compiler is allowed to cache the result (the return value from any pure function), and to re-use that result, instead of calling the function, whenever the input compares equal with last time. If any of that data is mutable, it all goes haywire. Seems to me that what's really needed here is a macro, because a macro is basically just a function that's always inlined, so the function body will be parsed /inside/ the pure specifier. Maybe macros could solve that problem completely.
Apr 10 2008
parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Janice Caron" wrote
 On 10/04/2008, Steven Schveighoffer wrote:
 Of course, but it is like a building block.  If I have a function that is
  pure, and that function calls nothing but other pure functions AND 
 'new',
  then why can't it also be proven to be pure?
It's not the calling of new that's a problem, it's the returning of mutable data. If we allow "pureish" functions, then maybe a "pureish" function could return mutable data, but not a truly (a pure) function. The reason is that the compiler is allowed to cache the result (the return value from any pure function), and to re-use that result, instead of calling the function, whenever the input compares equal with last time. If any of that data is mutable, it all goes haywire.
I look at it differently. To me, a pure function is a function that has no side effects and has promised to return the same thing every time it is called. Allocating memory is the only function that is considered pure, but alters global state (i.e. the heap), and so we assume it's pure. A function can still allocate data and be considered pure. A function whose result can be memoized is to me a function that is pure, and also one that returns either stack or invariant data. So a function has to be pure to be memoized, but not all pure functions can be memoized. If you separate out the memoization optimization, then it's possible to have both worlds, without any extra keywords or terminology. We are breaking new ground here, so this is the time to explore options that are not part of the precedent. You might not get the compiler optimizations with a pure function that returns mutable heap data that you would with one that returns invariant heap data, but sometimes that is desirable. The compiler cannot know all things about all optimizations that have been or ever will be. At some point, it has to be nudged in the right direction.
 Seems to me that what's really needed here is a macro, because a macro
 is basically just a function that's always inlined, so the function
 body will be parsed /inside/ the pure specifier. Maybe macros could
 solve that problem completely.
Maybe, but this results in code bloat :) There are always tradeoffs, it's a question of what is important to you. I used to work with a device that had 256 bytes of RAM and 8K of code space. When working on that device, I learned how important it was to split common functionality into functions. -Steve
Apr 10 2008
prev sibling parent "Janice Caron" <caron800 googlemail.com> writes:
On 10/04/2008, Janice Caron <caron800 googlemail.com> wrote:
  If we allow "pureish" functions, then maybe a "pureish" function could
  return mutable data, but not a truly function.
I meant: ...but not a truly pure function.
Apr 10 2008
prev sibling next sibling parent reply "Koroskin Denis" <2korden+dmd gmail.com> writes:
On Thu, 10 Apr 2008 20:34:29 +0400, Janice Caron <caron800 googlemail.co=
m>  =

wrote:

 On 10/04/2008, Steven Schveighoffer <schveiguy yahoo.com> wrote:

  char[] c =3D new char[5];

 is invalid inside a pure function because the compiler cannot verify
  uniqueness of mutable heap data, right?
No. As I said, new is special. It's integral to the language.
OK, new is 'special', but what about malloc or my own allocator? (I happ= en = to work in a gamedev and we never use new nor malloc). I don't see how special extern(C) void* malloc(size_t size); differs from: void* my_malloc(size_t size) { return malloc(size); } and why can't it be 'special' too.
Apr 10 2008
next sibling parent reply "Janice Caron" <caron800 googlemail.com> writes:
On 10/04/2008, Koroskin Denis <2korden+dmd gmail.com> wrote:
  OK, new is 'special', but what about malloc or my own allocator? (I happen
 to work in a gamedev and we never use new nor malloc).
  I don't see how

  special extern(C) void* malloc(size_t size);

  differs from:

  void* my_malloc(size_t size)
  {
    return malloc(size);
  }

  and why can't it be 'special' too.
Because you're modifying global state. One of the reasons why new is special is because you never have to delete (because the garbage collector takes care of it). This is directly comparable to, for example, Haskell, where new "things" are created all the time, and are never freed. It's built into the system. More - it /is/ the system. That's the status of new. It's the de facto heap allocation method in D. If you want to use custom allocators, there's nothing to stop you, but that doesn't change the fact that the minute you modify global state, you have a function that isn't pure. I think you're worrying too much. There should never be a need to call malloc and free inside a pure function. If you need a function that does impure stuff, just write the function anyway, and don't call it pure.
Apr 10 2008
parent reply "Koroskin Denis" <2korden+dmd gmail.com> writes:
On Thu, 10 Apr 2008 21:41:15 +0400, Janice Caron <caron800 googlemail.co=
m>  =

wrote:

 On 10/04/2008, Koroskin Denis <2korden+dmd gmail.com> wrote:
  OK, new is 'special', but what about malloc or my own allocator? (I =
=
 happen
 to work in a gamedev and we never use new nor malloc).
  I don't see how

  special extern(C) void* malloc(size_t size);

  differs from:

  void* my_malloc(size_t size)
  {
    return malloc(size);
  }

  and why can't it be 'special' too.
Because you're modifying global state. One of the reasons why new is special is because you never have to delete (because the garbage collector takes care of it). This is directly comparable to, for example, Haskell, where new "things" are created all the time, and are never freed. It's built into the system.=
 More - it /is/ the system. That's the status of new. It's the de facto=
 heap allocation method in D.

 If you want to use custom allocators, there's nothing to stop you, but=
 that doesn't change the fact that the minute you modify global state,
 you have a function that isn't pure.

 I think you're worrying too much. There should never be a need to call=
 malloc and free inside a pure function. If you need a function that
 does impure stuff, just write the function anyway, and don't call it
 pure.
Well, I don't distinguish between new and malloc. They both are impure a= nd /shouldn't/ be callable from pure functions. In fact, there is a solutio= n: inew (for short, or new invariant(T) from accu-functional). Obviously enough, new can't be pure: T t1 =3D new T(); T t2 =3D new T(); Since it can't be rewritten to: T t1 =3D new T(); T t2 =3D t1; However, inew /is/ pure: T t1 =3D inew T(); T t2 =3D t1; Is this the key!? Yet, there is one more exception: rand(). I believe Haskell /has/ random= = numbers, therefore it should be callable from pure functions, isn't it? Other solution could be to introduce some 'uncachable' keyword, so that = we = could declare: uncachable int rand(); uncachable void* malloc(size_t size);
 [snip]
 I should have added: /neither/ of those functions is pure.

 In fact, I strongly doubt that any function declared extern(C) will be=
 pure, for the simple and obvious reason that C doesn't have the "pure"=
 keyword. :-)
Agreed, C doesn't but pure and impure follow the same calling convention= (I hope), so there is no difference between two from linker's point of = view. Can't see why isalpha/isdigit/tolower/etc. can't be declared pure: extern "C" { pure int isalpha(int c); pure int isdigit(int c); pure int tolower(int c); }
Apr 11 2008
parent "Janice Caron" <caron800 googlemail.com> writes:
On 11/04/2008, Koroskin Denis <2korden+dmd gmail.com> wrote:
 I don't distinguish between new and malloc.
malloc() needs free(). new needs nothing, since we have a garbage collector. That is a /big/ difference.
Apr 13 2008
prev sibling parent "Janice Caron" <caron800 googlemail.com> writes:
On 10/04/2008, Janice Caron <caron800 googlemail.com> wrote:
 On 10/04/2008, Koroskin Denis <2korden+dmd gmail.com> wrote:
  >  OK, new is 'special', but what about malloc or my own allocator? (I happen
  > to work in a gamedev and we never use new nor malloc).
  >  I don't see how
  >
  >  special extern(C) void* malloc(size_t size);
  >
  >  differs from:
  >
  >  void* my_malloc(size_t size)
  >  {
  >    return malloc(size);
  >  }
Sorry, I should have added: /neither/ of those functions is pure. In fact, I strongly doubt that any function declared extern(C) will be pure, for the simple and obvious reason that C doesn't have the "pure" keyword. :-)
Apr 10 2008
prev sibling parent reply Georg Wrede <georg nospam.org> writes:
Janice Caron wrote:
 On 10/04/2008, Steven Schveighoffer <schveiguy yahoo.com> wrote:
 
 char[] c = new char[5];

 is invalid inside a pure function because the compiler cannot verify
 uniqueness of mutable heap data, right?
No. As I said, new is special. It's integral to the language.
First of all, I think new is special anyhow. Second, come to think of it, even if new weren't special, whatever it returns IS UNIQUE and as good as IMMUTABLE by others -- as seen FROM THE PERSPECTIVE OF THE CALLER. Which is actually all we care about, when we're inside a pure function. In other words, if you new something, nobody else gets to change it since nobody else has a reference to this new data. --- Ehrm. On second thought, it actually IS possible to end up in a situation where a just newed object may suddenly change in your hands, and also be referenced from other parts of the program. (I'll leave constructing an example as an exercise.) Now, the simplest thing to guarantee that not happening would be to deny newing of objects that have constructors. When the compiler gets more advanced, it could examine such constructors, and only warn (or deny) newing objects where it can't verify such is not the case. Until that, I guess we'll see what A/W have decided on this one. If, that is, they've come up with the above problem.
Apr 10 2008
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Georg Wrede" wrote
 Ehrm. On second thought, it actually IS possible to end up in a situation 
 where a just newed object may suddenly change in your hands, and also be 
 referenced from other parts of the program.

 (I'll leave constructing an example as an exercise.)

 Now, the simplest thing to guarantee that not happening would be to deny 
 newing of objects that have constructors.
In fact, I think the solution to this is to make a constructor that is pure :) The problem with this is that the 'this' pointer is mutable when passed to the constructor, so you have mutable data being passed to a pure function. However, it is 'unique' data that has not been initialized anywhere, so this might be an exception to the rule. Having the constructor pure solves any problems because you can't pass the 'this' pointer out to some global function that might store it somewhere. -Steve
Apr 10 2008
parent Georg Wrede <georg nospam.org> writes:
Steven Schveighoffer wrote:
 "Georg Wrede" wrote
 
Ehrm. On second thought, it actually IS possible to end up in a situation 
where a just newed object may suddenly change in your hands, and also be 
referenced from other parts of the program.

(I'll leave constructing an example as an exercise.)

Now, the simplest thing to guarantee that not happening would be to deny 
newing of objects that have constructors.
In fact, I think the solution to this is to make a constructor that is pure :)
Yes. And thus, a pure constructor cannot write to class variables, only instance variables.
 The problem with this is that the 'this' pointer is mutable when passed to 
 the constructor, so you have mutable data being passed to a pure function. 
 However, it is 'unique' data that has not been initialized anywhere, so this 
 might be an exception to the rule.
I agree. One might consider the "this" varable as either actually pointing to this particular instance or else consider it invalid. If we define it so, then "this" can be considered immutable. Besides, since we don't have a reallocating GC, and we can't (at least I don't now remember otherwise) store object instances in (potentially relocating) arrays (only references to them), the "this" pointer could be made immutable everywhere.
 Having the constructor pure solves any problems because you can't pass the 
 'this' pointer out to some global function that might store it somewhere.
Yes.
Apr 10 2008
prev sibling parent Georg Wrede <georg nospam.org> writes:
Steven Schveighoffer wrote:
 I've been reading a bit about FP, and I see now that you are correct that 
 mutable data isn't allowed.  But let's not forget that D is not a functional 
 language, so a loop like:
 
 for(int i = 0; i < n; i++)
    statement;
 
 Is legal inside a D pure function, whereas the same code would be achieved 
 through tail-recursion in a functional language.  Having mutable return 
 values and arguments may not prevent the compiler from considering the 
 function pure (then again, it may, that is my question).  But certainly, 
 comparing D to Haskell doesn't prove either case.
It doesn't. And whether we can have mutable arguments and/or mutable return values, is simply up to Walter. Nothing else. However, I don't see any other way than both being immutable. If you want to change the the value returned by a pure function, you have to make a copy of it. Just plain old COW. And most of the time, that feels natural anyway.
 I don't think the intent of bolting functional programming optimization 
 opportunities on top of D is going to mean that during pure functions we 
 will have to use a fully FP style.
That would be disaster. But that's just my opinion. :-)
Apr 10 2008
prev sibling parent reply Leandro Lucarella <llucax gmail.com> writes:
Steven Schveighoffer, el 10 de abril a las 10:42 me escribiste:
 So let me restate: not having the ability to pass around UNIQUE mutable heap 
 data to and from pure functions is going to limit severely the usefulness of 
 pure functions.
This is the exact same problem of the mutable methods of a class used in a stack allocated temporary of a pure function, talked in another thread. class A { int x; void f() { x = 1; } } pure void g() { scope a = new A; a.f(); } g() is clearly pure (it has no side effects, besides allocating memory in the stack, which shouldn't be considered a side effect, I guess allocating in the heap shouldn't be considered a side effect either, since new is thread safe). Even when A.f() is not pure, nor const. There is an implementation problem, about how hard is to detect that by the compiler, but in theory there is no reason for g() not being pure. -- Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/ ---------------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------------- Parece McGuevara's o CheDonald's
Apr 10 2008
parent Georg Wrede <georg nospam.org> writes:
Leandro Lucarella wrote:
 Steven Schveighoffer, el 10 de abril a las 10:42 me escribiste:
 
So let me restate: not having the ability to pass around UNIQUE mutable heap 
data to and from pure functions is going to limit severely the usefulness of 
pure functions.
This is the exact same problem of the mutable methods of a class used in a stack allocated temporary of a pure function, talked in another thread. class A { int x; void f() { x = 1; } } pure void g() { scope a = new A; a.f(); } g() is clearly pure (it has no side effects, besides allocating memory in the stack, which shouldn't be considered a side effect, I guess allocating in the heap shouldn't be considered a side effect either, since new is thread safe). Even when A.f() is not pure, nor const. There is an implementation problem, about how hard is to detect that by the compiler, but in theory there is no reason for g() not being pure.
g doesn't return a value.
Apr 10 2008
prev sibling parent Russell Lewis <webmaster villagersonline.com> writes:
Janice Caron wrote:
  Not having the ability to pass around mutable heap data to and from pure
  functions is going to limit severely the usefulness of pure functions.
I don't see why. Functional programming languages such as Haskell seem not to mind. In many functional programming languages, there is no mutable data at all.
That's not quite true, and I think that the distinction is important. Haskell and the rest are constantly doing memory allocation, and that memory allocation has the potential to fail when you run out of memory. It's simply that they hide it all behind their syntax. Thus, I would argue that it is perfectly reasonable for a "pure" D function to allocate memory. The question arises, then, what happens when you run out? A pure function probably shouldn't be allowed to throw an exception (that would violate the pureness guarantees), so what do you do? Or do you create a "PureFunctionFailedException" which is the only thing that a pure function might throw???
Apr 10 2008
prev sibling next sibling parent Georg Wrede <georg nospam.org> writes:
Steven Schveighoffer wrote:
 Now, I realize I have very similar code that initializes the int arrays, so 
 I want to put that into a function that creates and initializes an array 
 with the given parameters:
 
 int[] createIncreasingSequence(int length, int start, int step)
 {
     int[] result = new int[length];
     for(int i = 0; i < result.length; i++)
         result[i] = start + (i * step);
     return result;
 }
 
 Now, I think we all agree that createIncreasingSequence is as pure as 
 calling new, right?  That is, it doesn't affect any global state (except for 
 allocating heap data) and will return an equivalent array every time the 
 same arguments are given.  But if I can't declare it as 'pure', then I can't 
 call it from f and g:
Yes. I think your function is pure. Except for the fact that the returned heap data has to be pure.
 pure int f()
 {
    int[] x = createIncreasingSequence(5, 1, 1);
 
    /* do stuff */
    return result;
 }
 
 pure int g()
 {
    int[] y = createIncreasingSequence(15, 10, 3);
 
    /* do stuff */
    return result;
 }
 
 So should there be a way to define createIncreasingSequence such that I can 
 call it from a pure function?  Or do I have to re-implement it in every pure 
 function I use?
You should be able to call it from a pure function.
 Not having the ability to pass around mutable heap data to and from pure 
 functions is going to limit severely the usefulness of pure functions.
This example does not need mutable data. And it is an example of how you can get along just fine with immutable data. (It is also an example of CTFE. That is, already existing DMD should convert f and g into compile time constants. :-) But that's obviously besides the point here!)
Apr 10 2008
prev sibling parent reply "Koroskin Denis" <2korden+dmd gmail.com> writes:
On Thu, 10 Apr 2008 17:26:20 +0400, Steven Schveighoffer  =

<schveiguy yahoo.com> wrote:

 "Janice Caron" wrote
 On 09/04/2008, Steven Schveighoffer wrote:
 So how is new char[] a pure function?
new is special.
  And why can't I 'wrap' new?  For
  instance, if I have a function:

  int[] createIncreasingSequence(int s, int e, int step){...}

  which creates an array of ints that start at s and end at e,  =
 increasing
 by
  step each time, why can't I make that pure?
I reckon you can, but you got the return type wrong. invariant(int)[] createIncreasingSequence(int s, int e, int =
 step){...}
    {
        auto array =3D new int[(e-s)/step];
        foreach(ref a;array)
        {
            a =3D e;
            e +=3D step;
        }
        return assumeUnique!(array);
    }
This doesn't work for what I'm asking. I'll add more detail to the =
 example.
 You have two functions f, and g:

 pure int f()
 {
     int[] x =3D new int[5];
     for(int i =3D 0; i < x.length; i++)
        x[i] =3D 1 + i;

     /* do stuff */
     return result;
 }

 pure int g()
 {
     int[] y =3D new int[15];
     for(int i =3D 0; i < y.length; y++)
         y[i] =3D 10 + (i * 3);

     /* do stuff */
     return result;
 }

 Now, I realize I have very similar code that initializes the int array=
s, =
 so
 I want to put that into a function that creates and initializes an arr=
ay
 with the given parameters:

 int[] createIncreasingSequence(int length, int start, int step)
 {
     int[] result =3D new int[length];
     for(int i =3D 0; i < result.length; i++)
         result[i] =3D start + (i * step);
     return result;
 }

 Now, I think we all agree that createIncreasingSequence is as pure as
 calling new, right?  That is, it doesn't affect any global state (exce=
pt =
 for
 allocating heap data) and will return an equivalent array every time t=
he
 same arguments are given.  But if I can't declare it as 'pure', then I=
=
 can't
 call it from f and g:

 pure int f()
 {
    int[] x =3D createIncreasingSequence(5, 1, 1);

    /* do stuff */
    return result;
 }

 pure int g()
 {
    int[] y =3D createIncreasingSequence(15, 10, 3);

    /* do stuff */
    return result;
 }

 So should there be a way to define createIncreasingSequence such that =
I =
 can
 call it from a pure function?  Or do I have to re-implement it in ever=
y =
 pure
 function I use?
There is a way with inew (invariant new): class IncreasingSequence(int length) { public: this(int start, int step) { for(int i =3D 0; i < length; i++) elements[i] =3D start + (i * step); } int[length] elements; } pure int f() { auto x =3D inew IncreasingSequence!(5)(1, 1); /* do stuff */ return result; } pure int g() { auto y =3D inew IncreasingSequence!(15)(10, 3); /* do stuff */ return result; } Isn't it beautiful? Yet, it's /very/ close to wait you wanted.
Apr 11 2008
parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
Nope, not close :)  In fact, I want a mutable array because I intend to play 
with the values later in the function (I didn't make that clear I guess).

I agree that in the case that I don't need a mutable array, returning an 
invariant array is an acceptable solution, I don't even think you need a 
class for it, a pure function will do just fine.

But returning a mutable array IMO does not make the function impure, it just 
means you cannot cache the return value.  The compiler could be made to 
assume this when seeing that the return value has a mutable heap pointer.

-Steve

"Koroskin Denis" wrote
On Thu, 10 Apr 2008 17:26:20 +0400, Steven Schveighoffer
 wrote:
 "Janice Caron" wrote
 On 09/04/2008, Steven Schveighoffer wrote:
 So how is new char[] a pure function?
new is special.
  And why can't I 'wrap' new?  For
  instance, if I have a function:

  int[] createIncreasingSequence(int s, int e, int step){...}

  which creates an array of ints that start at s and end at e, 
 increasing
 by
  step each time, why can't I make that pure?
I reckon you can, but you got the return type wrong. invariant(int)[] createIncreasingSequence(int s, int e, int step){...} { auto array = new int[(e-s)/step]; foreach(ref a;array) { a = e; e += step; } return assumeUnique!(array); }
This doesn't work for what I'm asking. I'll add more detail to the example. You have two functions f, and g: pure int f() { int[] x = new int[5]; for(int i = 0; i < x.length; i++) x[i] = 1 + i; /* do stuff */ return result; } pure int g() { int[] y = new int[15]; for(int i = 0; i < y.length; y++) y[i] = 10 + (i * 3); /* do stuff */ return result; } Now, I realize I have very similar code that initializes the int arrays, so I want to put that into a function that creates and initializes an array with the given parameters: int[] createIncreasingSequence(int length, int start, int step) { int[] result = new int[length]; for(int i = 0; i < result.length; i++) result[i] = start + (i * step); return result; } Now, I think we all agree that createIncreasingSequence is as pure as calling new, right? That is, it doesn't affect any global state (except for allocating heap data) and will return an equivalent array every time the same arguments are given. But if I can't declare it as 'pure', then I can't call it from f and g: pure int f() { int[] x = createIncreasingSequence(5, 1, 1); /* do stuff */ return result; } pure int g() { int[] y = createIncreasingSequence(15, 10, 3); /* do stuff */ return result; } So should there be a way to define createIncreasingSequence such that I can call it from a pure function? Or do I have to re-implement it in every pure function I use?
There is a way with inew (invariant new): class IncreasingSequence(int length) { public: this(int start, int step) { for(int i = 0; i < length; i++) elements[i] = start + (i * step); } int[length] elements; } pure int f() { auto x = inew IncreasingSequence!(5)(1, 1); /* do stuff */ return result; } pure int g() { auto y = inew IncreasingSequence!(15)(10, 3); /* do stuff */ return result; } Isn't it beautiful? Yet, it's /very/ close to wait you wanted.
Apr 11 2008
prev sibling parent guslay <guslay gmail.com> writes:
Janice Caron Wrote:

 On 09/04/2008, Steven Schveighoffer <schveiguy yahoo.com> wrote:
 
  as an example of a pure function that takes a pointer to mutable data, but
  doesn't use the data, i.e. this doesn't ever read or write heap data:

  pure char *add(char * c, int n) { return c + n;}
Oooh - now that's an interesting one. That one's got me foxed. My instinct says it should be disallowed, but I can't quite put my finger on why. I'm gonna have to think about that one some more.
As long as you stick to pointer arithmetic and don't dereference c, it is pure. Will it be allowed, that's pure speculation ;)
Apr 09 2008
prev sibling next sibling parent reply Georg Wrede <georg nospam.org> writes:
Steven Schveighoffer wrote:
 I realize that I don't fully understand all the rules of what D pure 
 functions will be.
 
 I think all agree on these rules:
 
 - pure functions can only call pure functions
 - allow access to invariant data
 - allow mutable access to simple stack variables (variables that are all on 
 the stack, no references)
 
 In Andrei's accu-functional document, instead of the last rule, there is:
 
 - allow local automatic mutable state
Still not pretending to know too much, I'd say that if the A/W goal is "pure functions as far as MP optimizations is needed" only, then: A function might need to have local state. Take a function that counts something in a list. It'd need a local counter -- unless we want to force the programmer to write it as a recursive function. (Which I wholeheartedly hope not. I'd be happy with D being multiparadigm.) Such a variable would be the "local automatic mutable state". Local, of course. Automatic, as destructed at end of scope. Mutable and state, of course. :-)
 I'm not sure what this means :)  But here are some questions about what is 
 pure and what is not pure:
 
 1. Can pure functions use mutable heap data?  If so, what are the 
 restrictions for this?
 
 i.e.:
 pure char[] f(int x, int y)
 {
    char[] c = new char[y];
    c[] = (char)x;
return c; // I presume.
 }
 
 pure char[] g(int x)
 {
    char[] c = f(x, 5);
 }
f is pure. g is pure itself, and only uses f. And a new c is created each time you access the function, so it's free of other references to it (that could change it). For example, function pure char[] concatenate(char[] a, char[] b) { // return new string consisting of a ~ b } The return value is of course a reference to a new string. In a purely functional language returning a reference to a heap object might not be ok (I'm no pro on that.), but as D otherwise pretends strings to behave somewhat value like, I assume this simply has to be ok. Of course, a and b would have to be immutable. It's really a shame that A/W don't elaborate on these things. Now we are only getting increasingly unsure about stuff, and at the same time the combined brain-cell-hours we invest here might simply be a wild goose chase. And after all, most of this is just what they eventually *decide*. We could use some help and guidelines, or even opinions on these things. And we could even be of help to them, since right now there seem to be a lot of very smart, motivated and able people here. Maybe actually more than ever before.
Apr 09 2008
parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Georg Wrede" wrote
 Steven Schveighoffer wrote:
 pure char[] f(int x, int y)
 {
    char[] c = new char[y];
    c[] = (char)x;
return c; // I presume.
 }
Yes, I forgot that statement, thanks :) For some reason, Outlook Express didn't complain that this was an error :P
 It's really a shame that A/W don't elaborate on these things. Now we are 
 only getting increasingly unsure about stuff, and at the same time the 
 combined brain-cell-hours we invest here might simply be a wild goose 
 chase. And after all, most of this is just what they eventually *decide*. 
 We could use some help and guidelines, or even opinions on these things.

 And we could even be of help to them, since right now there seem to be a 
 lot of very smart, motivated and able people here. Maybe actually more 
 than ever before.
This is exactly why I asked these questions :) -Steve
Apr 09 2008
prev sibling parent reply guslay <guslay gmail.com> writes:
Steven Schveighoffer Wrote:
 
 1. Can pure functions use mutable heap data?  If so, what are the 
 restrictions for this?
 
 i.e.:
 pure char[] f(int x, int y)
 {
    char[] c = new char[y];
    c[] = (char)x;
 }
Isn't the dynamic allocator a global state?
Apr 09 2008
parent reply Georg Wrede <georg nospam.org> writes:
guslay wrote:
 Steven Schveighoffer Wrote:
 
1. Can pure functions use mutable heap data?  If so, what are the 
restrictions for this?

i.e.:
pure char[] f(int x, int y)
{
   char[] c = new char[y];
   c[] = (char)x;
}
Isn't the dynamic allocator a global state?
IMHO, especially in D, we'll have to make an exception of /new/. If we didn't, then a function can't return anything that can't be stored on the stack. So, if we pretend that a string is returned via the stack, and that any local strings a pure function needs temporarily, are in its own stack, then we're okay. I don't really see any alternative here. Also, I don't see the language/compiler enforcing this as all that hard.
Apr 09 2008
parent guslay <guslay gmail.com> writes:
Georg Wrede Wrote:

 guslay wrote:
 Steven Schveighoffer Wrote:
 
1. Can pure functions use mutable heap data?  If so, what are the 
restrictions for this?

i.e.:
pure char[] f(int x, int y)
{
   char[] c = new char[y];
   c[] = (char)x;
}
Isn't the dynamic allocator a global state?
IMHO, especially in D, we'll have to make an exception of /new/. If we didn't, then a function can't return anything that can't be stored on the stack. So, if we pretend that a string is returned via the stack, and that any local strings a pure function needs temporarily, are in its own stack, then we're okay. I don't really see any alternative here. Also, I don't see the language/compiler enforcing this as all that hard.
Not that I know what I'm talking about, but my understanding is: - You should be able to allocate with "scope" a local mutable object (automatic local mutable state). - You should not return a mutable reference type on the heap. A a1 = f(); A a2 = f(); Let's suppose f() is pure. Mr. compiler does: A __a = f(); A a1 = __a; A a2 = __a; But if A is a reference type, you don't know the concrete type of object a1,a2. How can you do the pure stuff, like skipping the second evaluation of f(), if you can't make distinct copy. There is no copy/cloning constructor. You don't want to share the same mutable reference between a1 and a2, otherwise you would have written A a1 = f(); A a2 = a1; - However, allocating and returning invariant memory should be fine. If A is invariant (like a string), you should be happy for a1 and a2 to share the same reference.
Apr 09 2008