digitalmars.D - pure or not pure?

Steven Schveighoffer (47/47) Apr 09 2008 I realize that I don't fully understand all the rules of what D pure

Janice Caron (40/71) Apr 09 2008 I thought that was pretty clear. It means local variables are allowed

Steven Schveighoffer (24/94) Apr 09 2008 So how is new char[] a pure function? And why can't I 'wrap' new? For

Georg Wrede (8/24) Apr 09 2008 The return value must be immutable if you're going to use it several tim...

Steven Schveighoffer (36/61) Apr 10 2008 Agreed. But I don't want to use it several times, I want to use it once...

Janice Caron (8/12) Apr 10 2008 You're overcomplicating this. The rule is
Georg Wrede (95/166) Apr 10 2008 I'll have to take that back. Of course main can be a pure function. Take...

Janice Caron (43/80) Apr 09 2008 I reckon you can, but you got the return type wrong.

Steven Schveighoffer (63/133) Apr 10 2008 This doesn't work for what I'm asking. I'll add more detail to the exam...

Janice Caron (8/16) Apr 10 2008 I would say no.

Steven Schveighoffer (13/23) Apr 10 2008 So your solution is, create a duplicate array that is never used again,

Janice Caron (32/42) Apr 10 2008 If you need to modify it, then it needs to be mutable, so a heap

Steven Schveighoffer (62/101) Apr 10 2008 Please re-read the example. If I only ever call createIncreasingSequenc...

Janice Caron (6/7) Apr 10 2008 Please don't make claims concerning what I may or may not believe.

Steven Schveighoffer (11/18) Apr 10 2008 If you read the post, you would see that it was a question, and assuming...

Janice Caron (9/26) Apr 10 2008 What you're asking for is what someone else called "amoral" (slightly

Steven Schveighoffer (13/43) Apr 10 2008 We may be able to do it without a keyword. For example, if a pure funct...

Janice Caron (13/16) Apr 10 2008 It's not the calling of new that's a problem, it's the returning of

Steven Schveighoffer (24/41) Apr 10 2008 I look at it differently.

Janice Caron (3/5) Apr 10 2008 I meant:

Koroskin Denis (14/20) Apr 10 2008 On Thu, 10 Apr 2008 20:34:29 +0400, Janice Caron

Janice Caron (15/25) Apr 10 2008 Because you're modifying global state.

Koroskin Denis (37/72) Apr 11 2008 On Thu, 10 Apr 2008 21:41:15 +0400, Janice Caron

Janice Caron (4/5) Apr 13 2008 malloc() needs free().

Janice Caron (5/18) Apr 10 2008 Sorry, I should have added: /neither/ of those functions is pure.

Georg Wrede (20/28) Apr 10 2008 First of all, I think new is special anyhow.

Steven Schveighoffer (10/16) Apr 10 2008 In fact, I think the solution to this is to make a constructor that is p...

Georg Wrede (11/31) Apr 10 2008 Yes. And thus, a pure constructor cannot write to class variables, only

Georg Wrede (8/23) Apr 10 2008 It doesn't. And whether we can have mutable arguments and/or mutable

Leandro Lucarella (24/27) Apr 10 2008 This is the exact same problem of the mutable methods of a class used in...

Georg Wrede (2/30) Apr 10 2008 g doesn't return a value.

Russell Lewis (11/17) Apr 10 2008 That's not quite true, and I think that the distinction is important.

Georg Wrede (9/47) Apr 10 2008 Yes. I think your function is pure. Except for the fact that the

Koroskin Denis (33/116) Apr 11 2008 s, =

Steven Schveighoffer (36/119) Apr 11 2008 Nope, not close :) In fact, I want a mutable array because I intend to ...

guslay (3/14) Apr 09 2008 As long as you stick to pointer arithmetic and don't dereference c, it i...

Georg Wrede (33/63) Apr 09 2008 Still not pretending to know too much, I'd say that if the A/W goal is

Steven Schveighoffer (5/20) Apr 09 2008 Yes, I forgot that statement, thanks :) For some reason, Outlook Expres...

guslay (2/12) Apr 09 2008 Isn't the dynamic allocator a global state?

Georg Wrede (9/22) Apr 09 2008 IMHO, especially in D, we'll have to make an exception of /new/.

guslay (14/40) Apr 09 2008 Not that I know what I'm talking about, but my understanding is:

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

I realize that I don't fully understand all the rules of what D pure 
functions will be.

I think all agree on these rules:

- pure functions can only call pure functions
- allow access to invariant data
- allow mutable access to simple stack variables (variables that are all on 
the stack, no references)

In Andrei's accu-functional document, instead of the last rule, there is:

- allow local automatic mutable state

I'm not sure what this means :)  But here are some questions about what is 
pure and what is not pure:

1. Can pure functions use mutable heap data?  If so, what are the 
restrictions for this?

i.e.:
pure char[] f(int x, int y)
{
   char[] c = new char[y];
   c[] = (char)x;
}

pure char[] g(int x)
{
   char[] c = f(x, 5);
}

2. Can pure functions use stack references to objects that are partially 
invariant, partially mutable?

i.e.:
class C
{
   invariant int x;
   int y;
}

pure int f(C c) {return c.x} // allowed?

This of course, assumes that C can assign x through a constructor, not just 
through a static initializer (this is not implemented today).

as an example of a pure function that takes a pointer to mutable data, but 
doesn't use the data, i.e. this doesn't ever read or write heap data:

pure char *add(char * c, int n) { return c + n;}

3. If the answer to question 1 is 'yes', can you pass mutable heap data to 
pure functions?

pure char[] substr(char[] src, int beg, int end) { return src[beg..end];}

Would substr be allowed to be called from a pure function?
Would substr be allowed to be called from a non-pure function?
Will the compiler tag heap data somehow during execution of a pure function 
to determine whether it is unique or not?

I'm just trying to get a handle on what is and isn't considered pure, as 
everyone seems to have a different opinion...

-Steve

Apr 09 2008

"Janice Caron" <caron800 googlemail.com> writes:

On 09/04/2008, Steven Schveighoffer <schveiguy yahoo.com> wrote:
 I realize that I don't fully understand all the rules of what D pure
  functions will be.

  I think all agree on these rules:

  - pure functions can only call pure functions
  - allow access to invariant data
  - allow mutable access to simple stack variables (variables that are all on
  the stack, no references)

  In Andrei's accu-functional document, instead of the last rule, there is:

  - allow local automatic mutable state

I thought that was pretty clear. It means local variables are allowed
to be mutable. (I realise that "automatic" isn't a synonym for
"local", but in the examples in accu-functional, Andrei does seem to
use it that way).



  But here are some questions about what is
  pure and what is not pure:

  1. Can pure functions use mutable heap data?

I don't see why not.


  If so, what are the
  restrictions for this?

  i.e.:
  pure char[] f(int x, int y)

I'm confident enough to say the return value will have to be
invariant. However, you that doesn't stop you from returning heap
data. You should be able to do

    pure string spaces(int x)
    {
        auto s = new char[x];
        s[] = ' ';
        return assumeUnique!(s);
    }

but you can't omit the assumeUnique!().



  2. Can pure functions use stack references to objects that are partially
  invariant, partially mutable?

My guess would be no. There's just no need for it.


  i.e.:

By the way, that should be e.g., not i.e.


  This of course, assumes that C can assign x through a constructor, not just
  through a static initializer (this is not implemented today).

I don't think it will be possible to initialize invariant data in a
non-invariant constructor, and I think an invariant constructor can
only make a fully invariant class.

I think this is a /necessary/ restriction, because otherwise the
following could not be typechecked:

    class C
    {
        invariant int * p;

        this(int * q)
        {
            p = q; // Big no no!
        }
    }




  as an example of a pure function that takes a pointer to mutable data, but
  doesn't use the data, i.e. this doesn't ever read or write heap data:

  pure char *add(char * c, int n) { return c + n;}

Oooh - now that's an interesting one. That one's got me foxed. My
instinct says it should be disallowed, but I can't quite put my finger
on why. I'm gonna have to think about that one some more.


  3. If the answer to question 1 is 'yes', can you pass mutable heap data to
  pure functions?

  pure char[] substr(char[] src, int beg, int end) { return src[beg..end];}

Same answer.



  Would substr be allowed to be called from a pure function?

If and only if substr() is itself a pure function.

  Would substr be allowed to be called from a non-pure function?

Yes


  Will the compiler tag heap data somehow during execution of a pure function
  to determine whether it is unique or not?

My guess would be no. (But that's just a guess). I reckon you'll have
to worry about that yourself and use the assumeUnique!() template to
return the data.

Apr 09 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Janice Caron" wrote
 On 09/04/2008, Steven Schveighoffer wrote:
 I realize that I don't fully understand all the rules of what D pure
  functions will be.

  I think all agree on these rules:

  - pure functions can only call pure functions
  - allow access to invariant data
  - allow mutable access to simple stack variables (variables that are all 
 on
  the stack, no references)

  In Andrei's accu-functional document, instead of the last rule, there 
 is:

  - allow local automatic mutable state

 I thought that was pretty clear. It means local variables are allowed
 to be mutable. (I realise that "automatic" isn't a synonym for
 "local", but in the examples in accu-functional, Andrei does seem to
 use it that way).

So 'local automatic' means 'local local'?

  But here are some questions about what is
  pure and what is not pure:

  1. Can pure functions use mutable heap data?

 I don't see why not.


  If so, what are the
  restrictions for this?

  i.e.:
  pure char[] f(int x, int y)

 I'm confident enough to say the return value will have to be
 invariant. However, you that doesn't stop you from returning heap
 data.  You should be able to do

    pure string spaces(int x)
    {
        auto s = new char[x];
        s[] = ' ';
        return assumeUnique!(s);
    }

 but you can't omit the assumeUnique!().

So how is new char[] a pure function?  And why can't I 'wrap' new?  For 
instance, if I have a function:

int[] createIncreasingSequence(int s, int e, int step){...}

which creates an array of ints that start at s and end at e, increasing by 
step each time, why can't I make that pure?  i.e. I want to specify my own 
way to initialize an array.  If I do that in 10 different pure functions, 
you're saying I have to re-implement that code in each one?

  2. Can pure functions use stack references to objects that are partially
  invariant, partially mutable?

 My guess would be no. There's just no need for it.

Surely there is some possible need for having run-time initialized invariant 
variables...

  This of course, assumes that C can assign x through a constructor, not 
 just
  through a static initializer (this is not implemented today).

 I don't think it will be possible to initialize invariant data in a
 non-invariant constructor, and I think an invariant constructor can
 only make a fully invariant class.

 I think this is a /necessary/ restriction, because otherwise the
 following could not be typechecked:

    class C
    {
        invariant int * p;

        this(int * q)
        {
            p = q; // Big no no!
        }
    }

Why would that compile?  If it did, then so would this:

this(int *q) invariant
{
    p = q;
}

I think invariant members that are run-time initialized can be done, as 
during the constructor, nothing has access to the 'this' pointer.

  3. If the answer to question 1 is 'yes', can you pass mutable heap data 
 to
  pure functions?

  pure char[] substr(char[] src, int beg, int end) { return 
 src[beg..end];}

 Same answer.



  Would substr be allowed to be called from a pure function?

 If and only if substr() is itself a pure function.

  Would substr be allowed to be called from a non-pure function?

 Yes

This was misleading on my part.  I should have asked instead of these two 
questions, if substr is allowed to be pure, can it be called from a non-pure 
function?  In the context of being called from a pure function, src is 
unique, because pure has to have unique data.  But in the context of calling 
from a non-pure function, it might not be...

-Steve

Apr 09 2008

Georg Wrede <georg nospam.org> writes:

Steven Schveighoffer wrote:
 "Janice Caron" wrote
 
 So how is new char[] a pure function?  And why can't I 'wrap' new?  For 
 instance, if I have a function:
 
 int[] createIncreasingSequence(int s, int e, int step){...}
 
 which creates an array of ints that start at s and end at e, increasing by 
 step each time, why can't I make that pure?  i.e. I want to specify my own 
 way to initialize an array.  If I do that in 10 different pure functions, 
 you're saying I have to re-implement that code in each one?

The return value must be immutable if you're going to use it several times.

And if you want to cache the return values outside of the function (as 
in only returning a new array when new parameters are used, and 
otherwise just a reference to a previously returned one), then I guess I 
don't have an opinion. :-?

 This was misleading on my part.  I should have asked instead of these two 
 questions, if substr is allowed to be pure, can it be called from a non-pure 
 function?  In the context of being called from a pure function, src is 
 unique, because pure has to have unique data.  But in the context of calling 
 from a non-pure function, it might not be...

Ultimately, any function is called from main(). Which is a non-pure 
function.

Apr 09 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Georg Wrede" wrote
 Steven Schveighoffer wrote:
 "Janice Caron" wrote

 So how is new char[] a pure function?  And why can't I 'wrap' new?  For 
 instance, if I have a function:

 int[] createIncreasingSequence(int s, int e, int step){...}

 which creates an array of ints that start at s and end at e, increasing 
 by step each time, why can't I make that pure?  i.e. I want to specify my 
 own way to initialize an array.  If I do that in 10 different pure 
 functions, you're saying I have to re-implement that code in each one?

 The return value must be immutable if you're going to use it several 
 times.

Agreed.  But I don't want to use it several times, I want to use it once :)

I want it to act like new does.  For example, if I have

char[] c = new char[5];

in two different pure functions, new better not return the same exact memory 
space for both!  It should be called every time.  But I can't do this with a 
custom function because pure functions can only call other pure functions. 
Basically, I'm asking if there will be a way to move a common piece of a 
function that allocates and initializes data out of a pure function and into 
another.

It would be cool if the compiler did not perform special optimizations (such 
as memoization) on pure functions that return mutable heap data, but still 
enforced the rules of purity.

 And if you want to cache the return values outside of the function (as in 
 only returning a new array when new parameters are used, and otherwise 
 just a reference to a previously returned one), then I guess I don't have 
 an opinion. :-?

 This was misleading on my part.  I should have asked instead of these two 
 questions, if substr is allowed to be pure, can it be called from a 
 non-pure function?  In the context of being called from a pure function, 
 src is unique, because pure has to have unique data.  But in the context 
 of calling from a non-pure function, it might not be...

 Ultimately, any function is called from main(). Which is a non-pure 
 function.

This is a special case of pure function.  Let's say I have a pure function:

pure invariant(char)[] f()
{
    char[] c = new char[15];
    g(c);
    return assumeUnique!(c);
}

Now, if g is a pure function that takes a char[] argument and accesses the 
data referenced by the argument, it could be OK to call it from another pure 
function because we could assume that the pure function can only use heap 
data that is 'local' to the function.  But it might be invalid to call such 
a pure function from a non-pure function because data held by a non-pure 
function is not always considered 'local'.  On the other hand, if a pure 
function can hold pointers to heap data that is shared, but just can't use 
it, then a pure function that takes mutable heap data might not be able to 
use it, so a function like g() isn't possible.

If pure functions only allow invariant heap parameters and invariant heap 
returns, then this whole discussion becomes moot, but if they accept 
non-invariant heap parameters, or allow non-invariant heap returns, I want 
to know what the rules for those are.  It seems to me that only allowing 
invariant heap data is a severe limitation that will leave D functional 
programming crippled in the face of an actual FP language.

-Steve

Apr 10 2008

"Janice Caron" <caron800 googlemail.com> writes:

On 10/04/2008, Steven Schveighoffer <schveiguy yahoo.com> wrote:
  I want it to act like new does.  For example, if I have

  char[] c = new char[5];

  in two different pure functions, new better not return the same exact memory
  space for both!

You're overcomplicating this. The rule is

    same input => same output

And I would put good money on the notion that "same" means "compares
as equal with the == operator".

(And that almost certainly means that mutable char[]s won't be allowed
as parameters to pure functions - because strings and char[]s with the
same content compare as equal)

Apr 10 2008

Georg Wrede <georg nospam.org> writes:

Steven Schveighoffer wrote:
 "Georg Wrede" wrote
Steven Schveighoffer wrote:
"Janice Caron" wrote

So how is new char[] a pure function?  And why can't I 'wrap' new?  For 
instance, if I have a function:

int[] createIncreasingSequence(int s, int e, int step){...}

which creates an array of ints that start at s and end at e, increasing 
by step each time, why can't I make that pure?  i.e. I want to specify my 
own way to initialize an array.  If I do that in 10 different pure 
functions, you're saying I have to re-implement that code in each one?

The return value must be immutable if you're going to use it several 
times.

 
 Agreed.  But I don't want to use it several times, I want to use it once :)
 
 I want it to act like new does.  For example, if I have
 
 char[] c = new char[5];
 
 in two different pure functions, new better not return the same exact memory 
 space for both!  It should be called every time.  But I can't do this with a 
 custom function because pure functions can only call other pure functions. 
 Basically, I'm asking if there will be a way to move a common piece of a 
 function that allocates and initializes data out of a pure function and into 
 another.
 
 It would be cool if the compiler did not perform special optimizations (such 
 as memoization) on pure functions that return mutable heap data, but still 
 enforced the rules of purity.
 
And if you want to cache the return values outside of the function (as in 
only returning a new array when new parameters are used, and otherwise 
just a reference to a previously returned one), then I guess I don't have 
an opinion. :-?

This was misleading on my part.  I should have asked instead of these two 
questions, if substr is allowed to be pure, can it be called from a 
non-pure function?  In the context of being called from a pure function, 
src is unique, because pure has to have unique data.  But in the context 
of calling from a non-pure function, it might not be...

Ultimately, any function is called from main(). Which is a non-pure 
function.


I'll have to take that back. Of course main can be a pure function. Take 
the unix sort function for an example. It receives data (which by the 
way, is /assumed/ immutable, that is, the input file is assumed not to 
change while sort is running) and _without_ otherwise changing its 
environment (read, files on the file system), returns a value (the 
sorted input).

Actually a lot of unix functionality is based on this. (Almost (just so 
I'm not caught lying :-) )) all the unix commands that take stdin as 
input and stdout as output, can be considered pure functions. (That most 
of them also output to stderr doesn't count in this discussion. :-) Nor 
the fact that they can change files (like the tee command) if the user 
wants. Also, they're excellent examples of functions that can be pure or 
not pure, depending on the invocation.)

 This is a special case of pure function.  Let's say I have a pure function:
 
 pure invariant(char)[] f()
 {
     char[] c = new char[15];
     g(c);
     return assumeUnique!(c);
 }
 
 Now, if g is a pure function that takes a char[] argument and accesses the 
 data referenced by the argument, it could be OK to call it from another pure 
 function because we could assume that the pure function can only use heap 
 data that is 'local' to the function.  But it might be invalid to call such 
 a pure function from a non-pure function because data held by a non-pure 
 function is not always considered 'local'.  On the other hand, if a pure 
 function can hold pointers to heap data that is shared, but just can't use 
 it, then a pure function that takes mutable heap data might not be able to 
 use it, so a function like g() isn't possible.

I'd say that

  - you can call a pure function from a non-pure function, anytime

  - you can call a non-pure function from a pure function, BUT THAT 
makes your pure function lose its purity. It'll still work, however. 
(And of course, the compiler could be polite enough to remind you of 
losing purity -- if the compiler can see that you're actually trying to 
have a pure function, as opposed to the function being pure just by 
happenstance.)

For a program none of this matters. I'd say that you can mix freely 
whatever you like. BUT, if you wish the compiler has the possibility to 
do serious optimizing, the you better not soil your pure functions.

I'd say, that is your only punishment for frivolous mixing. The data 
doesn't get corrupted, and the sky doesn't turn checkered above you.

 If pure functions only allow invariant heap parameters and invariant heap 
 returns, then this whole discussion becomes moot, but if they accept 
 non-invariant heap parameters, or allow non-invariant heap returns, I want 
 to know what the rules for those are.  It seems to me that only allowing 
 invariant heap data is a severe limitation that will leave D functional 
 programming crippled in the face of an actual FP language.

One of the dreamed benefits of deciding that _D_ pure functions can only 
access invariant data is to ensure that the compiler can do memoisation 
when it wants to. That is, then the programmer doesn't have to do 
memoisation by hand. This lessens bugs, eases coding, and gives the 
compiler potentially big possibilities of very serious optimizing. (Not 
that DMD probably will achieve such in some time, but still. After all, 
we're designing a laguage here, too, which should make it _possible_ to 
write such compilers.)

If we define

  - pure functions can only receive immutable data as arguments
  - pure functions can only access immutable other data
  - pure functions can only return immutable data
  - pure functions can't change or output other than the return "value"
  - pure functions can only use other pure functions

and then we define

  - "return value" can also be a reference to immutable heap data
  - pure functions can use the heap as their local storage

and we decide that the last one means that pure functions can create and 
manipulate mutable data on the heap, but only so that nobody else has 
access to this data (including previous, later, or concurrent instances 
of the same function),

then we should be half-way to the goals A/W are trying to reach.

---

Now, why all this heap stuff?

Simply because even Real Functional Languages, that make it look like 
the stack is the only place in the world, can actually use the heap all 
they want, only they do it behind the back of the programmer. It's not 
about *literally* using the stack only, it's about the metaphor.

This metaphor is dictated by the fact that pure functionality means that 
stuff is passed via return values only.

Now, in D (which is a multi-paradigm, practical language, for Systems 
Programming), we don't want to restrict ourselves any more by purity 
than we'd want to do that with OO, for example.

Before C++ was invented, we had languages that were OO-only, and 
languages that were non-OO. (Even Tiobe still uses this distinction, and 
I always wonder to which category they've put D.) The reason for this 
either-or was simply that people were new to OO, and hadn't yet figured 
out how (or whether) they can mix OO and procedural.

To take another example, when I got a VIC-20 in the early eighties, it 
only had a Microsoft Basic interpreter. You couldn't do structured 
programming with it simply because the facilities were missing in the 
language. This made the programming effort more than exponential into 
the number of lines written. Today, I only do structural programming (as 
in versus unstructural), and admire Walter, who still mixes structured 
and unstructured, quite fluently. (Just see Phobos source, especially 
the C files.)

Now, with D we are charting new ground here. We are (arguably) the first 
to make a serious effort of mixing functional (as in pure) with 
procedural programming.


And this is where the heap stuff comes along. It would be impractical 
for D to pass everything via the stack. First of all, non-trivial 
progams would potentially need huge stack spaces, and in many 
environments the stack space allocated for a program is given at startup 
time. So, either we need to allocate stack just-in-case, or risk running 
out of stack in mid-run. Wasteful, and impolite, especially in a 
multiuser environment.

But to achieve this, we, at this point, have to decide that both input 
to pure functions and output (i.e. the return "value") have to be 
invariant. Only this lets us get away with only "pretending" to use the 
stack, while in essence we pass references to heap stuff via the stack.

---

Later, when we're more at ease with mixing functional with 
non-functional, we might well find ways to ease these restrictions. But 
that might be several years after we've actually started using this in 
real projects.


PS, oh Bob, I wish Walter would comment on this post, in any way. 
Suppose my understanding on this differs from that of A/W, then I'd only 
be misleading everybody here.

Apr 10 2008

"Janice Caron" <caron800 googlemail.com> writes:

On 09/04/2008, Steven Schveighoffer <schveiguy yahoo.com> wrote:
 So how is new char[] a pure function?

new is special.

  And why can't I 'wrap' new?  For
  instance, if I have a function:

  int[] createIncreasingSequence(int s, int e, int step){...}

  which creates an array of ints that start at s and end at e, increasing by
  step each time, why can't I make that pure?

I reckon you can, but you got the return type wrong.

    invariant(int)[] createIncreasingSequence(int s, int e, int step){...}
    {
        auto array = new int[(e-s)/step];
        foreach(ref a;array)
        {
            a = e;
            e += step;
        }
        return assumeUnique!(array);
    }

 i.e. I want to specify my own
  way to initialize an array.  If I do that in 10 different pure functions,
  you're saying I have to re-implement that code in each one?

To be honest, I don't understand the question.

The rule is : same input => same output

What happens in the middle, so long as it is side-effect free, doesn't matter.

 Surely there is some possible need for having run-time initialized invariant
  variables...

You can do that. Walter confirmed that in response to a previous example I gave.

   import std.date;

   void main()
   {
       invariant s = toString(getUTCtime());

       pure string f()
       {
           return s;
       }

       writefln(s);
   }

Walter says f is pure. Ironically, the program will give a different
output each time it's run, but aparently purity only has to be global,
not universal.

  > I think this is a /necessary/ restriction, because otherwise the
  > following could not be typechecked:
  >
  >    class C
  >    {
  >        invariant int * p;
  >
  >        this(int * q)
  >        {
  >            p = q; // Big no no!
  >        }
  >    }

 Why would that compile?

Thankfully, it doesn't. But you were suggesting it should be possible
to initialize member variables which were declared invariant inside a
constructor, so I was demonstrating why this would not be a good idea.

  If it did, then so would this:

  this(int *q) invariant
  {
     p = q;
  }

Not so. Invariant constructors have different typechecking rules.

Member variable p initially has type __raw(int *), which means it can
only be assigned with an invariant(int *), not with an int *.

  I think invariant members that are run-time initialized can be done, as
  during the constructor, nothing has access to the 'this' pointer.

"this" is certainly available during a constructor.

  This was misleading on my part.  I should have asked instead of these two
  questions, if substr is allowed to be pure,

which is a big "if", and I'm still hedging my bets on the answer being
no. But let's hypothetically assume it's yes for sake of argument.

 can it be called from a non-pure
 function?

There is nothing to stop you calling a pure function from a non-pure
function. So that would be a yes.

 In the context of being called from a pure function, src is
 unique, because pure has to have unique data.

No it doesn't. It has to have invariant data, not unique data. The
whole point about invariant data is it's OK to share it.

Apr 09 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Janice Caron" wrote
 On 09/04/2008, Steven Schveighoffer wrote:
 So how is new char[] a pure function?

 new is special.

  And why can't I 'wrap' new?  For
  instance, if I have a function:

  int[] createIncreasingSequence(int s, int e, int step){...}

  which creates an array of ints that start at s and end at e, increasing 
 by
  step each time, why can't I make that pure?

 I reckon you can, but you got the return type wrong.

    invariant(int)[] createIncreasingSequence(int s, int e, int step){...}
    {
        auto array = new int[(e-s)/step];
        foreach(ref a;array)
        {
            a = e;
            e += step;
        }
        return assumeUnique!(array);
    }

This doesn't work for what I'm asking.  I'll add more detail to the example. 
You have two functions f, and g:

pure int f()
{
    int[] x = new int[5];
    for(int i = 0; i < x.length; i++)
       x[i] = 1 + i;

    /* do stuff */
    return result;
}

pure int g()
{
    int[] y = new int[15];
    for(int i = 0; i < y.length; y++)
        y[i] = 10 + (i * 3);

    /* do stuff */
    return result;
}

Now, I realize I have very similar code that initializes the int arrays, so 
I want to put that into a function that creates and initializes an array 
with the given parameters:

int[] createIncreasingSequence(int length, int start, int step)
{
    int[] result = new int[length];
    for(int i = 0; i < result.length; i++)
        result[i] = start + (i * step);
    return result;
}

Now, I think we all agree that createIncreasingSequence is as pure as 
calling new, right?  That is, it doesn't affect any global state (except for 
allocating heap data) and will return an equivalent array every time the 
same arguments are given.  But if I can't declare it as 'pure', then I can't 
call it from f and g:

pure int f()
{
   int[] x = createIncreasingSequence(5, 1, 1);

   /* do stuff */
   return result;
}

pure int g()
{
   int[] y = createIncreasingSequence(15, 10, 3);

   /* do stuff */
   return result;
}

So should there be a way to define createIncreasingSequence such that I can 
call it from a pure function?  Or do I have to re-implement it in every pure 
function I use?

Not having the ability to pass around mutable heap data to and from pure 
functions is going to limit severely the usefulness of pure functions.

 Surely there is some possible need for having run-time initialized 
 invariant
  variables...

 You can do that. Walter confirmed that in response to a previous example I 
 gave.

   import std.date;

   void main()
   {
       invariant s = toString(getUTCtime());

       pure string f()
       {
           return s;
       }

       writefln(s);
   }

 Walter says f is pure. Ironically, the program will give a different
 output each time it's run, but aparently purity only has to be global,
 not universal.

This is exactly what I want to do, But I want to do it inside a class scope, 
not in a function scope.  I think someone in another thread already 
confirmed that this is possible in DMD 2.012, so my question is already 
answered.

  > I think this is a /necessary/ restriction, because otherwise the
  > following could not be typechecked:
  >
  >    class C
  >    {
  >        invariant int * p;
  >
  >        this(int * q)
  >        {
  >            p = q; // Big no no!
  >        }
  >    }

 Why would that compile?

 Thankfully, it doesn't. But you were suggesting it should be possible
 to initialize member variables which were declared invariant inside a
 constructor, so I was demonstrating why this would not be a good idea.

Like I said, this is already possible.

  If it did, then so would this:

  this(int *q) invariant
  {
     p = q;
  }

 Not so. Invariant constructors have different typechecking rules.

 Member variable p initially has type __raw(int *), which means it can
 only be assigned with an invariant(int *), not with an int *.

I would guess that this is a requirement for invariant class members as 
well.

  I think invariant members that are run-time initialized can be done, as
  during the constructor, nothing has access to the 'this' pointer.

 "this" is certainly available during a constructor.

I would guess that you cannot pass 'this' to an external entity before 
initializing the invariant variables (or before initializing mutable 
variables in an invariant constructor).  This should be statically 
verifiable.

-Steve

Apr 10 2008

"Janice Caron" <caron800 googlemail.com> writes:

On 10/04/2008, Steven Schveighoffer <schveiguy yahoo.com> wrote:
  But if I can't declare it as 'pure', then I can't
  call it from f and g:

No, you can't.


  So should there be a way to define createIncreasingSequence such that I can
  call it from a pure function?

I would say no.


  Or do I have to re-implement it in every pure
  function I use?

You can make a pure function to return an immutable array, and then
duplicate it within f and g.


  Not having the ability to pass around mutable heap data to and from pure
  functions is going to limit severely the usefulness of pure functions.

I don't see why. Functional programming languages such as Haskell seem
not to mind. In many functional programming languages, there is no
mutable data at all.

Apr 10 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Janice Caron" wrote
 On 10/04/2008, Steven Schveighoffer wrote:
  Or do I have to re-implement it in every pure
  function I use?

 You can make a pure function to return an immutable array, and then
 duplicate it within f and g.

So your solution is, create a duplicate array that is never used again, 
thereby increasing the runtime (and memory allocation) of the pure function, 
and lowering the performance of the application.  I can't wait to use 
functional programming, it sounds awesome :)

  Not having the ability to pass around mutable heap data to and from pure
  functions is going to limit severely the usefulness of pure functions.

 I don't see why. Functional programming languages such as Haskell seem
 not to mind. In many functional programming languages, there is no
 mutable data at all.

Of course functional programming languages have mutable data.  They just 
don't have SHARED data.  We need immutable data in D because D is not a 
functional language and can share data, so that data needs to be immutable. 
But data that isn't shared shouldn't need to be immutable.

So let me restate: not having the ability to pass around UNIQUE mutable heap 
data to and from pure functions is going to limit severely the usefulness of 
pure functions.

-Steve

Apr 10 2008

"Janice Caron" <caron800 googlemail.com> writes:

On 10/04/2008, Steven Schveighoffer <schveiguy yahoo.com> wrote:
 So your solution is, create a duplicate array that is never used again,

If you need to modify it, then it needs to be mutable, so a heap
allocation is unavoidable in any case.

  thereby increasing the runtime

No, /decreasing/ the runtime, because the construction of the
invariant array only has to happen ONCE. (Conceptually, it happens
every time you call the function, but that's the beauty of functional
programming. The compiler is able to cache things and reuse them).


 Of course functional programming languages have mutable data.

Haskell doesn't.

  They just
  don't have SHARED data.

Haskell has no mutable data at all, only constants. In Haskell, you cannot do

    x = x + 1

because x, once assigned, keeps its value until it goes out of scope.
Now, I'm not saying that D needs to be that extreme. I'm only saying
that to get the benefits of FP, you have to hand over a certain amount
of control to the compiler, to the language. You no longer get to
choose the order of evaluation. You cannot be certain what will and
what won't be caller. You are no longer in /any/ position to estimate
execution time on the basis of what will or will not be called, or
when, or in what order.

Assuming the compiler is smart enough to make all the optimizations
that FP permits it to make, this is a plus, not a minus.


  We need immutable data in D because

No one is arguing against that position.


  But data that isn't shared shouldn't need to be immutable.

Or that one.


  So let me restate: not having the ability to pass around UNIQUE mutable heap
 data to and from pure functions is going to limit severely the usefulness of
  pure functions.

I disagree - partly, because the compiler cannot statically verify
uniqueness, so you would immediately be compromising the ability of FP
to strongly typecheck everything, thereby robbing the compiler of the
guarantees that it needs to ensure fast and crash-free
multiprocessing. But mostly because FP requires a different way of
thinking about things, and trying to force it to accomodate non-FP
paradigms because they sit well with the way you think about problems
just isn't playing the game. You have to trust that the compiler a
little more. It can optimise better - you just have to keep your
fingers crossed that it actually will!

Apr 10 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Janice Caron" wrote
 On 10/04/2008, Steven Schveighoffer wrote:
 So your solution is, create a duplicate array that is never used again,

 If you need to modify it, then it needs to be mutable, so a heap
 allocation is unavoidable in any case.

Yes, but this is 2x heap allocation.  For no reason.

  thereby increasing the runtime

 No, /decreasing/ the runtime, because the construction of the
 invariant array only has to happen ONCE. (Conceptually, it happens
 every time you call the function, but that's the beauty of functional
 programming. The compiler is able to cache things and reuse them).

Please re-read the example.  If I only ever call createIncreasingSequence 
with a set of arguments once, then it's memoizing on data that's never used 
again.  My supposition is that I have several functions that have similar 
code that I parameterize into a pure function.

Yes, f, and g can be cached, but that doesn't mean createIncreasingSequence 
needs to be cached.  What I'm trying to do (which is a common programming 
practice) is take common pieces of functions and use a separate function to 
do that piece.  My common piece happens to allocate data.

In other words, if I have pure functions:

f(x)
{
   statement1;
   statement2;
}

g(x)
{
   statement1;
   statement2;
}

Then I should be able to do

h(x)
{
    statement1;
    statement2;
}

f(x)
{
   h(x);
}

g(x)
{
   h(x);
}

No matter what statement 1 or 2 is.  If it happens to include memory 
allocation, and that is allowed in a pure function, then I should be able to 
wrap that memory allocation in another function.  If the compiler can't 
optimize on it inside the pure function, then it can't optimize it in the 
sub function, but who cares at that point?

 Of course functional programming languages have mutable data.

 Haskell doesn't.

  They just
  don't have SHARED data.

 Haskell has no mutable data at all, only constants. In Haskell, you cannot 
 do

    x = x + 1

 because x, once assigned, keeps its value until it goes out of scope.
 Now, I'm not saying that D needs to be that extreme. I'm only saying
 that to get the benefits of FP, you have to hand over a certain amount
 of control to the compiler, to the language. You no longer get to
 choose the order of evaluation. You cannot be certain what will and
 what won't be caller. You are no longer in /any/ position to estimate
 execution time on the basis of what will or will not be called, or
 when, or in what order.

I've been reading a bit about FP, and I see now that you are correct that 
mutable data isn't allowed.  But let's not forget that D is not a functional 
language, so a loop like:

for(int i = 0; i < n; i++)
   statement;

Is legal inside a D pure function, whereas the same code would be achieved 
through tail-recursion in a functional language.  Having mutable return 
values and arguments may not prevent the compiler from considering the 
function pure (then again, it may, that is my question).  But certainly, 
comparing D to Haskell doesn't prove either case.

I don't think the intent of bolting functional programming optimization 
opportunities on top of D is going to mean that during pure functions we 
will have to use a fully FP style.

  So let me restate: not having the ability to pass around UNIQUE mutable 
 heap
 data to and from pure functions is going to limit severely the usefulness 
 of
  pure functions.

 I disagree - partly, because the compiler cannot statically verify
 uniqueness, so you would immediately be compromising the ability of FP
 to strongly typecheck everything, thereby robbing the compiler of the
 guarantees that it needs to ensure fast and crash-free
 multiprocessing. But mostly because FP requires a different way of
 thinking about things, and trying to force it to accomodate non-FP
 paradigms because they sit well with the way you think about problems
 just isn't playing the game. You have to trust that the compiler a
 little more. It can optimise better - you just have to keep your
 fingers crossed that it actually will!

OK, sure, so that means you believe:

char[] c = new char[5];

is invalid inside a pure function because the compiler cannot verify 
uniqueness of mutable heap data, right?  But the data returned IS unique. 
Couldn't there be a way to specify that the data is unique, or at least 
local?

I consider having to dup data just so I can use it to be wasteful, no matter 
what paradigm I'm using.

-Steve

Apr 10 2008

"Janice Caron" <caron800 googlemail.com> writes:

On 10/04/2008, Steven Schveighoffer <schveiguy yahoo.com> wrote:
 OK, sure, so that means you believe:

Please don't make claims concerning what I may or may not believe.
Only I get to make those claims.

For the record, I do /not/ believe that which you claim I believe.

Feel free to rephrase as "assertion X leads to conclusion Y" or some
such, but please don't get personal, or I'm outa here.

Apr 10 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Janice Caron" wrote
 On 10/04/2008, Steven Schveighoffer wrote:
 OK, sure, so that means you believe:

 Please don't make claims concerning what I may or may not believe.
 Only I get to make those claims.

 For the record, I do /not/ believe that which you claim I believe.

 Feel free to rephrase as "assertion X leads to conclusion Y" or some
 such, but please don't get personal, or I'm outa here.

If you read the post, you would see that it was a question, and assuming 
that it was true, a further question about why you think that.

"so that means you believe: .... right?"

First, off, this is not an attack.  Second, even if it was an attack, it's 
not about anything personal.  It's about your beliefs on a language design. 
Please don't get so defensive, nobody is attacking you personally.  We're 
all trying to figure this pure function thing out, and I happen to think 
your statements are inconsistent.  This has nothing to do with what I think 
of you personally.

-Steve

Apr 10 2008

"Janice Caron" <caron800 googlemail.com> writes:

On 10/04/2008, Steven Schveighoffer <schveiguy yahoo.com> wrote:
  Then I should be able to do

  h(x)
  {
     statement1;
     statement2;
  }

  f(x)
  {
    h(x);
  }

  g(x)
  {
    h(x);
  }

What you're asking for is what someone else called "amoral" (slightly
pure). You're suggesting that h be pure-/ish/. It doesn't have to be
completely pure per se, but it must be pure /enough/ such that when
embedded within f or g, f and g remain pure.

I have no idea whether or not D will be able to accomodate this
concept. Strictly functional programming doesn't have it. D might or
might not. If we do, I suspect it will require another keyword.



  char[] c = new char[5];

 is invalid inside a pure function because the compiler cannot verify
  uniqueness of mutable heap data, right?

No. As I said, new is special. It's integral to the language.

Apr 10 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Janice Caron" wrote
 On 10/04/2008, Steven Schveighoffer wrote:
  Then I should be able to do

  h(x)
  {
     statement1;
     statement2;
  }

  f(x)
  {
    h(x);
  }

  g(x)
  {
    h(x);
  }

 What you're asking for is what someone else called "amoral" (slightly
 pure). You're suggesting that h be pure-/ish/. It doesn't have to be
 completely pure per se, but it must be pure /enough/ such that when
 embedded within f or g, f and g remain pure.

 I have no idea whether or not D will be able to accomodate this
 concept. Strictly functional programming doesn't have it. D might or
 might not. If we do, I suspect it will require another keyword.

We may be able to do it without a keyword.  For example, if a pure function 
returns an invariant(char)[], it means that the return value can be 
memoized, if not, it means that the return value is well-defined, but 
another call to the function will return another copy of the same data. 
Reordering is OK, because it always returns the same data.  You could even 
wrap this in another version that returns invariant data, which then allows 
for memoization, if you know you're going to call it a lot, and don't care 
about mutating the data afterwards.

  char[] c = new char[5];

 is invalid inside a pure function because the compiler cannot verify
  uniqueness of mutable heap data, right?

 No. As I said, new is special. It's integral to the language.

Of course, but it is like a building block.  If I have a function that is 
pure, and that function calls nothing but other pure functions AND 'new', 
then why can't it also be proven to be pure?

-Steve

Apr 10 2008

"Janice Caron" <caron800 googlemail.com> writes:

On 10/04/2008, Steven Schveighoffer <schveiguy yahoo.com> wrote:
 Of course, but it is like a building block.  If I have a function that is
  pure, and that function calls nothing but other pure functions AND 'new',
  then why can't it also be proven to be pure?

It's not the calling of new that's a problem, it's the returning of
mutable data.

If we allow "pureish" functions, then maybe a "pureish" function could
return mutable data, but not a truly function.

The reason is that the compiler is allowed to cache the result (the
return value from any pure function), and to re-use that result,
instead of calling the function, whenever the input compares equal
with last time. If any of that data is mutable, it all goes haywire.

Seems to me that what's really needed here is a macro, because a macro
is basically just a function that's always inlined, so the function
body will be parsed /inside/ the pure specifier. Maybe macros could
solve that problem completely.

Apr 10 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Janice Caron" wrote
 On 10/04/2008, Steven Schveighoffer wrote:
 Of course, but it is like a building block.  If I have a function that is
  pure, and that function calls nothing but other pure functions AND 
 'new',
  then why can't it also be proven to be pure?

 It's not the calling of new that's a problem, it's the returning of
 mutable data.

 If we allow "pureish" functions, then maybe a "pureish" function could
 return mutable data, but not a truly (a pure) function.

 The reason is that the compiler is allowed to cache the result (the
 return value from any pure function), and to re-use that result,
 instead of calling the function, whenever the input compares equal
 with last time. If any of that data is mutable, it all goes haywire.

I look at it differently.

To me, a pure function is a function that has no side effects and has 
promised to return the same thing every time it is called.  Allocating 
memory is the only function that is considered pure, but alters global state 
(i.e. the heap), and so we assume it's pure.  A function can still allocate 
data and be considered pure.

A function whose result can be memoized is to me a function that is pure, 
and also one that returns either stack or invariant data.  So a function has 
to be pure to be memoized, but not all pure functions can be memoized.

If you separate out the memoization optimization, then it's possible to have 
both worlds, without any extra keywords or terminology.  We are breaking new 
ground here, so this is the time to explore options that are not part of the 
precedent.

You might not get the compiler optimizations with a pure function that 
returns mutable heap data that you would with one that returns invariant 
heap data, but sometimes that is desirable.  The compiler cannot know all 
things about all optimizations that have been or ever will be.  At some 
point, it has to be nudged in the right direction.

 Seems to me that what's really needed here is a macro, because a macro
 is basically just a function that's always inlined, so the function
 body will be parsed /inside/ the pure specifier. Maybe macros could
 solve that problem completely.

Maybe, but this results in code bloat :)  There are always tradeoffs, it's a 
question of what is important to you.  I used to work with a device that had 
256 bytes of RAM and 8K of code space.  When working on that device, I 
learned how important it was to split common functionality into functions.

-Steve

Apr 10 2008

"Janice Caron" <caron800 googlemail.com> writes:

On 10/04/2008, Janice Caron <caron800 googlemail.com> wrote:
  If we allow "pureish" functions, then maybe a "pureish" function could
  return mutable data, but not a truly function.

I meant:
...but not a truly pure function.

Apr 10 2008

"Koroskin Denis" <2korden+dmd gmail.com> writes:

On Thu, 10 Apr 2008 20:34:29 +0400, Janice Caron <caron800 googlemail.co=
m>  =

wrote:

 On 10/04/2008, Steven Schveighoffer <schveiguy yahoo.com> wrote:

  char[] c =3D new char[5];

 is invalid inside a pure function because the compiler cannot verify
  uniqueness of mutable heap data, right?

 No. As I said, new is special. It's integral to the language.

OK, new is 'special', but what about malloc or my own allocator? (I happ=
en  =

to work in a gamedev and we never use new nor malloc).
I don't see how

special extern(C) void* malloc(size_t size);

differs from:

void* my_malloc(size_t size)
{
    return malloc(size);
}

and why can't it be 'special' too.

Apr 10 2008

"Janice Caron" <caron800 googlemail.com> writes:

On 10/04/2008, Koroskin Denis <2korden+dmd gmail.com> wrote:
  OK, new is 'special', but what about malloc or my own allocator? (I happen
 to work in a gamedev and we never use new nor malloc).
  I don't see how

  special extern(C) void* malloc(size_t size);

  differs from:

  void* my_malloc(size_t size)
  {
    return malloc(size);
  }

  and why can't it be 'special' too.

Because you're modifying global state.

One of the reasons why new is special is because you never have to
delete (because the garbage collector takes care of it). This is
directly comparable to, for example, Haskell, where new "things" are
created all the time, and are never freed. It's built into the system.
More - it /is/ the system. That's the status of new. It's the de facto
heap allocation method in D.

If you want to use custom allocators, there's nothing to stop you, but
that doesn't change the fact that the minute you modify global state,
you have a function that isn't pure.

I think you're worrying too much. There should never be a need to call
malloc and free inside a pure function. If you need a function that
does impure stuff, just write the function anyway, and don't call it
pure.

Apr 10 2008

"Koroskin Denis" <2korden+dmd gmail.com> writes:

On Thu, 10 Apr 2008 21:41:15 +0400, Janice Caron <caron800 googlemail.co=
m>  =

wrote:

 On 10/04/2008, Koroskin Denis <2korden+dmd gmail.com> wrote:
  OK, new is 'special', but what about malloc or my own allocator? (I =


 =

 happen
 to work in a gamedev and we never use new nor malloc).
  I don't see how

  special extern(C) void* malloc(size_t size);

  differs from:

  void* my_malloc(size_t size)
  {
    return malloc(size);
  }

  and why can't it be 'special' too.

 Because you're modifying global state.

 One of the reasons why new is special is because you never have to
 delete (because the garbage collector takes care of it). This is
 directly comparable to, for example, Haskell, where new "things" are
 created all the time, and are never freed. It's built into the system.=

 More - it /is/ the system. That's the status of new. It's the de facto=

 heap allocation method in D.

 If you want to use custom allocators, there's nothing to stop you, but=

 that doesn't change the fact that the minute you modify global state,
 you have a function that isn't pure.

 I think you're worrying too much. There should never be a need to call=

 malloc and free inside a pure function. If you need a function that
 does impure stuff, just write the function anyway, and don't call it
 pure.

Well, I don't distinguish between new and malloc. They both are impure a=
nd
/shouldn't/ be callable from pure functions. In fact, there is a solutio=
n:
inew (for short, or new invariant(T) from accu-functional).

Obviously enough, new can't be pure:

T t1 =3D new T();
T t2 =3D new T();

Since it can't be rewritten to:

T t1 =3D new T();
T t2 =3D t1;

However, inew /is/ pure:

T t1 =3D inew T();
T t2 =3D t1;

Is this the key!?

Yet, there is one more exception: rand(). I believe Haskell /has/ random=
  =

numbers,
therefore it should be callable from pure functions, isn't it?

Other solution could be to introduce some 'uncachable' keyword, so that =
we  =

could declare:

uncachable int rand();
uncachable void* malloc(size_t size);

 [snip]
 I should have added: /neither/ of those functions is pure.

 In fact, I strongly doubt that any function declared extern(C) will be=

 pure, for the simple and obvious reason that C doesn't have the "pure"=

 keyword. :-)

Agreed, C doesn't but pure and impure follow the same calling convention=

(I hope), so there is no difference between two from linker's point of  =

view.

Can't see why isalpha/isdigit/tolower/etc. can't be declared pure:

extern "C" {
    pure int isalpha(int c);
    pure int isdigit(int c);
    pure int tolower(int c);
}

Apr 11 2008

"Janice Caron" <caron800 googlemail.com> writes:

On 11/04/2008, Koroskin Denis <2korden+dmd gmail.com> wrote:
 I don't distinguish between new and malloc.

malloc() needs free().
new needs nothing, since we have a garbage collector.

That is a /big/ difference.

Apr 13 2008

"Janice Caron" <caron800 googlemail.com> writes:

On 10/04/2008, Janice Caron <caron800 googlemail.com> wrote:
 On 10/04/2008, Koroskin Denis <2korden+dmd gmail.com> wrote:
  >  OK, new is 'special', but what about malloc or my own allocator? (I happen
  > to work in a gamedev and we never use new nor malloc).
  >  I don't see how
  >
  >  special extern(C) void* malloc(size_t size);
  >
  >  differs from:
  >
  >  void* my_malloc(size_t size)
  >  {
  >    return malloc(size);
  >  }

Sorry, I should have added: /neither/ of those functions is pure.

In fact, I strongly doubt that any function declared extern(C) will be
pure, for the simple and obvious reason that C doesn't have the "pure"
keyword. :-)

Apr 10 2008

Georg Wrede <georg nospam.org> writes:

Janice Caron wrote:
 On 10/04/2008, Steven Schveighoffer <schveiguy yahoo.com> wrote:
 
 char[] c = new char[5];

 is invalid inside a pure function because the compiler cannot verify
 uniqueness of mutable heap data, right?

 
 No. As I said, new is special. It's integral to the language.

First of all, I think new is special anyhow.

Second, come to think of it, even if new weren't special, whatever it 
returns IS UNIQUE and as good as IMMUTABLE by others -- as seen FROM THE 
PERSPECTIVE OF THE  CALLER.

Which is actually all we care about, when we're inside a pure function.


In other words, if you new something, nobody else gets to change it 
since nobody else has a reference to this new data.

---

Ehrm. On second thought, it actually IS possible to end up in a 
situation where a just newed object may suddenly change in your hands, 
and also be referenced from other parts of the program.

(I'll leave constructing an example as an exercise.)

Now, the simplest thing to guarantee that not happening would be to deny 
newing of objects that have constructors.

When the compiler gets more advanced, it could examine such 
constructors, and only warn (or deny) newing objects where it can't 
verify such is not the case.

Until that, I guess we'll see what A/W have decided on this one. If, 
that is, they've come up with the above problem.

Apr 10 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Georg Wrede" wrote
 Ehrm. On second thought, it actually IS possible to end up in a situation 
 where a just newed object may suddenly change in your hands, and also be 
 referenced from other parts of the program.

 (I'll leave constructing an example as an exercise.)

 Now, the simplest thing to guarantee that not happening would be to deny 
 newing of objects that have constructors.

In fact, I think the solution to this is to make a constructor that is pure 
:)

The problem with this is that the 'this' pointer is mutable when passed to 
the constructor, so you have mutable data being passed to a pure function. 
However, it is 'unique' data that has not been initialized anywhere, so this 
might be an exception to the rule.

Having the constructor pure solves any problems because you can't pass the 
'this' pointer out to some global function that might store it somewhere.

-Steve

Apr 10 2008

Georg Wrede <georg nospam.org> writes:

Steven Schveighoffer wrote:
 "Georg Wrede" wrote
 
Ehrm. On second thought, it actually IS possible to end up in a situation 
where a just newed object may suddenly change in your hands, and also be 
referenced from other parts of the program.

(I'll leave constructing an example as an exercise.)

Now, the simplest thing to guarantee that not happening would be to deny 
newing of objects that have constructors.

 
 
 In fact, I think the solution to this is to make a constructor that is pure 
 :)

Yes. And thus, a pure constructor cannot write to class variables, only 
instance variables.

 The problem with this is that the 'this' pointer is mutable when passed to 
 the constructor, so you have mutable data being passed to a pure function. 
 However, it is 'unique' data that has not been initialized anywhere, so this 
 might be an exception to the rule.

I agree. One might consider the "this" varable as either actually 
pointing to this particular instance or else consider it invalid. If we 
define it so, then "this" can be considered immutable.

Besides, since we don't have a reallocating GC, and we can't (at least I 
don't now remember otherwise) store object instances in (potentially 
relocating) arrays (only references to them), the "this" pointer could 
be made immutable everywhere.

 Having the constructor pure solves any problems because you can't pass the 
 'this' pointer out to some global function that might store it somewhere.

Yes.

Apr 10 2008

Georg Wrede <georg nospam.org> writes:

Steven Schveighoffer wrote:
 I've been reading a bit about FP, and I see now that you are correct that 
 mutable data isn't allowed.  But let's not forget that D is not a functional 
 language, so a loop like:
 
 for(int i = 0; i < n; i++)
    statement;
 
 Is legal inside a D pure function, whereas the same code would be achieved 
 through tail-recursion in a functional language.  Having mutable return 
 values and arguments may not prevent the compiler from considering the 
 function pure (then again, it may, that is my question).  But certainly, 
 comparing D to Haskell doesn't prove either case.

It doesn't. And whether we can have mutable arguments and/or mutable 
return values, is simply up to Walter. Nothing else.

However, I don't see any other way than both being immutable.

If you want to change the the value returned by a pure function, you 
have to make a copy of it. Just plain old COW. And most of the time, 
that feels natural anyway.

 I don't think the intent of bolting functional programming optimization 
 opportunities on top of D is going to mean that during pure functions we 
 will have to use a fully FP style.

That would be disaster. But that's just my opinion. :-)

Apr 10 2008

Leandro Lucarella <llucax gmail.com> writes:

Steven Schveighoffer, el 10 de abril a las 10:42 me escribiste:
 So let me restate: not having the ability to pass around UNIQUE mutable heap 
 data to and from pure functions is going to limit severely the usefulness of 
 pure functions.

This is the exact same problem of the mutable methods of a class used in a
stack allocated temporary of a pure function, talked in another thread.

class A {
	int x;
	void f() { x = 1; }
}

pure void g()
{
	scope a = new A;
	a.f();
}

g() is clearly pure (it has no side effects, besides allocating memory in
the stack, which shouldn't be considered a side effect, I guess allocating
in the heap shouldn't be considered a side effect either, since new is
thread safe). Even when A.f() is not pure, nor const.

There is an implementation problem, about how hard is to detect that by
the compiler, but in theory there is no reason for g() not being pure.

-- 
Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/
----------------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------------
Parece McGuevara's o CheDonald's

Apr 10 2008

Georg Wrede <georg nospam.org> writes:

Leandro Lucarella wrote:
 Steven Schveighoffer, el 10 de abril a las 10:42 me escribiste:
 
So let me restate: not having the ability to pass around UNIQUE mutable heap 
data to and from pure functions is going to limit severely the usefulness of 
pure functions.

 
 
 This is the exact same problem of the mutable methods of a class used in a
 stack allocated temporary of a pure function, talked in another thread.
 
 class A {
 	int x;
 	void f() { x = 1; }
 }
 
 pure void g()
 {
 	scope a = new A;
 	a.f();
 }
 
 g() is clearly pure (it has no side effects, besides allocating memory in
 the stack, which shouldn't be considered a side effect, I guess allocating
 in the heap shouldn't be considered a side effect either, since new is
 thread safe). Even when A.f() is not pure, nor const.
 
 There is an implementation problem, about how hard is to detect that by
 the compiler, but in theory there is no reason for g() not being pure.

g doesn't return a value.

Apr 10 2008

Russell Lewis <webmaster villagersonline.com> writes:

Janice Caron wrote:
  Not having the ability to pass around mutable heap data to and from pure
  functions is going to limit severely the usefulness of pure functions.

 
 I don't see why. Functional programming languages such as Haskell seem
 not to mind. In many functional programming languages, there is no
 mutable data at all.

That's not quite true, and I think that the distinction is important. 
Haskell and the rest are constantly doing memory allocation, and that 
memory allocation has the potential to fail when you run out of memory. 
  It's simply that they hide it all behind their syntax.

Thus, I would argue that it is perfectly reasonable for a "pure" D 
function to allocate memory.  The question arises, then, what happens 
when you run out?  A pure function probably shouldn't be allowed to 
throw an exception (that would violate the pureness guarantees), so what 
do you do?  Or do you create a "PureFunctionFailedException" which is 
the only thing that a pure function might throw???

Apr 10 2008

Georg Wrede <georg nospam.org> writes:

Steven Schveighoffer wrote:
 Now, I realize I have very similar code that initializes the int arrays, so 
 I want to put that into a function that creates and initializes an array 
 with the given parameters:
 
 int[] createIncreasingSequence(int length, int start, int step)
 {
     int[] result = new int[length];
     for(int i = 0; i < result.length; i++)
         result[i] = start + (i * step);
     return result;
 }
 
 Now, I think we all agree that createIncreasingSequence is as pure as 
 calling new, right?  That is, it doesn't affect any global state (except for 
 allocating heap data) and will return an equivalent array every time the 
 same arguments are given.  But if I can't declare it as 'pure', then I can't 
 call it from f and g:

Yes. I think your function is pure. Except for the fact that the 
returned heap data has to be pure.

 pure int f()
 {
    int[] x = createIncreasingSequence(5, 1, 1);
 
    /* do stuff */
    return result;
 }
 
 pure int g()
 {
    int[] y = createIncreasingSequence(15, 10, 3);
 
    /* do stuff */
    return result;
 }
 
 So should there be a way to define createIncreasingSequence such that I can 
 call it from a pure function?  Or do I have to re-implement it in every pure 
 function I use?

You should be able to call it from a pure function.

 Not having the ability to pass around mutable heap data to and from pure 
 functions is going to limit severely the usefulness of pure functions.

This example does not need mutable data. And it is an example of how you 
can get along just fine with immutable data.

(It is also an example of CTFE. That is, already existing DMD should 
convert f and g into compile time constants. :-) But that's obviously 
besides the point here!)

Apr 10 2008

"Koroskin Denis" <2korden+dmd gmail.com> writes:

On Thu, 10 Apr 2008 17:26:20 +0400, Steven Schveighoffer  =

<schveiguy yahoo.com> wrote:

 "Janice Caron" wrote
 On 09/04/2008, Steven Schveighoffer wrote:
 So how is new char[] a pure function?

 new is special.

  And why can't I 'wrap' new?  For
  instance, if I have a function:

  int[] createIncreasingSequence(int s, int e, int step){...}

  which creates an array of ints that start at s and end at e,  =



 increasing
 by
  step each time, why can't I make that pure?

 I reckon you can, but you got the return type wrong.

    invariant(int)[] createIncreasingSequence(int s, int e, int  =


 step){...}
    {
        auto array =3D new int[(e-s)/step];
        foreach(ref a;array)
        {
            a =3D e;
            e +=3D step;
        }
        return assumeUnique!(array);
    }

 This doesn't work for what I'm asking.  I'll add more detail to the  =

 example.
 You have two functions f, and g:

 pure int f()
 {
     int[] x =3D new int[5];
     for(int i =3D 0; i < x.length; i++)
        x[i] =3D 1 + i;

     /* do stuff */
     return result;
 }

 pure int g()
 {
     int[] y =3D new int[15];
     for(int i =3D 0; i < y.length; y++)
         y[i] =3D 10 + (i * 3);

     /* do stuff */
     return result;
 }

 Now, I realize I have very similar code that initializes the int array=

s,  =

 so
 I want to put that into a function that creates and initializes an arr=

ay
 with the given parameters:

 int[] createIncreasingSequence(int length, int start, int step)
 {
     int[] result =3D new int[length];
     for(int i =3D 0; i < result.length; i++)
         result[i] =3D start + (i * step);
     return result;
 }

 Now, I think we all agree that createIncreasingSequence is as pure as
 calling new, right?  That is, it doesn't affect any global state (exce=

pt  =

 for
 allocating heap data) and will return an equivalent array every time t=

he
 same arguments are given.  But if I can't declare it as 'pure', then I=

  =

 can't
 call it from f and g:

 pure int f()
 {
    int[] x =3D createIncreasingSequence(5, 1, 1);

    /* do stuff */
    return result;
 }

 pure int g()
 {
    int[] y =3D createIncreasingSequence(15, 10, 3);

    /* do stuff */
    return result;
 }

 So should there be a way to define createIncreasingSequence such that =

I  =

 can
 call it from a pure function?  Or do I have to re-implement it in ever=

y  =

 pure
 function I use?

There is a way with inew (invariant new):

class IncreasingSequence(int length)
{
public:
    this(int start, int step)
    {
       for(int i =3D 0; i < length; i++)
          elements[i] =3D start + (i * step);
    }

    int[length] elements;
}

pure int f()
{
    auto x =3D inew IncreasingSequence!(5)(1, 1);

    /* do stuff */
    return result;
}

pure int g()
{
    auto y =3D inew IncreasingSequence!(15)(10, 3);

    /* do stuff */
    return result;
}

Isn't it beautiful? Yet, it's /very/ close to wait you wanted.

Apr 11 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

Nope, not close :)  In fact, I want a mutable array because I intend to play 
with the values later in the function (I didn't make that clear I guess).

I agree that in the case that I don't need a mutable array, returning an 
invariant array is an acceptable solution, I don't even think you need a 
class for it, a pure function will do just fine.

But returning a mutable array IMO does not make the function impure, it just 
means you cannot cache the return value.  The compiler could be made to 
assume this when seeing that the return value has a mutable heap pointer.

-Steve

"Koroskin Denis" wrote
On Thu, 10 Apr 2008 17:26:20 +0400, Steven Schveighoffer
 wrote:
 "Janice Caron" wrote
 On 09/04/2008, Steven Schveighoffer wrote:
 So how is new char[] a pure function?

 new is special.

  And why can't I 'wrap' new?  For
  instance, if I have a function:

  int[] createIncreasingSequence(int s, int e, int step){...}

  which creates an array of ints that start at s and end at e, 
 increasing
 by
  step each time, why can't I make that pure?

 I reckon you can, but you got the return type wrong.

    invariant(int)[] createIncreasingSequence(int s, int e, int 
 step){...}
    {
        auto array = new int[(e-s)/step];
        foreach(ref a;array)
        {
            a = e;
            e += step;
        }
        return assumeUnique!(array);
    }

 This doesn't work for what I'm asking.  I'll add more detail to the 
 example.
 You have two functions f, and g:

 pure int f()
 {
     int[] x = new int[5];
     for(int i = 0; i < x.length; i++)
        x[i] = 1 + i;

     /* do stuff */
     return result;
 }

 pure int g()
 {
     int[] y = new int[15];
     for(int i = 0; i < y.length; y++)
         y[i] = 10 + (i * 3);

     /* do stuff */
     return result;
 }

 Now, I realize I have very similar code that initializes the int arrays, 
 so
 I want to put that into a function that creates and initializes an array
 with the given parameters:

 int[] createIncreasingSequence(int length, int start, int step)
 {
     int[] result = new int[length];
     for(int i = 0; i < result.length; i++)
         result[i] = start + (i * step);
     return result;
 }

 Now, I think we all agree that createIncreasingSequence is as pure as
 calling new, right?  That is, it doesn't affect any global state (except 
 for
 allocating heap data) and will return an equivalent array every time the
 same arguments are given.  But if I can't declare it as 'pure', then I 
 can't
 call it from f and g:

 pure int f()
 {
    int[] x = createIncreasingSequence(5, 1, 1);

    /* do stuff */
    return result;
 }

 pure int g()
 {
    int[] y = createIncreasingSequence(15, 10, 3);

    /* do stuff */
    return result;
 }

 So should there be a way to define createIncreasingSequence such that I 
 can
 call it from a pure function?  Or do I have to re-implement it in every 
 pure
 function I use?

There is a way with inew (invariant new):

class IncreasingSequence(int length)
{
public:
    this(int start, int step)
    {
       for(int i = 0; i < length; i++)
          elements[i] = start + (i * step);
    }

    int[length] elements;
}

pure int f()
{
    auto x = inew IncreasingSequence!(5)(1, 1);

    /* do stuff */
    return result;
}

pure int g()
{
    auto y = inew IncreasingSequence!(15)(10, 3);

    /* do stuff */
    return result;
}

Isn't it beautiful? Yet, it's /very/ close to wait you wanted.

Apr 11 2008

guslay <guslay gmail.com> writes:

Janice Caron Wrote:

 On 09/04/2008, Steven Schveighoffer <schveiguy yahoo.com> wrote:

 
  as an example of a pure function that takes a pointer to mutable data, but
  doesn't use the data, i.e. this doesn't ever read or write heap data:

  pure char *add(char * c, int n) { return c + n;}

 
 Oooh - now that's an interesting one. That one's got me foxed. My
 instinct says it should be disallowed, but I can't quite put my finger
 on why. I'm gonna have to think about that one some more.
 

As long as you stick to pointer arithmetic and don't dereference c, it is pure.

Will it be allowed, that's pure speculation ;)

Apr 09 2008

Georg Wrede <georg nospam.org> writes:

Steven Schveighoffer wrote:
 I realize that I don't fully understand all the rules of what D pure 
 functions will be.
 
 I think all agree on these rules:
 
 - pure functions can only call pure functions
 - allow access to invariant data
 - allow mutable access to simple stack variables (variables that are all on 
 the stack, no references)
 
 In Andrei's accu-functional document, instead of the last rule, there is:
 
 - allow local automatic mutable state

Still not pretending to know too much, I'd say that if the A/W goal is 
"pure functions as far as MP optimizations is needed" only, then:

A function might need to have local state. Take a function that counts 
something in a list. It'd need a local counter -- unless we want to 
force the programmer to write it as a recursive function. (Which I 
wholeheartedly hope not. I'd be happy with D being multiparadigm.)

Such a variable would be the "local automatic mutable state". Local, of 
course. Automatic, as destructed at end of scope. Mutable and state, of 
course. :-)

 I'm not sure what this means :)  But here are some questions about what is 
 pure and what is not pure:
 
 1. Can pure functions use mutable heap data?  If so, what are the 
 restrictions for this?
 
 i.e.:
 pure char[] f(int x, int y)
 {
    char[] c = new char[y];
    c[] = (char)x;

return c; // I presume.

 }
 
 pure char[] g(int x)
 {
    char[] c = f(x, 5);
 }

f is pure. g is pure itself, and only uses f. And a new c is created 
each time you access the function, so it's free of other references to 
it (that could change it).

For example, function

     pure char[] concatenate(char[] a, char[] b)
     {
         // return new string consisting of a ~ b
     }

The return value is of course a reference to a new string. In a purely 
functional language returning a reference to a heap object might not be 
ok (I'm no pro on that.), but as D otherwise pretends strings to behave 
somewhat value like, I assume this simply has to be ok.

Of course, a and b would have to be immutable.

It's really a shame that A/W don't elaborate on these things. Now we are 
only getting increasingly unsure about stuff, and at the same time the 
combined brain-cell-hours we invest here might simply be a wild goose 
chase. And after all, most of this is just what they eventually 
*decide*. We could use some help and guidelines, or even opinions on 
these things.

And we could even be of help to them, since right now there seem to be a 
lot of very smart, motivated and able people here. Maybe actually more 
than ever before.

Apr 09 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Georg Wrede" wrote
 Steven Schveighoffer wrote:
 pure char[] f(int x, int y)
 {
    char[] c = new char[y];
    c[] = (char)x;

 return c; // I presume.

 }


Yes, I forgot that statement, thanks :)  For some reason, Outlook Express 
didn't complain that this was an error :P

 It's really a shame that A/W don't elaborate on these things. Now we are 
 only getting increasingly unsure about stuff, and at the same time the 
 combined brain-cell-hours we invest here might simply be a wild goose 
 chase. And after all, most of this is just what they eventually *decide*. 
 We could use some help and guidelines, or even opinions on these things.

 And we could even be of help to them, since right now there seem to be a 
 lot of very smart, motivated and able people here. Maybe actually more 
 than ever before.

This is exactly why I asked these questions :)

-Steve

Apr 09 2008

guslay <guslay gmail.com> writes:

Steven Schveighoffer Wrote:
 
 1. Can pure functions use mutable heap data?  If so, what are the 
 restrictions for this?
 
 i.e.:
 pure char[] f(int x, int y)
 {
    char[] c = new char[y];
    c[] = (char)x;
 }

Isn't the dynamic allocator a global state?

Apr 09 2008

Georg Wrede <georg nospam.org> writes:

guslay wrote:
 Steven Schveighoffer Wrote:
 
1. Can pure functions use mutable heap data?  If so, what are the 
restrictions for this?

i.e.:
pure char[] f(int x, int y)
{
   char[] c = new char[y];
   c[] = (char)x;
}

 
 Isn't the dynamic allocator a global state?

IMHO, especially in D, we'll have to make an exception of /new/.

If we didn't, then a function can't return anything that can't be stored 
on the stack.

So, if we pretend that a string is returned via the stack, and that any 
local strings a pure function needs temporarily, are in its own stack, 
then we're okay.

I don't really see any alternative here. Also, I don't see the 
language/compiler enforcing this as all that hard.

Apr 09 2008

guslay <guslay gmail.com> writes:

Georg Wrede Wrote:

 guslay wrote:
 Steven Schveighoffer Wrote:
 
1. Can pure functions use mutable heap data?  If so, what are the 
restrictions for this?

i.e.:
pure char[] f(int x, int y)
{
   char[] c = new char[y];
   c[] = (char)x;
}

 
 Isn't the dynamic allocator a global state?

 
 IMHO, especially in D, we'll have to make an exception of /new/.
 
 If we didn't, then a function can't return anything that can't be stored 
 on the stack.
 
 So, if we pretend that a string is returned via the stack, and that any 
 local strings a pure function needs temporarily, are in its own stack, 
 then we're okay.
 
 I don't really see any alternative here. Also, I don't see the 
 language/compiler enforcing this as all that hard.


Not that I know what I'm talking about, but my understanding is:

- You should be able to allocate with "scope" a local mutable object (automatic
local mutable state).

- You should not return a mutable reference type on the heap.

A a1 = f();
A a2 = f();

Let's suppose f() is pure. Mr. compiler does:

A __a = f();
A a1 = __a;
A a2 =  __a;

But if A is a reference type, you don't know the concrete type of object a1,a2.
How can you do the pure stuff, like skipping the second evaluation of f(), if
you can't make distinct copy. There is no copy/cloning constructor. You don't
want to share the same mutable reference between a1 and a2, otherwise you would
have written

A a1 = f();
A a2 = a1;

- However, allocating and returning invariant memory should be fine. If A is
invariant (like a string), you should be happy for a1 and a2 to share the same
reference.

Apr 09 2008

D Programming

C/C++ Programming

Other

digitalmars.D - pure or not pure?