www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 8185] New: Pure functions and pointers

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8185

           Summary: Pure functions and pointers
           Product: D
           Version: D2
          Platform: All
        OS/Version: All
            Status: NEW
          Keywords: spec
          Severity: major
          Priority: P2
         Component: DMD
        AssignedTo: nobody puremagic.com
        ReportedBy: verylonglogin.reg gmail.com


--- Comment #0 from Denis Shelomovskij <verylonglogin.reg gmail.com> 2012-06-02
12:10:50 MSD ---
Look's like there is a big problem with pure functions and pointers.

Consider these functions:
---
int*   f1(in int*   i) pure;
int**  f2(in int**  i) pure;
void*  g1(in void*  p) pure;
void** g2(in void** p) pure;

struct MyArray { int* p; size_t len; }
void** h(in MyArray arg) pure;
---
The Question: What exactly does these pure functions consider as `argument
value` and as `returned value`? Looks like this is neither documented nor
obvious.

I see the only two ways to document it properly (yes, the main problem is with
`h` function):
 * disallow pure functions to accept pointers or types with pointers;
 * once pure function accepts a pointer it is considered depending on all
process memory;
 * state with BIG RED LETTERS that pure function depends on the address only
and restrict dereferencing of the pointer on a compiler level.

The second way obviously just means the function isn't pure any more.
The third way means the pointer isn't a pointer any more so I'd prefer to
replace is with "The first way" + "f(cast(size_t) ptr)".

More than that, the situation is very dangerous now. E.g. one can consider
`strlen` to be pure. It should be clearly stated that purity is compiler
checkable, not user checkable with examples like `strlen`. See discussion in
Issue 3057.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jun 02 2012
next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8185


klickverbot <code klickverbot.at> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |code klickverbot.at
           Severity|major                       |enhancement


--- Comment #1 from klickverbot <code klickverbot.at> 2012-06-02 01:44:18 PDT
---
The current behavior is by design, and perfectly fine – note that `pure` in D
just means that a function doesn't access global (mutable) state. A pointer
somewhere isn't a problem either, since the caller must have obtained the
address from somewhere, and if it was indeed from global state, the calling
code couldn't be pure.

Do you have any suggestions on how to make this clearer in the spec? I admit
that the design can take some time to wrap one's head around, but I'm not sure
what's the best way to make the concept easier to grasp.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jun 02 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #2 from klickverbot <code klickverbot.at> 2012-06-02 03:12:04 PDT
---
Also, please note that issue 3057 is really old – I think at that point we
didn't even have the relaxed purity rules yet.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jun 02 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #3 from Denis Shelomovskij <verylonglogin.reg gmail.com> 2012-06-02
14:29:01 MSD ---
(In reply to comment #1)
 The current behavior is by design, and perfectly fine – note that `pure` in D
 just means that a function doesn't access global (mutable) state. A pointer
 somewhere isn't a problem either, since the caller must have obtained the
 address from somewhere, and if it was indeed from global state, the calling
 code couldn't be pure.
OK. Looks like everything works but I don't understand how. So could you please answer the question (read this to the end). According to http://dlang.org/function.html#pure-functions
 Pure functions are functions that produce the same result for the same
arguments.
And my original question is
 The Question: What exactly does these pure functions consider as `argument
value` and as `returned value`? Illustration: --- int f(in int* p) pure; void g() { auto arr = new int[5]; auto res = f(arr.ptr); assert(res == f(arr.ptr)); assert(res == f(arr.ptr + 1)); // *p isn't changed arr[1] = 7; assert(res == f(arr.ptr)); // neither p nor *p is changed arr[0] = 7; assert(res == f(arr.ptr)); // p isn't changed } --- Which asserts must pass? The second assert is here according to http://klickverbot.at/blog/2012/05/purity-in-d/ (yes, it's "Indirections in the Return Type?" section, but sentences looks general and I think it can be treated this way):
 The first essential point are addresses, respectively the definition of
equality applied when considering referential transparency. In functional
languages, the actual memory address that some value resides at is usually of
little to no importance. D being a system programming language, however,
exposes this concept.
-- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 02 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8185


klickverbot <code klickverbot.at> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|enhancement                 |normal


--- Comment #4 from klickverbot <code klickverbot.at> 2012-06-02 07:50:05 PDT
---
(In reply to comment #3)
 And my original question is
 The Question: What exactly does these pure functions consider as `argument
value` and as `returned value`? Illustration: --- int f(in int* p) pure;
Thanks for the example, this certainly makes your concerns easier to see. You are right, the spec is really not clear in this regard – but in my opinion, only a single interpretation makes sense, in that it is actually enforceable by the compiler: ---
     auto res = f(arr.ptr);
     assert(res == f(arr.ptr));
This one obviously has to pass.
     assert(res == f(arr.ptr + 1)); // *p isn't changed
Might fail, f is allowed to return cast(int)p.
     arr[1] = 7;
     assert(res == f(arr.ptr)); // neither p nor *p is changed
Must pass, reading/modifying random bits of memory inside pure functions is obviously a bad idea. Bad idea meaning that pointer arithmetic is disallowed in safe code anyway, and in system code, you as the programmer are responsible for not violating the type system guarantees – for example, you can just call any impure function in a pure context using a cast. This also means that e.g. C string functions cannot not be pure in D.
     arr[0] = 7;
     assert(res == f(arr.ptr)); // p isn't changed
Might fail, as discussed in the »What about Referential Transparency« section of the article – only if the parameters are _transitively_ equal (as defined by their type), then pure functions are guaranteed to return the same value.
 The second assert is here according to
 http://klickverbot.at/blog/2012/05/purity-in-d/.
Then this aspect of the article is apparently not as clear as it could be – thanks for the feedback, I'll incorporate it in the next revision. --- Do you disagree with any of these points? If so, I'd be happy to provide a more in-depth explanation of my view, so we can clarify the spec afterwards. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 02 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8185


art.08.09 gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |art.08.09 gmail.com


--- Comment #5 from art.08.09 gmail.com 2012-06-02 08:22:14 PDT ---
(In reply to comment #0)

 I see the only two ways to document it properly (yes, the main problem is with
 `h` function):
  * once pure function accepts a pointer it is considered depending on all
 process memory;
That would work, but would probably be too limiting. * Allow only dereferencing the pointer, disallow any kind of indexing. Note it's not trivial, as pointer arithmetic should still work. But probably doable, by disallowing dereferencing at all, and making a special exception for accessing via an unmodified argument. This would also have to work recursively, so it basically comes down to introducing a special kind of pointer, that behaves a bit more like a reference. The alternatives are the ones you listed, either banning pointers or assuming the function depends on everything - neither is really acceptable. A pure function shouldn't deal with unbounded arrays, so this kind of restriction should be fine (the alternative is to have to slice everything, which is not a sane solution, eg when working with pointers to structs) -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 02 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #6 from Denis Shelomovskij <verylonglogin.reg gmail.com> 2012-06-02
19:59:12 MSD ---
(In reply to comment #4)
 (In reply to comment #3)
     assert(res == f(arr.ptr + 1)); // *p isn't changed
Might fail, f is allowed to return cast(int)p.
Am I understanding correct that: --- int[] f() pure; int g(in int[] a) pure; int gs(in int[] a) safe pure; void h() { assert(g(f()) == g(f())); // May or may not pass assert(gs(f()) == gs(f())); // Should pass } --- ?
     arr[1] = 7;
     assert(res == f(arr.ptr)); // neither p nor *p is changed
Must pass,...
So this code is invalid: --- void f(int* i) pure safe // or unsafe, doesn't matter { ++i[1]; } --- and this is invalid too: --- struct MyArray { int* p; size_t len; ... int opIndex(size_t i) pure safe // or unsafe, doesn't matter in { assert(i < len); } body { return p[len]; } } --- ? And this is valid: --- void f(int* i) pure safe // or unsafe, doesn't matter { ++*i; } --- ?
 reading/modifying random bits of memory inside pure functions is
 obviously a bad idea. Bad idea meaning that pointer arithmetic is disallowed in
  safe code anyway, and in  system code, you as the programmer are responsible
 for not violating the type system guarantees – for example, you can just call
 any impure function in a pure context using a cast. This also means that e.g. C
 string functions cannot not be pure in D.
I'm a bit confused because I didn't mention safe attribute. If you have a time I'd like to see about safe/unsafe pure functions differences in your article because it looks like these things are really different.
 The second assert is here according to
 http://klickverbot.at/blog/2012/05/purity-in-d/.
Then this aspect of the article is apparently not as clear as it could be – thanks for the feedback, I'll incorporate it in the next revision.
Not sure, my English is rather bad so I could just misunderstand something.
 Do you disagree with any of these points? If so, I'd be happy to provide a more
 in-depth explanation of my view, so we can clarify the spec afterwards.
`void f(void*) pure;` is still unclear for me. What can it do? What can it do if it's safe? And I completely misunderstand why pure functions can't be optimized out as Steven Schveighoffer sad in druntime pull 198 comment:
 The fact that it returns mutable makes it weak pure (the optimizer cannot
remove any calls to gc_malloc)
(yes, this is a general question, not pointers only) -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 02 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8185


Steven Schveighoffer <schveiguy yahoo.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |schveiguy yahoo.com


--- Comment #7 from Steven Schveighoffer <schveiguy yahoo.com> 2012-06-02
17:48:23 PDT ---
All of the functions(In reply to comment #3)
 
 According to http://dlang.org/function.html#pure-functions
 Pure functions are functions that produce the same result for the same
arguments.
This is certainly true. However, it's not practical nor always possible for the compiler to determine if a call can be optimized out. Consider that on any call to a pure function that takes mutable data, the function could modify the data, so even calling with the same exact pointer again may result in a new effective parameter. However, if a function has only immutable or implicitly convertible to immutable parameters and return values, the function *can* be optimized out, because it's guaranteed nothing ever changes. This situation is what has been called "strong pure". It's the equivalent to functional language purity. It's possible in certain situations for a "weak pure" function to be considered strong pure. For example, consider a function which takes a const parameter, and returns a const. Pass an immutable into it, and nothing could possibly have changed before the next call, it can be optimized out. The compiler does not take advantage of these yet.
 And my original question is
 The Question: What exactly does these pure functions consider as `argument
value` and as `returned value`?
argument value is all the data reachable via the parameters. Argument result is all the data reachable via the result. For pointers, you are under the same rules as normal functions -- safe functions cannot use pointers, unsafe ones can. If an unsafe pure function is called, a certain degree of freedom to screw up is available, just like any other unsafe function.
 int f(in int* p) pure;
 
 void g()
 {
     auto arr = new int[5];
     auto res = f(arr.ptr);
 
     assert(res == f(arr.ptr));
obviously this passes, all the parameters are identical, and nothing could have changed between the two calls. The call will not currently be optimized out, because the compiler isn't smart enough yet.
 
     assert(res == f(arr.ptr + 1)); // *p isn't changed
may or may not pass, parameter is different.
 
     arr[1] = 7;
     assert(res == f(arr.ptr)); // neither p nor *p is changed
may or may not pass. f is not safe, so it could possibly access arr[1].
 
     arr[0] = 7;
     assert(res == f(arr.ptr)); // p isn't changed
may or may not pass, the parameter is different.
 And I completely misunderstand why pure functions can't be optimized out as
 Steven Schveighoffer sad in druntime pull 198 comment:
I hope I have helped to further your understanding with this post. Don just looked up the original thread which outlined the weak-pure proposal, which was submitted to digitalmars.D on August 2010. You may want to read that entire thread. In general response to this bug, I'm unsure how pointers should be treated by the optimizer. My gut feeling is the compiler/optimizer should trust the code "knows what it's doing." and so should expect that the code implicitly knows how much data it can access after the pointer. Consider an interesting case, using BSD sockets: int f(immutable sockaddr *addr) pure; sockaddr is a specific size, yet it's a "base class" of different types of address structures. Typically, one casts the sockaddr into the correct struct based on the sa_family member. But this may technically mean f accesses more data than it is given, based on a rigid interpretation of the type system. Should the compiler enforce this given it makes this kind of function practically useless? I think not. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 02 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8185


Jonathan M Davis <jmdavisProg gmx.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jmdavisProg gmx.com


--- Comment #8 from Jonathan M Davis <jmdavisProg gmx.com> 2012-06-02 21:29:24
PDT ---
This isn't true:

  safe functions cannot use pointers, unsafe ones can.
safe functions can use pointers just fine. Pointers themselves are considered safe (e.g. the AA's in operator works just fine in safe code). It's unsafe pointer operations such as pointer arithmetic which are not safe. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 02 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #9 from Denis Shelomovskij <verylonglogin.reg gmail.com> 2012-06-03
10:23:09 MSD ---
Such a mess! The more people write here the more different opinions I see.
IMHO, Walter and Andrei must also participate here to help with conclusion (or
to finally mix everything up).

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jun 02 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #10 from art.08.09 gmail.com 2012-06-03 06:28:09 PDT ---
(In reply to comment #7)

 argument value is all the data reachable via the parameters.  Argument result
 is all the data reachable via the result.
[...]
 the optimizer.  My gut feeling is the compiler/optimizer should trust the code
 "knows what it's doing." and so should expect that the code implicitly knows
 how much data it can access after the pointer.
Having "pure" as an user provided attribute, the compiler completely trusting the programmer and only checking/enforcing certain assumptions when it is easy to do, is a reasonable solution. Anybody that understands the purity concept will have no problem determining if some function is "pure" or not, this is how it is in C, in dialects supporting pure. Unfortunately, D has purity inference. uint f()(immutable ubyte* p) { uint r; foreach (i; 0..size_t.max) r += p[i]; return r; } Can this still be considered pure? What about "uint f2()(Struct* p) {/*same body*/}"? Or uint f3()(ubyte* p) { uint r; foreach (i; 0..size_t.max) r += p[i]++; return r; } ? All three functions are tagged as pure by the compiler... -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 03 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8185


timon.gehr gmx.ch changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |timon.gehr gmx.ch


--- Comment #11 from timon.gehr gmx.ch 2012-06-03 12:18:33 PDT ---
(In reply to comment #0)
 The Question: What exactly does these pure functions consider as `argument
 value` and as `returned value`? Looks like this is neither documented nor
 obvious.
 
Pointers may only access their own memory blocks, therefore exactly those blocks participate in argument value and return value. But why does it even matter? Isn't this discussion mostly philosophical? -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 03 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #12 from art.08.09 gmail.com 2012-06-03 12:46:28 PDT ---
(In reply to comment #11)
 Pointers may only access their own memory blocks, therefore exactly those
 blocks participate in argument value and return value.
What does 'their own memory block' mean? The problem is a pointer is basically an unbounded array, and, if the access isn't restricted somehow, makes the function dependent on global memory state.
 But why does it even matter? Isn't this discussion mostly philosophical?
The compiler will happily assume that template functions are pure even when they clearly are not, and there isn't even a way to mark such functions as "impure" (w/o using hacks like calling dummy functions etc). Example - a function that is designed to operate on arrays, will always be called with a pointer to inside an array, and can assume that the previous and next element is always valid: f4(T)(T* p) { p[-1] += p[0]; } The compiler thinks f4() is pure, when it clearly is not; optimizations based on that assumption are likely to result in corrupted data. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 03 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #13 from timon.gehr gmx.ch 2012-06-03 12:53:36 PDT ---
(In reply to comment #12)
 (In reply to comment #11)
 Pointers may only access their own memory blocks, therefore exactly those
 blocks participate in argument value and return value.
What does 'their own memory block' mean?
The allocated memory block it points into.
 The problem is a pointer is basically an unbounded array,
That is wrong. The pointer is bounded, but it is generally impossible to devise the exact bounds from the pointer alone. This is why D has dynamic arrays.
 and, if the access isn't restricted somehow, makes the
 function dependent on global memory state.
? A function independent of memory state is useless.
 
 But why does it even matter? Isn't this discussion mostly philosophical?
The compiler will happily assume that template functions are pure even when they clearly are not, and there isn't even a way to mark such functions as "impure" (w/o using hacks like calling dummy functions etc). Example - a function that is designed to operate on arrays, will always be called with a pointer to inside an array, and can assume that the previous and next element is always valid: f4(T)(T* p) { p[-1] += p[0]; } The compiler thinks f4() is pure, when it clearly is not; optimizations based on that assumption are likely to result in corrupted data.
f4 _is_ 'pure' (it does not access non-immutable free variables). The compiler is not allowed to perform optimizations that change defined program behavior. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 03 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #14 from art.08.09 gmail.com 2012-06-03 13:52:53 PDT ---
(In reply to comment #13)
 (In reply to comment #12)
 (In reply to comment #11)
 Pointers may only access their own memory blocks, therefore exactly those
 blocks participate in argument value and return value.
What does 'their own memory block' mean?
The allocated memory block it points into.
But, as the bounds are unknown to the compiler, it does not have the this information, it has to assume everything is reachable via the pointer. This is why i suggested above that only dereferencing a pointer should be allowed in pure functions.
 The problem is a pointer is basically an unbounded array,
That is wrong. The pointer is bounded, but it is generally impossible to devise the exact bounds from the pointer alone. This is why D has dynamic arrays.
And one way to make it work is to forbid dereferencing pointers and require fat ones. Then the bounds would be known. But i don't think anybody would want to write "f(pointer_to_some_struct[0..1])"...
 and, if the access isn't restricted somehow, makes the
 function dependent on global memory state.
? A function independent of memory state is useless.
int n(int i) {return i+42;}
 
 But why does it even matter? Isn't this discussion mostly philosophical?
The compiler will happily assume that template functions are pure even when they clearly are not, and there isn't even a way to mark such functions as "impure" (w/o using hacks like calling dummy functions etc). Example - a function that is designed to operate on arrays, will always be called with a pointer to inside an array, and can assume that the previous and next element is always valid: f4(T)(T* p) { p[-1] += p[0]; } The compiler thinks f4() is pure, when it clearly is not; optimizations based on that assumption are likely to result in corrupted data.
f4 _is_ 'pure' (it does not access non-immutable free variables). The compiler is not allowed to perform optimizations that change defined program behavior.
f4 isn't pure, by any definition - it depends on (or in this example modifies) state, which the caller may not even consider reachable. The compiler can assume that a pure function does not access any mutable state other than what can be directly or indirectly reached via the arguments -- that is what function purity is all about. If the compiler has to assume that a pure function that takes a pointer argument can read or modify everything, the "pure" tag becomes worthless. And what's worse, it allows other "truly" pure function to call our immoral one. Hmm, another way out of this could be to require all pointers args in a pure function to target 'immutable' - but that, again, seems to limiting; "bool f(in Struct* s)" could not be pure. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 03 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #15 from Jonathan M Davis <jmdavisProg gmx.com> 2012-06-03 14:40:12
PDT ---
The _only_ thing that the pure attribute means by itself is that that function
cannot directly access any mutable global or static variables. That is _all_.
It means _nothing_ else. It can mess with pointers. It can mess with in, ref,
out, and lazy parameters. It can mess with the elements in a slice (thereby
alterining external state). It can mess with mutable global or static variables
_indirectly_ via the arguments that it's passed (e.g if a pointer or ref is
passed to a global variable). It just cannot _directly_ access any mutable
global or static variables.

pure by itself indicates a weakly pure function. That function enables _zero_
optimizations. It is _not_ pure in the sense that the functional or
mathematical community would consider pure. It is not even _trying_ to be pure
in that sense. What weak purity does is enable _strong_ purity to actually be
useful.

When the compiler can guarantee that all of a pure function's arguments
_cannot_ be altered by that function, _then_ it is strongly pure. Currently,
that gurantee is in effect only when all of the parameters of the function are
immutable or implicitly convertible to immutable. It could be extended to const
parameters in the case when they're passed immutable arguments, but that isn't
currently done.

A strongly pure function cannot alter its arguments at all, but it _can_
allocate memory, and it _can_ mutate any of its local state. _weakly_ pure
functions can therefore be called from within a strongly pure function, because
the only state that they can alter is the state of what's passed to them
(because the fact that they're marked with pure means that they cannot access
mutable global or mutable static state except via their arguments), and the
only state that the strongly pure function _can_ pass to them is local to it,
because it can't access global or static mutable state any more than they can,
and it can't even access it via its arguments, because it's strongly pure.

This is all very clear and well-defined.

Having pointers sent off into la-la land doing unsafe  system stuff is a
_completely_ separate issue. You can break pretty much _anything_ with  system
code. You could even cast a function which called writeln so that that the
signature was pure and then call it from a pure function. All bets are off when
you're in  system land. It's _your_ job to make sure that your code isn't doing
something completely screwy at that point. Any function or operation which the
compiler doesn't consider pure would still make a templated function be
considered impure in such cases, but because it's  system, you can trick it if
you want to (e.g. by casting a function's signature). But it's  system code -
unsafe code - so it's your fault at that point, not the compiler's.

I really don't know how the documentation could be much clearer. ref and
pointer arguments are't "returned." Only the return value is returned. And
arguments are clearly the arguments to the function. And as long as the
compiler can determine that nothing has been done to an argument to alter it,
it's going to consider to be the same value (and it's going to be _extremely_
conservative about that - even altering a reference or pointer of the same type
would make its value be considered different, because they both might point to
the same thing).

As for stuff like strlen, in that case, you're doing the  system thing of
saying that yes, I know what I'm doing. I know that this function isn't marked
as pure, because it's a C function, but I also know that it _is_ actually pure.
I know that it won't access global mutable state. So, I will mark it as pure so
that it can be used in pure code. I'm telling the compiler that I know better
than it does. And in this caes, I do. If I didn't, then you'd have a bug, and
it would be the my fault, because they I the compiler what was best, and I was
wrong. At that point, it's up to me to make sure that that the compiler's
guarantees aren't being violated. That's  system for you. D is a systems
programming language. You can do that sort of thing.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jun 03 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #16 from art.08.09 gmail.com 2012-06-03 15:50:29 PDT ---
(In reply to comment #15)
 pure by itself indicates a weakly pure function. That function enables _zero_
Inventing terminology doesn't help, especially when the result is so confusing.
 optimizations. It is _not_ pure in the sense that the functional or
 mathematical community would consider pure. It is not even _trying_ to be pure
 in that sense. What weak purity does is enable _strong_ purity to actually be
 useful.
 
 When the compiler can guarantee that all of a pure function's arguments
 _cannot_ be altered by that function, _then_ it is strongly pure. Currently,
 that gurantee is in effect only when all of the parameters of the function are
 immutable or implicitly convertible to immutable. It could be extended to const
 parameters in the case when they're passed immutable arguments, but that isn't
 currently done.
[...] tl;dr. The bugtracker is probably not the right place for this discussion; we could move it to the ML, but talking about it only makes sense if D can be fixed; otherwise we would be wasting our time... Limiting "pure" to just immutable data would work indeed, but it's much too limiting. struct S {int a,b; int[64] c; bool f() const pure {return a||b;}} int g(S* p) { int r; foreach (i; 0..64) if (p.f()) r |= p.c[i]; return r; } Using your "weak pure" definition, f's "pure" would be a NOOP - that is not what most people would expect, and is not a sane purity implementation. It's not a problem for trivial examples such as this one because inlining should take care of it, but would make "pure" almost useless in real code, as it would almost never be, to use your terminology again, "strongly" pure (and couldn't be moved out of the loop). Note that, even when using your "strong purity" definition, the compiler still does the wrong thing - some of the examples I gave previously in this bug are (and others can be trivially modified to be) inferred as "strongly" pure functions, when they are not pure at all. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 03 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #17 from Jonathan M Davis <jmdavisProg gmx.com> 2012-06-03 16:02:19
PDT ---
They aren't _my_ definitions. They're official. They've been discussed in the
newsgroup. They've even been used by folks like Walter Bright in talks at
conferences. How purity is implemented in D has been discussed and was decided
a while ago. It works well and is not going to change. Weak purity solved a
real need. All we had before was strong purity, and it was almost useless,
because it was so limited. It is _far_ more useful now that it was before.

A pure function is clearly defined as a function which cannot access global or
static state which is mutable. It doesn't matter how other languages use the
term pure. That's how D uses it. And in cases where a function is strongly
pure, you _do_ get the optimizations based on passing the same arguments to the
same pure function multiple times that you'd expect from a more functional
language.

If you don't like how D's pure works, that's fine - you're free to have your
own opinion, be it dissenting or otherwise - but how pure works in D is _not_
going to change. If bugs are found in the compiler's implementation of it, they
will be addressed, but at this point, the design is what it is.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jun 03 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #18 from Denis Shelomovskij <verylonglogin.reg gmail.com>
2012-06-04 09:38:21 MSD ---
(In reply to comment #15)
 I really don't know how the documentation could be much clearer.
Once it will have examples showing what asserts have to/may/shouldn't pass and/or (I prefer and) what optimizations can be done. Even Setting Dynamic Array Length section has such examples but it is far more simple.
 As for stuff like strlen, in that case, you're doing the  system thing of
saying that yes, I know what I'm doing. And the missing now words "What exactly does these pure functions consider as `argument value` and as `returned value`" from my original question because it's treated by someone as "only pointer dereferencing" and by someone "access to any logically accessible address". Again, all misunderstanding of pure functions in D can be easily solved by just adding (lots of) examples with difficult cases into docs. IMHO, Jonathan M Davis e.g. will save at least lots of his time (yes, and our time too) by just adding such examples with minimal comments into docs instead of writing such big answers. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 03 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #19 from Jonathan M Davis <jmdavisProg gmx.com> 2012-06-03 22:58:33
PDT ---
I honestly don't understand why much in the way of examples are needed. The
documentation explains what pure is. When the compiler is able to optimize out
calls to pure functions is an implementation detail - just like optimizations
with const or immutable are. You use pure wherever you can, and the compiler
will optimize where it can.

The documentation could go into more detail on weakly pure vs strongly pure
(since it doesn't mention either), but that's pretty much the only relevant
improvement that I can think of, and I know that Don would be annoyed by that,
since he wants the terms strongly pure and weakly pure to die and just leave
them as implementation details (though I think that he's the only one who
really feels that way).

I think that there's a lot of overthinking of this going on here. The
documentation quite clearly states what a pure function is and what it can and
can't do. I don't see how more examples would really help much with that. But
anyone has an idea that they think will improve the documentation, then feel
free to create a pull request with the changes.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jun 03 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #20 from Denis Shelomovskij <verylonglogin.reg gmail.com>
2012-06-04 11:54:40 MSD ---
(In reply to comment #19)
 I honestly don't understand why much in the way of examples are needed.
OK. I have written some examples. Are they too obvious to not be in docs? Honestly, I'll be amazed if most of D programmers have thought about most of that cases. Examples: pure functions (not sure if system only or safe too) in D are guaranteed to be pure only if used according to it's documentation. There is no guarantees in other case. --- /// b argument have to be true or result will depend on global state size_t f(size_t i, bool b) pure; // strongly pure void main() { size_t i1 = f(1, false); // can depend on global state size_t i2 = f(1, false); // f is free to produce different result here // And if second f call is optimized out using i2 = i1, // (because f is strongly pure) a program will behave // differently in release mode so be careful. } --- For system pure functions, it's your responsibility to pass correct arguments to functions. These functions (even strongly pure) can be impure for "incorrect" arguments and even results in "undefined behavior". --- extern (C) size_t strlen(in char* s) nothrow pure; // strongly pure /// cstr must be zero-ended size_t myStrlen(in char[] cstr) pure // strongly pure { return strlen(cstr.ptr); } void main() { char[3] str = "abc"; // str isn't zero-ended so myStrlen call // results in undefined behavior. size_t l1 = myStrlen(str); size_t l2 = myStrlen(str); // can give different result } --- system strongly pure functions often can't be optimized out: --- extern (C) size_t strlen(in char* s) nothrow pure; // strongly pure void f(in char* cstr, int* n) pure { // strlen have to be executed every iteration, // because compiler doesn't know if n is // connected with cstr someway for(size_t i = 0; i < strlen(cstr); ++i) { *n += cstr[i]; } } --- Same apply even if these functions hasn't pointers/arrays in it's signature: --- size_t f(size_t) nothrow pure; // strongly pure void g(size_t i1, ref size_t i2) pure { // f have to be executed every iteration, // because compiler doesn't know if i1 is // connected with i2 someway (f can expect // that it's argument is an address of i2) for(size_t i = 0; i < f(i1); ++i) { i2 *= 3; } } --- One has to carefully watch if a function is strongly pure by it's signature (the compiler is guaranteed to determine function purity type by it's signature only to prevent different behavior between cases with/without a signature): --- void f(size_t x) pure // strongly pure, can't have side effects { *cast(int*) x = 5; // undefined behavior } __gshared int tmp; void g(size_t x, ref int dummy = tmp) pure // weakly pure, can have side effects { *cast(int*) x = 5; // correct } --- -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #21 from Jonathan M Davis <jmdavisProg gmx.com> 2012-06-04 01:22:53
PDT ---
???

Why would you be marking a function as pure if it can access global state? The
compiler would flag that unless you cheated through casts or the use of
extern(C) functions where you marked the declaration as pure but not the
definition (since pure isn't part of the name mangling for extern(C)
functions).

Also, none of your examples using in are strongly pure. At present, the
parameters must be _immutable_ or implicitly convertible to immutable for the
function to be strongly pure. The only way that const or in would work is if
they were passed immutable arguments, but the compiler doesn't treat that as
strongly pure right now.

 system has _nothing_ to do with purity. There's no need to bring it up. It's
just that  system will let you do dirty tricks (such as casting) to get around
pure. Certainly, an  system pure function isn't pure based on its arguments
unless it's doing something very wrong. The function would have to be
specifically trying to break purity to do that, and then it's the same as when
you're dealing with const and the like. There's no need to even bring it up.
It's a given with _anything_ where you can cast to do nasty  system stuff.

Adding a description of weakly pure vs strongly pure to the documentation may
be valuable, but adding any examples like these would be pointless without it.
Also, if you'll notice, the documentation in general is very light on
unnecessary examples. It explains exactly what the feature does and gives
minimal examples on it. Any that are added should add real value.

pure functions cannot access global mutable state or call any other functions
which aren't pure. The compiler will give an error if a function marked as pure
does either of those things. What the compiler does in terms of optimizations
is up to its implementation. I don't see how going into great detail on whether
this particular function signature or that particular function signature can be
optimized is going to help much.

It seems to me that the core problem is that many programmers are having a hard
time understanding that all that pure means is that pure functions cannot
access global mutable state or call any other functions which aren't pure. They
keep thinking that it means more than that, and it doesn't. The compiler will
use that information to do optimizations where it can (which aren't even always
related to strongly pure - e.g. combining const and weakly pure enable
optimizations, just not the kind which elide function calls). If programmers
would just believe what the description says about what pure means and stop
trying to insist that it must mean more than that, I think that they would be a
lot less confused. In some respects, discussing stuff like weakly pure and
strongly pure just confuses matters. They're effectively implementation details
of how some pure-related optimizations are triggered.

It's so very simple and understandable if you leave it at something like "pure
functions cannot access global or static variables which are at all mutable -
either by the pure function or anything else - and they cannot call other
functions which are not pure." That tells you all that you really need to know,
and is quite valuable even if _zero_ optimizations were done based on pure,
because it helps immensely in being able to think about and understand your
program, because you know that a pure function cannot mutate anything which
isn't passed to it. I think that you're just overthinking this and
overcomplicating things.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jun 04 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #22 from Denis Shelomovskij <verylonglogin.reg gmail.com>
2012-06-04 13:07:21 MSD ---
(In reply to comment #21)
 Why would you be marking a function as pure if it can access global state? The
 compiler would flag that unless you cheated through casts or the use of
 extern(C) functions where you marked the declaration as pure but not the
 definition (since pure isn't part of the name mangling for extern(C)
 functions).
From your comment before:
 As for stuff like strlen, in that case, you're doing the  system thing of
 saying that yes, I know what I'm doing. I know that this function isn't marked
 as pure, because it's a C function, but I also know that it _is_ actually pure.
`strlen` is now pure (marked by Andrei Alexandrescu) and it can access global state once used with non-zero-ended string. I just made situation more evident.
 Also, none of your examples using in are strongly pure. At present, the
 parameters must be _immutable_ or implicitly convertible to immutable for the
 function to be strongly pure. The only way that const or in would work is if
 they were passed immutable arguments, but the compiler doesn't treat that as
 strongly pure right now.
From your comment before:
 When the compiler can guarantee that all of a pure function's arguments
 _cannot_ be altered by that function, _then_ it is strongly pure.
So I just don't know how strlen can change its argument...
  system has _nothing_ to do with purity. There's no need to bring it up.
IMHO, yes it is. Because safe and system pure functions looks very different for me. And yes, I can be wrong.
 It's just that  system will let you do dirty tricks (such as casting) to get
around
 pure. Certainly, an  system pure function isn't pure based on its arguments
 unless it's doing something very wrong. The function would have to be
 specifically trying to break purity to do that, and then it's the same as when
 you're dealing with const and the like. There's no need to even bring it up.
 It's a given with _anything_ where you can cast to do nasty  system stuff.
Does strlen doing something very wrong or specifically trying to break purity when it accessing random memory?
 Adding a description of weakly pure vs strongly pure to the documentation may
 be valuable, but adding any examples like these would be pointless without it.
 Also, if you'll notice, the documentation in general is very light on
 unnecessary examples. It explains exactly what the feature does and gives
 minimal examples on it. Any that are added should add real value.
 
 pure functions cannot access global mutable state or call any other functions
 which aren't pure. The compiler will give an error if a function marked as pure
 does either of those things. What the compiler does in terms of optimizations
 is up to its implementation. I don't see how going into great detail on whether
 this particular function signature or that particular function signature can be
 optimized is going to help much.
Yes it is because as I wrote:
 Once it will have examples showing what asserts have to/may/shouldn't pass
 and/or (I prefer and) what optimizations can be done.
optimizations = what asserts should pure functions confirm = what is pure function
 It seems to me that the core problem is that many programmers are having a hard
 time understanding that all that pure means is that pure functions cannot
 access global mutable state or call any other functions which aren't pure. They
 keep thinking that it means more than that, and it doesn't. The compiler will
 use that information to do optimizations where it can (which aren't even always
 related to strongly pure - e.g. combining const and weakly pure enable
 optimizations, just not the kind which elide function calls). If programmers
 would just believe what the description says about what pure means and stop
 trying to insist that it must mean more than that, I think that they would be a
 lot less confused. In some respects, discussing stuff like weakly pure and
 strongly pure just confuses matters. They're effectively implementation details
 of how some pure-related optimizations are triggered.
strlen and other system functions does access global state in some cases. It's pure. And I'm confused if there is no explanation on _how exactly pure functions can access global state_.
 It's so very simple and understandable if you leave it at something like "pure
 functions cannot access global or static variables which are at all mutable -
 either by the pure function or anything else - and they cannot call other
 functions which are not pure."
No. They call everything that want and do everything they want (see druntme pull 198). They just should behave like a pure functions for a user. And I don't clearly understand what does it mean "to behave like a pure function". That's why this issue is created. That's why I want to see what asserts should pure functions confirm.
 That tells you all that you really need to know,
 and is quite valuable even if _zero_ optimizations were done based on pure,
Again, I'm not interesting in optimizations for optimization now. They just can explain what is a pure function.
 because it helps immensely in being able to think about and understand your
 program, because you know that a pure function cannot mutate anything which
 isn't passed to it.
It gives me nothing because I still doesn't know what is passed to it as I wrote:
 What exactly does these pure functions consider as `argument
 value` and as `returned value`?
 I think that you're just overthinking this and
 overcomplicating things.
May be. Just like a contrary case. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #23 from timon.gehr gmx.ch 2012-06-04 02:22:54 PDT ---
(In reply to comment #14)
 (In reply to comment #13)
 (In reply to comment #12)
 (In reply to comment #11)
 Pointers may only access their own memory blocks, therefore exactly those
 blocks participate in argument value and return value.
What does 'their own memory block' mean?
The allocated memory block it points into.
But, as the bounds are unknown to the compiler, it does not have the this information, it has to assume everything is reachable via the pointer.
1. It does not need the information. Dereferencing a pointer outside the valid bounds results in undefined behavior. Therefore the compiler can just ignore the possibility. 2. It can gain some information at the call site. Eg: int foo(const(int)* y)pure; void main(){ int* x = new int; int* y = new int; auto a = foo(x); auto b = foo(y); auto c = foo(x); assert(a == c); } 3. Aliasing is the classic optimization killer even without 'pure'. 4. Invalid use of pointers can break every other aspect of the type system. Why single out 'pure' ?
 This is
 why i suggested above that only dereferencing a pointer should be allowed in
 pure functions.
 
This is too restrictive.
 And one way to make it work is to forbid dereferencing pointers and require fat
 ones. Then the bounds would be known.
The bounds are usually known only at runtime. The compiler does not have more to work with. From the compiler's point of view, an array access out of bounds and an invalid pointer dereference are very similar.
 and, if the access isn't restricted somehow, makes the
 function dependent on global memory state.
? A function independent of memory state is useless.
int n(int i) {return i+42;}
Where do you store the parameter 'i' if not in some memory location?
 f4 _is_ 'pure' (it does not access non-immutable free variables). The compiler
 is not allowed to perform optimizations that change defined program behavior.
f4 isn't pure, by any definition - it depends on (or in this example modifies) state, which the caller may not even consider reachable.
Then it is the caller's fault. What is considered reachable is well-defined, and f4 must document its valid inputs.
 The compiler can
 assume that a pure function does not access any mutable state other than what
 can be directly or indirectly reached via the arguments -- that is what
 function purity is all about. If the compiler has to assume that a pure
 function that takes a pointer argument can read or modify everything, the
 "pure" tag becomes worthless.
No pointer _argument_ necessary. int foo()pure{ enum int* everything = cast(int*)...; return *everything; } As I already pointed out, unsafe language features can be used to subvert the type system. If pure functions should be restricted to the safe subset, they can be marked safe, or compiled with the -safe compiler switch.
 And what's worse, it allows other "truly" pure
 function to call our immoral one. 
 
Nothing wrong with that.
 Hmm, another way out of this could be to require all pointers args in a pure
 function to target 'immutable' - but that, again, seems to limiting; "bool
f(in Struct* s)" could not be pure.
This is why the restriction was dropped. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #24 from timon.gehr gmx.ch 2012-06-04 02:41:16 PDT ---
(In reply to comment #22)
 
 `strlen` is now pure (marked by Andrei Alexandrescu) and it can access global
 state once used with non-zero-ended string. I just made situation more evident.
 
It may not be used with a non-zero-ended string. See eg. http://www.cplusplus.com/reference/clibrary/cstring/strlen/ -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #25 from klickverbot <code klickverbot.at> 2012-06-04 03:07:52 PDT
---
I am partly playing Devil's advocate here, but:

(In reply to comment #23)
 This is
 why i suggested above that only dereferencing a pointer should be allowed in
 pure functions.
 
This is too restrictive.
Why?
 And one way to make it work is to forbid dereferencing pointers and require fat
 ones. Then the bounds would be known.
The bounds are usually known only at runtime. The compiler does not have more to work with. From the compiler's point of view, an array access out of bounds and an invalid pointer dereference are very similar.
There is an important semantic difference between these two – a slice is a bounded region of memory, whereas a pointer per se just represents a reference to a single value. --- int foo(int* p) pure { return *(p - 1); // Is this legal? } auto a = new int[10]; foo(a.ptr + 1); ---
 ? A function independent of memory state is useless.
int n(int i) {return i+42;}
Where do you store the parameter 'i' if not in some memory location?
In a register, but that's besides the point – which is that the type of i, int, makes it clear that n depends on exactly four bytes of memory. In »struct Node { Node* next; } void foo(Node* n) pure;«, on the other hand, following your interpretation foo() might depend on an almost arbitrarily large amount of memory (consider e.g. uninitialized memory in the area between a heap-allocated Node instance and the end of the block where it resides, which, if interpreted as Node instance(s), might have »false pointers« to other memory blocks, etc.).
 f4 _is_ 'pure' (it does not access non-immutable free variables). The compiler
 is not allowed to perform optimizations that change defined program behavior.
f4 isn't pure, by any definition - it depends on (or in this example modifies) state, which the caller may not even consider reachable.
Then it is the caller's fault. What is considered reachable is well-defined […]
Is it? Could you please repeat the definition then, and point out how this is clear from the definition of purity according to the spec, »Pure functions are functions that produce the same result for the same arguments«.
 and f4 must document its valid inputs.
--- /// Passing anything other than `false` is illegal. int g_state; void foo(bool neverTrue) pure { if (neverTrue) g_state = 42; } --- Should this be allowed to be pure? Well, if strlen is, then ostensibly yes, but isn't this too permissive of an interpretation, as the type system can't actually guarantee it? Shouldn't rather a cast to pure at the _call site_ be required if called with know good values, just as in other cases where the type system can't prove a certain invariant, but the programmer can? Purity by convention works just fine without the pure keyword as well… -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #26 from Jonathan M Davis <jmdavisProg gmx.com> 2012-06-04 03:19:22
PDT ---
I'd actually argue that the line "Pure functions are functions that produce the
same result for the same arguments" should be removed from the spec.
Ostensibly, yes. The same arguments will result in the same result, but that
doesn't really have anything to do with how pure is defined. It's more like
it's a side effect of the fact that you can't access global mutable state. It's
true that the compiler will elide additional function calls within an
expression in cases where the same function is called multiple times with the
same arguments and the compiler can guarantee that the result will be the same,
but that's arguably an implementation detail of the optimizer.

While the origin and original motivation for pure in D was to enable
optimizations based on functional purity (multiple calls to the same function
with the same arguments are guaranteed to have the same results), that's not
really what pure in D does now, and talking about that clouds the issue
something awful, as this bug report demonstrates.

Pure means solely that the function cannot access any global or static
variables which can be mutated either directly or indirectly once instantiated
and that the function cannot call any other functions which are not pure. That
enables the whole "same result for the same arguments" thing, but it does _not_
mean that in and of itself. The simple fact that an argument could have a
function on it which returns the value of a mutable global variable without
that variable being part of its state at all negates that.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jun 04 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #27 from klickverbot <code klickverbot.at> 2012-06-04 03:38:12 PDT
---
(In reply to comment #26)
 While the origin and original motivation for pure in D was to enable
 optimizations based on functional purity (multiple calls to the same function
 with the same arguments are guaranteed to have the same results), that's not
 really what pure in D does now, and talking about that clouds the issue
 something awful, as this bug report demonstrates.
I think you've provided a good explanation of the high-level design of the pure keyword, more than once, but it seems that you are missing that this issue, at least as stated in comment 3, is actually about a very specific detail: The extent to which memory reachably by manipulating passed in pointers is still considered local, i.e. accessible by pure functions. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #28 from Jonathan M Davis <jmdavisProg gmx.com> 2012-06-04 03:39:33
PDT ---
https://github.com/D-Programming-Language/d-programming-language.org/pull/128

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jun 04 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #29 from timon.gehr gmx.ch 2012-06-04 03:39:52 PDT ---
(In reply to comment #25)
 I am partly playing Devil's advocate here, but:
 
 (In reply to comment #23)
 This is
 why i suggested above that only dereferencing a pointer should be allowed in
 pure functions.
 
This is too restrictive.
Why?
Because safety is an orthogonal concern. eg. strlen is a pure function. By the same way of reasoning, all unsafe features could be banned in all parts of the code, not just in pure functions.
 
 And one way to make it work is to forbid dereferencing pointers and require fat
 ones. Then the bounds would be known.
The bounds are usually known only at runtime. The compiler does not have more to work with. From the compiler's point of view, an array access out of bounds and an invalid pointer dereference are very similar.
There is an important semantic difference between these two – a slice is a bounded region of memory, whereas a pointer per se just represents a reference to a single value.
Yes, 'per se'. Effectively, it references all memory in the same allocated memory block. (This is also the view taken by the GC.)
 ---
 int foo(int* p) pure {
   return *(p - 1); // Is this legal?
 }
 
If it is legal depends on whether or not *(p-1) is part of the same memory block. A conservative analysis (as is done in safe code) would have to flag the access as illegal.
 auto a = new int[10];
 foo(a.ptr + 1);
 ---
a.ptr is a pointer. The arithmetics are flagged as illegal in safe code even though it is safe. What do the examples show?
 
 ? A function independent of memory state is useless.
int n(int i) {return i+42;}
Where do you store the parameter 'i' if not in some memory location?
In a register, but that's besides the point
Indeed, because a register is just memory after all.
 – which is that the type of i, int,
 makes it clear that n depends on exactly four bytes of memory. In »struct Node
 { Node* next; } void foo(Node* n) pure;«, on the other hand, following your
 interpretation foo() might depend on an almost arbitrarily large amount of
 memory (consider e.g. uninitialized memory in the area between a heap-allocated
 Node instance and the end of the block where it resides,
 which, if interpreted as Node instance(s), might have »false pointers« to
other memory blocks, etc.).
 
The language does not define such a thing. Accessing this area therefore results in undefined behavior.
 f4 _is_ 'pure' (it does not access non-immutable free variables). The compiler
 is not allowed to perform optimizations that change defined program behavior.
f4 isn't pure, by any definition - it depends on (or in this example modifies) state, which the caller may not even consider reachable.
Then it is the caller's fault. What is considered reachable is well-defined […]
Is it? Could you please repeat the definition then,
It is written down in the C standard. There is no formal specification for D.
 and point out how this is
 clear from the definition of purity according to the spec,
This would not be defined in the pages about purity, but rather in the pages about pointer arithmetics, which are missing, presumably because they would be the same as in C.
 »Pure functions are
 functions that produce the same result for the same arguments«.
 
This is not a definition of the 'pure' keyword. It relies on informal terms such as 'the same' and does not require annotation of a function. Therefore the sentence should be dropped from the documentation. If a function is marked with 'pure', then it may not reference mutable free variables.
 and f4 must document its valid inputs.
--- /// Passing anything other than `false` is illegal. int g_state; void foo(bool neverTrue) pure { if (neverTrue) g_state = 42; } --- Should this be allowed to be pure? Well, if strlen is, then ostensibly yes, but
No, because it is trivial to devise an equivalent implementation that does not require the compiler to read documentation comments: int g_state; void foo(bool neverTrue) pure in{assert(!neverTrue);} body { } The same does not hold for 'strlen', therefore the analogy immediately breaks down.
 isn't this too permissive of an interpretation, as the type system can't
 actually guarantee it? Shouldn't rather a cast to pure at the _call site_ be
 required if called with know good values, just as in other cases where the type
 system can't prove a certain invariant, but the programmer can?
The type system of an unsafe language cannot prove _any_ invariants, because unsafe operations may result in undefined behavior. This does not imply we'd better have to drop the entire type system.
 Purity by convention works just fine without the pure keyword as well…
This is not only about purity by convention, it is about memory safety by convention. In safe code, all the concerns raised immediately disappear. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #30 from Jonathan M Davis <jmdavisProg gmx.com> 2012-06-04 03:42:40
PDT ---
 I think you've provided a good explanation of the high-level design of the
pure keyword, more than once, but it seems that you are missing that this
issue, at least as stated in comment 3, is actually about a very specific
detail: The extent to which memory reachably by manipulating passed in pointers
is still considered local, i.e. accessible by pure functions.
pure doesn't restrict pointers in any way shape or form. That's an safe/ trusted/ system issue, and is completely orthogonal to pure. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #31 from klickverbot <code klickverbot.at> 2012-06-04 06:18:02 PDT
---
(In reply to comment #30)
 pure doesn't restrict pointers in any way shape or form. That's an
  safe/ trusted/ system issue, and is completely orthogonal to pure.
I guess I _might_ have understood what purity entails and what it doesn't… To quote myself, the question here is the extent to which memory reachable by manipulating passed in pointers is still considered local, i.e. accessible by pure functions. This, conceptually, has nothing to do with safe/ trusted/ system, even though safe code cannot manipulate pointers for other reasons. There are two options: Either, allow pure functions taking pointers to read other memory locations in the same block of allocated values, or restrict access to just the data directly pointed at (which incidentally is also what safe does, but, again, that's not relevant). Both options are equally valid, and I think the current »spec« is not clear on which one should apply. The first option, which is currently implemented in DMD, allows functions like strlen() to be pure. On the other hand, it also makes the semantics/implications of `pure` a lot more complex, because it links it to something which is fundamentally not expressible by the type system, namely that for any level of indirection, surrounding parts of the memory might be accessible or not, depending on how it was originally allocated. This is assuming C semantics, because, as Timon mentioned as well, OTOH the D docs don't have a formal definition for this as all. For example, consider »struct Node { int val; Node* next; } int foo(in Node* head) pure;«. Using the first rule, it is almost impossible to figure out statically what parts of the program state »foo(someHead)« depends on, because if any of the Node instances in the chain was allocated as part of a contiguous block (i.e. array), it would be legal for foo() to read them as well, even though the function calling foo() might not even have been involved in the construction of the list. Thus, the compiler is forced to always assume the worst case in terms of optimization (at least without elaborate DFA), which, in most D programs, is needlessly conservative. The second option avoid such complications, and allows functions calls with parameters on the heap (and thus pointers) to receive the same kind of optimizations as if the parameters were passed on the stack, which might be impractical. It is also the expected behavior if you are thinking of a pointer literally just as an indirection to a single value stored somewhere else. Personally, I am not sure what is the better choice; the second option seems like the cleaner design, but I can see the merits of the first one as well. But that's not my point – I am just trying to convince you that the »spec« (or whatever it should really be called) needs improvement in this area, because it frequently confuses people. Your revised version (#128) doesn't define »through their arguments« either, yet this is the crucial point. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8185


Don <clugdbug yahoo.com.au> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |clugdbug yahoo.com.au


--- Comment #32 from Don <clugdbug yahoo.com.au> 2012-06-04 06:31:41 PDT ---
(In reply to comment #31)
 (In reply to comment #30)
 pure doesn't restrict pointers in any way shape or form. That's an
  safe/ trusted/ system issue, and is completely orthogonal to pure.
I guess I _might_ have understood what purity entails and what it doesn't… To quote myself, the question here is the extent to which memory reachable by manipulating passed in pointers is still considered local, i.e. accessible by pure functions. This, conceptually, has nothing to do with safe/ trusted/ system, even though safe code cannot manipulate pointers for other reasons.
I
 
 There are two options: Either, allow pure functions taking pointers to read
 other memory locations in the same block of allocated values, or restrict
 access to just the data directly pointed at (which incidentally is also what
  safe does, but, again, that's not relevant). Both options are equally valid,
 and I think the current »spec« is not clear on which one should apply.
 
 The first option, which is currently implemented in DMD, allows functions like
 strlen() to be pure. On the other hand, it also makes the
 semantics/implications of `pure` a lot more complex, because it links it to
 something which is fundamentally not expressible by the type system, namely
 that for any level of indirection, surrounding parts of the memory might be
 accessible or not, depending on how it was originally allocated. This is
 assuming C semantics, because, as Timon mentioned as well, OTOH the D docs
 don't have a formal definition for this as all.
 
 For example, consider »struct Node { int val; Node* next; } int foo(in Node*
 head) pure;«. Using the first rule, it is almost impossible to figure out
 statically what parts of the program state »foo(someHead)« depends on,
because
 if any of the Node instances in the chain was allocated as part of a contiguous
 block (i.e. array), it would be legal for foo() to read them as well, even
 though the function calling foo() might not even have been involved in the
 construction of the list. Thus, the compiler is forced to always assume the
 worst case in terms of optimization (at least without elaborate DFA), which, in
 most D programs, is needlessly conservative.
That's correct. You should not expect *any* optimizations from weakly pure functions. The ONLY purpose of weakly pure functions is to increase the number of strongly pure functions. In all other respects, they are no different from an impure function. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #33 from klickverbot <code klickverbot.at> 2012-06-04 06:43:32 PDT
---
(In reply to comment #32)
 That's correct. You should not expect *any* optimizations from weakly pure
 functions. The ONLY purpose of weakly pure functions is to increase the number
 of strongly pure functions. In all other respects, they are no different from
 an impure function.
Const-pure functions invoked with immutable _arguments_ (even though parameters might only be const) can receive exactly the same amount of optimizations. Even if not implemented in DMD today (as are many other possible purity-related optimizations), this is very useful, because otherwise functions would have to accept immutable values just for the sake of optimization even though they could work with const values just as well otherwise. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #34 from Denis Shelomovskij <verylonglogin.reg gmail.com>
2012-06-04 19:08:08 MSD ---
(In reply to comment #33)
 (In reply to comment #32)
 That's correct. You should not expect *any* optimizations from weakly pure
 functions. The ONLY purpose of weakly pure functions is to increase the number
 of strongly pure functions. In all other respects, they are no different from
 an impure function.
Const-pure functions invoked with immutable _arguments_ (even though parameters might only be const) can receive exactly the same amount of optimizations. Even if not implemented in DMD today (as are many other possible purity-related optimizations), this is very useful, because otherwise functions would have to accept immutable values just for the sake of optimization even though they could work with const values just as well otherwise.
Have you noticed that as I wrote in comment 20 strong unsafe pure functions like --- size_t f(size_t) nothrow pure; --- also almost always can't be optimized out? -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #35 from Denis Shelomovskij <verylonglogin.reg gmail.com>
2012-06-04 19:18:33 MSD ---
For Jonathan M Davis: here (as before) when I say "optimization" I mean
"doesn't behave such way that can be optimized" which means "doesn't behave
such way that is expected/desired (IMHO)/etc.".

Example (for everybody):
---
int f(size_t) pure;

__gshared int tmp;
void g(size_t, ref int dummy = tmp) pure;

void h(size_t a, size_t b) pure
{
    int res = f(a);
    g(b);
    assert(res == f(a)); // may fail, no guaranties by language!
}
---

So pure looks for me more then just useless. It looks dangerous because it
confuses people and forces them to think that the second `assert` will pass. At
least, with existing docs (or with pull 128).

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jun 04 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #36 from Jonathan M Davis <jmdavisProg gmx.com> 2012-06-04 08:45:00
PDT ---
 int f(size_t) pure;
 __gshared int tmp;
 void g(size_t, ref int dummy = tmp) pure;
 void h(size_t a, size_t b) pure
 {
    int res = f(a);
    g(b);
    assert(res == f(a)); // may fail, no guaranties by language!
}
Your g(b) causes h to be impure, because it accesses tmp, which is __gshared. Also, as far as eliding additional calls to pure functions, at present, they only occur within the same line, and I think that may only ever occur within the same expression (it's either expression or statement, I'm not sure which). So, the eliding of additional pure function calls is going to be quite rare. The _primary_ benefit of pure is how it enables you to reason about your code. You _know_ that f doesn't mess with anything other than the argument that you passed to it without having to look at its body at all. Oh, and the assertion _is_ guaranteed to pass. a and res are both value types. Neither res nor a are passed to anything or accessed in any way other than in the the lines with the calls to f, and even if g were impure, and it screwed with whatever argument was passed as the first argument to the h call, it wouldn't be able to mess with the value of a, because it was already copied. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #37 from klickverbot <code klickverbot.at> 2012-06-04 09:03:18 PDT
---
(In reply to comment #34)
 […] strong unsafe pure functions […]
Please note that safe-ty of a function has nothing to do with purity. Yes in a system/ trusted pure function, it's easy to do impure things, but if you do, it's your fault, not that of the language/type system. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #38 from art.08.09 gmail.com 2012-06-04 09:08:38 PDT ---
(In reply to comment #23)
 (In reply to comment #14)
 (In reply to comment #13)
 (In reply to comment #12)
 (In reply to comment #11)
 Pointers may only access their own memory blocks, therefore exactly those
 blocks participate in argument value and return value.
What does 'their own memory block' mean?
The allocated memory block it points into.
But, as the bounds are unknown to the compiler, it does not have the this information, it has to assume everything is reachable via the pointer.
1. It does not need the information. Dereferencing a pointer outside the valid bounds results in undefined behavior. Therefore the compiler can just ignore the possibility.
The problem is there are no "valid bounds". Unless you'd like to declare (char* p) {return p[1];} as invalid, which as you yourself say is restrictive (but IMO acceptable for pure functions, at least the ones that are automatically inferred as pure).
 2. It can gain some information at the call site. Eg:
 
 int foo(const(int)* y)pure;
 
 void main(){
     int* x = new int;
     int* y = new int;
     auto a = foo(x);
     auto b = foo(y);
     auto c = foo(x);
 
     assert(a == c);
 }
According to certain replies in this report, that assertion could fail. :) But i get what you're saying - now consider this foo() definition instead: int foo()(const(int)* y) { int r; foreach (i; 0..size_t.max) r += y[i]; return r; } /* same main () */ The compiler will treat foo() as pure, so if it would be able to act on the a==c assumption above, it could also do the same here. And now it would be completely wrong - the function doesn't even try to pretend that it's pure, yet it will be inferred as if it were and there's no (clean) way to prevent that. If the compiler optimizes based on a==c, it will miscompile the program. This is why the restrictions on what is accessed via a pointer in a pure function is necessary. Note it only matters for templates/literals/lambdas, ie the cases where purity is inferred; the programmer can always add the purity tag when he knows it is (logically) safe (eg most C string functions). And yes, my example code doesn't make sense as-is, but it only servers to illustrate the problem, there are sane implementations of foo(T*p) which under the right conditions will have the same issues. BTW, is my foo() above safe? According to the compiler here - it is.
 3. Aliasing is the classic optimization killer even without 'pure'.
Yes. Maybe it's a good thing that D doesn't attempt to define it, given the amount of confusion something like "pure" causes...
 4. Invalid use of pointers can break every other aspect of the type system.
    Why single out 'pure' ?
It has nothing to do with "invalid use of pointers", unless, again, p[1] is deemed invalid.
 This is
 why i suggested above that only dereferencing a pointer should be allowed in
 pure functions.
 
This is too restrictive.
What else do you want to be able to do with a pointer in a pure function? Dereferencing it and working with the value itself should work, anything else? Note that you should be able to explicitly tell the compiler to assume something is pure even when the code accesses more than just the pointed-to element.
 And one way to make it work is to forbid dereferencing pointers and require fat
 ones. Then the bounds would be known.
The bounds are usually known only at runtime. The compiler does not have more to work with. From the compiler's point of view, an array access out of bounds and an invalid pointer dereference are very similar.
Having well defined aliasing rules would help, yes, but I think that's beyond the scope of this bug.
 and, if the access isn't restricted somehow, makes the
 function dependent on global memory state.
? A function independent of memory state is useless.
int n(int i) {return i+42;}
Where do you store the parameter 'i' if not in some memory location?
I said "global memory state". The parameters are *local* state, just like variables - they can not escape (you can't return their address) and the values depend only on function inputs. Arguments containing references can be seen as part of the global state, but those are explicitly defined as inputs that the function depends on. And that definition wrt to pointers is exactly what this bug is about.
 f4 _is_ 'pure' (it does not access non-immutable free variables). The compiler
 is not allowed to perform optimizations that change defined program behavior.
f4 isn't pure, by any definition - it depends on (or in this example modifies) state, which the caller may not even consider reachable.
Then it is the caller's fault. What is considered reachable is well-defined, and f4 must document its valid inputs.
f4() takes a pointer; AFAICT you've said above that it should be able to do more than just dereference it. So what exactly is considered reachable?
 The compiler can
 assume that a pure function does not access any mutable state other than what
 can be directly or indirectly reached via the arguments -- that is what
 function purity is all about. If the compiler has to assume that a pure
 function that takes a pointer argument can read or modify everything, the
 "pure" tag becomes worthless.
No pointer _argument_ necessary. int foo()pure{ enum int* everything = cast(int*)...; return *everything; } As I already pointed out, unsafe language features can be used to subvert the
p[i] can be just as dangerous as the cast. The questions is - can the compiler treat a function containing these constructs as still pure? If the programmer says so, it's fine - purity by convention works.
 type system. If pure functions should be restricted to the safe subset, they
 can be marked  safe, or compiled with the -safe compiler switch.
int foo()(int* y) safe { int r; foreach (i; 0..size_t.max) r += y[i]++; return r; } But it's not related to this bug.
 And what's worse, it allows other "truly" pure
 function to call our immoral one. 
 
Nothing wrong with that.
It is wrong - if a pure functions can be optimized out and it calls another one that has side effects. Again, the case when a human incorrectly tags a function is not really the problem, it's when the compiler does that behind the programmers back. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #39 from klickverbot <code klickverbot.at> 2012-06-04 09:13:14 PDT
---
(In reply to comment #38)
 BTW, is my foo() above  safe? According to the compiler here - it is.
If so, please open a new issue – this is clearly a bug. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #40 from Denis Shelomovskij <verylonglogin.reg gmail.com>
2012-06-04 20:27:24 MSD ---
(In reply to comment #36)
 int f(size_t) pure;
 __gshared int tmp;
 void g(size_t, ref int dummy = tmp) pure;
 void h(size_t a, size_t b) pure
 {
    int res = f(a);
    g(b);
    assert(res == f(a)); // may fail, no guaranties by language!
}
Your g(b) causes h to be impure, because it accesses tmp, which is __gshared.
Yes, my mistake. Lets call "g(b, b)".
 Also, as far as eliding additional calls to pure functions, at present, they
 only occur within the same line, and I think that may only ever occur within
 the same expression (it's either expression or statement, I'm not sure which).
 So, the eliding of additional pure function calls is going to be quite rare.
 The _primary_ benefit of pure is how it enables you to reason about your code.
 You _know_ that f doesn't mess with anything other than the argument that you
 passed to it without having to look at its body at all.
No, because the assert may not pass. See below.
 Oh, and the assertion _is_ guaranteed to pass. a and res are both value types.
 Neither res nor a are passed to anything or accessed in any way other than in
 the the lines with the calls to f, and even if g were impure, and it screwed
 with whatever argument was passed as the first argument to the h call, it
 wouldn't be able to mess with the value of a, because it was already copied.
Again, assert may not pass. Were it pass, I will not write this question. Example: --- int f(size_t p) pure { return *cast(int*) p; } void g(size_t p, ref size_t) pure { ++*cast(int*) p; } void h(size_t a, size_t b) pure { int res = f(a); g(b, b); assert(res == f(a)); // may fail, no guaranties by language! } void main() { int a; h(cast(size_t) &a, cast(size_t) &a); } --- -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #41 from Jonathan M Davis <jmdavisProg gmx.com> 2012-06-04 09:35:33
PDT ---
 void g(size_t p, ref size_t) pure
{
    ++*cast(int*) p;
}
You're casting a size_t to a pointer. That's breaking the type system. The assertion is guaranteed to pass as long as you don't break the type system. That's exactly the same as occurs when casting away const. When you subvert the type system, the compiler can't guarantee anything. It's the _programmer's_ job at that point to maintain the compiler's guarantees. The compiler is free to assume that the programmer did not violate those guarantees. If you do, you've created a bug. This is precisely the sort of thing that comes up when someone is crazy enough to cast away const on somethnig and try and mutate it. Such an example is ultimately irrelevant, precisely because it violates the type system. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #42 from Denis Shelomovskij <verylonglogin.reg gmail.com>
2012-06-04 20:52:56 MSD ---
(In reply to comment #41)
 void g(size_t p, ref size_t) pure
{
    ++*cast(int*) p;
}
You're casting a size_t to a pointer. That's breaking the type system. The assertion is guaranteed to pass as long as you don't break the type system. That's exactly the same as occurs when casting away const.
It isn't and here is the point! It's explicitly stated that when I'm casting away const and than modify date the result is undefined. I will be happy if I'm missing that this casting results in undefined result too.
 When you subvert the
 type system, the compiler can't guarantee anything. It's the _programmer's_ job
 at that point to maintain the compiler's guarantees. The compiler is free to
 assume that the programmer did not violate those guarantees.
No it's not. Otherwise every such break of the rules will result in undefined behavior. E.g. C++ have strict aliasing and can shrink what function arguments can refer to and if C++ program has `strlen` source it can inline and move it out of loop if, e.g. in loop we only modify and `int*`, but in D it can't be done because every `int*` can refer to every `char*`. So C++ support pure functions better than D. :)
 If you do, you've
 created a bug. This is precisely the sort of thing that comes up when someone
 is crazy enough to cast away const on somethnig and try and mutate it. Such an
 example is ultimately irrelevant, precisely because it violates the type
 system.
Every system function can do it. It can even be written in assembly language. I'm just saying here that it doesn't violate definition of a `pure` function and here is the problem. I will be happy once it will violate the definition. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #43 from Steven Schveighoffer <schveiguy yahoo.com> 2012-06-04
10:30:19 PDT ---
(In reply to comment #42)
 It isn't and here is the point! It's explicitly stated that when I'm casting
 away const and than modify date the result is undefined. I will be happy if I'm
 missing that this casting results in undefined result too.
I believe it is undefined to cast a size_t to a pointer and use it as a pointer. But I could be wrong. In any case, pure function optimizations do not conservatively assume you will be doing that -- the compiler will optimize assuming you do *not* use it as a pointer. Whenever you cast, you are telling the compiler "I know what I'm doing." At that point, you are on your own as far as guaranteeing type safety and pure functions are actually pure.
 No it's not. Otherwise every such break of the rules will result in undefined
 behavior. E.g. C++ have strict aliasing and can shrink what function arguments
 can refer to and if C++ program has `strlen` source it can inline and move it
 out of loop if, e.g. in loop we only modify and `int*`, but in D it can't be
 done because every `int*` can refer to every `char*`. So C++ support pure
 functions better than D. :)
If you don't want the compiler to make bad optimization decisions, then don't use casting. At best, this will be implementation defined. I think you are way overthinking this. D's compiler and optimizer are based on a C++ compiler, written by the same person. Most of the same rules from C++ apply to D. The compiler does not "assume the worst," it "assumes the reasonable," until you tell it otherwise. In other words, no reasonable developer will write code like you have, so the compiler assumes you are reasonable. Using toy examples to show how the compiler *must* behave does not work. Yes, maybe this isn't spelled out fully in the spec, and it should be. But you are coming at this problem from the wrong end, start with what the compiler acutally *does*, not what you *think it should do* based on the spec. The spec, like most software products, is usually the last to be updated when it comes to additional features, and the new pure rules are quite recent. The priority of "who is right" goes like this: 1. TDPL (the book) 2. The reference implementation (DMD) 3. dlang.org -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #44 from Steven Schveighoffer <schveiguy yahoo.com> 2012-06-04
10:45:33 PDT ---
(In reply to comment #7)
 In general response to this bug, I'm unsure how pointers should be treated by
 the optimizer.  My gut feeling is the compiler/optimizer should trust the code
 "knows what it's doing." and so should expect that the code implicitly knows
 how much data it can access after the pointer.
After thinking about this for a couple days (and watching the emails pour in with differing opinions), here is what I think pure functions with pointers should mean: For system or trusted functions, the definition of what data the pointer has access to is defined by the programmer, and not expressed in possible way to the type system or the compiler. In other words, if I have a pointer to something, the actual data referenced includes any number of bytes before or after the memory pointed at. The scope of that data is defined by the programmer of the function/type, and should be clearly documented to the user of the function. For safe functions, the compiler should allow access only to the specific item pointed to as defined by the pointed-at type, and nothing else (pointer math is disallowed, pointer indexing is disallowed, and casting is disallowed). For pure functions, no conservative assumptions should be made or acted upon during optimizations that expect the function has access to global data. In other words, a system pure function that accepts a pointer should rightly assume that the function does *not* access global data, and that whatever data the function accesses via its pointer was passed via its parameter as expected by the caller. If the function incorrectly accesses global data via its pointer, then it results in undefined behavior. These expectations and behaviors should be spelled out in the spec. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #45 from klickverbot <code klickverbot.at> 2012-06-04 10:51:45 PDT
---
(In reply to comment #44)
Still thinking about the rest of the proposal, but:

 […] or  trusted functions […]
If a trusted function accepts a pointer, it must _under no circumstances_ access anything except for the pointer target, because it can be called from safe code. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #46 from Steven Schveighoffer <schveiguy yahoo.com> 2012-06-04
10:59:49 PDT ---
(In reply to comment #45)
 (In reply to comment #44)
 Still thinking about the rest of the proposal, but:
 
 […] or  trusted functions […]
If a trusted function accepts a pointer, it must _under no circumstances_ access anything except for the pointer target, because it can be called from safe code.
The point of trusted is that it is treated as safe, but can do unsafe things. At that point, you are telling the compiler that you know better than it does that the code is safe. The compiler is going to assume you did not access anything else beyond the target, so you have to keep that in mind when writing a trusted function that accepts a pointer parameter. Off the top of my head, I can't think of any valid usage of this, but it doesn't mean we should necessarily put a restriction on trusted functions. This is a systems language, and trusted is a tool used to circumvent safe-ty when you know it is actually safe. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #47 from Denis Shelomovskij <verylonglogin.reg gmail.com>
2012-06-04 22:13:05 MSD ---
(In reply to comment #43)
 The compiler does not "assume the worst," it "assumes the reasonable," until
 you tell it otherwise.  In other words, no reasonable developer will write code
 like you have, so the compiler assumes you are reasonable.  Using toy examples
 to show how the compiler *must* behave does not work.
Common! System language must have strict rights. You just have said that D is JavaScript. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #48 from klickverbot <code klickverbot.at> 2012-06-04 11:24:10 PDT
---
(In reply to comment #46)
 (In reply to comment #45)
 If a  trusted function accepts a pointer, it must _under no circumstances_
 access anything except for the pointer target, because it can be called from
  safe code.
The point of trusted is that it is treated as safe, but can do unsafe things. At that point, you are telling the compiler that you know better than it does that the code is safe. The compiler is going to assume you did not access anything else beyond the target, so you have to keep that in mind when writing a trusted function that accepts a pointer parameter. Off the top of my head, I can't think of any valid usage of this, but it doesn't mean we should necessarily put a restriction on trusted functions. This is a systems language, and trusted is a tool used to circumvent safe-ty when you know it is actually safe.
Sorry, but I think you got this wrong. Consider this example: --- void gun(int* a) trusted; int fun() safe { auto val = new int; gun(val); return *val; } --- Here, calling gun needs to be safe under _any_ circumstances. Thus, the only memory location which gun is allowed to access is val. If it does so by evaluating *(a + k), where k = (catalanNumber(5) - meaningOfLife()), that's fine, it's trusted, but ultimately k must always be zero. Otherwise, it might violate the memory safety guarantees that need to hold for fun(). This is definitely not defined by the programmer, and not expressed in possible way to the type system or the compiler. Makes sense? -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #49 from art.08.09 gmail.com 2012-06-04 11:29:39 PDT ---
As this discussions was mostly about what *should* be happening, I decided to
see what actually *is* happening right now.
It seems that the compiler will only optimize based on "pureness" if a function
takes an 'immutable T*' argument, even 'immutable(T)*' is enough to turn the
optimization off.
So, right now, it is extremely conservative - and there is no bug in the
implementation. (accessing mutable data via an immutable pointer can be done,
but would be clearly illegal, just as using a cast)

But that also means that a lot of valid optimizations aren't done, making
purity significantly less useful than it could be. Basically, only functions
that don't take any (non-immutable) references as arguments can benefit from
"pure". But it also means D can still be incrementally fixed, as long as a sane
definition of function purity is used.

But this bug is a spec issue, hence probably INVALID, as there is no
specification. Sorry for the noise.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jun 04 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #50 from Steven Schveighoffer <schveiguy yahoo.com> 2012-06-04
11:35:27 PDT ---
(In reply to comment #47)
 Common! System language must have strict rights. You just have said that D is
 JavaScript.
A systems language is very strict as long as you play within the type system. Once you use casts, all bets are off. The compiler can make *wrong assumptions* and your code may not do what you think it should. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #51 from Steven Schveighoffer <schveiguy yahoo.com> 2012-06-04
11:48:22 PDT ---
(In reply to comment #48)
 (In reply to comment #46)
 (In reply to comment #45)
 If a  trusted function accepts a pointer, it must _under no circumstances_
 access anything except for the pointer target, because it can be called from
  safe code.
The point of trusted is that it is treated as safe, but can do unsafe things. At that point, you are telling the compiler that you know better than it does that the code is safe. The compiler is going to assume you did not access anything else beyond the target, so you have to keep that in mind when writing a trusted function that accepts a pointer parameter. Off the top of my head, I can't think of any valid usage of this, but it doesn't mean we should necessarily put a restriction on trusted functions. This is a systems language, and trusted is a tool used to circumvent safe-ty when you know it is actually safe.
Sorry, but I think you got this wrong. Consider this example: --- void gun(int* a) trusted; int fun() safe { auto val = new int; gun(val); return *val; } --- Here, calling gun needs to be safe under _any_ circumstances.
No, it does not. Once you use trusted, the compiler stops checking that it's safe.
 Thus, the only
 memory location which gun is allowed to access is val. If it does so by
 evaluating *(a + k), where k = (catalanNumber(5) - meaningOfLife()), that's
 fine, it's  trusted, but ultimately k must always be zero. Otherwise, it might
 violate the memory safety guarantees that need to hold for fun(). This is
 definitely not defined by the programmer, and not expressed in possible way to
 the type system or the compiler.
Yeah, that's a hard one to spell out in docs. I'd recommend not writing that function :) But there's no way to specify this to the compiler, it must assume you have communicated it properly. Here is an interesting example (I pointed it out before in terms of sockaddr): struct PacketHeader { int nBytes; int packetType; } struct DataPacket { PacketHeader header = {packetType:5}; ubyte[1] data; // extends through length of packet } How to specify to the compiler that PacketHeader * with packetType of 5 is really a DataPacket, and it's data member has nBytes bytes in it? Such a well-described data structure system can be perfectly safe, as long as you follow the rules of construction. Now, in order to ensure any function that receives a PacketHeader * is trusted, you will have to control construction of the PacketHeader somehow. Perhaps you make PacketHeader an opaque type, and safe functions can therefore never muck with the header information, or maybe you mark nBytes and packetType as private, so it can never be changed outside the module that knows how to build PacketHeaders. In any case, it is wrong to assume that there isn't a valid way to make a trusted call that is free to go beyond the target. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #52 from Steven Schveighoffer <schveiguy yahoo.com> 2012-06-04
11:51:14 PDT ---
(In reply to comment #49)
 It seems that the compiler will only optimize based on "pureness" if a function
 takes an 'immutable T*' argument, even 'immutable(T)*' is enough to turn the
 optimization off.
This is a bug, both should be optimized equally: void foo(immutable int * _param) pure { immutable(int)* param = _param; // legal ... // same code as if you had written void foo(immutable(int)* param) } -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #53 from klickverbot <code klickverbot.at> 2012-06-04 12:12:16 PDT
---
(In reply to comment #51)
 (In reply to comment #48)
 Here, calling gun needs to be safe under _any_ circumstances.
No, it does not. Once you use trusted, the compiler stops checking that it's safe.
Yes, it does. As you noted correctly, you as the one implementing gun() must take care of that, the compiler doesn't help you here. But still, you must ensure that gun() never violates memory safety, regardless of what is passed in, because otherwise it might cause safe code to be no longer memory safe.
 Now, in order to ensure any function that receives a PacketHeader * is
  trusted, you will have to control construction of the PacketHeader somehow.
[…]
Okay, iff you are using a pointer more or less exclusively as an opaque handle, then I guess you are right – I thought only about pointers that are directly obtainable in safe code. But then, please be careful with including something along the lines of »For safe functions, the compiler should allow access only to the specific item pointed to as defined by the pointed-at type, and nothing else« in the docs, because it is quite misleading (or even technically wrong, although I know what you are trying to say): A safe function _can_ in effect access other memory, if only with the help from a trusted function. On a related note, the distinction between safe and trusted (especially the difference in mangling) is a horrible abomination and should die in a fire. safe and system are contracts, trusted is an implementation detail – mixing them makes no sense. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #54 from klickverbot <code klickverbot.at> 2012-06-04 12:14:40 PDT
---
(In reply to comment #52)
 This is a bug, both should be optimized equally:
 
 void foo(immutable int * _param) pure
 {
    immutable(int)* param = _param; // legal
    ... // same code as if you had written void foo(immutable(int)* param)
 }
Yep, both should be recognized PUREstrong in DMD – if not, please open a new bug report for that. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #55 from Steven Schveighoffer <schveiguy yahoo.com> 2012-06-04
13:34:50 PDT ---
(In reply to comment #53)
 (In reply to comment #51)
 (In reply to comment #48)
 Here, calling gun needs to be safe under _any_ circumstances.
No, it does not. Once you use trusted, the compiler stops checking that it's safe.
Yes, it does. As you noted correctly, you as the one implementing gun() must take care of that, the compiler doesn't help you here. But still, you must ensure that gun() never violates memory safety, regardless of what is passed in, because otherwise it might cause safe code to be no longer memory safe.
I think I misunderstood your original point. I thought you were saying that gun must be *prevented from* modifying other memory relative to its parameter. Were you simply saying that gun is not stopped by the compiler, but must avoid it in order to maintain safety? If so, I agree, for your example. I can also see that my response was misleading. I did not mean it should not be safe, I meant it's not enforced as safe. Obviously something that is trusted needs to maintain safety.
 On a related note, the distinction between  safe and  trusted (especially the
 difference in mangling) is a horrible abomination and should die in a fire.
  safe and  system are contracts,  trusted is an implementation detail –
mixing
 them makes no sense.
I'm not sure what you're saying here, but trusted is *definitely* needed. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 04 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8185



--- Comment #56 from github-bugzilla puremagic.com 2012-07-01 23:12:33 PDT ---
Commits pushed to master at
https://github.com/D-Programming-Language/d-programming-language.org

https://github.com/D-Programming-Language/d-programming-language.org/commit/59670a7823d066f5146e276bdf5aac7bd93a3f45
Fix for issue# 8185.

This clarifies the definition of pure, since so many people seem to have
a hard time understanding that _all_ that pure means is that the
function cannot access global or static, mutable state or call impure
functions. Everything else with regards to pure is a matter of
implementation-specific optimizations - which does in some cases relate
to full, functional purity, but pure itself does not indicate anything
of the sort.

https://github.com/D-Programming-Language/d-programming-language.org/commit/8cc3ba694bc07ec684f2d1c5a088728aa18e7d93
Merge pull request #128 from jmdavis/pure

Fix for issue# 8185.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jul 01 2012
prev sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8185


Walter Bright <bugzilla digitalmars.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
                 CC|                            |bugzilla digitalmars.com
         Resolution|                            |FIXED


-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jul 02 2012