www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Is this function pure?

reply "Janice Caron" <caron800 googlemail.com> writes:
Is this function pure?

int probablyNotPure(int x)
{
	try
	{
		char[] a = new char[x];
		return a.length;
	}
	catch
	{
		return 0;
	}
}

I must assume not, since it's not deterministic. But which part of it
wrong? Should "new" be forbidden? Or is "try/catch"? Or both?
Sep 18 2007
next sibling parent "Vladimir Panteleev" <thecybershadow gmail.com> writes:
On Tue, 18 Sep 2007 15:36:50 +0300, Janice Caron <caron800 googlemail.co=
m> wrote:

 Is this function pure?

 int probablyNotPure(int x)
 {
 	try
 	{
 		char[] a =3D new char[x];
 		return a.length;
 	}
 	catch
 	{
 		return 0;
 	}
 }

 I must assume not, since it's not deterministic. But which part of it
 wrong? Should "new" be forbidden? Or is "try/catch"? Or both?

Forbidding operations that may cause an exception would imply forbidding= division, since that may raise a division by zero statement. I don't se= e what's wrong with handling your own exceptions. And, "determinism" is = a stretchable notion - if you can determine the number of concurrent thr= eads which will call this function, the points where switches occur and = the amount of available memory... in D's context, though, I don't think = determinism should be a necessity for pure functions - just the limitati= on of not accessing external data or "non-pure" code. -- = Best regards, Vladimir mailto:thecybershadow gmail.com
Sep 18 2007
prev sibling next sibling parent reply Lutger <lutger.blijdestijn gmail.com> writes:
Janice Caron wrote:
 Is this function pure?
 
 int probablyNotPure(int x)
 {
 	try
 	{
 		char[] a = new char[x];
 		return a.length;
 	}
 	catch
 	{
 		return 0;
 	}
 }
 
 I must assume not, since it's not deterministic. But which part of it
 wrong? Should "new" be forbidden? Or is "try/catch"? Or both?

Putting what the compiler can check aside, isn't it the return statement in the catch block what prevents this function from being pure, or referentially transparent?
Sep 18 2007
parent Lutger <lutger.blijdestijn gmail.com> writes:
Janice Caron wrote:
 On 9/18/07, Lutger <lutger.blijdestijn gmail.com> wrote:
 Janice Caron wrote:
 Is this function pure?

in the catch block what prevents this function from being pure, or referentially transparent?

I don't think so, since I could easily rewrite it as int probablyNotPure(int x) { int n; try { char[] a = new char[x]; n = a.length; } catch { n = 0; } return n; }

What I meant is that the return value is depending on whether new throws or not. This is the case in the above function as well as the one in the original post, they amount to the same thing. From the standpoint of referential transparency I don't think this function is really different from: int surelyNotPure(int x) { int n = rand(); if (n > 100) { return x; } else { return 0; } } But if you write it as this I don't see why it would not be pure: int probablyPure(int x) { int n; try { char[] a = new char[x]; n = a.length; } catch { throw new FooException; } return n; }
Sep 18 2007
prev sibling next sibling parent "Janice Caron" <caron800 googlemail.com> writes:
On 9/18/07, Lutger <lutger.blijdestijn gmail.com> wrote:
 Janice Caron wrote:
 Is this function pure?

Putting what the compiler can check aside, isn't it the return statement in the catch block what prevents this function from being pure, or referentially transparent?

I don't think so, since I could easily rewrite it as int probablyNotPure(int x) { int n; try { char[] a = new char[x]; n = a.length; } catch { n = 0; } return n; }
Sep 18 2007
prev sibling next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Janice Caron" wrote
 Is this function pure?

 int probablyNotPure(int x)
 {
 try
 {
 char[] a = new char[x];
 return a.length;
 }
 catch
 {
 return 0;
 }
 }

 I must assume not, since it's not deterministic. But which part of it
 wrong? Should "new" be forbidden? Or is "try/catch"? Or both?

I would guess that it is not pure, as it is having side effects. The new call is modifying global state (that of the garbage collector), so I don't see how that 'new' call can be pure. From what I understand, a pure function can only call other pure functions. So I think to answer your question, new should be forbidden. -Steve
Sep 18 2007
parent Lutger <lutger.blijdestijn gmail.com> writes:
Steven Schveighoffer wrote:
 "Janice Caron" wrote
 Is this function pure?

 int probablyNotPure(int x)
 {
 try
 {
 char[] a = new char[x];
 return a.length;
 }
 catch
 {
 return 0;
 }
 }

 I must assume not, since it's not deterministic. But which part of it
 wrong? Should "new" be forbidden? Or is "try/catch"? Or both?

I would guess that it is not pure, as it is having side effects. The new call is modifying global state (that of the garbage collector), so I don't see how that 'new' call can be pure. From what I understand, a pure function can only call other pure functions. So I think to answer your question, new should be forbidden. -Steve

That's true, using 'new' does make a function impure. That doesn't mean it cannot be referentially transparent though. Even if some 'impure' operations were to be allowed, in the case of new you would also have to take custom allocators into account.
Sep 18 2007
prev sibling next sibling parent Don Clugston <dac nospam.com.au> writes:
Janice Caron wrote:
 Is this function pure?
 
 int probablyNotPure(int x)
 {
 	try
 	{
 		char[] a = new char[x];
 		return a.length;
 	}
 	catch
 	{
 		return 0;
 	}
 }
 
 I must assume not, since it's not deterministic. But which part of it
 wrong? Should "new" be forbidden? Or is "try/catch"? Or both?

Interesting question. This is exactly the same: int probablyToo(char [] x) { try { return (x ~ "a").length; } catch { return 0; } } If 'new' was forbidden for reasons of non-determinism, then string concatenation would have to be, as well. Which I think would make pure functions pretty limited. But, if you assume that anything with is CTFE-able is also pure, then string concatenation is OK.
Sep 18 2007
prev sibling next sibling parent reply OF <nospam nospammington.com> writes:
Janice Caron Wrote:

 Is this function pure?
 
 int probablyNotPure(int x)
 {
 	try
 	{
 		char[] a = new char[x];
 		return a.length;
 	}
 	catch
 	{
 		return 0;
 	}
 }
 
 I must assume not, since it's not deterministic. But which part of it
 wrong? Should "new" be forbidden? Or is "try/catch"? Or both?

As long as new char[x] can fail without consistency this function is not pure. That is, the return value depends on something else than x. However, allocating memory is in itself not necessarily 'unpure', as long as it's predictible (say: I always get memory, else make this computation fail, which could be handled by a caller on a lower and not pure level). Problems arise as long as you allow new to fail like this or if you use the address you get (which of course is not deterministic to the degree we want). Probably it's best to leave memory allocations to implicit allocations in pure functions, but this is tricky if you want to return a new object as a modified version of an input object. I don't know how it's intended for D to solve this nicely, but I'm very interested in hearing it.
Sep 18 2007
next sibling parent reply Brad Roberts <braddr puremagic.com> writes:
Janice Caron wrote:
 On 9/18/07, OF <nospam nospammington.com> wrote:
 Problems arise as long as you allow new to fail like this or if you use the
address you get (which of course is not deterministic to the degree we want).

I think the "address of" operator would have to banned completely from pure functions, otherwise you could do int definitelyNotPure() { int x; return cast(int)&x; }
 Probably it's best to leave memory allocations to implicit allocations in pure
functions, but this is tricky if you want to return a new object as a modified
version of an input object. I don't know how it's intended for D to solve this
nicely, but I'm very interested in hearing it.

The only way I can think of is the old-fashioned C way, of passing a buffer and a length (or in D, an array) into the function, and using that and that alone as your memory. ...Ooooh, but wait - don't all inputs have to be (transitively) const!? Yikes.

Purity typically means that the result of the function is exclusively determined by it's inputs. IE, that the results are entirely repeatable. So, this would be pure: void foo(in int x, out int y) { y = x+1; } No matter how often you call it, for any given value of x, what it sets y to is exactly the same. Obviously, this is a simple example and it's essentially isomorphic with the simpler: int foo(in int x) { return x+1; } Whether use of the memory subsystem or other subtle side effects affects purity likely becomes a matter of definition. I personally hope that it's included such that your original example is pure. Later, Brad
Sep 18 2007
parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Brad Roberts" wrote
 Whether use of the memory subsystem or other subtle side effects affects 
 purity likely becomes a matter of definition.  I personally hope that it's 
 included such that your original example is pure.

You mean 'logically' pure? (cackles manically) -Steve
Sep 18 2007
prev sibling parent Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:
Janice Caron wrote:
 On 9/18/07, OF <nospam nospammington.com> wrote:
 Problems arise as long as you allow new to fail like this or if you use the
address you get (which of course is not deterministic to the degree we want).

I think the "address of" operator would have to banned completely from pure functions, otherwise you could do

Can you give another example? The one you gave:
 int definitelyNotPure()
 {
     int x;
     return cast(int)&x;
 }
 
 

is invalid even for non-pure functions. -- Bruno Medeiros - MSc in CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
Sep 18 2007
prev sibling next sibling parent "Janice Caron" <caron800 googlemail.com> writes:
On 9/18/07, OF <nospam nospammington.com> wrote:
 Problems arise as long as you allow new to fail like this or if you use the
address you get (which of course is not deterministic to the degree we want).

I think the "address of" operator would have to banned completely from pure functions, otherwise you could do int definitelyNotPure() { int x; return cast(int)&x; }
 Probably it's best to leave memory allocations to implicit allocations in pure
functions, but this is tricky if you want to return a new object as a modified
version of an input object. I don't know how it's intended for D to solve this
nicely, but I'm very interested in hearing it.

The only way I can think of is the old-fashioned C way, of passing a buffer and a length (or in D, an array) into the function, and using that and that alone as your memory. ...Ooooh, but wait - don't all inputs have to be (transitively) const!? Yikes.
Sep 18 2007
prev sibling next sibling parent reply Nathan Reed <nathaniel.reed gmail.com> writes:
Janice Caron wrote:
 Is this function pure?
 
 int probablyNotPure(int x)
 {
 	try
 	{
 		char[] a = new char[x];
 		return a.length;
 	}
 	catch
 	{
 		return 0;
 	}
 }
 
 I must assume not, since it's not deterministic. But which part of it
 wrong? Should "new" be forbidden? Or is "try/catch"? Or both?

Pure is intended to mimic some of the capabilities of functional languages, yes? Well, at the implementation level functional languages have to allocate memory just like anyone else, and those allocations can potentially fail just like always. That doesn't keep functional language functions from being pure, so it shouldn't keep D functions from being pure either. The function you've written is nondeterministic, but only due to the nondeterminism inherent in memory allocation. It would be interesting to see if it's possible to write a function that acheives nondeterminism *without* either memory allocation or reading global state (e.g. rand()). I believe it's not. If that is the case, then the only potential source of nondeterminism in pure functions is memory allocation, and I don't think we need to really worry about that too much. After all, if we've run out of memory, we've likely got bigger problems than our pure functions becoming nondeterministic. :) Thanks, Nathan Reed
Sep 18 2007
next sibling parent reply Nathan Reed <nathaniel.reed gmail.com> writes:
Nathan Reed wrote:
 It would be interesting 
 to see if it's possible to write a function that acheives nondeterminism 
  *without* either memory allocation or reading global state (e.g. 
 rand()).  I believe it's not.

I just realized that (as someone else pointed out) taking the address of a local variable would also be nondeterministic. Pointer shenanigans in general can lead to nondeterminism, so maybe pointers should be forbidden entirely in pure functions. Given that pointers are rarely used in D except for interfacing with C functions and suchlike, this seems like it would be not entirely unreasonable. Thanks, Nathan Reed
Sep 18 2007
parent reply BCS <ao pathlink.com> writes:
Reply to Nathan,

 Nathan Reed wrote:
 
 It would be interesting to see if it's possible to write a function
 that acheives nondeterminism *without* either memory allocation or
 reading global state (e.g. rand()).  I believe it's not.
 

of a local variable would also be nondeterministic. Pointer shenanigans in general can lead to nondeterminism, so maybe pointers should be forbidden entirely in pure functions. Given that pointers are rarely used in D except for interfacing with C functions and suchlike, this seems like it would be not entirely unreasonable. Thanks, Nathan Reed

It would be a better approach (IMHO) for pure function to just be prevented from accessing "other things". This could be done with some sort of VM. It is interesting to note that given such a VM, pure function can call some non pure function. The requirement would be that a pure function must not call anything that accesses something that the pure function can't. In a VM this is easy to enforce, just don't put anything into it that the function shouldn't access.
Sep 18 2007
parent BCS <ao pathlink.com> writes:
Reply to Benjamin,

 Reply to Nathan,
 
 Nathan Reed wrote:
 
 It would be interesting to see if it's possible to write a function
 that acheives nondeterminism *without* either memory allocation or
 reading global state (e.g. rand()).  I believe it's not.
 

of a local variable would also be nondeterministic. Pointer shenanigans in general can lead to nondeterminism, so maybe pointers should be forbidden entirely in pure functions. Given that pointers are rarely used in D except for interfacing with C functions and suchlike, this seems like it would be not entirely unreasonable. Thanks, Nathan Reed

prevented from accessing "other things". This could be done with some sort of VM. It is interesting to note that given such a VM, pure function can call some non pure function. The requirement would be that a pure function must not call anything that accesses something that the pure function can't. In a VM this is easy to enforce, just don't put anything into it that the function shouldn't access.

(BTW: all that is with regards to pure running at compile time)
Sep 18 2007
prev sibling next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Nathan Reed" wrote
 Pure is intended to mimic some of the capabilities of functional 
 languages, yes?  Well, at the implementation level functional languages 
 have to allocate memory just like anyone else, and those allocations can 
 potentially fail just like always.  That doesn't keep functional language 
 functions from being pure, so it shouldn't keep D functions from being 
 pure either.

I tend to agree with you. But Walter is touting pure functions as being able to run simultaneously on two cores. What happens when both functions allocate memory without normal thread locking? i.e.: char(invariant)[] x = y() ~ z(); // both y and z are pure Since y and z are pure, they should be able to run simultaneously on two cores. What happens when they both try to allocate memory?
 The function you've written is nondeterministic, but only due to the 
 nondeterminism inherent in memory allocation.  It would be interesting to 
 see if it's possible to write a function that acheives nondeterminism 
 *without* either memory allocation or reading global state (e.g. rand()). 
 I believe it's not.

I don't follow... Isn't this a pure function? no global state read/memory allocation: int add(int x, int y) { return x + y; } Basically, you can do pure functions only on the stack, but the issue becomes returning a variable length data structure, such as a string. I think you can't do it unless memory allocation is given a pass. However, then all new operator overrides must be allowed to be pure. Perhaps you give a single memory allocation routine a free pass, and code that deep in the bowels of D, making sure it is thread-safe. Then all other memory allocation routines can declare themselves pure if they use that memory allocation routine as a base. I'm sure Walter has a solution in mind here...
 If that is the case, then the only potential source of nondeterminism in 
 pure functions is memory allocation, and I don't think we need to really 
 worry about that too much.  After all, if we've run out of memory, we've 
 likely got bigger problems than our pure functions becoming 
 nondeterministic. :)

I think exception catching should be outlawed in pure functions. This makes the pure function deterministic iff it does not throw an exception. You can handle cases like divide by zero just by checking if the divisor is 0. -Steve
Sep 18 2007
next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Steven Schveighoffer" wrote
 "Nathan Reed" wrote
 It would be interesting to see if it's possible to write a function that 
 acheives nondeterminism *without* either memory allocation or reading 
 global state (e.g. rand()). I believe it's not.

I don't follow... Isn't this a pure function? no global state read/memory allocation: int add(int x, int y) { return x + y; }

Totally misread your challenge. I thought you said impossible to write a function that achieves *determinism*. Sorry :) I think you are right. I think the challenge can be reduced to just reading global state, as allocating memory is a subset of that. -Steve
Sep 18 2007
prev sibling parent reply Nathan Reed <nathaniel.reed gmail.com> writes:
Steven Schveighoffer wrote:
 Since y and z are pure, they should be able to run simultaneously on two 
 cores.  What happens when they both try to allocate memory?

The allocator has to deal with allocation requests coming from multiple threads/cores anyway. This case presents no additional problem.
 I think exception catching should be outlawed in pure functions.  This makes 
 the pure function deterministic iff it does not throw an exception.  You can 
 handle cases like divide by zero just by checking if the divisor is 0.

Exception throwing and catching is deterministic too as long as the condition that generates the exception is (which dividing by zero would be). I think it is perfectly reasonable to allow pure functions to throw and catch exceptions.
Sep 18 2007
parent reply 0ffh <spam frankhirsch.net> writes:
Nathan Reed wrote:
 Exception throwing and catching is deterministic too as long as the 
 condition that generates the exception is (which dividing by zero would 
 be).  I think it is perfectly reasonable to allow pure functions to 
 throw and catch exceptions.

We might even want go all the way, and *require* a pure function to throw an exception if it detects that functional purity has been compromised like in Janice's out-of-memory example. Regards, frank
Sep 18 2007
parent reply Bruce Adams <tortoise_74 yeah.who.co.uk> writes:
0ffh Wrote:

 Nathan Reed wrote:
 Exception throwing and catching is deterministic too as long as the 
 condition that generates the exception is (which dividing by zero would 
 be).  I think it is perfectly reasonable to allow pure functions to 
 throw and catch exceptions.

We might even want go all the way, and *require* a pure function to throw an exception if it detects that functional purity has been compromised like in Janice's out-of-memory example. Regards, frank

My understanding is that pure is a compile time concept. If your function can violate it its not pure, unless you have a scheme for compile time exception handling - yeauch. Bruce.
Sep 18 2007
parent reply Nathan Reed <nathaniel.reed gmail.com> writes:
Bruce Adams wrote:
 My understanding is that pure is a compile time concept. If your function can
violate it its not pure, unless you have a scheme for compile time exception
handling - yeauch.

Pure functions aren't necessarily /evaluated/ at compile time. They can throw exceptions when evaluated at run-time, just like any function. However, even for functions evaluated at compile time I can't see that throwing and catching exceptions would be that big a deal. CTFE is basically like running an interpreter, right? Thanks, Nathan Reed
Sep 18 2007
parent reply Robert Fraser <fraserofthenight gmail.com> writes:
Someone probably already mentioned this, but you can't (easily, i.e. maybe some
sort of heap partitioning could do it but you'd need OS support for that)
automatically parallelize functions that allocate memory (since the heap is a
shared resource), which was one of the big selling points of "pure". So I'm
assuming that allocations are out of the equation. And since exceptions are
most often paired up with an allocation (although not necessarily), those might
be out, too. Not totally sure about that one.

There's no reason stack-allocated/scope classes couldn't be used, though.

Janice Caron Wrote:

 On 9/19/07, Nathan Reed <nathaniel.reed gmail.com> wrote:
 CTFE is
 basically like running an interpreter, right?

I presume that stands for Compile Time Function Execution and not chlorotrifluoroethylene, right? :-) My feeling is that Ingo is basically right, in that Walter needs to assume that new can never fail - that it can successfully allocate infinite amounts of memory without causing trouble. If that assumption holds (...obviously it doesn't, but bear with me...) then new is allowed. Otherwise the simplest things would be banned: a function which does toString (e.g. that turns 42 into "42") would be banned; a function which adds a passed-in element to a passed-in associative array would be banned, and so on. I guess it just has to be allowed. But I still have a problem with it, because such a program is only "logically pure", not "physically pure". And I have to wonder, will be be allowed to write our own "logically pure" functions and declare them pure? How will custom allocators for "new" fit into the picture? A custom allocator must surely access global state? A custom allocator could be written to be (logically) pure, or it could be decidedly impure (for example by initialising the created variables with rand()). Will purity be transitive and as strictly enforced as const? Will we be able to cast away purity when we need to? I guess we'll wait and see, but I would surely like to know.

Sep 19 2007
parent reply Robert Fraser <fraserofthenight gmail.com> writes:
Heh, never knew that; that's good news. Thanks for brightening up my day!

Brad Roberts Wrote:

 Yes you can.  The malloc subsystem out of practical necessity is 
 internally thread safe and can be used concurrently without concern.
 
 Robert Fraser wrote:
 Someone probably already mentioned this, but you can't (easily, i.e. maybe
some sort of heap partitioning could do it but you'd need OS support for that)
automatically parallelize functions that allocate memory (since the heap is a
shared resource), which was one of the big selling points of "pure". So I'm
assuming that allocations are out of the equation. And since exceptions are
most often paired up with an allocation (although not necessarily), those might
be out, too. Not totally sure about that one.
 
 There's no reason stack-allocated/scope classes couldn't be used, though.
 
 Janice Caron Wrote:
 
 On 9/19/07, Nathan Reed <nathaniel.reed gmail.com> wrote:
 CTFE is
 basically like running an interpreter, right?

chlorotrifluoroethylene, right? :-) My feeling is that Ingo is basically right, in that Walter needs to assume that new can never fail - that it can successfully allocate infinite amounts of memory without causing trouble. If that assumption holds (...obviously it doesn't, but bear with me...) then new is allowed. Otherwise the simplest things would be banned: a function which does toString (e.g. that turns 42 into "42") would be banned; a function which adds a passed-in element to a passed-in associative array would be banned, and so on. I guess it just has to be allowed. But I still have a problem with it, because such a program is only "logically pure", not "physically pure". And I have to wonder, will be be allowed to write our own "logically pure" functions and declare them pure? How will custom allocators for "new" fit into the picture? A custom allocator must surely access global state? A custom allocator could be written to be (logically) pure, or it could be decidedly impure (for example by initialising the created variables with rand()). Will purity be transitive and as strictly enforced as const? Will we be able to cast away purity when we need to? I guess we'll wait and see, but I would surely like to know.



Sep 19 2007
parent reply Bill Baxter <dnewsgroup billbaxter.com> writes:
 Robert Fraser wrote:

maybe some sort of heap partitioning could do it but you'd need OS support for that) automatically parallelize functions that allocate memory (since the heap is a shared resource), which was one of the big selling points of "pure". So I'm assuming that allocations are out of the equation. And since exceptions are most often paired up with an allocation (although not necessarily), those might be out, too. Not totally sure about that one. There's no reason stack-allocated/scope classes couldn't be used, though.

 Brad Roberts Wrote:
 
 Yes you can.  The malloc subsystem out of practical necessity is 
 internally thread safe and can be used concurrently without concern.


 Robert Fraser wrote:
 Heh, never knew that; that's good news. Thanks for brightening up my day!

On the other hand the support for concurrency may consist of little more than a global mutex. That means that the performance benefits of launching multiple threads might not be so great since they'll all just end up in a queued waiting for their turn to malloc. I have no idea what they do in practice though. Do typical malloc systems actually allow simultaneous allocations in multiple threads? --bb
Sep 19 2007
parent reply Nathan Reed <nathaniel.reed gmail.com> writes:
Bill Baxter wrote:
 On the other hand the support for concurrency may consist of little more 
 than a global mutex.  That means that the performance benefits of 
 launching multiple threads might not be so great since they'll all just 
 end up in a queued waiting for their turn to malloc.
 
 I have no idea what they do in practice though.  Do typical malloc 
 systems actually allow simultaneous allocations in multiple threads?
 
 --bb

On a multicore system the allocator might conceivably have a heap for each core or something like that. However, even if this is not the case, allocation in a garbage-collected language is very quick and the mutex would only need to be held for the duration of the allocation. With a pure function that does some allocation and then a bunch of computation, all the computations could proceed in parallel once each thread's memory had been allocated. Synchronized allocations certainly need not negate the benefits of parallelism. Thanks, Nathan Reed
Sep 19 2007
parent reply Bill Baxter <dnewsgroup billbaxter.com> writes:
Nathan Reed wrote:
 Bill Baxter wrote:
 On the other hand the support for concurrency may consist of little 
 more than a global mutex.  That means that the performance benefits of 
 launching multiple threads might not be so great since they'll all 
 just end up in a queued waiting for their turn to malloc.

 I have no idea what they do in practice though.  Do typical malloc 
 systems actually allow simultaneous allocations in multiple threads?

 --bb

On a multicore system the allocator might conceivably have a heap for each core or something like that. However, even if this is not the case, allocation in a garbage-collected language is very quick and the mutex would only need to be held for the duration of the allocation.

I've heard this claim a couple of times recently (allocation is fast in GC languages). I don't think it's true, though, at least not of D. Maybe it's a distortion of the the claim that *overall*, memory management in a GC language can be faster than explicit management. But that's not because allocations are cheaper. It's because you don't have to keep deleting things one by one as they go out of scope. Instead, a bunch of out-of-scope objects all get reclaimed together during a GC sweep. But the GC sweep takes place when you do an allocation, so in addition to the normal work necessary to do an allocation (finding a contiguous block of free memory that's big enough) you also have to do a full mark-sweep of all reachable memory. I don't see how mark+sweep+allocate can be faster than just allocate. [Rereading that now, though, it doesn't make much sense to me -- if there's a big gain to be had just by not explicitly freeing memory until the next allocation attempt, then regular malloc implementations could do that too, queuing up the free() requests until the next malloc call.] The 'fast allocation' claim may be true of generational GCs or other fancy GCs, but D currently has a mark&sweep type. Sean and other folks more in the know about GC -- please feel free to correct me on any of this. --bb
Sep 19 2007
next sibling parent Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Bill Baxter wrote:
 The 'fast allocation' claim may be true of generational GCs or other 
 fancy GCs, but D currently has a mark&sweep type.

Yes, AFAIK the 'fast allocation' is mostly true of garbage collectors that move objects that aren't collected, since they have all free memory in a contiguous block and can just increment a pointer. However, I can see /deallocation/ being faster with a GC since the cost of explicit deallocation isn't just limited to the free/delete call; it also often incurs the cost of tracking *when* to call them (which often means using things like reference counts, which in a multithreaded app need a mutex...). Also, a GC gets to postpone memory deallocation until it's actually needed, which can sometimes mean they don't have to do it at all (if your program doesn't allocate all that much). Note: I'm not saying using a GC is necessarily faster (since I don't have the numbers), just that I can see how it *could* be. Of course, there's also the programmer convenience factor :).
Sep 19 2007
prev sibling parent reply Nathan Reed <nathaniel.reed gmail.com> writes:
Bill Baxter wrote:
 The 'fast allocation' claim may be true of generational GCs or other 
 fancy GCs, but D currently has a mark&sweep type.

I didn't know that; I assumed D was using at least a copying collector. You're right that allocation wouldn't be any faster with a mark-and-sweep collector than in a language with explicit memory management. D /could/ theoretically use a copying collector or generational collector though; as far as I know there's no barrier except writing the code for it. Thanks, Nathan Reed
Sep 19 2007
parent Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Nathan Reed wrote:
 Bill Baxter wrote:
 The 'fast allocation' claim may be true of generational GCs or other 
 fancy GCs, but D currently has a mark&sweep type.

I didn't know that; I assumed D was using at least a copying collector. You're right that allocation wouldn't be any faster with a mark-and-sweep collector than in a language with explicit memory management. D /could/ theoretically use a copying collector or generational collector though; as far as I know there's no barrier except writing the code for it.

The big blocker to any sort of copying/moving collector is that it needs to know which bytes represent pointers and which don't. This would require the compiler to emit extra type information it currently doesn't. In particular: * ClassInfo.offTi[] would need to actually be filled (IIRC it's always an empty array currently) * Similar data is needed for stack frames. AFAIK there's currently no way to communicate this. * Pointer data for structs, delegates, array types, AAs etc. would also need to be provided (but could be "inlined" into the containing class/stackframe information) * Heap objects would need some kind of tag to indicate how to find pointers in them. For classes it would be easy, just get .classinfo.offTi through the vtable ptr, but for example structs and array data can also be heap-allocated and don't have a vtable ptr. In fact, for arrays it isn't even constant (though a 'repeat x times to visit all allocated data' marker could be used). The problem that requires all that is the fact that when you move a heap cell, you need to update all (reachable) pointers/references to it (but not non-pointer data that happens to *look* like a pointer). So the GC can only do this when it is sure it can update all pointers to it, and *only* pointers to it. A GC could of course refrain from moving objects if it's not sure everything that looks like a pointer to it actually is one, but that would negate at least some of the benefit of a moving collector. (An interesting idea might be to try and use debugging information as the source of type information. However, even though that might work for stack data it could be harder for heap data since all you've got to go on is the declared types of the pointers to it; and void*s could point to anything. Besides, requiring debug info for the GC to work well is a bit of a hack) Another issue is that sometimes the compiler simply doesn't *know* whether something is a pointer or not. Consider: * You can allocate void[]s, which explicitly *may* contain pointers but don't necessarily consist of only pointers. * Unions of pointer and non-pointer types * Code that's linked in could be compiled by another compiler entirely (for instance, a library written in C). Currently using these just requires that you keep a pointer where the GC can find it, but if you need to find all pointers to heap objects you'd need non-D code to declare where exactly they keep their pointers... (Assuming it can't be inferred from sources such as debug info)
Sep 19 2007
prev sibling parent Brad Roberts <braddr puremagic.com> writes:
Yes you can.  The malloc subsystem out of practical necessity is 
internally thread safe and can be used concurrently without concern.

Robert Fraser wrote:
 Someone probably already mentioned this, but you can't (easily, i.e. maybe
some sort of heap partitioning could do it but you'd need OS support for that)
automatically parallelize functions that allocate memory (since the heap is a
shared resource), which was one of the big selling points of "pure". So I'm
assuming that allocations are out of the equation. And since exceptions are
most often paired up with an allocation (although not necessarily), those might
be out, too. Not totally sure about that one.
 
 There's no reason stack-allocated/scope classes couldn't be used, though.
 
 Janice Caron Wrote:
 
 On 9/19/07, Nathan Reed <nathaniel.reed gmail.com> wrote:
 CTFE is
 basically like running an interpreter, right?

chlorotrifluoroethylene, right? :-) My feeling is that Ingo is basically right, in that Walter needs to assume that new can never fail - that it can successfully allocate infinite amounts of memory without causing trouble. If that assumption holds (...obviously it doesn't, but bear with me...) then new is allowed. Otherwise the simplest things would be banned: a function which does toString (e.g. that turns 42 into "42") would be banned; a function which adds a passed-in element to a passed-in associative array would be banned, and so on. I guess it just has to be allowed. But I still have a problem with it, because such a program is only "logically pure", not "physically pure". And I have to wonder, will be be allowed to write our own "logically pure" functions and declare them pure? How will custom allocators for "new" fit into the picture? A custom allocator must surely access global state? A custom allocator could be written to be (logically) pure, or it could be decidedly impure (for example by initialising the created variables with rand()). Will purity be transitive and as strictly enforced as const? Will we be able to cast away purity when we need to? I guess we'll wait and see, but I would surely like to know.


Sep 19 2007
prev sibling next sibling parent reply 0ffh <spam frankhirsch.net> writes:
Janice Caron wrote:
 Is this function pure?
 
 int probablyNotPure(int x)
 {
 	try
 	{
 		char[] a = new char[x];
 		return a.length;
 	}
 	catch
 	{
 		return 0;
 	}
 }

Actually, in spite of all the "Not quite sure, probably not." replies you had so far, I want to state boldly: Yes, sure, why not? :-) Scared by memory allocation changing the state of the GC? You may give a damn: When the function leaves, the dynamic array is dead, and thus not part of the state any more. Functional purity has only these conditions: * The result is a function of the parameters, and only of the parameters. * Variables can be created and duplicated, but not changed. I don't see your example violating these conditions. Regards, frank
Sep 18 2007
parent 0ffh <spam frankhirsch.net> writes:
Janice Caron wrote:
 You don't? The way I see it, if the input is 5, then output might
 sometimes be 5 and sometimes be 0. That would seem to violate your
 first bullet-point, would it not?

That's surely a point which can be discussed: I could, for example, say: A result of 0 happens only in case of an error condition. Sure, you loose functional purity then, but there's just nothing you can do about that. (Actually, in *this* special case you could do something about it, but not in the general case.) Your function could return 23, because a lump of radiation hit the stack memory and flipped two bits. That's just to bad, but that's it. Therefore: In case of errors, there's an exception! (And, in this case, a double entendre. :-) Regards, frank
Sep 18 2007
prev sibling next sibling parent reply Ingo Oeser <ioe-news rameria.de> writes:
Janice Caron wrote:

 Is this function pure?

Yes.
 int probablyNotPure(int x)
 {
       try
       {
               char[] a = new char[x];
               return a.length;
       }
       catch
       {
               return 0;
       }
 }

A valid optimisation is to assume that x -> a.length, since a.length is directly derived from x. If you don't use (e.g. reference) the memory you are allocating, the compiler should be free to optimise the allocation away. The catch will be dead code, but the function is also valid, with the catch being dead code, since that would happen with infinite storage or a GC deleting unreachable reference (like D has :-) ).
 I must assume not, since it's not deterministic. But which part of it
 wrong? Should "new" be forbidden? Or is "try/catch"? Or both?

It can be optimised into a deterministic function without any algorithmic side effect. So that function actually IS deterministic and thus pure :-) Best Regards Ingo Oeser
Sep 18 2007
parent reply Ingo Oeser <ioe-news rameria.de> writes:
Janice Caron wrote:

 OK, so change it to
 
 char[] probablyNotPure(int x)
 {
       try
       {
               char[] a = new char[x];
               return a;
       }
       catch
       {
               return null;
       }
 }
 
 and consider that the input might be 0x7FFFFFFF.
 
 I think it looks less easy to optimise now.

Ok, now that's impure, since now you are using the memory by returning it. This is really not side effect free. You are draining a global resource, which is a no-no for pure functions. All the discussions and points raised by the other people now apply. Best Regards Ingo Oeser
Sep 18 2007
next sibling parent reply 0ffh <spam frankhirsch.net> writes:
Ingo Oeser wrote:
 Janice Caron wrote:
 
 OK, so change it to

 char[] probablyNotPure(int x)
 [...]
 I think it looks less easy to optimise now.

Ok, now that's impure, since now you are using the memory by returning it. This is really not side effect free. You are draining a global resource, which is a no-no for pure functions. All the discussions and points raised by the other people now apply.

I am afraid I must disagree. We're talking functional purity in a programming context, not maths. It is perfectly allowable in a purely functional language to return a pointer to a heap allocated data structure. Regards, frank
Sep 18 2007
parent Ingo Oeser <ioe-news rameria.de> writes:
0ffh wrote:
 I am afraid I must disagree.
 We're talking functional purity in a programming context, not maths.
 It is perfectly allowable in a purely functional language to return
 a pointer to a heap allocated data structure.

Sorry, mixed that up with const functions. Those (const functions) only depend on their parameters and have no effect besides their return value. At least GCC defines them this way. Sorry again. Best Regards Ingo Oeser
Sep 18 2007
prev sibling parent Nathan Reed <nathaniel.reed gmail.com> writes:
Ingo Oeser wrote:
 Ok, now that's impure, since now you are using the memory by returning it.
 This is really not side effect free. You are draining a global resource,
 which is a no-no for pure functions. All the discussions and points
 raised by the other people now apply.

I think if you take this point of view, you have to conclude that even a function that allocates memory that becomes garbage by the end of the function is strictly not side-effect-free, since now some other code that is executing simultaneously with our "pure" function could run out of memory, where it wouldn't have otherwise. In other words, memory allocation in a pure function would have to be totally disallowed, drastically reducing the usefulness of pure functions. In fact, following the same line of reasoning, the space consumed by the *stack frame* of a pure function would make it strictly not side-effect-free. Memory allocation shouldn't be an impediment to making a function pure. If we actually run out of memory, we've got bigger problems than breaking the purity of the functions. Thanks, Nathan Reed
Sep 18 2007
prev sibling next sibling parent "Janice Caron" <caron800 googlemail.com> writes:
On 9/18/07, Ingo Oeser <ioe-news rameria.de> wrote:
 A valid optimisation is to assume that x -> a.length, since a.length
 is directly derived from x. If you don't use (e.g. reference) the memory
 you are allocating, the compiler should be free to optimise the allocation
 away.

OK, so change it to char[] probablyNotPure(int x) { try { char[] a = new char[x]; return a; } catch { return null; } } and consider that the input might be 0x7FFFFFFF. I think it looks less easy to optimise now.
Sep 18 2007
prev sibling next sibling parent "Janice Caron" <caron800 googlemail.com> writes:
On 9/18/07, 0ffh <spam frankhirsch.net> wrote:
 Janice Caron wrote:
 Is this function pure?

 int probablyNotPure(int x)
 {
       try
       {
               char[] a = new char[x];
               return a.length;
       }
       catch
       {
               return 0;
       }
 }

Functional purity has only these conditions: * The result is a function of the parameters, and only of the parameters. * Variables can be created and duplicated, but not changed. I don't see your example violating these conditions.

You don't? The way I see it, if the input is 5, then output might sometimes be 5 and sometimes be 0. That would seem to violate your first bullet-point, would it not?
Sep 18 2007
prev sibling next sibling parent "Janice Caron" <caron800 googlemail.com> writes:
On 9/18/07, 0ffh <spam frankhirsch.net> wrote:
 Therefore: In case of errors, there's an exception!
 (And, in this case, a double entendre. :-)

I don't think it's that simple. Exceptions should be allowed as flow control structures even in pure code, and don't necessarily represent an error. For example, a (pure) divide function might legitimately throw a divide-by-zero exception, and then a (also pure) fixed-point saturating divide function might legitimately call the first divide function, catch the exception, and return the saturation value. None of that violates purity. I don't think you can ban try/catch in pure code, nor assume that exception==error. Of course, only Walter can tell us the /real/ (as in, will-be-implemented) answer!
Sep 18 2007
prev sibling next sibling parent "Janice Caron" <caron800 googlemail.com> writes:
On 9/19/07, Bruno Medeiros <brunodomedeiros+spam com.gmail> wrote:
 Can you give another example? The one you gave:

 int definitelyNotPure()
 {
     int x;
     return cast(int)&x;
 }

is invalid even for non-pure functions.

If you insist... int definitelyNotPure() { auto x = new int[1]; return cast(int)&x; }
Sep 18 2007
prev sibling parent "Janice Caron" <caron800 googlemail.com> writes:
On 9/19/07, Nathan Reed <nathaniel.reed gmail.com> wrote:
 CTFE is
 basically like running an interpreter, right?

I presume that stands for Compile Time Function Execution and not chlorotrifluoroethylene, right? :-) My feeling is that Ingo is basically right, in that Walter needs to assume that new can never fail - that it can successfully allocate infinite amounts of memory without causing trouble. If that assumption holds (...obviously it doesn't, but bear with me...) then new is allowed. Otherwise the simplest things would be banned: a function which does toString (e.g. that turns 42 into "42") would be banned; a function which adds a passed-in element to a passed-in associative array would be banned, and so on. I guess it just has to be allowed. But I still have a problem with it, because such a program is only "logically pure", not "physically pure". And I have to wonder, will be be allowed to write our own "logically pure" functions and declare them pure? How will custom allocators for "new" fit into the picture? A custom allocator must surely access global state? A custom allocator could be written to be (logically) pure, or it could be decidedly impure (for example by initialising the created variables with rand()). Will purity be transitive and as strictly enforced as const? Will we be able to cast away purity when we need to? I guess we'll wait and see, but I would surely like to know.
Sep 19 2007