www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Purity (D2 standard libraries / object.d)

reply Jason House <jason.james.house gmail.com> writes:
In the latest D2, both the frontend and backend support use and optimization
when user code declares pure code. This isn't particularly useful when pure
functions can't call standard library code.

When considering the standard library changes, I realized that object.d could
change. I believe a pure opCmp or toString could break user code with impure
versions of those functions. Would that kind of a change to object.d cause any
real problems for D2 users?

Are there other ambiguous purity options when converting the standard
libraries? Which parts of Phobos/druntime would you expect to be pure/impure?
Jan 09 2009
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Jason House wrote:
 When considering the standard library changes, I realized that
 object.d could change. I believe a pure opCmp or toString could break
 user code with impure versions of those functions. Would that kind of
 a change to object.d cause any real problems for D2 users?

As Andrei pointed out, the trouble with making the Object functions pure is if you want to do an override that caches its value.
Jan 09 2009
next sibling parent reply Robert Fraser <fraserofthenight gmail.com> writes:
Walter Bright Wrote:

 Jason House wrote:
 When considering the standard library changes, I realized that
 object.d could change. I believe a pure opCmp or toString could break
 user code with impure versions of those functions. Would that kind of
 a change to object.d cause any real problems for D2 users?

As Andrei pointed out, the trouble with making the Object functions pure is if you want to do an override that caches its value.

If you have: class A { invariant final pure bool opEquals(invariant(Object) other) { ... } } int main() { auto a = cast(invariant) new A(); auto b = cast(invariant) new A(); if(a1 == a2) { return 0; } else { return 1; } } ... will the compiler optimize it? Since A.opEquals() is final, there's no polymorphism, so it should be able to, but I don't know about the implementation.
Jan 09 2009
parent Walter Bright <newshound1 digitalmars.com> writes:
Robert Fraser wrote:
 ... will the compiler optimize it? Since A.opEquals() is final,
 there's no polymorphism, so it should be able to, but I don't know
 about the implementation.

It should optimize it.
Jan 09 2009
prev sibling next sibling parent reply Jason House <jason.james.house gmail.com> writes:
Walter Bright Wrote:

 Jason House wrote:
 When considering the standard library changes, I realized that
 object.d could change. I believe a pure opCmp or toString could break
 user code with impure versions of those functions. Would that kind of
 a change to object.d cause any real problems for D2 users?

As Andrei pointed out, the trouble with making the Object functions pure is if you want to do an override that caches its value.

One could easily generalize that to mean non-final virtual functions should never be pure. Memoizaton is a pretty basic need. I thought that was in D's future. Is that not true?
Jan 09 2009
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Jason House wrote:
 Walter Bright Wrote:
 
 Jason House wrote:
 When considering the standard library changes, I realized that 
 object.d could change. I believe a pure opCmp or toString could
 break user code with impure versions of those functions. Would
 that kind of a change to object.d cause any real problems for D2
 users?

pure is if you want to do an override that caches its value.

One could easily generalize that to mean non-final virtual functions should never be pure. Memoizaton is a pretty basic need. I thought that was in D's future. Is that not true?

Memoization and purity don't mix. For each function, you'll have to pick which way you want to go. I don't see what that has to do with D's future, though, the language allows either.
Jan 09 2009
next sibling parent reply Jason House <jason.james.house gmail.com> writes:
Walter Bright wrote:

 Memoization and purity don't mix. 

True, but... All pure functions can be memoized. They can be memoized by the caller of the pure function, or the pure function could be rewritten to automatically memoize. When writing a pure but potentially costly function, the coder may consider memoization. They must predict how frequently the function would get called with the same input. They must also decide if the time/space trade-off is worth it for the program as a whole and how inconvenient it would be to call the function. If it's called from another pure function, well... they don't mix. All-in-all, memoization is a tough call to make. For library writers, it's even tougher. They may not know how their code would get used and may be unable to make the call about memoization. It's possible to provide a pure function and a memoized alternative. Of course, there are many memoization options available, especially when memory space is restricted or various thread safety/locking methods may be desired. A library writer may just declare that if the user wants memoization that they should use a memoization library to do it. Anyone who writes an inheritable classes also have to predict if the writers of classes that inherit from their class will want to memoize or not. Some may forbid it because then they can declare the function pure. Others will feel like they can't declare any virtual functions as pure (and therefore allow memoization).
 For each function, you'll have to pick
 which way you want to go. I don't see what that has to do with D's
 future, though, the language allows either.

Somehow, I had the impression that D might use memoization as some kind of optimization once pure functions are embraced. That was probably a misunderstanding on my part. I do think that writers of generic/library/inheritable code will continuously revisit the question of memoization. That may affect the future of D usage, but not the underlying language specification.
Jan 09 2009
next sibling parent Walter Bright <newshound1 digitalmars.com> writes:
Jason House wrote:
 Somehow, I had the impression that D might use memoization as some
 kind of optimization once pure functions are embraced.  That was
 probably a misunderstanding on my part.

Another word for that is "common subexpressions". And yes, the compiler has the option to do that, and in fact does do that (in a small way) in the latest release. But what I was talking about is the user himself doing memoization.
Jan 09 2009
prev sibling parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2009-01-09 20:45:57 -0500, Jason House <jason.james.house gmail.com> said:

 For each function, you'll have to pick
 which way you want to go. I don't see what that has to do with D's
 future, though, the language allows either.

Somehow, I had the impression that D might use memoization as some kind of optimization once pure functions are embraced. That was probably a misunderstanding on my part.

Hum, could the compiler be trusted to add the memoization code to pure functions so they can stay pure? Something like this: pure memoize int costlyCalculus(int i, int j) { return i + j; } being transformed into this by the compiler? int[TypeTuple!(i, j)] _costlyCalculus_cache; pure int costlyCalculus(int i, int j) { { int* _found_result = Tuple!(i, j) in _costlyCalculus_cache; if (_found_result) return _found_result; } { int _calculated_result = i + j; _costlyCalculus_cache[Tuple!(i, j)] = _calculated_result; return _calculated_result; } } Surely this way the compiler would be able to bypass purity checks for accessing the cache while still ensuring that the function has no side effect and always return the same result for the same arguments. The downside being that a hash table may not always be the best memoization technique. Perhaps then it belongs more in the standard library. pure int costlyCalculus(int i, int j); Memoizer!(costlyCalculus) memoizedCostlyCalculus; where Memoizer is a struct template defining a pure opCall and containing the cache it can access by casting away purity in a few places. But it makes it more difficult to add memoization while overriding a function. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Jan 09 2009
next sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Michel Fortin wrote:
 Hum, could the compiler be trusted to add the memoization code to pure 
 functions so they can stay pure?

If the compiler does general memoization on pure functions, all it has to do is use the bits of the arguments passed on the stack to the function as an index into an associative array of the return values. The problem is identifying if this would be faster than recomputing the return value.
Jan 09 2009
next sibling parent reply Jason House <jason.james.house gmail.com> writes:
Walter Bright Wrote:

 Michel Fortin wrote:
 Hum, could the compiler be trusted to add the memoization code to pure 
 functions so they can stay pure?

If the compiler does general memoization on pure functions, all it has to do is use the bits of the arguments passed on the stack to the function as an index into an associative array of the return values. The problem is identifying if this would be faster than recomputing the return value.

What about a compiler hint similar to inline or register?
Jan 09 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Jason House wrote:
 Walter Bright Wrote:
 
 Michel Fortin wrote:
 Hum, could the compiler be trusted to add the memoization code to pure 
 functions so they can stay pure?

to do is use the bits of the arguments passed on the stack to the function as an index into an associative array of the return values. The problem is identifying if this would be faster than recomputing the return value.

What about a compiler hint similar to inline or register?

Maybe making a conscious effort to leave the language feature as a last resort action would be better. Creating a memoization feature is a fish, creating a language that allows implementing memoization is fishing. :o/ Andrei
Jan 09 2009
parent reply Jason House <jason.james.house gmail.com> writes:
Andrei Alexandrescu Wrote:

 Jason House wrote:
 Walter Bright Wrote:
 
 Michel Fortin wrote:
 Hum, could the compiler be trusted to add the memoization code to pure 
 functions so they can stay pure?

to do is use the bits of the arguments passed on the stack to the function as an index into an associative array of the return values. The problem is identifying if this would be faster than recomputing the return value.

What about a compiler hint similar to inline or register?

Maybe making a conscious effort to leave the language feature as a last resort action would be better. Creating a memoization feature is a fish, creating a language that allows implementing memoization is fishing. :o/ Andrei

I chose nearly dead keywords for a reason ;) If I got Walter to think about how to implement "pure" memoizers for an extra half hour, I will consider myself highly successful. I must leave the fishing discussion up to the two of you in a coffee shop! (I'd love to sit in and participate with that discussion, but online forums make the conversation far too difficult)
Jan 10 2009
parent Walter Bright <newshound1 digitalmars.com> writes:
Jason House wrote:
 I chose nearly dead keywords for a reason ;) If I got Walter to think
 about how to implement "pure" memoizers for an extra half hour, I
 will consider myself highly successful. I must leave the fishing
 discussion up to the two of you in a coffee shop! (I'd love to sit in
 and participate with that discussion, but online forums make the
 conversation far too difficult)

Coffee shop discussions are certainly the best.
Jan 10 2009
prev sibling next sibling parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from Walter Bright (newshound1 digitalmars.com)'s article
 Michel Fortin wrote:
 Hum, could the compiler be trusted to add the memoization code to pure
 functions so they can stay pure?

to do is use the bits of the arguments passed on the stack to the function as an index into an associative array of the return values. The problem is identifying if this would be faster than recomputing the return value.

Wouldn't this also cause threading issues? The obvious solution would be to use TLS, but his requires duplicating the cache across threads. Also, using AAs internally like this would lead to very deeply hidden memory allocations, and therefore possibly more frequent GC, etc.
Jan 09 2009
next sibling parent Michel Fortin <michel.fortin michelf.com> writes:
On 2009-01-09 23:14:07 -0500, dsimcha <dsimcha yahoo.com> said:

 Wouldn't this also cause threading issues?

Well, I'd trust the compiler-generated code would not cause threading issues. :-)
 The obvious solution would be to use TLS, but his requires duplicating 
 the cache across threads.

Well, there isn't many choices. Either it's TLS, or the cache need to be synchronized (using either locks or lock-free algorithms; both will add some overhead anyway). That said, if you memoize a function member of a class, the memoization data could be added to the class and not be global. If the object is not shared across threads, then the memoization data isn't either. If the object is shared, then you'd need either TLS or synchronization.
 Also, using AAs internally like this would lead to very deeply hidden 
 memory allocations, and
 therefore possibly more frequent GC, etc.

Perhaps the cache needs to be a little smarter than a regular AA. You may not want to keep each and every value that was computed. Depending on the situation, keeping only the 100 last results may be enough, in which case you can dedicate a fixed amount of memory for caching. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Jan 10 2009
prev sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
dsimcha wrote:
 == Quote from Walter Bright (newshound1 digitalmars.com)'s article
 If the compiler does general memoization on pure functions, all it has
 to do is use the bits of the arguments passed on the stack to the
 function as an index into an associative array of the return values.
 The problem is identifying if this would be faster than recomputing the
 return value.

Wouldn't this also cause threading issues? The obvious solution would be to use TLS, but his requires duplicating the cache across threads. Also, using AAs internally like this would lead to very deeply hidden memory allocations, and therefore possibly more frequent GC, etc.

Immutable data doesn't have threading issues, which is the beauty of them.
Jan 10 2009
parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from Walter Bright (newshound1 digitalmars.com)'s article
 dsimcha wrote:
 == Quote from Walter Bright (newshound1 digitalmars.com)'s article
 If the compiler does general memoization on pure functions, all it has
 to do is use the bits of the arguments passed on the stack to the
 function as an index into an associative array of the return values.
 The problem is identifying if this would be faster than recomputing the
 return value.

Wouldn't this also cause threading issues? The obvious solution would be to use TLS, but his requires duplicating the cache across threads. Also, using AAs internally like this would lead to very deeply hidden memory allocations, and therefore possibly more frequent GC, etc.


You misunderstood the question. I mean, at the implementation level, even though it wouldn't be visible to the programmer, the AA that did the memoization would have to be mutable so it could be modified when a new value was to be memoized. This is where the threading issues would come in. What I was saying is that synchronizing on this would be bad for obvious reasons, and TLS would still not be great because you wouldn't be able to share memoized values across threads.
Jan 10 2009
parent Walter Bright <newshound1 digitalmars.com> writes:
dsimcha wrote:
 You misunderstood the question.  I mean, at the implementation level, even
though
 it wouldn't be visible to the programmer, the AA that did the memoization would
 have to be mutable so it could be modified when a new value was to be memoized.
 This is where the threading issues would come in.  What I was saying is that
 synchronizing on this would be bad for obvious reasons, and TLS would still
not be
 great because you wouldn't be able to share memoized values across threads.

You're right, I hadn't thought of that. But you could do a per-thread memoization using thread local storage. No sync problems.
Jan 10 2009
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Walter Bright wrote:
 Michel Fortin wrote:
 Hum, could the compiler be trusted to add the memoization code to pure 
 functions so they can stay pure?

If the compiler does general memoization on pure functions, all it has to do is use the bits of the arguments passed on the stack to the function as an index into an associative array of the return values.

Not the bits, the full objects if they have indirection. (Pure functions should accept const parameters!) And it's not only simple associative array. It could also be a dense array, a little interpolation engine etc.
 The problem is identifying if this would be faster than recomputing the 
 return value.

I used memoizers for exp and log (the most called functions in some code I wrote) and it made the original version feel like it was driving in reverse. Andrei
Jan 09 2009
next sibling parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2009-01-10 00:10:11 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 The problem is identifying if this would be faster than recomputing the 
 return value.

I used memoizers for exp and log (the most called functions in some code I wrote) and it made the original version feel like it was driving in reverse.

That's only true because, in your specific context, those functions were called often with the same input. In a program that rarely use the same inputs, your memoizing functions will just waste time and memory. So to determine if a function is worth memoizing, you have to say yes to those two things: 1. is it faster than recomputing the return value? 2. is this function called often with the same inputs? While the compiler could take an educated guess at answering 1 given the code of a function, question 2 can only be answered by knowing the usage pattern of various functions. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Jan 10 2009
next sibling parent bearophile <bearophileHUGS lycos.com> writes:
Michel Fortin:

 So to determine if a function is worth memoizing, you have to say yes 
 to those two things:
 1.	is it faster than recomputing the return value?
 2.	is this function called often with the same inputs?
 While the compiler could take an educated guess at answering 1 given 
 the code of a function, question 2 can only be answered by knowing the 
 usage pattern of various functions.

GCC has profile-driven optimization. I presume it may help in such problems too.
 Perhaps the cache needs to be a little smarter than a regular AA. You
 may not want to keep each and every value that was computed. Depending
 on the situation, keeping only the 100 last results may be enough, in
 which case you can dedicate a fixed amount of memory for caching.

Right. There are time-limited memoization strategies, and other ones as well. In Python I have seen plenty of them: http://code.activestate.com/recipes/325905/ http://code.activestate.com/recipes/498110/ Bye, bearophile
Jan 10 2009
prev sibling next sibling parent Daniel Keep <daniel.keep.lists gmail.com> writes:
Michel Fortin wrote:
 On 2009-01-10 00:10:11 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:
 
 The problem is identifying if this would be faster than recomputing 
 the return value.

I used memoizers for exp and log (the most called functions in some code I wrote) and it made the original version feel like it was driving in reverse.

That's only true because, in your specific context, those functions were called often with the same input. In a program that rarely use the same inputs, your memoizing functions will just waste time and memory. So to determine if a function is worth memoizing, you have to say yes to those two things: 1. is it faster than recomputing the return value? 2. is this function called often with the same inputs? While the compiler could take an educated guess at answering 1 given the code of a function, question 2 can only be answered by knowing the usage pattern of various functions.

... so let's tell it what the usage patterns are! 1. Compile with profiling. $ dmd -profile pure_program.d 2. Feed the profiled executable a big gob of test data. Om nom nom. $ for I in {0..99} do; pure_program < test_data_$I.dat; done 3. Recompile, giving the compiler the trace log, telling it which functions got called frequently, and where time was spent. $ dmd -trace=trace.log pure_program.d Viola; now the compiler can make a well-informed guess at compile-time. All it really lacks is information on how many distinct argument sets any given pure functions gets, but I imagine that could be added. I think feeding the program test data with profiling enabled and then re-compiling would be a very acceptable trade off. -- Daniel
Jan 10 2009
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Michel Fortin wrote:
 On 2009-01-10 00:10:11 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:
 
 The problem is identifying if this would be faster than recomputing 
 the return value.

I used memoizers for exp and log (the most called functions in some code I wrote) and it made the original version feel like it was driving in reverse.

That's only true because, in your specific context, those functions were called often with the same input. In a program that rarely use the same inputs, your memoizing functions will just waste time and memory.

No, because I use interpolation. Andrei
Jan 10 2009
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Andrei Alexandrescu wrote:
 Michel Fortin wrote:
 On 2009-01-10 00:10:11 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:

 The problem is identifying if this would be faster than recomputing 
 the return value.

I used memoizers for exp and log (the most called functions in some code I wrote) and it made the original version feel like it was driving in reverse.

That's only true because, in your specific context, those functions were called often with the same input. In a program that rarely use the same inputs, your memoizing functions will just waste time and memory.

No, because I use interpolation.

That's way beyond the ability of a compiler to do automatically. The compiler would have to understand that the pure function produces continuous results.
Jan 10 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Walter Bright wrote:
 Andrei Alexandrescu wrote:
 Michel Fortin wrote:
 On 2009-01-10 00:10:11 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:

 The problem is identifying if this would be faster than recomputing 
 the return value.

I used memoizers for exp and log (the most called functions in some code I wrote) and it made the original version feel like it was driving in reverse.

That's only true because, in your specific context, those functions were called often with the same input. In a program that rarely use the same inputs, your memoizing functions will just waste time and memory.

No, because I use interpolation.

That's way beyond the ability of a compiler to do automatically. The compiler would have to understand that the pure function produces continuous results.

You're replying to the wrong guy. I'm saying: the compiler shouldn't have to do so, but it should allow functions to do it. Lately it looks like a lot of the problems discussed here inevitably lead to a built-in feature. We want properties, let's have a property thingy. We want memoization, let's have a memoize thingy. We want optimally aligned structures, let's have an optimal align thingy. I'm holding my breath for a request for the kitchen sink thingy. Andrei
Jan 10 2009
next sibling parent reply Jason House <jason.james.house gmail.com> writes:
Andrei Alexandrescu wrote:

 That's way beyond the ability of a compiler to do automatically. The
 compiler would have to understand that the pure function produces
 continuous results.

You're replying to the wrong guy. I'm saying: the compiler shouldn't have to do so, but it should allow functions to do it.

Here's the simplest memoization example I could come up with. Maybe we should try to figure out how to get this to work and then expand to other cases? /** Fibonocci number generator based on * http://en.wikipedia.org/wiki/Fibonacci_number **/ pure int fibonocci(int n) in { assert( i>=0 && i<10 ); } out(result) { assert( result>0 ); } body{ static __thread int[10] cache = [0,1,-1,-1,-1,-1,-1,-1,-1,-1]; if (cache[n] >= 0) return i; else return fibonacci(n-1)+fibonacci(n-2); } unittest{ assert(fibonocci(4) == 3); assert(fibonocci(9) == 34); } Note the thread-local static variable. That's guaranteed to be thread safe and is easy to prove that it has no side effects outside of the fibonocci function. Inevitably, I've embarrassed myself by not actually passing what I thought was simple code through a compiler...
 Lately it looks like a lot of the problems discussed here inevitably
 lead to a built-in feature. We want properties, let's have a property
 thingy. We want memoization, let's have a memoize thingy. We want
 optimally aligned structures, let's have an optimal align thingy. I'm
 holding my breath for a request for the kitchen sink thingy.

That kitchen sink thingy sounds useful. What does it do? ;)
Jan 11 2009
parent Stewart Gordon <smjg_1998 yahoo.com> writes:
Jason House wrote:
 Andrei Alexandrescu wrote:
 
 That's way beyond the ability of a compiler to do automatically.  
 The compiler would have to understand that the pure function 
 produces continuous results.

shouldn't have to do so, but it should allow functions to do it.


So effectiely, the compiler could just trust pure functions to be actually pure, with UB for any that isn't?
 Here's the simplest memoization example I could come up with.

You mean the simplest example beyond just remembering a single property value?
 Maybe we should try to figure out how to get this to work and then
 expand to other cases?

 Note the thread-local static variable.  That's guaranteed to be
 thread safe and is easy to prove that it has no side effects outside
 of the fibonocci function.
 
 Inevitably, I've embarrassed myself by not actually passing what I
 thought was simple code through a compiler...

Indeed. Stewart.
Jan 11 2009
prev sibling parent Christopher Wright <dhasenan gmail.com> writes:
Bill Baxter wrote:
 But the call for a memoization thingy I just don't get.  No way that
 should be a compiler feature, just far too many ways to memoize with
 too many different tradeoffs.  And a memoizing wrapper function
 basically loses nothing in terms of expressiveness.
 
 --bb

The ability for you to memoize a pure function and get a pure function as a result. Your pure functions can't use the memoized version of the pure function. This means you might have to use a lot more memoization wrappers.
Jan 11 2009
prev sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Andrei Alexandrescu wrote:
 Walter Bright wrote:
 Michel Fortin wrote:
 Hum, could the compiler be trusted to add the memoization code to 
 pure functions so they can stay pure?

If the compiler does general memoization on pure functions, all it has to do is use the bits of the arguments passed on the stack to the function as an index into an associative array of the return values.

Not the bits, the full objects if they have indirection. (Pure functions should accept const parameters!)

You're right, the "just the bits on the stack" is if the arguments are immutable. Const arguments would have to include whatever was indirectly referenced, which I argue is impractical for the runtime to handle automatically.
Jan 10 2009
next sibling parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from Walter Bright (newshound1 digitalmars.com)'s article
 Andrei Alexandrescu wrote:
 Walter Bright wrote:
 Michel Fortin wrote:
 Hum, could the compiler be trusted to add the memoization code to
 pure functions so they can stay pure?

If the compiler does general memoization on pure functions, all it has to do is use the bits of the arguments passed on the stack to the function as an index into an associative array of the return values.

Not the bits, the full objects if they have indirection. (Pure functions should accept const parameters!)

immutable. Const arguments would have to include whatever was indirectly referenced, which I argue is impractical for the runtime to handle automatically.

Even if the bits are immutable, I can't see how this could work. What if something is GC'd and then that memory location gets overwritten? For example: import std.contracts, std.stdio; enum N = 100; void main() { foo(); bar(); } uint pureFun(invariant uint[] data) pure { // Do stuff. return someUint; } void foo() { uint[] myArray = new uint[N]; // Fill in myArray; invariant uint[] myInvariantArray = assumeUnique(myArray); writeln(pureFun(myInvariantArray)); } void bar() { // Same as foo(). } In this case, foo.myInvariantArray goes out of scope when foo() returns. When bar() allocates an array, it's entirely possible that the GC will give bar.myArray the same memory address as foo.myArray had. In this case, if you're only using stack bits to handle memoization, you're screwed.
Jan 10 2009
parent reply Walter Bright <newshound1 digitalmars.com> writes:
dsimcha wrote:
 In this case, foo.myInvariantArray goes out of scope when foo() returns.  When
 bar() allocates an array, it's entirely possible that the GC will give
bar.myArray
 the same memory address as foo.myArray had.  In this case, if you're only using
 stack bits to handle memoization, you're screwed.

That's the beauty of a garbage collector <g>. The bits will include a reference to the immutable data, which the gc will see, and so the gc will not reclaim that data.
Jan 10 2009
parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from Walter Bright (newshound1 digitalmars.com)'s article
 dsimcha wrote:
 In this case, foo.myInvariantArray goes out of scope when foo() returns.  When
 bar() allocates an array, it's entirely possible that the GC will give
bar.myArray
 the same memory address as foo.myArray had.  In this case, if you're only using
 stack bits to handle memoization, you're screwed.

reference to the immutable data, which the gc will see, and so the gc will not reclaim that data.

Good point, the bits in the internal memoization AA will contain a reference. I forgot about that. This leads to another problem, though: Wouldn't this cause a lot of soft memory leaks, since you're holding references to stuff that's otherwise no longer needed?
Jan 10 2009
parent reply Walter Bright <newshound1 digitalmars.com> writes:
dsimcha wrote:
 == Quote from Walter Bright (newshound1 digitalmars.com)'s article
 dsimcha wrote:
 In this case, foo.myInvariantArray goes out of scope when foo() returns.  When
 bar() allocates an array, it's entirely possible that the GC will give
bar.myArray
 the same memory address as foo.myArray had.  In this case, if you're only using
 stack bits to handle memoization, you're screwed.

reference to the immutable data, which the gc will see, and so the gc will not reclaim that data.

Good point, the bits in the internal memoization AA will contain a reference. I forgot about that. This leads to another problem, though: Wouldn't this cause a lot of soft memory leaks, since you're holding references to stuff that's otherwise no longer needed?

The job of managing those kind of design tradeoffs is something that the programmer should handle.
Jan 10 2009
parent Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Walter Bright wrote:
 dsimcha wrote:
 Good point, the bits in the internal memoization AA will contain a 
 reference.  I
 forgot about that.  This leads to another problem, though:  Wouldn't 
 this cause a
 lot of soft memory leaks, since you're holding references to stuff that's
 otherwise no longer needed?

The job of managing those kind of design tradeoffs is something that the programmer should handle.

*cough* And with weak pointer/reference types we could handle it quite nicely. *cough* ;)
Jan 12 2009
prev sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Walter Bright" wrote
 Andrei Alexandrescu wrote:
 Walter Bright wrote:
 Michel Fortin wrote:
 Hum, could the compiler be trusted to add the memoization code to pure 
 functions so they can stay pure?

If the compiler does general memoization on pure functions, all it has to do is use the bits of the arguments passed on the stack to the function as an index into an associative array of the return values.

Not the bits, the full objects if they have indirection. (Pure functions should accept const parameters!)

You're right, the "just the bits on the stack" is if the arguments are immutable. Const arguments would have to include whatever was indirectly referenced, which I argue is impractical for the runtime to handle automatically.

Bits on the stack also do not work for immutable reference arguments. Imagine two immutable strings that are exactly the same data, but have different pointers. You wouldn't want to memoize only on the pointer values (or you'd lose I think bits on the stack only work for non-reference arguments. It sounds like memoization is a non-automatic feature in any case. Which is a shame because I thought it was a common pure-function optimization by the way it was discussed in the past. -Steve
Jan 12 2009
prev sibling next sibling parent Michel Fortin <michel.fortin michelf.com> writes:
On 2009-01-09 22:41:01 -0500, Walter Bright <newshound1 digitalmars.com> said:

 If the compiler does general memoization on pure functions, all it has 
 to do is use the bits of the arguments passed on the stack to the 
 function as an index into an associative array of the return values.

True, but only in the absence of pointers to mutable data (pointers to immutable data should be fine). (Hum, and beware of moving GCs, and ignore any memoization data which may be present inside the arguments... isn't determining equality a job for opEquals?)
 The problem is identifying if this would be faster than recomputing the 
 return value.

That's why I was using a "memoized" attribute in the function declaration, so the programmer decides if this function needs memoization or not. I know it's often a tough decision even for the designer of a function, but if adding memoization becomes easy, as a library designer you could just not worry about it and let users create a memoization wrapper when necessary. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Jan 10 2009
prev sibling parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
Walter Bright wrote:
 Michel Fortin wrote:
 Hum, could the compiler be trusted to add the memoization code to pure 
 functions so they can stay pure?

If the compiler does general memoization on pure functions, all it has to do is use the bits of the arguments passed on the stack to the function as an index into an associative array of the return values.

Walter, before you go and implement any of this, I must point out that it's spelt "memorization" and "memorize". (Or "memorisation" and "memorise" if you're British, but that's an aside.) Stewart.
Jan 10 2009
parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
Stewart Gordon wrote:
<snip>
 Walter, before you go and implement any of this, I must point out that 
 it's spelt "memorization" and "memorize".  (Or "memorisation" and 
 "memorise" if you're British, but that's an aside.)

Or maybe not... http://en.wikipedia.org/wiki/Memoization Still, I'm confused.... Stewart.
Jan 10 2009
parent reply Christopher Wright <dhasenan gmail.com> writes:
Stewart Gordon wrote:
 Stewart Gordon wrote:
 <snip>
 Walter, before you go and implement any of this, I must point out that 
 it's spelt "memorization" and "memorize".  (Or "memorisation" and 
 "memorise" if you're British, but that's an aside.)

Or maybe not... http://en.wikipedia.org/wiki/Memoization Still, I'm confused.... Stewart.

I thought you were joking. What's confusing you?
Jan 10 2009
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Christopher Wright wrote:
 Stewart Gordon wrote:
 Stewart Gordon wrote:
 <snip>
 Walter, before you go and implement any of this, I must point out 
 that it's spelt "memorization" and "memorize".  (Or "memorisation" 
 and "memorise" if you're British, but that's an aside.)

Or maybe not... http://en.wikipedia.org/wiki/Memoization Still, I'm confused.... Stewart.

I thought you were joking. What's confusing you?

I forgot.
Jan 10 2009
parent Christopher Wright <dhasenan gmail.com> writes:
Walter Bright wrote:
 Christopher Wright wrote:
 Stewart Gordon wrote:
 Stewart Gordon wrote:
 <snip>
 Walter, before you go and implement any of this, I must point out 
 that it's spelt "memorization" and "memorize".  (Or "memorisation" 
 and "memorise" if you're British, but that's an aside.)

Or maybe not... http://en.wikipedia.org/wiki/Memoization Still, I'm confused.... Stewart.

I thought you were joking. What's confusing you?

I forgot.

Walter Bright is the new Stewart Gordon!
Jan 11 2009
prev sibling next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Michel Fortin wrote:
 On 2009-01-09 20:45:57 -0500, Jason House <jason.james.house gmail.com> 
 said:
 
 For each function, you'll have to pick
 which way you want to go. I don't see what that has to do with D's
 future, though, the language allows either.

Somehow, I had the impression that D might use memoization as some kind of optimization once pure functions are embraced. That was probably a misunderstanding on my part.

Hum, could the compiler be trusted to add the memoization code to pure functions so they can stay pure? Something like this: pure memoize int costlyCalculus(int i, int j) { return i + j; } being transformed into this by the compiler? int[TypeTuple!(i, j)] _costlyCalculus_cache; pure int costlyCalculus(int i, int j) { { int* _found_result = Tuple!(i, j) in _costlyCalculus_cache; if (_found_result) return _found_result; } { int _calculated_result = i + j; _costlyCalculus_cache[Tuple!(i, j)] = _calculated_result; return _calculated_result; } } Surely this way the compiler would be able to bypass purity checks for accessing the cache while still ensuring that the function has no side effect and always return the same result for the same arguments. The downside being that a hash table may not always be the best memoization technique. Perhaps then it belongs more in the standard library. pure int costlyCalculus(int i, int j); Memoizer!(costlyCalculus) memoizedCostlyCalculus; where Memoizer is a struct template defining a pure opCall and containing the cache it can access by casting away purity in a few places. But it makes it more difficult to add memoization while overriding a function.

Yes! Memoization is One Great Litmus Test of compile-time reflection. I implemented a couple of simple memoizers along the lines you mention, and I plan to make memoization examples part of TDPL. The Even Greater Litmus Test would be to expose the memoizer itself as a pure function (which it essentially is). Andrei
Jan 09 2009
prev sibling parent Bill Baxter <wbaxter gmail.com> writes:
On Sun, Jan 11, 2009 at 11:43 PM, Christopher Wright <dhasenan gmail.com> wrote:
 Bill Baxter wrote:
 But the call for a memoization thingy I just don't get.  No way that
 should be a compiler feature, just far too many ways to memoize with
 too many different tradeoffs.  And a memoizing wrapper function
 basically loses nothing in terms of expressiveness.

 --bb

The ability for you to memoize a pure function and get a pure function as a result. Your pure functions can't use the memoized version of the pure function. This means you might have to use a lot more memoization wrappers.

Isn't this the same old "logically const" vs "const" argument we went through ages ago? --bb
Jan 11 2009
prev sibling next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Walter Bright wrote:
 Jason House wrote:
 Walter Bright Wrote:

 Jason House wrote:
 When considering the standard library changes, I realized that 
 object.d could change. I believe a pure opCmp or toString could
 break user code with impure versions of those functions. Would
 that kind of a change to object.d cause any real problems for D2
 users?

pure is if you want to do an override that caches its value.

One could easily generalize that to mean non-final virtual functions should never be pure. Memoizaton is a pretty basic need. I thought that was in D's future. Is that not true?

Memoization and purity don't mix. For each function, you'll have to pick which way you want to go. I don't see what that has to do with D's future, though, the language allows either.

Memoization and purity can and should go together. We need to sit down and discuss that someday. Andrei
Jan 09 2009
prev sibling next sibling parent Stewart Gordon <smjg_1998 yahoo.com> writes:
Walter Bright wrote:
<snip>
 Memoization and purity don't mix. For each function, you'll have to pick 
 which way you want to go. I don't see what that has to do with D's 
 future, though, the language allows either.

Actually, what they don't mix with is the requirement that it still be callable on a mutable object. There's http://d.puremagic.com/issues/show_bug.cgi?id=1824 and, while it's a nice idea, it precludes caching of the result. Unless you want to keep the cache separate from the object. A pure toString may be a nice idea; however, this precludes using any mutable members of the class to generate it. But it would also be nice to be able to call such methods on all objects of the class, mutable or not. One possibility is to have separate functions string toString() string toString() invariant and let the programmer implement memo(r)ization on the mutable version if desired, whereas the invariant version would have it generated by the compiler. But there are still problems - nobody would know which to call if the reference is const, rather than mutable or invariant - it would be necessary to maintain two versions of toString in parallel Maybe there's a solution somewhere out there.... Stewart.
Jan 10 2009
prev sibling parent Bill Baxter <wbaxter gmail.com> writes:
On Sun, Jan 11, 2009 at 8:51 AM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 Walter Bright wrote:
 Andrei Alexandrescu wrote:
 Michel Fortin wrote:
 On 2009-01-10 00:10:11 -0500, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> said:

 The problem is identifying if this would be faster than recomputing
 the return value.

I used memoizers for exp and log (the most called functions in some code I wrote) and it made the original version feel like it was driving in reverse.

That's only true because, in your specific context, those functions were called often with the same input. In a program that rarely use the same inputs, your memoizing functions will just waste time and memory.

No, because I use interpolation.

That's way beyond the ability of a compiler to do automatically. The compiler would have to understand that the pure function produces continuous results.

You're replying to the wrong guy. I'm saying: the compiler shouldn't have to do so, but it should allow functions to do it. Lately it looks like a lot of the problems discussed here inevitably lead to a built-in feature. We want properties, let's have a property thingy. We want memoization, let's have a memoize thingy. We want optimally aligned structures, let's have an optimal align thingy. I'm holding my breath for a request for the kitchen sink thingy.

Just get Walter to implement AST macros and it should stop. :-) Until then the library solutions are often unpalatable. Like using string mixins to implement properties. Ick. But the call for a memoization thingy I just don't get. No way that should be a compiler feature, just far too many ways to memoize with too many different tradeoffs. And a memoizing wrapper function basically loses nothing in terms of expressiveness. --bb
Jan 10 2009
prev sibling next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Sat, 10 Jan 2009 07:14:07 +0300, dsimcha <dsimcha yahoo.com> wrote:

 == Quote from Walter Bright (newshound1 digitalmars.com)'s article
 Michel Fortin wrote:
 Hum, could the compiler be trusted to add the memoization code to pure
 functions so they can stay pure?

to do is use the bits of the arguments passed on the stack to the function as an index into an associative array of the return values. The problem is identifying if this would be faster than recomputing the return value.

Wouldn't this also cause threading issues? The obvious solution would be to use TLS, but his requires duplicating the cache across threads. Also, using AAs internally like this would lead to very deeply hidden memory allocations, and therefore possibly more frequent GC, etc.

Thread-safe (lock-free?) container would do the trick just fine.
Jan 09 2009
prev sibling parent Bill Baxter <wbaxter gmail.com> writes:
On Sun, Jan 11, 2009 at 4:31 AM, Stewart Gordon <smjg_1998 yahoo.com> wrote:
 Stewart Gordon wrote:
 <snip>
 Walter, before you go and implement any of this, I must point out that
 it's spelt "memorization" and "memorize".  (Or "memorisation" and "memorise"
 if you're British, but that's an aside.)

Or maybe not... http://en.wikipedia.org/wiki/Memoization Still, I'm confused.... Stewart.

Heh heh. Yeh, I remember I thought I'd found a major typo in the CLR algorithms tome when I first ran across that. But it's not a typo. It's "memo-ization" as in writing down a memo for yourself to remind you of the answer later. --bb
Jan 10 2009