digitalmars.D - Purity (D2 standard libraries / object.d)

Jason House (3/3) Jan 09 2009 In the latest D2, both the frontend and backend support use and optimiza...

Walter Bright (3/7) Jan 09 2009 As Andrei pointed out, the trouble with making the Object functions pure...

Robert Fraser (11/19) Jan 09 2009 If you have:

Walter Bright (2/5) Jan 09 2009 It should optimize it.

Jason House (2/10) Jan 09 2009 One could easily generalize that to mean non-final virtual functions sho...

Walter Bright (4/18) Jan 09 2009 Memoization and purity don't mix. For each function, you'll have to pick...

Jason House (8/12) Jan 09 2009 True, but...

Walter Bright (5/8) Jan 09 2009 Another word for that is "common subexpressions". And yes, the compiler
Michel Fortin (38/45) Jan 09 2009 Hum, could the compiler be trusted to add the memoization code to pure

Walter Bright (6/8) Jan 09 2009 If the compiler does general memoization on pure functions, all it has

Jason House (2/12) Jan 09 2009 What about a compiler hint similar to inline or register?

Andrei Alexandrescu (5/18) Jan 09 2009 Maybe making a conscious effort to leave the language feature as a last

Jason House (2/22) Jan 10 2009 I chose nearly dead keywords for a reason ;) If I got Walter to think ab...

Walter Bright (2/8) Jan 10 2009 Coffee shop discussions are certainly the best.

dsimcha (5/13) Jan 09 2009 Wouldn't this also cause threading issues? The obvious solution would b...

Denis Koroskin (2/18) Jan 09 2009 Thread-safe (lock-free?) container would do the trick just fine.
Michel Fortin (18/24) Jan 10 2009 Well, I'd trust the compiler-generated code would not cause threading
Walter Bright (2/13) Jan 10 2009 Immutable data doesn't have threading issues, which is the beauty of the...

dsimcha (7/20) Jan 10 2009 You misunderstood the question. I mean, at the implementation level, ev...

Walter Bright (3/9) Jan 10 2009 You're right, I hadn't thought of that. But you could do a per-thread

Andrei Alexandrescu (9/18) Jan 09 2009 Not the bits, the full objects if they have indirection. (Pure functions...

Michel Fortin (16/22) Jan 10 2009 That's only true because, in your specific context, those functions

bearophile (7/18) Jan 10 2009 Right. There are time-limited memoization strategies, and other ones as ...
Daniel Keep (15/38) Jan 10 2009 ... so let's tell it what the usage patterns are!
Andrei Alexandrescu (3/16) Jan 10 2009 No, because I use interpolation.

Walter Bright (4/21) Jan 10 2009 That's way beyond the ability of a compiler to do automatically. The

Andrei Alexandrescu (9/31) Jan 10 2009 You're replying to the wrong guy. I'm saying: the compiler shouldn't

Bill Baxter (10/42) Jan 10 2009 Just get Walter to implement AST macros and it should stop. :-)

Christopher Wright (5/11) Jan 11 2009 The ability for you to memoize a pure function and get a pure function

Bill Baxter (4/16) Jan 11 2009 Isn't this the same old "logically const" vs "const" argument we went

Jason House (20/31) Jan 11 2009 Here's the simplest memoization example I could come up with. Maybe we ...

Stewart Gordon (9/25) Jan 11 2009 So effectiely, the compiler could just trust pure functions to be

Walter Bright (5/16) Jan 10 2009 You're right, the "just the bits on the stack" is if the arguments are

dsimcha (26/42) Jan 10 2009 Even if the bits are immutable, I can't see how this could work. What i...

Walter Bright (4/8) Jan 10 2009 That's the beauty of a garbage collector . The bits will include a

dsimcha (5/13) Jan 10 2009 Good point, the bits in the internal memoization AA will contain a refer...

Walter Bright (3/17) Jan 10 2009 The job of managing those kind of design tradeoffs is something that the...

Frits van Bommel (3/13) Jan 12 2009 *cough* And with weak pointer/reference types we could handle it quite

Steven Schveighoffer (10/26) Jan 12 2009 Bits on the stack also do not work for immutable reference arguments.

Michel Fortin (15/20) Jan 10 2009 True, but only in the absence of pointers to mutable data (pointers to
Stewart Gordon (6/13) Jan 10 2009

Stewart Gordon (6/9) Jan 10 2009 Or maybe not...

Christopher Wright (2/14) Jan 10 2009 I thought you were joking. What's confusing you?

Walter Bright (2/17) Jan 10 2009 I forgot.

Christopher Wright (2/20) Jan 11 2009 Walter Bright is the new Stewart Gordon!

Bill Baxter (6/16) Jan 10 2009 Heh heh. Yeh, I remember I thought I'd found a major typo in the CLR

Andrei Alexandrescu (8/60) Jan 09 2009 Yes!

Andrei Alexandrescu (4/23) Jan 09 2009 Memoization and purity can and should go together. We need to sit down
Stewart Gordon (22/25) Jan 10 2009 Actually, what they don't mix with is the requirement that it still be

Jason House <jason.james.house gmail.com> writes:

In the latest D2, both the frontend and backend support use and optimization
when user code declares pure code. This isn't particularly useful when pure
functions can't call standard library code.

When considering the standard library changes, I realized that object.d could
change. I believe a pure opCmp or toString could break user code with impure
versions of those functions. Would that kind of a change to object.d cause any
real problems for D2 users?

Are there other ambiguous purity options when converting the standard
libraries? Which parts of Phobos/druntime would you expect to be pure/impure?

Jan 09 2009

Walter Bright <newshound1 digitalmars.com> writes:

Jason House wrote:
 When considering the standard library changes, I realized that
 object.d could change. I believe a pure opCmp or toString could break
 user code with impure versions of those functions. Would that kind of
 a change to object.d cause any real problems for D2 users?

As Andrei pointed out, the trouble with making the Object functions pure 
is if you want to do an override that caches its value.

Jan 09 2009

Robert Fraser <fraserofthenight gmail.com> writes:

Walter Bright Wrote:

 Jason House wrote:
 When considering the standard library changes, I realized that
 object.d could change. I believe a pure opCmp or toString could break
 user code with impure versions of those functions. Would that kind of
 a change to object.d cause any real problems for D2 users?

 
 As Andrei pointed out, the trouble with making the Object functions pure 
 is if you want to do an override that caches its value.

If you have:

class A {
    invariant final pure bool opEquals(invariant(Object) other) { ... }
}

int main() {
    auto a = cast(invariant) new A();
    auto b = cast(invariant) new A();
    if(a1 == a2) { return 0; } else { return 1; }
}

... will the compiler optimize it? Since A.opEquals() is final, there's no
polymorphism, so it should be able to, but I don't know about the
implementation.

Jan 09 2009

Walter Bright <newshound1 digitalmars.com> writes:

Robert Fraser wrote:
 ... will the compiler optimize it? Since A.opEquals() is final,
 there's no polymorphism, so it should be able to, but I don't know
 about the implementation.

It should optimize it.

Jan 09 2009

Jason House <jason.james.house gmail.com> writes:

Walter Bright Wrote:

 Jason House wrote:
 When considering the standard library changes, I realized that
 object.d could change. I believe a pure opCmp or toString could break
 user code with impure versions of those functions. Would that kind of
 a change to object.d cause any real problems for D2 users?

 
 As Andrei pointed out, the trouble with making the Object functions pure 
 is if you want to do an override that caches its value.

One could easily generalize that to mean non-final virtual functions should
never be pure. Memoizaton is a pretty basic need. I thought that was in D's
future. Is that not true?

Jan 09 2009

Walter Bright <newshound1 digitalmars.com> writes:

Jason House wrote:
 Walter Bright Wrote:
 
 Jason House wrote:
 When considering the standard library changes, I realized that 
 object.d could change. I believe a pure opCmp or toString could
 break user code with impure versions of those functions. Would
 that kind of a change to object.d cause any real problems for D2
 users?

 As Andrei pointed out, the trouble with making the Object functions
 pure is if you want to do an override that caches its value.

 
 One could easily generalize that to mean non-final virtual functions
 should never be pure. Memoizaton is a pretty basic need. I thought
 that was in D's future. Is that not true?

Memoization and purity don't mix. For each function, you'll have to pick 
which way you want to go. I don't see what that has to do with D's 
future, though, the language allows either.

Jan 09 2009

Jason House <jason.james.house gmail.com> writes:

Walter Bright wrote:

 Memoization and purity don't mix. 

True, but...

All pure functions can be memoized.  They can be memoized by the caller of the
pure function, or the pure function could be rewritten to automatically memoize.

When writing a pure but potentially costly function, the coder may consider
memoization.  They must predict how frequently the function would get called
with the same input.  They must also decide if the time/space trade-off is
worth it for the program as a whole and how inconvenient it would be to call
the function.  If it's called from another pure function, well... they don't
mix.

All-in-all, memoization is a tough call to make.  For library writers, it's
even tougher.  They may not know how their code would get used and may be
unable to make the call about memoization.  It's possible to provide a pure
function and a memoized alternative.  Of course, there are many memoization
options available, especially when memory space is restricted or various thread
safety/locking methods may be desired.  A library writer may just declare that
if the user wants memoization that they should use a memoization library to do
it.

Anyone who writes an inheritable classes also have to predict if the writers of
classes that inherit from their class will want to memoize or not.  Some may
forbid it because then they can declare the function pure.  Others will feel
like they can't declare any virtual functions as pure (and therefore allow
memoization).

 For each function, you'll have to pick
 which way you want to go. I don't see what that has to do with D's
 future, though, the language allows either.

Somehow, I had the impression that D might use memoization as some kind of
optimization once pure functions are embraced.  That was probably a
misunderstanding on my part.

I do think that writers of generic/library/inheritable code will continuously
revisit the question of memoization.  That may affect the future of D usage,
but not the underlying language specification.

Jan 09 2009

Walter Bright <newshound1 digitalmars.com> writes:

Jason House wrote:
 Somehow, I had the impression that D might use memoization as some
 kind of optimization once pure functions are embraced.  That was
 probably a misunderstanding on my part.

Another word for that is "common subexpressions". And yes, the compiler 
has the option to do that, and in fact does do that (in a small way) in 
the latest release. But what I was talking about is the user himself 
doing memoization.

Jan 09 2009

Michel Fortin <michel.fortin michelf.com> writes:

On 2009-01-09 20:45:57 -0500, Jason House <jason.james.house gmail.com> said:

 For each function, you'll have to pick
 which way you want to go. I don't see what that has to do with D's
 future, though, the language allows either.

 
 Somehow, I had the impression that D might use memoization as some kind 
 of optimization once pure functions are embraced.  That was probably a 
 misunderstanding on my part.

Hum, could the compiler be trusted to add the memoization code to pure 
functions so they can stay pure? Something like this:

	pure memoize int costlyCalculus(int i, int j)
	{
		return i + j;
	}

being transformed into this by the compiler?

	int[TypeTuple!(i, j)] _costlyCalculus_cache;
	pure int costlyCalculus(int i, int j)
	{
		{
			int* _found_result = Tuple!(i, j) in _costlyCalculus_cache;
			if (_found_result)
				return _found_result;
		}
		{
			int _calculated_result = i + j;
			_costlyCalculus_cache[Tuple!(i, j)] = _calculated_result;
			return _calculated_result;
		}
	}

Surely this way the compiler would be able to bypass purity checks for 
accessing the cache while still ensuring that the function has no side 
effect and always return the same result for the same arguments. The 
downside being that a hash table may not always be the best memoization 
technique.

Perhaps then it belongs more in the standard library.

	pure int costlyCalculus(int i, int j);
	Memoizer!(costlyCalculus) memoizedCostlyCalculus;

where Memoizer is a struct template defining a pure opCall and 
containing the cache it can access by casting away purity in a few 
places. But it makes it more difficult to add memoization while 
overriding a function.

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Jan 09 2009

Walter Bright <newshound1 digitalmars.com> writes:

Michel Fortin wrote:
 Hum, could the compiler be trusted to add the memoization code to pure 
 functions so they can stay pure?

If the compiler does general memoization on pure functions, all it has 
to do is use the bits of the arguments passed on the stack to the 
function as an index into an associative array of the return values.

The problem is identifying if this would be faster than recomputing the 
return value.

Jan 09 2009

Jason House <jason.james.house gmail.com> writes:

Walter Bright Wrote:

 Michel Fortin wrote:
 Hum, could the compiler be trusted to add the memoization code to pure 
 functions so they can stay pure?

 
 If the compiler does general memoization on pure functions, all it has 
 to do is use the bits of the arguments passed on the stack to the 
 function as an index into an associative array of the return values.
  
 The problem is identifying if this would be faster than recomputing the 
 return value.

What about a compiler hint similar to inline or register?

Jan 09 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Jason House wrote:
 Walter Bright Wrote:
 
 Michel Fortin wrote:
 Hum, could the compiler be trusted to add the memoization code to pure 
 functions so they can stay pure?

 If the compiler does general memoization on pure functions, all it has 
 to do is use the bits of the arguments passed on the stack to the 
 function as an index into an associative array of the return values.
  
 The problem is identifying if this would be faster than recomputing the 
 return value.

 
 What about a compiler hint similar to inline or register?

Maybe making a conscious effort to leave the language feature as a last 
resort action would be better. Creating a memoization feature is a fish, 
creating a language that allows implementing memoization is fishing. :o/

Andrei

Jan 09 2009

Jason House <jason.james.house gmail.com> writes:

Andrei Alexandrescu Wrote:

 Jason House wrote:
 Walter Bright Wrote:
 
 Michel Fortin wrote:
 Hum, could the compiler be trusted to add the memoization code to pure 
 functions so they can stay pure?

 If the compiler does general memoization on pure functions, all it has 
 to do is use the bits of the arguments passed on the stack to the 
 function as an index into an associative array of the return values.
  
 The problem is identifying if this would be faster than recomputing the 
 return value.

 
 What about a compiler hint similar to inline or register?

 
 Maybe making a conscious effort to leave the language feature as a last 
 resort action would be better. Creating a memoization feature is a fish, 
 creating a language that allows implementing memoization is fishing. :o/
 
 Andrei

I chose nearly dead keywords for a reason ;) If I got Walter to think about how
to implement "pure" memoizers for an extra half hour, I will consider myself
highly successful. I must leave the fishing discussion up to the two of you in
a coffee shop! (I'd love to sit in and participate with that discussion, but
online forums make the conversation far too difficult)

Jan 10 2009

Walter Bright <newshound1 digitalmars.com> writes:

Jason House wrote:
 I chose nearly dead keywords for a reason ;) If I got Walter to think
 about how to implement "pure" memoizers for an extra half hour, I
 will consider myself highly successful. I must leave the fishing
 discussion up to the two of you in a coffee shop! (I'd love to sit in
 and participate with that discussion, but online forums make the
 conversation far too difficult)

Coffee shop discussions are certainly the best.

Jan 10 2009

dsimcha <dsimcha yahoo.com> writes:

== Quote from Walter Bright (newshound1 digitalmars.com)'s article
 Michel Fortin wrote:
 Hum, could the compiler be trusted to add the memoization code to pure
 functions so they can stay pure?

 If the compiler does general memoization on pure functions, all it has
 to do is use the bits of the arguments passed on the stack to the
 function as an index into an associative array of the return values.
 The problem is identifying if this would be faster than recomputing the
 return value.

Wouldn't this also cause threading issues?  The obvious solution would be to use
TLS, but his requires duplicating the cache across threads.  Also, using AAs
internally like this would lead to very deeply hidden memory allocations, and
therefore possibly more frequent GC, etc.

Jan 09 2009

"Denis Koroskin" <2korden gmail.com> writes:

On Sat, 10 Jan 2009 07:14:07 +0300, dsimcha <dsimcha yahoo.com> wrote:

 == Quote from Walter Bright (newshound1 digitalmars.com)'s article
 Michel Fortin wrote:
 Hum, could the compiler be trusted to add the memoization code to pure
 functions so they can stay pure?

 If the compiler does general memoization on pure functions, all it has
 to do is use the bits of the arguments passed on the stack to the
 function as an index into an associative array of the return values.
 The problem is identifying if this would be faster than recomputing the
 return value.

 Wouldn't this also cause threading issues?  The obvious solution would  
 be to use
 TLS, but his requires duplicating the cache across threads.  Also, using  
 AAs
 internally like this would lead to very deeply hidden memory  
 allocations, and
 therefore possibly more frequent GC, etc.

Thread-safe (lock-free?) container would do the trick just fine.

Jan 09 2009

Michel Fortin <michel.fortin michelf.com> writes:

On 2009-01-09 23:14:07 -0500, dsimcha <dsimcha yahoo.com> said:

 Wouldn't this also cause threading issues?

Well, I'd trust the compiler-generated code would not cause threading 
issues. :-)


 The obvious solution would be to use TLS, but his requires duplicating 
 the cache across threads.

Well, there isn't many choices. Either it's TLS, or the cache need to 
be synchronized (using either locks or lock-free algorithms; both will 
add some overhead anyway).

That said, if you memoize a function member of a class, the memoization 
data could be added to the class and not be global. If the object is 
not shared across threads, then the memoization data isn't either. If 
the object is shared, then you'd need either TLS or synchronization.


 Also, using AAs internally like this would lead to very deeply hidden 
 memory allocations, and
 therefore possibly more frequent GC, etc.

Perhaps the cache needs to be a little smarter than a regular AA. You 
may not want to keep each and every value that was computed. Depending 
on the situation, keeping only the 100 last results may be enough, in 
which case you can dedicate a fixed amount of memory for caching.

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Jan 10 2009

Walter Bright <newshound1 digitalmars.com> writes:

dsimcha wrote:
 == Quote from Walter Bright (newshound1 digitalmars.com)'s article
 If the compiler does general memoization on pure functions, all it has
 to do is use the bits of the arguments passed on the stack to the
 function as an index into an associative array of the return values.
 The problem is identifying if this would be faster than recomputing the
 return value.

 
 Wouldn't this also cause threading issues?  The obvious solution would be to
use
 TLS, but his requires duplicating the cache across threads.  Also, using AAs
 internally like this would lead to very deeply hidden memory allocations, and
 therefore possibly more frequent GC, etc.

Immutable data doesn't have threading issues, which is the beauty of them.

Jan 10 2009

dsimcha <dsimcha yahoo.com> writes:

== Quote from Walter Bright (newshound1 digitalmars.com)'s article
 dsimcha wrote:
 == Quote from Walter Bright (newshound1 digitalmars.com)'s article
 If the compiler does general memoization on pure functions, all it has
 to do is use the bits of the arguments passed on the stack to the
 function as an index into an associative array of the return values.
 The problem is identifying if this would be faster than recomputing the
 return value.

 Wouldn't this also cause threading issues?  The obvious solution would be to
use
 TLS, but his requires duplicating the cache across threads.  Also, using AAs
 internally like this would lead to very deeply hidden memory allocations, and
 therefore possibly more frequent GC, etc.

 Immutable data doesn't have threading issues, which is the beauty of them.

You misunderstood the question.  I mean, at the implementation level, even
though
it wouldn't be visible to the programmer, the AA that did the memoization would
have to be mutable so it could be modified when a new value was to be memoized.
This is where the threading issues would come in.  What I was saying is that
synchronizing on this would be bad for obvious reasons, and TLS would still not
be
great because you wouldn't be able to share memoized values across threads.

Jan 10 2009

Walter Bright <newshound1 digitalmars.com> writes:

dsimcha wrote:
 You misunderstood the question.  I mean, at the implementation level, even
though
 it wouldn't be visible to the programmer, the AA that did the memoization would
 have to be mutable so it could be modified when a new value was to be memoized.
 This is where the threading issues would come in.  What I was saying is that
 synchronizing on this would be bad for obvious reasons, and TLS would still
not be
 great because you wouldn't be able to share memoized values across threads.

You're right, I hadn't thought of that. But you could do a per-thread 
memoization using thread local storage. No sync problems.

Jan 10 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Walter Bright wrote:
 Michel Fortin wrote:
 Hum, could the compiler be trusted to add the memoization code to pure 
 functions so they can stay pure?

 
 If the compiler does general memoization on pure functions, all it has 
 to do is use the bits of the arguments passed on the stack to the 
 function as an index into an associative array of the return values.

Not the bits, the full objects if they have indirection. (Pure functions 
should accept const parameters!)

And it's not only simple associative array. It could also be a dense 
array, a little interpolation engine etc.

 The problem is identifying if this would be faster than recomputing the 
 return value.

I used memoizers for exp and log (the most called functions in some code 
I wrote) and it made the original version feel like it was driving in 
reverse.


Andrei

Jan 09 2009

Michel Fortin <michel.fortin michelf.com> writes:

On 2009-01-10 00:10:11 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 The problem is identifying if this would be faster than recomputing the 
 return value.

 
 I used memoizers for exp and log (the most called functions in some 
 code I wrote) and it made the original version feel like it was driving 
 in reverse.

That's only true because, in your specific context, those functions 
were called often with the same input. In a program that rarely use the 
same inputs, your memoizing functions will just waste time and memory.

So to determine if a function is worth memoizing, you have to say yes 
to those two things:

1.	is it faster than recomputing the return value?
2.	is this function called often with the same inputs?

While the compiler could take an educated guess at answering 1 given 
the code of a function, question 2 can only be answered by knowing the 
usage pattern of various functions.

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Jan 10 2009

bearophile <bearophileHUGS lycos.com> writes:

Michel Fortin:

 So to determine if a function is worth memoizing, you have to say yes 
 to those two things:
 1.	is it faster than recomputing the return value?
 2.	is this function called often with the same inputs?
 While the compiler could take an educated guess at answering 1 given 
 the code of a function, question 2 can only be answered by knowing the 
 usage pattern of various functions.

GCC has profile-driven optimization. I presume it may help in such problems too.


 Perhaps the cache needs to be a little smarter than a regular AA. You
 may not want to keep each and every value that was computed. Depending
 on the situation, keeping only the 100 last results may be enough, in
 which case you can dedicate a fixed amount of memory for caching.

Right. There are time-limited memoization strategies, and other ones as well.
In Python I have seen plenty of them:
http://code.activestate.com/recipes/325905/
http://code.activestate.com/recipes/498110/

Bye,
bearophile

Jan 10 2009

Daniel Keep <daniel.keep.lists gmail.com> writes:

Michel Fortin wrote:
 On 2009-01-10 00:10:11 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:
 
 The problem is identifying if this would be faster than recomputing 
 the return value.

 I used memoizers for exp and log (the most called functions in some 
 code I wrote) and it made the original version feel like it was 
 driving in reverse.

 
 That's only true because, in your specific context, those functions were 
 called often with the same input. In a program that rarely use the same 
 inputs, your memoizing functions will just waste time and memory.
 
 So to determine if a function is worth memoizing, you have to say yes to 
 those two things:
 
 1.    is it faster than recomputing the return value?
 2.    is this function called often with the same inputs?
 
 While the compiler could take an educated guess at answering 1 given the 
 code of a function, question 2 can only be answered by knowing the usage 
 pattern of various functions.

... so let's tell it what the usage patterns are!

1. Compile with profiling.

$ dmd -profile pure_program.d

2. Feed the profiled executable a big gob of test data.  Om nom nom.

$ for I in {0..99} do; pure_program < test_data_$I.dat; done

3. Recompile, giving the compiler the trace log, telling it which 
functions got called frequently, and where time was spent.

$ dmd -trace=trace.log pure_program.d

Viola; now the compiler can make a well-informed guess at compile-time. 
  All it really lacks is information on how many distinct argument sets 
any given pure functions gets, but I imagine that could be added.

I think feeding the program test data with profiling enabled and then 
re-compiling would be a very acceptable trade off.

   -- Daniel

Jan 10 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Michel Fortin wrote:
 On 2009-01-10 00:10:11 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:
 
 The problem is identifying if this would be faster than recomputing 
 the return value.

 I used memoizers for exp and log (the most called functions in some 
 code I wrote) and it made the original version feel like it was 
 driving in reverse.

 
 That's only true because, in your specific context, those functions were 
 called often with the same input. In a program that rarely use the same 
 inputs, your memoizing functions will just waste time and memory.

No, because I use interpolation.

Andrei

Jan 10 2009

Walter Bright <newshound1 digitalmars.com> writes:

Andrei Alexandrescu wrote:
 Michel Fortin wrote:
 On 2009-01-10 00:10:11 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:

 The problem is identifying if this would be faster than recomputing 
 the return value.

 I used memoizers for exp and log (the most called functions in some 
 code I wrote) and it made the original version feel like it was 
 driving in reverse.

 That's only true because, in your specific context, those functions 
 were called often with the same input. In a program that rarely use 
 the same inputs, your memoizing functions will just waste time and 
 memory.

 
 No, because I use interpolation.

That's way beyond the ability of a compiler to do automatically. The 
compiler would have to understand that the pure function produces 
continuous results.

Jan 10 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Walter Bright wrote:
 Andrei Alexandrescu wrote:
 Michel Fortin wrote:
 On 2009-01-10 00:10:11 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:

 The problem is identifying if this would be faster than recomputing 
 the return value.

 I used memoizers for exp and log (the most called functions in some 
 code I wrote) and it made the original version feel like it was 
 driving in reverse.

 That's only true because, in your specific context, those functions 
 were called often with the same input. In a program that rarely use 
 the same inputs, your memoizing functions will just waste time and 
 memory.

 No, because I use interpolation.

 
 That's way beyond the ability of a compiler to do automatically. The 
 compiler would have to understand that the pure function produces 
 continuous results.

You're replying to the wrong guy. I'm saying: the compiler shouldn't 
have to do so, but it should allow functions to do it.

Lately it looks like a lot of the problems discussed here inevitably 
lead to a built-in feature. We want properties, let's have a property 
thingy. We want memoization, let's have a memoize thingy. We want 
optimally aligned structures, let's have an optimal align thingy. I'm 
holding my breath for a request for the kitchen sink thingy.


Andrei

Jan 10 2009

Bill Baxter <wbaxter gmail.com> writes:

On Sun, Jan 11, 2009 at 8:51 AM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 Walter Bright wrote:
 Andrei Alexandrescu wrote:
 Michel Fortin wrote:
 On 2009-01-10 00:10:11 -0500, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> said:

 The problem is identifying if this would be faster than recomputing
 the return value.

 I used memoizers for exp and log (the most called functions in some
 code I wrote) and it made the original version feel like it was driving in
 reverse.

 That's only true because, in your specific context, those functions were
 called often with the same input. In a program that rarely use the same
 inputs, your memoizing functions will just waste time and memory.

 No, because I use interpolation.

 That's way beyond the ability of a compiler to do automatically. The
 compiler would have to understand that the pure function produces continuous
 results.

 You're replying to the wrong guy. I'm saying: the compiler shouldn't have to
 do so, but it should allow functions to do it.

 Lately it looks like a lot of the problems discussed here inevitably lead to
 a built-in feature. We want properties, let's have a property thingy. We
 want memoization, let's have a memoize thingy. We want optimally aligned
 structures, let's have an optimal align thingy. I'm holding my breath for a
 request for the kitchen sink thingy.

Just get Walter to implement AST macros and it should stop.  :-)
Until then the library solutions are often unpalatable.  Like using
string mixins to implement properties.  Ick.

But the call for a memoization thingy I just don't get.  No way that
should be a compiler feature, just far too many ways to memoize with
too many different tradeoffs.  And a memoizing wrapper function
basically loses nothing in terms of expressiveness.

--bb

Jan 10 2009

Christopher Wright <dhasenan gmail.com> writes:

Bill Baxter wrote:
 But the call for a memoization thingy I just don't get.  No way that
 should be a compiler feature, just far too many ways to memoize with
 too many different tradeoffs.  And a memoizing wrapper function
 basically loses nothing in terms of expressiveness.
 
 --bb

The ability for you to memoize a pure function and get a pure function 
as a result. Your pure functions can't use the memoized version of the 
pure function.

This means you might have to use a lot more memoization wrappers.

Jan 11 2009

Bill Baxter <wbaxter gmail.com> writes:

On Sun, Jan 11, 2009 at 11:43 PM, Christopher Wright <dhasenan gmail.com> wrote:
 Bill Baxter wrote:
 But the call for a memoization thingy I just don't get.  No way that
 should be a compiler feature, just far too many ways to memoize with
 too many different tradeoffs.  And a memoizing wrapper function
 basically loses nothing in terms of expressiveness.

 --bb

 The ability for you to memoize a pure function and get a pure function as a
 result. Your pure functions can't use the memoized version of the pure
 function.

 This means you might have to use a lot more memoization wrappers.

Isn't this the same old "logically const" vs "const" argument we went
through ages ago?

--bb

Jan 11 2009

Jason House <jason.james.house gmail.com> writes:

Andrei Alexandrescu wrote:

 That's way beyond the ability of a compiler to do automatically. The
 compiler would have to understand that the pure function produces
 continuous results.

 
 You're replying to the wrong guy. I'm saying: the compiler shouldn't
 have to do so, but it should allow functions to do it.

Here's the simplest memoization example I could come up with.  Maybe we should
try to figure out how to get this to work and then expand to other cases?

/** Fibonocci number generator based on 
 * http://en.wikipedia.org/wiki/Fibonacci_number **/
pure int fibonocci(int n)
in { assert( i>=0 && i<10 ); }
out(result) { assert( result>0 ); }
body{
  static __thread int[10] cache = [0,1,-1,-1,-1,-1,-1,-1,-1,-1];
  if (cache[n] >= 0)
    return i;
  else return fibonacci(n-1)+fibonacci(n-2);
}

unittest{
  assert(fibonocci(4) == 3);
  assert(fibonocci(9) == 34);
}

Note the thread-local static variable.  That's guaranteed to be thread safe and
is easy to prove that it has no side effects outside of the fibonocci function.

Inevitably, I've embarrassed myself by not actually passing what I thought was
simple code through a compiler...


 Lately it looks like a lot of the problems discussed here inevitably
 lead to a built-in feature. We want properties, let's have a property
 thingy. We want memoization, let's have a memoize thingy. We want
 optimally aligned structures, let's have an optimal align thingy. I'm
 holding my breath for a request for the kitchen sink thingy.

That kitchen sink thingy sounds useful.  What does it do? ;)

Jan 11 2009

Stewart Gordon <smjg_1998 yahoo.com> writes:

Jason House wrote:
 Andrei Alexandrescu wrote:
 
 That's way beyond the ability of a compiler to do automatically.  
 The compiler would have to understand that the pure function 
 produces continuous results.

 You're replying to the wrong guy. I'm saying: the compiler 
 shouldn't have to do so, but it should allow functions to do it.


So effectiely, the compiler could just trust pure functions to be 
actually pure, with UB for any that isn't?

 Here's the simplest memoization example I could come up with.

You mean the simplest example beyond just remembering a single property 
value?

 Maybe we should try to figure out how to get this to work and then
 expand to other cases?

<snip>
 Note the thread-local static variable.  That's guaranteed to be
 thread safe and is easy to prove that it has no side effects outside
 of the fibonocci function.
 
 Inevitably, I've embarrassed myself by not actually passing what I
 thought was simple code through a compiler...

<snip>

Indeed.

Stewart.

Jan 11 2009

Walter Bright <newshound1 digitalmars.com> writes:

Andrei Alexandrescu wrote:
 Walter Bright wrote:
 Michel Fortin wrote:
 Hum, could the compiler be trusted to add the memoization code to 
 pure functions so they can stay pure?

 If the compiler does general memoization on pure functions, all it has 
 to do is use the bits of the arguments passed on the stack to the 
 function as an index into an associative array of the return values.

 
 Not the bits, the full objects if they have indirection. (Pure functions 
 should accept const parameters!)

You're right, the "just the bits on the stack" is if the arguments are 
immutable. Const arguments would have to include whatever was indirectly 
referenced, which I argue is impractical for the runtime to handle 
automatically.

Jan 10 2009

dsimcha <dsimcha yahoo.com> writes:

== Quote from Walter Bright (newshound1 digitalmars.com)'s article
 Andrei Alexandrescu wrote:
 Walter Bright wrote:
 Michel Fortin wrote:
 Hum, could the compiler be trusted to add the memoization code to
 pure functions so they can stay pure?

 If the compiler does general memoization on pure functions, all it has
 to do is use the bits of the arguments passed on the stack to the
 function as an index into an associative array of the return values.

 Not the bits, the full objects if they have indirection. (Pure functions
 should accept const parameters!)

 You're right, the "just the bits on the stack" is if the arguments are
 immutable. Const arguments would have to include whatever was indirectly
 referenced, which I argue is impractical for the runtime to handle
 automatically.

Even if the bits are immutable, I can't see how this could work.  What if
something is GC'd and then that memory location gets overwritten?  For example:

import std.contracts, std.stdio;

enum N = 100;

void main() {
    foo();
    bar();
}

uint pureFun(invariant uint[] data) pure {
    // Do stuff.
    return someUint;
}

void foo() {
    uint[] myArray = new uint[N];
    // Fill in myArray;
    invariant uint[] myInvariantArray = assumeUnique(myArray);
    writeln(pureFun(myInvariantArray));
}

void bar() {
    // Same as foo().
}

In this case, foo.myInvariantArray goes out of scope when foo() returns.  When
bar() allocates an array, it's entirely possible that the GC will give
bar.myArray
the same memory address as foo.myArray had.  In this case, if you're only using
stack bits to handle memoization, you're screwed.

Jan 10 2009

Walter Bright <newshound1 digitalmars.com> writes:

dsimcha wrote:
 In this case, foo.myInvariantArray goes out of scope when foo() returns.  When
 bar() allocates an array, it's entirely possible that the GC will give
bar.myArray
 the same memory address as foo.myArray had.  In this case, if you're only using
 stack bits to handle memoization, you're screwed.

That's the beauty of a garbage collector <g>. The bits will include a 
reference to the immutable data, which the gc will see, and so the gc 
will not reclaim that data.

Jan 10 2009

dsimcha <dsimcha yahoo.com> writes:

== Quote from Walter Bright (newshound1 digitalmars.com)'s article
 dsimcha wrote:
 In this case, foo.myInvariantArray goes out of scope when foo() returns.  When
 bar() allocates an array, it's entirely possible that the GC will give
bar.myArray
 the same memory address as foo.myArray had.  In this case, if you're only using
 stack bits to handle memoization, you're screwed.

 That's the beauty of a garbage collector <g>. The bits will include a
 reference to the immutable data, which the gc will see, and so the gc
 will not reclaim that data.

Good point, the bits in the internal memoization AA will contain a reference.  I
forgot about that.  This leads to another problem, though:  Wouldn't this cause
a
lot of soft memory leaks, since you're holding references to stuff that's
otherwise no longer needed?

Jan 10 2009

Walter Bright <newshound1 digitalmars.com> writes:

dsimcha wrote:
 == Quote from Walter Bright (newshound1 digitalmars.com)'s article
 dsimcha wrote:
 In this case, foo.myInvariantArray goes out of scope when foo() returns.  When
 bar() allocates an array, it's entirely possible that the GC will give
bar.myArray
 the same memory address as foo.myArray had.  In this case, if you're only using
 stack bits to handle memoization, you're screwed.

 That's the beauty of a garbage collector <g>. The bits will include a
 reference to the immutable data, which the gc will see, and so the gc
 will not reclaim that data.

 
 Good point, the bits in the internal memoization AA will contain a reference. 
I
 forgot about that.  This leads to another problem, though:  Wouldn't this
cause a
 lot of soft memory leaks, since you're holding references to stuff that's
 otherwise no longer needed?

The job of managing those kind of design tradeoffs is something that the 
programmer should handle.

Jan 10 2009

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Walter Bright wrote:
 dsimcha wrote:
 Good point, the bits in the internal memoization AA will contain a 
 reference.  I
 forgot about that.  This leads to another problem, though:  Wouldn't 
 this cause a
 lot of soft memory leaks, since you're holding references to stuff that's
 otherwise no longer needed?

 
 The job of managing those kind of design tradeoffs is something that the 
 programmer should handle.

*cough* And with weak pointer/reference types we could handle it quite 
nicely. *cough* ;)

Jan 12 2009

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Walter Bright" wrote
 Andrei Alexandrescu wrote:
 Walter Bright wrote:
 Michel Fortin wrote:
 Hum, could the compiler be trusted to add the memoization code to pure 
 functions so they can stay pure?

 If the compiler does general memoization on pure functions, all it has 
 to do is use the bits of the arguments passed on the stack to the 
 function as an index into an associative array of the return values.

 Not the bits, the full objects if they have indirection. (Pure functions 
 should accept const parameters!)

 You're right, the "just the bits on the stack" is if the arguments are 
 immutable. Const arguments would have to include whatever was indirectly 
 referenced, which I argue is impractical for the runtime to handle 
 automatically.

Bits on the stack also do not work for immutable reference arguments. 
Imagine two immutable strings that are exactly the same data, but have 
different pointers.  You wouldn't want to memoize only on the pointer values 
(or you'd lose

I think bits on the stack only work for non-reference arguments.  It sounds 
like memoization is a non-automatic feature in any case.  Which is a shame 
because I thought it was a common pure-function optimization by the way it 
was discussed in the past.

-Steve

Jan 12 2009

Michel Fortin <michel.fortin michelf.com> writes:

On 2009-01-09 22:41:01 -0500, Walter Bright <newshound1 digitalmars.com> said:

 If the compiler does general memoization on pure functions, all it has 
 to do is use the bits of the arguments passed on the stack to the 
 function as an index into an associative array of the return values.

True, but only in the absence of pointers to mutable data (pointers to 
immutable data should be fine). (Hum, and beware of moving GCs, and 
ignore any memoization data which may be present inside the 
arguments... isn't determining equality a job for opEquals?)


 The problem is identifying if this would be faster than recomputing the 
 return value.

That's why I was using a "memoized" attribute in the function 
declaration, so the programmer decides if this function needs 
memoization or not. I know it's often a tough decision even for the 
designer of a function, but if adding memoization becomes easy, as a 
library designer you could just not worry about it and let users create 
a memoization wrapper when necessary.


-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Jan 10 2009

Stewart Gordon <smjg_1998 yahoo.com> writes:

Walter Bright wrote:
 Michel Fortin wrote:
 Hum, could the compiler be trusted to add the memoization code to pure 
 functions so they can stay pure?

 
 If the compiler does general memoization on pure functions, all it has 
 to do is use the bits of the arguments passed on the stack to the 
 function as an index into an associative array of the return values.

<snip>

Walter, before you go and implement any of this, I must point out that 
it's spelt "memorization" and "memorize".  (Or "memorisation" and 
"memorise" if you're British, but that's an aside.)

Stewart.

Jan 10 2009

Stewart Gordon <smjg_1998 yahoo.com> writes:

Stewart Gordon wrote:
<snip>
 Walter, before you go and implement any of this, I must point out that 
 it's spelt "memorization" and "memorize".  (Or "memorisation" and 
 "memorise" if you're British, but that's an aside.)

Or maybe not...
http://en.wikipedia.org/wiki/Memoization

Still, I'm confused....

Stewart.

Jan 10 2009

Christopher Wright <dhasenan gmail.com> writes:

Stewart Gordon wrote:
 Stewart Gordon wrote:
 <snip>
 Walter, before you go and implement any of this, I must point out that 
 it's spelt "memorization" and "memorize".  (Or "memorisation" and 
 "memorise" if you're British, but that's an aside.)

 
 Or maybe not...
 http://en.wikipedia.org/wiki/Memoization
 
 Still, I'm confused....
 
 Stewart.

I thought you were joking. What's confusing you?

Jan 10 2009

Walter Bright <newshound1 digitalmars.com> writes:

Christopher Wright wrote:
 Stewart Gordon wrote:
 Stewart Gordon wrote:
 <snip>
 Walter, before you go and implement any of this, I must point out 
 that it's spelt "memorization" and "memorize".  (Or "memorisation" 
 and "memorise" if you're British, but that's an aside.)

 Or maybe not...
 http://en.wikipedia.org/wiki/Memoization

 Still, I'm confused....

 Stewart.

 
 I thought you were joking. What's confusing you?

I forgot.

Jan 10 2009

Christopher Wright <dhasenan gmail.com> writes:

Walter Bright wrote:
 Christopher Wright wrote:
 Stewart Gordon wrote:
 Stewart Gordon wrote:
 <snip>
 Walter, before you go and implement any of this, I must point out 
 that it's spelt "memorization" and "memorize".  (Or "memorisation" 
 and "memorise" if you're British, but that's an aside.)

 Or maybe not...
 http://en.wikipedia.org/wiki/Memoization

 Still, I'm confused....

 Stewart.

 I thought you were joking. What's confusing you?

 
 I forgot.

Walter Bright is the new Stewart Gordon!

Jan 11 2009

Bill Baxter <wbaxter gmail.com> writes:

On Sun, Jan 11, 2009 at 4:31 AM, Stewart Gordon <smjg_1998 yahoo.com> wrote:
 Stewart Gordon wrote:
 <snip>
 Walter, before you go and implement any of this, I must point out that
 it's spelt "memorization" and "memorize".  (Or "memorisation" and "memorise"
 if you're British, but that's an aside.)

 Or maybe not...
 http://en.wikipedia.org/wiki/Memoization

 Still, I'm confused....

 Stewart.

Heh heh.  Yeh, I remember I thought I'd found a major typo in the CLR
algorithms tome when I first ran across that.  But it's not a typo.
It's "memo-ization" as in writing down a memo for yourself to remind
you of the answer later.

--bb

Jan 10 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Michel Fortin wrote:
 On 2009-01-09 20:45:57 -0500, Jason House <jason.james.house gmail.com> 
 said:
 
 For each function, you'll have to pick
 which way you want to go. I don't see what that has to do with D's
 future, though, the language allows either.

 Somehow, I had the impression that D might use memoization as some 
 kind of optimization once pure functions are embraced.  That was 
 probably a misunderstanding on my part.

 
 Hum, could the compiler be trusted to add the memoization code to pure 
 functions so they can stay pure? Something like this:
 
     pure memoize int costlyCalculus(int i, int j)
     {
         return i + j;
     }
 
 being transformed into this by the compiler?
 
     int[TypeTuple!(i, j)] _costlyCalculus_cache;
     pure int costlyCalculus(int i, int j)
     {
         {
             int* _found_result = Tuple!(i, j) in _costlyCalculus_cache;
             if (_found_result)
                 return _found_result;
         }
         {
             int _calculated_result = i + j;
             _costlyCalculus_cache[Tuple!(i, j)] = _calculated_result;
             return _calculated_result;
         }
     }
 
 Surely this way the compiler would be able to bypass purity checks for 
 accessing the cache while still ensuring that the function has no side 
 effect and always return the same result for the same arguments. The 
 downside being that a hash table may not always be the best memoization 
 technique.
 
 Perhaps then it belongs more in the standard library.
 
     pure int costlyCalculus(int i, int j);
     Memoizer!(costlyCalculus) memoizedCostlyCalculus;
 
 where Memoizer is a struct template defining a pure opCall and 
 containing the cache it can access by casting away purity in a few 
 places. But it makes it more difficult to add memoization while 
 overriding a function.
 

Yes!

Memoization is One Great Litmus Test of compile-time reflection. I 
implemented a couple of simple memoizers along the lines you mention, 
and I plan to make memoization examples part of TDPL.

The Even Greater Litmus Test would be to expose the memoizer itself as a 
pure function (which it essentially is).


Andrei

Jan 09 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Walter Bright wrote:
 Jason House wrote:
 Walter Bright Wrote:

 Jason House wrote:
 When considering the standard library changes, I realized that 
 object.d could change. I believe a pure opCmp or toString could
 break user code with impure versions of those functions. Would
 that kind of a change to object.d cause any real problems for D2
 users?

 As Andrei pointed out, the trouble with making the Object functions
 pure is if you want to do an override that caches its value.

 One could easily generalize that to mean non-final virtual functions
 should never be pure. Memoizaton is a pretty basic need. I thought
 that was in D's future. Is that not true?

 
 Memoization and purity don't mix. For each function, you'll have to pick 
 which way you want to go. I don't see what that has to do with D's 
 future, though, the language allows either.

Memoization and purity can and should go together. We need to sit down 
and discuss that someday.

Andrei

Jan 09 2009

Stewart Gordon <smjg_1998 yahoo.com> writes:

Walter Bright wrote:
<snip>
 Memoization and purity don't mix. For each function, you'll have to pick 
 which way you want to go. I don't see what that has to do with D's 
 future, though, the language allows either.

Actually, what they don't mix with is the requirement that it still be 
callable on a mutable object.

There's
http://d.puremagic.com/issues/show_bug.cgi?id=1824
and, while it's a nice idea, it precludes caching of the result.  Unless 
you want to keep the cache separate from the object.

A pure toString may be a nice idea; however, this precludes using any 
mutable members of the class to generate it.  But it would also be nice 
to be able to call such methods on all objects of the class, mutable or not.

One possibility is to have separate functions

     string toString()
     string toString() invariant

and let the programmer implement memo(r)ization on the mutable version 
if desired, whereas the invariant version would have it generated by the 
compiler.  But there are still problems
- nobody would know which to call if the reference is const, rather than 
mutable or invariant
- it would be necessary to maintain two versions of toString in parallel

Maybe there's a solution somewhere out there....

Stewart.

Jan 10 2009

D Programming

C/C++ Programming

Other

digitalmars.D - Purity (D2 standard libraries / object.d)