www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Range returning an array

reply Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
Hello all,

I have a situation in a program I'm writing where it's convenient to define a
range whose return value is an array.  Something like:

struct MySimulation(T)
{
	T[] var;
	T diff, convergence;

	auto front()  property
	{
		return var;
	}

	bool empty()  property
	{
		return (diff < convergence);
	}

	void popFront()
	{
		// update values in var
		// and calculate diff
	}
}

Now, the problem with a design like this is that it's unsafe, in the sense that,
let's say I do something like,

	T[] output;
	foreach(opvar; mySim)
		output = opvar;

... then, at the end of this loop, output will not be the same as it was set to
with the last assignment, because popFront() will have been called one last time
prior to the "empty" condition being set to true.

Now, there are a number of ways I can think of to avoid this.  One is to return
var.dup rather than var from front(), but this is undesirable because it will
hit performance in a big way.  Another might be to calculate the updated value
of var in a temporary array, and update var itself only if the diff is greater
than the convergence criterion.

However, I'm curious enough about this problem that I thought I'd throw it open
to everyone else to see if anyone has a better idea than one of these.  The
challenge here is to ensure really good performance while not risking bad or
incorrect results leaking outside of the range.

Any thoughts? :-)

Thanks & best wishes,

    -- Joe
Apr 09 2013
next sibling parent reply =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 04/09/2013 01:03 PM, Joseph Rushton Wakeling wrote:

 to return var.dup rather than var from front(), but this is
 undesirable because it will hit performance in a big way.
Your problem is the exact situation that stdio.ByLine has faced. There, the choice has been to use an internal buffer just like your 'var' and leave the .dup call to the user for when it is needed. That choice is faster but sometimes causes bugs. Ali
Apr 09 2013
parent "bearophile" <bearophileHUGS lycos.com> writes:
Ali Çehreli:

 Your problem is the exact situation that stdio.ByLine has 
 faced. There, the choice has been to use an internal buffer 
 just like your 'var' and leave the .dup call to the user for 
 when it is needed.

 That choice is faster but sometimes causes bugs.
In similar situations I add a boolean template argument that switches on or off the dupping of the given result. On default it's on. So it's safe on default and faster on request, and this goes well with the D zen. Bye, bearophile
Apr 09 2013
prev sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 09 Apr 2013 16:03:01 -0400, Joseph Rushton Wakeling  
<joseph.wakeling webdrake.net> wrote:


 Any thoughts? :-)
1. documentation. Make sure the user of the range knows that this data is going to be re-used. 2. Make front() return const if possible. It is another signal that you aren't supposed to keep this data. 3. For your specific situation, add lastFront(): struct MySimulation(T) { T[] var; T[] lastVar; T diff, convergence; auto front() property { return var; } bool empty() property { return (diff < convergence); } void popFront() { lastVar[] = var[]; // update values in var // and calculate diff } auto lastFront() property { return lastVar; } } -Steve
Apr 09 2013
parent reply Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 04/09/2013 11:02 PM, Steven Schveighoffer wrote:
 1. documentation.  Make sure the user of the range knows that this data is
going
 to be re-used.
In this case it's quite unlikely anyone will ever want to use the code apart from me, but yes, good point. :-)
 2. Make front() return const if possible.  It is another signal that you aren't
 supposed to keep this data.
 3. For your specific situation, add lastFront():
It's an interesting thought. I don't think it's ultimately the right way to go -- yes, my application rests strongly on finding the last value, but the problem is very simply that popFront kills the value _before_ finding out if the range is now empty. The only logical option that I can see is to tweak things so that doesn't happen, which is possible but probably a little bit more finnicky than the current implementation.
Apr 09 2013
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 09 Apr 2013 18:09:07 -0400, Joseph Rushton Wakeling  
<joseph.wakeling webdrake.net> wrote:

 On 04/09/2013 11:02 PM, Steven Schveighoffer wrote:
 3. For your specific situation, add lastFront():
It's an interesting thought. I don't think it's ultimately the right way to go -- yes, my application rests strongly on finding the last value, but the problem is very simply that popFront kills the value _before_ finding out if the range is now empty.
Well here is another solution: struct MySimulation(T) { T[] var; T[] tmpvar; T diff, convergence; auto front() property { return var; } bool empty() property { return (diff < convergence); } void popFront() { tmpvar[] = var[]; // update values in var // and calculate diff if(empty) { var[] = tmpvar[]; // revert var = null; // optional, detach from original array } } } You could also use alloca to avoid storing tmpvar as a struct member and also avoid allocation. -Steve
Apr 09 2013
next sibling parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 04/10/2013 12:33 AM, Steven Schveighoffer wrote:
 Well here is another solution
I did consider something like that. The actual one I have come to after thinking about it is like this: struct MySimulation(T) { T[] var; T diff = T.max; T convergence; _bool empty = false; auto front() property { return var; } bool empty() property { return _empty; } void popFront() { if(diff < convergence) _empty = true; else // update var and calculate new diff } } How does this look to people?
Apr 09 2013
prev sibling parent reply Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 04/10/2013 12:50 AM, Joseph Rushton Wakeling wrote:
 I did consider something like that.
By the way: the reason that I rejected the temporary-variable choice was that I couldn't really see the difference cost-wise between doing that, versus returning var.dup from front(). Especially as it's not necessarily guaranteed that front will be called frequently (I might just popFront() until the range is empty and then take the final front value).
Apr 09 2013
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 09 Apr 2013 18:53:56 -0400, Joseph Rushton Wakeling  
<joseph.wakeling webdrake.net> wrote:

 On 04/10/2013 12:50 AM, Joseph Rushton Wakeling wrote:
 I did consider something like that.
By the way: the reason that I rejected the temporary-variable choice was that I couldn't really see the difference cost-wise between doing that, versus returning var.dup from front(). Especially as it's not necessarily guaranteed that front will be called frequently (I might just popFront() until the range is empty and then take the final front value).
Calling front after empty is not good range policy, once empty, front is possibly invalid or points at invalid memory. -Steve
Apr 09 2013
next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Apr 09, 2013 at 07:18:25PM -0400, Steven Schveighoffer wrote:
 On Tue, 09 Apr 2013 18:53:56 -0400, Joseph Rushton Wakeling
 <joseph.wakeling webdrake.net> wrote:
 
On 04/10/2013 12:50 AM, Joseph Rushton Wakeling wrote:
I did consider something like that.
By the way: the reason that I rejected the temporary-variable choice was that I couldn't really see the difference cost-wise between doing that, versus returning var.dup from front(). Especially as it's not necessarily guaranteed that front will be called frequently (I might just popFront() until the range is empty and then take the final front value).
Calling front after empty is not good range policy, once empty, front is possibly invalid or points at invalid memory.
[...] I believe it was proposed that .front should assert if .empty returns true. Personally I think that's a bit too extreme, but nevertheleses, I agree that calling .front after .empty returns true is sloppy coding, and can easily lead to bugs or runtime errors. T -- The two rules of success: 1. Don't tell everything you know. -- YHL
Apr 09 2013
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 09 Apr 2013 19:31:59 -0400, H. S. Teoh <hsteoh quickfur.ath.cx>  
wrote:

 On Tue, Apr 09, 2013 at 07:18:25PM -0400, Steven Schveighoffer wrote:
 On Tue, 09 Apr 2013 18:53:56 -0400, Joseph Rushton Wakeling
 <joseph.wakeling webdrake.net> wrote:

On 04/10/2013 12:50 AM, Joseph Rushton Wakeling wrote:
I did consider something like that.
By the way: the reason that I rejected the temporary-variable choice was that I couldn't really see the difference cost-wise between doing that, versus returning var.dup from front(). Especially as it's not necessarily guaranteed that front will be called frequently (I might just popFront() until the range is empty and then take the final front value).
Calling front after empty is not good range policy, once empty, front is possibly invalid or points at invalid memory.
[...] I believe it was proposed that .front should assert if .empty returns true. Personally I think that's a bit too extreme, but nevertheleses, I agree that calling .front after .empty returns true is sloppy coding, and can easily lead to bugs or runtime errors.
That is not necessary. A range does not just have to be a range, it just has to support a range interface. It can be perfectly valid for front to not assert. It can be also valid for it to assert. It just means that functions that generically operate on ranges cannot count on front working after empty is true (and that WOULD be a bug). For example, if the OP replaced his range with someone else's, he may run into that situation. But as long as he is the author, it is not illegal for an object that is also a range to be empty and have front still work. -Steve
Apr 09 2013
parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Apr 09, 2013 at 09:59:42PM -0400, Steven Schveighoffer wrote:
 On Tue, 09 Apr 2013 19:31:59 -0400, H. S. Teoh
 <hsteoh quickfur.ath.cx> wrote:
 
On Tue, Apr 09, 2013 at 07:18:25PM -0400, Steven Schveighoffer wrote:
On Tue, 09 Apr 2013 18:53:56 -0400, Joseph Rushton Wakeling
<joseph.wakeling webdrake.net> wrote:

On 04/10/2013 12:50 AM, Joseph Rushton Wakeling wrote:
I did consider something like that.
By the way: the reason that I rejected the temporary-variable choice was that I couldn't really see the difference cost-wise between doing that, versus returning var.dup from front(). Especially as it's not necessarily guaranteed that front will be called frequently (I might just popFront() until the range is empty and then take the final front value).
Calling front after empty is not good range policy, once empty, front is possibly invalid or points at invalid memory.
[...] I believe it was proposed that .front should assert if .empty returns true. Personally I think that's a bit too extreme, but nevertheleses, I agree that calling .front after .empty returns true is sloppy coding, and can easily lead to bugs or runtime errors.
That is not necessary. A range does not just have to be a range, it just has to support a range interface. It can be perfectly valid for front to not assert. It can be also valid for it to assert. It just means that functions that generically operate on ranges cannot count on front working after empty is true (and that WOULD be a bug).
[...] Yes, that was my point. I didn't mean to say that front *should* assert, just that it *might* for some ranges. (In fact, I've written my own ranges that do support calling .front even when .empty is true.) In generic code, though, one should not assume one way or another. T -- Study gravitation, it's a field with a lot of potential.
Apr 09 2013
prev sibling next sibling parent reply "Jesse Phillips" <Jessekphillips+D gmail.com> writes:
On Tuesday, 9 April 2013 at 23:18:26 UTC, Steven Schveighoffer 
wrote:
 On Tue, 09 Apr 2013 18:53:56 -0400, Joseph Rushton Wakeling 
 <joseph.wakeling webdrake.net> wrote:
 By the way: the reason that I rejected the temporary-variable 
 choice was that I
 couldn't really see the difference cost-wise between doing 
 that, versus
 returning var.dup from front().  Especially as it's not 
 necessarily guaranteed
 that front will be called frequently (I might just popFront() 
 until the range is
 empty and then take the final front value).
Calling front after empty is not good range policy, once empty, front is possibly invalid or points at invalid memory. -Steve
I'm pretty sure he realizes this, his original code shows how he is doing this. He expects 'output' to hold the final front value, but instead it holds the empty "value." Joseph, I think you will have to profile to decide which is the fastest for your use case.
Apr 10 2013
parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 04/10/2013 08:22 PM, Jesse Phillips wrote:
 I'm pretty sure he realizes this, his original code shows how he is doing this.
 He expects 'output' to hold the final front value, but instead it holds the
 empty "value."
 
 Joseph, I think you will have to profile to decide which is the fastest for
your
 use case.
Agree. I do also see the risks in these assumptions with respect to front -- they can be safely worked around within my code but it's not a very friendly construction to put in front of other people.
Apr 11 2013
prev sibling parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 04/10/2013 01:18 AM, Steven Schveighoffer wrote:
 Calling front after empty is not good range policy, once empty, front is
 possibly invalid or points at invalid memory.
It's not necessarily required to actually call front with the range empty: one could do something like, for(; !sim.empty; sim.popFront()) v = sim.front; ... and the way that popFront() operates means that, while the output of front() may be transient in general, the last value will remain. Though I agree that, even in this case, it's is a potentially problematic situation (there's something weird about the idea of a range that is transient except for the last entry). I will have to profile with .dup enabled and see what changes.
Apr 11 2013