digitalmars.D.learn - Array slices and copy on write

simendsjo (8/8) Apr 03 2011 D will copy the original content if a slice expands, but is the followin...

spir (14/22) Apr 03 2011 As I understand it (there is no real spec for D), this is consistent wit...

simendsjo (12/32) Apr 03 2011 I know the dangers of slices and copy on write. I was thinking of a very...

spir (11/42) Apr 03 2011 I was trying to explain my views on D's slice semantics, thus showing ho...
Jonathan M Davis (19/55) Apr 03 2011 What the GC does and doesn't choose to collect is implementation-specifi...

Steven Schveighoffer (28/37) Apr 04 2011 There are two incorrect assumptions here:

simendsjo (17/56) Apr 04 2011 Given the above, I guessed so.

Steven Schveighoffer (25/49) Apr 04 2011 -or-

simendsjo <simen.endsjo pandavre.com> writes:

D will copy the original content if a slice expands, but is the following
behavior
implementation specific, or part of the specification?

	auto a = [0,1,2];
	auto b = a[0..2];
	a.length = 2; // is a[3] already marked for collection by the gc?
	assert(a.ptr == b.ptr);
	b.length += 1; // as b cannot reuse it even though noone is using it
	assert(a.ptr == b.ptr); // ooops, fails.. b is reallocated

Apr 03 2011

spir <denis.spir gmail.com> writes:

On 04/03/2011 02:29 PM, simendsjo wrote:
 D will copy the original content if a slice expands, but is the following
behavior
 implementation specific, or part of the specification?

 	auto a = [0,1,2];
 	auto b = a[0..2];
 	a.length = 2; // is a[3] already marked for collection by the gc?
 	assert(a.ptr == b.ptr);
 	b.length += 1; // as b cannot reuse it even though noone is using it
 	assert(a.ptr == b.ptr); // ooops, fails.. b is reallocated

As I understand it (there is no real spec for D), this is consistent with the 
often commented design of D slices.
People often explain that slices are like views on (implicite or explicit) 
ordinary arrays. Then, change on either is seen by both, just like references 
--*except* when change require resizing.
Another point of view is that slices somewhat work like copy-on-write: there is 
no actual copy until one variable value is changed, then only copy happens 
--expect when change can happen on place.

Denis
-- 
_________________
vita es estrany
spir.wikidot.com

Apr 03 2011

simendsjo <simen.endsjo pandavre.com> writes:

On 03.04.2011 14:55, spir wrote:
 On 04/03/2011 02:29 PM, simendsjo wrote:
 D will copy the original content if a slice expands, but is the
 following behavior
 implementation specific, or part of the specification?

 auto a = [0,1,2];
 auto b = a[0..2];
 a.length = 2; // is a[3] already marked for collection by the gc?
 assert(a.ptr == b.ptr);
 b.length += 1; // as b cannot reuse it even though noone is using it
 assert(a.ptr == b.ptr); // ooops, fails.. b is reallocated

 As I understand it (there is no real spec for D), this is consistent
 with the often commented design of D slices.
 People often explain that slices are like views on (implicite or
 explicit) ordinary arrays. Then, change on either is seen by both, just
 like references --*except* when change require resizing.
 Another point of view is that slices somewhat work like copy-on-write:
 there is no actual copy until one variable value is changed, then only
 copy happens --expect when change can happen on place.

 Denis

I know the dangers of slices and copy on write. I was thinking of a very 
specific example. I'll try to explain better.

auto a = [0,1,2];
auto b = a[0..2];
a.length = 2;
// a[3] should now not be referenced, but my guess is it's not collected 
by the gc
assert(a.ptr == b.ptr);
b.length += 1;
// b could potentionally use the last part of a instead of reallocating
// Could another implementation reuse a's old memory without reallocating?

Apr 03 2011

spir <denis.spir gmail.com> writes:

On 04/03/2011 04:29 PM, simendsjo wrote:
 On 03.04.2011 14:55, spir wrote:
 On 04/03/2011 02:29 PM, simendsjo wrote:
 D will copy the original content if a slice expands, but is the
 following behavior
 implementation specific, or part of the specification?

 auto a = [0,1,2];
 auto b = a[0..2];
 a.length = 2; // is a[3] already marked for collection by the gc?
 assert(a.ptr == b.ptr);
 b.length += 1; // as b cannot reuse it even though noone is using it
 assert(a.ptr == b.ptr); // ooops, fails.. b is reallocated

 As I understand it (there is no real spec for D), this is consistent
 with the often commented design of D slices.
 People often explain that slices are like views on (implicite or
 explicit) ordinary arrays. Then, change on either is seen by both, just
 like references --*except* when change require resizing.
 Another point of view is that slices somewhat work like copy-on-write:
 there is no actual copy until one variable value is changed, then only
 copy happens --expect when change can happen on place.


I was trying to explain my views on D's slice semantics, thus showing how/why 
the above behaviour matches it.

 I know the dangers of slices and copy on write. I was thinking of a very
 specific example. I'll try to explain better.

 auto a = [0,1,2];
 auto b = a[0..2];
 a.length = 2;
 // a[3] should now not be referenced, but my guess is it's not collected by
the gc
 assert(a.ptr == b.ptr);
 b.length += 1;
 // b could potentionally use the last part of a instead of reallocating
 // Could another implementation reuse a's old memory without reallocating?

I think this is how Go does slices (their cap(acity) extends to the end of the 
original array). For me, this would not match D's semantics, but since there is 
no spec AFAIK...

Denis
-- 
_________________
vita es estrany
spir.wikidot.com

Apr 03 2011

Jonathan M Davis <jmdavisProg gmx.com> writes:

On 2011-04-03 07:29, simendsjo wrote:
 On 03.04.2011 14:55, spir wrote:
 On 04/03/2011 02:29 PM, simendsjo wrote:
 D will copy the original content if a slice expands, but is the
 following behavior
 implementation specific, or part of the specification?
 
 auto a = [0,1,2];
 auto b = a[0..2];
 a.length = 2; // is a[3] already marked for collection by the gc?
 assert(a.ptr == b.ptr);
 b.length += 1; // as b cannot reuse it even though noone is using it
 assert(a.ptr == b.ptr); // ooops, fails.. b is reallocated

 
 As I understand it (there is no real spec for D), this is consistent
 with the often commented design of D slices.
 People often explain that slices are like views on (implicite or
 explicit) ordinary arrays. Then, change on either is seen by both, just
 like references --*except* when change require resizing.
 Another point of view is that slices somewhat work like copy-on-write:
 there is no actual copy until one variable value is changed, then only
 copy happens --expect when change can happen on place.
 
 Denis

 
 I know the dangers of slices and copy on write. I was thinking of a very
 specific example. I'll try to explain better.
 
 auto a = [0,1,2];
 auto b = a[0..2];
 a.length = 2;
 // a[3] should now not be referenced, but my guess is it's not collected
 by the gc
 assert(a.ptr == b.ptr);
 b.length += 1;
 // b could potentionally use the last part of a instead of reallocating
 // Could another implementation reuse a's old memory without reallocating?

What the GC does and doesn't choose to collect is implementation-specific and 
doesn't necessarily have anything to do with the compiler. Also, remember that 
memory allocation is done in blocks, so there's pretty much no way that the 
last element of an array would be collected simply because you shrunk the 
length of the array by one. And you may not want it to anyway. There is at 
least a chance that b.length += 1 would cause b to use the now unreferenced 
3rd element of a, so giving that one element's worth of memory to the OS could 
be very ineffecient. Regardless, whether b.length +=1 manages to avoid a 
reallocation is implementation-dependent.

You should _not_ generally rely on when you are or aren't going to get a 
memory reallocation with an array. If you want to guarantee that a 
reallocation takes place, use dup or idup. If you want to guarantee that one 
doesn't take place, then don't increase the size of any array/slice which 
points to a particular block of memory. Beyond that, you don't really have any 
guarantees about reallocation. druntime will guarantee that you don't end up 
with your arrays stomping on each other, and it will try and minimize 
reallocations, but how it does all that is implementation-dependent.

- Jonathan M  Davis

Apr 03 2011

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Sun, 03 Apr 2011 08:29:47 -0400, simendsjo <simen.endsjo pandavre.com>  
wrote:

 D will copy the original content if a slice expands, but is the  
 following behavior
 implementation specific, or part of the specification?

 	auto a = [0,1,2];
 	auto b = a[0..2];
 	a.length = 2; // is a[3] already marked for collection by the gc?
 	assert(a.ptr == b.ptr);
 	b.length += 1; // as b cannot reuse it even though noone is using it
 	assert(a.ptr == b.ptr); // ooops, fails.. b is reallocated

There are two incorrect assumptions here:

1. The D array runtime code cannot possibly know how many references there  
are to a certain memory block, so your assertion that a[3] is unreferenced  
is not something that the compiler or runtime can know without horrendous  
performance implications.
2. The D GC collects entire memory blocks, not portions of them.  There is  
no way for the GC to "collect" a[3].  It can only collect the entire  
memory block which contains a.

The array runtime code does it's best effort to expand an array into its  
own memory block.  Because slicing an array or copying an array reference  
is a very fast, efficient event, it is not hooked by the array runtime  
code.  Therefore, the array runtime code cannot know all the array slices  
that point to the memory block.  The assumption it MUST make is that  
somebody somewhere has a pointer to the data it considers valid.  If you  
think about it, the bookkeeping necessary to track how many  
pointers/arrays point to each byte of an array would dwarf the cost of  
storing and using the array in the first place.

You must remember, copy-on-expansion (note, copy-on-write is not the right  
term, that implies changing any data in the array makes a copy), or rather  
the lack of copying, is an optimization.  In most cases, you should expect  
expansion can make a copy, but you shouldn't rely on that behavior.

For those cases where you want to *force* the runtime to consider data not  
referenced by an array as being 'free', you can use assumeSafeAppend:

b.assumeSafeAppend(); // marks b[3] as being unused

b.length += 1; // no copy made.

-Steve

Apr 04 2011

simendsjo <simen.endsjo pandavre.com> writes:

On 04.04.2011 15:53, Steven Schveighoffer wrote:
 On Sun, 03 Apr 2011 08:29:47 -0400, simendsjo
 <simen.endsjo pandavre.com> wrote:

 D will copy the original content if a slice expands, but is the
 following behavior
 implementation specific, or part of the specification?

 auto a = [0,1,2];
 auto b = a[0..2];
 a.length = 2; // is a[3] already marked for collection by the gc?
 assert(a.ptr == b.ptr);
 b.length += 1; // as b cannot reuse it even though noone is using it
 assert(a.ptr == b.ptr); // ooops, fails.. b is reallocated

 There are two incorrect assumptions here:

 1. The D array runtime code cannot possibly know how many references
 there are to a certain memory block, so your assertion that a[3] is
 unreferenced is not something that the compiler or runtime can know
 without horrendous performance implications.

Ah, of course. Makes sense.

 2. The D GC collects entire memory blocks, not portions of them. There
 is no way for the GC to "collect" a[3]. It can only collect the entire
 memory block which contains a.

Given the above, I guessed so.
Then you must copy data to a new array manually in case you want to free 
excess memory..?

int[] a;
a.length = 10_000_000; // my initial guess
// fill a
a.length = 100; // only 100 used. GC won't ever free the memory not used
auto b = a[0..100].dup; // copy
a = null; // and destroy

Or is it possible to mark part this for the GC as unused?


 The array runtime code does it's best effort to expand an array into its
 own memory block. Because slicing an array or copying an array reference
 is a very fast, efficient event, it is not hooked by the array runtime
 code. Therefore, the array runtime code cannot know all the array slices
 that point to the memory block. The assumption it MUST make is that
 somebody somewhere has a pointer to the data it considers valid. If you
 think about it, the bookkeeping necessary to track how many
 pointers/arrays point to each byte of an array would dwarf the cost of
 storing and using the array in the first place.

Makes sense.

 You must remember, copy-on-expansion (note, copy-on-write is not the
 right term, that implies changing any data in the array makes a copy),
 or rather the lack of copying, is an optimization. In most cases, you
 should expect expansion can make a copy, but you shouldn't rely on that
 behavior.

Copy on expansion is a much better term, yes.

 For those cases where you want to *force* the runtime to consider data
 not referenced by an array as being 'free', you can use assumeSafeAppend:

 b.assumeSafeAppend(); // marks b[3] as being unused

 b.length += 1; // no copy made.

 -Steve

I cannot find assumeSafeAppend.

Thanks a lot for your very thorough and detailed answer. Cleared up a 
couple of blind spots.

Apr 04 2011

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Mon, 04 Apr 2011 14:30:34 -0400, simendsjo <simen.endsjo pandavre.com>  
wrote:

 On 04.04.2011 15:53, Steven Schveighoffer wrote:
 2. The D GC collects entire memory blocks, not portions of them. There
 is no way for the GC to "collect" a[3]. It can only collect the entire
 memory block which contains a.

 Given the above, I guessed so.
 Then you must copy data to a new array manually in case you want to free  
 excess memory..?

 int[] a;
 a.length = 10_000_000; // my initial guess
 // fill a
 a.length = 100; // only 100 used. GC won't ever free the memory not used
 auto b = a[0..100].dup; // copy
 a = null; // and destroy\

-or-

a = a[0..100].dup;

Yes, the GC/array code does not move the array to a smaller block when it  
could (note in the example above, the array code again cannot know that  
nothing has a reference to the full 10,000,000 elements).  Simply because  
the array runtime cannot know what you intend to do with that array.  It  
has that block allocated, it's not going to spend extra cycles trying to  
optimize storage for it just in case that's what you wanted.

People who shrink an array down to 100 then build it back up would  
(rightfully) be complaining if the runtime reallocated to smaller blocks  
on shrinking.

 Or is it possible to mark part this for the GC as unused?

The way the GC works, it *could* do this for arrays that span multiple  
pages (i.e. give back some of the pages to the GC) without moving the  
data, but there is no mechanism that I know of to do that.  Re-allocating  
is a good enough solution, I guess nobody has yet had a very good need to  
do anything else.

 For those cases where you want to *force* the runtime to consider data
 not referenced by an array as being 'free', you can use  
 assumeSafeAppend:

 b.assumeSafeAppend(); // marks b[3] as being unused

 b.length += 1; // no copy made.

 I cannot find assumeSafeAppend.

It's in object.di (along with the documentation for it):

http://www.digitalmars.com/d/2.0/phobos/object.html#assumeSafeAppend

 Thanks a lot for your very thorough and detailed answer. Cleared up a  
 couple of blind spots.

No problem.  Arrays are a very tricky business in D, and I'm still finding  
new ways of looking at them :)  Having modified most of the array append  
code, I can say they are certainly one of the more deceptively complex  
parts of the runtime.

-Steve

Apr 04 2011

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Array slices and copy on write