www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Array slices and copy on write

reply simendsjo <simen.endsjo pandavre.com> writes:
D will copy the original content if a slice expands, but is the following
behavior
implementation specific, or part of the specification?

	auto a = [0,1,2];
	auto b = a[0..2];
	a.length = 2; // is a[3] already marked for collection by the gc?
	assert(a.ptr == b.ptr);
	b.length += 1; // as b cannot reuse it even though noone is using it
	assert(a.ptr == b.ptr); // ooops, fails.. b is reallocated
Apr 03 2011
next sibling parent reply spir <denis.spir gmail.com> writes:
On 04/03/2011 02:29 PM, simendsjo wrote:
 D will copy the original content if a slice expands, but is the following
behavior
 implementation specific, or part of the specification?

 	auto a = [0,1,2];
 	auto b = a[0..2];
 	a.length = 2; // is a[3] already marked for collection by the gc?
 	assert(a.ptr == b.ptr);
 	b.length += 1; // as b cannot reuse it even though noone is using it
 	assert(a.ptr == b.ptr); // ooops, fails.. b is reallocated

As I understand it (there is no real spec for D), this is consistent with the often commented design of D slices. People often explain that slices are like views on (implicite or explicit) ordinary arrays. Then, change on either is seen by both, just like references --*except* when change require resizing. Another point of view is that slices somewhat work like copy-on-write: there is no actual copy until one variable value is changed, then only copy happens --expect when change can happen on place. Denis -- _________________ vita es estrany spir.wikidot.com
Apr 03 2011
next sibling parent simendsjo <simen.endsjo pandavre.com> writes:
On 03.04.2011 14:55, spir wrote:
 On 04/03/2011 02:29 PM, simendsjo wrote:
 D will copy the original content if a slice expands, but is the
 following behavior
 implementation specific, or part of the specification?

 auto a = [0,1,2];
 auto b = a[0..2];
 a.length = 2; // is a[3] already marked for collection by the gc?
 assert(a.ptr == b.ptr);
 b.length += 1; // as b cannot reuse it even though noone is using it
 assert(a.ptr == b.ptr); // ooops, fails.. b is reallocated

As I understand it (there is no real spec for D), this is consistent with the often commented design of D slices. People often explain that slices are like views on (implicite or explicit) ordinary arrays. Then, change on either is seen by both, just like references --*except* when change require resizing. Another point of view is that slices somewhat work like copy-on-write: there is no actual copy until one variable value is changed, then only copy happens --expect when change can happen on place. Denis

I know the dangers of slices and copy on write. I was thinking of a very specific example. I'll try to explain better. auto a = [0,1,2]; auto b = a[0..2]; a.length = 2; // a[3] should now not be referenced, but my guess is it's not collected by the gc assert(a.ptr == b.ptr); b.length += 1; // b could potentionally use the last part of a instead of reallocating // Could another implementation reuse a's old memory without reallocating?
Apr 03 2011
prev sibling parent spir <denis.spir gmail.com> writes:
On 04/03/2011 04:29 PM, simendsjo wrote:
 On 03.04.2011 14:55, spir wrote:
 On 04/03/2011 02:29 PM, simendsjo wrote:
 D will copy the original content if a slice expands, but is the
 following behavior
 implementation specific, or part of the specification?

 auto a = [0,1,2];
 auto b = a[0..2];
 a.length = 2; // is a[3] already marked for collection by the gc?
 assert(a.ptr == b.ptr);
 b.length += 1; // as b cannot reuse it even though noone is using it
 assert(a.ptr == b.ptr); // ooops, fails.. b is reallocated

As I understand it (there is no real spec for D), this is consistent with the often commented design of D slices. People often explain that slices are like views on (implicite or explicit) ordinary arrays. Then, change on either is seen by both, just like references --*except* when change require resizing. Another point of view is that slices somewhat work like copy-on-write: there is no actual copy until one variable value is changed, then only copy happens --expect when change can happen on place.


I was trying to explain my views on D's slice semantics, thus showing how/why the above behaviour matches it.
 I know the dangers of slices and copy on write. I was thinking of a very
 specific example. I'll try to explain better.

 auto a = [0,1,2];
 auto b = a[0..2];
 a.length = 2;
 // a[3] should now not be referenced, but my guess is it's not collected by
the gc
 assert(a.ptr == b.ptr);
 b.length += 1;
 // b could potentionally use the last part of a instead of reallocating
 // Could another implementation reuse a's old memory without reallocating?

I think this is how Go does slices (their cap(acity) extends to the end of the original array). For me, this would not match D's semantics, but since there is no spec AFAIK... Denis -- _________________ vita es estrany spir.wikidot.com
Apr 03 2011
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On 2011-04-03 07:29, simendsjo wrote:
 On 03.04.2011 14:55, spir wrote:
 On 04/03/2011 02:29 PM, simendsjo wrote:
 D will copy the original content if a slice expands, but is the
 following behavior
 implementation specific, or part of the specification?
 
 auto a = [0,1,2];
 auto b = a[0..2];
 a.length = 2; // is a[3] already marked for collection by the gc?
 assert(a.ptr == b.ptr);
 b.length += 1; // as b cannot reuse it even though noone is using it
 assert(a.ptr == b.ptr); // ooops, fails.. b is reallocated

As I understand it (there is no real spec for D), this is consistent with the often commented design of D slices. People often explain that slices are like views on (implicite or explicit) ordinary arrays. Then, change on either is seen by both, just like references --*except* when change require resizing. Another point of view is that slices somewhat work like copy-on-write: there is no actual copy until one variable value is changed, then only copy happens --expect when change can happen on place. Denis

I know the dangers of slices and copy on write. I was thinking of a very specific example. I'll try to explain better. auto a = [0,1,2]; auto b = a[0..2]; a.length = 2; // a[3] should now not be referenced, but my guess is it's not collected by the gc assert(a.ptr == b.ptr); b.length += 1; // b could potentionally use the last part of a instead of reallocating // Could another implementation reuse a's old memory without reallocating?

What the GC does and doesn't choose to collect is implementation-specific and doesn't necessarily have anything to do with the compiler. Also, remember that memory allocation is done in blocks, so there's pretty much no way that the last element of an array would be collected simply because you shrunk the length of the array by one. And you may not want it to anyway. There is at least a chance that b.length += 1 would cause b to use the now unreferenced 3rd element of a, so giving that one element's worth of memory to the OS could be very ineffecient. Regardless, whether b.length +=1 manages to avoid a reallocation is implementation-dependent. You should _not_ generally rely on when you are or aren't going to get a memory reallocation with an array. If you want to guarantee that a reallocation takes place, use dup or idup. If you want to guarantee that one doesn't take place, then don't increase the size of any array/slice which points to a particular block of memory. Beyond that, you don't really have any guarantees about reallocation. druntime will guarantee that you don't end up with your arrays stomping on each other, and it will try and minimize reallocations, but how it does all that is implementation-dependent. - Jonathan M Davis
Apr 03 2011
prev sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Sun, 03 Apr 2011 08:29:47 -0400, simendsjo <simen.endsjo pandavre.com>  
wrote:

 D will copy the original content if a slice expands, but is the  
 following behavior
 implementation specific, or part of the specification?

 	auto a = [0,1,2];
 	auto b = a[0..2];
 	a.length = 2; // is a[3] already marked for collection by the gc?
 	assert(a.ptr == b.ptr);
 	b.length += 1; // as b cannot reuse it even though noone is using it
 	assert(a.ptr == b.ptr); // ooops, fails.. b is reallocated

There are two incorrect assumptions here: 1. The D array runtime code cannot possibly know how many references there are to a certain memory block, so your assertion that a[3] is unreferenced is not something that the compiler or runtime can know without horrendous performance implications. 2. The D GC collects entire memory blocks, not portions of them. There is no way for the GC to "collect" a[3]. It can only collect the entire memory block which contains a. The array runtime code does it's best effort to expand an array into its own memory block. Because slicing an array or copying an array reference is a very fast, efficient event, it is not hooked by the array runtime code. Therefore, the array runtime code cannot know all the array slices that point to the memory block. The assumption it MUST make is that somebody somewhere has a pointer to the data it considers valid. If you think about it, the bookkeeping necessary to track how many pointers/arrays point to each byte of an array would dwarf the cost of storing and using the array in the first place. You must remember, copy-on-expansion (note, copy-on-write is not the right term, that implies changing any data in the array makes a copy), or rather the lack of copying, is an optimization. In most cases, you should expect expansion can make a copy, but you shouldn't rely on that behavior. For those cases where you want to *force* the runtime to consider data not referenced by an array as being 'free', you can use assumeSafeAppend: b.assumeSafeAppend(); // marks b[3] as being unused b.length += 1; // no copy made. -Steve
Apr 04 2011
next sibling parent simendsjo <simen.endsjo pandavre.com> writes:
On 04.04.2011 15:53, Steven Schveighoffer wrote:
 On Sun, 03 Apr 2011 08:29:47 -0400, simendsjo
 <simen.endsjo pandavre.com> wrote:

 D will copy the original content if a slice expands, but is the
 following behavior
 implementation specific, or part of the specification?

 auto a = [0,1,2];
 auto b = a[0..2];
 a.length = 2; // is a[3] already marked for collection by the gc?
 assert(a.ptr == b.ptr);
 b.length += 1; // as b cannot reuse it even though noone is using it
 assert(a.ptr == b.ptr); // ooops, fails.. b is reallocated

There are two incorrect assumptions here: 1. The D array runtime code cannot possibly know how many references there are to a certain memory block, so your assertion that a[3] is unreferenced is not something that the compiler or runtime can know without horrendous performance implications.

Ah, of course. Makes sense.
 2. The D GC collects entire memory blocks, not portions of them. There
 is no way for the GC to "collect" a[3]. It can only collect the entire
 memory block which contains a.

Given the above, I guessed so. Then you must copy data to a new array manually in case you want to free excess memory..? int[] a; a.length = 10_000_000; // my initial guess // fill a a.length = 100; // only 100 used. GC won't ever free the memory not used auto b = a[0..100].dup; // copy a = null; // and destroy Or is it possible to mark part this for the GC as unused?
 The array runtime code does it's best effort to expand an array into its
 own memory block. Because slicing an array or copying an array reference
 is a very fast, efficient event, it is not hooked by the array runtime
 code. Therefore, the array runtime code cannot know all the array slices
 that point to the memory block. The assumption it MUST make is that
 somebody somewhere has a pointer to the data it considers valid. If you
 think about it, the bookkeeping necessary to track how many
 pointers/arrays point to each byte of an array would dwarf the cost of
 storing and using the array in the first place.

Makes sense.
 You must remember, copy-on-expansion (note, copy-on-write is not the
 right term, that implies changing any data in the array makes a copy),
 or rather the lack of copying, is an optimization. In most cases, you
 should expect expansion can make a copy, but you shouldn't rely on that
 behavior.

Copy on expansion is a much better term, yes.
 For those cases where you want to *force* the runtime to consider data
 not referenced by an array as being 'free', you can use assumeSafeAppend:

 b.assumeSafeAppend(); // marks b[3] as being unused

 b.length += 1; // no copy made.

 -Steve

I cannot find assumeSafeAppend. Thanks a lot for your very thorough and detailed answer. Cleared up a couple of blind spots.
Apr 04 2011
prev sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Mon, 04 Apr 2011 14:30:34 -0400, simendsjo <simen.endsjo pandavre.com>  
wrote:

 On 04.04.2011 15:53, Steven Schveighoffer wrote:
 2. The D GC collects entire memory blocks, not portions of them. There
 is no way for the GC to "collect" a[3]. It can only collect the entire
 memory block which contains a.

Given the above, I guessed so. Then you must copy data to a new array manually in case you want to free excess memory..? int[] a; a.length = 10_000_000; // my initial guess // fill a a.length = 100; // only 100 used. GC won't ever free the memory not used auto b = a[0..100].dup; // copy a = null; // and destroy\

-or- a = a[0..100].dup; Yes, the GC/array code does not move the array to a smaller block when it could (note in the example above, the array code again cannot know that nothing has a reference to the full 10,000,000 elements). Simply because the array runtime cannot know what you intend to do with that array. It has that block allocated, it's not going to spend extra cycles trying to optimize storage for it just in case that's what you wanted. People who shrink an array down to 100 then build it back up would (rightfully) be complaining if the runtime reallocated to smaller blocks on shrinking.
 Or is it possible to mark part this for the GC as unused?

The way the GC works, it *could* do this for arrays that span multiple pages (i.e. give back some of the pages to the GC) without moving the data, but there is no mechanism that I know of to do that. Re-allocating is a good enough solution, I guess nobody has yet had a very good need to do anything else.
 For those cases where you want to *force* the runtime to consider data
 not referenced by an array as being 'free', you can use  
 assumeSafeAppend:

 b.assumeSafeAppend(); // marks b[3] as being unused

 b.length += 1; // no copy made.

I cannot find assumeSafeAppend.

It's in object.di (along with the documentation for it): http://www.digitalmars.com/d/2.0/phobos/object.html#assumeSafeAppend
 Thanks a lot for your very thorough and detailed answer. Cleared up a  
 couple of blind spots.

No problem. Arrays are a very tricky business in D, and I'm still finding new ways of looking at them :) Having modified most of the array append code, I can say they are certainly one of the more deceptively complex parts of the runtime. -Steve
Apr 04 2011