www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Strange behavior on shrinking a Dynamic Array -- Capacity reduces to 0

reply d coder <dlang.coder gmail.com> writes:
Hello List

As per TDPL and D documentation, shrinking a dynamic array should not
relocate it. But that is exactly what is happening.
On shrinking an array, its capacity is getting reduced to 0. Surprisingly
the capacity after a shrink is even less than the length
of the array. As a result the array gets relocated as soon as another
element is added to it.

I have to call reserve again after shrinking it to make sure that the array
is not relocated.

Is this the expected behavior?

Regards
- Puneet

P.S. Please compile the following code to see yourself.

import std.stdio;

void main() {
  uint [] arr;
  reserve(arr, 2048);
  writeln("After reservation:");
  writefln("arr is at: %10s, length: %6s, capacity %6s",
   arr.ptr, arr.length, capacity(arr));
  // Grow the array
  for (size_t i = 0; i < 64; ++i) arr ~= i;

  writeln("After appending:");
  writefln("arr is at: %10s, length: %6s, capacity %6s",
   arr.ptr, arr.length, capacity(arr));

  arr.length = 32;
  // reserve(arr, 2048); // does not relocate if uncommented

  writeln("After setting length:");
  writefln("arr is at: %10s, length: %6s, capacity %6s",
   arr.ptr, arr.length, capacity(arr));

  // grow the array again
  for (size_t i = 0; i < 32; ++i) arr ~= i;
  writeln("After appending again:");
  writefln("arr is at: %10s, length: %6s, capacity %6s",
   arr.ptr, arr.length, capacity(arr));
}
Jun 22 2011
next sibling parent reply "Robert Jacques" <sandford jhu.edu> writes:
On Wed, 22 Jun 2011 23:38:30 -0400, d coder <dlang.coder gmail.com> wrote:

 Hello List

 As per TDPL and D documentation, shrinking a dynamic array should not
 relocate it. But that is exactly what is happening.
 On shrinking an array, its capacity is getting reduced to 0. Surprisingly
 the capacity after a shrink is even less than the length
 of the array. As a result the array gets relocated as soon as another
 element is added to it.

 I have to call reserve again after shrinking it to make sure that the  
 array
 is not relocated.

 Is this the expected behavior?
Yes. Capacity is only non-zero for arrays which can safely be extended. If you shrink an array from length 10 to length 5, the GC is smart enough to know that that there may be a dangling reference to elements [5..10] and thus if the length 5 array was appended to, it might stomp on someone else's data. Long ago, the GC was dumber and it caused a massive hole in the const/immutable system. If you know that no other aliases exist (i.e. you're buffering, etc) you can use the function assumeSafeAppend to reset the capacity. However, I'd strongly recommend switching to Appender whenever you're tempted to use assumeSafeAppend.
Jun 22 2011
next sibling parent Ali =?iso-8859-1?q?=C7ehreli?= <acehreli yahoo.com> writes:
On Wed, 22 Jun 2011 23:55:23 -0400, Robert Jacques wrote:

 On Wed, 22 Jun 2011 23:38:30 -0400, d coder <dlang.coder gmail.com>
 wrote:
 
 Hello List

 As per TDPL and D documentation, shrinking a dynamic array should not
 relocate it. But that is exactly what is happening. On shrinking an
 array, its capacity is getting reduced to 0. Surprisingly the capacity
 after a shrink is even less than the length of the array. As a result
 the array gets relocated as soon as another element is added to it.

 I have to call reserve again after shrinking it to make sure that the
 array
 is not relocated.

 Is this the expected behavior?
Yes. Capacity is only non-zero for arrays which can safely be extended. If you shrink an array from length 10 to length 5, the GC is smart enough to know that that there may be a dangling reference to elements [5..10] and thus if the length 5 array was appended to, it might stomp on someone else's data. Long ago, the GC was dumber and it caused a massive hole in the const/immutable system. If you know that no other aliases exist (i.e. you're buffering, etc) you can use the function assumeSafeAppend to reset the capacity. However, I'd strongly recommend switching to Appender whenever you're tempted to use assumeSafeAppend.
And an article to go with that: :) http://www.dsource.org/projects/dcollections/wiki/ArrayArticle Ali
Jun 22 2011
prev sibling parent reply d coder <dlang.coder gmail.com> writes:
 Yes. Capacity is only non-zero for arrays which can safely be extended. If
 you shrink an array from length 10 to length 5, the GC is smart enough to
 know that that there may be a dangling reference to elements [5..10] and
 thus if the length 5 array was appended to, it might stomp on someone else's
 data. Long ago, the GC was dumber and it caused a massive hole in the
 const/immutable system. If you know that no other aliases exist (i.e. you're
 buffering, etc) you can use the function assumeSafeAppend to reset the
 capacity. However, I'd strongly recommend switching to Appender whenever
 you're tempted to use assumeSafeAppend.
Thanks All I would give Appender a try. Any idea if it is safe to use this struct in multithreaded environment. Would it be OK to add elements to it from multiple threads -- of course I would be using synchronized code to make sure that only one element gets added at a time. Any other precautions? Regards - Puneet
Jun 22 2011
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 23 Jun 2011 00:15:12 -0400, d coder <dlang.coder gmail.com> wrote:

 Yes. Capacity is only non-zero for arrays which can safely be extended.  
 If
 you shrink an array from length 10 to length 5, the GC is smart enough  
 to
 know that that there may be a dangling reference to elements [5..10] and
 thus if the length 5 array was appended to, it might stomp on someone  
 else's
 data. Long ago, the GC was dumber and it caused a massive hole in the
 const/immutable system. If you know that no other aliases exist (i.e.  
 you're
 buffering, etc) you can use the function assumeSafeAppend to reset the
 capacity. However, I'd strongly recommend switching to Appender whenever
 you're tempted to use assumeSafeAppend.
Thanks All I would give Appender a try. Any idea if it is safe to use this struct in multithreaded environment. Would it be OK to add elements to it from multiple threads -- of course I would be using synchronized code to make sure that only one element gets added at a time. Any other precautions?
David Simcha has removed Appender from std.parallelism, due to a perceived failure during multithreaded appends (even with locking). I don't see why it would happen, but you may want to be cautious. See this commit: https://github.com/D-Programming-Language/phobos/commit/5a3761baac147ff43f4dc2ce264a18ad5e4330bd#std/parallelism.d -Steve
Jun 23 2011
parent dsimcha <dsimcha yahoo.com> writes:
== Quote from Steven Schveighoffer (schveiguy yahoo.com)'s article
 On Thu, 23 Jun 2011 00:15:12 -0400, d coder <dlang.coder gmail.com> wrote:
 Yes. Capacity is only non-zero for arrays which can safely be extended.
 If
 you shrink an array from length 10 to length 5, the GC is smart enough
 to
 know that that there may be a dangling reference to elements [5..10] and
 thus if the length 5 array was appended to, it might stomp on someone
 else's
 data. Long ago, the GC was dumber and it caused a massive hole in the
 const/immutable system. If you know that no other aliases exist (i.e.
 you're
 buffering, etc) you can use the function assumeSafeAppend to reset the
 capacity. However, I'd strongly recommend switching to Appender whenever
 you're tempted to use assumeSafeAppend.
Thanks All I would give Appender a try. Any idea if it is safe to use this struct in multithreaded environment. Would it be OK to add elements to it from multiple threads -- of course I would be using synchronized code to make sure that only one element gets added at a time. Any other precautions?
David Simcha has removed Appender from std.parallelism, due to a perceived failure during multithreaded appends (even with locking). I don't see why it would happen, but you may want to be cautious. See this commit:
https://github.com/D-Programming-Language/phobos/commit/5a3761baac147ff43f4dc2ce264a18ad5e4330bd#std/parallelism.d
 -Steve
I don't either. It was one of those things where I just didn't know what else to try. I'm not sure if removing Appender did anything. It seems to have removed one manifestation of the issues with std.parallelism, but occasionally it still segfaults on FreeBSD64. The bug appears to be in initialization code. Running the unittest in a loop 10,000 times by putting a for loop around the unittest code passes. Running the unittest several times by restarting the executable eventually fails. IIRC, though, running in a for loop used to fail before I took out the Appender. I can't reproduce any of this stuff anywhere except the FreeBSD64 auto tester.
Jun 23 2011
prev sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 22 Jun 2011 23:38:30 -0400, d coder <dlang.coder gmail.com> wrote:

 Hello List

 As per TDPL and D documentation, shrinking a dynamic array should not
 relocate it. But that is exactly what is happening.
 On shrinking an array, its capacity is getting reduced to 0. Surprisingly
 the capacity after a shrink is even less than the length
 of the array. As a result the array gets relocated as soon as another
 element is added to it.
A slice capacity of 0 does *not* indicate that 0 bytes are stored in the slice, it indicates that appending to the slice will reallocate into a new block. It's a special token meaning "any append to this slice will reallocate". There have been a few people that suggested capacity should return *at least* the number of bytes in the slice. However, the zero indicator can be useful. It's easier to check for, generates more efficient code when checking, plus the distinction between "not appendable" and "appendable, but the current block is full" can be very important. We would lose this distinction if the default return was the length of the slice.
 I have to call reserve again after shrinking it to make sure that the  
 array
 is not relocated.
In fact, it is relocated. Reserve will not overwrite data that is already valid, so it will reallocate the slice into a new array that can hold the requested size. There is a function that allows you to reestablish appendability without reallocating: assumeSafeAppend(). Call this on a slice, and it will become appendable again (non-zero capacity) at its current location. However, be cautious as this will *overwrite* any data that was valid after the slice. This function has no effect on slices of non-heap data.
 import std.stdio;

 void main() {
   uint [] arr;
   reserve(arr, 2048);
The above line will allocate an array, and set arr to the first 2048 elements.
   writeln("After reservation:");
   writefln("arr is at: %10s, length: %6s, capacity %6s",
    arr.ptr, arr.length, capacity(arr));
   // Grow the array
   for (size_t i = 0; i < 64; ++i) arr ~= i;

   writeln("After appending:");
   writefln("arr is at: %10s, length: %6s, capacity %6s",
    arr.ptr, arr.length, capacity(arr));
I expect arr.ptr to be unchanged, arr.length to be 64, and capacity to be unchanged.
   arr.length = 32;
Note that even though you have effectively cut off 32 elements from your slice that latter data is *still valid*. Another slice could be referencing that data, which is why arr is no longer appendable.
   // reserve(arr, 2048); // does not relocate if uncommented

   writeln("After setting length:");
   writefln("arr is at: %10s, length: %6s, capacity %6s",
    arr.ptr, arr.length, capacity(arr));
If you leave reserve commented out, I'd expect arr.ptr to remain unchanged, length to be 32, and capacity to be 0 (not appendable) If you uncomment reserve, then arr.ptr will have moved, length will be 32, and capacity will be >= 2048.
   // grow the array again
   for (size_t i = 0; i < 32; ++i) arr ~= i;
   writeln("After appending again:");
   writefln("arr is at: %10s, length: %6s, capacity %6s",
    arr.ptr, arr.length, capacity(arr));
Regardless of reserve, arr.ptr will have changed from it's original allocation, arr.length == 64 and capacity will vary depending on whether reserve was called or not. -Steve
Jun 22 2011