www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Does the GC consider slice lengths?

reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
void main() {
   auto a = new int[1_000_000];

   auto b = a[0..1];  // Hold on to the first element

   a = a[$-1..$];     // Drop all but the last element

   // Are the middle elements gone, or is b keeping
   // them alive just because it still holds the very
   // first pointer?

   // Attempt to access original elements through
   // a pointer to the first element. Bug?
   auto c = b.ptr[0..1_000_000];
}

One assumption might be that the GC remembers the single large 
allocation as one and all of the elements are still available? But if it 
considers slices and their lengths specially, then it would see that the 
mid section is not referenced any more.

I would not do that but one wonders... :)

A related question is whether the memory for earlier elements are freed 
as they are popped off from the front. I know by experience that the 
memory for earlier elements are indeed freed. So one can use D's arrays 
as queues without memory issues: Add to the end, pop from the front...

Ali
Apr 01 2022
next sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 4/1/22 12:24 PM, Ali Çehreli wrote:
 void main() {
    auto a = new int[1_000_000];
 
    auto b = a[0..1];  // Hold on to the first element
 
    a = a[$-1..$];     // Drop all but the last element
 
    // Are the middle elements gone, or is b keeping
    // them alive just because it still holds the very
    // first pointer?
 
    // Attempt to access original elements through
    // a pointer to the first element. Bug?
    auto c = b.ptr[0..1_000_000];
Not a bug, the array still exists and is allocated (the pointer is keeping it alive). Remember, the GC manages the memory, not the slice. e.g. if this were an array of structs with destructors, those destructors would not be called just because they aren't referenced by some slice.
 }
 
 One assumption might be that the GC remembers the single large 
 allocation as one and all of the elements are still available? But if it 
 considers slices and their lengths specially, then it would see that the 
 mid section is not referenced any more.
It's not a consideration for the GC, but possibly the compiler. The GC has no visibility into slices that hold references to its memory block. It only has knowledge of pointers to its memory block (and then only indirectly via scanning). Only the compiler could consider this distinction. As of yet, I don't think D has stated one way or another whether this could be exploited. But thinking about how a non-native array type might be implemented, I don't think there's any possible way D could consider a reference to one element to not be a reference to *all* elements.
 A related question is whether the memory for earlier elements are freed 
 as they are popped off from the front. I know by experience that the 
 memory for earlier elements are indeed freed. So one can use D's arrays 
 as queues without memory issues: Add to the end, pop from the front...
No, because removing the reference does not mean there are no other references. -Steve
Apr 01 2022
parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 4/1/22 10:46, Steven Schveighoffer wrote:

 But if
 it considers slices and their lengths specially, then it would see
 that the mid section is not referenced any more.
It's not a consideration for the GC, but possibly the compiler. The GC has no visibility into slices that hold references to its memory block. It only has knowledge of pointers to its memory block (and then only indirectly via scanning).
Yes but if it recognized slices in addition to pointers, it could consider their lengths to reuse the middle section, no? I understand the role of the compiler but as GC is part of the D runtime, it feels like it could own the concept of slices.
 A related question is whether the memory for earlier elements are
 freed as they are popped off from the front. I know by experience that
 the memory for earlier elements are indeed freed. So one can use D's
 arrays as queues without memory issues: Add to the end, pop from the
 front...
No, because removing the reference does not mean there are no other references.
Agreed but if the array is a private queue of a struct with just one reference, then the earlier memory would be freed. However, my mind was not working well in the morning: Combining with what you said above, of course it is never the "previous parts" of the array that are freed, rather the old allocations of the array that are freed. (Provided there are no references.) Ali
Apr 01 2022
next sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 4/1/22 4:15 PM, Ali Çehreli wrote:
 On 4/1/22 10:46, Steven Schveighoffer wrote:
 
  >> But if
  >> it considers slices and their lengths specially, then it would see
  >> that the mid section is not referenced any more.
  >
  > It's not a consideration for the GC, but possibly the compiler. The GC
  > has no visibility into slices that hold references to its memory block.
  > It only has knowledge of pointers to its memory block (and then only
  > indirectly via scanning).
 
 Yes but if it recognized slices in addition to pointers, it could 
 consider their lengths to reuse the middle section, no? I understand the 
 role of the compiler but as GC is part of the D runtime, it feels like 
 it could own the concept of slices.
While *possible*, there are several problems: 1. The GC is performant (!) because it only has to check whether a block is live or not. If it had to also mark every single element (which is based on the type) of the block as live or dead, the performance takes a steep nosedive, along with space usage (1 bit per element?). 2. The GC somehow would have to manage that "middle" block of memory. How do you free part of a block? 3. By default, memory is untyped. I think even with precise scanning, the stack is untyped. The GC only scans pointers because it has no way of interpreting the length that's contained somewhere nearby. A lot of machinery would have to be added to make this happen.
 
  >> A related question is whether the memory for earlier elements are
  >> freed as they are popped off from the front. I know by experience that
  >> the memory for earlier elements are indeed freed. So one can use D's
  >> arrays as queues without memory issues: Add to the end, pop from the
  >> front...
  >
  > No, because removing the reference does not mean there are no other
  > references.
 
 Agreed but if the array is a private queue of a struct with just one 
 reference, then the earlier memory would be freed.
 
 However, my mind was not working well in the morning: Combining with 
 what you said above, of course it is never the "previous parts" of the 
 array that are freed, rather the old allocations of the array that are 
 freed. (Provided there are no references.)
Yes, I didn't read fully your question. The "front" elements that you see freed are the old copies that are discarded when a reallocation occurs. It's still an entire block that's managed, not portions of it. -Steve
Apr 01 2022
parent =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 4/1/22 13:55, Steven Schveighoffer wrote:

 The GC only scans pointers because it has no way
 of interpreting the length that's contained somewhere nearby.
Makes sense. (I figured that out myself by thinking after sending my message. :/) Ali
Apr 01 2022
prev sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Fri, Apr 01, 2022 at 01:15:01PM -0700, Ali Çehreli via Digitalmars-d-learn
wrote:
 On 4/1/22 10:46, Steven Schveighoffer wrote:
 
 But if it considers slices and their lengths specially, then it
 would see that the mid section is not referenced any more.
It's not a consideration for the GC, but possibly the compiler. The GC has no visibility into slices that hold references to its memory block. It only has knowledge of pointers to its memory block (and then only indirectly via scanning).
Yes but if it recognized slices in addition to pointers, it could consider their lengths to reuse the middle section, no? I understand the role of the compiler but as GC is part of the D runtime, it feels like it could own the concept of slices.
[...] The GC trades in blocks of memory. Being conservative, it makes no assumptions about the semantics of pointers that it encounters when it scans for references, other than the fact that it's referencing an allocated block of memory. It doesn't know what a slice is -- which, in D, is stored as a struct containing a pointer and a length: the GC makes no assumptions about the meaning of the length value that just happens to occur adjacently to the pointer. It could be part of a slice, or it could be a completely unrelated variable that has no relation to the pointer; such values are disregarded. In fact, unless precise scanning is enabled, the GC doesn't even know which values are pointers and which are integers; it just conservatively assumes that every integer-like value is a potential pointer. Only if they don't look like a valid address, or point to something outside the GC heap, or to an unallocated block, then the GC will ignore them. This is why in 32-bit programs these "false pointers" can sometimes cause the GC not to collect blocks that are actually already unreferenced (you happen to have an integer variable whose value coincidentally looks like a pointer into the block). So as long as one reference to the allocated block still exists, the GC will not collect it. It cannot tell whether the program might have stored an index into an array inside this block in an integer variable that will later be added to the remaining reference; the GC would have no way of telling which integer variable this might be, so it must conservatively assume that as long as a reference to the block still exists, the entire block is still alive (other parts of the block may still be accessed by adding an index value, etc.). If you really only intend on keeping a small chunk of an existing large allocated block, one solution is to .dup or .idup the slice you want to keep, and null all other references to the old block (or let them go out of scope). // Of course, one could argue that a more precise scanning could help the GC minimize blocks whose sole remaining reference is a slice that covers only a small subset of the entire block. This would involve significant complications to the scanning algorithm, though, which may have performance implications. E.g., what constitutes a slice? Only built-in slices? So pointer + separately-kept index is not allowed? Then we have a potential buffer overrun bug in code that's actually not buggy, if some code holds a pointer to the block with some separate index value, then the GC comes along and thinks it can free the part of the block that the index indirectly refers to. I'm sure someone can come up with a way to solve this problem, but the question is whether the payoff is worth the likely performance hit to GC collection times. T -- Unix was not designed to stop people from doing stupid things, because that would also stop them from doing clever things. -- Doug Gwyn
Apr 01 2022
parent =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 4/1/22 14:11, H. S. Teoh wrote:
 the
 question is whether the payoff is worth the likely performance hit to GC
 collection times.
No! :) Thank you for the insights. Ali
Apr 01 2022
prev sibling parent Salih Dincer <salihdb hotmail.com> writes:
On Friday, 1 April 2022 at 16:24:33 UTC, Ali Çehreli wrote:
 void main() {
   auto a = new int[1_000_000];

   auto b = a[0..1];  // Hold on to the first element

   a = a[$-1..$];     // Drop all but the last element

   // Are the middle elements gone, or is b keeping
   // them alive just because it still holds the very
   // first pointer?

   // Attempt to access original elements through
   // a pointer to the first element. Bug?
   auto c = b.ptr[0..1_000_000];
 }

 One assumption might be that the GC remembers the single large 
 allocation as one and all of the elements are still available? 
 But if it considers slices and their lengths specially, then it 
 would see that the mid section is not referenced any more.
 [...]
```d import std.stdio; // -- I couldn't type 32 here --v enum testLimit = 1024 * 1024 * 31; void main() { char[] first ="a-123".dup; char* oneHalf = &first[2]; // =>[1] first.length = testLimit; first[$-1] = 'z'; // legal? Ok. auto sliceLeft = first[0..2]; // "a-" auto sliceRight = first[$-2..$];// "�z" // legal? Ok. assert(first.length == testLimit); first = sliceLeft ~ sliceRight; assert(first.length == 4); char[] second = "b-321".dup; second.length = testLimit/2; first.writeln; // "a-�z" first.length = testLimit/2; first[0..2].writeln; // legal? Ok. auto numsMid = oneHalf[0..3]; // "123" numsMid.writeln; second[0..2].writeln; // "b-" } ``` In this 2nd test, I couldn't get the memory up to 62 MBytes. It throws the following error: **(not a compiler error!)**
 core.exception.OutOfMemoryError src/core/exception.d(702)
In the same system configuration and in my previous attempt, I had allocated all of the memory to an array. Let's get to the conclusion here: We need to do something to get the used memory back. But what? SDB 79
Apr 01 2022