www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Question to GC experts: NO_SCAN for a part of the block

reply Stanislav Blinov <stanislav.blinov gmail.com> writes:
Suppose I need to allocate several dynamic arrays of different 
types and/or sizes. An obvious choice is to do just that: perform 
N allocations.
But given that I know all the sizes, and also am pretty sure that 
each of the arrays will have the same lifespan, why would I do N 
allocations when I can do just one? After all, I can ask the GC 
for a sufficiently-sized memory block and put my arrays in it.

But I don't necessarily want the GC to scan *all* my arrays for 
pointers. The problem is, with current GC, it seems that it's 
impossible to mark only a part of the block as NO_SCAN. Note that 
I'm not really talking about precise GC here with full type 
information, but about addRange() pointing within a GC-allocated 
block. A short example, two arrays, one holding class references, 
the other - bytes:

---

import core.memory : GC;
import std.stdio;
import std.typecons;

class C {
     int i;
     this(int i) {
         this.i = i;
     }

     ~this() {
         writeln("C(", i, ") dtor");
     }
}

auto obvious(int numCs, int numBytes) {
     // obvious solution: two arrays, two allocations
     return tuple!("Cs", "bytes")(new C[numCs], new 
byte[numBytes]);
}

auto scanned(int numCs, int numBytes) {

     // one allocation for both arrays,
     // for simplicity not dealing with alignment/out of memory 
here.
     // allocate with default attributes, scanned block
     auto memory = GC.malloc(numCs*C.sizeof + 
numBytes*byte.sizeof);
     auto cs = (cast(C*)memory)[0..numCs];
     cs[] = C.init;
     auto bytes = (cast(byte*)(memory + 
numCs*C.sizeof))[0..numBytes];
     bytes[] = byte.init;
     return tuple!("Cs", "bytes")(cs, bytes);
}

auto selective(int numCs, int numBytes) {

     // one allocation for both arrays,
     // for simplicity not dealing with alignment/out of memory 
here.
     // explicitly ask for NO_SCAN block,
     // to not scan bytes for pointers
     auto memory = GC.malloc(numCs*C.sizeof + numBytes*byte.sizeof,
             GC.BlkAttr.NO_SCAN);
     auto cs = (cast(C*)memory)[0..numCs];
     cs[] = C.init;
     // add scanning range for references
     GC.addRange(cs.ptr, cs.length*C.sizeof, typeid(C));
     auto bytes = (cast(byte*)(memory + 
numCs*C.sizeof))[0..numBytes];
     bytes[] = byte.init;
     return tuple!("Cs", "bytes")(cs, bytes);
}

void main() {

     int numCs = 4; // comes at runtime from elsewhere
     int numBytes = 32; // comes at runtime from elsewhere

     auto arrays1 = obvious(numCs, numBytes);

     int counter;
     foreach (ref e; arrays1.Cs)
         e = new C(counter++); // dtors will be called

     auto arrays2 = scanned(numCs, numBytes);
     foreach (ref e; arrays2.Cs)
         e = new C(counter++); // dtors will be called

     auto arrays3 = selective(numCs, numBytes);
     foreach (ref e; arrays3.Cs)
         e = new C(counter++); // dtors will not be called
}

---

Should this work, and if not, why?
Jun 03
parent reply ag0aep6g <anonymous example.com> writes:
On 06/04/2017 03:08 AM, Stanislav Blinov wrote:
 ---
 
 import core.memory : GC;
 import std.stdio;
 import std.typecons;
 
 class C {
      int i;
      this(int i) {
          this.i = i;
      }
 
      ~this() {
          writeln("C(", i, ") dtor");
      }
 }
[...]
 auto selective(int numCs, int numBytes) {
[...]
      auto memory = GC.malloc(numCs*C.sizeof + numBytes*byte.sizeof,
              GC.BlkAttr.NO_SCAN);
      auto cs = (cast(C*)memory)[0..numCs];
      cs[] = C.init;
      // add scanning range for references
      GC.addRange(cs.ptr, cs.length*C.sizeof, typeid(C));
      auto bytes = (cast(byte*)(memory + numCs*C.sizeof))[0..numBytes];
      bytes[] = byte.init;
      return tuple!("Cs", "bytes")(cs, bytes);
 }
 
 void main() {
 
      int numCs = 4; // comes at runtime from elsewhere
      int numBytes = 32; // comes at runtime from elsewhere
[...]
      int counter;
[...]
      auto arrays3 = selective(numCs, numBytes);
      foreach (ref e; arrays3.Cs)
          e = new C(counter++); // dtors will not be called
 }
 
 ---
 
 Should this work, and if not, why?
As far as I can tell, the `addRange` call works correctly, but maybe too well in a sense. It keeps the `new`ed `C`s alive as long as `arrays3.Cs` has pointers to them. And `arrays3.Cs` has those pointers until the very end. If you add `GC.removeRange(arrays3.Cs.ptr);` at the of `main`, the dtors show up. Overwriting `arrays3.Cs`'s elements with `null`s also works. My guess is that the `removeRange` call isn't done automatically before the final run of the GC. Maybe it should be? But I have a vague memory that the GC isn't required to call destructors on everything in the final run. Or maybe it's not guaranteed that there is a final run when the program ends? Anyway, that would mean everything's working as intended, and you just can't rely on destructors like that.
Jun 04
parent reply Stanislav Blinov <stanislav.blinov gmail.com> writes:
On Sunday, 4 June 2017 at 09:38:45 UTC, ag0aep6g wrote:

 Should this work, and if not, why?
As far as I can tell, the `addRange` call works correctly, but maybe too well in a sense. It keeps the `new`ed `C`s alive as long as `arrays3.Cs` has pointers to them. And `arrays3.Cs` has those pointers until the very end.
Yeah, after playing around with the code a bit, shuffling the calls, making new allocations, etc., I saw those dtors indeed being run. I was just expecting the behavior to be the same as for normal 'new'ed arrays, but I guess there are nuances.
 If you add `GC.removeRange(arrays3.Cs.ptr);` at the of `main`, 
 the dtors show up.
Yeah, well, calling it manually sort of defeats the purpose :)
 Overwriting `arrays3.Cs`'s elements with `null`s also works. My 
 guess is that the `removeRange` call isn't done automatically 
 before the final run of the GC. Maybe it should be?
If at all possible, I think that'd be good, if only for consistency.
 But I have a vague memory that the GC isn't required to call 
 destructors on everything in the final run. Or maybe it's not 
 guaranteed that there is a final run when the program ends?
That's what puzzled me, seeing as dtors from the other two arrays ran at the end. I understand how particular dtors may not be run due to circular or self-references, but there are none in this case.
 Anyway, that would mean everything's working as intended, and 
 you just can't rely on destructors like that.
Seems it is, and that is great. Now all that's left is some benchmarking to see if saving on allocations actually yields any fruit. Thanks!
Jun 04
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 6/4/17 5:57 AM, Stanislav Blinov wrote:
 On Sunday, 4 June 2017 at 09:38:45 UTC, ag0aep6g wrote:

 Should this work, and if not, why?
As far as I can tell, the `addRange` call works correctly, but maybe too well in a sense. It keeps the `new`ed `C`s alive as long as `arrays3.Cs` has pointers to them. And `arrays3.Cs` has those pointers until the very end.
Yeah, after playing around with the code a bit, shuffling the calls, making new allocations, etc., I saw those dtors indeed being run. I was just expecting the behavior to be the same as for normal 'new'ed arrays, but I guess there are nuances.
 If you add `GC.removeRange(arrays3.Cs.ptr);` at the of `main`, the
 dtors show up.
Yeah, well, calling it manually sort of defeats the purpose :)
adding a range marks it as having pointers to scan, AND stores a reference to that array, so it won't be GC collected (nor will anything it points to). The intention is for it to be used on non-GC memory, like C malloc'd memory, where it doesn't matter that the GC is pointing at it. I would say that you are better off allocating 2 arrays -- one with NO_SCAN where you put your non-pointer-containing data, and one without the flag to put your other data. This is similar to your "selective" function, but instead of allocating 1 array, with a tuple of slices into it, just allocate 2 arrays and return the tuple of those 2 arrays. In my dcollections library, the allocator I use uses malloc for non-pointer containing arrays, and GC for everything that could contain pointers, and then divvy up the array into the exact sizes I want. Works quite well, but the dtor of the container needs to deallocate the malloc'd one. -Steve
Jun 05
parent reply Stanislav Blinov <stanislav.blinov gmail.com> writes:
On Monday, 5 June 2017 at 13:11:25 UTC, Steven Schveighoffer 
wrote:

 adding a range marks it as having pointers to scan, AND stores 
 a reference to that array, so it won't be GC collected (nor 
 will anything it points to). The intention is for it to be used 
 on non-GC memory, like C malloc'd memory, where it doesn't 
 matter that the GC is pointing at it.
Huh? https://dlang.org/phobos/core_memory.html#.GC.addRange :
 Note that p[0 .. sz] is treated as an opaque range of memory 
 assumed to be suitably managed by the caller. In particular, if 
 p points into a GC-managed memory block, addRange does not mark 
 this block as live.
Is that paragraph wrong?
 I would say that you are better off allocating 2 arrays -- one 
 with NO_SCAN where you put your non-pointer-containing data, 
 and one without the flag to put your other data. This is 
 similar to your "selective" function, but instead of allocating 
 1 array, with a tuple of slices into it, just allocate 2 arrays 
 and return the tuple of those 2 arrays.
Then it's just the obvious() function. The whole point of the exercise is to make one GC allocation instead of N :) But still GC, so as not to put additional responsibility on the caller.
Jun 05
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On Monday, 5 June 2017 at 23:30:00 UTC, Stanislav Blinov wrote:
 On Monday, 5 June 2017 at 13:11:25 UTC, Steven Schveighoffer 
 wrote:

 adding a range marks it as having pointers to scan, AND stores 
 a reference to that array, so it won't be GC collected (nor 
 will anything it points to). The intention is for it to be 
 used on non-GC memory, like C malloc'd memory, where it 
 doesn't matter that the GC is pointing at it.
Huh? https://dlang.org/phobos/core_memory.html#.GC.addRange :
 Note that p[0 .. sz] is treated as an opaque range of memory 
 assumed to be suitably managed by the caller. In particular, 
 if p points into a GC-managed memory block, addRange does not 
 mark this block as live.
Is that paragraph wrong?
No I am wrong. I assumed that because the gc is part of the static space it's metadata would also be scanned. But the data is allocated using c malloc, so it's not scanned. This makes any partial insertion using addrange more dangerous actually because you have to take care to keep a pointer to that block somewhere
 I would say that you are better off allocating 2 arrays -- one 
 with NO_SCAN where you put your non-pointer-containing data, 
 and one without the flag to put your other data. This is 
 similar to your "selective" function, but instead of 
 allocating 1 array, with a tuple of slices into it, just 
 allocate 2 arrays and return the tuple of those 2 arrays.
Then it's just the obvious() function. The whole point of the exercise is to make one GC allocation instead of N :) But still GC, so as not to put additional responsibility on the caller.
No, 2 allocations instead of N. -Steve
Jun 06
next sibling parent Stanislav Blinov <stanislav.blinov gmail.com> writes:
On Tuesday, 6 June 2017 at 09:57:31 UTC, Steven Schveighoffer 
wrote:

 Note that p[0 .. sz] is treated as an opaque range of memory 
 assumed to be suitably managed by the caller. In particular, 
 if p points into a GC-managed memory block, addRange does not 
 mark this block as live.
Is that paragraph wrong?
No I am wrong. I assumed that because the gc is part of the static space it's metadata would also be scanned. But the data is allocated using c malloc, so it's not scanned. This makes any partial insertion using addrange more dangerous actually because you have to take care to keep a pointer to that block somewhere
You mean explicitly pointer to the block itself? It is being kept with the first array. Or you mean the pointer passed to addRange? That one would not necessarily belong to the first array. But I thought the GC does track pointers that point to block interiors. After all, that's how slices work.
 Then it's just the obvious() function. The whole point of the 
 exercise is to make one GC allocation instead of N :) But 
 still GC, so as not to put additional responsibility on the 
 caller.
No, 2 allocations instead of N.
In this example. But obviously I'd like to make this generic, i.e.: struct S { /* ... */ } auto arrays = allocateArrays!(int, "ints", numInts, S, "Ss", numSs, void*, "pointers", numPtrs); So it won't necessarily be 2. And of course the function would calculate the alignment, initialize the contents (or not, e.g. via passing a dummy type Uninitialized!T), etc. I do have a use case for allocating up to 4 arrays at once, for example.
Jun 06
prev sibling parent reply Stanislav Blinov <stanislav.blinov gmail.com> writes:
On Tuesday, 6 June 2017 at 09:57:31 UTC, Steven Schveighoffer 
wrote:

 No, 2 allocations instead of N.
Oh, I misunderstood you. You mean two blocks total, for scanned and non-scanned data. This could indeed work for cases when more than two arrays are needed. Still, it's one extra allocation.
Jun 06
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 6/6/17 6:36 AM, Stanislav Blinov wrote:
 On Tuesday, 6 June 2017 at 09:57:31 UTC, Steven Schveighoffer wrote:

 No, 2 allocations instead of N.
Oh, I misunderstood you. You mean two blocks total, for scanned and non-scanned data. This could indeed work for cases when more than two arrays are needed. Still, it's one extra allocation.
True. But O(2) is better than O(N) :) -Steve
Jun 06
parent reply Stanislav Blinov <stanislav.blinov gmail.com> writes:
On Tuesday, 6 June 2017 at 12:07:28 UTC, Steven Schveighoffer 
wrote:
 On 6/6/17 6:36 AM, Stanislav Blinov wrote:
 On Tuesday, 6 June 2017 at 09:57:31 UTC, Steven Schveighoffer 
 wrote:

 No, 2 allocations instead of N.
Oh, I misunderstood you. You mean two blocks total, for scanned and non-scanned data. This could indeed work for cases when more than two arrays are needed. Still, it's one extra allocation.
True. But O(2) is better than O(N) :) -Steve
Heh, this isn't malloc. We could end up in a situation when even one allocation is slower than N, depending on when they happen :) Actually, separate blocks might be a decent choice: there's another problem with sharing blocks for different data in that finalizers can't be run. So such a function would either not support types with finalizers at all, or would have to allocate separate blocks for such types anyway. Don't know, need to actually implement and measure if it's even worth the trouble.
Jun 06
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 6/6/17 8:34 AM, Stanislav Blinov wrote:
 On Tuesday, 6 June 2017 at 12:07:28 UTC, Steven Schveighoffer wrote:
 On 6/6/17 6:36 AM, Stanislav Blinov wrote:
 On Tuesday, 6 June 2017 at 09:57:31 UTC, Steven Schveighoffer wrote:

 No, 2 allocations instead of N.
Oh, I misunderstood you. You mean two blocks total, for scanned and non-scanned data. This could indeed work for cases when more than two arrays are needed. Still, it's one extra allocation.
True. But O(2) is better than O(N) :)
Heh, this isn't malloc. We could end up in a situation when even one allocation is slower than N, depending on when they happen :) Actually, separate blocks might be a decent choice: there's another problem with sharing blocks for different data in that finalizers can't be run. So such a function would either not support types with finalizers at all, or would have to allocate separate blocks for such types anyway. Don't know, need to actually implement and measure if it's even worth the trouble.
Looking at your code, you have allocated separate blocks for the objects and the bytes, but your objects still are just pointers (e.g. C.sizeof == (void *).sizeof). You are allocating a new block for each class instance anyway in the loop inside main. I don't think there is any support in the GC to run the finalizers of an array of class instances (read: an array of class instance storage that you point at), so this whole exercise may be moot if you are using classes :) -Steve
Jun 06
parent reply Stanislav Blinov <stanislav.blinov gmail.com> writes:
On Tuesday, 6 June 2017 at 13:05:51 UTC, Steven Schveighoffer 
wrote:

 Actually, separate blocks might be a decent choice: there's 
 another
 problem with sharing blocks for different data in that 
 finalizers can't
 be run. So such a function would either not support types with
 finalizers at all, or would have to allocate separate blocks 
 for such
 types anyway. Don't know, need to actually implement and 
 measure if it's
 even worth the trouble.
Looking at your code, you have allocated separate blocks for the objects and the bytes, but your objects still are just pointers (e.g. C.sizeof == (void *).sizeof). You are allocating a new block for each class instance anyway in the loop inside main.
Which is exactly why I used them. The code is a short example of single-block array allocation, not a generic demonstration of intent. Classes were just an easy way to stuff pointers into an array while being able to see if they're being looked at by the GC or not. Perhaps I should've used pointers to structs, but oh well.
 I don't think there is any support in the GC to run the 
 finalizers of an array of class instances (read: an array of 
 class instance storage that you point at), so this whole 
 exercise may be moot if you are using classes :)
Heh, well, there is no support in the GC to *create* an array of class instances in the first place ;) We can, of course, do it manually, but that's a different story. Structs, on the other hand, are another matter. Although, even creating a block with BlkAttr.FINALIZE *and* passing the struct typeinfo to GC.malloc() doesn't seem to compel it to run the struct dtors on collect, which is weird, considering it does so for a new'ed array of structs :\
Jun 06
parent Stanislav Blinov <stanislav.blinov gmail.com> writes:
On Tuesday, 6 June 2017 at 13:29:32 UTC, Stanislav Blinov wrote:

 Structs, on the other hand, are another matter. Although, even 
 creating a block with BlkAttr.FINALIZE *and* passing the struct 
 typeinfo to GC.malloc() doesn't seem to compel it to run the 
 struct dtors on collect, which is weird, considering it does so 
 for a new'ed array of structs :\
Ah, but of course I forgot to .init, never mind.
Jun 06