digitalmars.D - Question to GC experts: NO_SCAN for a part of the block

Stanislav Blinov (79/79) Jun 03 2017 Suppose I need to allocate several dynamic arrays of different

ag0aep6g (18/59) Jun 04 2017 [...]

Stanislav Blinov (16/31) Jun 04 2017 Yeah, after playing around with the code a bit, shuffling the

Steven Schveighoffer (16/30) Jun 05 2017 adding a range marks it as having pointers to scan, AND stores a

Stanislav Blinov (8/23) Jun 05 2017 Huh?

Steven Schveighoffer (8/32) Jun 06 2017 No I am wrong. I assumed that because the gc is part of the

Stanislav Blinov (18/35) Jun 06 2017 You mean explicitly pointer to the block itself? It is being kept
Stanislav Blinov (5/6) Jun 06 2017 Oh, I misunderstood you. You mean two blocks total, for scanned

Steven Schveighoffer (3/8) Jun 06 2017 True. But O(2) is better than O(N) :)

Stanislav Blinov (10/24) Jun 06 2017 Heh, this isn't malloc. We could end up in a situation when even

Steven Schveighoffer (10/31) Jun 06 2017 Looking at your code, you have allocated separate blocks for the objects...

Stanislav Blinov (16/35) Jun 06 2017 Which is exactly why I used them. The code is a short example of

Stanislav Blinov (2/7) Jun 06 2017 Ah, but of course I forgot to .init, never mind.

Stanislav Blinov <stanislav.blinov gmail.com> writes:

Suppose I need to allocate several dynamic arrays of different 
types and/or sizes. An obvious choice is to do just that: perform 
N allocations.
But given that I know all the sizes, and also am pretty sure that 
each of the arrays will have the same lifespan, why would I do N 
allocations when I can do just one? After all, I can ask the GC 
for a sufficiently-sized memory block and put my arrays in it.

But I don't necessarily want the GC to scan *all* my arrays for 
pointers. The problem is, with current GC, it seems that it's 
impossible to mark only a part of the block as NO_SCAN. Note that 
I'm not really talking about precise GC here with full type 
information, but about addRange() pointing within a GC-allocated 
block. A short example, two arrays, one holding class references, 
the other - bytes:

---

import core.memory : GC;
import std.stdio;
import std.typecons;

class C {
     int i;
     this(int i) {
         this.i = i;
     }

     ~this() {
         writeln("C(", i, ") dtor");
     }
}

auto obvious(int numCs, int numBytes) {
     // obvious solution: two arrays, two allocations
     return tuple!("Cs", "bytes")(new C[numCs], new 
byte[numBytes]);
}

auto scanned(int numCs, int numBytes) {

     // one allocation for both arrays,
     // for simplicity not dealing with alignment/out of memory 
here.
     // allocate with default attributes, scanned block
     auto memory = GC.malloc(numCs*C.sizeof + 
numBytes*byte.sizeof);
     auto cs = (cast(C*)memory)[0..numCs];
     cs[] = C.init;
     auto bytes = (cast(byte*)(memory + 
numCs*C.sizeof))[0..numBytes];
     bytes[] = byte.init;
     return tuple!("Cs", "bytes")(cs, bytes);
}

auto selective(int numCs, int numBytes) {

     // one allocation for both arrays,
     // for simplicity not dealing with alignment/out of memory 
here.
     // explicitly ask for NO_SCAN block,
     // to not scan bytes for pointers
     auto memory = GC.malloc(numCs*C.sizeof + numBytes*byte.sizeof,
             GC.BlkAttr.NO_SCAN);
     auto cs = (cast(C*)memory)[0..numCs];
     cs[] = C.init;
     // add scanning range for references
     GC.addRange(cs.ptr, cs.length*C.sizeof, typeid(C));
     auto bytes = (cast(byte*)(memory + 
numCs*C.sizeof))[0..numBytes];
     bytes[] = byte.init;
     return tuple!("Cs", "bytes")(cs, bytes);
}

void main() {

     int numCs = 4; // comes at runtime from elsewhere
     int numBytes = 32; // comes at runtime from elsewhere

     auto arrays1 = obvious(numCs, numBytes);

     int counter;
     foreach (ref e; arrays1.Cs)
         e = new C(counter++); // dtors will be called

     auto arrays2 = scanned(numCs, numBytes);
     foreach (ref e; arrays2.Cs)
         e = new C(counter++); // dtors will be called

     auto arrays3 = selective(numCs, numBytes);
     foreach (ref e; arrays3.Cs)
         e = new C(counter++); // dtors will not be called
}

---

Should this work, and if not, why?

Jun 03 2017

ag0aep6g <anonymous example.com> writes:

On 06/04/2017 03:08 AM, Stanislav Blinov wrote:
 ---
 
 import core.memory : GC;
 import std.stdio;
 import std.typecons;
 
 class C {
      int i;
      this(int i) {
          this.i = i;
      }
 
      ~this() {
          writeln("C(", i, ") dtor");
      }
 }

[...]
 auto selective(int numCs, int numBytes) {

[...]
      auto memory = GC.malloc(numCs*C.sizeof + numBytes*byte.sizeof,
              GC.BlkAttr.NO_SCAN);
      auto cs = (cast(C*)memory)[0..numCs];
      cs[] = C.init;
      // add scanning range for references
      GC.addRange(cs.ptr, cs.length*C.sizeof, typeid(C));
      auto bytes = (cast(byte*)(memory + numCs*C.sizeof))[0..numBytes];
      bytes[] = byte.init;
      return tuple!("Cs", "bytes")(cs, bytes);
 }
 
 void main() {
 
      int numCs = 4; // comes at runtime from elsewhere
      int numBytes = 32; // comes at runtime from elsewhere

[...]
      int counter;

[...]
      auto arrays3 = selective(numCs, numBytes);
      foreach (ref e; arrays3.Cs)
          e = new C(counter++); // dtors will not be called
 }
 
 ---
 
 Should this work, and if not, why?

As far as I can tell, the `addRange` call works correctly, but maybe too 
well in a sense. It keeps the `new`ed `C`s alive as long as `arrays3.Cs` 
has pointers to them. And `arrays3.Cs` has those pointers until the very 
end.

If you add `GC.removeRange(arrays3.Cs.ptr);` at the of `main`, the dtors 
show up. Overwriting `arrays3.Cs`'s elements with `null`s also works. My 
guess is that the `removeRange` call isn't done automatically before the 
final run of the GC. Maybe it should be?

But I have a vague memory that the GC isn't required to call destructors 
on everything in the final run. Or maybe it's not guaranteed that there 
is a final run when the program ends? Anyway, that would mean 
everything's working as intended, and you just can't rely on destructors 
like that.

Jun 04 2017

Stanislav Blinov <stanislav.blinov gmail.com> writes:

On Sunday, 4 June 2017 at 09:38:45 UTC, ag0aep6g wrote:

 Should this work, and if not, why?

 As far as I can tell, the `addRange` call works correctly, but 
 maybe too well in a sense. It keeps the `new`ed `C`s alive as 
 long as `arrays3.Cs` has pointers to them. And `arrays3.Cs` has 
 those pointers until the very end.

Yeah, after playing around with the code a bit, shuffling the 
calls, making new allocations, etc., I saw those dtors indeed 
being run. I was just expecting the behavior to be the same as 
for normal 'new'ed arrays, but I guess there are nuances.

 If you add `GC.removeRange(arrays3.Cs.ptr);` at the of `main`, 
 the dtors show up.

Yeah, well, calling it manually sort of defeats the purpose :)

 Overwriting `arrays3.Cs`'s elements with `null`s also works. My 
 guess is that the `removeRange` call isn't done automatically 
 before the final run of the GC. Maybe it should be?

If at all possible, I think that'd be good, if only for 
consistency.

 But I have a vague memory that the GC isn't required to call 
 destructors on everything in the final run. Or maybe it's not 
 guaranteed that there is a final run when the program ends?

That's what puzzled me, seeing as dtors from the other two arrays 
ran at the end. I understand how particular dtors may not be run 
due to circular or self-references, but there are none in this 
case.

 Anyway, that would mean everything's working as intended, and 
 you just can't rely on destructors like that.

Seems it is, and that is great. Now all that's left is some 
benchmarking to see if saving on allocations actually yields any 
fruit.

Thanks!

Jun 04 2017

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 6/4/17 5:57 AM, Stanislav Blinov wrote:
 On Sunday, 4 June 2017 at 09:38:45 UTC, ag0aep6g wrote:

 Should this work, and if not, why?

 As far as I can tell, the `addRange` call works correctly, but maybe
 too well in a sense. It keeps the `new`ed `C`s alive as long as
 `arrays3.Cs` has pointers to them. And `arrays3.Cs` has those pointers
 until the very end.

 Yeah, after playing around with the code a bit, shuffling the calls,
 making new allocations, etc., I saw those dtors indeed being run. I was
 just expecting the behavior to be the same as for normal 'new'ed arrays,
 but I guess there are nuances.

 If you add `GC.removeRange(arrays3.Cs.ptr);` at the of `main`, the
 dtors show up.

 Yeah, well, calling it manually sort of defeats the purpose :)

adding a range marks it as having pointers to scan, AND stores a 
reference to that array, so it won't be GC collected (nor will anything 
it points to). The intention is for it to be used on non-GC memory, like 
C malloc'd memory, where it doesn't matter that the GC is pointing at it.

I would say that you are better off allocating 2 arrays -- one with 
NO_SCAN where you put your non-pointer-containing data, and one without 
the flag to put your other data. This is similar to your "selective" 
function, but instead of allocating 1 array, with a tuple of slices into 
it, just allocate 2 arrays and return the tuple of those 2 arrays.

In my dcollections library, the allocator I use uses malloc for 
non-pointer containing arrays, and GC for everything that could contain 
pointers, and then divvy up the array into the exact sizes I want. Works 
quite well, but the dtor of the container needs to deallocate the 
malloc'd one.

-Steve

Jun 05 2017

Stanislav Blinov <stanislav.blinov gmail.com> writes:

On Monday, 5 June 2017 at 13:11:25 UTC, Steven Schveighoffer 
wrote:

 adding a range marks it as having pointers to scan, AND stores 
 a reference to that array, so it won't be GC collected (nor 
 will anything it points to). The intention is for it to be used 
 on non-GC memory, like C malloc'd memory, where it doesn't 
 matter that the GC is pointing at it.

Huh?

https://dlang.org/phobos/core_memory.html#.GC.addRange :

 Note that p[0 .. sz] is treated as an opaque range of memory 
 assumed to be suitably managed by the caller. In particular, if 
 p points into a GC-managed memory block, addRange does not mark 
 this block as live.

Is that paragraph wrong?

 I would say that you are better off allocating 2 arrays -- one 
 with NO_SCAN where you put your non-pointer-containing data, 
 and one without the flag to put your other data. This is 
 similar to your "selective" function, but instead of allocating 
 1 array, with a tuple of slices into it, just allocate 2 arrays 
 and return the tuple of those 2 arrays.

Then it's just the obvious() function. The whole point of the 
exercise is to make one GC allocation instead of N :) But still 
GC, so as not to put additional responsibility on the caller.

Jun 05 2017

Steven Schveighoffer <schveiguy yahoo.com> writes:

On Monday, 5 June 2017 at 23:30:00 UTC, Stanislav Blinov wrote:
 On Monday, 5 June 2017 at 13:11:25 UTC, Steven Schveighoffer 
 wrote:

 adding a range marks it as having pointers to scan, AND stores 
 a reference to that array, so it won't be GC collected (nor 
 will anything it points to). The intention is for it to be 
 used on non-GC memory, like C malloc'd memory, where it 
 doesn't matter that the GC is pointing at it.

 Huh?

 https://dlang.org/phobos/core_memory.html#.GC.addRange :

 Note that p[0 .. sz] is treated as an opaque range of memory 
 assumed to be suitably managed by the caller. In particular, 
 if p points into a GC-managed memory block, addRange does not 
 mark this block as live.

 Is that paragraph wrong?

No I am wrong. I assumed that because the gc is part of the 
static space it's metadata would also be scanned. But the data is 
allocated using c malloc, so it's not scanned. This makes any 
partial insertion using addrange more dangerous actually because 
you have to take care to keep a pointer to that block somewhere

 I would say that you are better off allocating 2 arrays -- one 
 with NO_SCAN where you put your non-pointer-containing data, 
 and one without the flag to put your other data. This is 
 similar to your "selective" function, but instead of 
 allocating 1 array, with a tuple of slices into it, just 
 allocate 2 arrays and return the tuple of those 2 arrays.

 Then it's just the obvious() function. The whole point of the 
 exercise is to make one GC allocation instead of N :) But still 
 GC, so as not to put additional responsibility on the caller.

No, 2 allocations instead of N.

-Steve

Jun 06 2017

Stanislav Blinov <stanislav.blinov gmail.com> writes:

On Tuesday, 6 June 2017 at 09:57:31 UTC, Steven Schveighoffer 
wrote:

 Note that p[0 .. sz] is treated as an opaque range of memory 
 assumed to be suitably managed by the caller. In particular, 
 if p points into a GC-managed memory block, addRange does not 
 mark this block as live.

 Is that paragraph wrong?

 No I am wrong. I assumed that because the gc is part of the 
 static space it's metadata would also be scanned. But the data 
 is allocated using c malloc, so it's not scanned. This makes 
 any partial insertion using addrange more dangerous actually 
 because you have to take care to keep a pointer to that block 
 somewhere

You mean explicitly pointer to the block itself? It is being kept 
with the first array.
Or you mean the pointer passed to addRange? That one would not 
necessarily belong to the first array. But I thought the GC does 
track pointers that point to block interiors. After all, that's 
how slices work.

 Then it's just the obvious() function. The whole point of the 
 exercise is to make one GC allocation instead of N :) But 
 still GC, so as not to put additional responsibility on the 
 caller.

 No, 2 allocations instead of N.

In this example. But obviously I'd like to make this generic, 
i.e.:

struct S { /* ... */ }

auto arrays = allocateArrays!(int, "ints", numInts, S, "Ss", 
numSs, void*, "pointers", numPtrs);

So it won't necessarily be 2. And of course the function would 
calculate the alignment, initialize the contents (or not, e.g. 
via passing a dummy type Uninitialized!T), etc.

I do have a use case for allocating up to 4 arrays at once, for 
example.

Jun 06 2017

Stanislav Blinov <stanislav.blinov gmail.com> writes:

On Tuesday, 6 June 2017 at 09:57:31 UTC, Steven Schveighoffer 
wrote:

 No, 2 allocations instead of N.

Oh, I misunderstood you. You mean two blocks total, for scanned 
and non-scanned data. This could indeed work for cases when more 
than two arrays are needed. Still, it's one extra allocation.

Jun 06 2017

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 6/6/17 6:36 AM, Stanislav Blinov wrote:
 On Tuesday, 6 June 2017 at 09:57:31 UTC, Steven Schveighoffer wrote:

 No, 2 allocations instead of N.

 Oh, I misunderstood you. You mean two blocks total, for scanned and
 non-scanned data. This could indeed work for cases when more than two
 arrays are needed. Still, it's one extra allocation.

True. But O(2) is better than O(N) :)

-Steve

Jun 06 2017

Stanislav Blinov <stanislav.blinov gmail.com> writes:

On Tuesday, 6 June 2017 at 12:07:28 UTC, Steven Schveighoffer 
wrote:
 On 6/6/17 6:36 AM, Stanislav Blinov wrote:
 On Tuesday, 6 June 2017 at 09:57:31 UTC, Steven Schveighoffer 
 wrote:

 No, 2 allocations instead of N.

 Oh, I misunderstood you. You mean two blocks total, for 
 scanned and
 non-scanned data. This could indeed work for cases when more 
 than two
 arrays are needed. Still, it's one extra allocation.

 True. But O(2) is better than O(N) :)

 -Steve

Heh, this isn't malloc. We could end up in a situation when even 
one allocation is slower than N, depending on when they happen :)

Actually, separate blocks might be a decent choice: there's 
another problem with sharing blocks for different data in that 
finalizers can't be run. So such a function would either not 
support types with finalizers at all, or would have to allocate 
separate blocks for such types anyway. Don't know, need to 
actually implement and measure if it's even worth the trouble.

Jun 06 2017

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 6/6/17 8:34 AM, Stanislav Blinov wrote:
 On Tuesday, 6 June 2017 at 12:07:28 UTC, Steven Schveighoffer wrote:
 On 6/6/17 6:36 AM, Stanislav Blinov wrote:
 On Tuesday, 6 June 2017 at 09:57:31 UTC, Steven Schveighoffer wrote:

 No, 2 allocations instead of N.

 Oh, I misunderstood you. You mean two blocks total, for scanned and
 non-scanned data. This could indeed work for cases when more than two
 arrays are needed. Still, it's one extra allocation.

 True. But O(2) is better than O(N) :)

 Heh, this isn't malloc. We could end up in a situation when even one
 allocation is slower than N, depending on when they happen :)

 Actually, separate blocks might be a decent choice: there's another
 problem with sharing blocks for different data in that finalizers can't
 be run. So such a function would either not support types with
 finalizers at all, or would have to allocate separate blocks for such
 types anyway. Don't know, need to actually implement and measure if it's
 even worth the trouble.

Looking at your code, you have allocated separate blocks for the objects 
and the bytes, but your objects still are just pointers (e.g. C.sizeof 
== (void *).sizeof). You are allocating a new block for each class 
instance anyway in the loop inside main.

I don't think there is any support in the GC to run the finalizers of an 
array of class instances (read: an array of class instance storage that 
you point at), so this whole exercise may be moot if you are using 
classes :)

-Steve

Jun 06 2017

Stanislav Blinov <stanislav.blinov gmail.com> writes:

On Tuesday, 6 June 2017 at 13:05:51 UTC, Steven Schveighoffer 
wrote:

 Actually, separate blocks might be a decent choice: there's 
 another
 problem with sharing blocks for different data in that 
 finalizers can't
 be run. So such a function would either not support types with
 finalizers at all, or would have to allocate separate blocks 
 for such
 types anyway. Don't know, need to actually implement and 
 measure if it's
 even worth the trouble.

 Looking at your code, you have allocated separate blocks for 
 the objects and the bytes, but your objects still are just 
 pointers (e.g. C.sizeof == (void *).sizeof). You are allocating 
 a new block for each class instance anyway in the loop inside 
 main.

Which is exactly why I used them. The code is a short example of 
single-block array allocation, not a generic demonstration of 
intent. Classes were just an easy way to stuff pointers into an 
array while being able to see if they're being looked at by the 
GC or not. Perhaps I should've used pointers to structs, but oh 
well.

 I don't think there is any support in the GC to run the 
 finalizers of an array of class instances (read: an array of 
 class instance storage that you point at), so this whole 
 exercise may be moot if you are using classes :)

Heh, well, there is no support in the GC to *create* an array of 
class instances in the first place ;) We can, of course, do it 
manually, but that's a different story.

Structs, on the other hand, are another matter. Although, even 
creating a block with BlkAttr.FINALIZE *and* passing the struct 
typeinfo to GC.malloc() doesn't seem to compel it to run the 
struct dtors on collect, which is weird, considering it does so 
for a new'ed array of structs :\

Jun 06 2017

Stanislav Blinov <stanislav.blinov gmail.com> writes:

On Tuesday, 6 June 2017 at 13:29:32 UTC, Stanislav Blinov wrote:

 Structs, on the other hand, are another matter. Although, even 
 creating a block with BlkAttr.FINALIZE *and* passing the struct 
 typeinfo to GC.malloc() doesn't seem to compel it to run the 
 struct dtors on collect, which is weird, considering it does so 
 for a new'ed array of structs :\

Ah, but of course I forgot to .init, never mind.

Jun 06 2017

D Programming

C/C++ Programming

Other

digitalmars.D - Question to GC experts: NO_SCAN for a part of the block