digitalmars.D - Question to GC experts: NO_SCAN for a part of the block
- Stanislav Blinov (79/79) Jun 03 2017 Suppose I need to allocate several dynamic arrays of different
- ag0aep6g (18/59) Jun 04 2017 [...]
- Stanislav Blinov (16/31) Jun 04 2017 Yeah, after playing around with the code a bit, shuffling the
- Steven Schveighoffer (16/30) Jun 05 2017 adding a range marks it as having pointers to scan, AND stores a
- Stanislav Blinov (8/23) Jun 05 2017 Huh?
- Steven Schveighoffer (8/32) Jun 06 2017 No I am wrong. I assumed that because the gc is part of the
- Stanislav Blinov (18/35) Jun 06 2017 You mean explicitly pointer to the block itself? It is being kept
- Stanislav Blinov (5/6) Jun 06 2017 Oh, I misunderstood you. You mean two blocks total, for scanned
- Steven Schveighoffer (3/8) Jun 06 2017 True. But O(2) is better than O(N) :)
- Stanislav Blinov (10/24) Jun 06 2017 Heh, this isn't malloc. We could end up in a situation when even
- Steven Schveighoffer (10/31) Jun 06 2017 Looking at your code, you have allocated separate blocks for the objects...
- Stanislav Blinov (16/35) Jun 06 2017 Which is exactly why I used them. The code is a short example of
- Stanislav Blinov (2/7) Jun 06 2017 Ah, but of course I forgot to .init, never mind.
Suppose I need to allocate several dynamic arrays of different types and/or sizes. An obvious choice is to do just that: perform N allocations. But given that I know all the sizes, and also am pretty sure that each of the arrays will have the same lifespan, why would I do N allocations when I can do just one? After all, I can ask the GC for a sufficiently-sized memory block and put my arrays in it. But I don't necessarily want the GC to scan *all* my arrays for pointers. The problem is, with current GC, it seems that it's impossible to mark only a part of the block as NO_SCAN. Note that I'm not really talking about precise GC here with full type information, but about addRange() pointing within a GC-allocated block. A short example, two arrays, one holding class references, the other - bytes: --- import core.memory : GC; import std.stdio; import std.typecons; class C { int i; this(int i) { this.i = i; } ~this() { writeln("C(", i, ") dtor"); } } auto obvious(int numCs, int numBytes) { // obvious solution: two arrays, two allocations return tuple!("Cs", "bytes")(new C[numCs], new byte[numBytes]); } auto scanned(int numCs, int numBytes) { // one allocation for both arrays, // for simplicity not dealing with alignment/out of memory here. // allocate with default attributes, scanned block auto memory = GC.malloc(numCs*C.sizeof + numBytes*byte.sizeof); auto cs = (cast(C*)memory)[0..numCs]; cs[] = C.init; auto bytes = (cast(byte*)(memory + numCs*C.sizeof))[0..numBytes]; bytes[] = byte.init; return tuple!("Cs", "bytes")(cs, bytes); } auto selective(int numCs, int numBytes) { // one allocation for both arrays, // for simplicity not dealing with alignment/out of memory here. // explicitly ask for NO_SCAN block, // to not scan bytes for pointers auto memory = GC.malloc(numCs*C.sizeof + numBytes*byte.sizeof, GC.BlkAttr.NO_SCAN); auto cs = (cast(C*)memory)[0..numCs]; cs[] = C.init; // add scanning range for references GC.addRange(cs.ptr, cs.length*C.sizeof, typeid(C)); auto bytes = (cast(byte*)(memory + numCs*C.sizeof))[0..numBytes]; bytes[] = byte.init; return tuple!("Cs", "bytes")(cs, bytes); } void main() { int numCs = 4; // comes at runtime from elsewhere int numBytes = 32; // comes at runtime from elsewhere auto arrays1 = obvious(numCs, numBytes); int counter; foreach (ref e; arrays1.Cs) e = new C(counter++); // dtors will be called auto arrays2 = scanned(numCs, numBytes); foreach (ref e; arrays2.Cs) e = new C(counter++); // dtors will be called auto arrays3 = selective(numCs, numBytes); foreach (ref e; arrays3.Cs) e = new C(counter++); // dtors will not be called } --- Should this work, and if not, why?
Jun 03 2017
On 06/04/2017 03:08 AM, Stanislav Blinov wrote:--- import core.memory : GC; import std.stdio; import std.typecons; class C { int i; this(int i) { this.i = i; } ~this() { writeln("C(", i, ") dtor"); } }[...]auto selective(int numCs, int numBytes) {[...]auto memory = GC.malloc(numCs*C.sizeof + numBytes*byte.sizeof, GC.BlkAttr.NO_SCAN); auto cs = (cast(C*)memory)[0..numCs]; cs[] = C.init; // add scanning range for references GC.addRange(cs.ptr, cs.length*C.sizeof, typeid(C)); auto bytes = (cast(byte*)(memory + numCs*C.sizeof))[0..numBytes]; bytes[] = byte.init; return tuple!("Cs", "bytes")(cs, bytes); } void main() { int numCs = 4; // comes at runtime from elsewhere int numBytes = 32; // comes at runtime from elsewhere[...]int counter;[...]auto arrays3 = selective(numCs, numBytes); foreach (ref e; arrays3.Cs) e = new C(counter++); // dtors will not be called } --- Should this work, and if not, why?As far as I can tell, the `addRange` call works correctly, but maybe too well in a sense. It keeps the `new`ed `C`s alive as long as `arrays3.Cs` has pointers to them. And `arrays3.Cs` has those pointers until the very end. If you add `GC.removeRange(arrays3.Cs.ptr);` at the of `main`, the dtors show up. Overwriting `arrays3.Cs`'s elements with `null`s also works. My guess is that the `removeRange` call isn't done automatically before the final run of the GC. Maybe it should be? But I have a vague memory that the GC isn't required to call destructors on everything in the final run. Or maybe it's not guaranteed that there is a final run when the program ends? Anyway, that would mean everything's working as intended, and you just can't rely on destructors like that.
Jun 04 2017
On Sunday, 4 June 2017 at 09:38:45 UTC, ag0aep6g wrote:Yeah, after playing around with the code a bit, shuffling the calls, making new allocations, etc., I saw those dtors indeed being run. I was just expecting the behavior to be the same as for normal 'new'ed arrays, but I guess there are nuances.Should this work, and if not, why?As far as I can tell, the `addRange` call works correctly, but maybe too well in a sense. It keeps the `new`ed `C`s alive as long as `arrays3.Cs` has pointers to them. And `arrays3.Cs` has those pointers until the very end.If you add `GC.removeRange(arrays3.Cs.ptr);` at the of `main`, the dtors show up.Yeah, well, calling it manually sort of defeats the purpose :)Overwriting `arrays3.Cs`'s elements with `null`s also works. My guess is that the `removeRange` call isn't done automatically before the final run of the GC. Maybe it should be?If at all possible, I think that'd be good, if only for consistency.But I have a vague memory that the GC isn't required to call destructors on everything in the final run. Or maybe it's not guaranteed that there is a final run when the program ends?That's what puzzled me, seeing as dtors from the other two arrays ran at the end. I understand how particular dtors may not be run due to circular or self-references, but there are none in this case.Anyway, that would mean everything's working as intended, and you just can't rely on destructors like that.Seems it is, and that is great. Now all that's left is some benchmarking to see if saving on allocations actually yields any fruit. Thanks!
Jun 04 2017
On 6/4/17 5:57 AM, Stanislav Blinov wrote:On Sunday, 4 June 2017 at 09:38:45 UTC, ag0aep6g wrote:adding a range marks it as having pointers to scan, AND stores a reference to that array, so it won't be GC collected (nor will anything it points to). The intention is for it to be used on non-GC memory, like C malloc'd memory, where it doesn't matter that the GC is pointing at it. I would say that you are better off allocating 2 arrays -- one with NO_SCAN where you put your non-pointer-containing data, and one without the flag to put your other data. This is similar to your "selective" function, but instead of allocating 1 array, with a tuple of slices into it, just allocate 2 arrays and return the tuple of those 2 arrays. In my dcollections library, the allocator I use uses malloc for non-pointer containing arrays, and GC for everything that could contain pointers, and then divvy up the array into the exact sizes I want. Works quite well, but the dtor of the container needs to deallocate the malloc'd one. -SteveYeah, after playing around with the code a bit, shuffling the calls, making new allocations, etc., I saw those dtors indeed being run. I was just expecting the behavior to be the same as for normal 'new'ed arrays, but I guess there are nuances.Should this work, and if not, why?As far as I can tell, the `addRange` call works correctly, but maybe too well in a sense. It keeps the `new`ed `C`s alive as long as `arrays3.Cs` has pointers to them. And `arrays3.Cs` has those pointers until the very end.If you add `GC.removeRange(arrays3.Cs.ptr);` at the of `main`, the dtors show up.Yeah, well, calling it manually sort of defeats the purpose :)
Jun 05 2017
On Monday, 5 June 2017 at 13:11:25 UTC, Steven Schveighoffer wrote:adding a range marks it as having pointers to scan, AND stores a reference to that array, so it won't be GC collected (nor will anything it points to). The intention is for it to be used on non-GC memory, like C malloc'd memory, where it doesn't matter that the GC is pointing at it.Huh? https://dlang.org/phobos/core_memory.html#.GC.addRange :Note that p[0 .. sz] is treated as an opaque range of memory assumed to be suitably managed by the caller. In particular, if p points into a GC-managed memory block, addRange does not mark this block as live.Is that paragraph wrong?I would say that you are better off allocating 2 arrays -- one with NO_SCAN where you put your non-pointer-containing data, and one without the flag to put your other data. This is similar to your "selective" function, but instead of allocating 1 array, with a tuple of slices into it, just allocate 2 arrays and return the tuple of those 2 arrays.Then it's just the obvious() function. The whole point of the exercise is to make one GC allocation instead of N :) But still GC, so as not to put additional responsibility on the caller.
Jun 05 2017
On Monday, 5 June 2017 at 23:30:00 UTC, Stanislav Blinov wrote:On Monday, 5 June 2017 at 13:11:25 UTC, Steven Schveighoffer wrote:No I am wrong. I assumed that because the gc is part of the static space it's metadata would also be scanned. But the data is allocated using c malloc, so it's not scanned. This makes any partial insertion using addrange more dangerous actually because you have to take care to keep a pointer to that block somewhereadding a range marks it as having pointers to scan, AND stores a reference to that array, so it won't be GC collected (nor will anything it points to). The intention is for it to be used on non-GC memory, like C malloc'd memory, where it doesn't matter that the GC is pointing at it.Huh? https://dlang.org/phobos/core_memory.html#.GC.addRange :Note that p[0 .. sz] is treated as an opaque range of memory assumed to be suitably managed by the caller. In particular, if p points into a GC-managed memory block, addRange does not mark this block as live.Is that paragraph wrong?No, 2 allocations instead of N. -SteveI would say that you are better off allocating 2 arrays -- one with NO_SCAN where you put your non-pointer-containing data, and one without the flag to put your other data. This is similar to your "selective" function, but instead of allocating 1 array, with a tuple of slices into it, just allocate 2 arrays and return the tuple of those 2 arrays.Then it's just the obvious() function. The whole point of the exercise is to make one GC allocation instead of N :) But still GC, so as not to put additional responsibility on the caller.
Jun 06 2017
On Tuesday, 6 June 2017 at 09:57:31 UTC, Steven Schveighoffer wrote:You mean explicitly pointer to the block itself? It is being kept with the first array. Or you mean the pointer passed to addRange? That one would not necessarily belong to the first array. But I thought the GC does track pointers that point to block interiors. After all, that's how slices work.No I am wrong. I assumed that because the gc is part of the static space it's metadata would also be scanned. But the data is allocated using c malloc, so it's not scanned. This makes any partial insertion using addrange more dangerous actually because you have to take care to keep a pointer to that block somewhereNote that p[0 .. sz] is treated as an opaque range of memory assumed to be suitably managed by the caller. In particular, if p points into a GC-managed memory block, addRange does not mark this block as live.Is that paragraph wrong?In this example. But obviously I'd like to make this generic, i.e.: struct S { /* ... */ } auto arrays = allocateArrays!(int, "ints", numInts, S, "Ss", numSs, void*, "pointers", numPtrs); So it won't necessarily be 2. And of course the function would calculate the alignment, initialize the contents (or not, e.g. via passing a dummy type Uninitialized!T), etc. I do have a use case for allocating up to 4 arrays at once, for example.Then it's just the obvious() function. The whole point of the exercise is to make one GC allocation instead of N :) But still GC, so as not to put additional responsibility on the caller.No, 2 allocations instead of N.
Jun 06 2017
On Tuesday, 6 June 2017 at 09:57:31 UTC, Steven Schveighoffer wrote:No, 2 allocations instead of N.Oh, I misunderstood you. You mean two blocks total, for scanned and non-scanned data. This could indeed work for cases when more than two arrays are needed. Still, it's one extra allocation.
Jun 06 2017
On 6/6/17 6:36 AM, Stanislav Blinov wrote:On Tuesday, 6 June 2017 at 09:57:31 UTC, Steven Schveighoffer wrote:True. But O(2) is better than O(N) :) -SteveNo, 2 allocations instead of N.Oh, I misunderstood you. You mean two blocks total, for scanned and non-scanned data. This could indeed work for cases when more than two arrays are needed. Still, it's one extra allocation.
Jun 06 2017
On Tuesday, 6 June 2017 at 12:07:28 UTC, Steven Schveighoffer wrote:On 6/6/17 6:36 AM, Stanislav Blinov wrote:Heh, this isn't malloc. We could end up in a situation when even one allocation is slower than N, depending on when they happen :) Actually, separate blocks might be a decent choice: there's another problem with sharing blocks for different data in that finalizers can't be run. So such a function would either not support types with finalizers at all, or would have to allocate separate blocks for such types anyway. Don't know, need to actually implement and measure if it's even worth the trouble.On Tuesday, 6 June 2017 at 09:57:31 UTC, Steven Schveighoffer wrote:True. But O(2) is better than O(N) :) -SteveNo, 2 allocations instead of N.Oh, I misunderstood you. You mean two blocks total, for scanned and non-scanned data. This could indeed work for cases when more than two arrays are needed. Still, it's one extra allocation.
Jun 06 2017
On 6/6/17 8:34 AM, Stanislav Blinov wrote:On Tuesday, 6 June 2017 at 12:07:28 UTC, Steven Schveighoffer wrote:Looking at your code, you have allocated separate blocks for the objects and the bytes, but your objects still are just pointers (e.g. C.sizeof == (void *).sizeof). You are allocating a new block for each class instance anyway in the loop inside main. I don't think there is any support in the GC to run the finalizers of an array of class instances (read: an array of class instance storage that you point at), so this whole exercise may be moot if you are using classes :) -SteveOn 6/6/17 6:36 AM, Stanislav Blinov wrote:Heh, this isn't malloc. We could end up in a situation when even one allocation is slower than N, depending on when they happen :) Actually, separate blocks might be a decent choice: there's another problem with sharing blocks for different data in that finalizers can't be run. So such a function would either not support types with finalizers at all, or would have to allocate separate blocks for such types anyway. Don't know, need to actually implement and measure if it's even worth the trouble.On Tuesday, 6 June 2017 at 09:57:31 UTC, Steven Schveighoffer wrote:True. But O(2) is better than O(N) :)No, 2 allocations instead of N.Oh, I misunderstood you. You mean two blocks total, for scanned and non-scanned data. This could indeed work for cases when more than two arrays are needed. Still, it's one extra allocation.
Jun 06 2017
On Tuesday, 6 June 2017 at 13:05:51 UTC, Steven Schveighoffer wrote:Which is exactly why I used them. The code is a short example of single-block array allocation, not a generic demonstration of intent. Classes were just an easy way to stuff pointers into an array while being able to see if they're being looked at by the GC or not. Perhaps I should've used pointers to structs, but oh well.Actually, separate blocks might be a decent choice: there's another problem with sharing blocks for different data in that finalizers can't be run. So such a function would either not support types with finalizers at all, or would have to allocate separate blocks for such types anyway. Don't know, need to actually implement and measure if it's even worth the trouble.Looking at your code, you have allocated separate blocks for the objects and the bytes, but your objects still are just pointers (e.g. C.sizeof == (void *).sizeof). You are allocating a new block for each class instance anyway in the loop inside main.I don't think there is any support in the GC to run the finalizers of an array of class instances (read: an array of class instance storage that you point at), so this whole exercise may be moot if you are using classes :)Heh, well, there is no support in the GC to *create* an array of class instances in the first place ;) We can, of course, do it manually, but that's a different story. Structs, on the other hand, are another matter. Although, even creating a block with BlkAttr.FINALIZE *and* passing the struct typeinfo to GC.malloc() doesn't seem to compel it to run the struct dtors on collect, which is weird, considering it does so for a new'ed array of structs :\
Jun 06 2017
On Tuesday, 6 June 2017 at 13:29:32 UTC, Stanislav Blinov wrote:Structs, on the other hand, are another matter. Although, even creating a block with BlkAttr.FINALIZE *and* passing the struct typeinfo to GC.malloc() doesn't seem to compel it to run the struct dtors on collect, which is weird, considering it does so for a new'ed array of structs :\Ah, but of course I forgot to .init, never mind.
Jun 06 2017