www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - New pointer type for GC

reply Etienne Cimon <etcimon gmail.com> writes:
I've been looking at the GC and found that the main problem is that 
there's no clear information about the pointers. At least smart pointers 
have some info inside them but GC pointers are completely plain 4-8 
bytes and nothing else.

The GC is very fast even if it needs to lookup this info but I believe 
it wouldn't stay low-cpu in a 128 GB of RAM server with 3GB/s of memory 
traffic with a wide range of memory segment sizes.

I think a decent proposal would be to
1- Introduce a new GC pointer type, e.g. a void' (its an apostrophee) 
used also in classes which implicitely converts to void* by removing the 
last bytes (which ontain the info). This pointer contains the Pool ID of 
the underlying memory
2- For reference pointers to a GC pointer, &void' would add a thread ID 
and magic number to better identify them during collection, and to avoid 
stopping the whole world to dereference them
3- The space (4 bytes?) added by the new pointer size could be saved 
with tighter storage bins. E.g. No more storing 65 bytes in a 128 byte 
bin, but the bucket would go from array to AVL Tree, which is a decent 
trade-off for all the O(1) searches during collection.

The downsides of it is that adding roots would force falling back on the 
previous/slower searches, so it's either GC or no GC. Also, everything 
in D would become a ' pointer rather than * (which would then be legacy)

I think everything everywhere would have to change for this to be possible.
May 26 2014
next sibling parent reply "Brian Schott" <briancschott gmail.com> writes:
On Tuesday, 27 May 2014 at 02:52:48 UTC, Etienne Cimon wrote:
 void' (its an apostrophee)

You mean the beginning of a character literal?
 I think everything everywhere would have to change for this to 
 be possible.

I don't think we want to do that.
May 26 2014
parent Etienne Cimon <etcimon gmail.com> writes:
On 2014-05-26 23:15, Brian Schott wrote:
 On Tuesday, 27 May 2014 at 02:52:48 UTC, Etienne Cimon wrote:
 void' (its an apostrophee)

You mean the beginning of a character literal?
 I think everything everywhere would have to change for this to be
 possible.

I don't think we want to do that.

Ah, maybe I wasn't clear but this meant that if the void' is not used entirely everywhere, it falls back on the old GC algorithms. This isn't breaking, it's a fully backwards-compatible idea
May 26 2014
prev sibling next sibling parent Etienne Cimon <etcimon gmail.com> writes:
On 2014-05-26 22:52, Etienne Cimon wrote:
 2- For reference pointers to a GC pointer, &void' would add a thread ID
 and magic number to better identify them during collection, and to avoid
 stopping the whole world to dereference them

Forgot to mention, but this not only avoids stopping the whole world, but also allows parallel collection (multi-threading).
May 26 2014
prev sibling next sibling parent reply "Daniel Murphy" <yebbliesnospam gmail.com> writes:
"Etienne Cimon"  wrote in message news:lm0um0$tgh$1 digitalmars.com...

 I think everything everywhere would have to change for this to be 
 possible.

Sounds like never gonna happen.
May 26 2014
next sibling parent reply Etienne Cimon <etcimon gmail.com> writes:
On 2014-05-26 23:19, Daniel Murphy wrote:
 "Etienne Cimon"  wrote in message news:lm0um0$tgh$1 digitalmars.com...

 I think everything everywhere would have to change for this to be
 possible.

Sounds like never gonna happen.

In terms of logic it's not that complicated, I could change DMD, druntime, phobos myself for it. The main problem is that the apostrophe is really, a major cultural change
May 26 2014
parent Etienne <etcimon gmail.com> writes:
On 2014-05-27 9:52 AM, Idan Arye wrote:
 Please, no apostrophe. It'll mess syntax highlighters, and possible
 indenters.

How about #? void# ptr; void## ptr2 = &ptr; assert(sizeof(ptr) == size_t + 3); assert(szptr_t == sizeof(ptr)); szptr_t ptr2Val = cast(szptr_t) &ptr; char magicNum = ptr2.magic; dchar threadId = ptr2.thread; char[3] poolId = ptr.pool;
May 27 2014
prev sibling parent "Idan Arye" <GenericNPC gmail.com> writes:
On Tuesday, 27 May 2014 at 03:26:33 UTC, Etienne Cimon wrote:
 On 2014-05-26 23:19, Daniel Murphy wrote:
 "Etienne Cimon"  wrote in message 
 news:lm0um0$tgh$1 digitalmars.com...

 I think everything everywhere would have to change for this 
 to be
 possible.

Sounds like never gonna happen.

In terms of logic it's not that complicated, I could change DMD, druntime, phobos myself for it. The main problem is that the apostrophe is really, a major cultural change

Please, no apostrophe. It'll mess syntax highlighters, and possible indenters.
May 27 2014
prev sibling next sibling parent reply "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Tuesday, 27 May 2014 at 02:52:48 UTC, Etienne Cimon wrote:
 I've been looking at the GC and found that the main problem is 
 that there's no clear information about the pointers. At least 
 smart pointers have some info inside them but GC pointers are 
 completely plain 4-8 bytes and nothing else.

On 64 bit platform 8 bytes is sufficient if you control the allocator: 1. Avoid allocating non-GC memory from specific address range. 2. Set a max-size for GC allocated objects. Then the test becomes this: if ((ptr & NONGCMASK)==0){ heapinfo_ptr = ptr&MASK; // process ptr }
May 27 2014
parent reply Etienne <etcimon gmail.com> writes:
On 2014-05-27 3:56 AM, "Ola Fosheim Grøstad" 
<ola.fosheim.grostad+dlang gmail.com>" wrote:
 On 64 bit platform 8 bytes is sufficient if you control the allocator:

 1. Avoid allocating non-GC memory from specific address range.
 2. Set a max-size for GC allocated objects.

 Then the test becomes this:

 if ((ptr & NONGCMASK)==0){
     heapinfo_ptr = ptr&MASK;

    // process ptr
 }

That's true, though you still need the thread ID for references to pointers and you need to be able to pass those pointers to C. If that's done manually, you end up sanitizing the pointers too often, it becomes boilerplate.
May 27 2014
parent reply Etienne <etcimon gmail.com> writes:
On 2014-05-27 10:18 AM, "Ola Fosheim Grøstad" 
<ola.fosheim.grostad+dlang gmail.com>" wrote:
 On Tuesday, 27 May 2014 at 13:58:26 UTC, Etienne wrote:
 That's true, though you still need the thread ID for references to
 pointers and you need to be able to pass those pointers to C.

I am not really sure how useful references to gc-pointers is. I certainly would trade them in for multiple return values. I also think it is reasonable to ban transfer of GC mem to C code if all GC mem is accounted for with gc-typed pointers...

I think the GC is the future of D considering it's embedded to the very core of the language, and compatibility with C code is ... elementary. Also, thread IDs in ptr references are to the GC as ref counts are to the smart pointers. If you remove the refCount from smart pointers, you end up scanning the whole memory to count them don't you? So then, why remove the thread ID from GC references, if only to look for them in each thread? You slow the GC down by as much total memory there is in all threads vs the avg in a thread, AND you remove parallel collection - by not having the Thread ID in gc ptr references So you understand that's exactly why the GC has to stop the world, and no gaming platform will ever turn to the default behavior of a language if it stops its world. As a matter of fact, I can't see any other way of fixing the GC than adding the Thread ID in there :/
May 27 2014
parent Etienne <etcimon gmail.com> writes:
On 2014-05-27 10:54 AM, "Ola Fosheim Grøstad" 
<ola.fosheim.grostad+dlang gmail.com>" wrote:
 On Tuesday, 27 May 2014 at 14:42:34 UTC, Etienne wrote:
 I think the GC is the future of D considering it's embedded to the
 very core of the language, and compatibility with C code is ...
 elementary.

Well, but then I think you should be required to do manual tracking while it is being retained by C code. Basically a ref counter that keeps it marked reachable by the gc until released.
 You slow the GC down by as much total memory there is in all threads
 vs the avg in a thread, AND you remove parallel collection - by not
 having the Thread ID in gc ptr references

Not if you restrict the gc heap to a set of blocks. You can also keep thread info in the heap memoryblock.
 behavior of a language if it stops its world. As a matter of fact, I
 can't see any other way of fixing the GC than adding the Thread ID in
 there :/

By having multiple local GCs?

You're right, it's obviously easier to keep it as the same pointer syntax but hijack the stdlib malloc functions to forcibly go through the GC. If the GC controls everything, you can keep the info in 8 byte pointers. - The GC always returns an libc-incompatible pointer value - Dereferencing should call the resolver from the GC to process it with the libc-compatible value (possibly just removing the last couple bytes) - Sending a pointer through extern(C) should call the sanitizer which resolves the real pointer through the GC and sends that - The last bytes of a pointer would contain thread ID for void** and poolID for void* - This would only work on x64 platforms
May 27 2014
prev sibling next sibling parent "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Tuesday, 27 May 2014 at 13:58:26 UTC, Etienne wrote:
 That's true, though you still need the thread ID for references 
 to pointers and you need to be able to pass those pointers to C.

I am not really sure how useful references to gc-pointers is. I certainly would trade them in for multiple return values. I also think it is reasonable to ban transfer of GC mem to C code if all GC mem is accounted for with gc-typed pointers...
May 27 2014
prev sibling next sibling parent "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Tuesday, 27 May 2014 at 14:42:34 UTC, Etienne wrote:
 I think the GC is the future of D considering it's embedded to 
 the very core of the language, and compatibility with C code is 
 ... elementary.

Well, but then I think you should be required to do manual tracking while it is being retained by C code. Basically a ref counter that keeps it marked reachable by the gc until released.
 You slow the GC down by as much total memory there is in all 
 threads vs the avg in a thread, AND you remove parallel 
 collection - by not having the Thread ID in gc ptr references

Not if you restrict the gc heap to a set of blocks. You can also keep thread info in the heap memoryblock.
 behavior of a language if it stops its world. As a matter of 
 fact, I can't see any other way of fixing the GC than adding 
 the Thread ID in there :/

By having multiple local GCs?
May 27 2014
prev sibling next sibling parent "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Tuesday, 27 May 2014 at 16:47:38 UTC, Etienne wrote:
 You're right, it's obviously easier to keep it as the same 
 pointer syntax but hijack the stdlib malloc functions to 
 forcibly go through the GC.

That is an option, and having a hijacked malloc would probably also make it possible to optimize out uneccessary allocations as well as inline allocations. If you have a GC-pointer type and ban transitions to regular pointers unless they are borrowed pointers then that would also be ok (you don't need to hijack malloc then).
May 27 2014
prev sibling next sibling parent "Dicebot" <public dicebot.lv> writes:
Big language change which does not fix any fundamental issue. I 
think at stage of language development it is better to not even 
discuss those ;)
May 28 2014
prev sibling next sibling parent "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Wednesday, 28 May 2014 at 14:16:56 UTC, Dicebot wrote:
 Big language change which does not fix any fundamental issue.

Having a GC pointer type + several other mechanisms could reduce the amount of scanned memory to a level where it slips below the "pain threshold" for interactive apps. That's a fundamental issue for anything that is not batch.
May 28 2014
prev sibling next sibling parent "Dicebot" <public dicebot.lv> writes:
On Wednesday, 28 May 2014 at 17:21:18 UTC, Ola Fosheim Grøstad 
wrote:
 On Wednesday, 28 May 2014 at 14:16:56 UTC, Dicebot wrote:
 Big language change which does not fix any fundamental issue.

Having a GC pointer type + several other mechanisms could reduce the amount of scanned memory to a level where it slips below the "pain threshold" for interactive apps. That's a fundamental issue for anything that is not batch.

No this is simply annoying problem. We are unlikely to break anything to fix annoyances (even huge ones). Fundamental issues in my opinion are those that result in type system holes or make certain common/desired code impossible without resorting to lot of assembly magic. Or areas that are complicated beyond explainable. Adding GC pointer type does not enable anything that you can't do write now for high-level applications and does not help at all low-level applications. It is niche solution.
May 28 2014
prev sibling next sibling parent "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Wednesday, 28 May 2014 at 17:27:20 UTC, Dicebot wrote:
 Adding GC pointer type does not enable anything that you can't 
 do write now for high-level applications and does not help at 
 all low-level applications. It is niche solution.

A niche solution is fine by me. Etienne has expressed interest in creating a D to asm.js converter. Now, maybe the current D is not suitable for that, but perhaps a dialect of D would be. I could back that. That makes two who are interested. Add 2-3 more people and we could have a train going to a station that is niche… but productive.
May 28 2014
prev sibling next sibling parent "Bastiaan Veelo" <Bastiaan Veelo.net> writes:
On Tuesday, 27 May 2014 at 16:47:38 UTC, Etienne wrote:
 You're right, it's obviously easier to keep it as the same 
 pointer syntax but hijack the stdlib malloc functions to 
 forcibly go through the GC.

 If the GC controls everything, you can keep the info in 8 byte 
 pointers.

 - The GC always returns an libc-incompatible pointer value
 - Dereferencing should call the resolver from the GC to process 
 it with the libc-compatible value (possibly just removing the 
 last couple bytes)
 - Sending a pointer through extern(C) should call the sanitizer 
 which resolves the real pointer through the GC and sends that
 - The last bytes of a pointer would contain thread ID for 
 void** and poolID for void*
 - This would only work on x64 platforms

This would also help implementing weak references, am I right? Which then come in handy when improving std.signals? Bastiaan.
May 28 2014
prev sibling next sibling parent "Dicebot" <public dicebot.lv> writes:
On Wednesday, 28 May 2014 at 17:35:18 UTC, Ola Fosheim Grøstad 
wrote:
 On Wednesday, 28 May 2014 at 17:27:20 UTC, Dicebot wrote:
 Adding GC pointer type does not enable anything that you can't 
 do write now for high-level applications and does not help at 
 all low-level applications. It is niche solution.

A niche solution is fine by me. Etienne has expressed interest in creating a D to asm.js converter. Now, maybe the current D is not suitable for that, but perhaps a dialect of D would be. I could back that. That makes two who are interested. Add 2-3 more people and we could have a train going to a station that is niche… but productive.

Get this into upstream and you will have dozens unhappy about updating for their code. It is never that simple.
May 28 2014
prev sibling next sibling parent "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Wednesday, 28 May 2014 at 17:39:21 UTC, Dicebot wrote:
 Get this into upstream and you will have dozens unhappy about 
 updating for their code. It is never that simple.

It doesn't have to be in upstream. It could be an experimental compiler implemented in pure D. I would back that, if Etienne is willing. There is a need for a strong typed language that can compile both to asm.js, PNACL and to machine language. I doubt that the current incarnation of D is suitable, so a reduced set of D with better GC/malloc support and tight codegen for small downloads would be welcome for interactive apps. I agree with you that D2 won't get this, but that does not prevent someone from creating D-.
May 28 2014
prev sibling next sibling parent Manu via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 29 May 2014 03:35, via Digitalmars-d <digitalmars-d puremagic.com> wrote:
 On Wednesday, 28 May 2014 at 17:27:20 UTC, Dicebot wrote:
 Adding GC pointer type does not enable anything that you can't do write
 now for high-level applications and does not help at all low-level
 applications. It is niche solution.

A niche solution is fine by me. Etienne has expressed interest in creating a D to asm.js converter. Now, maybe the current D is not suitable for that, but perhaps a dialect of D would be. I could back that. That makes two who are interested. Add 2-3 more people and we could have a train going to a station that is niche… but productive.

I would switch to a D fork in an instant if it satisfied my requirements...
May 31 2014
prev sibling next sibling parent "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Saturday, 31 May 2014 at 13:44:22 UTC, Manu via Digitalmars-d 
wrote:
 I would switch to a D fork in an instant if it satisfied my 
 requirements...

Maybe the best approach is to 1. Start with the basic requirements and acceptable restrictions and shrink the semantics to fit them. Then use the DScanner source code as a starting point and emit LLVM asm, and make sure the runtime fits with the restrictions of PNACL and asm.js. 2. Then accept new semantics/syntax and rather provide source2source converters with warnings from D, Swift etc. Safari is currently working on improving their JS compiler with LLVM tech so asm.js might perform good on all platforms eventually. Here is one benchmark based on Box2D: http://www.j15r.com/blog/2014/05/23/Box2d_2014_Update (What are your "absolute" requirements?)
Jun 04 2014
prev sibling parent "Sean Kelly" <sean invisibleduck.org> writes:
On Tuesday, 27 May 2014 at 02:52:48 UTC, Etienne Cimon wrote:
 I've been looking at the GC and found that the main problem is 
 that there's no clear information about the pointers. At least 
 smart pointers have some info inside them but GC pointers are 
 completely plain 4-8 bytes and nothing else.

 The GC is very fast even if it needs to lookup this info but I 
 believe it wouldn't stay low-cpu in a 128 GB of RAM server with 
 3GB/s of memory traffic with a wide range of memory segment 
 sizes.

 I think a decent proposal would be to
 1- Introduce a new GC pointer type, e.g. a void' (its an 
 apostrophee) used also in classes which implicitely converts to 
 void* by removing the last bytes (which ontain the info). This 
 pointer contains the Pool ID of the underlying memory

I think in SafeD it might be possible to just make this automatic. I don't see it ever happening in D proper though. What happens if I pass a dynamic array of pointers to memset?
Jun 04 2014