digitalmars.D - Precise GC

Walter Bright (23/23) Apr 07 2012 Of course, many of us have been thinking about this for a looong time, a...

Froglegs (3/3) Apr 07 2012 That sounds cool, perhaps people can have customizable GC for
Andrei Alexandrescu (7/10) Apr 07 2012 I'm also very excited about this design, and will make time to help with...

Andrei Alexandrescu (5/14) Apr 07 2012 BTW one exciting thing about both this and the nascent attributes design...
Antti-Ville Tuunainen (5/7) Apr 14 2012 That would be me.

Jacob Carlborg (4/10) Apr 14 2012 There is: http://prowiki.org/wiki4d/wiki.cgi?GSOC_2012_Ideas

Antti-Ville Tuunainen (3/4) Apr 14 2012 That's the ideas list for proposals. I was asking if anyone else

Jacob Carlborg (4/8) Apr 15 2012 Aha, I see. The chosen projects have not been announced yet.

Chad J (3/3) Apr 07 2012 Hey, that sounds awesome. I think I geeked out a bit.

Walter Bright (2/5) Apr 07 2012 It has nothing to do with reference counting that I can think of.

Andrei Alexandrescu (3/10) Apr 07 2012 Nevertheless good food for thought. This is all very cool.

Timon Gehr (2/28) Apr 08 2012 That actually sounds like a pretty awesome idea.

Timon Gehr (5/6) Apr 08 2012 Make sure that the compiler does not actually rely on the fact that the

Manu (6/13) Apr 08 2012 This sounds important to me. If it is also possible to do the work with

=?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= (6/19) Apr 08 2012 Yes, I agree here. The last thing we need is a huge amount of

deadalnix (4/27) Apr 09 2012 Nothing prevent the generated function to itself call other generated

Manu (10/42) Apr 09 2012 Eh?

deadalnix (11/20) Apr 09 2012 If you have reference to objects, you can't avoid a function call. If

Manu (9/29) Apr 09 2012 e

deadalnix (11/16) Apr 10 2012 OK, back to basics.

Andrei Alexandrescu (8/17) Apr 10 2012 That is correct. The code bloat is equal to that generated if the end

Kagamin (3/16) Apr 13 2012 What's the problem with virtual calls on ARM?

Manu (20/33) Apr 13 2012 No other processors have branch prediction units anywhere near

Kagamin (10/23) Apr 13 2012 Allocation of small aggregated objects usually involves

Manu (29/31) Apr 14 2012 All depends how much you love object orientation. If you follow the C++

Robert Jacques (3/36) Apr 14 2012 Is there any reason to assume that the GC is actually using the mark fun...

James Miller (5/6) Apr 13 2012 Hmmm, I thought we decided that was a good idea, anybody in the know if

Timon Gehr (3/4) Apr 08 2012 I understand that the stack will still have to be scanned

Walter Bright (2/6) Apr 08 2012 For now, just treat them conservatively.
Rainer Schuetze (6/10) Apr 08 2012 I guess the compiler should generate an (anonymous) struct type

=?ISO-8859-1?Q?Alex_R=F8nne_Petersen?= (6/17) Apr 08 2012 This sounds sensible to me. No reason closure marking can't be precise

Steven Schveighoffer (11/38) Apr 09 2012 I think this is a really good idea.
deadalnix (15/41) Apr 09 2012 This id a good idea. However, this doesn't handle type qualifiers. And

Walter Bright (2/6) Apr 09 2012 I think this is an orthogonal issue.

deadalnix (17/26) Apr 10 2012 You mean an allocator/deallocator one ?

Walter Bright <newshound2 digitalmars.com> writes:

Of course, many of us have been thinking about this for a looong time, and what 
is the best way to go about it. The usual technique is for the compiler to emit 
some sort of table for each TypeInfo giving the layout of the object, i.e.
where 
the pointers are.

The general problem with these is the table is non-trivial, as it will require 
things like iterated data blocks, etc. It has to be compressed to save space, 
and the gc then has to execute a fair amount of code to decode it.

It also requires some significant work on the compiler end, leading of course
to 
complexity, rigidity, development bottlenecks, and the usual bugs.

An alternative Andrei and I have been talking about is to put in the TypeInfo a 
pointer to a function. That function will contain customized code to mark the 
pointers in an instance of that type. That custom code will be generated by a 
template defined by the library. All the compiler has to do is stupidly 
instantiate the template for the type, and insert an address to the generated 
function.

The compiler need know NOTHING about how the marking works.

Even better, as ctRegex has demonstrated, the custom generated code can be
very, 
very fast compared with a runtime table-driven approach. (The slow part will be 
calling the function indirectly.)

And best of all, the design is pushed out of the compiler into the library, so 
various schemes can be tried out without needing compiler work.

I think this is an exciting idea, it will enable us to get a precise gc by 
enabling people to work on it in parallel rather than serially waiting for me.

Apr 07 2012

"Froglegs" <lugtug gmail.com> writes:

  That sounds cool, perhaps people can have customizable GC for
specific applications?

Looking forward to D having a precise GC

Apr 07 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 4/7/12 8:56 PM, Walter Bright wrote:
[snip]
 I think this is an exciting idea, it will enable us to get a precise gc
 by enabling people to work on it in parallel rather than serially
 waiting for me.

I'm also very excited about this design, and will make time to help with 
the library part of the implementation.

Maybe we can get a GSoC project on that. We already have a related 
proposal (lock-free GC).


Andrei

Apr 07 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 4/7/12 9:49 PM, Andrei Alexandrescu wrote:
 On 4/7/12 8:56 PM, Walter Bright wrote:
 [snip]
 I think this is an exciting idea, it will enable us to get a precise gc
 by enabling people to work on it in parallel rather than serially
 waiting for me.

 I'm also very excited about this design, and will make time to help with
 the library part of the implementation.

 Maybe we can get a GSoC project on that. We already have a related
 proposal (lock-free GC).

BTW one exciting thing about both this and the nascent attributes design 
is they integrate wonderful with language's generic and generative 
capabilities.

Andrei

Apr 07 2012

"Antti-Ville Tuunainen" <avtuunainen gmail.com> writes:

On Sunday, 8 April 2012 at 02:49:34 UTC, Andrei Alexandrescu 
wrote:
 Maybe we can get a GSoC project on that. We already have a 
 related proposal (lock-free GC).

That would be me.

Just though I should wave and say hello. Are there other GSoC 
proposals in the GC area?

Apr 14 2012

Jacob Carlborg <doob me.com> writes:

On 2012-04-14 19:46, Antti-Ville Tuunainen wrote:
 On Sunday, 8 April 2012 at 02:49:34 UTC, Andrei Alexandrescu wrote:
 Maybe we can get a GSoC project on that. We already have a related
 proposal (lock-free GC).

 That would be me.

 Just though I should wave and say hello. Are there other GSoC proposals
 in the GC area?

There is: http://prowiki.org/wiki4d/wiki.cgi?GSOC_2012_Ideas

-- 
/Jacob Carlborg

Apr 14 2012

"Antti-Ville Tuunainen" <avtuunainen gmail.com> writes:

On Saturday, 14 April 2012 at 19:17:02 UTC, Jacob Carlborg wrote:

 There is: http://prowiki.org/wiki4d/wiki.cgi?GSOC_2012_Ideas

That's the ideas list for proposals. I was asking if anyone else 
applied for GSoC using one of those.

Apr 14 2012

Jacob Carlborg <doob me.com> writes:

On 2012-04-15 04:11, Antti-Ville Tuunainen wrote:
 On Saturday, 14 April 2012 at 19:17:02 UTC, Jacob Carlborg wrote:

 There is: http://prowiki.org/wiki4d/wiki.cgi?GSOC_2012_Ideas

 That's the ideas list for proposals. I was asking if anyone else applied
 for GSoC using one of those.

Aha, I see. The chosen projects have not been announced yet.

-- 
/Jacob Carlborg

Apr 15 2012

Chad J <chadjoan __spam.is.bad__gmail.com> writes:

Hey, that sounds awesome.  I think I geeked out a bit.

Would this make it any easier to reference count types that can be 
statically proven to have no cyclical references?

Apr 07 2012

Walter Bright <newshound2 digitalmars.com> writes:

On 4/7/2012 7:58 PM, Chad J wrote:
 Hey, that sounds awesome. I think I geeked out a bit.

 Would this make it any easier to reference count types that can be statically
 proven to have no cyclical references?


It has nothing to do with reference counting that I can think of.

Apr 07 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 4/7/12 9:59 PM, Walter Bright wrote:
 On 4/7/2012 7:58 PM, Chad J wrote:
 Hey, that sounds awesome. I think I geeked out a bit.

 Would this make it any easier to reference count types that can be
 statically
 proven to have no cyclical references?


 It has nothing to do with reference counting that I can think of.

Nevertheless good food for thought. This is all very cool.

Andrei

Apr 07 2012

Timon Gehr <timon.gehr gmx.ch> writes:

On 04/08/2012 03:56 AM, Walter Bright wrote:
 Of course, many of us have been thinking about this for a looong time,
 and what is the best way to go about it. The usual technique is for the
 compiler to emit some sort of table for each TypeInfo giving the layout
 of the object, i.e. where the pointers are.

 The general problem with these is the table is non-trivial, as it will
 require things like iterated data blocks, etc. It has to be compressed
 to save space, and the gc then has to execute a fair amount of code to
 decode it.

 It also requires some significant work on the compiler end, leading of
 course to complexity, rigidity, development bottlenecks, and the usual
 bugs.

 An alternative Andrei and I have been talking about is to put in the
 TypeInfo a pointer to a function. That function will contain customized
 code to mark the pointers in an instance of that type. That custom code
 will be generated by a template defined by the library. All the compiler
 has to do is stupidly instantiate the template for the type, and insert
 an address to the generated function.

 The compiler need know NOTHING about how the marking works.

 Even better, as ctRegex has demonstrated, the custom generated code can
 be very, very fast compared with a runtime table-driven approach. (The
 slow part will be calling the function indirectly.)

 And best of all, the design is pushed out of the compiler into the
 library, so various schemes can be tried out without needing compiler work.

 I think this is an exciting idea, it will enable us to get a precise gc
 by enabling people to work on it in parallel rather than serially
 waiting for me.

That actually sounds like a pretty awesome idea.

Apr 08 2012

Timon Gehr <timon.gehr gmx.ch> writes:

On 04/08/2012 10:45 AM, Timon Gehr wrote:
 That actually sounds like a pretty awesome idea.

Make sure that the compiler does not actually rely on the fact that the 
template generates a function. The design should include the possibility 
of just generating tables. It all should be completely transparent to 
the compiler, if that is possible.

Apr 08 2012

Manu <turkeyman gmail.com> writes:

On 8 April 2012 11:56, Timon Gehr <timon.gehr gmx.ch> wrote:

 On 04/08/2012 10:45 AM, Timon Gehr wrote:

 That actually sounds like a pretty awesome idea.

 Make sure that the compiler does not actually rely on the fact that the
 template generates a function. The design should include the possibility of
 just generating tables. It all should be completely transparent to the
 compiler, if that is possible.

This sounds important to me. If it is also possible to do the work with
generated tables, and not calling thousands of indirect functions in
someone's implementation, it would be nice to reserve that possibility.
Indirect function calls in hot loops make me very nervous for non-x86
machines.

Apr 08 2012

=?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= <xtzgzorex gmail.com> writes:

On 08-04-2012 11:42, Manu wrote:
 On 8 April 2012 11:56, Timon Gehr <timon.gehr gmx.ch
 <mailto:timon.gehr gmx.ch>> wrote:

     On 04/08/2012 10:45 AM, Timon Gehr wrote:

         That actually sounds like a pretty awesome idea.


     Make sure that the compiler does not actually rely on the fact that
     the template generates a function. The design should include the
     possibility of just generating tables. It all should be completely
     transparent to the compiler, if that is possible.


 This sounds important to me. If it is also possible to do the work with
 generated tables, and not calling thousands of indirect functions in
 someone's implementation, it would be nice to reserve that possibility.
 Indirect function calls in hot loops make me very nervous for non-x86
 machines.

Yes, I agree here. The last thing we need is a huge amount of 
kinda-sorta-virtual function calls on ARM, MIPS, etc. It may work fine 
on x86, but anywhere else, it's really not what you want in a GC.

-- 
- Alex

Apr 08 2012

deadalnix <deadalnix gmail.com> writes:

Le 08/04/2012 14:02, Alex Rønne Petersen a écrit :
 On 08-04-2012 11:42, Manu wrote:
 On 8 April 2012 11:56, Timon Gehr <timon.gehr gmx.ch
 <mailto:timon.gehr gmx.ch>> wrote:

 On 04/08/2012 10:45 AM, Timon Gehr wrote:

 That actually sounds like a pretty awesome idea.


 Make sure that the compiler does not actually rely on the fact that
 the template generates a function. The design should include the
 possibility of just generating tables. It all should be completely
 transparent to the compiler, if that is possible.


 This sounds important to me. If it is also possible to do the work with
 generated tables, and not calling thousands of indirect functions in
 someone's implementation, it would be nice to reserve that possibility.
 Indirect function calls in hot loops make me very nervous for non-x86
 machines.

 Yes, I agree here. The last thing we need is a huge amount of
 kinda-sorta-virtual function calls on ARM, MIPS, etc. It may work fine
 on x86, but anywhere else, it's really not what you want in a GC.

Nothing prevent the generated function to itself call other generated 
functions, when things are predictable. It avoid many indirect calls, 
and purely by lib, which is good (can be tuned for application/plateform).

Apr 09 2012

Manu <turkeyman gmail.com> writes:

On 9 April 2012 21:20, deadalnix <deadalnix gmail.com> wrote:

 Le 08/04/2012 14:02, Alex R=C3=B8nne Petersen a =C3=A9crit :

  On 08-04-2012 11:42, Manu wrote:
 On 8 April 2012 11:56, Timon Gehr <timon.gehr gmx.ch
 <mailto:timon.gehr gmx.ch>> wrote:

 On 04/08/2012 10:45 AM, Timon Gehr wrote:

 That actually sounds like a pretty awesome idea.


 Make sure that the compiler does not actually rely on the fact that
 the template generates a function. The design should include the
 possibility of just generating tables. It all should be completely
 transparent to the compiler, if that is possible.


 This sounds important to me. If it is also possible to do the work with
 generated tables, and not calling thousands of indirect functions in
 someone's implementation, it would be nice to reserve that possibility.
 Indirect function calls in hot loops make me very nervous for non-x86
 machines.

 Yes, I agree here. The last thing we need is a huge amount of
 kinda-sorta-virtual function calls on ARM, MIPS, etc. It may work fine
 on x86, but anywhere else, it's really not what you want in a GC.

 Nothing prevent the generated function to itself call other generated
 functions, when things are predictable. It avoid many indirect calls, and
 purely by lib, which is good (can be tuned for application/plateform).

Eh?
Not sure what you mean. The idea is the template would produce a
struct/table of data instead of being a pointer to a function, this way the
GC could work without calling anything. If the GC was written to assume GC
info in a particular format/structure, it could be written without any
calls.
I'm just saying to leave that as a possibility, and not REQUIRE an indirect
function call for every single allocation in the system. Some GC might be
able to make better use of that sort of setup.

Apr 09 2012

deadalnix <deadalnix gmail.com> writes:

Le 09/04/2012 20:33, Manu a écrit :
 Eh?
 Not sure what you mean. The idea is the template would produce a
 struct/table of data instead of being a pointer to a function, this way
 the GC could work without calling anything. If the GC was written to
 assume GC info in a particular format/structure, it could be written
 without any calls.
 I'm just saying to leave that as a possibility, and not REQUIRE an
 indirect function call for every single allocation in the system. Some
 GC might be able to make better use of that sort of setup.

If you have reference to objects, you can't avoid a function call. If 
you have something you know at compile time, the generated function can 
directly call the other function that mark the pointed data (or even can 
do it itself, if you don't fear code bloat) without going back to the GC 
and its indirect call.

So it make no difference in the number of indirect calls you have, but 
the struct proposal is a stronger constraint on the GC that the function 
one.

BTW, starting you answer by « Not sure what you mean. » should have been 
a red flag.

Apr 09 2012

Manu <turkeyman gmail.com> writes:

On 10 April 2012 00:06, deadalnix <deadalnix gmail.com> wrote:

 Le 09/04/2012 20:33, Manu a =C3=A9crit :

  Eh?
 Not sure what you mean. The idea is the template would produce a
 struct/table of data instead of being a pointer to a function, this way
 the GC could work without calling anything. If the GC was written to
 assume GC info in a particular format/structure, it could be written
 without any calls.
 I'm just saying to leave that as a possibility, and not REQUIRE an
 indirect function call for every single allocation in the system. Some
 GC might be able to make better use of that sort of setup.

 If you have reference to objects, you can't avoid a function call. If you
 have something you know at compile time, the generated function can
 directly call the other function that mark the pointed data (or even can =

do
 it itself, if you don't fear code bloat) without going back to the GC and
 its indirect call.

 So it make no difference in the number of indirect calls you have, but th=

e
 struct proposal is a stronger constraint on the GC that the function one.

 BTW, starting you answer by =C2=AB Not sure what you mean. =C2=BB should =

have been a
 red flag.

It is, and I still don't follow. I can't imagine there are any indirect
function calls, except for the ones introduced by this proposal, where you
may register a function to mark the pointers in complex structs.
You seem to be suggesting that another one already exists anyway? Where is
it? Why is it there?

Apr 09 2012

deadalnix <deadalnix gmail.com> writes:

Le 10/04/2012 00:39, Manu a écrit :
 It is, and I still don't follow. I can't imagine there are any indirect
 function calls, except for the ones introduced by this proposal, where
 you may register a function to mark the pointers in complex structs.
 You seem to be suggesting that another one already exists anyway? Where
 is it? Why is it there?

OK, back to basics.

For every type, a function template (let's call it GCscan) will be 
instantiated to scan it. This function can be ANY code. ANY code include 
the possibility for GCscan!A to call GCscan!B directly, without going 
back to GC main loop and indirect call. If inlined, you can forget about 
function call at all (and you can force that using mixin template for 
example, but it is likely to massively generate code bloat).

This can't be done for reference/pointer to polymorphic types, but for 
any other it is an available option, and it can reduce dramatically the 
number of indirect calls.

Apr 10 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 4/10/12 3:03 AM, deadalnix wrote:
 For every type, a function template (let's call it GCscan) will be
 instantiated to scan it. This function can be ANY code. ANY code include
 the possibility for GCscan!A to call GCscan!B directly, without going
 back to GC main loop and indirect call. If inlined, you can forget about
 function call at all (and you can force that using mixin template for
 example, but it is likely to massively generate code bloat).

That is correct. The code bloat is equal to that generated if the end 
user sat down and wrote by hand appropriate routines for collection for 
all types.

 This can't be done for reference/pointer to polymorphic types, but for
 any other it is an available option, and it can reduce dramatically the
 number of indirect calls.

Indeed. For non-final class member variables, the template will fetch 
their Typeinfo, from that the pointer to the mark function, and will 
issue the call through pointer.


Andrei

Apr 10 2012

"Kagamin" <spam here.lot> writes:

On Sunday, 8 April 2012 at 12:02:10 UTC, Alex Rønne Petersen 
wrote:
 This sounds important to me. If it is also possible to do the 
 work with
 generated tables, and not calling thousands of indirect 
 functions in
 someone's implementation, it would be nice to reserve that 
 possibility.
 Indirect function calls in hot loops make me very nervous for 
 non-x86
 machines.

 Yes, I agree here. The last thing we need is a huge amount of 
 kinda-sorta-virtual function calls on ARM, MIPS, etc. It may 
 work fine on x86, but anywhere else, it's really not what you 
 want in a GC.

What's the problem with virtual calls on ARM?

Apr 13 2012

Manu <turkeyman gmail.com> writes:

On 13 April 2012 15:53, Kagamin <spam here.lot> wrote:

 On Sunday, 8 April 2012 at 12:02:10 UTC, Alex R=C3=B8nne Petersen wrote:

 This sounds important to me. If it is also possible to do the work with
 generated tables, and not calling thousands of indirect functions in
 someone's implementation, it would be nice to reserve that possibility.
 Indirect function calls in hot loops make me very nervous for non-x86
 machines.

 Yes, I agree here. The last thing we need is a huge amount of
 kinda-sorta-virtual function calls on ARM, MIPS, etc. It may work fine o=


n
 x86, but anywhere else, it's really not what you want in a GC.

 What's the problem with virtual calls on ARM?

No other processors have branch prediction units anywhere near
the sophistication of modern x86. Any call through a function pointer
stalls the pipeline, pipelines are getting longer all the time, and PPC has
even more associated costs/hazards.
Most processors can only perform trivial binary branch prediction around an
'if'.
It also places burden on the icache (unable to prefetch), and of course the
dcache, both of which are much less sophisticated than x86 aswell.
Compiler can't do anything with code locality (improves icache usage),
since the target is unknown at compile time... there are also pipeline
stalls introduced by the sequence of indirect pointer lookups preceding any
virtual call.
Virtuals are possibly the worst hazard to modern CPU's, and the hardest to
detect/profile, since their cost is evenly spread throughout the entire
code base, you can never gauge their true impact on your performance. You
also can't easily measure the affect of icache misses on your code, suffice
to say, you will have MANY more in virtual-heavy code.

While I'm at it. 'final:' and 'virtual' keyword please ;)

Apr 13 2012

"Kagamin" <spam here.lot> writes:

On Friday, 13 April 2012 at 13:54:39 UTC, Manu wrote:
 No other processors have branch prediction units anywhere near
 the sophistication of modern x86. Any call through a function 
 pointer
 stalls the pipeline, pipelines are getting longer all the time, 
 and PPC has
 even more associated costs/hazards.
 Most processors can only perform trivial binary branch 
 prediction around an
 'if'.
 It also places burden on the icache (unable to prefetch), and 
 of course the
 dcache, both of which are much less sophisticated than x86 
 aswell.

Allocation of small aggregated objects usually involves 
allocation of several equally small objects of different types in 
a row, so they sit one after another in heap and gc will visit 
them in a row every time calling function different from the 
previous time, so to x86 processor it would result in constant 
misprediction: AFAIK x86 processor caches only one target address 
per branch (ARM caches a flag?). And icache should not suffer in 
both cases: once you prefetched the function, it will remain in 
the icache and be reused from there the next time.

Apr 13 2012

Manu <turkeyman gmail.com> writes:

On 13 April 2012 17:25, Kagamin <spam here.lot> wrote:
 once you prefetched the function, it will remain in the icache and be
 reused from there the next time.

All depends how much you love object orientation. If you follow the C++
book and make loads of classes for everything, you'll thrash the hell out
of it. If you only have a couple of different object, maybe they'll coexist
in the icache.
The GC is a bit of a special case though because it runs in a tight loop.
That said, the pipeline hazards still exist regardless of the state of
icache.
Conventional virtuals are worse, since during the process of executing
regular code, there's not usually such a tight loop pattern.
(note: I was answering the prior question about virtual functions in
general, not as applied to the specific use case of a GC scan)

The latest 'turbo' ARM chips (Cortex, etc) and such may store a branch
target table, they are alleged to have improved prediction, but I haven't
checked.
Prior chips (standard Arm9 and down, note: most non-top-end androids fall
in this category, and all games consoles with arms in them) don't
technically have branch prediction at all. ARM has conditional execution
bits on instructions, so it can filter opcodes based on the state of the
flags register. This is a cool design for binary branching, or performing
'select' operations, but it can't do anything to help an indirect jump.

Point is, the GC is the most fundamental performance hazard to D, and I
just think it's important to make sure the design is such that it is
possible to write a GC loop which can do its job with generated data tables
if possible, instead of requiring generated marker functions.
It would seem to me that carefully crafted tables of data could technically
perform the same function as marker functions, but without any function
calls... and the performance characteristics may be better/worse for
different architectures.

Apr 14 2012

"Robert Jacques" <sandford jhu.edu> writes:

On Sat, 14 Apr 2012 05:21:08 -0500, Manu <turkeyman gmail.com> wrote:

 On 13 April 2012 17:25, Kagamin <spam here.lot> wrote:
 once you prefetched the function, it will remain in the icache and be
 reused from there the next time.

 All depends how much you love object orientation. If you follow the C++
 book and make loads of classes for everything, you'll thrash the hell out
 of it. If you only have a couple of different object, maybe they'll coexist
 in the icache.
 The GC is a bit of a special case though because it runs in a tight loop.
 That said, the pipeline hazards still exist regardless of the state of
 icache.
 Conventional virtuals are worse, since during the process of executing
 regular code, there's not usually such a tight loop pattern.
 (note: I was answering the prior question about virtual functions in
 general, not as applied to the specific use case of a GC scan)

 The latest 'turbo' ARM chips (Cortex, etc) and such may store a branch
 target table, they are alleged to have improved prediction, but I haven't
 checked.
 Prior chips (standard Arm9 and down, note: most non-top-end androids fall
 in this category, and all games consoles with arms in them) don't
 technically have branch prediction at all. ARM has conditional execution
 bits on instructions, so it can filter opcodes based on the state of the
 flags register. This is a cool design for binary branching, or performing
 'select' operations, but it can't do anything to help an indirect jump.

 Point is, the GC is the most fundamental performance hazard to D, and I
 just think it's important to make sure the design is such that it is
 possible to write a GC loop which can do its job with generated data tables
 if possible, instead of requiring generated marker functions.
 It would seem to me that carefully crafted tables of data could technically
 perform the same function as marker functions, but without any function
 calls... and the performance characteristics may be better/worse for
 different architectures.

Is there any reason to assume that the GC is actually using the mark function
for marking? The mark function is only there to provide the GC with the
information it needs in order to mark. What it actually is is defined by the
runtime. So it may just return a bitmask. Or it might _be_ a bitmask (for
objects <= 512 bytes) Or an 64-bit index into some internal table. Etc. If
fact, real functions only start making sense with very large arrays and
compound structs/objects.

My actual concern with this approach is that a DLL compiled with GC A will have
problems interacting with a program with GC B.

Apr 14 2012

James Miller <james aatch.net> writes:

On 2012-04-13 16:54:28 +0300 Manu <turkeyman gmail.com> wrote:
 While I'm at it. 'final:' and 'virtual' keyword please ;)

Hmmm, I thought we decided that was a good idea, anybody in the know if
this going to happen or not?

--
James Miller

Apr 13 2012

Timon Gehr <timon.gehr gmx.ch> writes:

On 04/08/2012 10:45 AM, Timon Gehr wrote:
 That actually sounds like a pretty awesome idea.

I understand that the stack will still have to be scanned 
conservatively, but how does the scheme deal with closures?

Apr 08 2012

Walter Bright <newshound2 digitalmars.com> writes:

On 4/8/2012 2:21 AM, Timon Gehr wrote:
 On 04/08/2012 10:45 AM, Timon Gehr wrote:
 That actually sounds like a pretty awesome idea.

 I understand that the stack will still have to be scanned conservatively, but
 how does the scheme deal with closures?

For now, just treat them conservatively.

Apr 08 2012

Rainer Schuetze <r.sagitario gmx.de> writes:

On 4/8/2012 11:21 AM, Timon Gehr wrote:
 On 04/08/2012 10:45 AM, Timon Gehr wrote:
 That actually sounds like a pretty awesome idea.

 I understand that the stack will still have to be scanned
 conservatively, but how does the scheme deal with closures?

I guess the compiler should generate an (anonymous) struct type 
corresponding to the closure data layout. There probably has to be a 
template for compiler generated structs or classes anyway.

This new type could also be used as the type of the context pointer, so 
a debugger could display the closure variables.

Apr 08 2012

=?ISO-8859-1?Q?Alex_R=F8nne_Petersen?= <xtzgzorex gmail.com> writes:

On 08-04-2012 12:07, Rainer Schuetze wrote:
 On 4/8/2012 11:21 AM, Timon Gehr wrote:
 On 04/08/2012 10:45 AM, Timon Gehr wrote:
 That actually sounds like a pretty awesome idea.

 I understand that the stack will still have to be scanned
 conservatively, but how does the scheme deal with closures?

 I guess the compiler should generate an (anonymous) struct type
 corresponding to the closure data layout. There probably has to be a
 template for compiler generated structs or classes anyway.

 This new type could also be used as the type of the context pointer, so
 a debugger could display the closure variables.

This sounds sensible to me. No reason closure marking can't be precise 
if the compiler just emits the relevant type info (pretty much any other 


-- 
- Alex

Apr 08 2012

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Sat, 07 Apr 2012 21:56:09 -0400, Walter Bright  
<newshound2 digitalmars.com> wrote:

 Of course, many of us have been thinking about this for a looong time,  
 and what is the best way to go about it. The usual technique is for the  
 compiler to emit some sort of table for each TypeInfo giving the layout  
 of the object, i.e. where the pointers are.

 The general problem with these is the table is non-trivial, as it will  
 require things like iterated data blocks, etc. It has to be compressed  
 to save space, and the gc then has to execute a fair amount of code to  
 decode it.

 It also requires some significant work on the compiler end, leading of  
 course to complexity, rigidity, development bottlenecks, and the usual  
 bugs.

 An alternative Andrei and I have been talking about is to put in the  
 TypeInfo a pointer to a function. That function will contain customized  
 code to mark the pointers in an instance of that type. That custom code  
 will be generated by a template defined by the library. All the compiler  
 has to do is stupidly instantiate the template for the type, and insert  
 an address to the generated function.

 The compiler need know NOTHING about how the marking works.

 Even better, as ctRegex has demonstrated, the custom generated code can  
 be very, very fast compared with a runtime table-driven approach. (The  
 slow part will be calling the function indirectly.)

 And best of all, the design is pushed out of the compiler into the  
 library, so various schemes can be tried out without needing compiler  
 work.

 I think this is an exciting idea, it will enable us to get a precise gc  
 by enabling people to work on it in parallel rather than serially  
 waiting for me.

I think this is a really good idea.

I would like to go further and propose that there be an arbitrary way to  
add members to the TypeInfo types using templates.  Not sure how it would  
be implemented, but I don't see why this has to be specific to GCs.  Some  
way to signify "hey compiler, please initialize this member with template  
X given the type being compiled".

This could be a huge bridge between compile-time and runtime type  
information.

-Steve

Apr 09 2012

deadalnix <deadalnix gmail.com> writes:

Le 08/04/2012 03:56, Walter Bright a �crit :
 Of course, many of us have been thinking about this for a looong time,
 and what is the best way to go about it. The usual technique is for the
 compiler to emit some sort of table for each TypeInfo giving the layout
 of the object, i.e. where the pointers are.

 The general problem with these is the table is non-trivial, as it will
 require things like iterated data blocks, etc. It has to be compressed
 to save space, and the gc then has to execute a fair amount of code to
 decode it.

 It also requires some significant work on the compiler end, leading of
 course to complexity, rigidity, development bottlenecks, and the usual
 bugs.

 An alternative Andrei and I have been talking about is to put in the
 TypeInfo a pointer to a function. That function will contain customized
 code to mark the pointers in an instance of that type. That custom code
 will be generated by a template defined by the library. All the compiler
 has to do is stupidly instantiate the template for the type, and insert
 an address to the generated function.

 The compiler need know NOTHING about how the marking works.

 Even better, as ctRegex has demonstrated, the custom generated code can
 be very, very fast compared with a runtime table-driven approach. (The
 slow part will be calling the function indirectly.)

 And best of all, the design is pushed out of the compiler into the
 library, so various schemes can be tried out without needing compiler work.

 I think this is an exciting idea, it will enable us to get a precise gc
 by enabling people to work on it in parallel rather than serially
 waiting for me.

This id a good idea. However, this doesn't handle type qualifiers. And 
this is important !

D2 type system is made in such a way that most data are either thread 
local or immutable, and a small amount is shared. Both thread local 
storage and immutability are source of BIG improvement for the GC. Doing 
without it is a huge design error.

For instance, Ocaml's GC is known to be more performant than Java's. 
Because in Caml, most data are immutable, and the GC take advantage of 
this. Immutability means 100% concurrent garbage collection.

In the other hand, TLS can be collected independently and only influence 
the thread that own the data. Both are every powerfull improvement, and 
the design you propose � as this � cannot provide any mean to handle 
that. Which is a big missed opportunity, and will be hard to change in 
the future.

Apr 09 2012

Walter Bright <newshound2 digitalmars.com> writes:

On 4/9/2012 11:30 AM, deadalnix wrote:
 In the other hand, TLS can be collected independently and only influence the
 thread that own the data. Both are every powerfull improvement, and the design
 you propose � as this � cannot provide any mean to handle that. Which is a big
 missed opportunity, and will be hard to change in the future.

I think this is an orthogonal issue.

Apr 09 2012

deadalnix <deadalnix gmail.com> writes:

Le 09/04/2012 23:27, Walter Bright a �crit :
 On 4/9/2012 11:30 AM, deadalnix wrote:
 In the other hand, TLS can be collected independently and only
 influence the
 thread that own the data. Both are every powerfull improvement, and
 the design
 you propose � as this � cannot provide any mean to handle that. Which
 is a big
 missed opportunity, and will be hard to change in the future.

 I think this is an orthogonal issue.

You mean an allocator/deallocator one ?

I'm not sure. For instance, concurrent shared memory scanning will 
require some magic on reference changes (it can be hooked into the 
program using page protection). In such a case, you have constraint in 
what the scanning function can do or can't.

If the function is scanning immutable data, such a constraint disappears.

In a similar way, when scanning TLS, you'll want to avoid going into non 
TLS world. This is currently possible only of you go back to main GC 
code and trigger the indirect call every single time you encounter a 
pointer or a reference. This is going to be a performance killer on many 
architecture.

So this code, in a way or another will need to be aware of the 
qualifier. Or it will either require to pass every single 
pointer/reference into an indirect function call, or forget about 
optimizations that the type system has been made to allow (in the 
program in general, not especially in the GC).

Apr 10 2012

D Programming

C/C++ Programming

Other

digitalmars.D - Precise GC