www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Thread local and memory allocation

reply deadalnix <deadalnix gmail.com> writes:
D's uses thread local storage for most of its data. And it's a good thing.

However, the allocation mecanism isn't aware of it. In addition, it has 
no way to handle it in the future as things are specified.

As long as you don't have any pointer in shared memory to thread local 
data (thank to the type system) so this is something GC could use at his 
own advantage.

As long as good pratice should minimize as much as possible the usage of 
shared data, this design choice make things worse for good design, which 
is, IMO, not D's phylosophy.

The advantages of handling this at memory management levels are the 
followings :
- Swap friendlyness. Data of a given thread can be located in blocks, so 
an idle thread can be swapped easily without huge penality on 
performance. Anyone who have used chrome and firefox with a lots of tabs 
on a machine with limited memory know what I'm talking about : firefox 
uses less memory than whrome, but performance are terrible anyway, 
because chrome memory layout is more cache friendly (tabs memory isn't 
mixed with each others).
- Effisciency in heavily multithreaded application like servers : the 
more thread run in the program, the more a stop the world GC is costly. 
As long as good design imply separate data from thread as much as 
possible, a thread local collection can be triggered at time without 
stopping other threads.

Even is thoses improvements are not implemented yet and anytime soon, it 
kinda sad that the current interface doesn't allow for this.

What I suggest in add a flag SHARED in BlkAttr and store it as an 
attribute of the block. Later modification could be made according to 
this flag. This attribute shouldn't be modifiable later on.

What do you think ? Is it something it worth working on ? If it is, how 
can I help ?
Oct 03 2011
next sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
On Oct 3, 2011, at 12:48 PM, deadalnix wrote:

 D's uses thread local storage for most of its data. And it's a good =

=20
 However, the allocation mecanism isn't aware of it. In addition, it =

=20
 As long as you don't have any pointer in shared memory to thread local =

own advantage.
=20
 As long as good pratice should minimize as much as possible the usage =

which is, IMO, not D's phylosophy.
=20
 The advantages of handling this at memory management levels are the =

 - Swap friendlyness. Data of a given thread can be located in blocks, =

performance. Anyone who have used chrome and firefox with a lots of tabs = on a machine with limited memory know what I'm talking about : firefox = uses less memory than whrome, but performance are terrible anyway, = because chrome memory layout is more cache friendly (tabs memory isn't = mixed with each others).
 - Effisciency in heavily multithreaded application like servers : the =

As long as good design imply separate data from thread as much as = possible, a thread local collection can be triggered at time without = stopping other threads.
=20
 Even is thoses improvements are not implemented yet and anytime soon, =

=20
 What I suggest in add a flag SHARED in BlkAttr and store it as an =

this flag. This attribute shouldn't be modifiable later on.
=20
 What do you think ? Is it something it worth working on ? If it is, =

There's another important issue that hasn't yet been addressed, which is = that when the GC collects memory, the thread that finalizes non-shared = data should be the one that created it. So that SHARED flag should = really be a thread-id of some sort. Alternately, each thread could = allocate from its own pool, with shared allocations coming from a common = pool. This would allow the lock granularity to be reduced and in some = cases eliminated. I'd like to move to CDGC as an intermediate step, and that will need = some testing and polish. That would allow for precise collections if = the compiler support is added. Then the thread-local finalization has = to be tackled one way or another. I'd favor per-thread heaps but am = open to suggestions and/or help.=
Oct 03 2011
next sibling parent bearophile <bearophileHUGS lycos.com> writes:
Sean Kelly:

 I'd favor per-thread heaps but am open to suggestions and/or help.

(I am ignorant still about such issues, so I usually keep myself quiet about them. Please forgive me if I am saying stupid things.) The memory today is organized like a tree, the larger memories are slower, and the far the memory is, the more costly it is to move data across, and to keep coherence across two pieces of data that are supposed to be the "same" data. If I have a CPU with 4 cores, each core has hyper-threading, and each pair of cores has its L2 cache, then I think it is better for your code to use 2 heaps (one heap for each L2 cache). If future CPUs will have even more cores, with totally independent local memory, then this memory has to correspond to a different heap. Bye, bearophile
Oct 03 2011
prev sibling next sibling parent deadalnix <deadalnix gmail.com> writes:
Yes, I was thinking is such a thing. Each thread has a local heap and 
you have a common shared heap too, with shared data in it.

Is such a case, the flag is suffiscient because then the GC could handle 
that and trigger thread local heap allocation instead of shared one.

This is consistent with the swap friendliness I was talking about and 
can reduce the need of synchronization when allocating memory (a lock 
will only occur if the GC doesn't have any memory left in his pool for 
the given thread).

And solve finalization's thread, yes.

Le 03/10/2011 22:54, Sean Kelly a écrit :
 On Oct 3, 2011, at 12:48 PM, deadalnix wrote:

 There's another important issue that hasn't yet been addressed, which is that
when the GC collects memory, the thread that finalizes non-shared data should
be the one that created it.  So that SHARED flag should really be a thread-id
of some sort.  Alternately, each thread could allocate from its own pool, with
shared allocations coming from a common pool.  This would allow the lock
granularity to be reduced and in some cases eliminated.

 I'd like to move to CDGC as an intermediate step, and that will need some
testing and polish.  That would allow for precise collections if the compiler
support is added.  Then the thread-local finalization has to be tackled one way
or another.  I'd favor per-thread heaps but am open to suggestions and/or help.

Oct 03 2011
prev sibling parent Jason House <jason.james.house gmail.com> writes:
Sean Kelly Wrote:
 There's another important issue that hasn't yet been addressed, which is that
when the GC collects memory, the thread that finalizes non-shared data should
be the one that created it.  So that SHARED flag should really be a thread-id
of some sort.  Alternately, each thread could allocate from its own pool, with
shared allocations coming from a common pool.  This would allow the lock
granularity to be reduced and in some cases eliminated.

Why not run the collection for a single thread in the thread being collected? It's a simple way to force where the finalizer runs. It's a big step up from stop-the world collections, but still requires pauses.
Oct 03 2011
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 10/3/2011 12:48 PM, deadalnix wrote:
 What do you think ? Is it something it worth working on ? If it is, how can I
 help ?

It is a great idea, and it has been discussed before. The difficulties are when thread local allocated data gets shared with other threads, like for instance immutable data that is implicitly shareable.
Oct 03 2011
next sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
On Oct 3, 2011, at 3:55 PM, Walter Bright wrote:

 On 10/3/2011 12:48 PM, deadalnix wrote:
 What do you think ? Is it something it worth working on ? If it is, =


 help ?

It is a great idea, and it has been discussed before. The difficulties =

like for instance immutable data that is implicitly shareable. Immutable data would have to be allocated on the shared heap as well, = which means the contention for the shared heap may actually be fairly = significant. But the alternatives are all too complex (migrating = immutable data from local pools to a common pool when a thread = terminates, etc). There's also the problem of transferring knowledge of = whether something is immutable into the allocation routine. As things = stand, I don't believe that type info is available.=
Oct 03 2011
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 10/3/2011 4:20 PM, Sean Kelly wrote:
 Immutable data would have to be allocated on the shared heap as well, which
 means the contention for the shared heap may actually be fairly significant.
 But the alternatives are all too complex (migrating immutable data from local
 pools to a common pool when a thread terminates, etc).  There's also the
 problem of transferring knowledge of whether something is immutable into the
 allocation routine.  As things stand, I don't believe that type info is
 available.

Right. The current language allows no way to determine in advance if an allocation will be eventually made immutable (or shared) or not. However, if the gc used thread local pools to do the allocation from (not the collection), the gc would go faster because it wouldn't need locking to allocate from those pools. This change can happen without any language or compiler changes.
Oct 03 2011
parent reply deadalnix <deadalnix gmail.com> writes:
Le 04/10/2011 02:15, Walter Bright a écrit :
 On 10/3/2011 4:20 PM, Sean Kelly wrote:
 Immutable data would have to be allocated on the shared heap as well,
 which
 means the contention for the shared heap may actually be fairly
 significant.
 But the alternatives are all too complex (migrating immutable data
 from local
 pools to a common pool when a thread terminates, etc). There's also the
 problem of transferring knowledge of whether something is immutable
 into the
 allocation routine. As things stand, I don't believe that type info is
 available.

Right. The current language allows no way to determine in advance if an allocation will be eventually made immutable (or shared) or not. However, if the gc used thread local pools to do the allocation from (not the collection), the gc would go faster because it wouldn't need locking to allocate from those pools. This change can happen without any language or compiler changes.

Do you mean manage the memory that way : Shared heap -> TL pool within the shared heap -> allocation in thread from TL pool. And complete GC collect. This is a good solution do reduce contention on allocation. But a very different thing than I was initially talking about. ******* Back to the point, Considering you have pointer to immutable from any dataset, but not the other way around, this is also valid to get a flag for it in the allocation interface. What is the issue with the compiler here ?
Oct 04 2011
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 10/4/2011 1:22 AM, deadalnix wrote:
 Do you mean manage the memory that way :
 Shared heap -> TL pool within the shared heap -> allocation in thread from TL
pool.

 And complete GC collect.

Yes.
 This is a good solution do reduce contention on allocation. But a very
different
 thing than I was initially talking about.

Yes.
 Back to the point,

 Considering you have pointer to immutable from any dataset, but not the other
 way around, this is also valid to get a flag for it in the allocation
interface.

 What is the issue with the compiler here ?

Allocate an object, then cast it to immutable, and pass it to another thread.
Oct 04 2011
parent reply deadalnix <deadalnix gmail.com> writes:
Le 04/10/2011 10:52, Walter Bright a écrit :
 Allocate an object, then cast it to immutable, and pass it to another
 thread.

That is explicitly said to be unsafe on D's website. As long as a reference exist in the creating thread, this should work, but if thoses references disapears, you'll end up with memory corruption. This is why the type system is made for isn't it ? And if you decide to do funky stuff bypassing it, unsafe things can happen.
Oct 04 2011
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 10/4/2011 2:32 AM, deadalnix wrote:
 Le 04/10/2011 10:52, Walter Bright a écrit :
 Allocate an object, then cast it to immutable, and pass it to another
 thread.

That is explicitly said to be unsafe on D's website. As long as a reference exist in the creating thread, this should work, but if thoses references disapears, you'll end up with memory corruption.

Unsafe doesn't mean "undefined behavior". It just means the compiler cannot guarantee that you did it correctly. If you do it correctly, it still should work. On the other hand, "undefined behavior" cannot be done correctly. With casts to immutable, it is perfectly correct if you, the user, ensure that there are no other mutable references to the same data. It's just that the compiler itself cannot make this guarantee, hence it's "unsafe". Casting from immutable to mutable, on the other hand, is "undefined behavior" because neither the compiler nor you, the user, can guarantee it will work.
Oct 04 2011
parent reply deadalnix <deadalnix gmail.com> writes:
Le 04/10/2011 20:30, Walter Bright a écrit :
 On 10/4/2011 2:32 AM, deadalnix wrote:
 With casts to immutable, it is perfectly correct if you, the user,
 ensure that there are no other mutable references to the same data. It's
 just that the compiler itself cannot make this guarantee, hence it's
 "unsafe".

 Casting from immutable to mutable, on the other hand, is "undefined
 behavior" because neither the compiler nor you, the user, can guarantee
 it will work.

This looks like more a flaw in the type system or lack of tools to deal with the type system than a real allocation issue. I see two solutions to deal with this : Something allocated on a Thread local heap can be seen from other threads, and this is safe as long as a reference is kept in the allocating thread. So, if you cast something TL and mutable as immutable, you have to ensure yourself that you will not modify it. Plus, you need to ensure that you keep a reference on that object in the allocating thread, otherwise, you'll see it collected. A shared casted as immutable should exprience any issue. The other apporach is to give a way to explicitely say to the compiler that this will be casted as shared/immutable at some point and should be allocated on the corresponding heap. Thoses two solutions are not exclusives and can be both implemented. Maybe I'm wrong, but it doesn't seems that the issue is that big. Anyway, thoses thing have a big impact, so they should be considered several times.
Oct 05 2011
parent Walter Bright <newshound2 digitalmars.com> writes:
On 10/5/2011 6:25 AM, deadalnix wrote:
 I see two solutions to deal with this :

 Something allocated on a Thread local heap can be seen from other threads, and
 this is safe as long as a reference is kept in the allocating thread.

 So, if you cast something TL and mutable as immutable, you have to ensure
 yourself that you will not modify it. Plus, you need to ensure that you keep a
 reference on that object in the allocating thread, otherwise, you'll see it
 collected.

 A shared casted as immutable should exprience any issue.

 The other apporach is to give a way to explicitely say to the compiler that
this
 will be casted as shared/immutable at some point and should be allocated on the
 corresponding heap.

 Thoses two solutions are not exclusives and can be both implemented. Maybe I'm
 wrong, but it doesn't seems that the issue is that big.

Both your solutions can work, but they can be highly error prone as they rely on the programmer getting the details right, and there's little means to verify that they did get them right.
Oct 05 2011
prev sibling parent bearophile <bearophileHUGS lycos.com> writes:
deadalnix:

 This is why the type system is made for isn't it ?

Casts are often the points where type systems fail :-) Bye, bearophile
Oct 04 2011
prev sibling next sibling parent Andrew Wiley <wiley.andrew.j gmail.com> writes:
On Tue, Oct 4, 2011 at 3:52 AM, Walter Bright
<newshound2 digitalmars.com> wrote:
 On 10/4/2011 1:22 AM, deadalnix wrote:
 Do you mean manage the memory that way :
 Shared heap -> TL pool within the shared heap -> allocation in thread from
 TL pool.

 And complete GC collect.

Yes.
 This is a good solution do reduce contention on allocation. But a very
 different
 thing than I was initially talking about.

Yes.
 Back to the point,

 Considering you have pointer to immutable from any dataset, but not the
 other
 way around, this is also valid to get a flag for it in the allocation
 interface.

 What is the issue with the compiler here ?

Allocate an object, then cast it to immutable, and pass it to another thread.

Assuming we have to make a call to the GC when an object toggles its immutable/shared state, it seems like this whole approach would basically murder anyone doing message passing with ownership changes, because the workflow tends to be create an object -> cast to shared -> send to another thread -> cast away shared -> do work -> cast to shared... On the other hand, I guess the counterargument is that locking an uncontended lock is on the order of two instructions (or so I'm told), so casting away shared probably isn't ever necessary. It just seems somewhat counterintuitive that casting to and from shared would be slower than unnecessarily locking the object.
Oct 04 2011
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Tue, 04 Oct 2011 10:54:58 -0400, Andrew Wiley <wiley.andrew.j gmail.com>
wrote:

 On Tue, Oct 4, 2011 at 3:52 AM, Walter Bright
 <newshound2 digitalmars.com> wrote:
 On 10/4/2011 1:22 AM, deadalnix wrote:
 Do you mean manage the memory that way :
 Shared heap -> TL pool within the shared heap -> allocation in thread from
 TL pool.

 And complete GC collect.

Yes.
 This is a good solution do reduce contention on allocation. But a very
 different
 thing than I was initially talking about.

Yes.
 Back to the point,

 Considering you have pointer to immutable from any dataset, but not the
 other
 way around, this is also valid to get a flag for it in the allocation
 interface.

 What is the issue with the compiler here ?

Allocate an object, then cast it to immutable, and pass it to another thread.

Assuming we have to make a call to the GC when an object toggles its immutable/shared state, it seems like this whole approach would basically murder anyone doing message passing with ownership changes, because the workflow tends to be create an object -> cast to shared -> send to another thread -> cast away shared -> do work -> cast to shared... On the other hand, I guess the counterargument is that locking an uncontended lock is on the order of two instructions (or so I'm told), so casting away shared probably isn't ever necessary. It just seems somewhat counterintuitive that casting to and from shared would be slower than unnecessarily locking the object.

It's entirely possible to simply allocate the memory for the object from the shared heap to start with. Then no more calls to the GC are needed.
Oct 04 2011
prev sibling next sibling parent Andrew Wiley <wiley.andrew.j gmail.com> writes:
On Tue, Oct 4, 2011 at 8:59 PM, Robert Jacques <sandford jhu.edu> wrote:
 On Tue, 04 Oct 2011 10:54:58 -0400, Andrew Wiley <wiley.andrew.j gmail.com>
 wrote:

 On Tue, Oct 4, 2011 at 3:52 AM, Walter Bright
 <newshound2 digitalmars.com> wrote:
 On 10/4/2011 1:22 AM, deadalnix wrote:
 Do you mean manage the memory that way :
 Shared heap -> TL pool within the shared heap -> allocation in thread
 from
 TL pool.

 And complete GC collect.

Yes.
 This is a good solution do reduce contention on allocation. But a very
 different
 thing than I was initially talking about.

Yes.
 Back to the point,

 Considering you have pointer to immutable from any dataset, but not the
 other
 way around, this is also valid to get a flag for it in the allocation
 interface.

 What is the issue with the compiler here ?

Allocate an object, then cast it to immutable, and pass it to another thread.

Assuming we have to make a call to the GC when an object toggles its immutable/shared state, it seems like this whole approach would basically murder anyone doing message passing with ownership changes, because the workflow tends to be create an object -> cast to shared -> send to another thread -> cast away shared -> do work -> cast to shared... On the other hand, I guess the counterargument is that locking an uncontended lock is on the order of two instructions (or so I'm told), so casting away shared probably isn't ever necessary. It just seems somewhat counterintuitive that casting to and from shared would be slower than unnecessarily locking the object.

It's entirely possible to simply allocate the memory for the object from the shared heap to start with. Then no more calls to the GC are needed.

When an object is created and later cast to shared, the compiler *can't* know that it should allocate from the shared heap because the cast may not be anywhere near where the object was created. The same problem goes for immutable.
Oct 04 2011
prev sibling next sibling parent Andrew Wiley <wiley.andrew.j gmail.com> writes:
On Tue, Oct 4, 2011 at 10:55 PM, Andrew Wiley <wiley.andrew.j gmail.com> wrote:
 On Tue, Oct 4, 2011 at 8:59 PM, Robert Jacques <sandford jhu.edu> wrote:
 On Tue, 04 Oct 2011 10:54:58 -0400, Andrew Wiley <wiley.andrew.j gmail.com>
 wrote:

 On Tue, Oct 4, 2011 at 3:52 AM, Walter Bright
 <newshound2 digitalmars.com> wrote:
 On 10/4/2011 1:22 AM, deadalnix wrote:
 Do you mean manage the memory that way :
 Shared heap -> TL pool within the shared heap -> allocation in thread
 from
 TL pool.

 And complete GC collect.

Yes.
 This is a good solution do reduce contention on allocation. But a very
 different
 thing than I was initially talking about.

Yes.
 Back to the point,

 Considering you have pointer to immutable from any dataset, but not the
 other
 way around, this is also valid to get a flag for it in the allocation
 interface.

 What is the issue with the compiler here ?

Allocate an object, then cast it to immutable, and pass it to another thread.

Assuming we have to make a call to the GC when an object toggles its immutable/shared state, it seems like this whole approach would basically murder anyone doing message passing with ownership changes, because the workflow tends to be create an object -> cast to shared -> send to another thread -> cast away shared -> do work -> cast to shared... On the other hand, I guess the counterargument is that locking an uncontended lock is on the order of two instructions (or so I'm told), so casting away shared probably isn't ever necessary. It just seems somewhat counterintuitive that casting to and from shared would be slower than unnecessarily locking the object.

It's entirely possible to simply allocate the memory for the object from the shared heap to start with. Then no more calls to the GC are needed.

When an object is created and later cast to shared, the compiler *can't* know that it should allocate from the shared heap because the cast may not be anywhere near where the object was created. The same problem goes for immutable.

If you meant that the *user* should be responsible for making sure it's allocated on the shared heap, then yes, that's possible, but you're putting GC implementation details into the type system. That may or may not be a good thing.
Oct 04 2011
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Tue, 04 Oct 2011 23:55:19 -0400, Andrew Wiley <wiley.andrew.j gmail.com>
wrote:

 On Tue, Oct 4, 2011 at 8:59 PM, Robert Jacques <sandford jhu.edu> wrote:
 On Tue, 04 Oct 2011 10:54:58 -0400, Andrew Wiley <wiley.andrew.j gmail.com>
 wrote:

 On Tue, Oct 4, 2011 at 3:52 AM, Walter Bright
 <newshound2 digitalmars.com> wrote:
 On 10/4/2011 1:22 AM, deadalnix wrote:
 Do you mean manage the memory that way :
 Shared heap -> TL pool within the shared heap -> allocation in thread
 from
 TL pool.

 And complete GC collect.

Yes.
 This is a good solution do reduce contention on allocation. But a very
 different
 thing than I was initially talking about.

Yes.
 Back to the point,

 Considering you have pointer to immutable from any dataset, but not the
 other
 way around, this is also valid to get a flag for it in the allocation
 interface.

 What is the issue with the compiler here ?

Allocate an object, then cast it to immutable, and pass it to another thread.

Assuming we have to make a call to the GC when an object toggles its immutable/shared state, it seems like this whole approach would basically murder anyone doing message passing with ownership changes, because the workflow tends to be create an object -> cast to shared -> send to another thread -> cast away shared -> do work -> cast to shared... On the other hand, I guess the counterargument is that locking an uncontended lock is on the order of two instructions (or so I'm told), so casting away shared probably isn't ever necessary. It just seems somewhat counterintuitive that casting to and from shared would be slower than unnecessarily locking the object.

It's entirely possible to simply allocate the memory for the object from the shared heap to start with. Then no more calls to the GC are needed.

When an object is created and later cast to shared, the compiler *can't* know that it should allocate from the shared heap because the cast may not be anywhere near where the object was created. The same problem goes for immutable.

But that not the scenario being discussed. In fact, having a dangling reference, and therefore having an object mutate under you, is just as dangerous as having the GC re-use a memory block. And honestly, as a GC clear or re-use is likely to segfault early and often, its a very detectable bug. Besides, anyone attempting to do this is going to be actively managing references and/or making this a library implementation detail. In fact, they're probably using going to use unqiue!T, which would always allocate from the correct heap.
Oct 04 2011
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Tue, 04 Oct 2011 23:56:52 -0400, Andrew Wiley <wiley.andrew.j gmail.com>
wrote:

 On Tue, Oct 4, 2011 at 10:55 PM, Andrew Wiley <wiley.andrew.j gmail.com> wrote:
 On Tue, Oct 4, 2011 at 8:59 PM, Robert Jacques <sandford jhu.edu> wrote:
 On Tue, 04 Oct 2011 10:54:58 -0400, Andrew Wiley <wiley.andrew.j gmail.com>
 wrote:

 On Tue, Oct 4, 2011 at 3:52 AM, Walter Bright
 <newshound2 digitalmars.com> wrote:
 On 10/4/2011 1:22 AM, deadalnix wrote:
 Do you mean manage the memory that way :
 Shared heap -> TL pool within the shared heap -> allocation in thread
 from
 TL pool.

 And complete GC collect.

Yes.
 This is a good solution do reduce contention on allocation. But a very
 different
 thing than I was initially talking about.

Yes.
 Back to the point,

 Considering you have pointer to immutable from any dataset, but not the
 other
 way around, this is also valid to get a flag for it in the allocation
 interface.

 What is the issue with the compiler here ?

Allocate an object, then cast it to immutable, and pass it to another thread.

Assuming we have to make a call to the GC when an object toggles its immutable/shared state, it seems like this whole approach would basically murder anyone doing message passing with ownership changes, because the workflow tends to be create an object -> cast to shared -> send to another thread -> cast away shared -> do work -> cast to shared... On the other hand, I guess the counterargument is that locking an uncontended lock is on the order of two instructions (or so I'm told), so casting away shared probably isn't ever necessary. It just seems somewhat counterintuitive that casting to and from shared would be slower than unnecessarily locking the object.

It's entirely possible to simply allocate the memory for the object from the shared heap to start with. Then no more calls to the GC are needed.

When an object is created and later cast to shared, the compiler *can't* know that it should allocate from the shared heap because the cast may not be anywhere near where the object was created. The same problem goes for immutable.

If you meant that the *user* should be responsible for making sure it's allocated on the shared heap, then yes, that's possible, but you're putting GC implementation details into the type system. That may or may not be a good thing.

I would phrase it as a shift D's memory model towards NUMA. By the way, GP GPU is here to stay and it's NUMA. HPC software is cache aware, which is NUMA. And all high-end server systems are NUMA aware, to say nothing of cluster/fabric computing.
Oct 04 2011
prev sibling parent Sean Kelly <sean invisibleduck.org> writes:
Maybe the correct approach is simply to try and eliminate the mutex protecti=
ng GC operations so allocations can be performed concurrently by multiple th=
reads?

Sent from my iPhone

On Oct 4, 2011, at 8:55 PM, Andrew Wiley <wiley.andrew.j gmail.com> wrote:

 On Tue, Oct 4, 2011 at 8:59 PM, Robert Jacques <sandford jhu.edu> wrote:
 On Tue, 04 Oct 2011 10:54:58 -0400, Andrew Wiley <wiley.andrew.j gmail.co=


 wrote:
=20
 On Tue, Oct 4, 2011 at 3:52 AM, Walter Bright
 <newshound2 digitalmars.com> wrote:
=20
 On 10/4/2011 1:22 AM, deadalnix wrote:
=20
 Do you mean manage the memory that way :
 Shared heap -> TL pool within the shared heap -> allocation in thread
 from
 TL pool.
=20
 And complete GC collect.

Yes. =20 =20
 This is a good solution do reduce contention on allocation. But a very=





 different
 thing than I was initially talking about.

Yes. =20 =20
 Back to the point,
=20
 Considering you have pointer to immutable from any dataset, but not th=





 other
 way around, this is also valid to get a flag for it in the allocation
 interface.
=20
 What is the issue with the compiler here ?

Allocate an object, then cast it to immutable, and pass it to another thread. =20

Assuming we have to make a call to the GC when an object toggles its immutable/shared state, it seems like this whole approach would basically murder anyone doing message passing with ownership changes, because the workflow tends to be create an object -> cast to shared -> send to another thread -> cast away shared -> do work -> cast to shared... On the other hand, I guess the counterargument is that locking an uncontended lock is on the order of two instructions (or so I'm told), so casting away shared probably isn't ever necessary. It just seems somewhat counterintuitive that casting to and from shared would be slower than unnecessarily locking the object. =20

It's entirely possible to simply allocate the memory for the object from t=


 shared heap to start with. Then no more calls to the GC are needed.
=20

When an object is created and later cast to shared, the compiler *can't* know that it should allocate from the shared heap because the cast may not be anywhere near where the object was created. The same problem goes for immutable.

Oct 05 2011
prev sibling next sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
On Oct 3, 2011, at 3:27 PM, Jason House wrote:

 Sean Kelly Wrote:
 There's another important issue that hasn't yet been addressed, which =


non-shared data should be the one that created it. So that SHARED flag = should really be a thread-id of some sort. Alternately, each thread = could allocate from its own pool, with shared allocations coming from a = common pool. This would allow the lock granularity to be reduced and in = some cases eliminated.
=20
=20
 Why not run the collection for a single thread in the thread being =

big step up from stop-the world collections, but still requires pauses. The world can't be stopped when finalizers run or the app can deadlock. = So the only correct behavior is to have the creator of a TLS block be = the one to finalize it.=
Oct 03 2011
parent Jason House <jason.james.house gmail.com> writes:
Sean Kelly Wrote:

 On Oct 3, 2011, at 3:27 PM, Jason House wrote:
 
 Sean Kelly Wrote:
 There's another important issue that hasn't yet been addressed, which is that
when the GC collects memory, the thread that finalizes non-shared data should
be the one that created it.  So that SHARED flag should really be a thread-id
of some sort.  Alternately, each thread could allocate from its own pool, with
shared allocations coming from a common pool.  This would allow the lock
granularity to be reduced and in some cases eliminated.

Why not run the collection for a single thread in the thread being collected? It's a simple way to force where the finalizer runs. It's a big step up from stop-the world collections, but still requires pauses.


If only one thread is stopped, how will a deadlock occur? If it's a deadlock due to new allocations, doesn't the current GC already handle that?
Oct 03 2011
prev sibling parent reply "Robert Jacques" <sandford jhu.edu> writes:
On Mon, 03 Oct 2011 15:48:57 -0400, deadalnix <deadalnix gmail.com> wrote:

[snip]

 What I suggest in add a flag SHARED in BlkAttr and store it as an
 attribute of the block. Later modification could be made according to
 this flag. This attribute shouldn't be modifiable later on.

 What do you think ? Is it something it worth working on ? If it is, how
 can I help ?

I've been a proponent of thread-local garbage collection, so naturally I think it's a good idea :) There are some GCs specifically tailored for immutable data, so I'd probably wish to add separate SHARED and IMMUTABLE flags. On the con side, the major issue with thread-local GCs is that currently we don't have good ways of building shared and immutable data. This leads to people building data with mutable structures and casting at the end. Now the issue with shared, is mostly a quality of implementation issue. However, building immutable data structures efficiently requires a unique (aka. mobile) storage type, which we'll probably get at the same time D gets an ownership type system. That is to say, no time in the foreseeable future. That said, there are are mitigating factors. First, by far the most common example of the build & cast pattern involves string/array building; a task appender addresses in spades. Second, std.allocators could be used to determine which heap to allocate from. Third, we could op to be able to switch the GC from thread-local to shared mode and visa versa; the idea being that inside an object building routine, all allocations would be casted to immutable/shared and thus the local heap should be bypassed. As for how can you help, I'd suggest building a thread local gc, following the design of the recent discussion on std.allocators, if you're up to it.
Oct 03 2011
next sibling parent deadalnix <deadalnix gmail.com> writes:
Le 04/10/2011 08:02, Robert Jacques a écrit :
 On Mon, 03 Oct 2011 15:48:57 -0400, deadalnix <deadalnix gmail.com> wrote:

 [snip]

 What I suggest in add a flag SHARED in BlkAttr and store it as an
 attribute of the block. Later modification could be made according to
 this flag. This attribute shouldn't be modifiable later on.

 What do you think ? Is it something it worth working on ? If it is, how
 can I help ?

I've been a proponent of thread-local garbage collection, so naturally I think it's a good idea :) There are some GCs specifically tailored for immutable data, so I'd probably wish to add separate SHARED and IMMUTABLE flags. On the con side, the major issue with thread-local GCs is that currently we don't have good ways of building shared and immutable data. This leads to people building data with mutable structures and casting at the end. Now the issue with shared, is mostly a quality of implementation issue. However, building immutable data structures efficiently requires a unique (aka. mobile) storage type, which we'll probably get at the same time D gets an ownership type system. That is to say, no time in the foreseeable future. That said, there are are mitigating factors. First, by far the most common example of the build & cast pattern involves string/array building; a task appender addresses in spades. Second, std.allocators could be used to determine which heap to allocate from. Third, we could op to be able to switch the GC from thread-local to shared mode and visa versa; the idea being that inside an object building routine, all allocations would be casted to immutable/shared and thus the local heap should be bypassed. As for how can you help, I'd suggest building a thread local gc, following the design of the recent discussion on std.allocators, if you're up to it.

The GC switch you suggest doesn't take into account all cases. you cannot get something work without shared GC. The thing is that shared data must be collected with a shared collection cycle. But most of data aren't.
Oct 04 2011
prev sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Tue, 04 Oct 2011 17:50:03 -0400, deadalnix <deadalnix gmail.com> wrote:
 Le 04/10/2011 08:02, Robert Jacques a écrit :
 On Mon, 03 Oct 2011 15:48:57 -0400, deadalnix <deadalnix gmail.com> wrote:

 [snip]

 What I suggest in add a flag SHARED in BlkAttr and store it as an
 attribute of the block. Later modification could be made according to
 this flag. This attribute shouldn't be modifiable later on.

 What do you think ? Is it something it worth working on ? If it is, how
 can I help ?

I've been a proponent of thread-local garbage collection, so naturally I think it's a good idea :) There are some GCs specifically tailored for immutable data, so I'd probably wish to add separate SHARED and IMMUTABLE flags. On the con side, the major issue with thread-local GCs is that currently we don't have good ways of building shared and immutable data. This leads to people building data with mutable structures and casting at the end. Now the issue with shared, is mostly a quality of implementation issue. However, building immutable data structures efficiently requires a unique (aka. mobile) storage type, which we'll probably get at the same time D gets an ownership type system. That is to say, no time in the foreseeable future. That said, there are are mitigating factors. First, by far the most common example of the build & cast pattern involves string/array building; a task appender addresses in spades. Second, std.allocators could be used to determine which heap to allocate from. Third, we could op to be able to switch the GC from thread-local to shared mode and visa versa; the idea being that inside an object building routine, all allocations would be casted to immutable/shared and thus the local heap should be bypassed. As for how can you help, I'd suggest building a thread local gc, following the design of the recent discussion on std.allocators, if you're up to it.

The GC switch you suggest doesn't take into account all cases. you cannot get something work without shared GC. The thing is that shared data must be collected with a shared collection cycle. But most of data aren't.

Well, our current GC behaves as a shared GC.
Oct 04 2011