digitalmars.D - Shared keyword and the GC?

renoX (12/12) Oct 17 2012 Hello,

=?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= (39/51) Oct 17 2012 Let's step back for a bit and think about what we want to achieve with

sclytrack (3/64) Oct 17 2012 Introduce the "noshared" keyword.

=?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= (6/66) Oct 17 2012 Not a practical solution.

Jacob Carlborg (7/41) Oct 17 2012 All TLS data is handled by collectors running in their one single

=?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= (11/56) Oct 18 2012 How does it deal with the problem where a pointer in TLS points to

Jacob Carlborg (6/12) Oct 19 2012 I'm not sure how this is handled. But the GC is only used for the
sclytrack (4/11) Oct 19 2012 Could you give an example?

=?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= (31/44) Oct 19 2012 The problem with D is that we have a (more or less) stable language that...

sclytrack (6/10) Oct 19 2012 Maybe the goal is to run the thread local garbage collectors very

sclytrack (24/31) Oct 19 2012 import std.stdio;

deadalnix (10/10) Oct 17 2012 Why not definitively adopt the following (and already proposed) memory

=?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= (6/16) Oct 18 2012 Can you elaborate? I'm not sure I understand.

deadalnix (27/41) Oct 22 2012 OK let me detail a little more.

Sean Kelly (19/25) Oct 18 2012 it's not built into the language, so the GC cannot make assumptions.

Jacob Carlborg (5/6) Oct 18 2012 Or move the shared data to the global heap when it's casted. Don't know

Sean Kelly (13/18) Oct 18 2012 instantiation, so to allow thread-local collections we'd have to make =

Jacob Carlborg (7/8) Oct 18 2012 You said the thread local heap would be merged with the global on thread...

Sean Kelly (11/18) Oct 18 2012 block is even movable. I agree that this would be the most efficient =

Jacob Carlborg (4/5) Oct 19 2012 Ah, now I see.

Michel Fortin (25/42) Oct 18 2012 All this is nice, but what is the owner thread for immutable data?

Jacob Carlborg (7/18) Oct 19 2012 Would it be any difference if the immutable data was collected from a

Michel Fortin (22/42) Oct 19 2012 A thread-local GC would be efficient because it scans only one thread.

Sean Kelly (24/27) Oct 22 2012 Because immutable is always implicitly shared, all your strings and =

=?ISO-8859-1?Q?Alex_R=F8nne_Petersen?= (12/21) Oct 18 2012 I'm not really sure how this solves the problem of having pointers from

Sean Kelly (12/16) Oct 22 2012 from a thread-local heap into the global heap and vice versa. Can you =

Jacob Carlborg (5/6) Oct 22 2012 Funny thing, immutable was supposed to make it easier to do concurrency

Andrei Alexandrescu (3/11) Oct 22 2012 But not garbage collection.

deadalnix (8/20) Oct 22 2012 OCmal's GC is one of the fastest GC ever made. And it is the case

thedeemon (10/17) Oct 23 2012 OCaml, I suppose. It is single threaded (there is no thread-level
Araq (3/5) Oct 23 2012 According to which benchmarks? And does the fact that an object

thedeemon (17/24) Oct 23 2012 I haven't seen proper benchmarks but some time ago I wrote in D

Araq (3/20) Oct 24 2012 That's true. But you don't need to know about immmutability at

thedeemon (5/15) Oct 24 2012 No, not that, of course. As I said, I haven't seen proper

Sean Kelly (10/15) Oct 22 2012 thread-local GC collection. Since shared data may never reference =

deadalnix (6/15) Oct 22 2012 This is already unsafe anyway. The clean solution is either to allocate

"renoX" <renozyx gmail.com> writes:

Hello,
in the discussions thread in the recent blog post which 
summarized how GC works(*), the topic of thread-local GC was 
further discussed and I pointed out that by default global 
variables in D are thread local but I was answered that the types 
doesn't tell which global variable are thread local and which are 
shared so the GC cannot use this information, is-it true?
It seems like a wasted opportunity..

BR,
renoX


*:
http://xtzgzorex.wordpress.com/2012/10/11/demystifying-garbage-collectors/

Oct 17 2012

=?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= <alex lycus.org> writes:

On 17-10-2012 10:29, renoX wrote:
 Hello,
 in the discussions thread in the recent blog post which summarized how
 GC works(*), the topic of thread-local GC was further discussed and I
 pointed out that by default global variables in D are thread local but I
 was answered that the types doesn't tell which global variable are
 thread local and which are shared so the GC cannot use this information,
 is-it true?
 It seems like a wasted opportunity..

 BR,
 renoX


 *:
 http://xtzgzorex.wordpress.com/2012/10/11/demystifying-garbage-collectors/

Let's step back for a bit and think about what we want to achieve with 
thread-local garbage collection. The idea is that we look only at a 
single thread's heap (and stack/registers, of course) when doing a 
collection. This means that we can -- theoretically -- stop only one 
thread at a time and only when it needs to be stopped. This is clearly a 
huge win in scalability and raw speed. With a scheme like this, it might 
even be possible to get away with a simple mark-sweep or copying GC per 
thread instead of a complicated generational GC, mainly due to the 
paradigms the isolation model induces.

Rust, as it is today, can do this. Tasks (or threads if you will - 
though they aren't the same thing) are completely isolated. Types that 
can potentially contain pointers into a task's heap cannot be sent to 
other tasks at all. Rust also does not have global variables.

So, let's look at D:

1. We have global variables.
1. Only std.concurrency enforces isolation at a type system level; it's 
not built into the language, so the GC cannot make assumptions.
1. The shared qualifier effectively allows pointers from one thread's 
heap into another's.

It's important to keep in mind that in order for thread-local GC (as 
defined above) to be possible at all, *under no circumstances whatsoever 
must there be a pointer in one thread's heap into another thread's heap, 
ever*. If this happens and you apply the above GC strategy (stop one 
thread at a time and scan only that thread's heap), you're effectively 
dealing with something very similar to the lost object problem on 
concurrent GC.

To clarify with regards to the shared qualifier: It does absolutely 
nothing. It's useless. All it does is slap a pretty "I can be shared 
arbitrarily across threads" label on a type. Even if you have this 
knowledge in the GC, it's not going to help you, because you *still* 
have to deal with the problem that arbitrary pointers can be floating 
around in arbitrary threads.

(And don't even get me started on the lack of clear semantics (and even 
the few semi-agreed-upon but flawed semantics) for shared.)

-- 
Alex Rønne Petersen
alex lycus.org
http://lycus.org

Oct 17 2012

"sclytrack" <sclytrack noshared.com> writes:

On Wednesday, 17 October 2012 at 08:55:55 UTC, Alex Rønne 
Petersen wrote:
 On 17-10-2012 10:29, renoX wrote:
 Hello,
 in the discussions thread in the recent blog post which 
 summarized how
 GC works(*), the topic of thread-local GC was further 
 discussed and I
 pointed out that by default global variables in D are thread 
 local but I
 was answered that the types doesn't tell which global variable 
 are
 thread local and which are shared so the GC cannot use this 
 information,
 is-it true?
 It seems like a wasted opportunity..

 BR,
 renoX


 *:
 http://xtzgzorex.wordpress.com/2012/10/11/demystifying-garbage-collectors/

 Let's step back for a bit and think about what we want to 
 achieve with thread-local garbage collection. The idea is that 
 we look only at a single thread's heap (and stack/registers, of 
 course) when doing a collection. This means that we can -- 
 theoretically -- stop only one thread at a time and only when 
 it needs to be stopped. This is clearly a huge win in 
 scalability and raw speed. With a scheme like this, it might 
 even be possible to get away with a simple mark-sweep or 
 copying GC per thread instead of a complicated generational GC, 
 mainly due to the paradigms the isolation model induces.

 Rust, as it is today, can do this. Tasks (or threads if you 
 will - though they aren't the same thing) are completely 
 isolated. Types that can potentially contain pointers into a 
 task's heap cannot be sent to other tasks at all. Rust also 
 does not have global variables.

 So, let's look at D:

 1. We have global variables.
 1. Only std.concurrency enforces isolation at a type system 
 level; it's not built into the language, so the GC cannot make 
 assumptions.
 1. The shared qualifier effectively allows pointers from one 
 thread's heap into another's.

 It's important to keep in mind that in order for thread-local 
 GC (as defined above) to be possible at all, *under no 
 circumstances whatsoever must there be a pointer in one 
 thread's heap into another thread's heap, ever*. If this 
 happens and you apply the above GC strategy (stop one thread at 
 a time and scan only that thread's heap), you're effectively 
 dealing with something very similar to the lost object problem 
 on concurrent GC.

 To clarify with regards to the shared qualifier: It does 
 absolutely nothing. It's useless. All it does is slap a pretty 
 "I can be shared arbitrarily across threads" label on a type. 
 Even if you have this knowledge in the GC, it's not going to 
 help you, because you *still* have to deal with the problem 
 that arbitrary pointers can be floating around in arbitrary 
 threads.

 (And don't even get me started on the lack of clear semantics 
 (and even the few semi-agreed-upon but flawed semantics) for 
 shared.)

Introduce the "noshared" keyword.

Oct 17 2012

=?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= <alex lycus.org> writes:

On 17-10-2012 11:50, sclytrack wrote:
 On Wednesday, 17 October 2012 at 08:55:55 UTC, Alex Rønne Petersen wrote:
 On 17-10-2012 10:29, renoX wrote:
 Hello,
 in the discussions thread in the recent blog post which summarized how
 GC works(*), the topic of thread-local GC was further discussed and I
 pointed out that by default global variables in D are thread local but I
 was answered that the types doesn't tell which global variable are
 thread local and which are shared so the GC cannot use this information,
 is-it true?
 It seems like a wasted opportunity..

 BR,
 renoX


 *:
 http://xtzgzorex.wordpress.com/2012/10/11/demystifying-garbage-collectors/

 Let's step back for a bit and think about what we want to achieve with
 thread-local garbage collection. The idea is that we look only at a
 single thread's heap (and stack/registers, of course) when doing a
 collection. This means that we can -- theoretically -- stop only one
 thread at a time and only when it needs to be stopped. This is clearly
 a huge win in scalability and raw speed. With a scheme like this, it
 might even be possible to get away with a simple mark-sweep or copying
 GC per thread instead of a complicated generational GC, mainly due to
 the paradigms the isolation model induces.

 Rust, as it is today, can do this. Tasks (or threads if you will -
 though they aren't the same thing) are completely isolated. Types that
 can potentially contain pointers into a task's heap cannot be sent to
 other tasks at all. Rust also does not have global variables.

 So, let's look at D:

 1. We have global variables.
 1. Only std.concurrency enforces isolation at a type system level;
 it's not built into the language, so the GC cannot make assumptions.
 1. The shared qualifier effectively allows pointers from one thread's
 heap into another's.

 It's important to keep in mind that in order for thread-local GC (as
 defined above) to be possible at all, *under no circumstances
 whatsoever must there be a pointer in one thread's heap into another
 thread's heap, ever*. If this happens and you apply the above GC
 strategy (stop one thread at a time and scan only that thread's heap),
 you're effectively dealing with something very similar to the lost
 object problem on concurrent GC.

 To clarify with regards to the shared qualifier: It does absolutely
 nothing. It's useless. All it does is slap a pretty "I can be shared
 arbitrarily across threads" label on a type. Even if you have this
 knowledge in the GC, it's not going to help you, because you *still*
 have to deal with the problem that arbitrary pointers can be floating
 around in arbitrary threads.

 (And don't even get me started on the lack of clear semantics (and
 even the few semi-agreed-upon but flawed semantics) for shared.)

 Introduce the "noshared" keyword.

Not a practical solution.

-- 
Alex Rønne Petersen
alex lycus.org
http://lycus.org

Oct 17 2012

Jacob Carlborg <doob me.com> writes:

On 2012-10-17 10:55, Alex Rønne Petersen wrote:

 Let's step back for a bit and think about what we want to achieve with
 thread-local garbage collection. The idea is that we look only at a
 single thread's heap (and stack/registers, of course) when doing a
 collection. This means that we can -- theoretically -- stop only one
 thread at a time and only when it needs to be stopped. This is clearly a
 huge win in scalability and raw speed. With a scheme like this, it might
 even be possible to get away with a simple mark-sweep or copying GC per
 thread instead of a complicated generational GC, mainly due to the
 paradigms the isolation model induces.

 Rust, as it is today, can do this. Tasks (or threads if you will -
 though they aren't the same thing) are completely isolated. Types that
 can potentially contain pointers into a task's heap cannot be sent to
 other tasks at all. Rust also does not have global variables.

 So, let's look at D:

 1. We have global variables.
 1. Only std.concurrency enforces isolation at a type system level; it's
 not built into the language, so the GC cannot make assumptions.
 1. The shared qualifier effectively allows pointers from one thread's
 heap into another's.

 It's important to keep in mind that in order for thread-local GC (as
 defined above) to be possible at all, *under no circumstances whatsoever
 must there be a pointer in one thread's heap into another thread's heap,
 ever*. If this happens and you apply the above GC strategy (stop one
 thread at a time and scan only that thread's heap), you're effectively
 dealing with something very similar to the lost object problem on
 concurrent GC.

 To clarify with regards to the shared qualifier: It does absolutely
 nothing. It's useless. All it does is slap a pretty "I can be shared
 arbitrarily across threads" label on a type. Even if you have this
 knowledge in the GC, it's not going to help you, because you *still*
 have to deal with the problem that arbitrary pointers can be floating
 around in arbitrary threads.

 (And don't even get me started on the lack of clear semantics (and even
 the few semi-agreed-upon but flawed semantics) for shared.)

All TLS data is handled by collectors running in their one single 
thread, as you describe above. Any non-TLS data is handled the same way 
as the GC currently works.

This is how the, now deprecated, Apple GC used by Objective-C works.

-- 
/Jacob Carlborg

Oct 17 2012

=?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= <alex lycus.org> writes:

On 17-10-2012 13:51, Jacob Carlborg wrote:
 On 2012-10-17 10:55, Alex Rønne Petersen wrote:

 Let's step back for a bit and think about what we want to achieve with
 thread-local garbage collection. The idea is that we look only at a
 single thread's heap (and stack/registers, of course) when doing a
 collection. This means that we can -- theoretically -- stop only one
 thread at a time and only when it needs to be stopped. This is clearly a
 huge win in scalability and raw speed. With a scheme like this, it might
 even be possible to get away with a simple mark-sweep or copying GC per
 thread instead of a complicated generational GC, mainly due to the
 paradigms the isolation model induces.

 Rust, as it is today, can do this. Tasks (or threads if you will -
 though they aren't the same thing) are completely isolated. Types that
 can potentially contain pointers into a task's heap cannot be sent to
 other tasks at all. Rust also does not have global variables.

 So, let's look at D:

 1. We have global variables.
 1. Only std.concurrency enforces isolation at a type system level; it's
 not built into the language, so the GC cannot make assumptions.
 1. The shared qualifier effectively allows pointers from one thread's
 heap into another's.

 It's important to keep in mind that in order for thread-local GC (as
 defined above) to be possible at all, *under no circumstances whatsoever
 must there be a pointer in one thread's heap into another thread's heap,
 ever*. If this happens and you apply the above GC strategy (stop one
 thread at a time and scan only that thread's heap), you're effectively
 dealing with something very similar to the lost object problem on
 concurrent GC.

 To clarify with regards to the shared qualifier: It does absolutely
 nothing. It's useless. All it does is slap a pretty "I can be shared
 arbitrarily across threads" label on a type. Even if you have this
 knowledge in the GC, it's not going to help you, because you *still*
 have to deal with the problem that arbitrary pointers can be floating
 around in arbitrary threads.

 (And don't even get me started on the lack of clear semantics (and even
 the few semi-agreed-upon but flawed semantics) for shared.)

 All TLS data is handled by collectors running in their one single
 thread, as you describe above. Any non-TLS data is handled the same way
 as the GC currently works.

 This is how the, now deprecated, Apple GC used by Objective-C works.

How does it deal with the problem where a pointer in TLS points to 
global data, or worse yet, a pointer in the global heap points to TLS?

I'm pretty sure it can't without doing a full pass over the entire heap, 
which seems to me like it defeats the purpose.

But I may just be missing out on some restriction (type system or 
whatever) Objective-C has that makes it feasible.

-- 
Alex Rønne Petersen
alex lycus.org
http://lycus.org

Oct 18 2012

Jacob Carlborg <doob me.com> writes:

On 2012-10-19 08:48, Alex Rønne Petersen wrote:

 How does it deal with the problem where a pointer in TLS points to
 global data, or worse yet, a pointer in the global heap points to TLS?

 I'm pretty sure it can't without doing a full pass over the entire heap,
 which seems to me like it defeats the purpose.

 But I may just be missing out on some restriction (type system or
 whatever) Objective-C has that makes it feasible.

I'm not sure how this is handled. But the GC is only used for the 
Objective-C allocations, i.e. [NSObject alloc] and not for C 
allocations, i.e. "malloc".

-- 
/Jacob Carlborg

Oct 19 2012

"sclytrack" <sclytrack awake.com> writes:

 How does it deal with the problem where a pointer in TLS points 
 to global data,

Need to run stop-the-world for shared heap. But it would be 
interesting to have blocks that have no shared pointers in them.


 or worse yet, a pointer in the global heap points to TLS?

Could you give an example?

 I'm pretty sure it can't without doing a full pass over the 
 entire heap, which seems to me like it defeats the purpose.

Yeah.

 But I may just be missing out on some restriction (type system 
 or whatever) Objective-C has that makes it feasible.

Oct 19 2012

=?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= <alex lycus.org> writes:

On 19-10-2012 11:07, sclytrack wrote:
 How does it deal with the problem where a pointer in TLS points to
 global data,

 Need to run stop-the-world for shared heap. But it would be interesting
 to have blocks that have no shared pointers in them.

The problem with D is that we have a (more or less) stable language that 
we can't make major changes to at this point.

 or worse yet, a pointer in the global heap points to TLS?

 Could you give an example?

I don't know Objective-C, so in D:

void* p; // in TLS

void main()
{
     p = GC.malloc(1024); // a pointer to the global heap is now in TLS
}

Or the more complicated case (for any arbitrary graph of objects):

Object p; // in TLS

class C
{
     Object o;

     this(Object o)
     {
         this.o = o;
     }
}

void main()
{
     p = new C(new Object); // the graph can be arbitrarily complex and 
any part of it can be allocated with the GC, malloc, or any other mechanism
}

 I'm pretty sure it can't without doing a full pass over the entire
 heap, which seems to me like it defeats the purpose.

 Yeah.

Thread-local GC is all about improving scalability by only stopping 
threads that need to be stopped. If you can't even do that, then any 
effort towards thread-local GC is quite pointless IMO.

 But I may just be missing out on some restriction (type system or
 whatever) Objective-C has that makes it feasible.



-- 
Alex Rønne Petersen
alex lycus.org
http://lycus.org

Oct 19 2012

"sclytrack" <sclytrack iq87.com> writes:

 Thread-local GC is all about improving scalability by only 
 stopping threads that need to be stopped. If you can't even do 
 that, then any effort towards thread-local GC is quite 
 pointless IMO.

Maybe the goal is to run the thread local garbage collectors very 
frequently and the stop-the-world one's just occasionally. Once 
your thread has shared in it must participate in the 
stop-the-world ones. Maybe the threads needs to register to both 
the thread local garbage collector and the global stop-the-world 
garbage collector.

Oct 19 2012

"sclytrack" <sclytrack answermyself.com> writes:

On Friday, 19 October 2012 at 09:07:55 UTC, sclytrack wrote:
 How does it deal with the problem where a pointer in TLS 
 points to global data,

 Need to run stop-the-world for shared heap. But it would be 
 interesting to have blocks that have no shared pointers in them.


 or worse yet, a pointer in the global heap points to TLS?

 Could you give an example?


import std.stdio;

class Local
{
}

class Global
{
	Local data;
	int [] arr;
}

Local l2;
int [] arr;			//tls

int main()
{
	shared Global g = new shared(Global);		//global heap
	Local l1 = new Local();	//future local heap
	//	g.data = l1;	//disallowed
	l2 = new Local();
	//	g.data = l2;	//disallowed
	arr = new int[10];	//future local heap
	g.arr = cast(shared(int[])) arr; //bypassed.	
	writeln("Complete");
	return 0;
}

Oct 19 2012

deadalnix <deadalnix gmail.com> writes:

Why not definitively adopt the following (and already proposed) memory 
scheme (some practice are now considered valid when this scheme is not 
respected) :

Thread local head (one by thread) -> shared heap -> immutable heap

This model have multiple benefices :
  - TL heap only can be processed by only interacting with one thread.
  - immutable head can be collected 100% concurently if we allow some 
floating garbage.
  - shared heap is the only problem, but as its size stay small, the 
problem stay small.

Oct 17 2012

=?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= <alex lycus.org> writes:

On 17-10-2012 16:26, deadalnix wrote:
 Why not definitively adopt the following (and already proposed) memory
 scheme (some practice are now considered valid when this scheme is not
 respected) :

 Thread local head (one by thread) -> shared heap -> immutable heap

 This model have multiple benefices :
   - TL heap only can be processed by only interacting with one thread.
   - immutable head can be collected 100% concurently if we allow some
 floating garbage.
   - shared heap is the only problem, but as its size stay small, the
 problem stay small.

Can you elaborate? I'm not sure I understand.

-- 
Alex Rønne Petersen
alex lycus.org
http://lycus.org

Oct 18 2012

deadalnix <deadalnix gmail.com> writes:

Le 19/10/2012 08:49, Alex Rønne Petersen a écrit :
 On 17-10-2012 16:26, deadalnix wrote:
 Why not definitively adopt the following (and already proposed) memory
 scheme (some practice are now considered valid when this scheme is not
 respected) :

 Thread local head (one by thread) -> shared heap -> immutable heap

 This model have multiple benefices :
 - TL heap only can be processed by only interacting with one thread.
 - immutable head can be collected 100% concurently if we allow some
 floating garbage.
 - shared heap is the only problem, but as its size stay small, the
 problem stay small.

 Can you elaborate? I'm not sure I understand.

OK let me detail a little more.

First, I'll explain TL GC. You have to understand shared heap here as 
both shared and immutable heap.

TL collection can be done disturbing only one thread. When the TL 
collection is done, a set of pointer to shared heap is known.

Now, all new pointer in the TL heap to shared heap is either :
  - a new allocation.
  - a pointer read from the shared heap.

So, basically, at the end, we have a set of root to collect the shared 
heap. The thread can continue to run when the shared heap is collected.

Now let's consider the immutable heap. Given a set of root from TL and 
shared, the immutable heap can be collected concurrently. I think it is 
straightforward and will not elaborate.

So the problem is now the shared heap. Here is how I see its collection. 
When the GC want to collect shared it first signal itself to each thread 
that will GC TL data and give back a set of root. As of this point, the 
GC mark all new allocation as live and set a write barrier on shared : 
when a shared object is written, it is marked ive (obviously), its old 
value is scanned, and its new value is scanned too. The reason is pretty 
simple : the old value may have been read by a thread and stored 
locally. When the collection is done, the write barrier can be removed. 
Obviously, the write barrier is not needed for immutable object or any 
shared write that isn't a pointer write, which lead to a very nice way 
to collect things.

The obvious drawback is that pointer for TL to another TL or from shared 
to TL will confuse the GC. But it is already not  safe anyway.

Oct 22 2012

Sean Kelly <sean invisibleduck.org> writes:

On Oct 17, 2012, at 1:55 AM, Alex R=F8nne Petersen <alex lycus.org> =
wrote:
=20
 So, let's look at D:
=20
 1. We have global variables.
 1. Only std.concurrency enforces isolation at a type system level; =

it's not built into the language, so the GC cannot make assumptions.
 1. The shared qualifier effectively allows pointers from one thread's =

heap into another's.

Well, the problem is more that a variable can be cast to shared after =
instantiation, so to allow thread-local collections we'd have to make =
cast(shared) set a flag on the memory block to indicate that it's =
shared, and vice-versa for unshared.  Then when a thread terminates, all =
blocks not flagged as shared would be finalized, leaving the shared =
blocks alone.  Then any pool from the terminated thread containing a =
shared block would have to be merged into the global heap instead of =
released to the OS.

I think we need to head in this direction anyway, because we need to =
make sure that thread-local data is finalized by its owner thread.  A =
blocks owner would be whoever allocated the block or if cast to shared =
and back to unshared, whichever thread most recently cast the block back =
to unshared.  Tracking the owner of a block gives us the shared state =
implicitly, making thread-local collections possible.  Who wants to work =
on this? :-)=

Oct 18 2012

Jacob Carlborg <doob me.com> writes:

On 2012-10-18 20:26, Sean Kelly wrote:

 Well, the problem is more that a variable can be cast to shared after
instantiation, so to allow thread-local collections we'd have to make
cast(shared) set a flag on the memory block to indicate that it's shared, and
vice-versa for unshared.  Then when a thread terminates, all blocks not flagged
as shared would be finalized, leaving the shared blocks alone.  Then any pool
from the terminated thread containing a shared block would have to be merged
into the global heap instead of released to the OS.

Or move the shared data to the global heap when it's casted. Don't know 
that's best. This way all data in a give pool will be truly thread local.

-- 
/Jacob Carlborg

Oct 18 2012

Sean Kelly <sean invisibleduck.org> writes:

On Oct 18, 2012, at 11:48 AM, Jacob Carlborg <doob me.com> wrote:

 On 2012-10-18 20:26, Sean Kelly wrote:
=20
 Well, the problem is more that a variable can be cast to shared after =


instantiation, so to allow thread-local collections we'd have to make =
cast(shared) set a flag on the memory block to indicate that it's =
shared, and vice-versa for unshared.  Then when a thread terminates, all =
blocks not flagged as shared would be finalized, leaving the shared =
blocks alone.  Then any pool from the terminated thread containing a =
shared block would have to be merged into the global heap instead of =
released to the OS.
=20
 Or move the shared data to the global heap when it's casted. Don't =

know that's best. This way all data in a give pool will be truly thread =
local.

And back down to a local pool when shared is cast away.  Assuming the =
block is even movable.  I agree that this would be the most efficient =
use of memory, but I don't know that it's feasible.=

Oct 18 2012

Jacob Carlborg <doob me.com> writes:

On 2012-10-18 20:54, Sean Kelly wrote:

 And back down to a local pool when shared is cast away.  Assuming the block is
even movable.  I agree that this would be the most efficient use of memory, but
I don't know that it's feasible.

You said the thread local heap would be merged with the global on thread 
termination. How is that different?

Alternative it could stay in the global heap. I mean, not many variables 
should be "shared" and even fewer should be casted back and forth.

-- 
/Jacob Carlborg

Oct 18 2012

Sean Kelly <sean invisibleduck.org> writes:

On Oct 18, 2012, at 12:22 PM, Jacob Carlborg <doob me.com> wrote:

 On 2012-10-18 20:54, Sean Kelly wrote:
=20
 And back down to a local pool when shared is cast away.  Assuming the =


block is even movable.  I agree that this would be the most efficient =
use of memory, but I don't know that it's feasible.
=20
 You said the thread local heap would be merged with the global on =

thread termination. How is that different?
=20
 Alternative it could stay in the global heap. I mean, not many =

variables should be "shared" and even fewer should be casted back and =
forth.

It's different in that a variable's address never actually changes.  =
When a thread completes it hands all of its pools to the shared =
allocator, and then per-thread allocators request free pools from the =
shared allocator before going to the OS.  This is basically how the =
HOARD allocator works.=

Oct 18 2012

Jacob Carlborg <doob me.com> writes:

On 2012-10-18 22:29, Sean Kelly wrote:

 It's different in that a variable's address never actually changes.  When a
thread completes it hands all of its pools to the shared allocator, and then
per-thread allocators request free pools from the shared allocator before going
to the OS.  This is basically how the HOARD allocator works.

Ah, now I see.

-- 
/Jacob Carlborg

Oct 19 2012

Michel Fortin <michel.fortin michelf.ca> writes:

On 2012-10-18 18:26:08 +0000, Sean Kelly <sean invisibleduck.org> said:

 Well, the problem is more that a variable can be cast to shared after
 instantiation, so to allow thread-local collections we'd have to make
 cast(shared) set a flag on the memory block to indicate that it's
 shared, and vice-versa for unshared.  Then when a thread terminates, all
 blocks not flagged as shared would be finalized, leaving the shared
 blocks alone.  Then any pool from the terminated thread containing a
 shared block would have to be merged into the global heap instead of
 released to the OS.
 
 I think we need to head in this direction anyway, because we need to
 make sure that thread-local data is finalized by its owner thread.  A
 blocks owner would be whoever allocated the block or if cast to shared
 and back to unshared, whichever thread most recently cast the block back
 to unshared.  Tracking the owner of a block gives us the shared state
 implicitly, making thread-local collections possible.  Who wants to work
 on this? :-)

All this is nice, but what is the owner thread for immutable data? 
Because immutable is always implicitly shared, all your strings and 
everything else that is immutable is thus "shared" and must be tracked 
by the global heap's collector and can never be handled by a 
thread-local collector. Even if most immutable data never leaves the 
thread it was allocated in, there's no way you can know.

I don't think per-thread GCs will work very well without support for 
immutable data, an for that you need to have a distinction between 
immutable and shared immutable (just like you have with mutable data). 
I complained about this almost three years ago when the semantics of 
shared were being defined, but it got nowhere. Quoting Walter at the 
time:

 As for a shared gc vs thread local gc, I just see an awful lot of 

strange irreproducible bugs when someone passes data from one to the 
other. I doubt it's worth it, unless it can be done with compiler 
guarantees, which seem doubtful.

I think you'll have a hard time convincing Walter it is worth changing 
the behaviour of type modifiers at this point.

Reference:
<http://lists.puremagic.com/pipermail/dmd-concurrency/2010-January/000132.html>
<http://lists.puremagic.com/pipermail/dmd-concurrency/2010-January/000146.html>

-- 
Michel Fortin
michel.fortin michelf.ca
http://michelf.ca/

Oct 18 2012

Jacob Carlborg <doob me.com> writes:

On 2012-10-19 03:06, Michel Fortin wrote:

 All this is nice, but what is the owner thread for immutable data?
 Because immutable is always implicitly shared, all your strings and
 everything else that is immutable is thus "shared" and must be tracked
 by the global heap's collector and can never be handled by a
 thread-local collector. Even if most immutable data never leaves the
 thread it was allocated in, there's no way you can know.

 I don't think per-thread GCs will work very well without support for
 immutable data, an for that you need to have a distinction between
 immutable and shared immutable (just like you have with mutable data). I
 complained about this almost three years ago when the semantics of
 shared were being defined, but it got nowhere. Quoting Walter at the time:

Would it be any difference if the immutable data was collected from a 
different collector than the shared or thread local?

In this case I guess the collector wouldn't try to make a difference 
between shared and non-shared immutable data.

-- 
/Jacob Carlborg

Oct 19 2012

Michel Fortin <michel.fortin michelf.ca> writes:

On 2012-10-19 07:42:53 +0000, Jacob Carlborg <doob me.com> said:

 On 2012-10-19 03:06, Michel Fortin wrote:
 
 All this is nice, but what is the owner thread for immutable data?
 Because immutable is always implicitly shared, all your strings and
 everything else that is immutable is thus "shared" and must be tracked
 by the global heap's collector and can never be handled by a
 thread-local collector. Even if most immutable data never leaves the
 thread it was allocated in, there's no way you can know.
 
 I don't think per-thread GCs will work very well without support for
 immutable data, an for that you need to have a distinction between
 immutable and shared immutable (just like you have with mutable data). I
 complained about this almost three years ago when the semantics of
 shared were being defined, but it got nowhere. Quoting Walter at the time:

 
 Would it be any difference if the immutable data was collected from a 
 different collector than the shared or thread local?
 
 In this case I guess the collector wouldn't try to make a difference 
 between shared and non-shared immutable data.

A thread-local GC would be efficient because it scans only one thread. 
The gain is that you minimize the load on the global GC, reducing 
collection cycles that need to stop all threads. Creating a second 
global GC for immutable data wouldn't free you from the need to stop 
all threads, but now you'd have two global collectors stopping all 
threads which would probably be worse. So I don't see the point in a 
second GC for immutable data.

Immutable data must always be handled by a global GC. There's no way 
around it as long as immutable and shared-immutable are the same thing. 
The more immutable data you have, the more irrelevant the performance 
gains from a thread-local GC becomes, because it get used less often. 
This creates a strange incentive to *not* make things immutable in 
order make things faster.

I'm all for a thread-local GC, but in the current state of the type 
system it'd just be ridiculous. But then, perhaps an implementation of 
it could convince Walter to change some things. So if someone is 
inclined to implement it, go ahead, I'm not here to stop you.

-- 
Michel Fortin
michel.fortin michelf.ca
http://michelf.ca/

Oct 19 2012

Sean Kelly <sean invisibleduck.org> writes:

On Oct 18, 2012, at 6:06 PM, Michel Fortin <michel.fortin michelf.ca> =
wrote:
=20
 All this is nice, but what is the owner thread for immutable data? =

Because immutable is always implicitly shared, all your strings and =
everything else that is immutable is thus "shared" and must be tracked =
by the global heap's collector and can never be handled by a =
thread-local collector. Even if most immutable data never leaves the =
thread it was allocated in, there's no way you can know.

Yes.


 I don't think per-thread GCs will work very well without support for =

immutable data, an for that you need to have a distinction between =
immutable and shared immutable (just like you have with mutable data). I =
complained about this almost three years ago when the semantics of =
shared were being defined, but it got nowhere.

Yeah, that's unfortunate.  "shared" today really has two meanings: =
instance visibility and what happens when the instance is accessed.  By =
comparison, "immutable" just describes what happens when the instance is =
accessed.  The really weird part of all this being that immutable data =
is exempt from the transitivity requirement of "shared".  Though that =
makes me realize that casting a UDT to "shared" could mean traversing =
all data reachable by that instance and marking it as shared as well, =
which sounds absolutely terrible.

Perhaps something could be done in instances where the only place data =
is passed between threads in an app is via std.concurrency?  Allowing =
strings to be referenced by global shared references would still be =
problematic though.  I'll have to give this some thought.=

Oct 22 2012

=?ISO-8859-1?Q?Alex_R=F8nne_Petersen?= <alex lycus.org> writes:

On 18-10-2012 20:26, Sean Kelly wrote:
 On Oct 17, 2012, at 1:55 AM, Alex R�nne Petersen <alex lycus.org> wrote:
 So, let's look at D:

 1. We have global variables.
 1. Only std.concurrency enforces isolation at a type system level; it's not
built into the language, so the GC cannot make assumptions.
 1. The shared qualifier effectively allows pointers from one thread's heap
into another's.

 Well, the problem is more that a variable can be cast to shared after
instantiation, so to allow thread-local collections we'd have to make
cast(shared) set a flag on the memory block to indicate that it's shared, and
vice-versa for unshared.  Then when a thread terminates, all blocks not flagged
as shared would be finalized, leaving the shared blocks alone.  Then any pool
from the terminated thread containing a shared block would have to be merged
into the global heap instead of released to the OS.

 I think we need to head in this direction anyway, because we need to make sure
that thread-local data is finalized by its owner thread.  A blocks owner would
be whoever allocated the block or if cast to shared and back to unshared,
whichever thread most recently cast the block back to unshared.  Tracking the
owner of a block gives us the shared state implicitly, making thread-local
collections possible.  Who wants to work on this? :-)

I'm not really sure how this solves the problem of having pointers from 
a thread-local heap into the global heap and vice versa. Can you 
elaborate on that?

The problem is that even if you know whether a piece of memory is 
flagged shared, you cannot know if some arbitrary number of threads 
happen to have pointers to it and can thus mutate anything inside it 
while a thread-local collection is in progress.

-- 
Alex R�nne Petersen
alex lycus.org
http://lycus.org

Oct 18 2012

Sean Kelly <sean invisibleduck.org> writes:

On Oct 18, 2012, at 11:56 PM, Alex R=F8nne Petersen <alex lycus.org> =
wrote:
=20
 I'm not really sure how this solves the problem of having pointers =

from a thread-local heap into the global heap and vice versa. Can you =
elaborate on that?
=20
 The problem is that even if you know whether a piece of memory is =

flagged shared, you cannot know if some arbitrary number of threads =
happen to have pointers to it and can thus mutate anything inside it =
while a thread-local collection is in progress.

Blocks flagged as shared would be completely ignored by the thread-local =
GC collection.  Since shared data may never reference unshared data, =
that should avoid anything being collected that's still referenced.  I =
hadn't thought about "immutable" though, which may turn out to be a =
problem.=

Oct 22 2012

Jacob Carlborg <doob me.com> writes:

On 2012-10-22 19:44, Sean Kelly wrote:

 Blocks flagged as shared would be completely ignored by the thread-local GC
collection.  Since shared data may never reference unshared data, that should
avoid anything being collected that's still referenced.  I hadn't thought about
"immutable" though, which may turn out to be a problem.

Funny thing, immutable was supposed to make it easier to do concurrency 
programming.

-- 
/Jacob Carlborg

Oct 22 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 10/22/12 3:16 PM, Jacob Carlborg wrote:
 On 2012-10-22 19:44, Sean Kelly wrote:

 Blocks flagged as shared would be completely ignored by the
 thread-local GC collection. Since shared data may never reference
 unshared data, that should avoid anything being collected that's still
 referenced. I hadn't thought about "immutable" though, which may turn
 out to be a problem.

 Funny thing, immutable was supposed to make it easier to do concurrency
 programming.

But not garbage collection.

Andrei

Oct 22 2012

deadalnix <deadalnix gmail.com> writes:

Le 22/10/2012 22:44, Andrei Alexandrescu a �crit :
 On 10/22/12 3:16 PM, Jacob Carlborg wrote:
 On 2012-10-22 19:44, Sean Kelly wrote:

 Blocks flagged as shared would be completely ignored by the
 thread-local GC collection. Since shared data may never reference
 unshared data, that should avoid anything being collected that's still
 referenced. I hadn't thought about "immutable" though, which may turn
 out to be a problem.

 Funny thing, immutable was supposed to make it easier to do concurrency
 programming.

 But not garbage collection.

OCmal's GC is one of the fastest GC ever made. And it is the case 
because it uses immutability to great benefice.

As immutable data can only refers to immutable data, I don't see a 
single problem here.

When collection shared and TL, you get a set of root pointer to 
immutable. All immutable can be collected from such set. All object 
allocated during the collection is supposed to be live.

Oct 22 2012

"thedeemon" <dlang thedeemon.com> writes:

On Monday, 22 October 2012 at 21:19:53 UTC, deadalnix wrote:

 Funny thing, immutable was supposed to make it easier to do 
 concurrency programming.

 But not garbage collection.

 OCmal's GC is one of the fastest GC ever made. And it is the 
 case because it uses immutability to great benefice.

OCaml, I suppose. It is single threaded (there is no thread-level 
parallelism in OCaml) and there is nothing in its GC that uses 
immutability really. It's so fast because of the memory model: to 
tell if a word is a pointer one just needs to look at its least 
significant bit, if it's 0 it's a pointer, if it's 1 it's not. 
That's why native ints are 31-bit in OCaml. With this scheme they 
don't need to store type layout info and pointer bitmaps. And it 
is generational (2 gens), which also adds much speed.

A GC which really relies on immutability you can find in Erlang.

Oct 23 2012

"Araq" <rumpf_a web.de> writes:

 OCmal's GC is one of the fastest GC ever made. And it is the 
 case because it uses immutability to great benefice.

According to which benchmarks? And does the fact that an object 
is immutable really need to be known at compile time for GC 
related optimizations?

Oct 23 2012

"thedeemon" <dlang thedeemon.com> writes:

On Tuesday, 23 October 2012 at 22:33:13 UTC, Araq wrote:
 OCmal's GC is one of the fastest GC ever made. And it is the 
 case because it uses immutability to great benefice.

 According to which benchmarks? And does the fact that an object 
 is immutable really need to be known at compile time for GC 
 related optimizations?

I haven't seen proper benchmarks but some time ago I wrote in D 
and OCaml  basically the same simple program which read and 
parsed some text and performed some calculations, allocating a 
lot of temporary arrays or lists:
https://gist.github.com/2902247
https://gist.github.com/2922399
and OCaml version was 2 times faster than D (29 and 59 seconds on 
input file of 1 million lines). After disabling GC on 
reading/parsing stage and doing calculations without allocations 
and using std.parallelism I made D version work in 4.4 seconds.

One place where immutability really helps is in a generational 
GC: runtime needs to track all the pointers from old generation 
to the young generation, if most of the data is immutable there 
are not so many such pointers, this makes collection faster. When 
all data is immutable there is no such pointers at all, each 
object can only have pointers to older ones.

Oct 23 2012

"Araq" <rumpf_a web.de> writes:

 I haven't seen proper benchmarks but some time ago I wrote in D 
 and OCaml  basically the same simple program which read and 
 parsed some text and performed some calculations, allocating a 
 lot of temporary arrays or lists:
 https://gist.github.com/2902247
 https://gist.github.com/2922399
 and OCaml version was 2 times faster than D (29 and 59 seconds 
 on input file of 1 million lines). After disabling GC on 
 reading/parsing stage and doing calculations without 
 allocations and using std.parallelism I made D version work in 
 4.4 seconds.

And that makes it the "fastest GC ever made"?

 One place where immutability really helps is in a generational 
 GC: runtime needs to track all the pointers from old generation 
 to the young generation, if most of the data is immutable there 
 are not so many such pointers, this makes collection faster. 
 When all data is immutable there is no such pointers at all, 
 each object can only have pointers to older ones.

That's true. But you don't need to know about immmutability at 
compile time to get this benefit.

Oct 24 2012

"thedeemon" <dlang thedeemon.com> writes:

On Wednesday, 24 October 2012 at 17:42:50 UTC, Araq wrote:

 And that makes it the "fastest GC ever made"?

No, not that, of course. As I said, I haven't seen proper 
benchmarks. But OCaml's GC is notorious for its speed and it 
performed very well in all comparisons I saw.

 One place where immutability really helps is in a generational 
 GC: runtime needs to track all the pointers from old 
 generation to the young generation, if most of the data is 
 immutable there are not so many such pointers, this makes 
 collection faster. When all data is immutable there is no such 
 pointers at all, each object can only have pointers to older 
 ones.

 That's true. But you don't need to know about immmutability at 
 compile time to get this benefit.

I agree.

Oct 24 2012

Sean Kelly <sean invisibleduck.org> writes:

On Oct 22, 2012, at 12:16 PM, Jacob Carlborg <doob me.com> wrote:

 On 2012-10-22 19:44, Sean Kelly wrote:
=20
 Blocks flagged as shared would be completely ignored by the =


thread-local GC collection.  Since shared data may never reference =
unshared data, that should avoid anything being collected that's still =
referenced.  I hadn't thought about "immutable" though, which may turn =
out to be a problem.
=20
 Funny thing, immutable was supposed to make it easier to do =

concurrency programming.

In the realm of shared data concurrency, immutable is definitely useful. =
 But where data is all thread-local I'm not entirely sure.  Either way =
though, immutable within the context of this discussion is a library =
optimization issue rather than anything to do with the type itself.=

Oct 22 2012

deadalnix <deadalnix gmail.com> writes:

Le 18/10/2012 20:26, Sean Kelly a �crit :
 On Oct 17, 2012, at 1:55 AM, Alex R�nne Petersen<alex lycus.org>  wrote:
 So, let's look at D:

 1. We have global variables.
 1. Only std.concurrency enforces isolation at a type system level; it's not
built into the language, so the GC cannot make assumptions.
 1. The shared qualifier effectively allows pointers from one thread's heap
into another's.

 Well, the problem is more that a variable can be cast to shared after
instantiation, so to allow thread-local collections we'd have to make
cast(shared) set a flag on the memory block to indicate that it's shared, and
vice-versa for unshared.  Then when a thread terminates, all blocks not flagged
as shared would be finalized, leaving the shared blocks alone.  Then any pool
from the terminated thread containing a shared block would have to be merged
into the global heap instead of released to the OS.

This is already unsafe anyway. The clean solution is either to allocate 
the object as shared, then cast it to TL and back to shared of it make 
sense.

The second option is to clone the object.

Having a flag by object isn't a good idea IMO.

 I think we need to head in this direction anyway, because we need to make sure
that thread-local data is finalized by its owner thread.  A blocks owner would
be whoever allocated the block or if cast to shared and back to unshared,
whichever thread most recently cast the block back to unshared.  Tracking the
owner of a block gives us the shared state implicitly, making thread-local
collections possible.  Who wants to work on this? :-)

Oct 22 2012

D Programming

C/C++ Programming

Other

digitalmars.D - Shared keyword and the GC?