www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Removing The Global GC Lock: Largest Plausible Number of Threads?

reply dsimcha <dsimcha yahoo.com> writes:
I'm thinking about ways to remove the global lock from the garbage 
collector for most small allocations.  I'm basically thinking of making 
the free lists thread local.  Every scheme I can come up with that 
doesn't require a radical overhaul of the current implementation 
requires every thread having a unique ID.  I want to do this as simply 
and efficiently as possible, preferably using dense integers.  Is it 
reasonable to assume that no program will ever need more than 2 ^^ 16 
thread (about 65,000) simultaneously so that I can store these indices 
as ushorts?  If a program creates a lot of short-lived threads, the 
indices will be recycled, so having a huge number of threads 
non-simultaneously is not a problem.
May 11 2011
next sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On 2011-05-11 21:14, dsimcha wrote:
 I'm thinking about ways to remove the global lock from the garbage
 collector for most small allocations.  I'm basically thinking of making
 the free lists thread local.  Every scheme I can come up with that
 doesn't require a radical overhaul of the current implementation
 requires every thread having a unique ID.  I want to do this as simply
 and efficiently as possible, preferably using dense integers.  Is it
 reasonable to assume that no program will ever need more than 2 ^^ 16
 thread (about 65,000) simultaneously so that I can store these indices
 as ushorts?  If a program creates a lot of short-lived threads, the
 indices will be recycled, so having a huge number of threads
 non-simultaneously is not a problem.
I don't think that you can legally create that many threads on a typical OS. I'd have to check, but as I recall, the typical limit is much lower than that - still in the tens of thousands, I think, but not that high. - Jonathan M Davis
May 11 2011
parent reply Daniel Gibson <metalcaedes gmail.com> writes:
Am 12.05.2011 06:33, schrieb Jonathan M Davis:
 On 2011-05-11 21:14, dsimcha wrote:
 I'm thinking about ways to remove the global lock from the garbage
 collector for most small allocations.  I'm basically thinking of making
 the free lists thread local.  Every scheme I can come up with that
 doesn't require a radical overhaul of the current implementation
 requires every thread having a unique ID.  I want to do this as simply
 and efficiently as possible, preferably using dense integers.  Is it
 reasonable to assume that no program will ever need more than 2 ^^ 16
 thread (about 65,000) simultaneously so that I can store these indices
 as ushorts?  If a program creates a lot of short-lived threads, the
 indices will be recycled, so having a huge number of threads
 non-simultaneously is not a problem.
I don't think that you can legally create that many threads on a typical OS. I'd have to check, but as I recall, the typical limit is much lower than that - still in the tens of thousands, I think, but not that high. - Jonathan M Davis
On Windows you get a DWORD (32bit int AFAIK) from GetCurrentThreadId(), so that seems to be a technical limit there. On Linux there's pthread_t which seems to be a 32bit uint on x86 and 64bit ulong on amd64. However, see: http://stackoverflow.com/questions/344203/maximum-number-of-threads-per-process-in-linux It seems like it's limited by the available stack space (especially on 32bit systems) Similar on Windows: http://blogs.msdn.com/b/oldnewthing/archive/2005/07/29/444912.aspx Cheers, - Daniel
May 11 2011
parent Daniel Gibson <metalcaedes gmail.com> writes:
Am 12.05.2011 06:47, schrieb Daniel Gibson:
 Am 12.05.2011 06:33, schrieb Jonathan M Davis:
 On 2011-05-11 21:14, dsimcha wrote:
 I'm thinking about ways to remove the global lock from the garbage
 collector for most small allocations.  I'm basically thinking of making
 the free lists thread local.  Every scheme I can come up with that
 doesn't require a radical overhaul of the current implementation
 requires every thread having a unique ID.  I want to do this as simply
 and efficiently as possible, preferably using dense integers.  Is it
 reasonable to assume that no program will ever need more than 2 ^^ 16
 thread (about 65,000) simultaneously so that I can store these indices
 as ushorts?  If a program creates a lot of short-lived threads, the
 indices will be recycled, so having a huge number of threads
 non-simultaneously is not a problem.
I don't think that you can legally create that many threads on a typical OS. I'd have to check, but as I recall, the typical limit is much lower than that - still in the tens of thousands, I think, but not that high. - Jonathan M Davis
On Windows you get a DWORD (32bit int AFAIK) from GetCurrentThreadId(), so that seems to be a technical limit there. On Linux there's pthread_t which seems to be a 32bit uint on x86 and 64bit ulong on amd64. However, see: http://stackoverflow.com/questions/344203/maximum-number-of-threads-per-process-in-linux It seems like it's limited by the available stack space (especially on 32bit systems) Similar on Windows: http://blogs.msdn.com/b/oldnewthing/archive/2005/07/29/444912.aspx Cheers, - Daniel
Also: http://www.sgi.com/products/servers/altix/uv/configs.html You can actually buy systems with 2048 Cores (AMD64 architecture even) and, thanks to Hyperthreading, 4096 threads (that may run at the same time) that run Linux. I think it isn't *that* unrealistic to have more than ushort.max Threads on such a beast.
May 11 2011
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 5/11/2011 9:14 PM, dsimcha wrote:
 I'm thinking about ways to remove the global lock from the garbage collector
for
 most small allocations. I'm basically thinking of making the free lists thread
 local. Every scheme I can come up with that doesn't require a radical overhaul
 of the current implementation requires every thread having a unique ID. I want
 to do this as simply and efficiently as possible, preferably using dense
 integers. Is it reasonable to assume that no program will ever need more than 2
 ^^ 16 thread (about 65,000) simultaneously so that I can store these indices as
 ushorts? If a program creates a lot of short-lived threads, the indices will be
 recycled, so having a huge number of threads non-simultaneously is not a
problem.
I suggest one of two schemes: 1. Use thread local storage to store the free list for each thread. 2. Use a global free list, and use CAS to pull allocations off of it.
May 11 2011
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On 2011-05-11 21:33, Jonathan M Davis wrote:
 On 2011-05-11 21:14, dsimcha wrote:
 I'm thinking about ways to remove the global lock from the garbage
 collector for most small allocations.  I'm basically thinking of making
 the free lists thread local.  Every scheme I can come up with that
 doesn't require a radical overhaul of the current implementation
 requires every thread having a unique ID.  I want to do this as simply
 and efficiently as possible, preferably using dense integers.  Is it
 reasonable to assume that no program will ever need more than 2 ^^ 16
 thread (about 65,000) simultaneously so that I can store these indices
 as ushorts?  If a program creates a lot of short-lived threads, the
 indices will be recycled, so having a huge number of threads
 non-simultaneously is not a problem.
I don't think that you can legally create that many threads on a typical OS. I'd have to check, but as I recall, the typical limit is much lower than that - still in the tens of thousands, I think, but not that high.
Okay. I was wrong. Apparently, the limit on Linux is set by what's in /proc/sys/kernel/threads-max. On my system (64-bit Arch Linux), that's 256,245 - though that's system-wide, so not even the most thread-happy program can really get all that near that limit. Still, that's far above 65,000. I can't find as clear an answer for Windows though. It sounds like it doesn't technically have a limit, but you'll run out of memory eventually. I'm not sure if either Linux or Windows can actually get away with having as many as 65,000 threads at a time. And even if you can, it might be completely reasonable to say that druntime doesn't support more than that. That's a _lot_ of threads, and I find it hard to believe that any semi-reasonable D program would have that many. Something like erlang probably would, but I don't believe that they use full-on OS threads for their threads but rather do something more like fibers. So, technically, I'm not sure that you can assume that 65,000 threads is enough, but it certainly sounds to me like it would be reasonable to enforce that limit on D programs, even if the OS doesn't. - Jonathan M Davis
May 11 2011
prev sibling next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 5/11/11 11:14 PM, dsimcha wrote:
 I'm thinking about ways to remove the global lock from the garbage
 collector for most small allocations. I'm basically thinking of making
 the free lists thread local. Every scheme I can come up with that
 doesn't require a radical overhaul of the current implementation
 requires every thread having a unique ID. I want to do this as simply
 and efficiently as possible, preferably using dense integers. Is it
 reasonable to assume that no program will ever need more than 2 ^^ 16
 thread (about 65,000) simultaneously so that I can store these indices
 as ushorts? If a program creates a lot of short-lived threads, the
 indices will be recycled, so having a huge number of threads
 non-simultaneously is not a problem.
http://stackoverflow.com/questions/344203/maximum-number-of-threads-per-process-in-linux On my work machine: 1032845 Andrei
May 11 2011
prev sibling next sibling parent Kagamin <spam here.lot> writes:
dsimcha Wrote:

 I'm thinking about ways to remove the global lock from the garbage 
 collector for most small allocations.  I'm basically thinking of making 
 the free lists thread local.  Every scheme I can come up with that 
 doesn't require a radical overhaul of the current implementation 
 requires every thread having a unique ID.  I want to do this as simply 
 and efficiently as possible, preferably using dense integers.  Is it 
 reasonable to assume that no program will ever need more than 2 ^^ 16 
 thread (about 65,000) simultaneously so that I can store these indices 
 as ushorts?  If a program creates a lot of short-lived threads, the 
 indices will be recycled, so having a huge number of threads 
 non-simultaneously is not a problem.
version(Slim) { alias ushort threadid_t; } else { alias uint threadid_t; } ?
May 12 2011
prev sibling next sibling parent "JimBob" <jim bob.com> writes:
"dsimcha" <dsimcha yahoo.com> wrote in message 
news:iqfn3l$p2p$1 digitalmars.com...
 I'm thinking about ways to remove the global lock from the garbage 
 collector for most small allocations.  I'm basically thinking of making 
 the free lists thread local.
If you use lock free free lists you dont need one for every thread, you only need enough to reduce contention on the free lists, which in my experience is about 1x to 2x the number of cores you have.
May 13 2011
prev sibling parent reply Kagamin <spam here.lot> writes:
dsimcha Wrote:

 I'm thinking about ways to remove the global lock from the garbage 
 collector for most small allocations.  I'm basically thinking of making 
 the free lists thread local.
http://www.liblfds.org/ ?
May 13 2011
next sibling parent reply dsimcha <dsimcha yahoo.com> writes:
On 5/13/2011 6:12 AM, Kagamin wrote:
 dsimcha Wrote:

 I'm thinking about ways to remove the global lock from the garbage
 collector for most small allocations.  I'm basically thinking of making
 the free lists thread local.
http://www.liblfds.org/ ?
Surprisingly, lock-free free lists wouldn't solve the problem, since the GC flag bookkeeping would still require locking unless it is completely rewritten.
May 13 2011
parent reply "JimBob" <jim bob.com> writes:
"dsimcha" <dsimcha yahoo.com> wrote in message 
news:iqj8ba$18sc$1 digitalmars.com...
 On 5/13/2011 6:12 AM, Kagamin wrote:
 dsimcha Wrote:

 I'm thinking about ways to remove the global lock from the garbage
 collector for most small allocations.  I'm basically thinking of making
 the free lists thread local.
http://www.liblfds.org/ ?
Surprisingly, lock-free free lists wouldn't solve the problem, since the GC flag bookkeeping would still require locking unless it is completely rewritten.
If the flag needs to be atomicaly updated and thread local free lists solve that problem yet global lock free free lists dont, the implication is that the flag is only ever accessed from the one thread. And that implies that memory cant be allocated in one thread and freed in another?
May 13 2011
parent dsimcha <dsimcha yahoo.com> writes:
== Quote from JimBob (jim bob.com)'s article
 "dsimcha" <dsimcha yahoo.com> wrote in message
 news:iqj8ba$18sc$1 digitalmars.com...
 On 5/13/2011 6:12 AM, Kagamin wrote:
 dsimcha Wrote:

 I'm thinking about ways to remove the global lock from the garbage
 collector for most small allocations.  I'm basically thinking of making
 the free lists thread local.
http://www.liblfds.org/ ?
Surprisingly, lock-free free lists wouldn't solve the problem, since the GC flag bookkeeping would still require locking unless it is completely rewritten.
If the flag needs to be atomicaly updated and thread local free lists solve that problem yet global lock free free lists dont, the implication is that the flag is only ever accessed from the one thread. And that implies that memory cant be allocated in one thread and freed in another?
There's clearly been some misunderstanding here because I haven't explained in detail what my idea was. First, the GC would still be locked when sweeping or manually freeing. Second, every page would be owned by a single thread. The way the GCBits are laid out, this means the flags issue would become a non-issue because every integer that makes up the bit array only tracks flags for one page. Secondly, I'm concerned about two things if I get too fancy with lock-free/atomic stuff: 1. It might plausibly end up being substantially slower than the current version in single-threaded programs or programs where there's not much contention. 2. I don't want to do anything too clever. I learned the hard way by implementing std.parallelism that: non-trivial manual memory management + clever concurrency in the same code == nightmarish to debug
May 13 2011
prev sibling parent reply liblfds-admin <admin remove-antispam.liblfds.org> writes:
On Friday, 13 May 2011 at 10:12:24 UTC, Kagamin wrote:
 dsimcha Wrote:

 I'm thinking about ways to remove the global lock from the 
 garbage collector for most small allocations.  I'm basically 
 thinking of making the free lists thread local.
http://www.liblfds.org/ ?
Release 7.0.0 of liblfds has just been published.
Dec 30 2015
parent reply extrawurst <stephan extrawurst.org> writes:
On Wednesday, 30 December 2015 at 23:55:25 UTC, liblfds-admin 
wrote:
 On Friday, 13 May 2011 at 10:12:24 UTC, Kagamin wrote:
 dsimcha Wrote:

 I'm thinking about ways to remove the global lock from the 
 garbage collector for most small allocations.  I'm basically 
 thinking of making the free lists thread local.
http://www.liblfds.org/ ?
Release 7.0.0 of liblfds has just been published.
Still no osx support ? or is it just not tested ? -- Stephan
Jan 03 2016
parent libfds-admin <admin nospamatall.liblfds.org> writes:
On Sunday, 3 January 2016 at 18:32:23 UTC, extrawurst wrote:
 On Wednesday, 30 December 2015 at 23:55:25 UTC, liblfds-admin
 Release 7.0.0 of liblfds has just been published.
Still no osx support ? or is it just not tested ?
Well, no build files are provided for out-of-the-box use, because I do not have an OSX platform to make them on. The library itself will work fine on OSX of course. It shouldn't be onerous to make the build files for it - there's nothing special that has to be done. Just load them all up into a project and compile. I'd very much like to support OSX and it's on my mind, but it's not reasonable to carry a Mac around the world with me for that.
Jan 03 2016