www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Removing The Global GC Lock: Largest Plausible Number of Threads?

reply dsimcha <dsimcha yahoo.com> writes:
I'm thinking about ways to remove the global lock from the garbage 
collector for most small allocations.  I'm basically thinking of making 
the free lists thread local.  Every scheme I can come up with that 
doesn't require a radical overhaul of the current implementation 
requires every thread having a unique ID.  I want to do this as simply 
and efficiently as possible, preferably using dense integers.  Is it 
reasonable to assume that no program will ever need more than 2 ^^ 16 
thread (about 65,000) simultaneously so that I can store these indices 
as ushorts?  If a program creates a lot of short-lived threads, the 
indices will be recycled, so having a huge number of threads 
non-simultaneously is not a problem.
May 11 2011
next sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On 2011-05-11 21:14, dsimcha wrote:
 I'm thinking about ways to remove the global lock from the garbage
 collector for most small allocations.  I'm basically thinking of making
 the free lists thread local.  Every scheme I can come up with that
 doesn't require a radical overhaul of the current implementation
 requires every thread having a unique ID.  I want to do this as simply
 and efficiently as possible, preferably using dense integers.  Is it
 reasonable to assume that no program will ever need more than 2 ^^ 16
 thread (about 65,000) simultaneously so that I can store these indices
 as ushorts?  If a program creates a lot of short-lived threads, the
 indices will be recycled, so having a huge number of threads
 non-simultaneously is not a problem.

I don't think that you can legally create that many threads on a typical OS. I'd have to check, but as I recall, the typical limit is much lower than that - still in the tens of thousands, I think, but not that high. - Jonathan M Davis
May 11 2011
parent reply Daniel Gibson <metalcaedes gmail.com> writes:
Am 12.05.2011 06:33, schrieb Jonathan M Davis:
 On 2011-05-11 21:14, dsimcha wrote:
 I'm thinking about ways to remove the global lock from the garbage
 collector for most small allocations.  I'm basically thinking of making
 the free lists thread local.  Every scheme I can come up with that
 doesn't require a radical overhaul of the current implementation
 requires every thread having a unique ID.  I want to do this as simply
 and efficiently as possible, preferably using dense integers.  Is it
 reasonable to assume that no program will ever need more than 2 ^^ 16
 thread (about 65,000) simultaneously so that I can store these indices
 as ushorts?  If a program creates a lot of short-lived threads, the
 indices will be recycled, so having a huge number of threads
 non-simultaneously is not a problem.

I don't think that you can legally create that many threads on a typical OS. I'd have to check, but as I recall, the typical limit is much lower than that - still in the tens of thousands, I think, but not that high. - Jonathan M Davis

On Windows you get a DWORD (32bit int AFAIK) from GetCurrentThreadId(), so that seems to be a technical limit there. On Linux there's pthread_t which seems to be a 32bit uint on x86 and 64bit ulong on amd64. However, see: http://stackoverflow.com/questions/344203/maximum-number-of-threads-per-process-in-linux It seems like it's limited by the available stack space (especially on 32bit systems) Similar on Windows: http://blogs.msdn.com/b/oldnewthing/archive/2005/07/29/444912.aspx Cheers, - Daniel
May 11 2011
parent Daniel Gibson <metalcaedes gmail.com> writes:
Am 12.05.2011 06:47, schrieb Daniel Gibson:
 Am 12.05.2011 06:33, schrieb Jonathan M Davis:
 On 2011-05-11 21:14, dsimcha wrote:
 I'm thinking about ways to remove the global lock from the garbage
 collector for most small allocations.  I'm basically thinking of making
 the free lists thread local.  Every scheme I can come up with that
 doesn't require a radical overhaul of the current implementation
 requires every thread having a unique ID.  I want to do this as simply
 and efficiently as possible, preferably using dense integers.  Is it
 reasonable to assume that no program will ever need more than 2 ^^ 16
 thread (about 65,000) simultaneously so that I can store these indices
 as ushorts?  If a program creates a lot of short-lived threads, the
 indices will be recycled, so having a huge number of threads
 non-simultaneously is not a problem.

I don't think that you can legally create that many threads on a typical OS. I'd have to check, but as I recall, the typical limit is much lower than that - still in the tens of thousands, I think, but not that high. - Jonathan M Davis

On Windows you get a DWORD (32bit int AFAIK) from GetCurrentThreadId(), so that seems to be a technical limit there. On Linux there's pthread_t which seems to be a 32bit uint on x86 and 64bit ulong on amd64. However, see: http://stackoverflow.com/questions/344203/maximum-number-of-threads-per-process-in-linux It seems like it's limited by the available stack space (especially on 32bit systems) Similar on Windows: http://blogs.msdn.com/b/oldnewthing/archive/2005/07/29/444912.aspx Cheers, - Daniel

Also: http://www.sgi.com/products/servers/altix/uv/configs.html You can actually buy systems with 2048 Cores (AMD64 architecture even) and, thanks to Hyperthreading, 4096 threads (that may run at the same time) that run Linux. I think it isn't *that* unrealistic to have more than ushort.max Threads on such a beast.
May 11 2011
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 5/11/2011 9:14 PM, dsimcha wrote:
 I'm thinking about ways to remove the global lock from the garbage collector
for
 most small allocations. I'm basically thinking of making the free lists thread
 local. Every scheme I can come up with that doesn't require a radical overhaul
 of the current implementation requires every thread having a unique ID. I want
 to do this as simply and efficiently as possible, preferably using dense
 integers. Is it reasonable to assume that no program will ever need more than 2
 ^^ 16 thread (about 65,000) simultaneously so that I can store these indices as
 ushorts? If a program creates a lot of short-lived threads, the indices will be
 recycled, so having a huge number of threads non-simultaneously is not a
problem.

I suggest one of two schemes: 1. Use thread local storage to store the free list for each thread. 2. Use a global free list, and use CAS to pull allocations off of it.
May 11 2011
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On 2011-05-11 21:33, Jonathan M Davis wrote:
 On 2011-05-11 21:14, dsimcha wrote:
 I'm thinking about ways to remove the global lock from the garbage
 collector for most small allocations.  I'm basically thinking of making
 the free lists thread local.  Every scheme I can come up with that
 doesn't require a radical overhaul of the current implementation
 requires every thread having a unique ID.  I want to do this as simply
 and efficiently as possible, preferably using dense integers.  Is it
 reasonable to assume that no program will ever need more than 2 ^^ 16
 thread (about 65,000) simultaneously so that I can store these indices
 as ushorts?  If a program creates a lot of short-lived threads, the
 indices will be recycled, so having a huge number of threads
 non-simultaneously is not a problem.

I don't think that you can legally create that many threads on a typical OS. I'd have to check, but as I recall, the typical limit is much lower than that - still in the tens of thousands, I think, but not that high.

Okay. I was wrong. Apparently, the limit on Linux is set by what's in /proc/sys/kernel/threads-max. On my system (64-bit Arch Linux), that's 256,245 - though that's system-wide, so not even the most thread-happy program can really get all that near that limit. Still, that's far above 65,000. I can't find as clear an answer for Windows though. It sounds like it doesn't technically have a limit, but you'll run out of memory eventually. I'm not sure if either Linux or Windows can actually get away with having as many as 65,000 threads at a time. And even if you can, it might be completely reasonable to say that druntime doesn't support more than that. That's a _lot_ of threads, and I find it hard to believe that any semi-reasonable D program would have that many. Something like erlang probably would, but I don't believe that they use full-on OS threads for their threads but rather do something more like fibers. So, technically, I'm not sure that you can assume that 65,000 threads is enough, but it certainly sounds to me like it would be reasonable to enforce that limit on D programs, even if the OS doesn't. - Jonathan M Davis
May 11 2011
prev sibling next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 5/11/11 11:14 PM, dsimcha wrote:
 I'm thinking about ways to remove the global lock from the garbage
 collector for most small allocations. I'm basically thinking of making
 the free lists thread local. Every scheme I can come up with that
 doesn't require a radical overhaul of the current implementation
 requires every thread having a unique ID. I want to do this as simply
 and efficiently as possible, preferably using dense integers. Is it
 reasonable to assume that no program will ever need more than 2 ^^ 16
 thread (about 65,000) simultaneously so that I can store these indices
 as ushorts? If a program creates a lot of short-lived threads, the
 indices will be recycled, so having a huge number of threads
 non-simultaneously is not a problem.

http://stackoverflow.com/questions/344203/maximum-number-of-threads-per-process-in-linux On my work machine: 1032845 Andrei
May 11 2011
prev sibling next sibling parent Kagamin <spam here.lot> writes:
dsimcha Wrote:

 I'm thinking about ways to remove the global lock from the garbage 
 collector for most small allocations.  I'm basically thinking of making 
 the free lists thread local.  Every scheme I can come up with that 
 doesn't require a radical overhaul of the current implementation 
 requires every thread having a unique ID.  I want to do this as simply 
 and efficiently as possible, preferably using dense integers.  Is it 
 reasonable to assume that no program will ever need more than 2 ^^ 16 
 thread (about 65,000) simultaneously so that I can store these indices 
 as ushorts?  If a program creates a lot of short-lived threads, the 
 indices will be recycled, so having a huge number of threads 
 non-simultaneously is not a problem.

version(Slim) { alias ushort threadid_t; } else { alias uint threadid_t; } ?
May 12 2011
prev sibling parent "JimBob" <jim bob.com> writes:
"dsimcha" <dsimcha yahoo.com> wrote in message 
news:iqfn3l$p2p$1 digitalmars.com...
 I'm thinking about ways to remove the global lock from the garbage 
 collector for most small allocations.  I'm basically thinking of making 
 the free lists thread local.

If you use lock free free lists you dont need one for every thread, you only need enough to reduce contention on the free lists, which in my experience is about 1x to 2x the number of cores you have.
May 13 2011