digitalmars.D - Removing The Global GC Lock: Largest Plausible Number of Threads?
- dsimcha <dsimcha yahoo.com> May 11 2011
- Jonathan M Davis <jmdavisProg gmx.com> May 11 2011
- Daniel Gibson <metalcaedes gmail.com> May 11 2011
- Daniel Gibson <metalcaedes gmail.com> May 11 2011
- Walter Bright <newshound2 digitalmars.com> May 11 2011
- Jonathan M Davis <jmdavisProg gmx.com> May 11 2011
- Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> May 11 2011
- Kagamin <spam here.lot> May 12 2011
- "JimBob" <jim bob.com> May 13 2011
I'm thinking about ways to remove the global lock from the garbage collector for most small allocations. I'm basically thinking of making the free lists thread local. Every scheme I can come up with that doesn't require a radical overhaul of the current implementation requires every thread having a unique ID. I want to do this as simply and efficiently as possible, preferably using dense integers. Is it reasonable to assume that no program will ever need more than 2 ^^ 16 thread (about 65,000) simultaneously so that I can store these indices as ushorts? If a program creates a lot of short-lived threads, the indices will be recycled, so having a huge number of threads non-simultaneously is not a problem.
May 11 2011
On 2011-05-11 21:14, dsimcha wrote:I'm thinking about ways to remove the global lock from the garbage collector for most small allocations. I'm basically thinking of making the free lists thread local. Every scheme I can come up with that doesn't require a radical overhaul of the current implementation requires every thread having a unique ID. I want to do this as simply and efficiently as possible, preferably using dense integers. Is it reasonable to assume that no program will ever need more than 2 ^^ 16 thread (about 65,000) simultaneously so that I can store these indices as ushorts? If a program creates a lot of short-lived threads, the indices will be recycled, so having a huge number of threads non-simultaneously is not a problem.
I don't think that you can legally create that many threads on a typical OS. I'd have to check, but as I recall, the typical limit is much lower than that - still in the tens of thousands, I think, but not that high. - Jonathan M Davis
May 11 2011
Am 12.05.2011 06:33, schrieb Jonathan M Davis:On 2011-05-11 21:14, dsimcha wrote:I'm thinking about ways to remove the global lock from the garbage collector for most small allocations. I'm basically thinking of making the free lists thread local. Every scheme I can come up with that doesn't require a radical overhaul of the current implementation requires every thread having a unique ID. I want to do this as simply and efficiently as possible, preferably using dense integers. Is it reasonable to assume that no program will ever need more than 2 ^^ 16 thread (about 65,000) simultaneously so that I can store these indices as ushorts? If a program creates a lot of short-lived threads, the indices will be recycled, so having a huge number of threads non-simultaneously is not a problem.
I don't think that you can legally create that many threads on a typical OS. I'd have to check, but as I recall, the typical limit is much lower than that - still in the tens of thousands, I think, but not that high. - Jonathan M Davis
On Windows you get a DWORD (32bit int AFAIK) from GetCurrentThreadId(), so that seems to be a technical limit there. On Linux there's pthread_t which seems to be a 32bit uint on x86 and 64bit ulong on amd64. However, see: http://stackoverflow.com/questions/344203/maximum-number-of-threads-per-process-in-linux It seems like it's limited by the available stack space (especially on 32bit systems) Similar on Windows: http://blogs.msdn.com/b/oldnewthing/archive/2005/07/29/444912.aspx Cheers, - Daniel
May 11 2011
Am 12.05.2011 06:47, schrieb Daniel Gibson:Am 12.05.2011 06:33, schrieb Jonathan M Davis:On 2011-05-11 21:14, dsimcha wrote:I'm thinking about ways to remove the global lock from the garbage collector for most small allocations. I'm basically thinking of making the free lists thread local. Every scheme I can come up with that doesn't require a radical overhaul of the current implementation requires every thread having a unique ID. I want to do this as simply and efficiently as possible, preferably using dense integers. Is it reasonable to assume that no program will ever need more than 2 ^^ 16 thread (about 65,000) simultaneously so that I can store these indices as ushorts? If a program creates a lot of short-lived threads, the indices will be recycled, so having a huge number of threads non-simultaneously is not a problem.
I don't think that you can legally create that many threads on a typical OS. I'd have to check, but as I recall, the typical limit is much lower than that - still in the tens of thousands, I think, but not that high. - Jonathan M Davis
On Windows you get a DWORD (32bit int AFAIK) from GetCurrentThreadId(), so that seems to be a technical limit there. On Linux there's pthread_t which seems to be a 32bit uint on x86 and 64bit ulong on amd64. However, see: http://stackoverflow.com/questions/344203/maximum-number-of-threads-per-process-in-linux It seems like it's limited by the available stack space (especially on 32bit systems) Similar on Windows: http://blogs.msdn.com/b/oldnewthing/archive/2005/07/29/444912.aspx Cheers, - Daniel
Also: http://www.sgi.com/products/servers/altix/uv/configs.html You can actually buy systems with 2048 Cores (AMD64 architecture even) and, thanks to Hyperthreading, 4096 threads (that may run at the same time) that run Linux. I think it isn't *that* unrealistic to have more than ushort.max Threads on such a beast.
May 11 2011
On 5/11/2011 9:14 PM, dsimcha wrote:I'm thinking about ways to remove the global lock from the garbage collector for most small allocations. I'm basically thinking of making the free lists thread local. Every scheme I can come up with that doesn't require a radical overhaul of the current implementation requires every thread having a unique ID. I want to do this as simply and efficiently as possible, preferably using dense integers. Is it reasonable to assume that no program will ever need more than 2 ^^ 16 thread (about 65,000) simultaneously so that I can store these indices as ushorts? If a program creates a lot of short-lived threads, the indices will be recycled, so having a huge number of threads non-simultaneously is not a problem.
I suggest one of two schemes: 1. Use thread local storage to store the free list for each thread. 2. Use a global free list, and use CAS to pull allocations off of it.
May 11 2011
On 2011-05-11 21:33, Jonathan M Davis wrote:On 2011-05-11 21:14, dsimcha wrote:I'm thinking about ways to remove the global lock from the garbage collector for most small allocations. I'm basically thinking of making the free lists thread local. Every scheme I can come up with that doesn't require a radical overhaul of the current implementation requires every thread having a unique ID. I want to do this as simply and efficiently as possible, preferably using dense integers. Is it reasonable to assume that no program will ever need more than 2 ^^ 16 thread (about 65,000) simultaneously so that I can store these indices as ushorts? If a program creates a lot of short-lived threads, the indices will be recycled, so having a huge number of threads non-simultaneously is not a problem.
I don't think that you can legally create that many threads on a typical OS. I'd have to check, but as I recall, the typical limit is much lower than that - still in the tens of thousands, I think, but not that high.
Okay. I was wrong. Apparently, the limit on Linux is set by what's in /proc/sys/kernel/threads-max. On my system (64-bit Arch Linux), that's 256,245 - though that's system-wide, so not even the most thread-happy program can really get all that near that limit. Still, that's far above 65,000. I can't find as clear an answer for Windows though. It sounds like it doesn't technically have a limit, but you'll run out of memory eventually. I'm not sure if either Linux or Windows can actually get away with having as many as 65,000 threads at a time. And even if you can, it might be completely reasonable to say that druntime doesn't support more than that. That's a _lot_ of threads, and I find it hard to believe that any semi-reasonable D program would have that many. Something like erlang probably would, but I don't believe that they use full-on OS threads for their threads but rather do something more like fibers. So, technically, I'm not sure that you can assume that 65,000 threads is enough, but it certainly sounds to me like it would be reasonable to enforce that limit on D programs, even if the OS doesn't. - Jonathan M Davis
May 11 2011
On 5/11/11 11:14 PM, dsimcha wrote:I'm thinking about ways to remove the global lock from the garbage collector for most small allocations. I'm basically thinking of making the free lists thread local. Every scheme I can come up with that doesn't require a radical overhaul of the current implementation requires every thread having a unique ID. I want to do this as simply and efficiently as possible, preferably using dense integers. Is it reasonable to assume that no program will ever need more than 2 ^^ 16 thread (about 65,000) simultaneously so that I can store these indices as ushorts? If a program creates a lot of short-lived threads, the indices will be recycled, so having a huge number of threads non-simultaneously is not a problem.
http://stackoverflow.com/questions/344203/maximum-number-of-threads-per-process-in-linux On my work machine: 1032845 Andrei
May 11 2011
dsimcha Wrote:I'm thinking about ways to remove the global lock from the garbage collector for most small allocations. I'm basically thinking of making the free lists thread local. Every scheme I can come up with that doesn't require a radical overhaul of the current implementation requires every thread having a unique ID. I want to do this as simply and efficiently as possible, preferably using dense integers. Is it reasonable to assume that no program will ever need more than 2 ^^ 16 thread (about 65,000) simultaneously so that I can store these indices as ushorts? If a program creates a lot of short-lived threads, the indices will be recycled, so having a huge number of threads non-simultaneously is not a problem.
version(Slim) { alias ushort threadid_t; } else { alias uint threadid_t; } ?
May 12 2011
"dsimcha" <dsimcha yahoo.com> wrote in message news:iqfn3l$p2p$1 digitalmars.com...I'm thinking about ways to remove the global lock from the garbage collector for most small allocations. I'm basically thinking of making the free lists thread local.
If you use lock free free lists you dont need one for every thread, you only need enough to reduce contention on the free lists, which in my experience is about 1x to 2x the number of cores you have.
May 13 2011









Daniel Gibson <metalcaedes gmail.com> 