www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - parallel optimizations based on number of memory controllers vs cpus

reply "Jay Norwood" <jayn prismnet.com> writes:
I believe the current std.parallelism default threadpool count is 
  number of cpus-1, according to some documentation.  When I was 
testing some concurrent vs threadpool parallel implementations I 
was seeing improvements on the concurrent operation up to about 
14 threads.  I didn't try to figure out how to change the 
threadpool.

While reading this article I noticed someone who reported similar 
improvements up to 14 threads on memory related operations, and 
explained it by the number of memory controllers being the 
limiting issue.  See his item number 4  where  significant gains 
were made in memory processing up to 14 threads.

So, I wonder if it wouldn't be good to have a couple of different 
built-in threadpool types ... one meant for memory operations, 
and one primarily for cpu crunching ... with different sizes.

http://stackoverflow.com/questions/4260602/how-to-increase-performance-of-memcpy
Mar 23 2012
next sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
On 03/23/2012 02:46 PM, Jay Norwood wrote:
 I believe the current std.parallelism default threadpool count is number
 of cpus-1, according to some documentation. When I was testing some
 concurrent vs threadpool parallel implementations I was seeing
 improvements on the concurrent operation up to about 14 threads. I
 didn't try to figure out how to change the threadpool.

 While reading this article I noticed someone who reported similar
 improvements up to 14 threads on memory related operations, and
 explained it by the number of memory controllers being the limiting
 issue. See his item number 4 where significant gains were made in memory
 processing up to 14 threads.

 So, I wonder if it wouldn't be good to have a couple of different
 built-in threadpool types ... one meant for memory operations, and one
 primarily for cpu crunching ... with different sizes.

 http://stackoverflow.com/questions/4260602/how-to-increase-performance-of-memcpy

On program startup, do: ThreadPool.defaultPoolThreads(14); // or 13
Mar 23 2012
prev sibling parent "Jay Norwood" <jayn prismnet.com> writes:
On Friday, 23 March 2012 at 13:56:09 UTC, Timon Gehr wrote:
 On program startup, do:

 ThreadPool.defaultPoolThreads(14); // or 13

Yes, thank you. I just tried adding that. The gains aren't scaleable in this particular test, which is apparently dominated by cpu processing, but even here you can see incremental improvements at 13 vs 7 threads on all the numbers. I'd probably have to identify operations that were being limited by memory accesses in order to see the type of gains stated in that other app. This is with default 7 threads finished wcp_wcPointer! time: 98 ms finished wcp_wcCtRegex! time: 1300 ms finished wcp_wcRegex! time: 2946 ms finished wcp_wcRegex2! time: 2687 ms finished wcp_wcSlices! time: 157 ms finished wcp_wcStdAscii! time: 225 ms This is processing the same data with 1 thread finished wcp_wcPointer! time: 188 ms finished wcp_wcCtRegex! time: 2219 ms finished wcp_wcRegex! time: 5951 ms finished wcp_wcRegex2! time: 5502 ms finished wcp_wcSlices! time: 318 ms finished wcp_wcStdAscii! time: 446 ms And this is processing the same data with 13 threads finished wcp_wcPointer! time: 93 ms finished wcp_wcCtRegex! time: 1110 ms finished wcp_wcRegex! time: 2531 ms finished wcp_wcRegex2! time: 2321 ms finished wcp_wcSlices! time: 136 ms finished wcp_wcStdAscii! time: 200 ms These were from the tests uploaded at https://github.com/jnorwood/wc_test. The only change in the program that is uploaded is to add the suggested defaultPoolThreads(13) at the start of main; at the start of main to change the ThreadPool default thread count.
Mar 26 2012