www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - How do I limit the number of active threads (queuing spawn calls)

reply Andrej Mitovic <none none.none> writes:
I'm testing out some various compilation schemes with DMD. Right now I'm
spawning multiple threads which simply do a `system` call with a string like
"DMD -c somefile.d". I'd like to limit the number of active threads to
something my CPU can handle (4 in this case since I've got 4 cores..).

How do I go about doing this?

Here's the function which I spawn:
void compileObjfile(string name)
{
    shell(r"dmd -od" ~ r".\cache\" ~ r" -c -version=Unicode
-version=WindowsNTonly -version=Windows2000 -version=WindowsXP -I..\ " ~ name ~
" ");
}

So I just need to pass the module name to it. The trouble is, if I spawn this
function inside a foreach loop, I'll inadvertently create a few dozen threads.
This hogs the system for a while. :) (although this does seem to create some
rather impressive compilation speeds, LOL!)

This is what the main function might look like:
void main()
{
    foreach (string name; dirEntries(curdir, SpanMode.shallow))
    {
        if (name.isfile && name.getExt == "d")
        {
            spawn(&compileObjfile, name);
        }
    }
}

Sidenotes: So I've tried compiling the win32 libraries via `DMD -lib`. DMD eats
up over 300 Megs of memory, and its quite scary how fast that number grows. It
took over 25 seconds to compile a lib file.

On the other hand, compiling .obj files one by one by blocking a single thread
on system calls (in other words, single-threaded version), it takes about 15
seconds to create a library file. In each instantiation DMD wastes only about a
dozen or so Mbytes, maybe less.

When I spawn an unlimited number of threads via a foreach loop, again compiling
object-by-object, the lib file is generated in only 5(!) seconds. I'm running a
quad-core on XP32, btw.

So I'm a little perplexed, because according to Tomasz (maker of xfBuild) and
his various posts, compiling .obj by .obj file should apparently be really
really slow and -lib makes the fastest builds. But I'm getting the exact
opposite results.
Mar 26 2011
next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On 2011-03-26 18:15, Andrej Mitovic wrote:
 I'm testing out some various compilation schemes with DMD. Right now I'm
 spawning multiple threads which simply do a `system` call with a string
 like "DMD -c somefile.d". I'd like to limit the number of active threads
 to something my CPU can handle (4 in this case since I've got 4 cores..).
 
 How do I go about doing this?
 
 Here's the function which I spawn:
 void compileObjfile(string name)
 {
     shell(r"dmd -od" ~ r".\cache\" ~ r" -c -version=Unicode
 -version=WindowsNTonly -version=Windows2000 -version=WindowsXP -I..\ " ~
 name ~ " "); }
 
 So I just need to pass the module name to it. The trouble is, if I spawn
 this function inside a foreach loop, I'll inadvertently create a few dozen
 threads. This hogs the system for a while. :) (although this does seem to
 create some rather impressive compilation speeds, LOL!)
 
 This is what the main function might look like:
 void main()
 {
     foreach (string name; dirEntries(curdir, SpanMode.shallow))
     {
         if (name.isfile && name.getExt == "d")
         {
             spawn(&compileObjfile, name);
         }
     }
 }
 
 Sidenotes: So I've tried compiling the win32 libraries via `DMD -lib`. DMD
 eats up over 300 Megs of memory, and its quite scary how fast that number
 grows. It took over 25 seconds to compile a lib file.
 
 On the other hand, compiling .obj files one by one by blocking a single
 thread on system calls (in other words, single-threaded version), it takes
 about 15 seconds to create a library file. In each instantiation DMD
 wastes only about a dozen or so Mbytes, maybe less.
 
 When I spawn an unlimited number of threads via a foreach loop, again
 compiling object-by-object, the lib file is generated in only 5(!)
 seconds. I'm running a quad-core on XP32, btw.
 
 So I'm a little perplexed, because according to Tomasz (maker of xfBuild)
 and his various posts, compiling .obj by .obj file should apparently be
 really really slow and -lib makes the fastest builds. But I'm getting the
 exact opposite results.

I don't believe that std.concurrency has any way to manage the number of threads that are running. It gives you the means to communicate between threads and gives you a nice to spawn a thread, but it doesn't really do much with thread management. You could use core.thread.Thread.getAll to get an array of all of the Threads, and spin until the number is below whatever the threshold is that you want, but that's not terribly efficient, since then you're going to have a thread spinning, eating up CPU as it waits for the others to finish. What I have done when I've wanted to do something like this is to have each spawned thread send a message back when it's done. Then, I increment a thread count when I spawn a thread and decrement it when I receive a message indicating that a thread has terminated. In the loop that I have running which is processing whatever list of things I want processed, it will only spawn a thread if the thread count is below the chosen threshhold. Otherwise it sits there waiting to receive a message. So, it would do something like this foreach(string name; dirEntries(curdir, SpanMode.shallow)) { if(name.isfile && name.getExt == "d") { if(currThreads < maxThreads) receiveTimeout(1, recProc); else receive(recProc; spawn(&compileObjfile, name); ++currThreads; } } recProc is then a function which handles receiving messages, and it decrements currThreads when it receives the message that a thread has been terminated. std.concurrency does not manage threads. It only gives you tools for creating them and communicating between them. So, you need to manage the threads yourself if you want to manage them. However, it should be noted that the task that you're looking to solve here may be better solved by std.parallelism, which David has been working on, and has been being reviewed on the main list. - Jonathan M Davis
Mar 26 2011
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
Well I've worked around this by polling a variable which holds the
number of active threads. It's not a pretty solution, and I'd probably
be best with using std.parallelism or some upcoming module. My
solution for now is:

import std.stdio;
import std.file;
import std.path;
import std.process;
import std.concurrency;
import core.thread;

shared int threadsCount;

void compileObjfile(string name)
{
    system(r"dmd -od" ~ r".\cache\" ~ r" -c -version=Unicode
-version=WindowsNTonly -version=Windows2000 -version=WindowsXP -I..\ "
~ name ~ " ");
    atomicOp!"-="(threadsCount, 1);
}

int main()
{
    string libfileName = r".\cache\win32.lib ";
    string objFiles;
	foreach (string name; dirEntries(curdir, SpanMode.shallow))
    {
        if (name.isfile && name.basename.getName != "build" &&
(name.getExt == "d" || name.getExt == "di"))
        {
            string objfileName = r".\cache\" ~ name.basename.getName ~ ".obj";
            objFiles ~= objfileName ~ " ";

            atomicOp!"+="(threadsCount, 1);
            while (threadsCount > 3)
            {
                Thread.sleep(dur!("msecs")(1));
            }
            spawn(&compileObjfile, name);
        }
    }

    while (threadsCount)
    {
        Thread.sleep(dur!("msecs")(1));  // wait for threads to finish
before call to lib
    }
    system(r"lib -c -n -p64 " ~ objFiles);

    return 0;
}

The timing:

D:\dev\projects\win32\win32>timeit build
Digital Mars Librarian Version 8.02n
Copyright (C) Digital Mars 2000-2007 All Rights Reserved
http://www.digitalmars.com/ctg/lib.html
Digital Mars Librarian complete.

Version Number:   Windows NT 5.1 (Build 2600)
Exit Time:        3:49 am, Sunday, March 27 2011
Elapsed Time:     0:00:06.437
Process Time:     0:00:00.062
System Calls:     627101
Context Switches: 123883
Page Faults:      734997
Bytes Read:       93800813
Bytes Written:    7138927
Bytes Other:      1043652

So about ~6.5 seconds. Now compare this to this build script which
simply invokes DMD with -lib and all the modules:

import std.stdio;
import std.process;
import std.path;
import std.file;

void main()
{
    string files;
	foreach (string name; dirEntries(curdir, SpanMode.shallow))
    {
        if (name.isfile && name.basename.getName != "build" &&
name.getExt == "d")
            files ~= name ~ " ";
    }

    system(r"dmd -lib -I..\ -version=Unicode -version=WindowsNTonly
-version=Windows2000 -version=WindowsXP " ~ files);
}

D:\dev\projects\win32\win32>timeit build.exe

Version Number:   Windows NT 5.1 (Build 2600)
Exit Time:        3:54 am, Sunday, March 27 2011
Elapsed Time:     0:00:25.750
Process Time:     0:00:00.015
System Calls:     139172
Context Switches: 44648
Page Faults:      87440
Bytes Read:       7427284
Bytes Written:    7413372
Bytes Other:      45798

Compiling object by object is almost exactly 4 times faster with
threading than using -lib on all module files. And my multithreaded
script is probably wasting some time by calling thread.sleep(), but
I'm new to threading and I don't know how else to limit the number of
threads.
Mar 26 2011
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
Edit: It looks like I did almost the same as Jonathan advised.

I'm looking forward to std.parallelism though. I'm thinking I'd
probably use some kind of parallel foreach loop that iterates over 4
files at once, and letting it do its work by spawning 4 threads. Or
something like that. We'll see.
Mar 26 2011
prev sibling parent Brad Roberts <braddr puremagic.com> writes:
On 3/26/2011 7:00 PM, Andrej Mitrovic wrote:
 Edit: It looks like I did almost the same as Jonathan advised.
 
 I'm looking forward to std.parallelism though. I'm thinking I'd
 probably use some kind of parallel foreach loop that iterates over 4
 files at once, and letting it do its work by spawning 4 threads. Or
 something like that. We'll see.

The way I've typically done this sort of pattern is with a thread pool that gets its work from a queue. The main thread shoves work into the queue and then calls a .join or .waitForEmpty sort of api on the pool. So it'd look something like: void workerFunc(string str) { ... } auto tp = new ThreadPool(getNumCpus(), &workerFunc); foreach(...) tp.push(str); tp.join(); This can suffer from queue size problems if the amount of work is awful, but that's not a problem for the vast majority of the cases I've had, so never worried about having the push capable of blocking or otherwise throttling the producer side.
Mar 26 2011