www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - First experience with Threads

reply "Era Scarecrow" <rtcvb32 yahoo.com> writes:
  Just a little experience and perhaps some help on the subject. 
This is a partial repost from another forum too so. I've always 
saw how much threading was an annoyance trying to follow along 
(the API alone) but programming it is more annoying. I've never 
actually done multi-thread programming so this is a first for me.


  First the problem. Trying to load up a data structure (that's 
fairly big) can take a fair amount of time, but if the records 
and structures never need to touch eachother, there's no reason 
they cannot be handled on separate cores/threads (or that's my 
logic on it anyways).


  In order to try and use more cores, I've split off the loading 
and unpacking stages as separate. So first off within half a 
second the whole memory is filled with 80Mb of data and all the 
records are separated. Now that they are separated, they can all 
be unpacked by the different cores.

  Part of a problem is when the thread activates, just because you 
start a thread doesn't mean it runs right away (it will run when 
it's ready), an any data that still relies on it via a delegate 
becomes a violate pointer data (At least in VisualD) and that 
data may change. So...

[code]
   class Record {
     //and stuff
     void loadSubRecords();
   }
   Record[] recordList; //and stuff

   foreach(rec; recordList) {
     Thread th = new Thread( () {rec.loadSubRecords()} );
     th.start();
   }
[/code]

  Rec (and even ref rec) may change at any time (Worse is during 
it's update or before the thread starts). So if we go with to 
copying an index instead it does improve a bit. So long as the 
data is copied before the next foreach loop it's fine, otherwise 
I may still change and it may do something unwanted.

[code]
   foreach(i, rec; recordList) {
     Thread th = new Thread( ()
        {
          int index = i;
          recordList[index].loadSubRecords();
        });
     th.start();
   }
[/code]

Several other combinations came up. I think I found an easy way 
to handle it without adding in unneeded mutexes and whatnot. What 
seems to work is if I pack all the data for the job I need in a 
structure, and have that structure start the thread (inside), 
then the chances of the problem happening go away (hopefully 
completely).

[code]
   //or something similar
   struct Packed {
     Thread thread;
     Record record;
     void run() {
       assert(record);
       thread = new Thread( (){record.loadSubRecords();} );
       thread.start();
     }
   }

   //bad way of thread handling, but makes sense.
   Packed[] obj;
   obj.length = recordList.length;

   foreach(i, rec; recordList) {
     obj[i].record = rec; //class is reference type remember
     obj[i].run(); //returns right away, but thread is running too
   }
   threads_joinAll();
[/code]

  So long as the records (and subrecords) never touch eachother 
then mutexes and semephores aren't needed 90% of the time.

  Now since the record count in the original file is 40k, having 
40k of threads not only is dumb, but also expensive to set up. So 
instead I set up job groups.

[code]
   struct PackedList {
     Thread thread;
     Record[] recordList;

     void runWork() {
       foreach(rec; recordList)
         rec.loadSubRecords();
     }

     void run() {
       assert(recordList);
       thread = new Thread( (){this.runWork();} );
       thread.start();
     }
   }
[/code]

  With this basic idea, drop a thousand in one PackedList and 
start it, then grab another thousand and drop them into another 
PackedList. They'll run until their workload is done.

  Is there a suggested magic number of how many threads per core 
you should use? If you have say a quad core, you can have 4 
threads going (obviously) but if they go to sleep waiting on 
system resources or something (loading a file, saving, something 
other), then the core may be unused. It makes sense to have 2 per 
core since then if it gets silent it has another it can pick up 
on. I'm guessing 2-4 would be the number of threads to do this 
type of work.
Oct 06 2012
parent reply =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 10/06/2012 06:17 AM, Era Scarecrow wrote:
 if the records and structures
 never need to touch eachother, there's no reason they cannot be handled
 on separate cores/threads (or that's my logic on it anyways).
Have you considered std.parallelism? If you can represent the data as a slice, then a parallel foreach loop on that data is all you need: foreach (data; parallel(dataSlice)) { // ... each data will be handled individually in parallel ... } There is the following chapter about that module, which covers most of std.parallelism: http://ddili.org/ders/d.en/parallelism.html Even though I have made a second pass to include the appearently-newly-added features, there are some features of std.parallelism that are missing in the chapter. Although you don't seem to need it, there is also message passing concurrency: http://ddili.org/ders/d.en/concurrency.html Ali
Oct 06 2012
parent reply "Era Scarecrow" <rtcvb32 yahoo.com> writes:
On Saturday, 6 October 2012 at 14:01:30 UTC, Ali Çehreli wrote:
 Have you considered std.parallelism? If you can represent the 
 data as a slice, then a parallel foreach loop on that data is 
 all you need:
 There is the following chapter about that module, which covers  
 most of std.parallelism:
Still heavily relying on TDPL which covered concurrency and message passing and shared, but not std.parallelism. On the other hand it does look like it contains more of what I wanted.
 Even though I have made a second pass to include the 
 apparently-newly-added features, there are some features of 
 std.parallelism that are missing in the chapter.
 Although you don't seem to need it, there is also message 
 passing concurrency:
For the moment I wanted to avoid message passing and shared, as they seem more complex than they need to be for now. I'm writing a merger (for game files) and in there you have records that modify other records, and records that don't. Only records that modify other records need to (and can run) in parallel, the rest if they qualify just get added. So once again, thank you and I'll give it a try after I read through it.
Oct 06 2012
parent "Era Scarecrow" <rtcvb32 yahoo.com> writes:
  Well I've tried using parallel as shown and it appears to be as 
efficient as my own struct/job based one, which is very 
promising. I'll consider using it more later. Still got plenty of 
reading and work to do before I get there.
Oct 06 2012