www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Need help with communication between multiple threads

reply Chad J <gamerChad _spamIsBad_gmail.com> writes:
I'm a bit of a newbie to this whole multithreading thing, so I'm hoping 
someone can help me with this.

First, what I know (or what I think I know):  So I've been reading about 
this, and apparently there's this problem where two threads that are 
reading a value from the same address in memory at the same time may end 
up with two completely different values.  Apparently this is because 
when you write something, it may just end up in the cache and not be 
updated in global memory.  Also, when reading, you may end up with 
something that is just an outdated copy in the cache, and not the actual 
thing from global memory.  But it doesn't stop there, apparently x86 
computers are very forgiving on this stuff so if you make a mistake you 
won't know it until your program is run on some more obscure hardware.

Now then, my question:  In D, how do I ensure that when I write 
something, the write is to global memory, and when I read something, the 
read is from global memory?

Some more info:  This comes up because I am trying to write a Timer 
class for Tango, and it will include timers that trigger events at a 
later date, which requires multithreading.  So it'd be most helpful if I 
could accomplish this using only D features and/or Tango.
Feb 20 2007
parent reply kris <foo bar.com> writes:
Chad J wrote:
 I'm a bit of a newbie to this whole multithreading thing, so I'm hoping 
 someone can help me with this.
 
 First, what I know (or what I think I know):  So I've been reading about 
 this, and apparently there's this problem where two threads that are 
 reading a value from the same address in memory at the same time may end 
 up with two completely different values.  Apparently this is because 
 when you write something, it may just end up in the cache and not be 
 updated in global memory.  Also, when reading, you may end up with 
 something that is just an outdated copy in the cache, and not the actual 
 thing from global memory.  But it doesn't stop there, apparently x86 
 computers are very forgiving on this stuff so if you make a mistake you 
 won't know it until your program is run on some more obscure hardware.
 
 Now then, my question:  In D, how do I ensure that when I write 
 something, the write is to global memory, and when I read something, the 
 read is from global memory?
 
 Some more info:  This comes up because I am trying to write a Timer 
 class for Tango, and it will include timers that trigger events at a 
 later date, which requires multithreading.  So it'd be most helpful if I 
 could accomplish this using only D features and/or Tango.

Basicially, you need to protect the value from contention between two threads. There are a number of ways to do this: 1) using native D facilities via the synchronized keyword: expose a getter and setter method, and have them both synch on the same object/lock. This is a fairly heavyweight resolution, but it would work. 2) get under the covers and utilize a mutex, semaphore, or some other classical synchronization construct exposed by the OS itself. Tango will provide a cross-platform way of doing this in the next release. This is potentially lighter weight than #1 3) use CPU-specific instructions to ensure value access is atomic. This is what Sean has exposed in the Atomic module within Tango. It is a lightweight and low-overhead solution, and works by locking the bus for the duration of the read/write access. 4) use a small discrete unit for the value. If value is just a byte, the underlying hardware will usually treat it as an indivisible unit, giving you the desired result (similar to #3). However, there are memory barriers involved also, which D respects via the "volatile" keyword. Beyond that, there may be issues with cache-reconciliation on a multi-core device, so this approach is generally not recommended. - Kris
Feb 20 2007
next sibling parent Sean Kelly <sean f4.ca> writes:
kris wrote:
 
 4) use a small discrete unit for the value. If value is just a byte, the 
 underlying hardware will usually treat it as an indivisible unit, giving 
 you the desired result (similar to #3). However, there are memory 
 barriers involved also, which D respects via the "volatile" keyword. 
 Beyond that, there may be issues with cache-reconciliation on a 
 multi-core device, so this approach is generally not recommended.

The "volatile" keyword is somewhat tricky, as it's effectively a memory barrier for compiler optimizations only. It will prevent the compiler from moving loads/stores across the volatile region during optimization, but it does not affect the ASM code in any way. I think it will also affect whether register caching of loads occurs, etc, which is occasionally necessary if you're performing a busy wait on a shared variable. ie, under normal circumstances: while( i == 0 ) { // do nothing } it's clear to the compiler that the value of 'i' will not change during the loop, so the code could theoretically be transformed into: if( i == 0 ) { while( true ) { // do nothing } } which is an equivalent sequence of operations. In C++, this is called the "as if" rule: the compiler can do whatever the heck it wants so long as the behavior meets expectations /within the context of the virtual machine described for the language/. For C++, this virtual machine is single-threaded so the above transformation is legal. I believe the same is currently true of D, though D provides "volatile" to tell the compiler "I don't care what fancy stuff you think you can do to this code to make it faster. Don't do it. I know more than you do about what's going on here." That said, progress is being made towards defining a multithreaded virtual machine for C++, and once it is settled I suspect the D model will follow the C++ model in spirit, if perhaps not exactly. By the way, for any who are interested, Doug Lea described a memory model last month that I think has tremendous promise, regardless of whether it's chosen for C++. He describes it here: http://www.decadentplace.org.uk/pipermail/cpp-threads/2007-January/001287.html Sean
Feb 20 2007
prev sibling parent reply Chad J <gamerChad _spamIsBad_gmail.com> writes:
kris wrote:
 
 Basicially, you need to protect the value from contention between two 
 threads. There are a number of ways to do this:
 
 1) using native D facilities via the synchronized keyword: expose a 
 getter and setter method, and have them both synch on the same 
 object/lock. This is a fairly heavyweight resolution, but it would work.
 

So would something like this do the trick? class Mutex { uint pointlessVariable; } Mutex mutex; uint m_value = 42; // The thing to be protected. uint value() // getter { synchronized(mutex) return m_value; } uint value(uint newbie) // setter { synchronized(mutex) m_value = newbie; return newbie; } Or am I supposed to do something else like put m_value inside the mutex class?
 2) get under the covers and utilize a mutex, semaphore, or some other 
 classical synchronization construct exposed by the OS itself. Tango will 
 provide a cross-platform way of doing this in the next release. This is 
 potentially lighter weight than #1
 

I suppose I'll wait for that release and see what happens.
 3) use CPU-specific instructions to ensure value access is atomic. This 
 is what Sean has exposed in the Atomic module within Tango. It is a 
 lightweight and low-overhead solution, and works by locking the bus for 
 the duration of the read/write access.
 

This sounds cool, but I don't quite understand how to use the Atomic module - what is msync and which value of it do I pick to make things work? I made a post about this in the Tango forum incase it's more appropriate to discuss there.
 4) use a small discrete unit for the value. If value is just a byte, the 
 underlying hardware will usually treat it as an indivisible unit, giving 
 you the desired result (similar to #3). However, there are memory 
 barriers involved also, which D respects via the "volatile" keyword. 
 Beyond that, there may be issues with cache-reconciliation on a 
 multi-core device, so this approach is generally not recommended.
 

Right, I'll just stay away from that.
 - Kris

When I was porting Phobos to work on ARM-WinCE, it was very helpful to be able to discard a module without breaking other parts of the lib, namely in the case of that module requiring a broken language feature or inline assembly (inline asm is not currently available with arm-wince-pe-gdc, and I don't feel like learning ARM asm yet). That said, Atomic looks like it will be very broken on ARM in its current state. I also benchmarked synchronized reads vs atomic reads and yeah, synchronized was much slower (I picked "whatever makes it compile" values for msync). So I'll probably implement a version using only synchronization and a version that uses Atomic instead whenever possible.
Feb 20 2007
next sibling parent reply kris <foo bar.com> writes:
Chad J wrote:
 When I was porting Phobos to work on ARM-WinCE, it was very helpful to 
 be able to discard a module without breaking other parts of the lib, 
 namely in the case of that module requiring a broken language feature or 
 inline assembly 

if the underlying OS api's are present, then the upcoming tango.locks ought to work on WinCE. I'd imagine this to be your best bet, or to go with synchronized instead :) Which approach you choose is ultimately down to the manner in which you need to share the entity.
Feb 20 2007
parent reply Chad J <gamerChad _spamIsBad_gmail.com> writes:
kris wrote:
 Chad J wrote:
 
 When I was porting Phobos to work on ARM-WinCE, it was very helpful to 
 be able to discard a module without breaking other parts of the lib, 
 namely in the case of that module requiring a broken language feature 
 or inline assembly 

if the underlying OS api's are present, then the upcoming tango.locks ought to work on WinCE. I'd imagine this to be your best bet, or to go with synchronized instead :) Which approach you choose is ultimately down to the manner in which you need to share the entity.

Alright. I'm starting to think it would be handy if modules that only work on some platforms (like Atomic and possibly Locks) would expose a const bool variable that is set to true if the module is supported on the hardware, and false if it isn't. That way I could version different blocks of code by that, rather than trying to guess what will compile on the different platforms.
Feb 20 2007
next sibling parent kris <foo bar.com> writes:
Chad J wrote:
 kris wrote:
 
 Chad J wrote:

 When I was porting Phobos to work on ARM-WinCE, it was very helpful 
 to be able to discard a module without breaking other parts of the 
 lib, namely in the case of that module requiring a broken language 
 feature or inline assembly 

if the underlying OS api's are present, then the upcoming tango.locks ought to work on WinCE. I'd imagine this to be your best bet, or to go with synchronized instead :) Which approach you choose is ultimately down to the manner in which you need to share the entity.

Alright. I'm starting to think it would be handy if modules that only work on some platforms (like Atomic and possibly Locks) would expose a const bool variable that is set to true if the module is supported on the hardware, and false if it isn't. That way I could version different blocks of code by that, rather than trying to guess what will compile on the different platforms.

That's a very good point. Some kind of mechanism would be very convenient
Feb 20 2007
prev sibling parent kris <foo bar.com> writes:
Chad J wrote:
 kris wrote:
 
 Chad J wrote:

 When I was porting Phobos to work on ARM-WinCE, it was very helpful 
 to be able to discard a module without breaking other parts of the 
 lib, namely in the case of that module requiring a broken language 
 feature or inline assembly 

if the underlying OS api's are present, then the upcoming tango.locks ought to work on WinCE. I'd imagine this to be your best bet, or to go with synchronized instead :) Which approach you choose is ultimately down to the manner in which you need to share the entity.

Alright. I'm starting to think it would be handy if modules that only work on some platforms (like Atomic and possibly Locks) would expose a const bool variable that is set to true if the module is supported on the hardware, and false if it isn't. That way I could version different blocks of code by that, rather than trying to guess what will compile on the different platforms.

If you're serious about having Tango run on WinCE, come and chat with us on the IRC channel? - Kris
Feb 20 2007
prev sibling parent reply Sean Kelly <sean f4.ca> writes:
Chad J wrote:
 kris wrote:
 Basicially, you need to protect the value from contention between two 
 threads. There are a number of ways to do this:

 1) using native D facilities via the synchronized keyword: expose a 
 getter and setter method, and have them both synch on the same 
 object/lock. This is a fairly heavyweight resolution, but it would work.

So would something like this do the trick? class Mutex { uint pointlessVariable; } Mutex mutex; uint m_value = 42; // The thing to be protected. uint value() // getter { synchronized(mutex) return m_value; } uint value(uint newbie) // setter { synchronized(mutex) m_value = newbie; return newbie; } Or am I supposed to do something else like put m_value inside the mutex class?

You could use synchronized with no arguments and everything will work file. The default behavior for free functions is to synchronize on a hidden global object. Alternately: Object valueLock = new Object; uint m_value = 42; // The thing to be protected. uint value() // getter { synchronized(valueLock) return m_value; } uint value(uint newbie) // setter { synchronized(valueLock) m_value = newbie; return newbie; } This works if you want to synch only specific functions with respect to one anohther.
 3) use CPU-specific instructions to ensure value access is atomic. 
 This is what Sean has exposed in the Atomic module within Tango. It is 
 a lightweight and low-overhead solution, and works by locking the bus 
 for the duration of the read/write access.

This sounds cool, but I don't quite understand how to use the Atomic module - what is msync and which value of it do I pick to make things work? I made a post about this in the Tango forum incase it's more appropriate to discuss there.

The Tango forums are probably more appropriate, but I can give a quick summary here (I'm on my way out the door as I write this). tango.core.Atomic does essentially two things: it ensures that any operation it performs is atomic, and it provides methods to control memory ordering regarding such operations. The latter issue is somewhat complicated, but suffice to say that msync.seq is the safest option and should be used in most situations. So for the above: uint m_value = 42; uint value() // getter { return atomicLoad!(msync.seq)( m_value ); } uint value(uint newbie) // setter { atomicStore!(msync.seq)( m_value, newbie ); return newbie; } For data which will always be modified atomically, a wrapper struct is also provided: Atomic!(uint) m_value; // Atomic really needs a ctor, but this should work // for "fast" construction. m_value.store!(msync.raw)( 42 ); uint value() // getter { return m_value.load!(msync.seq); } uint value(uint newbie) // setter { m_value.store!(msync.seq)( newbie ); return newbie; } Please note that Atomic currently only supports x86, but if there's a demand for it then I may add support for other architectures. If this happens, it will probably under Posix (and not Win32), since I'm not entirely sure about out-of-the-box assembler support with DMD/Win32.
 When I was porting Phobos to work on ARM-WinCE, it was very helpful to 
 be able to discard a module without breaking other parts of the lib, 
 namely in the case of that module requiring a broken language feature or 
 inline assembly (inline asm is not currently available with 
 arm-wince-pe-gdc, and I don't feel like learning ARM asm yet).  That 
 said, Atomic looks like it will be very broken on ARM in its current 
 state.  I also benchmarked synchronized reads vs atomic reads and yeah, 
 synchronized was much slower (I picked "whatever makes it compile" 
 values for msync).  So I'll probably implement a version using only 
 synchronization and a version that uses Atomic instead whenever possible.

See above :-) Atomic won't work on ARM without additional code. By the way, it may also eventually be necessary to add a hardware instruction for ordering load operations on x86, since I'm becoming convinced that load reordering is actually allowed by the IA-32 spec (and it may actually be done on some AMD CPUs). I've been resisting this until now because it will slow down synchronized loads substantially for what may be only a small portion of the x86 hardware in production. So if you (or anyone) decides to use Atomic as-is and see weird behavior with atomicLoad using msync.acq or msync.hlb, please let me know. Sean
Feb 20 2007
parent reply Chad J <gamerChad _spamIsBad_gmail.com> writes:
Sean Kelly wrote:
 Chad J wrote:
 
 kris wrote:

 Basicially, you need to protect the value from contention between two 
 threads. There are a number of ways to do this:

 1) using native D facilities via the synchronized keyword: expose a 
 getter and setter method, and have them both synch on the same 
 object/lock. This is a fairly heavyweight resolution, but it would work.

So would something like this do the trick? class Mutex { uint pointlessVariable; } Mutex mutex; uint m_value = 42; // The thing to be protected. uint value() // getter { synchronized(mutex) return m_value; } uint value(uint newbie) // setter { synchronized(mutex) m_value = newbie; return newbie; } Or am I supposed to do something else like put m_value inside the mutex class?

You could use synchronized with no arguments and everything will work file. The default behavior for free functions is to synchronize on a hidden global object. Alternately: Object valueLock = new Object; uint m_value = 42; // The thing to be protected. uint value() // getter { synchronized(valueLock) return m_value; } uint value(uint newbie) // setter { synchronized(valueLock) m_value = newbie; return newbie; } This works if you want to synch only specific functions with respect to one anohther.
 3) use CPU-specific instructions to ensure value access is atomic. 
 This is what Sean has exposed in the Atomic module within Tango. It 
 is a lightweight and low-overhead solution, and works by locking the 
 bus for the duration of the read/write access.

This sounds cool, but I don't quite understand how to use the Atomic module - what is msync and which value of it do I pick to make things work? I made a post about this in the Tango forum incase it's more appropriate to discuss there.

The Tango forums are probably more appropriate, but I can give a quick summary here (I'm on my way out the door as I write this). tango.core.Atomic does essentially two things: it ensures that any operation it performs is atomic, and it provides methods to control memory ordering regarding such operations. The latter issue is somewhat complicated, but suffice to say that msync.seq is the safest option and should be used in most situations. So for the above: uint m_value = 42; uint value() // getter { return atomicLoad!(msync.seq)( m_value ); } uint value(uint newbie) // setter { atomicStore!(msync.seq)( m_value, newbie ); return newbie; } For data which will always be modified atomically, a wrapper struct is also provided: Atomic!(uint) m_value; // Atomic really needs a ctor, but this should work // for "fast" construction. m_value.store!(msync.raw)( 42 ); uint value() // getter { return m_value.load!(msync.seq); } uint value(uint newbie) // setter { m_value.store!(msync.seq)( newbie ); return newbie; } Please note that Atomic currently only supports x86, but if there's a demand for it then I may add support for other architectures. If this happens, it will probably under Posix (and not Win32), since I'm not entirely sure about out-of-the-box assembler support with DMD/Win32.
 When I was porting Phobos to work on ARM-WinCE, it was very helpful to 
 be able to discard a module without breaking other parts of the lib, 
 namely in the case of that module requiring a broken language feature 
 or inline assembly (inline asm is not currently available with 
 arm-wince-pe-gdc, and I don't feel like learning ARM asm yet).  That 
 said, Atomic looks like it will be very broken on ARM in its current 
 state.  I also benchmarked synchronized reads vs atomic reads and 
 yeah, synchronized was much slower (I picked "whatever makes it 
 compile" values for msync).  So I'll probably implement a version 
 using only synchronization and a version that uses Atomic instead 
 whenever possible.

See above :-) Atomic won't work on ARM without additional code. By the way, it may also eventually be necessary to add a hardware instruction for ordering load operations on x86, since I'm becoming convinced that load reordering is actually allowed by the IA-32 spec (and it may actually be done on some AMD CPUs). I've been resisting this until now because it will slow down synchronized loads substantially for what may be only a small portion of the x86 hardware in production. So if you (or anyone) decides to use Atomic as-is and see weird behavior with atomicLoad using msync.acq or msync.hlb, please let me know. Sean

Cool thanks for the info and this handy low-overhead threading tool. I didn't have any problems with msync.acq or msync.hlb so far, but when loading using msync.seq I get a Win32 Exception. I created a ticket about this.
Feb 20 2007
parent Sean Kelly <sean f4.ca> writes:
Chad J wrote:
 
 Cool thanks for the info and this handy low-overhead threading tool.
 
 I didn't have any problems with msync.acq or msync.hlb so far, but when 
 loading using msync.seq I get a Win32 Exception.  I created a ticket 
 about this.

Well that's new :-) Looks like it may be a codegen bug of some sort (?) I've submitted a bug report for it. Sean
Feb 20 2007