digitalmars.D.learn - Need help with communication between multiple threads

Chad J (19/19) Feb 20 2007 I'm a bit of a newbie to this whole multithreading thing, so I'm hoping

kris (21/43) Feb 20 2007 Basicially, you need to protect the value from contention between two

Sean Kelly (38/45) Feb 20 2007 The "volatile" keyword is somewhat tricky, as it's effectively a memory
Chad J (37/63) Feb 20 2007 So would something like this do the trick?

kris (6/10) Feb 20 2007 if the underlying OS api's are present, then the upcoming tango.locks

Chad J (8/22) Feb 20 2007 Alright.

kris (2/29) Feb 20 2007 That's a very good point. Some kind of mechanism would be very convenien...
kris (4/31) Feb 20 2007 If you're serious about having Tango run on WinCE, come and chat with us...

Sean Kelly (66/119) Feb 20 2007 You could use synchronized with no arguments and everything will work

Chad J (5/153) Feb 20 2007 Cool thanks for the info and this handy low-overhead threading tool.

Sean Kelly (4/10) Feb 20 2007 Well that's new :-) Looks like it may be a codegen bug of some sort (?)...

Chad J <gamerChad _spamIsBad_gmail.com> writes:

I'm a bit of a newbie to this whole multithreading thing, so I'm hoping 
someone can help me with this.

First, what I know (or what I think I know):  So I've been reading about 
this, and apparently there's this problem where two threads that are 
reading a value from the same address in memory at the same time may end 
up with two completely different values.  Apparently this is because 
when you write something, it may just end up in the cache and not be 
updated in global memory.  Also, when reading, you may end up with 
something that is just an outdated copy in the cache, and not the actual 
thing from global memory.  But it doesn't stop there, apparently x86 
computers are very forgiving on this stuff so if you make a mistake you 
won't know it until your program is run on some more obscure hardware.

Now then, my question:  In D, how do I ensure that when I write 
something, the write is to global memory, and when I read something, the 
read is from global memory?

Some more info:  This comes up because I am trying to write a Timer 
class for Tango, and it will include timers that trigger events at a 
later date, which requires multithreading.  So it'd be most helpful if I 
could accomplish this using only D features and/or Tango.

Feb 20 2007

kris <foo bar.com> writes:

Chad J wrote:
 I'm a bit of a newbie to this whole multithreading thing, so I'm hoping 
 someone can help me with this.
 
 First, what I know (or what I think I know):  So I've been reading about 
 this, and apparently there's this problem where two threads that are 
 reading a value from the same address in memory at the same time may end 
 up with two completely different values.  Apparently this is because 
 when you write something, it may just end up in the cache and not be 
 updated in global memory.  Also, when reading, you may end up with 
 something that is just an outdated copy in the cache, and not the actual 
 thing from global memory.  But it doesn't stop there, apparently x86 
 computers are very forgiving on this stuff so if you make a mistake you 
 won't know it until your program is run on some more obscure hardware.
 
 Now then, my question:  In D, how do I ensure that when I write 
 something, the write is to global memory, and when I read something, the 
 read is from global memory?
 
 Some more info:  This comes up because I am trying to write a Timer 
 class for Tango, and it will include timers that trigger events at a 
 later date, which requires multithreading.  So it'd be most helpful if I 
 could accomplish this using only D features and/or Tango.

Basicially, you need to protect the value from contention between two 
threads. There are a number of ways to do this:

1) using native D facilities via the synchronized keyword: expose a 
getter and setter method, and have them both synch on the same 
object/lock. This is a fairly heavyweight resolution, but it would work.

2) get under the covers and utilize a mutex, semaphore, or some other 
classical synchronization construct exposed by the OS itself. Tango will 
provide a cross-platform way of doing this in the next release. This is 


3) use CPU-specific instructions to ensure value access is atomic. This 
is what Sean has exposed in the Atomic module within Tango. It is a 
lightweight and low-overhead solution, and works by locking the bus for 
the duration of the read/write access.

4) use a small discrete unit for the value. If value is just a byte, the 
underlying hardware will usually treat it as an indivisible unit, giving 

barriers involved also, which D respects via the "volatile" keyword. 
Beyond that, there may be issues with cache-reconciliation on a 
multi-core device, so this approach is generally not recommended.

- Kris

Feb 20 2007

Sean Kelly <sean f4.ca> writes:

kris wrote:
 
 4) use a small discrete unit for the value. If value is just a byte, the 
 underlying hardware will usually treat it as an indivisible unit, giving 

 barriers involved also, which D respects via the "volatile" keyword. 
 Beyond that, there may be issues with cache-reconciliation on a 
 multi-core device, so this approach is generally not recommended.

The "volatile" keyword is somewhat tricky, as it's effectively a memory 
barrier for compiler optimizations only.  It will prevent the compiler 
from moving loads/stores across the volatile region during optimization, 
but it does not affect the ASM code in any way.  I think it will also 
affect whether register caching of loads occurs, etc, which is 
occasionally necessary if you're performing a busy wait on a shared 
variable.  ie, under normal circumstances:

     while( i == 0 )
     {
         // do nothing
     }

it's clear to the compiler that the value of 'i' will not change during 
the loop, so the code could theoretically be transformed into:

     if( i == 0 )
     {
         while( true )
         {
             // do nothing
         }
     }

which is an equivalent sequence of operations.  In C++, this is called 
the "as if" rule: the compiler can do whatever the heck it wants so long 
as the behavior meets expectations /within the context of the virtual 
machine described for the language/.  For C++, this virtual machine is 
single-threaded so the above transformation is legal.  I believe the 
same is currently true of D, though D provides "volatile" to tell the 
compiler "I don't care what fancy stuff you think you can do to this 
code to make it faster.  Don't do it.  I know more than you do about 
what's going on here."

That said, progress is being made towards defining a multithreaded 
virtual machine for C++, and once it is settled I suspect the D model 
will follow the C++ model in spirit, if perhaps not exactly.

By the way, for any who are interested, Doug Lea described a memory 
model last month that I think has tremendous promise, regardless of 
whether it's chosen for C++.  He describes it here:

http://www.decadentplace.org.uk/pipermail/cpp-threads/2007-January/001287.html


Sean

Feb 20 2007

Chad J <gamerChad _spamIsBad_gmail.com> writes:

kris wrote:
 
 Basicially, you need to protect the value from contention between two 
 threads. There are a number of ways to do this:
 
 1) using native D facilities via the synchronized keyword: expose a 
 getter and setter method, and have them both synch on the same 
 object/lock. This is a fairly heavyweight resolution, but it would work.
 

So would something like this do the trick?

class Mutex
{
     uint pointlessVariable;
}
Mutex mutex;

uint m_value = 42; // The thing to be protected.
uint value() // getter
{
     synchronized(mutex)
         return m_value;
}
uint value(uint newbie) // setter
{
     synchronized(mutex)
         m_value = newbie;

     return newbie;
}

Or am I supposed to do something else like put m_value inside the mutex 
class?

 2) get under the covers and utilize a mutex, semaphore, or some other 
 classical synchronization construct exposed by the OS itself. Tango will 
 provide a cross-platform way of doing this in the next release. This is 

 

I suppose I'll wait for that release and see what happens.

 3) use CPU-specific instructions to ensure value access is atomic. This 
 is what Sean has exposed in the Atomic module within Tango. It is a 
 lightweight and low-overhead solution, and works by locking the bus for 
 the duration of the read/write access.
 

This sounds cool, but I don't quite understand how to use the Atomic 
module - what is msync and which value of it do I pick to make things 
work?  I made a post about this in the Tango forum incase it's more 
appropriate to discuss there.

 4) use a small discrete unit for the value. If value is just a byte, the 
 underlying hardware will usually treat it as an indivisible unit, giving 

 barriers involved also, which D respects via the "volatile" keyword. 
 Beyond that, there may be issues with cache-reconciliation on a 
 multi-core device, so this approach is generally not recommended.
 

Right, I'll just stay away from that.

 - Kris

When I was porting Phobos to work on ARM-WinCE, it was very helpful to 
be able to discard a module without breaking other parts of the lib, 
namely in the case of that module requiring a broken language feature or 
inline assembly (inline asm is not currently available with 
arm-wince-pe-gdc, and I don't feel like learning ARM asm yet).  That 
said, Atomic looks like it will be very broken on ARM in its current 
state.  I also benchmarked synchronized reads vs atomic reads and yeah, 
synchronized was much slower (I picked "whatever makes it compile" 
values for msync).  So I'll probably implement a version using only 
synchronization and a version that uses Atomic instead whenever possible.

Feb 20 2007

kris <foo bar.com> writes:

Chad J wrote:
 When I was porting Phobos to work on ARM-WinCE, it was very helpful to 
 be able to discard a module without breaking other parts of the lib, 
 namely in the case of that module requiring a broken language feature or 
 inline assembly 

if the underlying OS api's are present, then the upcoming tango.locks 
ought to work on WinCE. I'd imagine this to be your best bet, or to go 
with synchronized instead :)

Which approach you choose is ultimately down to the manner in which you 
need to share the entity.

Feb 20 2007

Chad J <gamerChad _spamIsBad_gmail.com> writes:

kris wrote:
 Chad J wrote:
 
 When I was porting Phobos to work on ARM-WinCE, it was very helpful to 
 be able to discard a module without breaking other parts of the lib, 
 namely in the case of that module requiring a broken language feature 
 or inline assembly 

 
 
 if the underlying OS api's are present, then the upcoming tango.locks 
 ought to work on WinCE. I'd imagine this to be your best bet, or to go 
 with synchronized instead :)
 
 Which approach you choose is ultimately down to the manner in which you 
 need to share the entity.

Alright.

I'm starting to think it would be handy if modules that only work on 
some platforms (like Atomic and possibly Locks) would expose a const 
bool variable that is set to true if the module is supported on the 
hardware, and false if it isn't.  That way I could version different 
blocks of code by that, rather than trying to guess what will compile on 
the different platforms.

Feb 20 2007

kris <foo bar.com> writes:

Chad J wrote:
 kris wrote:
 
 Chad J wrote:

 When I was porting Phobos to work on ARM-WinCE, it was very helpful 
 to be able to discard a module without breaking other parts of the 
 lib, namely in the case of that module requiring a broken language 
 feature or inline assembly 



 if the underlying OS api's are present, then the upcoming tango.locks 
 ought to work on WinCE. I'd imagine this to be your best bet, or to go 
 with synchronized instead :)

 Which approach you choose is ultimately down to the manner in which 
 you need to share the entity.

 
 
 Alright.
 
 I'm starting to think it would be handy if modules that only work on 
 some platforms (like Atomic and possibly Locks) would expose a const 
 bool variable that is set to true if the module is supported on the 
 hardware, and false if it isn't.  That way I could version different 
 blocks of code by that, rather than trying to guess what will compile on 
 the different platforms.

That's a very good point. Some kind of mechanism would be very convenient

Feb 20 2007

kris <foo bar.com> writes:

Chad J wrote:
 kris wrote:
 
 Chad J wrote:

 When I was porting Phobos to work on ARM-WinCE, it was very helpful 
 to be able to discard a module without breaking other parts of the 
 lib, namely in the case of that module requiring a broken language 
 feature or inline assembly 



 if the underlying OS api's are present, then the upcoming tango.locks 
 ought to work on WinCE. I'd imagine this to be your best bet, or to go 
 with synchronized instead :)

 Which approach you choose is ultimately down to the manner in which 
 you need to share the entity.

 
 
 Alright.
 
 I'm starting to think it would be handy if modules that only work on 
 some platforms (like Atomic and possibly Locks) would expose a const 
 bool variable that is set to true if the module is supported on the 
 hardware, and false if it isn't.  That way I could version different 
 blocks of code by that, rather than trying to guess what will compile on 
 the different platforms.

If you're serious about having Tango run on WinCE, come and chat with us 
on the IRC channel?

- Kris

Feb 20 2007

Sean Kelly <sean f4.ca> writes:

Chad J wrote:
 kris wrote:
 Basicially, you need to protect the value from contention between two 
 threads. There are a number of ways to do this:

 1) using native D facilities via the synchronized keyword: expose a 
 getter and setter method, and have them both synch on the same 
 object/lock. This is a fairly heavyweight resolution, but it would work.

 
 So would something like this do the trick?
 
 class Mutex
 {
     uint pointlessVariable;
 }
 Mutex mutex;
 
 uint m_value = 42; // The thing to be protected.
 uint value() // getter
 {
     synchronized(mutex)
         return m_value;
 }
 uint value(uint newbie) // setter
 {
     synchronized(mutex)
         m_value = newbie;
 
     return newbie;
 }
 
 Or am I supposed to do something else like put m_value inside the mutex 
 class?

You could use synchronized with no arguments and everything will work 
file.  The default behavior for free functions is to synchronize on a 
hidden global object.  Alternately:

     Object valueLock = new Object;

     uint m_value = 42; // The thing to be protected.

     uint value() // getter
     {
         synchronized(valueLock)
             return m_value;
     }

     uint value(uint newbie) // setter
     {
         synchronized(valueLock)
             m_value = newbie;
         return newbie;
     }

This works if you want to synch only specific functions with respect to 
one anohther.

 3) use CPU-specific instructions to ensure value access is atomic. 
 This is what Sean has exposed in the Atomic module within Tango. It is 
 a lightweight and low-overhead solution, and works by locking the bus 
 for the duration of the read/write access.

 
 This sounds cool, but I don't quite understand how to use the Atomic 
 module - what is msync and which value of it do I pick to make things 
 work?  I made a post about this in the Tango forum incase it's more 
 appropriate to discuss there.

The Tango forums are probably more appropriate, but I can give a quick 
summary here (I'm on my way out the door as I write this). 
tango.core.Atomic does essentially two things: it ensures that any 
operation it performs is atomic, and it provides methods to control 
memory ordering regarding such operations.  The latter issue is somewhat 
complicated, but suffice to say that msync.seq is the safest option and 
should be used in most situations.  So for the above:

     uint m_value = 42;

     uint value() // getter
     {
         return atomicLoad!(msync.seq)( m_value );
     }

     uint value(uint newbie) // setter
     {
         atomicStore!(msync.seq)( m_value, newbie );
         return newbie;
     }

For data which will always be modified atomically, a wrapper struct is 
also provided:

     Atomic!(uint) m_value;

     // Atomic really needs a ctor, but this should work
     // for "fast" construction.
     m_value.store!(msync.raw)( 42 );

     uint value() // getter
     {
         return m_value.load!(msync.seq);
     }

     uint value(uint newbie) // setter
     {
         m_value.store!(msync.seq)( newbie );
         return newbie;
     }

Please note that Atomic currently only supports x86, but if there's a 
demand for it then I may add support for other architectures.  If this 
happens, it will probably under Posix (and not Win32), since I'm not 
entirely sure about out-of-the-box assembler support with DMD/Win32.

 When I was porting Phobos to work on ARM-WinCE, it was very helpful to 
 be able to discard a module without breaking other parts of the lib, 
 namely in the case of that module requiring a broken language feature or 
 inline assembly (inline asm is not currently available with 
 arm-wince-pe-gdc, and I don't feel like learning ARM asm yet).  That 
 said, Atomic looks like it will be very broken on ARM in its current 
 state.  I also benchmarked synchronized reads vs atomic reads and yeah, 
 synchronized was much slower (I picked "whatever makes it compile" 
 values for msync).  So I'll probably implement a version using only 
 synchronization and a version that uses Atomic instead whenever possible.

See above :-)  Atomic won't work on ARM without additional code.

By the way, it may also eventually be necessary to add a hardware 
instruction for ordering load operations on x86, since I'm becoming 
convinced that load reordering is actually allowed by the IA-32 spec 
(and it may actually be done on some AMD CPUs).  I've been resisting 
this until now because it will slow down synchronized loads 
substantially for what may be only a small portion of the x86 hardware 
in production.  So if you (or anyone) decides to use Atomic as-is and 
see weird behavior with atomicLoad using msync.acq or msync.hlb, please 
let me know.


Sean

Feb 20 2007

Chad J <gamerChad _spamIsBad_gmail.com> writes:

Sean Kelly wrote:
 Chad J wrote:
 
 kris wrote:

 Basicially, you need to protect the value from contention between two 
 threads. There are a number of ways to do this:

 1) using native D facilities via the synchronized keyword: expose a 
 getter and setter method, and have them both synch on the same 
 object/lock. This is a fairly heavyweight resolution, but it would work.

 So would something like this do the trick?

 class Mutex
 {
     uint pointlessVariable;
 }
 Mutex mutex;

 uint m_value = 42; // The thing to be protected.
 uint value() // getter
 {
     synchronized(mutex)
         return m_value;
 }
 uint value(uint newbie) // setter
 {
     synchronized(mutex)
         m_value = newbie;

     return newbie;
 }

 Or am I supposed to do something else like put m_value inside the 
 mutex class?

 
 
 You could use synchronized with no arguments and everything will work 
 file.  The default behavior for free functions is to synchronize on a 
 hidden global object.  Alternately:
 
     Object valueLock = new Object;
 
     uint m_value = 42; // The thing to be protected.
 
     uint value() // getter
     {
         synchronized(valueLock)
             return m_value;
     }
 
     uint value(uint newbie) // setter
     {
         synchronized(valueLock)
             m_value = newbie;
         return newbie;
     }
 
 This works if you want to synch only specific functions with respect to 
 one anohther.
 
 3) use CPU-specific instructions to ensure value access is atomic. 
 This is what Sean has exposed in the Atomic module within Tango. It 
 is a lightweight and low-overhead solution, and works by locking the 
 bus for the duration of the read/write access.


 This sounds cool, but I don't quite understand how to use the Atomic 
 module - what is msync and which value of it do I pick to make things 
 work?  I made a post about this in the Tango forum incase it's more 
 appropriate to discuss there.

 
 
 The Tango forums are probably more appropriate, but I can give a quick 
 summary here (I'm on my way out the door as I write this). 
 tango.core.Atomic does essentially two things: it ensures that any 
 operation it performs is atomic, and it provides methods to control 
 memory ordering regarding such operations.  The latter issue is somewhat 
 complicated, but suffice to say that msync.seq is the safest option and 
 should be used in most situations.  So for the above:
 
     uint m_value = 42;
 
     uint value() // getter
     {
         return atomicLoad!(msync.seq)( m_value );
     }
 
     uint value(uint newbie) // setter
     {
         atomicStore!(msync.seq)( m_value, newbie );
         return newbie;
     }
 
 For data which will always be modified atomically, a wrapper struct is 
 also provided:
 
     Atomic!(uint) m_value;
 
     // Atomic really needs a ctor, but this should work
     // for "fast" construction.
     m_value.store!(msync.raw)( 42 );
 
     uint value() // getter
     {
         return m_value.load!(msync.seq);
     }
 
     uint value(uint newbie) // setter
     {
         m_value.store!(msync.seq)( newbie );
         return newbie;
     }
 
 Please note that Atomic currently only supports x86, but if there's a 
 demand for it then I may add support for other architectures.  If this 
 happens, it will probably under Posix (and not Win32), since I'm not 
 entirely sure about out-of-the-box assembler support with DMD/Win32.
 
 When I was porting Phobos to work on ARM-WinCE, it was very helpful to 
 be able to discard a module without breaking other parts of the lib, 
 namely in the case of that module requiring a broken language feature 
 or inline assembly (inline asm is not currently available with 
 arm-wince-pe-gdc, and I don't feel like learning ARM asm yet).  That 
 said, Atomic looks like it will be very broken on ARM in its current 
 state.  I also benchmarked synchronized reads vs atomic reads and 
 yeah, synchronized was much slower (I picked "whatever makes it 
 compile" values for msync).  So I'll probably implement a version 
 using only synchronization and a version that uses Atomic instead 
 whenever possible.

 
 
 See above :-)  Atomic won't work on ARM without additional code.
 
 By the way, it may also eventually be necessary to add a hardware 
 instruction for ordering load operations on x86, since I'm becoming 
 convinced that load reordering is actually allowed by the IA-32 spec 
 (and it may actually be done on some AMD CPUs).  I've been resisting 
 this until now because it will slow down synchronized loads 
 substantially for what may be only a small portion of the x86 hardware 
 in production.  So if you (or anyone) decides to use Atomic as-is and 
 see weird behavior with atomicLoad using msync.acq or msync.hlb, please 
 let me know.
 
 
 Sean

Cool thanks for the info and this handy low-overhead threading tool.

I didn't have any problems with msync.acq or msync.hlb so far, but when 
loading using msync.seq I get a Win32 Exception.  I created a ticket 
about this.

Feb 20 2007

Sean Kelly <sean f4.ca> writes:

Chad J wrote:
 
 Cool thanks for the info and this handy low-overhead threading tool.
 
 I didn't have any problems with msync.acq or msync.hlb so far, but when 
 loading using msync.seq I get a Win32 Exception.  I created a ticket 
 about this.

Well that's new :-)  Looks like it may be a codegen bug of some sort (?) 
  I've submitted a bug report for it.


Sean

Feb 20 2007

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Need help with communication between multiple threads