www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Unsynchronized int access from threads

reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
Suppose I have an int[] which may contain some zeroes, and 2 functions,
one scans the array for zeroes but does not modify it, and the other
modifies array elements but never introduces new zeroes (though it may
write a 0 to an existing 0).  Is it thread-safe to run the two functions
in parallel without any synchronization between them?

I.e., will the first function always see zeroes where there are zeroes,
in spite of the 2nd function writing to the array simultaneously?  Are
there any hardware situations where the 1st function may read a non-zero
value if the 2nd function is simultaneously overwriting an existing zero
with another zero?  Or a situation where the 1st function may read a
zero if the 2nd function is simultaneously overwriting a non-zero value
with another non-zero value?

Or am I playing with fire here?


T

-- 
Chance favours the prepared mind. -- Louis Pasteur
Jun 18
next sibling parent Paul Backus <snarwin gmail.com> writes:
On Thursday, 18 June 2020 at 16:42:15 UTC, H. S. Teoh wrote:
 Suppose I have an int[] which may contain some zeroes, and 2 
 functions, one scans the array for zeroes but does not modify 
 it, and the other modifies array elements but never introduces 
 new zeroes (though it may write a 0 to an existing 0).  Is it 
 thread-safe to run the two functions in parallel without any 
 synchronization between them?

 I.e., will the first function always see zeroes where there are 
 zeroes, in spite of the 2nd function writing to the array 
 simultaneously?  Are there any hardware situations where the 
 1st function may read a non-zero value if the 2nd function is 
 simultaneously overwriting an existing zero with another zero?  
 Or a situation where the 1st function may read a zero if the 
 2nd function is simultaneously overwriting a non-zero value 
 with another non-zero value?

 Or am I playing with fire here?


 T
As long as reads and writes of `int` are atomic on the platform in question (which they probably are) you should be ok. Personally, I would use core.atomic.atomic{Load,Store} just to be safe; they should optimize themselves away if they're not needed.
Jun 18
prev sibling next sibling parent =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 6/18/20 9:42 AM, H. S. Teoh wrote:

 Are
 there any hardware situations where the 1st function may read a non-zero
 value if the 2nd function is simultaneously overwriting an existing zero
 with another zero?
I've worked in high performance projects where exactly that was done. I am not aware of any hardware feature that would make that unsafe. Ali
Jun 18
prev sibling next sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 6/18/20 12:42 PM, H. S. Teoh wrote:
 Suppose I have an int[] which may contain some zeroes, and 2 functions,
 one scans the array for zeroes but does not modify it, and the other
 modifies array elements but never introduces new zeroes (though it may
 write a 0 to an existing 0).  Is it thread-safe to run the two functions
 in parallel without any synchronization between them?
 
 I.e., will the first function always see zeroes where there are zeroes,
 in spite of the 2nd function writing to the array simultaneously?  Are
 there any hardware situations where the 1st function may read a non-zero
 value if the 2nd function is simultaneously overwriting an existing zero
 with another zero?  Or a situation where the 1st function may read a
 zero if the 2nd function is simultaneously overwriting a non-zero value
 with another non-zero value?
I think it's possible there's a problem if the reads are not atomic: Let's say a value is 0xffff0000 Thread that is looking for 0s reads the lower half of the number -> 0x????0000 Context switches, thread 2 writes 0x0000ffff into that memory slot. Back at the zero-scanning thread, it reads the second half -> 0x0000???? Combines with the first part, and now it sees a 0 where there isn't one. I would agree with Paul -- make the reads/writes atomic. -Steve
Jun 18
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Jun 18, 2020 at 03:07:54PM -0400, Steven Schveighoffer via
Digitalmars-d wrote:
[...]
 Let's say a value is 0xffff0000
 
 Thread that is looking for 0s reads the lower half of the number ->
 0x????0000
 
 Context switches, thread 2 writes 0x0000ffff into that memory slot.
 
 Back at the zero-scanning thread, it reads the second half ->
 0x0000????
 
 Combines with the first part, and now it sees a 0 where there isn't
 one.
[...] Hmmm very good point, hadn't thought of that. :-/ Oh well, I'll just think of another way of parallelizing my code. I think I can break the work up into large enough chunks that the overhead of using traditional synchronization primitives will be insignificant. Thanks all for the replies! T -- A mathematician learns more and more about less and less, until he knows everything about nothing; whereas a philospher learns less and less about more and more, until he knows nothing about everything.
Jun 18
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/18/20 6:15 PM, H. S. Teoh wrote:
 On Thu, Jun 18, 2020 at 03:07:54PM -0400, Steven Schveighoffer via
Digitalmars-d wrote:
 [...]
 Let's say a value is 0xffff0000

 Thread that is looking for 0s reads the lower half of the number ->
 0x????0000

 Context switches, thread 2 writes 0x0000ffff into that memory slot.

 Back at the zero-scanning thread, it reads the second half ->
 0x0000????

 Combines with the first part, and now it sees a 0 where there isn't
 one.
[...] Hmmm very good point, hadn't thought of that. :-/ Oh well, I'll just think of another way of parallelizing my code. I think I can break the work up into large enough chunks that the overhead of using traditional synchronization primitives will be insignificant.
Once again the asinine response "all inter-thread communication must be done by a handshake in which both threads participate knowingly" turns out to be correct.
Jun 18
prev sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 19.06.20 00:15, H. S. Teoh wrote:
 Oh well, I'll just
 think of another way of parallelizing my code.
I think Steven just says that your reads and writes have to be atomic for the behavior to be defined at language level, so using core.atomic with memory order `raw` may in fact do what you want. (Or not, I am not sure what your full use case is.) I don't think in terms of generated assembly this will differ from what you had in mind as aligned word stores and loads are atomic on x86.
Jun 18
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Fri, Jun 19, 2020 at 06:00:32AM +0200, Timon Gehr via Digitalmars-d wrote:
 On 19.06.20 00:15, H. S. Teoh wrote:
 Oh well, I'll just think of another way of parallelizing my code.
I think Steven just says that your reads and writes have to be atomic for the behavior to be defined at language level, so using core.atomic with memory order `raw` may in fact do what you want. (Or not, I am not sure what your full use case is.) I don't think in terms of generated assembly this will differ from what you had in mind as aligned word stores and loads are atomic on x86.
What I meant is that I'm thinking of optimizing at a higher level, to arrange it in such a way that I don't need to lock. E.g., if scanning for zeroes can't easily multithread with updating the array, perhaps the better way to do it is to let them run sequentially, but multithread across multiple instances of the computation instead. I did some experiments along this line, and I've been finding that I'm able to boost performance by 2x by multithreading at a higher level in the code, but this comes at a significant overhead for small problem sizes. I probably need to apply some kind of heuristic to switch between plain ole single-threaded code (for small problems where the overhead of parallel computations outweigh the benefits -- I'm seeing 2x slowdown for smaller problem sets), and the parallel version for larger problems where the overhead of multithreading is dwarfed by the overall performance gain. T -- "Outlook not so good." That magic 8-ball knows everything! I'll ask about Exchange Server next. -- (Stolen from the net)
Jun 18
parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 6/19/20 1:54 AM, H. S. Teoh wrote:
 What I meant is that I'm thinking of optimizing at a higher level, to
 arrange it in such a way that I don't need to lock. E.g., if scanning
 for zeroes can't easily multithread with updating the array, perhaps the
 better way to do it is to let them run sequentially, but multithread
 across multiple instances of the computation instead.
Atomic reads/writes are not as heavy as locks. Though certainly if there are better ways to organize the code, that might be more fruitful. -Steve
Jun 19
prev sibling next sibling parent claptrap <clap trap.com> writes:
On Thursday, 18 June 2020 at 16:42:15 UTC, H. S. Teoh wrote:
 Suppose I have an int[] which may contain some zeroes, and 2 
 functions, one scans the array for zeroes but does not modify 
 it, and the other modifies array elements but never introduces 
 new zeroes (though it may write a 0 to an existing 0).  Is it 
 thread-safe to run the two functions in parallel without any 
 synchronization between them?

 I.e., will the first function always see zeroes where there are 
 zeroes, in spite of the 2nd function writing to the array 
 simultaneously?  Are there any hardware situations where the 
 1st function may read a non-zero value if the 2nd function is 
 simultaneously overwriting an existing zero with another zero?  
 Or a situation where the 1st function may read a zero if the 
 2nd function is simultaneously overwriting a non-zero value 
 with another non-zero value?

 Or am I playing with fire here?
If you're on x86 all reads and writes are atomic if they are naturally aligned. IE 16 bits on 2 byte boundry, 32 bits on 4 byte boundry, 64 bits on 64 bit boundry. (Dont think it applies to 128 bit regs) Short version is you're fine as long as your ints are on 4 byte boundrys. If you want unaligned atomicity you need a lock prefix.
Jun 18
prev sibling next sibling parent claptrap <clap trap.com> writes:
On Thursday, 18 June 2020 at 16:42:15 UTC, H. S. Teoh wrote:
 Suppose I have an int[] which may contain some zeroes, and 2 
 functions, one scans the array for zeroes but does not modify 
 it, and the other modifies array elements but never introduces 
 new zeroes (though it may write a 0 to an existing 0).  Is it 
 thread-safe to run the two functions in parallel without any 
 synchronization between them?
Found the relevant doc http://www.cs.cmu.edu/~410-f10/doc/Intel_Reordering_318147.pdf
Jun 18
prev sibling parent aliak <something something.com> writes:
On Thursday, 18 June 2020 at 16:42:15 UTC, H. S. Teoh wrote:
 Suppose I have an int[] which may contain some zeroes, and 2 
 functions, one scans the array for zeroes but does not modify 
 it, and the other modifies array elements but never introduces 
 new zeroes (though it may write a 0 to an existing 0).  Is it 
 thread-safe to run the two functions in parallel without any 
 synchronization between them?

 I.e., will the first function always see zeroes where there are 
 zeroes, in spite of the 2nd function writing to the array 
 simultaneously?  Are there any hardware situations where the 
 1st function may read a non-zero value if the 2nd function is 
 simultaneously overwriting an existing zero with another zero?  
 Or a situation where the 1st function may read a zero if the 
 2nd function is simultaneously overwriting a non-zero value 
 with another non-zero value?

 Or am I playing with fire here?


 T
If there's no alignment hanky panky (e.g. struct packing), and you're on x86-64, you have your guarantee - as reading aligned words are atomic. If you want portability then there's no guarantee. Steven's case could certainly happen if a value crosses a word boundary on a platform that doesn't guarantee atomic reads/writes. I was confirming things and found this which was a fun little read: https://preshing.com/20130618/atomic-vs-non-atomic-operations/
Jun 18