digitalmars.D - Word Tearing: Still a practical problem?
- dsimcha <dsimcha yahoo.com> Mar 21 2011
- bearophile <bearophileHUGS lycos.com> Mar 21 2011
- dsimcha <dsimcha yahoo.com> Mar 21 2011
- "nedbrek" <nedbrek yahoo.com> Mar 21 2011
- dsimcha <dsimcha yahoo.com> Mar 21 2011
- "Nick Sabalausky" <a a.a> Mar 21 2011
- %u <wfunction hotmail.com> Mar 21 2011
A few posts deep in the discussion on std.parallelism have prompted me to double-check an assumption that I made previously. Is writing to adjacent but non-overlapping memory addresses concurrently from different threads safe on all hardware we care about supporting? I know this isn't safe on some DS9K-like architectures that we don't care about, like old DEC Alphas. This is because the hardware doesn't allow addressing of single bytes. I'm also aware of the performance implications of false sharing, but this is not of concern because, for the cases where adjacent memory addresses are written to concurrently in std.parallelism or its examples, these are only a tiny fraction of writes and would not have a significant impact on performance. I'm also aware that the compiler could in theory generate instructions to perform writes at a higher granularity than what's specified by the source code, but I imagine this is a purely theoretical concern, as I can't see any reason why it would in practice. IMHO if this is already the way it works in practice, it should be formally specified by D's memory model.
Mar 21 2011
dsimcha:Is writing to adjacent but non-overlapping memory addresses concurrently from different threads safe on all hardware we care about supporting?
Aren't some problems caused by writing on the same cache line? Bye, bearophile
Mar 21 2011
== Quote from bearophile (bearophileHUGS lycos.com)'s articledsimcha:Is writing to adjacent but non-overlapping memory addresses concurrently from different threads safe on all hardware we care about supporting?
Bye, bearophile
I think you're referring to false sharing. If so, this is only a performance problem, nit a correctness problem. If not, please elaborate. Also, on x86, cache coherency circuitry make the cache much more transparent than on some architectures. I'm not so sure about others.
Mar 21 2011
Hello all, "dsimcha" <dsimcha yahoo.com> wrote in message news:im8d3b$j78$1 digitalmars.com...A few posts deep in the discussion on std.parallelism have prompted me to double-check an assumption that I made previously. Is writing to adjacent but non-overlapping memory addresses concurrently from different threads safe on all hardware we care about supporting? I know this isn't safe on some DS9K-like architectures that we don't care about, like old DEC Alphas. This is because the hardware doesn't allow addressing of single bytes. I'm also aware of the performance implications of false sharing, but this is not of concern because, for the cases where adjacent memory addresses are written to concurrently in std.parallelism or its examples, these are only a tiny fraction of writes and would not have a significant impact on performance.
The main architectures (x86 and ARM) are both byte granular. Most embedded platforms are also byte granular. Alpha is the only architecture I am aware of that had this problem. Possibly other old/high performance ones... (Cray, 360, etc.) Ned
Mar 21 2011
On 3/21/2011 7:55 PM, nedbrek wrote:Hello all, "dsimcha"<dsimcha yahoo.com> wrote in message news:im8d3b$j78$1 digitalmars.com...A few posts deep in the discussion on std.parallelism have prompted me to double-check an assumption that I made previously. Is writing to adjacent but non-overlapping memory addresses concurrently from different threads safe on all hardware we care about supporting? I know this isn't safe on some DS9K-like architectures that we don't care about, like old DEC Alphas. This is because the hardware doesn't allow addressing of single bytes. I'm also aware of the performance implications of false sharing, but this is not of concern because, for the cases where adjacent memory addresses are written to concurrently in std.parallelism or its examples, these are only a tiny fraction of writes and would not have a significant impact on performance.
The main architectures (x86 and ARM) are both byte granular. Most embedded platforms are also byte granular. Alpha is the only architecture I am aware of that had this problem. Possibly other old/high performance ones... (Cray, 360, etc.) Ned
Excellent. I highly doubt we care about std.parallelism working on embedded platforms. (Who the heck has a multicore embedded CPU anyway?) My only other concern is that the compiler could in theory do strange things that effectively increase granularity in some cases. I doubt any would in practice. I'd feel much better if I had some official-looking documentation, or at least assurance from Walter that DMD doesn't. Better yet would be assurance from a compiler expert (i.e. Walter) that all sanely implemented compilers for byte-granular hardware don't increase memory granularity in practice, even if they don't officially guarantee it.
Mar 21 2011
"dsimcha" <dsimcha yahoo.com> wrote in message news:im8pu5$1921$1 digitalmars.com...On 3/21/2011 7:55 PM, nedbrek wrote:The main architectures (x86 and ARM) are both byte granular. Most embedded platforms are also byte granular. Alpha is the only architecture I am aware of that had this problem. Possibly other old/high performance ones... (Cray, 360, etc.)
Excellent. I highly doubt we care about std.parallelism working on embedded platforms. (Who the heck has a multicore embedded CPU anyway?)
Parallax's Propeller microcontroller has 8 cores. But it's so low-memory that I doubt D would be appropriate for it. Someone did manage to make a C compiler for it, but even that involved some compromises (although not as many as the Propeller's built-in SPIN language).
Mar 21 2011
Excellent. I highly doubt we care about std.parallelism working on
anyway?) I KNOW!! 64k ought to be enough for anybody, right?
Mar 21 2011









dsimcha <dsimcha yahoo.com> 