www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - atomic operations compared to c++

reply gzp <z.p.gaal.devel gmail.com> writes:
I'm trying to port some simple lock-free algorithm to D and as 
the docs are quite minimal I'm stuck a little bit.

The memory order seem to be ok:
MemoryOrder.acq -> C++ accquire
MemoryOrder.rel -> C++ release
MemoryOrder.raw -> C++ relaxed
MemoryOrder.seq -> C++ seq_cst or acq_rel (the strongest)
There is no consume in D.

But what about compare_exchange (CAS) ? In C++ one have to 
provide Memory ordering for success and failure, but not in D. 
Does it mean, it is the strongest sequaential all the time, all 
some explicit fence have to be provided. Or the difference is 
that, CAS in D does not updates the expected value and in C++ the 
orderin is used for this update ? Thus in the usual spin loop I 
have to add an explicit fence on success?


ubyte flagsNow, newFlags;
do {
   flagsNow = atomicLoad!( MemoryOrder.acq )( flags_ );
   newFlags = update( flagsNow );
} while( !cas( &flags_, flagsNow, newFlags ) );
// do I need fence here ???


Another issue is the fence. In D there is no memoryordering for 
fence, only the strongest one exists. Is it intentional? (Not as 
if I have ever used explicit barriers apart from the one included 
in the atomic operations itself:) )

Thanks: Gzp
Jun 12
next sibling parent Kagamin <spam here.lot> writes:
LDC uses seq_cst seq_cst
Jun 13
prev sibling next sibling parent reply gzp <z.p.gaal.devel gmail.com> writes:
After digging into it the source for me it seems as D is lacking 
a "standardized" atomic library. It has some basic concepts, but 
far behind the c++ standards.
I don't know if there are any RFC-s in this topic but it requires 
a lot of work. Just to mention some by my first experience:

cas
in all api I've seen on a failed swap, the current value is 
retrieved
(in c/c++ there are intrinsic for them)

exchange
no api for it and not implementable without spinning
(in c/c++ there are intrinsic for them)

atomicFence
No memory ordering is considered in the API
Even tough it falls back to the strongest/slowest one for the 
current implementation it should be part of the API.

If D wants be be a real system programming language (ex a 
replacement for c++) please address these issues. I'm not an 
expert on the subject, but D seems to be in a c++11 stage where 
compiler/memory barriers and atomic had to be implemented 
differently for each platform and the programmer could only hope 
that compiler won't f*ck up everything during optimization.

I don't know if D compiler is aware of the fences and won't move 
out/in instructions from guarded areas.

Thanks: gzp
Jun 14
next sibling parent reply rikki cattermole <rikki cattermole.co.nz> writes:
On 14/06/2017 11:40 AM, gzp wrote:
 After digging into it the source for me it seems as D is lacking a 
 "standardized" atomic library. It has some basic concepts, but far 
 behind the c++ standards.
 I don't know if there are any RFC-s in this topic but it requires a lot 
 of work. Just to mention some by my first experience:
 
 cas
 in all api I've seen on a failed swap, the current value is retrieved
 (in c/c++ there are intrinsic for them)
 
 exchange
 no api for it and not implementable without spinning
 (in c/c++ there are intrinsic for them)
 
 atomicFence
 No memory ordering is considered in the API
 Even tough it falls back to the strongest/slowest one for the current 
 implementation it should be part of the API.
 
 If D wants be be a real system programming language (ex a replacement 
 for c++) please address these issues. I'm not an expert on the subject, 
 but D seems to be in a c++11 stage where compiler/memory barriers and 
 atomic had to be implemented differently for each platform and the 
 programmer could only hope that compiler won't f*ck up everything during 
 optimization.
 
 I don't know if D compiler is aware of the fences and won't move out/in 
 instructions from guarded areas.
 
 Thanks: gzp
Please create an issue here: issues.dlang.org for druntime atomic support. Clearly the requirements that we have been working under are not up to your expectations (or needs).
Jun 14
parent reply Russel Winder via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Wed, 2017-06-14 at 12:28 +0100, rikki cattermole via Digitalmars-d
wrote:
 On 14/06/2017 11:40 AM, gzp wrote:
 After digging into it the source for me it seems as D is lacking a=20
 "standardized" atomic library. It has some basic concepts, but far=20
 behind the c++ standards.
 I don't know if there are any RFC-s in this topic but it requires a
 lot=20
 of work. Just to mention some by my first experience:
=20
 cas
 in all api I've seen on a failed swap, the current value is
 retrieved
 (in c/c++ there are intrinsic for them)
=20
 exchange
 no api for it and not implementable without spinning
 (in c/c++ there are intrinsic for them)
=20
 atomicFence
 No memory ordering is considered in the API
 Even tough it falls back to the strongest/slowest one for the
 current=20
 implementation it should be part of the API.
=20
 If D wants be be a real system programming language (ex a
 replacement=20
 for c++) please address these issues. I'm not an expert on the
 subject,=20
 but D seems to be in a c++11 stage where compiler/memory barriers
 and=20
 atomic had to be implemented differently for each platform and the=20
 programmer could only hope that compiler won't f*ck up everything
 during=20
 optimization.
Step back a moment. C++ and Java are trying to stop programmers using these features, in favour of using higher level abstractions. In C++ and Java such features as above are often required to implement the higher level abstraction but so as to allow other programmers not to have to use them. That D can do the high level parallel and concurrent programming using a more actor style model, i.e. processes and channels, or data parallelism, tasks on a threadpool, that C++11 didn't have but C++17 has (I believe), potentially means there is no reason to slavishly follow other languages in providing features that are not needed. So what is it that requires D to have CAS, exchange, and atomicFence? This proposal to introduce them needs driving by showing what C++ can do at the application level that D cannot, rather than being tick box driven via a list of types. C++ and Java have formal memory models because people use a lot of shared memory multithreading. If you use actor/dataflow/data parallelism at the application level then it is entirely feasible to get away without a formal memory model as long as the actor/dataflow/data parallelism frameworks can be constructed without one. It is, of course, easier to do this if there is a memory model. So the first port of call has to be "does D have a formal memory model" rather than dopes it have CAS, exchange, and fences. Oh, and if you can avoid fences you must. Remember, locks, semaphores, mutexes, barriers, and fences are all designed to stop parallelism, they are designed to slow things down. They are needed for implementing operating system, but unless your application is an operating system in some sort of disguise, you really don't want them in your code. Investigation may discover that D is missing some of these features, but there needs to be a reason to have them other than "other languages have them". =20
 I don't know if D compiler is aware of the fences and won't move
 out/in=20
 instructions from guarded areas.
=20
 Thanks: gzp
=20 Please create an issue here: issues.dlang.org for druntime atomic support. Clearly the requirements that we have been working under are not up to=20 your expectations (or needs).
Is an issue the right vehicle for investigating the need for these? --=20 Russel. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.n= et 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder
Jun 14
next sibling parent reply rikki cattermole <rikki cattermole.co.nz> writes:
On 14/06/2017 1:15 PM, Russel Winder via Digitalmars-d wrote:

snip

 I don't know if D compiler is aware of the fences and won't move
 out/in
 instructions from guarded areas.

 Thanks: gzp
Please create an issue here: issues.dlang.org for druntime atomic support. Clearly the requirements that we have been working under are not up to your expectations (or needs).
Is an issue the right vehicle for investigating the need for these?
Yes. A N.G. post will be forgotten about quickly, but an issue in the bug tracker can send you updates as things progress. At the end of the day, that module grew organically, it just needs a bit of planning put into it for the future that's all.
Jun 14
parent reply Russel Winder via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Wed, 2017-06-14 at 13:27 +0100, rikki cattermole via Digitalmars-d
wrote:
=20
[=E2=80=A6]
 Yes. A N.G. post will be forgotten about quickly, but an issue in
 the=20
 bug tracker can send you updates as things progress.
OK. Feel free to sign me up for the issue.
 At the end of the day, that module grew organically, it just needs a
 bit=20
 of planning put into it for the future that's all.
Which module? --=20 Russel. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.n= et 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder
Jun 14
parent reply rikki cattermole <rikki cattermole.co.nz> writes:
On 14/06/2017 1:48 PM, Russel Winder via Digitalmars-d wrote:
 On Wed, 2017-06-14 at 13:27 +0100, rikki cattermole via Digitalmars-d
 wrote:

 […]
 Yes. A N.G. post will be forgotten about quickly, but an issue in
 the
 bug tracker can send you updates as things progress.
OK. Feel free to sign me up for the issue.
If an issue is created, you can add yourself pretty easily (cc field).
 At the end of the day, that module grew organically, it just needs a
 bit
 of planning put into it for the future that's all.
Which module?
core.atomic
Jun 14
parent gzp <z.p.gaal.devel gmail.com> writes:
Actually I've just found an isue from 2015 (still in NEW state):
https://issues.dlang.org/show_bug.cgi?id=15007
I've updated and linked this forum.
Jun 14
prev sibling parent Patrick Schluter <Patrick.Schluter bbox.fr> writes:
On Wednesday, 14 June 2017 at 12:15:49 UTC, Russel Winder wrote:
 On Wed, 2017-06-14 at 12:28 +0100, rikki cattermole via 
 Digitalmars-d wrote:
 [...]
Step back a moment. C++ and Java are trying to stop programmers using these features, in favour of using higher level abstractions. In C++ and Java such features as above are often required to implement the higher level abstraction but so as to allow other programmers not to have to use them. [...]
Especially since D has officially support for inline assembly. All these low-level constructs are better handled directly at the machine code level as their semantic varies significantly between architectures.
Jun 14
prev sibling parent reply Russel Winder via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Wed, 2017-06-14 at 10:40 +0000, gzp via Digitalmars-d wrote:
 After digging into it the source for me it seems as D is lacking=20
 a "standardized" atomic library. It has some basic concepts, but=20
 far behind the c++ standards.
 I don't know if there are any RFC-s in this topic but it requires=20
 a lot of work. Just to mention some by my first experience:
=20
 cas
 in all api I've seen on a failed swap, the current value is=20
 retrieved
 (in c/c++ there are intrinsic for them)
This appears to be in core.atomic.
 exchange
 no api for it and not implementable without spinning
 (in c/c++ there are intrinsic for them)
 atomicFence
 No memory ordering is considered in the API
 Even tough it falls back to the strongest/slowest one for the=20
 current implementation it should be part of the API.
Appears to be in core.atomic.
 If D wants be be a real system programming language (ex a=20
 replacement for c++) please address these issues. I'm not an=20
 expert on the subject, but D seems to be in a c++11 stage where=20
 compiler/memory barriers and atomic had to be implemented=20
 differently for each platform and the programmer could only hope=20
 that compiler won't f*ck up everything during optimization.
=20
 I don't know if D compiler is aware of the fences and won't move=20
 out/in instructions from guarded areas.
I am fairly sure it isn't, but why is this needed if you use a parallelism oriented approach to the architecture and design? Sorry to repeat but whilst there are circumstances where this stuff is needed (operating systems), most other applications should be written without the need for locks, mutexes, fences, memory model, etc. any need for that stuff should be covered in the frameworks used. We need to be careful not to bring 1960s views of threads into 2010 programming. Sometimes they are needed, usually they are not.=20 --=20 Russel. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.n= et 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder
Jun 14
next sibling parent gzp <z.p.gaal.devel gmail.com> writes:
 I am fairly sure it isn't, but why is this needed if you use a 
 parallelism oriented approach to the architecture and design? 
 Sorry to repeat but whilst there are circumstances where this 
 stuff is needed (operating systems), most other applications 
 should be written without the need for locks, mutexes, fences, 
 memory model, etc. any need for that stuff should be covered in 
 the frameworks used.

 We need to be careful not to bring 1960s views of threads into 
 2010 programming. Sometimes they are needed, usually they are 
 not.
Atomic is not meant to replace the higher level abstraction things (neither in c++ nor in any other language). They are meant to implement new higher lever abstraction layers as a library instead of as a language feature. As new parallel algorithms are discovered, they can be (not so) "easily" added to the languages through libs. How would you implement a lock-free list? Or a lock-free multiple producer, single consumer queue? It's good to have send/receive mechanism in a parallel world, but in my opinion the framework should be a library and not the language itself. And to easy the writing of such libraries some good, reliable building blocks are required (mutex, atomic, fence, etc.). You are right these features are not meant to be used too much, but required to build more general parallel, containers, schedulers, algorithms etc. Note: Why do I keep mentioning C++11 with respect to atomic? Because some experts spent a lot of time to find a good stable API for these things.
Jun 14
prev sibling parent David Nadlinger <code klickverbot.at> writes:
On Wednesday, 14 June 2017 at 12:48:14 UTC, Russel Winder wrote:
 On Wed, 2017-06-14 at 10:40 +0000, gzp via Digitalmars-d wrote:
 […]
 cas
 in all api I've seen on a failed swap, the current value is
 retrieved
 (in c/c++ there are intrinsic for them)
This appears to be in core.atomic.
There is a misunderstanding here. cas() is in core.atomic, but it returns true/false rather than the read value. However, this is just fine for virtually all algorithms. In fact, the respective <atomic> functions in C++11 also return a boolean result.
 exchange
 no api for it and not implementable without spinning
 (in c/c++ there are intrinsic for them)
 atomicFence
 No memory ordering is considered in the API
 Even tough it falls back to the strongest/slowest one for the
 current implementation it should be part of the API.
Appears to be in core.atomic.
Where exactly would that be? There is no unconditional swap/xchg in core.atomic, and atomicFence() indeed only supports sequentially consistent semantics. — David
Jun 14
prev sibling next sibling parent Adrian Matoga <dlang.spam matoga.info> writes:
On Tuesday, 13 June 2017 at 06:12:46 UTC, gzp wrote:
 (...)
 There is no consume in D.
What do you currently use for in C++? It is temporarily deprecated in C++17.
Jun 14
prev sibling next sibling parent Guillaume Piolat <first.last gmail.com> writes:
On Tuesday, 13 June 2017 at 06:12:46 UTC, gzp wrote:
 But what about compare_exchange (CAS) ? In C++ one have to 
 provide Memory ordering for success and failure, but not in D.
I have some difficulty already to comprehend MemoryOrder.rel and MemoryOrder.acq A cas with MemoryOrder.raw wouldn't be very useful.
 Does it mean, it is the strongest sequaential all the time, all 
 some explicit fence have to be provided.
It uses lock xchg https://github.com/dlang/druntime/blob/ce0f089fec56f7ff5b1df689f5c81256218e415b/src/core/atomic.d#L769 So no additional fences needed, it is already the strongest IIRC. imho, if a CAS requires additional memory barriers, it's a bit useless..
Jun 14
prev sibling parent David Nadlinger <code klickverbot.at> writes:
Hi,

On Tuesday, 13 June 2017 at 06:12:46 UTC, gzp wrote:
 the docs are quite minimal
That's true. In fact, this applies not only to atomic intrinsics, but all of `shared`. We need to sit down and properly specify things at some point. Andrei has been trying to get an initiative going to do just that recently.
 There is no consume in D.
There is indeed no equivalent to memory_order_consume. Note, however, that consume is about to be deprecated in C/C++, as it turned out to be more or less unimplementable in its current form (at least while still being useful). Introducing the notion of source-level dependencies into a language that otherwise operates with as-if semantics on an abstract machine is a tricky business.
 But what about compare_exchange (CAS) ? […] Does it mean,
 it is the strongest sequaential all the time
Yes, core.atomic.cas() is always seq_cst for the time being (we should fix this).
 Another issue is the fence. In D there is no memoryordering for 
 fence, only the strongest one exists. Is it intentional?
No; it is just a questionable design decision/unnecessary limitation which can easily be remedied by adding an optional parameter. — David
Jun 14