www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Dare I ... another volatile discussion ?

reply "Jens Bauer" <doctor who.no> writes:
I'm sorry for opening such a topic; I've heard it's not liked a 
lot, but I think it might be necessary.

I'm not asking for a 'volatile' keyword, but rather to find out 
what the right thing to use is.
After reading a few different threads related to 
microcontrollers, I started wondering how to program the 
following in D:

1: System level drivers, which writes directly to hardware 
registers (any architecture, PC/i386 [Linux, Windows, others], 
Atari ST, IBM BladeCenter, etc.)
2: Interrupts that needs to share variables.

1) is what we basically need on microcontrollers. If it's 
possible to write a driver in D, which has no problems with 
accessing hardware, then it should be possible to do it for any 
microcontroller as well.

2) shared variables could be used for interrupts, but what 
happens if the TLS is disabled; will shared variables work ? 
-Interrupts are not threads.

Regarding (1), because marking a variable 'shared' is not enough 
(it allows instructions to be moved around), Johannes already 
made a volatileLoad and volatileStore, which will be usable for 
microcontrollers, though for convenience, it requires writing 
additional code.
-But this solution ... I do not know if it would work, when 
writing a driver for Debian running on a i586 platform or 
PowerMac G3 for instance.

If variable 'alice' and variable 'bob' are both shared, and 
reading from 'bob', then writing to 'alice'; would instructions 
be moved around, so reading from 'bob' could actually occur after 
writing to 'alice' ?

Imagine that in the microcontroller world, there's a *bunch* of 
hardware registers.
The best thing a microcontroller knows, is to read/write hardware 
registers; often only a few bits at a time - but definitely also 
8-, 16- and 32-bit values.

Thus you will most likely not be able to find a single piece of 
source-code for a microcontroller, where it does not access 
hardware. This means: Hardware is used often, and reading/writing 
hardware registers should not take up a lot of space in the 
source code.


For those, who are unfamiliar with such code, here's a silly 
example in C on how it might look.
The following code changes the clock-frequency of the 
microcontroller, so it runs at 100 MHz:

---8<-----8<-----8<-----
uint32_t setupCoreClock()
{
	/* By default, we run at 100MHz, however,
	 * our PLL clock frequency is 300 MHz.
	 * We'd like to run 400MHz! */

	LPC_SC->SCS = SCS_MainOscillatorEnable;	/* enable Main 
Oscillator */
	if(LPC_SC->SCS & SCS_MainOscillatorEnable)
	{
		while(0 == (LPC_SC->SCS & SCS_MainOscillatorStatus)){}
	}

	LPC_SC->CCLKCFG = CCLK_Divider - 1;

	LPC_SC->PCLKSEL0 = 0;
	LPC_SC->PCLKSEL1 = 0;

	LPC_SC->CLKSRCSEL = PLL0_ClockSource;

	LPC_SC->PLL0CFG = PLL0_Configuration;
	LPC_SC->PLL0FEED = 0xaa;
	LPC_SC->PLL0FEED = 0x55;

	LPC_SC->PLL0CON = PLL_Enable;
	LPC_SC->PLL0FEED = 0xaa;
	LPC_SC->PLL0FEED = 0x55;
	while(!(LPC_SC->PLL0STAT & PLL0_Lock)){}	/* Wait for PLOCK0 */

	LPC_SC->PLL0CON = PLL_Enable | PLL_Connect;
	LPC_SC->PLL0FEED = 0xaa;
	LPC_SC->PLL0FEED = 0x55;
	while((LPC_SC->PLL0STAT & (PLL0_EnabledFlag | 
PLL0_ConnectedFlag)) != (PLL0_EnabledFlag | 
PLL0_ConnectedFlag)){}	/* Wait until connected */

	return(F_CCLK);
}
--->8----->8----->8-----

Everything you see above, which starts with 'LPC_' are hardware 
register accesses.
Not counting the curly braces, nearly every line of the code 
access hardware registers.

Things like LPC_SC are pointers to structures, which contains a 
load of volatile keywords; some prohibit reading, some prohibit 
writing, some allow both, some allow neither.

Most code for ARM Cortex-M looks like the above, because the 
silicon vendors just modifies the stock (template) source code, 
which they receive from ARM. Thus there's some convenience, but 
not a lot. Bitfields are not provided by ARM, so it propagates 
down to the vendor, which do not supply bitfields either. 
However, byte/halfword/word access (8/16/32-bit) are sometimes 
provided through anonymous unions, which is very helpful.

So back to what originally triggered this post; this is what the 
question is actually about:
If writing a driver for <platform X>, how is reading/writing 
hardware registers usually done in D ?
May 07 2015
next sibling parent reply Johannes Pfau <nospam example.com> writes:
Am Thu, 07 May 2015 16:04:55 +0000
schrieb "Jens Bauer" <doctor who.no>:

 I'm sorry for opening such a topic; I've heard it's not liked a 
 lot, but I think it might be necessary.
 
 I'm not asking for a 'volatile' keyword, but rather to find out 
 what the right thing to use is.
 After reading a few different threads related to 
 microcontrollers, I started wondering how to program the 
 following in D:
 
 1: System level drivers, which writes directly to hardware 
 registers (any architecture, PC/i386 [Linux, Windows, others], 
 Atari ST, IBM BladeCenter, etc.)
 2: Interrupts that needs to share variables.
 
 1) is what we basically need on microcontrollers. If it's 
 possible to write a driver in D, which has no problems with 
 accessing hardware, then it should be possible to do it for any 
 microcontroller as well.
 
 2) shared variables could be used for interrupts, but what 
 happens if the TLS is disabled; will shared variables work ? 
 -Interrupts are not threads.
 
 Regarding (1), because marking a variable 'shared' is not enough 
 (it allows instructions to be moved around), Johannes already 
 made a volatileLoad and volatileStore, which will be usable for 
 microcontrollers, though for convenience, it requires writing 
 additional code.
1) Actually these were proposed for DMD some time before I've implemented the GDC part ;-) Shared should be avoided for volatile data for now. It might make sense if you want to synchronize accesses from threads to a volatile memory location, but then things might get complicated. 2) This can be done with volatileLoad/Store as well. Just access a global variable using volatileLoad/Store. This can also be used nicely in a wrapper. If you want to enforce atomic access to volatile variables (bigger than the word size) you might need shared(Volatile!T). Again things might get complicated ;-)
 If variable 'alice' and variable 'bob' are both shared, and 
 reading from 'bob', then writing to 'alice'; would instructions 
 be moved around, so reading from 'bob' could actually occur after 
 writing to 'alice' ?
Not sure about shared (I don't think anybody knows what exactly shared is supposed to do) but if you use volatileLoad/Store these instructions won't be moved around.
May 07 2015
next sibling parent Iain Buclaw via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 7 May 2015 at 20:18, Johannes Pfau via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 If variable 'alice' and variable 'bob' are both shared, and
 reading from 'bob', then writing to 'alice'; would instructions
 be moved around, so reading from 'bob' could actually occur after
 writing to 'alice' ?
Not sure about shared (I don't think anybody knows what exactly shared is supposed to do) but if you use volatileLoad/Store these instructions won't be moved around.
I've change my mind over the years, so who knows whether or not I'll change again, but this is the current definition I give to it. Shared data on an ABI level is exactly the same as __gshared, but that's where the similarities end. On a semantic level, shared comes with a transitive, and statically type-checked guarantee that either taking it's address or passing it's reference around stays within the bounds of it's given qualifier, so everything it's address passes through must also be shared. To make it an even stronger type, non-shared data can not be implicitly promoted to shared. Of course, explicit casts allow you to demote and promote the 'shared' qualifier as much as you like, circumventing all guarantee. When used properly though, it's properties make it a prime candidate for the foundation of libraries/programs that centre around the use of atomics. However, shared is not to be confused with a thread-safe atomic type. Thread-safety is left to the end-user on such matters of access. Regards Iain.
May 07 2015
prev sibling next sibling parent reply Iain Buclaw via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 7 May 2015 at 23:39, Iain Buclaw <ibuclaw gdcproject.org> wrote:
 When used properly though, it's properties make it a prime candidate
 for the foundation of libraries/programs that centre around the use of
 atomics.  However, shared is not to be confused with a thread-safe
 atomic type.  Thread-safety is left to the end-user on such matters of
 access.
That last sentence should also include a mention of memory ordering too being left to the end-user.
May 07 2015
parent "Jens Bauer" <doctor who.no> writes:
On Thursday, 7 May 2015 at 21:42:08 UTC, Iain Buclaw wrote:
 On 7 May 2015 at 23:39, Iain Buclaw <ibuclaw gdcproject.org> 
 wrote:
 When used properly though, it's properties make it a prime 
 candidate
 for the foundation of libraries/programs that centre around 
 the use of
 atomics.  However, shared is not to be confused with a 
 thread-safe
 atomic type.  Thread-safety is left to the end-user on such 
 matters of
 access.
That last sentence should also include a mention of memory ordering too being left to the end-user.
Thank you for explaining this; it does make sense. :) -So 'shared' is not to be confused with 'atomic'; its behaviour seems closer to C's "extern", do I understand this correctly ? One of the reasons I asked the question, is because no doubt D will need to write to hardware on any kind of platform, and I'd like it to be the same on both microcontrollers and a Personal Computer (not limited to a specific kind of processor). Another thing, slightly related, is 'atomic'. C's implementation allows you to have a function, which reads/modifies/writes a value atomically. This is all implemented 'outside' the C language. Would it make sense to mark a variable 'atomic', or would it be rather crazy, because if it's extern(C) anyway, C wouldn't access it atomically ?
May 09 2015
prev sibling parent reply "Kagamin" <spam here.lot> writes:
On Thursday, 7 May 2015 at 16:04:56 UTC, Jens Bauer wrote:
 Regarding (1), because marking a variable 'shared' is not 
 enough (it allows instructions to be moved around), Johannes 
 already made a volatileLoad and volatileStore, which will be 
 usable for microcontrollers, though for convenience, it 
 requires writing additional code.
 -But this solution ... I do not know if it would work, when 
 writing a driver for Debian running on a i586 platform or 
 PowerMac G3 for instance.
System calls on sufficiently smart processors like x86 use C-like ABI good practices: registers and buffers to pass data instead of global variables, because multithreaded programs will have race condition on accessing the global variables. See read(2) syscall as an example of such API http://man7.org/linux/man-pages/man2/read.2.html On Thursday, 7 May 2015 at 18:18:02 UTC, Johannes Pfau wrote:
 Not sure about shared (I don't think anybody knows what exactly 
 shared is supposed to do)
Shared is supposed to prevent the programmer from accidentally putting unshared data in a shared context. Expectedly people wanted it to be a silver bullet for concurrency, instead std.concurrency provides high-level concurrency safety.
May 09 2015
parent reply "Jens Bauer" <doctor who.no> writes:
On Saturday, 9 May 2015 at 12:16:58 UTC, Kagamin wrote:
 On Thursday, 7 May 2015 at 16:04:56 UTC, Jens Bauer wrote:
 Regarding (1), because marking a variable 'shared' is not 
 enough (it allows instructions to be moved around), Johannes 
 already made a volatileLoad and volatileStore, which will be 
 usable for microcontrollers, though for convenience, it 
 requires writing additional code.
 -But this solution ... I do not know if it would work, when 
 writing a driver for Debian running on a i586 platform or 
 PowerMac G3 for instance.
System calls on sufficiently smart processors like x86 use C-like ABI good practices: registers and buffers to pass data instead of global variables, because multithreaded programs will have race condition on accessing the global variables. See read(2) syscall as an example of such API http://man7.org/linux/man-pages/man2/read.2.html
To make my question a little clearer: This part is not about RAM locations, but I/O-memory locations AKA. hardware addresses. On some systems, such as the 68xxx based Atari and Amiga, peripherals are accessed by reading and writing to memory locations; those locations are addresses belonging to hardware; eg. "hardware space". This depends on the CPU. As an example, the Atari, the address space was usually 0x..ffxxxx, where the first two dots were "don't care", as the 68000 was only 24-bit; later, 0x00fxxxxx was mirrored to 0xfffxxxxx, for backwards compatibility. Thus the address space between 0x..f00000 and 0x..ffffff was not ordinary RAM, but I/O-space. On Z80, for instance, it's common to use the IN and OUT instructions to access hardware, so normally you wouldn't use precious memory locations for peripherals on such systems. ... "System calls" will need to access the peripherals in some way, in order to send data to for instance a printer or harddisk. If the way it's done is using a memory location, then it's necessary to tell the compiler that this is not ordinary memory, but I/O-memory AKA hardware address space.
 On Thursday, 7 May 2015 at 18:18:02 UTC, Johannes Pfau wrote:
 Not sure about shared (I don't think anybody knows what 
 exactly shared is supposed to do)
Shared is supposed to prevent the programmer from accidentally putting unshared data in a shared context. Expectedly people wanted it to be a silver bullet for concurrency, instead std.concurrency provides high-level concurrency safety.
In other words, it's the oposite of 'static' ? -If so, then that makes the purpose much clearer to me, and it absolutely makes sense. :) ... Like the "export <symbol>" or "xdef <symbol>" directives in assembly language.
May 09 2015
parent reply "Kagamin" <spam here.lot> writes:
On Saturday, 9 May 2015 at 16:59:35 UTC, Jens Bauer wrote:
 ... "System calls" will need to access the peripherals in some 
 way, in order to send data to for instance a printer or 
 harddisk. If the way it's done is using a memory location, then 
 it's necessary to tell the compiler that this is not ordinary 
 memory, but I/O-memory AKA hardware address space.
Userland code still uses system calls and not global variables, whatever is expressed in read(2) signature tells the compiler enough to pass data via buffer.
 Shared is supposed to prevent the programmer from accidentally 
 putting unshared data in a shared context. Expectedly people 
 wanted it to be a silver bullet for concurrency, instead 
 std.concurrency provides high-level concurrency safety.
In other words, it's the oposite of 'static' ?
Whether data is shared or not is not tied to its storage class, that's why its shared nature is expressed in its type and storage class can be anything; for the same reason shared type qualifier is transitive.
May 10 2015
parent "Jens Bauer" <doctor who.no> writes:
On Sunday, 10 May 2015 at 12:43:31 UTC, Kagamin wrote:
 On Saturday, 9 May 2015 at 16:59:35 UTC, Jens Bauer wrote:
 ... "System calls" will need to access the peripherals in some 
 way, in order to send data to for instance a printer or 
 harddisk. If the way it's done is using a memory location, 
 then it's necessary to tell the compiler that this is not 
 ordinary memory, but I/O-memory AKA hardware address space.
Userland code still uses system calls and not global variables,
I think it is essential to emphasize I/O-space is not, and can not be compared to variables. Variables reside in RAM. I/O-space is outside RAM and usually not accessible for anything but kernel and drivers. On simple microcontrollers, there's no "user" and "supervisor" modes; thus I/O-space can be accessed from any part of the program. You could say that such microcontrollers are always running in 'supervisor mode' or 'privileged mode'.
 whatever is expressed in read(2) signature tells the compiler 
 enough to pass data via buffer.
Yes. If we take 'read' as an example, the system call takes your data-block and at some point it transfers your data block to the 'driver'. The driver receives the data, but where does it put the data, in order to write the data to your harddisk ? On some systems, it writes to a harddisk controller, which resides in I/O-space. This harddisk controller is not software, it's hardware. It means the data written to I/O-space, AKA. Hardware registers, go directly out onto the PCB traces and head for an external chip outside the CPU; the chip is a bridge chip, which is only the middle-man. The data is passed on to another chip connected to the bridge, and this other chip will then see "oh, it's a command, that I should move the arm that holds the harddisk's read/write head"; it also receives a position. This command may be 2 bytes in size. A series of such commands are necessary before the actual data can begin to be transferred. When the harddisk head is in the right position and all the other preparations have been made, the CPU can start transferring the data, which is held in the buffer in RAM. It may transfer the data byte-by-byte, until all data have been transferred. I'm sorry for such a tedious old-fashioned example, but it really explains it the best way. Today, we have DMA; we set a pointer and a length, and give the command "Begin", and the data is transferred automatically, so the CPU is actually free to do other things while our transfer is being done in the background - but the basics are the same. To tell the DMA where to start transferring from and how many bytes to transfer and trigger the transfer, one has to write to I/O-space (on most systems). Variables are not at all behaving like peripherals, because they reside in normal RAM (well, usually they reside in RAM, especially on computers). Imagine you write value 0xA5 to address 0xFFFF3840. If you have all interrupts disabled and read the value immediately after that, what do you expect to read ? Yes, of course, you expect to read 0xA5. But you will never read that value, because the hardware always reads this particular I/O location as 0x13. Now, your written value, somehow is readable in address 0xFFFF382C. So whatever you write in 0xFFFF3840 is immediately readable in address 0xFFFF382C. If this is confusing, then wait until you hear this: Some peripherals require you to *read* an address in order to clear an interrupt-pending bit. As soon as you read it, the pending bit will be cleared and you can have another interrupt, but not before that happens. This might sound wierd, but how about this: In another location on the same chip, you'll find that you need to write a one, in order to clear a bit in a register. However, not all bits act that way in this 32-bit register; some are 'sticky', some bits are read-only and some are write-one-to-clear. Alright, alright, we're crazy enough now, aren't we ? -NO! :) Another kind of peripheral resets a counter, whenever you write *any* value to a hardware register. It does not matter which value; pick one, then the address you write to will be cleared to 0. Reading it immediately after will either give you the result 0 or perhaps 1 if it already started counting. ... That means I/O-space isn't about atomic operations nor is it only about writing the values in the right order, but it may be completely nuts and twist your brain. :) Thus ... Having this kind of space, which is addressable the same way RAM is addressable, but behaves dramatically different, one needs to be able to tell the compiler that "this is not RAM and cannot behave as RAM". C does this by using the 'volatile' keyword, which is often just used to share variables between task-time and interrupt-time. But what volatile is really about is to tell the compiler that ... 1: It's not allowed to do *any* caching of the value 2: It can not predict which value the address may contain. 3: It's not allowed to move code from one side of the access to such address to the other side of the access (eg. it's not allowed to move an instruction from before the access to after the access or vice versa). 4: Just in case I forgot something, put it here. ;) (I'm sorry for the long explanation, I hope it wasn't too boring).
 Shared is supposed to prevent the programmer from 
 accidentally putting unshared data in a shared context. 
 Expectedly people wanted it to be a silver bullet for 
 concurrency, instead std.concurrency provides high-level 
 concurrency safety.
In other words, it's the oposite of 'static' ?
Whether data is shared or not is not tied to its storage class, that's why its shared nature is expressed in its type and storage class can be anything; for the same reason shared type qualifier is transitive.
This helps me a lot in understanding the nature of 'shared'. Thank you for providing these details. :)
May 10 2015
prev sibling next sibling parent Iain Buclaw via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 7 May 2015 at 18:04, Jens Bauer via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 I'm sorry for opening such a topic; I've heard it's not liked a lot, but I
 think it might be necessary.

 I'm not asking for a 'volatile' keyword, but rather to find out what the
 right thing to use is.
 After reading a few different threads related to microcontrollers, I started
 wondering how to program the following in D:

 1: System level drivers, which writes directly to hardware registers (any
 architecture, PC/i386 [Linux, Windows, others], Atari ST, IBM BladeCenter,
 etc.)
 2: Interrupts that needs to share variables.

 1) is what we basically need on microcontrollers. If it's possible to write
 a driver in D, which has no problems with accessing hardware, then it should
 be possible to do it for any microcontroller as well.

 2) shared variables could be used for interrupts, but what happens if the
 TLS is disabled; will shared variables work ? -Interrupts are not threads.

 Regarding (1), because marking a variable 'shared' is not enough (it allows
 instructions to be moved around), Johannes already made a volatileLoad and
 volatileStore, which will be usable for microcontrollers, though for
 convenience, it requires writing additional code.
 -But this solution ... I do not know if it would work, when writing a driver
 for Debian running on a i586 platform or PowerMac G3 for instance.

 If variable 'alice' and variable 'bob' are both shared, and reading from
 'bob', then writing to 'alice'; would instructions be moved around, so
 reading from 'bob' could actually occur after writing to 'alice' ?
Yes. Take this example: --- shared int alice, bob; void foo() { alice = bob + 1; bob = 0; } --- gdc without optimisations produces: --- ; load bob into %eax movl _D7reorder3bobOi(%rip), %eax ; + 1 addl $1, %eax ; store to alice movl %eax, _D7reorder5aliceOi(%rip) ; bob = 0 movl $0, _D7reorder3bobOi(%rip) --- gdc with optimisations produces: --- ; load bob into eax movl _D7reorder3bobOi(%rip), %eax ; bob = 0 movl $0, _D7reorder3bobOi(%rip) ; + 1 addl $1, %eax ; store to alice movl %eax, _D7reorder5aliceOi(%rip) --- Now, this does not change the behaviour of the program if we consider that bob and alice do not have any side effects. However on a micro-controller dealing with direct memory I/O ... Iain.
May 07 2015
prev sibling parent "Mike" <none none.com> writes:
On Thursday, 7 May 2015 at 16:04:56 UTC, Jens Bauer wrote:

 So back to what originally triggered this post; this is what 
 the question is actually about:
 If writing a driver for <platform X>, how is reading/writing 
 hardware registers usually done in D ?
Here's my pre 2.067 code for `volatile` semantics inline T volatileLoad(T)(T* a) { asm { "" ::: "memory"; }; return *cast(shared T*)a; } inline void volatileStore(T)(T* a, in T v) { asm { "" ::: "memory"; }; *cast(shared T*)a = v; } As Iain showed, shared does not provide any order guarantees, so that's why I added the asm { "" ::: "memory"; }; memory barrier. This code, however, is still incorrect because it uses a bug in GDC as a feature. Iain explained that bug here: https://youtu.be/o5m0m_ZG9e8?t=3141 There's also a post here that shows a volatileLoad/Store implementation using memory barriers without shared (http://forum.dlang.org/post/501A6E01.7010809 gmail.com), but when I tried it with my cross-compiler, I got an ICE. Furthermore, I don't even understand it. I don't understand exactly what the +g and +m constraints do, and I hate using code I don't understand. Once 2.067 is properly merged and implemented, none of this will be necessary anyway as volatileLoad/Store intrinsics were introduced. That being said, you will still have to add some boilerplate around volatileLoad/Store to make the syntax tolerable. Johannes has a nice implementation of Volatile!T for that purpose here (http://dpaste.dzfl.pl/dd7fa4c3d42b). I have my own CTFE Template implementation here (https://github.com/JinShil/stm32f42_discovery_demo/blob/master/source/stm32f42/mmio.d). In terms of shared, I think the following thread is quite telling: http://forum.dlang.org/post/lruc3n$at1$1 digitalmars.com. Based on that, I wouldn't trust shared for anything, and I think it should just be ignored altogether. See also http://p0nce.github.io/d-idioms/#The-truth-about-shared I'm not sure what to do yet for synchronized/atomic access. If I had to do in now, I'd probably just stick with C-like techniques. Mike
May 08 2015