www.digitalmars.com         C & C++   DMDScript  

D - Volatile

reply Jim Starkey <jas netfrastructure.com> writes:
Please pardon my ignorance if this has been hashed and re-hashed.  I
just got a pointer to D
from another list, came over for a quick look-see, and liked what I
saw.  So I thought I'd
toss in a few thoughts.

I notice there is no support for volatile, which perplexes me.  Volatile
is necessary to
warn an optimizer that another thread may change a data item without
warning.  It
isn't necessary in a JVM because those types of optimization can be
expressed in
byte codes, although it does limit what a JIT compiler can do.  D is
intended for real
compilation, however, and when the instruction set guys give us enough
registers,
the compiler is going to want to stick intermediates in them.  Without
volatile, this
ain't a gona work.

That said, the C concept of volatile declaration doesn't go far enough.
While it does
warn the compiler that an unexpected change is value is fair game, it
doesn't tell
the compiler when or if to generate multi-process safe instruction
sequences.

The obvious response is that data structures should be protected by a
mutex or
synchronize.   The problem is that these are vastly too expensive to use
in a
tight, fine-grained multi-thread application.  Modern multi-processors
do a
wonderful job of implementing processor interlocked atomic
instructions.  Modern
OSes do a reasonable job of scheduling threads on multi-processors.
Modern
language, however, do a rotten job of giving the primitives to exploit
these
environments.  Yeah, I know I can write an inline "lock xsub decl" yada
yada
yada.  But it's painful and non-portable.  And we all know that writing
assembler
rots the soul.

So, guys, I would like the following:

    1.  A volatile declaration so the compiler can do smart things while
I do
         fast things.
    2.  A "volatile volatile" declaration or distinct operator or
operator modified
         to tell the compiler to use an processor interlock instruction
sequence OR
         give me a compile time error why it can't.

There are probably smarter ways to do this than a volatile declaration.
But something
is needed in that niche.

Or, alternatively, I could have my head throughly wedged.  But I'll take
on all comers
until that is so obvious that I can see it myself.
Mar 21 2002
parent reply "Walter" <walter digitalmars.com> writes:
"Jim Starkey" <jas netfrastructure.com> wrote in message
news:3C9A43BC.AFBA03BA netfrastructure.com...
 I notice there is no support for volatile, which perplexes me.  Volatile
 is necessary to
 warn an optimizer that another thread may change a data item without
 warning.
They'd have to be implemented with mutexes anyway, so might as well just wrap them in "synchronized". Note: the X86 CPU doesn't guarantee that writing to memory will be atomic if the item crosses a 32 bit word boundary, which can happen writing doubles, longs, or even misaligned ints.
 That said, the C concept of volatile declaration doesn't go far enough.
 While it does
 warn the compiler that an unexpected change is value is fair game, it
 doesn't tell
 the compiler when or if to generate multi-process safe instruction
 sequences.
I agree that the C definition of volatile is next to useless.
 The obvious response is that data structures should be protected by a
 mutex or
 synchronize.   The problem is that these are vastly too expensive to use
 in a
 tight, fine-grained multi-thread application.  Modern multi-processors
 do a
 wonderful job of implementing processor interlocked atomic
 instructions.  Modern
 OSes do a reasonable job of scheduling threads on multi-processors.
 Modern
 language, however, do a rotten job of giving the primitives to exploit
 these
 environments.  Yeah, I know I can write an inline "lock xsub decl" yada
 yada
 yada.  But it's painful and non-portable.  And we all know that writing
 assembler
 rots the soul.
 So, guys, I would like the following:

     1.  A volatile declaration so the compiler can do smart things while
 I do
          fast things.
     2.  A "volatile volatile" declaration or distinct operator or
 operator modified
          to tell the compiler to use an processor interlock instruction
 sequence OR
          give me a compile time error why it can't.

 There are probably smarter ways to do this than a volatile declaration.
 But something
 is needed in that niche.
You're wrong, writing assembler puts one into a State of Grace <g>.
Mar 21 2002
next sibling parent reply "Serge K" <skarebo programmer.net> writes:
 I notice there is no support for volatile, which perplexes me.  Volatile
 is necessary to
 warn an optimizer that another thread may change a data item without
 warning.
They'd have to be implemented with mutexes anyway, so might as well just wrap them in "synchronized".
"volatile" does not mean "atomic" or even "synchronized". It's just an indication that some variable in the memory can be changed from "outside". And nobody cares when *exactly* it happens, as long as it happens. For example: by another thread on the same processor. => everything is in the same cache - no problem here. by another processor, or any other hardware (DMA, ...) => any modern processor has support for cache coherency (MESI or better), in fact - it's a "must" thing for any processor with the cache. - no problem there. (..even i486 had it..)
 I agree that the C definition of volatile is next to useless.
Is it?
Mar 21 2002
parent reply "Walter" <walter digitalmars.com> writes:
"Serge K" <skarebo programmer.net> wrote in message
news:a7e2kc$17qp$1 digitaldaemon.com...
 I notice there is no support for volatile, which perplexes me.
Volatile
 is necessary to
 warn an optimizer that another thread may change a data item without
 warning.
They'd have to be implemented with mutexes anyway, so might as well just wrap them in "synchronized".
"volatile" does not mean "atomic" or even "synchronized".
It does in Java, which to me makes it more useful than C's notion of "don't put it in a register".
 It's just an indication that some variable in the memory can be changed
from "outside".
 And nobody cares when *exactly* it happens, as long as it happens.
 For example:
     by another thread on the same processor.
         => everything is in the same cache - no problem here.
     by another processor, or any other hardware (DMA, ...)
         => any modern processor has support for cache coherency
         (MESI or better), in fact - it's a "must" thing for any processor
with the cache.
         - no problem there. (..even i486 had it..)
If you are writing to, say, a long, the long will be two write cycles. In between those two, another thread could change part of it, resulting in a scrambled write.
 I agree that the C definition of volatile is next to useless.
Is it?
Since it does not guarantee atomic writes, yes, I believe it is useless.
Mar 21 2002
parent reply "Stephen Fuld" <s.fuld.pleaseremove att.net> writes:
"Walter" <walter digitalmars.com> wrote in message
news:a7entf$1hik$2 digitaldaemon.com...
 "Serge K" <skarebo programmer.net> wrote in message
 news:a7e2kc$17qp$1 digitaldaemon.com...
 I notice there is no support for volatile, which perplexes me.
Volatile
 is necessary to
 warn an optimizer that another thread may change a data item without
 warning.
They'd have to be implemented with mutexes anyway, so might as well
just
 wrap them in "synchronized".
"volatile" does not mean "atomic" or even "synchronized".
It does in Java, which to me makes it more useful than C's notion of
"don't
 put it in a register".
This is necessary in many embedded systems, even when they are single threaded and even some operating system applications. For example, it is common in embedded systems to have external hardware be made visible by memory mapping the external hardware registers into the process memory space. This makes it easy to use standard syntax to manipulate the register and is the only way to implement I/O on some processors. However, you can't let the CPU keep the "data" in a CPU register or it won't work. For example, an update to the register has to actually go to the external register to be effective. It doesn't accomplish anything to update the copy in a CPU register without doing the store as the external hardware might not see it for a long time. Similarly, of course, these external registers can change their contents as the state of the external hardware changes (For example, a status register showing the completion of some external operation.) You can't let the data be stay in a register as subsequent reads, in say a polling loop, wouldn't go to the actual hardware, or worse yet, even be "optimized" away altogether. Note that this is a different issue than cache coherence. -- - Stephen Fuld e-mail address disguised to prevent spam
Mar 22 2002
parent reply "Walter" <walter digitalmars.com> writes:
"Stephen Fuld" <s.fuld.pleaseremove att.net> wrote in message
news:a7fq36$2uha$1 digitaldaemon.com...
 It does in Java, which to me makes it more useful than C's notion of
"don't put it in a register". This is necessary in many embedded systems, even when they are single threaded and even some operating system applications. For example, it is common in embedded systems to have external hardware be made visible by memory mapping the external hardware registers into the process memory space. This makes it easy to use standard syntax to manipulate the
register
 and is the only way to implement I/O on some processors.  However, you
can't
 let the CPU keep the "data" in a CPU register or it won't work.  For
 example, an update to the register has to actually go to the external
 register to be effective.  It doesn't accomplish anything to update the
copy
 in a CPU register without doing the store as the external hardware might
not
 see it for a long time.  Similarly, of course, these external registers
can
 change their contents as the state of the external hardware changes (For
 example, a status register showing the completion of some external
 operation.)  You can't let the data be stay in a register as subsequent
 reads, in say a polling loop, wouldn't go to the actual hardware, or worse
 yet, even be "optimized" away altogether.  Note that this is a different
 issue than cache coherence.
I understand what you mean. It's still problematic how that actually winds up being implemented in the compiler. C doesn't really define how many reads are done to an arbitrary expression in order to implement it, for example: j = i++; How many times is i read? Once or twice? mov eax, i inc i mov j, eax or: mov eax, i mov j, eax inc eax mov i, eax These ambiguities to me mean that if you need precise control over memory read and write cycles, the appropriate thing to use is the inline assembler. Volatile may happen to work, but to my mind is unreliable and may change behavior from compiler to compiler. BTW, D's inline assembler is well integrated in with the compiler. The compiler can track register usage even in asm blocks, and can still optimize the surrounding code, unlike any other inline implementation I'm aware of.
Mar 22 2002
next sibling parent reply "Stephen Fuld" <s.fuld.pleaseremove att.net> writes:
"Walter" <walter digitalmars.com> wrote in message
news:a7ft6h$1ccq$1 digitaldaemon.com...
 "Stephen Fuld" <s.fuld.pleaseremove att.net> wrote in message
 news:a7fq36$2uha$1 digitaldaemon.com...
 It does in Java, which to me makes it more useful than C's notion of
"don't put it in a register". This is necessary in many embedded systems, even when they are single threaded and even some operating system applications. For example, it
is
 common in embedded systems to have external hardware be made visible by
 memory mapping the external hardware registers into the process memory
 space.  This makes it easy to use standard syntax to manipulate the
register
 and is the only way to implement I/O on some processors.  However, you
can't
 let the CPU keep the "data" in a CPU register or it won't work.  For
 example, an update to the register has to actually go to the external
 register to be effective.  It doesn't accomplish anything to update the
copy
 in a CPU register without doing the store as the external hardware might
not
 see it for a long time.  Similarly, of course, these external registers
can
 change their contents as the state of the external hardware changes (For
 example, a status register showing the completion of some external
 operation.)  You can't let the data be stay in a register as subsequent
 reads, in say a polling loop, wouldn't go to the actual hardware, or
worse
 yet, even be "optimized" away altogether.  Note that this is a different
 issue than cache coherence.
I understand what you mean. It's still problematic how that actually winds up being implemented in the compiler. C doesn't really define how many
reads
 are done to an arbitrary expression in order to implement it, for example:
     j = i++;
 How many times is i read? Once or twice?
     mov eax, i
     inc i
     mov j, eax
 or:
     mov eax, i
     mov j, eax
     inc eax
     mov i, eax
 These ambiguities to me mean that if you need precise control over memory
 read and write cycles, the appropriate thing to use is the inline
assembler.
 Volatile may happen to work, but to my mind is unreliable and may change
 behavior from compiler to compiler.

 BTW, D's inline assembler is well integrated in with the compiler. The
 compiler can track register usage even in asm blocks, and can still
optimize
 the surrounding code, unlike any other inline implementation I'm aware of.
While I agree that you can use inline asm, and there are ways to code that could cause trouble, in practice, it works pretty well. People don't do things like post increment external registers when reading them. I know the syntax allows it, but programmers, especially embedded programmers learn pretty quickly what things to do and what not to do with the hardware they have. In practice, most uses of stuff like this is to read the whole register and test some bits or extract a field, or to create a word with the desired contents and write it in one piece to the external register. So, while volatile isn't a complete solution, it avoids having to delve into asm for the vast majority of such uses. -- - Stephen Fuld e-mail address disguised to prevent spam
Mar 22 2002
parent reply "Walter" <walter digitalmars.com> writes:
"Stephen Fuld" <s.fuld.pleaseremove att.net> wrote in message
news:a7g4uv$2q8r$1 digitaldaemon.com...
 While I agree that you can use inline asm, and there are ways to code that
 could cause trouble, in practice, it works pretty well.  People don't do
 things like post increment external registers when reading them.  I know
the
 syntax allows it, but programmers, especially embedded programmers learn
 pretty quickly what things to do and what not to do with the hardware they
 have.  In practice, most uses of stuff like this is to read the whole
 register and test some bits or extract a field, or to create a word with
the
 desired contents and write it in one piece to the external register.  So,
 while volatile isn't a complete solution, it avoids having to delve into
asm
 for the vast majority of such uses.
Wouldn't it be better to have a more reliable method than trial and error? Trial and error is subject to subtle changes if a new compiler is used. I also wish to point out that volatile permeates the typing system in a C/C++ compiler. There is a great deal of code to keep everything straight in the contexts of overloading, casting, type copying, etc. I don't see why volatile is that necessary for hardware registers. You can still easilly read a hardware register by setting a pointer to it and going *p. The compiler isn't going to skip the write to it through *p (it's very, very hard for a C optimizer to remove dead stores through pointers, due to the aliasing problem). Any reads through a pointer are not cached across any assignments through a pointer, including any function calls (again, due to the aliasing problem). For example, the second read of *p will not get cached away: x = *p; // first read func(); // call function to prevent caching of pointer results y = *p; // second read func() can simply consist of RET. To do, say, a spin lock on *p: while (*p != value) func();
Mar 22 2002
next sibling parent reply "Stephen Fuld" <s.fuld.pleaseremove att.net> writes:
"Walter" <walter digitalmars.com> wrote in message
news:a7gfrs$35e$1 digitaldaemon.com...
 "Stephen Fuld" <s.fuld.pleaseremove att.net> wrote in message
 news:a7g4uv$2q8r$1 digitaldaemon.com...
 While I agree that you can use inline asm, and there are ways to code
that
 could cause trouble, in practice, it works pretty well.  People don't do
 things like post increment external registers when reading them.  I know
the
 syntax allows it, but programmers, especially embedded programmers learn
 pretty quickly what things to do and what not to do with the hardware
they
 have.  In practice, most uses of stuff like this is to read the whole
 register and test some bits or extract a field, or to create a word with
the
 desired contents and write it in one piece to the external register.
So,
 while volatile isn't a complete solution, it avoids having to delve into
asm
 for the vast majority of such uses.
Wouldn't it be better to have a more reliable method than trial and error?
Of course! :-)
 Trial and error is subject to subtle changes if a new compiler is used.
Yes.
 I also wish to point out that volatile permeates the typing system in a
 C/C++ compiler. There is a great deal of code to keep everything straight
in
 the contexts of overloading, casting, type copying, etc.
I'll take your word for what is required within the compiler. I'm a compiler user, not a designer.
 I don't see why volatile is that necessary for hardware registers. You can
 still easilly read a hardware register by setting a pointer to it and
going
 *p.
Sure. But I am trying, as I thought you were with D, trying to minimize/eliminate the use of pointers in the source code as a major source of error.
 The compiler isn't going to skip the write to it through *p (it's very,
 very hard for a C optimizer to remove dead stores through pointers, due to
 the aliasing problem).
Again, I am not a compiler designer, but "very very hard" implies that it isn't impossible and therefore, some future compiler *could* do it and thus breaking code as you described the problem above. :-(
 Any reads through a pointer are not cached across any
 assignments through a pointer, including any function calls (again, due to
 the aliasing problem). For example, the second read of *p will not get
 cached away:

     x = *p;        // first read
     func();        // call function to prevent caching of pointer results
     y = *p;        // second read

 func() can simply consist of RET. To do, say, a spin lock on *p:

     while (*p != value)
         func();
Oh, that's intuitive! :-( Add an extra empty function call in order to prevent the compiler from doing some undesirable optimization. Uccccch! There has got to be a better way to address the problem than this. I'm not wedded to the "volatile" syntax and certainly not wedded to how C does things. I was just pointing out, for those who have never done embedded programming, a major reason for that syntax. If you can come up with a better solution (I guess I don't count the ones you have proposed so far to be better.) than I am all for it. You have showed such immagination in solving other C/C++ deficiencies that I have reason to hope you can solve this one elegantly. - Sorry to put you on the spot. :-) -- - Stephen Fuld e-mail address disguised to prevent spam
Mar 22 2002
next sibling parent reply "Walter" <walter digitalmars.com> writes:
"Stephen Fuld" <s.fuld.pleaseremove att.net> wrote in message
news:a7gom0$96n$1 digitaldaemon.com...
 "Walter" <walter digitalmars.com> wrote in message
 news:a7gfrs$35e$1 digitaldaemon.com...
 I don't see why volatile is that necessary for hardware registers. You
can
 still easilly read a hardware register by setting a pointer to it and
going
 *p.
Sure. But I am trying, as I thought you were with D, trying to minimize/eliminate the use of pointers in the source code as a major
source
 of error.
Pointers are still in D, for the reason that sometimes you just gotta have them. Minimizing them is a design goal, though. Also, to access hardware registers, you're going to need pointers because there is no way to specify absolute addresses for variables.
 The compiler isn't going to skip the write to it through *p (it's very,
 very hard for a C optimizer to remove dead stores through pointers, due
to
 the aliasing problem).
Again, I am not a compiler designer, but "very very hard" implies that it isn't impossible and therefore, some future compiler *could* do it and
thus
 breaking code as you described the problem above.  :-(
To make it impossible just have the pointer set in a function that the compiler doesn't know about.
 Any reads through a pointer are not cached across any
 assignments through a pointer, including any function calls (again, due
to
 the aliasing problem). For example, the second read of *p will not get
 cached away:
     x = *p;        // first read
     func();        // call function to prevent caching of pointer
results
     y = *p;        // second read
 func() can simply consist of RET. To do, say, a spin lock on *p:
     while (*p != value)
         func();
Oh, that's intuitive! :-( Add an extra empty function call in order to prevent the compiler from doing some undesirable optimization. Uccccch! There has got to be a better way to address the problem than this. I'm
not
 wedded to the "volatile" syntax and certainly not wedded to how C does
 things.  I was just pointing out, for those who have never done embedded
 programming, a major reason for that syntax.  If you can come up with a
 better solution (I guess I don't count the ones you have proposed so far
to
 be better.) than I am all for it.
Yeah, I understand it isn't the greatest, but it'll work reliably. I also happen to be fond of inline assembler when dealing with hardware <g>.
 You have showed such immagination in
 solving other C/C++ deficiencies that I have reason to hope you can solve
 this one elegantly.
Ahem. I'm on to that tactic!
Mar 26 2002
parent reply "Stephen Fuld" <s.fuld.pleaseremove att.net> writes:
"Walter" <walter digitalmars.com> wrote in message
news:a7qh13$15fb$1 digitaldaemon.com...
 "Stephen Fuld" <s.fuld.pleaseremove att.net> wrote in message
 news:a7gom0$96n$1 digitaldaemon.com...
 "Walter" <walter digitalmars.com> wrote in message
 news:a7gfrs$35e$1 digitaldaemon.com...
 I don't see why volatile is that necessary for hardware registers. You
can
 still easilly read a hardware register by setting a pointer to it and
going
 *p.
Sure. But I am trying, as I thought you were with D, trying to minimize/eliminate the use of pointers in the source code as a major
source
 of error.
Pointers are still in D, for the reason that sometimes you just gotta have them.
Sure.
 Minimizing them is a design goal, though.
And a worthy one.
 Also, to access hardware
 registers, you're going to need pointers because there is no way to
specify
 absolute addresses for variables.
Well, you could change that and eliminate one more use of pointers. I know of at least one language that allows the specification of absolute addresses for variables. You have to be careful when to allow/implement it, but it seems to work well. Some versions of the compiler (like the one given to students) just ignore the extra specification, but there are versions (you could use options) to support this. Another way to do it is to honor the requests but make the addresses program absolute and rely on the linker and other external things like the loader (or Prom/flash) burne to make them truely absolute. BTW, their syntax is varname type address
 The compiler isn't going to skip the write to it through *p (it's
very,
 very hard for a C optimizer to remove dead stores through pointers,
due
 to
 the aliasing problem).
Again, I am not a compiler designer, but "very very hard" implies that
it
 isn't impossible and therefore, some future compiler *could* do it and
thus
 breaking code as you described the problem above.  :-(
To make it impossible just have the pointer set in a function that the compiler doesn't know about.
Yes, but that is another "work around" that just doesn't seem "natural" Adding extra requirements that the programmer needs to know about in order to "trick" the compiler into doing the right thing are IMNSHO, not the right way to go.
 Any reads through a pointer are not cached across any
 assignments through a pointer, including any function calls (again,
due
 to
 the aliasing problem). For example, the second read of *p will not get
 cached away:
     x = *p;        // first read
     func();        // call function to prevent caching of pointer
results
     y = *p;        // second read
 func() can simply consist of RET. To do, say, a spin lock on *p:
     while (*p != value)
         func();
Oh, that's intuitive! :-( Add an extra empty function call in order to prevent the compiler from doing some undesirable optimization. Uccccch! There has got to be a better way to address the problem than this. I'm
not
 wedded to the "volatile" syntax and certainly not wedded to how C does
 things.  I was just pointing out, for those who have never done embedded
 programming, a major reason for that syntax.  If you can come up with a
 better solution (I guess I don't count the ones you have proposed so far
to
 be better.) than I am all for it.
Yeah, I understand it isn't the greatest, but it'll work reliably.
Agreed. I am working toward comming up with "the greatest" solution. :-)
 I also
 happen to be fond of inline assembler when dealing with hardware <g>.
An affliction that I am afraid is chronic, and probably not curable. :-) As I believe that the purpose of a high level language is to minimize the use of assembler, I am not so afflicted. You can always drop to assembler, but that is precisely what we are trying to avoid as much as possible.
 You have showed such immagination in
 solving other C/C++ deficiencies that I have reason to hope you can
solve
 this one elegantly.
Ahem. I'm on to that tactic!
But, based on your next post about "sequential", it seems to have worked :-) (I'll respond to that post there.) That is my goal here. To promote discussion on variious ways of solving the problems in order for the best one to come out. -- - Stephen Fuld e-mail address disguised to prevent spam
Mar 27 2002
parent reply "Pavel Minayev" <evilone omen.ru> writes:
"Stephen Fuld" <s.fuld.pleaseremove att.net> wrote in message
news:a7t4c6$2jfd$1 digitaldaemon.com...

 Well, you could change that and eliminate one more use of pointers.  I
know
 of at least one language that allows the specification of absolute
addresses Borland Pascal had it. It was great for low-level programming, indeed.
Mar 27 2002
parent "OddesE" <OddesE_XYZ hotmail.com> writes:
"Pavel Minayev" <evilone omen.ru> wrote in message
news:a7tdf1$2o5m$1 digitaldaemon.com...
 "Stephen Fuld" <s.fuld.pleaseremove att.net> wrote in message
 news:a7t4c6$2jfd$1 digitaldaemon.com...

 Well, you could change that and eliminate one more use of pointers.  I
know
 of at least one language that allows the specification of absolute
addresses Borland Pascal had it. It was great for low-level programming, indeed.
Yeah I loved it! Also great for addressing BIOS vars and VGA memory (in the old DOS days)... :) -- Stijn OddesE_XYZ hotmail.com http://OddesE.cjb.net _________________________________________________ Remove _XYZ from my address when replying by mail
Mar 27 2002
prev sibling parent reply "Walter" <walter digitalmars.com> writes:
"Stephen Fuld" <s.fuld.pleaseremove att.net> wrote in message
news:a7gom0$96n$1 digitaldaemon.com...
 Oh, that's intuitive!  :-(  Add an extra empty function call in order to
prevent the compiler from doing some undesirable optimization. Uccccch! There has got to be a better way to address the problem than this. I'm
not
 wedded to the "volatile" syntax and certainly not wedded to how C does
 things.  I was just pointing out, for those who have never done embedded
 programming, a major reason for that syntax.  If you can come up with a
 better solution (I guess I don't count the ones you have proposed so far
to
 be better.) than I am all for it.  You have showed such immagination in
 solving other C/C++ deficiencies that I have reason to hope you can solve
 this one elegantly.
I did have a thought. How about a keyword "sequence", as in: sequence; // no caching across this keyword x = *p; // *p is always reloaded and: x = *p; sequence; // *p is not cached
Mar 26 2002
next sibling parent reply Russ Lewis <spamhole-2001-07-16 deming-os.org> writes:
Walter wrote:

 I did have a thought. How about a keyword "sequence", as in:

     sequence;        // no caching across this keyword
     x = *p;            // *p is always reloaded

 and:
     x = *p;
     sequence;        // *p is not cached
Not a bad idea, although I don't like the idea that it removes ALL caching. How about also adding a block syntax, where caching is only disabled on the statements in the block: y = *q; sequence { x = *p; }// *p is NOT cached func(*q); // *q is still cached Pardon me if I'm being anal, but it seems like we should make 'sequence' impact as few lines of code as possible, so you can still mix good optimization into the same code block. Of course, somebody's going to say (for their hardware registers) that they will have to add 'sequence' to every line that uses the register, and they're going to ask for a 'sequence' type modifier...and we're back to volatile. :( -- The Villagers are Online! villagersonline.com .[ (the fox.(quick,brown)) jumped.over(the dog.lazy) ] .[ (a version.of(English).(precise.more)) is(possible) ] ?[ you want.to(help(develop(it))) ]
Mar 26 2002
parent "Walter" <walter digitalmars.com> writes:
"Russ Lewis" <spamhole-2001-07-16 deming-os.org> wrote in message
news:3CA0F98A.265E2A99 deming-os.org...
 Walter wrote:

 I did have a thought. How about a keyword "sequence", as in:

     sequence;        // no caching across this keyword
     x = *p;            // *p is always reloaded

 and:
     x = *p;
     sequence;        // *p is not cached
Not a bad idea, although I don't like the idea that it removes ALL
caching.
 How about also adding a block syntax, where caching is only disabled on
the
 statements in the block:
     y = *q;
     sequence { x = *p; }// *p is NOT cached
     func(*q);    // *q is still cached

 Pardon me if I'm being anal, but it seems like we should make 'sequence'
impact
 as few lines of code as possible, so you can still mix good optimization
into
 the same code block.
Sequence won't affect enregistering variables, which is the big speed win, not caching. I think it will have a negligible affect on performance. Sequence fits nicely into the optimizer, because a special op is just inserted into the instruction stream that causes a 'kill' in the data flow analysis.
 Of course, somebody's going to say (for their hardware registers) that
they
 will have to add 'sequence' to every line that uses the register, and
they're
 going to ask for a 'sequence' type modifier...and we're back to volatile.
:( Nobody's ever happy <g>.
Mar 26 2002
prev sibling next sibling parent reply "Richard Krehbiel" <rich kastle.com> writes:
	charset="Windows-1252"
Content-Transfer-Encoding: quoted-printable

(Apology: This message is HTML so a massive link might still be =
clickable.)

"Walter" <walter digitalmars.com> wrote in message =
news:a7qrji$1bnv$1 digitaldaemon.com...
=20
 "Stephen Fuld" <s.fuld.pleaseremove att.net> wrote in message
 news:a7gom0$96n$1 digitaldaemon.com...
 Oh, that's intuitive!  :-(  Add an extra empty function call in =
order to
 prevent the compiler from doing some undesirable optimization.  =
Uccccch!
 There has got to be a better way to address the problem than this.  =
I'm
 not
 wedded to the "volatile" syntax and certainly not wedded to how C =
does
 things.  I was just pointing out, for those who have never done =
embedded
 programming, a major reason for that syntax.  If you can come up =
with a
 better solution (I guess I don't count the ones you have proposed so =
far
 to
 be better.) than I am all for it.  You have showed such immagination =
in
 solving other C/C++ deficiencies that I have reason to hope you can =
solve
 this one elegantly.
=20 I did have a thought. How about a keyword "sequence", as in: =20 sequence; // no caching across this keyword x =3D *p; // *p is always reloaded =20 and: x =3D *p; sequence; // *p is not cached =20 =20
This reminded me of something, so I did a quick Google search. Go read a Linux Torvalds rant about SMP-safety, volatile, and = "barrier()" (which is the Linux kernel's equivalent of "sequence"). And = much of the thread is interesting, so I'm linking the whole thing (with = this massive link - sorry). http://groups.google.com/groups?hl=3Den&threadm=3Dlinux.kernel.Pine.LNX.4= .33.0107231546430.7916-100000%40penguin.transmeta.com&rnum=3D5&prev=3D/gr= oups%3Fq%3Dtorvalds%2Btransmeta%2Bbarrier%26hl%3Den Boiled down, Torvalds believes that "volatile" as a storage class = modifier is always wrong; if "volatile" semantics (whatever they are) = are needed, then apply them at the moment of access (as with a cast). --=20 Richard Krehbiel, Arlington, VA, USA rich kastle.com (work) or krehbiel3 comcast.net (personal)
Mar 27 2002
parent "Walter" <walter digitalmars.com> writes:
	charset="Windows-1252"
Content-Transfer-Encoding: quoted-printable

That's a great link! Thanks. Interestingly, Linus appears to have come =
to the same conclusion about volatile I did:

"But the fact is, that when you add "volatile" to
the register, it really tells gcc "Be afraid.  Be very afraid. This user
expects some random behaviour that is not actually covered by any
standard, so just don't ever use this variable for any optimizations, =
even
if they are obviously correct. That way he can't complain". -Linus
  "Richard Krehbiel" <rich kastle.com> wrote in message =
news:a7secs$27fl$1 digitaldaemon.com...
  (Apology: This message is HTML so a massive link might still be =
clickable.)
  Go read a Linux Torvalds rant about SMP-safety, volatile, and =
"barrier()" (which is the Linux kernel's equivalent of "sequence").  And =
much of the thread is interesting, so I'm linking the whole thing (with =
this massive link - sorry).

  =
http://groups.google.com/groups?hl=3Den&threadm=3Dlinux.kernel.Pine.LNX.4=
.33.0107231546430.7916-100000%40penguin.transmeta.com&rnum=3D5&prev=3D/gr=
oups%3Fq%3Dtorvalds%2Btransmeta%2Bbarrier%26hl%3Den

  Boiled down, Torvalds believes that "volatile" as a storage class =
modifier is always wrong; if "volatile" semantics (whatever they are) =
are needed, then apply them at the moment of access (as with a cast).
Mar 31 2002
prev sibling parent reply "Stephen Fuld" <s.fuld.pleaseremove att.net> writes:
"Walter" <walter digitalmars.com> wrote in message
news:a7qrji$1bnv$1 digitaldaemon.com...
 "Stephen Fuld" <s.fuld.pleaseremove att.net> wrote in message
 news:a7gom0$96n$1 digitaldaemon.com...
 Oh, that's intuitive!  :-(  Add an extra empty function call in order
to
 prevent the compiler from doing some undesirable optimization.  Uccccch!
 There has got to be a better way to address the problem than this.  I'm
not
 wedded to the "volatile" syntax and certainly not wedded to how C does
 things.  I was just pointing out, for those who have never done embedded
 programming, a major reason for that syntax.  If you can come up with a
 better solution (I guess I don't count the ones you have proposed so far
to
 be better.) than I am all for it.  You have showed such immagination in
 solving other C/C++ deficiencies that I have reason to hope you can
solve
 this one elegantly.
I did have a thought. How about a keyword "sequence", as in: sequence; // no caching across this keyword x = *p; // *p is always reloaded and: x = *p; sequence; // *p is not cached
I think the fundamental question is whether the "non registerability" should be a property of the variable (that is, "volatile") or of the particular access to the variable (that is, "sequence"). I guess there are two types of situations where this functionality is required, variables shared among multiple threads and physical hardware registers. For the latter, since we are talking about a direct, one to one relationship between a variable and a particular piece of physical hardware, I think it is clearly a property of the variable itself. For the former, I guess it it could be considered either. But in practical terms, since one thread can't know when another thread is going to access the variable, you probably don't want the variable living in a register for any significant length of time, and probably want a simple locking mechanism as well. So I guess I come down on the side of making it a property of the variable, not the particular access. I think that will reduce source program size, eliminate the class of bugs that might occur for someone "forgetting" to put in the sequence keyword, etc. The lock mechanism is a separate issue, but I do believe there should be a defined access to the low cost locks offerred by atomic instructions in most architectures. -- - Stephen Fuld e-mail address disguised to prevent spam
Mar 27 2002
parent reply "OddesE" <OddesE_XYZ hotmail.com> writes:
"Stephen Fuld" <s.fuld.pleaseremove att.net> wrote in message
news:a7t4ca$2jfd$2 digitaldaemon.com...
<SNIP>
 The lock mechanism is a separate issue, but I do believe there should be a
 defined access to the low cost locks offerred by atomic instructions in
most
 architectures.

 --
  - Stephen Fuld
    e-mail address disguised to prevent spam
Isn't depending on atomic instructions dangerous? What about multi-processor systems, where two atomic instructions might execute simultaneously? -- Stijn OddesE_XYZ hotmail.com http://OddesE.cjb.net __________________________________________ Remove _XYZ from my address when replying by mail
Mar 27 2002
parent reply "Stephen Fuld" <s.fuld.pleaseremove att.net> writes:
"OddesE" <OddesE_XYZ hotmail.com> wrote in message
news:a7tf5d$2p0k$1 digitaldaemon.com...
 "Stephen Fuld" <s.fuld.pleaseremove att.net> wrote in message
 news:a7t4ca$2jfd$2 digitaldaemon.com...
 <SNIP>
 The lock mechanism is a separate issue, but I do believe there should be
a
 defined access to the low cost locks offerred by atomic instructions in
most
 architectures.

 --
  - Stephen Fuld
    e-mail address disguised to prevent spam
Isn't depending on atomic instructions dangerous? What about multi-processor systems, where two atomic instructions might execute simultaneously?
The atomic instructions I was talking about are things like test and set, compare and swap, or atomic fetch-op-store, where the memory is locked for the duration of the instruction. These are safe in multi-processor systems. Sorry if I confused you. -- - Stephen Fuld e-mail address disguised to prevent spam
 --
 Stijn
 OddesE_XYZ hotmail.com
 http://OddesE.cjb.net
 __________________________________________
 Remove _XYZ from my address when replying by mail
Mar 27 2002
parent "OddesE" <OddesE_XYZ hotmail.com> writes:
"Stephen Fuld" <s.fuld.pleaseremove att.net> wrote in message
news:a7tm71$2sjd$2 digitaldaemon.com...
 "OddesE" <OddesE_XYZ hotmail.com> wrote in message
 news:a7tf5d$2p0k$1 digitaldaemon.com...
 "Stephen Fuld" <s.fuld.pleaseremove att.net> wrote in message
 news:a7t4ca$2jfd$2 digitaldaemon.com...
 <SNIP>
 The lock mechanism is a separate issue, but I do believe there should
be
 a
 defined access to the low cost locks offerred by atomic instructions
in
 most
 architectures.

 --
  - Stephen Fuld
    e-mail address disguised to prevent spam
Isn't depending on atomic instructions dangerous? What about multi-processor systems, where two atomic instructions might execute simultaneously?
The atomic instructions I was talking about are things like test and set, compare and swap, or atomic fetch-op-store, where the memory is locked for the duration of the instruction. These are safe in multi-processor
systems.
 Sorry if I confused you.

 --
  - Stephen Fuld
    e-mail address disguised to prevent spam
You didn't confuse me, the topic just does. Multi threading issues are one of my weaker points when it comes to programming... :( Thanks for clearing it up. -- Stijn OddesE_XYZ hotmail.com http://OddesE.cjb.net _________________________________________________ Remove _XYZ from my address when replying by mail
Mar 28 2002
prev sibling parent "Richard Krehbiel" <krehbiel3 comcast.net> writes:
"Walter" <walter digitalmars.com> wrote in message
news:a7gfrs$35e$1 digitaldaemon.com...
 I don't see why volatile is that necessary for hardware registers. You can
 still easilly read a hardware register by setting a pointer to it and
going
 *p. The compiler isn't going to skip the write to it through *p (it's
very,
 very hard for a C optimizer to remove dead stores through pointers, due to
 the aliasing problem).
The linux crowd had the devil of a time with a new release of GCC. It seems that the standard for C states that acessing the bytes of one object does not necessarily alias the bytes of any other object if their accesses are by different types, unless one is char. This means that in: auto float f; *(volatile long *)&f = 0; ...this need not visibly affect the object f. Yep.
Mar 26 2002
prev sibling next sibling parent reply "Serge K" <skarebo programmer.net> writes:
 BTW, D's inline assembler is well integrated in with the compiler. The
 compiler can track register usage even in asm blocks, and can still optimize
 the surrounding code, unlike any other inline implementation I'm aware of.
You should try Visual C++ for Alpha. It can optimize not only the surrounding code, but inline assembly code as well. I was truly amazed when I've noticed that.
Mar 22 2002
parent "Walter" <walter digitalmars.com> writes:
"Serge K" <skarebo programmer.net> wrote in message
news:a7gclf$ej$1 digitaldaemon.com...
 BTW, D's inline assembler is well integrated in with the compiler. The
 compiler can track register usage even in asm blocks, and can still
optimize
 the surrounding code, unlike any other inline implementation I'm aware
of.
 You should try Visual C++ for Alpha.
 It can optimize not only the surrounding code,
 but inline assembly code as well.
 I was truly amazed when I've noticed that.
D's instruction scheduler (and peephole optimizer) is specifically prevented from operating on the inline assembler blocks. I'm a little surprised that a compiler wouldn't do that. The whole point of inline asm is to wrest control away from the compiler and precisely lay out the instructions.
Mar 22 2002
prev sibling parent reply Karl Bochert <kbochert ix.netcom.com> writes:
On Fri, 22 Mar 2002 10:20:54 -0800, "Walter" <walter digitalmars.com> wrote:

 BTW, D's inline assembler is well integrated in with the compiler. The
 compiler can track register usage even in asm blocks, and can still optimize
 the surrounding code, unlike any other inline implementation I'm aware of.
 
Watcom has a form of asm that allows optimization. #pragma aux setSP = \ "mov ESP, eax" \ parm [eax] \ modify [EAX] ; #pragma aux getSP = \ "mov edx, esp" \ value [edx] modify [eax]; Then: ... current_sp = getSP() --- is fully optimized. It also has the 'asm("mov eax, esp") form, which I believe is opaque to the compiler. Watcom also allows register passing convention in addition to the standard _stdcall and _stddecl. This, and extensive optimization, enables it to produce the fastest C code of any compiler that I am aware of. An excellent back-end for D, someday. free too ;-) Karl Bochert
Mar 23 2002
next sibling parent reply "Pavel Minayev" <evilone omen.ru> writes:
"Karl Bochert" <kbochert ix.netcom.com> wrote in message
news:1103_1016902791 bose...

 Watcom also allows register passing convention in addition
 to the standard _stdcall and _stddecl.  This, and extensive optimization,
 enables it to produce the fastest C code of any compiler
AFAIK, D chooses calling convention on its own, and might use fastcall where it seems better.
 that I am aware of. An excellent back-end for D, someday.

 free too ;-)
Hm? Where can I get it, then?
Mar 23 2002
parent Karl Bochert <kbochert ix.netcom.com> writes:
On Sat, 23 Mar 2002 22:49:09 +0300, "Pavel Minayev" <evilone omen.ru> wrote:
 "Karl Bochert" <kbochert ix.netcom.com> wrote in message
 news:1103_1016902791 bose...
 
 Watcom also allows register passing convention in addition
 to the standard _stdcall and _stddecl.  This, and extensive optimization,
 enables it to produce the fastest C code of any compiler
AFAIK, D chooses calling convention on its own, and might use fastcall where it seems better.
 that I am aware of. An excellent back-end for D, someday.

 free too ;-)
Hm? Where can I get it, then?
To quote from a message on the Euphoria newsgroup " OpenWatcom is available as most of you know. The Beta to 11c does compile Euphoria Translated Code and runs much faster than LCC or Borland but you have to know a few tricks to get Watcom to work at all because the libraries and header files arent included in the beta release. I have the solution to this problem! Download Watcom 11c beta Download Masm32 by Hutch " I did this and the only problem I had was that I downloaded the file groups individually and missed one. Also the Watcom resource compiler is missing. The URL's are: http://www.openwatcom.org/ http://www.movsd.com/masm.htm A couple of benchmarks: http://www.byte.com/art/9801/sec12/art7.htm. http://www.geocities.com/SiliconValley/Vista/6552/compila.html. Karl Bochert
Mar 23 2002
prev sibling next sibling parent "Sean L. Palmer" <spalmer iname.com> writes:
Watcom did run circles around the competition back in the day.  GCC's inline
asm provides a similar amount of information to the optimizer so in theory
it should be able to perform as well as Watcom (but in practice it doesn't,
from what I can tell so far)

Watcom's inline asm had one main problem, which GCC doesn't:  Watcom didn't
let your inline asm request an empty register from the compiler... you just
used a given register and the asm around the call would be rearranged to
make room for the register your inline asm used.  For recursive functions
that doesn't work so well.  For instance if you made a vector add routine
where the vectors are pointed to by edx and eax, then edx and eax would
become bottleneck registers whilst doing lots of vector adds and would end
up getting pushed and popped alot.

Sean


 Watcom also allows register passing convention in addition
 to the standard _stdcall and _stddecl.  This, and extensive optimization,
 enables it to produce the fastest C code of any compiler
 that I am aware of. An excellent back-end for D, someday.

 free too ;-)

 Karl Bochert
Mar 25 2002
prev sibling parent "Walter" <walter digitalmars.com> writes:
"Karl Bochert" <kbochert ix.netcom.com> wrote in message
news:1103_1016902791 bose...
 On Fri, 22 Mar 2002 10:20:54 -0800, "Walter" <walter digitalmars.com>
wrote:
 BTW, D's inline assembler is well integrated in with the compiler. The
 compiler can track register usage even in asm blocks, and can still
optimize
 the surrounding code, unlike any other inline implementation I'm aware
of.
 Watcom has a form of asm that allows optimization.

 #pragma aux  setSP = \
     "mov ESP,  eax"  \
     parm [eax]            \
    modify [EAX] ;
 #pragma aux getSP = \
     "mov edx, esp" \
     value [edx] modify [eax];
 Then:
     current_sp = getSP()
 is fully optimized.
The Digital Mars optimizer doesn't need those hints to be specified by the user, it just analyzes the instructions.
 Watcom also allows register passing convention in addition
 to the standard _stdcall and _stddecl.  This, and extensive optimization,
 enables it to produce the fastest C code of any compiler
 that I am aware of.
My marketing has always been bad. I remember magazine compiler reviews where the reviewer's own numbers showed us to be the fastest compiler, but borland got the writeup as fastest. Where we produced the fastest benchmarks according to the reviewer's own numbers, but watcom got the writeup as fastest. It's all a bit maddening <g>.
Mar 26 2002
prev sibling parent reply "Jim Starkey" <jas netfrastructure.com> writes:
Walter wrote in message ...
 I notice there is no support for volatile, which perplexes me.  Volatile
 is necessary to
 warn an optimizer that another thread may change a data item without
 warning.
They'd have to be implemented with mutexes anyway, so might as well just wrap them in "synchronized". Note: the X86 CPU doesn't guarantee that writing to memory will be atomic if the item crosses a 32 bit word
boundary,
which can happen writing doubles, longs, or even misaligned ints.
No, it neither necessary nor desirable to use mutexes. Yes, there are restrictions on the interlocked instructions, but since volatile is implemented/ enforced by the compiler, this should be acceptable. The compiler's responsibility should be to either implement an operation atomically or generate a diagnostic explaining why it can't. An example of something that can be cheaply handled by enhanced volatile is use counts by objects shared across threads. An atomic interlocked decrement implemented with "lock xsub decl" does the trick correctly with no more cost than an extra bus cycle, where a mutex requires an OS call. The ratio of costs are probably three orders of magnitude or more.
I agree that the C definition of volatile is next to useless.
I didn't mean to imply that the C definition of volatle is next to useless -- it is, in fact, absolutely critical for all but the most primitive multi-threaded code. Even when used with mutexes volatile is necessary to warn the optimizer off unwarranted assumptions of invariance. If D is going to succeed, it is necessary to anticipate where computer architures are going. Everyone, I hope, understands that memory is cheap and plentiful, larger virtual address spaces are in easy sight, and dirt cheap multi-processors are here. Although we're in a period of rapidly increasing clock rates, we're also approaching physical limits on feature size. In the not distant future it will be cheaper to add more processors than buy/build faster ones. At that point performance will be gated by the degree to which doubling the number of processors doubles the speed of the system. There are a hierarchy of synchronization primitives -- interlocked instructions, shared/exclusive locks, and mutexes -- with a large variation in cost. Interlocked instructions are almost free, mutexes cost an arm and a leg. Forcing all synchronization to use mutexes is an unnecessary waste of resources. In the absence of volatile, however, it is impossible to implement finer grained sychronization primitives. This doesn't strike me as wise....
Mar 22 2002
parent "Walter" <walter digitalmars.com> writes:
"Jim Starkey" <jas netfrastructure.com> wrote in message
news:a7fk2p$20rj$1 digitaldaemon.com...
They'd have to be implemented with mutexes anyway, so might as well just
wrap them in "synchronized". Note: the X86 CPU doesn't guarantee that
writing to memory will be atomic if the item crosses a 32 bit word
boundary,
which can happen writing doubles, longs, or even misaligned ints.
No, it neither necessary nor desirable to use mutexes. Yes, there are restrictions on the interlocked instructions, but since volatile is implemented/ enforced by the compiler, this should be acceptable. The compiler's responsibility should be to either implement an operation atomically or generate a diagnostic explaining why it can't.
Writes to bytes and aligned words/dwords are done atomically by the CPU, misaligned data and multiword data is not.
 An example of something that can be cheaply handled by enhanced volatile
 is use counts by objects shared across threads.  An atomic interlocked
 decrement implemented with "lock xsub decl" does the trick correctly
 with no more cost than an extra bus cycle, where a mutex requires an
 OS call.  The ratio of costs are probably three orders of magnitude or
 more.
Synchronizing mutexes do not require an os call most of the time, although they still are slower than a simple lock. None of the modern java vm's do an os call for each synchronize.
I agree that the C definition of volatile is next to useless.
I didn't mean to imply that the C definition of volatle is next to useless -- it is, in fact, absolutely critical for all but the most primitive multi-threaded code. Even when used with mutexes volatile is necessary to warn the optimizer off unwarranted assumptions of invariance.
I'm sorry, I just don't see how. See my other post here about j=i++; and how volatile doesn't help.
 If D is going to succeed, it is necessary to anticipate where computer
 architures are going.  Everyone, I hope, understands that memory is
 cheap and plentiful, larger virtual address spaces are in easy sight,
 and dirt cheap multi-processors are here.  Although we're in a period
 of rapidly increasing clock rates, we're also approaching physical
 limits on feature size.  In the not distant future it will be cheaper to
 add more processors than buy/build faster ones.  At that point
 performance will be gated by the degree to which doubling the
 number of processors doubles the speed of the system.
I think you're right.
 There are a hierarchy of synchronization primitives -- interlocked
 instructions, shared/exclusive locks, and mutexes -- with a large
 variation in cost.  Interlocked instructions are almost free, mutexes
 cost an arm and a leg.  Forcing all synchronization to use mutexes
 is an unnecessary waste of resources.  In the absence of
 volatile, however, it is impossible to implement finer grained
 sychronization primitives.  This doesn't strike me as wise....
I think your points merit further investigation, though I don't see how volatile is the answer.
Mar 22 2002