www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Inline assembler and optimization

reply Arcane Jill <Arcane_member pathlink.com> writes:
Question: is optimization done before or after the insertion of inline
assembler? That is, is inline assembler "what you see is what you get", or does
the optimizer munge it? I should mention that I don't actually mind what the
answer is.

If the answer turns out to be that the opimizer MAY modify even my inline
assembler then I do have a workaround, so it doesn't matter. I just want to
know.

If the answer turns out to be that the optimizer WILL NOT modify inline
assembler, then I must ask a follow-up question: Do we have any kind of
guarantee that this will always be the case in the future? That is, does there
exist a stability policy in this regard which future incarnations of the
compiler must always respect?

Arcane Jill
Jun 09 2004
next sibling parent "Matthew" <matthew.hat stlsoft.dot.org> writes:
It seems to me that inline assembler must always be as-you-type-it. It's
reasonable in (almost?) all such cases to trust the programmer.

"Arcane Jill" <Arcane_member pathlink.com> wrote in message
news:ca6rqj$2300$1 digitaldaemon.com...
 Question: is optimization done before or after the insertion of inline
 assembler? That is, is inline assembler "what you see is what you get", or does
 the optimizer munge it? I should mention that I don't actually mind what the
 answer is.

 If the answer turns out to be that the opimizer MAY modify even my inline
 assembler then I do have a workaround, so it doesn't matter. I just want to
 know.

 If the answer turns out to be that the optimizer WILL NOT modify inline
 assembler, then I must ask a follow-up question: Do we have any kind of
 guarantee that this will always be the case in the future? That is, does there
 exist a stability policy in this regard which future incarnations of the
 compiler must always respect?

 Arcane Jill

Jun 09 2004
prev sibling next sibling parent reply Norbert Nemec <Norbert.Nemec gmx.de> writes:
Arcane Jill wrote:

 Question: is optimization done before or after the insertion of inline
 assembler? That is, is inline assembler "what you see is what you get", or
 does the optimizer munge it? I should mention that I don't actually mind
 what the answer is.
 
 If the answer turns out to be that the opimizer MAY modify even my inline
 assembler then I do have a workaround, so it doesn't matter. I just want
 to know.
 
 If the answer turns out to be that the optimizer WILL NOT modify inline
 assembler, then I must ask a follow-up question: Do we have any kind of
 guarantee that this will always be the case in the future? That is, does
 there exist a stability policy in this regard which future incarnations of
 the compiler must always respect?

I don't understand the purpose of this question: The optimizer is guaranteed not the change the behaviour of the code. If the compiler were intelligent enough to perfectly understand some inline-assembler lines including all side-effects, it might optimize them, otherwise you can be sure that it will not touch them. I assume that no compiler is intelligent enough to do optimization of inline-assembler code, and I assume there is little reason to take the pain of doing something like that. But still: if there were such a compiler mangling your inline assembler, you could still be sure that the resulting code behaves identical to the original in every respect. Therefore, I don't know why you are afraid of the optimizer touching your inline assembler code.
Jun 09 2004
next sibling parent reply Arcane Jill <Arcane_member pathlink.com> writes:
In article <ca6vk6$28o6$1 digitaldaemon.com>, Norbert Nemec says...
I don't understand the purpose of this question:

Then I shall explain.
The optimizer is guaranteed
not the change the behaviour of the code. If the compiler were intelligent
enough to perfectly understand some inline-assembler lines including all
side-effects, it might optimize them, otherwise you can be sure that it
will not touch them.

Not if previous experience is anything to go by. In Borland, Microsoft and GNU compilers, buffers which are memset() to zero to securely wipe their sensitive content immediately before destruction are considered to be "dead" already by the optimizers of those compilers - i.e. they will never be read again, thus the compiler marks this code as redundant and removes it. When this problem was revealed it was found that a great deal of cryptographic software, including a variety of cryptographic libraries written by experienced programmers, had failed to take adequate measures to address this.* Arcane Jill * from the paper "Understanding Data Lifetime via Whole System Simulation" by Chow, Pfaff, Garfinkel, Christopher and Rosenblum.
Jun 09 2004
parent reply "Walter" <newshound digitalmars.com> writes:
"Arcane Jill" <Arcane_member pathlink.com> wrote in message
news:ca7224$2cat$1 digitaldaemon.com...
 Not if previous experience is anything to go by. In Borland, Microsoft and

 compilers, buffers which are memset() to zero to securely wipe their

 content immediately before destruction are considered to be "dead" already

 the optimizers of those compilers - i.e. they will never be read again,

 compiler marks this code as redundant and removes it. When this problem

 revealed it was found that a great deal of cryptographic software,

 variety of cryptographic libraries written by experienced programmers, had
 failed to take adequate measures to address this.*

The optimizer won't delete your inline assembler to do that. However, declaring the reference that is memset to be 'volatile' should take care of it.
Jun 09 2004
parent reply Arcane Jill <Arcane_member pathlink.com> writes:
In article <ca7nq7$db5$2 digitaldaemon.com>, Walter says...

declaring the reference that is memset to be 'volatile' should take care of
it.

I don't completely understand the D meaning of "volatile". It seems to be different from that to which I am accustomed in C/C++. In D, "volatile" is part of a STATEMENT. It is not a storage class or an attribute. In C++, of course, volatile is a storage class. It means "do not cache the value of this variable in a register". It means that the compiler has to actually read it, every time, in case some other thread (or piece of hardware, etc.) has modified it. But in D, if I read this correctly, you can do stuff like:
    volatile *p++;

(I just checked, and that does compile). It seems to me that a statement like:
    volatile uint n;

won't actually make a volatile variable in the C sense, it will just guarantee that all writes are complete before the variable is initialized, and that the initialization of the variable is complete before the next statement begins. I may have completely misunderstood this. If I've got it right, then I don't entirely see why this would be useful.
Jun 09 2004
parent "Walter" <newshound digitalmars.com> writes:
"Arcane Jill" <Arcane_member pathlink.com> wrote in message
news:ca7p2s$f9s$1 digitaldaemon.com...
 In article <ca7nq7$db5$2 digitaldaemon.com>, Walter says...

declaring the reference that is memset to be 'volatile' should take care


it.

I don't completely understand the D meaning of "volatile". It seems to be different from that to which I am accustomed in C/C++. In D, "volatile" is

 of a STATEMENT. It is not a storage class or an attribute.

 In C++, of course, volatile is a storage class. It means "do not cache the

 of this variable in a register". It means that the compiler has to

 it, every time, in case some other thread (or piece of hardware, etc.) has
 modified it.

 But in D, if I read this correctly, you can do stuff like:

    volatile *p++;

(I just checked, and that does compile). It seems to me that a statement

    volatile uint n;

won't actually make a volatile variable in the C sense, it will just

 that all writes are complete before the variable is initialized, and that

 initialization of the variable is complete before the next statement

 I may have completely misunderstood this. If I've got it right, then I

 entirely see why this would be useful.

You're right. I was referring to C's notion of volatile in the problem with memset().
Jun 09 2004
prev sibling next sibling parent Derek <derek psyc.ward> writes:
On Wed, 09 Jun 2004 14:25:41 +0200, Norbert Nemec wrote:

 Arcane Jill wrote:
 
 Question: is optimization done before or after the insertion of inline
 assembler? That is, is inline assembler "what you see is what you get", or
 does the optimizer munge it? I should mention that I don't actually mind
 what the answer is.
 
 If the answer turns out to be that the opimizer MAY modify even my inline
 assembler then I do have a workaround, so it doesn't matter. I just want
 to know.
 
 If the answer turns out to be that the optimizer WILL NOT modify inline
 assembler, then I must ask a follow-up question: Do we have any kind of
 guarantee that this will always be the case in the future? That is, does
 there exist a stability policy in this regard which future incarnations of
 the compiler must always respect?

I don't understand the purpose of this question: The optimizer is guaranteed not the change the behaviour of the code. If the compiler were intelligent enough to perfectly understand some inline-assembler lines including all side-effects, it might optimize them, otherwise you can be sure that it will not touch them. I assume that no compiler is intelligent enough to do optimization of inline-assembler code, and I assume there is little reason to take the pain of doing something like that. But still: if there were such a compiler mangling your inline assembler, you could still be sure that the resulting code behaves identical to the original in every respect. Therefore, I don't know why you are afraid of the optimizer touching your inline assembler code.

One reason is that one may deliberately require 'under'-optimised machine code to exist. The compiler can nver really know the intentions of a coder, it just assumes some things. -- Derek Melbourne, Australia
Jun 09 2004
prev sibling parent Roberto Mariottini <Roberto_member pathlink.com> writes:
In article <ca6vk6$28o6$1 digitaldaemon.com>, Norbert Nemec says...

Therefore, I don't know why you are afraid of the optimizer touching your
inline assembler code.

Maybe working on a self-integrity check? I can pre-calculate the CRC of a bounch of functions and check them at runtime. Ciao
Jun 10 2004
prev sibling next sibling parent Ilya Minkov <minkov cs.tum.edu> writes:
Arcane Jill schrieb:

 Question: is optimization done before or after the insertion of inline
 assembler? That is, is inline assembler "what you see is what you get", or does
 the optimizer munge it? I should mention that I don't actually mind what the
 answer is.

As far as i remember from Walter answering some other question, DMD guards the inline assembly code to prevent optimizer from messing with it.
 If the answer turns out to be that the optimizer WILL NOT modify inline
 assembler, then I must ask a follow-up question: Do we have any kind of
 guarantee that this will always be the case in the future? That is, does there
 exist a stability policy in this regard which future incarnations of the
 compiler must always respect?

Other incarnations of compiler are not guaranteed to have an inline assembler at all. :) In particular, GDC doesn't. As to DMD, it looks like a deliberate decision, so knowing Walter it's unlikely to change. I haven't seen such a gurantee in documentation though, so you can actually never know what other compiler writers do, until it is written down. -eye
Jun 09 2004
prev sibling next sibling parent "Walter" <newshound digitalmars.com> writes:
"Arcane Jill" <Arcane_member pathlink.com> wrote in message
news:ca6rqj$2300$1 digitaldaemon.com...
 Question: is optimization done before or after the insertion of inline
 assembler?

After, although the optimizer does not touch the inline assembler.
 That is, is inline assembler "what you see is what you get",

Yes.
 or does
 the optimizer munge it?

No. But it will do a few single instruction things like: replaces jmps to the next instruction with NOPs sign extension of modregrm displacement sign extension of immediate data (can't do it for OR, AND, XOR as the opcodes are not defined) short versions for AX EA short versions for reg EA TEST reg,-1 => TEST reg,reg AND reg,0 => XOR reg,reg It won't do scheduling or reorganizing of it, nor any changes that would affect the flags.
 I should mention that I don't actually mind what the
 answer is.

 If the answer turns out to be that the opimizer MAY modify even my inline
 assembler then I do have a workaround, so it doesn't matter. I just want

 know.

 If the answer turns out to be that the optimizer WILL NOT modify inline
 assembler, then I must ask a follow-up question: Do we have any kind of
 guarantee that this will always be the case in the future? That is, does

 exist a stability policy in this regard which future incarnations of the
 compiler must always respect?

Since the asm language does not exactly specify the opcode to be generated, this would be a difficult rule to enforce in a few cases.
Jun 09 2004
prev sibling parent reply Kevin Bealer <Kevin_member pathlink.com> writes:
In article <ca6rqj$2300$1 digitaldaemon.com>, Arcane Jill says...
Question: is optimization done before or after the insertion of inline
assembler? That is, is inline assembler "what you see is what you get", or does
the optimizer munge it? I should mention that I don't actually mind what the
answer is.

If the answer turns out to be that the opimizer MAY modify even my inline
assembler then I do have a workaround, so it doesn't matter. I just want to
know.

If the answer turns out to be that the optimizer WILL NOT modify inline
assembler, then I must ask a follow-up question: Do we have any kind of
guarantee that this will always be the case in the future? That is, does there
exist a stability policy in this regard which future incarnations of the
compiler must always respect?

Arcane Jill

As a tangential comment: I wonder if it would makes sense to allocate all security-sensitive data from a special pool, perhaps portions of a large malloced chunk, or a linked list of malloc'd chunks. A call at the end of main() could clear this memory. It could XOR the return value (of main) with several random array elements after it is cleared to try to prevent optimization. As the code runs, each object would try to clear its own pieces, to further minimize lifetimes. The security pool could also verify that the pool was all nulls, perhaps spitting out messages if the pool was not cleared (even in release mode). This would also inhibit optimization. A side benefit of this is that you could do all your mlock or don't-page-out precautions (however that is done) in one place. Kevin
Jun 09 2004
parent reply Arcane Jill <Arcane_member pathlink.com> writes:
In article <ca8149$r84$1 digitaldaemon.com>, Kevin Bealer says...

As a tangential comment:

I wonder if it would makes sense to allocate all security-sensitive data from a
special pool, perhaps portions of a large malloced chunk, or a linked list of
malloc'd chunks.  A call at the end of main() could clear this memory.  

It makes perfect sense, except for the fact that, in a server, main() never returns. The program just keeps running forever.
It could
XOR the return value (of main) with several random array elements after it is
cleared to try to prevent optimization.

It's easy enough just to fill it with zeroes using inline assembler, since Walter says this will never be optimized away.
As the code runs, each object would try to clear its own pieces, to further
minimize lifetimes.

Yup. I just spent the last couple of weeks implementing exactly that. Now the only problem is, it doesn't work - BECAUSE - I have no way of knowing when an object is no longer visible (and hence eligible for wiping). Now, this wouldn't be a problem if operators new() and delete() were globally overloadable, but they're not. Unless I've got that wrong. For example, I could recode Int to use just such a custom allocator. Then they could be used to do RSA calculations, etc. BUT - realistically, no-one is ever going to call delete() on an Int. That would seriously complicate using them. And the GC won't touch it, because it has a custom allocator. Wait - just had an idea! <ping> I'll make that a separate post and see what Walter thinks.
The security pool could also verify that the pool was all
nulls, perhaps spitting out messages if the pool was not cleared (even in
release mode).  This would also inhibit optimization.

I might do that in a Debug build - that's DbC - but not in a Release build. I mean, if assembler doesn't get optimized away, there's just no problem. It WILL happen.
A side benefit of this is that you could do all your mlock or don't-page-out
precautions (however that is done) in one place.

Yup. Just need that global new() and delete() now. I like your thinking. Jill
Jun 09 2004
parent Sean Kelly <sean f4.ca> writes:
In article <ca836b$usn$1 digitaldaemon.com>, Arcane Jill says...
Yup. I just spent the last couple of weeks implementing exactly that. Now the
only problem is, it doesn't work - BECAUSE - I have no way of knowing when an
object is no longer visible (and hence eligible for wiping). Now, this wouldn't
be a problem if operators new() and delete() were globally overloadable, but
they're not. Unless I've got that wrong.

For example, I could recode Int to use just such a custom allocator. Then they
could be used to do RSA calculations, etc. BUT - realistically, no-one is ever
going to call delete() on an Int. That would seriously complicate using them.
And the GC won't touch it, because it has a custom allocator.

You could ask that we be allowed to overload the dot operator to make implementing smart pointers a bit simpler, though aside from that one use the idea kind of horrifies me :) Sean
Jun 09 2004