digitalmars.D - Inline assembler and optimization

Arcane Jill (13/13) Jun 09 2004 Question: is optimization done before or after the insertion of inline

Matthew (4/17) Jun 09 2004 It seems to me that inline assembler must always be as-you-type-it. It's
Norbert Nemec (13/27) Jun 09 2004 I don't understand the purpose of this question: The optimizer is guaran...

Arcane Jill (13/19) Jun 09 2004 Not if previous experience is anything to go by. In Borland, Microsoft a...

Walter (11/19) Jun 09 2004 GNU

Arcane Jill (15/19) Jun 09 2004 I don't completely understand the D meaning of "volatile". It seems to b...

Walter (13/32) Jun 09 2004 of

Derek (7/38) Jun 09 2004 One reason is that one may deliberately require 'under'-optimised machin...
Roberto Mariottini (5/7) Jun 10 2004 Maybe working on a self-integrity check? I can pre-calculate the CRC of ...

Ilya Minkov (10/19) Jun 09 2004 As far as i remember from Walter answering some other question, DMD
Walter (19/34) Jun 09 2004 After, although the optimizer does not touch the inline assembler.
Kevin Bealer (14/27) Jun 09 2004 As a tangential comment:

Arcane Jill (22/36) Jun 09 2004 It makes perfect sense, except for the fact that, in a server, main() ne...

Sean Kelly (5/14) Jun 09 2004 You could ask that we be allowed to overload the dot operator to make

Arcane Jill <Arcane_member pathlink.com> writes:

Question: is optimization done before or after the insertion of inline
assembler? That is, is inline assembler "what you see is what you get", or does
the optimizer munge it? I should mention that I don't actually mind what the
answer is.

If the answer turns out to be that the opimizer MAY modify even my inline
assembler then I do have a workaround, so it doesn't matter. I just want to
know.

If the answer turns out to be that the optimizer WILL NOT modify inline
assembler, then I must ask a follow-up question: Do we have any kind of
guarantee that this will always be the case in the future? That is, does there
exist a stability policy in this regard which future incarnations of the
compiler must always respect?

Arcane Jill

Jun 09 2004

"Matthew" <matthew.hat stlsoft.dot.org> writes:

It seems to me that inline assembler must always be as-you-type-it. It's
reasonable in (almost?) all such cases to trust the programmer.

"Arcane Jill" <Arcane_member pathlink.com> wrote in message
news:ca6rqj$2300$1 digitaldaemon.com...
 Question: is optimization done before or after the insertion of inline
 assembler? That is, is inline assembler "what you see is what you get", or does
 the optimizer munge it? I should mention that I don't actually mind what the
 answer is.

 If the answer turns out to be that the opimizer MAY modify even my inline
 assembler then I do have a workaround, so it doesn't matter. I just want to
 know.

 If the answer turns out to be that the optimizer WILL NOT modify inline
 assembler, then I must ask a follow-up question: Do we have any kind of
 guarantee that this will always be the case in the future? That is, does there
 exist a stability policy in this regard which future incarnations of the
 compiler must always respect?

 Arcane Jill

Jun 09 2004

Norbert Nemec <Norbert.Nemec gmx.de> writes:

Arcane Jill wrote:

 Question: is optimization done before or after the insertion of inline
 assembler? That is, is inline assembler "what you see is what you get", or
 does the optimizer munge it? I should mention that I don't actually mind
 what the answer is.
 
 If the answer turns out to be that the opimizer MAY modify even my inline
 assembler then I do have a workaround, so it doesn't matter. I just want
 to know.
 
 If the answer turns out to be that the optimizer WILL NOT modify inline
 assembler, then I must ask a follow-up question: Do we have any kind of
 guarantee that this will always be the case in the future? That is, does
 there exist a stability policy in this regard which future incarnations of
 the compiler must always respect?

I don't understand the purpose of this question: The optimizer is guaranteed
not the change the behaviour of the code. If the compiler were intelligent
enough to perfectly understand some inline-assembler lines including all
side-effects, it might optimize them, otherwise you can be sure that it
will not touch them.

I assume that no compiler is intelligent enough to do optimization of
inline-assembler code, and I assume there is little reason to take the pain
of doing something like that. But still: if there were such a compiler
mangling your inline assembler, you could still be sure that the resulting
code behaves identical to the original in every respect.

Therefore, I don't know why you are afraid of the optimizer touching your
inline assembler code.

Jun 09 2004

Arcane Jill <Arcane_member pathlink.com> writes:

In article <ca6vk6$28o6$1 digitaldaemon.com>, Norbert Nemec says...
I don't understand the purpose of this question:

Then I shall explain.


The optimizer is guaranteed
not the change the behaviour of the code. If the compiler were intelligent
enough to perfectly understand some inline-assembler lines including all
side-effects, it might optimize them, otherwise you can be sure that it
will not touch them.

Not if previous experience is anything to go by. In Borland, Microsoft and GNU
compilers, buffers which are memset() to zero to securely wipe their sensitive
content immediately before destruction are considered to be "dead" already by
the optimizers of those compilers - i.e. they will never be read again, thus the
compiler marks this code as redundant and removes it. When this problem was
revealed it was found that a great deal of cryptographic software, including a
variety of cryptographic libraries written by experienced programmers, had
failed to take adequate measures to address this.*

Arcane Jill

* from the paper "Understanding Data Lifetime via Whole System Simulation" by
Chow, Pfaff, Garfinkel, Christopher and Rosenblum.

Jun 09 2004

"Walter" <newshound digitalmars.com> writes:

"Arcane Jill" <Arcane_member pathlink.com> wrote in message
news:ca7224$2cat$1 digitaldaemon.com...
 Not if previous experience is anything to go by. In Borland, Microsoft and

GNU
 compilers, buffers which are memset() to zero to securely wipe their

sensitive
 content immediately before destruction are considered to be "dead" already

by
 the optimizers of those compilers - i.e. they will never be read again,

thus the
 compiler marks this code as redundant and removes it. When this problem

was
 revealed it was found that a great deal of cryptographic software,

including a
 variety of cryptographic libraries written by experienced programmers, had
 failed to take adequate measures to address this.*

The optimizer won't delete your inline assembler to do that. However,
declaring the reference that is memset to be 'volatile' should take care of
it.

Jun 09 2004

Arcane Jill <Arcane_member pathlink.com> writes:

In article <ca7nq7$db5$2 digitaldaemon.com>, Walter says...

declaring the reference that is memset to be 'volatile' should take care of
it.

I don't completely understand the D meaning of "volatile". It seems to be
different from that to which I am accustomed in C/C++. In D, "volatile" is part
of a STATEMENT. It is not a storage class or an attribute.

In C++, of course, volatile is a storage class. It means "do not cache the value
of this variable in a register". It means that the compiler has to actually read
it, every time, in case some other thread (or piece of hardware, etc.) has
modified it.

But in D, if I read this correctly, you can do stuff like:

    volatile *p++;

(I just checked, and that does compile). It seems to me that a statement like:

    volatile uint n;

won't actually make a volatile variable in the C sense, it will just guarantee
that all writes are complete before the variable is initialized, and that the
initialization of the variable is complete before the next statement begins.

I may have completely misunderstood this. If I've got it right, then I don't
entirely see why this would be useful.

Jun 09 2004

"Walter" <newshound digitalmars.com> writes:

"Arcane Jill" <Arcane_member pathlink.com> wrote in message
news:ca7p2s$f9s$1 digitaldaemon.com...
 In article <ca7nq7$db5$2 digitaldaemon.com>, Walter says...

declaring the reference that is memset to be 'volatile' should take care


of
it.

 I don't completely understand the D meaning of "volatile". It seems to be
 different from that to which I am accustomed in C/C++. In D, "volatile" is

part
 of a STATEMENT. It is not a storage class or an attribute.

 In C++, of course, volatile is a storage class. It means "do not cache the

value
 of this variable in a register". It means that the compiler has to

actually read
 it, every time, in case some other thread (or piece of hardware, etc.) has
 modified it.

 But in D, if I read this correctly, you can do stuff like:

    volatile *p++;

 (I just checked, and that does compile). It seems to me that a statement

like:
    volatile uint n;

 won't actually make a volatile variable in the C sense, it will just

guarantee
 that all writes are complete before the variable is initialized, and that

the
 initialization of the variable is complete before the next statement

begins.
 I may have completely misunderstood this. If I've got it right, then I

don't
 entirely see why this would be useful.

You're right. I was referring to C's notion of volatile in the problem with
memset().

Jun 09 2004

Derek <derek psyc.ward> writes:

On Wed, 09 Jun 2004 14:25:41 +0200, Norbert Nemec wrote:

 Arcane Jill wrote:
 
 Question: is optimization done before or after the insertion of inline
 assembler? That is, is inline assembler "what you see is what you get", or
 does the optimizer munge it? I should mention that I don't actually mind
 what the answer is.
 
 If the answer turns out to be that the opimizer MAY modify even my inline
 assembler then I do have a workaround, so it doesn't matter. I just want
 to know.
 
 If the answer turns out to be that the optimizer WILL NOT modify inline
 assembler, then I must ask a follow-up question: Do we have any kind of
 guarantee that this will always be the case in the future? That is, does
 there exist a stability policy in this regard which future incarnations of
 the compiler must always respect?

 
 I don't understand the purpose of this question: The optimizer is guaranteed
 not the change the behaviour of the code. If the compiler were intelligent
 enough to perfectly understand some inline-assembler lines including all
 side-effects, it might optimize them, otherwise you can be sure that it
 will not touch them.
 
 I assume that no compiler is intelligent enough to do optimization of
 inline-assembler code, and I assume there is little reason to take the pain
 of doing something like that. But still: if there were such a compiler
 mangling your inline assembler, you could still be sure that the resulting
 code behaves identical to the original in every respect.
 
 Therefore, I don't know why you are afraid of the optimizer touching your
 inline assembler code.

One reason is that one may deliberately require 'under'-optimised machine
code to exist. The compiler can nver really know the intentions of a coder,
it just assumes some things.

-- 
Derek
Melbourne, Australia

Jun 09 2004

Roberto Mariottini <Roberto_member pathlink.com> writes:

In article <ca6vk6$28o6$1 digitaldaemon.com>, Norbert Nemec says...

[...]
Therefore, I don't know why you are afraid of the optimizer touching your
inline assembler code.

Maybe working on a self-integrity check? I can pre-calculate the CRC of a bounch
of functions and check them at runtime.

Ciao

Jun 10 2004

Ilya Minkov <minkov cs.tum.edu> writes:

Arcane Jill schrieb:

 Question: is optimization done before or after the insertion of inline
 assembler? That is, is inline assembler "what you see is what you get", or does
 the optimizer munge it? I should mention that I don't actually mind what the
 answer is.

As far as i remember from Walter answering some other question, DMD 
guards the inline assembly code to prevent optimizer from messing with it.

 If the answer turns out to be that the optimizer WILL NOT modify inline
 assembler, then I must ask a follow-up question: Do we have any kind of
 guarantee that this will always be the case in the future? That is, does there
 exist a stability policy in this regard which future incarnations of the
 compiler must always respect?

Other incarnations of compiler are not guaranteed to have an inline 
assembler at all. :) In particular, GDC doesn't.

As to DMD, it looks like a deliberate decision, so knowing Walter it's 
unlikely to change. I haven't seen such a gurantee in documentation 
though, so you can actually never know what other compiler writers do, 
until it is written down.

-eye

Jun 09 2004

"Walter" <newshound digitalmars.com> writes:

"Arcane Jill" <Arcane_member pathlink.com> wrote in message
news:ca6rqj$2300$1 digitaldaemon.com...
 Question: is optimization done before or after the insertion of inline
 assembler?

After, although the optimizer does not touch the inline assembler.

 That is, is inline assembler "what you see is what you get",

Yes.

 or does
 the optimizer munge it?

No. But it will do a few single instruction things like:
      replaces jmps to the next instruction with NOPs
      sign extension of modregrm displacement
      sign extension of immediate data (can't do it for OR, AND, XOR
              as the opcodes are not defined)
      short versions for AX EA
      short versions for reg EA
     TEST reg,-1 => TEST reg,reg
    AND reg,0 => XOR reg,reg
It won't do scheduling or reorganizing of it, nor any changes that would
affect the flags.

 I should mention that I don't actually mind what the
 answer is.

 If the answer turns out to be that the opimizer MAY modify even my inline
 assembler then I do have a workaround, so it doesn't matter. I just want

to
 know.

 If the answer turns out to be that the optimizer WILL NOT modify inline
 assembler, then I must ask a follow-up question: Do we have any kind of
 guarantee that this will always be the case in the future? That is, does

there
 exist a stability policy in this regard which future incarnations of the
 compiler must always respect?

Since the asm language does not exactly specify the opcode to be generated,
this would be a difficult rule to enforce in a few cases.

Jun 09 2004

Kevin Bealer <Kevin_member pathlink.com> writes:

In article <ca6rqj$2300$1 digitaldaemon.com>, Arcane Jill says...
Question: is optimization done before or after the insertion of inline
assembler? That is, is inline assembler "what you see is what you get", or does
the optimizer munge it? I should mention that I don't actually mind what the
answer is.

If the answer turns out to be that the opimizer MAY modify even my inline
assembler then I do have a workaround, so it doesn't matter. I just want to
know.

If the answer turns out to be that the optimizer WILL NOT modify inline
assembler, then I must ask a follow-up question: Do we have any kind of
guarantee that this will always be the case in the future? That is, does there
exist a stability policy in this regard which future incarnations of the
compiler must always respect?

Arcane Jill

As a tangential comment:

I wonder if it would makes sense to allocate all security-sensitive data from a
special pool, perhaps portions of a large malloced chunk, or a linked list of
malloc'd chunks.  A call at the end of main() could clear this memory.  It could
XOR the return value (of main) with several random array elements after it is
cleared to try to prevent optimization.

As the code runs, each object would try to clear its own pieces, to further
minimize lifetimes.  The security pool could also verify that the pool was all
nulls, perhaps spitting out messages if the pool was not cleared (even in
release mode).  This would also inhibit optimization.

A side benefit of this is that you could do all your mlock or don't-page-out
precautions (however that is done) in one place.

Kevin

Jun 09 2004

Arcane Jill <Arcane_member pathlink.com> writes:

In article <ca8149$r84$1 digitaldaemon.com>, Kevin Bealer says...

As a tangential comment:

I wonder if it would makes sense to allocate all security-sensitive data from a
special pool, perhaps portions of a large malloced chunk, or a linked list of
malloc'd chunks.  A call at the end of main() could clear this memory.  

It makes perfect sense, except for the fact that, in a server, main() never
returns. The program just keeps running forever.


It could
XOR the return value (of main) with several random array elements after it is
cleared to try to prevent optimization.

It's easy enough just to fill it with zeroes using inline assembler, since
Walter says this will never be optimized away.


As the code runs, each object would try to clear its own pieces, to further
minimize lifetimes.

Yup. I just spent the last couple of weeks implementing exactly that. Now the
only problem is, it doesn't work - BECAUSE - I have no way of knowing when an
object is no longer visible (and hence eligible for wiping). Now, this wouldn't
be a problem if operators new() and delete() were globally overloadable, but
they're not. Unless I've got that wrong.

For example, I could recode Int to use just such a custom allocator. Then they
could be used to do RSA calculations, etc. BUT - realistically, no-one is ever
going to call delete() on an Int. That would seriously complicate using them.
And the GC won't touch it, because it has a custom allocator.

Wait - just had an idea! <ping> I'll make that a separate post and see what
Walter thinks.



The security pool could also verify that the pool was all
nulls, perhaps spitting out messages if the pool was not cleared (even in
release mode).  This would also inhibit optimization.

I might do that in a Debug build - that's DbC - but not in a Release build. I
mean, if assembler doesn't get optimized away, there's just no problem. It WILL
happen.


A side benefit of this is that you could do all your mlock or don't-page-out
precautions (however that is done) in one place.

Yup. Just need that global new() and delete() now.

I like your thinking.
Jill

Jun 09 2004

Sean Kelly <sean f4.ca> writes:

In article <ca836b$usn$1 digitaldaemon.com>, Arcane Jill says...
Yup. I just spent the last couple of weeks implementing exactly that. Now the
only problem is, it doesn't work - BECAUSE - I have no way of knowing when an
object is no longer visible (and hence eligible for wiping). Now, this wouldn't
be a problem if operators new() and delete() were globally overloadable, but
they're not. Unless I've got that wrong.

For example, I could recode Int to use just such a custom allocator. Then they
could be used to do RSA calculations, etc. BUT - realistically, no-one is ever
going to call delete() on an Int. That would seriously complicate using them.
And the GC won't touch it, because it has a custom allocator.

You could ask that we be allowed to overload the dot operator to make
implementing smart pointers a bit simpler, though aside from that one use the
idea kind of horrifies me :)


Sean

Jun 09 2004

D Programming

C/C++ Programming

Other

digitalmars.D - Inline assembler and optimization