D - Inlining

Helmut Leitner (11/11) Apr 21 2003 What do we know about inlining except that the compiler

Matthew Wilson (17/28) Apr 21 2003 afaik, it is entirely up to the compiler, which is where it should be in

Ilya Minkov (4/24) Apr 21 2003 Why not inline(always), inline(prefer), inline(never),

Matthew Wilson (3/27) Apr 21 2003 Sounds ok to me

Walter (5/13) Apr 24 2003 Think of inlining like the obsolete register keyword in C.

Mark T (6/11) Apr 25 2003 I agree, in the future most D compilers could have various compile-for-s...

Walter (9/15) Apr 25 2003 compile-for-speed and

Scott Wood (16/20) Apr 27 2003 It'd still be nice to have a way of explicitly saying that a function

Ilya Minkov (8/18) Apr 28 2003 Who says the register keyword is useless?
Walter (22/41) May 03 2003 I suspect those functions are heavilly dependent on how a *particular*

Scott Wood (41/66) May 04 2003 Not particularly, at least in the case of the scheduler. The

Walter (24/66) May 07 2003 The compiler does not optimize inline assembly that you write. Therefore...

Scott Wood (81/119) May 07 2003 I suppose, though it'd be a little awkward to use the assembler just

Walter (41/152) May 08 2003 I'd agree with that.

C (20/41) May 07 2003 If that means what I think is intended, should 'break' be more
Scott Wood (48/117) May 08 2003 Except that in this case, using inline assembler would have made it

Walter (43/121) May 09 2003 a

Scott Wood (72/132) May 09 2003 For the default case, sure. I'll wait until compilers have a full,

Ilya Minkov (9/12) May 10 2003 Why are you using GAS? You can use NASM (or maybe FASM) instead! Both

Nic Tiger (29/41) May 10 2003 I did find reliable way to use NASM with Digital Mars for Win32 and DOSX

Helmut Leitner <helmut.leitner chello.at> writes:

What do we know about inlining except that the compiler
will do it when it feels so?

Is there a guarantee that a simple macro-like definition like 

   void MemClear(char *p,int size) 
   {
      memset(p,0,size);
   }

will be inlined? What if this goes through multiple levels?

--
Helmut Leitner    leitner hls.via.at   
Graz, Austria   www.hls-software.com

Apr 21 2003

"Matthew Wilson" <dmd synesis.com.au> writes:

afaik, it is entirely up to the compiler, which is where it should be in
almost all cases.

I think I remember there being discussion about the use of the inline
keyword as something to _force_ the compiler to inline, which I kind of
like, but maybe using that keyword is bad, since all the C++ programmers
will use it everywhere, which may not be appropriate.

force_inline or forceinline might be better, as they're uglier, or even

forceinline
{
    void MemClear(char *p,int size)
    {
       memset(p,0,size);
    }
}

which would be unambiguous and quite obvious

"Helmut Leitner" <helmut.leitner chello.at> wrote in message
news:3EA3AEB3.DDC28183 chello.at...
 What do we know about inlining except that the compiler
 will do it when it feels so?

 Is there a guarantee that a simple macro-like definition like

    void MemClear(char *p,int size)
    {
       memset(p,0,size);
    }

 will be inlined? What if this goes through multiple levels?

 --
 Helmut Leitner    leitner hls.via.at
 Graz, Austria   www.hls-software.com

Apr 21 2003

Ilya Minkov <midiclub 8ung.at> writes:

Why not inline(always), inline(prefer), inline(never), 
inline(SomeConstantComparedToStandardizedInlinabilityIndex)?
Like the way version already works?

Matthew Wilson wrote:
 afaik, it is entirely up to the compiler, which is where it should be in
 almost all cases.
 
 I think I remember there being discussion about the use of the inline
 keyword as something to _force_ the compiler to inline, which I kind of
 like, but maybe using that keyword is bad, since all the C++ programmers
 will use it everywhere, which may not be appropriate.
 
 force_inline or forceinline might be better, as they're uglier, or even
 
 forceinline
 {
     void MemClear(char *p,int size)
     {
        memset(p,0,size);
     }
 }
 
 which would be unambiguous and quite obvious

Apr 21 2003

"Matthew Wilson" <matthew stlsoft.org> writes:

Sounds ok to me

"Ilya Minkov" <midiclub 8ung.at> wrote in message
news:b81ms7$2vg8$1 digitaldaemon.com...
 Why not inline(always), inline(prefer), inline(never),
 inline(SomeConstantComparedToStandardizedInlinabilityIndex)?
 Like the way version already works?

 Matthew Wilson wrote:
 afaik, it is entirely up to the compiler, which is where it should be in
 almost all cases.

 I think I remember there being discussion about the use of the inline
 keyword as something to _force_ the compiler to inline, which I kind of
 like, but maybe using that keyword is bad, since all the C++ programmers
 will use it everywhere, which may not be appropriate.

 force_inline or forceinline might be better, as they're uglier, or even

 forceinline
 {
     void MemClear(char *p,int size)
     {
        memset(p,0,size);
     }
 }

 which would be unambiguous and quite obvious

Apr 21 2003

"Walter" <walter digitalmars.com> writes:

"Helmut Leitner" <helmut.leitner chello.at> wrote in message
news:3EA3AEB3.DDC28183 chello.at...
 What do we know about inlining except that the compiler
 will do it when it feels so?

 Is there a guarantee that a simple macro-like definition like

    void MemClear(char *p,int size)
    {
       memset(p,0,size);
    }

 will be inlined? What if this goes through multiple levels?

Think of inlining like the obsolete register keyword in C.

Whether obvious inlining is done or not is a quality of implementation
issue, not a language issue.

Apr 24 2003

Mark T <Mark_member pathlink.com> writes:

 will be inlined? What if this goes through multiple levels?

Think of inlining like the obsolete register keyword in C.

Whether obvious inlining is done or not is a quality of implementation
issue, not a language issue.

I agree, in the future most D compilers could have various compile-for-speed and
compile-for-size switches for various environments ( ex: small embedded targets
)

The design of the language should also allow for "Global System Analysis" see
JOOP article May 2001 or look for similar info at http://smarteiffel.loria.fr/
http://smarteiffel.loria.fr/papers/papers.html

Apr 25 2003

"Walter" <walter digitalmars.com> writes:

"Mark T" <Mark_member pathlink.com> wrote in message
news:b8bb7j$d7b$1 digitaldaemon.com...
 I agree, in the future most D compilers could have various

compile-for-speed and
 compile-for-size switches for various environments ( ex: small embedded

targets
 )

Yes.

 The design of the language should also allow for "Global System Analysis"

see
 JOOP article May 2001 or look for similar info at

http://smarteiffel.loria.fr/
 http://smarteiffel.loria.fr/papers/papers.html

D's design does allow for extensive inter-module analysis, although DMD
makes no attempt at it.

Apr 25 2003

Scott Wood <scott buserror.net> writes:

On Thu, 24 Apr 2003 13:03:22 -0700, Walter <walter digitalmars.com> wrote:
 Think of inlining like the obsolete register keyword in C.
 
 Whether obvious inlining is done or not is a quality of implementation
 issue, not a language issue.

It'd still be nice to have a way of explicitly saying that a function
either must or must not be inlined.  For example, the dynamic linker
in the GNU libc will break if certain functions are not inlined,
because the relocation has not yet been done.  The schedule()
function in Linux will break on sparc (and perhaps some other
platforms) if it is inlined, if you switch to a task that entered the
scheduler via a different containing function.

As for the analogy with the register keyword, GCC extends that to
allow you to explicitly place variables in specific registers, which
is useful in conjunction with assembly code.  The uselessness of the
original keyword does not mean that anything similar is also useless.

Neither of these are things you'd need very often, but when you do,
it'd be really unpleasant if they weren't there.  After all, D
claims to support "Down and dirty programming". :-)

-Scott

Apr 27 2003

Ilya Minkov <midiclub 8ung.at> writes:

Who says the register keyword is useless?

I remember some case of some guys using a fairly recent GCC, where they 
could raise performance by 20% by putting in the simple register hint in 
a couple of spots. While the compilers are getting smart, they don't 
know anything particular about the program's typical input values, as 
the programmer usually does.

-i.

Scott Wood wrote:
 As for the analogy with the register keyword, GCC extends that to
 allow you to explicitly place variables in specific registers, which
 is useful in conjunction with assembly code.  The uselessness of the
 original keyword does not mean that anything similar is also useless.
 
 Neither of these are things you'd need very often, but when you do,
 it'd be really unpleasant if they weren't there.  After all, D
 claims to support "Down and dirty programming". :-)
 
 -Scott

Apr 28 2003

"Walter" <walter digitalmars.com> writes:

"Scott Wood" <scott buserror.net> wrote in message
news:slrnbaoctt.3jp.scott ti.buserror.net...
 On Thu, 24 Apr 2003 13:03:22 -0700, Walter <walter digitalmars.com> wrote:
 Think of inlining like the obsolete register keyword in C.

 Whether obvious inlining is done or not is a quality of implementation
 issue, not a language issue.

 It'd still be nice to have a way of explicitly saying that a function
 either must or must not be inlined.  For example, the dynamic linker
 in the GNU libc will break if certain functions are not inlined,
 because the relocation has not yet been done.  The schedule()
 function in Linux will break on sparc (and perhaps some other
 platforms) if it is inlined, if you switch to a task that entered the
 scheduler via a different containing function.

I suspect those functions are heavilly dependent on how a *particular*
compiler generates code for that. Depending on that is going outside of the
language definition. It makes successful operation of the code overly
sensitive to particular compiler versions, etc. (Some linux kernel
developers are open about the kernel code being heavilly dependent on how a
particular revision of GCC generates code.) You could as easilly write code
in D that depends on a particular implementation of D, though with D's
support for inline assembler I'd argue that is unnecessary.


 As for the analogy with the register keyword, GCC extends that to
 allow you to explicitly place variables in specific registers, which
 is useful in conjunction with assembly code.  The uselessness of the
 original keyword does not mean that anything similar is also useless.

Those features are not part of the C language; although they are part of
GCC, they will not work with every version of GCC, and will not work with
any other C compiler. Contrast that with D, which has defined support for
inline assembler. Try doing some inline assembler work in GCC, then with D.
I think you'll find it supported far better in D, despite GCC's extensions.


 Neither of these are things you'd need very often, but when you do,
 it'd be really unpleasant if they weren't there.  After all, D
 claims to support "Down and dirty programming". :-)

Those things are what the inline assembler is for, and D has very strong
support for inline assembler. The C language itself has no support at all
for inline assembler, and GCC's support for it is very weak and error-prone
(for example, there's an arcane syntax you have to add to say which
registers were read and which were written by each asm block - get that
wrong, and your code will behave unpredictably. D, on the other hand, keeps
track of that automatically).

May 03 2003

Scott Wood <scott buserror.net> writes:

On Sat, 3 May 2003 14:38:10 -0700, Walter <walter digitalmars.com> wrote:
 "Scott Wood" <scott buserror.net> wrote in message
 news:slrnbaoctt.3jp.scott ti.buserror.net...
 It'd still be nice to have a way of explicitly saying that a function
 either must or must not be inlined.  For example, the dynamic linker
 in the GNU libc will break if certain functions are not inlined,
 because the relocation has not yet been done.  The schedule()
 function in Linux will break on sparc (and perhaps some other
 platforms) if it is inlined, if you switch to a task that entered the
 scheduler via a different containing function.

 
 I suspect those functions are heavilly dependent on how a *particular*
 compiler generates code for that. 

Not particularly, at least in the case of the scheduler.  The
scheduler's only concern with inlining is that it the destination
thread doesn't resume in the wrong inlined instance.  The inline
assembly is non-portable as well, but only because inline assembly is
not part of C.

 Depending on that is going outside of the language definition. 

That depends on what the language definition is. :-)

 It makes successful operation of the code overly sensitive to
 particular compiler versions, etc. (Some linux kernel developers
 are open about the kernel code being heavilly dependent on how a
 particular revision of GCC generates code.)

Some bits have been, but it's mainly been due to Linux developers
ignoring GCC's own rules for things like inline assembly constraints,
or making assumptions about weird stuff like "inline" assembly
outside of any function.

 Those things are what the inline assembler is for, and D has very strong
 support for inline assembler. 

How do you use the inline assembler to tell the compiler not to
inline a certain function written in D, not assembly?

 The C language itself has no support at all
 for inline assembler, and GCC's support for it is very weak and error-prone
 (for example, there's an arcane syntax you have to add to say which
 registers were read and which were written by each asm block - get that
 wrong, and your code will behave unpredictably. D, on the other hand, keeps
 track of that automatically).

Is there a way in D inline assembly to ask for a temporary register
without mandating a specific one?  How about specifying clobbers that
aren't explicitly in the code, such as when calling a function with
an unusual calling convention, or when switching threads?

Also, one of the example code sequences is this:

    void *pc;
    asm
    {
        call L1             ;
     L1:                    ;
        pop EBX             ;
        mov pc[EBP],EBX     ;       // pc now points to code at L1
    }

Why do you need to specify EBP when accessing pc?  Shouldn't the
compiler know what the best way to access pc is?  It might want to
get rid of the frame pointer, or it might want to keep it around in a
register for use after the asm block, etc.

GCC's inline assembly also has the sometimes desirable attribute
that the compiler doesn't touch the instructions you specify, other
than to schedule the block and substitute the things you asked it to. 
Will a D compiler be allowed to stick code in the middle of it, in
order to satisfy symbolic references, or to schedule instructions?
Is it allowed to optimize away mov instructions if it can get the
data there on its own?  Can it move memory accesses across the asm
block?

Usually, those sorts of things would be beneficial, but there should
be a way to tell it not to do it.

-Scott

May 04 2003

"Walter" <walter digitalmars.com> writes:

"Scott Wood" <scott buserror.net> wrote in message
news:slrnbbalqd.ud.scott ti.buserror.net...
 Some bits have been, but it's mainly been due to Linux developers
 ignoring GCC's own rules for things like inline assembly constraints,
 or making assumptions about weird stuff like "inline" assembly
 outside of any function.
 Those things are what the inline assembler is for, and D has very strong
 support for inline assembler.

 How do you use the inline assembler to tell the compiler not to
 inline a certain function written in D, not assembly?

The compiler does not optimize inline assembly that you write. Therefore, if
you use the inline assembler to call a function, that function won't be
inlined.


 The C language itself has no support at all
 for inline assembler, and GCC's support for it is very weak and


error-prone
 (for example, there's an arcane syntax you have to add to say which
 registers were read and which were written by each asm block - get that
 wrong, and your code will behave unpredictably. D, on the other hand,


keeps
 track of that automatically).

 Is there a way in D inline assembly to ask for a temporary register
 without mandating a specific one?

No. The idea is "what you write is what you get" with the inline assembler.


  How about specifying clobbers that
 aren't explicitly in the code, such as when calling a function with
 an unusual calling convention, or when switching threads?

Called functions must follow the normal register saving convention. If it is
an unusual function that clobbers other registers, you'll need to
save/restore them in the inline assembler.


 Also, one of the example code sequences is this:
     void *pc;
     asm
     {
         call L1             ;
      L1:                    ;
         pop EBX             ;
         mov pc[EBP],EBX     ;       // pc now points to code at L1
     }
 Why do you need to specify EBP when accessing pc?  Shouldn't the
 compiler know what the best way to access pc is?  It might want to
 get rid of the frame pointer, or it might want to keep it around in a
 register for use after the asm block, etc.

The compiler doesn't do frame pointer optimization when the inline assembler
is used, because the results of the inline assembler shouldn't be affected
by whether optimization is on or off. If you want, though, you can use the
'naked' pseudo-op and write the entire function in assembler, and what you
write is what you get.


 GCC's inline assembly also has the sometimes desirable attribute
 that the compiler doesn't touch the instructions you specify, other
 than to schedule the block and substitute the things you asked it to.
 Will a D compiler be allowed to stick code in the middle of it, in
 order to satisfy symbolic references, or to schedule instructions?
 Is it allowed to optimize away mov instructions if it can get the
 data there on its own?  Can it move memory accesses across the asm
 block?

The D compiler does not schedule, move around, optimize, or alter the inline
assembler instructions. The assumption is that if the programmer is going to
use inline assembler, the programmer knows exactly what he wants, and will
write it that way. What you write is what you get.

 Usually, those sorts of things would be beneficial, but there should
 be a way to tell it not to do it.

I guess I'm philosophically opposed to such things. I much prefer the
straightforward approach of inline assembler that what you write is what you
get. I also find it odd that gcc provides such things, yet still requires me
to specify which registers were read/written for the simplest inline asm.

May 07 2003

Scott Wood <scott buserror.net> writes:

On Wed, 7 May 2003 11:11:40 -0700, Walter <walter digitalmars.com> wrote:
 The compiler does not optimize inline assembly that you write. Therefore, if
 you use the inline assembler to call a function, that function won't be
 inlined.

I suppose, though it'd be a little awkward to use the assembler just
to call a function without it being inlined.

Still, I'm a bit uncomfortable with the idea that the compiler's
always right and cannot be corrected, even explicitly.  I've seen GCC
silently decide not to inline a function (on which inlining was
requested) because it was "too big", even though it was just a large
switch statement on a constant, which ended up being one or two
instructions after optimization.  Given that no compiler is going to
make the right choice all the time, it's nice to be able to declare
one's intent when there's a clear reason to do so.

  How about specifying clobbers that
 aren't explicitly in the code, such as when calling a function with
 an unusual calling convention, or when switching threads?

 
 Called functions must follow the normal register saving convention. If it is
 an unusual function that clobbers other registers, you'll need to
 save/restore them in the inline assembler.

Which would defeat the purpose of using a special convention.  For
example, on a mutex implementation, one might want to make the
contented case call a function that saves all registers, so that
the common case doesn't have to spill any registers (other than
whatever's need to test the mutex).

Thread switching would also be slower on architectures with a
reasonable number of registers if you have to manually save all of
them just because you can't tell the compiler to save (or
reconstruct) the 2 or 3 it might still care about.

BTW, will there be any way to tell the inline assembler to put some
code out-of-line?  Something like:

inline int lock_mutex(Mutex m)
{
   int new = whatever_goes_in_there;

   asm {
      eax = 0;  /* This tells the compiler to get a zero into eax,
                   in whatever way it chooses.  Maybe the caller
                   (which is inlining this function) had one lying
                   around in a register, and it can now choose to use
                   eax for that variable. */
      lock; cmpxchg [m.lock], new;
      jz failed;
      
      outofline {
         failed: /* I hope this label isn't visible outside of this
                    instantiation of this assembly block... */
            push ecx;
            push edx;
            call handle_failed;
            pop edx;
            pop ecx;
            return; /* This tells the compiler to exit the assembly
                       block.  Alternatively, a return label could
                       be declared. */
      }
      
      /* Tell the compiler that these registers were not, in fact,
         clobbered.  It can't assume it automatically, though, since
         it has no idea what handle_failed might be doing to those
         values on the stack.  Or, to save space, I may have buried
         those pushes into a wrapper assembly function instead,
         where the compiler probably won't see them. */

      noclobber ecx, edx;
      
      /* Tell the compiler that, since this thing acts as a mutex,
         no memory accesses can be reordered across it.  It's
         probably not necessary in this case, though, as it contains
         a function call. */

      clobber memory;
   }
}

 Also, one of the example code sequences is this:
     void *pc;
     asm
     {
         call L1             ;
      L1:                    ;
         pop EBX             ;
         mov pc[EBP],EBX     ;       // pc now points to code at L1
     }
 Why do you need to specify EBP when accessing pc?  Shouldn't the
 compiler know what the best way to access pc is?  It might want to
 get rid of the frame pointer, or it might want to keep it around in a
 register for use after the asm block, etc.

 
 The compiler doesn't do frame pointer optimization when the inline assembler
 is used, because the results of the inline assembler shouldn't be affected
 by whether optimization is on or off.

But it wouldn't affect the results, if the compiler handles the
assignment to pc rather than the programmer.  And what if I move to a
compiler that *never* uses frame pointers?  The code is now broken,
because I had to make an assumption about what the compiler was doing
with its registers.

Plus, pc is probably going to be used soon after the asm block; why
force it onto the stack and then back?

 If you want, though, you can use the
 'naked' pseudo-op and write the entire function in assembler, and what you
 write is what you get.

Yes, but you can get that by using an external assembler as well. 
The point of inline assembly is to, well, be inline. :-)

 The D compiler does not schedule, move around, optimize, or alter the inline
 assembler instructions. The assumption is that if the programmer is going to
 use inline assembler, the programmer knows exactly what he wants, and will
 write it that way. What you write is what you get.

The problem is that the programmer can't know exactly what he wants,
without knowing some decisions that the compiler will make.  GCC's
syntax allows the programmer to tell the compiler exactly where to
substitute those decisions.  Removing the ability of the compiler to
make the decisions will lead to slower code.

 I guess I'm philosophically opposed to such things. I much prefer the
 straightforward approach of inline assembler that what you write is what you
 get. I also find it odd that gcc provides such things, yet still requires me
 to specify which registers were read/written for the simplest inline asm.

It's not really that odd, seeing as it needs those features to make
up for its inability to parse the assembly code itself.  However,
those features end up granting the programmer more power than what
they replace.

-Scott

May 07 2003

"Walter" <walter digitalmars.com> writes:

"Scott Wood" <scott buserror.net> wrote in message
news:slrnbbjfj0.1a2.scott ti.buserror.net...
 On Wed, 7 May 2003 11:11:40 -0700, Walter <walter digitalmars.com> wrote:
 I suppose, though it'd be a little awkward to use the assembler just
 to call a function without it being inlined.

I'd agree with that.

 Still, I'm a bit uncomfortable with the idea that the compiler's
 always right and cannot be corrected, even explicitly.  I've seen GCC
 silently decide not to inline a function (on which inlining was
 requested) because it was "too big", even though it was just a large
 switch statement on a constant, which ended up being one or two
 instructions after optimization.  Given that no compiler is going to
 make the right choice all the time, it's nice to be able to declare
 one's intent when there's a clear reason to do so.

I think that comes with the territory of using a high level language. If a
particular routine is a major bottleneck in your program (and it does
usually come down to one!), and you want to make the effort to tune it to
the max, write it in inline assembler.

  How about specifying clobbers that
 aren't explicitly in the code, such as when calling a function with
 an unusual calling convention, or when switching threads?

 Called functions must follow the normal register saving convention. If


it is
 an unusual function that clobbers other registers, you'll need to
 save/restore them in the inline assembler.

 Which would defeat the purpose of using a special convention.  For
 example, on a mutex implementation, one might want to make the
 contented case call a function that saves all registers, so that
 the common case doesn't have to spill any registers (other than
 whatever's need to test the mutex).

 Thread switching would also be slower on architectures with a
 reasonable number of registers if you have to manually save all of
 them just because you can't tell the compiler to save (or
 reconstruct) the 2 or 3 it might still care about.

 BTW, will there be any way to tell the inline assembler to put some
 code out-of-line?  Something like:

 inline int lock_mutex(Mutex m)
 {
    int new = whatever_goes_in_there;

    asm {
       eax = 0;  /* This tells the compiler to get a zero into eax,
                    in whatever way it chooses.  Maybe the caller
                    (which is inlining this function) had one lying
                    around in a register, and it can now choose to use
                    eax for that variable. */

The Digital Mars C++ compiler can do this, but after having that capability
for 15 years it just never proved out to be very useful.

       lock; cmpxchg [m.lock], new;
       jz failed;

       outofline {
          failed: /* I hope this label isn't visible outside of this
                     instantiation of this assembly block... */

Yes, it is visible outside. All labels are in one scope per function,
including the inline asm labels.

             push ecx;
             push edx;
             call handle_failed;
             pop edx;
             pop ecx;
             return; /* This tells the compiler to exit the assembly
                        block.  Alternatively, a return label could
                        be declared. */

Exit the assembly block? I don't know what you mean by that.

       }

       /* Tell the compiler that these registers were not, in fact,
          clobbered.  It can't assume it automatically, though, since
          it has no idea what handle_failed might be doing to those
          values on the stack.  Or, to save space, I may have buried
          those pushes into a wrapper assembly function instead,
          where the compiler probably won't see them. */

       noclobber ecx, edx;

That might be a reasonable addition.

       /* Tell the compiler that, since this thing acts as a mutex,
          no memory accesses can be reordered across it.  It's
          probably not necessary in this case, though, as it contains
          a function call. */

       clobber memory;

Unnecessary, as the inline assembler assumes memory is clobbered.

    }
 }

 Also, one of the example code sequences is this:
     void *pc;
     asm
     {
         call L1             ;
      L1:                    ;
         pop EBX             ;
         mov pc[EBP],EBX     ;       // pc now points to code at L1
     }
 Why do you need to specify EBP when accessing pc?  Shouldn't the
 compiler know what the best way to access pc is?  It might want to
 get rid of the frame pointer, or it might want to keep it around in a
 register for use after the asm block, etc.

 The compiler doesn't do frame pointer optimization when the inline


assembler
 is used, because the results of the inline assembler shouldn't be


affected
 by whether optimization is on or off.

 But it wouldn't affect the results, if the compiler handles the
 assignment to pc rather than the programmer.  And what if I move to a
 compiler that *never* uses frame pointers?  The code is now broken,
 because I had to make an assumption about what the compiler was doing
 with its registers.

When using inline asm, you'll always run the risk of nonportability between
compilers - after all, things like register conventions, calling
conventions, etc., are not defined by the language. Only the syntax of the
inline assembler is.


 Plus, pc is probably going to be used soon after the asm block; why
 force it onto the stack and then back?

Because the inline assembler assembles the code long before any register
assignments are done.


 If you want, though, you can use the
 'naked' pseudo-op and write the entire function in assembler, and what


you
 write is what you get.

 Yes, but you can get that by using an external assembler as well.
 The point of inline assembly is to, well, be inline. :-)

I'm currently porting D to linux. Believe me, the inline assembler is a
great boon to that. Just try converting MASM files to gas files! To me,
using gas is like trying to write code looking in a mirror.


 The D compiler does not schedule, move around, optimize, or alter the


inline
 assembler instructions. The assumption is that if the programmer is


going to
 use inline assembler, the programmer knows exactly what he wants, and


will
 write it that way. What you write is what you get.

 The problem is that the programmer can't know exactly what he wants,
 without knowing some decisions that the compiler will make.  GCC's
 syntax allows the programmer to tell the compiler exactly where to
 substitute those decisions.  Removing the ability of the compiler to
 make the decisions will lead to slower code.

You are correct in the abstract. In my experience, I believe the difference
to be negligible. I profile code extensively to make it faster. The
bottlenecks turn out to be maybe 30 lines of code out of a few thousand.
Those I just write completely in hand-tuned inline assembler.


 I guess I'm philosophically opposed to such things. I much prefer the
 straightforward approach of inline assembler that what you write is what


you
 get. I also find it odd that gcc provides such things, yet still


requires me
 to specify which registers were read/written for the simplest inline


asm.
 It's not really that odd, seeing as it needs those features to make
 up for its inability to parse the assembly code itself.  However,
 those features end up granting the programmer more power than what
 they replace.

I understand what you're driving at. It is heavilly integrated in with how
gcc parses, optimizes, and generates code. I don't think that's a good thing
to put in a language spec, as it may unnecessarilly constrain how the
compiler is built.

May 08 2003

C <cc.news gateway.mirlex.com> writes:

Walter wrote:
 "Scott Wood" <scott buserror.net> wrote in message
 news:slrnbbjfj0.1a2.scott ti.buserror.net...

[-snip-]

            return; /* This tells the compiler to exit the assembly
                       block.  Alternatively, a return label could
                       be declared. */

 
 
 Exit the assembly block? I don't know what you mean by that.

If that means what I think is intended, should 'break' be more
approprate?

      }

      /* Tell the compiler that these registers were not, in fact,
         clobbered.  It can't assume it automatically, though, since
         it has no idea what handle_failed might be doing to those
         values on the stack.  Or, to save space, I may have buried
         those pushes into a wrapper assembly function instead,
         where the compiler probably won't see them. */

      noclobber ecx, edx;

 
 
 That might be a reasonable addition.

Agreed, though I would change the keyword, maybe 'retain' would be good,
or the list could be added to the assembler declaration ..

assembler: 'asm' '(' '!' noClobberList ')' '{' assemblerStatements '}'
	| 'asm' '{' assemblerStatements '}'
	;

noClobberList : regiterName ',' noClobberList
	| registerName
	;

such as ...

asm (! ecx, edx ) {
	xor eax, eax
	push ecx
	call myFunc;
}

This is efficient, but its meaning is not immediately clear.

C 2003/5/8

May 07 2003

Scott Wood <scott buserror.net> writes:

On Thu, 8 May 2003 10:50:21 -0700, Walter <walter digitalmars.com> wrote:
 "Scott Wood" <scott buserror.net> wrote in message
 news:slrnbbjfj0.1a2.scott ti.buserror.net...
 Still, I'm a bit uncomfortable with the idea that the compiler's
 always right and cannot be corrected, even explicitly.  I've seen GCC
 silently decide not to inline a function (on which inlining was
 requested) because it was "too big", even though it was just a large
 switch statement on a constant, which ended up being one or two
 instructions after optimization.  Given that no compiler is going to
 make the right choice all the time, it's nice to be able to declare
 one's intent when there's a clear reason to do so.

 
 I think that comes with the territory of using a high level language. If a
 particular routine is a major bottleneck in your program (and it does
 usually come down to one!), and you want to make the effort to tune it to
 the max, write it in inline assembler.

Except that in this case, using inline assembler would have made it
worse.  The code was expecting to have the switch(constant) optimized
away to just the relevant case.  Writing the containing functions in
assembly would not have been realistic, as the to-be-inlined
functions were used all over the source tree (they were used to move
data to/from userspace).

       eax = 0;  /* This tells the compiler to get a zero into eax,
                    in whatever way it chooses.  Maybe the caller
                    (which is inlining this function) had one lying
                    around in a register, and it can now choose to use
                    eax for that variable. */

 
 The Digital Mars C++ compiler can do this, but after having that capability
 for 15 years it just never proved out to be very useful.

It's a pretty small gain in this case, but what if it were a
non-constant, that is almost guaranteed to be in some register before
the asm statement?

       lock; cmpxchg [m.lock], new;
       jz failed;

       outofline {
          failed: /* I hope this label isn't visible outside of this
                     instantiation of this assembly block... */

 
 Yes, it is visible outside. All labels are in one scope per function,
 including the inline asm labels.

I was more worried about it being visible throughout the file (or
caller of the inline function), like it would have been in GCC, since
there's no support for find-the-first-one-in-a-given-direction
labels.

             push ecx;
             push edx;
             call handle_failed;
             pop edx;
             pop ecx;
             return; /* This tells the compiler to exit the assembly
                        block.  Alternatively, a return label could
                        be declared. */

 
 Exit the assembly block? I don't know what you mean by that.

Just a shortcut for declaring a new label at the end and branching
there, which is a rather common construct (especially when using
out-of-line sections).  I agree with "C" that break would be a better
keyword, though.

       /* Tell the compiler that, since this thing acts as a mutex,
          no memory accesses can be reordered across it.  It's
          probably not necessary in this case, though, as it contains
          a function call. */

       clobber memory;

 
 Unnecessary, as the inline assembler assumes memory is clobbered.

It'd be nice if the language didn't force the compiler to do this in
all cases, though.  For instance, it's not necessary when just
reading timestamps, or making use of some fancy computational
instruction for which the compiler doesn't have an intrinsic, or as a
touch-up in a critical function that the compiler doesn't optimize
well enough.  At the very least, "noclobber memory" should exist, but
a compiler should also be allowed to look for itself.  If the
compiler doesn't support this, it could always fall back on assuming
"clobber memory" for everything.

 When using inline asm, you'll always run the risk of nonportability between
 compilers - after all, things like register conventions, calling
 conventions, etc., are not defined by the language. Only the syntax of the
 inline assembler is.

But would it not be better to reduce the potential sources of
nonportability, by letting the programmer tell the compiler to handle
certain details?  If the compiler can know the offset from EBP at
assembly time, it presumably knows that it's on the stack, and thus
that it should index off of EBP.

 Plus, pc is probably going to be used soon after the asm block; why
 force it onto the stack and then back?

 
 Because the inline assembler assembles the code long before any register
 assignments are done.

That's a compiler implementation detail.  Other compilers might not
have that restriction (for example, they may allow the registers to
be patched into the assembled code later on, or use an external
assembler).

If the compiler has to choose registers for the asm block in advance,
it could just add the store instruction itself at the time it handles
the inline assembly (in which case you get exactly the same code as
you do now), or it could remember which register the asm block used
and use that in the subsequent non-asm code.

 The problem is that the programmer can't know exactly what he wants,
 without knowing some decisions that the compiler will make.  GCC's
 syntax allows the programmer to tell the compiler exactly where to
 substitute those decisions.  Removing the ability of the compiler to
 make the decisions will lead to slower code.

 
 You are correct in the abstract. In my experience, I believe the difference
 to be negligible. I profile code extensively to make it faster. The
 bottlenecks turn out to be maybe 30 lines of code out of a few thousand.
 Those I just write completely in hand-tuned inline assembler.

It's a little harder when it's 30,000 lines out of a few million, and
most of that needs to stay portable, so any assembler has to be
buried in separate inline functions.  In any case, I don't think the
language should throw away the opportunity for such optimizations
just because they don't help the majority of programs.  The compiler
is free to not implement them if it doesn't feel they're important.

-Scott

May 08 2003

"Walter" <walter digitalmars.com> writes:

"Scott Wood" <scott buserror.net> wrote in message
news:slrnbbluah.1cq.scott ti.buserror.net...
 On Thu, 8 May 2003 10:50:21 -0700, Walter <walter digitalmars.com> wrote:
 "Scott Wood" <scott buserror.net> wrote in message
 news:slrnbbjfj0.1a2.scott ti.buserror.net...
 Still, I'm a bit uncomfortable with the idea that the compiler's
 always right and cannot be corrected, even explicitly.  I've seen GCC
 silently decide not to inline a function (on which inlining was
 requested) because it was "too big", even though it was just a large
 switch statement on a constant, which ended up being one or two
 instructions after optimization.  Given that no compiler is going to
 make the right choice all the time, it's nice to be able to declare
 one's intent when there's a clear reason to do so.

 I think that comes with the territory of using a high level language. If


a
 particular routine is a major bottleneck in your program (and it does
 usually come down to one!), and you want to make the effort to tune it


to
 the max, write it in inline assembler.

 Except that in this case, using inline assembler would have made it
 worse.  The code was expecting to have the switch(constant) optimized
 away to just the relevant case.  Writing the containing functions in
 assembly would not have been realistic, as the to-be-inlined
 functions were used all over the source tree (they were used to move
 data to/from userspace).

I see the inline/not inline as a quality of implementation issue. The
language design should specify semantics, and the semantics should not
change if something is inlined or not. I want to allow the compiler writer
to be as free as possible to innovate how D is implemented. Trying to
specify exactly what optimizations are performed in the language spec can
forestall that. Note that DMD has a compiler switch to turn inlining on or
off.


       eax = 0;  /* This tells the compiler to get a zero into eax,
                    in whatever way it chooses.  Maybe the caller
                    (which is inlining this function) had one lying
                    around in a register, and it can now choose to use
                    eax for that variable. */

 The Digital Mars C++ compiler can do this, but after having that


capability
 for 15 years it just never proved out to be very useful.

 It's a pretty small gain in this case, but what if it were a
 non-constant, that is almost guaranteed to be in some register before
 the asm statement?

It's not worth it. I have a lot of practice writing fast applications (DMC
is the fastest compiler, and has been for 15 years).


       /* Tell the compiler that, since this thing acts as a mutex,
          no memory accesses can be reordered across it.  It's
          probably not necessary in this case, though, as it contains
          a function call. */
       clobber memory;

 Unnecessary, as the inline assembler assumes memory is clobbered.

 It'd be nice if the language didn't force the compiler to do this in
 all cases, though.  For instance, it's not necessary when just
 reading timestamps, or making use of some fancy computational
 instruction for which the compiler doesn't have an intrinsic, or as a
 touch-up in a critical function that the compiler doesn't optimize
 well enough.  At the very least, "noclobber memory" should exist, but
 a compiler should also be allowed to look for itself.  If the
 compiler doesn't support this, it could always fall back on assuming
 "clobber memory" for everything.

I misspoke. It doesn't do it in cases where none of the asm instructions
could possibly modify memory.


 When using inline asm, you'll always run the risk of nonportability


between
 compilers - after all, things like register conventions, calling
 conventions, etc., are not defined by the language. Only the syntax of


the
 inline assembler is.

 But would it not be better to reduce the potential sources of
 nonportability, by letting the programmer tell the compiler to handle
 certain details?  If the compiler can know the offset from EBP at
 assembly time, it presumably knows that it's on the stack, and thus
 that it should index off of EBP.

One thing I do in inline asm sometimes is muck with stack and the frame
registers. The variable name gives me an offset as if I hadn't - I then
adjust it as necessary.

 Plus, pc is probably going to be used soon after the asm block; why
 force it onto the stack and then back?

 Because the inline assembler assembles the code long before any register
 assignments are done.

 That's a compiler implementation detail.  Other compilers might not
 have that restriction (for example, they may allow the registers to
 be patched into the assembled code later on, or use an external
 assembler).

They may not have that restriction, yes, but I don't want to force the
compiler to be built that way. I want to keep the bar low for building a
basic spec compliant D compiler, while making it possible to build very
advanced spec compliant ones.


 The problem is that the programmer can't know exactly what he wants,
 without knowing some decisions that the compiler will make.  GCC's
 syntax allows the programmer to tell the compiler exactly where to
 substitute those decisions.  Removing the ability of the compiler to
 make the decisions will lead to slower code.

 You are correct in the abstract. In my experience, I believe the


difference
 to be negligible. I profile code extensively to make it faster. The
 bottlenecks turn out to be maybe 30 lines of code out of a few thousand.
 Those I just write completely in hand-tuned inline assembler.

 It's a little harder when it's 30,000 lines out of a few million, and
 most of that needs to stay portable, so any assembler has to be
 buried in separate inline functions.  In any case, I don't think the
 language should throw away the opportunity for such optimizations
 just because they don't help the majority of programs.  The compiler
 is free to not implement them if it doesn't feel they're important.

If the compiler is free not to implement it, then it can't be part of the
language spec. D doesn't preclude any vendors from adding extensions,
though. Extensions are important as they're how new innovations get tried
out. The good ones will wind up getting folded into D. I'm not sure what you
mean by portable, as GCC's way of doing inline assembler is not portable to
any other compiler. As far as I've been able to figure out (with google),
most of it isn't even documented. I figured out how to use it by reading the
kernel listings.

I'm currently in the process of building a linux version of D. It's pretty
sweet to be able to take the inline asm code from win32 and recompile it
under linux and it works just the same with no modification. That's a
hopeless task if you're using separate asm files, or if you're using the
inline assembler from a C compiler. I've even got obj2asm to work on elf
files, so now you can disassemble .o files and see it in intel syntax!

P.S. How I write a whole function in hand-tuned asm is write it in C,
compile it, disassemble it with obj2asm, cut & paste the code back into the
C source in an asm block, and then tune.

May 09 2003

Scott Wood <scott buserror.net> writes:

On Fri, 9 May 2003 01:23:17 -0700, Walter <walter digitalmars.com> wrote:
 I see the inline/not inline as a quality of implementation issue.

For the default case, sure.  I'll wait until compilers have a full,
working AI built in before I trust even the best compiler to *always*
get it right, though.

 The language design should specify semantics, and the semantics
 should not change if something is inlined or not. I want to allow
 the compiler writer to be as free as possible to innovate how D is
 implemented. Trying to specify exactly what optimizations are
 performed in the language spec can forestall that.

I'm not suggesting that the language mandate certain optimizations;
just that there be a standard way of communicating one's intentions
to the compiler.  If the compiler doesn't support inlining at all,
then fine, don't inline; however, if it does support it, it should
pay attention to the programmer's request.

 It's a pretty small gain in this case, but what if it were a
 non-constant, that is almost guaranteed to be in some register before
 the asm statement?

 
 It's not worth it.

If there's no cost to it (as is the case with compilers which already
implement such things, including GCC), then any optimization is
worth it.  It doesn't make the language any harder to write a
compiler for, as a compiler can choose to always interpret an
assignment as a mov statement.

 I have a lot of practice writing fast applications (DMC
 is the fastest compiler, and has been for 15 years).

But how much do you need to use assembly in a compiler?  Take
something like a kernel instead, which often needs to use assembly
for various things, including the aforementioned copying of data
between user and kernel.  This is done a lot, and saving a few cycles
on every such occurance *does* show up in the benchmarks, especially
since so many of them are just copying one or two words (making the
overhead very visible).  Loading the value from userspace, then
storing it on the stack, then loading it again immediately after the
asm block is over will be noticeable.  If you're on anything but a
non-regparm x86, add the cost of storing the user address to the
stack (since it was passed in a register) and then loading it again.

The compiler will generally do these sorts of things for its own
generated code; it doesn't strike me as a freak occurance for a
compiler to allow the user access to the same thing when using inline
assembly.

 But would it not be better to reduce the potential sources of
 nonportability, by letting the programmer tell the compiler to handle
 certain details?  If the compiler can know the offset from EBP at
 assembly time, it presumably knows that it's on the stack, and thus
 that it should index off of EBP.

 
 One thing I do in inline asm sometimes is muck with stack and the frame
 registers. The variable name gives me an offset as if I hadn't - I then
 adjust it as necessary.

If you can specify that the value must be in a register in the
beginning and/or end of the block, you don't need to worry about the
validity of the address in the middle of the block.

 Plus, pc is probably going to be used soon after the asm block; why
 force it onto the stack and then back?

 Because the inline assembler assembles the code long before any register
 assignments are done.

 That's a compiler implementation detail.  Other compilers might not
 have that restriction (for example, they may allow the registers to
 be patched into the assembled code later on, or use an external
 assembler).

 
 They may not have that restriction, yes, but I don't want to force the
 compiler to be built that way.

If the compiler isn't built that way, just act as if the user put a
mov instruction there.  If the syntax allows the user to ask the
compiler to choose the register, it can pick one arbitrarily if it's
not capable of picking a good one.

 It's a little harder when it's 30,000 lines out of a few million, and
 most of that needs to stay portable, so any assembler has to be
 buried in separate inline functions.  In any case, I don't think the
 language should throw away the opportunity for such optimizations
 just because they don't help the majority of programs.  The compiler
 is free to not implement them if it doesn't feel they're important.

 
 If the compiler is free not to implement it, then it can't be part of the
 language spec.

The semantics behind what the programmer requests must be
implemented; it's the optimization that the semantics allow that
does not need to be there in simpler compilers.

 D doesn't preclude any vendors from adding extensions, though.
 Extensions are important as they're how new innovations get tried
 out. The good ones will wind up getting folded into D.

Sure.  However, this often leads to different compilers implementing
the same feature in incompatible ways, requiring programs that want
to use the feature to use lots of conditional compilation to remain
semi-portable. 

If the new feature would require significant effort to implement
correctly (not necessarily efficiently), then I agree that it should
stay out of the language unless it is demonstrated to be sufficiently
useful (though it might sometimes be beneficial to formalize it into
an optional yet standardized extension, so that if it is implemented,
it's implemented in the same way).  However, some of these things
could be implemented (poorly, but correctly and no worse than if the
feature weren't used) with a sed script if one were so inclined.

 I'm not sure what you mean by portable, as GCC's way of doing
 inline assembler is not portable to any other compiler.

Intel's compiler claims to support GCC inline assembly on x86 (their
IA64 compiler apparently doesn't support inline assembly at all). 
However, in general, the lack of portability of inline assembly
between compilers for the same architecture is a bit annoying.

I was hoping that, with D's placing it into the language itself, it
would cease to be an issue.  However, once extensions to the basic
syntax are relied on, you're right back to the current state of
incompatibility.

 As far as I've been able to figure out (with google), most of it
 isn't even documented. I figured out how to use it by reading the
 kernel listings.

It's documented in the GCC info pages.  Look for the "Extended Asm"
node, as well as the section on constraints.

 I'm currently in the process of building a linux version of D. It's pretty
 sweet to be able to take the inline asm code from win32 and recompile it
 under linux and it works just the same with no modification. That's a
 hopeless task if you're using separate asm files,

Not really.  There are Intel-syntax assemblers for Linux (even gas
can be told to use it now), and gas is available for Windows should
one want to go the other way.

 or if you're using the inline assembler from a C compiler. 

Unless you're using the same C compiler on both platforms.

 I've even got obj2asm to work on elf files, so now you can
 disassemble .o files and see it in intel syntax!

GNU objdump can do that as well, by passing "-m i386:intel".

 P.S. How I write a whole function in hand-tuned asm is write it in C,
 compile it, disassemble it with obj2asm, cut & paste the code back into the
 C source in an asm block, and then tune.

And do it over again every time the C code changes, or when a header
it depends on changes (if you notice!).  Each time, doing it for
every supported architecture.  It's still a useful technique for
certain situations, but it's not a replacement for flexible inline
assembly.

-Scott

May 09 2003

Ilya Minkov <midiclub 8ung.at> writes:

Walter wrote:

 I'm currently porting D to linux. Believe me, the inline assembler is a
 great boon to that. Just try converting MASM files to gas files! To me,
 using gas is like trying to write code looking in a mirror.

Why are you using GAS? You can use NASM (or maybe FASM) instead! Both 
use a (cleaned-up?) Intel-Syntax.

There have also been a number of converters NASM <-> GAS <-> MASM. And 
besides, the new GAS has been told to be able to use Intel-Syntax.

BTW, i didn't find a reliable way to use NASM with DigitalMars compilers 
for Windows. It has Borland format, but it somehow didn't work. I'll try 
to reproduce this problem someday later.

-i.

May 10 2003

"Nic Tiger" <tiger7 progtech.ru> writes:

I did find reliable way to use NASM with Digital Mars for Win32 and DOSX
targets.

The problem is that common statement
    section .data
or
    section .code
in COFF and other formats is expanded to something line 'dword aligned
32-bit segment of code(or text)'

When the same statement is used for OBJ format, it is not treated as
pervious.
To make them identical, you should write
    section .code align=4 use32

As for DOSX target, the previous is not sufficient. You should write
    section _DATA class=DATA align=4 use32
or
    section _CODE class=CODE align=4 use32
And moreover, you should place somewhere directive
    group DGROUP _DATA
to tell linker to group data segment in this module with others.

The last described technique (I mean for DOSX target) is fully compatible
with Win32 target code.
I used this in order to compile XVID codec sources both for Win32 and DOSX
with DMC and it works.

BTW, with optimizations turned on C version of codec (when asm is not used)
runs almost twice faster than not optimized one. I think DMC optimizer is
cool!

Nic Tiger.

"Ilya Minkov" <midiclub 8ung.at> wrote in message
news:b9j6pl$32m$1 digitaldaemon.com...
 Walter wrote:

 I'm currently porting D to linux. Believe me, the inline assembler is a
 great boon to that. Just try converting MASM files to gas files! To me,
 using gas is like trying to write code looking in a mirror.

 Why are you using GAS? You can use NASM (or maybe FASM) instead! Both
 use a (cleaned-up?) Intel-Syntax.

 There have also been a number of converters NASM <-> GAS <-> MASM. And
 besides, the new GAS has been told to be able to use Intel-Syntax.

 BTW, i didn't find a reliable way to use NASM with DigitalMars compilers
 for Windows. It has Borland format, but it somehow didn't work. I'll try
 to reproduce this problem someday later.

 -i.

May 10 2003

D Programming

C/C++ Programming

Other

D - Inlining