www.digitalmars.com         C & C++   DMDScript  

D - Inlining

reply Helmut Leitner <helmut.leitner chello.at> writes:
What do we know about inlining except that the compiler
will do it when it feels so?

Is there a guarantee that a simple macro-like definition like 

   void MemClear(char *p,int size) 
   {
      memset(p,0,size);
   }

will be inlined? What if this goes through multiple levels?

--
Helmut Leitner    leitner hls.via.at   
Graz, Austria   www.hls-software.com
Apr 21 2003
next sibling parent reply "Matthew Wilson" <dmd synesis.com.au> writes:
afaik, it is entirely up to the compiler, which is where it should be in
almost all cases.

I think I remember there being discussion about the use of the inline
keyword as something to _force_ the compiler to inline, which I kind of
like, but maybe using that keyword is bad, since all the C++ programmers
will use it everywhere, which may not be appropriate.

force_inline or forceinline might be better, as they're uglier, or even

forceinline
{
    void MemClear(char *p,int size)
    {
       memset(p,0,size);
    }
}

which would be unambiguous and quite obvious

"Helmut Leitner" <helmut.leitner chello.at> wrote in message
news:3EA3AEB3.DDC28183 chello.at...
 What do we know about inlining except that the compiler
 will do it when it feels so?

 Is there a guarantee that a simple macro-like definition like

    void MemClear(char *p,int size)
    {
       memset(p,0,size);
    }

 will be inlined? What if this goes through multiple levels?

 --
 Helmut Leitner    leitner hls.via.at
 Graz, Austria   www.hls-software.com

Apr 21 2003
parent reply Ilya Minkov <midiclub 8ung.at> writes:
Why not inline(always), inline(prefer), inline(never), 
inline(SomeConstantComparedToStandardizedInlinabilityIndex)?
Like the way version already works?

Matthew Wilson wrote:
 afaik, it is entirely up to the compiler, which is where it should be in
 almost all cases.
 
 I think I remember there being discussion about the use of the inline
 keyword as something to _force_ the compiler to inline, which I kind of
 like, but maybe using that keyword is bad, since all the C++ programmers
 will use it everywhere, which may not be appropriate.
 
 force_inline or forceinline might be better, as they're uglier, or even
 
 forceinline
 {
     void MemClear(char *p,int size)
     {
        memset(p,0,size);
     }
 }
 
 which would be unambiguous and quite obvious
 

Apr 21 2003
parent "Matthew Wilson" <matthew stlsoft.org> writes:
Sounds ok to me

"Ilya Minkov" <midiclub 8ung.at> wrote in message
news:b81ms7$2vg8$1 digitaldaemon.com...
 Why not inline(always), inline(prefer), inline(never),
 inline(SomeConstantComparedToStandardizedInlinabilityIndex)?
 Like the way version already works?

 Matthew Wilson wrote:
 afaik, it is entirely up to the compiler, which is where it should be in
 almost all cases.

 I think I remember there being discussion about the use of the inline
 keyword as something to _force_ the compiler to inline, which I kind of
 like, but maybe using that keyword is bad, since all the C++ programmers
 will use it everywhere, which may not be appropriate.

 force_inline or forceinline might be better, as they're uglier, or even

 forceinline
 {
     void MemClear(char *p,int size)
     {
        memset(p,0,size);
     }
 }

 which would be unambiguous and quite obvious


Apr 21 2003
prev sibling parent reply "Walter" <walter digitalmars.com> writes:
"Helmut Leitner" <helmut.leitner chello.at> wrote in message
news:3EA3AEB3.DDC28183 chello.at...
 What do we know about inlining except that the compiler
 will do it when it feels so?

 Is there a guarantee that a simple macro-like definition like

    void MemClear(char *p,int size)
    {
       memset(p,0,size);
    }

 will be inlined? What if this goes through multiple levels?

Think of inlining like the obsolete register keyword in C. Whether obvious inlining is done or not is a quality of implementation issue, not a language issue.
Apr 24 2003
next sibling parent reply Mark T <Mark_member pathlink.com> writes:
 will be inlined? What if this goes through multiple levels?

Think of inlining like the obsolete register keyword in C. Whether obvious inlining is done or not is a quality of implementation issue, not a language issue.

I agree, in the future most D compilers could have various compile-for-speed and compile-for-size switches for various environments ( ex: small embedded targets ) The design of the language should also allow for "Global System Analysis" see JOOP article May 2001 or look for similar info at http://smarteiffel.loria.fr/ http://smarteiffel.loria.fr/papers/papers.html
Apr 25 2003
parent "Walter" <walter digitalmars.com> writes:
"Mark T" <Mark_member pathlink.com> wrote in message
news:b8bb7j$d7b$1 digitaldaemon.com...
 I agree, in the future most D compilers could have various

 compile-for-size switches for various environments ( ex: small embedded

 )

Yes.
 The design of the language should also allow for "Global System Analysis"

 JOOP article May 2001 or look for similar info at

 http://smarteiffel.loria.fr/papers/papers.html

D's design does allow for extensive inter-module analysis, although DMD makes no attempt at it.
Apr 25 2003
prev sibling parent reply Scott Wood <scott buserror.net> writes:
On Thu, 24 Apr 2003 13:03:22 -0700, Walter <walter digitalmars.com> wrote:
 Think of inlining like the obsolete register keyword in C.
 
 Whether obvious inlining is done or not is a quality of implementation
 issue, not a language issue.

It'd still be nice to have a way of explicitly saying that a function either must or must not be inlined. For example, the dynamic linker in the GNU libc will break if certain functions are not inlined, because the relocation has not yet been done. The schedule() function in Linux will break on sparc (and perhaps some other platforms) if it is inlined, if you switch to a task that entered the scheduler via a different containing function. As for the analogy with the register keyword, GCC extends that to allow you to explicitly place variables in specific registers, which is useful in conjunction with assembly code. The uselessness of the original keyword does not mean that anything similar is also useless. Neither of these are things you'd need very often, but when you do, it'd be really unpleasant if they weren't there. After all, D claims to support "Down and dirty programming". :-) -Scott
Apr 27 2003
next sibling parent Ilya Minkov <midiclub 8ung.at> writes:
Who says the register keyword is useless?

I remember some case of some guys using a fairly recent GCC, where they 
could raise performance by 20% by putting in the simple register hint in 
a couple of spots. While the compilers are getting smart, they don't 
know anything particular about the program's typical input values, as 
the programmer usually does.

-i.

Scott Wood wrote:
 As for the analogy with the register keyword, GCC extends that to
 allow you to explicitly place variables in specific registers, which
 is useful in conjunction with assembly code.  The uselessness of the
 original keyword does not mean that anything similar is also useless.
 
 Neither of these are things you'd need very often, but when you do,
 it'd be really unpleasant if they weren't there.  After all, D
 claims to support "Down and dirty programming". :-)
 
 -Scott

Apr 28 2003
prev sibling parent reply "Walter" <walter digitalmars.com> writes:
"Scott Wood" <scott buserror.net> wrote in message
news:slrnbaoctt.3jp.scott ti.buserror.net...
 On Thu, 24 Apr 2003 13:03:22 -0700, Walter <walter digitalmars.com> wrote:
 Think of inlining like the obsolete register keyword in C.

 Whether obvious inlining is done or not is a quality of implementation
 issue, not a language issue.

It'd still be nice to have a way of explicitly saying that a function either must or must not be inlined. For example, the dynamic linker in the GNU libc will break if certain functions are not inlined, because the relocation has not yet been done. The schedule() function in Linux will break on sparc (and perhaps some other platforms) if it is inlined, if you switch to a task that entered the scheduler via a different containing function.

I suspect those functions are heavilly dependent on how a *particular* compiler generates code for that. Depending on that is going outside of the language definition. It makes successful operation of the code overly sensitive to particular compiler versions, etc. (Some linux kernel developers are open about the kernel code being heavilly dependent on how a particular revision of GCC generates code.) You could as easilly write code in D that depends on a particular implementation of D, though with D's support for inline assembler I'd argue that is unnecessary.
 As for the analogy with the register keyword, GCC extends that to
 allow you to explicitly place variables in specific registers, which
 is useful in conjunction with assembly code.  The uselessness of the
 original keyword does not mean that anything similar is also useless.

Those features are not part of the C language; although they are part of GCC, they will not work with every version of GCC, and will not work with any other C compiler. Contrast that with D, which has defined support for inline assembler. Try doing some inline assembler work in GCC, then with D. I think you'll find it supported far better in D, despite GCC's extensions.
 Neither of these are things you'd need very often, but when you do,
 it'd be really unpleasant if they weren't there.  After all, D
 claims to support "Down and dirty programming". :-)

Those things are what the inline assembler is for, and D has very strong support for inline assembler. The C language itself has no support at all for inline assembler, and GCC's support for it is very weak and error-prone (for example, there's an arcane syntax you have to add to say which registers were read and which were written by each asm block - get that wrong, and your code will behave unpredictably. D, on the other hand, keeps track of that automatically).
May 03 2003
parent reply Scott Wood <scott buserror.net> writes:
On Sat, 3 May 2003 14:38:10 -0700, Walter <walter digitalmars.com> wrote:
 "Scott Wood" <scott buserror.net> wrote in message
 news:slrnbaoctt.3jp.scott ti.buserror.net...
 It'd still be nice to have a way of explicitly saying that a function
 either must or must not be inlined.  For example, the dynamic linker
 in the GNU libc will break if certain functions are not inlined,
 because the relocation has not yet been done.  The schedule()
 function in Linux will break on sparc (and perhaps some other
 platforms) if it is inlined, if you switch to a task that entered the
 scheduler via a different containing function.

I suspect those functions are heavilly dependent on how a *particular* compiler generates code for that.

Not particularly, at least in the case of the scheduler. The scheduler's only concern with inlining is that it the destination thread doesn't resume in the wrong inlined instance. The inline assembly is non-portable as well, but only because inline assembly is not part of C.
 Depending on that is going outside of the language definition. 

That depends on what the language definition is. :-)
 It makes successful operation of the code overly sensitive to
 particular compiler versions, etc. (Some linux kernel developers
 are open about the kernel code being heavilly dependent on how a
 particular revision of GCC generates code.)

Some bits have been, but it's mainly been due to Linux developers ignoring GCC's own rules for things like inline assembly constraints, or making assumptions about weird stuff like "inline" assembly outside of any function.
 Those things are what the inline assembler is for, and D has very strong
 support for inline assembler. 

How do you use the inline assembler to tell the compiler not to inline a certain function written in D, not assembly?
 The C language itself has no support at all
 for inline assembler, and GCC's support for it is very weak and error-prone
 (for example, there's an arcane syntax you have to add to say which
 registers were read and which were written by each asm block - get that
 wrong, and your code will behave unpredictably. D, on the other hand, keeps
 track of that automatically).

Is there a way in D inline assembly to ask for a temporary register without mandating a specific one? How about specifying clobbers that aren't explicitly in the code, such as when calling a function with an unusual calling convention, or when switching threads? Also, one of the example code sequences is this: void *pc; asm { call L1 ; L1: ; pop EBX ; mov pc[EBP],EBX ; // pc now points to code at L1 } Why do you need to specify EBP when accessing pc? Shouldn't the compiler know what the best way to access pc is? It might want to get rid of the frame pointer, or it might want to keep it around in a register for use after the asm block, etc. GCC's inline assembly also has the sometimes desirable attribute that the compiler doesn't touch the instructions you specify, other than to schedule the block and substitute the things you asked it to. Will a D compiler be allowed to stick code in the middle of it, in order to satisfy symbolic references, or to schedule instructions? Is it allowed to optimize away mov instructions if it can get the data there on its own? Can it move memory accesses across the asm block? Usually, those sorts of things would be beneficial, but there should be a way to tell it not to do it. -Scott
May 04 2003
parent reply "Walter" <walter digitalmars.com> writes:
"Scott Wood" <scott buserror.net> wrote in message
news:slrnbbalqd.ud.scott ti.buserror.net...
 Some bits have been, but it's mainly been due to Linux developers
 ignoring GCC's own rules for things like inline assembly constraints,
 or making assumptions about weird stuff like "inline" assembly
 outside of any function.
 Those things are what the inline assembler is for, and D has very strong
 support for inline assembler.

inline a certain function written in D, not assembly?

The compiler does not optimize inline assembly that you write. Therefore, if you use the inline assembler to call a function, that function won't be inlined.
 The C language itself has no support at all
 for inline assembler, and GCC's support for it is very weak and


 (for example, there's an arcane syntax you have to add to say which
 registers were read and which were written by each asm block - get that
 wrong, and your code will behave unpredictably. D, on the other hand,


 track of that automatically).

without mandating a specific one?

No. The idea is "what you write is what you get" with the inline assembler.
  How about specifying clobbers that
 aren't explicitly in the code, such as when calling a function with
 an unusual calling convention, or when switching threads?

Called functions must follow the normal register saving convention. If it is an unusual function that clobbers other registers, you'll need to save/restore them in the inline assembler.
 Also, one of the example code sequences is this:
     void *pc;
     asm
     {
         call L1             ;
      L1:                    ;
         pop EBX             ;
         mov pc[EBP],EBX     ;       // pc now points to code at L1
     }
 Why do you need to specify EBP when accessing pc?  Shouldn't the
 compiler know what the best way to access pc is?  It might want to
 get rid of the frame pointer, or it might want to keep it around in a
 register for use after the asm block, etc.

The compiler doesn't do frame pointer optimization when the inline assembler is used, because the results of the inline assembler shouldn't be affected by whether optimization is on or off. If you want, though, you can use the 'naked' pseudo-op and write the entire function in assembler, and what you write is what you get.
 GCC's inline assembly also has the sometimes desirable attribute
 that the compiler doesn't touch the instructions you specify, other
 than to schedule the block and substitute the things you asked it to.
 Will a D compiler be allowed to stick code in the middle of it, in
 order to satisfy symbolic references, or to schedule instructions?
 Is it allowed to optimize away mov instructions if it can get the
 data there on its own?  Can it move memory accesses across the asm
 block?

The D compiler does not schedule, move around, optimize, or alter the inline assembler instructions. The assumption is that if the programmer is going to use inline assembler, the programmer knows exactly what he wants, and will write it that way. What you write is what you get.
 Usually, those sorts of things would be beneficial, but there should
 be a way to tell it not to do it.

I guess I'm philosophically opposed to such things. I much prefer the straightforward approach of inline assembler that what you write is what you get. I also find it odd that gcc provides such things, yet still requires me to specify which registers were read/written for the simplest inline asm.
May 07 2003
parent reply Scott Wood <scott buserror.net> writes:
On Wed, 7 May 2003 11:11:40 -0700, Walter <walter digitalmars.com> wrote:
 The compiler does not optimize inline assembly that you write. Therefore, if
 you use the inline assembler to call a function, that function won't be
 inlined.

I suppose, though it'd be a little awkward to use the assembler just to call a function without it being inlined. Still, I'm a bit uncomfortable with the idea that the compiler's always right and cannot be corrected, even explicitly. I've seen GCC silently decide not to inline a function (on which inlining was requested) because it was "too big", even though it was just a large switch statement on a constant, which ended up being one or two instructions after optimization. Given that no compiler is going to make the right choice all the time, it's nice to be able to declare one's intent when there's a clear reason to do so.
  How about specifying clobbers that
 aren't explicitly in the code, such as when calling a function with
 an unusual calling convention, or when switching threads?

Called functions must follow the normal register saving convention. If it is an unusual function that clobbers other registers, you'll need to save/restore them in the inline assembler.

Which would defeat the purpose of using a special convention. For example, on a mutex implementation, one might want to make the contented case call a function that saves all registers, so that the common case doesn't have to spill any registers (other than whatever's need to test the mutex). Thread switching would also be slower on architectures with a reasonable number of registers if you have to manually save all of them just because you can't tell the compiler to save (or reconstruct) the 2 or 3 it might still care about. BTW, will there be any way to tell the inline assembler to put some code out-of-line? Something like: inline int lock_mutex(Mutex m) { int new = whatever_goes_in_there; asm { eax = 0; /* This tells the compiler to get a zero into eax, in whatever way it chooses. Maybe the caller (which is inlining this function) had one lying around in a register, and it can now choose to use eax for that variable. */ lock; cmpxchg [m.lock], new; jz failed; outofline { failed: /* I hope this label isn't visible outside of this instantiation of this assembly block... */ push ecx; push edx; call handle_failed; pop edx; pop ecx; return; /* This tells the compiler to exit the assembly block. Alternatively, a return label could be declared. */ } /* Tell the compiler that these registers were not, in fact, clobbered. It can't assume it automatically, though, since it has no idea what handle_failed might be doing to those values on the stack. Or, to save space, I may have buried those pushes into a wrapper assembly function instead, where the compiler probably won't see them. */ noclobber ecx, edx; /* Tell the compiler that, since this thing acts as a mutex, no memory accesses can be reordered across it. It's probably not necessary in this case, though, as it contains a function call. */ clobber memory; } }
 Also, one of the example code sequences is this:
     void *pc;
     asm
     {
         call L1             ;
      L1:                    ;
         pop EBX             ;
         mov pc[EBP],EBX     ;       // pc now points to code at L1
     }
 Why do you need to specify EBP when accessing pc?  Shouldn't the
 compiler know what the best way to access pc is?  It might want to
 get rid of the frame pointer, or it might want to keep it around in a
 register for use after the asm block, etc.

The compiler doesn't do frame pointer optimization when the inline assembler is used, because the results of the inline assembler shouldn't be affected by whether optimization is on or off.

But it wouldn't affect the results, if the compiler handles the assignment to pc rather than the programmer. And what if I move to a compiler that *never* uses frame pointers? The code is now broken, because I had to make an assumption about what the compiler was doing with its registers. Plus, pc is probably going to be used soon after the asm block; why force it onto the stack and then back?
 If you want, though, you can use the
 'naked' pseudo-op and write the entire function in assembler, and what you
 write is what you get.

Yes, but you can get that by using an external assembler as well. The point of inline assembly is to, well, be inline. :-)
 The D compiler does not schedule, move around, optimize, or alter the inline
 assembler instructions. The assumption is that if the programmer is going to
 use inline assembler, the programmer knows exactly what he wants, and will
 write it that way. What you write is what you get.

The problem is that the programmer can't know exactly what he wants, without knowing some decisions that the compiler will make. GCC's syntax allows the programmer to tell the compiler exactly where to substitute those decisions. Removing the ability of the compiler to make the decisions will lead to slower code.
 I guess I'm philosophically opposed to such things. I much prefer the
 straightforward approach of inline assembler that what you write is what you
 get. I also find it odd that gcc provides such things, yet still requires me
 to specify which registers were read/written for the simplest inline asm.

It's not really that odd, seeing as it needs those features to make up for its inability to parse the assembly code itself. However, those features end up granting the programmer more power than what they replace. -Scott
May 07 2003
parent reply "Walter" <walter digitalmars.com> writes:
"Scott Wood" <scott buserror.net> wrote in message
news:slrnbbjfj0.1a2.scott ti.buserror.net...
 On Wed, 7 May 2003 11:11:40 -0700, Walter <walter digitalmars.com> wrote:
 I suppose, though it'd be a little awkward to use the assembler just
 to call a function without it being inlined.

I'd agree with that.
 Still, I'm a bit uncomfortable with the idea that the compiler's
 always right and cannot be corrected, even explicitly.  I've seen GCC
 silently decide not to inline a function (on which inlining was
 requested) because it was "too big", even though it was just a large
 switch statement on a constant, which ended up being one or two
 instructions after optimization.  Given that no compiler is going to
 make the right choice all the time, it's nice to be able to declare
 one's intent when there's a clear reason to do so.

I think that comes with the territory of using a high level language. If a particular routine is a major bottleneck in your program (and it does usually come down to one!), and you want to make the effort to tune it to the max, write it in inline assembler.
  How about specifying clobbers that
 aren't explicitly in the code, such as when calling a function with
 an unusual calling convention, or when switching threads?



 an unusual function that clobbers other registers, you'll need to
 save/restore them in the inline assembler.

example, on a mutex implementation, one might want to make the contented case call a function that saves all registers, so that the common case doesn't have to spill any registers (other than whatever's need to test the mutex). Thread switching would also be slower on architectures with a reasonable number of registers if you have to manually save all of them just because you can't tell the compiler to save (or reconstruct) the 2 or 3 it might still care about. BTW, will there be any way to tell the inline assembler to put some code out-of-line? Something like: inline int lock_mutex(Mutex m) { int new = whatever_goes_in_there; asm { eax = 0; /* This tells the compiler to get a zero into eax, in whatever way it chooses. Maybe the caller (which is inlining this function) had one lying around in a register, and it can now choose to use eax for that variable. */

The Digital Mars C++ compiler can do this, but after having that capability for 15 years it just never proved out to be very useful.
       lock; cmpxchg [m.lock], new;
       jz failed;

       outofline {
          failed: /* I hope this label isn't visible outside of this
                     instantiation of this assembly block... */

Yes, it is visible outside. All labels are in one scope per function, including the inline asm labels.
             push ecx;
             push edx;
             call handle_failed;
             pop edx;
             pop ecx;
             return; /* This tells the compiler to exit the assembly
                        block.  Alternatively, a return label could
                        be declared. */

Exit the assembly block? I don't know what you mean by that.
       }

       /* Tell the compiler that these registers were not, in fact,
          clobbered.  It can't assume it automatically, though, since
          it has no idea what handle_failed might be doing to those
          values on the stack.  Or, to save space, I may have buried
          those pushes into a wrapper assembly function instead,
          where the compiler probably won't see them. */

       noclobber ecx, edx;

That might be a reasonable addition.
       /* Tell the compiler that, since this thing acts as a mutex,
          no memory accesses can be reordered across it.  It's
          probably not necessary in this case, though, as it contains
          a function call. */

       clobber memory;

Unnecessary, as the inline assembler assumes memory is clobbered.
    }
 }

 Also, one of the example code sequences is this:
     void *pc;
     asm
     {
         call L1             ;
      L1:                    ;
         pop EBX             ;
         mov pc[EBP],EBX     ;       // pc now points to code at L1
     }
 Why do you need to specify EBP when accessing pc?  Shouldn't the
 compiler know what the best way to access pc is?  It might want to
 get rid of the frame pointer, or it might want to keep it around in a
 register for use after the asm block, etc.

The compiler doesn't do frame pointer optimization when the inline


 is used, because the results of the inline assembler shouldn't be


 by whether optimization is on or off.

But it wouldn't affect the results, if the compiler handles the assignment to pc rather than the programmer. And what if I move to a compiler that *never* uses frame pointers? The code is now broken, because I had to make an assumption about what the compiler was doing with its registers.

When using inline asm, you'll always run the risk of nonportability between compilers - after all, things like register conventions, calling conventions, etc., are not defined by the language. Only the syntax of the inline assembler is.
 Plus, pc is probably going to be used soon after the asm block; why
 force it onto the stack and then back?

Because the inline assembler assembles the code long before any register assignments are done.
 If you want, though, you can use the
 'naked' pseudo-op and write the entire function in assembler, and what


 write is what you get.

The point of inline assembly is to, well, be inline. :-)

I'm currently porting D to linux. Believe me, the inline assembler is a great boon to that. Just try converting MASM files to gas files! To me, using gas is like trying to write code looking in a mirror.
 The D compiler does not schedule, move around, optimize, or alter the


 assembler instructions. The assumption is that if the programmer is


 use inline assembler, the programmer knows exactly what he wants, and


 write it that way. What you write is what you get.

without knowing some decisions that the compiler will make. GCC's syntax allows the programmer to tell the compiler exactly where to substitute those decisions. Removing the ability of the compiler to make the decisions will lead to slower code.

You are correct in the abstract. In my experience, I believe the difference to be negligible. I profile code extensively to make it faster. The bottlenecks turn out to be maybe 30 lines of code out of a few thousand. Those I just write completely in hand-tuned inline assembler.
 I guess I'm philosophically opposed to such things. I much prefer the
 straightforward approach of inline assembler that what you write is what


 get. I also find it odd that gcc provides such things, yet still


 to specify which registers were read/written for the simplest inline


 It's not really that odd, seeing as it needs those features to make
 up for its inability to parse the assembly code itself.  However,
 those features end up granting the programmer more power than what
 they replace.

I understand what you're driving at. It is heavilly integrated in with how gcc parses, optimizes, and generates code. I don't think that's a good thing to put in a language spec, as it may unnecessarilly constrain how the compiler is built.
May 08 2003
next sibling parent C <cc.news gateway.mirlex.com> writes:
Walter wrote:
 "Scott Wood" <scott buserror.net> wrote in message
 news:slrnbbjfj0.1a2.scott ti.buserror.net...

[-snip-]
            return; /* This tells the compiler to exit the assembly
                       block.  Alternatively, a return label could
                       be declared. */

Exit the assembly block? I don't know what you mean by that.

If that means what I think is intended, should 'break' be more approprate?
      }

      /* Tell the compiler that these registers were not, in fact,
         clobbered.  It can't assume it automatically, though, since
         it has no idea what handle_failed might be doing to those
         values on the stack.  Or, to save space, I may have buried
         those pushes into a wrapper assembly function instead,
         where the compiler probably won't see them. */

      noclobber ecx, edx;

That might be a reasonable addition.

Agreed, though I would change the keyword, maybe 'retain' would be good, or the list could be added to the assembler declaration .. assembler: 'asm' '(' '!' noClobberList ')' '{' assemblerStatements '}' | 'asm' '{' assemblerStatements '}' ; noClobberList : regiterName ',' noClobberList | registerName ; such as ... asm (! ecx, edx ) { xor eax, eax push ecx call myFunc; } This is efficient, but its meaning is not immediately clear. C 2003/5/8
May 07 2003
prev sibling next sibling parent reply Scott Wood <scott buserror.net> writes:
On Thu, 8 May 2003 10:50:21 -0700, Walter <walter digitalmars.com> wrote:
 "Scott Wood" <scott buserror.net> wrote in message
 news:slrnbbjfj0.1a2.scott ti.buserror.net...
 Still, I'm a bit uncomfortable with the idea that the compiler's
 always right and cannot be corrected, even explicitly.  I've seen GCC
 silently decide not to inline a function (on which inlining was
 requested) because it was "too big", even though it was just a large
 switch statement on a constant, which ended up being one or two
 instructions after optimization.  Given that no compiler is going to
 make the right choice all the time, it's nice to be able to declare
 one's intent when there's a clear reason to do so.

I think that comes with the territory of using a high level language. If a particular routine is a major bottleneck in your program (and it does usually come down to one!), and you want to make the effort to tune it to the max, write it in inline assembler.

Except that in this case, using inline assembler would have made it worse. The code was expecting to have the switch(constant) optimized away to just the relevant case. Writing the containing functions in assembly would not have been realistic, as the to-be-inlined functions were used all over the source tree (they were used to move data to/from userspace).
       eax = 0;  /* This tells the compiler to get a zero into eax,
                    in whatever way it chooses.  Maybe the caller
                    (which is inlining this function) had one lying
                    around in a register, and it can now choose to use
                    eax for that variable. */

The Digital Mars C++ compiler can do this, but after having that capability for 15 years it just never proved out to be very useful.

It's a pretty small gain in this case, but what if it were a non-constant, that is almost guaranteed to be in some register before the asm statement?
       lock; cmpxchg [m.lock], new;
       jz failed;

       outofline {
          failed: /* I hope this label isn't visible outside of this
                     instantiation of this assembly block... */

Yes, it is visible outside. All labels are in one scope per function, including the inline asm labels.

I was more worried about it being visible throughout the file (or caller of the inline function), like it would have been in GCC, since there's no support for find-the-first-one-in-a-given-direction labels.
             push ecx;
             push edx;
             call handle_failed;
             pop edx;
             pop ecx;
             return; /* This tells the compiler to exit the assembly
                        block.  Alternatively, a return label could
                        be declared. */

Exit the assembly block? I don't know what you mean by that.

Just a shortcut for declaring a new label at the end and branching there, which is a rather common construct (especially when using out-of-line sections). I agree with "C" that break would be a better keyword, though.
       /* Tell the compiler that, since this thing acts as a mutex,
          no memory accesses can be reordered across it.  It's
          probably not necessary in this case, though, as it contains
          a function call. */

       clobber memory;

Unnecessary, as the inline assembler assumes memory is clobbered.

It'd be nice if the language didn't force the compiler to do this in all cases, though. For instance, it's not necessary when just reading timestamps, or making use of some fancy computational instruction for which the compiler doesn't have an intrinsic, or as a touch-up in a critical function that the compiler doesn't optimize well enough. At the very least, "noclobber memory" should exist, but a compiler should also be allowed to look for itself. If the compiler doesn't support this, it could always fall back on assuming "clobber memory" for everything.
 When using inline asm, you'll always run the risk of nonportability between
 compilers - after all, things like register conventions, calling
 conventions, etc., are not defined by the language. Only the syntax of the
 inline assembler is.

But would it not be better to reduce the potential sources of nonportability, by letting the programmer tell the compiler to handle certain details? If the compiler can know the offset from EBP at assembly time, it presumably knows that it's on the stack, and thus that it should index off of EBP.
 Plus, pc is probably going to be used soon after the asm block; why
 force it onto the stack and then back?

Because the inline assembler assembles the code long before any register assignments are done.

That's a compiler implementation detail. Other compilers might not have that restriction (for example, they may allow the registers to be patched into the assembled code later on, or use an external assembler). If the compiler has to choose registers for the asm block in advance, it could just add the store instruction itself at the time it handles the inline assembly (in which case you get exactly the same code as you do now), or it could remember which register the asm block used and use that in the subsequent non-asm code.
 The problem is that the programmer can't know exactly what he wants,
 without knowing some decisions that the compiler will make.  GCC's
 syntax allows the programmer to tell the compiler exactly where to
 substitute those decisions.  Removing the ability of the compiler to
 make the decisions will lead to slower code.

You are correct in the abstract. In my experience, I believe the difference to be negligible. I profile code extensively to make it faster. The bottlenecks turn out to be maybe 30 lines of code out of a few thousand. Those I just write completely in hand-tuned inline assembler.

It's a little harder when it's 30,000 lines out of a few million, and most of that needs to stay portable, so any assembler has to be buried in separate inline functions. In any case, I don't think the language should throw away the opportunity for such optimizations just because they don't help the majority of programs. The compiler is free to not implement them if it doesn't feel they're important. -Scott
May 08 2003
parent reply "Walter" <walter digitalmars.com> writes:
"Scott Wood" <scott buserror.net> wrote in message
news:slrnbbluah.1cq.scott ti.buserror.net...
 On Thu, 8 May 2003 10:50:21 -0700, Walter <walter digitalmars.com> wrote:
 "Scott Wood" <scott buserror.net> wrote in message
 news:slrnbbjfj0.1a2.scott ti.buserror.net...
 Still, I'm a bit uncomfortable with the idea that the compiler's
 always right and cannot be corrected, even explicitly.  I've seen GCC
 silently decide not to inline a function (on which inlining was
 requested) because it was "too big", even though it was just a large
 switch statement on a constant, which ended up being one or two
 instructions after optimization.  Given that no compiler is going to
 make the right choice all the time, it's nice to be able to declare
 one's intent when there's a clear reason to do so.



 particular routine is a major bottleneck in your program (and it does
 usually come down to one!), and you want to make the effort to tune it


 the max, write it in inline assembler.

worse. The code was expecting to have the switch(constant) optimized away to just the relevant case. Writing the containing functions in assembly would not have been realistic, as the to-be-inlined functions were used all over the source tree (they were used to move data to/from userspace).

I see the inline/not inline as a quality of implementation issue. The language design should specify semantics, and the semantics should not change if something is inlined or not. I want to allow the compiler writer to be as free as possible to innovate how D is implemented. Trying to specify exactly what optimizations are performed in the language spec can forestall that. Note that DMD has a compiler switch to turn inlining on or off.
       eax = 0;  /* This tells the compiler to get a zero into eax,
                    in whatever way it chooses.  Maybe the caller
                    (which is inlining this function) had one lying
                    around in a register, and it can now choose to use
                    eax for that variable. */



 for 15 years it just never proved out to be very useful.

non-constant, that is almost guaranteed to be in some register before the asm statement?

It's not worth it. I have a lot of practice writing fast applications (DMC is the fastest compiler, and has been for 15 years).
       /* Tell the compiler that, since this thing acts as a mutex,
          no memory accesses can be reordered across it.  It's
          probably not necessary in this case, though, as it contains
          a function call. */
       clobber memory;


all cases, though. For instance, it's not necessary when just reading timestamps, or making use of some fancy computational instruction for which the compiler doesn't have an intrinsic, or as a touch-up in a critical function that the compiler doesn't optimize well enough. At the very least, "noclobber memory" should exist, but a compiler should also be allowed to look for itself. If the compiler doesn't support this, it could always fall back on assuming "clobber memory" for everything.

I misspoke. It doesn't do it in cases where none of the asm instructions could possibly modify memory.
 When using inline asm, you'll always run the risk of nonportability


 compilers - after all, things like register conventions, calling
 conventions, etc., are not defined by the language. Only the syntax of


 inline assembler is.

nonportability, by letting the programmer tell the compiler to handle certain details? If the compiler can know the offset from EBP at assembly time, it presumably knows that it's on the stack, and thus that it should index off of EBP.

One thing I do in inline asm sometimes is muck with stack and the frame registers. The variable name gives me an offset as if I hadn't - I then adjust it as necessary.
 Plus, pc is probably going to be used soon after the asm block; why
 force it onto the stack and then back?

assignments are done.

have that restriction (for example, they may allow the registers to be patched into the assembled code later on, or use an external assembler).

They may not have that restriction, yes, but I don't want to force the compiler to be built that way. I want to keep the bar low for building a basic spec compliant D compiler, while making it possible to build very advanced spec compliant ones.
 The problem is that the programmer can't know exactly what he wants,
 without knowing some decisions that the compiler will make.  GCC's
 syntax allows the programmer to tell the compiler exactly where to
 substitute those decisions.  Removing the ability of the compiler to
 make the decisions will lead to slower code.



 to be negligible. I profile code extensively to make it faster. The
 bottlenecks turn out to be maybe 30 lines of code out of a few thousand.
 Those I just write completely in hand-tuned inline assembler.

most of that needs to stay portable, so any assembler has to be buried in separate inline functions. In any case, I don't think the language should throw away the opportunity for such optimizations just because they don't help the majority of programs. The compiler is free to not implement them if it doesn't feel they're important.

If the compiler is free not to implement it, then it can't be part of the language spec. D doesn't preclude any vendors from adding extensions, though. Extensions are important as they're how new innovations get tried out. The good ones will wind up getting folded into D. I'm not sure what you mean by portable, as GCC's way of doing inline assembler is not portable to any other compiler. As far as I've been able to figure out (with google), most of it isn't even documented. I figured out how to use it by reading the kernel listings. I'm currently in the process of building a linux version of D. It's pretty sweet to be able to take the inline asm code from win32 and recompile it under linux and it works just the same with no modification. That's a hopeless task if you're using separate asm files, or if you're using the inline assembler from a C compiler. I've even got obj2asm to work on elf files, so now you can disassemble .o files and see it in intel syntax! P.S. How I write a whole function in hand-tuned asm is write it in C, compile it, disassemble it with obj2asm, cut & paste the code back into the C source in an asm block, and then tune.
May 09 2003
parent Scott Wood <scott buserror.net> writes:
On Fri, 9 May 2003 01:23:17 -0700, Walter <walter digitalmars.com> wrote:
 I see the inline/not inline as a quality of implementation issue.

For the default case, sure. I'll wait until compilers have a full, working AI built in before I trust even the best compiler to *always* get it right, though.
 The language design should specify semantics, and the semantics
 should not change if something is inlined or not. I want to allow
 the compiler writer to be as free as possible to innovate how D is
 implemented. Trying to specify exactly what optimizations are
 performed in the language spec can forestall that.

I'm not suggesting that the language mandate certain optimizations; just that there be a standard way of communicating one's intentions to the compiler. If the compiler doesn't support inlining at all, then fine, don't inline; however, if it does support it, it should pay attention to the programmer's request.
 It's a pretty small gain in this case, but what if it were a
 non-constant, that is almost guaranteed to be in some register before
 the asm statement?

It's not worth it.

If there's no cost to it (as is the case with compilers which already implement such things, including GCC), then any optimization is worth it. It doesn't make the language any harder to write a compiler for, as a compiler can choose to always interpret an assignment as a mov statement.
 I have a lot of practice writing fast applications (DMC
 is the fastest compiler, and has been for 15 years).

But how much do you need to use assembly in a compiler? Take something like a kernel instead, which often needs to use assembly for various things, including the aforementioned copying of data between user and kernel. This is done a lot, and saving a few cycles on every such occurance *does* show up in the benchmarks, especially since so many of them are just copying one or two words (making the overhead very visible). Loading the value from userspace, then storing it on the stack, then loading it again immediately after the asm block is over will be noticeable. If you're on anything but a non-regparm x86, add the cost of storing the user address to the stack (since it was passed in a register) and then loading it again. The compiler will generally do these sorts of things for its own generated code; it doesn't strike me as a freak occurance for a compiler to allow the user access to the same thing when using inline assembly.
 But would it not be better to reduce the potential sources of
 nonportability, by letting the programmer tell the compiler to handle
 certain details?  If the compiler can know the offset from EBP at
 assembly time, it presumably knows that it's on the stack, and thus
 that it should index off of EBP.

One thing I do in inline asm sometimes is muck with stack and the frame registers. The variable name gives me an offset as if I hadn't - I then adjust it as necessary.

If you can specify that the value must be in a register in the beginning and/or end of the block, you don't need to worry about the validity of the address in the middle of the block.
 Plus, pc is probably going to be used soon after the asm block; why
 force it onto the stack and then back?

assignments are done.

have that restriction (for example, they may allow the registers to be patched into the assembled code later on, or use an external assembler).

They may not have that restriction, yes, but I don't want to force the compiler to be built that way.

If the compiler isn't built that way, just act as if the user put a mov instruction there. If the syntax allows the user to ask the compiler to choose the register, it can pick one arbitrarily if it's not capable of picking a good one.
 It's a little harder when it's 30,000 lines out of a few million, and
 most of that needs to stay portable, so any assembler has to be
 buried in separate inline functions.  In any case, I don't think the
 language should throw away the opportunity for such optimizations
 just because they don't help the majority of programs.  The compiler
 is free to not implement them if it doesn't feel they're important.

If the compiler is free not to implement it, then it can't be part of the language spec.

The semantics behind what the programmer requests must be implemented; it's the optimization that the semantics allow that does not need to be there in simpler compilers.
 D doesn't preclude any vendors from adding extensions, though.
 Extensions are important as they're how new innovations get tried
 out. The good ones will wind up getting folded into D.

Sure. However, this often leads to different compilers implementing the same feature in incompatible ways, requiring programs that want to use the feature to use lots of conditional compilation to remain semi-portable. If the new feature would require significant effort to implement correctly (not necessarily efficiently), then I agree that it should stay out of the language unless it is demonstrated to be sufficiently useful (though it might sometimes be beneficial to formalize it into an optional yet standardized extension, so that if it is implemented, it's implemented in the same way). However, some of these things could be implemented (poorly, but correctly and no worse than if the feature weren't used) with a sed script if one were so inclined.
 I'm not sure what you mean by portable, as GCC's way of doing
 inline assembler is not portable to any other compiler.

Intel's compiler claims to support GCC inline assembly on x86 (their IA64 compiler apparently doesn't support inline assembly at all). However, in general, the lack of portability of inline assembly between compilers for the same architecture is a bit annoying. I was hoping that, with D's placing it into the language itself, it would cease to be an issue. However, once extensions to the basic syntax are relied on, you're right back to the current state of incompatibility.
 As far as I've been able to figure out (with google), most of it
 isn't even documented. I figured out how to use it by reading the
 kernel listings.

It's documented in the GCC info pages. Look for the "Extended Asm" node, as well as the section on constraints.
 I'm currently in the process of building a linux version of D. It's pretty
 sweet to be able to take the inline asm code from win32 and recompile it
 under linux and it works just the same with no modification. That's a
 hopeless task if you're using separate asm files,

Not really. There are Intel-syntax assemblers for Linux (even gas can be told to use it now), and gas is available for Windows should one want to go the other way.
 or if you're using the inline assembler from a C compiler. 

Unless you're using the same C compiler on both platforms.
 I've even got obj2asm to work on elf files, so now you can
 disassemble .o files and see it in intel syntax!

GNU objdump can do that as well, by passing "-m i386:intel".
 P.S. How I write a whole function in hand-tuned asm is write it in C,
 compile it, disassemble it with obj2asm, cut & paste the code back into the
 C source in an asm block, and then tune.

And do it over again every time the C code changes, or when a header it depends on changes (if you notice!). Each time, doing it for every supported architecture. It's still a useful technique for certain situations, but it's not a replacement for flexible inline assembly. -Scott
May 09 2003
prev sibling parent reply Ilya Minkov <midiclub 8ung.at> writes:
Walter wrote:

 I'm currently porting D to linux. Believe me, the inline assembler is a
 great boon to that. Just try converting MASM files to gas files! To me,
 using gas is like trying to write code looking in a mirror.

Why are you using GAS? You can use NASM (or maybe FASM) instead! Both use a (cleaned-up?) Intel-Syntax. There have also been a number of converters NASM <-> GAS <-> MASM. And besides, the new GAS has been told to be able to use Intel-Syntax. BTW, i didn't find a reliable way to use NASM with DigitalMars compilers for Windows. It has Borland format, but it somehow didn't work. I'll try to reproduce this problem someday later. -i.
May 10 2003
parent "Nic Tiger" <tiger7 progtech.ru> writes:
I did find reliable way to use NASM with Digital Mars for Win32 and DOSX
targets.

The problem is that common statement
    section .data
or
    section .code
in COFF and other formats is expanded to something line 'dword aligned
32-bit segment of code(or text)'

When the same statement is used for OBJ format, it is not treated as
pervious.
To make them identical, you should write
    section .code align=4 use32

As for DOSX target, the previous is not sufficient. You should write
    section _DATA class=DATA align=4 use32
or
    section _CODE class=CODE align=4 use32
And moreover, you should place somewhere directive
    group DGROUP _DATA
to tell linker to group data segment in this module with others.

The last described technique (I mean for DOSX target) is fully compatible
with Win32 target code.
I used this in order to compile XVID codec sources both for Win32 and DOSX
with DMC and it works.

BTW, with optimizations turned on C version of codec (when asm is not used)
runs almost twice faster than not optimized one. I think DMC optimizer is
cool!

Nic Tiger.

"Ilya Minkov" <midiclub 8ung.at> wrote in message
news:b9j6pl$32m$1 digitaldaemon.com...
 Walter wrote:

 I'm currently porting D to linux. Believe me, the inline assembler is a
 great boon to that. Just try converting MASM files to gas files! To me,
 using gas is like trying to write code looking in a mirror.

Why are you using GAS? You can use NASM (or maybe FASM) instead! Both use a (cleaned-up?) Intel-Syntax. There have also been a number of converters NASM <-> GAS <-> MASM. And besides, the new GAS has been told to be able to use Intel-Syntax. BTW, i didn't find a reliable way to use NASM with DigitalMars compilers for Windows. It has Borland format, but it somehow didn't work. I'll try to reproduce this problem someday later. -i.

May 10 2003