www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - GDC review process.

reply "Iain Buclaw" <ibuclaw ubuntu.com> writes:
Hi,

Had round one of the code review process, so I'm going to post 
the main issues here that most affect D users / the platforms 
they want to run on / the compiler version they want to use.



1) D Inline Asm and naked function support is raising far too 
many alarm bells. So would just be easier to remove it and avoid 
all the other comments on why we need middle-end and backend 
headers in gdc.


2) Code with #if V1 and V2 raised another bell with the request 
to remove all code that relies on internal macros with proper 
if() conditions. If something is always going to be turned off, 
remove it.

So, we shall also be saying bye bye D1 in GDC.  We'll miss you!


3) For anyone who has submitted patches for Mingw and Apple - 
sorry, but I'm going to have to yank out or alter certain bits.  
Apple GCC is irrelevant now, and some Mingw checks look for 
if(target) when it should really be checking if(host) and vice 
versa!


Most discussion I would imagine be on the decision to remove D 
inline assembler support from gdc.  So, nay sayers, do your 
worst, but unfortunately there is a +1 here for removal.


Regards
Iain
Jun 19 2012
next sibling parent =?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= <alex lycus.org> writes:
On 19-06-2012 20:19, Iain Buclaw wrote:
 Hi,

 Had round one of the code review process, so I'm going to post the main
 issues here that most affect D users / the platforms they want to run on
 / the compiler version they want to use.



 1) D Inline Asm and naked function support is raising far too many alarm
 bells. So would just be easier to remove it and avoid all the other
 comments on why we need middle-end and backend headers in gdc.


 2) Code with #if V1 and V2 raised another bell with the request to
 remove all code that relies on internal macros with proper if()
 conditions. If something is always going to be turned off, remove it.

 So, we shall also be saying bye bye D1 in GDC. We'll miss you!


 3) For anyone who has submitted patches for Mingw and Apple - sorry, but
 I'm going to have to yank out or alter certain bits. Apple GCC is
 irrelevant now, and some Mingw checks look for if(target) when it should
 really be checking if(host) and vice versa!


 Most discussion I would imagine be on the decision to remove D inline
 assembler support from gdc. So, nay sayers, do your worst, but
 unfortunately there is a +1 here for removal.


 Regards
 Iain

+1 for removal of inline asm. -- Alex Rønne Petersen alex lycus.org http://lycus.org
Jun 19 2012
prev sibling next sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Iain Buclaw:

 Most discussion I would imagine be on the decision to remove D 
 inline assembler support from gdc.  So, nay sayers, do your 
 worst, but unfortunately there is a +1 here for removal.

I suggest to try to do the opposite, that it to try to increase the current conformance of GDC to D/DMD specs (like introducing D calling conventions, if they are missing). Bye, bearophile
Jun 19 2012
parent reply =?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= <alex lycus.org> writes:
On 19-06-2012 20:44, bearophile wrote:
 Iain Buclaw:

 Most discussion I would imagine be on the decision to remove D inline
 assembler support from gdc. So, nay sayers, do your worst, but
 unfortunately there is a +1 here for removal.

I suggest to try to do the opposite, that it to try to increase the current conformance of GDC to D/DMD specs (like introducing D calling conventions, if they are missing). Bye, bearophile

Not gonna happen. The D calling convention is Windows/32-bit only. Implementing a new calling convention in all major compiler back ends is not something you do trivially. Further, I doubt the GCC maintainers would actually approve of doing this. -- Alex Rønne Petersen alex lycus.org http://lycus.org
Jun 19 2012
next sibling parent reply =?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= <alex lycus.org> writes:
On 19-06-2012 21:05, bearophile wrote:
 Iain Buclaw:

 "Does D *really* require a new calling convention?
 Also does it *really* require naked support?

I guess the answers are yes and yes.
 I think naked support is a bad idea

Maybe he/she's wrong, and it's a good idea. Where are the deep discussions that justify his/her words?
 and people who require naked support should be writing an assembly
 function wrapper."

Why? And why they have to impose a design on another language they haven't designed? They are the oweners of their compilers and D is a guest, but imposing too much of your customs on a guest is not polite. Bye, bearophile

Because the guest wants to rearrange the host's home. D better have a good reason to do so. So far, I have seen none. -- Alex Rønne Petersen alex lycus.org http://lycus.org
Jun 19 2012
parent =?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= <alex lycus.org> writes:
On 19-06-2012 21:17, bearophile wrote:
 Alex Rønne Petersen:

 Because the guest wants to rearrange the host's home.

 D better have a good reason to do so. So far, I have seen none.

Lot of time ago I have suggested to Walter to look at the D design searching for features that are hard to implement on normal back-ends (at that time I was helping a bit the development of LDC), and reconsider them. So I agree asking for useless changes in a host's home is bad. On the other hand introducing one more calling convention is an additive change for GCC, so it's not a rearrangement. I have used inline asm many times in DMD, as I have used it many times in Delphi, I like a lot.

Just to make it clear: GDC still has GCC-style inline assembly.
 This is mostly a technical discussion, but I don't expect all Walter
 opinions to be the same as the opinions of GCC designers.
 Generally GDC should try to be as close to the D specs as possible :-)

 Bye,
 bearophile

-- Alex Rønne Petersen alex lycus.org http://lycus.org
Jun 19 2012
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/19/2012 11:57 AM, Iain Buclaw wrote:
 To quote from one of the i386 backend maintainers:
 ---
 "Does D *really* require a new calling convention?

No, but the idea was to allow D to innovate on calling conventions without disturbing code that needed to interface with C.
 Also does it *really* require naked support?
 I think naked support is a bad idea
 and people who require naked support should be writing an assembly
 function wrapper."
 ---

Naked support allows people to write max efficient assembler without needing to exit the language and use the (often miserable) standalone assembler.
Jun 19 2012
next sibling parent reply deadalnix <deadalnix gmail.com> writes:
Le 19/06/2012 22:08, Iain Buclaw a écrit :
  From what I gathered from further discussion, it made sense for
 embedded platforms, such as ARM, but not x86.

It has proven to be useful to me, not only for performances reasons, but also for low level manipulations. It don't see what make ARM that different on regard to inline assembly capabilities.
Jun 19 2012
next sibling parent reply deadalnix <deadalnix gmail.com> writes:
Le 19/06/2012 23:22, Manu a écrit :
 If you had the register alias feature I described above, would you be
 ale to write such low-level manipulations using intrinsics?
 I think I would be able to rewrite all x86 asm blocks I've ever written
 using that feature.

No, I couldn't. Such code involved stack manipulations that cannot be emulated by such a feature.
 ARM and PPC both have unique features relating to their branch control
 and branch prediction that x86 doesn't have. Sadly, all high level
 languages COMPLETELY overlook such features when designing high level
 expressions, because they are traditionally designed for x86 first.
 A thorough set of intrinsics can allow access to these features though,
 although since they're related to branch control/conditional execution,
 it feels clumsy, since you lose the feeling of structured code; ie, no
 scoped if blocks, loop constructs, etc,  if you have to use intrinsics
 to generate conditions or masks.

 ARM is the most common architecture on earth now. It would be nice if D
 were able to take better advantage of the architecture.

Even if it is true, you don't address the actual interrogation. The discussion was about the naked functionality, and some advanced that naked can be useful on ARM, but not on x86. The specificities of ARM you mention here don't explain that point. (I don't want to pronounce myself on PPC as I have no experience on it).
Jun 19 2012
parent reply deadalnix <deadalnix gmail.com> writes:
Le 19/06/2012 23:54, Manu a écrit :
 On 20 June 2012 00:41, deadalnix <deadalnix gmail.com
 <mailto:deadalnix gmail.com>> wrote:

     Le 19/06/2012 23:22, Manu a écrit :

         If you had the register alias feature I described above, would
         you be
         ale to write such low-level manipulations using intrinsics?
         I think I would be able to rewrite all x86 asm blocks I've ever
         written
         using that feature.


     No, I couldn't. Such code involved stack manipulations that cannot
     be emulated by such a feature.


 Really?
 Can you elaborate? Give me an example that couldn't be done with
 register aliasing and intrinsics?

Walter gave you examples. You'll find many others in druntime. Here is something I wrote recently that use this again : http://www.deadalnix.me/2012/03/24/get-an-exception-from-a-segfault-on-linux-x86-and-x86_64-using-some-black-magic/
Jun 19 2012
parent Walter Bright <newshound2 digitalmars.com> writes:
On 6/19/2012 4:23 PM, Manu wrote:
 That code could all be done with the register alias I described, and
 __push/__pop intrinsics.

Push/pop intrinsics won't work reliably, because on 16 byte aligned stack machines the compiler must emit stack alignment instructions at various points. With intrinsics, it won't know where to put the stack alignment.
Jun 19 2012
prev sibling parent =?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= <alex lycus.org> writes:
On 19-06-2012 23:22, Manu wrote:
 On 19 June 2012 23:59, deadalnix <deadalnix gmail.com
 <mailto:deadalnix gmail.com>> wrote:

     Le 19/06/2012 22:08, Iain Buclaw a écrit :

           From what I gathered from further discussion, it made sense for
         embedded platforms, such as ARM, but not x86.


     It has proven to be useful to me, not only for performances reasons,
     but also for low level manipulations.

     It don't see what make ARM that different on regard to inline
     assembly capabilities.


 If you had the register alias feature I described above, would you be
 ale to write such low-level manipulations using intrinsics?
 I think I would be able to rewrite all x86 asm blocks I've ever written
 using that feature.

 ARM and PPC both have unique features relating to their branch control
 and branch prediction that x86 doesn't have. Sadly, all high level
 languages COMPLETELY overlook such features when designing high level
 expressions, because they are traditionally designed for x86 first.

To be fair, ARM v8/AArch64 has eliminated predicated execution, simply because it turned out that the complexity of writing languages and compilers for it was not worth it, compared to just having good branch prediction.
 A thorough set of intrinsics can allow access to these features though,
 although since they're related to branch control/conditional execution,
 it feels clumsy, since you lose the feeling of structured code; ie, no
 scoped if blocks, loop constructs, etc,  if you have to use intrinsics
 to generate conditions or masks.

 ARM is the most common architecture on earth now. It would be nice if D
 were able to take better advantage of the architecture.

-- Alex Rønne Petersen alex lycus.org http://lycus.org
Jun 19 2012
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 6/19/2012 1:08 PM, Iain Buclaw wrote:
 On Tuesday, 19 June 2012 at 20:04:12 UTC, Walter Bright wrote:
 Naked support allows people to write max efficient assembler without needing
 to exit the language and use the (often miserable) standalone assembler.

From what I gathered from further discussion, it made sense for embedded platforms, such as ARM, but not x86.

I find occasion to use it now and then on x86.
Jun 19 2012
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/19/2012 1:36 PM, bearophile wrote:
 No, but the idea was to allow D to innovate on calling
 conventions without disturbing code that needed to
 interface with C.

The idea is nice, but ideas aren't enough. Where are the benchmarks that show a performance improvement over the C calling convention? And even if such improvement is present, is it worth it in the face of people that don't want to add it to GCC?

GDC can certainly define its D calling convention to match GCC's. It's an "implementation defined" thing, not a language defined one.
Jun 19 2012
parent reply =?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= <alex lycus.org> writes:
On 19-06-2012 23:52, Walter Bright wrote:
 On 6/19/2012 1:36 PM, bearophile wrote:
 No, but the idea was to allow D to innovate on calling
 conventions without disturbing code that needed to
 interface with C.

The idea is nice, but ideas aren't enough. Where are the benchmarks that show a performance improvement over the C calling convention? And even if such improvement is present, is it worth it in the face of people that don't want to add it to GCC?

GDC can certainly define its D calling convention to match GCC's. It's an "implementation defined" thing, not a language defined one.

Then let's please rename it to the DMD ABI instead of calling it the D ABI and making it look like it's part of the language on the website. Further, D mangling rules should be separate from calling convention. -- Alex Rønne Petersen alex lycus.org http://lycus.org
Jun 19 2012
next sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 06/20/2012 12:47 AM, Alex Rønne Petersen wrote:
 On 19-06-2012 23:52, Walter Bright wrote:
 On 6/19/2012 1:36 PM, bearophile wrote:
 No, but the idea was to allow D to innovate on calling
 conventions without disturbing code that needed to
 interface with C.

The idea is nice, but ideas aren't enough. Where are the benchmarks that show a performance improvement over the C calling convention? And even if such improvement is present, is it worth it in the face of people that don't want to add it to GCC?

GDC can certainly define its D calling convention to match GCC's. It's an "implementation defined" thing, not a language defined one.

Then let's please rename it to the DMD ABI instead of calling it the D ABI and making it look like it's part of the language on the website. Further, D mangling rules should be separate from calling convention.

IIRC currently, the calling convention is mangled into the symbol name. Do you want to remove this?
Jun 19 2012
parent reply =?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= <alex lycus.org> writes:
On 20-06-2012 01:55, Timon Gehr wrote:
 On 06/20/2012 12:47 AM, Alex Rønne Petersen wrote:
 On 19-06-2012 23:52, Walter Bright wrote:
 On 6/19/2012 1:36 PM, bearophile wrote:
 No, but the idea was to allow D to innovate on calling
 conventions without disturbing code that needed to
 interface with C.

The idea is nice, but ideas aren't enough. Where are the benchmarks that show a performance improvement over the C calling convention? And even if such improvement is present, is it worth it in the face of people that don't want to add it to GCC?

GDC can certainly define its D calling convention to match GCC's. It's an "implementation defined" thing, not a language defined one.

Then let's please rename it to the DMD ABI instead of calling it the D ABI and making it look like it's part of the language on the website. Further, D mangling rules should be separate from calling convention.

IIRC currently, the calling convention is mangled into the symbol name. Do you want to remove this?

Not that I can see from http://dlang.org/abi.html ? -- Alex Rønne Petersen alex lycus.org http://lycus.org
Jun 19 2012
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 06/20/2012 02:04 AM, Alex Rønne Petersen wrote:
 On 20-06-2012 01:55, Timon Gehr wrote:
 On 06/20/2012 12:47 AM, Alex Rønne Petersen wrote:
 On 19-06-2012 23:52, Walter Bright wrote:
 On 6/19/2012 1:36 PM, bearophile wrote:
 No, but the idea was to allow D to innovate on calling
 conventions without disturbing code that needed to
 interface with C.

The idea is nice, but ideas aren't enough. Where are the benchmarks that show a performance improvement over the C calling convention? And even if such improvement is present, is it worth it in the face of people that don't want to add it to GCC?

GDC can certainly define its D calling convention to match GCC's. It's an "implementation defined" thing, not a language defined one.

Then let's please rename it to the DMD ABI instead of calling it the D ABI and making it look like it's part of the language on the website. Further, D mangling rules should be separate from calling convention.

IIRC currently, the calling convention is mangled into the symbol name. Do you want to remove this?

Not that I can see from http://dlang.org/abi.html ?

TypeFunction: CallConvention FuncAttrs Arguments ArgClose Type CallConvention: F // D U // C W // Windows V // Pascal R // C++
Jun 19 2012
parent reply =?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= <alex lycus.org> writes:
On 20-06-2012 02:58, Timon Gehr wrote:
 On 06/20/2012 02:04 AM, Alex Rønne Petersen wrote:
 On 20-06-2012 01:55, Timon Gehr wrote:
 On 06/20/2012 12:47 AM, Alex Rønne Petersen wrote:
 On 19-06-2012 23:52, Walter Bright wrote:
 On 6/19/2012 1:36 PM, bearophile wrote:
 No, but the idea was to allow D to innovate on calling
 conventions without disturbing code that needed to
 interface with C.

The idea is nice, but ideas aren't enough. Where are the benchmarks that show a performance improvement over the C calling convention? And even if such improvement is present, is it worth it in the face of people that don't want to add it to GCC?

GDC can certainly define its D calling convention to match GCC's. It's an "implementation defined" thing, not a language defined one.

Then let's please rename it to the DMD ABI instead of calling it the D ABI and making it look like it's part of the language on the website. Further, D mangling rules should be separate from calling convention.

IIRC currently, the calling convention is mangled into the symbol name. Do you want to remove this?

Not that I can see from http://dlang.org/abi.html ?

TypeFunction: CallConvention FuncAttrs Arguments ArgClose Type CallConvention: F // D U // C W // Windows V // Pascal R // C++

I see. I think it's a mistake to call that calling convention "D". I'm not against removing it, but the description is highly misleading. -- Alex Rønne Petersen alex lycus.org http://lycus.org
Jun 19 2012
parent Don Clugston <dac nospam.com> writes:
On 20/06/12 03:01, Alex Rønne Petersen wrote:
 On 20-06-2012 02:58, Timon Gehr wrote:
 On 06/20/2012 02:04 AM, Alex Rønne Petersen wrote:
 On 20-06-2012 01:55, Timon Gehr wrote:
 On 06/20/2012 12:47 AM, Alex Rønne Petersen wrote:
 On 19-06-2012 23:52, Walter Bright wrote:
 On 6/19/2012 1:36 PM, bearophile wrote:
 No, but the idea was to allow D to innovate on calling
 conventions without disturbing code that needed to
 interface with C.

The idea is nice, but ideas aren't enough. Where are the benchmarks that show a performance improvement over the C calling convention? And even if such improvement is present, is it worth it in the face of people that don't want to add it to GCC?

GDC can certainly define its D calling convention to match GCC's. It's an "implementation defined" thing, not a language defined one.

Then let's please rename it to the DMD ABI instead of calling it the D ABI and making it look like it's part of the language on the website. Further, D mangling rules should be separate from calling convention.

IIRC currently, the calling convention is mangled into the symbol name. Do you want to remove this?

Not that I can see from http://dlang.org/abi.html ?

TypeFunction: CallConvention FuncAttrs Arguments ArgClose Type CallConvention: F // D U // C W // Windows V // Pascal R // C++

I see. I think it's a mistake to call that calling convention "D". I'm not against removing it, but the description is highly misleading.

And "C++ calling convention" doesn't make any sense. There is no such thing. On Windows, every vendor does it differently (even the ones who claim to be compatible with one another!).
Jun 20 2012
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/19/2012 3:47 PM, Alex Rønne Petersen wrote:
 On 19-06-2012 23:52, Walter Bright wrote:
 GDC can certainly define its D calling convention to match GCC's. It's
 an "implementation defined" thing, not a language defined one.

and making it look like it's part of the language on the website.

The ABI is not part of the language. For example, the C Standard says nothing whatsoever about the C ABI.
 Further, D mangling rules should be separate from calling convention.

I disagree. The mangling rules are not part of the language specification, either. But they are necessary so that a function with one convention won't be connected to one with another.
Jun 19 2012
parent reply =?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= <alex lycus.org> writes:
On 20-06-2012 03:01, Walter Bright wrote:
 On 6/19/2012 3:47 PM, Alex Rønne Petersen wrote:
 On 19-06-2012 23:52, Walter Bright wrote:
 GDC can certainly define its D calling convention to match GCC's. It's
 an "implementation defined" thing, not a language defined one.

ABI and making it look like it's part of the language on the website.

The ABI is not part of the language. For example, the C Standard says nothing whatsoever about the C ABI.

Then it's very misleading that it's under the language reference area of the website and calls it the "D ABI" and not the "DMD ABI". This might have been fine back when there was only DMD, but it really needs to be made clear that this is not an ABI that compilers are required to follow.
 Further, D mangling rules should be separate from calling convention.

I disagree. The mangling rules are not part of the language specification, either. But they are necessary so that a function with one convention won't be connected to one with another.

If compilers employed their own mangling schemes, debuggers and other tools would never be able to properly demangle names. I think it is important that the mangling is at least emphasized as a highly recommended (but not required) part of the language to implementors. -- Alex Rønne Petersen alex lycus.org http://lycus.org
Jun 19 2012
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/19/2012 6:06 PM, Alex Rønne Petersen wrote:
 On 20-06-2012 03:01, Walter Bright wrote:
 On 6/19/2012 3:47 PM, Alex Rønne Petersen wrote:
 On 19-06-2012 23:52, Walter Bright wrote:
 GDC can certainly define its D calling convention to match GCC's. It's
 an "implementation defined" thing, not a language defined one.

ABI and making it look like it's part of the language on the website.

The ABI is not part of the language. For example, the C Standard says nothing whatsoever about the C ABI.

Then it's very misleading that it's under the language reference area of the website and calls it the "D ABI" and not the "DMD ABI". This might have been fine back when there was only DMD, but it really needs to be made clear that this is not an ABI that compilers are required to follow.

You're probably right.
 Further, D mangling rules should be separate from calling convention.

I disagree. The mangling rules are not part of the language specification, either. But they are necessary so that a function with one convention won't be connected to one with another.

If compilers employed their own mangling schemes, debuggers and other tools would never be able to properly demangle names. I think it is important that the mangling is at least emphasized as a highly recommended (but not required) part of the language to implementors.

I don't think we need to worry about that. Implementers tend to follow existing practice unless there is a very, very good reason.
Jun 19 2012
next sibling parent deadalnix <deadalnix gmail.com> writes:
Le 20/06/2012 04:34, Walter Bright a écrit :
 I don't think we need to worry about that. Implementers tend to follow
 existing practice unless there is a very, very good reason.

Did you ever heard of a company named microsoft ?
Jun 20 2012
prev sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 6/20/2012 4:26 AM, Bernard Helyer wrote:
 I was sputtering with rage. Sputtering!

Look Dave, I can see you're really upset about this. I honestly think you ought to sit down calmly, take a stress pill, and think things over. I know I've made some very poor decisions recently, but I can give you my complete assurance that my work will be back to normal.
Jun 20 2012
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/19/2012 1:58 PM, Manu wrote:

 I find a thorough suite of architecture intrinsics are usually the fastest and
 cleanest way to the best possible code, although 'naked' may be handy in this
 circumstance too...

Do a grep for "naked" across the druntime library sources. For example, its use in druntime/src/rt/alloca.d, where it is very much needed, as alloca() is one of those "magic" functions.
 If a function is written from intrinsics, then it can inline and better adapt
to
 the calling context. It's very common that you use asm to write super-efficient
 micro-function (memory copying/compression, linear algebra, matrix routines,
 DSPs, etc), which are classic candidates for being inlined.
 So I maintain, naked is useful, but asm is not (assuming you have a high level
 way to address registers like the stack pointer directly).

Do a grep for "asm" across the druntime library sources. Can you justify all of that with some other scheme?
 Thinking more about the implications of removing the inline asm, what would
 REALLY roxors, would be a keyword to insist a variable is represented by a
 register, and by extension, to associate it with a specific register:
    register int x;             // compiler assigns an unused register, promises
 it will remain resident, error if it can't maintain promise.
    register int x : rsp;    // x aliases RSP; can now produce a function
 pre/postable in high level code.
 Repeat for the argument registers -> readable, high-level custom calling
 conventions!

This was a failure in C.
 This would almost entirely eliminate the usefulness of an inline assembler.
 Better yet, this could use the 'new' attribute syntax, which most agree will
 support arguments:
  register(rsp) int x;

Some C compilers did have such pseudo-register abilities. It was a failure in practice. I really don't understand preferring all these rather convoluted enhancements to avoid something simple and straightforward like the inline assembler. The use of IA in the D runtime library, for example, has been quite successful. For example, consider this bit from druntime/src/rt/lifetime.d: ------------------------------------------------------------------- auto isshared = ti.classinfo is TypeInfo_Shared.classinfo; auto bic = !isshared ? __getBlkInfo((*p).ptr) : null; auto info = bic ? *bic : gc_query((*p).ptr); auto size = ti.next.tsize(); version (D_InlineAsm_X86) { size_t reqsize = void; asm { mov EAX, newcapacity; mul EAX, size; mov reqsize, EAX; jc Loverflow; } } else { size_t reqsize = size * newcapacity; if (newcapacity > 0 && reqsize / newcapacity != size) goto Loverflow; } // step 2, get the actual "allocated" size. If the allocated size does not // match what we expect, then we will need to reallocate anyways. // TODO: this probably isn't correct for shared arrays size_t curallocsize = void; size_t curcapacity = void; size_t offset = void; size_t arraypad = void; ----------------------------------------------
Jun 19 2012
next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 6/19/2012 3:55 PM, Manu wrote:
 On 20 June 2012 01:07, Walter Bright <newshound2 digitalmars.com
     Do a grep for "naked" across the druntime library sources. For example, its
     use in druntime/src/rt/alloca.d, where it is very much needed, as alloca()
     is one of those "magic" functions.
 I never argued against naked... I agree it's mandatory.

Then I misunderstood you.
     Do a grep for "asm" across the druntime library sources. Can you justify
all
     of that with some other scheme?


 I think almost all the blocks I just browsed through could be easily written
 with nothing more than the register alias feature I suggested, and perhaps a
 couple of opcode intrinsics.

But I see nothing gained by that.
 And as a bonus, they would also be readable.

I don't agree. The point of IA to me is so I can specify exactly what I want. If I wanted to do it at a higher level, I'd use normal D syntax.
 I can imagine cases where the
 optimiser would have more freedom too.

But if I'm writing IA, I want to do it my way. Not the optimizer's way, which may or may not be able to give me what I want.
         Thinking more about the implications of removing the inline asm, what
would
         REALLY roxors, would be a keyword to insist a variable is represented
by a
         register, and by extension, to associate it with a specific register:


     This was a failure in C.


 Really?

Yes. C has a register keyword, and nobody uses it anymore. The troubles are many, starting with people always "register"ed the wrong variables, and it really didn't work out too well when compilers started doing live range register assignments. It's ignored by modern C compilers, and hasn't been carried forward into other languages.
 This is the missing link between mandatory asm blocks, and being able to
 do it in high level code with intrinsics.
 The 'register' keyword was similarly fail as 'inline'.. __forceinline was not
 fail, it is actually mandatory. I'd argue that __forceregister would be
 similarly useful in C aswell, but the real power would come from being able to
 specify the particular register to alias.

         This would almost entirely eliminate the usefulness of an inline
assembler.
         Better yet, this could use the 'new' attribute syntax, which most
agree will
         support arguments:
          register(rsp) int x;


     Some C compilers did have such pseudo-register abilities. It was a failure
     in practice.


 Really? I've never seen that. What about it was fail?

It's actually in DMC, believe it or not. It was a giant failure because nobody used it. It was in Borland's TurboC, too. It pretty much just throws a wrench into the gears of more sophisticated code generators.
     I really don't understand preferring all these rather convoluted
     enhancements to avoid something simple and straightforward like the inline
     assembler. The use of IA in the D runtime library, for example, has been
     quite successful.


 I agree, IA is useful and has been successful, but it has drawbacks too.
    * IA ruins optimisation around the IA block

dmd's optimizer is not so sensitive to that.
    * IA doesn't inline well.

True, but that's fixable (excluding naked functions). Currently, you can use mixins to do it.
 intrinsics allow much greater opportunity for
 efficient integration into the calling context
    * most IA functions are small, and prime candidates for inlining (see points
 1 and 2)
    * IA is difficult for the majority of programmers to follow/understand

IA isn't for everyone. But when you do need it, it has been a marvelous tool for D.
    * even to experienced programmers, poorly commented asm takes a lot of time
 to mentally parse

 It's a shame that there are IA constructs that can't be expressed any other
way.
 I don't think it would take much to address that.

 This one seems trivial, you just need one intrinsic:

    size_t reqsize = size * newcapacity;
    __jc(&Loverflow);

That's highly risky. The optimizer knows nothing at all about the state of the flags register, and does not take into account a dependency on the C flag when doing code motion. Nor would the compiler guarantee that the C flag is even set by however it chose to do the previous multiply (for example, the LEA instruction is often used to do multiplies, which leaves the C flag untouched. Oops!). Nothing connects the __jc intrinsic to that multiply operation.
  Although it depends on a '&codeLabel' mechanism to get the label address (GCC
 supports this in C, I'd love to see this in D too).

Note that supporting such will wind up disabling a lot of the data flow analysis, which is not set up to handle unknown edges between basic blocks. To summarize, I see a lot of complex new features, a significant rewrite of the optimizer, and a rewrite of a lot of existing code, and at the end of all that we're pretty much at the same state we are at now.
Jun 19 2012
prev sibling parent reply Don Clugston <dac nospam.com> writes:
On 20/06/12 00:55, Manu wrote:
 On 20 June 2012 01:07, Walter Bright <newshound2 digitalmars.com
 <mailto:newshound2 digitalmars.com>> wrote:

     On 6/19/2012 1:58 PM, Manu wrote:

         I find a thorough suite of architecture intrinsics are usually
         the fastest and
         cleanest way to the best possible code, although 'naked' may be
         handy in this
         circumstance too...


     Do a grep for "naked" across the druntime library sources. For
     example, its use in druntime/src/rt/alloca.d, where it is very much
     needed, as alloca() is one of those "magic" functions.


 I never argued against naked... I agree it's mandatory.


     Do a grep for "asm" across the druntime library sources. Can you
     justify all of that with some other scheme?


 I think almost all the blocks I just browsed through could be easily
 written with nothing more than the register alias feature I suggested,
 and perhaps a couple of opcode intrinsics.
 And as a bonus, they would also be readable. I can imagine cases where
 the optimiser would have more freedom too.


         Thinking more about the implications of removing the inline asm,
         what would
         REALLY roxors, would be a keyword to insist a variable is
         represented by a
         register, and by extension, to associate it with a specific
         register:


     This was a failure in C.


 Really? This is the missing link between mandatory asm blocks, and being
 able to do it in high level code with intrinsics.
 The 'register' keyword was similarly fail as 'inline'.. __forceinline
 was not fail, it is actually mandatory. I'd argue that __forceregister
 would be similarly useful in C aswell, but the real power would come
 from being able to specify the particular register to alias.

         This would almost entirely eliminate the usefulness of an inline
         assembler.
         Better yet, this could use the 'new' attribute syntax, which
         most agree will
         support arguments:
          register(rsp) int x;


     Some C compilers did have such pseudo-register abilities. It was a
     failure in practice.


 Really? I've never seen that. What about it was fail?

     I really don't understand preferring all these rather convoluted
     enhancements to avoid something simple and straightforward like the
     inline assembler. The use of IA in the D runtime library, for
     example, has been quite successful.


 I agree, IA is useful and has been successful, but it has drawbacks too.
    * IA ruins optimisation around the IA block
    * IA doesn't inline well. intrinsics allow much greater opportunity
 for efficient integration into the calling context
    * most IA functions are small, and prime candidates for inlining (see
 points 1 and 2)

You and I seem to be from different planets. I have almost never written as asm function which was suitable for inlining. Take a look at std.internal.math.biguintX86.d I do not know how to write that code without inline asm.
Jun 20 2012
parent reply Don Clugston <dac nospam.com> writes:
On 20/06/12 13:22, Manu wrote:
 On 20 June 2012 13:59, Don Clugston <dac nospam.com
 <mailto:dac nospam.com>> wrote:

     You and I seem to be from different planets. I have almost never
     written as asm function which was suitable for inlining.

     Take a look at std.internal.math.biguintX86.d

     I do not know how to write that code without inline asm.


 Interesting.
 I wish I could paste some counter-examples, but they're all proprietary >_<

 I think they key detail here is where you stated, they _always_ include
 a loop. Is this because it's hard to manipulate the compiler into the
 correct interaction with the flags register?

No. It's just because speed doesn't matter outside loops. A consequence of having the loop be inside the asm code, is that the parameter passing is much less significant for speed, and calling convention is the big
 I'd be interested to compare the compiled D code, and your hand written
 asm code, to see where exactly the optimiser goes wrong. It doesn't look
 like you're exploiting too many tricks (at a brief glance), it's just
 nice tight hand written code, which the optimiser should theoretically
 be able to get right...

Theoretically, yes. In practice, DMD doesn't get anywhere near, and gcc isn't much better. I don't think there's any reason why they couldn't, but I don't have much hope that they will. As you say, the code looks fairly straightforward, but actually there are very many similar ways of writing the code, most of which are much slower. There are many bottlenecks you need to avoid. I was only able to get it to that speed by using the processor profiling registers. So, my original two uses for asm are actually: (1) when the language prevents you from accessing low-level functionality; and (2) when the optimizer isn't good enough.
 I find optimisers are very good at code simplification, assuming that
 you massage the code/expressions to neatly match any architectural quirks.
 I also appreciate that good x86 code is possibly the hardest
 architecture for an optimiser to get right...

Optimizers improved enormously during the 80's and 90's, but the rate of improvement seems to have slowed. With x86, out-of-order execution has made it very easy to get reasonably good code, and much harder to achieve perfection. Still, Core i7 is much easier than Core2, since Intel removed one of the most complicated bottlenecks (on core2 and earlier there is a max 3 reads per cycle, of registers you haven't written to in the previous 3 cycles).
Jun 20 2012
parent Don Clugston <dac nospam.com> writes:
On 20/06/12 16:37, Manu wrote:
 On 20 June 2012 17:15, Don Clugston <dac nospam.com
 <mailto:dac nospam.com>> wrote:

     On 20/06/12 13:22, Manu wrote:

         I find optimisers are very good at code simplification, assuming
         that

         you massage the code/expressions to neatly match any
         architectural quirks.
         I also appreciate that good x86 code is possibly the hardest
         architecture for an optimiser to get right...


     Optimizers improved enormously during the 80's and 90's, but the
     rate of improvement seems to have slowed.

     With x86, out-of-order execution has made it very easy to get
     reasonably good code, and much harder to achieve perfection. Still,
     Core i7 is much easier than Core2, since Intel removed one of the
     most complicated bottlenecks (on core2 and earlier there is a max 3
     reads per cycle, of registers you haven't written to in the previous
     3 cycles).


 Yeah okay, I can easily imagine the complexity for an x86 codegen.
 RISC architectures are so much more predictable.

 How do you define 'perfection'? Performance as measured on what
 particular machine? :)

The theoretical limit for a particular architecture. Eg in BigInt, the most crucial functions an integer multiply in each loop iteration. Since the machine only has one integer multiply unit, it is impossible to do better than one multiply per cycle. If you've achieved that, it's perfect. If the processors are different enough you may also need a separate branch for different processors.
Jun 21 2012
prev sibling next sibling parent Brad Roberts <braddr slice-2.puremagic.com> writes:
On Tue, 19 Jun 2012, Walter Bright wrote:

 On 6/19/2012 3:47 PM, Alex R?nne Petersen wrote:
 On 19-06-2012 23:52, Walter Bright wrote:
 GDC can certainly define its D calling convention to match GCC's. It's
 an "implementation defined" thing, not a language defined one.

and making it look like it's part of the language on the website.

The ABI is not part of the language. For example, the C Standard says nothing whatsoever about the C ABI.
 Further, D mangling rules should be separate from calling convention.

I disagree. The mangling rules are not part of the language specification, either. But they are necessary so that a function with one convention won't be connected to one with another.

Let's not repeat the mistakes of c++, the mangling has got to be part of the language definition to facilitate interoperability between compilers. Similarily, the runtime's need to be interchangable. Requiring the entire body of code to all come from the same compiler would be horrible.
Jun 19 2012
prev sibling parent reply deadalnix <deadalnix gmail.com> writes:
Le 19/06/2012 22:58, Manu a écrit :
 Thinking more about the implications of removing the inline asm, what
 would REALLY roxors, would be a keyword to insist a variable is
 represented by a register, and by extension, to associate it with a
 specific register:
    register int x;             // compiler assigns an unused register,
 promises it will remain resident, error if it can't maintain promise.
    register int x : rsp;    // x aliases RSP; can now produce a function
 pre/postable in high level code.
 Repeat for the argument registers -> readable, high-level custom calling
 conventions!

 This would almost entirely eliminate the usefulness of an inline assembler.
 Better yet, this could use the 'new' attribute syntax, which most agree
 will support arguments:
  register(rsp) int x;

Choosing registers is something the compiler is better at than us most of the time. For this very reason, I think we want to go in the exact opposite direction : asm with compiler choosen register when possible.
Jun 20 2012
parent deadalnix <deadalnix gmail.com> writes:
Le 20/06/2012 09:58, Manu a écrit :
 On 20 June 2012 10:42, deadalnix <deadalnix gmail.com
 <mailto:deadalnix gmail.com>> wrote:

     Le 19/06/2012 22:58, Manu a écrit :

         This would almost entirely eliminate the usefulness of an inline
         assembler.
         Better yet, this could use the 'new' attribute syntax, which
         most agree
         will support arguments:
          register(rsp) int x;


     Choosing registers is something the compiler is better at than us
     most of the time.

     For this very reason, I think we want to go in the exact opposite
     direction : asm with compiler choosen register when possible.


 ...I think you've missed the entire point of my suggestion.
 But that's okay. I give up ;)

We presented you example code where your approach isn't going to do the trick. You are free to ignore them.
Jun 20 2012
prev sibling parent reply deadalnix <deadalnix gmail.com> writes:
Le 19/06/2012 20:51, Alex Rønne Petersen a écrit :
 On 19-06-2012 20:44, bearophile wrote:
 Iain Buclaw:

 Most discussion I would imagine be on the decision to remove D inline
 assembler support from gdc. So, nay sayers, do your worst, but
 unfortunately there is a +1 here for removal.

I suggest to try to do the opposite, that it to try to increase the current conformance of GDC to D/DMD specs (like introducing D calling conventions, if they are missing). Bye, bearophile

Not gonna happen. The D calling convention is Windows/32-bit only. Implementing a new calling convention in all major compiler back ends is not something you do trivially. Further, I doubt the GCC maintainers would actually approve of doing this.

frankly, we don't care. 32bits windows is not the plateform of the future. GDC and DMD have consistent ABI on all other plateforms. This allow to write code that compile both with DMD and GDC on most plateform. This make no sense to drop D asm support. This is not because the situation is b0rken on window 32 bits that we should break it on all other plateforms. The asm syntax should be DMD compliant on x86 and x86_64. Plus, gcc asm syntax is horrible, and DMD's is really nice.
Jun 19 2012
parent reply =?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= <alex lycus.org> writes:
On 19-06-2012 22:56, Trass3r wrote:
 Plus, gcc asm syntax is horrible, and DMD's is really nice.

yep, AT&T vs. Intel syntax :)

Please be informed that GCC inline asm supports Intel syntax... -- Alex Rønne Petersen alex lycus.org http://lycus.org
Jun 19 2012
parent =?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= <alex lycus.org> writes:
On 20-06-2012 00:48, Trass3r wrote:
 Please be informed that GCC inline asm supports Intel syntax...

With -masm=intel.

No, you can tell the inline assembler to use Intel syntax from inside code. Iain showed me how on IRC at some point, but I forget the specifics. -- Alex Rønne Petersen alex lycus.org http://lycus.org
Jun 19 2012
prev sibling next sibling parent Iain Buclaw <ibuclaw ubuntu.com> writes:
On 19 June 2012 19:44, bearophile <bearophileHUGS lycos.com> wrote:
 Iain Buclaw:


 Most discussion I would imagine be on the decision to remove D inline
 assembler support from gdc. =A0So, nay sayers, do your worst, but
 unfortunately there is a +1 here for removal.

I suggest to try to do the opposite, that it to try to increase the current conformance of GDC to D/DMD specs (like introducing D calling conventions, if they are missing). Bye, bearophile

The D spec states: The extern (C) and extern (D) calling convention matches the C calling convention used by the supported C compiler on the host system. Except that the extern (D) calling convention for Windows x86 is described here. </insert D calling convention> So GDC already conforms to the spec on all platforms except Windows x86. --=20 Iain Buclaw *(p < e ? p++ : p) =3D (c & 0x0f) + '0';
Jun 19 2012
prev sibling next sibling parent Iain Buclaw <ibuclaw ubuntu.com> writes:
On 19 June 2012 19:51, Alex R=F8nne Petersen <alex lycus.org> wrote:
 On 19-06-2012 20:44, bearophile wrote:
 Iain Buclaw:

 Most discussion I would imagine be on the decision to remove D inline
 assembler support from gdc. So, nay sayers, do your worst, but
 unfortunately there is a +1 here for removal.

I suggest to try to do the opposite, that it to try to increase the current conformance of GDC to D/DMD specs (like introducing D calling conventions, if they are missing). Bye, bearophile

Not gonna happen. The D calling convention is Windows/32-bit only. Implementing a new calling convention in all major compiler back ends is =

 something you do trivially. Further, I doubt the GCC maintainers would
 actually approve of doing this.

To quote from one of the i386 backend maintainers: --- "Does D *really* require a new calling convention? Also does it *really* require naked support? I think naked support is a bad idea and people who require naked support should be writing an assembly function wrapper." --- --=20 Iain Buclaw *(p < e ? p++ : p) =3D (c & 0x0f) + '0';
Jun 19 2012
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
Alex Rønne Petersen:

Further, I doubt the GCC maintainers would actually approve of 
doing this.

I know nothing about how GCC maintainers do their work. Can you explain why they have accepted a D front-end but refuse a change like that? Are they believing that D is a kind of re-syntaxed C/C++ that requires zero back-end changes? Are LLVM developers sharing the same opinions? In the LLVM blog I have read plenty about small but significant changes in LLVM needed to implement an efficient back-end for GHC (the main Haskell compiler), so are LLVM developers different and more gentle than the GCC devs? Thank you for your answers, bearophile
Jun 19 2012
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
Iain Buclaw:

 "Does D *really* require a new calling convention?
 Also does it *really* require naked support?

I guess the answers are yes and yes.
 I think naked support is a bad idea

Maybe he/she's wrong, and it's a good idea. Where are the deep discussions that justify his/her words?
 and people who require naked support should be writing an 
 assembly function wrapper."

Why? And why they have to impose a design on another language they haven't designed? They are the oweners of their compilers and D is a guest, but imposing too much of your customs on a guest is not polite. Bye, bearophile
Jun 19 2012
prev sibling next sibling parent reply Russel Winder <russel winder.org.uk> writes:
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Tue, 2012-06-19 at 20:19 +0200, Iain Buclaw wrote:
[=E2=80=A6]
 1) D Inline Asm and naked function support is raising far too=20
 many alarm bells. So would just be easier to remove it and avoid=20
 all the other comments on why we need middle-end and backend=20
 headers in gdc.

I can do without inline assembly language. If I need assembly language code, I can write a function in assembly language.
 2) Code with #if V1 and V2 raised another bell with the request=20
 to remove all code that relies on internal macros with proper=20
 if() conditions. If something is always going to be turned off,=20
 remove it.
=20
 So, we shall also be saying bye bye D1 in GDC.  We'll miss you!

I never used V1 (though I do have the Tango book) so enforcing V2 will nto be a problem. Actually this means I can delete 65% of the SCons D tool :-)))))))))=20
 3) For anyone who has submitted patches for Mingw and Apple -=20
 sorry, but I'm going to have to yank out or alter certain bits. =20
 Apple GCC is irrelevant now, and some Mingw checks look for=20
 if(target) when it should really be checking if(host) and vice=20
 versa!

Is Apple GCC irrelevant? Apple itself has switched to Clang but GCC is still available via MacPorts =E2=80=93 or am I missing something obvious, I= am only an occasional Mac OS X user.=20
 Most discussion I would imagine be on the decision to remove D=20
 inline assembler support from gdc.  So, nay sayers, do your=20
 worst, but unfortunately there is a +1 here for removal.

I could put in a -1 if pushed, but +1 is fine by me! --=20 Russel. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.n= et 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder
Jun 19 2012
parent Jacob Carlborg <doob me.com> writes:
On 2012-06-19 21:14, Russel Winder wrote:

 I never used V1 (though I do have the Tango book) so enforcing V2 will
 nto be a problem. Actually this means I can delete 65% of the SCons D
 tool :-)))))))))

You can use the Tango book with D2.
 3) For anyone who has submitted patches for Mingw and Apple -
 sorry, but I'm going to have to yank out or alter certain bits.
 Apple GCC is irrelevant now, and some Mingw checks look for
 if(target) when it should really be checking if(host) and vice
 versa!

Is Apple GCC irrelevant? Apple itself has switched to Clang but GCC is still available via MacPorts – or am I missing something obvious, I am only an occasional Mac OS X user.

Probably not so much. Note that GCC is still available from Apple, still stuck at 4.2.x. -- /Jacob Carlborg
Jun 19 2012
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
Alex Rønne Petersen:

 Because the guest wants to rearrange the host's home.

 D better have a good reason to do so. So far, I have seen none.

Lot of time ago I have suggested to Walter to look at the D design searching for features that are hard to implement on normal back-ends (at that time I was helping a bit the development of LDC), and reconsider them. So I agree asking for useless changes in a host's home is bad. On the other hand introducing one more calling convention is an additive change for GCC, so it's not a rearrangement. I have used inline asm many times in DMD, as I have used it many times in Delphi, I like a lot. This is mostly a technical discussion, but I don't expect all Walter opinions to be the same as the opinions of GCC designers. Generally GDC should try to be as close to the D specs as possible :-) Bye, bearophile
Jun 19 2012
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 6/19/2012 11:19 AM, Iain Buclaw wrote:
 1) D Inline Asm and naked function support is raising far too many alarm bells.
 So would just be easier to remove it and avoid all the other comments on why we
 need middle-end and backend headers in gdc.

I'm not clear on why the inline assembler is a problem?
Jun 19 2012
prev sibling next sibling parent "Iain Buclaw" <ibuclaw ubuntu.com> writes:
On Tuesday, 19 June 2012 at 20:04:12 UTC, Walter Bright wrote:
 On 6/19/2012 11:57 AM, Iain Buclaw wrote:
 To quote from one of the i386 backend maintainers:
 ---
 "Does D *really* require a new calling convention?

No, but the idea was to allow D to innovate on calling conventions without disturbing code that needed to interface with C.
 Also does it *really* require naked support?
 I think naked support is a bad idea
 and people who require naked support should be writing an 
 assembly
 function wrapper."
 ---

Naked support allows people to write max efficient assembler without needing to exit the language and use the (often miserable) standalone assembler.

From what I gathered from further discussion, it made sense for embedded platforms, such as ARM, but not x86.
Jun 19 2012
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
Iain Buclaw:

 From what I gathered from further discussion, it made sense for 
 embedded platforms, such as ARM, but not x86.

Why? If this whole discussion wants to go somewhere, and not just be a waste of everyone time like similar past discussions on such topics, it needs a *much larger* amount of technical arguments, both pro and against the various positions. This discussion also requires people that work on LDC, to see their problems. If I write inline asm code on DMD I'd really like the code to compile on GDC too, on the same CPU and OS. ---------------------- Walter:
 No, but the idea was to allow D to innovate on calling
 conventions without disturbing code that needed to
 interface with C.

The idea is nice, but ideas aren't enough. Where are the benchmarks that show a performance improvement over the C calling convention? And even if such improvement is present, is it worth it in the face of people that don't want to add it to GCC? Bye, bearophile
Jun 19 2012
prev sibling next sibling parent reply Manu <turkeyman gmail.com> writes:
--20cf300facd7c2bcd204c2d94b96
Content-Type: text/plain; charset=UTF-8

On 19 June 2012 21:19, Iain Buclaw <ibuclaw ubuntu.com> wrote:

 1) D Inline Asm and naked function support is raising far too many alarm
 bells. So would just be easier to remove it and avoid all the other
 comments on why we need middle-end and backend headers in gdc.

Inline assembly has been relatively useless in GCC for years. Inline asm interferes with the optimisers ability to do a good job, which basically makes use of inline assembly self-defeating. The only time I ever need to use inline-asm is to interface an arch feature that has no API. As long as there are intrinsics for all the opcodes one might want, then it's better to use them. There are 2 operations that spring to mind that typically don't have intrinsics, or high level API's, which I always use asm to interface; the fine-grain manual manipulation of the flags register on PPC (ie, the '.' suite of opcodes), and conditional execution opcodes on ARM. Neither of these have high level expressions, and they are both relatively important. That said, as stated above, if use of this stuff is for performance, then using an inline-asm block will ruin the surrounding code anyway, so I almost always find I'm required to write the entire function in asm to achieve the expected result... I see no major loss to removing the inline assembler. I would like to know what the issue is though? Why are you compelled to remove it? I thought GCC optionally supported the microsoft asm syntax instead, which should make it syntactically consistent with D? --20cf300facd7c2bcd204c2d94b96 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable <div class=3D"gmail_quote">On 19 June 2012 21:19, Iain Buclaw <span dir=3D"= ltr">&lt;<a href=3D"mailto:ibuclaw ubuntu.com" target=3D"_blank">ibuclaw ub= untu.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style= =3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> 1) D Inline Asm and naked function support is raising far too many alarm be= lls. So would just be easier to remove it and avoid all the other comments = on why we need middle-end and backend headers in gdc.<br></blockquote><div> <br></div><div>Inline assembly has been relatively useless in GCC for years= . Inline asm interferes with the optimisers ability to do a good job, which= basically makes use of inline assembly self-defeating.</div><div>The only = time I ever need to use inline-asm is to interface an arch feature that has= no API. As long as there are intrinsics for all the opcodes one might want= , then it&#39;s better to use them.</div> <div><br></div><div>There are 2 operations that spring to mind that typical= ly don&#39;t have intrinsics, or high level API&#39;s, which I always use a= sm to interface;=C2=A0the fine-grain manual manipulation of the flags regis= ter on PPC (ie, the &#39;.&#39; suite of opcodes), and conditional executio= n opcodes on ARM. Neither of these have high level expressions, and they ar= e both relatively important.</div> <div>That said, as stated above, if use of this stuff is for performance, t= hen using an inline-asm block will ruin the surrounding code anyway, so I a= lmost always find I&#39;m required to write the entire function in asm to a= chieve the expected result...</div> <div><br></div><div>I see no major loss to removing the inline assembler.</= div><div>I would like to know what the issue is though? Why are you compell= ed to remove it?</div><div>I thought GCC optionally supported the microsoft= asm syntax instead, which should make it syntactically consistent with D?<= /div> </div> --20cf300facd7c2bcd204c2d94b96--
Jun 19 2012
parent =?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= <alex lycus.org> writes:
On 19-06-2012 22:40, Manu wrote:
 On 19 June 2012 21:19, Iain Buclaw <ibuclaw ubuntu.com
 <mailto:ibuclaw ubuntu.com>> wrote:

     1) D Inline Asm and naked function support is raising far too many
     alarm bells. So would just be easier to remove it and avoid all the
     other comments on why we need middle-end and backend headers in gdc.


 Inline assembly has been relatively useless in GCC for years. Inline asm
 interferes with the optimisers ability to do a good job, which basically
 makes use of inline assembly self-defeating.
 The only time I ever need to use inline-asm is to interface an arch
 feature that has no API. As long as there are intrinsics for all the
 opcodes one might want, then it's better to use them.

 There are 2 operations that spring to mind that typically don't have
 intrinsics, or high level API's, which I always use asm to
 interface; the fine-grain manual manipulation of the flags register on
 PPC (ie, the '.' suite of opcodes), and conditional execution opcodes on
 ARM. Neither of these have high level expressions, and they are both
 relatively important.
 That said, as stated above, if use of this stuff is for performance,
 then using an inline-asm block will ruin the surrounding code anyway, so
 I almost always find I'm required to write the entire function in asm to
 achieve the expected result...

 I see no major loss to removing the inline assembler.
 I would like to know what the issue is though? Why are you compelled to
 remove it?
 I thought GCC optionally supported the microsoft asm syntax instead,
 which should make it syntactically consistent with D?

Not "Microsoft", but Intel syntax. But GDC's inline assembly syntax is very different from DMD's: https://bitbucket.org/goshawk/gdc/wiki/UserDocumentation#!extended-assembler -- Alex Rønne Petersen alex lycus.org http://lycus.org
Jun 19 2012
prev sibling next sibling parent reply Trass3r <un known.com> writes:
 1) D Inline Asm and naked function support is raising far too many alarm  
 bells. So would just be easier to remove it and avoid all the other  
 comments on why we need middle-end and backend headers in gdc.

And the C++ frontend doesn't need these headers for its inline assembler implementation?
Jun 19 2012
parent =?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= <alex lycus.org> writes:
On 19-06-2012 22:51, Trass3r wrote:
 1) D Inline Asm and naked function support is raising far too many
 alarm bells. So would just be easier to remove it and avoid all the
 other comments on why we need middle-end and backend headers in gdc.

And the C++ frontend doesn't need these headers for its inline assembler implementation?

No, it passes the assembly on to the assembler. Simple as that. -- Alex Rønne Petersen alex lycus.org http://lycus.org
Jun 19 2012
prev sibling next sibling parent Trass3r <un known.com> writes:
 Plus, gcc asm syntax is horrible, and DMD's is really nice.

yep, AT&T vs. Intel syntax :)
Jun 19 2012
prev sibling next sibling parent Manu <turkeyman gmail.com> writes:
--20cf3074b37ead9e8504c2d98be5
Content-Type: text/plain; charset=UTF-8

On 19 June 2012 23:03, Walter Bright <newshound2 digitalmars.com> wrote:

 On 6/19/2012 11:57 AM, Iain Buclaw wrote:

 To quote from one of the i386 backend maintainers:
 ---
 "Does D *really* require a new calling convention?

No, but the idea was to allow D to innovate on calling conventions without disturbing code that needed to interface with C.

Properly implemented multiple-return-values being the killer app here! Using ALL the argument registers for returning multiple values aswell ;) Also does it *really* require naked support?
 I think naked support is a bad idea
 and people who require naked support should be writing an assembly
 function wrapper."
 ---

Naked support allows people to write max efficient assembler without needing to exit the language and use the (often miserable) standalone assembler.

I find a thorough suite of architecture intrinsics are usually the fastest and cleanest way to the best possible code, although 'naked' may be handy in this circumstance too... If a function is written from intrinsics, then it can inline and better adapt to the calling context. It's very common that you use asm to write super-efficient micro-function (memory copying/compression, linear algebra, matrix routines, DSPs, etc), which are classic candidates for being inlined. So I maintain, naked is useful, but asm is not (assuming you have a high level way to address registers like the stack pointer directly). Thinking more about the implications of removing the inline asm, what would REALLY roxors, would be a keyword to insist a variable is represented by a register, and by extension, to associate it with a specific register: register int x; // compiler assigns an unused register, promises it will remain resident, error if it can't maintain promise. register int x : rsp; // x aliases RSP; can now produce a function pre/postable in high level code. Repeat for the argument registers -> readable, high-level custom calling conventions! This would almost entirely eliminate the usefulness of an inline assembler. Better yet, this could use the 'new' attribute syntax, which most agree will support arguments: register(rsp) int x; --20cf3074b37ead9e8504c2d98be5 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable <div class=3D"gmail_quote">On 19 June 2012 23:03, Walter Bright <span dir= =3D"ltr">&lt;<a href=3D"mailto:newshound2 digitalmars.com" target=3D"_blank= ">newshound2 digitalmars.com</a>&gt;</span> wrote:<br><blockquote class=3D"= gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-= left:1ex"> <div class=3D"im">On 6/19/2012 11:57 AM, Iain Buclaw wrote:<br> </div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-l= eft:1px #ccc solid;padding-left:1ex"><div class=3D"im"> To quote from one of the i386 backend maintainers:<br> ---<br></div><div class=3D"im"> &quot;Does D *really* require a new calling convention?<br> </div></blockquote> <br> No, but the idea was to allow D to innovate on calling conventions without = disturbing code that needed to interface with C.<br></blockquote><div><br><= /div><div>Properly implemented multiple-return-values being the killer app = here! Using ALL the argument registers for returning multiple values aswell= ;)</div> <div><br></div><div><br></div><blockquote class=3D"gmail_quote" style=3D"ma= rgin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p= x #ccc solid;padding-left:1ex"><div class=3D"im"> Also does it *really* require naked support?<br></div><div class=3D"im"> I think naked support is a bad idea<br></div><div class=3D"im"> and people who require naked support should be writing an assembly<br> function wrapper.&quot;<br></div> ---<br> </blockquote> <br> Naked support allows people to write max efficient assembler without needin= g to exit the language and use the (often miserable) standalone assembler.<= br></blockquote><div><br></div><div>I find a thorough suite of architecture= intrinsics are usually the fastest and cleanest way to the best possible c= ode, although &#39;naked&#39; may be handy in this circumstance too...</div=

r adapt to the calling context. It&#39;s very common that you use asm to wr= ite super-efficient micro-function (memory copying/compression, linear alge= bra, matrix routines, DSPs, etc), which are classic candidates for being in= lined.</div> <div>So I maintain, naked is useful, but asm is not (assuming you have a hi= gh level way to address registers like the stack pointer directly).</div><d= iv><br></div><div><br></div><div>Thinking more about the implications of re= moving the inline asm, what would REALLY roxors, would be a keyword to insi= st a variable is represented by a register, and by extension, to associate = it with a specific register:</div> <div>=C2=A0 register int x; =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 // co= mpiler assigns an unused register, promises it will remain resident, error = if it can&#39;t maintain promise.</div><div>=C2=A0 register int x : rsp; = =C2=A0 =C2=A0// x aliases RSP; can now produce a function pre/postable in h= igh level code.</div> <div>Repeat for the argument registers -&gt; readable, high-level custom ca= lling conventions!</div><div><br></div><div>This would almost entirely elim= inate the usefulness of an inline assembler.</div><div>Better yet, this cou= ld use the &#39;new&#39; attribute syntax, which most agree will support ar= guments:</div> <div> register(rsp) int x;</div></div> --20cf3074b37ead9e8504c2d98be5--
Jun 19 2012
prev sibling next sibling parent Manu <turkeyman gmail.com> writes:
--00235429d7688728a904c2d9e03a
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

On 19 June 2012 23:59, deadalnix <deadalnix gmail.com> wrote:

 Le 19/06/2012 22:08, Iain Buclaw a =C3=A9crit :

   From what I gathered from further discussion, it made sense for
 embedded platforms, such as ARM, but not x86.

It has proven to be useful to me, not only for performances reasons, but also for low level manipulations. It don't see what make ARM that different on regard to inline assembly capabilities.

If you had the register alias feature I described above, would you be ale to write such low-level manipulations using intrinsics? I think I would be able to rewrite all x86 asm blocks I've ever written using that feature. ARM and PPC both have unique features relating to their branch control and branch prediction that x86 doesn't have. Sadly, all high level languages COMPLETELY overlook such features when designing high level expressions, because they are traditionally designed for x86 first. A thorough set of intrinsics can allow access to these features though, although since they're related to branch control/conditional execution, it feels clumsy, since you lose the feeling of structured code; ie, no scoped if blocks, loop constructs, etc, if you have to use intrinsics to generate conditions or masks. ARM is the most common architecture on earth now. It would be nice if D were able to take better advantage of the architecture. --00235429d7688728a904c2d9e03a Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable <div class=3D"gmail_quote">On 19 June 2012 23:59, deadalnix <span dir=3D"lt= r">&lt;<a href=3D"mailto:deadalnix gmail.com" target=3D"_blank">deadalnix g= mail.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style= =3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> Le 19/06/2012 22:08, Iain Buclaw a =C3=A9crit :<div class=3D"im"><br> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p= x #ccc solid;padding-left:1ex"> =C2=A0From what I gathered from further discussion, it made sense for<br> embedded platforms, such as ARM, but not x86.<br> </blockquote> <br></div> It has proven to be useful to me, not only for performances reasons, but al= so for low level manipulations.<br> <br> It don&#39;t see what make ARM that different on regard to inline assembly = capabilities.<br> </blockquote></div><br><div>If you had the register alias feature I describ= ed above, would you be ale to write such low-level manipulations using intr= insics?</div><div>I think I would be able to rewrite all x86 asm blocks I&#= 39;ve ever written using that feature.</div> <div><br></div><div>ARM and PPC both have unique features relating to their= branch control and branch prediction that x86 doesn&#39;t have. Sadly, all= high level languages COMPLETELY overlook such features when designing high= level expressions, because they are traditionally designed for x86 first.<= /div> <div>A thorough set of intrinsics can allow access to these features though= , although since they&#39;re related to branch control/conditional executio= n, it feels clumsy, since you lose the feeling of structured code; ie, no s= coped if blocks, loop constructs, etc, =C2=A0if you have to use intrinsics = to generate conditions or masks.</div> <div><br></div><div>ARM is the most common architecture on earth now. It wo= uld be nice if D were able to take better advantage of the architecture.</d= iv> --00235429d7688728a904c2d9e03a--
Jun 19 2012
prev sibling next sibling parent Manu <turkeyman gmail.com> writes:
--f46d0444038402a95504c2da5213
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

On 20 June 2012 00:41, deadalnix <deadalnix gmail.com> wrote:

 Le 19/06/2012 23:22, Manu a =C3=A9crit :

  If you had the register alias feature I described above, would you be
 ale to write such low-level manipulations using intrinsics?
 I think I would be able to rewrite all x86 asm blocks I've ever written
 using that feature.

emulated by such a feature.

Really? Can you elaborate? Give me an example that couldn't be done with register aliasing and intrinsics?
  Even if it is true, you don't address the actual interrogation. The
 discussion was about the naked functionality, and some advanced that nake=

 can be useful on ARM, but not on x86. The specificities of ARM you mentio=

 here don't explain that point. (I don't want to pronounce myself on PPC a=

 I have no experience on it).

It seemed to me your comment was about inline assembly, not about naked functions: It don't see what make ARM that different on regard to inline assembly
 capabilities.

I was just listing some reasons I think the inline assembler is more useful to ARM + PPC code than it is to x86 code. Naked is what it is... and it's occasionally useful in very low level situations. --f46d0444038402a95504c2da5213 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable <div class=3D"gmail_quote">On 20 June 2012 00:41, deadalnix <span dir=3D"lt= r">&lt;<a href=3D"mailto:deadalnix gmail.com" target=3D"_blank">deadalnix g= mail.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style= =3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> Le 19/06/2012 23:22, Manu a =C3=A9crit :<div class=3D"im"><br> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p= x #ccc solid;padding-left:1ex"> If you had the register alias feature I described above, would you be<br> ale to write such low-level manipulations using intrinsics?<br> I think I would be able to rewrite all x86 asm blocks I&#39;ve ever written= <br> using that feature.<br> <br> </blockquote> <br></div> No, I couldn&#39;t. Such code involved stack manipulations that cannot be e= mulated by such a feature.</blockquote><div><br></div><div>Really?</div><di= v>Can you elaborate? Give me an example that couldn&#39;t be done with regi= ster aliasing and intrinsics?</div> <div><br></div><div>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"= margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class= =3D"im"><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border= -left:1px #ccc solid;padding-left:1ex"> </blockquote> <div class=3D"im">Even if it is true, you don&#39;t address the actual inte= rrogation. The discussion was about the naked functionality, and some advan= ced that naked can be useful on ARM, but not on x86. The specificities of A= RM you mention here don&#39;t explain that point. (I don&#39;t want to pron= ounce myself on PPC as I have no experience on it).=C2=A0<br> </div></div></blockquote><div><br></div><div>It seemed to me your comment w= as about inline assembly, not about naked functions:</div><div><br></div><b= lockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px = #ccc solid;padding-left:1ex"> <div class=3D"im"><div class=3D"im">It don&#39;t see what make ARM that dif= ferent on regard to inline assembly capabilities.<br></div></div></blockquo= te></div><div><br></div><div>I was just listing some reasons I think the in= line assembler is more useful to ARM + PPC code than it is to x86 code.</di= v> <div>Naked is what it is... and it&#39;s occasionally useful in very low le= vel situations.</div> --f46d0444038402a95504c2da5213--
Jun 19 2012
prev sibling next sibling parent Trass3r <un known.com> writes:
 Please be informed that GCC inline asm supports Intel syntax...

With -masm=intel.
Jun 19 2012
prev sibling next sibling parent Manu <turkeyman gmail.com> writes:
--f46d0442844e05bcc904c2db2f2d
Content-Type: text/plain; charset=UTF-8

On 20 June 2012 01:07, Walter Bright <newshound2 digitalmars.com> wrote:

 On 6/19/2012 1:58 PM, Manu wrote:

  I find a thorough suite of architecture intrinsics are usually the
 fastest and
 cleanest way to the best possible code, although 'naked' may be handy in
 this
 circumstance too...

Do a grep for "naked" across the druntime library sources. For example, its use in druntime/src/rt/alloca.d, where it is very much needed, as alloca() is one of those "magic" functions.

I never argued against naked... I agree it's mandatory. Do a grep for "asm" across the druntime library sources. Can you justify
 all of that with some other scheme?

I think almost all the blocks I just browsed through could be easily written with nothing more than the register alias feature I suggested, and perhaps a couple of opcode intrinsics. And as a bonus, they would also be readable. I can imagine cases where the optimiser would have more freedom too. Thinking more about the implications of removing the inline asm, what would
 REALLY roxors, would be a keyword to insist a variable is represented by a
 register, and by extension, to associate it with a specific register:

This was a failure in C.

Really? This is the missing link between mandatory asm blocks, and being able to do it in high level code with intrinsics. The 'register' keyword was similarly fail as 'inline'.. __forceinline was not fail, it is actually mandatory. I'd argue that __forceregister would be similarly useful in C aswell, but the real power would come from being able to specify the particular register to alias.
 This would almost entirely eliminate the usefulness of an inline assembler.
 Better yet, this could use the 'new' attribute syntax, which most agree
 will
 support arguments:
  register(rsp) int x;

Some C compilers did have such pseudo-register abilities. It was a failure in practice.

Really? I've never seen that. What about it was fail? I really don't understand preferring all these rather convoluted
 enhancements to avoid something simple and straightforward like the inline
 assembler. The use of IA in the D runtime library, for example, has been
 quite successful.

I agree, IA is useful and has been successful, but it has drawbacks too. * IA ruins optimisation around the IA block * IA doesn't inline well. intrinsics allow much greater opportunity for efficient integration into the calling context * most IA functions are small, and prime candidates for inlining (see points 1 and 2) * IA is difficult for the majority of programmers to follow/understand * even to experienced programmers, poorly commented asm takes a lot of time to mentally parse It's a shame that there are IA constructs that can't be expressed any other way. I don't think it would take much to address that. For example, consider this bit from druntime/src/rt/lifetime.d:

 ------------------------------**------------------------------**-------
    auto isshared = ti.classinfo is TypeInfo_Shared.classinfo;
    auto bic = !isshared ? __getBlkInfo((*p).ptr) : null;
    auto info = bic ? *bic : gc_query((*p).ptr);
    auto size = ti.next.tsize();
    version (D_InlineAsm_X86)
    {
        size_t reqsize = void;

        asm
        {
            mov EAX, newcapacity;
            mul EAX, size;
            mov reqsize, EAX;
            jc  Loverflow;
        }
    }
    else
    {
        size_t reqsize = size * newcapacity;

        if (newcapacity > 0 && reqsize / newcapacity != size)
            goto Loverflow;
    }

    // step 2, get the actual "allocated" size.  If the allocated size does
 not
    // match what we expect, then we will need to reallocate anyways.

    // TODO: this probably isn't correct for shared arrays
    size_t curallocsize = void;
    size_t curcapacity = void;
    size_t offset = void;
    size_t arraypad = void;
 ------------------------------**----------------

This one seems trivial, you just need one intrinsic: size_t reqsize = size * newcapacity; __jc(&Loverflow); Although it depends on a '&codeLabel' mechanism to get the label address (GCC supports this in C, I'd love to see this in D too). --f46d0442844e05bcc904c2db2f2d Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable <div class=3D"gmail_quote">On 20 June 2012 01:07, Walter Bright <span dir= =3D"ltr">&lt;<a href=3D"mailto:newshound2 digitalmars.com" target=3D"_blank= ">newshound2 digitalmars.com</a>&gt;</span> wrote:<br><blockquote class=3D"= gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-= left:1ex"> <div class=3D"im">On 6/19/2012 1:58 PM, Manu wrote:<br> <br> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p= x #ccc solid;padding-left:1ex"> I find a thorough suite of architecture intrinsics are usually the fastest = and<br> cleanest way to the best possible code, although &#39;naked&#39; may be han= dy in this<br> circumstance too...<br> </blockquote> <br></div> Do a grep for &quot;naked&quot; across the druntime library sources. For ex= ample, its use in druntime/src/rt/alloca.d, where it is very much needed, a= s alloca() is one of those &quot;magic&quot; functions.</blockquote><div> <br></div><div>I never argued against naked... I agree it&#39;s mandatory.<= /div><div><br></div><div><br></div><blockquote class=3D"gmail_quote" style= =3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> Do a grep for &quot;asm&quot; across the druntime library sources. Can you = justify all of that with some other scheme?</blockquote><div><br></div><div=
I think almost all the blocks I just browsed through could be easily writt=

ps a couple of opcode intrinsics.</div> <div>And as a bonus, they would also be readable. I can imagine cases where= the optimiser would have more freedom too.</div><div><br></div><div><br></= div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-lef= t:1px #ccc solid;padding-left:1ex"> <div class=3D"im"> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p= x #ccc solid;padding-left:1ex"> Thinking more about the implications of removing the inline asm, what would= <br> REALLY roxors, would be a keyword to insist a variable is represented by a<= br> register, and by extension, to associate it with a specific register:<br> </blockquote> <br></div> This was a failure in C.</blockquote><div><br></div><div>Really? This is th= e missing link between mandatory asm blocks, and being able to do it in hig= h level code with intrinsics.</div><div>The &#39;register&#39; keyword was = similarly fail as &#39;inline&#39;.. __forceinline was not fail, it is actu= ally mandatory. I&#39;d argue that __forceregister would be similarly usefu= l in C aswell, but the real power would come from being able to specify the= particular register to alias.</div> <div><br></div><div>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"= margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class= =3D"im"><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border= -left:1px #ccc solid;padding-left:1ex"> This would almost entirely eliminate the usefulness of an inline assembler.= <br> Better yet, this could use the &#39;new&#39; attribute syntax, which most a= gree will<br> support arguments:<br> register(rsp) int x;<br> </blockquote> <br></div> Some C compilers did have such pseudo-register abilities. It was a failure = in practice.<br></blockquote><div><br></div><div>Really? I&#39;ve never see= n that. What about it was fail?</div><div><br></div><blockquote class=3D"gm= ail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-le= ft:1ex"> I really don&#39;t understand preferring all these rather convoluted enhanc= ements to avoid something simple and straightforward like the inline assemb= ler. The use of IA in the D runtime library, for example, has been quite su= ccessful.<br> </blockquote><div><br></div><div>I agree, IA is useful and has been success= ful, but it has drawbacks too.</div><div>=C2=A0 * IA ruins optimisation aro= und the IA block</div><div>=C2=A0 * IA doesn&#39;t inline well. intrinsics = allow much greater opportunity for efficient integration into the calling c= ontext</div> <div>=C2=A0 * most IA functions are small, and prime candidates for inlinin= g (see points 1 and 2)</div><div>=C2=A0 * IA is difficult for the majority = of programmers to follow/understand</div><div>=C2=A0 * even to experienced = programmers, poorly commented asm takes a lot of time to mentally parse</di= v> <div><br></div><div>It&#39;s a shame that there are IA constructs that can&= #39;t be expressed any other way. I don&#39;t think it would take much to a= ddress that.</div><div><br></div><div><br></div><blockquote class=3D"gmail_= quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1= ex"> For example, consider this bit from druntime/src/rt/lifetime.d:<br></blockq= uote><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-le= ft:1px #ccc solid;padding-left:1ex"><br> ------------------------------<u></u>------------------------------<u></u>-= ------<br> =C2=A0 =C2=A0auto isshared =3D ti.classinfo is TypeInfo_Shared.classinfo;<= br> =C2=A0 =C2=A0auto bic =3D !isshared ? __getBlkInfo((*p).ptr) : null;<br> =C2=A0 =C2=A0auto info =3D bic ? *bic : gc_query((*p).ptr);<br> =C2=A0 =C2=A0auto size =3D ti.next.tsize();<br> =C2=A0 =C2=A0version (D_InlineAsm_X86)<br> =C2=A0 =C2=A0{<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0size_t reqsize =3D void;<br> <br> =C2=A0 =C2=A0 =C2=A0 =C2=A0asm<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0{<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0mov EAX, newcapacity;<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0mul EAX, size;<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0mov reqsize, EAX;<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0jc =C2=A0Loverflow;<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0}<br> =C2=A0 =C2=A0}<br> =C2=A0 =C2=A0else<br> =C2=A0 =C2=A0{<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0size_t reqsize =3D size * newcapacity;<br> <br> =C2=A0 =C2=A0 =C2=A0 =C2=A0if (newcapacity &gt; 0 &amp;&amp; reqsize / new= capacity !=3D size)<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0goto Loverflow;<br> =C2=A0 =C2=A0}<br> <br> =C2=A0 =C2=A0// step 2, get the actual &quot;allocated&quot; size. =C2=A0I= f the allocated size does not<br> =C2=A0 =C2=A0// match what we expect, then we will need to reallocate anyw= ays.<br> <br> =C2=A0 =C2=A0// TODO: this probably isn&#39;t correct for shared arrays<br=

=C2=A0 =C2=A0size_t curcapacity =3D void;<br> =C2=A0 =C2=A0size_t offset =3D void;<br> =C2=A0 =C2=A0size_t arraypad =3D void;<br> ------------------------------<u></u>----------------<br> </blockquote></div><br><div>This one seems trivial, you just need one intri= nsic:</div><div><br></div><div>=C2=A0=C2=A0size_t reqsize =3D size * newcap= acity;</div><div><div>=C2=A0 __jc(&amp;Loverflow);</div><div><br></div><div=
Although it depends on a &#39;&amp;codeLabel&#39; mechanism to get the lab=

v> </div> --f46d0442844e05bcc904c2db2f2d--
Jun 19 2012
prev sibling next sibling parent Manu <turkeyman gmail.com> writes:
--f46d040f9f546db91804c2db467f
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

On 20 June 2012 01:50, Alex R=C3=B8nne Petersen <alex lycus.org> wrote:

 On 19-06-2012 23:22, Manu wrote:

 On 19 June 2012 23:59, deadalnix <deadalnix gmail.com

 <mailto:deadalnix gmail.com>> wrote:

    Le 19/06/2012 22:08, Iain Buclaw a =C3=A9crit :

          From what I gathered from further discussion, it made sense for
        embedded platforms, such as ARM, but not x86.


    It has proven to be useful to me, not only for performances reasons,
    but also for low level manipulations.

    It don't see what make ARM that different on regard to inline
    assembly capabilities.


 If you had the register alias feature I described above, would you be
 ale to write such low-level manipulations using intrinsics?
 I think I would be able to rewrite all x86 asm blocks I've ever written
 using that feature.

 ARM and PPC both have unique features relating to their branch control
 and branch prediction that x86 doesn't have. Sadly, all high level
 languages COMPLETELY overlook such features when designing high level
 expressions, because they are traditionally designed for x86 first.

To be fair, ARM v8/AArch64 has eliminated predicated execution, simply because it turned out that the complexity of writing languages and compilers for it was not worth it, compared to just having good branch prediction.

I suspect it may have been because C didn't have expressions to support it, and D... ;) Shame though, it's a totally awesome hardware feature. I don't know of any mass-market arm-v8 devices yet. arm-v7 is still very much alive, and will exist for many years yet. --f46d040f9f546db91804c2db467f Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable <div class=3D"gmail_quote">On 20 June 2012 01:50, Alex R=C3=B8nne Petersen = <span dir=3D"ltr">&lt;<a href=3D"mailto:alex lycus.org" target=3D"_blank">a= lex lycus.org</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" st= yle=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> On 19-06-2012 23:22, Manu wrote:<br> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p= x #ccc solid;padding-left:1ex"> On 19 June 2012 23:59, deadalnix &lt;<a href=3D"mailto:deadalnix gmail.com"= target=3D"_blank">deadalnix gmail.com</a><div><div class=3D"h5"><br> &lt;mailto:<a href=3D"mailto:deadalnix gmail.com" target=3D"_blank">deadaln= ix gmail.com</a>&gt;&gt; wrote:<br> <br> =C2=A0 =C2=A0Le 19/06/2012 22:08, Iain Buclaw a =C3=A9crit :<br> <br> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0From what I gathered from further discus= sion, it made sense for<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0embedded platforms, such as ARM, but not x86.<b= r> <br> <br> =C2=A0 =C2=A0It has proven to be useful to me, not only for performances r= easons,<br> =C2=A0 =C2=A0but also for low level manipulations.<br> <br> =C2=A0 =C2=A0It don&#39;t see what make ARM that different on regard to in= line<br> =C2=A0 =C2=A0assembly capabilities.<br> <br> <br></div></div><div class=3D"im"> If you had the register alias feature I described above, would you be<br> ale to write such low-level manipulations using intrinsics?<br> I think I would be able to rewrite all x86 asm blocks I&#39;ve ever written= <br> using that feature.<br> <br></div><div class=3D"im"> ARM and PPC both have unique features relating to their branch control<br> and branch prediction that x86 doesn&#39;t have. Sadly, all high level<br> languages COMPLETELY overlook such features when designing high level<br> expressions, because they are traditionally designed for x86 first.<br> </div></blockquote> <br> To be fair, ARM v8/AArch64 has eliminated predicated execution, simply beca= use it turned out that the complexity of writing languages and compilers fo= r it was not worth it, compared to just having good branch prediction.</blo= ckquote> <div><br></div><div>I suspect it may have been because C didn&#39;t have ex= pressions to support it, and D... ;)</div><div>Shame though, it&#39;s a tot= ally awesome hardware feature.</div><div><br></div><div>I don&#39;t know of= any mass-market arm-v8 devices yet. arm-v7 is still very much alive, and w= ill exist for many years yet.</div> </div> --f46d040f9f546db91804c2db467f--
Jun 19 2012
prev sibling next sibling parent Manu <turkeyman gmail.com> writes:
--0016e6de17e73e394e04c2db90b7
Content-Type: text/plain; charset=UTF-8

On 20 June 2012 01:37, deadalnix <deadalnix gmail.com> wrote:
 Walter gave you examples. You'll find many others in druntime.

 Here is something I wrote recently that use this again :
 http://www.deadalnix.me/2012/**03/24/get-an-exception-from-a-**
 segfault-on-linux-x86-and-x86_**64-using-some-black-magic/<http://www.deadalnix.me/2012/03/24/get-an-exception-from-a-segfault-on-linux-x86-and-x86_64-using-some-black-magic/>

That code could all be done with the register alias I described, and __push/__pop intrinsics. --0016e6de17e73e394e04c2db90b7 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable <div class=3D"gmail_quote">On 20 June 2012 01:37, deadalnix <span dir=3D"lt= r">&lt;<a href=3D"mailto:deadalnix gmail.com" target=3D"_blank">deadalnix g= mail.com</a>&gt;</span> wrote:<blockquote class=3D"gmail_quote" style=3D"ma= rgin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> Walter gave you examples. You&#39;ll find many others in druntime.<br> <br> Here is something I wrote recently that use this again : <a href=3D"http://= www.deadalnix.me/2012/03/24/get-an-exception-from-a-segfault-on-linux-x86-a= nd-x86_64-using-some-black-magic/" target=3D"_blank">http://www.deadalnix.m= e/2012/<u></u>03/24/get-an-exception-from-a-<u></u>segfault-on-linux-x86-an= d-x86_<u></u>64-using-some-black-magic/</a><br> </blockquote></div><br><div>That code could all be done with the register a= lias I described, and __push/__pop intrinsics.</div> --0016e6de17e73e394e04c2db90b7--
Jun 19 2012
prev sibling next sibling parent Iain Buclaw <ibuclaw ubuntu.com> writes:
On 19 June 2012 23:51, Alex R=F8nne Petersen <alex lycus.org> wrote:
 On 20-06-2012 00:48, Trass3r wrote:
 Please be informed that GCC inline asm supports Intel syntax...

With -masm=3Dintel.

No, you can tell the inline assembler to use Intel syntax from inside cod=

 Iain showed me how on IRC at some point, but I forget the specifics.

iirc, it's: asm { ".intel_syntax noprefix" /* Intel syntax here */ ".att_syntax" } --=20 Iain Buclaw *(p < e ? p++ : p) =3D (c & 0x0f) + '0';
Jun 19 2012
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
Manu:

 Properly implemented multiple-return-values being the killer 
 app here!
 Using ALL the argument registers for returning multiple values 
 aswell ;)

Well, if D has a specific calling convention, it's better for it to give a lot back. Regarding the LLVM back-end for Haskell, it has required a new calling convention, used to replace a feature of GCC (missing in D and in LLVM) ("Thankfully GCC offers an extension, 'Global Register Variables', which allows you to assign a global variable to always reside in a specific hardware register"): http://blog.llvm.org/2010/05/glasgow-haskell-compiler-and-llvm.html Bye, bearophile
Jun 19 2012
prev sibling next sibling parent Manu <turkeyman gmail.com> writes:
--00248c767e3e971d6704c2e2c423
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

On 20 June 2012 10:42, deadalnix <deadalnix gmail.com> wrote:

 Le 19/06/2012 22:58, Manu a =C3=A9crit :
 This would almost entirely eliminate the usefulness of an inline
 assembler.
 Better yet, this could use the 'new' attribute syntax, which most agree
 will support arguments:
  register(rsp) int x;

Choosing registers is something the compiler is better at than us most of the time. For this very reason, I think we want to go in the exact opposite direction : asm with compiler choosen register when possible.

...I think you've missed the entire point of my suggestion. But that's okay. I give up ;) --00248c767e3e971d6704c2e2c423 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable <div class=3D"gmail_quote">On 20 June 2012 10:42, deadalnix <span dir=3D"lt= r">&lt;<a href=3D"mailto:deadalnix gmail.com" target=3D"_blank">deadalnix g= mail.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style= =3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> Le 19/06/2012 22:58, Manu a =C3=A9crit :<blockquote class=3D"gmail_quote" s= tyle=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div= class=3D"im"> This would almost entirely eliminate the usefulness of an inline assembler.= <br> Better yet, this could use the &#39;new&#39; attribute syntax, which most a= gree<br> will support arguments:<br> register(rsp) int x;<br> </div></blockquote> <br> Choosing registers is something the compiler is better at than us most of t= he time.<br> <br> For this very reason, I think we want to go in the exact opposite direction= : asm with compiler choosen register when possible.<br> </blockquote></div><br><div>...I think you&#39;ve missed the entire point o= f my suggestion.</div><div>But that&#39;s okay. I give up ;)</div> --00248c767e3e971d6704c2e2c423--
Jun 20 2012
prev sibling next sibling parent "Tobias Pankrath" <tobias pankrath.net> writes:
 Inline assembly has been relatively useless in GCC for years. 
 Inline asm
 interferes with the optimisers ability to do a good job, which 
 basically
 makes use of inline assembly self-defeating.
 The only time I ever need to use inline-asm is to interface an 
 arch feature
 that has no API. As long as there are intrinsics for all the 
 opcodes one
 might want, then it's better to use them.

 That said, as stated above, if use of this stuff is for 
 performance, then
 using an inline-asm block will ruin the surrounding code anyway,

Could someone explain to me, why inline asm screws up the optimizer? My naive view on the matter is, that the optimizer has full knowledge of what is going on regardless of whether intrinsics or asm is used. I could also think of an optimizer that optimizes inline asm, too. For example by reassigning registers etc.
Jun 20 2012
prev sibling next sibling parent Manu <turkeyman gmail.com> writes:
--00248c6a66ba0efef604c2e45a53
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

On 20 June 2012 11:14, deadalnix <deadalnix gmail.com> wrote:

 Le 20/06/2012 09:58, Manu a =C3=A9crit :

 On 20 June 2012 10:42, deadalnix <deadalnix gmail.com
 <mailto:deadalnix gmail.com>> wrote:

    Le 19/06/2012 22:58, Manu a =C3=A9crit :

        This would almost entirely eliminate the usefulness of an inline
        assembler.
        Better yet, this could use the 'new' attribute syntax, which
        most agree
        will support arguments:
         register(rsp) int x;


    Choosing registers is something the compiler is better at than us
    most of the time.

    For this very reason, I think we want to go in the exact opposite
    direction : asm with compiler choosen register when possible.


 ...I think you've missed the entire point of my suggestion.
 But that's okay. I give up ;)

We presented you example code where your approach isn't going to do the trick. You are free to ignore them.

No, the entire point of my suggestion IS to allow seamless mixing with conventional code, which includes compiler register assignment. The main problem with IA is it's interference with the optimiser, and it's inability to make automatic register selection. Walter claimed push/pop intrinsics wouldn't work due to alignment issues, but I think that's a moot argument, since it's identical to writing your code in asm anyway. If the asm works, then it'll work using an intrinsic exactly the same. The neat bonus is, you can interleave it with structured code, any non-critical variables can be automatically assigned by the compiler as usual... and if the compiler feels comfortable to reorder the code, it can do so. --00248c6a66ba0efef604c2e45a53 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable <div class=3D"gmail_quote">On 20 June 2012 11:14, deadalnix <span dir=3D"lt= r">&lt;<a href=3D"mailto:deadalnix gmail.com" target=3D"_blank">deadalnix g= mail.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style= =3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> Le 20/06/2012 09:58, Manu a =C3=A9crit :<br> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p= x #ccc solid;padding-left:1ex"><div class=3D"im"> On 20 June 2012 10:42, deadalnix &lt;<a href=3D"mailto:deadalnix gmail.com"= target=3D"_blank">deadalnix gmail.com</a><br></div><div class=3D"im"> &lt;mailto:<a href=3D"mailto:deadalnix gmail.com" target=3D"_blank">deadaln= ix gmail.com</a>&gt;&gt; wrote:<br> <br> =C2=A0 =C2=A0Le 19/06/2012 22:58, Manu a =C3=A9crit :<br> <br></div><div class=3D"im"> =C2=A0 =C2=A0 =C2=A0 =C2=A0This would almost entirely eliminate the useful= ness of an inline<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0assembler.<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0Better yet, this could use the &#39;new&#39; at= tribute syntax, which<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0most agree<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0will support arguments:<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0 register(rsp) int x;<br> <br> <br> =C2=A0 =C2=A0Choosing registers is something the compiler is better at tha= n us<br> =C2=A0 =C2=A0most of the time.<br> <br> =C2=A0 =C2=A0For this very reason, I think we want to go in the exact oppo= site<br> =C2=A0 =C2=A0direction : asm with compiler choosen register when possible.= <br> <br> <br></div><div class=3D"im"> ...I think you&#39;ve missed the entire point of my suggestion.<br> But that&#39;s okay. I give up ;)<br> </div></blockquote> <br> We presented you example code where your approach isn&#39;t going to do the= trick. You are free to ignore them.<br> </blockquote></div><br><div>No, the entire point of my suggestion IS to all= ow seamless mixing with conventional code, which includes compiler register= assignment.</div><div>The main problem with IA is it&#39;s interference wi= th the optimiser, and it&#39;s inability to make automatic register selecti= on.</div> <div><br></div><div>Walter claimed push/pop intrinsics wouldn&#39;t work du= e to alignment issues, but I think that&#39;s a moot argument, since it&#39= ;s identical to writing your code in asm anyway. If the asm works, then it&= #39;ll work using an intrinsic exactly the same.</div> <div>The neat bonus is, you can interleave it with structured code, any non= -critical variables can be automatically assigned by the compiler as usual.= .. and if the compiler feels comfortable to reorder the code, it can do so.= </div> --00248c6a66ba0efef604c2e45a53--
Jun 20 2012
prev sibling next sibling parent Iain Buclaw <ibuclaw ubuntu.com> writes:
On 20 June 2012 09:32, Tobias Pankrath <tobias pankrath.net> wrote:
 Inline assembly has been relatively useless in GCC for years. Inline asm
 interferes with the optimisers ability to do a good job, which basically
 makes use of inline assembly self-defeating.
 The only time I ever need to use inline-asm is to interface an arch
 feature
 that has no API. As long as there are intrinsics for all the opcodes one
 might want, then it's better to use them.

 That said, as stated above, if use of this stuff is for performance, then
 using an inline-asm block will ruin the surrounding code anyway,

Could someone explain to me, why inline asm screws up the optimizer? My naive view on the matter is, that the optimizer has full knowledge of what is going on regardless of whether intrinsics or asm is used. I could also think of an optimizer that optimizes inline asm, too. For example by reassigning registers etc.

Actually, the compiler has little knowledge of what the assembly does at all, other than the input/output constraints, and what gets registers get clobbered. Which is enough for the compiler to know how to avoid stepping on your toes when trying to work around it. -- Iain Buclaw *(p < e ? p++ : p) = (c & 0x0f) + '0';
Jun 20 2012
prev sibling next sibling parent Manu <turkeyman gmail.com> writes:
--20cf3074b4f80bd72804c2e4f413
Content-Type: text/plain; charset=UTF-8

On 20 June 2012 03:58, Walter Bright <newshound2 digitalmars.com> wrote:

    Do a grep for "asm" across the druntime library sources. Can you
 justify all
    of that with some other scheme?


 I think almost all the blocks I just browsed through could be easily
 written
 with nothing more than the register alias feature I suggested, and
 perhaps a
 couple of opcode intrinsics.

But I see nothing gained by that.

The gain is that by not using IA, the compiler could much better optimise and inline your code. Your code is likely more readable by more people. Also, since Iain is proposing removing the inline assembler from GDC, it's clearly hard to maintain across different compilers. A higher level language defined construct may be simpler... And as a bonus, they would also be readable.

I don't agree. The point of IA to me is so I can specify exactly what I want. If I wanted to do it at a higher level, I'd use normal D syntax.

In many cases, you need to write a big block of asm to do one single operation that's not expressible at the higher level... and in my experience, most of the time, that operation is addressing a register directly; most commonly, dealing with the stack pointer, or argument registers direcetly. I can imagine cases where the
 optimiser would have more freedom too.

But if I'm writing IA, I want to do it my way. Not the optimizer's way, which may or may not be able to give me what I want.

I think you typically want to do one very small detail your way, the rest of the function, let the optimiser make it the best of. The result is very much comparable to the use of intrinsics in high level code. Yes. C has a register keyword, and nobody uses it anymore. The troubles are
 many, starting with people always "register"ed the wrong variables, and it
 really didn't work out too well when compilers started doing live range
 register assignments. It's ignored by modern C compilers, and hasn't been
 carried forward into other languages.


particular registers in high level code, allowing you do eliminate many small asm blocks. C's failing is unrelated, the goal was totally different. Really? I've never seen that. What about it was fail?

It's actually in DMC, believe it or not. It was a giant failure because nobody used it. It was in Borland's TurboC, too. It pretty much just throws a wrench into the gears of more sophisticated code generators.

I'm not surprised nobody used it in a niche compiler like DMC, especially when it's not supported by major compilers like GCC or MSC... It's not a feature of C, so most people wouldn't ever consider it, or even realise it's possible. Of course it throws a gear in the works, it's a reasonably complex feature, but IA blocks themselves throw an equally large (and rather similar) gear in the works. The most naive implementation could probably do precisely what IA does, that is, to stop reordering across the IA block. That should be just as safe when using intrinsics or explicit register aliasing as it is with inline asm. And that's only a start, I think the compiler could do better with time. The compiler doesn't have much opportunity for improvement with IA, unless the compiler attempts to understand the IA block, which is in a totally different language, and architecture specific. Well defined high-level constructs help the compiler with the understanding it needs to do a good/safe job. It's the same logic that supports opcode intrinsics, which became almost universally preferred to IA in appropriate situations, and are an undeniable success. I really don't understand preferring all these rather convoluted
    enhancements to avoid something simple and straightforward like the
 inline
    assembler. The use of IA in the D runtime library, for example, has
 been
    quite successful.


 I agree, IA is useful and has been successful, but it has drawbacks too.
   * IA ruins optimisation around the IA block

dmd's optimizer is not so sensitive to that.

How can you safely reorder across an IA block? Is there a well defined mechanism to determine it's safe? GCC has been failing at that forever. It takes a very conservative approach. I guess the main problem is because GCC doesn't attempt to understand the asm block, it just pastes it in the output. This one seems trivial, you just need one intrinsic:
   size_t reqsize = size * newcapacity;
   __jc(&Loverflow);

That's highly risky. The optimizer knows nothing at all about the state of the flags register, and does not take into account a dependency on the C flag when doing code motion. Nor would the compiler guarantee that the C flag is even set by however it chose to do the previous multiply (for example, the LEA instruction is often used to do multiplies, which leaves the C flag untouched. Oops!). Nothing connects the __jc intrinsic to that multiply operation.

True, but you could also perform the multiply explicitly with another intrinsic. This reordering problem is perhaps the most difficult issue, but not necessarily insurmountable. And it's only really relevant where explicit interaction with the flags are involved. I suspect it wouldn't be too much trouble to make that intrinsic encode some information that fuses it with the preceding operation as written in the source. Alternatively use a __noreorder {} scope block or something surrounding the mul and jc.. Another possibility might be to make the intrinsic combine both operations as a compound: if(__mul_getc(T a, T b, ref in T res)) goto blah; // <- eliminates the need to take the address of a label There are lots of different approaches, I'm sure an elegant solution is possible. Although it depends on a '&codeLabel' mechanism to get the label address
 (GCC
 supports this in C, I'd love to see this in D too).

Note that supporting such will wind up disabling a lot of the data flow analysis, which is not set up to handle unknown edges between basic blocks.

No doubt, but it only affects code where that operation appears, which would be rather rare. To summarize, I see a lot of complex new features, a significant rewrite of
 the optimizer, and a rewrite of a lot of existing code, and at the end of
 all that we're pretty much at the same state we are at now.

I agree, it's not trivial. It was just something to think about. It's not quite the same place. The examples that have come up here are relatively trivial, so it doesn't add so much to those. It would add an awful lot to larger uses of asm, where it's really nice to be able to mix the explicit pseudo-asm code with regular automatic register assignments, and use of standard control structures (if/for/etc) --20cf3074b4f80bd72804c2e4f413 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable <div class=3D"gmail_quote">On 20 June 2012 03:58, Walter Bright <span dir= =3D"ltr">&lt;<a href=3D"mailto:newshound2 digitalmars.com" target=3D"_blank= ">newshound2 digitalmars.com</a>&gt;</span> wrote:<br><blockquote class=3D"= gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-= left:1ex"> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p= x #ccc solid;padding-left:1ex"><div class=3D"im">=C2=A0 =C2=A0Do a grep for= &quot;asm&quot; across the druntime library sources. Can you justify all<b= r> =C2=A0 =C2=A0of that with some other scheme?<br> <br> <br></div><div class=3D"im"> I think almost all the blocks I just browsed through could be easily writte= n<br> with nothing more than the register alias feature I suggested, and perhaps = a<br> couple of opcode intrinsics.<br> </div></blockquote> <br> But I see nothing gained by that.</blockquote><div><br></div><div>The gain = is that by not using IA, the compiler could much better optimise and inline= your code. Your code is likely more readable by more people.</div><div> Also, since Iain is proposing removing the inline assembler from GDC, it&#3= 9;s clearly hard to maintain across different compilers. A higher level lan= guage defined construct may be simpler...</div><div><br></div><div><br> </div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-l= eft:1px #ccc solid;padding-left:1ex"><div class=3D"im"><blockquote class=3D= "gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding= -left:1ex"> And as a bonus, they would also be readable.<br> </blockquote> <br></div> I don&#39;t agree. The point of IA to me is so I can specify exactly what I= want. If I wanted to do it at a higher level, I&#39;d use normal D syntax.= </blockquote><div><br></div><div>In many cases, you need to write a big blo= ck of asm to do one single operation that&#39;s not expressible at the high= er level... and in my experience, most of the time, that operation is addre= ssing a register directly; most commonly, dealing with the stack pointer, o= r argument registers direcetly.</div> <div><br></div><div><br></div><blockquote class=3D"gmail_quote" style=3D"ma= rgin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class=3D"= im"><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-lef= t:1px #ccc solid;padding-left:1ex"> I can imagine cases where the<br> optimiser would have more freedom too.<br> </blockquote> <br></div> But if I&#39;m writing IA, I want to do it my way. Not the optimizer&#39;s = way, which may or may not be able to give me what I want.<br></blockquote><= div><br></div><div>I think you typically want to do one very small detail y= our way, the rest of the function, let the optimiser make it the best of.</= div> <div>The result is very much comparable to the use of intrinsics in high le= vel code.</div><div><br></div><div><br></div><blockquote class=3D"gmail_quo= te" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"=

x #ccc solid;padding-left:1ex"><div class=3D"im">Yes. C has a register keyw= ord, and nobody uses it anymore. The troubles are many, starting with peopl= e always &quot;register&quot;ed the wrong variables, and it really didn&#39= ;t work out too well when compilers started doing live range register assig= nments. It&#39;s ignored by modern C compilers, and hasn&#39;t been carried= forward into other languages.</div> </blockquote></blockquote><div><br></div><div>You miss the point of the sug= gestion; as a mechanism to directly address particular registers in high le= vel code, allowing you do eliminate many small asm blocks. C&#39;s failing = is unrelated, the goal was totally different.</div> <div><br></div><div><br></div><blockquote class=3D"gmail_quote" style=3D"ma= rgin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><blockquote cl= ass=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;p= adding-left:1ex"> <div class=3D"im">Really? I&#39;ve never seen that. What about it was fail?= </div></blockquote> <br> It&#39;s actually in DMC, believe it or not. It was a giant failure because= nobody used it. It was in Borland&#39;s TurboC, too. It pretty much just t= hrows a wrench into the gears of more sophisticated code generators.<br> </blockquote><div><br></div><div>I&#39;m not surprised nobody used it in a = niche compiler like DMC, especially when it&#39;s not supported by major co= mpilers like GCC or MSC... It&#39;s not a feature of C, so most people woul= dn&#39;t ever consider it, or even realise it&#39;s possible.</div> <div><br></div><div>Of course it throws a gear in the works, it&#39;s a rea= sonably complex feature, but IA blocks themselves throw an equally large (a= nd rather similar) gear in the works. The most naive implementation could p= robably do precisely what IA does, that is, to stop reordering across the I= A block.</div> <div>That should be just as safe when using intrinsics or explicit register= aliasing as it is with inline asm. And that&#39;s only a start, I think th= e compiler could do better with time.</div><div>The compiler doesn&#39;t ha= ve much opportunity for improvement with IA, unless the compiler attempts t= o understand the IA block, which is in a totally different language, and ar= chitecture specific. Well defined high-level constructs help the compiler w= ith the understanding it needs to do a good/safe job.</div> <div>It&#39;s the same logic that supports opcode intrinsics, which became = almost universally preferred to IA in appropriate situations, and are an un= deniable success.</div><div><br></div><div><br></div><blockquote class=3D"g= mail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-l= eft:1ex"> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p= x #ccc solid;padding-left:1ex"><div class=3D"im">=C2=A0 =C2=A0I really don&= #39;t understand preferring all these rather convoluted<br> =C2=A0 =C2=A0enhancements to avoid something simple and straightforward li= ke the inline<br> =C2=A0 =C2=A0assembler. The use of IA in the D runtime library, for exampl= e, has been<br> =C2=A0 =C2=A0quite successful.<br> <br> <br></div><div class=3D"im"> I agree, IA is useful and has been successful, but it has drawbacks too.<br=

</div></blockquote> <br> dmd&#39;s optimizer is not so sensitive to that.</blockquote><div><br></div=
<div>How can you safely reorder across an IA block? Is there a well define=

hat forever. It takes a very conservative approach.</div> <div>I guess the main problem is because GCC doesn&#39;t attempt to underst= and the asm block, it just pastes it in the output.</div><div><br></div><di= v><br></div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;bo= rder-left:1px #ccc solid;padding-left:1ex"> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p= x #ccc solid;padding-left:1ex"><div class=3D"im">This one seems trivial, yo= u just need one intrinsic:<br> <br></div><div class=3D"im"> =C2=A0 size_t reqsize =3D size * newcapacity;<br></div> =C2=A0 __jc(&amp;Loverflow);<br> </blockquote> <br> That&#39;s highly risky. The optimizer knows nothing at all about the state= of the flags register, and does not take into account a dependency on the = C flag when doing code motion. Nor would the compiler guarantee that the C = flag is even set by however it chose to do the previous multiply (for examp= le, the LEA instruction is often used to do multiplies, which leaves the C = flag untouched. Oops!). Nothing connects the __jc intrinsic to that multipl= y operation.</blockquote> <div><br></div><div>True, but you could also perform the multiply explicitl= y with another intrinsic.</div><div>This reordering problem is perhaps the = most difficult issue, but not necessarily insurmountable. And it&#39;s only= really relevant where explicit interaction with the flags are involved.</d= iv> <div>I suspect it wouldn&#39;t be too much trouble to make that intrinsic e= ncode some information that fuses it with the=C2=A0preceding=C2=A0operation= as written in the source.</div><div>Alternatively use a __noreorder {} sco= pe block or something surrounding the mul and jc..</div> <div>Another possibility might be to make the intrinsic combine both operat= ions as a compound: if(__mul_getc(T a, T b, ref in T res)) goto blah; // &l= t;- eliminates the need to take the address of a label</div><div>There are = lots of different approaches, I&#39;m sure an elegant solution is possible.= </div> <div><br></div><div><br></div><blockquote class=3D"gmail_quote" style=3D"ma= rgin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class=3D"= im"><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-lef= t:1px #ccc solid;padding-left:1ex"> =C2=A0Although it depends on a &#39;&amp;codeLabel&#39; mechanism to get th= e label address (GCC<br> supports this in C, I&#39;d love to see this in D too).<br> </blockquote> <br></div> Note that supporting such will wind up disabling a lot of the data flow ana= lysis, which is not set up to handle unknown edges between basic blocks.<br=
</blockquote><div><br></div><div>No doubt, but it only affects code where =

<div><br></div><div><br></div><blockquote class=3D"gmail_quote" style=3D"ma= rgin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">To summarize, = I see a lot of complex new features, a significant rewrite of the optimizer= , and a rewrite of a lot of existing code, and at the end of all that we&#3= 9;re pretty much at the same state we are at now.<br> </blockquote></div><br><div>I agree, it&#39;s not trivial. It was just some= thing to think about.</div><div>It&#39;s not quite the same place. The exam= ples that have come up here are relatively trivial, so it doesn&#39;t add s= o much to those. It would add an awful lot to larger uses of asm, where it&= #39;s really nice to be able to mix the explicit pseudo-asm code with regul= ar automatic register assignments, and use of standard control structures (= if/for/etc)</div> --20cf3074b4f80bd72804c2e4f413--
Jun 20 2012
prev sibling next sibling parent Manu <turkeyman gmail.com> writes:
--00235429d76880a3f204c2e528e0
Content-Type: text/plain; charset=UTF-8

On 20 June 2012 11:32, Tobias Pankrath <tobias pankrath.net> wrote:

 Inline assembly has been relatively useless in GCC for years. Inline asm
 interferes with the optimisers ability to do a good job, which basically
 makes use of inline assembly self-defeating.
 The only time I ever need to use inline-asm is to interface an arch
 feature
 that has no API. As long as there are intrinsics for all the opcodes one
 might want, then it's better to use them.

That said, as stated above, if use of this stuff is for performance, then
 using an inline-asm block will ruin the surrounding code anyway,

Could someone explain to me, why inline asm screws up the optimizer? My naive view on the matter is, that the optimizer has full knowledge of what is going on regardless of whether intrinsics or asm is used. I could also think of an optimizer that optimizes inline asm, too. For example by reassigning registers etc.

It's because the compiler doesn't understand assembly code. It has no knowledge of what it actually does, and as a result, just treats it as a black box. Since it has no idea what it does, and doesn't know how it may or may not relate to the surrounding code, the compiler conservatively preserves the order of operations on either side of the asm block for safety. Worse, the asm block may write to memory, which potentially invalidates the state of resident present in registers. Most compilers will force a store and reload of non-local variables on either side of the asm block. This is the main reason opcode intrinsics became popular rather than using the IA, particularly for things like maths/simd/etc, where use of asm is typically for optimisation. You can't use SSE code within an IA block as an optimisation if your use of IA its self causes optimisation to fail in the surrounding code. Usage of IA blocks in most cases of that type will result in slower code. --00235429d76880a3f204c2e528e0 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable <div class=3D"gmail_quote">On 20 June 2012 11:32, Tobias Pankrath <span dir= =3D"ltr">&lt;<a href=3D"mailto:tobias pankrath.net" target=3D"_blank">tobia= s pankrath.net</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" s= tyle=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div class=3D"im"><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .= 8ex;border-left:1px #ccc solid;padding-left:1ex"> Inline assembly has been relatively useless in GCC for years. Inline asm<br=


The only time I ever need to use inline-asm is to interface an arch feature= <br> that has no API. As long as there are intrinsics for all the opcodes one<br=

</blockquote> <br> </div><div class=3D"im"><blockquote class=3D"gmail_quote" style=3D"margin:0= 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> That said, as stated above, if use of this stuff is for performance, then<b= r> using an inline-asm block will ruin the surrounding code anyway,<br> </blockquote> <br></div> Could someone explain to me, why inline asm screws up the optimizer? My nai= ve view on the matter is, that the optimizer has full knowledge of what is = going on regardless of whether intrinsics or asm is used. I could also thin= k of an optimizer that optimizes inline asm, too. For example by reassignin= g registers etc.<br> </blockquote><div><br></div><div>It&#39;s because the compiler doesn&#39;t = understand assembly code. It has no knowledge of what it actually does, and= as a result, just treats it as a black box.</div><div>Since it has no idea= what it does, and doesn&#39;t know how it may or may not relate to the sur= rounding code, the compiler conservatively preserves the order of operation= s on either side of the asm block for safety.</div> <div>Worse, the asm block may write to memory, which potentially invalidate= s the state of resident present in registers. Most compilers will force a s= tore and reload of non-local variables on either side of the asm block.</di= v> <div><br></div><div>This is the main reason opcode intrinsics became popula= r rather than using the IA, particularly for things like maths/simd/etc, wh= ere use of asm is typically for optimisation. You can&#39;t use SSE code wi= thin an IA block as an optimisation if your use of IA its self causes optim= isation to fail in the surrounding code. Usage of IA blocks in most cases o= f that type will result in slower code.</div> </div> --00235429d76880a3f204c2e528e0--
Jun 20 2012
prev sibling next sibling parent reply Don Clugston <dac nospam.com> writes:
On 19/06/12 20:19, Iain Buclaw wrote:
 Hi,

 Had round one of the code review process, so I'm going to post the main
 issues here that most affect D users / the platforms they want to run on
 / the compiler version they want to use.



 1) D Inline Asm and naked function support is raising far too many alarm
 bells. So would just be easier to remove it and avoid all the other
 comments on why we need middle-end and backend headers in gdc.

You seem to be conflating a couple of unrelated issues here. One is the calling convention. The other is inline asm. Comments in the thread about "asm is mostly used for short things which get inlined" leave me completely baffled, as it is completely wrong. There are two uses for asm, and they are very different: (1) Functionality. This happens when there are gaps in the language, and you get an abstraction inversion. You can address these with intrinsics. (2) Speed. High-speed, all-asm functions. These _always_ include a loop. You seem to be focusing on (1), but case (2) is completely different. Case (2) cannot be replaced with intrinsics. For example, you can't write asm code using MSVC intrinsics (because the compiler rewrites your code). Currently, D is the best way to write (2). It is much, much better than an external assembler.
Jun 20 2012
next sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2012-06-20 12:51, Don Clugston wrote:

 You seem to be conflating a couple of unrelated issues here.
 One is the calling convention. The other is inline asm.

 Comments in the thread about "asm is mostly used for short things which
 get inlined" leave me completely baffled, as it is completely wrong.

 There are two uses for asm, and they are very different:
 (1) Functionality. This happens when there are gaps in the language, and
 you get an abstraction inversion. You can address these with intrinsics.
 (2) Speed. High-speed, all-asm functions. These _always_ include a loop.


 You seem to be focusing on (1), but case (2) is completely different.

 Case (2) cannot be replaced with intrinsics. For example, you can't
 write asm code using MSVC intrinsics (because the compiler rewrites your
 code).
 Currently, D is the best way to write (2). It is much, much better than
 an external assembler.

You do understand that the GCC-style inline assembly will still be available? -- /Jacob Carlborg
Jun 20 2012
next sibling parent =?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= <alex lycus.org> writes:
On 20-06-2012 18:08, Jonathan M Davis wrote:
 On Wednesday, June 20, 2012 13:33:53 Jacob Carlborg wrote:
 You do understand that the GCC-style inline assembly will still be
 available?

But inline assembler with the syntax that dmd uses is supposed to be part of the language. So, if gdc doesn't support it, it's not a fully compliant D compiler. It would be like if gdc didn't do auto a = expression; but instead did expression = a auto; except that the problem is more localized, because inline assembly is rather rare (unlike variable declarations). So, this a is a _huge_ deal. - Jonathan M Davis

In practice, no it isn't. Do you really think all C/C++ compilers are truly standard compliant in every single aspect of the standard, for instance? And besides, how many of D's users actually write inline assembly in the first place? In reality, I don't think removing inline assembly support from GDC is going to be as problematic as you make it sound, especially when GDC does provide its own syntax based on the very well-established GCC syntax. And I think the comparison you offer is very exaggerated. Besides, the D spec has always been incredibly x86-centric, something I've been screaming about for a long time now (see my rants on shared). Making it less x86-centric is a *good* thing IMHO. Implementing a D compiler shouldn't require implementing an inline assembler for x86. It just doesn't make any sense, as much as it is neat to have a standard inline assembler. Actually, why would we even have the inline assembly version identifiers if compilers weren't allowed to omit inline assembly syntax? And let's not forget interpreters, JITs, ... -- Alex Rønne Petersen alex lycus.org http://lycus.org
Jun 20 2012
prev sibling parent reply =?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= <alex lycus.org> writes:
On 20-06-2012 21:08, Joseph Rushton Wakeling wrote:
 On 20/06/12 18:10, David Nadlinger wrote:
 I am not too sure about that: In my opinion, your description of the
 problem
 would be accurate if some compiler implemented asm {}, but with a
 different
 syntax or different semantics. But GDC simply does not (resp. will not)
 implement D-style inline assembly at all. From my point of view, this
 is not
 necessarily a problem spec-wise, as it is not guaranteed to be
 available – if it
 was, there would be no reason to have D_InlineAsm_X86 at all.

Reading http://dlang.org/iasm.html I don't have the impression that the inline assembler is an optional part of the D spec or not guaranteed to be available -- it's very deliberately intended to be there.
 Needless to say, inline assembly is sometimes a very convenient
 feature to have,
 but if it is the only issue stopping GDC from being merged to mainline
 GCC, I'd
 say the only sensible choice is to yank it, at least it for the time
 being. If,
 at a later point, somebody comes up with a clever way to implement it
 given the
 constraints imposed by the GCC infrastructure, or manages to convince
 the GCC
 maintainers to accept the »dirty« solution, it could still be added in
 again.

For sure it make sense as a short-term compromise, but I don't see how GDC can meet the D specifications without implementing the inline assembler at some point in the (hopefully near) future. When you consider that GDC is the best bet for being able to compile D on ARM processors, and a major application here is embedded systems, it really seems necessary to plan to have this functionality in there.

And x86 inline assembler... on ARM? I don't think I follow. -- Alex Rønne Petersen alex lycus.org http://lycus.org
Jun 20 2012
parent =?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= <alex lycus.org> writes:
On 20-06-2012 21:48, Joseph Rushton Wakeling wrote:
 On 20/06/12 20:35, Alex Rønne Petersen wrote:
 And x86 inline assembler... on ARM? I don't think I follow.

If I understand http://dlang.org/iasm.html correctly, the idea is that D should have an inline assembler for each target architecture. AFAICS what's desired is that you should be able to insert asm { // target-specific assembly goes here } .... and have it accepted by _any_ D compiler. That seems to me to be an important part of the language in general and even more so on architectures that are suited to embedded systems. So while it may make sense to cut the inline assembly in the short term for GDC, it doesn't make sense to me for it to be a change that lasts.

GDC currently supports x86, ARM, PowerPC, MIPS, SPARC, and possibly others. The language reference lists assembly syntax for x86. I understand that in an ideal world, we'd have standardized assembly syntaxes for all of these architectures, but somebody has to actually spec and implement them. Besides, Iain has already pointed out that the x86 syntax in the spec doesn't integrate with GCC's inline assembly support at all (which is why GDC had the glue code for it). It took around 2000 lines (if memory serves) to translate the D inline assembly to GCC inline assembly. Now imagine having to do this for every architecture ever supported. -- Alex Rønne Petersen alex lycus.org http://lycus.org
Jun 20 2012
prev sibling next sibling parent reply Don Clugston <dac nospam.com> writes:
On 20/06/12 13:04, Manu wrote:
 On 20 June 2012 13:51, Don Clugston <dac nospam.com
 <mailto:dac nospam.com>> wrote:

     On 19/06/12 20:19, Iain Buclaw wrote:

         Hi,

         Had round one of the code review process, so I'm going to post
         the main
         issues here that most affect D users / the platforms they want
         to run on
         / the compiler version they want to use.



         1) D Inline Asm and naked function support is raising far too
         many alarm
         bells. So would just be easier to remove it and avoid all the other
         comments on why we need middle-end and backend headers in gdc.


     You seem to be conflating a couple of unrelated issues here.
     One is the calling convention. The other is inline asm.

     Comments in the thread about "asm is mostly used for short things
     which get inlined" leave me completely baffled, as it is completely
     wrong.

     There are two uses for asm, and they are very different:
     (1) Functionality. This happens when there are gaps in the language,
     and you get an abstraction inversion. You can address these with
     intrinsics.
     (2) Speed. High-speed, all-asm functions. These _always_ include a loop.


     You seem to be focusing on (1), but case (2) is completely different.

     Case (2) cannot be replaced with intrinsics. For example, you can't
     write asm code using MSVC intrinsics (because the compiler rewrites
     your code).
     Currently, D is the best way to write (2). It is much, much better
     than an external assembler.


 Case 1 has no alternative to inline asm. I've thrown out some crazy
 ideas to think about (but nobody seems to like them). I still think it
 could be addressed though.

 Case 2; I'm not convinced. These such long functions are the type I'm
 generally interested in aswell, and have the most experience with. But
 in my experience, they're almost always best written with intrinsics.
 If they're small enough to be inlined, then you can't afford not to use
 intrinsics. If they are truly big functions, then you begin to sacrifice
 readability and maintain-ability, and certainly limit the number of
 programmers that can maintain the code.

I don't agree with that. In the situations I'm used to, using intrinsics would not make it easier to read, and would definitely not make it easier to maintain. I find it inconceivable that somebody could understand the processor well enough to maintain the code, and yet not understand asm.
 I rarely fail to produce identical code with intrinsics to that which I
 would write with hand written asm. The flags are always the biggest
 challenge, as discussed prior in this thread. I think that could be
 addressed with better intrinsics.

Again, look at std.internal.math.BiguintX86. There are many cases there where you can swap two instructions, and the code will still produce the correct result, but it will be 30% slower. I think that the SIMD case gives you a misleading impression, because on x86 they are very easy to schedule (they nearly all take the same number of cycles, etc). So it's not hard for the compiler to do a good job of it.
Jun 20 2012
next sibling parent deadalnix <deadalnix gmail.com> writes:
Le 20/06/2012 14:51, Manu a écrit :
 On 20 June 2012 14:44, Don Clugston <dac nospam.com
 <mailto:dac nospam.com>> wrote:

     On 20/06/12 13:04, Manu wrote:

         On 20 June 2012 13:51, Don Clugston <dac nospam.com
         <mailto:dac nospam.com>

         <mailto:dac nospam.com <mailto:dac nospam.com>>> wrote:

             On 19/06/12 20:19, Iain Buclaw wrote:

                 Hi,

                 Had round one of the code review process, so I'm going
         to post
                 the main
                 issues here that most affect D users / the platforms
         they want
                 to run on
                 / the compiler version they want to use.



                 1) D Inline Asm and naked function support is raising
         far too
                 many alarm
                 bells. So would just be easier to remove it and avoid
         all the other
                 comments on why we need middle-end and backend headers
         in gdc.


             You seem to be conflating a couple of unrelated issues here.
             One is the calling convention. The other is inline asm.

             Comments in the thread about "asm is mostly used for short
         things
             which get inlined" leave me completely baffled, as it is
         completely
             wrong.

             There are two uses for asm, and they are very different:
             (1) Functionality. This happens when there are gaps in the
         language,
             and you get an abstraction inversion. You can address these with
             intrinsics.
             (2) Speed. High-speed, all-asm functions. These _always_
         include a loop.


             You seem to be focusing on (1), but case (2) is completely
         different.

             Case (2) cannot be replaced with intrinsics. For example,
         you can't
             write asm code using MSVC intrinsics (because the compiler
         rewrites
             your code).
             Currently, D is the best way to write (2). It is much, much
         better
             than an external assembler.


         Case 1 has no alternative to inline asm. I've thrown out some crazy
         ideas to think about (but nobody seems to like them). I still
         think it
         could be addressed though.

         Case 2; I'm not convinced. These such long functions are the
         type I'm
         generally interested in aswell, and have the most experience
         with. But
         in my experience, they're almost always best written with
         intrinsics.
         If they're small enough to be inlined, then you can't afford not
         to use
         intrinsics. If they are truly big functions, then you begin to
         sacrifice
         readability and maintain-ability, and certainly limit the number of
         programmers that can maintain the code.


     I don't agree with that. In the situations I'm used to, using
     intrinsics would not make it easier to read, and would definitely
     not make it easier to maintain. I find it inconceivable that
     somebody could understand the processor well enough to maintain the
     code, and yet not understand asm.


 These functions of yours are 100% asm, that's not really what I would
 usually call 'inline asm'. That's really just 'asm' :)

You are being picky here. Yes, this is 100% asm. But still, 100% asm is inline asm. It is asm within D code.
Jun 20 2012
prev sibling parent Don Clugston <dac nospam.com> writes:
On 20/06/12 14:51, Manu wrote:
 On 20 June 2012 14:44, Don Clugston <dac nospam.com
 <mailto:dac nospam.com>> wrote:

     On 20/06/12 13:04, Manu wrote:

         On 20 June 2012 13:51, Don Clugston <dac nospam.com
         <mailto:dac nospam.com>

         <mailto:dac nospam.com <mailto:dac nospam.com>>> wrote:

             On 19/06/12 20:19, Iain Buclaw wrote:

                 Hi,

                 Had round one of the code review process, so I'm going
         to post
                 the main
                 issues here that most affect D users / the platforms
         they want
                 to run on
                 / the compiler version they want to use.



                 1) D Inline Asm and naked function support is raising
         far too
                 many alarm
                 bells. So would just be easier to remove it and avoid
         all the other
                 comments on why we need middle-end and backend headers
         in gdc.


             You seem to be conflating a couple of unrelated issues here.
             One is the calling convention. The other is inline asm.

             Comments in the thread about "asm is mostly used for short
         things
             which get inlined" leave me completely baffled, as it is
         completely
             wrong.

             There are two uses for asm, and they are very different:
             (1) Functionality. This happens when there are gaps in the
         language,
             and you get an abstraction inversion. You can address these with
             intrinsics.
             (2) Speed. High-speed, all-asm functions. These _always_
         include a loop.


             You seem to be focusing on (1), but case (2) is completely
         different.

             Case (2) cannot be replaced with intrinsics. For example,
         you can't
             write asm code using MSVC intrinsics (because the compiler
         rewrites
             your code).
             Currently, D is the best way to write (2). It is much, much
         better
             than an external assembler.


         Case 1 has no alternative to inline asm. I've thrown out some crazy
         ideas to think about (but nobody seems to like them). I still
         think it
         could be addressed though.

         Case 2; I'm not convinced. These such long functions are the
         type I'm
         generally interested in aswell, and have the most experience
         with. But
         in my experience, they're almost always best written with
         intrinsics.
         If they're small enough to be inlined, then you can't afford not
         to use
         intrinsics. If they are truly big functions, then you begin to
         sacrifice
         readability and maintain-ability, and certainly limit the number of
         programmers that can maintain the code.


     I don't agree with that. In the situations I'm used to, using
     intrinsics would not make it easier to read, and would definitely
     not make it easier to maintain. I find it inconceivable that
     somebody could understand the processor well enough to maintain the
     code, and yet not understand asm.


 These functions of yours are 100% asm, that's not really what I would
 usually call 'inline asm'. That's really just 'asm' :)
 I think you've just illustrated one of my key points actually; that is
 that you can't just insert small inline asm blocks within regular code,
 the optimiser can't deal with it in most cases, so inevitably, the
 entire function becomes asm from start to end.

Personally I call it "inline asm" if I don't need to use a separate assembler. If you're using a different definition, then we don't actually disagree.
 I find I can typically produce equivalent code using carefully crafted
 intrinsics within regular C language structures. Also, often enough, the
 code outside the hot loop can be written in normal C for readability,
 since it barely affects performance, and trivial setup code will usually
 optimise perfectly anyway.

 You're correct that a person 'maintaining' such code, who doesn't have
 such a thorough understanding of the codegen may ruin it's perfectly
 tuned efficiency. This may be the case, but in a commercial coding
 environment, where a build MUST be delivered yesterday, the guy that
 understands it is on holiday, and you need to tweak the behaviour
 immediately, this is a much safer position to be in.
 This is a very real scenario. I can't afford to ignore this practical
 reality.

OK, it sounds like your use case is a bit different. The kinds of things I deal with are
 I might have a go at compiling the regular D code tonight, and seeing if
 I can produce identical assembly. I haven't tried this so much with x86
 as I have with RISC architectures, which have much more predictable codegen.


         I rarely fail to produce identical code with intrinsics to that
         which I
         would write with hand written asm. The flags are always the biggest
         challenge, as discussed prior in this thread. I think that could be
         addressed with better intrinsics.


     Again, look at std.internal.math.BiguintX86. There are many cases
     there where you can swap two instructions, and the code will still
     produce the correct result, but it will be 30% slower.


 But that's precisely the sort of thing optimisers/schedulers are best
 at. Can you point at a particular example where that is the case, that
 the scheduler would get it wrong if left to its own ordering algorithm?
 The opcode tables should have thorough information about the opcode
 timings and latencies.

I don't know. I can just tell you that they don't get it right. I suspect they don't take all of the bottlenecks into account. For x86 I think the primary difficulty is that you cannot do it in independent passes. Eg, you won't find a register contention bottleneck until you've assigned registers, and the only way to get rid of it is to change the instructions you're using. Which involves backtracking through several passes. Very messy.
 The only thing that I find usually trips it up is
 not having knowledge of the probability of the data being in nearby
 cache. If it has 2 loads, and one is less likely to be in cache, it
 should be scheduled earlier.

Yes, that's definitely true.
 As a side question, x86 architectures perform wildly differently from
 each other. How do you reliably say some block of hand written x86 code
 is the best possible code on all available processors?
 Do you just benchmark on a suite of common processors available at the
 time? I can imagine the opcode timing tables, which are presumably
 rather different for every cpu, could easily feed wrong data to the
 codegen...

Yes. You can fairly easily determine a theoretical limit for a piece of code, and if you've reached that, you're optimal. It's not possible to be simultaneously optimal on Pentium4 and something else, but my experience is that code optimized for PPro-series Intel machines is usually near-optimal on AMD. (The reverse is not true, it's much easier to be optimal on AMD).
     I think that the SIMD case gives you a misleading impression,
     because on x86 they are very easy to schedule (they nearly all take
     the same number of cycles, etc). So it's not hard for the compiler
     to do a good job of it.


 True, but it's one of the most common usage scenarios, so it can't be
 ignored. Some other case studies I feel close to are hardware emulation,
 software rasterisation, particles, fluid dynamics, rigid body dynamics,
 FFT's, and audio signal processing. In each, the only time I rarely need
 inline asm, usually only when there is a hole in the high level
 language, as you said earlier. I find this typically surfaces when
 needing to interact with the flags regs directly.

I agree with that. I think the need for asm in those cases could be greatly reduced. I'm just saying that there are cases where eliminating asm is not realistic.
Jun 20 2012
prev sibling parent deadalnix <deadalnix gmail.com> writes:
Le 20/06/2012 13:04, Manu a écrit :
 Case 1 has no alternative to inline asm. I've thrown out some crazy
 ideas to think about (but nobody seems to like them). I still think it
 could be addressed though.

 Case 2; I'm not convinced. These such long functions are the type I'm
 generally interested in aswell, and have the most experience with. But
 in my experience, they're almost always best written with intrinsics.
 If they're small enough to be inlined, then you can't afford not to use
 intrinsics. If they are truly big functions, then you begin to sacrifice
 readability and maintain-ability, and certainly limit the number of
 programmers that can maintain the code.
 I rarely fail to produce identical code with intrinsics to that which I
 would write with hand written asm. The flags are always the biggest
 challenge, as discussed prior in this thread. I think that could be
 addressed with better intrinsics.

I'm sorry, but what you say is rather ignorant. Not that it is wrong, but it only cover YOUR usage of inline asm. You are talking about performances, but many other usages of assembly code are very useful, valid, and cannot be replaced by intrinsics. druntime is full of that, Walter and I presented you piece of code specifically. None of that could have been done without 100% asm functions. It is clear, however, that the compiler should get a better understanding of asm.
Jun 20 2012
prev sibling next sibling parent Manu <turkeyman gmail.com> writes:
--20cf3074b37ea7eda404c2e55dc8
Content-Type: text/plain; charset=UTF-8

On 20 June 2012 13:51, Don Clugston <dac nospam.com> wrote:

 On 19/06/12 20:19, Iain Buclaw wrote:

 Hi,

 Had round one of the code review process, so I'm going to post the main
 issues here that most affect D users / the platforms they want to run on
 / the compiler version they want to use.



 1) D Inline Asm and naked function support is raising far too many alarm
 bells. So would just be easier to remove it and avoid all the other
 comments on why we need middle-end and backend headers in gdc.

You seem to be conflating a couple of unrelated issues here. One is the calling convention. The other is inline asm. Comments in the thread about "asm is mostly used for short things which get inlined" leave me completely baffled, as it is completely wrong. There are two uses for asm, and they are very different: (1) Functionality. This happens when there are gaps in the language, and you get an abstraction inversion. You can address these with intrinsics. (2) Speed. High-speed, all-asm functions. These _always_ include a loop. You seem to be focusing on (1), but case (2) is completely different. Case (2) cannot be replaced with intrinsics. For example, you can't write asm code using MSVC intrinsics (because the compiler rewrites your code). Currently, D is the best way to write (2). It is much, much better than an external assembler.

Case 1 has no alternative to inline asm. I've thrown out some crazy ideas to think about (but nobody seems to like them). I still think it could be addressed though. Case 2; I'm not convinced. These such long functions are the type I'm generally interested in aswell, and have the most experience with. But in my experience, they're almost always best written with intrinsics. If they're small enough to be inlined, then you can't afford not to use intrinsics. If they are truly big functions, then you begin to sacrifice readability and maintain-ability, and certainly limit the number of programmers that can maintain the code. I rarely fail to produce identical code with intrinsics to that which I would write with hand written asm. The flags are always the biggest challenge, as discussed prior in this thread. I think that could be addressed with better intrinsics. --20cf3074b37ea7eda404c2e55dc8 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable <div class=3D"gmail_quote">On 20 June 2012 13:51, Don Clugston <span dir=3D= "ltr">&lt;<a href=3D"mailto:dac nospam.com" target=3D"_blank">dac nospam.co= m</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margi= n:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div class=3D"im">On 19/06/12 20:19, Iain Buclaw wrote:<br> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p= x #ccc solid;padding-left:1ex"> Hi,<br> <br> Had round one of the code review process, so I&#39;m going to post the main= <br> issues here that most affect D users / the platforms they want to run on<br=

<br> <br> <br> 1) D Inline Asm and naked function support is raising far too many alarm<br=

comments on why we need middle-end and backend headers in gdc.<br> </blockquote> <br></div> You seem to be conflating a couple of unrelated issues here.<br> One is the calling convention. The other is inline asm.<br> <br> Comments in the thread about &quot;asm is mostly used for short things whic= h get inlined&quot; leave me completely baffled, as it is completely wrong.= <br> <br> There are two uses for asm, and they are very different:<br> (1) Functionality. This happens when there are gaps in the language, and yo= u get an abstraction inversion. You can address these with intrinsics.<br> (2) Speed. High-speed, all-asm functions. These _always_ include a loop.<br=

<br> You seem to be focusing on (1), but case (2) is completely different.<br> <br> Case (2) cannot be replaced with intrinsics. For example, you can&#39;t wri= te asm code using MSVC intrinsics (because the compiler rewrites your code)= .<br> Currently, D is the best way to write (2). It is much, much better than an = external assembler.<br> </blockquote></div><div><br></div><div>Case 1 has no alternative to inline = asm. I&#39;ve thrown out some crazy ideas to think about (but nobody seems = to like them). I still think it could be addressed though.</div><div><br> </div><div>Case 2; I&#39;m not convinced. These such long functions are the= type I&#39;m generally interested in aswell, and have the most experience = with. But in my experience, they&#39;re almost always best written with int= rinsics.</div> <div>If they&#39;re small enough to be inlined, then you can&#39;t afford n= ot to use intrinsics. If they are truly big functions, then you begin to sa= crifice readability and maintain-ability, and certainly limit the number of= programmers that can maintain the code.</div> <div>I rarely fail to produce identical code with intrinsics to that which = I would write with hand written asm. The flags are always the biggest chall= enge, as discussed prior in this thread. I think that could be addressed wi= th better intrinsics.</div> --20cf3074b37ea7eda404c2e55dc8--
Jun 20 2012
prev sibling next sibling parent "Tobias Pankrath" <tobias pankrath.net> writes:
 It's because the compiler doesn't understand assembly code. It 
 has no
 knowledge of what it actually does, and as a result, just 
 treats it as a
 black box.

But this is not set in stone. If I teach a compiler how to optimize intrinsics, can't I teach him to understand and optimize a (maybe small) subset of assembler, too? This must happen in the backend anyway, since intrinsics are platform-dependent, no?
Jun 20 2012
prev sibling next sibling parent Manu <turkeyman gmail.com> writes:
--00235429d7689e789104c2e59e04
Content-Type: text/plain; charset=UTF-8

On 20 June 2012 13:59, Don Clugston <dac nospam.com> wrote:

 You and I seem to be from different planets. I have almost never written
 as asm function which was suitable for inlining.

 Take a look at std.internal.math.biguintX86.d

 I do not know how to write that code without inline asm.

Interesting. I wish I could paste some counter-examples, but they're all proprietary >_< I think they key detail here is where you stated, they _always_ include a loop. Is this because it's hard to manipulate the compiler into the correct interaction with the flags register? I'd be interested to compare the compiled D code, and your hand written asm code, to see where exactly the optimiser goes wrong. It doesn't look like you're exploiting too many tricks (at a brief glance), it's just nice tight hand written code, which the optimiser should theoretically be able to get right... I find optimisers are very good at code simplification, assuming that you massage the code/expressions to neatly match any architectural quirks. I also appreciate that good x86 code is possibly the hardest architecture for an optimiser to get right... --00235429d7689e789104c2e59e04 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable <div class=3D"gmail_quote">On 20 June 2012 13:59, Don Clugston <span dir=3D= "ltr">&lt;<a href=3D"mailto:dac nospam.com" target=3D"_blank">dac nospam.co= m</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margi= n:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div class=3D"im">You and I seem to be from different planets. I have almos= t never written as asm function which was suitable for inlining.</div> <br> Take a look at std.internal.math.biguintX86.d<br> <br> I do not know how to write that code without inline asm.<br> </blockquote></div><br><div>Interesting.</div><div>I wish I could paste som= e counter-examples, but they&#39;re all proprietary &gt;_&lt;</div><div><br=
</div><div>I think they key detail here is where you stated, they _always_=

nto the correct interaction with the flags register?</div> <div>I&#39;d be interested to compare the compiled D code, and your hand wr= itten asm code, to see where exactly the optimiser goes wrong. It doesn&#39= ;t look like you&#39;re exploiting too many tricks (at a brief glance), it&= #39;s just nice tight hand written code, which the optimiser should theoret= ically be able to get right...</div> <div><br></div><div>I find optimisers are very good at code simplification,= assuming that you massage the code/expressions to neatly match any archite= ctural quirks.</div><div>I also appreciate that good x86 code is possibly t= he hardest architecture for an optimiser to get right...</div> --00235429d7689e789104c2e59e04--
Jun 20 2012
prev sibling next sibling parent "Bernard Helyer" <b.helyer gmail.com> writes:
On Wednesday, 20 June 2012 at 02:35:10 UTC, Walter Bright wrote:
 On 6/19/2012 6:06 PM, Alex Rønne Petersen wrote:
 On 20-06-2012 03:01, Walter Bright wrote:
 On 6/19/2012 3:47 PM, Alex Rønne Petersen wrote:
 On 19-06-2012 23:52, Walter Bright wrote:
 GDC can certainly define its D calling convention to match 
 GCC's. It's
 an "implementation defined" thing, not a language defined 
 one.

calling it the D ABI and making it look like it's part of the language on the website.

The ABI is not part of the language. For example, the C Standard says nothing whatsoever about the C ABI.

Then it's very misleading that it's under the language reference area of the website and calls it the "D ABI" and not the "DMD ABI". This might have been fine back when there was only DMD, but it really needs to be made clear that this is not an ABI that compilers are required to follow.

You're probably right.

He's definitely right. To have the mangling rules on the same page as the ABI and then act confused when people think it's part of the language? I was sputtering with rage. Sputtering!
Jun 20 2012
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
Jacob Carlborg:

 You do understand that the GCC-style inline assembly will still 
 be available?

Are DMD and LDC2 going to accept that GCC-style inline assembly? Bye, bearophile
Jun 20 2012
prev sibling next sibling parent Manu <turkeyman gmail.com> writes:
--20cf3074b37e5a212304c2e5d734
Content-Type: text/plain; charset=UTF-8

On 20 June 2012 14:19, Tobias Pankrath <tobias pankrath.net> wrote:

 It's because the compiler doesn't understand assembly code. It has no
 knowledge of what it actually does, and as a result, just treats it as a
 black box.

But this is not set in stone. If I teach a compiler how to optimize intrinsics, can't I teach him to understand and optimize a (maybe small) subset of assembler, too? This must happen in the backend anyway, since intrinsics are platform-dependent, no?

It's MUCH easier with intrinsics. Teaching it to understand assembly involves learning a foreign language, and also for the _compiler_ to understand and predict the product of the codegen step. The compiler is usually completely separated from the codegen, it doesn't understand the architecture it targets. But using the knowledge supplied by the intrinsic API, it can do the optimisations it needs to in the usual way. Declare the intrinsic as pure/nothrow, declare its arguments as in/scope/const/etc, it now knows a lot more about what to expect from the magic code beneath the intrinsic, and it can safely perform regular optimisation around it. Also, the codegen can use standard register assignment, which is important, and integrate nicely with regular program control structure. --20cf3074b37e5a212304c2e5d734 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable <div class=3D"gmail_quote">On 20 June 2012 14:19, Tobias Pankrath <span dir= =3D"ltr">&lt;<a href=3D"mailto:tobias pankrath.net" target=3D"_blank">tobia= s pankrath.net</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" s= tyle=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div class=3D"im"><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .= 8ex;border-left:1px #ccc solid;padding-left:1ex"> It&#39;s because the compiler doesn&#39;t understand assembly code. It has = no<br> knowledge of what it actually does, and as a result, just treats it as a<br=

</blockquote> <br></div> But this is not set in stone. If I teach a compiler how to optimize intrins= ics, can&#39;t I teach him to understand and optimize a (maybe small) subse= t of assembler, too? This must happen in the backend anyway, since intrinsi= cs are platform-dependent, no?<br> </blockquote></div><br><div>It&#39;s MUCH easier with intrinsics. Teaching = it to understand assembly involves learning a foreign language, and also fo= r the _compiler_ to understand and predict the product of the codegen step.= The compiler is usually completely separated from the codegen, it doesn&#3= 9;t understand the architecture it targets. But using the knowledge supplie= d by the intrinsic API, it can do the optimisations it needs to in the usua= l way.</div> <div><br></div><div>Declare the intrinsic as pure/nothrow, declare its argu= ments as in/scope/const/etc, it now knows a lot more about what to expect f= rom the magic code beneath the intrinsic, and it can safely perform regular= optimisation around it. Also, the codegen can use standard register assign= ment, which is important, and integrate nicely with regular program control= structure.</div> --20cf3074b37e5a212304c2e5d734--
Jun 20 2012
prev sibling next sibling parent Manu <turkeyman gmail.com> writes:
--20cf3074b6a846090f04c2e6dcb0
Content-Type: text/plain; charset=UTF-8

On 20 June 2012 14:44, Don Clugston <dac nospam.com> wrote:

 On 20/06/12 13:04, Manu wrote:

 On 20 June 2012 13:51, Don Clugston <dac nospam.com

 <mailto:dac nospam.com>> wrote:

    On 19/06/12 20:19, Iain Buclaw wrote:

        Hi,

        Had round one of the code review process, so I'm going to post
        the main
        issues here that most affect D users / the platforms they want
        to run on
        / the compiler version they want to use.



        1) D Inline Asm and naked function support is raising far too
        many alarm
        bells. So would just be easier to remove it and avoid all the other
        comments on why we need middle-end and backend headers in gdc.


    You seem to be conflating a couple of unrelated issues here.
    One is the calling convention. The other is inline asm.

    Comments in the thread about "asm is mostly used for short things
    which get inlined" leave me completely baffled, as it is completely
    wrong.

    There are two uses for asm, and they are very different:
    (1) Functionality. This happens when there are gaps in the language,
    and you get an abstraction inversion. You can address these with
    intrinsics.
    (2) Speed. High-speed, all-asm functions. These _always_ include a
 loop.


    You seem to be focusing on (1), but case (2) is completely different.

    Case (2) cannot be replaced with intrinsics. For example, you can't
    write asm code using MSVC intrinsics (because the compiler rewrites
    your code).
    Currently, D is the best way to write (2). It is much, much better
    than an external assembler.


 Case 1 has no alternative to inline asm. I've thrown out some crazy
 ideas to think about (but nobody seems to like them). I still think it
 could be addressed though.

 Case 2; I'm not convinced. These such long functions are the type I'm
 generally interested in aswell, and have the most experience with. But
 in my experience, they're almost always best written with intrinsics.
 If they're small enough to be inlined, then you can't afford not to use
 intrinsics. If they are truly big functions, then you begin to sacrifice
 readability and maintain-ability, and certainly limit the number of
 programmers that can maintain the code.

I don't agree with that. In the situations I'm used to, using intrinsics would not make it easier to read, and would definitely not make it easier to maintain. I find it inconceivable that somebody could understand the processor well enough to maintain the code, and yet not understand asm.

These functions of yours are 100% asm, that's not really what I would usually call 'inline asm'. That's really just 'asm' :) I think you've just illustrated one of my key points actually; that is that you can't just insert small inline asm blocks within regular code, the optimiser can't deal with it in most cases, so inevitably, the entire function becomes asm from start to end. I find I can typically produce equivalent code using carefully crafted intrinsics within regular C language structures. Also, often enough, the code outside the hot loop can be written in normal C for readability, since it barely affects performance, and trivial setup code will usually optimise perfectly anyway. You're correct that a person 'maintaining' such code, who doesn't have such a thorough understanding of the codegen may ruin it's perfectly tuned efficiency. This may be the case, but in a commercial coding environment, where a build MUST be delivered yesterday, the guy that understands it is on holiday, and you need to tweak the behaviour immediately, this is a much safer position to be in. This is a very real scenario. I can't afford to ignore this practical reality. I might have a go at compiling the regular D code tonight, and seeing if I can produce identical assembly. I haven't tried this so much with x86 as I have with RISC architectures, which have much more predictable codegen. I rarely fail to produce identical code with intrinsics to that which I
 would write with hand written asm. The flags are always the biggest
 challenge, as discussed prior in this thread. I think that could be
 addressed with better intrinsics.

Again, look at std.internal.math.BiguintX86. There are many cases there where you can swap two instructions, and the code will still produce the correct result, but it will be 30% slower.

But that's precisely the sort of thing optimisers/schedulers are best at. Can you point at a particular example where that is the case, that the scheduler would get it wrong if left to its own ordering algorithm? The opcode tables should have thorough information about the opcode timings and latencies. The only thing that I find usually trips it up is not having knowledge of the probability of the data being in nearby cache. If it has 2 loads, and one is less likely to be in cache, it should be scheduled earlier. As a side question, x86 architectures perform wildly differently from each other. How do you reliably say some block of hand written x86 code is the best possible code on all available processors? Do you just benchmark on a suite of common processors available at the time? I can imagine the opcode timing tables, which are presumably rather different for every cpu, could easily feed wrong data to the codegen... I think that the SIMD case gives you a misleading impression, because on
 x86 they are very easy to schedule (they nearly all take the same number of
 cycles, etc). So it's not hard for the compiler to do a good job of it.

True, but it's one of the most common usage scenarios, so it can't be ignored. Some other case studies I feel close to are hardware emulation, software rasterisation, particles, fluid dynamics, rigid body dynamics, FFT's, and audio signal processing. In each, the only time I rarely need inline asm, usually only when there is a hole in the high level language, as you said earlier. I find this typically surfaces when needing to interact with the flags regs directly. --20cf3074b6a846090f04c2e6dcb0 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable <div class=3D"gmail_quote">On 20 June 2012 14:44, Don Clugston <span dir=3D= "ltr">&lt;<a href=3D"mailto:dac nospam.com" target=3D"_blank">dac nospam.co= m</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margi= n:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> On 20/06/12 13:04, Manu wrote:<br> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p= x #ccc solid;padding-left:1ex"> On 20 June 2012 13:51, Don Clugston &lt;<a href=3D"mailto:dac nospam.com" t= arget=3D"_blank">dac nospam.com</a><div><div class=3D"h5"><br> &lt;mailto:<a href=3D"mailto:dac nospam.com" target=3D"_blank">dac nospam.c= om</a>&gt;&gt; wrote:<br> <br> =C2=A0 =C2=A0On 19/06/12 20:19, Iain Buclaw wrote:<br> <br> =C2=A0 =C2=A0 =C2=A0 =C2=A0Hi,<br> <br> =C2=A0 =C2=A0 =C2=A0 =C2=A0Had round one of the code review process, so I&= #39;m going to post<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0the main<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0issues here that most affect D users / the plat= forms they want<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0to run on<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0/ the compiler version they want to use.<br> <br> <br> <br> =C2=A0 =C2=A0 =C2=A0 =C2=A01) D Inline Asm and naked function support is r= aising far too<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0many alarm<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0bells. So would just be easier to remove it and= avoid all the other<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0comments on why we need middle-end and backend = headers in gdc.<br> <br> <br> =C2=A0 =C2=A0You seem to be conflating a couple of unrelated issues here.<= br> =C2=A0 =C2=A0One is the calling convention. The other is inline asm.<br> <br> =C2=A0 =C2=A0Comments in the thread about &quot;asm is mostly used for sho= rt things<br> =C2=A0 =C2=A0which get inlined&quot; leave me completely baffled, as it is= completely<br> =C2=A0 =C2=A0wrong.<br> <br> =C2=A0 =C2=A0There are two uses for asm, and they are very different:<br> =C2=A0 =C2=A0(1) Functionality. This happens when there are gaps in the la= nguage,<br> =C2=A0 =C2=A0and you get an abstraction inversion. You can address these w= ith<br> =C2=A0 =C2=A0intrinsics.<br> =C2=A0 =C2=A0(2) Speed. High-speed, all-asm functions. These _always_ incl= ude a loop.<br> <br> <br> =C2=A0 =C2=A0You seem to be focusing on (1), but case (2) is completely di= fferent.<br> <br> =C2=A0 =C2=A0Case (2) cannot be replaced with intrinsics. For example, you= can&#39;t<br> =C2=A0 =C2=A0write asm code using MSVC intrinsics (because the compiler re= writes<br> =C2=A0 =C2=A0your code).<br> =C2=A0 =C2=A0Currently, D is the best way to write (2). It is much, much b= etter<br> =C2=A0 =C2=A0than an external assembler.<br> <br> <br></div></div><div class=3D"im"> Case 1 has no alternative to inline asm. I&#39;ve thrown out some crazy<br> ideas to think about (but nobody seems to like them). I still think it<br> could be addressed though.<br> <br> Case 2; I&#39;m not convinced. These such long functions are the type I&#39= ;m<br> generally interested in aswell, and have the most experience with. But<br> in my experience, they&#39;re almost always best written with intrinsics.<b= r> If they&#39;re small enough to be inlined, then you can&#39;t afford not to= use<br> intrinsics. If they are truly big functions, then you begin to sacrifice<br=

programmers that can maintain the code.<br> </div></blockquote> <br> I don&#39;t agree with that. In the situations I&#39;m used to, using intri= nsics would not make it easier to read, and would definitely not make it ea= sier to maintain. I find it inconceivable that somebody could understand th= e processor well enough to maintain the code, and yet not understand asm.</= blockquote> <div><br></div><div>These functions of yours are 100% asm, that&#39;s not r= eally what I would usually call &#39;inline asm&#39;. That&#39;s really jus= t &#39;asm&#39; :)</div><div>I think you&#39;ve just illustrated one of my = key points actually; that is that you can&#39;t just insert small inline as= m blocks within regular code, the optimiser can&#39;t deal with it in most = cases, so inevitably, the entire function becomes asm from start to end.</d= iv> <div><br></div><div>I find I can typically produce equivalent code using ca= refully crafted intrinsics within regular C language structures. Also, ofte= n enough, the code outside the hot loop can be written in normal C for read= ability, since it barely affects performance, and trivial setup code will u= sually optimise perfectly anyway.</div> <div><br></div><div>You&#39;re correct that a person &#39;maintaining&#39; = such code, who doesn&#39;t have such a thorough understanding of the codege= n may ruin it&#39;s perfectly tuned efficiency. This may be the case, but i= n a commercial coding environment, where a build MUST be delivered yesterda= y, the guy that understands it is on holiday, and you need to tweak the beh= aviour immediately, this is a much safer position to be in.</div> <div>This is a very real scenario. I can&#39;t afford to ignore this practi= cal reality.</div><div><br></div><div>I might have a go at compiling the re= gular D code tonight, and seeing if I can produce identical assembly. I hav= en&#39;t tried this so much with x86 as I have with RISC architectures, whi= ch have much more predictable codegen.</div> <div><br></div><div><br></div><blockquote class=3D"gmail_quote" style=3D"ma= rgin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class=3D"= im"><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-lef= t:1px #ccc solid;padding-left:1ex"> I rarely fail to produce identical code with intrinsics to that which I<br> would write with hand written asm. The flags are always the biggest<br> challenge, as discussed prior in this thread. I think that could be<br> addressed with better intrinsics.<br> </blockquote> <br></div> Again, look at std.internal.math.BiguintX86. There are many cases there whe= re you can swap two instructions, and the code will still produce the corre= ct result, but it will be 30% slower.<br></blockquote><div><br></div><div> But that&#39;s precisely the sort of thing optimisers/schedulers are best a= t. Can you point at a particular example where that is the case, that the s= cheduler would get it wrong if left to its own ordering algorithm?</div> <div>The opcode tables should have thorough information about the opcode ti= mings and latencies. The only thing that I find usually trips it up is not = having knowledge of the probability of the data being in nearby cache. If i= t has 2 loads, and one is less likely to be in cache, it should be schedule= d earlier.</div> <div><br></div><div>As a side question, x86 architectures perform wildly di= fferently from each other. How do you reliably say some block of hand writt= en x86 code is the best possible code on all available processors?</div> <div>Do you just benchmark on a suite of common processors available at the= time? I can imagine the opcode timing tables, which are presumably rather = different for every cpu, could easily feed wrong data to the codegen...</di= v> <div><br></div><div><br></div><blockquote class=3D"gmail_quote" style=3D"ma= rgin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">I think that t= he SIMD case gives you a misleading impression, because on x86 they are ver= y easy to schedule (they nearly all take the same number of cycles, etc). S= o it&#39;s not hard for the compiler to do a good job of it.<br> </blockquote></div><br><div>True, but it&#39;s one of the most common usage= scenarios, so it can&#39;t be ignored. Some other case studies I feel clos= e to are hardware emulation, software rasterisation, particles, fluid dynam= ics, rigid body dynamics, FFT&#39;s, and audio signal processing. In each, = the only time I rarely need inline asm, usually only when there is a hole i= n the high level language, as you said earlier. I find this typically surfa= ces when needing to interact with the flags regs directly.</div> --20cf3074b6a846090f04c2e6dcb0--
Jun 20 2012
prev sibling next sibling parent Manu <turkeyman gmail.com> writes:
--20cf3005dcc8b52dcd04c2e6ec95
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

On 20 June 2012 15:30, deadalnix <deadalnix gmail.com> wrote:

 Le 20/06/2012 13:04, Manu a =C3=A9crit :

 Case 1 has no alternative to inline asm. I've thrown out some crazy
 ideas to think about (but nobody seems to like them). I still think it
 could be addressed though.

 Case 2; I'm not convinced. These such long functions are the type I'm
 generally interested in aswell, and have the most experience with. But
 in my experience, they're almost always best written with intrinsics.
 If they're small enough to be inlined, then you can't afford not to use
 intrinsics. If they are truly big functions, then you begin to sacrifice
 readability and maintain-ability, and certainly limit the number of
 programmers that can maintain the code.
 I rarely fail to produce identical code with intrinsics to that which I
 would write with hand written asm. The flags are always the biggest
 challenge, as discussed prior in this thread. I think that could be
 addressed with better intrinsics.

I'm sorry, but what you say is rather ignorant. Not that it is wrong, but it only cover YOUR usage of inline asm. You are talking about performances, but many other usages of assembly code are ve=

 useful, valid, and cannot be replaced by intrinsics. druntime is full of
 that, Walter and I presented you piece of code specifically. None of that
 could have been done without 100% asm functions.

I wasn't talking about performance strictly, I'm talking about pure functionality but with an intent not to inhibit optimisation. The high level language can't interact with registers directly, there's no mechanism to do so. I offered trivial solutions. You never suggested any reason why they couldn't work. In your code, you only need push/pop intrinsics, and a register alias to produce identical code. In Walters example, I offered a number of options (neat handling of JC being the key issue). I'm not saying what it SHOULD be, just some possibilities to think about/explore. It is clear, however, that the compiler should get a better understanding
 of asm.

Such a better understanding of asm is easier implemented via intrinsics, that's the basis of my suggestion; extend the high level language such that it is capable of that understanding within conventional expressions. Intrinsics are already mechanically present in the language, adding more as they are needed is no problem. The only missing component I can identify, is the ability to directly address specific registers in high level code. --20cf3005dcc8b52dcd04c2e6ec95 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable <div class=3D"gmail_quote">On 20 June 2012 15:30, deadalnix <span dir=3D"lt= r">&lt;<a href=3D"mailto:deadalnix gmail.com" target=3D"_blank">deadalnix g= mail.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style= =3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> Le 20/06/2012 13:04, Manu a =C3=A9crit :<br> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p= x #ccc solid;padding-left:1ex"><div class=3D"im"> Case 1 has no alternative to inline asm. I&#39;ve thrown out some crazy<br> ideas to think about (but nobody seems to like them). I still think it<br> could be addressed though.<br> <br> Case 2; I&#39;m not convinced. These such long functions are the type I&#39= ;m<br> generally interested in aswell, and have the most experience with. But<br> in my experience, they&#39;re almost always best written with intrinsics.<b= r> If they&#39;re small enough to be inlined, then you can&#39;t afford not to= use<br> intrinsics. If they are truly big functions, then you begin to sacrifice<br=

programmers that can maintain the code.<br></div><div class=3D"im"> I rarely fail to produce identical code with intrinsics to that which I<br> would write with hand written asm. The flags are always the biggest<br> challenge, as discussed prior in this thread. I think that could be<br> addressed with better intrinsics.<br> </div></blockquote> <br> I&#39;m sorry, but what you say is rather ignorant.<br> <br> Not that it is wrong, but it only cover YOUR usage of inline asm. You are t= alking about performances, but many other usages of assembly code are very = useful, valid, and cannot be replaced by intrinsics. druntime is full of th= at, Walter and I presented you piece of code specifically. None of that cou= ld have been done without 100% asm functions.<br> </blockquote><div><br></div><div>I wasn&#39;t talking about performance str= ictly, I&#39;m talking about pure functionality but with an intent not to i= nhibit optimisation. The high level language can&#39;t interact with regist= ers directly, there&#39;s no mechanism to do so.</div> <div><br></div><div>I offered trivial solutions. You never suggested any re= ason why they couldn&#39;t work. In your code, you only need push/pop intri= nsics, and a register alias to produce identical code.</div><div>In Walters= example, I offered a number of options (neat handling of JC being the key = issue). I&#39;m not saying what it SHOULD be, just some possibilities to th= ink about/explore.</div> <div><br></div><div><br></div><blockquote class=3D"gmail_quote" style=3D"ma= rgin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">It is clear, h= owever, that the compiler should get a better understanding of asm.<br> </blockquote></div><br><div>Such a better understanding of asm is easier im= plemented via intrinsics, that&#39;s the basis of my suggestion; extend the= high level language such that it is capable of that understanding within c= onventional expressions.</div> <div>Intrinsics are already mechanically present in the language, adding mo= re as they are needed is no problem. The only missing component I can ident= ify, is the ability to directly address specific registers in high level co= de.</div> --20cf3005dcc8b52dcd04c2e6ec95--
Jun 20 2012
prev sibling next sibling parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 19/06/12 19:19, Iain Buclaw wrote:
 Had round one of the code review process, so I'm going to post the main issues
 here that most affect D users / the platforms they want to run on / the
compiler
 version they want to use.

A somewhat different take on these issues -- we've several times now had discussions over the backend of DMD, and the whole "reference implementation not entirely free/open source" issue. One of the points made in those discussions is that the issue is somewhat moot given that the real "reference implementation" is the frontend, and this already has at least 2 free backends (GCC and LLVM). However, that point stops being moot the moment there are compiler-specific constraints that mean that code that will compile with DMD won't compile with GDC, or vice-versa. If I can't use inline asm with GDC, or I have to go about it in a different way to DMD, then we can hardly say that GDC reflects the reference implementation. It seems to me that guaranteeing equal capabilities between DMD and GDC should be a "red line" in determining what changes or deletions are acceptable or not.
Jun 20 2012
prev sibling next sibling parent Iain Buclaw <ibuclaw ubuntu.com> writes:
On 20 June 2012 14:01, Joseph Rushton Wakeling
<joseph.wakeling webdrake.net> wrote:
 On 19/06/12 19:19, Iain Buclaw wrote:
 Had round one of the code review process, so I'm going to post the main
 issues
 here that most affect D users / the platforms they want to run on / the
 compiler
 version they want to use.

A somewhat different take on these issues -- we've several times now had discussions over the backend of DMD, and the whole "reference implementat=

 not entirely free/open source" issue. =A0One of the points made in those
 discussions is that the issue is somewhat moot given that the real
 "reference implementation" is the frontend, and this already has at least=

 free backends (GCC and LLVM).

 However, that point stops being moot the moment there are compiler-specif=

 constraints that mean that code that will compile with DMD won't compile
 with GDC, or vice-versa. =A0If I can't use inline asm with GDC, or I have=

 go about it in a different way to DMD, then we can hardly say that GDC
 reflects the reference implementation.

 It seems to me that guaranteeing equal capabilities between DMD and GDC
 should be a "red line" in determining what changes or deletions are
 acceptable or not.

Unfortunately this is a red line I am going to cross. Haven't yet pushed anything yet, but feel free to visualise: I have altered the following to the gdc build for all gdc-specific sources (which includes d inline assembler implementation) - GCC system headers are included first and foremost before all other heade= rs. - Now compiles with macro -DIN_GCC_FRONTEND turned on Result: GDC now fails to compile as we pull in many middle-end and backend headers that have been POISONED for GCC frontends to use. Apparently I somehow bypassed this. :o) Fix: Remove all included headers that are poisoned - but wait! - now D inline assembler is missing crucial key elements of what made it just about work in GDC. Hands are tied, sorry. --=20 Iain Buclaw *(p < e ? p++ : p) =3D (c & 0x0f) + '0';
Jun 20 2012
prev sibling next sibling parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 20/06/12 14:31, Iain Buclaw wrote:
 Hands are tied, sorry.

Is this planned as a short-term change for which a long-term solution will be developed, or is it likely to be a permanent split with DMD?
Jun 20 2012
prev sibling next sibling parent Manu <turkeyman gmail.com> writes:
--00235429d76812c3eb04c2e85837
Content-Type: text/plain; charset=UTF-8

On 20 June 2012 17:15, Don Clugston <dac nospam.com> wrote:

 On 20/06/12 13:22, Manu wrote:

 I find optimisers are very good at code simplification, assuming that

you massage the code/expressions to neatly match any architectural quirks.
 I also appreciate that good x86 code is possibly the hardest
 architecture for an optimiser to get right...

Optimizers improved enormously during the 80's and 90's, but the rate of improvement seems to have slowed. With x86, out-of-order execution has made it very easy to get reasonably good code, and much harder to achieve perfection. Still, Core i7 is much easier than Core2, since Intel removed one of the most complicated bottlenecks (on core2 and earlier there is a max 3 reads per cycle, of registers you haven't written to in the previous 3 cycles).

Yeah okay, I can easily imagine the complexity for an x86 codegen. RISC architectures are so much more predictable. How do you define 'perfection'? Performance as measured on what particular machine? :) --00235429d76812c3eb04c2e85837 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable <div class=3D"gmail_quote">On 20 June 2012 17:15, Don Clugston <span dir=3D= "ltr">&lt;<a href=3D"mailto:dac nospam.com" target=3D"_blank">dac nospam.co= m</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margi= n:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> On 20/06/12 13:22, Manu wrote:<br> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p= x #ccc solid;padding-left:1ex">I find optimisers are very good at code simp= lification, assuming that</blockquote><div class=3D"im"><blockquote class= =3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padd= ing-left:1ex"> you massage the code/expressions to neatly match any architectural quirks.<= br> I also appreciate that good x86 code is possibly the hardest<br> architecture for an optimiser to get right...<br> </blockquote> <br></div> Optimizers improved enormously during the 80&#39;s and 90&#39;s, but the ra= te of improvement seems to have slowed.<br> <br> With x86, out-of-order execution has made it very easy to get reasonably go= od code, and much harder to achieve perfection. Still, Core i7 is much easi= er than Core2, since Intel removed one of the most complicated bottlenecks = (on core2 and earlier there is a max 3 reads per cycle, of registers you ha= ven&#39;t written to in the previous 3 cycles).<br> </blockquote></div><br><div>Yeah okay, I can easily imagine the complexity = for an x86 codegen.</div><div>RISC architectures are so much more predictab= le.</div><div><br></div><div>How do you define &#39;perfection&#39;? Perfor= mance as measured on what particular machine? :)</div> --00235429d76812c3eb04c2e85837--
Jun 20 2012
prev sibling next sibling parent Iain Buclaw <ibuclaw ubuntu.com> writes:
On 20 June 2012 15:07, Joseph Rushton Wakeling
<joseph.wakeling webdrake.net> wrote:
 On 20/06/12 14:31, Iain Buclaw wrote:
 Hands are tied, sorry.

Is this planned as a short-term change for which a long-term solution will be developed, or is it likely to be a permanent split with DMD?

Likely permanent move away from having the a good portion of the frontend one big special case for i386. I don't see it as a huge problem though. However one or two people in IRC have asked if the GDC Extended Assembler could be renamed to __gcc_asm or __asm to make it a special / reserved feature of GDC, rather than competing with the D spec's namespace. -- Iain Buclaw *(p < e ? p++ : p) = (c & 0x0f) + '0';
Jun 20 2012
prev sibling next sibling parent Brad Anderson <eco gnuk.net> writes:
--bcaec517a7a4af13c904c2e97ef8
Content-Type: text/plain; charset=ISO-8859-1

On Tue, Jun 19, 2012 at 12:19 PM, Iain Buclaw <ibuclaw ubuntu.com> wrote:

 Hi,

 Had round one of the code review process, so I'm going to post the main
 issues here that most affect D users / the platforms they want to run on /
 the compiler version they want to use.



 1) D Inline Asm and naked function support is raising far too many alarm
 bells. So would just be easier to remove it and avoid all the other
 comments on why we need middle-end and backend headers in gdc.


 2) Code with #if V1 and V2 raised another bell with the request to remove
 all code that relies on internal macros with proper if() conditions. If
 something is always going to be turned off, remove it.

 So, we shall also be saying bye bye D1 in GDC.  We'll miss you!


 3) For anyone who has submitted patches for Mingw and Apple - sorry, but
 I'm going to have to yank out or alter certain bits.  Apple GCC is
 irrelevant now, and some Mingw checks look for if(target) when it should
 really be checking if(host) and vice versa!


 Most discussion I would imagine be on the decision to remove D inline
 assembler support from gdc.  So, nay sayers, do your worst, but
 unfortunately there is a +1 here for removal.


 Regards
 Iain

I'm very much outside of my area of understanding but would it be possible to use CTFE+mixin to generate GCC asm from DMD style asm allowing people to still use a single version of the asm for both DMD and GDC? Regards, Brad Anderson --bcaec517a7a4af13c904c2e97ef8 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On Tue, Jun 19, 2012 at 12:19 PM, Iain Buclaw <span dir=3D"ltr">&lt;<a href= =3D"mailto:ibuclaw ubuntu.com" target=3D"_blank">ibuclaw ubuntu.com</a>&gt;= </span> wrote:<br><div class=3D"gmail_quote"><blockquote class=3D"gmail_quo= te" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"=

<br> Had round one of the code review process, so I&#39;m going to post the main= issues here that most affect D users / the platforms they want to run on /= the compiler version they want to use.<br> <br> <br> <br> 1) D Inline Asm and naked function support is raising far too many alarm be= lls. So would just be easier to remove it and avoid all the other comments = on why we need middle-end and backend headers in gdc.<br> <br> <br> 2) Code with #if V1 and V2 raised another bell with the request to remove a= ll code that relies on internal macros with proper if() conditions. If some= thing is always going to be turned off, remove it.<br> <br> So, we shall also be saying bye bye D1 in GDC. =A0We&#39;ll miss you!<br> <br> <br> 3) For anyone who has submitted patches for Mingw and Apple - sorry, but I&= #39;m going to have to yank out or alter certain bits. =A0Apple GCC is irre= levant now, and some Mingw checks look for if(target) when it should really= be checking if(host) and vice versa!<br> <br> <br> Most discussion I would imagine be on the decision to remove D inline assem= bler support from gdc. =A0So, nay sayers, do your worst, but unfortunately = there is a +1 here for removal.<br> <br> <br> Regards<span class=3D"HOEnZb"><font color=3D"#888888"><br> Iain<br></font></span></blockquote><div><br></div><div>I&#39;m very much ou= tside of my area of understanding but would it be possible to use CTFE+mixi= n to generate GCC asm from DMD style asm allowing people to still use a sin= gle version of the asm for both DMD and GDC?</div> <div><br></div><div>Regards,</div><div>Brad Anderson</div></div> --bcaec517a7a4af13c904c2e97ef8--
Jun 20 2012
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Wednesday, June 20, 2012 13:33:53 Jacob Carlborg wrote:
 You do understand that the GCC-style inline assembly will still be
 available?

But inline assembler with the syntax that dmd uses is supposed to be part of the language. So, if gdc doesn't support it, it's not a fully compliant D compiler. It would be like if gdc didn't do auto a = expression; but instead did expression = a auto; except that the problem is more localized, because inline assembly is rather rare (unlike variable declarations). So, this a is a _huge_ deal. - Jonathan M Davis
Jun 20 2012
prev sibling next sibling parent reply Iain Buclaw <ibuclaw ubuntu.com> writes:
On 20 June 2012 17:00, Brad Anderson <eco gnuk.net> wrote:
 On Tue, Jun 19, 2012 at 12:19 PM, Iain Buclaw <ibuclaw ubuntu.com> wrote:
 Hi,

 Had round one of the code review process, so I'm going to post the main
 issues here that most affect D users / the platforms they want to run on=


 the compiler version they want to use.



 1) D Inline Asm and naked function support is raising far too many alarm
 bells. So would just be easier to remove it and avoid all the other comm=


 on why we need middle-end and backend headers in gdc.


 2) Code with #if V1 and V2 raised another bell with the request to remov=


 all code that relies on internal macros with proper if() conditions. If
 something is always going to be turned off, remove it.

 So, we shall also be saying bye bye D1 in GDC. =A0We'll miss you!


 3) For anyone who has submitted patches for Mingw and Apple - sorry, but
 I'm going to have to yank out or alter certain bits. =A0Apple GCC is
 irrelevant now, and some Mingw checks look for if(target) when it should
 really be checking if(host) and vice versa!


 Most discussion I would imagine be on the decision to remove D inline
 assembler support from gdc. =A0So, nay sayers, do your worst, but
 unfortunately there is a +1 here for removal.


 Regards
 Iain

I'm very much outside of my area of understanding but would it be possibl=

 to use CTFE+mixin to generate GCC asm from DMD style asm allowing people =

 still use a single version of the asm for both DMD and GDC?

 Regards,
 Brad Anderson

Hmm... doable, yes, but it would require a similarly complex construct as the implementation in the compiler. GCC Assembler is much more expressive than D Inline Assembler, and requires for you to describe everything a given asm command is doing, inputs, outputs, clobbers, and labels that we may jump to (if any). The only thing I worry is that CTFE is not powerful enough process a long set of instructions at a fast enough rate to make it benefitial. --=20 Iain Buclaw *(p < e ? p++ : p) =3D (c & 0x0f) + '0';
Jun 20 2012
parent reply deadalnix <deadalnix gmail.com> writes:
Le 20/06/2012 18:18, Iain Buclaw a crit :
 On 20 June 2012 17:00, Brad Anderson<eco gnuk.net>  wrote:
 On Tue, Jun 19, 2012 at 12:19 PM, Iain Buclaw<ibuclaw ubuntu.com>  wrote:
 Hi,

 Had round one of the code review process, so I'm going to post the main
 issues here that most affect D users / the platforms they want to run on /
 the compiler version they want to use.



 1) D Inline Asm and naked function support is raising far too many alarm
 bells. So would just be easier to remove it and avoid all the other comments
 on why we need middle-end and backend headers in gdc.


 2) Code with #if V1 and V2 raised another bell with the request to remove
 all code that relies on internal macros with proper if() conditions. If
 something is always going to be turned off, remove it.

 So, we shall also be saying bye bye D1 in GDC.  We'll miss you!


 3) For anyone who has submitted patches for Mingw and Apple - sorry, but
 I'm going to have to yank out or alter certain bits.  Apple GCC is
 irrelevant now, and some Mingw checks look for if(target) when it should
 really be checking if(host) and vice versa!


 Most discussion I would imagine be on the decision to remove D inline
 assembler support from gdc.  So, nay sayers, do your worst, but
 unfortunately there is a +1 here for removal.


 Regards
 Iain

I'm very much outside of my area of understanding but would it be possible to use CTFE+mixin to generate GCC asm from DMD style asm allowing people to still use a single version of the asm for both DMD and GDC? Regards, Brad Anderson

Hmm... doable, yes, but it would require a similarly complex construct as the implementation in the compiler. GCC Assembler is much more expressive than D Inline Assembler, and requires for you to describe everything a given asm command is doing, inputs, outputs, clobbers, and labels that we may jump to (if any). The only thing I worry is that CTFE is not powerful enough process a long set of instructions at a fast enough rate to make it benefitial.

Can't gdc frontend process asm to gcc's asm and go from that ?
Jun 20 2012
parent deadalnix <deadalnix gmail.com> writes:
Le 20/06/2012 18:40, Iain Buclaw a crit :
 On 20 June 2012 17:23, deadalnix<deadalnix gmail.com>  wrote:
 Le 20/06/2012 18:18, Iain Buclaw a crit :
 On 20 June 2012 17:00, Brad Anderson<eco gnuk.net>    wrote:
 On Tue, Jun 19, 2012 at 12:19 PM, Iain Buclaw<ibuclaw ubuntu.com>    wrote:

 Hi,

 Had round one of the code review process, so I'm going to post the main
 issues here that most affect D users / the platforms they want to run on
 /
 the compiler version they want to use.



 1) D Inline Asm and naked function support is raising far too many alarm
 bells. So would just be easier to remove it and avoid all the other
 comments
 on why we need middle-end and backend headers in gdc.


 2) Code with #if V1 and V2 raised another bell with the request to
 remove
 all code that relies on internal macros with proper if() conditions. If
 something is always going to be turned off, remove it.

 So, we shall also be saying bye bye D1 in GDC.  We'll miss you!


 3) For anyone who has submitted patches for Mingw and Apple - sorry, but
 I'm going to have to yank out or alter certain bits.  Apple GCC is
 irrelevant now, and some Mingw checks look for if(target) when it should
 really be checking if(host) and vice versa!


 Most discussion I would imagine be on the decision to remove D inline
 assembler support from gdc.  So, nay sayers, do your worst, but
 unfortunately there is a +1 here for removal.


 Regards
 Iain

I'm very much outside of my area of understanding but would it be possible to use CTFE+mixin to generate GCC asm from DMD style asm allowing people to still use a single version of the asm for both DMD and GDC? Regards, Brad Anderson

Hmm... doable, yes, but it would require a similarly complex construct as the implementation in the compiler. GCC Assembler is much more expressive than D Inline Assembler, and requires for you to describe everything a given asm command is doing, inputs, outputs, clobbers, and labels that we may jump to (if any). The only thing I worry is that CTFE is not powerful enough process a long set of instructions at a fast enough rate to make it benefitial.

Can't gdc frontend process asm to gcc's asm and go from that ?

It's what we did, but there's a lot of information that we require about, eg: the function frame pointer, that is not available to the frontend when trying to re-create just exactly what the assembly code is requiring us to do.

So, how does the programer is supposed to handle that when writing gcc's asm ?
Jun 20 2012
prev sibling next sibling parent Iain Buclaw <ibuclaw ubuntu.com> writes:
On 20 June 2012 17:08, Jonathan M Davis <jmdavisProg gmx.com> wrote:
 On Wednesday, June 20, 2012 13:33:53 Jacob Carlborg wrote:
 You do understand that the GCC-style inline assembly will still be
 available?

But inline assembler with the syntax that dmd uses is supposed to be part of the language. So, if gdc doesn't support it, it's not a fully compliant D compiler. It would be like if gdc didn't do auto a = expression; but instead did expression = a auto; except that the problem is more localized, because inline assembly is rather rare (unlike variable declarations). So, this a is a _huge_ deal.

1) DMD is capable of parsing both D Inline and GCC Extended assembler without throwing errors in the lexer/parser. 2) GDC defines GNU_InlineAsm, and does *not* define D_InlineAsm, D_InlineAsm_X86, or D_InlineAsm_X86_64. Not a huge deal if you follow standard coding practices, putting inline asm in D_InlineAsm blocks, etc. -- Iain Buclaw *(p < e ? p++ : p) = (c & 0x0f) + '0';
Jun 20 2012
prev sibling next sibling parent Iain Buclaw <ibuclaw ubuntu.com> writes:
On 20 June 2012 17:23, deadalnix <deadalnix gmail.com> wrote:
 Le 20/06/2012 18:18, Iain Buclaw a =E9crit :
 On 20 June 2012 17:00, Brad Anderson<eco gnuk.net> =A0wrote:
 On Tue, Jun 19, 2012 at 12:19 PM, Iain Buclaw<ibuclaw ubuntu.com> =A0wr=



 Hi,

 Had round one of the code review process, so I'm going to post the mai=




 issues here that most affect D users / the platforms they want to run =




 /
 the compiler version they want to use.



 1) D Inline Asm and naked function support is raising far too many ala=




 bells. So would just be easier to remove it and avoid all the other
 comments
 on why we need middle-end and backend headers in gdc.


 2) Code with #if V1 and V2 raised another bell with the request to
 remove
 all code that relies on internal macros with proper if() conditions. I=




 something is always going to be turned off, remove it.

 So, we shall also be saying bye bye D1 in GDC. =A0We'll miss you!


 3) For anyone who has submitted patches for Mingw and Apple - sorry, b=




 I'm going to have to yank out or alter certain bits. =A0Apple GCC is
 irrelevant now, and some Mingw checks look for if(target) when it shou=




 really be checking if(host) and vice versa!


 Most discussion I would imagine be on the decision to remove D inline
 assembler support from gdc. =A0So, nay sayers, do your worst, but
 unfortunately there is a +1 here for removal.


 Regards
 Iain

I'm very much outside of my area of understanding but would it be possible to use CTFE+mixin to generate GCC asm from DMD style asm allowing peopl=



 to
 still use a single version of the asm for both DMD and GDC?

 Regards,
 Brad Anderson

Hmm... doable, yes, but it would require a similarly complex construct as the implementation in the compiler. =A0GCC Assembler is much more expressive than D Inline Assembler, and requires for you to describe everything a given asm command is doing, inputs, outputs, clobbers, and labels that we may jump to (if any). =A0 The only thing I worry is that CTFE is not powerful enough process a long set of instructions at a fast enough rate to make it benefitial.

Can't gdc frontend process asm to gcc's asm and go from that ?

It's what we did, but there's a lot of information that we require about, eg: the function frame pointer, that is not available to the frontend when trying to re-create just exactly what the assembly code is requiring us to do. --=20 Iain Buclaw *(p < e ? p++ : p) =3D (c & 0x0f) + '0';
Jun 20 2012
prev sibling next sibling parent Iain Buclaw <ibuclaw ubuntu.com> writes:
On 20 June 2012 17:44, deadalnix <deadalnix gmail.com> wrote:
 Le 20/06/2012 18:40, Iain Buclaw a =E9crit :
 On 20 June 2012 17:23, deadalnix<deadalnix gmail.com> =A0wrote:

 Le 20/06/2012 18:18, Iain Buclaw a =E9crit :
 On 20 June 2012 17:00, Brad Anderson<eco gnuk.net> =A0 =A0wrote:
 On Tue, Jun 19, 2012 at 12:19 PM, Iain Buclaw<ibuclaw ubuntu.com>
 =A0wrote:

 Hi,

 Had round one of the code review process, so I'm going to post the
 main
 issues here that most affect D users / the platforms they want to ru=






 on
 /
 the compiler version they want to use.



 1) D Inline Asm and naked function support is raising far too many
 alarm
 bells. So would just be easier to remove it and avoid all the other
 comments
 on why we need middle-end and backend headers in gdc.


 2) Code with #if V1 and V2 raised another bell with the request to
 remove
 all code that relies on internal macros with proper if() conditions.
 If
 something is always going to be turned off, remove it.

 So, we shall also be saying bye bye D1 in GDC. =A0We'll miss you!


 3) For anyone who has submitted patches for Mingw and Apple - sorry,
 but
 I'm going to have to yank out or alter certain bits. =A0Apple GCC is
 irrelevant now, and some Mingw checks look for if(target) when it
 should
 really be checking if(host) and vice versa!


 Most discussion I would imagine be on the decision to remove D inlin=






 assembler support from gdc. =A0So, nay sayers, do your worst, but
 unfortunately there is a +1 here for removal.


 Regards
 Iain

I'm very much outside of my area of understanding but would it be possible to use CTFE+mixin to generate GCC asm from DMD style asm allowing people to still use a single version of the asm for both DMD and GDC? Regards, Brad Anderson

Hmm... doable, yes, but it would require a similarly complex construct as the implementation in the compiler. =A0GCC Assembler is much more expressive than D Inline Assembler, and requires for you to describe everything a given asm command is doing, inputs, outputs, clobbers, and labels that we may jump to (if any). =A0 The only thing I worry is that CTFE is not powerful enough process a long set of instructions at a fast enough rate to make it benefitial.

Can't gdc frontend process asm to gcc's asm and go from that ?

It's what we did, but there's a lot of information that we require about, eg: the function frame pointer, that is not available to the frontend when trying to re-create just exactly what the assembly code is requiring us to do.

So, how does the programer is supposed to handle that when writing gcc's =

 ?

Make an unaccurate guess of where a variable is located in relation to the stack frame. :-) Most actions are reasonably simple to deduce and convert. There are lots of notable exceptions though: ptr[1] =3D &foo; asm { jmp ptr[1*4]; } May produce a number of variants depending on whether ptr is a local, parameter, static, shared, thread local, or whether you are running Linux or Windows :-) But it's code like this that is the worst reason for removal: int zz( int p1 ) { asm { naked; mov EAX, p1[EBP]; } } Here, depending on what optimisation you have turned on, p1 could be passed on the stack, or in a register. So how do you plan to work out what value you are mov'ing to EAX without knowledge of the function frame that the GCC backend sets up in RTL. Information that the frontend should never have access to (and is #poison'd to enforce this) ? --=20 Iain Buclaw *(p < e ? p++ : p) =3D (c & 0x0f) + '0';
Jun 20 2012
prev sibling next sibling parent "David Nadlinger" <see klickverbot.at> writes:
On Wednesday, 20 June 2012 at 16:09:14 UTC, Jonathan M Davis 
wrote:
 But inline assembler with the syntax that dmd uses is supposed 
 to be part of
 the language. So, if gdc doesn't support it, it's not a fully 
 compliant D
 compiler.

I am not too sure about that: In my opinion, your description of the problem would be accurate if some compiler implemented asm {}, but with a different syntax or different semantics. But GDC simply does not (resp. will not) implement D-style inline assembly at all. From my point of view, this is not necessarily a problem spec-wise, as it is not guaranteed to be available – if it was, there would be no reason to have D_InlineAsm_X86 at all. Needless to say, inline assembly is sometimes a very convenient feature to have, but if it is the only issue stopping GDC from being merged to mainline GCC, I'd say the only sensible choice is to yank it, at least it for the time being. If, at a later point, somebody comes up with a clever way to implement it given the constraints imposed by the GCC infrastructure, or manages to convince the GCC maintainers to accept the »dirty« solution, it could still be added in again. David
Jun 20 2012
prev sibling next sibling parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 20/06/12 18:10, David Nadlinger wrote:
 I am not too sure about that: In my opinion, your description of the problem
 would be accurate if some compiler implemented asm {}, but with a different
 syntax or different semantics. But GDC simply does not (resp. will not)
 implement D-style inline assembly at all. From my point of view, this is not
 necessarily a problem spec-wise, as it is not guaranteed to be available –
if it
 was, there would be no reason to have D_InlineAsm_X86 at all.

Reading http://dlang.org/iasm.html I don't have the impression that the inline assembler is an optional part of the D spec or not guaranteed to be available -- it's very deliberately intended to be there.
 Needless to say, inline assembly is sometimes a very convenient feature to
have,
 but if it is the only issue stopping GDC from being merged to mainline GCC, I'd
 say the only sensible choice is to yank it, at least it for the time being. If,
 at a later point, somebody comes up with a clever way to implement it given the
 constraints imposed by the GCC infrastructure, or manages to convince the GCC
 maintainers to accept the »dirty« solution, it could still be added in again.

For sure it make sense as a short-term compromise, but I don't see how GDC can meet the D specifications without implementing the inline assembler at some point in the (hopefully near) future. When you consider that GDC is the best bet for being able to compile D on ARM processors, and a major application here is embedded systems, it really seems necessary to plan to have this functionality in there.
Jun 20 2012
prev sibling next sibling parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 20/06/12 20:35, Alex Rønne Petersen wrote:
 And x86 inline assembler... on ARM? I don't think I follow.

If I understand http://dlang.org/iasm.html correctly, the idea is that D should have an inline assembler for each target architecture. AFAICS what's desired is that you should be able to insert asm { // target-specific assembly goes here } ... and have it accepted by _any_ D compiler. That seems to me to be an important part of the language in general and even more so on architectures that are suited to embedded systems. So while it may make sense to cut the inline assembly in the short term for GDC, it doesn't make sense to me for it to be a change that lasts.
Jun 20 2012
prev sibling next sibling parent "David Nadlinger" <see klickverbot.at> writes:
On Wednesday, 20 June 2012 at 19:08:43 UTC, Joseph Rushton 
Wakeling wrote:
 Reading http://dlang.org/iasm.html I don't have the impression 
 that the inline assembler is an optional part of the D spec or 
 not guaranteed to be available -- it's very deliberately 
 intended to be there.

… yet code is only to assume it is actually available if D_InlineAsm_x86 is defined? I don't want to argue about the fact that there is certainly code out there which assumes inline x86 assembly to be available. My point was that this is _not_ like Jonathan's »auto a = expression« vs. »expression = a auto« example, because the language contains specific provisions for inline assembly not being available by having a standardized D_InlineAsm_x86 version identifier. David
Jun 20 2012
prev sibling next sibling parent "Jesse Phillips" <jessekphillips+D gmail.com> writes:
On Tuesday, 19 June 2012 at 18:19:01 UTC, Iain Buclaw wrote:

 1) D Inline Asm and naked function support is raising far too 
 many alarm bells. So would just be easier to remove it and 
 avoid all the other comments on why we need middle-end and 
 backend headers in gdc.

I'll give my opinion. I have yet to write some x86 assembly and only have anticipations of playing around in it, so take this with as much force as you desire. D specifies inline ASM, I don't see this GCC submission dictating its removal. So I think it would be best to actually support D. Now, the way you phrase this statement it sounds like it will be harder to get through the review and as there might be many changes to get the ASM support through. In which case I think postponing inline ASM support to get through an initial review and approval is fine. But it should be considered only temporary and should get approval based on the knowledge inline ASM will be coming. The other items do not appear relevant to supporting the D specification, and we are halfway to DigitalMars not supporting a D1 compiler.
Jun 20 2012
prev sibling next sibling parent "Bernard Helyer" <b.helyer gmail.com> writes:
On Thursday, 21 June 2012 at 00:02:58 UTC, Walter Bright wrote:
 On 6/20/2012 4:26 AM, Bernard Helyer wrote:
 I was sputtering with rage. Sputtering!

Look Dave, I can see you're really upset about this. I honestly think you ought to sit down calmly, take a stress pill, and think things over. I know I've made some very poor decisions recently, but I can give you my complete assurance that my work will be back to normal.

Open the source repo doors, Walt.
Jun 20 2012
prev sibling next sibling parent Iain Buclaw <ibuclaw ubuntu.com> writes:
On 21 June 2012 04:06, Bernard Helyer <b.helyer gmail.com> wrote:
 On Thursday, 21 June 2012 at 00:02:58 UTC, Walter Bright wrote:
 On 6/20/2012 4:26 AM, Bernard Helyer wrote:
 I was sputtering with rage. Sputtering!

Look Dave, I can see you're really upset about this. I honestly think you ought to sit down calmly, take a stress pill, and think things over. I know I've made some very poor decisions recently, but I can give you my complete assurance that my work will be back to normal.

Open the source repo doors, Walt.

It's not pining. It's passed on. This door is no more. It has ceased to be. It's expired and gone to meet its maker. This is a late door. It's a stiff. Bereft of life, it rests in peace. If you hadn't nailed it to the perch, it would be pushing up the daisies. It's rung down the curtain and joined the choir invisible. THIS IS AN EX-DOOR. -- Iain Buclaw *(p < e ? p++ : p) = (c & 0x0f) + '0';
Jun 21 2012
prev sibling parent Iain Buclaw <ibuclaw ubuntu.com> writes:
On 20 June 2012 21:16, Alex R=F8nne Petersen <alex lycus.org> wrote:
 On 20-06-2012 21:48, Joseph Rushton Wakeling wrote:
 On 20/06/12 20:35, Alex R=F8nne Petersen wrote:
 And x86 inline assembler... on ARM? I don't think I follow.

If I understand http://dlang.org/iasm.html correctly, the idea is that D should have an inline assembler for each target architecture. AFAICS what's desired is that you should be able to insert asm { // target-specific assembly goes here } .... and have it accepted by _any_ D compiler. That seems to me to be an important part of the language in general and even more so on architectures that are suited to embedded systems. So while it may make sense to cut the inline assembly in the short term for GDC, it doesn't make sense to me for it to be a change that lasts.

GDC currently supports x86, ARM, PowerPC, MIPS, SPARC, and possibly other=

 The language reference lists assembly syntax for x86. I understand that i=

 an ideal world, we'd have standardized assembly syntaxes for all of these
 architectures, but somebody has to actually spec and implement them.

 Besides, Iain has already pointed out that the x86 syntax in the spec
 doesn't integrate with GCC's inline assembly support at all (which is why
 GDC had the glue code for it). It took around 2000 lines (if memory serve=

 to translate the D inline assembly to GCC inline assembly. Now imagine
 having to do this for every architecture ever supported.

More closer to 4000 lines, and the current implementation is in no state to be able to add more architechtures into the mix. --=20 Iain Buclaw *(p < e ? p++ : p) =3D (c & 0x0f) + '0';
Jun 21 2012