www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - DIP56 Provide pragma to control function inlining

reply Walter Bright <newshound2 digitalmars.com> writes:
http://wiki.dlang.org/DIP56

Manu has needed always inlining, and I've needed never inlining. This DIP 
proposes a simple solution.
Feb 23 2014
next sibling parent reply "Mike" <none none.com> writes:
On Sunday, 23 February 2014 at 12:07:40 UTC, Walter Bright wrote:
 http://wiki.dlang.org/DIP56

 Manu has needed always inlining, and I've needed never 
 inlining. This DIP proposes a simple solution.

Is this a front-end thing or something specific to DMD? I'm wondering because I'd like something like this for GDC and LCD when targeting ARM microcontrollers. The inline keyword makes quite a significant performance improvement in one of my current C++ projects, and I anticipate the same result when I convert it to D. Any chance of adding a "optimize, true/false" pragma also to get around the lack of a volatile keyword? (Just a question, I don't mean to hijack this thread and turn into another volatile keyword debate). Mike
Feb 23 2014
parent Walter Bright <newshound2 digitalmars.com> writes:
On 2/23/2014 4:21 AM, Mike wrote:
 Is this a front-end thing or something specific to DMD?  I'm wondering because
 I'd like something like this for GDC and LCD when targeting ARM
 microcontrollers.  The inline keyword makes quite a significant performance
 improvement in one of my current C++ projects, and I anticipate the same result
 when I convert it to D.

It's a hint to the compiler - the compiler is allowed to ignore it if it doesn't support it.
 Any chance of adding a "optimize, true/false" pragma also to get around the
lack
 of a volatile keyword? (Just a question, I don't mean to hijack this thread and
 turn into another volatile keyword debate).

Please start another thread with your proposal.
Feb 23 2014
prev sibling next sibling parent reply Benjamin Thaut <code benjamin-thaut.de> writes:
Am 23.02.2014 13:07, schrieb Walter Bright:
 http://wiki.dlang.org/DIP56

 Manu has needed always inlining, and I've needed never inlining. This
 DIP proposes a simple solution.

Why a pragma? Can't we use a UDA and give it some special meaning inside the compiler?
Feb 23 2014
next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 2/23/2014 4:25 AM, Benjamin Thaut wrote:
 Why a pragma? Can't we use a UDA and give it some special meaning inside the
 compiler?

This shouldn't be an attribute, it's a hint to the compiler optimizer. Pragma is ideally suited to that.
Feb 23 2014
prev sibling next sibling parent reply "Namespace" <rswhite4 googlemail.com> writes:
On Sunday, 23 February 2014 at 12:25:20 UTC, Benjamin Thaut wrote:
 Am 23.02.2014 13:07, schrieb Walter Bright:
 http://wiki.dlang.org/DIP56

 Manu has needed always inlining, and I've needed never 
 inlining. This
 DIP proposes a simple solution.

Why a pragma? Can't we use a UDA and give it some special meaning inside the compiler?

+1 I would also prefer an attribute which can be used as label. ---- inline(true): // ... inline(false): // ... inline(default): ----
Feb 23 2014
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 2/23/2014 1:41 PM, Namespace wrote:
 pragma(inline, true);
 pragma(inline, false);
 pragma(inline, default);

'default' being a keyword makes for an ugly special case in how pragmas are parsed.
Feb 23 2014
parent reply Lionello Lunesu <lionello lunesu.remove.com> writes:
On 24/02/14 06:12, Walter Bright wrote:
 On 2/23/2014 1:41 PM, Namespace wrote:
 pragma(inline, true);
 pragma(inline, false);
 pragma(inline, default);

'default' being a keyword makes for an ugly special case in how pragmas are parsed.

Aren't true and false keywords as well?
Feb 23 2014
parent Walter Bright <newshound2 digitalmars.com> writes:
On 2/23/2014 5:47 PM, Lionello Lunesu wrote:
 On 24/02/14 06:12, Walter Bright wrote:
 On 2/23/2014 1:41 PM, Namespace wrote:
 pragma(inline, true);
 pragma(inline, false);
 pragma(inline, default);

'default' being a keyword makes for an ugly special case in how pragmas are parsed.

Aren't true and false keywords as well?

Yes, but the are also expressions. default is not.
Feb 23 2014
prev sibling parent "Namespace" <rswhite4 googlemail.com> writes:
On Sunday, 23 February 2014 at 19:10:08 UTC, Namespace wrote:
 On Sunday, 23 February 2014 at 12:25:20 UTC, Benjamin Thaut 
 wrote:
 Am 23.02.2014 13:07, schrieb Walter Bright:
 http://wiki.dlang.org/DIP56

 Manu has needed always inlining, and I've needed never 
 inlining. This
 DIP proposes a simple solution.

Why a pragma? Can't we use a UDA and give it some special meaning inside the compiler?

+1 I would also prefer an attribute which can be used as label. ---- inline(true): // ... inline(false): // ... inline(default): ----

I still prefer the attribute/UDA idea but in case of pragma: pragma(inline, true); pragma(inline, false); pragma(inline, default); ?
Feb 23 2014
prev sibling next sibling parent reply "Tove" <tove fransson.se> writes:
On Sunday, 23 February 2014 at 12:07:40 UTC, Walter Bright wrote:
 http://wiki.dlang.org/DIP56

 Manu has needed always inlining, and I've needed never 
 inlining. This DIP proposes a simple solution.

yay, all for it! The DIP should probably specify what happens if inlining fails, i.e. generate a compilation error. Could we consider adding "flatten" in the same dip? quote from gcc "Flatten Generally, inlining into a function is limited. For a function marked with this attribute, every call inside this function is inlined, if possible. Whether the function itself is considered for inlining depends on its size and the current inlining parameters. "
Feb 23 2014
next sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
23-Feb-2014 16:25, Tove пишет:
 On Sunday, 23 February 2014 at 12:07:40 UTC, Walter Bright wrote:
 http://wiki.dlang.org/DIP56

 Manu has needed always inlining, and I've needed never inlining. This
 DIP proposes a simple solution.

yay, all for it! The DIP should probably specify what happens if inlining fails, i.e. generate a compilation error. Could we consider adding "flatten" in the same dip? quote from gcc "Flatten Generally, inlining into a function is limited. For a function marked with this attribute, every call inside this function is inlined, if possible. Whether the function itself is considered for inlining depends on its size and the current inlining parameters. "

Yes, please. -- Dmitry Olshansky
Feb 23 2014
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 2/23/2014 4:25 AM, Tove wrote:
 The DIP should probably specify what happens if inlining fails,
 i.e. generate a compilation error.

I suspect that may cause problems, because different compilers will have different inlining capabilities. I think it should be a 'recommendation' to the compiler.
Feb 23 2014
next sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
23-Feb-2014 16:57, Walter Bright пишет:
 On 2/23/2014 4:25 AM, Tove wrote:
 The DIP should probably specify what happens if inlining fails,
 i.e. generate a compilation error.

I suspect that may cause problems, because different compilers will have different inlining capabilities. I think it should be a 'recommendation' to the compiler.

It's going to be near useless if it doesn't make sure inlining happened. Part of the reason for forced inline is always inlining some core primitives, even in debug builds. The other point is what Vladimir mentioned - we already doing micro-optimization, hence it better error out then turn a blind eye on our tinkering. I wouldn't not like to ever have to get down and look at ASM for every function just to make sure it was inlined. -- Dmitry Olshansky
Feb 23 2014
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 2/23/2014 5:07 AM, Dmitry Olshansky wrote:
 Part of the reason for forced inline is always inlining some core primitives,
 even in debug builds.

Right - and if the compiler won't do it, how does the error message help?
 I wouldn't not like to ever have to get down and look at ASM for every 

By the time you get to the point of checking on inlining, you're already looking at the assembler output, because the function is on the top of the profile of time wasters, and that's how you take it to the next level of performance. The trouble with an error message, is what (as the user) can you do about it?
Feb 23 2014
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
24-Feb-2014 00:46, Walter Bright пишет:
 On 2/23/2014 5:07 AM, Dmitry Olshansky wrote:
 Part of the reason for forced inline is always inlining some core
 primitives,
 even in debug builds.

Right - and if the compiler won't do it, how does the error message help?

reason. Keep in mind that code changes with time and running profiler/disassembler on every tiny change to make sure the stuff is still inlined is highly counter-productive.
  > I wouldn't not like to ever have to get down and look at ASM for
 every function just to make sure it was inlined.

 By the time you get to the point of checking on inlining, you're already
 looking at the assembler output, because the function is on the top of
 the profile of time wasters, and that's how you take it to the next
 level of performance.

A one-off activity. Now what guarantees you will have that it will keep getting inlined? Right, nothing.
 The trouble with an error message, is what (as the user) can you do
 about it?

Re-write till compiler loves it, that is what we do today anyway. Else we wouldn't mark it as force_inline in the first place. With error - yo get a huge advantage - an _instant_ feedback that it doesn't do what you want it to do. Otherwise it gets the extra pleasure of running disassembler to pinpoint your favorite call sites or observing that your profiler shows the same awful stats. -- Dmitry Olshansky
Feb 23 2014
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 2/23/2014 1:04 PM, Dmitry Olshansky wrote:
 That programmer is instantly aware that it can't be done due to some reason.
 Keep in mind that code changes with time and running profiler/disassembler on
 every tiny change to make sure the stuff is still inlined is highly
 counter-productive.

I'm aware of that, but once you add the: version(BadCompiler) { } else pragma(inline, true); things will never get better for BadCompiler. And besides, that line looks awful.
 By the time you get to the point of checking on inlining, you're already
 looking at the assembler output, because the function is on the top of
 the profile of time wasters, and that's how you take it to the next
 level of performance.

A one-off activity. Now what guarantees you will have that it will keep getting inlined? Right, nothing.

You're always going to have that issue when optimizing at that level, and it will be for a large range of constructs. For example, you may need variable x to be enregistered. You may need some construct to be implemented as a ROL instruction. You may need a switch to be implemented as a binary search.
 The trouble with an error message, is what (as the user) can you do
 about it?

wouldn't mark it as force_inline in the first place.

In which case there will be two code paths selected with a version(BadCompiler). I have a hard time seeing the value in supporting both code paths - the programmer would just use the workaround code always.
 With error - yo get a huge advantage - an _instant_ feedback that it doesn't do
 what you want it to do. Otherwise it gets the extra pleasure of running
 disassembler to pinpoint your favorite call sites or observing that your
 profiler shows the same awful stats.

My point is you're going to have to look at the asm of the top functions on the profiler stats anyway, or you're wasting your time trying to optimize the code. (Speaking from considerable experience doing that.) There's a heluva lot more to optimizing effectively than inlining, and it takes some back-and-forth tweaking source code and looking at the assembler. I gave some examples of that above. And yes, performance critical code often suffers from bit rot, and changes in the compiler, and needs to be re-tuned now and then. I suspect if the compiler errors out on a failed inline, it'll be much less useful than one might think.
Feb 23 2014
next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 2/23/2014 1:53 PM, Walter Bright wrote:
 And yes, performance critical code often suffers from bit rot, and changes in
 the compiler, and needs to be re-tuned now and then.

BTW, just to reiterate, there are *thousands* of optimizations the compiler may or may not do. And yes, performance critical code will often rely on them, and code is often tuned to 'tickle' certain ones. For example, I know a fellow years ago who thought he had invented a spectacular new string processing algorithm. He had the benchmarks to prove it, and published an article with his with/without benchmark. Unfortunately, the without benchmark contained an extra DIV instruction that, due to the vagaries of optimization, the compiler hadn't elided. That DIV had nothing to do with the algorithm, but the benchmark timing differences were totally due to its presence/absence. He would have spotted it if he'd ever looked at the asm generated, and saved himself from some embarrassment. I understand that in an ideal world one should never have to look at asm, but if you're writing high performance code and don't look at asm, the code is never going to beat the competition.
Feb 23 2014
prev sibling next sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
24-Feb-2014 01:53, Walter Bright пишет:
 On 2/23/2014 1:04 PM, Dmitry Olshansky wrote:
 That programmer is instantly aware that it can't be done due to some
 reason.
 Keep in mind that code changes with time and running
 profiler/disassembler on
 every tiny change to make sure the stuff is still inlined is highly
 counter-productive.

I'm aware of that, but once you add the: version(BadCompiler) { } else pragma(inline, true); things will never get better for BadCompiler. And besides, that line looks awful.

You actually going against yourself with this argument - for porting you typically suggest: version(OS1) ... else version(OS2) ... else static assert(0); Why forced_inline is any different then other porting (where you want fail fast).
 By the time you get to the point of checking on inlining, you're already
 looking at the assembler output, because the function is on the top of
 the profile of time wasters, and that's how you take it to the next
 level of performance.

A one-off activity. Now what guarantees you will have that it will keep getting inlined? Right, nothing.

You're always going to have that issue when optimizing at that level, and it will be for a large range of constructs. For example, you may need variable x to be enregistered. You may need some construct to be implemented as a ROL instruction. You may need a switch to be implemented as a binary search.

Let's not detract from original point. ROL is done as an instrinsic, and there are different answers to many of these questions that are BETTER then _always_ triple checking by hand and doing re-writes. Switch may benefit from pragmas as well, and modern compiler allow tweaking it. In fact LLVM allows assigning weights to specify which cases are more probable. Almost all of listed issues could be addressed better then dancing around disassembler and trying to please PARTICULAR COMPILER for many cases you listed above. Yes, looking at ASM is important but no not every single case should require the painful cycle of: compile->disassemble-->re-write-->compile-->...
 The trouble with an error message, is what (as the user) can you do
 about it?

wouldn't mark it as force_inline in the first place.

In which case there will be two code paths selected with a version(BadCompiler). I have a hard time seeing the value in supporting both code paths - the programmer would just use the workaround code always.

Your nice tired and true way of doing things is EQUALLY FRAGILE (if not more) and highly coupled to the compiler but only SILENTLY so.
 With error - yo get a huge advantage - an _instant_ feedback that it
 doesn't do
 what you want it to do. Otherwise it gets the extra pleasure of running
 disassembler to pinpoint your favorite call sites or observing that your
 profiler shows the same awful stats.

My point is you're going to have to look at the asm of the top functions on the profiler stats anyway, or you're wasting your time trying to optimize the code.

Like I don't know already, getting in this discussion.
 (Speaking from considerable experience doing that.)

And since you've come to enjoy it as is, you accept no improvements over that process? So you known it's hard fighting the compiler and you decidedly as a samurai reject any help messing with it. I seriously don't get the point. GCC has force inline, let's look at what GCC does with its always_inline: http://gcc.gnu.org/ml/gcc-help/2007-01/msg00051.html Quote of interest: ---
 **5) Could there be any situation, where a function with always_inline
 is _silently_ not embedded?

I hope not. I don't know of any. ---
 There's a heluva lot more to optimizing effectively than inlining, and
 it takes some back-and-forth tweaking source code and looking at the
 assembler. I gave some examples of that above.

Just because there are other reasons to look at disassembly is not a good reason to forcibly send people to double-check compiler for basic inlining.
 And yes, performance critical code often suffers from bit rot, and
 changes in the compiler, and needs to be re-tuned now and then.

And you accept no safe-guards against this because that is "the true old way"?
 I suspect if the compiler errors out on a failed inline, it'll be much
 less useful than one might think.

On the contrary, at least I may have to spent less time checking that intended optimizations are being done in ASM listings. -- Dmitry Olshansky
Feb 23 2014
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 2/23/2014 3:00 PM, Dmitry Olshansky wrote:
 You actually going against yourself with this argument - for porting you
 typically suggest:

 version(OS1)
   ...
 else version(OS2)
   ...
 else
 static assert(0);

There's not much choice about that. I also suggest moving such code into separate modules.
 Your nice tired and true way of doing things is EQUALLY FRAGILE (if not more)
 and highly coupled to the compiler but only SILENTLY so.

That's very true. Do you suggest the compiler emit a list of what optimizations it did or did not do? What makes inlining special, as opposed to, say, enregistering particular variables?
Feb 23 2014
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 2/23/2014 3:55 PM, Mike wrote:
 The difference is it was explicitly told do do something and didn't.  That's
 insubordination.

I view this as more in the manner of providing the equivalent of runtime profiling information to the optimizer, in indirectly saying how often a function is executed. Optimizing is a rather complicated process, and particular optimizations very often have weird and unpredictable interactions with other optimizations. For example, in the olden days, C compilers had a 'register' storage class. Optimizers' register allocation strategy was so primitive it needed help. Over time, however, it became apparent that uses of 'register' became bit-rotted due to maintenance, resulting in all the wrong variables being enregistered. Compiler register allocation got a lot better, almost always being better than the users'. Not only that, but with generic code, and optimization rewrites of code, many variables would disappear and new ones would take their place. Different CPUs needed different register allocation strategies. What to do with 'register' then? The result was compilers began to take the 'register' as a hint, and eventually moved to totally ignoring 'register', as it turned out to be a pessimization. I suspect that elevating one particular optimization hint to being an absolute command may not turn out well. Inlining already has performance issues, as it may increase the size of an inner loop beyond what will fit in the cache, for just one unexpected result. For another it may mess up the register allocation of the caller. "Inlining makes it faster" is not always true. Do you really want to weld this in as an absolute requirement in the language?
Feb 23 2014
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 2/23/14, 4:33 PM, Walter Bright wrote:
 On 2/23/2014 3:55 PM, Mike wrote:
 The difference is it was explicitly told do do something and didn't.
 That's
 insubordination.

I view this as more in the manner of providing the equivalent of runtime profiling information to the optimizer, in indirectly saying how often a function is executed. Optimizing is a rather complicated process, and particular optimizations very often have weird and unpredictable interactions with other optimizations. For example, in the olden days, C compilers had a 'register' storage class. Optimizers' register allocation strategy was so primitive it needed help. Over time, however, it became apparent that uses of 'register' became bit-rotted due to maintenance, resulting in all the wrong variables being enregistered. Compiler register allocation got a lot better, almost always being better than the users'. Not only that, but with generic code, and optimization rewrites of code, many variables would disappear and new ones would take their place. Different CPUs needed different register allocation strategies. What to do with 'register' then? The result was compilers began to take the 'register' as a hint, and eventually moved to totally ignoring 'register', as it turned out to be a pessimization. I suspect that elevating one particular optimization hint to being an absolute command may not turn out well. Inlining already has performance issues, as it may increase the size of an inner loop beyond what will fit in the cache, for just one unexpected result. For another it may mess up the register allocation of the caller. "Inlining makes it faster" is not always true. Do you really want to weld this in as an absolute requirement in the language?

I'll add an anecdote - in HHVM we owe a lot of speedups to the careful use of "never inline" and "always inline" gcc pragmas IN ADDITION TO the usual "inline" directives. We have factual proof that gcc makes the wrong inline decisions BOTH WAYS if left to decide. If we define pragmas for inlining, "always inline" must mean always inline no questions asked and "never inline" must mean always prevent inlining no questions asked. Anything else would be a frustrating waste of time. Andrei
Feb 23 2014
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 2/23/14, 8:26 PM, Vladimir Panteleev wrote:
 On Monday, 24 February 2014 at 04:14:08 UTC, Andrei Alexandrescu wrote:
 I'll add an anecdote - in HHVM we owe a lot of speedups to the careful
 use of "never inline" and "always inline" gcc pragmas IN ADDITION TO
 the usual "inline" directives. We have factual proof that gcc makes
 the wrong inline decisions BOTH WAYS if left to decide.

 If we define pragmas for inlining, "always inline" must mean always
 inline no questions asked and "never inline" must mean always prevent
 inlining no questions asked. Anything else would be a frustrating
 waste of time.

I think there is another, distinct use case for an inline pragma where "try to inline" is useful - namely, turning on the equivalent of the compiler "-inline" switch for just one function. I believe this is the original rationale behind the DIP (enabling inlining for certain functions even in debug builds, because otherwise the debug builds become so slow as to be unusable). In this case, whether the compiler actually succeeds at inlining the function doesn't matter as long as it does the same thing as for an optimized (-inline) build. Thus, I think there should be "try to inline" (same as -inline) and "always inline" (failure stops compilation).

Sounds fair enough. Andrei
Feb 23 2014
prev sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
24-Feb-2014 04:33, Walter Bright пишет:
 On 2/23/2014 3:55 PM, Mike wrote:
 The difference is it was explicitly told do do something and didn't.
 That's
 insubordination.

I view this as more in the manner of providing the equivalent of runtime profiling information to the optimizer, in indirectly saying how often a function is executed. Optimizing is a rather complicated process, and particular optimizations very often have weird and unpredictable interactions with other optimizations.

Speaking of other optimizations. There is a thing called tail-call. Funnily enough compilers still consider it an optimization whereas in practice the difference usually means "stack overflow" vs "normal execution" for functional-style code. But I'd rather prefer we stay focused on one particular optimization here.
 For example, in the olden days, C compilers had a 'register' storage
 class. Optimizers' register allocation strategy was so primitive it
 needed help. Over time, however, it became apparent that uses of
 'register' became bit-rotted due to maintenance, resulting in all the
 wrong variables being enregistered. Compiler register allocation got a
 lot better, almost always being better than the users'.

When such a time the compiler can actually produce the best inlining decisions on its own these kind of options may become irrelevant. However it may need to run profiler on relevant input to understand that and do it all by itself.
 Not only that,
 but with generic code, and optimization rewrites of code, many variables
 would disappear and new ones would take their place. Different CPUs
 needed different register allocation strategies. What to do with
 'register' then?

Indeed register was tied to something immaterial - a variable, whereas in fact there are plenty of temporaries and induction variables that a programmer can't label. In contrast the generic code is functions upon functions passed through other tiny functions. This in part what makes inlining so special.
 The result was compilers began to take the 'register' as a hint, and
 eventually moved to totally ignoring 'register', as it turned out to be
 a pessimization.

 I suspect that elevating one particular optimization hint to being an
 absolute command may not turn out well. Inlining already has performance
 issues, as it may increase the size of an inner loop beyond what will
 fit in the cache, for just one unexpected result. For another it may
 mess up the register allocation of the caller.

"Inlining makes it
 faster" is not always true.

Like I'm a bloody idiot. But once your performance problem is (after perusing ASM) particular function not being inlined, dancing around compiler in the DARK until it strikes home (if ever) isn't a viable option. And with DMD it's like 90% of cases my problem is some critical one-liner not being inlined. In contracts register allocation is mostly fine. There are some marvelous codegen gems though: https://d.puremagic.com/issues/show_bug.cgi?id=10932 where compiler moves from ebx to edx via a stack slot for no apparent reason.
 Do you really want to weld this in as an
 absolute requirement in the language?

Aye. That and explicit tail calls but that's a separate matter. Experimental compilers may choose to issue warnings saying that they basically can't inline (yet or by design). -- Dmitry Olshansky
Feb 24 2014
prev sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
24-Feb-2014 03:49, Walter Bright пишет:
 On 2/23/2014 3:00 PM, Dmitry Olshansky wrote:
 You actually going against yourself with this argument - for porting you
 typically suggest:

 version(OS1)
   ...
 else version(OS2)
   ...
 else
 static assert(0);

There's not much choice about that. I also suggest moving such code into separate modules.
 Your nice tired and true way of doing things is EQUALLY FRAGILE (if
 not more)
 and highly coupled to the compiler but only SILENTLY so.

That's very true. Do you suggest the compiler emit a list of what optimizations it did or did not do? What makes inlining special, as opposed to, say, enregistering particular variables?

GCC has these attributes (including flatten to fully unroll all calls in a function) for a good reason. Let's face the fact that compilers nowhere near perfect with decisions about inlining. Especially so when building libraries. Inlining is special in the sense that compiler doesn't know (there is not a single hint today in D) if any particular function should be a part of object code (following the ABI and referenced elsewhere) or just a logical piece of code that is reused (using any convenient calling convention or inlined). Let me turn the question sideways - what if no_inline will be a hint to compiler and it may feel free to inline the function anyway? Will you be happy with such a pragma? It's that simple - you either gain control, or stay with wishy-washy hopes. As you said in contrast with register allocation (that is ridiculously hard problem) later with time it turned out that trying to pin outsmart the compiler is something people were not good at in general. -- Dmitry Olshansky
Feb 24 2014
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 2/23/2014 4:21 PM, Tove wrote:
 Inspecting asm output doesn't scale well to huge projects. Imagine simply
 updating the existing codebase to use a new compiler version.

Again, this is treating 'inline' as being the only optimization that matters? It's not even the most important - that would likely be register allocation. At some point, you're going to need to trust the compiler.
 You are right in that there is nothing special about inlining, but I'd rather
 add warnings for all other failed optimisation opportunities than not to warn
 about failed inlining. RVCT for instance has --diag_warning=optimizations,
which
 gives many helpful hints, such as alias issues: please add "restrict", or
 possible alignment issues etc.

There are *thousands* of optimization patterns. Logging which ones were applied to each expression node would be utterly useless to anyone but a compiler writer. (You can turn this on in debug builds of the compiler and see for yourself.) The most effective log is to look at the asm output. There isn't a substitute. I know that doesn't scale, going back to my point that at some point you're going to have to spot check here and there and otherwise trust the compiler. I know that most programmers don't want to look at the asm output. Whether an error for failed inlining is or is not issued won't change the need to have a look now and then, if you want your code to be the fastest it can be. BTW, although the DIP says the compiler can ignore it, in practice there aren't going to be perverse compilers. Compiler writers want their compilers to be useful, and don't go out of their way to sneakily interpret the spec to do as bad a job as possible. Conversely, the history of programmer-supplied optimizer edicts (see 'register') is not a very good one, as programmers are often not terribly cognizant of the tradeoffs and tend to use overly simplistic rules when applying these edicts. As optimizers improve, they shouldn't be impeded by well-intentioned but wrong optimization edicts. (An early version of my C compiler had a long list of various optimization strategies that could be turned on/off. Never once was any appropriate use made of these. It's why dmd has evolved to simply have -O. -inline is a separate switch for reasons of symbolic debuggability.)
Feb 23 2014
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 2/23/2014 5:45 PM, Brad Roberts wrote:
 At this point, you're starting to argue that the entire DIP isn't relevant.  I
 agree with the majority that if you're going to have the directive, then it
 needs to be enforcement, not suggestion.

1. It provides information to the compiler about runtime frequency that it cannot obtain otherwise. This is very useful information for generating better code. 2. Making it a hard requirement then means the user will have to put versioning in it. It becomes inherently non-portable. There is no way to predict what some other version of some other compiler on some other system will do. 3. In the end, the compiler should make the decision. Inlining does not always result in faster code, as I pointed out in another post. 4. I don't see that users really are asking for inlining or not. They are asking for the fastest code. As such, providing hints about usage frequencies are entirely appropriate. Micromanaging the method used is not so appropriate. After all, the reason one uses a compiler in the first place rather than assembler is to not micromanage the actual instructions. Perhaps the lesson is the word 'inline' carries certain expectations with it, and the feature would be better positioned as something like: pragma(usage, often); pragma(usage, rare);
Feb 23 2014
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 2/23/14, 6:05 PM, Walter Bright wrote:
 4. I don't see that users really are asking for inlining or not. They
 are asking for the fastest code. As such, providing hints about usage
 frequencies are entirely appropriate. Micromanaging the method used is
 not so appropriate. After all, the reason one uses a compiler in the
 first place rather than assembler is to not micromanage the actual
 instructions.

In HHVM we plainly ask for specific decisions on inlining or not. We have a reasonably good understanding of how and where our code has trouble with ICache misses, and adjust our inline decisions and validate using experiments. A decision to force inlining or against it already indicates a failure of the compiler's heuristics to address the situation. Keeping it an option is insisting on failing.
 Perhaps the lesson is the word 'inline' carries certain expectations
 with it, and the feature would be better positioned as something like:

      pragma(usage, often);
      pragma(usage, rare);

That's an interesting unrelated idea. But if we defined pragmas to "force inline" and "never inline" we must damn sure make sure the compiler always does that. It's "listen to your customers" as plainly as it gets. Andrei
Feb 23 2014
prev sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 2/23/2014 11:04 AM, Dicebot wrote:
 Optional recommendation for inlining already exists - it is current default.

That is not the point of the pragma. The point of always inlining is (as Manu explained) some functions need to be inlined even in debug mode, as the code would otherwise be too slow to even debug.
Feb 23 2014
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 2/23/2014 5:01 AM, Vladimir Panteleev wrote:
 I think there should be some way to force the compiler to inline a function. As
 a bonus, the error message can tell the programmer why the function could not
be
 inlined, allowing them to make the necessary adjustments.

 Different compilers will have different inlining capabilities, however at the
 point where programmers are forcing inlining on or off, they are already
 micro-optimizing at a level which implies dependency on particular compiler
 implementations.

I think it would be a porting nuisance to error out when the compiler can't inline. The user would then fix it by versioning out for that compiler, and then the user is back to the same state as it being a recommendation. Generally, when I optimize at that level, I have a window open on the assembler output of the compiler and I go back and forth on the source code until I get the shape of the assembler I need. Having compiler messages wouldn't be very helpful.
Feb 23 2014
next sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
24-Feb-2014 00:40, Walter Bright пишет:
 On 2/23/2014 5:01 AM, Vladimir Panteleev wrote:
 I think there should be some way to force the compiler to inline a
 function. As
 a bonus, the error message can tell the programmer why the function
 could not be
 inlined, allowing them to make the necessary adjustments.

 Different compilers will have different inlining capabilities, however
 at the
 point where programmers are forcing inlining on or off, they are already
 micro-optimizing at a level which implies dependency on particular
 compiler
 implementations.

I think it would be a porting nuisance to error out when the compiler can't inline. The user would then fix it by versioning out for that compiler, and then the user is back to the same state as it being a recommendation.

Porting across compilers you mean? While porting making temporary changes is fine, like turning off force_inline where it doesn't work. Without this error you are facing a silent performance disaster you still need to figure out. Fail fast for the win.
 Generally, when I optimize at that level, I have a window open on the
 assembler output of the compiler and I go back and forth on the source
 code until I get the shape of the assembler I need. Having compiler
 messages wouldn't be very helpful.

Will save you the trouble of looking at the assembly window to begin with. Because you known ahead of time you wouldn't see what you like. -- Dmitry Olshansky
Feb 23 2014
prev sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 2/23/2014 1:32 PM, Francesco Cattoglio wrote:
 [...]

I addressed these three messages in another reply to Dmitry.
Feb 23 2014
prev sibling next sibling parent reply "Andrej Mitrovic" <andrej.mitrovich gmail.com> writes:
On Sunday, 23 February 2014 at 12:07:40 UTC, Walter Bright wrote:
 http://wiki.dlang.org/DIP56

 Manu has needed always inlining, and I've needed never 
 inlining. This DIP proposes a simple solution.

What if you want to mark a series of functions to be inlined? E.g. in an entire module: ----- module fast; // ?? pragma(inline, true): Vec vecSum(); Vec vecMul(); ----- Seems like a solution would be preferred where this can be used for multiple functions. A UDA/ property of some sort.
Feb 23 2014
parent Walter Bright <newshound2 digitalmars.com> writes:
On 2/23/2014 4:31 AM, Andrej Mitrovic wrote:
 What if you want to mark a series of functions to be inlined? E.g. in an entire
 module:

 -----
 module fast;

 // ??
 pragma(inline, true):

 Vec vecSum();
 Vec vecMul();
 -----

That can work because pragmas can have blocks associated with them.
Feb 23 2014
prev sibling next sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
23-Feb-2014 16:07, Walter Bright пишет:
 http://wiki.dlang.org/DIP56

 Manu has needed always inlining, and I've needed never inlining. This
 DIP proposes a simple solution.

Why pragma? Also how exactly it is supposed to work: pragma(inline, true); ... //every declaration that follows is forcibly inlined? pragma(inline, false); ... //every declaration that follows is forcibly NOT inlined? How to return to normal state then? I think pragma is not attached to declaration. I'd strongly favor introducing a compiler-hint family of UDAs and force_inline/force_notinline as first among many. -- Dmitry Olshansky
Feb 23 2014
next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 2/23/2014 4:38 AM, Dmitry Olshansky wrote:
 Why pragma?

Answered in another post.
 Also how exactly it is supposed to work:

T func(args) { ... pragma(inline, true); ... }
 How to return to normal state then?

Not necessary when it's inside a function.
 I'd strongly favor introducing a compiler-hint family of UDAs and
 force_inline/force_notinline as first among many.

I don't see an advantage of that over pragma. It also seems like something that should be inside a function, not outside. (After all, a function with no body cannot be inlined.)
Feb 23 2014
prev sibling next sibling parent dennis luehring <dl.soluz gmx.net> writes:
Am 23.02.2014 13:38, schrieb Dmitry Olshansky:
 23-Feb-2014 16:07, Walter Bright пишет:
 http://wiki.dlang.org/DIP56

 Manu has needed always inlining, and I've needed never inlining. This
 DIP proposes a simple solution.

Why pragma? Also how exactly it is supposed to work: pragma(inline, true); ... //every declaration that follows is forcibly inlined? pragma(inline, false); ... //every declaration that follows is forcibly NOT inlined? How to return to normal state then? I think pragma is not attached to declaration. I'd strongly favor introducing a compiler-hint family of UDAs and force_inline/force_notinline as first among many.

yea it feels strange - like naked in inline asm its a scope changer - that sits inside the scope it changes??? like writing public methods by putting public inside of the method - and public is also compiler relevant for the generated interface and aligne is also not a pragma - and still changes codegeneration its a function-(compile-)attribute but that does not mean it have to be a pragma btw: is the pragma way just easier to implement - or else i don't understand why this is handle so special?
Feb 23 2014
prev sibling parent "Joseph Cassman" <jc7919 outlook.com> writes:
On Sunday, 23 February 2014 at 12:50:58 UTC, Walter Bright wrote:
 On 2/23/2014 4:38 AM, Dmitry Olshansky wrote:
 Why pragma?

Answered in another post.
 Also how exactly it is supposed to work:

T func(args) { ... pragma(inline, true); ... }
 How to return to normal state then?

Not necessary when it's inside a function.
 I'd strongly favor introducing a compiler-hint family of UDAs 
 and
 force_inline/force_notinline as first among many.

I don't see an advantage of that over pragma. It also seems like something that should be inside a function, not outside. (After all, a function with no body cannot be inlined.)

Thanks for the code example. That helped me better understand what is being proposed. I like the idea of using pragma since it is built specifically for the purpose of sending information to the compiler from code. Also, I like not having to add another keyword to a function definition. Especially since I already have " safe pure nothrow" in as many places as possible, for inline-able functions I'd prefer to not have to add "inline" to that list. Using a pragma would mean it could be implemented right away without worrying about breaking any existing code. The proposal also satisfies the needs of both parties. Especially since D is a flexible language it would be nice to give such ability to customize code generation to the programmer. Given the above I think this is a good idea. Joseph
Feb 23 2014
prev sibling next sibling parent reply "ponce" <contact gam3sfrommars.fr> writes:
On Sunday, 23 February 2014 at 12:07:40 UTC, Walter Bright wrote:
 http://wiki.dlang.org/DIP56

 Manu has needed always inlining, and I've needed never 
 inlining. This DIP proposes a simple solution.

This is great. I bet this will be useful. I tend to prefer force-inline/force-not-inline at call site, but realized the proposal will let me do it: void myFun(bool inlined)(int arg) { static if (inlined) pragma(inline, true); else pragma(inline, false); } Then inlining can be entirely explicit :)
Feb 23 2014
parent Walter Bright <newshound2 digitalmars.com> writes:
On 2/23/2014 4:53 AM, ponce wrote:
 On Sunday, 23 February 2014 at 12:07:40 UTC, Walter Bright wrote:
 http://wiki.dlang.org/DIP56

 Manu has needed always inlining, and I've needed never inlining. This DIP
 proposes a simple solution.

This is great. I bet this will be useful. I tend to prefer force-inline/force-not-inline at call site, but realized the proposal will let me do it: void myFun(bool inlined)(int arg) { static if (inlined) pragma(inline, true); else pragma(inline, false); } Then inlining can be entirely explicit :)

Or better: void myFun(bool inlined)(int arg) { pragma(inline, inlined); } :-)
Feb 23 2014
prev sibling next sibling parent "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Sunday, 23 February 2014 at 12:57:00 UTC, Walter Bright wrote:
 On 2/23/2014 4:25 AM, Tove wrote:
 The DIP should probably specify what happens if inlining fails,
 i.e. generate a compilation error.

I suspect that may cause problems, because different compilers will have different inlining capabilities. I think it should be a 'recommendation' to the compiler.

I think there should be some way to force the compiler to inline a function. As a bonus, the error message can tell the programmer why the function could not be inlined, allowing them to make the necessary adjustments. Different compilers will have different inlining capabilities, however at the point where programmers are forcing inlining on or off, they are already micro-optimizing at a level which implies dependency on particular compiler implementations.
Feb 23 2014
prev sibling next sibling parent reply Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 23/02/2014 13:07, Walter Bright wrote:
 http://wiki.dlang.org/DIP56

 Manu has needed always inlining, and I've needed never inlining. This DIP
 proposes a simple solution.

Sounds good in principle. So, if I understand right, a pragma(inline, true) anywhere inside a function adds a compiler hint to always inline this function, while with false it's a hint to _never_ do so, and no pragma at all gives the usual compiler-decides situation? Question: what happens if someone is daft enough to put both true and false inside the same function? In any case, could you possibly provide a slightly more detailed code example with accompanying explanation of what the intended results are?
Feb 23 2014
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 2/23/2014 5:06 AM, Joseph Rushton Wakeling wrote:
 So, if I understand right, a pragma(inline, true)
 anywhere inside a function adds a compiler hint to always inline this function,
 while with false it's a hint to _never_ do so, and no pragma at all gives the
 usual compiler-decides situation?

I'll add: pragma(inline); meaning revert to default behavior.
 Question: what happens if someone is daft enough to put both true and false
 inside the same function?

The last one wins.
Feb 23 2014
parent Walter Bright <newshound2 digitalmars.com> writes:
On 2/23/2014 12:31 PM, Andrej Mitrovic wrote:
 On Sunday, 23 February 2014 at 20:29:19 UTC, Walter Bright wrote:
 I'll add:

     pragma(inline);

That's just going to confuse people, because they'll think *this* forces inlining.

Perhaps, but there's precedent with how align works, and how default initialization of variables works.
 I'd prefer 3 separate states. pragma(inline), pragma(no_inline), and
 pragma(default_inline) or something like that.

That makes documentation with a sorted list of pragmas impractical.
Feb 23 2014
prev sibling next sibling parent "Francesco Cattoglio" <francesco.cattoglio gmail.com> writes:
On Sunday, 23 February 2014 at 13:07:27 UTC, Dmitry Olshansky 
wrote:
 It's going to be near useless if it doesn't make sure inlining 
 happened.

Feb 23 2014
prev sibling next sibling parent "Tove" <tove fransson.se> writes:
On Sunday, 23 February 2014 at 12:57:00 UTC, Walter Bright wrote:
 On 2/23/2014 4:25 AM, Tove wrote:
 The DIP should probably specify what happens if inlining fails,
 i.e. generate a compilation error.

I suspect that may cause problems, because different compilers will have different inlining capabilities. I think it should be a 'recommendation' to the compiler.

Would assert be feasible or difficult to implement with the current compiler design? static assert(pragma(inline, true));
Feb 23 2014
prev sibling next sibling parent Iain Buclaw <ibuclaw gdcproject.org> writes:
On 23 February 2014 14:19, Tove <tove fransson.se> wrote:
 On Sunday, 23 February 2014 at 12:57:00 UTC, Walter Bright wrote:
 On 2/23/2014 4:25 AM, Tove wrote:
 The DIP should probably specify what happens if inlining fails,
 i.e. generate a compilation error.

I suspect that may cause problems, because different compilers will have different inlining capabilities. I think it should be a 'recommendation' to the compiler.

Would assert be feasible or difficult to implement with the current compiler design? static assert(pragma(inline, true));

WAT!
Feb 23 2014
prev sibling next sibling parent "Joseph Cassman" <jc7919 outlook.com> writes:
On Sunday, 23 February 2014 at 13:07:27 UTC, Dmitry Olshansky 
wrote:
 23-Feb-2014 16:57, Walter Bright пишет:
 On 2/23/2014 4:25 AM, Tove wrote:
 The DIP should probably specify what happens if inlining 
 fails,
 i.e. generate a compilation error.

I suspect that may cause problems, because different compilers will have different inlining capabilities. I think it should be a 'recommendation' to the compiler.

It's going to be near useless if it doesn't make sure inlining happened. Part of the reason for forced inline is always inlining some core primitives, even in debug builds. The other point is what Vladimir mentioned - we already doing micro-optimization, hence it better error out then turn a blind eye on our tinkering. I wouldn't not like to ever have to get down and look at ASM for every function just to make sure it was inlined.

That is most likely when I would make use of the concept too. And a message from the compiler in its output telling me when such an inline request failed would be helpful. Joseph
Feb 23 2014
prev sibling next sibling parent "Dicebot" <public dicebot.lv> writes:
On Sunday, 23 February 2014 at 13:07:27 UTC, Dmitry Olshansky 
wrote:
 It's going to be near useless if it doesn't make sure inlining 
 happened.
 Part of the reason for forced inline is always inlining some 
 core primitives, even in debug builds.

Optional recommendation for inlining already exists - it is current default. This pragma needs to result in compile-time error if used where inlining is not possible to be any useful. Other than that, looks fine.
Feb 23 2014
prev sibling next sibling parent "Andrej Mitrovic" <andrej.mitrovich gmail.com> writes:
On Sunday, 23 February 2014 at 20:29:19 UTC, Walter Bright wrote:
 I'll add:

     pragma(inline);

That's just going to confuse people, because they'll think *this* forces inlining. I'd prefer 3 separate states. pragma(inline), pragma(no_inline), and pragma(default_inline) or something like that.
Feb 23 2014
prev sibling next sibling parent "Dicebot" <public dicebot.lv> writes:
On Sunday, 23 February 2014 at 20:40:44 UTC, Walter Bright wrote:
 Generally, when I optimize at that level, I have a window open 
 on the assembler output of the compiler and I go back and forth 
 on the source code until I get the shape of the assembler I 
 need. Having compiler messages wouldn't be very helpful.

Ok, you are at this point, check assembly and find out that compiler ignores your recommendation with no error messages / explanations. Next step?
Feb 23 2014
prev sibling next sibling parent "Francesco Cattoglio" <francesco.cattoglio gmail.com> writes:
On Sunday, 23 February 2014 at 20:40:44 UTC, Walter Bright wrote:
 Generally, when I optimize at that level, I have a window open 
 on the assembler output of the compiler and I go back and forth 
 on the source code until I get the shape of the assembler I 
 need. Having compiler messages wouldn't be very helpful.

Not everyone has time/knowledge for checking the ASM at every recompile. Personally I wouldn't be able to do something like this that much often, and yet I'd love to know that something is not working ASAP. Code changes, and it changes a lot during development. Having a way to make sure that one or more functions stay inlined is handy to have. If such a pragma doesn't guarantee inlining, that means we will have no way to check it quickly. Sometimes fail fast is really the best choice.
Feb 23 2014
prev sibling next sibling parent "tn" <no email.com> writes:
On Sunday, 23 February 2014 at 21:38:46 UTC, Walter Bright wrote:
 On 2/23/2014 12:31 PM, Andrej Mitrovic wrote:
 I'd prefer 3 separate states. pragma(inline), 
 pragma(no_inline), and
 pragma(default_inline) or something like that.

That makes documentation with a sorted list of pragmas impractical.

pragma(inline_always) pragma(inline_never) pragma(inline_default) or pragma(inline_force) pragma(inline_prevent) pragma(inline_default)
Feb 23 2014
prev sibling next sibling parent "Dicebot" <public dicebot.lv> writes:
On Sunday, 23 February 2014 at 21:53:43 UTC, Walter Bright wrote:
 On 2/23/2014 1:04 PM, Dmitry Olshansky wrote:
 That programmer is instantly aware that it can't be done due 
 to some reason.
 Keep in mind that code changes with time and running 
 profiler/disassembler on
 every tiny change to make sure the stuff is still inlined is 
 highly
 counter-productive.

I'm aware of that, but once you add the: version(BadCompiler) { } else pragma(inline, true);

Once one resorts to force_inline and similar micro-optimisations he usually sticks to single "good" compiler as code gen needs to be re-profiled for each compiler anyway.
Feb 23 2014
prev sibling next sibling parent "Francesco Cattoglio" <francesco.cattoglio gmail.com> writes:
On Sunday, 23 February 2014 at 21:55:11 UTC, Walter Bright wrote:
 On 2/23/2014 1:32 PM, Francesco Cattoglio wrote:
 [...]

I addressed these three messages in another reply to Dmitry.

Read that, and you do make a point. I am no expert on optimization, but as far as I could tell, inlining is usually the easiest and most rewarding of the optimizations one can do. I know you kind of hate warnings, but perhaps we could at least get a warning if something cannot be inlined?
Feb 23 2014
prev sibling next sibling parent "Dicebot" <public dicebot.lv> writes:
As a compromise diagnostics about refused inlining can be added 
as special output category to 
https://github.com/D-Programming-Language/dmd/pull/645
Feb 23 2014
prev sibling next sibling parent "QAston" <qaston gmail.com> writes:
On Sunday, 23 February 2014 at 21:53:43 UTC, Walter Bright wrote:
 I'm aware of that, but once you add the:

     version(BadCompiler) { } else pragma(inline, true);

 things will never get better for BadCompiler.

This is exactly what caused mess with http user agent info when both browsers tried to present web pages better and web devs tried to tune their pages to browsers with distinct features. Now chrome says it's Mozilla, khtml, gecko and safari. But, is that really a problem? I don't think much code relies on compiler intrinsics. If it does perhaps a way to specify attributes in one place and then reference those (like CUSTOM_INLINE define in C) would help.
Feb 23 2014
prev sibling next sibling parent "Mike" <none none.com> writes:
On Sunday, 23 February 2014 at 23:49:57 UTC, Walter Bright wrote:
 What makes inlining special, as opposed to, say, enregistering 
 particular variables?

The difference is it was explicitly told do do something and didn't. That's insubordination. Mike
Feb 23 2014
prev sibling next sibling parent "Tove" <tove fransson.se> writes:
On Sunday, 23 February 2014 at 21:53:43 UTC, Walter Bright wrote:
 I'm aware of that, but once you add the:

     version(BadCompiler) { } else pragma(inline, true);

 things will never get better for BadCompiler. And besides, that 
 line looks awful.

If I need to support multiple compilers and if one of them is not good enough, I would first try to figure out which statement causes it to fail, if left with no other alternatives: Manually inline it in the common path for all compilers, _not_ create version blocks. Inspecting asm output doesn't scale well to huge projects. Imagine simply updating the existing codebase to use a new compiler version. Based on my experience, even if we are profiling and benchmarking a lot and have many performance based KPI:s, they will still never be as fine-grained as the functional test coverage. Also not forgetting, some performance issues may only be detected in live usage scenarios on the other side of the earth as the developers doesn't even have access to the needed environment(only imperfect simulations), in those scenarios you are quite grateful for every static compilation error/warning you can get... You are right in that there is nothing special about inlining, but I'd rather add warnings for all other failed optimisation opportunities than not to warn about failed inlining. RVCT for instance has --diag_warning=optimizations, which gives many helpful hints, such as alias issues: please add "restrict", or possible alignment issues etc.
Feb 23 2014
prev sibling next sibling parent "Dicebot" <public dicebot.lv> writes:
On Monday, 24 February 2014 at 00:33:09 UTC, Walter Bright wrote:
 I suspect that elevating one particular optimization hint to 
 being an absolute command may not turn out well. Inlining 
 already has performance issues, as it may increase the size of 
 an inner loop beyond what will fit in the cache, for just one 
 unexpected result. For another it may mess up the register 
 allocation of the caller. "Inlining makes it faster" is not 
 always true. Do you really want to weld this in as an absolute 
 requirement in the language?

The fact that original C "inline" was designed in same "permissive" way and is almost unused in practice (as opposed to compiler-specific force_inline attributes) does say something. It is not feature that should be design for mass usage.
Feb 23 2014
prev sibling next sibling parent Xavier Bigand <flamaros.xavier gmail.com> writes:
Le 23/02/2014 13:07, Walter Bright a écrit :
 http://wiki.dlang.org/DIP56

 Manu has needed always inlining, and I've needed never inlining. This
 DIP proposes a simple solution.

I saw many times C++ developers works on applications doesn't need such level optimization puts inline keyword or implementation in headers files without doing any performance analysis!!! And as I saw they don't know X86 neither,... For those doesn't have necessary knowledge it's just counter productive and increase the compilation times without evidence of the interest. So my point is this kind of feature have to be hidden from newbies (like me) and other developers who are zealous.
Feb 23 2014
prev sibling next sibling parent "Araq" <rumpf_a web.de> writes:
 The fact that original C "inline" was designed in same 
 "permissive" way and is almost unused in practice (as opposed 
 to compiler-specific force_inline attributes) does say 
 something.

Do you mind to back up your "fact" with some numbers? Afaict 'inline' is more common than __attribute__((forceinline)). (Well ok for C code #define is even more common, but most C code is stuck in the 70ies anyway so that doesn't mean anything.)
Feb 23 2014
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 2/23/14, 4:07 AM, Walter Bright wrote:
 http://wiki.dlang.org/DIP56

 Manu has needed always inlining, and I've needed never inlining. This
 DIP proposes a simple solution.

This makes inlining dependent on previously-seen code. Would that make parallel compilation more difficult? I've always thought the obvious/simple way would be an attribute such as forceinline and noinline that applies to individual functions. Andrei
Feb 23 2014
next sibling parent "Meta" <jared771 gmail.com> writes:
On Monday, 24 February 2014 at 01:12:56 UTC, Andrei Alexandrescu 
wrote:
 This makes inlining dependent on previously-seen code. Would 
 that make parallel compilation more difficult?

 I've always thought the obvious/simple way would be an 
 attribute such as  forceinline and  noinline that applies to 
 individual functions.


 Andrei

That seems to be how Rust does it, but I'm not really clear how attributes work in Rust. http://static.rust-lang.org/doc/master/rust.html#inline-attributes
Feb 23 2014
prev sibling next sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:

 I've always thought the obvious/simple way would be an 
 attribute such as  forceinline and  noinline that applies to 
 individual functions.

Seems good. And what do you think the D compiler should do when you use forceinline and it can't inline? Bye, bearophile
Feb 23 2014
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 2/23/14, 5:55 PM, bearophile wrote:
 Andrei Alexandrescu:

 I've always thought the obvious/simple way would be an attribute such
 as  forceinline and  noinline that applies to individual functions.

Seems good. And what do you think the D compiler should do when you use forceinline and it can't inline?

Compile-time error, no two ways about it. Andrei
Feb 23 2014
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 2/23/2014 5:12 PM, Andrei Alexandrescu wrote:
 This makes inlining dependent on previously-seen code. Would that make parallel
 compilation more difficult?

I don't understand the question. Inlining always depends on the compiler having seen the function body.
 I've always thought the obvious/simple way would be an attribute such as
  forceinline and  noinline that applies to individual functions.

Since inlining can't be done without the function body, putting the pragma in the function body makes sense.
Feb 23 2014
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 2/23/14, 6:12 PM, Walter Bright wrote:
 On 2/23/2014 5:12 PM, Andrei Alexandrescu wrote:
 This makes inlining dependent on previously-seen code. Would that make
 parallel
 compilation more difficult?

I don't understand the question. Inlining always depends on the compiler having seen the function body.

Decision to inline at line 2000 may be caused by a pragma in line 2. Andrei
Feb 23 2014
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 2/23/2014 8:18 PM, Andrei Alexandrescu wrote:
 On 2/23/14, 6:12 PM, Walter Bright wrote:
 On 2/23/2014 5:12 PM, Andrei Alexandrescu wrote:
 This makes inlining dependent on previously-seen code. Would that make
 parallel
 compilation more difficult?

I don't understand the question. Inlining always depends on the compiler having seen the function body.

Decision to inline at line 2000 may be caused by a pragma in line 2.

I still don't understand the question. Successfully compiling anything in D can have dependencies on arbitrary other parts of the code. Why would inlining be any different, or be a special problem?
Feb 24 2014
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 2/24/14, 12:55 AM, Walter Bright wrote:
 On 2/23/2014 8:18 PM, Andrei Alexandrescu wrote:
 On 2/23/14, 6:12 PM, Walter Bright wrote:
 On 2/23/2014 5:12 PM, Andrei Alexandrescu wrote:
 This makes inlining dependent on previously-seen code. Would that make
 parallel
 compilation more difficult?

I don't understand the question. Inlining always depends on the compiler having seen the function body.

Decision to inline at line 2000 may be caused by a pragma in line 2.

I still don't understand the question. Successfully compiling anything in D can have dependencies on arbitrary other parts of the code. Why would inlining be any different, or be a special problem?

Probably it makes no difference, sorry for the distraction. Andrei
Feb 24 2014
prev sibling next sibling parent Brad Roberts <braddr puremagic.com> writes:
On 2/23/14, 5:05 PM, Walter Bright wrote:
 On 2/23/2014 4:21 PM, Tove wrote:
 Inspecting asm output doesn't scale well to huge projects. Imagine simply
 updating the existing codebase to use a new compiler version.

Again, this is treating 'inline' as being the only optimization that matters? It's not even the most important - that would likely be register allocation. At some point, you're going to need to trust the compiler.

At this point, you're starting to argue that the entire DIP isn't relevant. I agree with the majority that if you're going to have the directive, then it needs to be enforcement, not suggestion.
Feb 23 2014
prev sibling next sibling parent reply Lionello Lunesu <lionello lunesu.remove.com> writes:
On 23/02/14 20:07, Walter Bright wrote:
 http://wiki.dlang.org/DIP56

 Manu has needed always inlining, and I've needed never inlining. This
 DIP proposes a simple solution.

void A() { } void B() { pragma(inline, true) A(); } void C() { B(); } Reading that code, I would guess that within B(), the call to A() would get inlined. Reading the DIP, it appears that the pragma controls whether B() gets inlined. When the pragma is used outside of the scope at the function declaration it would work more like "inline" or "__inline" in C++, correct? L.
Feb 23 2014
parent Walter Bright <newshound2 digitalmars.com> writes:
On 2/23/2014 6:12 PM, Lionello Lunesu wrote:
 On 23/02/14 20:07, Walter Bright wrote:
 http://wiki.dlang.org/DIP56

 Manu has needed always inlining, and I've needed never inlining. This
 DIP proposes a simple solution.

void A() { } void B() { pragma(inline, true) A();

No. This would be: pragma(inline, true); A(); and then B() will be inlined when it is encountered.
 }

 void C()
 {
    B();
 }

 Reading that code, I would guess that within B(), the call to A() would get
 inlined. Reading the DIP, it appears that the pragma controls whether B() gets
 inlined.

 When the pragma is used outside of the scope at the function declaration it
 would work more like "inline" or "__inline" in C++, correct?

Yes.
Feb 24 2014
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Sun, 23 Feb 2014 21:05:32 -0500, Walter Bright  
<newshound2 digitalmars.com> wrote:

 On 2/23/2014 5:45 PM, Brad Roberts wrote:
 At this point, you're starting to argue that the entire DIP isn't  
 relevant.  I
 agree with the majority that if you're going to have the directive,  
 then it
 needs to be enforcement, not suggestion.

1. It provides information to the compiler about runtime frequency that it cannot obtain otherwise. This is very useful information for generating better code.

But you are under-utilizing the message. There is the case that one wants inlining, even when -inline isn't passed to the compiler, for functions that would have been inlined if -inline was specified. That is your case, right? But there is a case where the compiler for some reason has decided that inlining a function is not worth it, so even with -inline it doesn't do it. However, without the inlining, the function becomes horrendously slow. For example, functions that contain lazy parameters.
 2. Making it a hard requirement then means the user will have to put  
 versioning in it. It becomes inherently non-portable. There is no way to  
 predict what some other version of some other compiler on some other  
 system will do.

This is not a problem. The whole point is, if the compiler doesn't support the inlining, the code is useless. I WANT it to fail, there is no reason to version it out.
 3. In the end, the compiler should make the decision. Inlining does not  
 always result in faster code, as I pointed out in another post.

Huh? Then why even have the feature if the compiler is going to ignore your request! This feature sounds completely useless to me, it certainly adds no real value that warrants adding a pragma. It may as well be called pragma(please_inline_pretty_pretty_please_ill_be_your_best_friend)
 4. I don't see that users really are asking for inlining or not. They  
 are asking for the fastest code. As such, providing hints about usage  
 frequencies are entirely appropriate. Micromanaging the method used is  
 not so appropriate. After all, the reason one uses a compiler in the  
 first place rather than assembler is to not micromanage the actual  
 instructions.

Compilers are not infallible. They may make mistakes, or not have enough information, which is the point of this feature. What is to say they don't make mistakes even with the correct amount of information? And the reason I use a compiler rather than assembler is because I hate writing assembler :)
 Perhaps the lesson is the word 'inline' carries certain expectations  
 with it, and the feature would be better positioned as something like:

      pragma(usage, often);
      pragma(usage, rare);

This is totally the wrong tack. First, I may have no idea how often a function will be used. Second, usage frequency has nothing to do with how inlining may affect the performance of an individual call. If an inlined function always executes faster than calling the function, I always want to inline. For example, foo: void foo(ref int x) { ++x; } -Steve
Feb 23 2014
prev sibling next sibling parent "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Monday, 24 February 2014 at 04:14:08 UTC, Andrei Alexandrescu 
wrote:
 I'll add an anecdote - in HHVM we owe a lot of speedups to the 
 careful use of "never inline" and "always inline" gcc pragmas 
 IN ADDITION TO the usual "inline" directives. We have factual 
 proof that gcc makes the wrong inline decisions BOTH WAYS if 
 left to decide.

 If we define pragmas for inlining, "always inline" must mean 
 always inline no questions asked and "never inline" must mean 
 always prevent inlining no questions asked. Anything else would 
 be a frustrating waste of time.

I think there is another, distinct use case for an inline pragma where "try to inline" is useful - namely, turning on the equivalent of the compiler "-inline" switch for just one function. I believe this is the original rationale behind the DIP (enabling inlining for certain functions even in debug builds, because otherwise the debug builds become so slow as to be unusable). In this case, whether the compiler actually succeeds at inlining the function doesn't matter as long as it does the same thing as for an optimized (-inline) build. Thus, I think there should be "try to inline" (same as -inline) and "always inline" (failure stops compilation).
Feb 23 2014
prev sibling next sibling parent "francesco cattoglio" <francesco.cattoglio gmail.com> writes:
On Monday, 24 February 2014 at 02:05:31 UTC, Walter Bright wrote:
 1. It provides information to the compiler about runtime 
 frequency that it cannot obtain otherwise. This is very useful 
 information for generating better code.

"inline" a special optimization.
 3. In the end, the compiler should make the decision. Inlining 
 does not always result in faster code, as I pointed out in 
 another post.

inlining never made my code slower. But I do realize this is not relevant to the discussion.
 Perhaps the lesson is the word 'inline' carries certain 
 expectations with it, and the feature would be better 
 positioned as something like:

     pragma(usage, often);
     pragma(usage, rare);

for the compiler to comply. If the plan is hinting frequency information, then "usage" makes way more sense. It might be used in if blocks and in switch cases too, when branch prediction might be sloppy or unoptimal.
Feb 23 2014
prev sibling next sibling parent Iain Buclaw <ibuclaw gdcproject.org> writes:
--001a11c2f4e603a3e604f3218192
Content-Type: text/plain; charset=ISO-8859-1

On Feb 24, 2014 1:15 AM, "Andrei Alexandrescu" <
SeeWebsiteForEmail erdani.org> wrote:
 On 2/23/14, 4:07 AM, Walter Bright wrote:
 http://wiki.dlang.org/DIP56

 Manu has needed always inlining, and I've needed never inlining. This
 DIP proposes a simple solution.

This makes inlining dependent on previously-seen code. Would that make

 I've always thought the obvious/simple way would be an attribute such as

 Andrei

GDC already has both of these as a compiler extended attribute (need to document these!!!) import gcc.attribute; attribute("forceinline") ... Being backend attributes, you can't enforce that these attributes actually take effect in user code (no static asserts!) - but you have some guarantee in that the backend will complain if it can't apply the attribute - this is good because the compiler will always produce a better diagnostic than some user static assert, always. Regards -- Iain Buclaw *(p < e ? p++ : p) = (c & 0x0f) + '0'; --001a11c2f4e603a3e604f3218192 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable <p><br> On Feb 24, 2014 1:15 AM, &quot;Andrei Alexandrescu&quot; &lt;<a href=3D"mai= lto:SeeWebsiteForEmail erdani.org">SeeWebsiteForEmail erdani.org</a>&gt; wr= ote:<br> &gt;<br> &gt; On 2/23/14, 4:07 AM, Walter Bright wrote:<br> &gt;&gt;<br> &gt;&gt; <a href=3D"http://wiki.dlang.org/DIP56">http://wiki.dlang.org/DIP5= 6</a><br> &gt;&gt;<br> &gt;&gt; Manu has needed always inlining, and I&#39;ve needed never inlinin= g. This<br> &gt;&gt; DIP proposes a simple solution.<br> &gt;<br> &gt;<br> &gt; This makes inlining dependent on previously-seen code. Would that make= parallel compilation more difficult?<br> &gt;<br> &gt; I&#39;ve always thought the obvious/simple way would be an attribute s= uch as forceinline and noinline that applies to individual functions.<br> &gt;<br> &gt;<br> &gt; Andrei<br> &gt;</p> <p>GDC already has both of these as a compiler extended attribute (need to = document these!!!)</p> <p>import gcc.attribute;</p> <p> attribute(&quot;forceinline&quot;) ...</p> <p>Being backend attributes, you can&#39;t enforce that these attributes ac= tually take effect in user code (no static asserts!) - but you have some gu= arantee in that the backend will complain if it can&#39;t apply the attribu= te - this is good because the compiler will always produce a better diagnos= tic than some user static assert, always.</p> <p>Regards<br> -- <br> Iain Buclaw</p> <p>*(p &lt; e ? p++ : p) =3D (c &amp; 0x0f) + &#39;0&#39;;</p> --001a11c2f4e603a3e604f3218192--
Feb 23 2014
prev sibling next sibling parent Iain Buclaw <ibuclaw gdcproject.org> writes:
--001a11c297941ddddc04f3219460
Content-Type: text/plain; charset=ISO-8859-1

On Feb 24, 2014 2:10 AM, "Walter Bright" <newshound2 digitalmars.com> wrote:
 On 2/23/2014 5:45 PM, Brad Roberts wrote:
 At this point, you're starting to argue that the entire DIP isn't


 agree with the majority that if you're going to have the directive, then


 needs to be enforcement, not suggestion.

1. It provides information to the compiler about runtime frequency that

better code.
 2. Making it a hard requirement then means the user will have to put

predict what some other version of some other compiler on some other system will do.
 3. In the end, the compiler should make the decision. Inlining does not

 4. I don't see that users really are asking for inlining or not. They are

frequencies are entirely appropriate. Micromanaging the method used is not so appropriate. After all, the reason one uses a compiler in the first place rather than assembler is to not micromanage the actual instructions.
 Perhaps the lesson is the word 'inline' carries certain expectations with

     pragma(usage, often);
     pragma(usage, rare);

Also known as, hot and cold functions. Regards -- Iain Buclaw *(p < e ? p++ : p) = (c & 0x0f) + '0'; --001a11c297941ddddc04f3219460 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable <p><br> On Feb 24, 2014 2:10 AM, &quot;Walter Bright&quot; &lt;<a href=3D"mailto:ne= wshound2 digitalmars.com">newshound2 digitalmars.com</a>&gt; wrote:<br> &gt;<br> &gt; On 2/23/2014 5:45 PM, Brad Roberts wrote:<br> &gt;&gt;<br> &gt;&gt; At this point, you&#39;re starting to argue that the entire DIP is= n&#39;t relevant. =A0I<br> &gt;&gt; agree with the majority that if you&#39;re going to have the direc= tive, then it<br> &gt;&gt; needs to be enforcement, not suggestion.<br> &gt;<br> &gt;<br> &gt; 1. It provides information to the compiler about runtime frequency tha= t it cannot obtain otherwise. This is very useful information for generatin= g better code.<br> &gt;<br> &gt; 2. Making it a hard requirement then means the user will have to put v= ersioning in it. It becomes inherently non-portable. There is no way to pre= dict what some other version of some other compiler on some other system wi= ll do.<br> &gt;<br> &gt; 3. In the end, the compiler should make the decision. Inlining does no= t always result in faster code, as I pointed out in another post.<br> &gt;<br> &gt; 4. I don&#39;t see that users really are asking for inlining or not. T= hey are asking for the fastest code. As such, providing hints about usage f= requencies are entirely appropriate. Micromanaging the method used is not s= o appropriate. After all, the reason one uses a compiler in the first place= rather than assembler is to not micromanage the actual instructions.<br> &gt;<br> &gt;<br> &gt; Perhaps the lesson is the word &#39;inline&#39; carries certain expect= ations with it, and the feature would be better positioned as something lik= e:<br> &gt;<br> &gt; =A0 =A0 pragma(usage, often);<br> &gt; =A0 =A0 pragma(usage, rare);</p> <p>Also known as, hot and cold functions. </p> <p>Regards<br> -- <br> Iain Buclaw</p> <p>*(p &lt; e ? p++ : p) =3D (c &amp; 0x0f) + &#39;0&#39;;</p> --001a11c297941ddddc04f3219460--
Feb 23 2014
prev sibling next sibling parent "Dicebot" <public dicebot.lv> writes:
On Monday, 24 February 2014 at 01:09:46 UTC, Araq wrote:
 Do you mind to back up your "fact" with some numbers? Afaict 
 'inline' is more common than __attribute__((forceinline)). 
 (Well ok for C code #define is even more common, but most C 
 code is stuck in the 70ies anyway so that doesn't mean 
 anything.)

I can't link you closed projects I have been working on before so you can surely not trust my memories. Normal `inline` is common in headers because you can't have non-inlined function bodies in headers. In actual translation units - only from those who actually expect it to have forceinline effect (I have not met a single case where adding it can make any difference on gcc decision to inline or not). This was my actual point - not that no one uses "inline" but that the very same lax definition has turned it into essentially into no-op, causing necessity for compiler-specific alternative to appear.
Feb 24 2014
prev sibling next sibling parent "Dicebot" <public dicebot.lv> writes:
On Monday, 24 February 2014 at 02:05:31 UTC, Walter Bright wrote:
     pragma(usage, often);
     pragma(usage, rare);

This is also useful feature, especially when also applicable to if branches (I have been using __builtin_expect quite a lot with GCC). But it is different, I think we need both.
Feb 24 2014
prev sibling next sibling parent Manu <turkeyman gmail.com> writes:
--089e013cb946189c1a04f3297f51
Content-Type: text/plain; charset=UTF-8

This will probably do, but I still don't understand why not a function
attribute?

Will marking a function as inline notify the compiler that code should
never be emitted to object files for that function?

Perhaps OT:
I've been playing with ranges a lot recently, and std.algorithm and
friends, and I'm finding that using lambdas is real problem. They don't
reliably inline, and the optimiser seems to have problems on occasion even
when they do. (Perhaps they inline at the wrong stage?)
How can we have some guarantees about the inlining and inline-ability of
trivial lambda's?
I'm very concerned about the performance of debug code when using something
like filter!"condition", which results in a whole bunch of extra function
calls per loop iteration.
I raised a thread recently about the idea of adding an additional optional
argument to foreach to provide a filtering or termination condition, which
if implemented by the language would have no overhead cost. The suggestion
was to use filter!"", which sounds like a reasonable idea, but I'm really
worried about the performance implications of using library primitives that
produce a bunch of extra function calls on every loop cycle. I'm not sure
these are practical when used in sufficiently trivial loops. Imagine I'm
looping over a vertex array or an image or something, skipping over
transparent pixels, or something like that... millions of iterations
performing very trivial transformation, calling a bunch of functions every
cycle.

On 23 February 2014 22:07, Walter Bright <newshound2 digitalmars.com> wrote:

 http://wiki.dlang.org/DIP56

 Manu has needed always inlining, and I've needed never inlining. This DIP
 proposes a simple solution.

--089e013cb946189c1a04f3297f51 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr"><div class=3D"gmail_extra">This will probably do, but I st= ill don&#39;t understand why not a function attribute?</div><div class=3D"g= mail_extra"><br></div><div class=3D"gmail_extra"><div class=3D"gmail_extra"=
Will marking a function as inline notify the compiler that code should nev=

<div><br></div><div>Perhaps OT:</div></div><div class=3D"gmail_extra">I&#39= ;ve been playing with ranges a lot recently, and std.algorithm and friends,= and I&#39;m finding that using lambdas is real problem. They don&#39;t rel= iably inline, and the optimiser seems to have problems on occasion even whe= n they do. (Perhaps they inline at the wrong stage?)</div> <div class=3D"gmail_extra">How can we have some guarantees about the inlini= ng and inline-ability of trivial lambda&#39;s?</div><div class=3D"gmail_ext= ra">I&#39;m very concerned about the performance of debug code when using s= omething like filter!&quot;condition&quot;, which results in a whole bunch = of extra function calls per loop iteration.</div> <div class=3D"gmail_extra">I raised a thread recently about the idea of add= ing an additional optional argument to foreach to provide a filtering or te= rmination condition, which if implemented by the language would have no ove= rhead cost. The suggestion was to use filter!&quot;&quot;, which sounds lik= e a reasonable idea, but I&#39;m really worried about the performance impli= cations of using library primitives that produce a bunch of extra function = calls on every loop cycle. I&#39;m not sure these are practical when used i= n sufficiently trivial loops. Imagine I&#39;m looping over a vertex array o= r an image or something, skipping over transparent pixels, or something lik= e that... millions of iterations performing very trivial transformation, ca= lling a bunch of functions every cycle.</div> <div class=3D"gmail_extra"><br></div><div class=3D"gmail_extra"><div class= =3D"gmail_quote"> On 23 February 2014 22:07, Walter Bright <span dir=3D"ltr">&lt;<a href=3D"m= ailto:newshound2 digitalmars.com" target=3D"_blank">newshound2 digitalmars.= com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"mar= gin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,2= 04);border-left-style:solid;padding-left:1ex"> <a href=3D"http://wiki.dlang.org/DIP56" target=3D"_blank">http://wiki.dlang= .org/DIP56</a><br> <br> Manu has needed always inlining, and I&#39;ve needed never inlining. This D= IP proposes a simple solution.<br> </blockquote></div><br></div></div> --089e013cb946189c1a04f3297f51--
Feb 24 2014
prev sibling next sibling parent Manu <turkeyman gmail.com> writes:
--001a1136a0d416501e04f3298b16
Content-Type: text/plain; charset=UTF-8

On 25 February 2014 02:30, Manu <turkeyman gmail.com> wrote:

 This will probably do, but I still don't understand why not a function
 attribute?

Note; GDC and LDC already have inline attributes. It's a pain in the arse to use them though, since in D, we have no way to alias attributes and can't do the typical C preprocessor tricks to insert appropriate attributes for different compilers. I'd strongly encourage considering making it an attribute for the reason that all compilers could then share the same attribute, rather than remaining fragmented as it is. Will marking a function as inline notify the compiler that code should
 never be emitted to object files for that function?

 Perhaps OT:
 I've been playing with ranges a lot recently, and std.algorithm and
 friends, and I'm finding that using lambdas is real problem. They don't
 reliably inline, and the optimiser seems to have problems on occasion even
 when they do. (Perhaps they inline at the wrong stage?)
 How can we have some guarantees about the inlining and inline-ability of
 trivial lambda's?
 I'm very concerned about the performance of debug code when using
 something like filter!"condition", which results in a whole bunch of extra
 function calls per loop iteration.
 I raised a thread recently about the idea of adding an additional optional
 argument to foreach to provide a filtering or termination condition, which
 if implemented by the language would have no overhead cost. The suggestion
 was to use filter!"", which sounds like a reasonable idea, but I'm really
 worried about the performance implications of using library primitives that
 produce a bunch of extra function calls on every loop cycle. I'm not sure
 these are practical when used in sufficiently trivial loops. Imagine I'm
 looping over a vertex array or an image or something, skipping over
 transparent pixels, or something like that... millions of iterations
 performing very trivial transformation, calling a bunch of functions every
 cycle.

 On 23 February 2014 22:07, Walter Bright <newshound2 digitalmars.com>wrote:

 http://wiki.dlang.org/DIP56

 Manu has needed always inlining, and I've needed never inlining. This DIP
 proposes a simple solution.


--001a1136a0d416501e04f3298b16 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr"><div class=3D"gmail_extra"><div class=3D"gmail_quote">On 2= 5 February 2014 02:30, Manu <span dir=3D"ltr">&lt;<a href=3D"mailto:turkeym= an gmail.com" target=3D"_blank">turkeyman gmail.com</a>&gt;</span> wrote:<b= r><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:= 1px #ccc solid;padding-left:1ex"> <div dir=3D"ltr"><div class=3D"gmail_extra">This will probably do, but I st= ill don&#39;t understand why not a function attribute?</div></div></blockqu= ote><div><br></div><div>Note; GDC and LDC already have inline attributes. I= t&#39;s a pain in the arse to use them though, since in D, we have no way t= o alias attributes and can&#39;t do the typical C preprocessor tricks to in= sert appropriate attributes for different compilers.</div> <div>I&#39;d strongly encourage considering making it an attribute for the = reason that all compilers could then share the same attribute, rather than = remaining fragmented as it is.</div><div><br></div><div><br></div><blockquo= te class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc so= lid;padding-left:1ex"> <div dir=3D"ltr"><div class=3D"gmail_extra"><div class=3D"gmail_extra">Will= marking a function as inline notify the compiler that code should never be= emitted to object files for that function?</div> <div><br></div><div>Perhaps OT:</div></div><div class=3D"gmail_extra">I&#39= ;ve been playing with ranges a lot recently, and std.algorithm and friends,= and I&#39;m finding that using lambdas is real problem. They don&#39;t rel= iably inline, and the optimiser seems to have problems on occasion even whe= n they do. (Perhaps they inline at the wrong stage?)</div> <div class=3D"gmail_extra">How can we have some guarantees about the inlini= ng and inline-ability of trivial lambda&#39;s?</div><div class=3D"gmail_ext= ra">I&#39;m very concerned about the performance of debug code when using s= omething like filter!&quot;condition&quot;, which results in a whole bunch = of extra function calls per loop iteration.</div> <div class=3D"gmail_extra">I raised a thread recently about the idea of add= ing an additional optional argument to foreach to provide a filtering or te= rmination condition, which if implemented by the language would have no ove= rhead cost. The suggestion was to use filter!&quot;&quot;, which sounds lik= e a reasonable idea, but I&#39;m really worried about the performance impli= cations of using library primitives that produce a bunch of extra function = calls on every loop cycle. I&#39;m not sure these are practical when used i= n sufficiently trivial loops. Imagine I&#39;m looping over a vertex array o= r an image or something, skipping over transparent pixels, or something lik= e that... millions of iterations performing very trivial transformation, ca= lling a bunch of functions every cycle.</div> <div class=3D""> <div class=3D"gmail_extra"><br></div><div class=3D"gmail_extra"><div class= =3D"gmail_quote"> On 23 February 2014 22:07, Walter Bright <span dir=3D"ltr">&lt;<a href=3D"m= ailto:newshound2 digitalmars.com" target=3D"_blank">newshound2 digitalmars.= com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"mar= gin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,2= 04);border-left-style:solid;padding-left:1ex"> <a href=3D"http://wiki.dlang.org/DIP56" target=3D"_blank">http://wiki.dlang= .org/DIP56</a><br> <br> Manu has needed always inlining, and I&#39;ve needed never inlining. This D= IP proposes a simple solution.<br> </blockquote></div><br></div></div></div> </blockquote></div><br></div></div> --001a1136a0d416501e04f3298b16--
Feb 24 2014
prev sibling next sibling parent Manu <turkeyman gmail.com> writes:
--001a11c2cba488610a04f329a04d
Content-Type: text/plain; charset=UTF-8

On 23 February 2014 22:57, Walter Bright <newshound2 digitalmars.com> wrote:

 On 2/23/2014 4:25 AM, Tove wrote:

 The DIP should probably specify what happens if inlining fails,
 i.e. generate a compilation error.

I suspect that may cause problems, because different compilers will have different inlining capabilities. I think it should be a 'recommendation' to the compiler.

Does this depend how it is implemented? Will DMD just patch it directly into the AST like a mixin in the front end, or is it always left to the back end? --001a11c2cba488610a04f329a04d Content-Type: text/html; charset=UTF-8 <div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On 23 February 2014 22:57, Walter Bright <span dir="ltr">&lt;<a href="mailto:newshound2 digitalmars.com" target="_blank">newshound2 digitalmars.com</a>&gt;</span> wrote:<br> <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;pa ding-left:1ex"><div class="">On 2/23/2014 4:25 AM, Tove wrote:<br> <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"> The DIP should probably specify what happens if inlining fails,<br> i.e. generate a compilation error.<br> </blockquote> <br></div> I suspect that may cause problems, because different compilers will have different inlining capabilities. I think it should be a &#39;recommendation&#39; to the compiler.<br> </blockquote><div><br></div><div>Does this depend how it is implemented?</div><div>Will DMD just patch it directly into the AST like a mixin in the front end, or is it always left to the back end?</div></div></div></div> --001a11c2cba488610a04f329a04d--
Feb 24 2014
prev sibling next sibling parent Manu <turkeyman gmail.com> writes:
--047d7b5d5c726ced7204f329af18
Content-Type: text/plain; charset=UTF-8

On 23 February 2014 22:55, Walter Bright <newshound2 digitalmars.com> wrote:

 On 2/23/2014 4:53 AM, ponce wrote:

 On Sunday, 23 February 2014 at 12:07:40 UTC, Walter Bright wrote:

 http://wiki.dlang.org/DIP56

 Manu has needed always inlining, and I've needed never inlining. This DIP
 proposes a simple solution.

This is great. I bet this will be useful. I tend to prefer force-inline/force-not-inline at call site, but realized the proposal will let me do it: void myFun(bool inlined)(int arg) { static if (inlined) pragma(inline, true); else pragma(inline, false); } Then inlining can be entirely explicit :)

Or better: void myFun(bool inlined)(int arg) { pragma(inline, inlined); } :-)

Really? I think you're just trying to be different for the sake of being different :P --047d7b5d5c726ced7204f329af18 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr"><div class=3D"gmail_extra"><div class=3D"gmail_quote">On 2= 3 February 2014 22:55, Walter Bright <span dir=3D"ltr">&lt;<a href=3D"mailt= o:newshound2 digitalmars.com" target=3D"_blank">newshound2 digitalmars.com<= /a>&gt;</span> wrote:<br> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p= x #ccc solid;padding-left:1ex"><div class=3D"HOEnZb"><div class=3D"h5">On 2= /23/2014 4:53 AM, ponce wrote:<br> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p= x #ccc solid;padding-left:1ex"> On Sunday, 23 February 2014 at 12:07:40 UTC, Walter Bright wrote:<br> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p= x #ccc solid;padding-left:1ex"> <a href=3D"http://wiki.dlang.org/DIP56" target=3D"_blank">http://wiki.dlang= .org/DIP56</a><br> <br> Manu has needed always inlining, and I&#39;ve needed never inlining. This D= IP<br> proposes a simple solution.<br> </blockquote> <br> This is great. I bet this will be useful.<br> <br> I tend to prefer force-inline/force-not-inline at call site, but realized t= he<br> proposal will let me do it:<br> <br> void myFun(bool inlined)(int arg)<br> {<br> =C2=A0 =C2=A0 =C2=A0static if (inlined)<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0pragma(inline, true);<br> =C2=A0 =C2=A0 =C2=A0else<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0pragma(inline, false);<br> }<br> <br> Then inlining can be entirely explicit :)<br> </blockquote> <br></div></div> Or better:<br> <br> void myFun(bool inlined)(int arg)<br> {<br> =C2=A0 =C2=A0 pragma(inline, inlined);<br> }<br> <br> :-)<br> </blockquote></div><br></div><div class=3D"gmail_extra">Really? I think you= &#39;re just trying to be different for the sake of being different :P</div=
</div>

--047d7b5d5c726ced7204f329af18--
Feb 24 2014
prev sibling next sibling parent Manu <turkeyman gmail.com> writes:
--047d7b5d5c72343b2b04f329e0f7
Content-Type: text/plain; charset=UTF-8

On 24 February 2014 07:53, Walter Bright <newshound2 digitalmars.com> wrote:

  With error - yo get a huge advantage - an _instant_ feedback that it
 doesn't do
 what you want it to do. Otherwise it gets the extra pleasure of running
 disassembler to pinpoint your favorite call sites or observing that your
 profiler shows the same awful stats.

My point is you're going to have to look at the asm of the top functions on the profiler stats anyway, or you're wasting your time trying to optimize the code. (Speaking from considerable experience doing that.) There's a heluva lot more to optimizing effectively than inlining, and it takes some back-and-forth tweaking source code and looking at the assembler. I gave some examples of that above.

For those interested, in my experience, the value of inlining is rarely related to eliminating the cost of the function call. call and ret have virtually no impact on performance on any architecture I've used. The main value is that it eliminates stuffing around with parameter lists, and managing save registers. Also, some argument types can't pass in registers, which means they pass through memory, and memory access should be treated no differently from the hard drive in realtime code ;) .. The worst case is a write followed by an immediate read (non-register argument, or save register value); some architectures stall waiting for the full flush before they can read it back. It's called a Load-Hit-Store hazard, and it's the most expensive low level hazard short of an L2 miss. But the most important use by far is that you can control which functions are leaf functions. Leaf functions (functions that don't allocate a stack frame at all) are critical for good performance. Any small helper functions you call MUST be inlined, or your function is no longer eligible to be a leaf function. I agree that inline should be a hint (a STRONG hint, not like 'inline' in C, more like __force_inline, perhaps stronger), but I'd like it if I received a warning when it failed for whatever reason. I don't want it to stop compiling, but a nice notification that I should look into it, and the ability to disable/silence the warning if I can't/don't intend to. --047d7b5d5c72343b2b04f329e0f7 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr"><div class=3D"gmail_extra"><div class=3D"gmail_quote">On 2= 4 February 2014 07:53, Walter Bright <span dir=3D"ltr">&lt;<a href=3D"mailt= o:newshound2 digitalmars.com" target=3D"_blank">newshound2 digitalmars.com<= /a>&gt;</span> wrote:<br> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p= x #ccc solid;padding-left:1ex"><div class=3D""><br></div><div class=3D""> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p= x #ccc solid;padding-left:1ex"> With error - yo get a huge advantage - an _instant_ feedback that it doesn&= #39;t do<br> what you want it to do. Otherwise it gets the extra pleasure of running<br> disassembler to pinpoint your favorite call sites or observing that your<br=

</blockquote> <br></div> My point is you&#39;re going to have to look at the asm of the top function= s on the profiler stats anyway, or you&#39;re wasting your time trying to o= ptimize the code. (Speaking from considerable experience doing that.) There= &#39;s a heluva lot more to optimizing effectively than inlining, and it ta= kes some back-and-forth tweaking source code and looking at the assembler. = I gave some examples of that above.<br> </blockquote><div><br></div><div>For those interested, in my experience, th= e value of inlining is rarely related to eliminating the cost of the functi= on call. call and ret have virtually no impact on performance on any archit= ecture I&#39;ve used.</div> <div>The main value is that it eliminates stuffing around with parameter li= sts, and managing save registers. Also, some argument types can&#39;t pass = in registers, which means they pass through memory, and memory access shoul= d be treated no differently from the hard drive in realtime code ;) .. The = worst case is a write followed by an immediate read (non-register argument,= or save register value); some architectures stall waiting for the full flu= sh before they can read it back. It&#39;s called a Load-Hit-Store hazard, a= nd it&#39;s the most expensive low level hazard short of an L2 miss.</div> <div>But the most important use by far is that you can control which functi= ons are leaf functions. Leaf functions (functions that don&#39;t allocate a= stack frame at all) are critical for good performance. Any small helper fu= nctions you call MUST be inlined, or your function is no longer eligible to= be a leaf function.</div> <div><br></div><div>I agree that inline should be a hint (a STRONG hint, no= t like &#39;inline&#39; in C, more like __force_inline, perhaps stronger), = but I&#39;d like it if I received a warning when it failed for whatever rea= son. I don&#39;t want it to stop compiling, but a nice notification that I = should look into it, and the ability to disable/silence the warning if I ca= n&#39;t/don&#39;t intend to.</div> </div></div></div> --047d7b5d5c72343b2b04f329e0f7--
Feb 24 2014
prev sibling next sibling parent "ponce" <contact gam3sfrommars.fr> writes:
On Monday, 24 February 2014 at 02:05:31 UTC, Walter Bright wrote:
 1. It provides information to the compiler about runtime 
 frequency that it cannot obtain otherwise. This is very useful 
 information for generating better code.

 2. Making it a hard requirement then means the user will have 
 to put versioning in it. It becomes inherently non-portable. 
 There is no way to predict what some other version of some 
 other compiler on some other system will do.

never hit that limitation with ICC. Like others I would like unconditional and explicit optimization from the compiler.
 3. In the end, the compiler should make the decision. Inlining 
 does not always result in faster code, as I pointed out in 
 another post.

Also when I use "force inline" it's very often to force "not-inline" to reuse the same bit of code while the compiler would have inlined it. Each optimization here is taken a repeatable automated A-B test with a 95% statistical significance on various inputs, and forcing inline/not-inline has been an effective tool to reduce the I-cache stress that plagues some very particular program areas that the compiler doesn't differentiate. This can be checked by looking at assembly or binary size afterwards. I'm perfectly OK with the compiler doing what he wants when I don't tell it to inline or not. AFAIK the C/C++ inline keyword is mostly ignored by optimizing compilers, it's precisely a keyword that is both overused and meaningless.
 Perhaps the lesson is the word 'inline' carries certain 
 expectations with it, and the feature would be better 
 positioned as something like:

     pragma(usage, often);
     pragma(usage, rare);

To me it's not so much about usage frequency that about I-cache misses. Some inlining can be nearly free (I-cache working set small), or very costly (I-cache actively being the bottleneck through repeated miss due to large working set).
Feb 24 2014
prev sibling next sibling parent "Kapps" <opantm2+spam gmail.com> writes:
On Monday, 24 February 2014 at 16:58:21 UTC, Manu wrote:
 I agree that inline should be a hint (a STRONG hint, not like 
 'inline' in
 C, more like __force_inline, perhaps stronger), but I'd like it 
 if I
 received a warning when it failed for whatever reason. I don't 
 want it to
 stop compiling, but a nice notification that I should look into 
 it, and the
 ability to disable/silence the warning if I can't/don't intend 
 to.

Perhaps something like a -vinline similar to -vtls? You don't need to be spammed repeatedly every time you build saying something isn't inlined, yet this still gives an easy way of seeing which methods you requested to be inlined that were not. The flag would display only functions marked with pragma(inline, true).
Feb 24 2014
prev sibling next sibling parent "Dicebot" <public dicebot.lv> writes:
On Monday, 24 February 2014 at 18:00:39 UTC, Kapps wrote:
 Perhaps something like a -vinline similar to -vtls? You don't
 need to be spammed repeatedly every time you build saying
 something isn't inlined, yet this still gives an easy way of
 seeing which methods you requested to be inlined that were not.
 The flag would display only functions marked with pragma(inline,
 true).

As I have already mentioned in this thread, there already does exist pull request to add flag to print inlining diagnostics. It can be re-used once merged.
Feb 24 2014
prev sibling next sibling parent "deadalnix" <deadalnix gmail.com> writes:
On Monday, 24 February 2014 at 16:58:21 UTC, Manu wrote:
 For those interested, in my experience, the value of inlining 
 is rarely
 related to eliminating the cost of the function call. call and 
 ret have
 virtually no impact on performance on any architecture I've 
 used.

It highly depends on the architecture you run on. X86 is astonishingly good at this.
 The main value is that it eliminates stuffing around with 
 parameter lists,
 and managing save registers. Also, some argument types can't 
 pass in
 registers, which means they pass through memory, and memory 
 access should
 be treated no differently from the hard drive in realtime code 
 ;) .. The
 worst case is a write followed by an immediate read 
 (non-register argument,
 or save register value); some architectures stall waiting for 
 the full
 flush before they can read it back. It's called a 
 Load-Hit-Store hazard,
 and it's the most expensive low level hazard short of an L2 
 miss.

All modern architecture (if I put aside PIC) that I know of have a store buffer to avoid this. Also, not inlining prevent the compiler to do constant propagation, and as such, prevent the compiler from doing a lot of optimizations.
 I agree that inline should be a hint (a STRONG hint, not like 
 'inline' in
 C, more like __force_inline, perhaps stronger), but I'd like it 
 if I
 received a warning when it failed for whatever reason. I don't 
 want it to
 stop compiling, but a nice notification that I should look into 
 it, and the
 ability to disable/silence the warning if I can't/don't intend 
 to.

Proposed semantic: Inline unless for some reason you cannot. If you cannot, warn about it.
Feb 24 2014
prev sibling next sibling parent Jerry <jlquinn optonline.net> writes:
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

 On 2/23/14, 8:26 PM, Vladimir Panteleev wrote:
 Thus, I think there should be "try to inline" (same as -inline) and
 "always inline" (failure stops compilation).

Sounds fair enough.

pragma(inline, false); pragma(inline, true); pragma(inline, force); // inline or die How is that?
Feb 24 2014
prev sibling parent "francesco cattoglio" <francesco.cattoglio gmail.com> writes:
On Monday, 24 February 2014 at 22:09:49 UTC, Jerry wrote:
 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

 On 2/23/14, 8:26 PM, Vladimir Panteleev wrote:
 Thus, I think there should be "try to inline" (same as 
 -inline) and
 "always inline" (failure stops compilation).

Sounds fair enough.

pragma(inline, false); pragma(inline, true); pragma(inline, force); // inline or die How is that?

Personally I like it. Perhaps you forgot pragma(inline, never); // don't inline or die but I honestly have no idea if this would actually be useful. Anyway, I'm really fine if there will be no way to force inline. But if we can't guarantee that inlining actually happens, please change the pragma name.
Feb 25 2014