www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Optimize away immediately-called delegate literals?

reply "Nick Sabalausky" <a a.a> writes:
Suppose you have a delegate literal and immediately call it:

auto a = x + (){ doStuff(); return y; }() + z;

Does DMD ever (or always?) optimize away a delegate if it's executed 
immediately and never stored into a variable? If not, can it, and would it 
be a simple change? Is something like this already on the table?
Mar 10 2012
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Sun, Mar 11, 2012 at 01:29:01AM -0500, Nick Sabalausky wrote:
 Suppose you have a delegate literal and immediately call it:
 
 auto a = x + (){ doStuff(); return y; }() + z;
 
 Does DMD ever (or always?) optimize away a delegate if it's executed
 immediately and never stored into a variable? If not, can it, and
 would it be a simple change? Is something like this already on the
 table?

[...] I've always wondered about whether delegates passed to opApply ever get inlined. Seems to be pretty silly to do the entire function pointer + context pointer and full function call thing, if both opApply and the delegate are very simple. Especially if opApply is not much more than a for loop of some sort, perhaps with a line or two of private member access code inserted. T -- Never ascribe to malice that which is adequately explained by incompetence. -- Napoleon Bonaparte
Mar 10 2012
next sibling parent reply "Nick Sabalausky" <a a.a> writes:
"H. S. Teoh" <hsteoh quickfur.ath.cx> wrote in message 
news:mailman.455.1331448575.4860.digitalmars-d puremagic.com...
 Never ascribe to malice that which is adequately explained by
 incompetence. -- Napoleon Bonaparte

Pardon me veering offtopic at one of your taglines yet again, but I have to say, this is one that I've believed very strongly in for years. It's practically a mantra of mine, a big part of my own personal philosophy. I've never heard it attributed to Napoleon, though. I always knew it as "Hanlon's Razor", somewhat of a corollary to Occam's Razor.
Mar 10 2012
parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Sun, Mar 11, 2012 at 03:03:39AM -0400, Nick Sabalausky wrote:
 "H. S. Teoh" <hsteoh quickfur.ath.cx> wrote in message 
 news:mailman.455.1331448575.4860.digitalmars-d puremagic.com...
 Never ascribe to malice that which is adequately explained by
 incompetence. -- Napoleon Bonaparte

Pardon me veering offtopic at one of your taglines yet again, but I have to say, this is one that I've believed very strongly in for years. It's practically a mantra of mine, a big part of my own personal philosophy. I've never heard it attributed to Napoleon, though. I always knew it as "Hanlon's Razor", somewhat of a corollary to Occam's Razor.

[...] I guess my sources aren't that reliable. I just copy-n-pasted that quote from somebody else's sig. :-) T -- Some ideas are so stupid that only intellectuals could believe them. -- George Orwell
Mar 12 2012
prev sibling parent reply "Peter Alexander" <peter.alexander.au gmail.com> writes:
On Sunday, 11 March 2012 at 06:49:27 UTC, H. S. Teoh wrote:
 On Sun, Mar 11, 2012 at 01:29:01AM -0500, Nick Sabalausky wrote:
 Suppose you have a delegate literal and immediately call it:
 
 auto a = x + (){ doStuff(); return y; }() + z;
 
 Does DMD ever (or always?) optimize away a delegate if it's 
 executed
 immediately and never stored into a variable? If not, can it, 
 and
 would it be a simple change? Is something like this already on 
 the
 table?

[...] I've always wondered about whether delegates passed to opApply ever get inlined.

Don't wonder. Find out! import std.stdio; void doStuff() { writeln("Howdy!"); } void main() { int x = 1, y = 2, z = 3; auto a = x + (){ doStuff(); return y; }() + z; writeln(a); } $ dmd test.d -O -release -inline __Dmain: 000000010000106c pushq %rbp 000000010000106d movq %rsp,%rbp 0000000100001070 pushq %rax 0000000100001071 pushq %rbx 0000000100001072 movq $0x000000000000000c,%rdi 000000010000107c callq 0x1000237f0 ; symbol stub for: __d_allocmemory 0000000100001081 movq %rax,%rbx 0000000100001084 movq $0x00000000,(%rbx) 000000010000108b movl $0x00000002,0x08(%rbx) 0000000100001092 movq %rbx,%rdi 0000000100001095 call *0x0002318c(%rip) 000000010000109c leal 0x04(%rax),%edx 000000010000109f movl $0x0000000a,%esi 00000001000010a4 leaq 0x00033eed(%rip),%rdi 00000001000010ab callq 0x10002319c ; symbol stub for: _D3std5stdio4File14__T5writeTiTaZ5writeMFiaZv 00000001000010b0 xorl %eax,%eax 00000001000010b2 popq %rbx 00000001000010b3 movq %rbp,%rsp 00000001000010b6 popq %rbp 00000001000010b7 ret In short. No! It doesn't currently inline in this case. Even if the lambda just returns a constant, it doesn't get inlined.
Mar 12 2012
next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Mar 13, 2012 at 12:15:02AM +0100, Peter Alexander wrote:
 On Sunday, 11 March 2012 at 06:49:27 UTC, H. S. Teoh wrote:
On Sun, Mar 11, 2012 at 01:29:01AM -0500, Nick Sabalausky wrote:
Suppose you have a delegate literal and immediately call it:

auto a = x + (){ doStuff(); return y; }() + z;

Does DMD ever (or always?) optimize away a delegate if it's executed
immediately and never stored into a variable? If not, can it, and
would it be a simple change? Is something like this already on the
table?

[...] I've always wondered about whether delegates passed to opApply ever get inlined.

Don't wonder. Find out! import std.stdio; void doStuff() { writeln("Howdy!"); } void main() { int x = 1, y = 2, z = 3; auto a = x + (){ doStuff(); return y; }() + z; writeln(a); } $ dmd test.d -O -release -inline __Dmain: 000000010000106c pushq %rbp 000000010000106d movq %rsp,%rbp 0000000100001070 pushq %rax 0000000100001071 pushq %rbx 0000000100001072 movq $0x000000000000000c,%rdi 000000010000107c callq 0x1000237f0 ; symbol stub for: __d_allocmemory 0000000100001081 movq %rax,%rbx 0000000100001084 movq $0x00000000,(%rbx) 000000010000108b movl $0x00000002,0x08(%rbx) 0000000100001092 movq %rbx,%rdi 0000000100001095 call *0x0002318c(%rip) 000000010000109c leal 0x04(%rax),%edx 000000010000109f movl $0x0000000a,%esi 00000001000010a4 leaq 0x00033eed(%rip),%rdi 00000001000010ab callq 0x10002319c ; symbol stub for: _D3std5stdio4File14__T5writeTiTaZ5writeMFiaZv 00000001000010b0 xorl %eax,%eax 00000001000010b2 popq %rbx 00000001000010b3 movq %rbp,%rsp 00000001000010b6 popq %rbp 00000001000010b7 ret In short. No! It doesn't currently inline in this case. Even if the lambda just returns a constant, it doesn't get inlined.

Hmph. I tried this code: import std.stdio; struct A { int[] data; int opApply(int delegate(ref int) dg) { foreach (d; data) { if (dg(d)) return 1; } return 0; } } void main() { A a; int n = 0; a.data = [1,2,3,4,5]; foreach (d; a) { n++; } } With both dmd and gdc, the delegate is never inlined. :-( Compiling with gdc -O3 causes opApply to get inlined and loop-unrolled, but the call to the delegate is still there. With dmd -O, even opApply is not inlined, and the code is generally much longer per loop iteration than gdc -O3. Here's the code generated by gdc for the foreach() loop in main(): 404839: 48 89 c3 mov %rax,%rbx 40483c: 48 8b 04 24 mov (%rsp),%rax 404840: 48 8d 74 24 3c lea 0x3c(%rsp),%rsi 404845: 48 8d 7c 24 20 lea 0x20(%rsp),%rdi 40484a: 48 89 03 mov %rax,(%rbx) 40484d: 48 8b 44 24 08 mov 0x8(%rsp),%rax 404852: 48 89 43 08 mov %rax,0x8(%rbx) 404856: 8b 44 24 10 mov 0x10(%rsp),%eax 40485a: 89 43 10 mov %eax,0x10(%rbx) 40485d: 8b 03 mov (%rbx),%eax 40485f: 89 44 24 3c mov %eax,0x3c(%rsp) 404863: ff d5 callq *%rbp 404865: 85 c0 test %eax,%eax 404867: 75 58 jne 4048c1 <_Dmain+0xd1> 404869: 8b 43 04 mov 0x4(%rbx),%eax 40486c: 48 8d 74 24 3c lea 0x3c(%rsp),%rsi 404871: 48 8d 7c 24 20 lea 0x20(%rsp),%rdi 404876: 89 44 24 3c mov %eax,0x3c(%rsp) 40487a: ff d5 callq *%rbp 40487c: 85 c0 test %eax,%eax 40487e: 75 41 jne 4048c1 <_Dmain+0xd1> 404880: 8b 43 08 mov 0x8(%rbx),%eax 404883: 48 8d 74 24 3c lea 0x3c(%rsp),%rsi 404888: 48 8d 7c 24 20 lea 0x20(%rsp),%rdi 40488d: 89 44 24 3c mov %eax,0x3c(%rsp) 404891: ff d5 callq *%rbp 404893: 85 c0 test %eax,%eax 404895: 75 2a jne 4048c1 <_Dmain+0xd1> 404897: 8b 43 0c mov 0xc(%rbx),%eax 40489a: 48 8d 74 24 3c lea 0x3c(%rsp),%rsi 40489f: 48 8d 7c 24 20 lea 0x20(%rsp),%rdi 4048a4: 89 44 24 3c mov %eax,0x3c(%rsp) 4048a8: ff d5 callq *%rbp 4048aa: 85 c0 test %eax,%eax 4048ac: 75 13 jne 4048c1 <_Dmain+0xd1> 4048ae: 8b 43 10 mov 0x10(%rbx),%eax 4048b1: 48 8d 74 24 3c lea 0x3c(%rsp),%rsi 4048b6: 48 8d 7c 24 20 lea 0x20(%rsp),%rdi 4048bb: 89 44 24 3c mov %eax,0x3c(%rsp) 4048bf: ff d5 callq *%rbp 4048c1: 48 83 c4 48 add $0x48,%rsp 4048c5: 31 c0 xor %eax,%eax 4048c7: 5b pop %rbx 4048c8: 5d pop %rbp 4048c9: c3 retq Notice that each loop iteration is only 7 instructions, with array elements loaded directly via an offset. The loop in opApply generated by dmd looks like this: 08049314 <_D4test1A7opApplyMFDFKiZiZi>: 8049314: 55 push %ebp 8049315: 8b ec mov %esp,%ebp 8049317: 83 ec 18 sub $0x18,%esp 804931a: 53 push %ebx 804931b: 56 push %esi 804931c: 89 45 f8 mov %eax,-0x8(%ebp) 804931f: 83 7d f8 00 cmpl $0x0,-0x8(%ebp) 8049323: 75 1f jne 8049344 <_D4test1A7opApplyMFDFKiZiZi+0x30> 8049325: 6a 06 push $0x6 8049327: ff 35 d4 a3 05 08 pushl 0x805a3d4 804932d: ff 35 d0 a3 05 08 pushl 0x805a3d0 8049333: ff 35 ec a3 05 08 pushl 0x805a3ec 8049339: ff 35 e8 a3 05 08 pushl 0x805a3e8 804933f: e8 8c 04 00 00 call 80497d0 <_d_assert_msg> 8049344: 8b 45 f8 mov -0x8(%ebp),%eax 8049347: 8b 50 04 mov 0x4(%eax),%edx 804934a: 8b 00 mov (%eax),%eax 804934c: 89 45 e8 mov %eax,-0x18(%ebp) 804934f: 89 55 ec mov %edx,-0x14(%ebp) 8049352: c7 45 f0 00 00 00 00 movl $0x0,-0x10(%ebp) 8049359: 8b 4d f0 mov -0x10(%ebp),%ecx 804935c: 3b 4d e8 cmp -0x18(%ebp),%ecx 804935f: 73 41 jae 80493a2 <_D4test1A7opApplyMFDFKiZiZi+0x8e> 8049361: 8b 5d f0 mov -0x10(%ebp),%ebx 8049364: 3b 5d e8 cmp -0x18(%ebp),%ebx 8049367: 72 0a jb 8049373 <_D4test1A7opApplyMFDFKiZiZi+0x5f> 8049369: b8 07 00 00 00 mov $0x7,%eax 804936e: e8 b9 00 00 00 call 804942c <_D4test7__arrayZ> 8049373: 8b 55 ec mov -0x14(%ebp),%edx 8049376: 8b 45 e8 mov -0x18(%ebp),%eax 8049379: 8b 34 9a mov (%edx,%ebx,4),%esi 804937c: 89 75 f4 mov %esi,-0xc(%ebp) 804937f: 8d 4d f4 lea -0xc(%ebp),%ecx 8049382: 51 push %ecx 8049383: 8b 45 08 mov 0x8(%ebp),%eax 8049386: 8b 55 0c mov 0xc(%ebp),%edx 8049389: 8b 5d 08 mov 0x8(%ebp),%ebx 804938c: ff d2 call *%edx 804938e: 85 c0 test %eax,%eax 8049390: 74 0b je 804939d <_D4test1A7opApplyMFDFKiZiZi+0x89> 8049392: b8 01 00 00 00 mov $0x1,%eax 8049397: 5e pop %esi 8049398: 5b pop %ebx 8049399: c9 leave 804939a: c2 08 00 ret $0x8 804939d: ff 45 f0 incl -0x10(%ebp) 80493a0: eb b7 jmp 8049359 <_D4test1A7opApplyMFDFKiZiZi+0x45> 80493a2: 31 c0 xor %eax,%eax 80493a4: 5e pop %esi 80493a5: 5b pop %ebx 80493a6: c9 leave 80493a7: c2 08 00 ret $0x8 80493aa: 90 nop 80493ab: 90 nop The loop body is significantly longer, about 22 instructions when the exit branch isn't taken, and includes a call to a bounds check routine per iteration. I suppose the gdc case has been heavily optimized by gcc's optimizing backend. :-) Though I'm kinda disappointed that it didn't inline the trivial delegate. Or is that because of the way the front-end generates the AST? T -- Nothing in the world is more distasteful to a man than to take the path that leads to himself. -- Herman Hesse
Mar 12 2012
parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Mon, 12 Mar 2012 20:28:15 -0400, H. S. Teoh <hsteoh quickfur.ath.cx>  
wrote:

 Hmph.

 I tried this code:

 	import std.stdio;
 	struct A {
 		int[] data;
 		int opApply(int delegate(ref int) dg) {
 			foreach (d; data) {
 				if (dg(d)) return 1;
 			}
 			return 0;
 		}
 	}
 	void main() {
 		A a;
 		int n = 0;

 		a.data = [1,2,3,4,5];
 		foreach (d; a) {
 			n++;
 		}
 	}

 With both dmd and gdc, the delegate is never inlined. :-(  Compiling
 with gdc -O3 causes opApply to get inlined and loop-unrolled, but the
 call to the delegate is still there. With dmd -O, even opApply is not
 inlined, and the code is generally much longer per loop iteration than
 gdc -O3.

IIRC, ldc does inline opApply. But this is somewhat hearsay since I don't use ldc. I'm just remembering what others have posted here. -Steve
Mar 14 2012
prev sibling next sibling parent reply Brad Roberts <braddr puremagic.com> writes:
On 3/12/2012 4:15 PM, Peter Alexander wrote:
 On Sunday, 11 March 2012 at 06:49:27 UTC, H. S. Teoh wrote:
 On Sun, Mar 11, 2012 at 01:29:01AM -0500, Nick Sabalausky wrote:
 Suppose you have a delegate literal and immediately call it:

 auto a = x + (){ doStuff(); return y; }() + z;

 Does DMD ever (or always?) optimize away a delegate if it's executed
 immediately and never stored into a variable? If not, can it, and
 would it be a simple change? Is something like this already on the
 table?

[...] I've always wondered about whether delegates passed to opApply ever get inlined.

Don't wonder. Find out! import std.stdio; void doStuff() { writeln("Howdy!"); } void main() { int x = 1, y = 2, z = 3; auto a = x + (){ doStuff(); return y; }() + z; writeln(a); }

See also: bug 4440 The patch in there, if it hasn't bit rotten to badly (I suspect it has) will handle _this_ case. But almost no other case of inlining delegates. It'd be a good area for someone who wants an interesting and non-trivial problem to dig into. It wouldn't touch all that much of the codebase as the inliner is fairly self-contained. At least, that's what I recall from when I looked at this stuff a couple years ago. Later, Brad
Mar 12 2012
parent reply "Nick Sabalausky" <a a.a> writes:
"Brad Roberts" <braddr puremagic.com> wrote in message 
news:mailman.582.1331607753.4860.digitalmars-d puremagic.com...
 On 3/12/2012 4:15 PM, Peter Alexander wrote:
 On Sunday, 11 March 2012 at 06:49:27 UTC, H. S. Teoh wrote:
 On Sun, Mar 11, 2012 at 01:29:01AM -0500, Nick Sabalausky wrote:
 Suppose you have a delegate literal and immediately call it:

 auto a = x + (){ doStuff(); return y; }() + z;

 Does DMD ever (or always?) optimize away a delegate if it's executed
 immediately and never stored into a variable? If not, can it, and
 would it be a simple change? Is something like this already on the
 table?

[...] I've always wondered about whether delegates passed to opApply ever get inlined.

Don't wonder. Find out! import std.stdio; void doStuff() { writeln("Howdy!"); } void main() { int x = 1, y = 2, z = 3; auto a = x + (){ doStuff(); return y; }() + z; writeln(a); }

See also: bug 4440 The patch in there, if it hasn't bit rotten to badly (I suspect it has) will handle _this_ case. But almost no other case of inlining delegates. It'd be a good area for someone who wants an interesting and non-trivial problem to dig into. It wouldn't touch all that much of the codebase as the inliner is fairly self-contained. At least, that's what I recall from when I looked at this stuff a couple years ago.

Do you think that patch would be a good starting place for further work, or would a proper solution likely necessitate an entirely different approach?
Mar 12 2012
parent Brad Roberts <braddr puremagic.com> writes:
On 3/12/2012 8:10 PM, Nick Sabalausky wrote:
 "Brad Roberts" <braddr puremagic.com> wrote:
 See also: bug 4440

 The patch in there, if it hasn't bit rotten to badly (I suspect it has) 
 will handle _this_ case.  But almost no other
 case of inlining delegates.

 It'd be a good area for someone who wants an interesting and non-trivial 
 problem to dig into.  It wouldn't touch all
 that much of the codebase as the inliner is fairly self-contained.  At 
 least, that's what I recall from when I looked at
 this stuff a couple years ago.

Do you think that patch would be a good starting place for further work, or would a proper solution likely necessitate an entirely different approach?

I suspect that, to fix it for a broader set of cases, it'll need much larger changes. During those changes the simple version in that patch would likely disappear. But take that all with a huge grain of salt.
Mar 12 2012
prev sibling parent "Nick Sabalausky" <a a.a> writes:
"Peter Alexander" <peter.alexander.au gmail.com> wrote in message 
news:thetmhnnbeepmxgusntr forum.dlang.org...
 On Sunday, 11 March 2012 at 06:49:27 UTC, H. S. Teoh wrote:
 On Sun, Mar 11, 2012 at 01:29:01AM -0500, Nick Sabalausky wrote:
 Suppose you have a delegate literal and immediately call it:

 auto a = x + (){ doStuff(); return y; }() + z;

 Does DMD ever (or always?) optimize away a delegate if it's executed
 immediately and never stored into a variable? If not, can it, and
 would it be a simple change? Is something like this already on the
 table?

[...] I've always wondered about whether delegates passed to opApply ever get inlined.

Don't wonder. Find out! import std.stdio; void doStuff() { writeln("Howdy!"); } void main() { int x = 1, y = 2, z = 3; auto a = x + (){ doStuff(); return y; }() + z; writeln(a); } $ dmd test.d -O -release -inline __Dmain: 000000010000106c pushq %rbp

[...] I keep forgetting I can do that. ;)
 In short. No! It doesn't currently inline in this case.

 Even if the lambda just returns a constant, it doesn't get inlined.

Darn. The reason this came up is I've gotten back into some more work on HaxeD (Haxe -> D converter). Haxe allows statements to be used as expressions. For example: x = 5 + if(cond) { foo(); 1; } else 2; At the moment, I'm converting those by just tossing them into an immediately-called anonymous delegate: x = 5 + (){ ...blah... }(); But as I was afraid of, there's a performance hit with that. So eventually, I'll have to either do some fancier reworking: // Something like typeof(1) __tmp1; if(cond) { foo(); __tmp1 = 1; } else __tmp1 = 2; x = 5 + __tmp1; ...which might even run into some problems with order-of-execution. Or delegate inlining would have to get added to DMD. Now that I actually look (heh :) ), it looks like there's an old Buzilla issue for it with an apperently overly-limited patch: http://d.puremagic.com/issues/show_bug.cgi?id=4440 Unfortunately, according to Brad Roberts in the last comment: "Getting delegate inlining in dmd is going to take serious work." (Ugh, I wish Windows had an easy way to copy/paste text *without* the styling.)
Mar 12 2012