www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Optimize away immediately-called delegate literals?

reply "Nick Sabalausky" <a a.a> writes:
Suppose you have a delegate literal and immediately call it:

auto a = x + (){ doStuff(); return y; }() + z;

Does DMD ever (or always?) optimize away a delegate if it's executed 
immediately and never stored into a variable? If not, can it, and would it 
be a simple change? Is something like this already on the table?
Mar 10 2012
next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Sun, Mar 11, 2012 at 01:29:01AM -0500, Nick Sabalausky wrote:
 Suppose you have a delegate literal and immediately call it:
 
 auto a = x + (){ doStuff(); return y; }() + z;
 
 Does DMD ever (or always?) optimize away a delegate if it's executed
 immediately and never stored into a variable? If not, can it, and
 would it be a simple change? Is something like this already on the
 table?

I've always wondered about whether delegates passed to opApply ever get inlined. Seems to be pretty silly to do the entire function pointer + context pointer and full function call thing, if both opApply and the delegate are very simple. Especially if opApply is not much more than a for loop of some sort, perhaps with a line or two of private member access code inserted. T -- Never ascribe to malice that which is adequately explained by incompetence. -- Napoleon Bonaparte
Mar 10 2012
next sibling parent "Nick Sabalausky" <a a.a> writes:
"H. S. Teoh" <hsteoh quickfur.ath.cx> wrote in message 
news:mailman.455.1331448575.4860.digitalmars-d puremagic.com...
 Never ascribe to malice that which is adequately explained by
 incompetence. -- Napoleon Bonaparte

Pardon me veering offtopic at one of your taglines yet again, but I have to say, this is one that I've believed very strongly in for years. It's practically a mantra of mine, a big part of my own personal philosophy. I've never heard it attributed to Napoleon, though. I always knew it as "Hanlon's Razor", somewhat of a corollary to Occam's Razor.
Mar 10 2012
prev sibling parent "Nick Sabalausky" <a a.a> writes:
"Peter Alexander" <peter.alexander.au gmail.com> wrote in message 
news:thetmhnnbeepmxgusntr forum.dlang.org...
 On Sunday, 11 March 2012 at 06:49:27 UTC, H. S. Teoh wrote:
 On Sun, Mar 11, 2012 at 01:29:01AM -0500, Nick Sabalausky wrote:
 Suppose you have a delegate literal and immediately call it:

 auto a = x + (){ doStuff(); return y; }() + z;

 Does DMD ever (or always?) optimize away a delegate if it's executed
 immediately and never stored into a variable? If not, can it, and
 would it be a simple change? Is something like this already on the
 table?

I've always wondered about whether delegates passed to opApply ever get inlined.

Don't wonder. Find out! import std.stdio; void doStuff() { writeln("Howdy!"); } void main() { int x = 1, y = 2, z = 3; auto a = x + (){ doStuff(); return y; }() + z; writeln(a); } $ dmd test.d -O -release -inline __Dmain: 000000010000106c pushq %rbp

I keep forgetting I can do that. ;)
 In short. No! It doesn't currently inline in this case.

 Even if the lambda just returns a constant, it doesn't get inlined.

Darn. The reason this came up is I've gotten back into some more work on HaxeD (Haxe -> D converter). Haxe allows statements to be used as expressions. For example: x = 5 + if(cond) { foo(); 1; } else 2; At the moment, I'm converting those by just tossing them into an immediately-called anonymous delegate: x = 5 + (){ ...blah... }(); But as I was afraid of, there's a performance hit with that. So eventually, I'll have to either do some fancier reworking: // Something like typeof(1) __tmp1; if(cond) { foo(); __tmp1 = 1; } else __tmp1 = 2; x = 5 + __tmp1; ...which might even run into some problems with order-of-execution. Or delegate inlining would have to get added to DMD. Now that I actually look (heh :) ), it looks like there's an old Buzilla issue for it with an apperently overly-limited patch: http://d.puremagic.com/issues/show_bug.cgi?id=4440 Unfortunately, according to Brad Roberts in the last comment: "Getting delegate inlining in dmd is going to take serious work." (Ugh, I wish Windows had an easy way to copy/paste text *without* the styling.)
Mar 12 2012
prev sibling next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Sun, Mar 11, 2012 at 03:03:39AM -0400, Nick Sabalausky wrote:
 "H. S. Teoh" <hsteoh quickfur.ath.cx> wrote in message 
 news:mailman.455.1331448575.4860.digitalmars-d puremagic.com...
 Never ascribe to malice that which is adequately explained by
 incompetence. -- Napoleon Bonaparte

Pardon me veering offtopic at one of your taglines yet again, but I have to say, this is one that I've believed very strongly in for years. It's practically a mantra of mine, a big part of my own personal philosophy. I've never heard it attributed to Napoleon, though. I always knew it as "Hanlon's Razor", somewhat of a corollary to Occam's Razor.

I guess my sources aren't that reliable. I just copy-n-pasted that quote from somebody else's sig. :-) T -- Some ideas are so stupid that only intellectuals could believe them. -- George Orwell
Mar 12 2012
prev sibling next sibling parent "Peter Alexander" <peter.alexander.au gmail.com> writes:
On Sunday, 11 March 2012 at 06:49:27 UTC, H. S. Teoh wrote:
 On Sun, Mar 11, 2012 at 01:29:01AM -0500, Nick Sabalausky wrote:
 Suppose you have a delegate literal and immediately call it:
 
 auto a = x + (){ doStuff(); return y; }() + z;
 
 Does DMD ever (or always?) optimize away a delegate if it's 
 executed
 immediately and never stored into a variable? If not, can it, 
 and
 would it be a simple change? Is something like this already on 
 the
 table?

I've always wondered about whether delegates passed to opApply ever get inlined.

Don't wonder. Find out! import std.stdio; void doStuff() { writeln("Howdy!"); } void main() { int x = 1, y = 2, z = 3; auto a = x + (){ doStuff(); return y; }() + z; writeln(a); } $ dmd test.d -O -release -inline __Dmain: 000000010000106c pushq %rbp 000000010000106d movq %rsp,%rbp 0000000100001070 pushq %rax 0000000100001071 pushq %rbx 0000000100001072 movq $0x000000000000000c,%rdi 000000010000107c callq 0x1000237f0 ; symbol stub for: __d_allocmemory 0000000100001081 movq %rax,%rbx 0000000100001084 movq $0x00000000,(%rbx) 000000010000108b movl $0x00000002,0x08(%rbx) 0000000100001092 movq %rbx,%rdi 0000000100001095 call *0x0002318c(%rip) 000000010000109c leal 0x04(%rax),%edx 000000010000109f movl $0x0000000a,%esi 00000001000010a4 leaq 0x00033eed(%rip),%rdi 00000001000010ab callq 0x10002319c ; symbol stub for: _D3std5stdio4File14__T5writeTiTaZ5writeMFiaZv 00000001000010b0 xorl %eax,%eax 00000001000010b2 popq %rbx 00000001000010b3 movq %rbp,%rsp 00000001000010b6 popq %rbp 00000001000010b7 ret In short. No! It doesn't currently inline in this case. Even if the lambda just returns a constant, it doesn't get inlined.
Mar 12 2012
prev sibling next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Mar 13, 2012 at 12:15:02AM +0100, Peter Alexander wrote:
 On Sunday, 11 March 2012 at 06:49:27 UTC, H. S. Teoh wrote:
On Sun, Mar 11, 2012 at 01:29:01AM -0500, Nick Sabalausky wrote:
Suppose you have a delegate literal and immediately call it:

auto a = x + (){ doStuff(); return y; }() + z;

Does DMD ever (or always?) optimize away a delegate if it's executed
immediately and never stored into a variable? If not, can it, and
would it be a simple change? Is something like this already on the
table?

I've always wondered about whether delegates passed to opApply ever get inlined.

Don't wonder. Find out! import std.stdio; void doStuff() { writeln("Howdy!"); } void main() { int x = 1, y = 2, z = 3; auto a = x + (){ doStuff(); return y; }() + z; writeln(a); } $ dmd test.d -O -release -inline __Dmain: 000000010000106c pushq %rbp 000000010000106d movq %rsp,%rbp 0000000100001070 pushq %rax 0000000100001071 pushq %rbx 0000000100001072 movq $0x000000000000000c,%rdi 000000010000107c callq 0x1000237f0 ; symbol stub for: __d_allocmemory 0000000100001081 movq %rax,%rbx 0000000100001084 movq $0x00000000,(%rbx) 000000010000108b movl $0x00000002,0x08(%rbx) 0000000100001092 movq %rbx,%rdi 0000000100001095 call *0x0002318c(%rip) 000000010000109c leal 0x04(%rax),%edx 000000010000109f movl $0x0000000a,%esi 00000001000010a4 leaq 0x00033eed(%rip),%rdi 00000001000010ab callq 0x10002319c ; symbol stub for: _D3std5stdio4File14__T5writeTiTaZ5writeMFiaZv 00000001000010b0 xorl %eax,%eax 00000001000010b2 popq %rbx 00000001000010b3 movq %rbp,%rsp 00000001000010b6 popq %rbp 00000001000010b7 ret In short. No! It doesn't currently inline in this case. Even if the lambda just returns a constant, it doesn't get inlined.

Hmph. I tried this code: import std.stdio; struct A { int[] data; int opApply(int delegate(ref int) dg) { foreach (d; data) { if (dg(d)) return 1; } return 0; } } void main() { A a; int n = 0; a.data = [1,2,3,4,5]; foreach (d; a) { n++; } } With both dmd and gdc, the delegate is never inlined. :-( Compiling with gdc -O3 causes opApply to get inlined and loop-unrolled, but the call to the delegate is still there. With dmd -O, even opApply is not inlined, and the code is generally much longer per loop iteration than gdc -O3. Here's the code generated by gdc for the foreach() loop in main(): 404839: 48 89 c3 mov %rax,%rbx 40483c: 48 8b 04 24 mov (%rsp),%rax 404840: 48 8d 74 24 3c lea 0x3c(%rsp),%rsi 404845: 48 8d 7c 24 20 lea 0x20(%rsp),%rdi 40484a: 48 89 03 mov %rax,(%rbx) 40484d: 48 8b 44 24 08 mov 0x8(%rsp),%rax 404852: 48 89 43 08 mov %rax,0x8(%rbx) 404856: 8b 44 24 10 mov 0x10(%rsp),%eax 40485a: 89 43 10 mov %eax,0x10(%rbx) 40485d: 8b 03 mov (%rbx),%eax 40485f: 89 44 24 3c mov %eax,0x3c(%rsp) 404863: ff d5 callq *%rbp 404865: 85 c0 test %eax,%eax 404867: 75 58 jne 4048c1 <_Dmain+0xd1> 404869: 8b 43 04 mov 0x4(%rbx),%eax 40486c: 48 8d 74 24 3c lea 0x3c(%rsp),%rsi 404871: 48 8d 7c 24 20 lea 0x20(%rsp),%rdi 404876: 89 44 24 3c mov %eax,0x3c(%rsp) 40487a: ff d5 callq *%rbp 40487c: 85 c0 test %eax,%eax 40487e: 75 41 jne 4048c1 <_Dmain+0xd1> 404880: 8b 43 08 mov 0x8(%rbx),%eax 404883: 48 8d 74 24 3c lea 0x3c(%rsp),%rsi 404888: 48 8d 7c 24 20 lea 0x20(%rsp),%rdi 40488d: 89 44 24 3c mov %eax,0x3c(%rsp) 404891: ff d5 callq *%rbp 404893: 85 c0 test %eax,%eax 404895: 75 2a jne 4048c1 <_Dmain+0xd1> 404897: 8b 43 0c mov 0xc(%rbx),%eax 40489a: 48 8d 74 24 3c lea 0x3c(%rsp),%rsi 40489f: 48 8d 7c 24 20 lea 0x20(%rsp),%rdi 4048a4: 89 44 24 3c mov %eax,0x3c(%rsp) 4048a8: ff d5 callq *%rbp 4048aa: 85 c0 test %eax,%eax 4048ac: 75 13 jne 4048c1 <_Dmain+0xd1> 4048ae: 8b 43 10 mov 0x10(%rbx),%eax 4048b1: 48 8d 74 24 3c lea 0x3c(%rsp),%rsi 4048b6: 48 8d 7c 24 20 lea 0x20(%rsp),%rdi 4048bb: 89 44 24 3c mov %eax,0x3c(%rsp) 4048bf: ff d5 callq *%rbp 4048c1: 48 83 c4 48 add $0x48,%rsp 4048c5: 31 c0 xor %eax,%eax 4048c7: 5b pop %rbx 4048c8: 5d pop %rbp 4048c9: c3 retq Notice that each loop iteration is only 7 instructions, with array elements loaded directly via an offset. The loop in opApply generated by dmd looks like this: 08049314 <_D4test1A7opApplyMFDFKiZiZi>: 8049314: 55 push %ebp 8049315: 8b ec mov %esp,%ebp 8049317: 83 ec 18 sub $0x18,%esp 804931a: 53 push %ebx 804931b: 56 push %esi 804931c: 89 45 f8 mov %eax,-0x8(%ebp) 804931f: 83 7d f8 00 cmpl $0x0,-0x8(%ebp) 8049323: 75 1f jne 8049344 <_D4test1A7opApplyMFDFKiZiZi+0x30> 8049325: 6a 06 push $0x6 8049327: ff 35 d4 a3 05 08 pushl 0x805a3d4 804932d: ff 35 d0 a3 05 08 pushl 0x805a3d0 8049333: ff 35 ec a3 05 08 pushl 0x805a3ec 8049339: ff 35 e8 a3 05 08 pushl 0x805a3e8 804933f: e8 8c 04 00 00 call 80497d0 <_d_assert_msg> 8049344: 8b 45 f8 mov -0x8(%ebp),%eax 8049347: 8b 50 04 mov 0x4(%eax),%edx 804934a: 8b 00 mov (%eax),%eax 804934c: 89 45 e8 mov %eax,-0x18(%ebp) 804934f: 89 55 ec mov %edx,-0x14(%ebp) 8049352: c7 45 f0 00 00 00 00 movl $0x0,-0x10(%ebp) 8049359: 8b 4d f0 mov -0x10(%ebp),%ecx 804935c: 3b 4d e8 cmp -0x18(%ebp),%ecx 804935f: 73 41 jae 80493a2 <_D4test1A7opApplyMFDFKiZiZi+0x8e> 8049361: 8b 5d f0 mov -0x10(%ebp),%ebx 8049364: 3b 5d e8 cmp -0x18(%ebp),%ebx 8049367: 72 0a jb 8049373 <_D4test1A7opApplyMFDFKiZiZi+0x5f> 8049369: b8 07 00 00 00 mov $0x7,%eax 804936e: e8 b9 00 00 00 call 804942c <_D4test7__arrayZ> 8049373: 8b 55 ec mov -0x14(%ebp),%edx 8049376: 8b 45 e8 mov -0x18(%ebp),%eax 8049379: 8b 34 9a mov (%edx,%ebx,4),%esi 804937c: 89 75 f4 mov %esi,-0xc(%ebp) 804937f: 8d 4d f4 lea -0xc(%ebp),%ecx 8049382: 51 push %ecx 8049383: 8b 45 08 mov 0x8(%ebp),%eax 8049386: 8b 55 0c mov 0xc(%ebp),%edx 8049389: 8b 5d 08 mov 0x8(%ebp),%ebx 804938c: ff d2 call *%edx 804938e: 85 c0 test %eax,%eax 8049390: 74 0b je 804939d <_D4test1A7opApplyMFDFKiZiZi+0x89> 8049392: b8 01 00 00 00 mov $0x1,%eax 8049397: 5e pop %esi 8049398: 5b pop %ebx 8049399: c9 leave 804939a: c2 08 00 ret $0x8 804939d: ff 45 f0 incl -0x10(%ebp) 80493a0: eb b7 jmp 8049359 <_D4test1A7opApplyMFDFKiZiZi+0x45> 80493a2: 31 c0 xor %eax,%eax 80493a4: 5e pop %esi 80493a5: 5b pop %ebx 80493a6: c9 leave 80493a7: c2 08 00 ret $0x8 80493aa: 90 nop 80493ab: 90 nop The loop body is significantly longer, about 22 instructions when the exit branch isn't taken, and includes a call to a bounds check routine per iteration. I suppose the gdc case has been heavily optimized by gcc's optimizing backend. :-) Though I'm kinda disappointed that it didn't inline the trivial delegate. Or is that because of the way the front-end generates the AST? T -- Nothing in the world is more distasteful to a man than to take the path that leads to himself. -- Herman Hesse
Mar 12 2012
prev sibling next sibling parent reply Brad Roberts <braddr puremagic.com> writes:
On 3/12/2012 4:15 PM, Peter Alexander wrote:
 On Sunday, 11 March 2012 at 06:49:27 UTC, H. S. Teoh wrote:
 On Sun, Mar 11, 2012 at 01:29:01AM -0500, Nick Sabalausky wrote:
 Suppose you have a delegate literal and immediately call it:

 auto a = x + (){ doStuff(); return y; }() + z;

 Does DMD ever (or always?) optimize away a delegate if it's executed
 immediately and never stored into a variable? If not, can it, and
 would it be a simple change? Is something like this already on the
 table?

I've always wondered about whether delegates passed to opApply ever get inlined.

Don't wonder. Find out! import std.stdio; void doStuff() { writeln("Howdy!"); } void main() { int x = 1, y = 2, z = 3; auto a = x + (){ doStuff(); return y; }() + z; writeln(a); }

See also: bug 4440 The patch in there, if it hasn't bit rotten to badly (I suspect it has) will handle _this_ case. But almost no other case of inlining delegates. It'd be a good area for someone who wants an interesting and non-trivial problem to dig into. It wouldn't touch all that much of the codebase as the inliner is fairly self-contained. At least, that's what I recall from when I looked at this stuff a couple years ago. Later, Brad
Mar 12 2012
parent "Nick Sabalausky" <a a.a> writes:
"Brad Roberts" <braddr puremagic.com> wrote in message 
news:mailman.582.1331607753.4860.digitalmars-d puremagic.com...
 On 3/12/2012 4:15 PM, Peter Alexander wrote:
 On Sunday, 11 March 2012 at 06:49:27 UTC, H. S. Teoh wrote:
 On Sun, Mar 11, 2012 at 01:29:01AM -0500, Nick Sabalausky wrote:
 Suppose you have a delegate literal and immediately call it:

 auto a = x + (){ doStuff(); return y; }() + z;

 Does DMD ever (or always?) optimize away a delegate if it's executed
 immediately and never stored into a variable? If not, can it, and
 would it be a simple change? Is something like this already on the
 table?

I've always wondered about whether delegates passed to opApply ever get inlined.

Don't wonder. Find out! import std.stdio; void doStuff() { writeln("Howdy!"); } void main() { int x = 1, y = 2, z = 3; auto a = x + (){ doStuff(); return y; }() + z; writeln(a); }

See also: bug 4440 The patch in there, if it hasn't bit rotten to badly (I suspect it has) will handle _this_ case. But almost no other case of inlining delegates. It'd be a good area for someone who wants an interesting and non-trivial problem to dig into. It wouldn't touch all that much of the codebase as the inliner is fairly self-contained. At least, that's what I recall from when I looked at this stuff a couple years ago.

Do you think that patch would be a good starting place for further work, or would a proper solution likely necessitate an entirely different approach?
Mar 12 2012
prev sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Mon, 12 Mar 2012 20:28:15 -0400, H. S. Teoh <hsteoh quickfur.ath.cx>  
wrote:

 Hmph.

 I tried this code:

 	import std.stdio;
 	struct A {
 		int[] data;
 		int opApply(int delegate(ref int) dg) {
 			foreach (d; data) {
 				if (dg(d)) return 1;
 			}
 			return 0;
 		}
 	}
 	void main() {
 		A a;
 		int n = 0;

 		a.data = [1,2,3,4,5];
 		foreach (d; a) {
 			n++;
 		}
 	}

 With both dmd and gdc, the delegate is never inlined. :-(  Compiling
 with gdc -O3 causes opApply to get inlined and loop-unrolled, but the
 call to the delegate is still there. With dmd -O, even opApply is not
 inlined, and the code is generally much longer per loop iteration than
 gdc -O3.

IIRC, ldc does inline opApply. But this is somewhat hearsay since I don't use ldc. I'm just remembering what others have posted here. -Steve
Mar 14 2012