digitalmars.D.learn - Is there a way to get a list of functions that get inlined by dmd?
- Trass3r (1/1) Feb 08 2010 Would be interesting.
- Scorn (15/16) Feb 08 2010 Yes, this would be very interesting indeed. A list of the rules which
- bearophile (4/10) Feb 08 2010 Use LDC (D1), you will note a significant improvement over DMD.
- Trass3r (2/3) Feb 08 2010 I believe that, but... D1.
- Scorn (14/26) Feb 08 2010 Hi bearophile. Thanks for your advice (i might try out ldc in the future
- bearophile (10/21) Feb 09 2010 I like DMD for some of the D2 features, for its speed, for allowing exce...
- Scorn (58/93) Feb 09 2010 I absolutely understand this :-) That is why i am using gdc at the
- bearophile (14/18) Feb 09 2010 I understand. The problem is that D is a new language with new compilers...
- bearophile (3/4) Feb 09 2010 larsivi and Deew have told me that compiling with -inline -singleobj sol...
- Scorn (3/9) Feb 09 2010 Should that not be a default then for the ldc compiler when compiling in
- Scorn (19/54) Feb 09 2010 Sure. And that is something i do not expect yet. Since Walther is doing
- Nick Sabalausky (3/11) Feb 08 2010 Unless you're on windows.
- Trass3r (34/39) Feb 09 2010 Well if I read the code correctly the following is not supported:
- Scorn (7/56) Feb 09 2010 Thanks for your work. This is very interesting. It would be good to have
- Trass3r (3/7) Feb 09 2010 Yeah, I think so as well as each module is compiled separately.
- Don (4/13) Feb 09 2010 I don't think so. It's hard to believe that the inliner would be so limi...
- Scorn (47/62) Feb 09 2010 Thank you very much for your answer Don. I think it might be interesting
- bearophile (140/147) Feb 09 2010 Don't write code like that, add some parenthesys like this:
- Scorn (37/207) Feb 09 2010 :-)
Trass3r schrieb:Would be interesting.Yes, this would be very interesting indeed. A list of the rules which dmd uses internally for inlining functions and methods similiar to this (which is for .NET) http://blogs.msdn.com/ericgu/archive/2004/01/29/64717.aspx would be really nice. Sure you can figure this out on your own by studying the compiler sources but a simple list of rules (does not have to be exhaustive) on http://www.digitalmars.com/d/1.0/lex.html or on a seperate page regarding optimizations for d would be very appreciated. The only things i figured out so far is that functions across modules do not seem to get inlined (i don't know if this is the case in general) which would be really bad. Another thing which i believe is that functions / methods which contain ref parameters are never inlined at all (which is again really annoying since it's not a clever way passing huge structs by value).
Feb 08 2010
Scorn:The only things i figured out so far is that functions across modules do not seem to get inlined (i don't know if this is the case in general) which would be really bad. Another thing which i believe is that functions / methods which contain ref parameters are never inlined at all (which is again really annoying since it's not a clever way passing huge structs by value).Use LDC (D1), you will note a significant improvement over DMD. Bye, bearophile
Feb 08 2010
Use LDC (D1), you will note a significant improvement over DMD.I believe that, but... D1. Also LDC doesn't seem to get much attention (development-wise) recently.
Feb 08 2010
bearophile schrieb:Scorn:Hi bearophile. Thanks for your advice (i might try out ldc in the future but now i need phobos and not tango as standard library). At the moment i am using gdc where i nearly always see a significant improvement regarding the speed of the produced code when comparing to dmd. But since all three dmd, ldc and gdc use the same frontend, the question from Trass3r still remains: Under which conditions are functions/methods inlined ? Do you know if ldc inlines functions/methods across modules ? (dmd doesn't seem to do it and neither does gdc). And that is bad since for an actual project i have made a separate module with a lot of small utility math functions which should be inlined but don't because of this. When i inline them using mixins or manually i get an overall speed up of about 20%.The only things i figured out so far is that functions across modules do not seem to get inlined (i don't know if this is the case in general) which would be really bad. Another thing which i believe is that functions / methods which contain ref parameters are never inlined at all (which is again really annoying since it's not a clever way passing huge structs by value).Use LDC (D1), you will note a significant improvement over DMD. Bye, bearophile
Feb 08 2010
Scorn:Hi bearophile. Thanks for your advice (i might try out ldc in the future but now i need phobos and not tango as standard library).I like DMD for some of the D2 features, for its speed, for allowing exceptions to be used on Windows too, for its built-in profiler and code coverage analyser that are not present in LDC, and for other small things, but I like how LDC feels more like a real-world compiler, it has smaller features like force_inline that look like coming out of a more practical compiler. When I optimize code on LDC, I see predictable improvements of the performance, while with DMD it's like a shoot in the dark, and I usually have to avoid several tweaks of the code, otherwise I get a negative improvement.But since all three dmd, ldc and gdc use the same frontend, the question from Trass3r still remains: Under which conditions are functions/methods inlined ?I think LDC doesn't use the inliner of the front-end and just uses the much better inliner of the back-end. So the inling rules are probably all different (but in theory the front-end knows more about the D semantics, so LDC has to work even more to regain the lost semantics). programmers don't look obsessed with performance, this is often positive), but it's not easy to find them on most C/C++ compilers I know of. Have you ever seen the exact inlining rules of C code compiled with GCC 4.4.3?Do you know if ldc inlines functions/methods across modules ? (dmd doesn't seem to do it and neither does gdc).I have just done a test, normally LDC is not able to inline across modules. This is a shitty situation. But with LDC you can perform Link-Time Optimization too (but you have to ask for it!), that in my test I've just seen is able to inline across modules.And that is bad since for an actual project i have made a separate module with a lot of small utility math functions which should be inlined but don't because of this. When i inline them using mixins or manually i get an overall speed up of about 20%.If you explain this problem to Walter he will surely tell you that such inlining can't be done because mumble mumble separate compilation mumble mumble was done fifteen years ago mumble mumble mumble (even if LDC is currently doing it) :-) Good luck, bearophile
Feb 09 2010
Scorn:I absolutely understand this :-) That is why i am using gdc at the moment and not dmd (with all of the very nice gcc optimizations flags for the gcc backend which improve speed substantially when comparing to dmd)Hi bearophile. Thanks for your advice (i might try out ldc in the future but now i need phobos and not tango as standard library).I like DMD for some of the D2 features, for its speed, for allowing exceptions to be used on Windows too, for its built-in profiler and code coverage analyser that are not present in LDC, and for other small things, but I like how LDC feels more like a real-world compiler, it has smaller features like force_inline that look like coming out of a more practical compiler. When I optimize code on LDC, I see predictable improvements of the performance, while with DMD it's like a shoot in the dark, and I usually have to avoid several tweaks of the code, otherwise I get a negative improvement.That would be nice. You nearly convinced me porting my project to tango and ldc.But since all three dmd, ldc and gdc use the same frontend, the question from Trass3r still remains: Under which conditions are functions/methods inlined ?I think LDC doesn't use the inliner of the front-end and just uses the much better inliner of the back-end. So the inling rules are probably all different (but in theory the front-end knows more about the D semantics, so LDC has to work even more to regain the lost semantics).programmers don't look obsessed with performance, this is often positive), but it's not easy to find them on most C/C++ compilers I know of.application-development language and it's very good for these kind of things. It's not meant to be used for high-performance numerical computing but astonishingly it is sometimes not so bad when you use it for this purpose. But D as a system-programming language with the ambition to be an alternative to C / C++ has, at least in the long term, to compete with C / C++ regarding these things. So when i consider using D for a project i do it because i am looking makes me always very sad when i figure out that i can't use D successfully because of little flaws like this. And that means another chance of using D for a project is lost. That's why i like your posts about benchmarking so much. They might not be liked by parts of the community because they put the hook where it hurts most and might be annoying sometimes but please carry on :-) (you have at least one fan here :-) ). And please make a collection of all the issues you have found so far and but them on a web page somewhere.Have you ever seen the exact inlining rules of C code compiled with GCC 4.4.3?No i have not seen yet the exact inlining rules of C code compiled with GCC 4.4.3. But i don't need them and i don't care. Why ? Because gcc does its job well without me knowing anything about the heuristics it uses for inlining. For D it's different. I have found so many times that i can increase the speed of the generate code in D tremendously by just inlining small pieces of code manually (or using mixins). Something i have never found using any C++ compiler (they do it so much better than i ever could). So i think that's why Trass3r and i are so interested in the rules D uses for inlining.You are absolutely right here. That is a shitty situation for all D compilers. Because it basically means that your are not able to do decent software engineering practices (a clean separation of functionality in different modules) without sacrifying performance. That is a situation you have in no other language i know of and this is a big design flaw of the module system making it nearly senseless.Do you know if ldc inlines functions/methods across modules ? (dmd doesn't seem to do it and neither does gdc).I have just done a test, normally LDC is not able to inline across modules. This is a shitty situation.But with LDC you can perform Link-Time Optimization too (but you have to ask for it!), that in my test I've just seen is able to inline across modules.Which compiler switches do you need for this ?:-) (Big smile.) Yes. That's what i would expect. But the whole thing is easy solvable by not doing just a separate compilation of the modules and then linking them together but by including the source code content of a module file in another module file which does import it (like a C or C++ compiler includes files). Then every function / method needed is visible while compiling the module and the compiler can decide by using its heuristics if a function / method is worth inlining or not. The compile times might be a bit longer but since the grammar of D is much simpler than C++ it would not hurt compile time so much.And that is bad since for an actual project i have made a separate module with a lot of small utility math functions which should be inlined but don't because of this. When i inline them using mixins or manually i get an overall speed up of about 20%.If you explain this problem to Walter he will surely tell you that such inlining can't be done because mumble mumble separate compilation mumble mumble was done fifteen years ago mumble mumble mumble (even if LDC is currently doing it) :-)Good luck, bearophileThanks. Good luck for you too. But for this project i think i will abandon D and stick to C++ again (even if i hate to do so) because the only options would be using mixins all over the place or put everything in one big file (and both solutions are ugly). Thanks for your help.
Feb 09 2010
Scorn:But D as a system-programming language with the ambition to be an alternative to C / C++ has, at least in the long term, to compete with C / C++ regarding these things.<I understand. The problem is that D is a new language with new compilers, so it can't optimize as well as GCC that is compiling C code for so many years. LDC compiles C-like D1 code well enough, about as well as GCC or better.And please make a collection of all the issues you have found so far and but them on a web page somewhere.<http://www.fantascienza.net/leonardo/js/slow_d.zipWhich compiler switches do you need for this ?<Found after few hours of tests of mine plus a suggestion from the LLVM lead developer :-) For example you have a "temp.d" main module and a "mo.d" imported module: ldc -O5 -release -inline -output-bc temp.d ldc -O5 -release -inline -output-bc mo.d opt -std-compile-opts temp.bc > tempo.bc opt -std-compile-opts mo.bc > moo.bc llvm-ld -L/usr/lib/d -native -ltango-ldc -ldl -lm -lpthread -internalize-public-api-list=_Dmain -o=tempo tempo.bc moo.bcBut the whole thing is easy solvable by not doing just a separate compilation of the modules and then linking them together<If you take a look at my precedent post ( http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D.le rn&article_id=18822 ) you can see I have found DMD does inline functions from other modules (while LDC doesn't do it, so I have to report this to the ldc devs and not to Walter). Bye, bearophile
Feb 09 2010
(while LDC doesn't do it, so I have to report this to the ldc devs and not to Walter).<larsivi and Deew have told me that compiling with -inline -singleobj solves the problem with ldc, and my test shows it's true. Bye, bearophile
Feb 09 2010
bearophile schrieb:Should that not be a default then for the ldc compiler when compiling in release mode ?(while LDC doesn't do it, so I have to report this to the ldc devs and not to Walter).<larsivi and Deew have told me that compiling with -inline -singleobj solves the problem with ldc, and my test shows it's true. Bye, bearophile
Feb 09 2010
bearophile schrieb:Scorn:Sure. And that is something i do not expect yet. Since Walther is doing nearly all the compiler work alone, i would never ever blame him for anything. And it's not that i am not willing to sacrifice speed for the productivity gains which programming in D gets me. But it's always annoying when simple things like this do not seem to work. Things which work in virtually every other compiled language out there. This sometimes costs a lot of confidence in the compiler.But D as a system-programming language with the ambition to be an alternative to C / C++ has, at least in the long term, to compete with C / C++ regarding these things.<I understand. The problem is that D is a new language with new compilers, so it can't optimize as well as GCC that is compiling C code for so many years. LDC compiles C-like D1 code well enough, about as well as GCC or better.This test code is very nice. But some kind of web page where you a collection of all the performance problems you found so far in the D compiler would be great. So nothing of your research would get lost.And please make a collection of all the issues you have found so far and but them on a web page somewhere.<http://www.fantascienza.net/leonardo/js/slow_d.zipThese are good news. But hopefully the compiler does it not only for your test case but tries always to inline functions across modules when it is worth it (which was the purpose of the original question). Is it constrained by the size of the function or the parameters. Trass3r did some nice research regarding this.Which compiler switches do you need for this ?<Found after few hours of tests of mine plus a suggestion from the LLVM lead developer :-) For example you have a "temp.d" main module and a "mo.d" imported module: ldc -O5 -release -inline -output-bc temp.d ldc -O5 -release -inline -output-bc mo.d opt -std-compile-opts temp.bc > tempo.bc opt -std-compile-opts mo.bc > moo.bc llvm-ld -L/usr/lib/d -native -ltango-ldc -ldl -lm -lpthread -internalize-public-api-list=_Dmain -o=tempo tempo.bc moo.bcBut the whole thing is easy solvable by not doing just a separate compilation of the modules and then linking them together<If you take a look at my precedent post ( http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D.le rn&article_id=18822 ) you can see I have found DMD does inline functions from other modules (while LDC doesn't do it, so I have to report this to the ldc devs and not to Walter).Bye, bearophileThanks for your hints, Bye.
Feb 09 2010
"bearophile" <bearophileHUGS lycos.com> wrote in message news:hkpiai$2kt6$1 digitalmars.com...Scorn:Unless you're on windows.The only things i figured out so far is that functions across modules do not seem to get inlined (i don't know if this is the case in general) which would be really bad. Another thing which i believe is that functions / methods which contain ref parameters are never inlined at all (which is again really annoying since it's not a clever way passing huge structs by value).Use LDC (D1), you will note a significant improvement over DMD.
Feb 08 2010
Yep, second big problem.Use LDC (D1), you will note a significant improvement over DMD.Unless you're on windows.
Feb 08 2010
And unless you're using CodeBlocks (like me at the moment) for development ...Yep, second big problem.Use LDC (D1), you will note a significant improvement over DMD.Unless you're on windows.
Feb 08 2010
Am 08.02.2010, 16:33 Uhr, schrieb Scorn <scorn trash-mail.com>:Trass3r schrieb:Well if I read the code correctly the following is not supported: - nested inline? - variadic functions (T t, ...) - synchronized - imported functions - functions with closure vars - virtual functions that aren't final - functions with out, ref or static array parameters - functions with more than 250 elementary expressions Created my own little inline dumping patch: Index: inline.c =================================================================== --- inline.c (revision 363) +++ inline.c (working copy) -1126,6 +1126,7 if (fd && fd != iss->fd && fd->canInline(0)) { e = fd->doInline(iss, NULL, arguments); + printf("Inlined function %s.\n", fd->toPrettyChars()); } } else if (e1->op == TOKdotvar) -1145,7 +1146,10 ; } else - e = fd->doInline(iss, dve->e1, arguments); + { + e = fd->doInline(iss, dve->e1, arguments); + printf("Inlined method %s.\n", fd->toPrettyChars()); + } } }Would be interesting.Yes, this would be very interesting indeed. A list of the rules which dmd uses internally for inlining functions and methods would be really nice.
Feb 09 2010
Trass3r schrieb:Am 08.02.2010, 16:33 Uhr, schrieb Scorn <scorn trash-mail.com>:Thanks for your work. This is very interesting. It would be good to have this on a separate page regarding optimizations in D. But a big problem is that inlining seems to be done only inside one module and not across modules which makes modules with something like small helper or math functions which are used across different modules senseless.Trass3r schrieb:Well if I read the code correctly the following is not supported: - nested inline? - variadic functions (T t, ...) - synchronized - imported functions - functions with closure vars - virtual functions that aren't final - functions with out, ref or static array parameters - functions with more than 250 elementary expressions Created my own little inline dumping patch: Index: inline.c =================================================================== --- inline.c (revision 363) +++ inline.c (working copy) -1126,6 +1126,7 if (fd && fd != iss->fd && fd->canInline(0)) { e = fd->doInline(iss, NULL, arguments); + printf("Inlined function %s.\n", fd->toPrettyChars()); } } else if (e1->op == TOKdotvar) -1145,7 +1146,10 ; } else - e = fd->doInline(iss, dve->e1, arguments); + { + e = fd->doInline(iss, dve->e1, arguments); + printf("Inlined method %s.\n", fd->toPrettyChars()); + } } }Would be interesting.Yes, this would be very interesting indeed. A list of the rules which dmd uses internally for inlining functions and methods would be really nice.
Feb 09 2010
But a big problem is that inlining seems to be done only inside one module and not across modules which makes modules with something like small helper or math functions which are used across different modules senseless.Yeah, I think so as well as each module is compiled separately. Nevertheless my patch lists functions that aren't used in the same module. Maybe I've missed something.
Feb 09 2010
Trass3r wrote:I don't think so. It's hard to believe that the inliner would be so limited. It'd be great to assemble some important test cases that currently fail. Probably 'ref' arguments are the main culprit.But a big problem is that inlining seems to be done only inside one module and not across modules which makes modules with something like small helper or math functions which are used across different modules senseless.Yeah, I think so as well as each module is compiled separately. Nevertheless my patch lists functions that aren't used in the same module. Maybe I've missed something.
Feb 09 2010
Don schrieb:Trass3r wrote:Thank you very much for your answer Don. I think it might be interesting for you too, since from what i know you are also using D for numerical stuff. I still hope that the inliner is not so limited. But i am sure i once read a post from Walther where he did say that optimizations are done only per module. I don't hope it applies for inlining too. Because otherwise the whole module system would be, in my humble opinion, totally unusable. Just think about things like setter / getter methods in classes, little math-functions which you would put in a separate math module, operator overloading in a complex number or vector / matrix class module. When things like these are used in a program which does a lot of numerical computations inside in a big loop this would be a really really bad. But from my experience (which i have to admit is not that big) small utility functions like the following are not inlined when they are in a separate module but are inlined when they are in the same module in which they are called: double min(double a, double b, double c) { return a < b && a < c ? a : b < c ? b : c; } At the moment i help myself with mixins because they will just get copied and pasted inside the code: template Min(char[] a, char[] b, char[] c) { const char[] Min = a~"<"~b~"&&"~a~"<"~c~"?"~a~":"~b~"<"~c~"?"~b~":" ~ c; } min.x = mixin(Min!("vertex0.x", "vertex1.x", "vertex2.x")); min.y = mixin(Min!("vertex0.y", "vertex1.y", "vertex2.y")); min.z = mixin(Min!("vertex0.z", "vertex1.z", "vertex2.z")); This is (and a little bit more) is running in a tight loop which runs about 10000000 times. With these "optimizations" i get a speed increase about 20% percent. And i get the same increase in speed when i just copy the same function in the module in which the function is called. It's not the only this function where i notice this behaviour but in all of my tiny helper functions in the separate math-module. So for the moment it seems for me i have the alternatives in copying every math-helper function in every module which needs it (so it gets inlined) which is a software-engineering nightmare because i spread the same functionality over and over in my code or use mixins to death. So this is why i was so interested in the question under which conditions are functions inlined (it sometimes is very strange). I still hope it's not true that inlinig in D is so limited but from my experience it seems to be (at last in many cases).I don't think so. It's hard to believe that the inliner would be so limited. It'd be great to assemble some important test cases that currently fail. Probably 'ref' arguments are the main culprit.But a big problem is that inlining seems to be done only inside one module and not across modules which makes modules with something like small helper or math functions which are used across different modules senseless.Yeah, I think so as well as each module is compiled separately. Nevertheless my patch lists functions that aren't used in the same module. Maybe I've missed something.
Feb 09 2010
Scorn:double min(double a, double b, double c) { return a < b && a < c ? a : b < c ? b : c; }Don't write code like that, add some parenthesys like this: return (a < b && a < c) ? a : (b < c ? b : c); because the compiler is able to sort out those operator precedences, but the programmer that comes after you and reads that code will have problems. A compiler compiles that code with 3 FP tests, while I think two suffice, so there are better ways to write that.This is (and a little bit more) is running in a tight loop which runs about 10000000 times. With these "optimizations" i get a speed increase about 20% percent.--------------------- I have created a module named "mo" and a main module named "temp": module mo; int foo(int x) { return x * x; } double min3(double a, double b, double c) { return (a <= b) ? (a <= c ? a : c) : (b <= c ? b : c); } --------------------- module temp; // main module version (Tango) { import tango.stdc.stdio: printf; import tango.stdc.stdlib: atoi, atof; } else { import std.c.stdio: printf; import std.c.stdlib: atoi, atof; } import mo: foo, min3; void main() { int x = atoi("12"); printf("%d\n", foo(x)); double x1 = atof("10"); double x2 = atof("20"); double x3 = atof("30"); printf("%f\n", min3(x1, x2, x3)); } --------------------- From my tests it seems LDC isn't able to inline those functions, while DMD is able to inline them :-) ldc -O5 -release -output-s -inline temp.d mo.d 08049600 <_Dmain>: 8049600: 83 ec 34 sub $0x34,%esp 8049603: c7 04 24 e8 8c 05 08 movl $0x8058ce8,(%esp) 804960a: e8 99 fd ff ff call 80493a8 <atoi plt> 804960f: e8 9c 00 00 00 call 80496b0 <_D2mo3fooFiZi> 8049614: 89 44 24 04 mov %eax,0x4(%esp) 8049618: c7 04 24 eb 8c 05 08 movl $0x8058ceb,(%esp) 804961f: e8 64 fd ff ff call 8049388 <printf plt> 8049624: c7 04 24 ef 8c 05 08 movl $0x8058cef,(%esp) 804962b: e8 98 fd ff ff call 80493c8 <atof plt> 8049630: db 7c 24 28 fstpt 0x28(%esp) 8049634: c7 04 24 f2 8c 05 08 movl $0x8058cf2,(%esp) 804963b: e8 88 fd ff ff call 80493c8 <atof plt> 8049640: db 7c 24 1c fstpt 0x1c(%esp) 8049644: c7 04 24 f5 8c 05 08 movl $0x8058cf5,(%esp) 804964b: e8 78 fd ff ff call 80493c8 <atof plt> 8049650: db 6c 24 28 fldt 0x28(%esp) 8049654: dd 5c 24 10 fstpl 0x10(%esp) 8049658: db 6c 24 1c fldt 0x1c(%esp) 804965c: dd 5c 24 08 fstpl 0x8(%esp) 8049660: dd 1c 24 fstpl (%esp) 8049663: e8 58 00 00 00 call 80496c0 <_D2mo4min3FdddZd> 8049668: 83 ec 18 sub $0x18,%esp 804966b: dd 5c 24 04 fstpl 0x4(%esp) 804966f: c7 04 24 f8 8c 05 08 movl $0x8058cf8,(%esp) 8049676: e8 0d fd ff ff call 8049388 <printf plt> 804967b: 31 c0 xor %eax,%eax 804967d: 83 c4 34 add $0x34,%esp 8049680: c2 08 00 ret $0x8 8049683: 8d b6 00 00 00 00 lea 0x0(%esi),%esi 8049689: 8d bc 27 00 00 00 00 lea 0x0(%edi,%eiz,1),%edi ----------------- dmd -O -release -inline temp.d mo.d __Dmain comdat L0: sub ESP,038h mov EAX,offset FLAT:_DATA push EBX push ESI push EDI push EAX call near ptr _atoi add ESP,4 mov EBX,EAX mov ECX,EAX imul ECX,ECX mov EDX,offset FLAT:_DATA[4] push ECX push EDX call near ptr _printf mov ESI,offset FLAT:_DATA[8] push ESI call near ptr _atof mov EDI,offset FLAT:_DATA[0Ch] fstp qword ptr 018h[ESP] push EDI call near ptr _atof mov EAX,offset FLAT:_DATA[010h] fstp qword ptr 024h[ESP] push EAX call near ptr _atof add ESP,4 fld qword ptr 01Ch[ESP] fxch ST1 fstp qword ptr 02Ch[ESP] fcomp qword ptr 024h[ESP] fstsw AX sahf ja L83 jp L83 fld qword ptr 01Ch[ESP] fcomp qword ptr 02Ch[ESP] fstsw AX sahf ja L7D jp L7D fld qword ptr 01Ch[ESP] jmp short L9C L7D: fld qword ptr 02Ch[ESP] jmp short L9C L83: fld qword ptr 024h[ESP] fcomp qword ptr 02Ch[ESP] fstsw AX sahf ja L98 jp L98 fld qword ptr 024h[ESP] jmp short L9C L98: fld qword ptr 02Ch[ESP] L9C: sub ESP,8 mov ECX,offset FLAT:_DATA[014h] fstp qword ptr [ESP] push ECX call near ptr _printf add ESP,01Ch xor EAX,EAX pop EDI pop ESI pop EBX add ESP,038h ret ----------------- Using Link-Time optimization LDC is able to inline those functions. So here it seems LDC is worse :-( Bye, bearophile
Feb 09 2010
Scorn:Ok. The next time i post an example i will take care that it is more readable :-)double min(double a, double b, double c) { return a < b && a < c ? a : b < c ? b : c; }Don't write code like that, add some parenthesys like this: return (a < b && a < c) ? a : (b < c ? b : c); because the compiler is able to sort out those operator precedences, but the programmer that comes after you and reads that code will have problems.A compiler compiles that code with 3 FP tests, while I think two suffice, so there are better ways to write that.:-) Sure. Yes you are right. Since i do not want to sort the values a, b and c (have a total order of things) i could, of course, write something longer and a bit more efficient code like this: double max(double a, double b, double c) { if (a >= b) { if (a >= c) return a; else return c; } else { if (b >= c) return b; else return c; } } which just uses two comparisons instead of three. But trust me. That bad code from above is not the explanation for the lack of speed in my program and would be a bit longer to write as a mixin. ;-) But here comes the interesting part:And gdc does not seem to inline those functions neither :-(This is (and a little bit more) is running in a tight loop which runs about 10000000 times. With these "optimizations" i get a speed increase about 20% percent.--------------------- I have created a module named "mo" and a main module named "temp": module mo; int foo(int x) { return x * x; } double min3(double a, double b, double c) { return (a <= b) ? (a <= c ? a : c) : (b <= c ? b : c); } --------------------- module temp; // main module version (Tango) { import tango.stdc.stdio: printf; import tango.stdc.stdlib: atoi, atof; } else { import std.c.stdio: printf; import std.c.stdlib: atoi, atof; } import mo: foo, min3; void main() { int x = atoi("12"); printf("%d\n", foo(x)); double x1 = atof("10"); double x2 = atof("20"); double x3 = atof("30"); printf("%f\n", min3(x1, x2, x3)); } --------------------- From my tests it seems LDC isn't able to inline those functions, while DMD is able to inline them :-)ldc -O5 -release -output-s -inline temp.d mo.d 08049600 <_Dmain>: 8049600: 83 ec 34 sub $0x34,%esp 8049603: c7 04 24 e8 8c 05 08 movl $0x8058ce8,(%esp) 804960a: e8 99 fd ff ff call 80493a8 <atoi plt> 804960f: e8 9c 00 00 00 call 80496b0 <_D2mo3fooFiZi> 8049614: 89 44 24 04 mov %eax,0x4(%esp) 8049618: c7 04 24 eb 8c 05 08 movl $0x8058ceb,(%esp) 804961f: e8 64 fd ff ff call 8049388 <printf plt> 8049624: c7 04 24 ef 8c 05 08 movl $0x8058cef,(%esp) 804962b: e8 98 fd ff ff call 80493c8 <atof plt> 8049630: db 7c 24 28 fstpt 0x28(%esp) 8049634: c7 04 24 f2 8c 05 08 movl $0x8058cf2,(%esp) 804963b: e8 88 fd ff ff call 80493c8 <atof plt> 8049640: db 7c 24 1c fstpt 0x1c(%esp) 8049644: c7 04 24 f5 8c 05 08 movl $0x8058cf5,(%esp) 804964b: e8 78 fd ff ff call 80493c8 <atof plt> 8049650: db 6c 24 28 fldt 0x28(%esp) 8049654: dd 5c 24 10 fstpl 0x10(%esp) 8049658: db 6c 24 1c fldt 0x1c(%esp) 804965c: dd 5c 24 08 fstpl 0x8(%esp) 8049660: dd 1c 24 fstpl (%esp) 8049663: e8 58 00 00 00 call 80496c0 <_D2mo4min3FdddZd> 8049668: 83 ec 18 sub $0x18,%esp 804966b: dd 5c 24 04 fstpl 0x4(%esp) 804966f: c7 04 24 f8 8c 05 08 movl $0x8058cf8,(%esp) 8049676: e8 0d fd ff ff call 8049388 <printf plt> 804967b: 31 c0 xor %eax,%eax 804967d: 83 c4 34 add $0x34,%esp 8049680: c2 08 00 ret $0x8 8049683: 8d b6 00 00 00 00 lea 0x0(%esi),%esi 8049689: 8d bc 27 00 00 00 00 lea 0x0(%edi,%eiz,1),%edi ----------------- dmd -O -release -inline temp.d mo.d __Dmain comdat L0: sub ESP,038h mov EAX,offset FLAT:_DATA push EBX push ESI push EDI push EAX call near ptr _atoi add ESP,4 mov EBX,EAX mov ECX,EAX imul ECX,ECX mov EDX,offset FLAT:_DATA[4] push ECX push EDX call near ptr _printf mov ESI,offset FLAT:_DATA[8] push ESI call near ptr _atof mov EDI,offset FLAT:_DATA[0Ch] fstp qword ptr 018h[ESP] push EDI call near ptr _atof mov EAX,offset FLAT:_DATA[010h] fstp qword ptr 024h[ESP] push EAX call near ptr _atof add ESP,4 fld qword ptr 01Ch[ESP] fxch ST1 fstp qword ptr 02Ch[ESP] fcomp qword ptr 024h[ESP] fstsw AX sahf ja L83 jp L83 fld qword ptr 01Ch[ESP] fcomp qword ptr 02Ch[ESP] fstsw AX sahf ja L7D jp L7D fld qword ptr 01Ch[ESP] jmp short L9C L7D: fld qword ptr 02Ch[ESP] jmp short L9C L83: fld qword ptr 024h[ESP] fcomp qword ptr 02Ch[ESP] fstsw AX sahf ja L98 jp L98 fld qword ptr 024h[ESP] jmp short L9C L98: fld qword ptr 02Ch[ESP] L9C: sub ESP,8 mov ECX,offset FLAT:_DATA[014h] fstp qword ptr [ESP] push ECX call near ptr _printf add ESP,01Ch xor EAX,EAX pop EDI pop ESI pop EBX add ESP,038h ret ----------------- Using Link-Time optimization LDC is able to inline those functions. So here it seems LDC is worse :-(I have to try it with gdc too.Bye, bearophileThank you very much for your research bearophile. It's very appreciated. But now the interesting question is why the different compilers inline functions so differently (other versions of the frontend ? has Walther changed something) or is because they use different backends (which should not matter so much since inlining normally is best done in the frontend). And of course Trass3rs original question under which conditions are functions inlined still remains. Are setters/getters inlined ? Overloaded operators ? Short helper functions ? Functions with ref or out parameters ? In which cases does it simply not work when it should ?
Feb 09 2010