digitalmars.D.announce - GC vs. Manual Memory Management Real World Comparison
- Benjamin Thaut (20/20) Sep 05 2012 I rewrote a 3d game I created during my studies with D 2.0 to manual
- =?ISO-8859-15?Q?Alex_R=F8nne_Petersen?= (12/32) Sep 05 2012 Is source code available anywhere?
- Benjamin Thaut (13/20) Sep 05 2012 The sourcecode is not aviable yet, as it is in a repository of my
- bearophile (5/8) Sep 05 2012 Maybe a compiler-enforced annotation for functions and modules is
- =?ISO-8859-15?Q?Alex_R=F8nne_Petersen?= (7/30) Sep 05 2012 Sure, I just want to point out that it's a problem with the language (GC...
- Benjamin Thaut (15/17) Sep 05 2012 Thats exactly what I want to cause with this post. More effort should be...
- Benjamin Thaut (9/16) Sep 05 2012 Should be:
- =?ISO-8859-15?Q?Alex_R=F8nne_Petersen?= (9/28) Sep 05 2012 Very true. I've often thought we should ship a GC-less druntime in the
- Benjamin Thaut (37/39) Sep 05 2012 Everything is in object_.d:
- Peter Alexander (3/10) Sep 05 2012 Wow.
- Benjamin Thaut (8/18) Sep 05 2012 I already have a fix for this.
- Iain Buclaw (14/45) Sep 05 2012 This got fixed. Said code is now:
- Iain Buclaw (12/60) Sep 05 2012 Oops, let me correct myself.
- Benjamin Thaut (5/67) Sep 05 2012 Still, comparing two type info objects will result in one or multiple
- Andrei Alexandrescu (4/6) Sep 05 2012 Could you please submit a patch for that? Thanks!
- Piotr Szturmaj (3/7) Sep 05 2012 There's one proposed solution to this problem:
- SomeDude (5/15) Sep 10 2012 It's a bad solution imho. Monitoring the druntime and hunting
- bearophile (6/9) Sep 11 2012 Why do you think such hunt is better than letting the compiler
- Iain Buclaw (10/15) Sep 11 2012 Is not difficult to implement, as the compiler only needs to warn that t...
- SomeDude (21/30) Sep 11 2012 My problem is you litter your codebase with nogc everywhere. In
- Felix Hufnagel (31/61) Sep 12 2012 class Foo
- Paulo Pinto (2/36) Sep 13 2012 This is partially what happens in C++/CLI and C++/CX.
- Rob T (7/14) Oct 23 2012 The compiler option warning about undesirable heap allocations
- =?ISO-8859-15?Q?Alex_R=F8nne_Petersen?= (6/26) Sep 05 2012 BTW, your blog post appears to have comparison misspelled.
- anonymous (4/42) Sep 05 2012 Also "development".
- Benjamin Thaut (9/10) Sep 05 2012 The GDC druntime does have a different folder structure, which makes it
- Andrei Alexandrescu (4/5) Sep 05 2012 Smile, you're on reddit:
- bearophile (9/12) Sep 05 2012 Regardind your issues list, most of them are fixable, like the
- Iain Buclaw (15/19) Sep 05 2012 I have no clue what the issue with invariant handlers is... Calls to
- bearophile (40/47) Sep 05 2012 Iain Buclaw:
- Iain Buclaw (10/53) Sep 05 2012 I think it was mostly due to that you can't tell the difference
- bearophile (14/20) Sep 05 2012 I use fixed size arrays all the time in D. Heap-allocated arrays
- bearophile (6/10) Sep 05 2012 Also, the lack of variable length stack allocated arrays in D
- Benjamin Thaut (14/24) Sep 05 2012 Well, as overloading new and delete is deprecated, and the new which is
- Sean Kelly (5/13) Sep 05 2012 It sounds like your code has escaping references? I think the presence ...
- Benjamin Thaut (4/4) Sep 05 2012 My "standard" library is now aviable on github:
- Johannes Pfau (10/37) Sep 05 2012 Would be great if some of the code could be merged into phobos,
- Benjamin Thaut (8/16) Sep 05 2012 I personally really like my composite template, which allows for direct
- Nathan M. Swan (3/25) Sep 05 2012 Did you try GC.disable/enable?
- Walter Bright (3/6) Sep 05 2012 I'd like it if you could add some instrumentation to see what accounts f...
- Iain Buclaw (6/14) Sep 05 2012 I'd say they are identical, but I don't really look at what goes on
- Andrej Mitrovic (5/7) Sep 05 2012 Speaking of which, I'd like to see if the Unilink linker would make
- bearophile (6/9) Sep 05 2012 Maybe that performance difference comes from the sum of some
- Walter Bright (11/16) Sep 05 2012 We can trade guesses all day, and not get anywhere. Instrumentation and
- bearophile (11/12) Sep 06 2012 In that case I think I didn't specify what subsystem of the D
- Peter Alexander (4/13) Sep 06 2012 In addition to Walter's response, it is very rare for advanced
- Sean Cavanaugh (14/17) Sep 06 2012 I love trying to explain to people our debug builds are too slow because...
- Benjamin Thaut (12/19) Sep 06 2012 The code is identical, I did not change anything in the GC code. So it
- Jacob Carlborg (6/23) Sep 06 2012 I don't know what Windows has but on Mac OS X there's this application:
- ponce (2/9) Sep 06 2012 You don't necessarily need to recompile anything with a sampling
- Benjamin Thaut (10/19) Sep 06 2012 I just tried profiling it with Very Sleepy but basically it only tells
- ponce (3/18) Sep 06 2012 You might try AMD Code Analyst, it will highlight the bottleneck
- Walter Bright (2/5) Sep 06 2012 Even so, that in itself is a good clue.
- Sven Torvinger (22/30) Sep 06 2012 my bet is on, cross-module-inlining of bitop.btr failing...
- Iain Buclaw (10/42) Sep 06 2012 You would be wrong. btr is a compiler intrinsic, so it is *always* inli...
- Walter Bright (2/5) Sep 07 2012 Would it be easy to give that a try, and see what happens?
- Iain Buclaw (6/12) Sep 07 2012 Sure, can do. Give me something to work against, and I will be able
- Walter Bright (2/15) Sep 07 2012 Well, gdc with and without it!
- Sean Kelly (6/25) Sep 06 2012 version.
- Jacob Carlborg (4/5) Sep 06 2012 He's using only Windows as far as I understand, GDC MinGW.
- Sean Kelly (6/11) Sep 07 2012 Well sure, but MinGW is weird. I'd expect the Windows flag to be set for...
- Benjamin Thaut (11/32) Sep 07 2012 I did build druntime and phobos with -release -noboundscheck -inline -O
- Andrei Alexandrescu (7/17) Sep 07 2012 You mentioned some issues in Phobos with memory allocation, that you had...
- Benjamin Thaut (53/59) Sep 07 2012 Let me give a bit more details about what I did and why.
- ponce (7/9) Sep 07 2012 You make some good points about what happen under the hood.
- Jens Mueller (7/18) Sep 07 2012 Interesting.
- Benjamin Thaut (5/5) Sep 09 2012 The full sourcecode for the non-GC version is now aviable on github. The...
- Benjamin Thaut (9/9) Oct 23 2012 Here a small update:
- Rob T (20/30) Oct 23 2012 That's a very significant difference in performance that should
- Paulo Pinto (6/39) Oct 24 2012 Having dealt with systems programming in languages with GC
- Rob T (20/25) Oct 24 2012 Well, performnce is only part of the GC equation. There's
- Paulo Pinto (20/48) Oct 24 2012 I do understand that.
- Rob T (17/21) Oct 24 2012 Probably no one in here is thinking that we should not have a GC.
- Jakob Ovrum (7/12) Oct 24 2012 You can very much link to C and C++ code, or have C and C++ code
- Jakob Ovrum (7/12) Oct 24 2012 You can very much link to C and C++ code, or have C and C++ code
- Paulo Pinto (9/21) Oct 24 2012 I am speaking without knowing if such thing already exists.
- Rob T (14/20) Oct 25 2012 My understanding of dynamic linking and the runtime is based on
- Jakob Ovrum (6/19) Oct 25 2012 You are right that compiling the runtime itself (druntime and
- Rob T (19/24) Oct 25 2012 Yes I can build my own D shared libs, both as static PIC (.a) and
- Jakob Ovrum (27/45) Oct 25 2012 Sorry, I keep forgetting that this is needed on non-Windows
- Jakob Ovrum (27/45) Oct 25 2012 Sorry, I keep forgetting that this is needed on non-Windows
- Rob T (3/6) Oct 25 2012 What is the GC proxy system, and how do I make use of it?
- Jakob Ovrum (14/20) Oct 25 2012 There's a function Runtime.loadLibrary in core.runtime that is
- bearophile (20/20) Oct 26 2012 I use this GC thread to show a little GC-related benchmark.
- Rob T (2/6) Oct 26 2012 Is this happening with dmd 2.060 as released?
- bearophile (6/7) Oct 26 2012 I'm using 2.061alpha git head, but I guess the situation is the
- bearophile (4/5) Oct 27 2012 And setting trades.length to zero and then using GC.free() on its
- bearophile (5/5) Oct 27 2012 And with the usual optimizations (struct splitting) coming from
I rewrote a 3d game I created during my studies with D 2.0 to manual memory mangement. If I'm not studying I'm working in the 3d Engine deparement of Havok. As I needed to pratice manual memory management and did want to get rid of the GC in D for quite some time, I did go through all this effort to create a GC free version of my game. The results are: DMD GC Version: 71 FPS, 14.0 ms frametime GDC GC Version: 128.6 FPS, 7.72 ms frametime DMD MMM Version: 142.8 FPS, 7.02 ms frametime GC collection times: DMD GC Version: 8.9 ms GDC GC Version: 4.1 ms As you see the manual managed version is twice as fast as the garbage collected one. Even the highly optimized version created with GDC is still slower the the manual memory management. You can find the full article at: http://3d.benjamin-thaut.de/?p=20#more-20 Feedback is welcome. Kind Regards Benjamin Thaut
Sep 05 2012
On 05-09-2012 13:03, Benjamin Thaut wrote:I rewrote a 3d game I created during my studies with D 2.0 to manual memory mangement. If I'm not studying I'm working in the 3d Engine deparement of Havok. As I needed to pratice manual memory management and did want to get rid of the GC in D for quite some time, I did go through all this effort to create a GC free version of my game. The results are: DMD GC Version: 71 FPS, 14.0 ms frametime GDC GC Version: 128.6 FPS, 7.72 ms frametime DMD MMM Version: 142.8 FPS, 7.02 ms frametime GC collection times: DMD GC Version: 8.9 ms GDC GC Version: 4.1 ms As you see the manual managed version is twice as fast as the garbage collected one. Even the highly optimized version created with GDC is still slower the the manual memory management. You can find the full article at: http://3d.benjamin-thaut.de/?p=20#more-20 Feedback is welcome. Kind Regards Benjamin ThautIs source code available anywhere? Also, I have to point out that programming for a garbage collected runtime is very different from doing manual memory management. The same patterns don't apply, and you optimize in different ways. For instance, when using a GC, it is very recommendable that you allocate up front and use object pooling - and most importantly, don't allocate at all during your render loop. -- Alex Rønne Petersen alex lycus.org http://lycus.org
Sep 05 2012
Am 05.09.2012 13:10, schrieb Alex Rønne Petersen:Is source code available anywhere? Also, I have to point out that programming for a garbage collected runtime is very different from doing manual memory management. The same patterns don't apply, and you optimize in different ways. For instance, when using a GC, it is very recommendable that you allocate up front and use object pooling - and most importantly, don't allocate at all during your render loop.The sourcecode is not aviable yet, as it is in a repository of my university, but I can zip it and upload the current version if that is wanted. But it currently does only support Windows and does not have any setup instructions yet. I do object pooling in both versions, as in game developement you usually don't allocate during the frame. But still in the GC version you have the problem that way to many parts of the language allocate and you don't event notice it when using the GC. Just to clarify, I'm into 3d engine developement since about 7 years now. So I'm not a newcomer to the subject. Kind Regards Benjamin Thaut
Sep 05 2012
Benjamin Thaut:But still in the GC version you have the problem that way to many parts of the language allocate and you don't event notice it when using the GC.Maybe a compiler-enforced annotation for functions and modules is able to remove this problem in D. Bye, bearophile
Sep 05 2012
On 05-09-2012 13:19, Benjamin Thaut wrote:Am 05.09.2012 13:10, schrieb Alex Rønne Petersen:Sure, I just want to point out that it's a problem with the language (GC allocations being very non-obvious) as opposed to the nature of GC. -- Alex Rønne Petersen alex lycus.org http://lycus.orgIs source code available anywhere? Also, I have to point out that programming for a garbage collected runtime is very different from doing manual memory management. The same patterns don't apply, and you optimize in different ways. For instance, when using a GC, it is very recommendable that you allocate up front and use object pooling - and most importantly, don't allocate at all during your render loop.The sourcecode is not aviable yet, as it is in a repository of my university, but I can zip it and upload the current version if that is wanted. But it currently does only support Windows and does not have any setup instructions yet. I do object pooling in both versions, as in game developement you usually don't allocate during the frame. But still in the GC version you have the problem that way to many parts of the language allocate and you don't event notice it when using the GC. Just to clarify, I'm into 3d engine developement since about 7 years now. So I'm not a newcomer to the subject. Kind Regards Benjamin Thaut
Sep 05 2012
Am 05.09.2012 14:00, schrieb Alex Rønne Petersen:Sure, I just want to point out that it's a problem with the language (GC allocations being very non-obvious) as opposed to the nature of GC.Thats exactly what I want to cause with this post. More effort should be put into the parts of D that currently allocate, but absolutley don't have to. Also the statement "You can use D without a GC" is not quite as easy as the homepage makes it sound. My favorite hidden allocation so far is: class A {} class B : A{} A a = new A(); B b = new B(); if(a == b) //this will allocate { } Kind Regards Benjamin Thaut
Sep 05 2012
Am 05.09.2012 14:07, schrieb Benjamin Thaut:class A {} class B : A{} A a = new A(); B b = new B(); if(a == b) //this will allocate { }Should be: class A {} class B : A{} const(A) a = new A(); const(B) b = new B(); if(a == b) //this will allocate { }
Sep 05 2012
On 05-09-2012 14:07, Benjamin Thaut wrote:Am 05.09.2012 14:00, schrieb Alex Rønne Petersen: >Very true. I've often thought we should ship a GC-less druntime in the normal distribution.Sure, I just want to point out that it's a problem with the language (GC allocations being very non-obvious) as opposed to the nature of GC.Thats exactly what I want to cause with this post. More effort should be put into the parts of D that currently allocate, but absolutley don't have to. Also the statement "You can use D without a GC" is not quite as easy as the homepage makes it sound.My favorite hidden allocation so far is: class A {} class B : A{} A a = new A(); B b = new B(); if(a == b) //this will allocate { }Where's the catch? From looking in druntime, I don't see where the allocation could occur.Kind Regards Benjamin Thaut-- Alex Rønne Petersen alex lycus.org http://lycus.org
Sep 05 2012
Am 05.09.2012 14:14, schrieb Alex Rønne Petersen:Where's the catch? From looking in druntime, I don't see where the allocation could occur.Everything is in object_.d: equals_t opEquals(Object lhs, Object rhs) { if (lhs is rhs) return true; if (lhs is null || rhs is null) return false; if (typeid(lhs) == typeid(rhs)) return lhs.opEquals(rhs); return lhs.opEquals(rhs) && rhs.opEquals(lhs); } Will trigger a comparison of the TypeInfo objects with if (typeid(lhs) == typeid(rhs)) Which will after some function calls trigger opEquals of TypeInfo override equals_t opEquals(Object o) { /* TypeInfo instances are singletons, but duplicates can exist * across DLL's. Therefore, comparing for a name match is * sufficient. */ if (this is o) return true; TypeInfo ti = cast(TypeInfo)o; return ti && this.toString() == ti.toString(); } Then because they are const, TypeInfo_Const.toString() will be called: override string toString() { return cast(string) ("const(" ~ base.toString() ~ ")"); } which allocates, due to array concardination. But this only happens, if they are not of the same type, and if one of them has a storage qualifier. Kind Regards Benjamin Thaut
Sep 05 2012
On Wednesday, 5 September 2012 at 12:27:05 UTC, Benjamin Thaut wrote:Then because they are const, TypeInfo_Const.toString() will be called: override string toString() { return cast(string) ("const(" ~ base.toString() ~ ")"); } which allocates, due to array concardination.Wow.
Sep 05 2012
Am 05.09.2012 14:34, schrieb Peter Alexander:On Wednesday, 5 September 2012 at 12:27:05 UTC, Benjamin Thaut wrote:I already have a fix for this. https://github.com/Ingrater/druntime/commit/74713f7af496fd50fe4cfe60b3d9906b87efbdb6 https://github.com/Ingrater/druntime/commit/05c440b0322d39cf98425f50172c468c6659efb8 If I find a good description how to do pull requests, I might be able to do one. Kind Regards Benjamin ThautThen because they are const, TypeInfo_Const.toString() will be called: override string toString() { return cast(string) ("const(" ~ base.toString() ~ ")"); } which allocates, due to array concardination.Wow.
Sep 05 2012
On 5 September 2012 13:27, Benjamin Thaut <code benjamin-thaut.de> wrote:Am 05.09.2012 14:14, schrieb Alex R=F8nne Petersen:This got fixed. Said code is now: override equals_t opEquals(Object o) { if (this is o) return true; auto c =3D cast(const TypeInfo_Class)o; return c && this.info.name =3D=3D c.info.name; } Causing no hidden allocation. Regards --=20 Iain Buclaw *(p < e ? p++ : p) =3D (c & 0x0f) + '0';Where's the catch? From looking in druntime, I don't see where the allocation could occur.Everything is in object_.d: equals_t opEquals(Object lhs, Object rhs) { if (lhs is rhs) return true; if (lhs is null || rhs is null) return false; if (typeid(lhs) =3D=3D typeid(rhs)) return lhs.opEquals(rhs); return lhs.opEquals(rhs) && rhs.opEquals(lhs); } Will trigger a comparison of the TypeInfo objects with if (typeid(lhs) =3D=3D typeid(rhs)) Which will after some function calls trigger opEquals of TypeInfo override equals_t opEquals(Object o) { /* TypeInfo instances are singletons, but duplicates can exist * across DLL's. Therefore, comparing for a name match is * sufficient. */ if (this is o) return true; TypeInfo ti =3D cast(TypeInfo)o; return ti && this.toString() =3D=3D ti.toString(); }
Sep 05 2012
On 5 September 2012 14:04, Iain Buclaw <ibuclaw ubuntu.com> wrote:On 5 September 2012 13:27, Benjamin Thaut <code benjamin-thaut.de> wrote:Oops, let me correct myself. This was hacked at to call the *correct* opEquals method above. bool opEquals(const Object lhs, const Object rhs) { // A hack for the moment. return opEquals(cast()lhs, cast()rhs); } Regards --=20 Iain Buclaw *(p < e ? p++ : p) =3D (c & 0x0f) + '0';Am 05.09.2012 14:14, schrieb Alex R=F8nne Petersen:This got fixed. Said code is now: override equals_t opEquals(Object o) { if (this is o) return true; auto c =3D cast(const TypeInfo_Class)o; return c && this.info.name =3D=3D c.info.name; } Causing no hidden allocation.Where's the catch? From looking in druntime, I don't see where the allocation could occur.Everything is in object_.d: equals_t opEquals(Object lhs, Object rhs) { if (lhs is rhs) return true; if (lhs is null || rhs is null) return false; if (typeid(lhs) =3D=3D typeid(rhs)) return lhs.opEquals(rhs); return lhs.opEquals(rhs) && rhs.opEquals(lhs); } Will trigger a comparison of the TypeInfo objects with if (typeid(lhs) =3D=3D typeid(rhs)) Which will after some function calls trigger opEquals of TypeInfo override equals_t opEquals(Object o) { /* TypeInfo instances are singletons, but duplicates can exist * across DLL's. Therefore, comparing for a name match is * sufficient. */ if (this is o) return true; TypeInfo ti =3D cast(TypeInfo)o; return ti && this.toString() =3D=3D ti.toString(); }
Sep 05 2012
Am 05.09.2012 15:07, schrieb Iain Buclaw:On 5 September 2012 14:04, Iain Buclaw <ibuclaw ubuntu.com> wrote:Still, comparing two type info objects will result in one or multiple allocations most of the time. Kind Regards Benjamin ThautOn 5 September 2012 13:27, Benjamin Thaut <code benjamin-thaut.de> wrote:Oops, let me correct myself. This was hacked at to call the *correct* opEquals method above. bool opEquals(const Object lhs, const Object rhs) { // A hack for the moment. return opEquals(cast()lhs, cast()rhs); } RegardsAm 05.09.2012 14:14, schrieb Alex Rønne Petersen:This got fixed. Said code is now: override equals_t opEquals(Object o) { if (this is o) return true; auto c = cast(const TypeInfo_Class)o; return c && this.info.name == c.info.name; } Causing no hidden allocation.Where's the catch? From looking in druntime, I don't see where the allocation could occur.Everything is in object_.d: equals_t opEquals(Object lhs, Object rhs) { if (lhs is rhs) return true; if (lhs is null || rhs is null) return false; if (typeid(lhs) == typeid(rhs)) return lhs.opEquals(rhs); return lhs.opEquals(rhs) && rhs.opEquals(lhs); } Will trigger a comparison of the TypeInfo objects with if (typeid(lhs) == typeid(rhs)) Which will after some function calls trigger opEquals of TypeInfo override equals_t opEquals(Object o) { /* TypeInfo instances are singletons, but duplicates can exist * across DLL's. Therefore, comparing for a name match is * sufficient. */ if (this is o) return true; TypeInfo ti = cast(TypeInfo)o; return ti && this.toString() == ti.toString(); }
Sep 05 2012
On 9/5/12 4:59 PM, Benjamin Thaut wrote:Still, comparing two type info objects will result in one or multiple allocations most of the time.Could you please submit a patch for that? Thanks! Andrei P.S. Very nice work. Congrats!
Sep 05 2012
Benjamin Thaut wrote:I do object pooling in both versions, as in game developement you usually don't allocate during the frame. But still in the GC version you have the problem that way to many parts of the language allocate and you don't event notice it when using the GC.There's one proposed solution to this problem: http://forum.dlang.org/thread/k1rlhn$19du$1 digitalmars.com
Sep 05 2012
On Wednesday, 5 September 2012 at 12:28:43 UTC, Piotr Szturmaj wrote:Benjamin Thaut wrote:It's a bad solution imho. Monitoring the druntime and hunting every part that allocates until our codebase is correct like Benjamen Thaut is a much better solutionI do object pooling in both versions, as in game developement you usually don't allocate during the frame. But still in the GC version you have the problem that way to many parts of the language allocate and you don't event notice it when using the GC.There's one proposed solution to this problem: http://forum.dlang.org/thread/k1rlhn$19du$1 digitalmars.com
Sep 10 2012
SomeDude:It's a bad solution imho. Monitoring the druntime and hunting every part that allocates until our codebase is correct like Benjamen Thaut is a much better solutionWhy do you think such hunt is better than letting the compiler tell you what parts of your program have the side effects you want to avoid? Bye, bearophile
Sep 11 2012
Is not difficult to implement, as the compiler only needs to warn that the emission of /certain/ library calls /may/ cause heap allocations. Regards. ---- Iain Buclaw *(p < e ? p++ : p) = (c & 0x0f) + '0'; On 11 Sep 2012 11:31, "bearophile" <bearophileHUGS lycos.com> wrote:SomeDude:that allocates until our codebase is correct like Benjamen Thaut is a much better solutionIt's a bad solution imho. Monitoring the druntime and hunting every partWhy do you think such hunt is better than letting the compiler tell youwhat parts of your program have the side effects you want to avoid?Bye, bearophile
Sep 11 2012
On Tuesday, 11 September 2012 at 10:28:29 UTC, bearophile wrote:SomeDude:My problem is you litter your codebase with nogc everywhere. In similar fashion, the nothrow keyword, for instance, has to be appended just about everywhere and I find it very ugly on its own. Basically, with this scheme, you have to annotate every single method you write for each and every guarantee (nothrow, nogc, nosideeffect, noshared, whatever you fancy) you want to ensure. This doesn't scale well at all. I would find it okay to use a noalloc annotation as a shortcut for a compiler switch or a an external tool to detect allocations in some part of code (as a digression, I tend to think D annotations as compiler or tooling switches. One could imagine a general scheme where one associates a annotation with a compiler/tool switch whose effect is limited to the annotated scope). I suppose the tool has to build the full call tree starting with the nogc method until it reaches the leaves or finds calls to new or malloc; you would have to do that for every single nogc annotation, which could be very slow, unless you trust the developer that indeed his code doesn't allocate, which means he effectively needs to litter his codebase with nogc keywords.It's a bad solution imho. Monitoring the druntime and hunting every part that allocates until our codebase is correct like Benjamen Thaut is a much better solutionWhy do you think such hunt is better than letting the compiler tell you what parts of your program have the side effects you want to avoid? Bye, bearophile
Sep 11 2012
class Foo { safe nothrow: void method_is_nothrow(){} void method_is_also_nothrow(){} } or class Foo { safe nothrow { = void method_is_nothrow(){} void method_is_also_nothrow(){} } } no need to append it to every single method by hand... Am 12.09.2012, 04:38 Uhr, schrieb SomeDude <lovelydear mailmetrash.com>:=On Tuesday, 11 September 2012 at 10:28:29 UTC, bearophile wrote:=SomeDude:It's a bad solution imho. Monitoring the druntime and hunting every =t =part that allocates until our codebase is correct like Benjamen Thau=ou =is a much better solutionWhy do you think such hunt is better than letting the compiler tell y=ar =what parts of your program have the side effects you want to avoid? Bye, bearophileMy problem is you litter your codebase with nogc everywhere. In simil=fashion, the nothrow keyword, for instance, has to be appended just =about everywhere and I find it very ugly on its own. Basically, with =this scheme, you have to annotate every single method you write for ea=ch =and every guarantee (nothrow, nogc, nosideeffect, noshared, whatever y=ou =fancy) you want to ensure. This doesn't scale well at all. I would find it okay to use a noalloc annotation as a shortcut for a ==compiler switch or a an external tool to detect allocations in some pa=rt =of code (as a digression, I tend to think D annotations as compiler o=r =tooling switches. One could imagine a general scheme where one =associates a annotation with a compiler/tool switch whose effect is =limited to the annotated scope). I suppose the tool has to build the full call tree starting with the =nogc method until it reaches the leaves or finds calls to new or =malloc; you would have to do that for every single nogc annotation, =which could be very slow, unless you trust the developer that indeed h=is =code doesn't allocate, which means he effectively needs to litter his ==codebase with nogc keywords.-- = Erstellt mit Operas revolution=E4rem E-Mail-Modul: http://www.opera.com/= mail/
Sep 12 2012
On Wednesday, 12 September 2012 at 02:37:52 UTC, SomeDude wrote:On Tuesday, 11 September 2012 at 10:28:29 UTC, bearophile wrote:This is partially what happens in C++/CLI and C++/CX.SomeDude:My problem is you litter your codebase with nogc everywhere. In similar fashion, the nothrow keyword, for instance, has to be appended just about everywhere and I find it very ugly on its own. Basically, with this scheme, you have to annotate every single method you write for each and every guarantee (nothrow, nogc, nosideeffect, noshared, whatever you fancy) you want to ensure. This doesn't scale well at all. I would find it okay to use a noalloc annotation as a shortcut for a compiler switch or a an external tool to detect allocations in some part of code (as a digression, I tend to think D annotations as compiler or tooling switches. One could imagine a general scheme where one associates a annotation with a compiler/tool switch whose effect is limited to the annotated scope). I suppose the tool has to build the full call tree starting with the nogc method until it reaches the leaves or finds calls to new or malloc; you would have to do that for every single nogc annotation, which could be very slow, unless you trust the developer that indeed his code doesn't allocate, which means he effectively needs to litter his codebase with nogc keywords.It's a bad solution imho. Monitoring the druntime and hunting every part that allocates until our codebase is correct like Benjamen Thaut is a much better solutionWhy do you think such hunt is better than letting the compiler tell you what parts of your program have the side effects you want to avoid? Bye, bearophile
Sep 13 2012
On Tuesday, 11 September 2012 at 10:28:29 UTC, bearophile wrote:SomeDude:The compiler option warning about undesirable heap allocations will allow for complete undesirable allocations to be identified much more easily and without missing anything. This is a general solution to a general problem where a programmer wishes to avoid heap allocations for whatever reason. --rtIt's a bad solution imho. Monitoring the druntime and hunting every part that allocates until our codebase is correct like Benjamen Thaut is a much better solutionWhy do you think such hunt is better than letting the compiler tell you what parts of your program have the side effects you want to avoid?
Oct 23 2012
On 05-09-2012 13:03, Benjamin Thaut wrote:I rewrote a 3d game I created during my studies with D 2.0 to manual memory mangement. If I'm not studying I'm working in the 3d Engine deparement of Havok. As I needed to pratice manual memory management and did want to get rid of the GC in D for quite some time, I did go through all this effort to create a GC free version of my game. The results are: DMD GC Version: 71 FPS, 14.0 ms frametime GDC GC Version: 128.6 FPS, 7.72 ms frametime DMD MMM Version: 142.8 FPS, 7.02 ms frametime GC collection times: DMD GC Version: 8.9 ms GDC GC Version: 4.1 ms As you see the manual managed version is twice as fast as the garbage collected one. Even the highly optimized version created with GDC is still slower the the manual memory management. You can find the full article at: http://3d.benjamin-thaut.de/?p=20#more-20 Feedback is welcome. Kind Regards Benjamin ThautBTW, your blog post appears to have comparison misspelled. -- Alex Rønne Petersen alex lycus.org http://lycus.org
Sep 05 2012
On Wednesday, 5 September 2012 at 12:22:52 UTC, Alex Rønne Petersen wrote:On 05-09-2012 13:03, Benjamin Thaut wrote:Also "development". It was interesting to read it. What about GDC MMM?I rewrote a 3d game I created during my studies with D 2.0 to manual memory mangement. If I'm not studying I'm working in the 3d Engine deparement of Havok. As I needed to pratice manual memory management and did want to get rid of the GC in D for quite some time, I did go through all this effort to create a GC free version of my game. The results are: DMD GC Version: 71 FPS, 14.0 ms frametime GDC GC Version: 128.6 FPS, 7.72 ms frametime DMD MMM Version: 142.8 FPS, 7.02 ms frametime GC collection times: DMD GC Version: 8.9 ms GDC GC Version: 4.1 ms As you see the manual managed version is twice as fast as the garbage collected one. Even the highly optimized version created with GDC is still slower the the manual memory management. You can find the full article at: http://3d.benjamin-thaut.de/?p=20#more-20 Feedback is welcome. Kind Regards Benjamin ThautBTW, your blog post appears to have comparison misspelled.
Sep 05 2012
Am 05.09.2012 16:07, schrieb anonymous:It was interesting to read it. What about GDC MMM?The GDC druntime does have a different folder structure, which makes it a lot more time consuming to add in the changes. Also it is not possible to rebuild phobos or druntime with the binary release of GDC Mingw. You need the complete build setup for GDC mingw to do that. As this is not documented very well and quite some work I didn't go through that additional effort. Kind Regards Benjamin Thaut
Sep 05 2012
On 9/5/12 1:03 PM, Benjamin Thaut wrote:http://3d.benjamin-thaut.de/?p=20#more-20Smile, you're on reddit: http://www.reddit.com/r/programming/comments/ze4cx/real_world_comparison_gc_vs_manual_memory/ Andrei
Sep 05 2012
Benjamin Thaut:http://3d.benjamin-thaut.de/?p=20#more-20Regardind your issues list, most of them are fixable, like the one regarding array literals, and even the one regarding the invariant handler. But I didn't know about this, and I don't know how and if this is fixable:The new statement will not free any memory if the constructor throws a exception.<Insights welcome. Bye, bearophile
Sep 05 2012
On 5 September 2012 15:57, bearophile <bearophileHUGS lycos.com> wrote:Benjamin Thaut:I have no clue what the issue with invariant handlers is... Calls to them are not emitted in release code, and if you think they are, then you've probably built either your application, or the library you are using wrong. Array literals are not so easy to fix. I once thought that it would be optimal to make it a stack initialisation given that all values are known at compile time, this infact caused many strange SEGV's in quite a few of my programs (most are parsers / interpreters, so things that go down *heavy* nested into itself, and it was under these circumstances that array literals on the stack would go corrupt in one way or another causing *huge* errors in perfectly sound code). -- Iain Buclaw *(p < e ? p++ : p) = (c & 0x0f) + '0';http://3d.benjamin-thaut.de/?p=20#more-20Regardind your issues list, most of them are fixable, like the one regarding array literals, and even the one regarding the invariant handler.
Sep 05 2012
Iain Buclaw: Most of the array allocation cases we are talking about are like: void main() { int[3] a = [1, 2, 3]; // fixed size array } That currently produces, with DMD: __Dmain: L0: sub ESP, 010h mov EAX, offset FLAT:_D12TypeInfo_xAi6__initZ push EBX push 0Ch push 3 push EAX call near ptr __d_arrayliteralTX add ESP, 8 mov EBX, EAX mov dword ptr [EAX], 1 mov ECX, EBX push EBX lea EDX, 010h[ESP] mov dword ptr 4[EBX], 2 mov dword ptr 8[EBX], 3 push EDX call near ptr _memcpy add ESP, 0Ch xor EAX, EAX pop EBX add ESP, 010h ret There is also the case for dynamic arrays: void main() { int[] a = [1, 2, 3]; // use a here } But this is a harder problem, to leave for later.this infact caused many strange SEGV's in quite a few of my programs (most are parsers / interpreters, so things that go down *heavy* nested into itself, and it was under these circumstances that array literals on the stack would go corrupt in one way or another causing *huge* errors in perfectly sound code).Do you know the cause of such corruptions? maybe they are caused by other compiler bugs... And what to do regarding those exceptions in constructors? :-) Bye, bearophile
Sep 05 2012
On 5 September 2012 16:31, bearophile <bearophileHUGS lycos.com> wrote:Iain Buclaw: Most of the array allocation cases we are talking about are like: void main() { int[3] a = [1, 2, 3]; // fixed size array } That currently produces, with DMD: __Dmain: L0: sub ESP, 010h mov EAX, offset FLAT:_D12TypeInfo_xAi6__initZ push EBX push 0Ch push 3 push EAX call near ptr __d_arrayliteralTX add ESP, 8 mov EBX, EAX mov dword ptr [EAX], 1 mov ECX, EBX push EBX lea EDX, 010h[ESP] mov dword ptr 4[EBX], 2 mov dword ptr 8[EBX], 3 push EDX call near ptr _memcpy add ESP, 0Ch xor EAX, EAX pop EBX add ESP, 010h ret There is also the case for dynamic arrays: void main() { int[] a = [1, 2, 3]; // use a here } But this is a harder problem, to leave for later.I think it was mostly due to that you can't tell the difference between array literals that are to be assigned to either dynamic or static arrays (as far as I can tell). I do believe that the issues surrounded dynamic arrays causing SEGVs, and not static (I don't recall ever needing the use of a static array :-). Regards -- Iain Buclaw *(p < e ? p++ : p) = (c & 0x0f) + '0';this infact caused many strange SEGV's in quite a few of my programs (most are parsers / interpreters, so things that go down *heavy* nested into itself, and it was under these circumstances that array literals on the stack would go corrupt in one way or another causing *huge* errors in perfectly sound code).Do you know the cause of such corruptions? maybe they are caused by other compiler bugs... And what to do regarding those exceptions in constructors? :-)
Sep 05 2012
Iain Buclaw:I think it was mostly due to that you can't tell the difference between array literals that are to be assigned to either dynamic or static arrays (as far as I can tell).I see.I do believe that the issues surrounded dynamic arrays causing SEGVs, and not static (I don't recall ever needing the use of a static array :-).I use fixed size arrays all the time in D. Heap-allocated arrays are overused in D. They produce garbage and in lot of cases they are not needed. Using them a lot is sometimes a bad habit (if you are writing script-like programs they are OK), that's also encouraged by making them almost second-class citizens in Phobos (and druntime, using them as AA keys causes performance troubles). If you take a look at Ada language you see how much static/stack-allocated arrays are used. In high performance code they help, and I'd like D programmers and Phobos devs to give them a little more consideration. Bye, bearophile
Sep 05 2012
If you take a look at Ada language you see how much static/stack-allocated arrays are used. In high performance code they help, and I'd like D programmers and Phobos devs to give them a little more consideration.Also, the lack of variable length stack allocated arrays in D forces you to over-allocate, wasting stack space, or forces you to use alloca() that is bug-prone and makes things not easy if you need a multi dimensional array. Bye, bearophile
Sep 05 2012
Am 05.09.2012 16:57, schrieb bearophile:Benjamin Thaut:Well, as overloading new and delete is deprecated, and the new which is part of the language only works together with a GC I don't think that anything will be done about this. Its not a big problem in D because you can't create arrays of objects so that multiple constructors will be called at the same time. (Which is the biggest issue in c++ with exceptions and constructors). Also doe to memory pre initialization the object will always be in a meaningfull state, which helps with exception handling too. My replacement just calls the constructor, and if a exception is thrown, the destructor is called and the memory is freed, then the new statement returns null. Works flawlessley so far. Kind Regards Benjamin Thauthttp://3d.benjamin-thaut.de/?p=20#more-20Regardind your issues list, most of them are fixable, like the one regarding array literals, and even the one regarding the invariant handler. But I didn't know about this, and I don't know how and if this is fixable:The new statement will not free any memory if the constructor throws a exception.<Insights welcome. Bye, bearophile
Sep 05 2012
On Sep 5, 2012, at 8:08 AM, Iain Buclaw <ibuclaw ubuntu.com> wrote:=20 Array literals are not so easy to fix. I once thought that it would be optimal to make it a stack initialisation given that all values are known at compile time, this infact caused many strange SEGV's in quite a few of my programs (most are parsers / interpreters, so things that go down *heavy* nested into itself, and it was under these circumstances that array literals on the stack would go corrupt in one way or another causing *huge* errors in perfectly sound code).It sounds like your code has escaping references? I think the presence = of a GC tends to eliminate a lot of thought about data ownership. This = is usually beneficial in that maintaining ownership rules tends to be a = huge pain, but then it also tends to avoid issues like this.=
Sep 05 2012
My "standard" library is now aviable on github: https://github.com/Ingrater/thBase Kind Regards Benjamin Thaut
Sep 05 2012
Am Wed, 05 Sep 2012 13:03:37 +0200 schrieb Benjamin Thaut <code benjamin-thaut.de>:I rewrote a 3d game I created during my studies with D 2.0 to manual memory mangement. If I'm not studying I'm working in the 3d Engine deparement of Havok. As I needed to pratice manual memory management and did want to get rid of the GC in D for quite some time, I did go through all this effort to create a GC free version of my game. The results are: DMD GC Version: 71 FPS, 14.0 ms frametime GDC GC Version: 128.6 FPS, 7.72 ms frametime DMD MMM Version: 142.8 FPS, 7.02 ms frametime GC collection times: DMD GC Version: 8.9 ms GDC GC Version: 4.1 ms As you see the manual managed version is twice as fast as the garbage collected one. Even the highly optimized version created with GDC is still slower the the manual memory management. You can find the full article at: http://3d.benjamin-thaut.de/?p=20#more-20 Feedback is welcome.Would be great if some of the code could be merged into phobos, especially the memory tracker. But also things like memory or object pools would be great in phobos, an emplace wrapper which accepts a custom alloc function to replace new (and something similar for delete), etc. We really need a module for manual memory management (std.mmm?). And functions which currently use the GC to allocate should get overloads which take buffers (Or better support custom allocators, but that needs an allocator design first).
Sep 05 2012
Am 05.09.2012 19:31, schrieb Johannes Pfau:Would be great if some of the code could be merged into phobos, especially the memory tracker. But also things like memory or object pools would be great in phobos, an emplace wrapper which accepts a custom alloc function to replace new (and something similar for delete), etc. We really need a module for manual memory management (std.mmm?). And functions which currently use the GC to allocate should get overloads which take buffers (Or better support custom allocators, but that needs an allocator design first).I personally really like my composite template, which allows for direct composition of one class instance into another. It does not introduce additional indirections and the compiler will remind you, if you forgett to initialize it. https://github.com/Ingrater/druntime/blob/master/src/core/allocator.d#L670 Kind Regards Benjamin Thaut
Sep 05 2012
On Wednesday, 5 September 2012 at 11:03:03 UTC, Benjamin Thaut wrote:I rewrote a 3d game I created during my studies with D 2.0 to manual memory mangement. If I'm not studying I'm working in the 3d Engine deparement of Havok. As I needed to pratice manual memory management and did want to get rid of the GC in D for quite some time, I did go through all this effort to create a GC free version of my game. The results are: DMD GC Version: 71 FPS, 14.0 ms frametime GDC GC Version: 128.6 FPS, 7.72 ms frametime DMD MMM Version: 142.8 FPS, 7.02 ms frametime GC collection times: DMD GC Version: 8.9 ms GDC GC Version: 4.1 ms As you see the manual managed version is twice as fast as the garbage collected one. Even the highly optimized version created with GDC is still slower the the manual memory management. You can find the full article at: http://3d.benjamin-thaut.de/?p=20#more-20 Feedback is welcome. Kind Regards Benjamin ThautDid you try GC.disable/enable?
Sep 05 2012
On 9/5/2012 4:03 AM, Benjamin Thaut wrote:GC collection times: DMD GC Version: 8.9 ms GDC GC Version: 4.1 msI'd like it if you could add some instrumentation to see what accounts for the time difference. I presume they both use the same D source code.
Sep 05 2012
On 6 September 2012 00:10, Walter Bright <newshound2 digitalmars.com> wrote:On 9/5/2012 4:03 AM, Benjamin Thaut wrote:I'd say they are identical, but I don't really look at what goes on over on the MinGW port. -- Iain Buclaw *(p < e ? p++ : p) = (c & 0x0f) + '0';GC collection times: DMD GC Version: 8.9 ms GDC GC Version: 4.1 msI'd like it if you could add some instrumentation to see what accounts for the time difference. I presume they both use the same D source code.
Sep 05 2012
On 9/6/12, Iain Buclaw <ibuclaw ubuntu.com> wrote:I'd say they are identical, but I don't really look at what goes on over on the MinGW port.Speaking of which, I'd like to see if the Unilink linker would make any difference as well. It's known to make smaller binaries than Optlink. I think Unilink could be tested with MinGW if it supports whatever GDC outputs, to compare against LD.
Sep 05 2012
Walter Bright:I'd like it if you could add some instrumentation to see what accounts for the time difference. I presume they both use the same D source code.Maybe that performance difference comes from the sum of some metric tons of different little optimizations done by the GCC back-end. Bye, bearophile
Sep 05 2012
On 9/5/2012 5:01 PM, bearophile wrote:Walter Bright:We can trade guesses all day, and not get anywhere. Instrumentation and measurement is needed. I've investigated many similar things, and the truth usually turned out to be something nobody guessed or assumed. I recall the benchmark you posted where you guessed that dmd's integer code generation was woefully deficient. Examining the actual output showed that there wasn't a dime's worth of difference in the code generated from dmd vs gcc. The problem turned out to be the long division runtime library function. Fixing that brought the timings to parity. No code gen changes whatsoever were needed.I'd like it if you could add some instrumentation to see what accounts for the time difference. I presume they both use the same D source code.Maybe that performance difference comes from the sum of some metric tons of different little optimizations done by the GCC back-end.
Sep 05 2012
Walter Bright:No code gen changes whatsoever were needed.In that case I think I didn't specify what subsystem of the D compiler was not "good enough", I have just shown a performance difference. The division was slow, regardless of the cause. This is what's important for the final C/D programmer, not if the cause is a badly written division routine, or a bad/missing optimization stage. And regarding divisions, currently they are not optimized by dmd if divisors are small (like 10) and statically known. Bye, bearophile
Sep 06 2012
On Thursday, 6 September 2012 at 00:00:31 UTC, bearophile wrote:Walter Bright:In addition to Walter's response, it is very rare for advanced compiler optimisations to make >2x difference on any non-trivial code. Not impossible, but it's definitely suspicious.I'd like it if you could add some instrumentation to see what accounts for the time difference. I presume they both use the same D source code.Maybe that performance difference comes from the sum of some metric tons of different little optimizations done by the GCC back-end. Bye, bearophile
Sep 06 2012
On 9/6/2012 4:30 AM, Peter Alexander wrote:In addition to Walter's response, it is very rare for advanced compiler optimisations to make >2x difference on any non-trivial code. Not impossible, but it's definitely suspicious.I love trying to explain to people our debug builds are too slow because they have instrumented too much of the code, and haven't disabled any of it. A lot of people are pushed into debugging release builds as a result, which is pretty silly. Now there are some pathological cases: non-inlined constructors can sometimes kill in some cases you for 3d vector math type libraries 128 bit SIMD intrinsics with microsofts compiler in debug builds makes horrifically slow code, each operation has its results written to memory and then is reloaded for the next 'instruction'. I believe its two order of magnitudes slower (the extra instructions, plus pegging the read and write ports of the CPU hurt quite a lot too). These tend to be right functions so can be optimized in debug builds selectively . . .
Sep 06 2012
Am 06.09.2012 01:10, schrieb Walter Bright:On 9/5/2012 4:03 AM, Benjamin Thaut wrote:The code is identical, I did not change anything in the GC code. So it uses whatever code comes with the MinGW GDC 2.058 release. The problem with intstrumentation is, that I can not recompile druntime for the MinGW GDC, as this is not possible with the binary release of MinGW GDC and I did not go thorugh the effort to setup the whole build. I'm open to suggestions though how I could profile the GC without recompiling druntime. If someone else wants to profile this, I can also provide precompiled versions of both versions. -- Kind Regards Benjamin ThautGC collection times: DMD GC Version: 8.9 ms GDC GC Version: 4.1 msI'd like it if you could add some instrumentation to see what accounts for the time difference. I presume they both use the same D source code.
Sep 06 2012
On 2012-09-06 14:12, Benjamin Thaut wrote:Am 06.09.2012 01:10, schrieb Walter Bright:I don't know what Windows has but on Mac OS X there's this application: https://developer.apple.com/library/mac/#documentation/developertools/conceptual/InstrumentsUserGuide/Introduction/Introduction.html It lets you instrument any running application. -- /Jacob CarlborgOn 9/5/2012 4:03 AM, Benjamin Thaut wrote:The code is identical, I did not change anything in the GC code. So it uses whatever code comes with the MinGW GDC 2.058 release. The problem with intstrumentation is, that I can not recompile druntime for the MinGW GDC, as this is not possible with the binary release of MinGW GDC and I did not go thorugh the effort to setup the whole build. I'm open to suggestions though how I could profile the GC without recompiling druntime. If someone else wants to profile this, I can also provide precompiled versions of both versions.GC collection times: DMD GC Version: 8.9 ms GDC GC Version: 4.1 msI'd like it if you could add some instrumentation to see what accounts for the time difference. I presume they both use the same D source code.
Sep 06 2012
The problem with intstrumentation is, that I can not recompile druntime for the MinGW GDC, as this is not possible with the binary release of MinGW GDC and I did not go thorugh the effort to setup the whole build. I'm open to suggestions though how I could profile the GC without recompiling druntime. If someone else wants to profile this, I can also provide precompiled versions of both versions.You don't necessarily need to recompile anything with a sampling profiler like AMD Code Analyst or Very Sleepy
Sep 06 2012
Am 06.09.2012 15:30, schrieb ponce:I just tried profiling it with Very Sleepy but basically it only tells me for both versions that most of the time is spend in gcx.fullcollect. Just that the GDC version spends less time in gcx.fullcollect then the DMD version. As I can not rebuild druntime with GDC it will be quite hard to get detailed profiling results. I'm open for suggestions. Kind Regards Benjamin ThautThe problem with intstrumentation is, that I can not recompile druntime for the MinGW GDC, as this is not possible with the binary release of MinGW GDC and I did not go thorugh the effort to setup the whole build. I'm open to suggestions though how I could profile the GC without recompiling druntime. If someone else wants to profile this, I can also provide precompiled versions of both versions.You don't necessarily need to recompile anything with a sampling profiler like AMD Code Analyst or Very Sleepy
Sep 06 2012
I just tried profiling it with Very Sleepy but basically it only tells me for both versions that most of the time is spend in gcx.fullcollect. Just that the GDC version spends less time in gcx.fullcollect then the DMD version. As I can not rebuild druntime with GDC it will be quite hard to get detailed profiling results. I'm open for suggestions.As I can not rebuild druntime with GDC it will be quite hard to get detailed profiling results. I'm open for suggestions. Kind Regards Benjamin ThautYou might try AMD Code Analyst, it will highlight the bottleneck in the assembly listing. Then use a disassembler like IDA to get a feel of what the bottleneck could be.
Sep 06 2012
On 9/6/2012 10:50 AM, Benjamin Thaut wrote:I just tried profiling it with Very Sleepy but basically it only tells me for both versions that most of the time is spend in gcx.fullcollect. Just that the GDC version spends less time in gcx.fullcollect then the DMD version.Even so, that in itself is a good clue.
Sep 06 2012
On Thursday, 6 September 2012 at 20:44:29 UTC, Walter Bright wrote:On 9/6/2012 10:50 AM, Benjamin Thaut wrote:my bet is on, cross-module-inlining of bitop.btr failing... https://github.com/D-Programming-Language/druntime/blob/master/src/gc/gcbits.d version (DigitalMars) { version = bitops; } else version (GNU) { // use the unoptimized version } else version (D_InlineAsm_X86) { version = Asm86; } wordtype testClear(size_t i) { version (bitops) { return core.bitop.btr(data + 1, i); // this is faster! }I just tried profiling it with Very Sleepy but basically it only tells me for both versions that most of the time is spend in gcx.fullcollect. Just that the GDC version spends less time in gcx.fullcollect then the DMD version.Even so, that in itself is a good clue.
Sep 06 2012
On 7 September 2012 07:28, Sven Torvinger <Sven torvinger.se> wrote:On Thursday, 6 September 2012 at 20:44:29 UTC, Walter Bright wrote:You would be wrong. btr is a compiler intrinsic, so it is *always* inlined! Leaning towards Walter here that I would very much like to see hard evidence of your claims. :-) On a side note of that though, GDC has bt, btr, bts, etc, as intrinsics to its compiler front-end. So it would be no problem switching to version = bitops for version GNU. -- Iain Buclaw *(p < e ? p++ : p) = (c & 0x0f) + '0';On 9/6/2012 10:50 AM, Benjamin Thaut wrote:my bet is on, cross-module-inlining of bitop.btr failing... https://github.com/D-Programming-Language/druntime/blob/master/src/gc/gcbits.d version (DigitalMars) { version = bitops; } else version (GNU) { // use the unoptimized version } else version (D_InlineAsm_X86) { version = Asm86; } wordtype testClear(size_t i) { version (bitops) { return core.bitop.btr(data + 1, i); // this is faster! }I just tried profiling it with Very Sleepy but basically it only tells me for both versions that most of the time is spend in gcx.fullcollect. Just that the GDC version spends less time in gcx.fullcollect then the DMD version.Even so, that in itself is a good clue.
Sep 06 2012
On 9/6/2012 11:47 PM, Iain Buclaw wrote:On a side note of that though, GDC has bt, btr, bts, etc, as intrinsics to its compiler front-end. So it would be no problem switching to version = bitops for version GNU.Would it be easy to give that a try, and see what happens?
Sep 07 2012
On 7 September 2012 10:31, Walter Bright <newshound2 digitalmars.com> wrote:On 9/6/2012 11:47 PM, Iain Buclaw wrote:Sure, can do. Give me something to work against, and I will be able to produce the difference. -- Iain Buclaw *(p < e ? p++ : p) = (c & 0x0f) + '0';On a side note of that though, GDC has bt, btr, bts, etc, as intrinsics to its compiler front-end. So it would be no problem switching to version = bitops for version GNU.Would it be easy to give that a try, and see what happens?
Sep 07 2012
On 9/7/2012 2:52 AM, Iain Buclaw wrote:On 7 September 2012 10:31, Walter Bright <newshound2 digitalmars.com> wrote:Well, gdc with and without it!On 9/6/2012 11:47 PM, Iain Buclaw wrote:Sure, can do. Give me something to work against, and I will be able to produce the difference.On a side note of that though, GDC has bt, btr, bts, etc, as intrinsics to its compiler front-end. So it would be no problem switching to version = bitops for version GNU.Would it be easy to give that a try, and see what happens?
Sep 07 2012
On Sep 6, 2012, at 10:50 AM, Benjamin Thaut <code benjamin-thaut.de> wrote:Am 06.09.2012 15:30, schrieb ponce:or both versions that most of the time is spend in gcx.fullcollect.=20 I just tried profiling it with Very Sleepy but basically it only tells me f=The problem with intstrumentation is, that I can not recompile druntime for the MinGW GDC, as this is not possible with the binary release of MinGW GDC and I did not go thorugh the effort to setup the whole build. I'm open to suggestions though how I could profile the GC without recompiling druntime. If someone else wants to profile this, I can also provide precompiled versions of both versions.=20 You don't necessarily need to recompile anything with a sampling profiler like AMD Code Analyst or Very Sleepy =20Just that the GDC version spends less time in gcx.fullcollect then the DMD=version.=20 As I can not rebuild druntime with GDC it will be quite hard to get detail=ed profiling results.=20 I'm open for suggestions.What version flags are set by GDC vs. DMD in your target apps? The way "sto= p the world" is done on Linux vs. Windows is different, for example.=20=
Sep 06 2012
On 2012-09-07 01:53, Sean Kelly wrote:What version flags are set by GDC vs. DMD in your target apps? The way "stop the world" is done on Linux vs. Windows is different, for example.He's using only Windows as far as I understand, GDC MinGW. -- /Jacob Carlborg
Sep 06 2012
On Sep 6, 2012, at 10:57 PM, Jacob Carlborg <doob me.com> wrote:On 2012-09-07 01:53, Sean Kelly wrote: =20stop the world" is done on Linux vs. Windows is different, for example.What version flags are set by GDC vs. DMD in your target apps? The way "==20 He's using only Windows as far as I understand, GDC MinGW.Well sure, but MinGW is weird. I'd expect the Windows flag to be set for Min= GW and both the Windows and Posix flags set for Cygwin, but it seemed worth a= sking. If Windows and Posix are both set, the Windows method will be used fo= r "stop the world".=20=
Sep 07 2012
Am 07.09.2012 01:53, schrieb Sean Kelly:On Sep 6, 2012, at 10:50 AM, Benjamin Thaut <code benjamin-thaut.de> wrote:I did build druntime and phobos with -release -noboundscheck -inline -O for DMD. For MinGW GDC I just used whatever version of druntime and phobos came precompiled with it, so I can't tell you which flags have been used to compile that. But I can tell you that cygwin is not required to run or compile, so I think its not using any posix stuff. I'm going to upload a zip-package with the source for the GC version soon, but I have to deal with some licence stuff first. Kind Regards Benjamin ThautAm 06.09.2012 15:30, schrieb ponce:What version flags are set by GDC vs. DMD in your target apps? The way "stop the world" is done on Linux vs. Windows is different, for example.I just tried profiling it with Very Sleepy but basically it only tells me for both versions that most of the time is spend in gcx.fullcollect. Just that the GDC version spends less time in gcx.fullcollect then the DMD version. As I can not rebuild druntime with GDC it will be quite hard to get detailed profiling results. I'm open for suggestions.The problem with intstrumentation is, that I can not recompile druntime for the MinGW GDC, as this is not possible with the binary release of MinGW GDC and I did not go thorugh the effort to setup the whole build. I'm open to suggestions though how I could profile the GC without recompiling druntime. If someone else wants to profile this, I can also provide precompiled versions of both versions.You don't necessarily need to recompile anything with a sampling profiler like AMD Code Analyst or Very Sleepy
Sep 07 2012
On 9/7/12 6:31 PM, Benjamin Thaut wrote:I did build druntime and phobos with -release -noboundscheck -inline -O for DMD. For MinGW GDC I just used whatever version of druntime and phobos came precompiled with it, so I can't tell you which flags have been used to compile that. But I can tell you that cygwin is not required to run or compile, so I think its not using any posix stuff. I'm going to upload a zip-package with the source for the GC version soon, but I have to deal with some licence stuff first. Kind Regards Benjamin ThautYou mentioned some issues in Phobos with memory allocation, that you had to replace with your own code. It would be awesome if you could post more about that, and possibly post a few pull requests where directly applicable. Thanks, Andrei
Sep 07 2012
Am 07.09.2012 18:36, schrieb Andrei Alexandrescu:You mentioned some issues in Phobos with memory allocation, that you had to replace with your own code. It would be awesome if you could post more about that, and possibly post a few pull requests where directly applicable. Thanks, AndreiLet me give a bit more details about what I did and why. Druntime: I added a reference counting mechanism. core.refcounted in my druntime branch. I created a reference counted array which is as close to the native D array as currently possible (compiler bugs, type system issues, etc). also in core.refcounted. It however does not replace the default string or array type in all cases because it would lead to reference counting in uneccessary places. The focus is to get only reference couting where absolutly neccessary. I'm still using the standard string type as a "only valid for current scope" kind of string. I created a allocator base interface which is used by everything that allocates, also I created replacement templates for new and delete. Located in core.allocator I created a new hashmap container wich is cache friendly and does not leak memory. Located in core.hashmap I created a memory tracking allocator in core.allocator which can be turned on and off with a version statement (as it has to run before and after module ctors dtors etc) I changed all parts of druntime that do string processing to use the reference counted array, so it no longer leaks. I made the Thread class reference counted so it no longer leaks. I fixed the type info comparsion and numerous other issues. Of all these changes only the type info fix will be easily convertible into the default druntime because it does not depend on any of my other stuff. I will do a merge request for this fix as soon as I find some time. Phobos: I threw away most of phobos because it didn't match my requirements. The only modules I kept are std.traits, std.random, std.math, std.typetuple, std.uni The parts of these modules that I use have been changed so they don't leak memory. Mostly this comes down to use reference counted strings for exception error message generation. I did require the option to specify a allocator for any function that allocates. Either by template argument, by function parameter or both, depending on the case. As custom allocators can not be pure this is a major issue with phobos, because adding allocators to the functions would make them unpure instantly. I know about the C-Linkage pure hack but its really a hack and it does not work for templates. So I think most of my changes are not directly applicable because: - You most likely won't like the way I implemented reference counting - You might won't like my allocator design - My standard library goes more into the C++ direction and is not as easly usable as phobos (as performance comes first for me, and usability is second) - All my changes heavily depend on some of the functionality I added to druntime. - The neccessary changes to phobos would break a lot of code because some of the function properties like pure couldn't be used any more, as a result of language limitations. Kind Regards Benjamin Thaut
Sep 07 2012
You can find the full article at: http://3d.benjamin-thaut.de/?p=20#more-20You make some good points about what happen under the hood. Especially: - homogeneous variadic function call allocate - comparison of const object allocate - useless druntime invariant handlers calls I removed some homogeneous variadic function calls from my own code.
Sep 07 2012
Benjamin Thaut wrote:I rewrote a 3d game I created during my studies with D 2.0 to manual memory mangement. If I'm not studying I'm working in the 3d Engine deparement of Havok. As I needed to pratice manual memory management and did want to get rid of the GC in D for quite some time, I did go through all this effort to create a GC free version of my game. The results are: DMD GC Version: 71 FPS, 14.0 ms frametime GDC GC Version: 128.6 FPS, 7.72 ms frametime DMD MMM Version: 142.8 FPS, 7.02 ms frametimeInteresting. What about measuring a GDC MMM version? Because I wonder what is the GC overhead. With DMD it's two. Maybe that factor is lower with GDC. I would be interested in some numbers regarding memory overhead. To get a more complete picture of the impact on resources when using the GC. Jens
Sep 07 2012
The full sourcecode for the non-GC version is now aviable on github. The GC version will follow soon. https://github.com/Ingrater/Spacecraft Kind Regards Benjamin Thaut
Sep 09 2012
Here a small update: I found a piece of code that did manually slow down the simulation in case it got to fast. This code never kicked in with the GC version, because it never reached the margin. The manual memory managed version however did reach the margin and was slowed down. With this piece of code removed the manual memory managed version runs at 5 ms which is 200 FPS and thus nearly 3 times as fast as the GC collected version. Kind Regards Benjamin Thaut
Oct 23 2012
On Tuesday, 23 October 2012 at 16:30:41 UTC, Benjamin Thaut wrote:Here a small update: I found a piece of code that did manually slow down the simulation in case it got to fast. This code never kicked in with the GC version, because it never reached the margin. The manual memory managed version however did reach the margin and was slowed down. With this piece of code removed the manual memory managed version runs at 5 ms which is 200 FPS and thus nearly 3 times as fast as the GC collected version. Kind Regards Benjamin ThautThat's a very significant difference in performance that should not be taken lightly. I don't really see a general solution to the GC problem other than to design things such that a D programmer has a truely practical ability to not use the GC at all and ensure that it does not sneak back in. IMHO I think it was a mistake to assume that D should depend on a GC to the degree that has taken place. The GC is also the reason why D has a few other significant technical problems not related to performance, such as inability to link D code to C/C++ code if the GC is required on the D side, and inability to build dynamic liraries and runtime loadable plugins that link to the runtime system - the GC apparently does not work correctly in these situatons, although the problem is solvable how this was allowed to happen in the first place is difficult to understand. I'll be a much more happy D programmer if I could guarantee where and when the GC is used, therefore the GC should be 100% optional in practice, not just in theory. --rt
Oct 23 2012
On Tuesday, 23 October 2012 at 22:31:03 UTC, Rob T wrote:On Tuesday, 23 October 2012 at 16:30:41 UTC, Benjamin Thaut wrote:Having dealt with systems programming in languages with GC (Native Oberon, Modula-3), I wonder how much an optional GC would really matter, if D's GC had better performance. -- PauloHere a small update: I found a piece of code that did manually slow down the simulation in case it got to fast. This code never kicked in with the GC version, because it never reached the margin. The manual memory managed version however did reach the margin and was slowed down. With this piece of code removed the manual memory managed version runs at 5 ms which is 200 FPS and thus nearly 3 times as fast as the GC collected version. Kind Regards Benjamin ThautThat's a very significant difference in performance that should not be taken lightly. I don't really see a general solution to the GC problem other than to design things such that a D programmer has a truely practical ability to not use the GC at all and ensure that it does not sneak back in. IMHO I think it was a mistake to assume that D should depend on a GC to the degree that has taken place. The GC is also the reason why D has a few other significant technical problems not related to performance, such as inability to link D code to C/C++ code if the GC is required on the D side, and inability to build dynamic liraries and runtime loadable plugins that link to the runtime system - the GC apparently does not work correctly in these situatons, although the problem is solvable how this was allowed to happen in the first place is difficult to understand. I'll be a much more happy D programmer if I could guarantee where and when the GC is used, therefore the GC should be 100% optional in practice, not just in theory. --rt
Oct 24 2012
On Wednesday, 24 October 2012 at 12:21:03 UTC, Paulo Pinto wrote:Having dealt with systems programming in languages with GC (Native Oberon, Modula-3), I wonder how much an optional GC would really matter, if D's GC had better performance. -- PauloWell, performnce is only part of the GC equation. There's determinism, knowing when the GC is invoked and ability to control it, and increased complexity introduced by a GC, which tends to increase considerably when improving the GCs performance and ability to manage it manually. All this means there's a lot more potential for things going wrong, and this cycle of fixing the fix may never end. The cost of clinging onto a GC may be too high to be worth relying on as heavily as is being done, and effectivly forcing a GC on programmers is the wrong approach because not everyone has the same requirements that require its use. When I say "forcing", look at what had to be done to fix the performance of the game in question, what was done to get rid of the GC was a super-human effort and that is simply not a practical solution by any stretch of the imagination. A GC is both good and bad, not good for everyone and not bad for everyone, with shades of gray in between, so it has to be made fully optional, with good manual control, and easily so. --rt
Oct 24 2012
On Wednesday, 24 October 2012 at 18:26:48 UTC, Rob T wrote:On Wednesday, 24 October 2012 at 12:21:03 UTC, Paulo Pinto wrote:I do understand that. But on the other hand there are operating systems fully developed in such languages, like Blue Bottle, http://www.ocp.inf.ethz.ch/wiki/Documentation/WindowManager Or the real time system developed at ETHZ to control robot helicopters, http://static.usenix.org/events/vee05/full_papers/p35-kirsch.pdf I surely treble at the thought of a full GC collection in plane software. On the other hand I am old enough to remember the complaints that C was too slow and one needed to write everything in Assembly to have full control of the application code. Followed by C++ was too slow and one should use C structs with embedded pointers to have full control over the memory layout of the object table, instead of strange compiler generated VMT tables. So I always take the assertions that manual memory management is a must with a grain of salt. -- PauloHaving dealt with systems programming in languages with GC (Native Oberon, Modula-3), I wonder how much an optional GC would really matter, if D's GC had better performance. -- PauloWell, performnce is only part of the GC equation. There's determinism, knowing when the GC is invoked and ability to control it, and increased complexity introduced by a GC, which tends to increase considerably when improving the GCs performance and ability to manage it manually. All this means there's a lot more potential for things going wrong, and this cycle of fixing the fix may never end. The cost of clinging onto a GC may be too high to be worth relying on as heavily as is being done, and effectivly forcing a GC on programmers is the wrong approach because not everyone has the same requirements that require its use. When I say "forcing", look at what had to be done to fix the performance of the game in question, what was done to get rid of the GC was a super-human effort and that is simply not a practical solution by any stretch of the imagination. A GC is both good and bad, not good for everyone and not bad for everyone, with shades of gray in between, so it has to be made fully optional, with good manual control, and easily so. --rt
Oct 24 2012
On Wednesday, 24 October 2012 at 21:02:34 UTC, Paulo Pinto wrote:So I always take the assertions that manual memory management is a must with a grain of salt. -- PauloProbably no one in here is thinking that we should not have a GC. I'm sure that many applications will benefit from a GC, but I'm also certain that not all applicatins require a GC, and it's a mistake to assume everyone will be happy to have one as was illustrated in the OP. In my case, I'm not too concerned about performance, or pauses in the execution, but I do require dynamic loadable libraries, and I do want to link D code to existing C/C++ code, but in order to do these things, I cannot use the GC because I'm told that it will not work under these situations. It may be theoretically possible to build a near perfect GC that will work well for even RT applications, and will work for dynamic loadable libraies, etc, but while waiting for one to materialize in D, what are we supposed to do when the current GC is unsuitable? --rt
Oct 24 2012
On Wednesday, 24 October 2012 at 23:05:29 UTC, Rob T wrote:In my case, I'm not too concerned about performance, or pauses in the execution, but I do require dynamic loadable libraries, and I do want to link D code to existing C/C++ code, but in order to do these things, I cannot use the GC because I'm told that it will not work under these situations.You can very much link to C and C++ code, or have C and C++ code link to your D code, while still using the GC, you just have to be careful when you send GC memory to external code. You can even share the same GC between dynamic libraries and the host application (if both are D and use GC, of course) using the GC proxy system.
Oct 24 2012
On Wednesday, 24 October 2012 at 23:05:29 UTC, Rob T wrote:In my case, I'm not too concerned about performance, or pauses in the execution, but I do require dynamic loadable libraries, and I do want to link D code to existing C/C++ code, but in order to do these things, I cannot use the GC because I'm told that it will not work under these situations.You can very much link to C and C++ code, or have C and C++ code link to your D code, while still using the GC, you just have to be careful when you send GC memory to external code. You can even share the same GC between dynamic libraries and the host application (if both are D and use GC, of course) using the GC proxy system.
Oct 24 2012
On Thursday, 25 October 2012 at 02:15:41 UTC, Jakob Ovrum wrote:On Wednesday, 24 October 2012 at 23:05:29 UTC, Rob T wrote:I am speaking without knowing if such thing already exists. Maybe someone that knows the best way to do so, could write an article about best practices of using C and C++ code together in D applications. So that we could point them to it, in similar vein to the wonderful article about templates. -- PauloIn my case, I'm not too concerned about performance, or pauses in the execution, but I do require dynamic loadable libraries, and I do want to link D code to existing C/C++ code, but in order to do these things, I cannot use the GC because I'm told that it will not work under these situations.You can very much link to C and C++ code, or have C and C++ code link to your D code, while still using the GC, you just have to be careful when you send GC memory to external code. You can even share the same GC between dynamic libraries and the host application (if both are D and use GC, of course) using the GC proxy system.
Oct 24 2012
On Thursday, 25 October 2012 at 02:15:41 UTC, Jakob Ovrum wrote:You can very much link to C and C++ code, or have C and C++ code link to your D code, while still using the GC, you just have to be careful when you send GC memory to external code. You can even share the same GC between dynamic libraries and the host application (if both are D and use GC, of course) using the GC proxy system.My understanding of dynamic linking and the runtime is based on this thread http://www.digitalmars.com/d/archives/digitalmars/D/dynamic_library_building_and_loading_176983.html The runtime is not compiled to be sharable, so you cannot link it to shared libs by defult. However, hacking the gdc build system allowed me to compile the runtime into a sharable state, and all seemed well. However, based on the input from that thread, my understanding was that the GC would be unreliable at best. I suppose I could do some tests on it, but tests can only confirm so much. I'd also have to decipher the runtime source code to see what the heck it is doing or not. --rt
Oct 25 2012
On Thursday, 25 October 2012 at 08:34:15 UTC, Rob T wrote:My understanding of dynamic linking and the runtime is based on this thread http://www.digitalmars.com/d/archives/digitalmars/D/dynamic_library_building_and_loading_176983.html The runtime is not compiled to be sharable, so you cannot link it to shared libs by defult. However, hacking the gdc build system allowed me to compile the runtime into a sharable state, and all seemed well. However, based on the input from that thread, my understanding was that the GC would be unreliable at best. I suppose I could do some tests on it, but tests can only confirm so much. I'd also have to decipher the runtime source code to see what the heck it is doing or not. --rtYou are right that compiling the runtime itself (druntime and Phobos) as a shared library is not yet fully realized, but that doesn't stop you from compiling your own libraries and applications as shared libraries even if they statically link to the runtime (which is the current default behaviour).
Oct 25 2012
On Thursday, 25 October 2012 at 08:50:19 UTC, Jakob Ovrum wrote:You are right that compiling the runtime itself (druntime and Phobos) as a shared library is not yet fully realized, but that doesn't stop you from compiling your own libraries and applications as shared libraries even if they statically link to the runtime (which is the current default behaviour).Yes I can build my own D shared libs, both as static PIC (.a) and dynamically loadable (.so). however I cannot statically link my shared libs to druntime + phobos as-is. The only way I can do that, is to also compile druntime + phobos into PIC, which can be done as a static PIC lib. So what you are saying is that I can statically link PIC compiled druntime to my own shared lib, but I cannot build druntime as a dynamically loadable shared lib? I can see why thatmay work, if each shared lib has it's own private compy of the GC. Correct? I recall that druntime may have some ASM code that will not work when compiled to PIC. I think gdc removed the offending ASM code, but it may still be present in the dmd version, but I don't know for sure. Another question is if I can link a dynamic loadable D lib to C/C++ code or not? Yes I can do it, and it seems to work, but I was told that the GC will not necessarily work. Am I misunderstanding this part? --rt
Oct 25 2012
On Thursday, 25 October 2012 at 17:17:01 UTC, Rob T wrote:Yes I can build my own D shared libs, both as static PIC (.a) and dynamically loadable (.so). however I cannot statically link my shared libs to druntime + phobos as-is. The only way I can do that, is to also compile druntime + phobos into PIC, which can be done as a static PIC lib.Sorry, I keep forgetting that this is needed on non-Windows systems.So what you are saying is that I can statically link PIC compiled druntime to my own shared lib, but I cannot build druntime as a dynamically loadable shared lib? I can see why thatmay work, if each shared lib has it's own private compy of the GC. Correct?Yes, this is possible. Sending references to GC memory between the D modules then has the same rules as when sending it to non-D code, unless the host (the loader module) uses druntime to load the other modules, in which case it can in principle share the same GC with them.I recall that druntime may have some ASM code that will not work when compiled to PIC. I think gdc removed the offending ASM code, but it may still be present in the dmd version, but I don't know for sure.I think it was relatively recently that DMD could also compile the runtime as PIC, but I might be remembering wrong.Another question is if I can link a dynamic loadable D lib to C/C++ code or not? Yes I can do it, and it seems to work, but I was told that the GC will not necessarily work. Am I misunderstanding this part?The GC will work the same as usual inside the D code, but you have to manually keep track of references you send outside the scope of the GC, such as references to GC memory put on the C heap. This can be done with the GC.addRoot/removeRoot and GC.addRange/removeRange functions found in core.memory, or by retaining the references in global, TLS or GC memory. It's good practice to do this for all GC references sent to external code, as you don't know where the reference may end up. Of course, you have other options. You don't have to send references to GC memory to external code, you can always copy the data over to a different buffer, such as one on the C heap (i.e. malloc()). If the caller (in the case of a return value) or the callee (in the case of a function argument) expects to be able to call free() on the memory referenced, then you must do it this way regardless.
Oct 25 2012
On Thursday, 25 October 2012 at 17:17:01 UTC, Rob T wrote:Yes I can build my own D shared libs, both as static PIC (.a) and dynamically loadable (.so). however I cannot statically link my shared libs to druntime + phobos as-is. The only way I can do that, is to also compile druntime + phobos into PIC, which can be done as a static PIC lib.Sorry, I keep forgetting that this is needed on non-Windows systems.So what you are saying is that I can statically link PIC compiled druntime to my own shared lib, but I cannot build druntime as a dynamically loadable shared lib? I can see why thatmay work, if each shared lib has it's own private compy of the GC. Correct?Yes, this is possible. Sending references to GC memory between the D modules then has the same rules as when sending it to non-D code, unless the host (the loader module) uses druntime to load the other modules, in which case it can in principle share the same GC with them.I recall that druntime may have some ASM code that will not work when compiled to PIC. I think gdc removed the offending ASM code, but it may still be present in the dmd version, but I don't know for sure.I think it was relatively recently that DMD could also compile the runtime as PIC, but I might be remembering wrong.Another question is if I can link a dynamic loadable D lib to C/C++ code or not? Yes I can do it, and it seems to work, but I was told that the GC will not necessarily work. Am I misunderstanding this part?The GC will work the same as usual inside the D code, but you have to manually keep track of references you send outside the scope of the GC, such as references to GC memory put on the C heap. This can be done with the GC.addRoot/removeRoot and GC.addRange/removeRange functions found in core.memory, or by retaining the references in global, TLS or GC memory. It's good practice to do this for all GC references sent to external code, as you don't know where the reference may end up. Of course, you have other options. You don't have to send references to GC memory to external code, you can always copy the data over to a different buffer, such as one on the C heap (i.e. malloc()). If the caller (in the case of a return value) or the callee (in the case of a function argument) expects to be able to call free() on the memory referenced, then you must do it this way regardless.
Oct 25 2012
On Thursday, 25 October 2012 at 02:15:41 UTC, Jakob Ovrum wrote:You can even share the same GC between dynamic libraries and the host application (if both are D and use GC, of course) using the GC proxy system.What is the GC proxy system, and how do I make use of it? --rt
Oct 25 2012
On Thursday, 25 October 2012 at 17:20:40 UTC, Rob T wrote:On Thursday, 25 October 2012 at 02:15:41 UTC, Jakob Ovrum wrote:There's a function Runtime.loadLibrary in core.runtime that is supposed to load a shared library and get the symbol named `gc_setProxy` using the platform's dynamic library loading routines, then use that to share the host GC with the loaded library. I say "is supposed to" because I checked the code and it's currently a throwing stub on POSIX systems, it's only implemented for Windows (the source of the function can be found in rt_loadLibrary in rt/dmain2.d of druntime). When it comes to gc_setProxy - GDC exports this symbol by default on Windows, while DMD doesn't. I don't know why this is the case. I haven't built shared libraries on other OS' before so I don't know how GDC and DMD behave there.You can even share the same GC between dynamic libraries and the host application (if both are D and use GC, of course) using the GC proxy system.What is the GC proxy system, and how do I make use of it? --rt
Oct 25 2012
I use this GC thread to show a little GC-related benchmark. A little Reddit thread about using memory more compactly in Java: http://www.reddit.com/r/programming/comments/120xvf/compact_offheap_structurestuples_in_java/ The relative blog post: http://mechanical-sympathy.blogspot.it/2012/10/compact-off-heap-structurestuples-in.html So I have written a D version, in my test I have reduced the amount of memory allocated (NUM_RECORDS = 10_000_000): http://codepad.org/IhHjqUua With this lower memory usage the D version it's more than twice faster than the compact Java version that uses the same NUM_RECORDS (0.5 seconds against 1.2 seconds each loop after the first two ones). In D I have improved the loops, I have used an align() and a minimallyInitializedArray, this is not too much bad. But in the main() I have also had to use a deprecated "delete", because otherwise the GC doesn't deallocate the arrays and the program burns all the memory (setting the array to null and using GC.collect() isn't enough). This is not good. Bye, bearophile
Oct 26 2012
On Friday, 26 October 2012 at 14:21:51 UTC, bearophile wrote:But in the main() I have also had to use a deprecated "delete", because otherwise the GC doesn't deallocate the arrays and the program burns all the memory (setting the array to null and using GC.collect() isn't enough). This is not good.Is this happening with dmd 2.060 as released?
Oct 26 2012
Rob T:Is this happening with dmd 2.060 as released?I'm using 2.061alpha git head, but I guess the situation is the same with dmd 2.060. The code is linked in my post, so trying it is easy, it's one small module. Bye, bearophile
Oct 26 2012
On Friday, 26 October 2012 at 23:10:48 UTC, bearophile wrote:Rob T:I tried it with dmd 2.60 (released), and gdc 4.7 branch. I tried to check if memory was being freed by creating a struc destructor for JavaMemoryTrade, but that did not work as expected, leading me down the confusing and inconsistent path of figuring out why destructors do not get called when memory is freed. Long story short, I could not force a struct to execute its destructor if it was allocated on the heap unless I used delete. I tried destroy and clear, as well as GC.collect and GC.free(), nothing else worked. Memory heap management as well as struct destructors appear to be seriously broken. --rtIs this happening with dmd 2.060 as released?I'm using 2.061alpha git head, but I guess the situation is the same with dmd 2.060. The code is linked in my post, so trying it is easy, it's one small module. Bye, bearophile
Oct 26 2012
On Saturday, 27 October 2012 at 01:03:57 UTC, Rob T wrote:On Friday, 26 October 2012 at 23:10:48 UTC, bearophile wrote:OK my bad, partially. Heap allocated struct destructors will not get called using clear or destroy unless the struct reference is manually dereferenced. I got confused that class references behave differently than heap allocated struct references. I cannot be the first person to do this, and it must happen all the time. The auto dereferencing of a struc pointer when accessing members may be nice, but it makes struct pointers look exactly like class references, which will lead to mistakes. I do get the concept between classes and structs, but why does clear and destroy using a struct pointer not give me a compiler error or at least a warning? Is there any valid purpose to clear or destroy a pointer that is not dereferenced? Seems like a bug to me. --rtRob T:I tried it with dmd 2.60 (released), and gdc 4.7 branch. I tried to check if memory was being freed by creating a struc destructor for JavaMemoryTrade, but that did not work as expected, leading me down the confusing and inconsistent path of figuring out why destructors do not get called when memory is freed. Long story short, I could not force a struct to execute its destructor if it was allocated on the heap unless I used delete. I tried destroy and clear, as well as GC.collect and GC.free(), nothing else worked. Memory heap management as well as struct destructors appear to be seriously broken. --rtIs this happening with dmd 2.060 as released?I'm using 2.061alpha git head, but I guess the situation is the same with dmd 2.060. The code is linked in my post, so trying it is easy, it's one small module. Bye, bearophile
Oct 26 2012
On Saturday, 27 October 2012 at 01:03:57 UTC, Rob T wrote:On Friday, 26 October 2012 at 23:10:48 UTC, bearophile wrote:I made a mistake. The clear and destroy operations require that a pointer to a struc be manually dereferenced. What I don't understand is why the compiler allows you to pass a -not- dereferenced pointer to clear and destroy, this looks like a bug to me. It should either work just like a class reference does, or it should refuse to compile. I'm sure you've heard this many times before, but I have to say that it's very confusing when struct pointers behave exactly like class references, but not always. --rtRob T:I tried it with dmd 2.60 (released), and gdc 4.7 branch. I tried to check if memory was being freed by creating a struc destructor for JavaMemoryTrade, but that did not work as expected, leading me down the confusing and inconsistent path of figuring out why destructors do not get called when memory is freed. Long story short, I could not force a struct to execute its destructor if it was allocated on the heap unless I used delete. I tried destroy and clear, as well as GC.collect and GC.free(), nothing else worked. Memory heap management as well as struct destructors appear to be seriously broken. --rtIs this happening with dmd 2.060 as released?I'm using 2.061alpha git head, but I guess the situation is the same with dmd 2.060. The code is linked in my post, so trying it is easy, it's one small module. Bye, bearophile
Oct 26 2012
But in the main() I have also had to use a deprecated "delete",And setting trades.length to zero and then using GC.free() on its ptr gives the same good result. Bye, bearophile
Oct 27 2012
And with the usual optimizations (struct splitting) coming from talking a look at the access patterns, the D code gets faster: http://codepad.org/SnxnpcAB Bye, bearophile
Oct 27 2012