www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.announce - GC vs. Manual Memory Management Real World Comparison

reply Benjamin Thaut <code benjamin-thaut.de> writes:
I rewrote a 3d game I created during my studies with D 2.0 to manual 
memory mangement. If I'm not studying I'm working in the 3d Engine 
deparement of Havok. As I needed to pratice manual memory management and 
did want to get rid of the GC in D for quite some time, I did go through 
all this effort to create a GC free version of my game.

The results are:

     DMD GC Version: 71 FPS, 14.0 ms frametime
     GDC GC Version: 128.6 FPS, 7.72 ms frametime
     DMD MMM Version: 142.8 FPS, 7.02 ms frametime

GC collection times:

     DMD GC Version: 8.9 ms
     GDC GC Version: 4.1 ms

As you see the manual managed version is twice as fast as the garbage 
collected one. Even the highly optimized version created with GDC is 
still slower the the manual memory management.

You can find the full article at:

http://3d.benjamin-thaut.de/?p=20#more-20


Feedback is welcome.

Kind Regards
Benjamin Thaut
Sep 05 2012
next sibling parent reply =?ISO-8859-15?Q?Alex_R=F8nne_Petersen?= <alex lycus.org> writes:
On 05-09-2012 13:03, Benjamin Thaut wrote:
 I rewrote a 3d game I created during my studies with D 2.0 to manual
 memory mangement. If I'm not studying I'm working in the 3d Engine
 deparement of Havok. As I needed to pratice manual memory management and
 did want to get rid of the GC in D for quite some time, I did go through
 all this effort to create a GC free version of my game.

 The results are:

      DMD GC Version: 71 FPS, 14.0 ms frametime
      GDC GC Version: 128.6 FPS, 7.72 ms frametime
      DMD MMM Version: 142.8 FPS, 7.02 ms frametime

 GC collection times:

      DMD GC Version: 8.9 ms
      GDC GC Version: 4.1 ms

 As you see the manual managed version is twice as fast as the garbage
 collected one. Even the highly optimized version created with GDC is
 still slower the the manual memory management.

 You can find the full article at:

 http://3d.benjamin-thaut.de/?p=20#more-20


 Feedback is welcome.

 Kind Regards
 Benjamin Thaut

Is source code available anywhere? Also, I have to point out that programming for a garbage collected runtime is very different from doing manual memory management. The same patterns don't apply, and you optimize in different ways. For instance, when using a GC, it is very recommendable that you allocate up front and use object pooling - and most importantly, don't allocate at all during your render loop. -- Alex Rønne Petersen alex lycus.org http://lycus.org
Sep 05 2012
next sibling parent reply Benjamin Thaut <code benjamin-thaut.de> writes:
Am 05.09.2012 13:10, schrieb Alex Rønne Petersen:
 Is source code available anywhere?

 Also, I have to point out that programming for a garbage collected
 runtime is very different from doing manual memory management. The same
 patterns don't apply, and you optimize in different ways. For instance,
 when using a GC, it is very recommendable that you allocate up front and
 use object pooling - and most importantly, don't allocate at all during
 your render loop.

The sourcecode is not aviable yet, as it is in a repository of my university, but I can zip it and upload the current version if that is wanted. But it currently does only support Windows and does not have any setup instructions yet. I do object pooling in both versions, as in game developement you usually don't allocate during the frame. But still in the GC version you have the problem that way to many parts of the language allocate and you don't event notice it when using the GC. Just to clarify, I'm into 3d engine developement since about 7 years now. So I'm not a newcomer to the subject. Kind Regards Benjamin Thaut
Sep 05 2012
next sibling parent reply =?ISO-8859-15?Q?Alex_R=F8nne_Petersen?= <alex lycus.org> writes:
On 05-09-2012 13:19, Benjamin Thaut wrote:
 Am 05.09.2012 13:10, schrieb Alex Rønne Petersen:
 Is source code available anywhere?

 Also, I have to point out that programming for a garbage collected
 runtime is very different from doing manual memory management. The same
 patterns don't apply, and you optimize in different ways. For instance,
 when using a GC, it is very recommendable that you allocate up front and
 use object pooling - and most importantly, don't allocate at all during
 your render loop.

The sourcecode is not aviable yet, as it is in a repository of my university, but I can zip it and upload the current version if that is wanted. But it currently does only support Windows and does not have any setup instructions yet. I do object pooling in both versions, as in game developement you usually don't allocate during the frame. But still in the GC version you have the problem that way to many parts of the language allocate and you don't event notice it when using the GC. Just to clarify, I'm into 3d engine developement since about 7 years now. So I'm not a newcomer to the subject. Kind Regards Benjamin Thaut

Sure, I just want to point out that it's a problem with the language (GC allocations being very non-obvious) as opposed to the nature of GC. -- Alex Rønne Petersen alex lycus.org http://lycus.org
Sep 05 2012
parent reply Benjamin Thaut <code benjamin-thaut.de> writes:
Am 05.09.2012 14:00, schrieb Alex Rønne Petersen:
 Sure, I just want to point out that it's a problem with the language (GC
 allocations being very non-obvious) as opposed to the nature of GC.

Thats exactly what I want to cause with this post. More effort should be put into the parts of D that currently allocate, but absolutley don't have to. Also the statement "You can use D without a GC" is not quite as easy as the homepage makes it sound. My favorite hidden allocation so far is: class A {} class B : A{} A a = new A(); B b = new B(); if(a == b) //this will allocate { } Kind Regards Benjamin Thaut
Sep 05 2012
next sibling parent Benjamin Thaut <code benjamin-thaut.de> writes:
Am 05.09.2012 14:07, schrieb Benjamin Thaut:
 class A {}
 class B : A{}

 A a = new A();
 B b = new B();

 if(a == b) //this will allocate
 {
 }

Should be: class A {} class B : A{} const(A) a = new A(); const(B) b = new B(); if(a == b) //this will allocate { }
Sep 05 2012
prev sibling parent reply =?ISO-8859-15?Q?Alex_R=F8nne_Petersen?= <alex lycus.org> writes:
On 05-09-2012 14:07, Benjamin Thaut wrote:
 Am 05.09.2012 14:00, schrieb Alex Rønne Petersen:
  >
 Sure, I just want to point out that it's a problem with the language (GC
 allocations being very non-obvious) as opposed to the nature of GC.

Thats exactly what I want to cause with this post. More effort should be put into the parts of D that currently allocate, but absolutley don't have to. Also the statement "You can use D without a GC" is not quite as easy as the homepage makes it sound.

Very true. I've often thought we should ship a GC-less druntime in the normal distribution.
 My favorite hidden allocation so far is:

 class A {}
 class B : A{}

 A a = new A();
 B b = new B();

 if(a == b) //this will allocate
 {
 }

Where's the catch? From looking in druntime, I don't see where the allocation could occur.
 Kind Regards
 Benjamin Thaut

-- Alex Rønne Petersen alex lycus.org http://lycus.org
Sep 05 2012
parent reply Benjamin Thaut <code benjamin-thaut.de> writes:
Am 05.09.2012 14:14, schrieb Alex Rønne Petersen:
 Where's the catch? From looking in druntime, I don't see where the
 allocation could occur.

Everything is in object_.d: equals_t opEquals(Object lhs, Object rhs) { if (lhs is rhs) return true; if (lhs is null || rhs is null) return false; if (typeid(lhs) == typeid(rhs)) return lhs.opEquals(rhs); return lhs.opEquals(rhs) && rhs.opEquals(lhs); } Will trigger a comparison of the TypeInfo objects with if (typeid(lhs) == typeid(rhs)) Which will after some function calls trigger opEquals of TypeInfo override equals_t opEquals(Object o) { /* TypeInfo instances are singletons, but duplicates can exist * across DLL's. Therefore, comparing for a name match is * sufficient. */ if (this is o) return true; TypeInfo ti = cast(TypeInfo)o; return ti && this.toString() == ti.toString(); } Then because they are const, TypeInfo_Const.toString() will be called: override string toString() { return cast(string) ("const(" ~ base.toString() ~ ")"); } which allocates, due to array concardination. But this only happens, if they are not of the same type, and if one of them has a storage qualifier. Kind Regards Benjamin Thaut
Sep 05 2012
next sibling parent Benjamin Thaut <code benjamin-thaut.de> writes:
Am 05.09.2012 14:34, schrieb Peter Alexander:
 On Wednesday, 5 September 2012 at 12:27:05 UTC, Benjamin Thaut wrote:
 Then because they are const, TypeInfo_Const.toString() will be called:

     override string toString()
     {
         return cast(string) ("const(" ~ base.toString() ~ ")");
     }

 which allocates, due to array concardination.

Wow.

I already have a fix for this. https://github.com/Ingrater/druntime/commit/74713f7af496fd50fe4cfe60b3d9906b87efbdb6 https://github.com/Ingrater/druntime/commit/05c440b0322d39cf98425f50172c468c6659efb8 If I find a good description how to do pull requests, I might be able to do one. Kind Regards Benjamin Thaut
Sep 05 2012
prev sibling parent reply Benjamin Thaut <code benjamin-thaut.de> writes:
Am 05.09.2012 15:07, schrieb Iain Buclaw:
 On 5 September 2012 14:04, Iain Buclaw <ibuclaw ubuntu.com> wrote:
 On 5 September 2012 13:27, Benjamin Thaut <code benjamin-thaut.de> wrote:
 Am 05.09.2012 14:14, schrieb Alex Rønne Petersen:

 Where's the catch? From looking in druntime, I don't see where the
 allocation could occur.

Everything is in object_.d: equals_t opEquals(Object lhs, Object rhs) { if (lhs is rhs) return true; if (lhs is null || rhs is null) return false; if (typeid(lhs) == typeid(rhs)) return lhs.opEquals(rhs); return lhs.opEquals(rhs) && rhs.opEquals(lhs); } Will trigger a comparison of the TypeInfo objects with if (typeid(lhs) == typeid(rhs)) Which will after some function calls trigger opEquals of TypeInfo override equals_t opEquals(Object o) { /* TypeInfo instances are singletons, but duplicates can exist * across DLL's. Therefore, comparing for a name match is * sufficient. */ if (this is o) return true; TypeInfo ti = cast(TypeInfo)o; return ti && this.toString() == ti.toString(); }

This got fixed. Said code is now: override equals_t opEquals(Object o) { if (this is o) return true; auto c = cast(const TypeInfo_Class)o; return c && this.info.name == c.info.name; } Causing no hidden allocation.

Oops, let me correct myself. This was hacked at to call the *correct* opEquals method above. bool opEquals(const Object lhs, const Object rhs) { // A hack for the moment. return opEquals(cast()lhs, cast()rhs); } Regards

Still, comparing two type info objects will result in one or multiple allocations most of the time. Kind Regards Benjamin Thaut
Sep 05 2012
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 9/5/12 4:59 PM, Benjamin Thaut wrote:
 Still, comparing two type info objects will result in one or multiple
 allocations most of the time.

Could you please submit a patch for that? Thanks! Andrei P.S. Very nice work. Congrats!
Sep 05 2012
prev sibling parent Piotr Szturmaj <bncrbme jadamspam.pl> writes:
Benjamin Thaut wrote:
 I do object pooling in both versions, as in game developement you
 usually don't allocate during the frame. But still in the GC version you
 have the problem that way to many parts of the language allocate and you
 don't event notice it when using the GC.

There's one proposed solution to this problem: http://forum.dlang.org/thread/k1rlhn$19du$1 digitalmars.com
Sep 05 2012
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
Benjamin Thaut:

 But still in the GC version you have the problem that way to 
 many parts of the language allocate and you don't event notice 
 it when using the GC.

Maybe a compiler-enforced annotation for functions and modules is able to remove this problem in D. Bye, bearophile
Sep 05 2012
prev sibling next sibling parent "Peter Alexander" <peter.alexander.au gmail.com> writes:
On Wednesday, 5 September 2012 at 12:27:05 UTC, Benjamin Thaut 
wrote:
 Then because they are const, TypeInfo_Const.toString() will be 
 called:

     override string toString()
     {
         return cast(string) ("const(" ~ base.toString() ~ ")");
     }

 which allocates, due to array concardination.

Wow.
Sep 05 2012
prev sibling next sibling parent Iain Buclaw <ibuclaw ubuntu.com> writes:
On 5 September 2012 13:27, Benjamin Thaut <code benjamin-thaut.de> wrote:
 Am 05.09.2012 14:14, schrieb Alex R=F8nne Petersen:

 Where's the catch? From looking in druntime, I don't see where the
 allocation could occur.

Everything is in object_.d: equals_t opEquals(Object lhs, Object rhs) { if (lhs is rhs) return true; if (lhs is null || rhs is null) return false; if (typeid(lhs) =3D=3D typeid(rhs)) return lhs.opEquals(rhs); return lhs.opEquals(rhs) && rhs.opEquals(lhs); } Will trigger a comparison of the TypeInfo objects with if (typeid(lhs) =3D=3D typeid(rhs)) Which will after some function calls trigger opEquals of TypeInfo override equals_t opEquals(Object o) { /* TypeInfo instances are singletons, but duplicates can exist * across DLL's. Therefore, comparing for a name match is * sufficient. */ if (this is o) return true; TypeInfo ti =3D cast(TypeInfo)o; return ti && this.toString() =3D=3D ti.toString(); }

This got fixed. Said code is now: override equals_t opEquals(Object o) { if (this is o) return true; auto c =3D cast(const TypeInfo_Class)o; return c && this.info.name =3D=3D c.info.name; } Causing no hidden allocation. Regards --=20 Iain Buclaw *(p < e ? p++ : p) =3D (c & 0x0f) + '0';
Sep 05 2012
prev sibling next sibling parent Iain Buclaw <ibuclaw ubuntu.com> writes:
On 5 September 2012 14:04, Iain Buclaw <ibuclaw ubuntu.com> wrote:
 On 5 September 2012 13:27, Benjamin Thaut <code benjamin-thaut.de> wrote:
 Am 05.09.2012 14:14, schrieb Alex R=F8nne Petersen:

 Where's the catch? From looking in druntime, I don't see where the
 allocation could occur.

Everything is in object_.d: equals_t opEquals(Object lhs, Object rhs) { if (lhs is rhs) return true; if (lhs is null || rhs is null) return false; if (typeid(lhs) =3D=3D typeid(rhs)) return lhs.opEquals(rhs); return lhs.opEquals(rhs) && rhs.opEquals(lhs); } Will trigger a comparison of the TypeInfo objects with if (typeid(lhs) =3D=3D typeid(rhs)) Which will after some function calls trigger opEquals of TypeInfo override equals_t opEquals(Object o) { /* TypeInfo instances are singletons, but duplicates can exist * across DLL's. Therefore, comparing for a name match is * sufficient. */ if (this is o) return true; TypeInfo ti =3D cast(TypeInfo)o; return ti && this.toString() =3D=3D ti.toString(); }

This got fixed. Said code is now: override equals_t opEquals(Object o) { if (this is o) return true; auto c =3D cast(const TypeInfo_Class)o; return c && this.info.name =3D=3D c.info.name; } Causing no hidden allocation.

Oops, let me correct myself. This was hacked at to call the *correct* opEquals method above. bool opEquals(const Object lhs, const Object rhs) { // A hack for the moment. return opEquals(cast()lhs, cast()rhs); } Regards --=20 Iain Buclaw *(p < e ? p++ : p) =3D (c & 0x0f) + '0';
Sep 05 2012
prev sibling next sibling parent "SomeDude" <lovelydear mailmetrash.com> writes:
On Wednesday, 5 September 2012 at 12:28:43 UTC, Piotr Szturmaj 
wrote:
 Benjamin Thaut wrote:
 I do object pooling in both versions, as in game developement 
 you
 usually don't allocate during the frame. But still in the GC 
 version you
 have the problem that way to many parts of the language 
 allocate and you
 don't event notice it when using the GC.

There's one proposed solution to this problem: http://forum.dlang.org/thread/k1rlhn$19du$1 digitalmars.com

It's a bad solution imho. Monitoring the druntime and hunting every part that allocates until our codebase is correct like Benjamen Thaut is a much better solution
Sep 10 2012
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
SomeDude:

 It's a bad solution imho. Monitoring the druntime and hunting 
 every part that allocates until our codebase is correct like 
 Benjamen Thaut is a much better solution

Why do you think such hunt is better than letting the compiler tell you what parts of your program have the side effects you want to avoid? Bye, bearophile
Sep 11 2012
prev sibling next sibling parent Iain Buclaw <ibuclaw ubuntu.com> writes:
--14dae9cdc4872ffcf704c96d4b16
Content-Type: text/plain; charset=ISO-8859-1

Is not difficult to implement, as the compiler only needs to warn that the
emission of /certain/ library calls /may/ cause heap allocations.

Regards.
----
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';

On 11 Sep 2012 11:31, "bearophile" <bearophileHUGS lycos.com> wrote:
 SomeDude:


 It's a bad solution imho. Monitoring the druntime and hunting every part


better solution
 Why do you think such hunt is better than letting the compiler tell you

 Bye,
 bearophile

--14dae9cdc4872ffcf704c96d4b16 Content-Type: text/html; charset=ISO-8859-1 <p>Is not difficult to implement, as the compiler only needs to warn that the emission of /certain/ library calls /may/ cause heap allocations.<br></p> <p>Regards. <br> ----<br> Iain Buclaw</p> <p>*(p &lt; e ? p++ : p) = (c &amp; 0x0f) + &#39;0&#39;;</p> <p>On 11 Sep 2012 11:31, &quot;bearophile&quot; &lt;<a href="mailto:bearophileHUGS lycos.com">bearophileHUGS lycos.com</a>&gt; wrote:<br> &gt;<br> &gt; SomeDude:<br> &gt;<br> &gt;<br> &gt;&gt; It&#39;s a bad solution imho. Monitoring the druntime and hunting every part that allocates until our codebase is correct like Benjamen Thaut is a much better solution<br> &gt;<br> &gt;<br> &gt; Why do you think such hunt is better than letting the compiler tell you what parts of your program have the side effects you want to avoid?<br> &gt;<br> &gt; Bye,<br> &gt; bearophile<br> </p> --14dae9cdc4872ffcf704c96d4b16--
Sep 11 2012
prev sibling next sibling parent "SomeDude" <lovelydear mailmetrash.com> writes:
On Tuesday, 11 September 2012 at 10:28:29 UTC, bearophile wrote:
 SomeDude:

 It's a bad solution imho. Monitoring the druntime and hunting 
 every part that allocates until our codebase is correct like 
 Benjamen Thaut is a much better solution

Why do you think such hunt is better than letting the compiler tell you what parts of your program have the side effects you want to avoid? Bye, bearophile

My problem is you litter your codebase with nogc everywhere. In similar fashion, the nothrow keyword, for instance, has to be appended just about everywhere and I find it very ugly on its own. Basically, with this scheme, you have to annotate every single method you write for each and every guarantee (nothrow, nogc, nosideeffect, noshared, whatever you fancy) you want to ensure. This doesn't scale well at all. I would find it okay to use a noalloc annotation as a shortcut for a compiler switch or a an external tool to detect allocations in some part of code (as a digression, I tend to think D annotations as compiler or tooling switches. One could imagine a general scheme where one associates a annotation with a compiler/tool switch whose effect is limited to the annotated scope). I suppose the tool has to build the full call tree starting with the nogc method until it reaches the leaves or finds calls to new or malloc; you would have to do that for every single nogc annotation, which could be very slow, unless you trust the developer that indeed his code doesn't allocate, which means he effectively needs to litter his codebase with nogc keywords.
Sep 11 2012
prev sibling next sibling parent "Felix Hufnagel" <suicide xited.de> writes:
class Foo
{
      safe nothrow:
	void method_is_nothrow(){}
	void method_is_also_nothrow(){}
}


or

class Foo
{
      safe nothrow
     {	=

	void method_is_nothrow(){}
	void method_is_also_nothrow(){}
     }
}

no need to append it to every single method by hand...



Am 12.09.2012, 04:38 Uhr, schrieb SomeDude <lovelydear mailmetrash.com>:=


 On Tuesday, 11 September 2012 at 10:28:29 UTC, bearophile wrote:
 SomeDude:

 It's a bad solution imho. Monitoring the druntime and hunting every =



 part that allocates until our codebase is correct like Benjamen Thau=



 is a much better solution

Why do you think such hunt is better than letting the compiler tell y=


 what parts of your program have the side effects you want to avoid?

 Bye,
 bearophile

My problem is you litter your codebase with nogc everywhere. In simil=

 fashion, the nothrow keyword, for instance, has to be appended just  =

 about everywhere and I find it very ugly on its own. Basically, with  =

 this scheme, you have to annotate every single method you write for ea=

 and every guarantee (nothrow, nogc, nosideeffect, noshared, whatever y=

 fancy) you want to ensure. This doesn't scale well at all.

 I would find it okay to use a  noalloc annotation as a shortcut for a =

 compiler switch or a an external tool to detect allocations in some pa=

 of code (as a digression, I tend to think D  annotations as compiler o=

 tooling switches. One could imagine a general scheme where one  =

 associates a  annotation with a compiler/tool switch whose effect is  =

 limited to the annotated scope).
 I suppose the tool has to build the full call tree starting with the  =

  nogc method until it reaches the leaves or finds calls to new or  =

 malloc; you would have to do that for every single  nogc annotation,  =

 which could be very slow, unless you trust the developer that indeed h=

 code doesn't allocate, which means he effectively needs to litter his =

 codebase with nogc keywords.

-- = Erstellt mit Operas revolution=E4rem E-Mail-Modul: http://www.opera.com/= mail/
Sep 12 2012
prev sibling next sibling parent "Paulo Pinto" <pjmlp progtools.org> writes:
On Wednesday, 12 September 2012 at 02:37:52 UTC, SomeDude wrote:
 On Tuesday, 11 September 2012 at 10:28:29 UTC, bearophile wrote:
 SomeDude:

 It's a bad solution imho. Monitoring the druntime and hunting 
 every part that allocates until our codebase is correct like 
 Benjamen Thaut is a much better solution

Why do you think such hunt is better than letting the compiler tell you what parts of your program have the side effects you want to avoid? Bye, bearophile

My problem is you litter your codebase with nogc everywhere. In similar fashion, the nothrow keyword, for instance, has to be appended just about everywhere and I find it very ugly on its own. Basically, with this scheme, you have to annotate every single method you write for each and every guarantee (nothrow, nogc, nosideeffect, noshared, whatever you fancy) you want to ensure. This doesn't scale well at all. I would find it okay to use a noalloc annotation as a shortcut for a compiler switch or a an external tool to detect allocations in some part of code (as a digression, I tend to think D annotations as compiler or tooling switches. One could imagine a general scheme where one associates a annotation with a compiler/tool switch whose effect is limited to the annotated scope). I suppose the tool has to build the full call tree starting with the nogc method until it reaches the leaves or finds calls to new or malloc; you would have to do that for every single nogc annotation, which could be very slow, unless you trust the developer that indeed his code doesn't allocate, which means he effectively needs to litter his codebase with nogc keywords.

This is partially what happens in C++/CLI and C++/CX.
Sep 13 2012
prev sibling parent "Rob T" <rob ucora.com> writes:
On Tuesday, 11 September 2012 at 10:28:29 UTC, bearophile wrote:
 SomeDude:

 It's a bad solution imho. Monitoring the druntime and hunting 
 every part that allocates until our codebase is correct like 
 Benjamen Thaut is a much better solution

Why do you think such hunt is better than letting the compiler tell you what parts of your program have the side effects you want to avoid?

The compiler option warning about undesirable heap allocations will allow for complete undesirable allocations to be identified much more easily and without missing anything. This is a general solution to a general problem where a programmer wishes to avoid heap allocations for whatever reason. --rt
Oct 23 2012
prev sibling next sibling parent reply =?ISO-8859-15?Q?Alex_R=F8nne_Petersen?= <alex lycus.org> writes:
On 05-09-2012 13:03, Benjamin Thaut wrote:
 I rewrote a 3d game I created during my studies with D 2.0 to manual
 memory mangement. If I'm not studying I'm working in the 3d Engine
 deparement of Havok. As I needed to pratice manual memory management and
 did want to get rid of the GC in D for quite some time, I did go through
 all this effort to create a GC free version of my game.

 The results are:

      DMD GC Version: 71 FPS, 14.0 ms frametime
      GDC GC Version: 128.6 FPS, 7.72 ms frametime
      DMD MMM Version: 142.8 FPS, 7.02 ms frametime

 GC collection times:

      DMD GC Version: 8.9 ms
      GDC GC Version: 4.1 ms

 As you see the manual managed version is twice as fast as the garbage
 collected one. Even the highly optimized version created with GDC is
 still slower the the manual memory management.

 You can find the full article at:

 http://3d.benjamin-thaut.de/?p=20#more-20


 Feedback is welcome.

 Kind Regards
 Benjamin Thaut

BTW, your blog post appears to have comparison misspelled. -- Alex Rønne Petersen alex lycus.org http://lycus.org
Sep 05 2012
parent reply "anonymous" <anonymous nobody.alone> writes:
On Wednesday, 5 September 2012 at 12:22:52 UTC, Alex Rønne 
Petersen wrote:
 On 05-09-2012 13:03, Benjamin Thaut wrote:
 I rewrote a 3d game I created during my studies with D 2.0 to 
 manual
 memory mangement. If I'm not studying I'm working in the 3d 
 Engine
 deparement of Havok. As I needed to pratice manual memory 
 management and
 did want to get rid of the GC in D for quite some time, I did 
 go through
 all this effort to create a GC free version of my game.

 The results are:

     DMD GC Version: 71 FPS, 14.0 ms frametime
     GDC GC Version: 128.6 FPS, 7.72 ms frametime
     DMD MMM Version: 142.8 FPS, 7.02 ms frametime

 GC collection times:

     DMD GC Version: 8.9 ms
     GDC GC Version: 4.1 ms

 As you see the manual managed version is twice as fast as the 
 garbage
 collected one. Even the highly optimized version created with 
 GDC is
 still slower the the manual memory management.

 You can find the full article at:

 http://3d.benjamin-thaut.de/?p=20#more-20


 Feedback is welcome.

 Kind Regards
 Benjamin Thaut

BTW, your blog post appears to have comparison misspelled.

Also "development". It was interesting to read it. What about GDC MMM?
Sep 05 2012
parent Benjamin Thaut <code benjamin-thaut.de> writes:
Am 05.09.2012 16:07, schrieb anonymous:
 It was interesting to read it. What about GDC MMM?

The GDC druntime does have a different folder structure, which makes it a lot more time consuming to add in the changes. Also it is not possible to rebuild phobos or druntime with the binary release of GDC Mingw. You need the complete build setup for GDC mingw to do that. As this is not documented very well and quite some work I didn't go through that additional effort. Kind Regards Benjamin Thaut
Sep 05 2012
prev sibling next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 9/5/12 1:03 PM, Benjamin Thaut wrote:
 http://3d.benjamin-thaut.de/?p=20#more-20

Smile, you're on reddit: http://www.reddit.com/r/programming/comments/ze4cx/real_world_comparison_gc_vs_manual_memory/ Andrei
Sep 05 2012
prev sibling next sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Benjamin Thaut:

 http://3d.benjamin-thaut.de/?p=20#more-20

Regardind your issues list, most of them are fixable, like the one regarding array literals, and even the one regarding the invariant handler. But I didn't know about this, and I don't know how and if this is fixable:
The new statement will not free any memory if the constructor 
throws a exception.<

Insights welcome. Bye, bearophile
Sep 05 2012
parent Benjamin Thaut <code benjamin-thaut.de> writes:
Am 05.09.2012 16:57, schrieb bearophile:
 Benjamin Thaut:

 http://3d.benjamin-thaut.de/?p=20#more-20

Regardind your issues list, most of them are fixable, like the one regarding array literals, and even the one regarding the invariant handler. But I didn't know about this, and I don't know how and if this is fixable:
 The new statement will not free any memory if the constructor throws a
 exception.<

Insights welcome. Bye, bearophile

Well, as overloading new and delete is deprecated, and the new which is part of the language only works together with a GC I don't think that anything will be done about this. Its not a big problem in D because you can't create arrays of objects so that multiple constructors will be called at the same time. (Which is the biggest issue in c++ with exceptions and constructors). Also doe to memory pre initialization the object will always be in a meaningfull state, which helps with exception handling too. My replacement just calls the constructor, and if a exception is thrown, the destructor is called and the memory is freed, then the new statement returns null. Works flawlessley so far. Kind Regards Benjamin Thaut
Sep 05 2012
prev sibling next sibling parent Iain Buclaw <ibuclaw ubuntu.com> writes:
On 5 September 2012 15:57, bearophile <bearophileHUGS lycos.com> wrote:
 Benjamin Thaut:

 http://3d.benjamin-thaut.de/?p=20#more-20

Regardind your issues list, most of them are fixable, like the one regarding array literals, and even the one regarding the invariant handler.

I have no clue what the issue with invariant handlers is... Calls to them are not emitted in release code, and if you think they are, then you've probably built either your application, or the library you are using wrong. Array literals are not so easy to fix. I once thought that it would be optimal to make it a stack initialisation given that all values are known at compile time, this infact caused many strange SEGV's in quite a few of my programs (most are parsers / interpreters, so things that go down *heavy* nested into itself, and it was under these circumstances that array literals on the stack would go corrupt in one way or another causing *huge* errors in perfectly sound code). -- Iain Buclaw *(p < e ? p++ : p) = (c & 0x0f) + '0';
Sep 05 2012
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
Iain Buclaw:

Most of the array allocation cases we are talking about are like:

void main() {
   int[3] a = [1, 2, 3]; // fixed size array
}


That currently produces, with DMD:

__Dmain:
L0:     sub ESP, 010h
         mov EAX, offset FLAT:_D12TypeInfo_xAi6__initZ
         push EBX
         push 0Ch
         push 3
         push EAX
         call near ptr __d_arrayliteralTX
         add ESP, 8
         mov EBX, EAX
         mov dword ptr [EAX], 1
         mov ECX, EBX
         push EBX
         lea EDX, 010h[ESP]
         mov dword ptr 4[EBX], 2
         mov dword ptr 8[EBX], 3
         push EDX
         call near ptr _memcpy
         add ESP, 0Ch
         xor EAX, EAX
         pop EBX
         add ESP, 010h
         ret



There is also the case for dynamic arrays:

void main() {
   int[] a = [1, 2, 3];
   // use a here
}

But this is a harder problem, to leave for later.


 this infact caused many strange SEGV's in quite
 a few of my programs  (most are parsers / interpreters, so 
 things that
 go down *heavy* nested into itself, and it was under these
 circumstances that array literals on the stack would go corrupt 
 in one
 way or another causing *huge* errors in perfectly sound code).

Do you know the cause of such corruptions? maybe they are caused by other compiler bugs... And what to do regarding those exceptions in constructors? :-) Bye, bearophile
Sep 05 2012
prev sibling next sibling parent Iain Buclaw <ibuclaw ubuntu.com> writes:
On 5 September 2012 16:31, bearophile <bearophileHUGS lycos.com> wrote:
 Iain Buclaw:

 Most of the array allocation cases we are talking about are like:

 void main() {
   int[3] a = [1, 2, 3]; // fixed size array
 }


 That currently produces, with DMD:

 __Dmain:
 L0:     sub ESP, 010h
         mov EAX, offset FLAT:_D12TypeInfo_xAi6__initZ
         push EBX
         push 0Ch
         push 3
         push EAX
         call near ptr __d_arrayliteralTX
         add ESP, 8
         mov EBX, EAX
         mov dword ptr [EAX], 1
         mov ECX, EBX
         push EBX
         lea EDX, 010h[ESP]
         mov dword ptr 4[EBX], 2
         mov dword ptr 8[EBX], 3
         push EDX
         call near ptr _memcpy
         add ESP, 0Ch
         xor EAX, EAX
         pop EBX
         add ESP, 010h
         ret



 There is also the case for dynamic arrays:

 void main() {
   int[] a = [1, 2, 3];
   // use a here
 }

 But this is a harder problem, to leave for later.



 this infact caused many strange SEGV's in quite
 a few of my programs  (most are parsers / interpreters, so things that
 go down *heavy* nested into itself, and it was under these
 circumstances that array literals on the stack would go corrupt in one
 way or another causing *huge* errors in perfectly sound code).

Do you know the cause of such corruptions? maybe they are caused by other compiler bugs... And what to do regarding those exceptions in constructors? :-)

I think it was mostly due to that you can't tell the difference between array literals that are to be assigned to either dynamic or static arrays (as far as I can tell). I do believe that the issues surrounded dynamic arrays causing SEGVs, and not static (I don't recall ever needing the use of a static array :-). Regards -- Iain Buclaw *(p < e ? p++ : p) = (c & 0x0f) + '0';
Sep 05 2012
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
Iain Buclaw:

 I think it was mostly due to that you can't tell the difference
 between array literals that are to be assigned to either 
 dynamic or static arrays (as far as I can tell).

I see.
 I do believe that the issues
 surrounded dynamic arrays causing SEGVs, and not static  (I 
 don't recall ever needing the use of a static array :-).

I use fixed size arrays all the time in D. Heap-allocated arrays are overused in D. They produce garbage and in lot of cases they are not needed. Using them a lot is sometimes a bad habit (if you are writing script-like programs they are OK), that's also encouraged by making them almost second-class citizens in Phobos (and druntime, using them as AA keys causes performance troubles). If you take a look at Ada language you see how much static/stack-allocated arrays are used. In high performance code they help, and I'd like D programmers and Phobos devs to give them a little more consideration. Bye, bearophile
Sep 05 2012
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
 If you take a look at Ada language you see how much 
 static/stack-allocated arrays are used. In high performance 
 code they help, and I'd like D programmers and Phobos devs to 
 give them a little more consideration.

Also, the lack of variable length stack allocated arrays in D forces you to over-allocate, wasting stack space, or forces you to use alloca() that is bug-prone and makes things not easy if you need a multi dimensional array. Bye, bearophile
Sep 05 2012
prev sibling next sibling parent Benjamin Thaut <code benjamin-thaut.de> writes:
My "standard" library is now aviable on github:

https://github.com/Ingrater/thBase

Kind Regards
Benjamin Thaut
Sep 05 2012
prev sibling next sibling parent reply Johannes Pfau <nospam example.com> writes:
Am Wed, 05 Sep 2012 13:03:37 +0200
schrieb Benjamin Thaut <code benjamin-thaut.de>:

 I rewrote a 3d game I created during my studies with D 2.0 to manual 
 memory mangement. If I'm not studying I'm working in the 3d Engine 
 deparement of Havok. As I needed to pratice manual memory management
 and did want to get rid of the GC in D for quite some time, I did go
 through all this effort to create a GC free version of my game.
 
 The results are:
 
      DMD GC Version: 71 FPS, 14.0 ms frametime
      GDC GC Version: 128.6 FPS, 7.72 ms frametime
      DMD MMM Version: 142.8 FPS, 7.02 ms frametime
 
 GC collection times:
 
      DMD GC Version: 8.9 ms
      GDC GC Version: 4.1 ms
 
 As you see the manual managed version is twice as fast as the garbage 
 collected one. Even the highly optimized version created with GDC is 
 still slower the the manual memory management.
 
 You can find the full article at:
 
 http://3d.benjamin-thaut.de/?p=20#more-20
 
 
 Feedback is welcome.

Would be great if some of the code could be merged into phobos, especially the memory tracker. But also things like memory or object pools would be great in phobos, an emplace wrapper which accepts a custom alloc function to replace new (and something similar for delete), etc. We really need a module for manual memory management (std.mmm?). And functions which currently use the GC to allocate should get overloads which take buffers (Or better support custom allocators, but that needs an allocator design first).
Sep 05 2012
parent Benjamin Thaut <code benjamin-thaut.de> writes:
Am 05.09.2012 19:31, schrieb Johannes Pfau:
 Would be great if some of the code could be merged into phobos,
 especially the memory tracker. But also things like memory or object
 pools would be great in phobos, an emplace wrapper which accepts a
 custom alloc function to replace new (and something similar for delete),
 etc. We really need a module for manual memory management (std.mmm?).
 And functions which currently use the GC to allocate should get
 overloads which take buffers (Or better support custom allocators, but
 that needs an allocator design first).

I personally really like my composite template, which allows for direct composition of one class instance into another. It does not introduce additional indirections and the compiler will remind you, if you forgett to initialize it. https://github.com/Ingrater/druntime/blob/master/src/core/allocator.d#L670 Kind Regards Benjamin Thaut
Sep 05 2012
prev sibling next sibling parent Sean Kelly <sean invisibleduck.org> writes:
On Sep 5, 2012, at 8:08 AM, Iain Buclaw <ibuclaw ubuntu.com> wrote:
=20
 Array literals are not so easy to fix.  I once thought that it would
 be optimal to make it a stack initialisation given that all values are
 known at compile time, this infact caused many strange SEGV's in quite
 a few of my programs  (most are parsers / interpreters, so things that
 go down *heavy* nested into itself, and it was under these
 circumstances that array literals on the stack would go corrupt in one
 way or another causing *huge* errors in perfectly sound code).

It sounds like your code has escaping references? I think the presence = of a GC tends to eliminate a lot of thought about data ownership. This = is usually beneficial in that maintaining ownership rules tends to be a = huge pain, but then it also tends to avoid issues like this.=
Sep 05 2012
prev sibling next sibling parent "Nathan M. Swan" <nathanmswan gmail.com> writes:
On Wednesday, 5 September 2012 at 11:03:03 UTC, Benjamin Thaut 
wrote:
 I rewrote a 3d game I created during my studies with D 2.0 to 
 manual memory mangement. If I'm not studying I'm working in the 
 3d Engine deparement of Havok. As I needed to pratice manual 
 memory management and did want to get rid of the GC in D for 
 quite some time, I did go through all this effort to create a 
 GC free version of my game.

 The results are:

     DMD GC Version: 71 FPS, 14.0 ms frametime
     GDC GC Version: 128.6 FPS, 7.72 ms frametime
     DMD MMM Version: 142.8 FPS, 7.02 ms frametime

 GC collection times:

     DMD GC Version: 8.9 ms
     GDC GC Version: 4.1 ms

 As you see the manual managed version is twice as fast as the 
 garbage collected one. Even the highly optimized version 
 created with GDC is still slower the the manual memory 
 management.

 You can find the full article at:

 http://3d.benjamin-thaut.de/?p=20#more-20


 Feedback is welcome.

 Kind Regards
 Benjamin Thaut

Did you try GC.disable/enable?
Sep 05 2012
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 9/5/2012 4:03 AM, Benjamin Thaut wrote:
 GC collection times:

      DMD GC Version: 8.9 ms
      GDC GC Version: 4.1 ms

I'd like it if you could add some instrumentation to see what accounts for the time difference. I presume they both use the same D source code.
Sep 05 2012
next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 9/6/12, Iain Buclaw <ibuclaw ubuntu.com> wrote:
 I'd say they are identical, but I don't really look at what goes on
 over on the MinGW port.

Speaking of which, I'd like to see if the Unilink linker would make any difference as well. It's known to make smaller binaries than Optlink. I think Unilink could be tested with MinGW if it supports whatever GDC outputs, to compare against LD.
Sep 05 2012
prev sibling next sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Walter Bright:

 I'd like it if you could add some instrumentation to see what 
 accounts for the time difference. I presume they both use the 
 same D source code.

Maybe that performance difference comes from the sum of some metric tons of different little optimizations done by the GCC back-end. Bye, bearophile
Sep 05 2012
next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 9/5/2012 5:01 PM, bearophile wrote:
 Walter Bright:

 I'd like it if you could add some instrumentation to see what accounts for the
 time difference. I presume they both use the same D source code.

Maybe that performance difference comes from the sum of some metric tons of different little optimizations done by the GCC back-end.

We can trade guesses all day, and not get anywhere. Instrumentation and measurement is needed. I've investigated many similar things, and the truth usually turned out to be something nobody guessed or assumed. I recall the benchmark you posted where you guessed that dmd's integer code generation was woefully deficient. Examining the actual output showed that there wasn't a dime's worth of difference in the code generated from dmd vs gcc. The problem turned out to be the long division runtime library function. Fixing that brought the timings to parity. No code gen changes whatsoever were needed.
Sep 05 2012
prev sibling parent Sean Cavanaugh <WorksOnMyMachine gmail.com> writes:
On 9/6/2012 4:30 AM, Peter Alexander wrote:
 In addition to Walter's response, it is very rare for advanced compiler
 optimisations to make >2x difference on any non-trivial code. Not
 impossible, but it's definitely suspicious.

I love trying to explain to people our debug builds are too slow because they have instrumented too much of the code, and haven't disabled any of it. A lot of people are pushed into debugging release builds as a result, which is pretty silly. Now there are some pathological cases: non-inlined constructors can sometimes kill in some cases you for 3d vector math type libraries 128 bit SIMD intrinsics with microsofts compiler in debug builds makes horrifically slow code, each operation has its results written to memory and then is reloaded for the next 'instruction'. I believe its two order of magnitudes slower (the extra instructions, plus pegging the read and write ports of the CPU hurt quite a lot too). These tend to be right functions so can be optimized in debug builds selectively . . .
Sep 06 2012
prev sibling next sibling parent "Peter Alexander" <peter.alexander.au gmail.com> writes:
On Thursday, 6 September 2012 at 00:00:31 UTC, bearophile wrote:
 Walter Bright:

 I'd like it if you could add some instrumentation to see what 
 accounts for the time difference. I presume they both use the 
 same D source code.

Maybe that performance difference comes from the sum of some metric tons of different little optimizations done by the GCC back-end. Bye, bearophile

In addition to Walter's response, it is very rare for advanced compiler optimisations to make >2x difference on any non-trivial code. Not impossible, but it's definitely suspicious.
Sep 06 2012
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
Walter Bright:

 No code gen changes whatsoever were needed.

In that case I think I didn't specify what subsystem of the D compiler was not "good enough", I have just shown a performance difference. The division was slow, regardless of the cause. This is what's important for the final C/D programmer, not if the cause is a badly written division routine, or a bad/missing optimization stage. And regarding divisions, currently they are not optimized by dmd if divisors are small (like 10) and statically known. Bye, bearophile
Sep 06 2012
prev sibling next sibling parent reply Benjamin Thaut <code benjamin-thaut.de> writes:
Am 06.09.2012 01:10, schrieb Walter Bright:
 On 9/5/2012 4:03 AM, Benjamin Thaut wrote:
 GC collection times:

 DMD GC Version: 8.9 ms
 GDC GC Version: 4.1 ms

I'd like it if you could add some instrumentation to see what accounts for the time difference. I presume they both use the same D source code.

The code is identical, I did not change anything in the GC code. So it uses whatever code comes with the MinGW GDC 2.058 release. The problem with intstrumentation is, that I can not recompile druntime for the MinGW GDC, as this is not possible with the binary release of MinGW GDC and I did not go thorugh the effort to setup the whole build. I'm open to suggestions though how I could profile the GC without recompiling druntime. If someone else wants to profile this, I can also provide precompiled versions of both versions. -- Kind Regards Benjamin Thaut
Sep 06 2012
next sibling parent Jacob Carlborg <doob me.com> writes:
On 2012-09-06 14:12, Benjamin Thaut wrote:
 Am 06.09.2012 01:10, schrieb Walter Bright:
 On 9/5/2012 4:03 AM, Benjamin Thaut wrote:
 GC collection times:

 DMD GC Version: 8.9 ms
 GDC GC Version: 4.1 ms

I'd like it if you could add some instrumentation to see what accounts for the time difference. I presume they both use the same D source code.

The code is identical, I did not change anything in the GC code. So it uses whatever code comes with the MinGW GDC 2.058 release. The problem with intstrumentation is, that I can not recompile druntime for the MinGW GDC, as this is not possible with the binary release of MinGW GDC and I did not go thorugh the effort to setup the whole build. I'm open to suggestions though how I could profile the GC without recompiling druntime. If someone else wants to profile this, I can also provide precompiled versions of both versions.

I don't know what Windows has but on Mac OS X there's this application: https://developer.apple.com/library/mac/#documentation/developertools/conceptual/InstrumentsUserGuide/Introduction/Introduction.html It lets you instrument any running application. -- /Jacob Carlborg
Sep 06 2012
prev sibling parent reply Benjamin Thaut <code benjamin-thaut.de> writes:
Am 06.09.2012 15:30, schrieb ponce:
 The problem with intstrumentation is, that I can not recompile
 druntime for the MinGW GDC, as this is not possible with the binary
 release of MinGW GDC and I did not go thorugh the effort to setup the
 whole build.
 I'm open to suggestions though how I could profile the GC without
 recompiling druntime. If someone else wants to profile this, I can
 also provide precompiled versions of both versions.

You don't necessarily need to recompile anything with a sampling profiler like AMD Code Analyst or Very Sleepy

I just tried profiling it with Very Sleepy but basically it only tells me for both versions that most of the time is spend in gcx.fullcollect. Just that the GDC version spends less time in gcx.fullcollect then the DMD version. As I can not rebuild druntime with GDC it will be quite hard to get detailed profiling results. I'm open for suggestions. Kind Regards Benjamin Thaut
Sep 06 2012
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 9/6/2012 10:50 AM, Benjamin Thaut wrote:
 I just tried profiling it with Very Sleepy but basically it only tells me for
 both versions that most of the time is spend in gcx.fullcollect.
 Just that the GDC version spends less time in gcx.fullcollect then the DMD
version.

Even so, that in itself is a good clue.
Sep 06 2012
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 9/6/2012 11:47 PM, Iain Buclaw wrote:
 On a side note of that though, GDC has bt, btr, bts, etc, as
 intrinsics to its compiler front-end.  So it would be no problem
 switching to version = bitops for version GNU.

Would it be easy to give that a try, and see what happens?
Sep 07 2012
parent Walter Bright <newshound2 digitalmars.com> writes:
On 9/7/2012 2:52 AM, Iain Buclaw wrote:
 On 7 September 2012 10:31, Walter Bright <newshound2 digitalmars.com> wrote:
 On 9/6/2012 11:47 PM, Iain Buclaw wrote:
 On a side note of that though, GDC has bt, btr, bts, etc, as
 intrinsics to its compiler front-end.  So it would be no problem
 switching to version = bitops for version GNU.

Would it be easy to give that a try, and see what happens?

Sure, can do. Give me something to work against, and I will be able to produce the difference.

Well, gdc with and without it!
Sep 07 2012
prev sibling next sibling parent Jacob Carlborg <doob me.com> writes:
On 2012-09-07 01:53, Sean Kelly wrote:

 What version flags are set by GDC vs. DMD in your target apps?  The way "stop
the world" is done on Linux vs. Windows is different, for example.

He's using only Windows as far as I understand, GDC MinGW. -- /Jacob Carlborg
Sep 06 2012
prev sibling parent reply Benjamin Thaut <code benjamin-thaut.de> writes:
Am 07.09.2012 01:53, schrieb Sean Kelly:
 On Sep 6, 2012, at 10:50 AM, Benjamin Thaut <code benjamin-thaut.de> wrote:

 Am 06.09.2012 15:30, schrieb ponce:
 The problem with intstrumentation is, that I can not recompile
 druntime for the MinGW GDC, as this is not possible with the binary
 release of MinGW GDC and I did not go thorugh the effort to setup the
 whole build.
 I'm open to suggestions though how I could profile the GC without
 recompiling druntime. If someone else wants to profile this, I can
 also provide precompiled versions of both versions.

You don't necessarily need to recompile anything with a sampling profiler like AMD Code Analyst or Very Sleepy

I just tried profiling it with Very Sleepy but basically it only tells me for both versions that most of the time is spend in gcx.fullcollect. Just that the GDC version spends less time in gcx.fullcollect then the DMD version. As I can not rebuild druntime with GDC it will be quite hard to get detailed profiling results. I'm open for suggestions.

What version flags are set by GDC vs. DMD in your target apps? The way "stop the world" is done on Linux vs. Windows is different, for example.

I did build druntime and phobos with -release -noboundscheck -inline -O for DMD. For MinGW GDC I just used whatever version of druntime and phobos came precompiled with it, so I can't tell you which flags have been used to compile that. But I can tell you that cygwin is not required to run or compile, so I think its not using any posix stuff. I'm going to upload a zip-package with the source for the GC version soon, but I have to deal with some licence stuff first. Kind Regards Benjamin Thaut
Sep 07 2012
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 9/7/12 6:31 PM, Benjamin Thaut wrote:
 I did build druntime and phobos with -release -noboundscheck -inline -O
 for DMD.
 For MinGW GDC I just used whatever version of druntime and phobos came
 precompiled with it, so I can't tell you which flags have been used to
 compile that. But I can tell you that cygwin is not required to run or
 compile, so I think its not using any posix stuff.


 I'm going to upload a zip-package with the source for the GC version
 soon, but I have to deal with some licence stuff first.

 Kind Regards
 Benjamin Thaut

You mentioned some issues in Phobos with memory allocation, that you had to replace with your own code. It would be awesome if you could post more about that, and possibly post a few pull requests where directly applicable. Thanks, Andrei
Sep 07 2012
parent Benjamin Thaut <code benjamin-thaut.de> writes:
Am 07.09.2012 18:36, schrieb Andrei Alexandrescu:
 You mentioned some issues in Phobos with memory allocation, that you had
 to replace with your own code. It would be awesome if you could post
 more about that, and possibly post a few pull requests where directly
 applicable.

 Thanks,

 Andrei

Let me give a bit more details about what I did and why. Druntime: I added a reference counting mechanism. core.refcounted in my druntime branch. I created a reference counted array which is as close to the native D array as currently possible (compiler bugs, type system issues, etc). also in core.refcounted. It however does not replace the default string or array type in all cases because it would lead to reference counting in uneccessary places. The focus is to get only reference couting where absolutly neccessary. I'm still using the standard string type as a "only valid for current scope" kind of string. I created a allocator base interface which is used by everything that allocates, also I created replacement templates for new and delete. Located in core.allocator I created a new hashmap container wich is cache friendly and does not leak memory. Located in core.hashmap I created a memory tracking allocator in core.allocator which can be turned on and off with a version statement (as it has to run before and after module ctors dtors etc) I changed all parts of druntime that do string processing to use the reference counted array, so it no longer leaks. I made the Thread class reference counted so it no longer leaks. I fixed the type info comparsion and numerous other issues. Of all these changes only the type info fix will be easily convertible into the default druntime because it does not depend on any of my other stuff. I will do a merge request for this fix as soon as I find some time. Phobos: I threw away most of phobos because it didn't match my requirements. The only modules I kept are std.traits, std.random, std.math, std.typetuple, std.uni The parts of these modules that I use have been changed so they don't leak memory. Mostly this comes down to use reference counted strings for exception error message generation. I did require the option to specify a allocator for any function that allocates. Either by template argument, by function parameter or both, depending on the case. As custom allocators can not be pure this is a major issue with phobos, because adding allocators to the functions would make them unpure instantly. I know about the C-Linkage pure hack but its really a hack and it does not work for templates. So I think most of my changes are not directly applicable because: - You most likely won't like the way I implemented reference counting - You might won't like my allocator design - My standard library goes more into the C++ direction and is not as easly usable as phobos (as performance comes first for me, and usability is second) - All my changes heavily depend on some of the functionality I added to druntime. - The neccessary changes to phobos would break a lot of code because some of the function properties like pure couldn't be used any more, as a result of language limitations. Kind Regards Benjamin Thaut
Sep 07 2012
prev sibling next sibling parent "ponce" <spam spam.org> writes:
 The problem with intstrumentation is, that I can not recompile 
 druntime for the MinGW GDC, as this is not possible with the 
 binary release of MinGW GDC and I did not go thorugh the effort 
 to setup the whole build.
 I'm open to suggestions though how I could profile the GC 
 without recompiling druntime. If someone else wants to profile 
 this, I can also provide precompiled versions of both versions.

You don't necessarily need to recompile anything with a sampling profiler like AMD Code Analyst or Very Sleepy
Sep 06 2012
prev sibling next sibling parent "ponce" <spam spam.org> writes:
 I just tried profiling it with Very Sleepy but basically it 
 only tells me for both versions that most of the time is spend 
 in gcx.fullcollect.
 Just that the GDC version spends less time in gcx.fullcollect 
 then the DMD version.

 As I can not rebuild druntime with GDC it will be quite hard to 
 get detailed profiling results.

 I'm open for suggestions.

 As I can not rebuild druntime with GDC it will be quite hard to 
 get detailed profiling results.
 
 I'm open for suggestions.
 
 Kind Regards
 Benjamin Thaut

You might try AMD Code Analyst, it will highlight the bottleneck in the assembly listing. Then use a disassembler like IDA to get a feel of what the bottleneck could be.
Sep 06 2012
prev sibling next sibling parent Sean Kelly <sean invisibleduck.org> writes:
On Sep 6, 2012, at 10:50 AM, Benjamin Thaut <code benjamin-thaut.de> wrote:

 Am 06.09.2012 15:30, schrieb ponce:
 The problem with intstrumentation is, that I can not recompile
 druntime for the MinGW GDC, as this is not possible with the binary
 release of MinGW GDC and I did not go thorugh the effort to setup the
 whole build.
 I'm open to suggestions though how I could profile the GC without
 recompiling druntime. If someone else wants to profile this, I can
 also provide precompiled versions of both versions.

You don't necessarily need to recompile anything with a sampling profiler like AMD Code Analyst or Very Sleepy =20

I just tried profiling it with Very Sleepy but basically it only tells me f=

 Just that the GDC version spends less time in gcx.fullcollect then the DMD=

=20
 As I can not rebuild druntime with GDC it will be quite hard to get detail=

=20
 I'm open for suggestions.

What version flags are set by GDC vs. DMD in your target apps? The way "sto= p the world" is done on Linux vs. Windows is different, for example.=20=
Sep 06 2012
prev sibling next sibling parent "Sven Torvinger" <Sven torvinger.se> writes:
On Thursday, 6 September 2012 at 20:44:29 UTC, Walter Bright 
wrote:
 On 9/6/2012 10:50 AM, Benjamin Thaut wrote:
 I just tried profiling it with Very Sleepy but basically it 
 only tells me for
 both versions that most of the time is spend in 
 gcx.fullcollect.
 Just that the GDC version spends less time in gcx.fullcollect 
 then the DMD version.

Even so, that in itself is a good clue.

my bet is on, cross-module-inlining of bitop.btr failing... https://github.com/D-Programming-Language/druntime/blob/master/src/gc/gcbits.d version (DigitalMars) { version = bitops; } else version (GNU) { // use the unoptimized version } else version (D_InlineAsm_X86) { version = Asm86; } wordtype testClear(size_t i) { version (bitops) { return core.bitop.btr(data + 1, i); // this is faster! }
Sep 06 2012
prev sibling next sibling parent Iain Buclaw <ibuclaw ubuntu.com> writes:
On 7 September 2012 07:28, Sven Torvinger <Sven torvinger.se> wrote:
 On Thursday, 6 September 2012 at 20:44:29 UTC, Walter Bright wrote:
 On 9/6/2012 10:50 AM, Benjamin Thaut wrote:
 I just tried profiling it with Very Sleepy but basically it only tells me
 for
 both versions that most of the time is spend in gcx.fullcollect.
 Just that the GDC version spends less time in gcx.fullcollect then the
 DMD version.

Even so, that in itself is a good clue.

my bet is on, cross-module-inlining of bitop.btr failing... https://github.com/D-Programming-Language/druntime/blob/master/src/gc/gcbits.d version (DigitalMars) { version = bitops; } else version (GNU) { // use the unoptimized version } else version (D_InlineAsm_X86) { version = Asm86; } wordtype testClear(size_t i) { version (bitops) { return core.bitop.btr(data + 1, i); // this is faster! }

You would be wrong. btr is a compiler intrinsic, so it is *always* inlined! Leaning towards Walter here that I would very much like to see hard evidence of your claims. :-) On a side note of that though, GDC has bt, btr, bts, etc, as intrinsics to its compiler front-end. So it would be no problem switching to version = bitops for version GNU. -- Iain Buclaw *(p < e ? p++ : p) = (c & 0x0f) + '0';
Sep 06 2012
prev sibling next sibling parent Iain Buclaw <ibuclaw ubuntu.com> writes:
On 7 September 2012 10:31, Walter Bright <newshound2 digitalmars.com> wrote:
 On 9/6/2012 11:47 PM, Iain Buclaw wrote:
 On a side note of that though, GDC has bt, btr, bts, etc, as
 intrinsics to its compiler front-end.  So it would be no problem
 switching to version = bitops for version GNU.

Would it be easy to give that a try, and see what happens?

Sure, can do. Give me something to work against, and I will be able to produce the difference. -- Iain Buclaw *(p < e ? p++ : p) = (c & 0x0f) + '0';
Sep 07 2012
prev sibling parent Sean Kelly <sean invisibleduck.org> writes:
On Sep 6, 2012, at 10:57 PM, Jacob Carlborg <doob me.com> wrote:

 On 2012-09-07 01:53, Sean Kelly wrote:
=20
 What version flags are set by GDC vs. DMD in your target apps?  The way "=


=20
 He's using only Windows as far as I understand, GDC MinGW.

Well sure, but MinGW is weird. I'd expect the Windows flag to be set for Min= GW and both the Windows and Posix flags set for Cygwin, but it seemed worth a= sking. If Windows and Posix are both set, the Windows method will be used fo= r "stop the world".=20=
Sep 07 2012
prev sibling next sibling parent Iain Buclaw <ibuclaw ubuntu.com> writes:
On 6 September 2012 00:10, Walter Bright <newshound2 digitalmars.com> wrote:
 On 9/5/2012 4:03 AM, Benjamin Thaut wrote:
 GC collection times:

      DMD GC Version: 8.9 ms
      GDC GC Version: 4.1 ms

I'd like it if you could add some instrumentation to see what accounts for the time difference. I presume they both use the same D source code.

I'd say they are identical, but I don't really look at what goes on over on the MinGW port. -- Iain Buclaw *(p < e ? p++ : p) = (c & 0x0f) + '0';
Sep 05 2012
prev sibling next sibling parent "ponce" <spam spam.org> writes:
 You can find the full article at:

 http://3d.benjamin-thaut.de/?p=20#more-20

You make some good points about what happen under the hood. Especially: - homogeneous variadic function call allocate - comparison of const object allocate - useless druntime invariant handlers calls I removed some homogeneous variadic function calls from my own code.
Sep 07 2012
prev sibling next sibling parent Jens Mueller <jens.k.mueller gmx.de> writes:
Benjamin Thaut wrote:
 I rewrote a 3d game I created during my studies with D 2.0 to manual
 memory mangement. If I'm not studying I'm working in the 3d Engine
 deparement of Havok. As I needed to pratice manual memory management
 and did want to get rid of the GC in D for quite some time, I did go
 through all this effort to create a GC free version of my game.
 
 The results are:
 
     DMD GC Version: 71 FPS, 14.0 ms frametime
     GDC GC Version: 128.6 FPS, 7.72 ms frametime
     DMD MMM Version: 142.8 FPS, 7.02 ms frametime

Interesting. What about measuring a GDC MMM version? Because I wonder what is the GC overhead. With DMD it's two. Maybe that factor is lower with GDC. I would be interested in some numbers regarding memory overhead. To get a more complete picture of the impact on resources when using the GC. Jens
Sep 07 2012
prev sibling next sibling parent Benjamin Thaut <code benjamin-thaut.de> writes:
The full sourcecode for the non-GC version is now aviable on github. The 
GC version will follow soon.

https://github.com/Ingrater/Spacecraft

Kind Regards
Benjamin Thaut
Sep 09 2012
prev sibling next sibling parent reply Benjamin Thaut <code benjamin-thaut.de> writes:
Here a small update:

I found a piece of code that did manually slow down the simulation in 
case it got to fast. This code never kicked in with the GC version, 
because it never reached the margin. The manual memory managed version 
however did reach the margin and was slowed down. With this piece of 
code removed the manual memory managed version runs at 5 ms which is 200 
FPS and thus nearly 3 times as fast as the GC collected version.

Kind Regards
Benjamin Thaut
Oct 23 2012
next sibling parent "Rob T" <rob ucora.com> writes:
On Tuesday, 23 October 2012 at 16:30:41 UTC, Benjamin Thaut wrote:
 Here a small update:

 I found a piece of code that did manually slow down the 
 simulation in case it got to fast. This code never kicked in 
 with the GC version, because it never reached the margin. The 
 manual memory managed version however did reach the margin and 
 was slowed down. With this piece of code removed the manual 
 memory managed version runs at 5 ms which is 200 FPS and thus 
 nearly 3 times as fast as the GC collected version.

 Kind Regards
 Benjamin Thaut

That's a very significant difference in performance that should not be taken lightly. I don't really see a general solution to the GC problem other than to design things such that a D programmer has a truely practical ability to not use the GC at all and ensure that it does not sneak back in. IMHO I think it was a mistake to assume that D should depend on a GC to the degree that has taken place. The GC is also the reason why D has a few other significant technical problems not related to performance, such as inability to link D code to C/C++ code if the GC is required on the D side, and inability to build dynamic liraries and runtime loadable plugins that link to the runtime system - the GC apparently does not work correctly in these situatons, although the problem is solvable how this was allowed to happen in the first place is difficult to understand. I'll be a much more happy D programmer if I could guarantee where and when the GC is used, therefore the GC should be 100% optional in practice, not just in theory. --rt
Oct 23 2012
prev sibling next sibling parent "Paulo Pinto" <pjmlp progtools.org> writes:
On Tuesday, 23 October 2012 at 22:31:03 UTC, Rob T wrote:
 On Tuesday, 23 October 2012 at 16:30:41 UTC, Benjamin Thaut 
 wrote:
 Here a small update:

 I found a piece of code that did manually slow down the 
 simulation in case it got to fast. This code never kicked in 
 with the GC version, because it never reached the margin. The 
 manual memory managed version however did reach the margin and 
 was slowed down. With this piece of code removed the manual 
 memory managed version runs at 5 ms which is 200 FPS and thus 
 nearly 3 times as fast as the GC collected version.

 Kind Regards
 Benjamin Thaut

That's a very significant difference in performance that should not be taken lightly. I don't really see a general solution to the GC problem other than to design things such that a D programmer has a truely practical ability to not use the GC at all and ensure that it does not sneak back in. IMHO I think it was a mistake to assume that D should depend on a GC to the degree that has taken place. The GC is also the reason why D has a few other significant technical problems not related to performance, such as inability to link D code to C/C++ code if the GC is required on the D side, and inability to build dynamic liraries and runtime loadable plugins that link to the runtime system - the GC apparently does not work correctly in these situatons, although the problem is solvable how this was allowed to happen in the first place is difficult to understand. I'll be a much more happy D programmer if I could guarantee where and when the GC is used, therefore the GC should be 100% optional in practice, not just in theory. --rt

Having dealt with systems programming in languages with GC (Native Oberon, Modula-3), I wonder how much an optional GC would really matter, if D's GC had better performance. -- Paulo
Oct 24 2012
prev sibling next sibling parent "Rob T" <rob ucora.com> writes:
On Wednesday, 24 October 2012 at 12:21:03 UTC, Paulo Pinto wrote:
 Having dealt with systems programming in languages with GC 
 (Native Oberon, Modula-3), I wonder how much an optional GC 
 would really matter, if D's GC had better performance.

 --
 Paulo

Well, performnce is only part of the GC equation. There's determinism, knowing when the GC is invoked and ability to control it, and increased complexity introduced by a GC, which tends to increase considerably when improving the GCs performance and ability to manage it manually. All this means there's a lot more potential for things going wrong, and this cycle of fixing the fix may never end. The cost of clinging onto a GC may be too high to be worth relying on as heavily as is being done, and effectivly forcing a GC on programmers is the wrong approach because not everyone has the same requirements that require its use. When I say "forcing", look at what had to be done to fix the performance of the game in question, what was done to get rid of the GC was a super-human effort and that is simply not a practical solution by any stretch of the imagination. A GC is both good and bad, not good for everyone and not bad for everyone, with shades of gray in between, so it has to be made fully optional, with good manual control, and easily so. --rt
Oct 24 2012
prev sibling next sibling parent "Paulo Pinto" <pjmlp progtools.org> writes:
On Wednesday, 24 October 2012 at 18:26:48 UTC, Rob T wrote:
 On Wednesday, 24 October 2012 at 12:21:03 UTC, Paulo Pinto 
 wrote:
 Having dealt with systems programming in languages with GC 
 (Native Oberon, Modula-3), I wonder how much an optional GC 
 would really matter, if D's GC had better performance.

 --
 Paulo

Well, performnce is only part of the GC equation. There's determinism, knowing when the GC is invoked and ability to control it, and increased complexity introduced by a GC, which tends to increase considerably when improving the GCs performance and ability to manage it manually. All this means there's a lot more potential for things going wrong, and this cycle of fixing the fix may never end. The cost of clinging onto a GC may be too high to be worth relying on as heavily as is being done, and effectivly forcing a GC on programmers is the wrong approach because not everyone has the same requirements that require its use. When I say "forcing", look at what had to be done to fix the performance of the game in question, what was done to get rid of the GC was a super-human effort and that is simply not a practical solution by any stretch of the imagination. A GC is both good and bad, not good for everyone and not bad for everyone, with shades of gray in between, so it has to be made fully optional, with good manual control, and easily so. --rt

I do understand that. But on the other hand there are operating systems fully developed in such languages, like Blue Bottle, http://www.ocp.inf.ethz.ch/wiki/Documentation/WindowManager Or the real time system developed at ETHZ to control robot helicopters, http://static.usenix.org/events/vee05/full_papers/p35-kirsch.pdf I surely treble at the thought of a full GC collection in plane software. On the other hand I am old enough to remember the complaints that C was too slow and one needed to write everything in Assembly to have full control of the application code. Followed by C++ was too slow and one should use C structs with embedded pointers to have full control over the memory layout of the object table, instead of strange compiler generated VMT tables. So I always take the assertions that manual memory management is a must with a grain of salt. -- Paulo
Oct 24 2012
prev sibling next sibling parent "Rob T" <rob ucora.com> writes:
On Wednesday, 24 October 2012 at 21:02:34 UTC, Paulo Pinto wrote:
 So I always take the assertions that manual memory management 
 is a must with a grain of salt.

 --
 Paulo

Probably no one in here is thinking that we should not have a GC. I'm sure that many applications will benefit from a GC, but I'm also certain that not all applicatins require a GC, and it's a mistake to assume everyone will be happy to have one as was illustrated in the OP. In my case, I'm not too concerned about performance, or pauses in the execution, but I do require dynamic loadable libraries, and I do want to link D code to existing C/C++ code, but in order to do these things, I cannot use the GC because I'm told that it will not work under these situations. It may be theoretically possible to build a near perfect GC that will work well for even RT applications, and will work for dynamic loadable libraies, etc, but while waiting for one to materialize in D, what are we supposed to do when the current GC is unsuitable? --rt
Oct 24 2012
prev sibling next sibling parent "Jakob Ovrum" <jakobovrum gmail.com> writes:
On Wednesday, 24 October 2012 at 23:05:29 UTC, Rob T wrote:
 In my case, I'm not too concerned about performance, or pauses 
 in the execution, but I do require dynamic loadable libraries, 
 and I do want to link D code to existing C/C++ code, but in 
 order to do these things, I cannot use the GC because I'm told 
 that it will not work under these situations.

You can very much link to C and C++ code, or have C and C++ code link to your D code, while still using the GC, you just have to be careful when you send GC memory to external code. You can even share the same GC between dynamic libraries and the host application (if both are D and use GC, of course) using the GC proxy system.
Oct 24 2012
prev sibling next sibling parent "Jakob Ovrum" <jakobovrum gmail.com> writes:
On Wednesday, 24 October 2012 at 23:05:29 UTC, Rob T wrote:
 In my case, I'm not too concerned about performance, or pauses 
 in the execution, but I do require dynamic loadable libraries, 
 and I do want to link D code to existing C/C++ code, but in 
 order to do these things, I cannot use the GC because I'm told 
 that it will not work under these situations.

You can very much link to C and C++ code, or have C and C++ code link to your D code, while still using the GC, you just have to be careful when you send GC memory to external code. You can even share the same GC between dynamic libraries and the host application (if both are D and use GC, of course) using the GC proxy system.
Oct 24 2012
prev sibling next sibling parent "Paulo Pinto" <pjmlp progtools.org> writes:
On Thursday, 25 October 2012 at 02:15:41 UTC, Jakob Ovrum wrote:
 On Wednesday, 24 October 2012 at 23:05:29 UTC, Rob T wrote:
 In my case, I'm not too concerned about performance, or pauses 
 in the execution, but I do require dynamic loadable libraries, 
 and I do want to link D code to existing C/C++ code, but in 
 order to do these things, I cannot use the GC because I'm told 
 that it will not work under these situations.

You can very much link to C and C++ code, or have C and C++ code link to your D code, while still using the GC, you just have to be careful when you send GC memory to external code. You can even share the same GC between dynamic libraries and the host application (if both are D and use GC, of course) using the GC proxy system.

I am speaking without knowing if such thing already exists. Maybe someone that knows the best way to do so, could write an article about best practices of using C and C++ code together in D applications. So that we could point them to it, in similar vein to the wonderful article about templates. -- Paulo
Oct 24 2012
prev sibling next sibling parent "Rob T" <rob ucora.com> writes:
On Thursday, 25 October 2012 at 02:15:41 UTC, Jakob Ovrum wrote:
 You can very much link to C and C++ code, or have C and C++ 
 code link to your D code, while still using the GC, you just 
 have to be careful when you send GC memory to external code.

 You can even share the same GC between dynamic libraries and 
 the host application  (if both are D and use GC, of course) 
 using the GC proxy system.

My understanding of dynamic linking and the runtime is based on this thread http://www.digitalmars.com/d/archives/digitalmars/D/dynamic_library_building_and_loading_176983.html The runtime is not compiled to be sharable, so you cannot link it to shared libs by defult. However, hacking the gdc build system allowed me to compile the runtime into a sharable state, and all seemed well. However, based on the input from that thread, my understanding was that the GC would be unreliable at best. I suppose I could do some tests on it, but tests can only confirm so much. I'd also have to decipher the runtime source code to see what the heck it is doing or not. --rt
Oct 25 2012
prev sibling next sibling parent "Jakob Ovrum" <jakobovrum gmail.com> writes:
On Thursday, 25 October 2012 at 08:34:15 UTC, Rob T wrote:
 My understanding of dynamic linking and the runtime is based on 
 this thread

 http://www.digitalmars.com/d/archives/digitalmars/D/dynamic_library_building_and_loading_176983.html

 The runtime is not compiled to be sharable, so you cannot link 
 it to shared libs by defult. However, hacking the gdc build 
 system allowed me to compile the runtime into a sharable state, 
 and all seemed well.

 However, based on the input from that thread, my understanding 
 was that the GC would be unreliable at best.

 I suppose I could do some tests on it, but tests can only 
 confirm so much. I'd also have to decipher the runtime source 
 code to see what the heck it is doing or not.

 --rt

You are right that compiling the runtime itself (druntime and Phobos) as a shared library is not yet fully realized, but that doesn't stop you from compiling your own libraries and applications as shared libraries even if they statically link to the runtime (which is the current default behaviour).
Oct 25 2012
prev sibling next sibling parent "Rob T" <rob ucora.com> writes:
On Thursday, 25 October 2012 at 08:50:19 UTC, Jakob Ovrum wrote:
 You are right that compiling the runtime itself (druntime and 
 Phobos) as a shared library is not yet fully realized, but that 
 doesn't stop you from compiling your own libraries and 
 applications as shared libraries even if they statically link 
 to the runtime (which is the current default behaviour).

Yes I can build my own D shared libs, both as static PIC (.a) and dynamically loadable (.so). however I cannot statically link my shared libs to druntime + phobos as-is. The only way I can do that, is to also compile druntime + phobos into PIC, which can be done as a static PIC lib. So what you are saying is that I can statically link PIC compiled druntime to my own shared lib, but I cannot build druntime as a dynamically loadable shared lib? I can see why thatmay work, if each shared lib has it's own private compy of the GC. Correct? I recall that druntime may have some ASM code that will not work when compiled to PIC. I think gdc removed the offending ASM code, but it may still be present in the dmd version, but I don't know for sure. Another question is if I can link a dynamic loadable D lib to C/C++ code or not? Yes I can do it, and it seems to work, but I was told that the GC will not necessarily work. Am I misunderstanding this part? --rt
Oct 25 2012
prev sibling next sibling parent "Rob T" <rob ucora.com> writes:
On Thursday, 25 October 2012 at 02:15:41 UTC, Jakob Ovrum wrote:
 You can even share the same GC between dynamic libraries and 
 the host application  (if both are D and use GC, of course) 
 using the GC proxy system.

What is the GC proxy system, and how do I make use of it? --rt
Oct 25 2012
prev sibling next sibling parent "Jakob Ovrum" <jakobovrum gmail.com> writes:
On Thursday, 25 October 2012 at 17:17:01 UTC, Rob T wrote:
 Yes I can build my own D shared libs, both as static PIC (.a) 
 and dynamically loadable (.so). however I cannot statically 
 link my shared libs to druntime + phobos as-is. The only way I 
 can do that, is to also compile druntime + phobos into PIC, 
 which can be done as a static PIC lib.

Sorry, I keep forgetting that this is needed on non-Windows systems.
 So what you are saying is that I can statically link PIC 
 compiled druntime to my own shared lib, but I cannot build 
 druntime as a dynamically loadable shared lib? I can see why 
 thatmay work, if each shared lib has it's own private compy of 
 the GC. Correct?

Yes, this is possible. Sending references to GC memory between the D modules then has the same rules as when sending it to non-D code, unless the host (the loader module) uses druntime to load the other modules, in which case it can in principle share the same GC with them.
 I recall that druntime may have some ASM code that will not 
 work when compiled to PIC. I think gdc removed the offending 
 ASM code, but it may still be present in the dmd version, but I 
 don't know for sure.

I think it was relatively recently that DMD could also compile the runtime as PIC, but I might be remembering wrong.
 Another question is if I can link a dynamic loadable D lib to 
 C/C++ code or not? Yes I can do it, and it seems to work, but I 
 was told that the GC will not necessarily work. Am I 
 misunderstanding this part?

The GC will work the same as usual inside the D code, but you have to manually keep track of references you send outside the scope of the GC, such as references to GC memory put on the C heap. This can be done with the GC.addRoot/removeRoot and GC.addRange/removeRange functions found in core.memory, or by retaining the references in global, TLS or GC memory. It's good practice to do this for all GC references sent to external code, as you don't know where the reference may end up. Of course, you have other options. You don't have to send references to GC memory to external code, you can always copy the data over to a different buffer, such as one on the C heap (i.e. malloc()). If the caller (in the case of a return value) or the callee (in the case of a function argument) expects to be able to call free() on the memory referenced, then you must do it this way regardless.
Oct 25 2012
prev sibling next sibling parent "Jakob Ovrum" <jakobovrum gmail.com> writes:
On Thursday, 25 October 2012 at 17:17:01 UTC, Rob T wrote:
 Yes I can build my own D shared libs, both as static PIC (.a) 
 and dynamically loadable (.so). however I cannot statically 
 link my shared libs to druntime + phobos as-is. The only way I 
 can do that, is to also compile druntime + phobos into PIC, 
 which can be done as a static PIC lib.

Sorry, I keep forgetting that this is needed on non-Windows systems.
 So what you are saying is that I can statically link PIC 
 compiled druntime to my own shared lib, but I cannot build 
 druntime as a dynamically loadable shared lib? I can see why 
 thatmay work, if each shared lib has it's own private compy of 
 the GC. Correct?

Yes, this is possible. Sending references to GC memory between the D modules then has the same rules as when sending it to non-D code, unless the host (the loader module) uses druntime to load the other modules, in which case it can in principle share the same GC with them.
 I recall that druntime may have some ASM code that will not 
 work when compiled to PIC. I think gdc removed the offending 
 ASM code, but it may still be present in the dmd version, but I 
 don't know for sure.

I think it was relatively recently that DMD could also compile the runtime as PIC, but I might be remembering wrong.
 Another question is if I can link a dynamic loadable D lib to 
 C/C++ code or not? Yes I can do it, and it seems to work, but I 
 was told that the GC will not necessarily work. Am I 
 misunderstanding this part?

The GC will work the same as usual inside the D code, but you have to manually keep track of references you send outside the scope of the GC, such as references to GC memory put on the C heap. This can be done with the GC.addRoot/removeRoot and GC.addRange/removeRange functions found in core.memory, or by retaining the references in global, TLS or GC memory. It's good practice to do this for all GC references sent to external code, as you don't know where the reference may end up. Of course, you have other options. You don't have to send references to GC memory to external code, you can always copy the data over to a different buffer, such as one on the C heap (i.e. malloc()). If the caller (in the case of a return value) or the callee (in the case of a function argument) expects to be able to call free() on the memory referenced, then you must do it this way regardless.
Oct 25 2012
prev sibling parent "Jakob Ovrum" <jakobovrum gmail.com> writes:
On Thursday, 25 October 2012 at 17:20:40 UTC, Rob T wrote:
 On Thursday, 25 October 2012 at 02:15:41 UTC, Jakob Ovrum wrote:
 You can even share the same GC between dynamic libraries and 
 the host application  (if both are D and use GC, of course) 
 using the GC proxy system.

What is the GC proxy system, and how do I make use of it? --rt

There's a function Runtime.loadLibrary in core.runtime that is supposed to load a shared library and get the symbol named `gc_setProxy` using the platform's dynamic library loading routines, then use that to share the host GC with the loaded library. I say "is supposed to" because I checked the code and it's currently a throwing stub on POSIX systems, it's only implemented for Windows (the source of the function can be found in rt_loadLibrary in rt/dmain2.d of druntime). When it comes to gc_setProxy - GDC exports this symbol by default on Windows, while DMD doesn't. I don't know why this is the case. I haven't built shared libraries on other OS' before so I don't know how GDC and DMD behave there.
Oct 25 2012
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
I use this GC thread to show a little GC-related benchmark.

A little Reddit thread about using memory more compactly in Java:

http://www.reddit.com/r/programming/comments/120xvf/compact_offheap_structurestuples_in_java/

The relative blog post:
http://mechanical-sympathy.blogspot.it/2012/10/compact-off-heap-structurestuples-in.html

So I have written a D version, in my test I have reduced the 
amount of memory allocated (NUM_RECORDS = 10_000_000):
http://codepad.org/IhHjqUua

With this lower memory usage the D version it's more than twice 
faster than the compact Java version that uses the same 
NUM_RECORDS (0.5 seconds against 1.2 seconds each loop after the 
first two ones).

In D I have improved the loops, I have used an align() and a 
minimallyInitializedArray, this is not too much bad.

But in the main() I have also had to use a deprecated "delete", 
because otherwise the GC doesn't deallocate the arrays and the 
program burns all the memory (setting the array to null and using 
GC.collect() isn't enough). This is not good.

Bye,
bearophile
Oct 26 2012
prev sibling next sibling parent "Rob T" <rob ucora.com> writes:
On Friday, 26 October 2012 at 14:21:51 UTC, bearophile wrote:
 But in the main() I have also had to use a deprecated "delete", 
 because otherwise the GC doesn't deallocate the arrays and the 
 program burns all the memory (setting the array to null and 
 using GC.collect() isn't enough). This is not good.

Is this happening with dmd 2.060 as released?
Oct 26 2012
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
Rob T:

 Is this happening with dmd 2.060 as released?

I'm using 2.061alpha git head, but I guess the situation is the same with dmd 2.060. The code is linked in my post, so trying it is easy, it's one small module. Bye, bearophile
Oct 26 2012
prev sibling next sibling parent "Rob T" <rob ucora.com> writes:
On Friday, 26 October 2012 at 23:10:48 UTC, bearophile wrote:
 Rob T:

 Is this happening with dmd 2.060 as released?

I'm using 2.061alpha git head, but I guess the situation is the same with dmd 2.060. The code is linked in my post, so trying it is easy, it's one small module. Bye, bearophile

I tried it with dmd 2.60 (released), and gdc 4.7 branch. I tried to check if memory was being freed by creating a struc destructor for JavaMemoryTrade, but that did not work as expected, leading me down the confusing and inconsistent path of figuring out why destructors do not get called when memory is freed. Long story short, I could not force a struct to execute its destructor if it was allocated on the heap unless I used delete. I tried destroy and clear, as well as GC.collect and GC.free(), nothing else worked. Memory heap management as well as struct destructors appear to be seriously broken. --rt
Oct 26 2012
prev sibling next sibling parent "Rob T" <rob ucora.com> writes:
On Saturday, 27 October 2012 at 01:03:57 UTC, Rob T wrote:
 On Friday, 26 October 2012 at 23:10:48 UTC, bearophile wrote:
 Rob T:

 Is this happening with dmd 2.060 as released?

I'm using 2.061alpha git head, but I guess the situation is the same with dmd 2.060. The code is linked in my post, so trying it is easy, it's one small module. Bye, bearophile

I tried it with dmd 2.60 (released), and gdc 4.7 branch. I tried to check if memory was being freed by creating a struc destructor for JavaMemoryTrade, but that did not work as expected, leading me down the confusing and inconsistent path of figuring out why destructors do not get called when memory is freed. Long story short, I could not force a struct to execute its destructor if it was allocated on the heap unless I used delete. I tried destroy and clear, as well as GC.collect and GC.free(), nothing else worked. Memory heap management as well as struct destructors appear to be seriously broken. --rt

OK my bad, partially. Heap allocated struct destructors will not get called using clear or destroy unless the struct reference is manually dereferenced. I got confused that class references behave differently than heap allocated struct references. I cannot be the first person to do this, and it must happen all the time. The auto dereferencing of a struc pointer when accessing members may be nice, but it makes struct pointers look exactly like class references, which will lead to mistakes. I do get the concept between classes and structs, but why does clear and destroy using a struct pointer not give me a compiler error or at least a warning? Is there any valid purpose to clear or destroy a pointer that is not dereferenced? Seems like a bug to me. --rt
Oct 26 2012
prev sibling next sibling parent "Rob T" <rob ucora.com> writes:
On Saturday, 27 October 2012 at 01:03:57 UTC, Rob T wrote:
 On Friday, 26 October 2012 at 23:10:48 UTC, bearophile wrote:
 Rob T:

 Is this happening with dmd 2.060 as released?

I'm using 2.061alpha git head, but I guess the situation is the same with dmd 2.060. The code is linked in my post, so trying it is easy, it's one small module. Bye, bearophile

I tried it with dmd 2.60 (released), and gdc 4.7 branch. I tried to check if memory was being freed by creating a struc destructor for JavaMemoryTrade, but that did not work as expected, leading me down the confusing and inconsistent path of figuring out why destructors do not get called when memory is freed. Long story short, I could not force a struct to execute its destructor if it was allocated on the heap unless I used delete. I tried destroy and clear, as well as GC.collect and GC.free(), nothing else worked. Memory heap management as well as struct destructors appear to be seriously broken. --rt

I made a mistake. The clear and destroy operations require that a pointer to a struc be manually dereferenced. What I don't understand is why the compiler allows you to pass a -not- dereferenced pointer to clear and destroy, this looks like a bug to me. It should either work just like a class reference does, or it should refuse to compile. I'm sure you've heard this many times before, but I have to say that it's very confusing when struct pointers behave exactly like class references, but not always. --rt
Oct 26 2012
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
 But in the main() I have also had to use a deprecated "delete",

And setting trades.length to zero and then using GC.free() on its ptr gives the same good result. Bye, bearophile
Oct 27 2012
prev sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
And with the usual optimizations (struct splitting) coming from 
talking a look at the access patterns, the D code gets faster:

http://codepad.org/SnxnpcAB

Bye,
bearophile
Oct 27 2012