www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - GC interpreting integer values as pointers

reply Ivo Kasiuk <i.kasiuk gmx.de> writes:
Hi!

In my D programs I am having problems with objects not getting finalised
although there is no reference anymore. It turned out that this is
caused by integers which happen to have values corresponding to pointers
into the heap. So I wrote a test program to check the GC behaviour
concerning integer values:

----------------------------------------
import std.stdio;
import core.memory;
class C {
  string s;
  this(string s) { this.s=3Ds; }
  ~this() { writeln(s); }
}
struct S {
  uint r;
  this(uint x) { r =3D x; }
}
class X {
  C c;
  uint r;
  S s;
  uint[int] a;
  uint* p;
  this() {
    c =3D new C("reference");
    new C("no reference");
    r =3D cast(uint) cast(void*) new C("uint");
    s =3D S(cast(uint) cast(void*) new C("struct"));
    a[0] =3D cast(uint) cast(void*) new C("AA");
    p =3D new uint;
    *p =3D (cast(uint) cast(void*) new C("new uint"));
  }
}
void main(string[] args) {
  X x =3D new X;
  GC.collect();
  writefln("=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D %s, %x, %x, %x, %x",
           x.c.s, x.r, x.s.r, x.a[0], *x.p);
}
----------------------------------------

This writes:

new uint
no reference
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D reference, f7490e20, f7490e10, f7490df0, f74=
90dd0
AA
struct
uint
reference


So in most but not all situations the integer value keeps the object
from getting finalised. This observation corresponds to the effects I
saw in my programs.

I find this rather unfortunate. Is this known, documented behaviour? In
a typical program there are such integer values all over the place. How
should such values be stored to avoid unwanted interaction with the GC?

Thanks,
Ivo
Oct 09 2010
next sibling parent reply %u <e ee.com> writes:
== Quote from Ivo Kasiuk (i.kasiuk gmx.de)'s article
 Hi!
~snip
 ----------------------------------------
 This writes:
 new uint
 no reference
 ========== reference, f7490e20, f7490e10, f7490df0, f74
 90dd0
 AA
 struct
 uint
 reference
 So in most but not all situations the integer value keeps the object
 from getting finalised. This observation corresponds to the effects I
 saw in my programs.
 I find this rather unfortunate. Is this known, documented behaviour? In
 a typical program there are such integer values all over the place. How
 should such values be stored to avoid unwanted interaction with the GC?
 Thanks,
 Ivo
In D1: import std.stdio; import std.gc; class C { string s; this(string s) { this.s=s; } ~this() { writefln(s); } } class X { C c; uint r; uint[int] a; uint* p; this() { c = new C("reference"); new C("no reference"); r = cast(uint) cast(void*) new C("uint"); a[0] = cast(uint) cast(void*) new C("AA"); p = new uint; *p = (cast(uint) cast(void*) new C("new uint")); } } void main(string[] args) { X x = new X; std.gc.fullCollect(); writefln("========== %s, %x, %x, %x", x.c.s, x.r, x.a[0],*x.p); } Writes: no reference ========== reference, ad3fd0, ad3fb0, ad3f90 new uint << ;) AA uint reference
Oct 11 2010
parent reply Ivo Kasiuk <i.kasiuk gmx.de> writes:
 ~snip
 ----------------------------------------
 This writes:
 new uint
 no reference
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D reference, f7490e20, f7490e10, f7490df0,=
f74
 90dd0
 AA
 struct
 uint
 reference
...
 Thanks,
 Ivo
=20 In D1:
...
 Writes:
 no reference
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D reference, ad3fd0, ad3fb0, ad3f90
 new uint  << ;)
 AA
 uint
 reference
Thanks for trying it out in D1. So, summing up this means that: - In most cases, memory is by default scanned for pointers regardless of the actual data types. - In D2, newly allocated memory for a non-pointer data type (like "new uint" or "new uint[10]") is not scanned by default. - In D1, you have to use hasNoPointers if you want some memory not to be scanned. Is this observation correct? And what about structs/classes that have integer fields as well as pointer/reference fields? And what about associative arrays - apparently these are scanned even if the type is uint? Ivo
Oct 11 2010
parent reply %u <e ee.com> writes:
== Quote from Ivo Kasiuk (i.kasiuk gmx.de)'s article
 ~snip
 ----------------------------------------
 This writes:
 new uint
 no reference
 ========== reference, f7490e20, f7490e10, f7490df0,
f74
 90dd0
 AA
 struct
 uint
 reference
...
 Thanks,
 Ivo
In D1:
...
 Writes:
 no reference
 ========== reference, ad3fd0, ad3fb0, ad3f90
 new uint  << ;)
 AA
 uint
 reference
Thanks for trying it out in D1. So, summing up this means that: - In most cases, memory is by default scanned for pointers regardless of the actual data types. - In D2, newly allocated memory for a non-pointer data type (like "new uint" or "new uint[10]") is not scanned by default.
Isn't p a pointer data type? I didn't even know I could do "i = new int;" :D
 - In D1, you have to use hasNoPointers if you want some memory not to be
 scanned.
 Is this observation correct?
 And what about structs/classes that have integer fields as well as
 pointer/reference fields?
 And what about associative arrays - apparently these are scanned even if
 the type is uint?
 Ivo
I added the struct again and also ran without the enclosing X class. With X : no reference ========== reference, ad3fd0, ad3fc0, ad3fa0, ad3f80 new uint AA struct uint reference Without X : no reference ========== reference, ad2fd0, ad2fc0, ad2fa0, ad2f80 new uint -- import std.stdio; import std.gc; class C { string s; this(string s) { this.s=s; } ~this() { writefln(s); } } struct S { uint r; static S opCall(uint x) { S s; s.r = x; return s; } } class X{ C c; uint r; S s; uint[int] a; uint* p; this() { c = new C("reference"); new C("no reference"); r = cast(uint) cast(void*) new C("uint"); s = S(cast(uint) cast(void*) new C("struct")); a[0] = cast(uint) cast(void*) new C("AA"); p = new uint; *p = (cast(uint) cast(void*) new C("new uint")); } } void main(string[] args) { /+ c = new C("reference"); new C("no reference"); r = cast(uint) cast(void*) new C("uint"); s = S(cast(uint) cast(void*) new C("struct")); a[0] = cast(uint) cast(void*) new C("AA"); p = new uint; *p = (cast(uint) cast(void*) new C("new uint")); +/ X x = new X; std.gc.fullCollect(); writefln("========== %s, %x, %x, %x, %x", x.c.s, x.r, x.s.r, x.a[0],*x.p); //writefln("========== %s, %x, %x, %x, %x", c.s, r, s.r, a[0],*p); }
Oct 11 2010
parent reply Ivo Kasiuk <i.kasiuk gmx.de> writes:
 ~snip
 ----------------------------------------
 This writes:
 new uint
 no reference
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D reference, f7490e20, f7490e10, f7490=
df0,
  f74
 90dd0
 AA
 struct
 uint
 reference
...
 Thanks,
 Ivo
In D1:
...
 Writes:
 no reference
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D reference, ad3fd0, ad3fb0, ad3f90
 new uint  << ;)
 AA
 uint
 reference
Thanks for trying it out in D1. So, summing up this means that: - In most cases, memory is by default scanned for pointers regardless of the actual data types. - In D2, newly allocated memory for a non-pointer data type (like "new uint" or "new uint[10]") is not scanned by default.
Isn't p a pointer data type? I didn't even know I could do "i =3D new int;" :D
What I mean is that p is pointing to data which has a simple data type (not a struct/class/union) that is not a pointer/reference type. For instance, with "p =3D new uint[10]" the compiler knows that the newly allocated memory that p points to does not contain any pointers. With D2, that seems to cause the memory not to be scanned.
 - In D1, you have to use hasNoPointers if you want some memory not
 to be
 scanned.
 Is this observation correct?
 And what about structs/classes that have integer fields as well as
 pointer/reference fields?
 And what about associative arrays - apparently these are scanned
 even if
 the type is uint?
 Ivo
=20 I added the struct again and also ran without the enclosing X class. =20 With X : no reference =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D reference, ad3fd0, ad3fc0, ad3fa0, ad3f80 new uint AA struct uint reference =20 Without X : no reference =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D reference, ad2fd0, ad2fc0, ad2fa0, ad2f80 new uint
... No suprises with the struct. And the "Without X" example... I am not sure, with the variables all in the current stack frame that might be a special case. What about global variables instead: ... C c; uint r; S s; uint[int] a; uint* p; uint[] arr; void f() { c =3D new C("reference"); new C("no reference"); r =3D cast(uint) cast(void*) new C("uint"); s =3D S(cast(uint) cast(void*) new C("struct")); a[0] =3D cast(uint) cast(void*) new C("AA"); p =3D new uint; *p =3D (cast(uint) cast(void*) new C("new uint")); arr =3D new uint[1]; arr[0] =3D (cast(uint) cast(void*) new C("array")); } void main(string[] args) { f(); GC.collect(); writefln("=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D %s, %x, %x, %x, %x, %x", c.s, r, s.r, a[0], *p, arr[0]); } That gives me (with D2): array new uint no reference =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D reference, f74c3e20, f74c3e10, f74c3df0, f74= c3dd0, f74c3db0 AA struct uint reference
Oct 11 2010
parent %u <e ee.com> writes:
== Quote from Ivo Kasiuk (i.kasiuk gmx.de)'s article
 I added the struct again and also ran without the enclosing X class.

 With X :
 no reference
 ========== reference, ad3fd0, ad3fc0, ad3fa0, ad3f80
 new uint
 AA
 struct
 uint
 reference

 Without X :
 no reference
 ========== reference, ad2fd0, ad2fc0, ad2fa0, ad2f80
 new uint
... No suprises with the struct. And the "Without X" example... I am not sure, with the variables all in the current stack frame that might be a special case. What about global variables instead:
Actually, those were global variables: I simply commented out the encapsulating class and constructor. But I left all the allocation in the main.. would that matter?
 ...
 C c;
 uint r;
 S s;
 uint[int] a;
 uint* p;
 uint[] arr;
 void f() {
   c = new C("reference");
   new C("no reference");
   r = cast(uint) cast(void*) new C("uint");
   s = S(cast(uint) cast(void*) new C("struct"));
   a[0] = cast(uint) cast(void*) new C("AA");
   p = new uint;
   *p = (cast(uint) cast(void*) new C("new uint"));
   arr = new uint[1];
   arr[0] = (cast(uint) cast(void*) new C("array"));
 }
 void main(string[] args) {
   f();
   GC.collect();
   writefln("========== %s, %x, %x, %x, %x, %x",
            c.s, r, s.r, a[0], *p, arr[0]);
 }
 That gives me (with D2):
 array
 new uint
 no reference
 ========== reference, f74c3e20, f74c3e10, f74c3df0, f74
 c3dd0, f74c3db0
 AA
 struct
 uint
 reference
Oct 12 2010
prev sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Sat, 09 Oct 2010 15:51:37 -0400, Ivo Kasiuk <i.kasiuk gmx.de> wrote:

 Hi!

 In my D programs I am having problems with objects not getting finalised
 although there is no reference anymore. It turned out that this is
 caused by integers which happen to have values corresponding to pointers
 into the heap. So I wrote a test program to check the GC behaviour
 concerning integer values:
[snip]
 So in most but not all situations the integer value keeps the object
 from getting finalised. This observation corresponds to the effects I
 saw in my programs.

 I find this rather unfortunate. Is this known, documented behaviour? In
 a typical program there are such integer values all over the place. How
 should such values be stored to avoid unwanted interaction with the GC?
Yes, D's garbage collector is a conservative garbage collector. One which doesn't have this problem is called a precise garbage collector. There are two problems here. First, D has unions, so it is impossible for the GC to determine if a union contains an integer or a pointer. Second problem is the granularity of scanning. A memory block is scanned as if every n bits (n being your architecture) is a pointer, or there are no pointers. This is determined by a bit associated with the block (the NO_SCAN bit). If you allocate a memory block that contains at least one pointer, then all the words in the memory block are considered to be pointers by the GC. There is a (continually updated) patch which allows the GC to be semi-precise. That is, the type information of the memory block will be linked to it. This will allow precise scanning except for unions. Once this is integrated, the false pointer problem will be much less prevalent. -Steve
Oct 14 2010
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Steven Schveighoffer:

 There are two problems here.  First, D has unions, so it is impossible for  
 the GC to determine if a union contains an integer or a pointer.
D has unions, and sometimes normal C-style unions are useful. But in many situations when you have a union you also keep a tag that represents the type, so in many of those situations you may use the tagged union of Phobos, std.variant.Algebraic (if the Phobos implementation is good enough, currently unfinished and not good enough yet) and the D GC may be aware and read and use the tag of an Algebraic union to know at runtime what's the type. This improves the GC precision a little. Bye, bearophile
Oct 14 2010
parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 14 Oct 2010 12:39:33 -0400, bearophile <bearophileHUGS lycos.com>  
wrote:

 Steven Schveighoffer:

 There are two problems here.  First, D has unions, so it is impossible  
 for
 the GC to determine if a union contains an integer or a pointer.
D has unions, and sometimes normal C-style unions are useful. But in many situations when you have a union you also keep a tag that represents the type, so in many of those situations you may use the tagged union of Phobos, std.variant.Algebraic (if the Phobos implementation is good enough, currently unfinished and not good enough yet) and the D GC may be aware and read and use the tag of an Algebraic union to know at runtime what's the type. This improves the GC precision a little.
Unions are rare enough that I think this may not be worth doing. But yes, it could be had. -Steve
Oct 14 2010
prev sibling parent reply Ivo Kasiuk <i.kasiuk gmx.de> writes:
 On Sat, 09 Oct 2010 15:51:37 -0400, Ivo Kasiuk <i.kasiuk gmx.de> wrote:
=20
 Hi!

 In my D programs I am having problems with objects not getting finalise=
d
 although there is no reference anymore. It turned out that this is
 caused by integers which happen to have values corresponding to pointer=
s
 into the heap. So I wrote a test program to check the GC behaviour
 concerning integer values:
=20 [snip] =20
 So in most but not all situations the integer value keeps the object
 from getting finalised. This observation corresponds to the effects I
 saw in my programs.

 I find this rather unfortunate. Is this known, documented behaviour? In
 a typical program there are such integer values all over the place. How
 should such values be stored to avoid unwanted interaction with the GC?
=20 Yes, D's garbage collector is a conservative garbage collector. One whic=
h =20
 doesn't have this problem is called a precise garbage collector.
=20
 There are two problems here.  First, D has unions, so it is impossible fo=
r =20
 the GC to determine if a union contains an integer or a pointer.
=20
 Second problem is the granularity of scanning.  A memory block is scanned=
=20
 as if every n bits (n being your architecture) is a pointer, or there are=
=20
 no pointers.  This is determined by a bit associated with the block (the =
=20
 NO_SCAN bit).
=20
 If you allocate a memory block that contains at least one pointer, then =20
 all the words in the memory block are considered to be pointers by the =20
 GC.  There is a (continually updated) patch which allows the GC to be =20
 semi-precise.  That is, the type information of the memory block will be =
=20
 linked to it.  This will allow precise scanning except for unions.  Once =
=20
 this is integrated, the false pointer problem will be much less prevalent=
.
=20
 -Steve
Thanks! This absolutely makes sense. It is basically a trade-off between precision and efficiency of the GC. Slowly, I am learning all the little details of D's garbage collection. It is more complicated than it seems at first, but understanding it better greatly helps to write better programs in terms of memory management. There is one case though that I am still not sure about: associative arrays. It seems that keys as well as values in AAs are scanned for pointers even if both are integer types. How can I tell the GC that I do not want them to be scanned? I know about the NO_SCAN flag but what memory region should it be applied to in this case? BTW: considering the "conservative" scanning, the implementation of Object.toHash() is somewhat interesting: hash_t toHash() { // BUG: this prevents a compacting GC from working, needs to be fixed return cast(hash_t)cast(void*)this; } So an object's hash value will keep the GC from freeing the object, if that value is scanned. But as the comment indicates, this implementation needs to be changed anyway (I am eager to see the result). A compacting GC probably gives rise to some whole new problems. Ivo
Oct 14 2010
parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 14 Oct 2010 13:35:13 -0400, Ivo Kasiuk <i.kasiuk gmx.de> wrote:
 There is one case though that I am still not sure about: associative
 arrays. It seems that keys as well as values in AAs are scanned for
 pointers even if both are integer types. How can I tell the GC that I do
 not want them to be scanned? I know about the NO_SCAN flag but what
 memory region should it be applied to in this case?
This is a common problem. I am not intimately familiar with AAs, but it may have something to do with the fact that it's not a templated type. That means the runtime is responsible for allocating AA nodes. I think at the moment there is no way to do this. I also think there is likely a bug report to this effect, and that others may have implemented better AAs to fix the issue. Try searching the bug database for AA and NO_SCAN. -Steve
Oct 14 2010