digitalmars.D - D1 garbage collector + threads + malloc = garbage?

Bane (46/46) Dec 11 2009 Bug with GC fullCollect() in multithreaded environment. Grauzone explain...

Leandro Lucarella (42/99) Dec 11 2009 The patch is very simple (as pointed by Grauzone), I guess it should mak...

Sean Kelly (2/4) Dec 11 2009 To be perfectly safe, it's a tiny bit trickier than that. You have to a...

Walter Bright (4/22) Dec 11 2009 Yes, the lock on the gc must still be held until the collection work is

Sean Kelly (2/25) Dec 12 2009 Hm... so if a dtor creates a new thread then that thread could enter and...

Walter Bright (4/32) Dec 12 2009 No, because the thread doing the GC collection has locked the GC. Any

Sean Kelly (2/35) Dec 12 2009 Yeah, but it releases the lock when the collection completes, and the it...

Walter Bright (4/13) Dec 12 2009 I don't understand why there is a problem with this. When the collection...

Sean Kelly (2/16) Dec 12 2009 Ah, I didn't realize the allocation call backed up that far after the co...

Walter Bright (2/4) Dec 11 2009 It's a good fix, I checked in a patch for it.

Leandro Lucarella (10/15) Dec 12 2009 Great, thanks!

Leandro Lucarella (14/22) Dec 12 2009 If you could add a small description of what the commit is about besides

Walter Bright (3/6) Dec 12 2009 Just click on the link! (At least, that's how the changelog works.)

Leandro Lucarella (18/25) Dec 13 2009 There are no links in the commit messages, and my RSS feed gets only the

Bane (3/8) Dec 12 2009 Thanx :)

Bane (4/109) Dec 12 2009 Home computer I tested it in is winxp with phenom quad core, but this bu...
Bane (1/10) Dec 12 2009 I forgot: for linux just remove msleep lines completely. It will deadloc...

Bane <branimir.milosavljevic gmail.com> writes:

Bug with GC fullCollect() in multithreaded environment. Grauzone explained it
here http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=99610

Test case that freezes app (tested on 1.053):


import std.c.stdlib;
import std.stdio;
import std.gc;
import std.thread;
import std.c.time;


class Foo {
  void* p;
  this(){
    synchronized(this.classinfo) {
      writefln("malloc begin");
      p = std.c.stdlib.malloc(1024*1024);
      writefln("malloc finished");
    }
  }

  ~this(){
    synchronized(this.classinfo){
      writefln("free begin");
      std.c.stdlib.free(p);
      writefln("free finished");
    }
  }
}

class MyThread : Thread {
  int run(){
    while(true){
      new Foo;
      msleep(1);
    }
    return 0;
  }
}

void main(){
  for(int i=0; i<10; i++){
    (new MyThread).start;
  }
  
  while(true){
    writefln("collect begin");
    std.gc.fullCollect;
    writefln("collect finished");
    msleep(1000);
  }
}



Can it be fixed or adleast mentioned somewhere in docs as known limitation of
D1 GC? It would save a lot of time to some people (too late for me :).

Dec 11 2009

Leandro Lucarella <llucax gmail.com> writes:

Bane, el 11 de diciembre a las 06:00 me escribiste:
 Bug with GC fullCollect() in multithreaded environment. Grauzone explained it
here http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=99610
 
 Test case that freezes app (tested on 1.053):
 
 
 import std.c.stdlib;
 import std.stdio;
 import std.gc;
 import std.thread;
 import std.c.time;
 
 
 class Foo {
   void* p;
   this(){
     synchronized(this.classinfo) {
       writefln("malloc begin");
       p = std.c.stdlib.malloc(1024*1024);
       writefln("malloc finished");
     }
   }
 
   ~this(){
     synchronized(this.classinfo){
       writefln("free begin");
       std.c.stdlib.free(p);
       writefln("free finished");
     }
   }
 }
 
 class MyThread : Thread {
   int run(){
     while(true){
       new Foo;
       msleep(1);
     }
     return 0;
   }
 }
 
 void main(){
   for(int i=0; i<10; i++){
     (new MyThread).start;
   }
   
   while(true){
     writefln("collect begin");
     std.gc.fullCollect;
     writefln("collect finished");
     msleep(1000);
   }
 }
 
 
 
 Can it be fixed or adleast mentioned somewhere in docs as known limitation of
D1 GC? It would save a lot of time to some people (too late for me :).

The patch is very simple (as pointed by Grauzone), I guess it should make it
into D1 (I hope):

------------------------------------------------------------
diff --git a/phobos/internal/gc/gcx.d b/phobos/internal/gc/gcx.d
index b0d2821..7b1a7a3 100644
--- a/phobos/internal/gc/gcx.d
+++ b/phobos/internal/gc/gcx.d
   -2033,8 +2033,6    struct Gcx
            }
        }
 
-       Thread.resumeAll();
-
        // Free up everything not marked
        debug(COLLECT_PRINTF) printf("\tfree'ing\n");
        size_t freedpages = 0;
   -2194,6 +2192,8    struct Gcx
        debug(COLLECT_PRINTF) printf("recovered pages = %d\n", recoveredpages);
        debug(COLLECT_PRINTF) printf("\tfree'd %u bytes, %u pages from %u pools\
 
+       Thread.resumeAll();
+
        return freedpages + recoveredpages;
     }
 
------------------------------------------------------------

I wanted to reproduce it to make a bug report to attach the patch but
I couldn't. I replaced msleep() with usleep() (and multiplied the values
by 1000) because I'm in Linux, but that's all I changed. I have an old AMD
Athlon 1.8GHz. How do you reproduce it?

I want to be sure the bug is present before the patch and fixed after the
patch before submiting the patch via Bugzilla.

Thanks!

-- 
Leandro Lucarella (AKA luca)                     http://llucax.com.ar/
----------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------
Nos retiramos hasta la semana que viene reflexionando sobre nuestras
vidas: "Qué vida de mier'... Qué vida de mier'!"
	-- Sidharta Kiwi

Dec 11 2009

Sean Kelly <sean invisibleduck.org> writes:

Leandro Lucarella Wrote:

 Bane, el 11 de diciembre a las 06:00 me escribiste:
 Bug with GC fullCollect() in multithreaded environment. Grauzone explained it
here http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=99610


To be perfectly safe, it's a tiny bit trickier than that.  You have to always
lock on allocations instead of just locking the collection cycle in
single-threaded apps.  I think the reason for this is that once the lock is
acquired it must be held until the current thread exits the GC code, and the
"synchronized" statement only allows you to lock for the length of the
collect() call (because the start and end must be within the same scope). 
However, this /may/ actually be a separate bug with the D 1.0 GC.  I really
can't recall for sure if this was an issue with collecting when all the other
threads were suspended.

Dec 11 2009

Walter Bright <newshound1 digitalmars.com> writes:

Sean Kelly wrote:
 Leandro Lucarella Wrote:
 
 Bane, el 11 de diciembre a las 06:00 me escribiste:
 Bug with GC fullCollect() in multithreaded environment. Grauzone
 explained it here
 http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=99610
 


 
 To be perfectly safe, it's a tiny bit trickier than that.  You have
 to always lock on allocations instead of just locking the collection
 cycle in single-threaded apps.  I think the reason for this is that
 once the lock is acquired it must be held until the current thread
 exits the GC code, and the "synchronized" statement only allows you
 to lock for the length of the collect() call (because the start and
 end must be within the same scope).  However, this /may/ actually be
 a separate bug with the D 1.0 GC.  I really can't recall for sure if
 this was an issue with collecting when all the other threads were
 suspended.

Yes, the lock on the gc must still be held until the collection work is 
completed, but the other threads can be unfrozen once the mark phase is 
complete.

Dec 11 2009

Sean Kelly <sean invisibleduck.org> writes:

Walter Bright Wrote:

 Sean Kelly wrote:
 Leandro Lucarella Wrote:
 
 Bane, el 11 de diciembre a las 06:00 me escribiste:
 Bug with GC fullCollect() in multithreaded environment. Grauzone
 explained it here
 http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=99610
 


 
 To be perfectly safe, it's a tiny bit trickier than that.  You have
 to always lock on allocations instead of just locking the collection
 cycle in single-threaded apps.  I think the reason for this is that
 once the lock is acquired it must be held until the current thread
 exits the GC code, and the "synchronized" statement only allows you
 to lock for the length of the collect() call (because the start and
 end must be within the same scope).  However, this /may/ actually be
 a separate bug with the D 1.0 GC.  I really can't recall for sure if
 this was an issue with collecting when all the other threads were
 suspended.

 
 Yes, the lock on the gc must still be held until the collection work is 
 completed, but the other threads can be unfrozen once the mark phase is 
 complete.

Hm... so if a dtor creates a new thread then that thread could enter and "lock"
the GC while the first thread is still completing its allocation.  Couldn't
this cause problems?

Dec 12 2009

Walter Bright <newshound1 digitalmars.com> writes:

Sean Kelly wrote:
 Walter Bright Wrote:
 
 Sean Kelly wrote:
 Leandro Lucarella Wrote:
 
 Bane, el 11 de diciembre a las 06:00 me escribiste:
 Bug with GC fullCollect() in multithreaded environment.
 Grauzone explained it here 
 http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=99610
 
 


 To be perfectly safe, it's a tiny bit trickier than that.  You
 have to always lock on allocations instead of just locking the
 collection cycle in single-threaded apps.  I think the reason for
 this is that once the lock is acquired it must be held until the
 current thread exits the GC code, and the "synchronized"
 statement only allows you to lock for the length of the collect()
 call (because the start and end must be within the same scope).
 However, this /may/ actually be a separate bug with the D 1.0 GC.
 I really can't recall for sure if this was an issue with
 collecting when all the other threads were suspended.

 Yes, the lock on the gc must still be held until the collection
 work is completed, but the other threads can be unfrozen once the
 mark phase is complete.

 
 Hm... so if a dtor creates a new thread then that thread could enter
 and "lock" the GC while the first thread is still completing its
 allocation.  Couldn't this cause problems?

No, because the thread doing the GC collection has locked the GC. Any 
other thread attempting to alloc will wait on that lock. There is only 
one GC lock.

Dec 12 2009

Sean Kelly <sean invisibleduck.org> writes:

Walter Bright Wrote:

 Sean Kelly wrote:
 Walter Bright Wrote:
 
 Sean Kelly wrote:
 Leandro Lucarella Wrote:
 
 Bane, el 11 de diciembre a las 06:00 me escribiste:
 Bug with GC fullCollect() in multithreaded environment.
 Grauzone explained it here 
 http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=99610
 
 


 To be perfectly safe, it's a tiny bit trickier than that.  You
 have to always lock on allocations instead of just locking the
 collection cycle in single-threaded apps.  I think the reason for
 this is that once the lock is acquired it must be held until the
 current thread exits the GC code, and the "synchronized"
 statement only allows you to lock for the length of the collect()
 call (because the start and end must be within the same scope).
 However, this /may/ actually be a separate bug with the D 1.0 GC.
 I really can't recall for sure if this was an issue with
 collecting when all the other threads were suspended.

 Yes, the lock on the gc must still be held until the collection
 work is completed, but the other threads can be unfrozen once the
 mark phase is complete.

 
 Hm... so if a dtor creates a new thread then that thread could enter
 and "lock" the GC while the first thread is still completing its
 allocation.  Couldn't this cause problems?

 
 No, because the thread doing the GC collection has locked the GC. Any 
 other thread attempting to alloc will wait on that lock. There is only 
 one GC lock.

Yeah, but it releases the lock when the collection completes, and the it
retries the allocation to actually obtain memory.  If the lock were held until
the thread exited the GC code completely there wouldn't be a problem.  I've
thought about switching to using a Mutex in D2 so the lock could be acquired at
the collection and released at the end of GC.malloc(), but this isn't an option
in D1.

Dec 12 2009

Walter Bright <newshound1 digitalmars.com> writes:

Sean Kelly wrote:
 Walter Bright Wrote:
 No, because the thread doing the GC collection has locked the GC.
 Any other thread attempting to alloc will wait on that lock. There
 is only one GC lock.

 
 Yeah, but it releases the lock when the collection completes, and the
 it retries the allocation to actually obtain memory.  If the lock
 were held until the thread exited the GC code completely there
 wouldn't be a problem.

I don't understand why there is a problem with this. When the collection 
is complete, release lock, then get lock again for allocation, if the 
allocation fails, then go for another collection.

Dec 12 2009

Sean Kelly <sean invisibleduck.org> writes:

Walter Bright Wrote:

 Sean Kelly wrote:
 Walter Bright Wrote:
 No, because the thread doing the GC collection has locked the GC.
 Any other thread attempting to alloc will wait on that lock. There
 is only one GC lock.

 
 Yeah, but it releases the lock when the collection completes, and the
 it retries the allocation to actually obtain memory.  If the lock
 were held until the thread exited the GC code completely there
 wouldn't be a problem.

 
 I don't understand why there is a problem with this. When the collection 
 is complete, release lock, then get lock again for allocation, if the 
 allocation fails, then go for another collection.

Ah, I didn't realize the allocation call backed up that far after the
collection.  It's been a while since I looked at the code.

Dec 12 2009

Walter Bright <newshound1 digitalmars.com> writes:

Leandro Lucarella wrote:
 I want to be sure the bug is present before the patch and fixed after the
 patch before submiting the patch via Bugzilla.

It's a good fix, I checked in a patch for it.

Dec 11 2009

Leandro Lucarella <llucax gmail.com> writes:

Walter Bright, el 11 de diciembre a las 20:51 me escribiste:
 Leandro Lucarella wrote:
I want to be sure the bug is present before the patch and fixed after the
patch before submiting the patch via Bugzilla.

 
 It's a good fix, I checked in a patch for it.

Great, thanks!

-- 
Leandro Lucarella (AKA luca)                     http://llucax.com.ar/
----------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------
You should've seen her face. It was the exact same look my father gave me when
I told him I wanted to be a ventriloquist.
	-- George Constanza

Dec 12 2009

Leandro Lucarella <llucax gmail.com> writes:

Leandro Lucarella, el 12 de diciembre a las 11:18 me escribiste:
 Walter Bright, el 11 de diciembre a las 20:51 me escribiste:
 Leandro Lucarella wrote:
I want to be sure the bug is present before the patch and fixed after the
patch before submiting the patch via Bugzilla.

 
 It's a good fix, I checked in a patch for it.

 
 Great, thanks!

If you could add a small description of what the commit is about besides
pointing the NG discussion/Bugzilla number, as you are doing with DMD now,
it would be really great! =)

-- 
Leandro Lucarella (AKA luca)                     http://llucax.com.ar/
----------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------
Le pedí que me enseñe a usar el mouse
Pero solo quiere hablarme del Bauhaus
Le pregunté si era chorra o rockera
Me dijo "Gertrude Stein era re-tortillera"
Me hizo mucho mal la cumbiera intelectual

Dec 12 2009

Walter Bright <newshound1 digitalmars.com> writes:

Leandro Lucarella wrote:
 If you could add a small description of what the commit is about besides
 pointing the NG discussion/Bugzilla number, as you are doing with DMD now,
 it would be really great! =)

Just click on the link! (At least, that's how the changelog works.) 
Also, I added a comment in the source code that explains the why.

Dec 12 2009

Leandro Lucarella <llucax gmail.com> writes:

Walter Bright, el 12 de diciembre a las 10:29 me escribiste:
 Leandro Lucarella wrote:
If you could add a small description of what the commit is about besides
pointing the NG discussion/Bugzilla number, as you are doing with DMD now,
it would be really great! =)

 
 Just click on the link! (At least, that's how the changelog works.)
 Also, I added a comment in the source code that explains the why.

There are no links in the commit messages, and my RSS feed gets only the
commit message, not the diff, so to actually see the diff I have to open
the RSS "link" to go to the full changeset information. So, if there are
10 changes, I have to open 10 new web pages to see what the changes are
about. It would be nice to know what the changes are about *before*
opening any new web pages, so I can only click in the changes I'm
interested in seeing fully.

Thanks.

-- 
Leandro Lucarella (AKA luca)                     http://llucax.com.ar/
----------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------
Si pusieramos la máquina de cortar boludos dentro de la máquina del tiempo
y se pusiera a cortar boludos históricos con retroactividad,
otra sería la historieta hoy.
	-- Tato Bores

Dec 13 2009

Bane <branimir.milosavljevic gmail.com> writes:

Walter Bright Wrote:

 Leandro Lucarella wrote:
 I want to be sure the bug is present before the patch and fixed after the
 patch before submiting the patch via Bugzilla.

 
 It's a good fix, I checked in a patch for it.

Thanx :)

I tested it with patched phobos on winxp/centos and it works.

Dec 12 2009

Bane <branimir.milosavljevic gmail.com> writes:

Leandro Lucarella Wrote:

 Bane, el 11 de diciembre a las 06:00 me escribiste:
 Bug with GC fullCollect() in multithreaded environment. Grauzone explained it
here http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=99610
 
 Test case that freezes app (tested on 1.053):
 
 
 import std.c.stdlib;
 import std.stdio;
 import std.gc;
 import std.thread;
 import std.c.time;
 
 
 class Foo {
   void* p;
   this(){
     synchronized(this.classinfo) {
       writefln("malloc begin");
       p = std.c.stdlib.malloc(1024*1024);
       writefln("malloc finished");
     }
   }
 
   ~this(){
     synchronized(this.classinfo){
       writefln("free begin");
       std.c.stdlib.free(p);
       writefln("free finished");
     }
   }
 }
 
 class MyThread : Thread {
   int run(){
     while(true){
       new Foo;
       msleep(1);
     }
     return 0;
   }
 }
 
 void main(){
   for(int i=0; i<10; i++){
     (new MyThread).start;
   }
   
   while(true){
     writefln("collect begin");
     std.gc.fullCollect;
     writefln("collect finished");
     msleep(1000);
   }
 }
 
 
 
 Can it be fixed or adleast mentioned somewhere in docs as known limitation of
D1 GC? It would save a lot of time to some people (too late for me :).

 
 The patch is very simple (as pointed by Grauzone), I guess it should make it
 into D1 (I hope):
 
 ------------------------------------------------------------
 diff --git a/phobos/internal/gc/gcx.d b/phobos/internal/gc/gcx.d
 index b0d2821..7b1a7a3 100644
 --- a/phobos/internal/gc/gcx.d
 +++ b/phobos/internal/gc/gcx.d
    -2033,8 +2033,6    struct Gcx
             }
         }
  
 -       Thread.resumeAll();
 -
         // Free up everything not marked
         debug(COLLECT_PRINTF) printf("\tfree'ing\n");
         size_t freedpages = 0;
    -2194,6 +2192,8    struct Gcx
         debug(COLLECT_PRINTF) printf("recovered pages = %d\n", recoveredpages);
         debug(COLLECT_PRINTF) printf("\tfree'd %u bytes, %u pages from %u
pools\
  
 +       Thread.resumeAll();
 +
         return freedpages + recoveredpages;
      }
  
 ------------------------------------------------------------
 
 I wanted to reproduce it to make a bug report to attach the patch but
 I couldn't. I replaced msleep() with usleep() (and multiplied the values
 by 1000) because I'm in Linux, but that's all I changed. I have an old AMD
 Athlon 1.8GHz. How do you reproduce it?
 
 I want to be sure the bug is present before the patch and fixed after the
 patch before submiting the patch via Bugzilla.
 
 Thanks!
 
 -- 
 Leandro Lucarella (AKA luca)                     http://llucax.com.ar/
 ----------------------------------------------------------------------
 GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
 ----------------------------------------------------------------------
 Nos retiramos hasta la semana que viene reflexionando sobre nuestras
 vidas: "Qué vida de mier'... Qué vida de mier'!"
 	-- Sidharta Kiwi

Home computer I tested it in is winxp with phenom quad core, but this bug is
confirmed with centos 5.2 + intel dual core and centos 5.2 + intel xeon.

Bug is present for a long time, thats for sure, It took me a while to track it
because I thought its just me abusing keyboard, not a bug in phobos. Worst
thing with this bug is that there is no output or exception when it happens -
just that dreaded deadlock :)

Once thing I noted - if you reduce memory required in that malloc call from 1
MB to 1 KB or less (or increase sleep time between calls), chances for bug to
occur drop drastically, so in most applications this probably will never happen.

Dec 12 2009

Bane <branimir.milosavljevic gmail.com> writes:

 I wanted to reproduce it to make a bug report to attach the patch but
 I couldn't. I replaced msleep() with usleep() (and multiplied the values
 by 1000) because I'm in Linux, but that's all I changed. I have an old AMD
 Athlon 1.8GHz. How do you reproduce it?
 
 I want to be sure the bug is present before the patch and fixed after the
 patch before submiting the patch via Bugzilla.
 
 Thanks!

I forgot: for linux just remove msleep lines completely. It will deadlock soon
after program start. Seems that linux is more resistant to this bug than
windows.

Dec 12 2009

D Programming

C/C++ Programming

Other

digitalmars.D - D1 garbage collector + threads + malloc = garbage?