www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Bug in phobos Thread module?

reply Babele Dunnit <babele.dunnit gmail.com> writes:
Hi all,

I am a newbie with D but experienced C++ developer.. this is my first post, so
I MUST say that D ROCKS! Really impressive language. Walter, you are definitely
the Lord of Compilers. Back to the subject, I am writing a massively
multithreaded piece of code, and after some time I get a "failed to start"
error. I digged in the forums and found someone else with same problem, but no
answer. So was time to dig into Thread sources...

...and I see a static destructor cleaning a single handle (the main thread, I
suppose), but no CloseHandle on any other handle created via _beginthreadex (I
am talking about Windows, I should have specified before, sorry). 

I believe there should be an explicit Thread destructor able to free the handle
via CloseHandle; also, because thread handles under Windows are a limited
resource, probably a RIAA scheme (or explicit "delete" call) should be used,
instead of waiting for the GC to pass by..

So, I added a CloseHandle(mythread.hdl) call and now my handles count (in the
Task Manager) is much more under control...

ciao
Bab
May 11 2007
next sibling parent reply Regan Heath <regan netmail.co.nz> writes:
I suspect you have found a bug, in which case it should be reported in the bug
tracker...  Now, as I've been away for 6 months and forgotten, can someone tell
us both how that's done.

Regan
May 11 2007
next sibling parent david <ta-nospam-zz gmx.at> writes:
Regan Heath schrieb:
 I suspect you have found a bug, in which case it should be reported in the bug
tracker...  Now, as I've been away for 6 months and forgotten, can someone tell
us both how that's done.
 
 Regan
look here: http://d.puremagic.com/issues/index.cgi david
May 11 2007
prev sibling parent "Chris Miller" <chris dprogramming.com> writes:
On Fri, 11 May 2007 18:47:04 -0400, Regan Heath <regan netmail.co.nz>  =

wrote:

 I suspect you have found a bug, in which case it should be reported in=
=
 the bug tracker...  Now, as I've been away for 6 months and forgotten,=
=
 can someone tell us both how that's done.

 Regan
It has been reported and confirmed several times: http://d.puremagic.com/issues/show_bug.cgi?id=3D318
May 11 2007
prev sibling next sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Babele Dunnit wrote:

 I am a newbie with D but experienced C++ developer.. this is my first post, so
I MUST say that D ROCKS! Really impressive language. Walter, you are definitely
the Lord of Compilers. Back to the subject, I am writing a massively
multithreaded piece of code, and after some time I get a "failed to start"
error. I digged in the forums and found someone else with same problem, but no
answer. So was time to dig into Thread sources...
 
 ...and I see a static destructor cleaning a single handle (the main thread, I
suppose), but no CloseHandle on any other handle created via _beginthreadex (I
am talking about Windows, I should have specified before, sorry). 
 
 I believe there should be an explicit Thread destructor able to free the
handle via CloseHandle; also, because thread handles under Windows are a
limited resource, probably a RIAA scheme (or explicit "delete" call) should be
used, instead of waiting for the GC to pass by..
 
 So, I added a CloseHandle(mythread.hdl) call and now my handles count (in the
Task Manager) is much more under control...
I'm not too experienced with threading, so if you could post a patch that would be welcome.
May 11 2007
next sibling parent reply kenny <funisher gmail.com> writes:
Walter Bright wrote:
 Babele Dunnit wrote:
 
 I am a newbie with D but experienced C++ developer.. this is my first
 post, so I MUST say that D ROCKS! Really impressive language. Walter,
 you are definitely the Lord of Compilers. Back to the subject, I am
 writing a massively multithreaded piece of code, and after some time I
 get a "failed to start" error. I digged in the forums and found
 someone else with same problem, but no answer. So was time to dig into
 Thread sources...

 ...and I see a static destructor cleaning a single handle (the main
 thread, I suppose), but no CloseHandle on any other handle created via
 _beginthreadex (I am talking about Windows, I should have specified
 before, sorry).
 I believe there should be an explicit Thread destructor able to free
 the handle via CloseHandle; also, because thread handles under Windows
 are a limited resource, probably a RIAA scheme (or explicit "delete"
 call) should be used, instead of waiting for the GC to pass by..

 So, I added a CloseHandle(mythread.hdl) call and now my handles count
 (in the Task Manager) is much more under control...
I'm not too experienced with threading, so if you could post a patch that would be welcome.
I also experience the same issue in linux. It happens after I create/delete over 8000 threads. I dunno why 8000 is the magic number, but it is for my machine. I'll try and look into it as well on linux. Just wanted to let you know that the problem also occurs in linux
May 12 2007
parent reply Regan Heath <regan netmail.co.nz> writes:
kenny Wrote:
 I also experience the same issue in linux. It happens after I create/delete
over 8000 threads. I dunno why 8000 is the magic number, but it is for my
machine. I'll try and look into it as well on linux. Just wanted to let you
know that the problem also occurs in linux
Each operating system has a maximum number of handles, it differs for operating system, each version of each operating system, and can be defined for each process upon execution as well. Or so I recall from my travels. Regan Heath
May 12 2007
parent reply kenny <funisher gmail.com> writes:
Regan Heath wrote:
 kenny Wrote:
 I also experience the same issue in linux. It happens after I create/delete
over 8000 threads. I dunno why 8000 is the magic number, but it is for my
machine. I'll try and look into it as well on linux. Just wanted to let you
know that the problem also occurs in linux
Each operating system has a maximum number of handles, it differs for operating system, each version of each operating system, and can be defined for each process upon execution as well. Or so I recall from my travels. Regan Heath
wow, now it's only giving me 382 threads... here is test code. Later today, I will dip into phobos and try and find the problem.
May 12 2007
parent reply Lars Ivar Igesund <larsivar igesund.net> writes:
kenny wrote:

 Regan Heath wrote:
 kenny Wrote:
 I also experience the same issue in linux. It happens after I
 create/delete over 8000 threads. I dunno why 8000 is the magic number,
 but it is for my machine. I'll try and look into it as well on linux.
 Just wanted to let you know that the problem also occurs in linux
Each operating system has a maximum number of handles, it differs for operating system, each version of each operating system, and can be defined for each process upon execution as well. Or so I recall from my travels. Regan Heath
wow, now it's only giving me 382 threads... here is test code. Later today, I will dip into phobos and try and find the problem.
For what it's worth, the Tango version of Thread has several fixes and changes compared to the Phobos version, and also provides other features like thread local storage and fibers/co-routines. -- Lars Ivar Igesund blog at http://larsivi.net DSource, #d.tango & #D: larsivi Dancing the Tango
May 12 2007
parent reply kenny <funisher gmail.com> writes:
Lars Ivar Igesund wrote:
 kenny wrote:
 
 Regan Heath wrote:
 kenny Wrote:
 I also experience the same issue in linux. It happens after I
 create/delete over 8000 threads. I dunno why 8000 is the magic number,
 but it is for my machine. I'll try and look into it as well on linux.
 Just wanted to let you know that the problem also occurs in linux
Each operating system has a maximum number of handles, it differs for operating system, each version of each operating system, and can be defined for each process upon execution as well. Or so I recall from my travels. Regan Heath
wow, now it's only giving me 382 threads... here is test code. Later today, I will dip into phobos and try and find the problem.
For what it's worth, the Tango version of Thread has several fixes and changes compared to the Phobos version, and also provides other features like thread local storage and fibers/co-routines.
well, it looks like I'll be doing one of two things this weekend :) 1. read http://yolinux.com/TUTORIALS/LinuxTutorialPosixThreads.html 2. install tango / tangbos because I have to make a program by wednesday that needs to create unlimited threads over unlimited time (eg a daemon of sorts). I'll check out tango, as it sounds a lot easier than bugfixing tango. Thanks for the tip :) Kenny
May 12 2007
parent reply Lars Ivar Igesund <larsivar igesund.net> writes:
kenny wrote:

 well, it looks like I'll be doing one of two things this weekend :)
 
 1. read http://yolinux.com/TUTORIALS/LinuxTutorialPosixThreads.html
I can also suggest http://www.dsource.org/projects/tango/wiki/ChapterThreading
 ... as it sounds a lot easier than bugfixing tango. 
I hope you meant Phobos ;) -- Lars Ivar Igesund blog at http://larsivi.net DSource, #d.tango & #D: larsivi Dancing the Tango
May 12 2007
parent kenny <funisher gmail.com> writes:
Lars Ivar Igesund wrote:
 kenny wrote:
 
 well, it looks like I'll be doing one of two things this weekend :)

 1. read http://yolinux.com/TUTORIALS/LinuxTutorialPosixThreads.html
I can also suggest http://www.dsource.org/projects/tango/wiki/ChapterThreading
 ... as it sounds a lot easier than bugfixing tango. 
I hope you meant Phobos ;)
awesome, thanks for the link. Yep, I meant phobos :)
May 12 2007
prev sibling parent Regan Heath <regan netmail.co.nz> writes:
Walter Bright Wrote:
 Babele Dunnit wrote:
 
 I am a newbie with D but experienced C++ developer.. this is my first post, so
I MUST say that D ROCKS! Really impressive language. Walter, you are definitely
the Lord of Compilers. Back to the subject, I am writing a massively
multithreaded piece of code, and after some time I get a "failed to start"
error. I digged in the forums and found someone else with same problem, but no
answer. So was time to dig into Thread sources...
 
 ...and I see a static destructor cleaning a single handle (the main thread, I
suppose), but no CloseHandle on any other handle created via _beginthreadex (I
am talking about Windows, I should have specified before, sorry). 
 
 I believe there should be an explicit Thread destructor able to free the
handle via CloseHandle; also, because thread handles under Windows are a
limited resource, probably a RIAA scheme (or explicit "delete" call) should be
used, instead of waiting for the GC to pass by..
 
 So, I added a CloseHandle(mythread.hdl) call and now my handles count (in the
Task Manager) is much more under control...
I'm not too experienced with threading, so if you could post a patch that would be welcome.
I've no idea how to do a patch as such but I believe all you need is a call to CloseHandle in: extern (Windows) static uint threadstart(void *p) as this wrapper is used for all threads and the stuff at the end of the wrapper removes the thread from allThreads and does cleanup. eg debug (thread) printf("Ending thread %d\n", t.idx); version (Win32) { CloseHandle(t.hdl); t.hdl = cast(thread_hdl)0; } t.state = TS.TERMINATED; allThreads[t.idx] = null; I cannot recall (nor do I have old code I can refer to, sadly) whether there is a similar call required on Linux etc, I suspect not given that pthread_create returns a thread id, as opposed to a handle per-se. I have vague recollections of calling kill, but that may be to forcibly terminate a thread as opposed to cleaning up a handle. Regan Heath
May 12 2007
prev sibling parent reply Regan Heath <regan netmail.co.nz> writes:
Thought you might want some test code to demonstrate the problem.

# import std.thread, std.c.time, std.c.windows.windows;
# 
# class Threader : Thread {
# 	int id;
# 	
# 	this(int _id) { id = _id; printf("thread %d\n",id); }
# 	~this() {printf("deconstructing %d\n",id);}
# 	
# 	int run() {
# 		msleep(1000);
# 		CloseHandle(hdl);
# 		delete this;
# 		return 0;
# 	}
# }
# 
# void main() {
# 	for(uint i = 0; i < 4000; i++) {
# 		(new Threader(i)).start;
# 		msleep(50);
# 	}
# }

If you remove the call to CloseHandle and configure task manager to show
handles and threads you should see the number of threads at a constant, around
17, and the number of handles steadily climbing.

The addition of CloseHandle causes the number of handles to remain steady also,
at approximately 2x the number of threads.

The correct location for the call to CloseHandle, I believe, is at the end of
threadstart, as shown in my other reply.

Can someone run this same code on linux (without CloseHandle etc) and use 'top'
or similar to watch the thread and handle counts, I suspect no fix is required
for linux but I may be mistaken.
May 13 2007
parent reply Regan Heath <regan netmail.co.nz> writes:
One further question, is it actually valid to call delete this in a non-static
method of a class as shown below.  It seems to work but I would not have
thought it possible...

 # class Threader : Thread {
 # 	int run() {
 # 		delete this;
 # 		return 0;
 # 	}
 # }
Regan Heath
May 13 2007
parent reply "Vladimir Panteleev" <thecybershadow gmail.com> writes:
On Sun, 13 May 2007 18:23:13 +0300, Regan Heath <regan netmail.co.nz> wrote:

 One further question, is it actually valid to call delete this in a non-static
method of a class as shown below.  It seems to work but I would not have
thought it possible...
I don't see why this wouldn't work, with the condition that you don't access class fields or virtual methods after deleting the object (this includes invariants). The above might work in very specific circumstances, but you should never rely on accessing data marked as deallocated. -- Best regards, Vladimir mailto:thecybershadow gmail.com
May 13 2007
parent reply Daniel Keep <daniel.keep.lists gmail.com> writes:
Vladimir Panteleev wrote:
 On Sun, 13 May 2007 18:23:13 +0300, Regan Heath <regan netmail.co.nz> wrote:
 
 One further question, is it actually valid to call delete this in a non-static
method of a class as shown below.  It seems to work but I would not have
thought it possible...
I don't see why this wouldn't work, with the condition that you don't access class fields or virtual methods after deleting the object (this includes invariants). The above might work in very specific circumstances, but you should never rely on accessing data marked as deallocated.
The problem I can see with this is that *something* must still have a reference to the object, otherwise the method couldn't have been called. And if that's the case, that reference is now left dangling. (Plus, you can't use this on any class with invariants or they'll blow up.) I suppose, if you were careful, it would be OK. But frankly, it just seems wrong... a bit like returning pointers to local variables based on the assumption they get used before calling something else; technically feasible, but just *asking* for problems. -- Daniel -- int getRandomNumber() { return 4; // chosen by fair dice roll. // guaranteed to be random. } http://xkcd.com/ v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP http://hackerkey.com/
May 13 2007
parent reply "Vladimir Panteleev" <thecybershadow gmail.com> writes:
On Mon, 14 May 2007 02:11:56 +0300, Daniel Keep <daniel.keep.lists gmail.com>
wrote:

 Vladimir Panteleev wrote:
 On Sun, 13 May 2007 18:23:13 +0300, Regan Heath <regan netmail.co.nz> wrote:

 One further question, is it actually valid to call delete this in a non-static
method of a class as shown below.  It seems to work but I would not have
thought it possible...
I don't see why this wouldn't work, with the condition that you don't access class fields or virtual methods after deleting the object (this includes invariants). The above might work in very specific circumstances, but you should never rely on accessing data marked as deallocated.
The problem I can see with this is that *something* must still have a reference to the object, otherwise the method couldn't have been called. And if that's the case, that reference is now left dangling.
Not really. The object context, aka "this", is nothing more than a "hidden" function parameter. Classes are just pointers to structs with VMTs. So "delete this" isn't different than deallocating something the reference to which was given to you as a function parameter. The same logic applies - the calling code shouldn't use the object reference after calling the function/method, just like you wouldn't use a reference after you delete it explicitely :)
 I suppose, if you were careful, it would be OK.  But frankly, it just
 seems wrong... a bit like returning pointers to local variables based on
 the assumption they get used before calling something else; technically
 feasible, but just *asking* for problems.
Returning pointers to local variables is comparable to this situation only when you actually DO use the class's fields or virtual methods after you delete it. Just like with the stack, memory has been deallocated but usually not overwritten. In pre-multi-tasking OSes (MS-DOS specifically), using data left in the callees' stacks had a real risk of being overwritten by hardware interrupt handlers. -- Best regards, Vladimir mailto:thecybershadow gmail.com
May 13 2007
parent Daniel Keep <daniel.keep.lists gmail.com> writes:
Vladimir Panteleev wrote:
 On Mon, 14 May 2007 02:11:56 +0300, Daniel Keep <daniel.keep.lists gmail.com>
wrote:
 
 Vladimir Panteleev wrote:
 On Sun, 13 May 2007 18:23:13 +0300, Regan Heath <regan netmail.co.nz> wrote:

 One further question, is it actually valid to call delete this in a non-static
method of a class as shown below.  It seems to work but I would not have
thought it possible...
I don't see why this wouldn't work, with the condition that you don't access class fields or virtual methods after deleting the object (this includes invariants). The above might work in very specific circumstances, but you should never rely on accessing data marked as deallocated.
The problem I can see with this is that *something* must still have a reference to the object, otherwise the method couldn't have been called. And if that's the case, that reference is now left dangling.
Not really. The object context, aka "this", is nothing more than a "hidden" function parameter. Classes are just pointers to structs with VMTs. So "delete this" isn't different than deallocating something the reference to which was given to you as a function parameter. The same logic applies - the calling code shouldn't use the object reference after calling the function/method, just like you wouldn't use a reference after you delete it explicitely :)
I *know* how classes work; I wrote assembler code a while back to figure out how the D ABI worked back before it was documented :P
 I suppose, if you were careful, it would be OK.  But frankly, it just
 seems wrong... a bit like returning pointers to local variables based on
 the assumption they get used before calling something else; technically
 feasible, but just *asking* for problems.
Returning pointers to local variables is comparable to this situation only when you actually DO use the class's fields or virtual methods after you delete it. Just like with the stack, memory has been deallocated but usually not overwritten. In pre-multi-tasking OSes (MS-DOS specifically), using data left in the callees' stacks had a real risk of being overwritten by hardware interrupt handlers.
My problem with this is basically the same reason Joel asserts that exceptions are evil: it's not that it necessarily introduces bugs or bad behaviour, but it sure as hell helps mask it! For instance: foo.run(); What about that line indicates that "foo" is no longer a valid reference? Absolutely nothing; you could argue that "the docs should state that the object deletes itself", but let's face it: reading the documentation is the *last* thing most people do. Programmers are lazy by their very definition (if we *weren't* lazy, we wouldn't be spending all this time and effort trying to get computers to do stuff for us.) Hell, I know I've looked at code before that did something sneaky and completely missed what it was doing, causing me no end of grief: and that's on code I wrote myself! On the other hand: foo.runThenDelete(); Is marginally better, but it's still obfuscating the deletion; what's it deleting? foo.run(); delete foo; Is not that much more typing, and is explicit about what's going on. Anyway, just my AU$0.02 :) -- Daniel -- int getRandomNumber() { return 4; // chosen by fair dice roll. // guaranteed to be random. } http://xkcd.com/ v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP http://hackerkey.com/
May 13 2007