www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - druntime thread_needLock()

reply dsimcha <dsimcha yahoo.com> writes:
According to both the docs and my own experiments, thread_needLock() in
core.thread returns a bool that depends on whether the current process has
*ever* been multithreaded at *any* point in its execution.  In Phobos's GC
(pre-druntime), a similar function existed, but it returned a bool based on
whether more than 1 thread was *currently* running.  It seems to me that
omitting locks should be safe if no more than 1 thread is currently running,
even if more than 1 was running at some point in the past.  Why is druntime's
thread_needLock() designed the way it is?
Dec 05 2008
next sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
dsimcha wrote:
 According to both the docs and my own experiments, thread_needLock() in
 core.thread returns a bool that depends on whether the current process has
 *ever* been multithreaded at *any* point in its execution.  In Phobos's GC
 (pre-druntime), a similar function existed, but it returned a bool based on
 whether more than 1 thread was *currently* running.  It seems to me that
 omitting locks should be safe if no more than 1 thread is currently running,
 even if more than 1 was running at some point in the past.  Why is druntime's
 thread_needLock() designed the way it is?

Typically, the stores of a terminating thread are only guaranteed to be visible when join() returns for that thread... and then to the joining thread only. While it's true that the stores will eventually be visible to all threads in a program, there's no easy way to figure out exactly when this is (the lock-free people would probably say you'd have to wait for a "quiescent state"). I also don't know of any apps that are multi threaded for a while and then later become single threaded, so the issue of performance loss seems like somewhat of a corner case. Sean
Dec 05 2008
next sibling parent reply Fawzi Mohamed <fmohamed mac.com> writes:
On 2008-12-06 08:33:40 +0100, Sean Kelly <sean invisibleduck.org> said:

 dsimcha wrote:
 According to both the docs and my own experiments, thread_needLock() in
 core.thread returns a bool that depends on whether the current process has
 *ever* been multithreaded at *any* point in its execution.  In Phobos's GC
 (pre-druntime), a similar function existed, but it returned a bool based on
 whether more than 1 thread was *currently* running.  It seems to me that
 omitting locks should be safe if no more than 1 thread is currently running,
 even if more than 1 was running at some point in the past.  Why is druntime's
 thread_needLock() designed the way it is?

Typically, the stores of a terminating thread are only guaranteed to be visible when join() returns for that thread... and then to the joining thread only. While it's true that the stores will eventually be visible to all threads in a program, there's no easy way to figure out exactly when this is (the lock-free people would probably say you'd have to wait for a "quiescent state"). I also don't know of any apps that are multi threaded for a while and then later become single threaded, so the issue of performance loss seems like somewhat of a corner case. Sean

ok so this is the reason, good to know... Fawzi
Dec 06 2008
parent reply Fawzi Mohamed <fmohamed mac.com> writes:
On 2008-12-06 09:44:06 +0100, Fawzi Mohamed <fmohamed mac.com> said:

 On 2008-12-06 08:33:40 +0100, Sean Kelly <sean invisibleduck.org> said:
 
 dsimcha wrote:
 According to both the docs and my own experiments, thread_needLock() in
 core.thread returns a bool that depends on whether the current process has
 *ever* been multithreaded at *any* point in its execution.  In Phobos's GC
 (pre-druntime), a similar function existed, but it returned a bool based on
 whether more than 1 thread was *currently* running.  It seems to me that
 omitting locks should be safe if no more than 1 thread is currently running,
 even if more than 1 was running at some point in the past.  Why is druntime's
 thread_needLock() designed the way it is?

Typically, the stores of a terminating thread are only guaranteed to be visible when join() returns for that thread... and then to the joining thread only. While it's true that the stores will eventually be visible to all threads in a program, there's no easy way to figure out exactly when this is (the lock-free people would probably say you'd have to wait for a "quiescent state"). I also don't know of any apps that are multi threaded for a while and then later become single threaded, so the issue of performance loss seems like somewhat of a corner case. Sean

ok so this is the reason, good to know... Fawzi

a memory barrier would be needed, and atomic decrements, but I see that it is not portable...
Dec 06 2008
parent reply Sean Kelly <sean invisibleduck.org> writes:
Fawzi Mohamed wrote:
 
 a memory barrier would be needed, and atomic decrements, but I see that 
 it is not portable...

It would also somewhat defeat the purpose of thread_needLock, since IMO this routine should be fast. If memory barriers are involved then it may as well simply use a mutex itself, and this is exactly what it's intended to avoid. Sean
Dec 06 2008
parent reply Fawzi Mohamed <fmohamed mac.com> writes:
On 2008-12-06 17:13:34 +0100, Sean Kelly <sean invisibleduck.org> said:

 Fawzi Mohamed wrote:
 
 a memory barrier would be needed, and atomic decrements, but I see that 
 it is not portable...

It would also somewhat defeat the purpose of thread_needLock, since IMO this routine should be fast. If memory barriers are involved then it may as well simply use a mutex itself, and this is exactly what it's intended to avoid.

the memory barrier would be needed in the code that decrements the number of active threads, so that you are sure that no pending writes are still there, (that is the problem that you said brought you to switch to a multithreaded flag), not in the code of thread_needLock... But again I would say that this optimization is not really worth it (as you also said it), even if it is relevant for GUI applications. Fawzi
Dec 06 2008
parent reply Sean Kelly <sean invisibleduck.org> writes:
Fawzi Mohamed wrote:
 On 2008-12-06 17:13:34 +0100, Sean Kelly <sean invisibleduck.org> said:
 
 Fawzi Mohamed wrote:
 a memory barrier would be needed, and atomic decrements, but I see 
 that it is not portable...

It would also somewhat defeat the purpose of thread_needLock, since IMO this routine should be fast. If memory barriers are involved then it may as well simply use a mutex itself, and this is exactly what it's intended to avoid.

the memory barrier would be needed in the code that decrements the number of active threads, so that you are sure that no pending writes are still there, (that is the problem that you said brought you to switch to a multithreaded flag), not in the code of thread_needLock...

Not true. You would need an acquire barrier in thread_needLock. However, on x86 the point is probably moot since loads have acquire semantics anyway.
 But again I would say that this optimization is not really worth it (as 
 you also said it), even if it is relevant for GUI applications.

:-) Sean
Dec 06 2008
parent reply Fawzi Mohamed <fmohamed mac.com> writes:
On 2008-12-07 03:48:40 +0100, Sean Kelly <sean invisibleduck.org> said:

 Fawzi Mohamed wrote:
 On 2008-12-06 17:13:34 +0100, Sean Kelly <sean invisibleduck.org> said:
 
 Fawzi Mohamed wrote:
 
 a memory barrier would be needed, and atomic decrements, but I see that 
 it is not portable...

It would also somewhat defeat the purpose of thread_needLock, since IMO this routine should be fast. If memory barriers are involved then it may as well simply use a mutex itself, and this is exactly what it's intended to avoid.

the memory barrier would be needed in the code that decrements the number of active threads, so that you are sure that no pending writes are still there, (that is the problem that you said brought you to switch to a multithreaded flag), not in the code of thread_needLock...

Not true. You would need an acquire barrier in thread_needLock. However, on x86 the point is probably moot since loads have acquire semantics anyway.

You would need a very good processor to reorder speculative loads before a function call and a branch. As far as I know even alpha did not do it. A volatile statement will probably be enough in all cases, but you are right that to be really correct a load barrier should be done, an even in a processor where this might matter the cost of it in the fast path will be basically 0 (so still better than a lock).
 
 But again I would say that this optimization is not really worth it (as 
 you also said it), even if it is relevant for GUI applications.

:-) Sean

Dec 07 2008
parent reply Sean Kelly <sean invisibleduck.org> writes:
Fawzi Mohamed wrote:
 On 2008-12-07 03:48:40 +0100, Sean Kelly <sean invisibleduck.org> said:
 Not true.  You would need an acquire barrier in thread_needLock. 
 However, on x86 the point is probably moot since loads have acquire 
 semantics anyway.

You would need a very good processor to reorder speculative loads before a function call and a branch. As far as I know even alpha did not do it.

But if thread_needLock() is inlined...
 A volatile statement will probably be enough in all cases, but you are 
 right that to be really correct a load barrier should be done, an even 
 in a processor where this might matter the cost of it in the fast path 
 will be basically 0 (so still better than a lock).

Aye. I'd do this if there were a common use case that justified it, but I don't see one. Sean
Dec 07 2008
parent Fawzi Mohamed <fmohamed mac.com> writes:
On 2008-12-07 09:23:01 +0100, Sean Kelly <sean invisibleduck.org> said:

 Fawzi Mohamed wrote:
 On 2008-12-07 03:48:40 +0100, Sean Kelly <sean invisibleduck.org> said:
 
 Not true.  You would need an acquire barrier in thread_needLock. 
 However, on x86 the point is probably moot since loads have acquire 
 semantics anyway.

You would need a very good processor to reorder speculative loads before a function call and a branch. As far as I know even alpha did not do it.

But if thread_needLock() is inlined...
 A volatile statement will probably be enough in all cases, but you are 
 right that to be really correct a load barrier should be done, an even 
 in a processor where this might matter the cost of it in the fast path 
 will be basically 0 (so still better than a lock).

Aye. I'd do this if there were a common use case that justified it, but I don't see one.

I fully agree with you (see my answer to Robert Fraser)
 
 
 Sean

Dec 07 2008
prev sibling parent Leandro Lucarella <llucax gmail.com> writes:
Sean Kelly, el  5 de diciembre a las 23:33 me escribiste:
 dsimcha wrote:
According to both the docs and my own experiments, thread_needLock() in
core.thread returns a bool that depends on whether the current process has
*ever* been multithreaded at *any* point in its execution.  In Phobos's GC
(pre-druntime), a similar function existed, but it returned a bool based on
whether more than 1 thread was *currently* running.  It seems to me that
omitting locks should be safe if no more than 1 thread is currently running,
even if more than 1 was running at some point in the past.  Why is druntime's
thread_needLock() designed the way it is?

Typically, the stores of a terminating thread are only guaranteed to be visible when join() returns for that thread... and then to the joining thread only. While it's true that the stores will eventually be visible to all threads in a program, there's no easy way to figure out exactly when this is (the lock-free people would probably say you'd have to wait for a "quiescent state"). I also don't know of any apps that are multi threaded for a while and then later become single threaded, so the issue of performance loss seems like somewhat of a corner case.

FYI, I've added this to the druntime FAQ: http://www.dsource.org/projects/druntime/wiki/DevelFAQ -- Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/ ---------------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------------- Karma police arrest this man, he talks in maths, he buzzes like a fridge, he's like a detuned radio.
Dec 06 2008
prev sibling parent reply Fawzi Mohamed <fmohamed mac.com> writes:
On 2008-12-06 06:02:44 +0100, dsimcha <dsimcha yahoo.com> said:

 According to both the docs and my own experiments, thread_needLock() in
 core.thread returns a bool that depends on whether the current process has
 *ever* been multithreaded at *any* point in its execution.  In Phobos's GC
 (pre-druntime), a similar function existed, but it returned a bool based on
 whether more than 1 thread was *currently* running.  It seems to me that
 omitting locks should be safe if no more than 1 thread is currently running,
 even if more than 1 was running at some point in the past.  Why is druntime's
 thread_needLock() designed the way it is?

Indeed I see no real reason not to keep a thread could that would be incremented before spawn or in thread_attach, and decremented at the end of thread_entryFunction and thread_detach. Potentially one could think badly written code similar to this if (thread_needLock()) lock(); if (thread_needLock()) unlock(); or initializations done unconditionally when the runtime becomes multithreaded, but I found no issues like this in tangos runtime, thread_needLock is used only to then do synchronized(...){...} So yes one could probably switch back to the old Phobos style. I would guess that it is not really a common situation for a program to become single threaded again, though... Fawzi
Dec 06 2008
parent reply Christopher Wright <dhasenan gmail.com> writes:
Fawzi Mohamed wrote:
 So yes one could probably switch back to the old Phobos style.
 I would guess that it is not really a common situation for a program to 
 become single threaded again, though...
 
 Fawzi
 

At work, we have a single-threaded application -- everything happens on the GUI thread. There are some operations that take a long time, though. For those, we throw up a spinny dialog box. But if these operations happened on the GUI thread, the spinny dialog box would not spin. So we do the expensive operations on a background thread. So, our application becomes multithreaded on rare occasions and becomes single-threaded again after. Not sure how common this is.
Dec 06 2008
parent reply Leandro Lucarella <llucax gmail.com> writes:
Christopher Wright, el  6 de diciembre a las 09:06 me escribiste:
 Fawzi Mohamed wrote:
So yes one could probably switch back to the old Phobos style.
I would guess that it is not really a common situation for a program to become
single threaded again, though...
Fawzi

At work, we have a single-threaded application -- everything happens on the GUI thread. There are some operations that take a long time, though. For those, we throw up a spinny dialog box. But if these operations happened on the GUI thread, the spinny dialog box would not spin. So we do the expensive operations on a background thread. So, our application becomes multithreaded on rare occasions and becomes single-threaded again after. Not sure how common this is.

I think this is pretty common in GUI applications, but I don't think GUI applications usually are performance critical, right? -- Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/ ---------------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------------- You can do better than me. You could throw a dart out the window and hit someone better than me. I'm no good! -- George Constanza
Dec 06 2008
parent reply Robert Fraser <fraserofthenight gmail.com> writes:
Leandro Lucarella wrote:
 Christopher Wright, el  6 de diciembre a las 09:06 me escribiste:
 Fawzi Mohamed wrote:
 So yes one could probably switch back to the old Phobos style.
 I would guess that it is not really a common situation for a program to become
single threaded again, though...
 Fawzi

throw up a spinny dialog box. But if these operations happened on the GUI thread, the spinny dialog box would not spin. So we do the expensive operations on a background thread. So, our application becomes multithreaded on rare occasions and becomes single-threaded again after. Not sure how common this is.

I think this is pretty common in GUI applications, but I don't think GUI applications usually are performance critical, right?

Maya? Combustion? Final Cut Pro? Photoshop? Visual Studio (it shouldn't be, but it can get damn slow on occasion)? Heck, most GUI programs seem like they "could be faster". Opening Outlook takes 30 seconds. Firefox takes 5-10 seconds to start. Even Windows Explorer feels sluggish (to its credit, much less so than Gnome or KDE). I'm not sure if this translates to "performance-critical", but it's certainly something to think about.
Dec 06 2008
next sibling parent Fawzi Mohamed <fmohamed mac.com> writes:
On 2008-12-07 06:34:20 +0100, Robert Fraser <fraserofthenight gmail.com> said:

 Leandro Lucarella wrote:
 Christopher Wright, el  6 de diciembre a las 09:06 me escribiste:
 Fawzi Mohamed wrote:
 So yes one could probably switch back to the old Phobos style.
 I would guess that it is not really a common situation for a program to 
 become single threaded again, though...
 Fawzi

the GUI thread. There are some operations that take a long time, though. For those, we throw up a spinny dialog box. But if these operations happened on the GUI thread, the spinny dialog box would not spin. So we do the expensive operations on a background thread. So, our application becomes multithreaded on rare occasions and becomes single-threaded again after. Not sure how common this is.

I think this is pretty common in GUI applications, but I don't think GUI applications usually are performance critical, right?

Maya? Combustion? Final Cut Pro? Photoshop? Visual Studio (it shouldn't be, but it can get damn slow on occasion)? Heck, most GUI programs seem like they "could be faster". Opening Outlook takes 30 seconds. Firefox takes 5-10 seconds to start. Even Windows Explorer feels sluggish (to its credit, much less so than Gnome or KDE). I'm not sure if this translates to "performance-critical", but it's certainly something to think about.

all example that you did are heavily multithreaded as far as I know, (VisualStudio I do not know). An the standard way to make a GUI more responsive is th make it multithreaded (offloading computational intensive tasks. The GUI is driven by a single thread, but the application itself is multithreaded. If you have a single threaded application that it too slow in the single threaded parts, probably to speed it up you would want to make it multithreaded. So is the speedup of single threaded parts worht making the runtime depend on memory barriers, that are not implemented for each platform? I don't think so. Fawzi
Dec 06 2008
prev sibling parent BCS <ao pathlink.com> writes:
Reply to Robert,

 Opening
 Outlook takes 30 seconds.

The early (internal) versions of outlook took 5 MINUTES to open an e-mail. The solution ended up including fewer threads IIRC, (but probably not single threaded).
Dec 07 2008