digitalmars.D - druntime thread

digitalmars.D - druntime thread_needLock()

dsimcha (8/8) Dec 05 2008 According to both the docs and my own experiments, thread_needLock() in

Sean Kelly (10/18) Dec 05 2008 Typically, the stores of a terminating thread are only guaranteed to be

Fawzi Mohamed (3/25) Dec 06 2008 ok so this is the reason, good to know...

Fawzi Mohamed (3/31) Dec 06 2008 a memory barrier would be needed, and atomic decrements, but I see that

Sean Kelly (6/9) Dec 06 2008 It would also somewhat defeat the purpose of thread_needLock, since IMO

Fawzi Mohamed (8/17) Dec 06 2008 the memory barrier would be needed in the code that decrements the

Sean Kelly (6/24) Dec 06 2008 Not true. You would need an acquire barrier in thread_needLock.

Fawzi Mohamed (8/37) Dec 07 2008 You would need a very good processor to reorder speculative loads

Sean Kelly (5/17) Dec 07 2008 Aye. I'd do this if there were a common use case that justified it, but...

Fawzi Mohamed (2/25) Dec 07 2008

Leandro Lucarella (13/31) Dec 06 2008 FYI, I've added this to the druntime FAQ:

Fawzi Mohamed (14/22) Dec 06 2008 Indeed I see no real reason not to keep a thread could that would be

Christopher Wright (9/15) Dec 06 2008 At work, we have a single-threaded application -- everything happens on

Leandro Lucarella (11/23) Dec 06 2008 I think this is pretty common in GUI applications, but I don't think GUI

Robert Fraser (8/24) Dec 06 2008 Maya? Combustion? Final Cut Pro? Photoshop? Visual Studio (it shouldn't

Fawzi Mohamed (14/44) Dec 06 2008 all example that you did are heavily multithreaded as far as I know,
BCS (4/6) Dec 07 2008 The early (internal) versions of outlook took 5 MINUTES to open an e-mai...

dsimcha <dsimcha yahoo.com> writes:

According to both the docs and my own experiments, thread_needLock() in
core.thread returns a bool that depends on whether the current process has
*ever* been multithreaded at *any* point in its execution.  In Phobos's GC
(pre-druntime), a similar function existed, but it returned a bool based on
whether more than 1 thread was *currently* running.  It seems to me that
omitting locks should be safe if no more than 1 thread is currently running,
even if more than 1 was running at some point in the past.  Why is druntime's
thread_needLock() designed the way it is?

Dec 05 2008

Sean Kelly <sean invisibleduck.org> writes:

dsimcha wrote:
 According to both the docs and my own experiments, thread_needLock() in
 core.thread returns a bool that depends on whether the current process has
 *ever* been multithreaded at *any* point in its execution.  In Phobos's GC
 (pre-druntime), a similar function existed, but it returned a bool based on
 whether more than 1 thread was *currently* running.  It seems to me that
 omitting locks should be safe if no more than 1 thread is currently running,
 even if more than 1 was running at some point in the past.  Why is druntime's
 thread_needLock() designed the way it is?

Typically, the stores of a terminating thread are only guaranteed to be 
visible when join() returns for that thread... and then to the joining 
thread only.  While it's true that the stores will eventually be visible 
to all threads in a program, there's no easy way to figure out exactly 
when this is (the lock-free people would probably say you'd have to wait 
for a "quiescent state").  I also don't know of any apps that are multi 
threaded for a while and then later become single threaded, so the issue 
of performance loss seems like somewhat of a corner case.


Sean

Dec 05 2008

Fawzi Mohamed <fmohamed mac.com> writes:

On 2008-12-06 08:33:40 +0100, Sean Kelly <sean invisibleduck.org> said:

 dsimcha wrote:
 According to both the docs and my own experiments, thread_needLock() in
 core.thread returns a bool that depends on whether the current process has
 *ever* been multithreaded at *any* point in its execution.  In Phobos's GC
 (pre-druntime), a similar function existed, but it returned a bool based on
 whether more than 1 thread was *currently* running.  It seems to me that
 omitting locks should be safe if no more than 1 thread is currently running,
 even if more than 1 was running at some point in the past.  Why is druntime's
 thread_needLock() designed the way it is?

 
 Typically, the stores of a terminating thread are only guaranteed to be 
 visible when join() returns for that thread... and then to the joining 
 thread only.  While it's true that the stores will eventually be 
 visible to all threads in a program, there's no easy way to figure out 
 exactly when this is (the lock-free people would probably say you'd 
 have to wait for a "quiescent state").  I also don't know of any apps 
 that are multi threaded for a while and then later become single 
 threaded, so the issue of performance loss seems like somewhat of a 
 corner case.
 
 
 Sean

ok so this is the reason, good to know...

Fawzi

Dec 06 2008

Fawzi Mohamed <fmohamed mac.com> writes:

On 2008-12-06 09:44:06 +0100, Fawzi Mohamed <fmohamed mac.com> said:

 On 2008-12-06 08:33:40 +0100, Sean Kelly <sean invisibleduck.org> said:
 
 dsimcha wrote:
 According to both the docs and my own experiments, thread_needLock() in
 core.thread returns a bool that depends on whether the current process has
 *ever* been multithreaded at *any* point in its execution.  In Phobos's GC
 (pre-druntime), a similar function existed, but it returned a bool based on
 whether more than 1 thread was *currently* running.  It seems to me that
 omitting locks should be safe if no more than 1 thread is currently running,
 even if more than 1 was running at some point in the past.  Why is druntime's
 thread_needLock() designed the way it is?

 
 Typically, the stores of a terminating thread are only guaranteed to be 
 visible when join() returns for that thread... and then to the joining 
 thread only.  While it's true that the stores will eventually be 
 visible to all threads in a program, there's no easy way to figure out 
 exactly when this is (the lock-free people would probably say you'd 
 have to wait for a "quiescent state").  I also don't know of any apps 
 that are multi threaded for a while and then later become single 
 threaded, so the issue of performance loss seems like somewhat of a 
 corner case.
 
 
 Sean

 
 ok so this is the reason, good to know...
 
 Fawzi

a memory barrier would be needed, and atomic decrements, but I see that 
it is not portable...

Dec 06 2008

Sean Kelly <sean invisibleduck.org> writes:

Fawzi Mohamed wrote:
 
 a memory barrier would be needed, and atomic decrements, but I see that 
 it is not portable...

It would also somewhat defeat the purpose of thread_needLock, since IMO 
this routine should be fast.  If memory barriers are involved then it 
may as well simply use a mutex itself, and this is exactly what it's 
intended to avoid.


Sean

Dec 06 2008

Fawzi Mohamed <fmohamed mac.com> writes:

On 2008-12-06 17:13:34 +0100, Sean Kelly <sean invisibleduck.org> said:

 Fawzi Mohamed wrote:
 
 a memory barrier would be needed, and atomic decrements, but I see that 
 it is not portable...

 
 It would also somewhat defeat the purpose of thread_needLock, since IMO 
 this routine should be fast.  If memory barriers are involved then it 
 may as well simply use a mutex itself, and this is exactly what it's 
 intended to avoid.

the memory barrier would be needed in the code that decrements the 
number of active threads, so that you are sure that no pending writes 
are still there, (that is the problem that you said brought you to 
switch to a multithreaded flag), not in the code of thread_needLock...

But again I would say that this optimization is not really worth it (as 
you also said it), even if it is relevant for GUI applications.

Fawzi

Dec 06 2008

Sean Kelly <sean invisibleduck.org> writes:

Fawzi Mohamed wrote:
 On 2008-12-06 17:13:34 +0100, Sean Kelly <sean invisibleduck.org> said:
 
 Fawzi Mohamed wrote:
 a memory barrier would be needed, and atomic decrements, but I see 
 that it is not portable...

 It would also somewhat defeat the purpose of thread_needLock, since 
 IMO this routine should be fast.  If memory barriers are involved then 
 it may as well simply use a mutex itself, and this is exactly what 
 it's intended to avoid.

 
 the memory barrier would be needed in the code that decrements the 
 number of active threads, so that you are sure that no pending writes 
 are still there, (that is the problem that you said brought you to 
 switch to a multithreaded flag), not in the code of thread_needLock...

Not true.  You would need an acquire barrier in thread_needLock. 
However, on x86 the point is probably moot since loads have acquire 
semantics anyway.

 But again I would say that this optimization is not really worth it (as 
 you also said it), even if it is relevant for GUI applications.

:-)


Sean

Dec 06 2008

Fawzi Mohamed <fmohamed mac.com> writes:

On 2008-12-07 03:48:40 +0100, Sean Kelly <sean invisibleduck.org> said:

 Fawzi Mohamed wrote:
 On 2008-12-06 17:13:34 +0100, Sean Kelly <sean invisibleduck.org> said:
 
 Fawzi Mohamed wrote:
 
 a memory barrier would be needed, and atomic decrements, but I see that 
 it is not portable...

 
 It would also somewhat defeat the purpose of thread_needLock, since IMO 
 this routine should be fast.  If memory barriers are involved then it 
 may as well simply use a mutex itself, and this is exactly what it's 
 intended to avoid.

 
 the memory barrier would be needed in the code that decrements the 
 number of active threads, so that you are sure that no pending writes 
 are still there, (that is the problem that you said brought you to 
 switch to a multithreaded flag), not in the code of thread_needLock...

 
 Not true.  You would need an acquire barrier in thread_needLock. 
 However, on x86 the point is probably moot since loads have acquire 
 semantics anyway.

You would need a very good processor to reorder speculative loads 
before a function call and a branch. As far as I know even alpha did 
not do it.
A volatile statement will probably be enough in all cases, but you are 
right that to be really correct a load barrier should be done, an even 
in a processor where this might matter the cost of it in the fast path 
will be basically 0 (so still better than a lock).

 
 But again I would say that this optimization is not really worth it (as 
 you also said it), even if it is relevant for GUI applications.

 
 :-)
 
 
 Sean

Dec 07 2008

Sean Kelly <sean invisibleduck.org> writes:

Fawzi Mohamed wrote:
 On 2008-12-07 03:48:40 +0100, Sean Kelly <sean invisibleduck.org> said:
 Not true.  You would need an acquire barrier in thread_needLock. 
 However, on x86 the point is probably moot since loads have acquire 
 semantics anyway.

 
 You would need a very good processor to reorder speculative loads before 
 a function call and a branch. As far as I know even alpha did not do it.

But if thread_needLock() is inlined...

 A volatile statement will probably be enough in all cases, but you are 
 right that to be really correct a load barrier should be done, an even 
 in a processor where this might matter the cost of it in the fast path 
 will be basically 0 (so still better than a lock).

Aye.  I'd do this if there were a common use case that justified it, but 
I don't see one.


Sean

Dec 07 2008

Fawzi Mohamed <fmohamed mac.com> writes:

On 2008-12-07 09:23:01 +0100, Sean Kelly <sean invisibleduck.org> said:

 Fawzi Mohamed wrote:
 On 2008-12-07 03:48:40 +0100, Sean Kelly <sean invisibleduck.org> said:
 
 Not true.  You would need an acquire barrier in thread_needLock. 
 However, on x86 the point is probably moot since loads have acquire 
 semantics anyway.

 
 You would need a very good processor to reorder speculative loads 
 before a function call and a branch. As far as I know even alpha did 
 not do it.

 
 But if thread_needLock() is inlined...
 
 A volatile statement will probably be enough in all cases, but you are 
 right that to be really correct a load barrier should be done, an even 
 in a processor where this might matter the cost of it in the fast path 
 will be basically 0 (so still better than a lock).

 
 Aye.  I'd do this if there were a common use case that justified it, 
 but I don't see one.

I fully agree with you (see my answer to Robert Fraser)
 
 
 Sean

Dec 07 2008

Leandro Lucarella <llucax gmail.com> writes:

Sean Kelly, el  5 de diciembre a las 23:33 me escribiste:
 dsimcha wrote:
According to both the docs and my own experiments, thread_needLock() in
core.thread returns a bool that depends on whether the current process has
*ever* been multithreaded at *any* point in its execution.  In Phobos's GC
(pre-druntime), a similar function existed, but it returned a bool based on
whether more than 1 thread was *currently* running.  It seems to me that
omitting locks should be safe if no more than 1 thread is currently running,
even if more than 1 was running at some point in the past.  Why is druntime's
thread_needLock() designed the way it is?

 
 Typically, the stores of a terminating thread are only guaranteed to be
 visible when join() returns for that thread... and then to the joining
 thread only.  While it's true that the stores will eventually be visible
 to all threads in a program, there's no easy way to figure out exactly
 when this is (the lock-free people would probably say you'd have to wait
 for a "quiescent state").  I also don't know of any apps that are multi
 threaded for a while and then later become single threaded, so the issue
 of performance loss seems like somewhat of a corner case.

FYI, I've added this to the druntime FAQ:
http://www.dsource.org/projects/druntime/wiki/DevelFAQ

-- 
Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/
----------------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------------
Karma police
arrest this man,
he talks in maths,
he buzzes like a fridge,
he's like a detuned radio.

Dec 06 2008

Fawzi Mohamed <fmohamed mac.com> writes:

On 2008-12-06 06:02:44 +0100, dsimcha <dsimcha yahoo.com> said:

 According to both the docs and my own experiments, thread_needLock() in
 core.thread returns a bool that depends on whether the current process has
 *ever* been multithreaded at *any* point in its execution.  In Phobos's GC
 (pre-druntime), a similar function existed, but it returned a bool based on
 whether more than 1 thread was *currently* running.  It seems to me that
 omitting locks should be safe if no more than 1 thread is currently running,
 even if more than 1 was running at some point in the past.  Why is druntime's
 thread_needLock() designed the way it is?

Indeed I see no real reason not to keep a thread could that would be 
incremented before spawn or in thread_attach, and decremented at the 
end of thread_entryFunction and thread_detach.

Potentially one could think badly written code similar to this
  if (thread_needLock()) lock();
  if (thread_needLock()) unlock();
or initializations done unconditionally when the runtime becomes multithreaded,
but I found no issues like this in tangos runtime, thread_needLock is 
used only to then do synchronized(...){...}

So yes one could probably switch back to the old Phobos style.
I would guess that it is not really a common situation for a program to 
become single threaded again, though...

Fawzi

Dec 06 2008

Christopher Wright <dhasenan gmail.com> writes:

Fawzi Mohamed wrote:
 So yes one could probably switch back to the old Phobos style.
 I would guess that it is not really a common situation for a program to 
 become single threaded again, though...
 
 Fawzi
 

At work, we have a single-threaded application -- everything happens on 
the GUI thread. There are some operations that take a long time, though. 
For those, we throw up a spinny dialog box. But if these operations 
happened on the GUI thread, the spinny dialog box would not spin. So we 
do the expensive operations on a background thread.

So, our application becomes multithreaded on rare occasions and becomes 
single-threaded again after.

Not sure how common this is.

Dec 06 2008

Leandro Lucarella <llucax gmail.com> writes:

Christopher Wright, el  6 de diciembre a las 09:06 me escribiste:
 Fawzi Mohamed wrote:
So yes one could probably switch back to the old Phobos style.
I would guess that it is not really a common situation for a program to become
single threaded again, though...
Fawzi

 
 At work, we have a single-threaded application -- everything happens on the
GUI thread. There are some operations that take a long time, though. For those,
we 
 throw up a spinny dialog box. But if these operations happened on the GUI
thread, the spinny dialog box would not spin. So we do the expensive operations
on a 
 background thread.
 
 So, our application becomes multithreaded on rare occasions and becomes
single-threaded again after.
 
 Not sure how common this is.

I think this is pretty common in GUI applications, but I don't think GUI
applications usually are performance critical, right?

-- 
Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/
----------------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------------
You can do better than me. You could throw a dart out the window and hit
someone better than me. I'm no good!
	-- George Constanza

Dec 06 2008

Robert Fraser <fraserofthenight gmail.com> writes:

Leandro Lucarella wrote:
 Christopher Wright, el  6 de diciembre a las 09:06 me escribiste:
 Fawzi Mohamed wrote:
 So yes one could probably switch back to the old Phobos style.
 I would guess that it is not really a common situation for a program to become
single threaded again, though...
 Fawzi

 At work, we have a single-threaded application -- everything happens on the
GUI thread. There are some operations that take a long time, though. For those,
we 
 throw up a spinny dialog box. But if these operations happened on the GUI
thread, the spinny dialog box would not spin. So we do the expensive operations
on a 
 background thread.

 So, our application becomes multithreaded on rare occasions and becomes
single-threaded again after.

 Not sure how common this is.

 
 I think this is pretty common in GUI applications, but I don't think GUI
 applications usually are performance critical, right?
 

Maya? Combustion? Final Cut Pro? Photoshop? Visual Studio (it shouldn't 
be, but it can get damn slow on occasion)?

Heck, most GUI programs seem like they "could be faster". Opening 
Outlook takes 30 seconds. Firefox takes 5-10 seconds to start. Even 
Windows Explorer feels sluggish (to its credit, much less so than Gnome 
or KDE). I'm not sure if this translates to "performance-critical", but 
it's certainly something to think about.

Dec 06 2008

Fawzi Mohamed <fmohamed mac.com> writes:

On 2008-12-07 06:34:20 +0100, Robert Fraser <fraserofthenight gmail.com> said:

 Leandro Lucarella wrote:
 Christopher Wright, el  6 de diciembre a las 09:06 me escribiste:
 Fawzi Mohamed wrote:
 So yes one could probably switch back to the old Phobos style.
 I would guess that it is not really a common situation for a program to 
 become single threaded again, though...
 Fawzi

 At work, we have a single-threaded application -- everything happens on 
 the GUI thread. There are some operations that take a long time, 
 though. For those, we throw up a spinny dialog box. But if these 
 operations happened on the GUI thread, the spinny dialog box would not 
 spin. So we do the expensive operations on a background thread.
 
 So, our application becomes multithreaded on rare occasions and becomes 
 single-threaded again after.
 
 Not sure how common this is.

 
 I think this is pretty common in GUI applications, but I don't think GUI
 applications usually are performance critical, right?
 

 
 Maya? Combustion? Final Cut Pro? Photoshop? Visual Studio (it shouldn't 
 be, but it can get damn slow on occasion)?
 
 Heck, most GUI programs seem like they "could be faster". Opening 
 Outlook takes 30 seconds. Firefox takes 5-10 seconds to start. Even 
 Windows Explorer feels sluggish (to its credit, much less so than Gnome 
 or KDE). I'm not sure if this translates to "performance-critical", but 
 it's certainly something to think about.

all example that you did are heavily multithreaded as far as I know, 
(VisualStudio I do not know).
An the standard way to make a GUI more responsive is th make it 
multithreaded (offloading computational intensive tasks.
The GUI is driven by a single thread, but the application itself is 
multithreaded.
If you have a single threaded application that it too slow in the 
single threaded parts, probably to speed it up you would want to make 
it multithreaded.
So is the speedup of single threaded parts worht making the runtime 
depend on memory barriers, that are not implemented for each platform?
I don't think so.

Fawzi

Dec 06 2008

BCS <ao pathlink.com> writes:

Reply to Robert,

 Opening
 Outlook takes 30 seconds.

The early (internal) versions of outlook took 5 MINUTES to open an e-mail. 
The solution ended up including fewer threads IIRC, (but probably not single 
threaded).

Dec 07 2008

D Programming

C/C++ Programming

Other

digitalmars.D - druntime thread_needLock()