www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Thread-local storage and Performance

reply dsimcha <dsimcha yahoo.com> writes:
Has D's builtin TLS been optimized in the past 6 months to year?  I had
benchmarked it awhile back when optimizing some code that I wrote and
discovered it was significantly slower than regular globals (the kind that are
now __gshared).  Now, at least on Windows, it seems that there is no
discernible difference and if anything, TLS is slightly faster than __gshared.
 What's changed?
Oct 26 2009
parent reply =?UTF-8?B?UGVsbGUgTcOlbnNzb24=?= <pelle.mansson gmail.com> writes:
dsimcha wrote:
 Has D's builtin TLS been optimized in the past 6 months to year?  I had
 benchmarked it awhile back when optimizing some code that I wrote and
 discovered it was significantly slower than regular globals (the kind that are
 now __gshared).  Now, at least on Windows, it seems that there is no
 discernible difference and if anything, TLS is slightly faster than __gshared.
  What's changed?
I was under the impression that TLS should be faster due to absence of synchronization.
Oct 26 2009
next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Mon, 26 Oct 2009 18:26:02 +0300, Pelle M=C3=A5nsson  =

<pelle.mansson gmail.com> wrote:

 dsimcha wrote:
 Has D's builtin TLS been optimized in the past 6 months to year?  I h=
ad
 benchmarked it awhile back when optimizing some code that I wrote and=
 discovered it was significantly slower than regular globals (the kind=
=
 that are
 now __gshared).  Now, at least on Windows, it seems that there is no
 discernible difference and if anything, TLS is slightly faster than  =
 __gshared.
  What's changed?
I was under the impression that TLS should be faster due to absence of=
=
 synchronization.
__gshared doesn't have any locks/barriers associated with them. TLS should be slightly slower due to an additional indirection, but I = don't think it would be noticeable.
Oct 26 2009
prev sibling parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from Pelle Månsson (pelle.mansson gmail.com)'s article
 dsimcha wrote:
 Has D's builtin TLS been optimized in the past 6 months to year?  I had
 benchmarked it awhile back when optimizing some code that I wrote and
 discovered it was significantly slower than regular globals (the kind that are
 now __gshared).  Now, at least on Windows, it seems that there is no
 discernible difference and if anything, TLS is slightly faster than __gshared.
  What's changed?
I was under the impression that TLS should be faster due to absence of synchronization.
__gshared == old-skool cowboy sharing, i.e. plain old unsynchronized globals. Without getting into the details of my specific case, the reason I'm interested in this is that I have some code that I want to be as fast as possible in both single- and multithreaded environments. Right now, it has a hack that checks thread_needLock() and uses plain old globals for everything as long as the program is single-threaded because that seemed faster than TLS lookups a while ago. However, running the same benchmark again shows otherwise.
Oct 26 2009
parent Walter Bright <newshound1 digitalmars.com> writes:
dsimcha wrote:
 == Quote from Pelle Månsson (pelle.mansson gmail.com)'s article
 dsimcha wrote:
 Has D's builtin TLS been optimized in the past 6 months to year?  I had
 benchmarked it awhile back when optimizing some code that I wrote and
 discovered it was significantly slower than regular globals (the kind that are
 now __gshared).  Now, at least on Windows, it seems that there is no
 discernible difference and if anything, TLS is slightly faster than __gshared.
  What's changed?
I was under the impression that TLS should be faster due to absence of synchronization.
__gshared == old-skool cowboy sharing, i.e. plain old unsynchronized globals. Without getting into the details of my specific case, the reason I'm interested in this is that I have some code that I want to be as fast as possible in both single- and multithreaded environments. Right now, it has a hack that checks thread_needLock() and uses plain old globals for everything as long as the program is single-threaded because that seemed faster than TLS lookups a while ago. However, running the same benchmark again shows otherwise.
Nothing has changed. What I would do is to look at the assembler output and verify that the TLS globals really are TLS, and the ones that are not are really not.
Oct 26 2009