digitalmars.D - Testing some singleton implementations

Andrej Mitrovic (114/114) Jan 31 2014 There was a nice blog-post about implementing low-lock singletons in D, ...

Stanislav Blinov (12/12) Jan 31 2014 You forgot to make the flag static for AtomicSingleton. I'd also

Andrej Mitrovic (9/15) Jan 31 2014 Here's mine:

Stanislav Blinov (7/19) Jan 31 2014 It is :)

Benjamin Thaut (16/28) Jan 31 2014 For x86 CPUs you don't really need MemoryOrder.acq as reads are atomic

Andrej Mitrovic (4/6) Jan 31 2014 Hmm, I guess we could use a version(X86) block to pick this. When you

Benjamin Thaut (11/17) Jan 31 2014 It depends on the processor architecture. Usually if you have a "normal"...

Benjamin Thaut (4/4) Jan 31 2014 If you need the details, read:

Jonathan Bettencourt (2/2) Jan 31 2014 Is it just me or does the implementation of atomic.d look grossly

Benjamin Thaut (6/8) Jan 31 2014 I can't really judge that, as I don't have much experience in lock free

Andrej Mitrovic (3/5) Jan 31 2014 Aye it's been on my todo list forever, even though I've read the first

Benjamin Thaut (5/11) Jan 31 2014 You should really take the time to read it. Its one of the best articles...

Stanislav Blinov (7/9) Jan 31 2014 Uhm... atomicLoad() itself guarantees that the read is atomic.

Stanislav Blinov (17/17) Jan 31 2014 In fact #2, I think it's even safe to pull that store out of the

Dmitry Olshansky (7/23) Jan 31 2014 //(4)

Stanislav Blinov (8/35) Jan 31 2014 Nope. The only way the thread is going to end up past the null

Stanislav Blinov (8/38) Feb 01 2014 To clarify: only one thread will ever get to position (3). All

Dmitry Olshansky (5/41) Feb 01 2014 Yes, I see there could be many writes to _instantiated field but not
Stanislav Blinov (10/10) Feb 07 2014 There's a lot more to these singletons than meets the eye.

Jonathan Bettencourt (4/9) Feb 07 2014 I agree that acq/rel is the correct way to go, but it will cause
Cecil Ward (19/29) Feb 27 2014 Hi Martin, Sean, Stanislav et al

Stanislav Blinov (22/44) Mar 03 2014 When I said "review" I meant this specific issue, e.g.

Dmitry Olshansky (6/15) Jan 31 2014 And it was a big thing because of that. Also keep in mind that atomic

Andrej Mitrovic (3/6) Jan 31 2014 Hmm yeah, but I was expecting better numbers. Even after the 'static'
Andrej Mitrovic (18/20) Jan 31 2014 Actually, I think I understand why this happens. Logically, the atomic

Stanislav Blinov (15/21) Jan 31 2014 Easy enough to test. But inconclusive. I just ran some tests with

Andrej Mitrovic (3/7) Jan 31 2014 Hmm.. Well I know we've had some issues with threads on FreeBSD. It's

Stanislav Blinov (29/37) Jan 31 2014 I'm not comfortable with that atomicOp in the thread function.

Andrej Mitrovic (14/17) Feb 04 2014 I've finally managed to build LDC2 on Windows (MinGW version), here

Stanislav Blinov (7/22) Feb 04 2014 :)

Andrej Mitrovic (7/11) Feb 04 2014 I haven't figured out exactly what you're trying to swap there. Do you

Stanislav Blinov (7/21) Feb 04 2014 Both atomicLoad and atomicStore use raw MemoryOrder, and also the

Andrej Mitrovic (5/8) Feb 05 2014 No difference, but maybe the timing precision isn't proper. It always

Stanislav Blinov (7/12) Feb 05 2014 Hmm... It should be as proper as it gets, judging from

Jonathan Bettencourt (5/17) Feb 05 2014 The atomics implementation in druntime is very inefficient, it

Andrej Mitrovic (2/4) Feb 04 2014 s/:/?
Jerry (25/42) Feb 04 2014 Here's the best and worst times I get on my linux laptop. These are

Stanislav Blinov (5/19) Feb 05 2014 Whoah, those times for AtomicSingleton are way high. What kind of

Jerry (4/28) Feb 05 2014 Core 2 Due T9400. The gdc times were much better for AtomicSingleton -

Stanislav Blinov (6/7) Feb 05 2014 Here's my latest revision: http://dpaste.dzfl.pl/5b54df1c7004

Jerry (23/29) Feb 06 2014 Yup, that helps out the AtomicSingleton a lot. Here's best and worst

Sean Kelly (4/4) Feb 07 2014 Weird. atomicLoad(raw) should be the same as atomicLoad(acq),

Stanislav Blinov (9/13) Feb 07 2014 huh?

Sean Kelly (7/20) Feb 07 2014 Oops. I thought that since Intel has officially defined loads as

Stanislav Blinov (71/76) Feb 07 2014 Offhand - no. But who forbids empirical tests? :)

Sean Kelly (3/4) Feb 07 2014 Awesome. Then I think we can go back to the old logic.

Stanislav Blinov (12/17) Feb 07 2014 Cool. Also, from

Marco Leise (8/30) Feb 07 2014 Strong-ordering does not work on x86/amd64 in two cases:
Martin Nowak (2/11) Feb 08 2014 So, who is going to fix core.atomic?

Stanislav Blinov (2/3) Feb 09 2014 I was under impression that Sean was onto it.

Martin Nowak (2/5) Feb 09 2014 Can you please submit a bug report, so we don't loose track of this.

Stanislav Blinov (3/5) Feb 09 2014 Sure:

Iain Buclaw (10/30) Feb 07 2014 atomicStore(raw) should be the same as atomicStore(rel). At least on x8...

Marco Leise (12/46) Feb 07 2014 You send shared variables as "volatile" to the backend and

Iain Buclaw (10/54) Feb 09 2014 having

Stanislav Blinov (2/2) Feb 09 2014 Isn't it great how a simple benchmark thread can reveal such
Marco Leise (7/67) Feb 17 2014 ut

Stanislav Blinov (3/30) Feb 07 2014 Nice.

Marco Leise (25/59) Feb 07 2014 I just tested with DMD 2.064.2 and my numbers for the

Dejan Lekic (37/37) Jan 31 2014 I was thinking about implementing a typical Java singleton in D,

Dejan Lekic (6/6) Jan 31 2014 I should have mentioned two things in my previous post.

Stanislav Blinov (4/10) Jan 31 2014 What use would the const version have? You'd still need some way

Dejan Lekic (1/3) Jan 31 2014 I believe it should have been "final" instead of "const".

Stanislav Blinov (3/4) Jan 31 2014 But D doesn't have "final" :) In any event, that article by Mike

Andrej Mitrovic (4/8) Jan 31 2014 AFAIK D1's final was equivalent to D2's immutable. But I maybe

Jacob Carlborg (8/12) Jan 31 2014 In D2 if if a variable is immutable or const you can not call non-const

Andrej Mitrovic (2/5) Jan 31 2014 So in D1 const is non-transitive?

Dicebot (9/17) Jan 31 2014 It is completely different in D1. I think it is not even a

Dejan Lekic (3/5) Jan 31 2014 Well, "final" still works. Until it does not we will agree that D

Namespace (3/40) Jan 31 2014 Why is someone interested in implementing such an Ani Pattern

Stanislav Blinov (6/9) Jan 31 2014 Any sort of shared (as in, between threads) resource is often a

Namespace (6/10) Jan 31 2014 I know so many people and have read so many books where

Dejan Lekic (4/4) Jan 31 2014 Here is an updated Andrej's code:

Andrej Mitrovic (5/6) Jan 31 2014 Well yeah, but that's not really the only thing what a singleton is

Dejan Lekic (3/12) Jan 31 2014 Absolutely, that is why I would use bothe alternatives, depending
Andrei Alexandrescu (3/9) Jan 31 2014 Well yah Singleton should be created on first access.

Dejan Lekic (4/17) Jan 31 2014 If that is what people want, then David's version is definitely

Stanislav Blinov (2/4) Jan 31 2014 Dejan, your singletons are thread-local :)

Dejan Lekic (3/7) Jan 31 2014 YAY, that is correct! :'(

Andrej Mitrovic (2/3) Jan 31 2014 SingletonLazy isn't thread-safe. :)

Dejan Lekic (1/2) Jan 31 2014 EEK!
Dejan Lekic (4/7) Jan 31 2014 I made it thread-safe, and guess what - I ended up with

TC (23/38) Feb 07 2014 Should't be the LockSingleton implemented like this instead?

Iain Buclaw (26/69) Feb 07 2014 We don't want double-checked locking. :)
Stanislav Blinov (6/12) Feb 07 2014 (_instance is null) will most likely not be an atomic operation.

Daniel Murphy (3/5) Feb 07 2014 References are one word.

Stanislav Blinov (3/8) Feb 07 2014 Heh, indeed. Need to go have my brain scanned :\ I have no idea

Stanislav Blinov (3/16) Feb 07 2014 Scratch that.

luka8088 (26/46) Feb 09 2014 What about swapping function pointer so the check is done only once per

Stanislav Blinov (5/9) Feb 09 2014 That is an interesting idea indeed, though it seems to be faster

luka8088 (5/14) Feb 09 2014 I got it while writing code for dynamic languages (especially

Martin Nowak (2/17) Feb 09 2014

Stanislav Blinov (3/11) Feb 09 2014 I don't follow. get should be TLS, as a replacement for

luka8088 (5/17) Feb 09 2014 It is tls and it needs to be tls because one thread could be replacing

Andrej Mitrovic (3/7) Feb 10 2014 This confused me for a second since @property is meaningless for variabl...

luka8088 (2/13) Feb 10 2014 Yeah. My mistake. It should be removed.

Andrej Mitrovic (6/7) Feb 10 2014 Also, "static __gshared" is really meaningless here, it's either

luka8088 (26/35) Feb 10 2014 "static" does not meat it must be tls, as "static shared" is valid.

luka8088 (2/51) Feb 10 2014 Um actually this makes no sense. But anyway I mark it static.
Andrej Mitrovic (3/4) Feb 10 2014 Yes you're right. I'm beginning to really dislike the 20 different

Daniel Murphy (3/5) Feb 10 2014 Don't forget that __gshared static and static __gshared do different thi...

Andrej Mitrovic (2/3) Feb 10 2014 wat.

Dicebot (3/7) Feb 10 2014 To be more specific: "WATWATWAT"

Dejan Lekic (2/9) Feb 10 2014 Care to elaborate?

Daniel Murphy (2/5) Feb 10 2014 https://d.puremagic.com/issues/show_bug.cgi?id=4419

Andrej Mitrovic (2/9) Feb 11 2014 Ah, that thing. Yeah this whole issue is rather messy IMO.

Jerry (5/14) Feb 11 2014 Looking at the bug, I see the compiler doesn't implement what the spec

Daniel Murphy (3/7) Feb 12 2014 It's just messy in the sense that it doesn't behave in a logical or usef...

Andrej Mitrovic (13/18) Feb 10 2014 C:\dev\code\d_code>test_dmd

luka8088 (2/25) Feb 10 2014 Could it be that TLS is slower in LLVM?

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

There was a nice blog-post about implementing low-lock singletons in D, here:
http://davesdprogramming.wordpress.com/2013/05/06/low-lock-singletons/

One suggestion on Reddit was by dawgfoto (I think this is Martin
Nowak?), to use atomic primitives instead:
http://www.reddit.com/r/programming/comments/1droaa/lowlock_singletons_in_d_the_singleton_pattern/c9tmz07

I wanted to benchmark these different approaches. I was expecting
Martin's implementation to be the fastest one, but on my machine
(Athlon II X4 620 - 2.61GHz) the implementation in the blog post turns
out to be the fastest one. I'm wondering whether my test case is
flawed in some way. Btw, I think we should put an implementation of
this into Phobos.

The timings on my machine:

Test time for LockSingleton: 542 msecs.
Test time for SyncSingleton: 20 msecs.
Test time for AtomicSingleton: 755 msecs.

Here's the code:

http://codepad.org/TMb0xxYw

And pasted below for convenience:

-----
module singleton;

import std.concurrency;
import core.atomic;
import core.thread;

class LockSingleton
{
    static LockSingleton get()
    {
        __gshared LockSingleton _instance;

        synchronized
        {
            if (_instance is null)
                _instance = new LockSingleton;
        }

        return _instance;
    }

private:
    this() { }
}

class SyncSingleton
{
    static SyncSingleton get()
    {
        static bool _instantiated;  // tls
        __gshared SyncSingleton _instance;

        if (!_instantiated)
        {
            synchronized
            {
                if (_instance is null)
                    _instance = new SyncSingleton;

                _instantiated = true;
            }
        }

        return _instance;
    }

private:
    this() { }
}

class AtomicSingleton
{
    static AtomicSingleton get()
    {
        shared bool _instantiated;
        __gshared AtomicSingleton _instance;

        // only enter synchronized block if not instantiated
        if (!atomicLoad!(MemoryOrder.acq)(_instantiated))
        {
            synchronized
            {
                if (_instance is null)
                    _instance = new AtomicSingleton;

                atomicStore!(MemoryOrder.rel)(_instantiated, true);
            }
        }

        return _instance;
    }
}

version (unittest)
{
    ulong _thread_call_count;  // TLS
}

unittest
{
    import std.datetime;
    import std.stdio;
    import std.string;
    import std.typetuple;

    foreach (TestClass; TypeTuple!(LockSingleton, SyncSingleton,
AtomicSingleton))
    {
        // mixin to avoid multiple definition errors
        mixin(q{

        static void test_%1$s()
        {
            foreach (i; 0 .. 1024_000)
            {
                // just trying to avoid the compiler from doing
dead-code optimization
                _thread_call_count += (TestClass.get() !is null);
            }
        }

        auto sw = StopWatch(AutoStart.yes);

        enum threadCount = 4;
        foreach (i; 0 .. threadCount)
            spawn(&test_%1$s);
        thread_joinAll();

        }.format(TestClass.stringof));

        sw.stop();
        writefln("Test time for %s: %s msecs.", TestClass.stringof,
sw.peek.msecs);
    }
}

void main() { }
-----

Jan 31 2014

"Stanislav Blinov" <stanislav.blinov gmail.com> writes:

You forgot to make the flag static for AtomicSingleton. I'd also 
move the timing into the threads themselves, for fairness :)

http://codepad.org/gvm3A88k

Timings on my machine:

ldc2 -unittest -release -O3:

Test time for LockSingleton: 537 msecs.
Test time for SyncSingleton: 2 msecs.
Test time for AtomicSingleton: 2.25 msecs.

dmd -unittest -release -O -inline:

Test time for LockSingleton: 451.5 msecs.
Test time for SyncSingleton: 7.75 msecs.
Test time for AtomicSingleton: 99.75 msecs.

Jan 31 2014

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

On 1/31/14, Stanislav Blinov <stanislav.blinov gmail.com> wrote:
 You forgot to make the flag static for AtomicSingleton.

Ah. It was copied verbatim from reddit, I guess we both missed it.

 Timings on my machine:

 ldc2 -unittest -release -O3:

 Test time for LockSingleton: 537 msecs.
 Test time for SyncSingleton: 2 msecs.
 Test time for AtomicSingleton: 2.25 msecs.

Here's mine:

$ dmd -release -inline -O -noboundscheck -unittest -run singleton.d

Test time for LockSingleton: 577.5 msecs.
Test time for SyncSingleton: 9.25 msecs.
Test time for AtomicSingleton: 159.75 msecs.

Maybe ldc's optimizer is just much better at this? In either case how
come the atomic version is slower?

Jan 31 2014

"Stanislav Blinov" <stanislav.blinov gmail.com> writes:

On Friday, 31 January 2014 at 10:39:19 UTC, Andrej Mitrovic wrote:
 On 1/31/14, Stanislav Blinov <stanislav.blinov gmail.com> wrote:
 You forgot to make the flag static for AtomicSingleton.

 Ah. It was copied verbatim from reddit, I guess we both missed 
 it.

Yeah, with D's verbosity in this cases it's easy to miss.

 Here's mine:

 $ dmd -release -inline -O -noboundscheck -unittest -run 
 singleton.d

 Test time for LockSingleton: 577.5 msecs.
 Test time for SyncSingleton: 9.25 msecs.
 Test time for AtomicSingleton: 159.75 msecs.

 Maybe ldc's optimizer is just much better at this?

It is :) 
http://forum.dlang.org/thread/lqmqsnucadaqlkxkoffc forum.dlang.org

 In either case how come the atomic version is slower?

It may not be universally true, as Dmitry mentioned. On some 
platforms, TLS could be slow but atomics fast. I'm suspecting 
that on Windows TLS could be slower, actually.

Jan 31 2014

Benjamin Thaut <code benjamin-thaut.de> writes:

Am 31.01.2014 10:18, schrieb Stanislav Blinov:
 You forgot to make the flag static for AtomicSingleton. I'd also move
 the timing into the threads themselves, for fairness :)

 http://codepad.org/gvm3A88k

 Timings on my machine:

 ldc2 -unittest -release -O3:

 Test time for LockSingleton: 537 msecs.
 Test time for SyncSingleton: 2 msecs.
 Test time for AtomicSingleton: 2.25 msecs.

 dmd -unittest -release -O -inline:

 Test time for LockSingleton: 451.5 msecs.
 Test time for SyncSingleton: 7.75 msecs.
 Test time for AtomicSingleton: 99.75 msecs.

For x86 CPUs you don't really need MemoryOrder.acq as reads are atomic 
by default. So I replaced that with MemoryOrder.raw and named it 
AtomicSingletonRaw

On Windows 7:

dmd -unittest -release -O -inline -noboundscheck
Test time for LockSingleton: 299 msecs.
Test time for SyncSingleton: 5 msecs.
Test time for AtomicSingleton: 304 msecs.
Test time for AtomicSingletonRaw: 280 msecs.

ldc2 -release -unittest -O3
Test time for LockSingleton: 320 msecs.
Test time for SyncSingleton: 2 msecs.
Test time for AtomicSingleton: 271 msecs.
Test time for AtomicSingletonRaw: 209 msecs.

It seems that the SyncSingleton is supperior in all cases.

Jan 31 2014

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

On 1/31/14, Benjamin Thaut <code benjamin-thaut.de> wrote:
 For x86 CPUs you don't really need MemoryOrder.acq as reads are atomic
 by default.

Hmm, I guess we could use a version(X86) block to pick this. When you
say x86, do you also imply X86_64? Where can I read about the memory
reads being atomic by default?

Jan 31 2014

Benjamin Thaut <code benjamin-thaut.de> writes:

Am 31.01.2014 12:44, schrieb Andrej Mitrovic:
 On 1/31/14, Benjamin Thaut <code benjamin-thaut.de> wrote:
 For x86 CPUs you don't really need MemoryOrder.acq as reads are atomic
 by default.

 Hmm, I guess we could use a version(X86) block to pick this. When you
 say x86, do you also imply X86_64? Where can I read about the memory
 reads being atomic by default?

It depends on the processor architecture. Usually if you have a "normal" 
CPU architecture it garantuees a consitent view to memory. Meaning all 
reads and writes are atomic. (But not read modify write, or even read 
write). Usually only numa architectures don't garantuee a consitent view 
of memory, resulting in reads and writes not beeing atomic. For example 
the Intel Itanium architecture does not garantuee this. But usually all 
single processor architectures garantuee a consitent view of memory. I 
did not come arcross one yet, that didn't do so. (so ARM, PPC and X86, 
X86_64 all have atomic read/writes)

Also see: http://en.wikipedia.org/wiki/Cache_coherence

Jan 31 2014

Benjamin Thaut <code benjamin-thaut.de> writes:

If you need the details, read:

http://lwn.net/Articles/250967/

Kind Regards
Benjamin Thaut

Jan 31 2014

"Jonathan Bettencourt" <jbetten gmail.com> writes:

Is it just me or does the implementation of atomic.d look grossly 
inefficient and badly in need of a rewrite?

Jan 31 2014

Benjamin Thaut <code benjamin-thaut.de> writes:

Am 31.01.2014 15:27, schrieb Jonathan Bettencourt:
 Is it just me or does the implementation of atomic.d look grossly
 inefficient and badly in need of a rewrite?

I can't really judge that, as I don't have much experience in lock free 
programming. But if someone is to rewrite this module, then it should be 
someone with quite some experience in lock free programming. Taking a 
look at the memory model of C++11 and copy from there, might not hurt 
either.

Jan 31 2014

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

On 1/31/14, Benjamin Thaut <code benjamin-thaut.de> wrote:
 If you need the details, read:

 http://lwn.net/Articles/250967/

Aye it's been on my todo list forever, even though I've read the first
part when it was a single blost post, afair.

Jan 31 2014

Benjamin Thaut <code benjamin-thaut.de> writes:

Am 31.01.2014 15:30, schrieb Andrej Mitrovic:
 On 1/31/14, Benjamin Thaut <code benjamin-thaut.de> wrote:
 If you need the details, read:

 http://lwn.net/Articles/250967/

 Aye it's been on my todo list forever, even though I've read the first
 part when it was a single blost post, afair.

You should really take the time to read it. Its one of the best articles 
on the internet I ever read, and it has tons of relevant information for 
programmers. You can skip the first chaper, as it mostly talks about the 
hardware details of how memory works, and why it is hard to make it faster.

Jan 31 2014

"Stanislav Blinov" <stanislav.blinov gmail.com> writes:

On Friday, 31 January 2014 at 11:31:53 UTC, Benjamin Thaut wrote:

 For x86 CPUs you don't really need MemoryOrder.acq as reads are 
 atomic by default.

Uhm... atomicLoad() itself guarantees that the read is atomic. 
It's not about atomicity of operation, it's about sequential 
consistency. Using raw in this case is safe because the further 
synchronized block guarantees that this read will not be 
reordered to follow write. In fact, the presence of that 
synchronized block allows for making both load and store raw.

Jan 31 2014

"Stanislav Blinov" <stanislav.blinov gmail.com> writes:


synchronized block:

         // (2)
         if (!atomicLoad!(MemoryOrder.raw)(_instantiated))
         {
             // (1)
             synchronized
             { // <- this is 'acquire'
                 if (_instance is null) {
                     _instance = new AtomicSingleton;
                 }

             } // <- this is 'release'

             // This store cannot be moved to positions (1) or (2) 
because
             // of 'synchronized' above
             atomicStore!(MemoryOrder.raw)(_instantiated, true);
         }

Jan 31 2014

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

31-Jan-2014 17:26, Stanislav Blinov пишет:

 synchronized block:

          // (2)
          if (!atomicLoad!(MemoryOrder.raw)(_instantiated))
          {
              // (1)
              synchronized
              { // <- this is 'acquire'
                  if (_instance is null) {

//(3)
                      _instance = new AtomicSingleton;
                  }

              } // <- this is 'release'

//(4)
              // This store cannot be moved to positions (1) or (2) because
              // of 'synchronized' above
              atomicStore!(MemoryOrder.raw)(_instantiated, true);
          }

No it's not - the second thread may get to (3)
while some other thread is at (4).

-- 
Dmitry Olshansky

Jan 31 2014

"Stanislav Blinov" <stanislav.blinov gmail.com> writes:

On Friday, 31 January 2014 at 15:18:43 UTC, Dmitry Olshansky
wrote:
 31-Jan-2014 17:26, Stanislav Blinov пишет:

 the
 synchronized block:

         // (2)
         if (!atomicLoad!(MemoryOrder.raw)(_instantiated))
         {
             // (1)
             synchronized
             { // <- this is 'acquire'
                 if (_instance is null) {

 //(3)
                     _instance = new AtomicSingleton;
                 }

             } // <- this is 'release'

 //(4)
             // This store cannot be moved to positions (1) or 
 (2) because
             // of 'synchronized' above
             atomicStore!(MemoryOrder.raw)(_instantiated, true);
         }

 No it's not - the second thread may get to (3)
 while some other thread is at (4).

Nope. The only way the thread is going to end up past the null
check is if it's instantiating the singleton. It's inside the
locked region. As long as the bool is false one of the threads
will get inside. the synchronized block, all others will lock.
Once that "first" thread is done, the others will see a non null
reference. No thread can get to 4 until the singleton is created.

Jan 31 2014

"Stanislav Blinov" <stanislav.blinov gmail.com> writes:

On Friday, 31 January 2014 at 23:35:25 UTC, Stanislav Blinov 
wrote:

        // (2)
        if (!atomicLoad!(MemoryOrder.raw)(_instantiated))
        {
            // (1)
            synchronized
            { // <- this is 'acquire'
                if (_instance is null) {

 //(3)
                    _instance = new AtomicSingleton;
                }

            } // <- this is 'release'

 //(4)
            // This store cannot be moved to positions (1) or 
 (2) because
            // of 'synchronized' above
            atomicStore!(MemoryOrder.raw)(_instantiated, true);
        }

 No it's not - the second thread may get to (3)
 while some other thread is at (4).

 Nope. The only way the thread is going to end up past the null
 check is if it's instantiating the singleton. It's inside the
 locked region. As long as the bool is false one of the threads
 will get inside. the synchronized block, all others will lock.
 Once that "first" thread is done, the others will see a non null
 reference. No thread can get to 4 until the singleton is 
 created.

To clarify: only one thread will ever get to position (3). All 
others that follow it will see that _instance is not null, thus 
will just leave the synchronized section. Of course, this means 
that some N threads (that arrived to the synchronized section 
before the singleton was created) will all write 'true' into the 
flag. No big deal :)

Feb 01 2014

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

01-Feb-2014 18:23, Stanislav Blinov пишет:
 On Friday, 31 January 2014 at 23:35:25 UTC, Stanislav Blinov wrote:

        // (2)
        if (!atomicLoad!(MemoryOrder.raw)(_instantiated))
        {
            // (1)
            synchronized
            { // <- this is 'acquire'
                if (_instance is null) {

 //(3)
                    _instance = new AtomicSingleton;
                }

            } // <- this is 'release'

 //(4)
            // This store cannot be moved to positions (1) or (2)
 because
            // of 'synchronized' above
            atomicStore!(MemoryOrder.raw)(_instantiated, true);
        }

 No it's not - the second thread may get to (3)
 while some other thread is at (4).

 Nope. The only way the thread is going to end up past the null
 check is if it's instantiating the singleton. It's inside the
 locked region. As long as the bool is false one of the threads
 will get inside. the synchronized block, all others will lock.
 Once that "first" thread is done, the others will see a non null
 reference. No thread can get to 4 until the singleton is created.

 To clarify: only one thread will ever get to position (3). All others
 that follow it will see that _instance is not null, thus will just leave
 the synchronized section. Of course, this means that some N threads
 (that arrived to the synchronized section before the singleton was
 created) will all write 'true' into the flag. No big deal :)

Yes, I see there could be many writes to _instantiated field but not 
_instance.


-- 
Dmitry Olshansky

Feb 01 2014

"Stanislav Blinov" <stanislav.blinov gmail.com> writes:

There's a lot more to these singletons than meets the eye.

- It would seem that such usage of raw MemoryOrder in 
AtomicSingleton would be wrong (e.g. return to acq/rel is in 
order, which should not pose any performance issues on X86, as 
Sean mentioned).

- The instance references should be qualified shared.

This needs more serious review, even if only for academic 
purposes. I'll see what I can come up with :)
In the meantime, if anyone has anything to add to the list, 
please chime in!

Feb 07 2014

"Jonathan Bettencourt" <jbetten gmail.com> writes:

On Friday, 7 February 2014 at 20:09:29 UTC, Stanislav Blinov 
wrote:
 There's a lot more to these singletons than meets the eye.

 - It would seem that such usage of raw MemoryOrder in 
 AtomicSingleton would be wrong (e.g. return to acq/rel is in 
 order, which should not pose any performance issues on X86, as 
 Sean mentioned).

I agree that acq/rel is the correct way to go, but it will cause 
performance issues with the current implementation of AtomicLoad.

Feb 07 2014

"Cecil Ward" <d cecilward.com> writes:

On Friday, 7 February 2014 at 20:09:29 UTC, Stanislav Blinov
wrote:
 There's a lot more to these singletons than meets the eye.

 - It would seem that such usage of raw MemoryOrder in 
 AtomicSingleton would be wrong (e.g. return to acq/rel is in 
 order, which should not pose any performance issues on X86, as 
 Sean mentioned).

 - The instance references should be qualified shared.

 This needs more serious review, even if only for academic 
 purposes. I'll see what I can come up with :)
 In the meantime, if anyone has anything to add to the list, 
 please chime in!

Hi Martin, Sean, Stanislav et al

I would quite like to code-review atomics.d and maybe think about
improving the documentation and adding a few comments, especially
for the purposes of knowledge capture in this sticky field.

Would that be ok, in principle?

There are a few rough edges here and there _in my very unworthy
opinion_, and the odd bit that doesn't look quite right somehow
especially in the x64 branch. If I could even find the odd bug
then that would be good. Or rather bad.

A big amount of work has clearly gone into this module. So, many
beers to Sean and others who put their time into it. Research can
be quite a pig too on a project of this kind, I would imagine.

There is quite a list of things that I'm currently unclear about
when I read through the D, and this might mean me whimpering for
help occasionally..?

Best,

Cecil.

Feb 27 2014

"Stanislav Blinov" <stanislav.blinov gmail.com> writes:

On Friday, 28 February 2014 at 00:29:49 UTC, Cecil Ward wrote:
 On Friday, 7 February 2014 at 20:09:29 UTC, Stanislav Blinov

 This needs more serious review, even if only for academic 
 purposes. I'll see what I can come up with :)
 In the meantime, if anyone has anything to add to the list, 
 please chime in!

 Hi Martin, Sean, Stanislav et al

 I would quite like to code-review atomics.d

When I said "review" I meant this specific issue, e.g. 
singletons. Since then I got a bit carried away into general 
issues with 'shared' qualifier, so for me the quirks of 
singletons are on hold for now. But if you find other bugs (in 
atomic.d or anywhere else), inconsistencies, documentation 
omissions, etc., please post them. This thread clearly shows the 
value of more thorough testing. Who knows how long it would've 
taken to notice that atomicLoad() issue if Andrej hadn't created 
this thread.

 and maybe think about improving the documentation and adding a 
 few comments, especially
 for the purposes of knowledge capture in this sticky field.

 Would that be ok, in principle?

IMO submitting issues, enhacnements, documentation updates is 
always a good idea. Though don't be surprised if your submissions 
hang in the air for a while, it's pretty common esp. when people 
responsible for the original code are busy with other things.

 There are a few rough edges here and there _in my very unworthy
 opinion_, and the odd bit that doesn't look quite right somehow
 especially in the x64 branch. If I could even find the odd bug
 then that would be good. Or rather bad.

 A big amount of work has clearly gone into this module. So, many
 beers to Sean and others who put their time into it. Research 
 can
 be quite a pig too on a project of this kind, I would imagine.

Use bugzilla (https://d.puremagic.com/issues/) to submit 
issues/enhancement requests; or submit ready pull requests on 
github so that they can be reviewed, improved, and if all is 
good, eventually accepted. It's best done that way since it 
presents clear history and more focused discussion, and because 
threads in this NG sink rather quickly.

 There is quite a list of things that I'm currently unclear about
 when I read through the D, and this might mean me whimpering for
 help occasionally..?

I don't see a big red banner saying "don't post your questions 
here" anywhere ;)

Mar 03 2014

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

31-Jan-2014 12:25, Andrej Mitrovic пишет:
 There was a nice blog-post about implementing low-lock singletons in D, here:
 http://davesdprogramming.wordpress.com/2013/05/06/low-lock-singletons/

 One suggestion on Reddit was by dawgfoto (I think this is Martin
 Nowak?), to use atomic primitives instead:
 http://www.reddit.com/r/programming/comments/1droaa/lowlock_singletons_in_d_the_singleton_pattern/c9tmz07

 I wanted to benchmark these different approaches. I was expecting
 Martin's implementation to be the fastest one, but on my machine
 (Athlon II X4 620 - 2.61GHz) the implementation in the blog post turns
 out to be the fastest one.

And it was a big thing because of that. Also keep in mind that atomic 
ops are _relatively_ cheap on x86 the stuff should get even better on 
say ARM.


-- 
Dmitry Olshansky

Jan 31 2014

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

On 1/31/14, Dmitry Olshansky <dmitry.olsh gmail.com> wrote:
 Also keep in mind that atomic
 ops are _relatively_ cheap on x86 the stuff should get even better on
 say ARM.

Hmm yeah, but I was expecting better numbers. Even after the 'static'
fix in the bug as noted by Stanislav the atomic version is slower.

Jan 31 2014

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

On 1/31/14, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:
 Hmm yeah, but I was expecting better numbers. Even after the 'static'
 fix in the bug as noted by Stanislav the atomic version is slower.

Actually, I think I understand why this happens. Logically, the atomic
version will do an atomic read for *every* access, whereas the TLS
implementation only checks a thread-local boolean flag. Even though
the TLS implementation forces each new thread to enter the
synchronized block *on the first read for that thread*, on subsequent
reads that thread will not enter the synchronized block anymore.

After the very first call of every thread, the cost of the read
operation for the TLS version is a TLS read, whereas for the atomic
version it is an atomic read. I guess TLS read operations simply beat
atomic read operations.

The atomic implementation probably beats the TLS version when a lot of
new threads are being spawned at once and they only retrieve the
singleton which has already been initialized. E.g., say a 1000 threads
are spawned. In the atomic version, the 1000 threads will all do an
atomic read and not enter the synchronized block, whereas in the TLS
version the 1000 threads will all need to enter a synchronized block
on the very first read.

Jan 31 2014

"Stanislav Blinov" <stanislav.blinov gmail.com> writes:

On Friday, 31 January 2014 at 10:57:53 UTC, Andrej Mitrovic wrote:

 The atomic implementation probably beats the TLS version when a 
 lot of
 new threads are being spawned at once and they only retrieve the
 singleton which has already been initialized. E.g., say a 1000 
 threads
 are spawned.

Easy enough to test. But inconclusive. I just ran some tests with 
1024 threads :)

First, subsequent runs on my machine show interleaving results:

Test time for SyncSingleton: 61.2334 msecs.
Test time for AtomicSingleton: 15.9795 msecs.

Test time for SyncSingleton: 11.209 msecs.
Test time for AtomicSingleton: 25.4395 msecs.

Test time for SyncSingleton: 22.8105 msecs.
Test time for AtomicSingleton: 35.1865 msecs.

I guess I'd need a different CPU (and probably one that's not 
doing anything else at the time) to get conclusive results.

It also seems that either there *is* a race in there somewhere, 
or maybe a bug?.. Some runs just flat freeze (even on small 
thread counts) :\

Jan 31 2014

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

On 1/31/14, Stanislav Blinov <stanislav.blinov gmail.com> wrote:
 First, subsequent runs on my machine show interleaving results.
 It also seems that either there *is* a race in there somewhere,
 or maybe a bug?.. Some runs just flat freeze (even on small
 thread counts) :\

Hmm.. Well I know we've had some issues with threads on FreeBSD. It's
hard to just guess what's wrong though. :)

Jan 31 2014

"Stanislav Blinov" <stanislav.blinov gmail.com> writes:

On Friday, 31 January 2014 at 11:18:03 UTC, Andrej Mitrovic wrote:
 On 1/31/14, Stanislav Blinov <stanislav.blinov gmail.com> wrote:
 First, subsequent runs on my machine show interleaving results.
 It also seems that either there *is* a race in there somewhere,
 or maybe a bug?.. Some runs just flat freeze (even on small
 thread counts) :\

 Hmm.. Well I know we've had some issues with threads on 
 FreeBSD. It's
 hard to just guess what's wrong though. :)

I'm not comfortable with that atomicOp in the thread function.

I've reworked the unittest a little, to accomodate for multiple 
runs:

http://codepad.org/ghZdjvUE

And here are ldc's results (you may want to lower the thread 
count for dmd, I've killed program after the very first test took 
27 second :o):

Test 0 time for SyncSingleton: 35.4775 msecs.
Test 0 time for AtomicSingleton: 58.5859 msecs.

Test 1 time for SyncSingleton: 64.9863 msecs.
Test 1 time for AtomicSingleton: 12.5479 msecs.

Test 2 time for SyncSingleton: 44.2617 msecs.
Test 2 time for AtomicSingleton: 26.2842 msecs.

Test 3 time for SyncSingleton: 24.8008 msecs.
Test 3 time for AtomicSingleton: 34.416 msecs.

Test 4 time for SyncSingleton: 5.63477 msecs.
Test 4 time for AtomicSingleton: 28.458 msecs.

Test 5 time for SyncSingleton: 18.1123 msecs.
Test 5 time for AtomicSingleton: 29.6738 msecs.

Test 6 time for SyncSingleton: 12.0234 msecs.
Test 6 time for AtomicSingleton: 53.2061 msecs.

Test 7 time for SyncSingleton: 70.6982 msecs.
Test 7 time for AtomicSingleton: 13.2285 msecs.

Test 8 time for SyncSingleton: 12.3447 msecs.
Test 8 time for AtomicSingleton: 8.06348 msecs.

Test 9 time for SyncSingleton: 20.3145 msecs.
Test 9 time for AtomicSingleton: 14.334 msecs.

Again, inconclusive :)

Jan 31 2014

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

On 1/31/14, Stanislav Blinov <stanislav.blinov gmail.com> wrote:
 I've reworked the unittest a little, to accomodate for multiple
 runs:

 http://codepad.org/ghZdjvUE

I've finally managed to build LDC2 on Windows (MinGW version), here
are the timings between DMD and LDC2:

$ dmd -release -inline -O -noboundscheck -unittest singleton_2.d
 -oftest.exe && test.exe
Test time for LockSingleton: 606.5 msecs.
Test time for SyncSingleton: 7 msecs.
Test time for AtomicSingleton: 138 msecs.

$ ldmd2 -release -inline -O -noboundscheck -unittest singleton_2.d
 -oftest.exe && test.exe
Test time for LockSingleton: 536.25 msecs.
Test time for SyncSingleton: 5 msecs.
Test time for AtomicSingleton: 3 msecs.

Freaking awesome!

Feb 04 2014

"Stanislav Blinov" <stanislav.blinov gmail.com> writes:

On Tuesday, 4 February 2014 at 09:44:04 UTC, Andrej Mitrovic 
wrote:

 I've finally managed to build LDC2 on Windows (MinGW version), 
 here
 are the timings between DMD and LDC2:

 $ dmd -release -inline -O -noboundscheck -unittest singleton_2.d
  -oftest.exe && test.exe
 Test time for LockSingleton: 606.5 msecs.
 Test time for SyncSingleton: 7 msecs.
 Test time for AtomicSingleton: 138 msecs.

 $ ldmd2 -release -inline -O -noboundscheck -unittest 
 singleton_2.d
  -oftest.exe && test.exe
 Test time for LockSingleton: 536.25 msecs.
 Test time for SyncSingleton: 5 msecs.
 Test time for AtomicSingleton: 3 msecs.

 Freaking awesome!

:)

Have you also included fixes from 
http://forum.dlang.org/post/khidcgetalmguhassvqm forum.dlang.org ?

How do the test results look in multiple runs? Is AtomicSingleton 
always faster than SyncSingleton on Windows?

Feb 04 2014

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

On 2/4/14, Stanislav Blinov <stanislav.blinov gmail.com> wrote:
 Have you also included fixes from
 http://forum.dlang.org/post/khidcgetalmguhassvqm forum.dlang.org ?

I haven't figured out exactly what you're trying to swap there. Do you
have a full example:

 How do the test results look in multiple runs? Is AtomicSingleton
 always faster than SyncSingleton on Windows?

Pretty much. I'm getting reliable results. But I'm not a statistics
pro (and yeah I've read
http://zedshaw.com/essays/programmer_stats.html - still doesn't make
me a pro).

Feb 04 2014

"Stanislav Blinov" <stanislav.blinov gmail.com> writes:

On Tuesday, 4 February 2014 at 14:23:51 UTC, Andrej Mitrovic 
wrote:
 On 2/4/14, Stanislav Blinov <stanislav.blinov gmail.com> wrote:
 Have you also included fixes from
 http://forum.dlang.org/post/khidcgetalmguhassvqm forum.dlang.org 
 ?

 I haven't figured out exactly what you're trying to swap there. 
 Do you
 have a full example:

Both atomicLoad and atomicStore use raw MemoryOrder, and also the 
atomicStore is out of the synchronized {} section:

http://dpaste.dzfl.pl/291abc51bb0e

 How do the test results look in multiple runs? Is 
 AtomicSingleton
 always faster than SyncSingleton on Windows?

 Pretty much. I'm getting reliable results.

Interesting. As you've seen, for me on Linux it's 50/50.

 But I'm not a statistics pro (and yeah I've read
 http://zedshaw.com/essays/programmer_stats.html - still doesn't 
 make me a pro).

Same here :)

Feb 04 2014

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

On 2/4/14, Stanislav Blinov <stanislav.blinov gmail.com> wrote:
 Both atomicLoad and atomicStore use raw MemoryOrder, and also the
 atomicStore is out of the synchronized {} section:

 http://dpaste.dzfl.pl/291abc51bb0e

No difference, but maybe the timing precision isn't proper. It always
displays one of 3/3.25/4 msecs. Anywho what's important is that Atomic
is really speedy and Sync is almost as fast. Except with DMD which is
bad at optimizing this specific code.

Feb 05 2014

"Stanislav Blinov" <stanislav.blinov gmail.com> writes:

On Wednesday, 5 February 2014 at 08:39:08 UTC, Andrej Mitrovic 
wrote:

 No difference, but maybe the timing precision isn't proper. It 
 always displays one of 3/3.25/4 msecs.

Hmm... It should be as proper as it gets, judging from 
StopWatch's docs.

 Anywho what's important is that Atomic is really speedy and 
 Sync is almost as fast. Except with DMD  which is
 bad at optimizing this specific code.

Yup, at least we have two fast low-lock implementations to choose 
from depending on platform's capabilities regarding TLS and 
atomics.

Feb 05 2014

"Jonathan Bettencourt" <jbetten gmail.com> writes:

On Wednesday, 5 February 2014 at 09:30:51 UTC, Stanislav Blinov 
wrote:
 On Wednesday, 5 February 2014 at 08:39:08 UTC, Andrej Mitrovic 
 wrote:

 No difference, but maybe the timing precision isn't proper. It 
 always displays one of 3/3.25/4 msecs.

 Hmm... It should be as proper as it gets, judging from 
 StopWatch's docs.

 Anywho what's important is that Atomic is really speedy and 
 Sync is almost as fast. Except with DMD  which is
 bad at optimizing this specific code.

 Yup, at least we have two fast low-lock implementations to 
 choose from depending on platform's capabilities regarding TLS 
 and atomics.

The atomics implementation in druntime is very inefficient, it 
uses compare-and-swap for nearly everything.

I'm working on a rewrite.

Feb 05 2014

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

On 2/4/14, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:
 I haven't figured out exactly what you're trying to swap there. Do you
 have a full example:

s/:/?

Feb 04 2014

Jerry <jlquinn optonline.net> writes:

"Stanislav Blinov" <stanislav.blinov gmail.com> writes:

 On Tuesday, 4 February 2014 at 09:44:04 UTC, Andrej Mitrovic wrote:

 I've finally managed to build LDC2 on Windows (MinGW version), here
 are the timings between DMD and LDC2:

 $ dmd -release -inline -O -noboundscheck -unittest singleton_2.d
  -oftest.exe && test.exe
 Test time for LockSingleton: 606.5 msecs.
 Test time for SyncSingleton: 7 msecs.
 Test time for AtomicSingleton: 138 msecs.

 $ ldmd2 -release -inline -O -noboundscheck -unittest singleton_2.d
  -oftest.exe && test.exe
 Test time for LockSingleton: 536.25 msecs.
 Test time for SyncSingleton: 5 msecs.
 Test time for AtomicSingleton: 3 msecs.

 Freaking awesome!


Here's the best and worst times I get on my linux laptop.  These are
with 2.064.2 dmd and gdc 4.9 with 2.064.2

On Ubuntu x86_64:

~/dmd2/linux/bin64/dmd -O -release -inline -noboundscheck -unittest singleton.d

Test 2 time for SyncSingleton: 753.547 msecs.
Test 2 time for AtomicSingleton: 22290.3 msecs.

Test 3 time for SyncSingleton: 254.968 msecs.
Test 3 time for AtomicSingleton: 22903.3 msecs.

Test 6 time for SyncSingleton: 510.118 msecs.
Test 6 time for AtomicSingleton: 23970.9 msecs.

Test 8 time for SyncSingleton: 480.175 msecs.
Test 8 time for AtomicSingleton: 12827.9 msecs.


../bin/gdc -frelease -funittest -O3 singleton.d

Test 0 time for SyncSingleton: 458.605 msecs.
Test 0 time for AtomicSingleton: 1985.87 msecs.

Test 1 time for SyncSingleton: 334.097 msecs.
Test 1 time for AtomicSingleton: 2030.29 msecs.

Test 5 time for SyncSingleton: 355.765 msecs.
Test 5 time for AtomicSingleton: 1040.87 msecs.


Test 9 time for SyncSingleton: 295.145 msecs.
Test 9 time for AtomicSingleton: 1272.22 msecs.


It seems like gdc and dmd are similar for SyncSingleton.
AtomicSingleton is significantly faster for gdc, but not as fast as
SyncSingleton.

Feb 04 2014

"Stanislav Blinov" <stanislav.blinov gmail.com> writes:

On Wednesday, 5 February 2014 at 00:11:58 UTC, Jerry wrote:

 Here's the best and worst times I get on my linux laptop.  
 These are
 with 2.064.2 dmd and gdc 4.9 with 2.064.2

 On Ubuntu x86_64:

 ~/dmd2/linux/bin64/dmd -O -release -inline -noboundscheck 
 -unittest singleton.d

 Test 2 time for SyncSingleton: 753.547 msecs.
 Test 2 time for AtomicSingleton: 22290.3 msecs.

 Test 3 time for SyncSingleton: 254.968 msecs.
 Test 3 time for AtomicSingleton: 22903.3 msecs.

 Test 6 time for SyncSingleton: 510.118 msecs.
 Test 6 time for AtomicSingleton: 23970.9 msecs.

 Test 8 time for SyncSingleton: 480.175 msecs.
 Test 8 time for AtomicSingleton: 12827.9 msecs.

Whoah, those times for AtomicSingleton are way high. What kind of 
machine is your laptop?

Perhaps we need to repost the test with the latest implementation 
of AtomicSingleton.

Feb 05 2014

Jerry <jlquinn optonline.net> writes:

"Stanislav Blinov" <stanislav.blinov gmail.com> writes:

 On Wednesday, 5 February 2014 at 00:11:58 UTC, Jerry wrote:

 Here's the best and worst times I get on my linux laptop.  These are
 with 2.064.2 dmd and gdc 4.9 with 2.064.2

 On Ubuntu x86_64:

 ~/dmd2/linux/bin64/dmd -O -release -inline -noboundscheck -unittest
 singleton.d

 Test 2 time for SyncSingleton: 753.547 msecs.
 Test 2 time for AtomicSingleton: 22290.3 msecs.

 Test 3 time for SyncSingleton: 254.968 msecs.
 Test 3 time for AtomicSingleton: 22903.3 msecs.

 Test 6 time for SyncSingleton: 510.118 msecs.
 Test 6 time for AtomicSingleton: 23970.9 msecs.

 Test 8 time for SyncSingleton: 480.175 msecs.
 Test 8 time for AtomicSingleton: 12827.9 msecs.

 Whoah, those times for AtomicSingleton are way high. What kind of machine is
 your laptop?

Core 2 Due T9400.  The gdc times were much better for AtomicSingleton -
about 4x slower than SyncSingleton.

 Perhaps we need to repost the test with the latest implementation of
 AtomicSingleton.

I downloaded the test program yesterday.

Feb 05 2014

"Stanislav Blinov" <stanislav.blinov gmail.com> writes:

On Wednesday, 5 February 2014 at 21:47:40 UTC, Jerry wrote:

 I downloaded the test program yesterday.

Here's my latest revision: http://dpaste.dzfl.pl/5b54df1c7004

Andrej, I hope you don't mind me fiddling with that code? I've 
put that atomic fix in there, also switched timing to use hnsecs 
(converted back to msecs for output), which seems to give more 
accurate readings.

Feb 05 2014

Jerry <jlquinn optonline.net> writes:

"Stanislav Blinov" <stanislav.blinov gmail.com> writes:

 On Wednesday, 5 February 2014 at 21:47:40 UTC, Jerry wrote:

 I downloaded the test program yesterday.

 Here's my latest revision: http://dpaste.dzfl.pl/5b54df1c7004

 Andrej, I hope you don't mind me fiddling with that code? I've put that atomic
 fix in there, also switched timing to use hnsecs (converted back to msecs for
 output), which seems to give more accurate readings.

Yup, that helps out the AtomicSingleton a lot.  Here's best and worst
times for each for dmd and gdc:


jlquinn wyvern:~/d/tests$ ~/dmd2/linux/bin64/dmd -O -release -inline -unittest
singleton2.d 
jlquinn wyvern:~/d/tests$ ./singleton2
*Test 2 time for SyncSingleton: 585.992 msecs.
Test 2 time for AtomicSingleton: 1189.03 msecs.

Test 5 time for SyncSingleton: 796.834 msecs.
*Test 5 time for AtomicSingleton: 1069.08 msecs.

*Test 7 time for SyncSingleton: 811.711 msecs.
Test 7 time for AtomicSingleton: 1263.36 msecs.

Test 9 time for SyncSingleton: 605.729 msecs.
*Test 9 time for AtomicSingleton: 2173.74 msecs.


jlquinn wyvern:~/d/tests$ ../bin/gdc -O3 -finline -frelease -fno-bounds-check
-funittest singleton2.d 
jlquinn wyvern:~/d/tests$ ./a.out 
Test 0 time for SyncSingleton: 542.797 msecs.
*Test 0 time for AtomicSingleton: 257.805 msecs.

*Test 5 time for SyncSingleton: 620.052 msecs.
Test 5 time for AtomicSingleton: 248.951 msecs.

Test 7 time for SyncSingleton: 437.124 msecs.
*Test 7 time for AtomicSingleton: 605.781 msecs.

*Test 8 time for SyncSingleton: 252.643 msecs.
Test 8 time for AtomicSingleton: 279.854 msecs.

Feb 06 2014

"Sean Kelly" <sean invisibleduck.org> writes:

Weird.  atomicLoad(raw) should be the same as atomicLoad(acq), 
and atomicStore(raw) should be the same as atomicStore(rel).  At 
least on x86.  I don't know why that change made a difference in 
performance.

Feb 07 2014

"Stanislav Blinov" <stanislav.blinov gmail.com> writes:

On Friday, 7 February 2014 at 08:10:58 UTC, Sean Kelly wrote:
 Weird.  atomicLoad(raw) should be the same as atomicLoad(acq), 
 and atomicStore(raw) should be the same as atomicStore(rel).  
 At least on x86.  I don't know why that change made a 
 difference in performance.

huh?

--8<-- core/atomic.d

         template needsLoadBarrier( MemoryOrder ms )
         {
             enum bool needsLoadBarrier = ms != MemoryOrder.raw;
         }

-->8--

Didn't you write this? :)

Feb 07 2014

"Sean Kelly" <sean invisibleduck.org> writes:

On Friday, 7 February 2014 at 11:17:49 UTC, Stanislav Blinov 
wrote:
 On Friday, 7 February 2014 at 08:10:58 UTC, Sean Kelly wrote:
 Weird.  atomicLoad(raw) should be the same as atomicLoad(acq), 
 and atomicStore(raw) should be the same as atomicStore(rel).  
 At least on x86.  I don't know why that change made a 
 difference in performance.

 huh?

 --8<-- core/atomic.d

         template needsLoadBarrier( MemoryOrder ms )
         {
             enum bool needsLoadBarrier = ms != MemoryOrder.raw;
         }

 -->8--

 Didn't you write this? :)

Oops.  I thought that since Intel has officially defined loads as 
having acquire semantics, I had eliminated the barrier 
requirement there.  But I guess not.  I suppose it's an issue 
worth discussing.  Does anyone know offhand what C++0x 
implementations do for load acquires on x86?

Feb 07 2014

"Stanislav Blinov" <stanislav.blinov gmail.com> writes:

On Friday, 7 February 2014 at 15:42:06 UTC, Sean Kelly wrote:

 Oops.  I thought that since Intel has officially defined loads 
 as having acquire semantics, I had eliminated the barrier 
 requirement there.  But I guess not.  I suppose it's an issue 
 worth discussing.  Does anyone know offhand what C++0x 
 implementations do for load acquires on x86?

Offhand - no. But who forbids empirical tests? :)

--8<-- main.cpp

#include <atomic>
#include <cstdint>
#include <iostream>

int test32() {
         std::atomic<int> ai(0xfacefeed);
         return ai.load(std::memory_order_acquire);
}

int64_t test64() {
         std::atomic<int64_t> ai(0xbadface00badface);
         return ai.load(std::memory_order_acquire);
}

int main(int argc, char** argv) {
         auto i1 = test32();
         auto i2 = test64();
         // Prevent dead code optimization
         std::cout << i1 << " " << i2 << std::endl;
}

-->8--

I've pulled the atomic ops into separate functions to try and 
prevent the compiler from being too clever.
I'm using --std=c++11 but --std=c++0x would work as well.

$ g++ -Ofast -m32 --std=c++11 main.cpp
$ objdump -d -w -r -C --no-show-raw-insn 
--disassembler-options=intel a.out | less -S

--8<--

08048830 <test32()>:
  8048830:       sub    esp,0x10
  8048833:       mov    DWORD PTR [esp+0xc],0xfacefeed
  804883b:       mov    eax,DWORD PTR [esp+0xc]
  804883f:       add    esp,0x10
  8048842:       ret
  8048843:       lea    esi,[esi+0x0]
  8048849:       lea    edi,[edi+eiz*1+0x0]

08048850 <test64()>:
  8048850:       sub    esp,0x1c
  8048853:       mov    DWORD PTR [esp+0x10],0xbadface
  804885b:       mov    DWORD PTR [esp+0x14],0xbadface0
  8048863:       fild   QWORD PTR [esp+0x10]
  8048867:       fistp  QWORD PTR [esp]
  804886a:       mov    eax,DWORD PTR [esp]
  804886d:       mov    edx,DWORD PTR [esp+0x4]
  8048871:       add    esp,0x1c
  8048874:       ret
  8048875:       xchg   ax,ax
  8048877:       xchg   ax,ax
  8048879:       xchg   ax,ax
  804887b:       xchg   ax,ax
  804887d:       xchg   ax,ax
  804887f:       nop

-->8--

$ g++ -Ofast -m64 --std=c++11 main.cpp
$ objdump -d -w -r -C --no-show-raw-insn 
--disassembler-options=intel a.out | less -S

--8<--

0000000000400950 <test32()>:
   400950:       mov    DWORD PTR [rsp-0x18],0xfacefeed
   400958:       mov    eax,DWORD PTR [rsp-0x18]
   40095c:       ret
   40095d:       nop    DWORD PTR [rax]

0000000000400960 <test64()>:
   400960:       movabs rax,0xbadface00badface
   40096a:       mov    QWORD PTR [rsp-0x18],rax
   40096f:       mov    rax,QWORD PTR [rsp-0x18]
   400974:       ret
   400975:       nop    WORD PTR cs:[rax+rax*1+0x0]
   40097f:       nop

-->8--

No barriers in sight.

Feb 07 2014

"Sean Kelly" <sean invisibleduck.org> writes:

On Friday, 7 February 2014 at 16:36:03 UTC, Stanislav Blinov 
wrote:
 No barriers in sight.

Awesome.  Then I think we can go back to the old logic.

Feb 07 2014

"Stanislav Blinov" <stanislav.blinov gmail.com> writes:

On Friday, 7 February 2014 at 16:57:50 UTC, Sean Kelly wrote:
 On Friday, 7 February 2014 at 16:36:03 UTC, Stanislav Blinov 
 wrote:
 No barriers in sight.

 Awesome.  Then I think we can go back to the old logic.

Cool. Also, from 
http://en.cppreference.com/w/cpp/atomic/memory_order:

--8<--

On strongly-ordered systems (x86, SPARC, IBM mainframe), 
release-acquire ordering is automatic for the majority of 
operations. No additional CPU instructions are issued for this 
synchronization mode, only certain compiler optimizations are 
affected (e.g. the compiler is prohibited from moving non-atomic 
stores past the atomic store-release or perform non-atomic loads 
earlier than the atomic load-acquire)

-->8--

Feb 07 2014

Marco Leise <Marco.Leise gmx.de> writes:

Am Fri, 07 Feb 2014 17:10:06 +0000
schrieb "Stanislav Blinov" <stanislav.blinov gmail.com>:

 On Friday, 7 February 2014 at 16:57:50 UTC, Sean Kelly wrote:
 On Friday, 7 February 2014 at 16:36:03 UTC, Stanislav Blinov 
 wrote:
 No barriers in sight.

 Awesome.  Then I think we can go back to the old logic.

 
 Cool. Also, from 
 http://en.cppreference.com/w/cpp/atomic/memory_order:
 
 --8<--
 
 On strongly-ordered systems (x86, SPARC, IBM mainframe), 
 release-acquire ordering is automatic for the majority of 
 operations. No additional CPU instructions are issued for this 
 synchronization mode, only certain compiler optimizations are 
 affected (e.g. the compiler is prohibited from moving non-atomic 
 stores past the atomic store-release or perform non-atomic loads 
 earlier than the atomic load-acquire)
 
 -->8--

Strong-ordering does not work on x86/amd64 in two cases:
http://preshing.com/20120913/acquire-and-release-semantics/#IDComment721195739

Just thought I should throw that in. Only the official CPU docs
will give certainty :)

-- 
Marco

Feb 07 2014

Martin Nowak <code dawg.eu> writes:

On 02/07/2014 06:10 PM, Stanislav Blinov wrote:
 On Friday, 7 February 2014 at 16:57:50 UTC, Sean Kelly wrote:
 --8<--

 On strongly-ordered systems (x86, SPARC, IBM mainframe), release-acquire
 ordering is automatic for the majority of operations. No additional CPU
 instructions are issued for this synchronization mode, only certain
 compiler optimizations are affected (e.g. the compiler is prohibited
 from moving non-atomic stores past the atomic store-release or perform
 non-atomic loads earlier than the atomic load-acquire)

 -->8--

So, who is going to fix core.atomic?

Feb 08 2014

"Stanislav Blinov" <stanislav.blinov gmail.com> writes:

On Sunday, 9 February 2014 at 01:40:51 UTC, Martin Nowak wrote:

 So, who is going to fix core.atomic?

I was under impression that Sean was onto it.

Feb 09 2014

Martin Nowak <code dawg.eu> writes:

On 02/09/2014 03:07 PM, Stanislav Blinov wrote:
 On Sunday, 9 February 2014 at 01:40:51 UTC, Martin Nowak wrote:

 So, who is going to fix core.atomic?

 I was under impression that Sean was onto it.

Can you please submit a bug report, so we don't loose track of this.

Feb 09 2014

"Stanislav Blinov" <stanislav.blinov gmail.com> writes:

On Sunday, 9 February 2014 at 18:07:50 UTC, Martin Nowak wrote:

 Can you please submit a bug report, so we don't loose track of 
 this.

Sure:
https://d.puremagic.com/issues/show_bug.cgi?id=12121

Feb 09 2014

Iain Buclaw <ibuclaw gdcproject.org> writes:

On 7 Feb 2014 15:45, "Sean Kelly" <sean invisibleduck.org> wrote:
 On Friday, 7 February 2014 at 11:17:49 UTC, Stanislav Blinov wrote:
 On Friday, 7 February 2014 at 08:10:58 UTC, Sean Kelly wrote:
 Weird.  atomicLoad(raw) should be the same as atomicLoad(acq), and



atomicStore(raw) should be the same as atomicStore(rel).  At least on x86.
 I don't know why that change made a difference in performance.
 huh?

 --8<-- core/atomic.d

         template needsLoadBarrier( MemoryOrder ms )
         {
             enum bool needsLoadBarrier = ms != MemoryOrder.raw;
         }

 -->8--

 Didn't you write this? :)


 Oops.  I thought that since Intel has officially defined loads as having

acquire semantics, I had eliminated the barrier requirement there.  But I
guess not.  I suppose it's an issue worth discussing.  Does anyone know
offhand what C++0x implementations do for load acquires on x86?

Speaking of which, I need to add 'Update gcc.atomics to use new C++0x
intrinsics' to the GDCProjects page - they map closely to what core.atomic
is doing, and should see better performance compared to the __sync
intrinsics.  :)

Feb 07 2014

Marco Leise <Marco.Leise gmx.de> writes:

Am Fri, 7 Feb 2014 18:42:29 +0000
schrieb Iain Buclaw <ibuclaw gdcproject.org>:

 On 7 Feb 2014 15:45, "Sean Kelly" <sean invisibleduck.org> wrote:
 On Friday, 7 February 2014 at 11:17:49 UTC, Stanislav Blinov wrote:
 On Friday, 7 February 2014 at 08:10:58 UTC, Sean Kelly wrote:
 Weird.  atomicLoad(raw) should be the same as atomicLoad(acq), and



 atomicStore(raw) should be the same as atomicStore(rel).  At least on x86.
  I don't know why that change made a difference in performance.
 huh?

 --8<-- core/atomic.d

         template needsLoadBarrier( MemoryOrder ms )
         {
             enum bool needsLoadBarrier =3D ms !=3D MemoryOrder.raw;
         }

 -->8--

 Didn't you write this? :)


 Oops.  I thought that since Intel has officially defined loads as having

 acquire semantics, I had eliminated the barrier requirement there.  But I
 guess not.  I suppose it's an issue worth discussing.  Does anyone know
 offhand what C++0x implementations do for load acquires on x86?
=20
 Speaking of which, I need to add 'Update gcc.atomics to use new C++0x
 intrinsics' to the GDCProjects page - they map closely to what core.atomic
 is doing, and should see better performance compared to the __sync
 intrinsics.  :)

You send shared variables as "volatile" to the backend and
that is correct. I wonder since that should create strong
ordering of memory operations (correct?), if DMD has something
similar, or if D's "shared" isn't really shared at al=C4=BA and
relies entirely on the correct use of atomicLoad/atomicStore
and atomicFence. In that case, would the GCC backend be able to
optimize more around shared variables (by not considering them
volatile) and still be no worse off than DMD?

--=20
Marco

Feb 07 2014

Iain Buclaw <ibuclaw gdcproject.org> writes:

On 8 Feb 2014 01:20, "Marco Leise" <Marco.Leise gmx.de> wrote:
 Am Fri, 7 Feb 2014 18:42:29 +0000
 schrieb Iain Buclaw <ibuclaw gdcproject.org>:

 On 7 Feb 2014 15:45, "Sean Kelly" <sean invisibleduck.org> wrote:
 On Friday, 7 February 2014 at 11:17:49 UTC, Stanislav Blinov wrote:
 On Friday, 7 February 2014 at 08:10:58 UTC, Sean Kelly wrote:
 Weird.  atomicLoad(raw) should be the same as atomicLoad(acq), and



 atomicStore(raw) should be the same as atomicStore(rel).  At least on


x86.
  I don't know why that change made a difference in performance.
 huh?

 --8<-- core/atomic.d

         template needsLoadBarrier( MemoryOrder ms )
         {
             enum bool needsLoadBarrier =3D ms !=3D MemoryOrder.raw;
         }

 -->8--

 Didn't you write this? :)


 Oops.  I thought that since Intel has officially defined loads as



having
 acquire semantics, I had eliminated the barrier requirement there.  But


I
 guess not.  I suppose it's an issue worth discussing.  Does anyone know
 offhand what C++0x implementations do for load acquires on x86?

 Speaking of which, I need to add 'Update gcc.atomics to use new C++0x
 intrinsics' to the GDCProjects page - they map closely to what


core.atomic
 is doing, and should see better performance compared to the __sync
 intrinsics.  :)

 You send shared variables as "volatile" to the backend and
 that is correct. I wonder since that should create strong
 ordering of memory operations (correct?), if DMD has something
 similar, or if D's "shared" isn't really shared at al=C4=BA and
 relies entirely on the correct use of atomicLoad/atomicStore
 and atomicFence. In that case, would the GCC backend be able to
 optimize more around shared variables (by not considering them
 volatile) and still be no worse off than DMD?

No. The fact that I decided shared data be marked volatile was *not*
because of a strong ordering. Remember, we follow C semantics here, which
is quite specific in not guaranteeing this.

The reason it is set as volatile, is that it (instead) guarantees the
compiler will not generate code that explicitly cache the shared data.

Feb 09 2014

"Stanislav Blinov" <stanislav.blinov gmail.com> writes:

Isn't it great how a simple benchmark thread can reveal such 
valuable insights and important problems?

Feb 09 2014

Marco Leise <Marco.Leise gmx.de> writes:

Am Sun, 9 Feb 2014 20:47:07 +0000
schrieb Iain Buclaw <ibuclaw gdcproject.org>:

 On 8 Feb 2014 01:20, "Marco Leise" <Marco.Leise gmx.de> wrote:
 Am Fri, 7 Feb 2014 18:42:29 +0000
 schrieb Iain Buclaw <ibuclaw gdcproject.org>:

 On 7 Feb 2014 15:45, "Sean Kelly" <sean invisibleduck.org> wrote:
 On Friday, 7 February 2014 at 11:17:49 UTC, Stanislav Blinov wrote:
 On Friday, 7 February 2014 at 08:10:58 UTC, Sean Kelly wrote:
 Weird.  atomicLoad(raw) should be the same as atomicLoad(acq), and



 atomicStore(raw) should be the same as atomicStore(rel).  At least on


 x86.
  I don't know why that change made a difference in performance.
 huh?

 --8<-- core/atomic.d

         template needsLoadBarrier( MemoryOrder ms )
         {
             enum bool needsLoadBarrier =3D ms !=3D MemoryOrder.raw;
         }

 -->8--

 Didn't you write this? :)


 Oops.  I thought that since Intel has officially defined loads as



 having
 acquire semantics, I had eliminated the barrier requirement there.  B=



ut
 I
 guess not.  I suppose it's an issue worth discussing.  Does anyone kn=



ow
 offhand what C++0x implementations do for load acquires on x86?

 Speaking of which, I need to add 'Update gcc.atomics to use new C++0x
 intrinsics' to the GDCProjects page - they map closely to what


 core.atomic
 is doing, and should see better performance compared to the __sync
 intrinsics.  :)

 You send shared variables as "volatile" to the backend and
 that is correct. I wonder since that should create strong
 ordering of memory operations (correct?), if DMD has something
 similar, or if D's "shared" isn't really shared at al=C4=BA and
 relies entirely on the correct use of atomicLoad/atomicStore
 and atomicFence. In that case, would the GCC backend be able to
 optimize more around shared variables (by not considering them
 volatile) and still be no worse off than DMD?

=20
 No. The fact that I decided shared data be marked volatile was *not*
 because of a strong ordering. Remember, we follow C semantics here, which
 is quite specific in not guaranteeing this.
=20
 The reason it is set as volatile, is that it (instead) guarantees the
 compiler will not generate code that explicitly cache the shared data.

Ah, alright then.

--=20
Marco

Feb 17 2014

"Stanislav Blinov" <stanislav.blinov gmail.com> writes:

On Friday, 7 February 2014 at 04:06:40 UTC, Jerry wrote:
 "Stanislav Blinov" <stanislav.blinov gmail.com> writes:
 Here's my latest revision: http://dpaste.dzfl.pl/5b54df1c7004

 Yup, that helps out the AtomicSingleton a lot.  Here's best and 
 worst times for each for dmd and gdc:

Cool, I almost started to research that CPU of yours :)

 jlquinn wyvern:~/d/tests$ ~/dmd2/linux/bin64/dmd -O -release 
 -inline -unittest singleton2.d
 jlquinn wyvern:~/d/tests$ ./singleton2
 *Test 2 time for SyncSingleton: 585.992 msecs.
 Test 2 time for AtomicSingleton: 1189.03 msecs.

 Test 5 time for SyncSingleton: 796.834 msecs.
 *Test 5 time for AtomicSingleton: 1069.08 msecs.

 *Test 7 time for SyncSingleton: 811.711 msecs.
 Test 7 time for AtomicSingleton: 1263.36 msecs.

 Test 9 time for SyncSingleton: 605.729 msecs.
 *Test 9 time for AtomicSingleton: 2173.74 msecs.

 jlquinn wyvern:~/d/tests$ ../bin/gdc -O3 -finline -frelease 
 -fno-bounds-check -funittest singleton2.d
 jlquinn wyvern:~/d/tests$ ./a.out
 Test 0 time for SyncSingleton: 542.797 msecs.
 *Test 0 time for AtomicSingleton: 257.805 msecs.

 *Test 5 time for SyncSingleton: 620.052 msecs.
 Test 5 time for AtomicSingleton: 248.951 msecs.

 Test 7 time for SyncSingleton: 437.124 msecs.
 *Test 7 time for AtomicSingleton: 605.781 msecs.

 *Test 8 time for SyncSingleton: 252.643 msecs.
 Test 8 time for AtomicSingleton: 279.854 msecs.

Nice.

Feb 07 2014

Marco Leise <Marco.Leise gmx.de> writes:

Am Wed, 05 Feb 2014 16:47:40 -0500
schrieb Jerry <jlquinn optonline.net>:

 "Stanislav Blinov" <stanislav.blinov gmail.com> writes:
 
 On Wednesday, 5 February 2014 at 00:11:58 UTC, Jerry wrote:

 Here's the best and worst times I get on my linux laptop.  These are
 with 2.064.2 dmd and gdc 4.9 with 2.064.2

 On Ubuntu x86_64:

 ~/dmd2/linux/bin64/dmd -O -release -inline -noboundscheck -unittest
 singleton.d

 Test 2 time for SyncSingleton: 753.547 msecs.
 Test 2 time for AtomicSingleton: 22290.3 msecs.

 Test 3 time for SyncSingleton: 254.968 msecs.
 Test 3 time for AtomicSingleton: 22903.3 msecs.

 Test 6 time for SyncSingleton: 510.118 msecs.
 Test 6 time for AtomicSingleton: 23970.9 msecs.

 Test 8 time for SyncSingleton: 480.175 msecs.
 Test 8 time for AtomicSingleton: 12827.9 msecs.

 Whoah, those times for AtomicSingleton are way high. What kind of machine is
 your laptop?

 
 Core 2 Due T9400.  The gdc times were much better for AtomicSingleton -
 about 4x slower than SyncSingleton.
 
 Perhaps we need to repost the test with the latest implementation of
 AtomicSingleton.

 
 I downloaded the test program yesterday.

I just tested with DMD 2.064.2 and my numbers for the
AtomicSingleton are not as high.
This is on a Core 2 Duo T7250 / 2.0 Ghz.

Test 0 time for SyncSingleton: 1068.83 msecs.
Test 0 time for AtomicSingleton: 2102.32 msecs.

Test 1 time for SyncSingleton: 901.215 msecs.
Test 1 time for AtomicSingleton: 2479.6 msecs.

Test 2 time for SyncSingleton: 1091.91 msecs.
Test 2 time for AtomicSingleton: 2269.45 msecs.

Test 3 time for SyncSingleton: 1156.74 msecs.
Test 3 time for AtomicSingleton: 2498.25 msecs.


Also for GDC my numbers are like this:

Test 0 time for SyncSingleton: 657.928 msecs.
Test 0 time for AtomicSingleton: 851.795 msecs.

Test 1 time for SyncSingleton: 655.204 msecs.
Test 1 time for AtomicSingleton: 893.51 msecs.

Test 2 time for SyncSingleton: 613.881 msecs.
Test 2 time for AtomicSingleton: 843.635 msecs.

Test 3 time for SyncSingleton: 657.87 msecs.
Test 3 time for AtomicSingleton: 709.823 msecs.

Which is far from the difference you see.

-- 
Marco

Feb 07 2014

"Dejan Lekic" <dejan.lekic gmail.com> writes:

I was thinking about implementing a typical Java singleton in D, 
and then decided to first check whether someone already did that, 
and guess what - yes, someone did. Chech this URL: 
http://dblog.aldacron.net/2007/03/03/singletons-in-d/

Something like this (taken from the article above) in the case 
you do not want lazy initialisation:

     class Singleton2(T)
     {
     public:
         static const T instance;

     private:
         this() {}

         static this() { instance = new T; }
     }

     class TMySingleton2 : Singleton!(TMySingleton2)
     {
     }

Something like this (taken from the article above) in the case 
you want lazy initialisation:

     class Singleton(T)
     {
     public:
         static T instance()
         {
             if(_instance is null) _instance = new T;
             return _instance;
         }

     private:
         this() {}

         static T _instance;
     }

     class TMySingleton : Singleton!(TMySingleton)
     {
     }

If there are some Java programmers around who are curious how is 
Java version done: 
http://www.javaworld.com/article/2073352/core-java/simply-singleton.html

Jan 31 2014

"Dejan Lekic" <dejan.lekic gmail.com> writes:

I should have mentioned two things in my previous post.

1) There are no locks involved. No need, because the solution 
relies on the fact that static member variables are guaranteed to 
be created the first time they are accessed.

2) Note that we have constructor disabled. This is important not 
to forget. ;)

Jan 31 2014

"Stanislav Blinov" <stanislav.blinov gmail.com> writes:

On Friday, 31 January 2014 at 10:26:50 UTC, Dejan Lekic wrote:
 I should have mentioned two things in my previous post.

 1) There are no locks involved. No need, because the solution 
 relies on the fact that static member variables are guaranteed 
 to be created the first time they are accessed.

And they are thread-local :)

 2) Note that we have constructor disabled. This is important 
 not to forget. ;)

What use would the const version have? You'd still need some way 
to access the instance, right? Cast away const?

Jan 31 2014

"Dejan Lekic" <dejan.lekic gmail.com> writes:

 What use would the const version have? You'd still need some 
 way to access the instance, right? Cast away const?

I believe it should have been "final" instead of "const".

Jan 31 2014

"Stanislav Blinov" <stanislav.blinov gmail.com> writes:

On Friday, 31 January 2014 at 11:08:42 UTC, Dejan Lekic wrote:

 I believe it should have been "final" instead of "const".

But D doesn't have "final" :) In any event, that article by Mike 
Parker is about D1.

Jan 31 2014

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

On 1/31/14, Stanislav Blinov <stanislav.blinov gmail.com> wrote:
 On Friday, 31 January 2014 at 11:08:42 UTC, Dejan Lekic wrote:

 I believe it should have been "final" instead of "const".

 But D doesn't have "final" :) In any event, that article by Mike
 Parker is about D1.

AFAIK D1's final was equivalent to D2's immutable. But I maybe
remembering that wrong. Or maybe D2 initially used final before
settling for the new keyword immutable, to avoid confusion by users.

Jan 31 2014

Jacob Carlborg <doob me.com> writes:

On 2014-01-31 12:27, Andrej Mitrovic wrote:

 AFAIK D1's final was equivalent to D2's immutable. But I maybe
 remembering that wrong.

In D2 if if a variable is immutable or const you can not call non-const 
non-immutable methods via that variable. D1 didn't have any concept of 
this. "const" and "final" in D1 as more, you cannot change this variable.

 Or maybe D2 initially used final before
 settling for the new keyword immutable, to avoid confusion by users.

D2 used "invariant" before it used "immutable". It also changed the 
meaning of "const" compared to D1.

-- 
/Jacob Carlborg

Jan 31 2014

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

On 1/31/14, Jacob Carlborg <doob me.com> wrote:
 In D2 if if a variable is immutable or const you can not call non-const
 non-immutable methods via that variable. D1 didn't have any concept of
 this. "const" and "final" in D1 as more, you cannot change this variable.

So in D1 const is non-transitive?

Jan 31 2014

"Dicebot" <public dicebot.lv> writes:

On Friday, 31 January 2014 at 12:09:49 UTC, Andrej Mitrovic wrote:
 On 1/31/14, Jacob Carlborg <doob me.com> wrote:
 In D2 if if a variable is immutable or const you can not call 
 non-const
 non-immutable methods via that variable. D1 didn't have any 
 concept of
 this. "const" and "final" in D1 as more, you cannot change 
 this variable.

 So in D1 const is non-transitive?

It is completely different in D1. I think it is not even a 
qualifier there but a storage class - you can't have const 
function arguments, it is not printed in typeof and, yes, it is 
non-transitive. It basically just says "you can't modify this 
memory block". Also const variables with initializer act as D2 
enums.

This is one of reasons why porting Sociomantic code will be quite 
painful :)

Jan 31 2014

"Dejan Lekic" <dejan.lekic gmail.com> writes:

 But D doesn't have "final" :) In any event, that article by 
 Mike Parker is about D1.

Well, "final" still works. Until it does not we will agree that D 
does not have it. ;) That article applies to D2 as well, without 
any problems.

Jan 31 2014

"Namespace" <rswhite4 googlemail.com> writes:

On Friday, 31 January 2014 at 10:20:45 UTC, Dejan Lekic wrote:
 I was thinking about implementing a typical Java singleton in 
 D, and then decided to first check whether someone already did 
 that, and guess what - yes, someone did. Chech this URL: 
 http://dblog.aldacron.net/2007/03/03/singletons-in-d/

 Something like this (taken from the article above) in the case 
 you do not want lazy initialisation:

     class Singleton2(T)
     {
     public:
         static const T instance;

     private:
         this() {}

         static this() { instance = new T; }
     }

     class TMySingleton2 : Singleton!(TMySingleton2)
     {
     }

 Something like this (taken from the article above) in the case 
 you want lazy initialisation:

     class Singleton(T)
     {
     public:
         static T instance()
         {
             if(_instance is null) _instance = new T;
             return _instance;
         }

     private:
         this() {}

         static T _instance;
     }

     class TMySingleton : Singleton!(TMySingleton)
     {
     }

 If there are some Java programmers around who are curious how 
 is Java version done: 
 http://www.javaworld.com/article/2073352/core-java/simply-singleton.html

Why is someone interested in implementing such an Ani Pattern 
like Singletons? In most of all cases Singletons are misused.

Jan 31 2014

"Stanislav Blinov" <stanislav.blinov gmail.com> writes:

On Friday, 31 January 2014 at 10:27:28 UTC, Namespace wrote:

 Why is someone interested in implementing such an Ani Pattern 
 like Singletons?

Why is someone overquoting without reason? ;)

 In most of all cases Singletons are misused.

Any sort of shared (as in, between threads) resource is often a 
singleton. A queue for message passing, concurrent GC, a pipe... 
Even it doesn't have SINGLETON (yes, in all capitals to irritate 
reviewers) in its name.

Jan 31 2014

"Namespace" <rswhite4 googlemail.com> writes:

On Friday, 31 January 2014 at 10:50:57 UTC, Stanislav Blinov 
wrote:
 On Friday, 31 January 2014 at 10:27:28 UTC, Namespace wrote:

 Why is someone interested in implementing such an Ani Pattern 
 like Singletons?

 Why is someone overquoting without reason? ;)

I know so many people and have read so many books where 
Singletons are misused, that I react a bit allergic on it. In 
most cases, a singleton is absolutely unnecessary and hidden a 
global variable. Sorry if it may have sounded too harsh. ;)

Jan 31 2014

"Dejan Lekic" <dejan.lekic gmail.com> writes:

Here is an updated Andrej's code: 
http://dpaste.dzfl.pl/c85f487c7f70
SingletonSimple is a winner, followed by the SyncSingleton and 
SingletonLazy.

Jan 31 2014

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

On 1/31/14, Dejan Lekic <dejan.lekic gmail.com> wrote:
 SingletonSimple is a winner

Well yeah, but that's not really the only thing what a singleton is
about. It's also about being able to initialize the singleton at an
arbitrary time, rather than in a module constructor before main() is
called.

Jan 31 2014

"Dejan Lekic" <dejan.lekic gmail.com> writes:

On Friday, 31 January 2014 at 11:42:29 UTC, Andrej Mitrovic wrote:
 On 1/31/14, Dejan Lekic <dejan.lekic gmail.com> wrote:
 SingletonSimple is a winner

 Well yeah, but that's not really the only thing what a 
 singleton is
 about. It's also about being able to initialize the singleton 
 at an
 arbitrary time, rather than in a module constructor before 
 main() is
 called.

Absolutely, that is why I would use bothe alternatives, depending 
on the use-case.

Jan 31 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 1/31/14, 3:42 AM, Andrej Mitrovic wrote:
 On 1/31/14, Dejan Lekic <dejan.lekic gmail.com> wrote:
 SingletonSimple is a winner

 Well yeah, but that's not really the only thing what a singleton is
 about. It's also about being able to initialize the singleton at an
 arbitrary time, rather than in a module constructor before main() is
 called.

Well yah Singleton should be created on first access.

Andrei

Jan 31 2014

"Dejan Lekic" <dejan.lekic gmail.com> writes:

On Friday, 31 January 2014 at 17:10:08 UTC, Andrei Alexandrescu 
wrote:
 On 1/31/14, 3:42 AM, Andrej Mitrovic wrote:
 On 1/31/14, Dejan Lekic <dejan.lekic gmail.com> wrote:
 SingletonSimple is a winner

 Well yeah, but that's not really the only thing what a 
 singleton is
 about. It's also about being able to initialize the singleton 
 at an
 arbitrary time, rather than in a module constructor before 
 main() is
 called.

 Well yah Singleton should be created on first access.

 Andrei

If that is what people want, then David's version is definitely 
the best one.

Jan 31 2014

"Stanislav Blinov" <stanislav.blinov gmail.com> writes:

On Friday, 31 January 2014 at 11:34:13 UTC, Dejan Lekic wrote:

 SingletonSimple is a winner, followed by the SyncSingleton and 
 SingletonLazy.

Dejan, your singletons are thread-local :)

Jan 31 2014

"Dejan Lekic" <dejan.lekic gmail.com> writes:

On Friday, 31 January 2014 at 11:44:10 UTC, Stanislav Blinov
wrote:
 On Friday, 31 January 2014 at 11:34:13 UTC, Dejan Lekic wrote:

 SingletonSimple is a winner, followed by the SyncSingleton and 
 SingletonLazy.

 Dejan, your singletons are thread-local :)

YAY, that is correct! :'(

Jan 31 2014

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

On 1/31/14, Dejan Lekic <dejan.lekic gmail.com> wrote:
 SingletonLazy.

SingletonLazy isn't thread-safe. :)

Jan 31 2014

"Dejan Lekic" <dejan.lekic gmail.com> writes:

 SingletonLazy isn't thread-safe. :)

EEK!

Jan 31 2014

"Dejan Lekic" <dejan.lekic gmail.com> writes:

On Friday, 31 January 2014 at 11:45:56 UTC, Andrej Mitrovic wrote:
 On 1/31/14, Dejan Lekic <dejan.lekic gmail.com> wrote:
 SingletonLazy.

 SingletonLazy isn't thread-safe. :)

I made it thread-safe, and guess what - I ended up with 
SyncSingleton-like solution! So SyncSingleton is a clean winner 
if you want to make it lazy.

Jan 31 2014

"TC" <chalucha gmail.com> writes:

On Friday, 31 January 2014 at 08:25:16 UTC, Andrej Mitrovic wrote:
 class LockSingleton
 {
     static LockSingleton get()
     {
         __gshared LockSingleton _instance;

         synchronized
         {
             if (_instance is null)
                 _instance = new LockSingleton;
         }

         return _instance;
     }

 private:
     this() { }
 }

Should't be the LockSingleton implemented like this instead?

class LockSingleton
{
     static auto get()
     {
         if (_instance is null)
         {
             synchronized
             {
                 if (_instance is null)
                     _instance = new LockSingleton;
             }
         }

         return _instance;
     }

private:
     this() { }
     __gshared LockSingleton _instance;
}

At least this is the way singleton is suggested to implement in 

instantiation and not allways.

Feb 07 2014

Iain Buclaw <ibuclaw gdcproject.org> writes:

On 7 February 2014 10:25, TC <chalucha gmail.com> wrote:
 On Friday, 31 January 2014 at 08:25:16 UTC, Andrej Mitrovic wrote:
 class LockSingleton
 {
     static LockSingleton get()
     {
         __gshared LockSingleton _instance;

         synchronized
         {
             if (_instance is null)

                 _instance = new LockSingleton;
         }

         return _instance;
     }

 private:
     this() { }
 }


 Should't be the LockSingleton implemented like this instead?

 class LockSingleton
 {
     static auto get()
     {
         if (_instance is null)
         {
             synchronized
             {
                 if (_instance is null)

                     _instance = new LockSingleton;
             }
         }

         return _instance;
     }

 private:
     this() { }
     __gshared LockSingleton _instance;
 }


 synchronization is then needed only for initial instantiation and not
 allways.

We don't want double-checked locking. :)

This was discussed at dconf, the D way is to leverage native thread
local storage.  I seem to recall that when David tested this, GDC had
pretty much near identical speeds to unsafe gets().

You'll have to consult the slides, but I think it was something like:

class LockSingleton
{
    static auto get()
    {
        if (!_instantiated)
        {
            synchronized (LockSingleton.classinfo)
            {
               if (_instance is null)
                    _instance = new LockSingleton;
                _instantiated = true;
            }
        }
        return _instance;
    }

  private:
    this() { }
    static bool _instantiated;
    __gshared LockSingleton _instance;
}

Feb 07 2014

"Stanislav Blinov" <stanislav.blinov gmail.com> writes:

On Friday, 7 February 2014 at 10:25:52 UTC, TC wrote:

 Should't be the LockSingleton implemented like this instead?

 class LockSingleton
 {
     static auto get()
     {
         if (_instance is null)

(_instance is null) will most likely not be an atomic operation. 
References are two words. Imagine that one thread writes half a 
reference inside synchronized {}, then goes to sleep. What would 
the thread that gets to that 'if' return? I'd say it'll return 
"ouch".

Feb 07 2014

"Daniel Murphy" <yebbliesnospam gmail.com> writes:

"Stanislav Blinov"  wrote in message 
news:idrxthgkumydmiszdtcx forum.dlang.org...
 (_instance is null) will most likely not be an atomic operation. 
 References are two words.

References are one word.

Feb 07 2014

"Stanislav Blinov" <stanislav.blinov gmail.com> writes:

On Friday, 7 February 2014 at 11:36:23 UTC, Daniel Murphy wrote:
 "Stanislav Blinov"  wrote in message 
 news:idrxthgkumydmiszdtcx forum.dlang.org...
 (_instance is null) will most likely not be an atomic 
 operation. References are two words.

 References are one word.

Heh, indeed. Need to go have my brain scanned :\ I have no idea 
why I thought that.

Feb 07 2014

"Stanislav Blinov" <stanislav.blinov gmail.com> writes:

On Friday, 7 February 2014 at 11:31:14 UTC, Stanislav Blinov 
wrote:
 On Friday, 7 February 2014 at 10:25:52 UTC, TC wrote:

 Should't be the LockSingleton implemented like this instead?

 class LockSingleton
 {
    static auto get()
    {
        if (_instance is null)

 (_instance is null) will most likely not be an atomic 
 operation. References are two words. Imagine that one thread 
 writes half a reference inside synchronized {}, then goes to 
 sleep. What would the thread that gets to that 'if' return? I'd 
 say it'll return "ouch".

Scratch that.

Feb 07 2014

luka8088 <luka8088 owave.net> writes:

On 31.1.2014. 9:25, Andrej Mitrovic wrote:
 There was a nice blog-post about implementing low-lock singletons in D, here:
 http://davesdprogramming.wordpress.com/2013/05/06/low-lock-singletons/
 
 One suggestion on Reddit was by dawgfoto (I think this is Martin
 Nowak?), to use atomic primitives instead:
 http://www.reddit.com/r/programming/comments/1droaa/lowlock_singletons_in_d_the_singleton_pattern/c9tmz07
 
 I wanted to benchmark these different approaches. I was expecting
 Martin's implementation to be the fastest one, but on my machine
 (Athlon II X4 620 - 2.61GHz) the implementation in the blog post turns
 out to be the fastest one. I'm wondering whether my test case is
 flawed in some way. Btw, I think we should put an implementation of
 this into Phobos.
 
 The timings on my machine:
 
 Test time for LockSingleton: 542 msecs.
 Test time for SyncSingleton: 20 msecs.
 Test time for AtomicSingleton: 755 msecs.
 

What about swapping function pointer so the check is done only once per
thread? (Thread is tldr so I am sorry if someone already suggested this)

--------------------------------------------------

class FunctionPointerSingleton {

  private static __gshared typeof(this) instance_;

  // tls
   property static typeof(this) function () get;

  static this () {
    get = {
      synchronized {
        if (instance_ is null)
          instance_ = new typeof(this)();
        get = { return instance_; };
        return instance_;
      }
    };
  }

}

--------------------------------------------------

dmd -release -inline -O -noboundscheck -unittest -run singleton.d

Test time for LockSingleton: 901 msecs.
Test time for SyncSingleton: 20.75 msecs.
Test time for AtomicSingleton: 169 msecs.
Test time for FunctionPointerSingleton: 7.5 msecs.

I don't have such a muscular machine xD

Feb 09 2014

"Stanislav Blinov" <stanislav.blinov gmail.com> writes:

On Sunday, 9 February 2014 at 12:20:54 UTC, luka8088 wrote:

 What about swapping function pointer so the check is done only 
 once per
 thread? (Thread is tldr so I am sorry if someone already 
 suggested this)

That is an interesting idea indeed, though it seems to be faster 
only for dmd. I haven't studied the assembly yet, but with LDC I 
don't see any noticeable difference between SyncSingleton and 
FunctionPointerSingleton.

Feb 09 2014

luka8088 <luka8088 owave.net> writes:

On 9.2.2014. 15:09, Stanislav Blinov wrote:
 On Sunday, 9 February 2014 at 12:20:54 UTC, luka8088 wrote:
 
 What about swapping function pointer so the check is done only once per
 thread? (Thread is tldr so I am sorry if someone already suggested this)

 
 That is an interesting idea indeed, though it seems to be faster only
 for dmd. I haven't studied the assembly yet, but with LDC I don't see
 any noticeable difference between SyncSingleton and
 FunctionPointerSingleton.

I got it while writing code for dynamic languages (especially
javascript). Thought came that instead of checking for something that
you know will always have the same result just remove that piece of code
and voila :)

Feb 09 2014

Martin Nowak <code dawg.eu> writes:

On 02/09/2014 01:20 PM, luka8088 wrote:
 class FunctionPointerSingleton {

    private static __gshared typeof(this) instance_;

    // tls
     property static typeof(this) function () get;

You don't even need to make this TLS, right?
    static this () {
      get = {
        synchronized {
          if (instance_ is null)
            instance_ = new typeof(this)();
          get = { return instance_; };
          return instance_;
        }
      };
    }

 }

Feb 09 2014

"Stanislav Blinov" <stanislav.blinov gmail.com> writes:

On Sunday, 9 February 2014 at 18:06:46 UTC, Martin Nowak wrote:
 On 02/09/2014 01:20 PM, luka8088 wrote:
 class FunctionPointerSingleton {

   private static __gshared typeof(this) instance_;

   // tls
    property static typeof(this) function () get;

 You don't even need to make this TLS, right?

I don't follow. get should be TLS, as a replacement for 
SyncSingleton's _instantiated TLS bool.

Feb 09 2014

luka8088 <luka8088 owave.net> writes:

On 9.2.2014. 19:51, Stanislav Blinov wrote:
 On Sunday, 9 February 2014 at 18:06:46 UTC, Martin Nowak wrote:
 On 02/09/2014 01:20 PM, luka8088 wrote:
 class FunctionPointerSingleton {

   private static __gshared typeof(this) instance_;

   // tls
    property static typeof(this) function () get;

 You don't even need to make this TLS, right?

 
 I don't follow. get should be TLS, as a replacement for SyncSingleton's
 _instantiated TLS bool.

It is tls and it needs to be tls because one thread could be replacing
where get points to while another is trying to access it. It's either
tls or putting some synchronization above it which would break the whole
idea of executing synchronized block only once per thread.

Feb 09 2014

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

On 2/9/14, luka8088 <luka8088 owave.net> wrote:
 What about swapping function pointer so the check is done only once per
 thread? (Thread is tldr so I am sorry if someone already suggested this)

Interesting solution for sure.

   // tls
    property static typeof(this) function () get;

This confused me for a second since  property is meaningless for variables. :>

Feb 10 2014

luka8088 <luka8088 owave.net> writes:

On 10.2.2014. 10:52, Andrej Mitrovic wrote:
 On 2/9/14, luka8088 <luka8088 owave.net> wrote:
 What about swapping function pointer so the check is done only once per
 thread? (Thread is tldr so I am sorry if someone already suggested this)

 
 Interesting solution for sure.
 
   // tls
    property static typeof(this) function () get;

 
 This confused me for a second since  property is meaningless for variables. :>
 

Yeah. My mistake. It should be removed.

Feb 10 2014

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

On 2/9/14, luka8088 <luka8088 owave.net> wrote:
   private static __gshared typeof(this) instance_;

Also, "static __gshared" is really meaningless here, it's either
static (TLS), or globally shared, either way it's not a class
instance, so you can type __gshared alone here. Otherwise I'm not sure
what the semantics of a per-class-instance __gshared field would be,
if that can exist.

Feb 10 2014

luka8088 <luka8088 owave.net> writes:

On 10.2.2014. 10:54, Andrej Mitrovic wrote:
 On 2/9/14, luka8088 <luka8088 owave.net> wrote:
   private static __gshared typeof(this) instance_;

 
 Also, "static __gshared" is really meaningless here, it's either
 static (TLS), or globally shared, either way it's not a class
 instance, so you can type __gshared alone here. Otherwise I'm not sure
 what the semantics of a per-class-instance __gshared field would be,
 if that can exist.
 

"static" does not meat it must be tls, as "static shared" is valid.

I just like to write that it is static and not shared. I know that
__gshared does imply static but this implication is not intuitive to me
so I write it explicitly.

For example, I think that the following code should output 5 and 6 (as
it would it __gshared did not imply static):


module program;

import std.stdio;
import core.thread;

class A {
  __gshared int i;
}

void main () {

  auto a1 = new A();
  auto a2 = new A();

  (new Thread({
    a1.i = 5;
    a2.i = 6;
    (new Thread({
      writeln(a1.i);
      writeln(a2.i);
    })).start();
  })).start();

}


But in any case, this variable is just __gshared.

Feb 10 2014

luka8088 <luka8088 owave.net> writes:

On 10.2.2014. 13:44, luka8088 wrote:
 On 10.2.2014. 10:54, Andrej Mitrovic wrote:
 On 2/9/14, luka8088 <luka8088 owave.net> wrote:
   private static __gshared typeof(this) instance_;

 Also, "static __gshared" is really meaningless here, it's either
 static (TLS), or globally shared, either way it's not a class
 instance, so you can type __gshared alone here. Otherwise I'm not sure
 what the semantics of a per-class-instance __gshared field would be,
 if that can exist.

 
 "static" does not meat it must be tls, as "static shared" is valid.
 
 I just like to write that it is static and not shared. I know that
 __gshared does imply static but this implication is not intuitive to me
 so I write it explicitly.
 
 For example, I think that the following code should output 5 and 6 (as
 it would it __gshared did not imply static):
 
 
 module program;
 
 import std.stdio;
 import core.thread;
 
 class A {
   __gshared int i;
 }
 
 void main () {
 
   auto a1 = new A();
   auto a2 = new A();
 
   (new Thread({
     a1.i = 5;
     a2.i = 6;
     (new Thread({
       writeln(a1.i);
       writeln(a2.i);
     })).start();
   })).start();
 
 }
 
 
 But in any case, this variable is just __gshared.
 

Um actually this makes no sense. But anyway I mark it static.

Feb 10 2014

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

On 2/10/14, luka8088 <luka8088 owave.net> wrote:
 "static" does not mean it must be tls, as "static shared" is valid.

Yes you're right. I'm beginning to really dislike the 20 different
meanings of "static". :)

Feb 10 2014

"Daniel Murphy" <yebbliesnospam gmail.com> writes:

"Andrej Mitrovic"  wrote in message 
news:mailman.111.1392039607.21734.digitalmars-d puremagic.com...

 Yes you're right. I'm beginning to really dislike the 20 different
 meanings of "static". :)

Don't forget that __gshared static and static __gshared do different things!

Feb 10 2014

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

On 2/10/14, Daniel Murphy <yebbliesnospam gmail.com> wrote:
 Don't forget that __gshared static and static __gshared do different things!

wat.

Feb 10 2014

"Dicebot" <public dicebot.lv> writes:

On Monday, 10 February 2014 at 16:53:35 UTC, Andrej Mitrovic 
wrote:
 On 2/10/14, Daniel Murphy <yebbliesnospam gmail.com> wrote:
 Don't forget that __gshared static and static __gshared do 
 different things!

 wat.

To be more specific: "WATWATWAT"

Feb 10 2014

"Dejan Lekic" <dejan.lekic gmail.com> writes:

On Monday, 10 February 2014 at 14:15:58 UTC, Daniel Murphy wrote:
 "Andrej Mitrovic"  wrote in message 
 news:mailman.111.1392039607.21734.digitalmars-d puremagic.com...

 Yes you're right. I'm beginning to really dislike the 20 
 different
 meanings of "static". :)

 Don't forget that __gshared static and static __gshared do 
 different things!

Care to elaborate?

Feb 10 2014

"Daniel Murphy" <yebbliesnospam gmail.com> writes:

"Dejan Lekic"  wrote in message news:nvakemdpugwupoqctrtd forum.dlang.org...
 Don't forget that __gshared static and static __gshared do different 
 things!

 Care to elaborate?

https://d.puremagic.com/issues/show_bug.cgi?id=4419

Feb 10 2014

"Andrej Mitrovic" <andrej.mitrovich gmail.com> writes:

On Tuesday, 11 February 2014 at 03:43:35 UTC, Daniel Murphy wrote:
 "Dejan Lekic"  wrote in message 
 news:nvakemdpugwupoqctrtd forum.dlang.org...
 Don't forget that __gshared static and static __gshared do 
 different things!

 Care to elaborate?

 https://d.puremagic.com/issues/show_bug.cgi?id=4419

Ah, that thing. Yeah this whole issue is rather messy IMO.

Feb 11 2014

Jerry <jlquinn optonline.net> writes:

"Andrej Mitrovic" <andrej.mitrovich gmail.com> writes:

 On Tuesday, 11 February 2014 at 03:43:35 UTC, Daniel Murphy wrote:
 "Dejan Lekic"  wrote in message news:nvakemdpugwupoqctrtd forum.dlang.org...
 Don't forget that __gshared static and static __gshared do > different

 things!

 Care to elaborate?

 https://d.puremagic.com/issues/show_bug.cgi?id=4419

 Ah, that thing. Yeah this whole issue is rather messy IMO.

Looking at the bug, I see the compiler doesn't implement what the spec
says.  The spec says __gshared implies static.  Is the messiness fixing
the implementation to match the spec, or refining the spec to better
define what should happen?

Feb 11 2014

"Daniel Murphy" <yebbliesnospam gmail.com> writes:

"Jerry"  wrote in message news:87sirpbjdf.fsf optonline.net...

 Looking at the bug, I see the compiler doesn't implement what the spec
 says.  The spec says __gshared implies static.  Is the messiness fixing
 the implementation to match the spec, or refining the spec to better
 define what should happen?

It's just messy in the sense that it doesn't behave in a logical or useful 
way.

Feb 12 2014

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

On 2/9/14, luka8088 <luka8088 owave.net> wrote:
 dmd -release -inline -O -noboundscheck -unittest -run singleton.d

 Test time for LockSingleton: 901 msecs.
 Test time for SyncSingleton: 20.75 msecs.
 Test time for AtomicSingleton: 169 msecs.
 Test time for FunctionPointerSingleton: 7.5 msecs.

C:\dev\code\d_code>test_dmd
Test time for LockSingleton: 438 msecs.
Test time for SyncSingleton: 6.25 msecs.
Test time for AtomicSingleton: 8 msecs.
Test time for FunctionPointerSingleton: 5 msecs.

C:\dev\code\d_code>test_ldc
Test time for LockSingleton: 575.5 msecs.
Test time for SyncSingleton: 5 msecs.
Test time for AtomicSingleton: 3 msecs.
Test time for FunctionPointerSingleton: 5.25 msecs.

It seems it makes a tiny bit of difference for DMD, but LDC still
generates better codegen for the atomic version.

Feb 10 2014

luka8088 <luka8088 owave.net> writes:

On 10.2.2014. 10:59, Andrej Mitrovic wrote:
 On 2/9/14, luka8088 <luka8088 owave.net> wrote:
 dmd -release -inline -O -noboundscheck -unittest -run singleton.d

 Test time for LockSingleton: 901 msecs.
 Test time for SyncSingleton: 20.75 msecs.
 Test time for AtomicSingleton: 169 msecs.
 Test time for FunctionPointerSingleton: 7.5 msecs.

 
 C:\dev\code\d_code>test_dmd
 Test time for LockSingleton: 438 msecs.
 Test time for SyncSingleton: 6.25 msecs.
 Test time for AtomicSingleton: 8 msecs.
 Test time for FunctionPointerSingleton: 5 msecs.
 
 C:\dev\code\d_code>test_ldc
 Test time for LockSingleton: 575.5 msecs.
 Test time for SyncSingleton: 5 msecs.
 Test time for AtomicSingleton: 3 msecs.
 Test time for FunctionPointerSingleton: 5.25 msecs.
 
 It seems it makes a tiny bit of difference for DMD, but LDC still
 generates better codegen for the atomic version.
 

Could it be that TLS is slower in LLVM?

Feb 10 2014

D Programming

C/C++ Programming

Other

digitalmars.D - Testing some singleton implementations