digitalmars.D - Lets talk about fibers

Liran Zvibel (122/122) Jun 03 2015 Hi,

Joakim (7/39) Jun 03 2015 Your entire argument seems based on fibers moving between threads

Liran Zvibel (30/39) Jun 04 2015 This is not "my" reactor IO model, this is the model that was

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (19/21) Jun 04 2015 INCOMING WORKLOAD ("__" denotes yield+delay):

Liran Zvibel (32/54) Jun 04 2015 Fibers are good when you get tons of new work constantly.

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (14/29) Jun 04 2015 That assumes that the tasks don't do much work but just wait and

Ivan Timokhin (4/9) Jun 04 2015 This might be relevant:

Steven Schveighoffer (12/18) Jun 04 2015 I plead complete ignorance and inexperience with fibers and thread

Jonathan M Davis (8/32) Jun 04 2015 One thing that needs to be considered that deadalnix pointed out

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (11/11) Jun 04 2015 I mostly agree with what you wrote, but I'd like to point out
Dmitry Olshansky (12/15) Jun 04 2015 For me language being TLS by default is enough to not even try this

Dan Olson (14/24) Jun 04 2015 Opposite problem too, with LLVM's TLS optimizations, the Fiber may keep

Jonathan M Davis (34/36) Jun 04 2015 Given that it sounds like LLVM _can't_ implement moving fibers

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (17/20) Jun 05 2015 What good reasons?

Steven Schveighoffer (6/17) Jun 05 2015 I think I'll go with Liran's experience over your hypothetical

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (19/23) Jun 05 2015 There is absolutely no reason to go personal. I address weak

Chris (20/44) Jun 05 2015 I agree, but I dare doubt that a slight performance edge will

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (23/35) Jun 05 2015 But everybody loves the underdog when it catches up to the pack

Chris (6/43) Jun 05 2015 Thanks for showing me Pony. Languages like Nim and Pony keep

Paulo Pinto (12/66) Jun 08 2015 Which is why after all those years, the OpenJDK will eventually

maik klein (19/19) Apr 16 2016 Here is an interesting talk from Naughty Dog

Dicebot (14/39) Apr 16 2016 Such design is neither needed for good concurrency, nor actually

Suliman (2/9) Jan 08 2017 Could you explain difference between fibers and tasks. I read a

Suliman (12/12) Jan 08 2017 "The type of concurrency used when logical threads are created is
Chris Wright (11/20) Jan 08 2017 A task is a unit of work to be scheduled.
Dicebot (16/25) Jan 08 2017 Fiber is context switching primitive very similar to thread. It
Russel Winder via Digitalmars-d (27/37) Jan 08 2017 A fibre is what a thread used to be before kernels supported threads
Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (6/15) Jan 23 2017 The meaning of the word "task" is contextual:

Steven Schveighoffer (7/12) Jun 05 2015 I didn't, actually. Your arguments seem well crafted and persuasive, but...

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (18/23) Jun 06 2015 I have absolutely no idea what you are talking about. Experience

Dmitry Olshansky (24/41) Jun 05 2015 Cache arguments are hard to get right w/o experiment. That "possibly"

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (20/36) Jun 05 2015 If you cannot control affinity then you can't take advantage of

Dmitry Olshansky (22/48) Jun 05 2015 You choose to ignore the point about duplicating the same memory in each...

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (19/26) Jun 05 2015 Not sure what you mean by this. 3rd level cache is shared.

Dan Olson (8/10) Jun 05 2015 On TLS and migrating Fibers - these were posted elsewhere, and want to

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (6/11) Jun 05 2015 What I meant is that I don't have a use case for TLS in my own

Shachar Shemesh (24/27) Jun 06 2015 I see that people already raised the point that the OS does allow you to...

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (6/8) Jun 07 2015 Using an unlikely workload that the kernel has not been designed

Dicebot (1/1) Jun 04 2015 For the record : I am fully with Liran on this case.

Paolo Invernizzi (5/6) Jun 04 2015 +1 also for me.

"Liran Zvibel" <liran weka.io> writes:

Hi,

We discussed (not) moving fibers between threads on DConf last 
week, and later it was discussed in the announce group, I think 
this matter is important enough to get a thread of it's own.

Software fibers/coroutines were created to make asynchronous 
programming using a Reactor (or another "event loop i/o 
scheduler") more seamless.

For those unaware of the Reactor Pattern, I advise reading [ 
http://en.wikipedia.org/wiki/Reactor_pattern ; 
http://www.dre.vanderbilt.edu/~schmidt/PDF/reactor-siemens.pdf ], 
and for some perspective at how other languages have addressed 
this I recommend watching Guido Van Rossum's talk about acyncio 
and Python: https://www.youtube.com/watch?v=aurOB4qYuFM

The Reactor pattern is a long-time widely accepted way to achieve 
low latency async io operations, that fortunately became famous 
thanks to the Web and the C10k requirement/problem. Using the 
Reactor is the most efficient way to leverage current CPU 
architectures to perform lots of IO for many reasons outside of 
this scope.
Another very important quality to using a rector based approach, 
is that since all event handlers just serialize on a single IO 
scheduler ("the reactor") on each thread, if designed correctly 
programmers don't have to think about concurrency and care about 
code-races.

Another thing to note: when using the reactor pattern you have to 
make sure that no event handler blocks at all, never! Once an 
event-handler blocks, since being a non-preemptive model, the 
other event handlers will not be able to run, basically starving 
themselves and the clients on the other side of the network.
Reactor implementations usually detect, and notify when an event 
handler took too much time until giving away control (this is 
dependent on application, but should be in the usec range on 
current hw).

The downside for the reactor pattern (used to be) that the 
programmer has to manually keep the state/context of how the 
event handler worked. Since each "logical" operation was 
comprised by many i/o transactions (some NW protocol to keep 
track, maybe accessing a networked DB for some data, 
reading/writing to local/remote files/ etc) the reactor would 
also keep a context for each callback and IO event and the 
programmer had to either update the context and keep registering 
new event handlers manually for all extra I/O transactions and in 
many cases change callback registration in some cases.
This downside means that it's more difficult to program for a 
Reactor model, but since programmers don't have to think about 
races and concurrency issues (and then debug them...) from our 
experience it still more efficient to program than 
general-purpose threads if you care about correctness/coherency.
One way so mitigate this complexity was through the Proactor 
pattern -- implementing higher-level async. IO services over the 
reactor, thus sparing the programmer a lot of the low-level 
context headaches.

Up until now I did not say anything about Fibers/coroutines.

What Fibers bring to the table, is the ability to program within 
the reactor model without having to manually keep a context that 
is separate for the program logic, and without the requirement to 
manually re/register callbacks for different IO events.
D's Fibers allowed us to create an async io library with support 
for network/file/disk operations and higher level conditions 
(waiters, barriers, etc) that allows the programmer to write code 
as-if it runs in its own thread (almost, sometimes fibers are 
explicitly "spawned" -- added to the reactor, and 
fiber-conditions are slightly different than spawning and joining 
threads) without paying the huge correctness/coherence and 
performance penalties of the threading model.

There are two main reasons why it does not make sense to move 
fibers between threads:

1. You'll start having concurrency issues. Lets assume we have a 
main fiber that received some request, and it spawns 3 fibers 
looking into different DBs to get some info and update an array 
with the data. The array will probably be on the stack of the 
first fiber. If fibers don't move between threads, there is 
nothing to worry about (as expected by the model). If you start 
moving fibers across threads you have to start guarding this 
array now, to make sure it's still coherent.
This is a simple example, but basically shows that you're 
"losing" one of the biggest selling point of the whole reactor 
based model.

2. Fibers and reactor based IO make work well (read: make sense) 
when you have a situation where you have lots of concurrent very 
small transactions (similar to the Web C10k problem or a storage 
machine). In this case, if one of the threads has more capacity 
than the rest, then the IO scheduler ("reactor") will just make 
sure to spawn new fibers accepting new transactions in that 
fiber. If you don't have a situation that balancing can be done 
via placing new requests in the right place, then probably you 
should not use the reactor model, but a different one that suits 
your application better.
Currently we can spawn another reactor to take more load, but the 
load is balanced statically at a system-wide level. On previous 
projects we had several reactors running on different threads and 
providing very different functionality (with different handlers, 
naturally).
We never got to a situation that moving a fiber between threads 
made any sense.

As we see, there is nothing to gain and lots to lose by moving 
fibers between threads.

Now, if we want to make sure fibers are well supported in D there 
are several other things we should do:

1. Implement a good asyncIO library that supports fiber based 
programming. I don't know Vibe.d very well (e.g. at all), maybe 
we (Weka.IO) can help review it and suggest ways to make it into 
a general async IO library (we have over 15 years experience 
developing with the reactor model in many environments)

2. Adding better compiler support. The one problem with fibers is 
that upon creation you have to know the stack size for that 
fiber. Different functions will create different stack depths. It 
is very convenient to use the stack to hold all objects (recall 
Walter's first day talk, for example), and it can be used as very 
convenient way to "garbage collect" all resources added during 
the run of that fiber, but currently we don't leverage it to the 
max since we don't have a good way to know/limit the amount of 
memory used this way.
If the compiler will be able to analyze stack usage by functions 
(recursively) and be able to give us hints regarding the 
upper-bounds of stack usage, we will be able to use the stack 
more aggressively and utilize memory much better.
Also -- I think such static analysis will be a big selling point 
for D for systems like ours.

I think now everything is written down, and we can move the 
discussion here.

Liran.

Jun 03 2015

"Joakim" <dlang joakim.fea.st> writes:

On Wednesday, 3 June 2015 at 18:34:34 UTC, Liran Zvibel wrote:
 There are two main reasons why it does not make sense to move 
 fibers between threads:

 1. You'll start having concurrency issues. Lets assume we have 
 a main fiber that received some request, and it spawns 3 fibers 
 looking into different DBs to get some info and update an array 
 with the data. The array will probably be on the stack of the 
 first fiber. If fibers don't move between threads, there is 
 nothing to worry about (as expected by the model). If you start 
 moving fibers across threads you have to start guarding this 
 array now, to make sure it's still coherent.
 This is a simple example, but basically shows that you're 
 "losing" one of the biggest selling point of the whole reactor 
 based model.

 2. Fibers and reactor based IO make work well (read: make 
 sense) when you have a situation where you have lots of 
 concurrent very small transactions (similar to the Web C10k 
 problem or a storage machine). In this case, if one of the 
 threads has more capacity than the rest, then the IO scheduler 
 ("reactor") will just make sure to spawn new fibers accepting 
 new transactions in that fiber. If you don't have a situation 
 that balancing can be done via placing new requests in the 
 right place, then probably you should not use the reactor 
 model, but a different one that suits your application better.
 Currently we can spawn another reactor to take more load, but 
 the load is balanced statically at a system-wide level. On 
 previous projects we had several reactors running on different 
 threads and providing very different functionality (with 
 different handlers, naturally).
 We never got to a situation that moving a fiber between threads 
 made any sense.

 As we see, there is nothing to gain and lots to lose by moving 
 fibers between threads.

Your entire argument seems based on fibers moving between threads
breaking your reactor IO model.  If there was an option to
disable fibers moving or if you had to explicitly ask for a fiber
to move, your argument is moot.

I have no dog in this fight, just pointing out that your argument
is very specific to your use.

Jun 03 2015

"Liran Zvibel" <liran weka.io> writes:

On Thursday, 4 June 2015 at 01:51:25 UTC, Joakim wrote:
 Your entire argument seems based on fibers moving between 
 threads
 breaking your reactor IO model.  If there was an option to
 disable fibers moving or if you had to explicitly ask for a 
 fiber
 to move, your argument is moot.

 I have no dog in this fight, just pointing out that your 
 argument
 is very specific to your use.

This is not "my" reactor IO model, this is the model that was 
popularized by ACE in the '90 (and since this is how I got to 
know it this is how I call it), and later became the asyncio 
programming model.
This model was important enough for Guido Van Rossum to spend a 
lot of his time to add to Python, and Google created a whole 
programming language around [and I can give more references to 
that model if you like].

My point is that moving fibers between threads is difficult to 
implement and makes the model WEAKER. So you work hard, and get 
less (or just never use that feature you worked hard on as it 
breaks the model).

The main problem with adding flexibility is that initially it 
always sounds like a "good idea". I just want to stress the point 
that in this case it's actually not such a good idea.

If you can come up with another programming model that leverages 
fibers (and is popular), and moving fibers between threads makes 
sense in that model, then I think the discussion should be how 
stronger that other model is with fibers being able to move, and 
whether it's worth the effort.

Since I think you won't come up with a very good case to moving 
them between threads on that other popular programming model, and 
since it's difficult to implement, and since it already makes one 
popular programming model weaker -- I suggest not to do it.

Currently asyncio is supported by D (Vibe.d and Weka.IO are using 
it) well without this ability.

At the end of my post I suggested to use the resources freed by 
not-moving-fibers differently and just endorse the asyncio 
programming model rather then add generic "flexibility" features.

Jun 04 2015

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Thursday, 4 June 2015 at 07:24:48 UTC, Liran Zvibel wrote:
 Since I think you won't come up with a very good case to moving 
 them between threads on that other popular programming model,

INCOMING WORKLOAD ("__" denotes yield+delay):

a____aaaaaaa
         b____bbbbbb
         c____cccccccc
         d____dddddd
         e____eeeeeee

SCHEDULING WITHOUT MIGRATION:

CORE 1: aaaaaaaa
CORE 2: bcdef___bbbbbbccccccccddddddeeeeeee


SCHEDULING WITH MIGRATION:

CORE 1: aaaaaaaacccccccceeeeeee
CORE 2: bcdef___bbbbbbdddddd

And this isn't even a worst case scenario. Please note that it is 
common to start a task by looking up global caches first. So this 
is a common pattern:

1. look up caches
2. wait for response
3. process

Jun 04 2015

"Liran Zvibel" <liran weka.io> writes:

On Thursday, 4 June 2015 at 08:43:31 UTC, Ola Fosheim Grøstad 
wrote:
 On Thursday, 4 June 2015 at 07:24:48 UTC, Liran Zvibel wrote:
 Since I think you won't come up with a very good case to 
 moving them between threads on that other popular programming 
 model,

 INCOMING WORKLOAD ("__" denotes yield+delay):

 a____aaaaaaa
         b____bbbbbb
         c____cccccccc
         d____dddddd
         e____eeeeeee

 SCHEDULING WITHOUT MIGRATION:

 CORE 1: aaaaaaaa
 CORE 2: bcdef___bbbbbbccccccccddddddeeeeeee


 SCHEDULING WITH MIGRATION:

 CORE 1: aaaaaaaacccccccceeeeeee
 CORE 2: bcdef___bbbbbbdddddd

 And this isn't even a worst case scenario. Please note that it 
 is common to start a task by looking up global caches first. So 
 this is a common pattern:

 1. look up caches
 2. wait for response
 3. process

Fibers are good when you get tons of new work constantly.

If you just have a few things that runs forever, you're most 
probably better off with threads.

It's true that you can misuse fibers that than complains that 
things don't work well for you, but I don't think it should be 
supported by the language.

If you assume that new jobs always come in (and then you schedule 
new jobs to the more-empty fibers), there is no need to balance 
old jobs (That will finish very soon anyway).

If you have a blocking operation it should not be in fibers 
anyways.
We have a deferToThread mechanism with a thread pool that waits 
for such functions (if we want to do something that takes some 
time, or use external library).
Fibers should never ever block. If your fiber is blocking you're 
violating the model.

Fibers aren't some magic to solve every CS problem possible. 
There is a defined class of problems that work well for fibers, 
and there fibers should be utilized (and even then with great 
discipline). If your problem is not one of these -- use another 
form of concurrency/parallelism. One of my main arguments against 
Go is "If your only tool is a hammer, then every problem looks 
like a nail" -- D should not go that route.

Looking at your example -- a good scheduler should have 
distributed a-e evenly across both cores to begin with. Then a 
good fibers programmer should yield() after each unit of work, so 
aaaaaaa won't be a valid state. Finally, the blocking code should 
have run outside the fibers io scheduler, and just have that 
fiber waiting in suspended mode until it's runnable again, 
allowing other fibers to execute.

Jun 04 2015

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Thursday, 4 June 2015 at 13:42:41 UTC, Liran Zvibel wrote:
 If you assume that new jobs always come in (and then you 
 schedule new jobs to the more-empty fibers), there is no need 
 to balance old jobs (That will finish very soon anyway).

That assumes that the tasks don't do much work but just wait and 
wait and wait.


 If you have a blocking operation it should not be in fibers 
 anyways.
 We have a deferToThread mechanism with a thread pool that waits 
 for such functions (if we want to do something that takes some 
 time, or use external library).
 Fibers should never ever block. If your fiber is blocking 
 you're violating the model.

 Fibers aren't some magic to solve every CS problem possible.

Actually, co-routines have been basic concurrency building blocks 
since the 50s, and from a CS perspective the degree of 
parallelism is an implementation detail.

 Looking at your example -- a good scheduler should have 
 distributed a-e evenly across both cores to begin with.

Nah, because that would require an a priori estimate.

 Then a good fibers programmer should yield() after each unit of 
 work, so aaaaaaa won't be a valid state.

Won't work when you call external libraries. Here is a likely 
pattern for an image scaling service:

1. check cache
2. request data if not found
3. process, save in cache and return

1____________2____________33333333

You can't just break up workload 3, you would run out of memory.

Jun 04 2015

Ivan Timokhin <timokhin.iv gmail.com> writes:

On Thu, Jun 04, 2015 at 07:24:47AM +0000, Liran Zvibel wrote:
 If you can come up with another programming model that leverages
 fibers (and is popular), and moving fibers between threads makes
 sense in that model, then I think the discussion should be how
 stronger that other model is with fibers being able to move, and
 whether it's worth the effort.

This might be relevant:
https://channel9.msdn.com/Events/GoingNative/2013/Bringing-await-to-Cpp

Specifically slide 12 (~12:30 in the video), where he discusses implementation.

Jun 04 2015

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 6/3/15 9:51 PM, Joakim wrote:

 Your entire argument seems based on fibers moving between threads
 breaking your reactor IO model.  If there was an option to
 disable fibers moving or if you had to explicitly ask for a fiber
 to move, your argument is moot.

 I have no dog in this fight, just pointing out that your argument
 is very specific to your use.

I plead complete ignorance and inexperience with fibers and thread 
scheduling.

But I think the sanest approach here is to NOT support moving fibers, 
and then add support if it becomes necessary. We can make the scheduler 
something that's parameterized, or hell, just edit your own runtime if 
you need it!

It may also be that fibers that move can't be statically checked to see 
if they will break on moving. That may simply just be on you, like casting.

I think for the most part, the safest default is to have a fiber 
scheduler that cannot possibly create races. Let's build from there.

-Steve

Jun 04 2015

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Thursday, 4 June 2015 at 13:16:48 UTC, Steven Schveighoffer 
wrote:
 On 6/3/15 9:51 PM, Joakim wrote:

 Your entire argument seems based on fibers moving between 
 threads
 breaking your reactor IO model.  If there was an option to
 disable fibers moving or if you had to explicitly ask for a 
 fiber
 to move, your argument is moot.

 I have no dog in this fight, just pointing out that your 
 argument
 is very specific to your use.

 I plead complete ignorance and inexperience with fibers and 
 thread scheduling.

 But I think the sanest approach here is to NOT support moving 
 fibers, and then add support if it becomes necessary. We can 
 make the scheduler something that's parameterized, or hell, 
 just edit your own runtime if you need it!

 It may also be that fibers that move can't be statically 
 checked to see if they will break on moving. That may simply 
 just be on you, like casting.

 I think for the most part, the safest default is to have a 
 fiber scheduler that cannot possibly create races. Let's build 
 from there.

One thing that needs to be considered that deadalnix pointed out 
at dconf is that we _do_ have shared(Fiber), and we have to deal 
with that in some manner, even if we don't want to support moving 
fibers across threads (even if that simply means disallowing 
shared(Fiber)).

- Jonathan M Davis

Jun 04 2015

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

I mostly agree with what you wrote, but I'd like to point out 
that it's probably safe to move some kinds of fibers across 
threads:

If the fiber's main function is pure and its parameters have no 
mutable indirection (i.e. if the function is strongly pure), 
there should be no way to get data races.

Therefore I believe we could theoretically support moving such 
fibers. But currently I see no way how most fibers can be made 
pure, after all you want to do IO in them. Of course, we could 
forego the purity requirement, but then the compiler can no 
longer support us.

Jun 04 2015

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 03-Jun-2015 21:34, Liran Zvibel wrote:
 Hi,

[snip]

 There are two main reasons why it does not make sense to move fibers
 between threads:

For me language being TLS by default is enough to not even try this 
madness. If we allow moves a typical fiber will see different "globals" 
depending on where it is scheduled next.

For instance, if a thread local connection is used (inside of some pool 
presumably) then:

Socket socket;

first_part = socket.read(...); // assume this yields
second_part = socket.read(...); // then this may use different socket



-- 
Dmitry Olshansky

Jun 04 2015

Dan Olson <gorox comcast.net> writes:

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

 On 03-Jun-2015 21:34, Liran Zvibel wrote:
 Hi,

 [snip]

 There are two main reasons why it does not make sense to move fibers
 between threads:

 For me language being TLS by default is enough to not even try this
 madness. If we allow moves a typical fiber will see different
 "globals" depending on where it is scheduled next.

Opposite problem too, with LLVM's TLS optimizations, the Fiber may keep
accessing same "global" even when yield() resumes on a different thread.

int someTls;      // optimizer caches address 

    auto fib = new Fiber({

        for (;;)
        {
            printf("%d fiber before yield\n", someTls);
            ++someTls;       // thread A's var
            Fiber.yield();
            ++someTls;       // resumed thread B, but still A's var
            printf("%d fiber after yield\n", someTls);
        }
    });

Jun 04 2015

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Wednesday, 3 June 2015 at 18:34:34 UTC, Liran Zvibel wrote:
 As we see, there is nothing to gain and lots to lose by moving 
 fibers between threads.

Given that it sounds like LLVM _can't_ implement moving fibers 
(or if it can, it'll really hurt performance), I think that we 
need a really compelling reason to allow it. And I haven't heard 
one from anyone thus far.

Initially, at dconf, Walter asserted that we needed to make 
fibers moveable across threads, but I haven't really heard anyone 
give a reason why we need to. deadalnix talked about load 
balancing that way, but you gave good reasons as to why that 
didn't make sense, and that argument is the closest that I've 
seen to a reason why it would make sense to move fibers across 
threads.

Now, like Steven, I've never used a fiber in my life (I really 
should look into them one of these days), so I'm ill-suited for 
making a decision on this, but it sounds to me like we should 
start by having it be illegal to move fibers across threads and 
then add the ability later if someone comes up with a good enough 
reason. Certainly, it's sounds questionable that it even _can_ be 
implemented and costly if it can.

Another approach would be to make it so that shared(Fiber) could 
be moved across threads but that Fiber can't be (or at least, 
it's undefined behavior if you do, since the compiler will assume 
that you won't), and if the 3 major backends can all support 
moving fibers across threads (even in an inefficient fashion), 
then we can just implement that support for shared(Fiber) and say 
that folks are free to shoot themselves in the foot using that if 
they so desire and let Fiber be more restrictive and not have it 
take the performance hit incurred by allowing fibers to be passed 
across threads.

But if LLVM really can't support moving fibers across threads, 
then I think that the clear answer is that we shouldn't allow it 
at all (in which case, shared(Fiber) should probably be outright 
disallowed).

- Jonathan M Davis

Jun 04 2015

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Thursday, 4 June 2015 at 22:28:52 UTC, Jonathan M Davis wrote:
 anyone give a reason why we need to. deadalnix talked about 
 load balancing that way, but you gave good reasons as to why 
 that didn't make sense,

What good reasons?

By the time you get response from your shared memcache or 
database the x86 cache level 1 and possibly 2 is cold. And cache 
level 3 is shared, so there is no cache penalty for switching 
cores. Add to this that two-and-two cores share primary caches so 
if you don't pair tasks that address the same memory you loose up 
to 10-20% performance in addition to unused capacity and 
increased latency. Smart scheduling matters, both at the OS level 
and at the application level. That's not a controversial 
statement (only in these forums…)!

The only good reason for not switching is that you lack 
resources/know-how. But then you probably should not make it a 
language feature in the first place...?

There is no reason to pretend that synthetic performance 
benchmarks don't carry weight when people pick a language for 
production. That's just wishful thinking.

Jun 05 2015

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 6/5/15 7:29 AM, "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= 
<ola.fosheim.grostad+dlang gmail.com>" wrote:
 On Thursday, 4 June 2015 at 22:28:52 UTC, Jonathan M Davis wrote:
 anyone give a reason why we need to. deadalnix talked about load
 balancing that way, but you gave good reasons as to why that didn't
 make sense,

 What good reasons?

 By the time you get response from your shared memcache or database the
 x86 cache level 1 and possibly 2 is cold. And cache level 3 is shared,
 so there is no cache penalty for switching cores. Add to this that
 two-and-two cores share primary caches so if you don't pair tasks that
 address the same memory you loose up to 10-20% performance in addition
 to unused capacity and increased latency.

I think I'll go with Liran's experience over your hypothetical 
anecdotes. You seem to have a lot of academic knowledge, but I'd rather 
see what actually happens. If you have that data, please share.

-Steve

Jun 05 2015

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Friday, 5 June 2015 at 13:20:27 UTC, Steven Schveighoffer 
wrote:
 I think I'll go with Liran's experience over your hypothetical 
 anecdotes. You seem to have a lot of academic knowledge, but 
 I'd rather see what actually happens. If you have that data, 
 please share.

There is absolutely no reason to go personal. I address weak 
arguments when I see them. Liran claimed there were no benefits 
to migrating fibers. That's not true. He is speaking for his 
particular use case, that is fine. It is easy to create a 
benchmark where locking fibers to a thread is beneficial. But it 
is completely orthogonal to my most likely D use case which is in 
low-latency web-services.

There will be no data that benefits D until D is a making itself 
look like a serious contender and do it well in aggressive 
external benchmarking. You don't get the luxury to choose what 
workload D's performance is benchmarked with!

D is an underdog compared to C++/Rust/Go. That means you need to 
get that 10-20% performance edge in benchmarks to make D look 
attractive.

If you want D to succeed you need to figure out what is D's main 
selling point and make it a compiler-based feature. If it is a 
library only solution, then any language can steal your thunder...

Jun 05 2015

"Chris" <wendlec tcd.ie> writes:

On Friday, 5 June 2015 at 14:17:35 UTC, Ola Fosheim Grøstad wrote:
 On Friday, 5 June 2015 at 13:20:27 UTC, Steven Schveighoffer 
 wrote:
 I think I'll go with Liran's experience over your hypothetical 
 anecdotes. You seem to have a lot of academic knowledge, but 
 I'd rather see what actually happens. If you have that data, 
 please share.

 There is absolutely no reason to go personal. I address weak 
 arguments when I see them. Liran claimed there were no benefits 
 to migrating fibers. That's not true. He is speaking for his 
 particular use case, that is fine. It is easy to create a 
 benchmark where locking fibers to a thread is beneficial. But 
 it is completely orthogonal to my most likely D use case which 
 is in low-latency web-services.

 There will be no data that benefits D until D is a making 
 itself look like a serious contender and do it well in 
 aggressive external benchmarking. You don't get the luxury to 
 choose what workload D's performance is benchmarked with!

 D is an underdog compared to C++/Rust/Go. That means you need 
 to get that 10-20% performance edge in benchmarks to make D 
 look attractive.

I agree, but I dare doubt that a slight performance edge will 
make the difference. There are load of factors (knowledge base, 
infrastructure, complacency, C++-Guruism, marketing etc.) why D 
is an underdog.

 If you want D to succeed you need to figure out what is D's 
 main selling point and make it a compiler-based feature. If it 
 is a library only solution, then any language can steal your 
 thunder...

The "problem" D has is that it has loads of selling points. Rust 
and Go were designed with very specific goals in mind, thus it's 
easy to sell them "You want X? We have X!". D has been developed 
over the years by a community not a committee. D is more like 
"You want X? Yeah, we have X, actually a slightly improved 
version of X we call it EX, and Y and Z on top of that. And A B C 
too! And templates!" - "Sorry, man! Too complicated for me! Can I 
just have a for-loop, please? Milk, no sugar, thanks." I know, as 
usual I simplify things and exaggerate! He he he. But programming 
languages are like everything else, only because something is 
good doesn't mean that people will buy it.

As regard compiler-based features, as soon as features are 
compiler-based people will complain "Why is it built-in? That 
should be handled by a library! I want more freedom!" I know for 
sure.

Jun 05 2015

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Friday, 5 June 2015 at 14:51:05 UTC, Chris wrote:
 I agree, but I dare doubt that a slight performance edge will 
 make the difference. There are load of factors (knowledge base, 
 infrastructure, complacency, C++-Guruism, marketing etc.) why D 
 is an underdog.

But everybody loves the underdog when it catches up to the pack 
and beats the pack on the finish line. ;^)

I now follow Pony because of this self-provided benchmark:

http://ponylang.org/benchmarks_all.pdf

They are communicating a focus for a domain, a good understanding 
of their area, and it makes me want to give it a spin even at 
this early stage where I obviously can't actually use it.

I am not saying Pony is good, but it makes a good case for itself 
IMO.

 no sugar, thanks." I know, as usual I simplify things and 
 exaggerate! He he he. But programming languages are like 
 everything else, only because something is good doesn't mean 
 that people will buy it.

Sure, but it is also important to make people take notice. People 
take notice of benchmark leaders. And too often benchmarks 
measure throughput while latency is just as important.

End user don't notice peak throughput (which is measurable as a 
bleep on the cloud server instance-count logs), they notice 
reduced latency. So to me latency is the most important aspect of 
a web-service (+ programmer productivity).

I don't find Go exciting, but they show concern for latency 
(concurrent GC etc). Communicating that concern is good, even 
before they reach whatever goals they have.

 As regard compiler-based features, as soon as features are 
 compiler-based people will complain "Why is it built-in? That 
 should be handled by a library! I want more freedom!" I know 
 for sure.

Heh, not if it is getting you an edge, but if it is a second 
citizen addition. Yes, then I agree.

Cheers!

Jun 05 2015

"Chris" <wendlec tcd.ie> writes:

On Friday, 5 June 2015 at 17:28:39 UTC, Ola Fosheim Grøstad wrote:
 On Friday, 5 June 2015 at 14:51:05 UTC, Chris wrote:
 I agree, but I dare doubt that a slight performance edge will 
 make the difference. There are load of factors (knowledge 
 base, infrastructure, complacency, C++-Guruism, marketing 
 etc.) why D is an underdog.

 But everybody loves the underdog when it catches up to the pack 
 and beats the pack on the finish line. ;^)

 I now follow Pony because of this self-provided benchmark:

 http://ponylang.org/benchmarks_all.pdf

 They are communicating a focus for a domain, a good 
 understanding of their area, and it makes me want to give it a 
 spin even at this early stage where I obviously can't actually 
 use it.

 I am not saying Pony is good, but it makes a good case for 
 itself IMO.

 no sugar, thanks." I know, as usual I simplify things and 
 exaggerate! He he he. But programming languages are like 
 everything else, only because something is good doesn't mean 
 that people will buy it.

 Sure, but it is also important to make people take notice. 
 People take notice of benchmark leaders. And too often 
 benchmarks measure throughput while latency is just as 
 important.

 End user don't notice peak throughput (which is measurable as a 
 bleep on the cloud server instance-count logs), they notice 
 reduced latency. So to me latency is the most important aspect 
 of a web-service (+ programmer productivity).

 I don't find Go exciting, but they show concern for latency 
 (concurrent GC etc). Communicating that concern is good, even 
 before they reach whatever goals they have.

 As regard compiler-based features, as soon as features are 
 compiler-based people will complain "Why is it built-in? That 
 should be handled by a library! I want more freedom!" I know 
 for sure.

 Heh, not if it is getting you an edge, but if it is a second 
 citizen addition. Yes, then I agree.

 Cheers!

Thanks for showing me Pony. Languages like Nim and Pony keep 
popping up which shows a) how important native compilation is and 
b) that there are still loads of issues in standard languages 

usable, and new languages often re-invent D.

Jun 05 2015

"Paulo Pinto" <pjmlp progtools.org> writes:

On Friday, 5 June 2015 at 18:25:26 UTC, Chris wrote:
 On Friday, 5 June 2015 at 17:28:39 UTC, Ola Fosheim Grøstad 
 wrote:
 On Friday, 5 June 2015 at 14:51:05 UTC, Chris wrote:
 I agree, but I dare doubt that a slight performance edge will 
 make the difference. There are load of factors (knowledge 
 base, infrastructure, complacency, C++-Guruism, marketing 
 etc.) why D is an underdog.

 But everybody loves the underdog when it catches up to the 
 pack and beats the pack on the finish line. ;^)

 I now follow Pony because of this self-provided benchmark:

 http://ponylang.org/benchmarks_all.pdf

 They are communicating a focus for a domain, a good 
 understanding of their area, and it makes me want to give it a 
 spin even at this early stage where I obviously can't actually 
 use it.

 I am not saying Pony is good, but it makes a good case for 
 itself IMO.

 no sugar, thanks." I know, as usual I simplify things and 
 exaggerate! He he he. But programming languages are like 
 everything else, only because something is good doesn't mean 
 that people will buy it.

 Sure, but it is also important to make people take notice. 
 People take notice of benchmark leaders. And too often 
 benchmarks measure throughput while latency is just as 
 important.

 End user don't notice peak throughput (which is measurable as 
 a bleep on the cloud server instance-count logs), they notice 
 reduced latency. So to me latency is the most important aspect 
 of a web-service (+ programmer productivity).

 I don't find Go exciting, but they show concern for latency 
 (concurrent GC etc). Communicating that concern is good, even 
 before they reach whatever goals they have.

 As regard compiler-based features, as soon as features are 
 compiler-based people will complain "Why is it built-in? That 
 should be handled by a library! I want more freedom!" I know 
 for sure.

 Heh, not if it is getting you an edge, but if it is a second 
 citizen addition. Yes, then I agree.

 Cheers!

 Thanks for showing me Pony. Languages like Nim and Pony keep 
 popping up which shows a) how important native compilation is 
 and [...]

Which is why after all those years, the OpenJDK will eventually 
support AOT compilation to native code for Java 10 with some work 
being done in JEP 220[0], and .NET does AOT native code on 
Windows Phone 8 (MDIL), with static compilation with Visual C++ 
backend coming with .NET Native.

And Android also went native with the Dalvik re-write.

The best approach is anyway to have a JIT/AOT capable toolchain 
and use them accordingly to the deployment target.

[0]Which means Oracle finally accepted why almost all commercial 
JVM vendors do offer such a feature. I read somewhere that JIT 
only was a kind of Sun political issue.

Jun 08 2015

maik klein <maikklein googlemail.com> writes:

Here is an interesting talk from Naughty Dog

http://www.gdcvault.com/play/1022186/Parallelizing-the-Naughty-Dog-Engine

They move Fibers between threads.

A rough overview:

You create task A that depends on task B. The task is submitted 
as a fiber and executed by a thread. Now task A has to wait for 
task B to finish so you hold the fiber and put it into a queue, 
you also create an atomic counter that tracks all dependencies, 
once the counter reaches 0 you know that all dependencies have 
finished.

Now you put task A into a queue and execute a different task. 
Once a thread completes a task it looks into the queue and checks 
if there is one task that has a counter of 0, which means it can 
continue to execute that task.

Now move that fiber/task onto a free thread and you can continue 
to execute that fiber.

What is the current state of fibers in D? I have asked this 
question on SO 
https://stackoverflow.com/questions/36663720/how-to-pass-a-fiber-to-a-thread

Apr 16 2016

Dicebot <public dicebot.lv> writes:

On 04/16/2016 03:45 PM, maik klein wrote:
 Here is an interesting talk from Naughty Dog
 
 http://www.gdcvault.com/play/1022186/Parallelizing-the-Naughty-Dog-Engine
 
 They move Fibers between threads.
 
 A rough overview:
 
 You create task A that depends on task B. The task is submitted as a
 fiber and executed by a thread. Now task A has to wait for task B to
 finish so you hold the fiber and put it into a queue, you also create an
 atomic counter that tracks all dependencies, once the counter reaches 0
 you know that all dependencies have finished.
 
 Now you put task A into a queue and execute a different task. Once a
 thread completes a task it looks into the queue and checks if there is
 one task that has a counter of 0, which means it can continue to execute
 that task.
 
 Now move that fiber/task onto a free thread and you can continue to
 execute that fiber.
 
 What is the current state of fibers in D? I have asked this question on
 SO
 https://stackoverflow.com/questions/36663720/how-to-pass-a-fiber-to-a-thread

Such design is neither needed for good concurrency, nor actually
helpful. Under heavy load (and that is the only case that is worth
optimizing for) there will be so many fibers that thread-local fiber
queues will always have enough work to keep them busy.

At the same time moving fibers between threads is harmful for plain
performance - it screws the cache and makes impossible to share
thread-local storage between fibers on same worker thread.

Simply picking a worker thread + worker fiber when task is assigned and
sticking to it until finished should work good enough. It is also
important to note though that "fiber" is not the same as "task". Former
is execution context primitive, latter is scheduling abstraction. In
fact, heavy load systems are likely to have many more tasks than fibers
at certain spike points.

Apr 16 2016

Suliman <evermind live.ru> writes:

 Simply picking a worker thread + worker fiber when task is 
 assigned and sticking to it until finished should work good 
 enough. It is also important to note though that "fiber" is not 
 the same as "task". Former is execution context primitive, 
 latter is scheduling abstraction. In fact, heavy load systems 
 are likely to have many more tasks than fibers at certain spike 
 points.

Could you explain difference between fibers and tasks. I read a 
lot, but still can't understand the difference.

Jan 08 2017

Suliman <evermind live.ru> writes:

"The type of concurrency used when logical threads are created is 
determined by the Scheduler selected at initialization time. The 
default behavior is currently to create a new kernel thread per 
call to spawn, but other schedulers are available that multiplex 
fibers across the main thread or use some combination of the two 
approaches" (с) dlang docs

Am I right understand that `concurrency` is just wrapper that 
hide implementation of tasks and fibers? So programmer can work 
with threads like with fibers and vice versa?

If yes, does it's mean that spawns is planing not but with system 
Scheduler, but with DRuntime Scheduler (or how it's can be 
named?) and all of them work in user-space?

Jan 08 2017

Chris Wright <dhasenan gmail.com> writes:

On Sun, 08 Jan 2017 09:18:19 +0000, Suliman wrote:

 Simply picking a worker thread + worker fiber when task is assigned and
 sticking to it until finished should work good enough. It is also
 important to note though that "fiber" is not the same as "task". Former
 is execution context primitive, latter is scheduling abstraction. In
 fact, heavy load systems are likely to have many more tasks than fibers
 at certain spike points.

 
 Could you explain difference between fibers and tasks. I read a lot, but
 still can't understand the difference.

A task is a unit of work to be scheduled.

A fiber is a concurrency mechanism supporting multiple independent 
stacks, like threads, that you can switch between. Unlike threads, a 
fiber continues to execute until it voluntarily yields execution.

You might have a task: send a registration message to a user who just 
registered. That gets scheduled onto a fiber. Your email sending stuff is 
vibe.d all the way down, and also you have to make some database queries. 
The IO involved causes the fiber that the task was scheduled on to yield 
execution several times. Finally, the task finishes, and the fiber can be 
destroyed -- or reused for another task.

Jan 08 2017

Dicebot <public dicebot.lv> writes:

On Sunday, 8 January 2017 at 09:18:19 UTC, Suliman wrote:
 Simply picking a worker thread + worker fiber when task is 
 assigned and sticking to it until finished should work good 
 enough. It is also important to note though that "fiber" is 
 not the same as "task". Former is execution context primitive, 
 latter is scheduling abstraction. In fact, heavy load systems 
 are likely to have many more tasks than fibers at certain 
 spike points.

 Could you explain difference between fibers and tasks. I read a 
 lot, but still can't understand the difference.

Fiber is context switching primitive very similar to thread. It 
is different from thread in a sense that it is completely 
invisible to operating system and only does context switching 
when explicitly told so in code. But it still can execute 
arbitrary code. When we talk about fibers in D, we usually mean 
https://dlang.org/library/core/thread/fiber.html

Task is abstraction over some specific piece of work to do. Most 
simple task one can think of is simply a function to execute. 
Other details may vary a lot -different languages and libraries 
implement tasks differently, and D standard library doesn't 
define it all. Most widespread task definition in D comes from 
vibe.d - http://vibed.org/api/vibe.core.task/Task

To summarize - fiber defines HOW to execute code but doesn't care 
which code to execute. Task defines WHAT code to execute but 
normally has no assumptions over how exactly it gets run.

Jan 08 2017

Russel Winder via Digitalmars-d <digitalmars-d puremagic.com> writes:

On Sun, 2017-01-08 at 09:18 +0000, Suliman via Digitalmars-d wrote:
 Simply picking a worker thread + worker fiber when task is=C2=A0
 assigned and sticking to it until finished should work good=C2=A0
 enough. It is also important to note though that "fiber" is not=C2=A0
 the same as "task". Former is execution context primitive,=C2=A0
 latter is scheduling abstraction. In fact, heavy load systems=C2=A0
 are likely to have many more tasks than fibers at certain spike=C2=A0
 points.

=20
 Could you explain difference between fibers and tasks. I read a=C2=A0
 lot, but still can't understand the difference.

A fibre is what a thread used to be before kernels supported threads
directly. Having provided that historical backdrop, that seems sadly
missing from the entire Web, the current status is roughly described
by:

https://en.wikipedia.org/wiki/Fiber_(computer_science)

http://stackoverflow.com/questions/796217/what-is-the-difference-betwee
n-a-thread-and-a-fiber

Tasks are things that can be scheduled using threads or fibres. It's
all down to thread pools and kernel processes. Which probably doesn't
help per se, but:

http://docs.paralleluniverse.co/quasar/

Quasar, GPars, std.parallelism, Java Fork/Join all harness all these
ideas.

In the end as a programmer you should be using actors, agents,
dataflow, data parallelism or some similar high level model. Anything
lower level and, to be honest, you are doing it wrong.


--=20
Russel.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D
Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder ekiga.n=
et
41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel winder.org.uk
London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder

Jan 08 2017

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:

On Sunday, 8 January 2017 at 09:18:19 UTC, Suliman wrote:
 Simply picking a worker thread + worker fiber when task is 
 assigned and sticking to it until finished should work good 
 enough. It is also important to note though that "fiber" is 
 not the same as "task". Former is execution context primitive, 
 latter is scheduling abstraction. In fact, heavy load systems 
 are likely to have many more tasks than fibers at certain 
 spike points.

 Could you explain difference between fibers and tasks. I read a 
 lot, but still can't understand the difference.

The meaning of the word "task" is contextual:

https://en.wikipedia.org/wiki/Task_(computing)

So, yes, it is a confusing term that one should avoid using 
without defining it.

Ola.

Jan 23 2017

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 6/5/15 10:17 AM, "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= 
<ola.fosheim.grostad+dlang gmail.com>" wrote:
 On Friday, 5 June 2015 at 13:20:27 UTC, Steven Schveighoffer wrote:
 I think I'll go with Liran's experience over your hypothetical
 anecdotes. You seem to have a lot of academic knowledge, but I'd
 rather see what actually happens. If you have that data, please share.

 There is absolutely no reason to go personal.

I didn't, actually. Your arguments seem well crafted and persuasive, but 
I've seen so many arguments based on theory that don't always pan out. I 
like to see hard data. That's what Liran's experience provides. Perhaps 
you have it too? Please share if you do.

-Steve

Jun 05 2015

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Friday, 5 June 2015 at 19:21:32 UTC, Steven Schveighoffer 
wrote:
 I didn't, actually. Your arguments seem well crafted and 
 persuasive, but I've seen so many arguments based on theory 
 that don't always pan out. I like to see hard data. That's what 
 Liran's experience provides. Perhaps you have it too? Please 
 share if you do.

I have absolutely no idea what you are talking about. Experience 
is data? Huh?

If you talk about benchmarking, you do this by defining a 
baseline to measure up against and run a wide set of demanding 
workloads with increasing load until the system performance 
collapses, then you analyze the outcome for each workload. One 
usually pick best-of-breed "competitor" as the baseline. E.g. 
Nginx gained traction by benchmarking against Apache.

If you are talking about multi-threading/fibers/event-based 
systems you read technical optimization manuals from CPU vendors 
for each processor generation, they provide what you need to know 
when designing scheduling heuristics. The problem is how to give 
the scheduler meta information. In event systems that is 
explicit, in D you could provide information through "yield" 
either by profiling, analysis, or explict... but getting to event 
based performance isn't all that easy...

Jun 06 2015

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 05-Jun-2015 14:29, "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= 
<ola.fosheim.grostad+dlang gmail.com>" wrote:
 On Thursday, 4 June 2015 at 22:28:52 UTC, Jonathan M Davis wrote:
 anyone give a reason why we need to. deadalnix talked about load
 balancing that way, but you gave good reasons as to why that didn't
 make sense,

 What good reasons?

 By the time you get response from your shared memcache or database the
 x86 cache level 1 and possibly 2 is cold.

Cache arguments are hard to get right w/o experiment. That "possibly" 
may be enough compared to certainly cold.

However I'll answer theoretically to equally theoretical argument.

If there is affinity and we assume that OS schedules threads on the same 
cores*  then each core has it's cache loaded with (some of) stacks of 
its fibers. If we assume sharing fibers across all cores, then each core 
will have to cache stacks for all of fibers which is wasteful.

So fiber affinity => that much less burden on each of core's caches, 
making them that much hotter.

* You seem to assume the same. Fine assumption given that OS usually 
tries to keep the same cores working on the same threads, for the 
similar reasons I believe.

  Add to this that
 two-and-two cores share primary caches so if you don't pair tasks that
 address the same memory you loose up to 10-20% performance in addition
 to unused capacity and increased latency. Smart scheduling matters, both
 at the OS level and at the application level. That's not a controversial
 statement (only in these forums…)!

Moving fibers across threads have no effect on all of the above even if 
there is some truth. There is simply no way to control what core 
executes which thread to begin with, this assignment is the OS territory.

 The only good reason for not switching is that you lack
 resources/know-how.

Reasons were presented, but there is nothing in your answer that at 
least acknowledges that.

 But then you probably should not make it a language
 feature in the first place...?

Then it's a good chance for you to prove your design by experimentation. 
That if we all accept concurrency issues with moving fibers that violate 
some language guarantees.

-- 
Dmitry Olshansky

Jun 05 2015

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Friday, 5 June 2015 at 13:44:16 UTC, Dmitry Olshansky wrote:
 If there is affinity and we assume that OS schedules threads on 
 the same cores*  then each core has it's cache loaded with 
 (some of) stacks of its fibers. If we assume sharing fibers 
 across all cores, then each core will have to cache stacks for 
 all of fibers which is wasteful.

If you cannot control affinity then you can't take advantage of 
hyper-threading either? I need to think of this in terms of 
_smart_ scheduling and adaptive load balancing.

 Moving fibers across threads have no effect on all of the above 
 even if there is some truth.

In order to get benefits from hyper-threading you need pay close 
attention how you schedule, or you should turn it off.

 There is simply no way to control what core executes which 
 thread to begin with, this assignment is the OS territory.

If your OS is does not support hyper-threading level control you 
should turn it off...

 The only good reason for not switching is that you lack
 resources/know-how.

 Reasons were presented, but there is nothing in your answer 
 that at least acknowledges that.

No, there were no performance related reasons, only TLS (which is 
a questionable feature to begin with).

 Then it's a good chance for you to prove your design by 
 experimentation. That if we all accept concurrency issues with 
 moving fibers that violate some language guarantees.

There is nothing to prove. You either perform worse or better 
than a carefully scheduled event-based solution in C++. You 
either perform worse or better than Go 1.5 in scheduling and GC.

However, doing well in externally designed and executed 
benchmarks on _language_ _features_ is good marketing (even if 
that 10-20% edge does not matter in real world applications).

Right now, neither concurrency or GC are really D language 
features, they are more like library/runtime features. That makes 
it difficult to excel in those areas. In languages like Go, 
Erlang and Pony concurrency is a language feature.

Jun 05 2015

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 05-Jun-2015 17:04, "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= 
<ola.fosheim.grostad+dlang gmail.com>" wrote:
 On Friday, 5 June 2015 at 13:44:16 UTC, Dmitry Olshansky wrote:
 If there is affinity and we assume that OS schedules threads on the
 same cores*  then each core has it's cache loaded with (some of)
 stacks of its fibers. If we assume sharing fibers across all cores,
 then each core will have to cache stacks for all of fibers which is
 wasteful.

 If you cannot control affinity then you can't take advantage of
 hyper-threading either?

You choose to ignore the point about duplicating the same memory in each 
core's cache. To me it seems like throwing random CPU technologies won't 
help make your argument stronger.

However I stand corrected - there are sys-calls to confine thread to 
specifics subset of cores. The point about cache stays as is as it 
assumed each thread prefers to run the same core vs e.g. always running 
on the same core.

 I need to think of this in terms of _smart_
 scheduling and adaptive load balancing.

Can't help you there, especially w/o definition of the first.

Adaptive load-balancing is quite possible with fibers sticking to a 
thread and is a question of application design.

 Moving fibers across threads have no effect on all of the above even
 if there is some truth.

 In order to get benefits from hyper-threading you need pay close
 attention how you schedule, or you should turn it off.

I bet it still helps some workloads and hurts others without "me" 
scheduling anything. There are some things OS can do just fine.

 There is simply no way to control what core executes which thread to
 begin with, this assignment is the OS territory.

 If your OS is does not support hyper-threading level control you should
 turn it off...

Not sure if this is English, but I stand corrected in that one may set 
thread affinity for each thread manually. What I argued for is that 
default is mostly the same and the point stands as is.

 The only good reason for not switching is that you lack
 resources/know-how.

 Reasons were presented, but there is nothing in your answer that at
 least acknowledges that.

 No, there were no performance related reasons,

I haven't said performance. Fast and incorrect is cheap.

 only TLS (which is a
 questionable feature to begin with).

Aye, no implicit data-races by default is questionable design. What 
questions do you have?


-- 
Dmitry Olshansky

Jun 05 2015

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Friday, 5 June 2015 at 15:06:04 UTC, Dmitry Olshansky wrote:
 You choose to ignore the point about duplicating the same 
 memory in each core's cache. To me it seems like throwing

Not sure what you mean by this. 3rd level cache is shared. 
Die-level cache is shared. Primary caches are small and are 
shared between pairs of hyper-threaded cores. If a task has been 
suspended for 100ms you can just assume that primary cache is 
cold.

 Adaptive load-balancing is quite possible with fibers sticking 
 to a thread and is a question of application design.

Then you should not have fibers at all since an event based 
solution is even faster (but more work). Coroutines is a 
convenience feature, not a performance feature. You need control 
over workload scheduling to optimize to prevent 3rd level cache 
pollution. Random  fine grained scheduling is not good for memory 
intensive workloads because you push out data from the caches 
prematurely.

 I bet it still helps some workloads and hurts others without 
 "me" scheduling anything.

Hyperthreading requires two cores to run specific workloads at 
the same time. If not you are better off just halting that extra 
core. The idea with hyperthreading is that one thread fills in 
holes in the pipeline when the other thread is stalled.

 Not sure if this is English,

When people pick on typos the debate is essentially over...

EOD

Jun 05 2015

Dan Olson <gorox comcast.net> writes:

"Ola Fosheim "Grøstad\"" <ola.fosheim.grostad+dlang gmail.com> writes:

 No, there were no performance related reasons, only TLS (which is a
 questionable feature to begin with).

On TLS and migrating Fibers - these were posted elsewhere, and want to
make sure that when you read TLS Fiber problem here, it is understood to
be something that could be solved by compiler solution.

David has a good overview of the problem here:

https://github.com/ldc-developers/ldc/issues/666

And Boost discussion to show D is not alone here:

http://www.crystalclearsoftware.com/soc/coroutine/coroutine/coroutine_thread.html

Jun 05 2015

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Friday, 5 June 2015 at 15:18:59 UTC, Dan Olson wrote:
 On TLS and migrating Fibers - these were posted elsewhere, and 
 want to
 make sure that when you read TLS Fiber problem here, it is 
 understood to
 be something that could be solved by compiler solution.

What I meant is that I don't have a use case for TLS in my own 
programs.

I think TLS is primarily useful for runtime-level issues like 
thread local allocators. I either read from global immutables or 
use lock-free datastructures for sharing...

Jun 05 2015

Shachar Shemesh <shachar weka.io> writes:

On 05/06/15 16:44, Dmitry Olshansky wrote:
 * You seem to assume the same. Fine assumption given that OS usually
 tries to keep the same cores working on the same threads, for the
 similar reasons I believe.

I see that people already raised the point that the OS does allow you to 
pin a thread to specific cores, so lets skip repeating that.

AFAIK, the kernel tries to keep threads running on the same core they 
did before is because moving them requires so much locking, synchronous 
assembly instructions and barriers, resulting in huge costs for 
migrating threads between cores.

Which turns out to be relevant to this discussion, because that will, 
likely, also be required in order to move fibers between threads.

A while back, a friend and myself ran an (incomplete) research project 
where we tried reverting to the long discarded "one thread per socket" 
model. It actually performed really well (much much better than the 
"common wisdom" would have it perform), provided you did two things:
1. Use a thread pool. Do not actually spawn a new thread each time a new 
incoming connection arrives
and
2. pin that thread to a core, don't let it migrate

Since we are talking about several tens of thousands of threads, each 
random fluctuation in the load resulted in the kernel's scheduler 
wishing to migrate them, resulting in losing thousands of percent worth 
of performance. Once we locked the threads into place, we were, more or 
less, on par with micro-threading in terms of overall performance the 
server could take.

Shachar

Jun 06 2015

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Saturday, 6 June 2015 at 18:49:30 UTC, Shachar Shemesh wrote:
 Since we are talking about several tens of thousands of 
 threads, each random fluctuation in the load resulted in the

Using an unlikely workload that the kernel has not been designed 
and optimized for is in general a bad idea. Especially on a 
generic scheduler that has no knowledge of the nature of the 
workload and therefore is (or should be) designed to avoid worst 
case starvation scenarios.

Jun 07 2015

"Dicebot" <public dicebot.lv> writes:

For the record : I am fully with Liran on this case.

Jun 04 2015

"Paolo Invernizzi" <paolo.invernizzi no.address> writes:

On Friday, 5 June 2015 at 06:03:13 UTC, Dicebot wrote:
 For the record : I am fully with Liran on this case.

+1 also for me.

At work we are using fibers when appropriate, and I see no 
advantages in moving them.

/P

Jun 04 2015

D Programming

C/C++ Programming

Other

digitalmars.D - Lets talk about fibers