www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Dual Core Support

reply Manfred Nowak <svv1999 hotmail.com> writes:
The shipping of the "AMD Athlon 64 X2" is announced to start at the 
end of this month.

A review is available:
http://www.amdreview.com/reviews.php?rev=athlonx24200

As the review suggests WinXP and Sandra are prepared to use more than 
one CPU.

Will D be outdated before the release of 1.0 because D has no support 
for multi core units?

-manfred
Jun 16 2005
next sibling parent Brad Beveridge <brad somewhere.net> writes:
Manfred Nowak wrote:
 The shipping of the "AMD Athlon 64 X2" is announced to start at the 
 end of this month.
 
 A review is available:
 http://www.amdreview.com/reviews.php?rev=athlonx24200
 
 As the review suggests WinXP and Sandra are prepared to use more than 
 one CPU.
 
 Will D be outdated before the release of 1.0 because D has no support 
 for multi core units?
 
 -manfred

should take advantage of multiple cpus/cores. Or am I missing something? Brad
Jun 16 2005
prev sibling next sibling parent reply "Lionello Lunesu" <lio lunesu.removethis.com> writes:
| Will D be outdated before the release of 1.0 because D has no support
| for multi core units?

There's nothing special about multi-core processors, at least when it comes 
to the compiler, it's all the same. A PC with a dual-core CPU (or two 
'single-core' CPU's for that matter) can simply run two programs at full 
speed, at the same time.

On a single-core CPU, the operating system lets each running program use the 
CPU for a fraction of a second, so it seems they are running at the same 
time, but they never really are.

L. 
Jun 16 2005
parent reply Manfred Nowak <svv1999 hotmail.com> writes:
"Lionello Lunesu" <lio lunesu.removethis.com> wrote:

 There's nothing special about multi-core processors, at least
 when it comes to the compiler, it's all the same.

Thank you both for your responses, Brad and Lionellu. In essence both of you seem to want the OS to represent a multicore system as a virtual single core system to you. In this case you are right: neglecting the fact that you have a multicore system does not raise any need to use its capabilities. On the other hand the OS has to do the work to make the multicore sytem to appear as a virtual single core system to you. | If control of Northbridge functions is shared between software | on both cores, software must ensure that only one core at a time | is allowed to access the shared MSR. http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_ docs/26094.PDF (p. 324) So there is a need to adress the specialities of dual core machines. Please recall that an AMD Athlon64 system can contain up to 8 dual core units and that one of D's major goals is to | Provide low level bare metal access as required http://www.digitalmars.com/d/overview.html Is this really true when all bare metal access has to use the asm statement? Please look deeper into the D specs: http://www.digitalmars.com/d/statement.html The throw-statement: | The Object reference is thrown as an exception. What will happen if both cores throw an exception at the same clock impulse? The volatile satement: | Memory writes occurring before the Statement are performed | before any reads within or after the Statement. Memory reads | occurring after the Statement occur after any writes before or | within Statement are completed. What does this mean for a multi core system, which shares the main memory between all activated cores? Algorithmically it is simply not true that a dual core system is aequivalent to a higher clocked single core system! Please recall the simple task of deciding wether there is a given and fixed value in an array large enough. Using a virtual single core machine you would simply loop through all indices until you find the given value or end up not finding it, then issuing the appropriate result. Given a natural number n (n>=2 && n <=16) and a mchine with n cores you would divide the array into n equal sized pieces and assign a core to each piece of the array. In case of not finding the searched value you would in essence end up having cut down the number of clock cycles needed to an n-th of the time of a virtual single core system. But if you cannot assign a core to a task because the used language does not allow this assignment you can do nothing more than assigning the n parts of the array to n threads and then _hope_ that the OS will execute them in parallel. Would you trust your life to a system, that is usually fast but cannot be assured to have reaction time prolongations in a magnitude of more than ten? You may want to answer with "no", and in this case my initial question on the outdatedness of D is assigned a positive value. -manfred
Jun 17 2005
next sibling parent reply xs0 <xs0 xs0.com> writes:
Manfred Nowak wrote:
 "Lionello Lunesu" <lio lunesu.removethis.com> wrote:
 
 
There's nothing special about multi-core processors, at least
when it comes to the compiler, it's all the same.

Thank you both for your responses, Brad and Lionellu. In essence both of you seem to want the OS to represent a multicore system as a virtual single core system to you. In this case you are right: neglecting the fact that you have a multicore system does not raise any need to use its capabilities.

AFAIK, multi-core processors are almost exactly the same as having multiple cpus, except they're in a single box and share a single bus to the outside world. So, I'd say that there's nothing much that can be done beyond what is already done (which is basically multi-threading support and synchronization objects). I don't think starting a thread is light-weight enough that the compiler should try to multi-thread code automatically, because in 99.9% cases there'd be no benefit.
 On the other hand the OS has to do the work to make the multicore
 sytem to appear as a virtual single core system to you. 

I think the OS does just the opposite - by scheduling and task-switching, it hides the actual CPUs/cores, and makes the system appear as having any number of them (where the number is the number of threads that are running).
 | If control of Northbridge functions is shared between software
 | on both cores, software must ensure that only one core at a time
 | is allowed to access the shared MSR. 
 http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_
 docs/26094.PDF (p. 324)
 
 So there is a need to adress the specialities of dual core machines.

You should've also mentioned the title of the white paper, which is BIOS and Kernel Developer's Guide for [AMD processors]. I disagree that D should be specialized for those types of software, and I think you'd still need assembler anyway; much important kernel code is both speed-critical and extremely specific, so coding it in a high-level langauge is just not an option realistically.
 Please look deeper into the D specs:
 http://www.digitalmars.com/d/statement.html
 
 The throw-statement:
 | The Object reference is thrown as an exception.
 
 What will happen if both cores throw an exception at the same
 clock impulse? 

Each thread will unwind its stack, like it does now, until it gets to an exception handler.. I don't see the difference when there is more than one core..
 The volatile satement:
 | Memory writes occurring before the Statement are performed
 | before any reads within or after the Statement. Memory reads
 | occurring after the Statement occur after any writes before or
 | within Statement are completed.
 
 What does this mean for a multi core system, which shares the main 
 memory between all activated cores?

Again, you skipped an important part: A volatile statement does not guarantee atomicity. Whenever more than one thread can access the same memory (where at least one is writing to it), the accesses should be synchronized, multi-core or not. Providing synchronization methods is the job of OS and/or hardware, and using them is already simple in D.
 Algorithmically it is simply not true that a dual core system is 
 aequivalent to a higher clocked single core system!

Unfortunately, no, it isn't.
 [snip]
 
 But if you cannot assign a core to a task because the used language 
 does not allow this assignment you can do nothing more than assigning 
 the n parts of the array to n threads and then _hope_ that the OS 
 will execute them in parallel.

The OS is in charge of both cores anyway; you can't bypass it and somehow take control of the cores, so you hope for the best in any case. That's another reason why automatically multi-threading doesn't make much sense.
 Would you trust your life to a system, that is usually fast but 
 cannot be assured to have reaction time prolongations in a magnitude 
 of more than ten?

No, but luckily both software and OSs in such systems are usually written with hard guarantees about how much time anything takes..
 You may want to answer with "no", and in this case my initial 
 question on the outdatedness of D is assigned a positive value.

Well, I certainly wouldn't like D to be outdated so soon, but I think that as far as performance is concerned, there are several better things that could be done first (any-order loops, array ops, easier MMX/SSE utilization, etc.). I think that only after single-thread optimizations are exhausted, we (or D or Walter) should be moving towards multi-cpu/core stuff. xs0
Jun 17 2005
parent reply Manfred Nowak <svv1999 hotmail.com> writes:
xs0 <xs0 xs0.com> wrote:

[...]
 AFAIK, multi-core processors are almost exactly the same as
 having multiple cpus, except they're in a single box and share a
 single bus to the outside world.

Thanks for your opinions, I have read them carefully several times. There is one fundamental difference between dual-cores and dual- cpus: dual-cores can exchange data over the internal bus and do not need any bandwidth on the bus to the "outside world". I.e. if you have a multi dual-core machine and knows that two threads have to communicate intensively you loose performance if you cannot control to have both threads running on a single dual- core die.
 So, I'd say that there's
 nothing much that can be done beyond what is already done (which
 is basically multi-threading support and synchronization
 objects). 

I do not find the hook in your arguments to the explanation why control of the two points of execution (which are implied by a dual core machine) is not necessary. In the example of the throw statement you even explicitely say, that you are not interested in guiding the machine, instead the machine is allowed to do what ever _randomly_ occurs first To explain why this might be wrong imagine security rules for a train: - if the pressing of the alive-knob for the driver times out then stop the train as if you are joining in to a sattion - if fire alarm is issued then bring the train to a stop as fast as possible except you are in a tunnel, then delay the stopping of the train until you have left the tunnel Now what will your machine do if fire alarm is issued in a tunnel and the pressing of the alive-knob is timing out also? -manfred
Jun 18 2005
next sibling parent reply xs0 <xs0 xs0.com> writes:
Manfred Nowak wrote:
 Thanks for your opinions, I have read them carefully several times.
 
 There is one fundamental difference between dual-cores and dual-
 cpus: dual-cores can exchange data over the internal bus and do not 
 need any bandwidth on the bus to the "outside world".
 
 I.e. if you have a multi dual-core machine and knows that two 
 threads have to communicate intensively you loose performance if 
 you cannot control to have both threads running on a single dual-
 core die.

I don't know what can or can't be done over the internal bus, but as far as thread control is concerned, it's not something that can be done by user apps, no matter what you do to the language they were coded in, because it's in the OS domain. If/when OS supports it, the functionality is available through an OS library, so, everything that D needs for multi-core CPU support is already there (access to OS :) Again, I think it'd be better to focus on providing constructs that allow optimization in general. When/if it is feasible to optimize them by utilizing multi-core cpus in the way you'd want, the only thing that needs to be done is improve the compiler. In the meantime, they can be optimized for other cases, like by making use of MMX/SSE instructions, which I think are totally underutilized generally, and which could easily provide comparable gains in speed. Well, writing all this, I think I'm not sure what are you actually proposing to be done. You seem to want some sort of multi-core support, but what would that be? Can you give an example or two?
So, I'd say that there's
nothing much that can be done beyond what is already done (which
is basically multi-threading support and synchronization
objects). 

[...] I do not find the hook in your arguments to the explanation why control of the two points of execution (which are implied by a dual core machine) is not necessary.

I'm not saying it's not necessary, I'm just saying it's not something that can be done in the language itself.
 In the example of the throw statement you even explicitely say, 
 that you are not interested in guiding the machine, instead the 
 machine is allowed to do what ever _randomly_ occurs first

In a general-purpose OS, everything is basically random - at any time, the OS can switch to another task. In a real-time OS, things are different (although, admittedly, I don't know how much), but I guess most software we're writing won't be running on such an OS. Even regardless of all this - considering the two simultaneous exceptions case: if they can occur simultaneously, it's almost certain that they can also occur within, say, 1 microsecond. If that is so, you must handle both cases of which occurs first anyway; when that is done, it doesn't matter anymore which comes first..
 To explain why this might be wrong imagine security rules for a 
 train:
 - if the pressing of the alive-knob for the driver times out then 
 stop the train as if you are joining in to a sattion
 - if fire alarm is issued then bring the train to a stop as fast as 
 possible except you are in a tunnel, then delay the stopping of the 
 train until you have left the tunnel
 
 Now what will your machine do if fire alarm is issued in a tunnel 
 and the pressing of the alive-knob is timing out also?

Hmm, I'm not sure where you see randomness in all this (hopefully, the software would be coded to handle the case where both things occur), but as for "my machine" - for something this simple (stop if (at_station && !alive) || (on_fire && !inside_tunnel)), I wouldn't use a CPU at all, this can be done far more reliably with a few really big logic gates :) xs0
Jun 18 2005
parent Manfred Nowak <svv1999 hotmail.com> writes:
xs0 <xs0 xs0.com> wrote:

 You seem to want some sort of multi-core support, but what would
 that be? Can you give an example or two? 

Please have a look at http://plg.uwaterloo.ca/~usystem/pub/uSystem/uC++book.pdf Thanks for "Marco A"'s post 29355 in the old D group for directing me to this reference. -manfred
Jun 18 2005
prev sibling parent reply Sean Kelly <sean f4.ca> writes:
In article <d90h6h$134$1 digitaldaemon.com>, Manfred Nowak says...
xs0 <xs0 xs0.com> wrote:

[...]
 AFAIK, multi-core processors are almost exactly the same as
 having multiple cpus, except they're in a single box and share a
 single bus to the outside world.

Thanks for your opinions, I have read them carefully several times. There is one fundamental difference between dual-cores and dual- cpus: dual-cores can exchange data over the internal bus and do not need any bandwidth on the bus to the "outside world". I.e. if you have a multi dual-core machine and knows that two threads have to communicate intensively you loose performance if you cannot control to have both threads running on a single dual- core die.

Yikes. So you're saying you'd have lockless sharing of data between the cores and only force a cache sync when communicating between processors? Makes sense, I suppose, but it sounds risky.
In the example of the throw statement you even explicitely say, 
that you are not interested in guiding the machine, instead the 
machine is allowed to do what ever _randomly_ occurs first

To explain why this might be wrong imagine security rules for a 
train:
- if the pressing of the alive-knob for the driver times out then 
stop the train as if you are joining in to a sattion
- if fire alarm is issued then bring the train to a stop as fast as 
possible except you are in a tunnel, then delay the stopping of the 
train until you have left the tunnel

Now what will your machine do if fire alarm is issued in a tunnel 
and the pressing of the alive-knob is timing out also?

Perhaps I'm missing something, but I don't see why this example requires special assembly-level handling of exceptions. If the button failure exception is thrown before the fire warning is signalled, then the train will begin to slow down. Then when the fire warning is signalled I assume the train will continue on at its existing speed until it exits the tunnel, then it will stop? And if the reverse happens, the train will ignore the stop button time-out because it's handing a more important directive. Is the issue that you don't want to use traditional synchronization in the error handling mechanism and would rather prioritize at the signalling level? I'll admit I haven't done this sort of programming before. Sean
Jun 18 2005
parent reply Manfred Nowak <svv1999 hotmail.com> writes:
Sean Kelly <sean f4.ca> wrote:
[...] 
 Yikes.  So you're saying you'd have lockless sharing of data
 between the cores and only force a cache sync when communicating
 between processors?  Makes sense, I suppose, but it sounds
 risky. 

Why lockless?
In the example of the throw statement you even explicitely say, 
that you are not interested in guiding the machine, instead the 
machine is allowed to do what ever _randomly_ occurs first

To explain why this might be wrong imagine security rules for a 
train:
- if the pressing of the alive-knob for the driver times out
then stop the train as if you are joining in to a sattion
- if fire alarm is issued then bring the train to a stop as fast
as possible except you are in a tunnel, then delay the stopping
of the train until you have left the tunnel

Now what will your machine do if fire alarm is issued in a
tunnel and the pressing of the alive-knob is timing out also?

If the button failure exception is thrown before the fire warning

 And if the reverse happens

 Is the issue that you don't want to use traditional
 synchronization in the error handling mechanism and would rather
 prioritize at the signalling level?

I see, that you catched the basic principal behind my example. And as you may see above it is difficult to the human brain to think in concurrency: you serialized the events but do not handle the case when depending on an unlucky implementation both cores might independently raise both exceptions, one core the fire exception and the other the alive-knob exception. In this case you have a control leak. There is one more thing to mention: it is not seldom, that specifications are incomplete or even contradictory and that detection of this specification faults occurs late in the software production process. Depending on the awareness of the implementators such a fault might traverse into the final product. Have a look at your two cases: you are handling the case that the alive-knob exception comes first, but you missed that the fire-knob exception might be thrown, when the train stopped already, but in a tunnel. -manfred
Jun 19 2005
parent reply Sean Kelly <sean f4.ca> writes:
In article <d94alb$2gld$1 digitaldaemon.com>, Manfred Nowak says...
Sean Kelly <sean f4.ca> wrote:
[...] 
 Yikes.  So you're saying you'd have lockless sharing of data
 between the cores and only force a cache sync when communicating
 between processors?  Makes sense, I suppose, but it sounds
 risky. 

Why lockless?

If multiple cores share a single cache, then there's no need to force cache coherency when sharing data between them. Of course, that assumes there's some way to tell you're running on two cores sharing a cache, which may not be possible. As for why: cache synchs take time. Less time than full locking, but time nevertheless. I don't know how useful this would be for PCs, but for NUMA machines that have clusered cores but inter-cluster ops involve message-passing, this may be a reasonable strategy. Though I'm speculating here, as I've never actually coded for such a machine.
I see, that you catched the basic principal behind my example. And 
as you may see above it is difficult to the human brain to think in 
concurrency: you serialized the events but do not handle the case 
when depending on an unlucky implementation both cores might 
independently raise both exceptions, one core the fire exception 
and the other the alive-knob exception.

In this case you have a control leak.

Why can't the exception handlers serialize error-handing though? There ultimately has to be some coordination to resolve potentially conflicting directives. Why should this happen when the exception is thrown as opposed to when it's caught?
There is one more thing to mention: it is not seldom, that 
specifications are incomplete or even contradictory and that 
detection of this specification faults occurs late in the software 
production process.

Depending on the awareness of the implementators such a fault might 
traverse into the final product.

Have a look at your two cases: you are handling the case that the 
alive-knob exception comes first, but you missed that the fire-knob 
exception might be thrown, when the train stopped already, but in a 
tunnel.

And what if the train had already stopped because of an engine failure, or because someone pulled the emergency brake? The 'fire' routine would need to know whether it should try and move a stopped train out of a tunnel, etc. How can this be solved by prioritizing exceptions? Or am I missing something? Sean
Jun 19 2005
parent reply James Dunne <james.jdunne gmail.com> writes:
It's been said in this thread before, but multi-threading control is a function
of the OS and not the language.  Is C a dead language because it doesn't have
dual-core functionality?  Of course not.  Although, we're still not clear on
what dual-core functionality is being proposed to be added to the language.
Regardless, it shouldn't be a concern.  Simple mutli-threading constructs and
locking mechanisms should be enough to guarantee that D will work in dual-core
systems.

In article <d94hu8$2l7i$1 digitaldaemon.com>, Sean Kelly says...
In article <d94alb$2gld$1 digitaldaemon.com>, Manfred Nowak says...
Sean Kelly <sean f4.ca> wrote:
[...] 
 Yikes.  So you're saying you'd have lockless sharing of data
 between the cores and only force a cache sync when communicating
 between processors?  Makes sense, I suppose, but it sounds
 risky. 

Why lockless?

If multiple cores share a single cache, then there's no need to force cache coherency when sharing data between them. Of course, that assumes there's some way to tell you're running on two cores sharing a cache, which may not be possible. As for why: cache synchs take time. Less time than full locking, but time nevertheless. I don't know how useful this would be for PCs, but for NUMA machines that have clusered cores but inter-cluster ops involve message-passing, this may be a reasonable strategy. Though I'm speculating here, as I've never actually coded for such a machine.
I see, that you catched the basic principal behind my example. And 
as you may see above it is difficult to the human brain to think in 
concurrency: you serialized the events but do not handle the case 
when depending on an unlucky implementation both cores might 
independently raise both exceptions, one core the fire exception 
and the other the alive-knob exception.

In this case you have a control leak.

Why can't the exception handlers serialize error-handing though? There ultimately has to be some coordination to resolve potentially conflicting directives. Why should this happen when the exception is thrown as opposed to when it's caught?
There is one more thing to mention: it is not seldom, that 
specifications are incomplete or even contradictory and that 
detection of this specification faults occurs late in the software 
production process.

Depending on the awareness of the implementators such a fault might 
traverse into the final product.

Have a look at your two cases: you are handling the case that the 
alive-knob exception comes first, but you missed that the fire-knob 
exception might be thrown, when the train stopped already, but in a 
tunnel.

And what if the train had already stopped because of an engine failure, or because someone pulled the emergency brake? The 'fire' routine would need to know whether it should try and move a stopped train out of a tunnel, etc. How can this be solved by prioritizing exceptions? Or am I missing something? Sean

Regards, James Dunne
Jun 19 2005
parent reply Manfred Nowak <svv1999 hotmail.com> writes:
James Dunne <james.jdunne gmail.com> wrote:

 Is C a dead language because it doesn't have dual-core
 functionality? Of course not.

True. But have you read why Buhr abandoned his concurrency project in C?
 Simple
 mutli-threading constructs and locking mechanisms should be
 enough to guarantee that D will work in dual-core systems.

Can you prove that? [...]
In this case you have a control leak.



Why should they? This kind of argument has shown up repeatedly: Why should a concurrent working machine be viewed as a serial working machine? In fact the AMD cores are designed to have a programmable lower bound on the priority of interrupts they will handle: so they will handle interrupts concurrently. [...]
And what if the train had already stopped because of an engine
failure, or because someone pulled the emergency brake?


You are right, that you can extend the security rules and will have more complex scenes to solve. Therefore I limited the example to only three variables.
The
'fire' routine would need to know whether it should try and move
a stopped train out of a tunnel, etc.  How can this be solved by
prioritizing exceptions?  Or am I missing something? 


This truly cannot be done by prioritizing and therefore I said, that you have a control leak: depending on the implementation it might be necessary to preemptry both taks assigned to the cores and start one adapted to the more complex scene. -manfred
Jun 19 2005
next sibling parent Sean Kelly <sean f4.ca> writes:
In article <d94ls5$2o57$1 digitaldaemon.com>, Manfred Nowak says...
In this case you have a control leak.



Why should they? This kind of argument has shown up repeatedly: Why should a concurrent working machine be viewed as a serial working machine? In fact the AMD cores are designed to have a programmable lower bound on the priority of interrupts they will handle: so they will handle interrupts concurrently.

They should because the way errors are handled depends on system state. And resources for handling there errors are shared. If two errors are thrown concurrently that both want to do something with the speed of the train, for example, something will need to prioritize those operations. What would the speed control do if it simultaneously received errors to stop and to accelerate?
The
'fire' routine would need to know whether it should try and move
a stopped train out of a tunnel, etc.  How can this be solved by
prioritizing exceptions?  Or am I missing something? 


This truly cannot be done by prioritizing and therefore I said, that you have a control leak: depending on the implementation it might be necessary to preemptry both taks assigned to the cores and start one adapted to the more complex scene.

This can all be done in code though. Do multi-core CPUs actually offer instructions to do this in a way that requires language support beyond what D already has? (I suppose I should go read the references you've been posting) Sean
Jun 19 2005
prev sibling next sibling parent Matthias Becker <Matthias_member pathlink.com> writes:
 Simple
 mutli-threading constructs and locking mechanisms should be
 enough to guarantee that D will work in dual-core systems.

Can you prove that?

A dualcore isn't that mucgh different from dual CPUs. Make an example of what problem could arise on a dual core that can't on dual CPUs.
[...]
In this case you have a control leak.



Why should they? This kind of argument has shown up repeatedly: Why should a concurrent working machine be viewed as a serial working machine? In fact the AMD cores are designed to have a programmable lower bound on the priority of interrupts they will handle: so they will handle interrupts concurrently. [...]
And what if the train had already stopped because of an engine
failure, or because someone pulled the emergency brake?


You are right, that you can extend the security rules and will have more complex scenes to solve. Therefore I limited the example to only three variables.
The
'fire' routine would need to know whether it should try and move
a stopped train out of a tunnel, etc.  How can this be solved by
prioritizing exceptions?  Or am I missing something? 


This truly cannot be done by prioritizing and therefore I said, that you have a control leak: depending on the implementation it might be necessary to preemptry both taks assigned to the cores and start one adapted to the more complex scene.

Anyway, this isn't a new problem as real concurrency isn't an invention of this year. We have it for a long time. There are a lot of dual CPU-machines with real concurrency. You haven't described any problem that wouldn't arise on such machine.
Jun 20 2005
prev sibling parent reply Brad Beveridge <brad somewhere.net> writes:
Manfred Nowak wrote:
 James Dunne <james.jdunne gmail.com> wrote:

 
Simple
mutli-threading constructs and locking mechanisms should be
enough to guarantee that D will work in dual-core systems.

Can you prove that?

begs the question - can you prove that existing multi-threaded controls will not work correctly on SMP machines? I've read this thread, and I am sorry to say that I am too thick to see why dual core CPUs are any different to programming multiple CPU machines - or for that matter any different to programming a multi-threaded application. Manfred, you look to be most concerned with concurrency issues - but from a programmers point of view I cannot see the difference between programming with multiple threads and programming with multiple CPUS/cores. Assuming a general purpose OS (and I think we have to), then your train example has (to my mind) exactly the same problems regardless of what kind of machine it is run on. The only true difference is that on a multiple core machine the instructions can actually run at the same physical time, on a single core machine the threads need to share the CPU, but that means nothing because the CPU could change threads every few operations - ie you need to provide the same locks and measures anyhow. Brad
Jun 20 2005
parent Sean Kelly <sean f4.ca> writes:
In article <d96n2l$11lq$1 digitaldaemon.com>, Brad Beveridge says...
Manfred Nowak wrote:
 James Dunne <james.jdunne gmail.com> wrote:

 
Simple
mutli-threading constructs and locking mechanisms should be
enough to guarantee that D will work in dual-core systems.

Can you prove that?

begs the question - can you prove that existing multi-threaded controls will not work correctly on SMP machines?

They will.
I've read this thread, and I am sorry to say that I am too thick to see 
why dual core CPUs are any different to programming multiple CPU 
machines - or for that matter any different to programming a 
multi-threaded application.

AFAIK, dual core machines are indistuingishable from 'true' SMP machines to all but perhaps an OS programmer. The most obvious example of this is that Windows reports each core of a multi-core machine as a separate CPU.
Manfred, you look to be most concerned with concurrency issues - but 
from a programmers point of view I cannot see the difference between 
programming with multiple threads and programming with multiple 
CPUS/cores.

The only difference I can think of is that cache coherency is not an issue with single CPU machines, though you typically have to pretend that it is anyway (since not many applications are written to target a specific hardware configuration). Theoretically, I could see some of what Manfred mentioned being a potential point of optimization for realtime systems, but those would probably be built with a custom compiler and target a specific run environment anyway.
Assuming a general purpose OS (and I think we have to), 
then your train example has (to my mind) exactly the same problems 
regardless of what kind of machine it is run on.  The only true 
difference is that on a multiple core machine the instructions can 
actually run at the same physical time, on a single core machine the 
threads need to share the CPU, but that means nothing because the CPU 
could change threads every few operations - ie you need to provide the 
same locks and measures anyhow.

Exactly. D is no different that any other procedural language in how it deals with concurrency. Though as a point of geek interest I suppose it's worth mentioning that BS' original purpose for C++ was as a concurrent language--it just didn't really stay that way once he'd finished his research. In any case, if there's anything that D lacks, I'd love to hear some concrete examples. It's much easier to address issues when you know specifically what they are, and the discussion has remained pretty abstract up to this point. Sean
Jun 20 2005
prev sibling parent reply Sean Kelly <sean f4.ca> writes:
I need to read up a bit on multi-core systems, but they act the same as SMP
systems, correct?  So your concern is having library facilities which allow you
to assign tasks to different processors and so on?  If so, I think at least some
basic functionality is a candidate for 1.0, especially if some motivated person
is willing to write it :)  I'm currently experimenting with some lockless synch.
functionality in Ares, and would be happy to build processor affinity support
and such into the Thread class if someone is willing to supply the assembly for
it... and I believe Walter would do the same for Phobos.


Sean
Jun 17 2005
next sibling parent reply Brad Beveridge <brad somewhere.net> writes:
Sean Kelly wrote:
 I need to read up a bit on multi-core systems, but they act the same as SMP
 systems, correct?  So your concern is having library facilities which allow you
 to assign tasks to different processors and so on?  If so, I think at least
some
 basic functionality is a candidate for 1.0, especially if some motivated person
 is willing to write it :)  I'm currently experimenting with some lockless
synch.
 functionality in Ares, and would be happy to build processor affinity support
 and such into the Thread class if someone is willing to supply the assembly for
 it... and I believe Walter would do the same for Phobos.
 
 
 Sean
 
 

idea. Of course, as far as I am aware at the application level you don't really get to choose anyhow - you can provide hints to the OS about processor afinity, but that is about it. Writing software for multicore systems is almost the same as writing multithreaded programs - the main difference being that even more sublte bugs can show due to the fact that threads actually are executing at the same time rather than concurrently. As an aside, I don't particularly see the true use for multicore systems in real life applications at the moment. Right now most CPUs, unless you program very carefully, are memory bound - they spend a lot of their time waiting for memory accesses. Having multiple cores just increases the demand on the main memory bus, so the CPUs (unless executing completely out of cache) will still be waiting a lot. But I guess that is why we are seeing larger and larger L1 caches. Brad
Jun 17 2005
parent reply Sean Kelly <sean f4.ca> writes:
In article <d8uq8m$1heq$1 digitaldaemon.com>, Brad Beveridge says...
As an aside, I don't particularly see the true use for multicore systems 
in real life applications at the moment.  Right now most CPUs, unless 
you program very carefully, are memory bound - they spend a lot of their 
time waiting for memory accesses.  Having multiple cores just increases 
the demand on the main memory bus, so the CPUs (unless executing 
completely out of cache) will still be waiting a lot.  But I guess that 
is why we are seeing larger and larger L1 caches.

Exactly. And that leaves us with cache coherency problems. I think we're getting close to a fundamental change in how applications are designed, but I haven't seen any suggestion for how to handle SMP efficiently and easily as locks and such just don't cut it. It's an interesting time for software design :) Sean
Jun 17 2005
parent reply Brad Beveridge <brad somewhere.net> writes:
Sean Kelly wrote:
 In article <d8uq8m$1heq$1 digitaldaemon.com>, Brad Beveridge says...
 

 
 Exactly.  And that leaves us with cache coherency problems.  I think we're
 getting close to a fundamental change in how applications are designed, but I
 haven't seen any suggestion for how to handle SMP efficiently and easily as
 locks and such just don't cut it.  It's an interesting time for software design
 :)
 
 
 Sean
 
 

benefit more from a library that lets you manipulate the cache. Such a library could possibly provide functions to prefill the cache, lock portions of it, etc. Of course, messing with caches is not the kind of thing that you want to do even 1% of the time - there is just too much chance that locking the cache down will negatively impact performance. Especially if the OS wants to do a context switch. Sigh, programming just ain't what it used to be when you could cycle count your assembler instructions & figure out how fast your loop would be :) Brad
Jun 17 2005
parent Sean Kelly <sean f4.ca> writes:
In article <d8utov$1khp$1 digitaldaemon.com>, Brad Beveridge says...
Thinking along these lines, performance programming in D would possibly 
benefit more from a library that lets you manipulate the cache.  Such a 
library could possibly provide functions to prefill the cache, lock 
portions of it, etc.  Of course, messing with caches is not the kind of 
thing that you want to do even 1% of the time - there is just too much 
chance that locking the cache down will negatively impact performance. 
Especially if the OS wants to do a context switch.  Sigh, programming 
just ain't what it used to be when you could cycle count your assembler 
instructions & figure out how fast your loop would be :)

True enough :) And things are changing for x86 architectures in this regard. Until recently, x86 machines only had full mfence facilities (with the LOCK instruction) but IIRC acquire/release instructions were added to the Itanium, and I think things are moving towards more fine-grained cache control. But this is something that is sufficiently complex (even for experts) that it really needs to be done right in a library so that the average joe doesn't have to worry about it. Lockless containers are one such feature, and perhaps some other design patterns would be appropriate to support as well. Ben's work is a definite step in the right direction, and it may well be a basis for some of the stuff that ends up in Ares. As for the rest... it's worth keeping on on the C++ standardization process as they're facing similar issues for the next release. But D has a lead on C++ at the moment because of the way Walter implemented 'volatile'. It's my hope that D will be we well suited for concurrent programming years before the next iteration of the C++ standard is finalized. Sean
Jun 17 2005
prev sibling parent reply Manfred Nowak <svv1999 hotmail.com> writes:
Sean Kelly <sean f4.ca> wrote:

 I need to read up a bit on multi-core systems, but they act the
 same as SMP systems, correct?

Dual-cores _are_ an implementation of SMP.
 So your concern is having library
 facilities which allow you to assign tasks to different
 processors and so on?

No. I have somewhere seen an argument, that if concurrency is not implemented into the language then no compiler can be guaranteed to deliver correct code under all circumstances---therefore concurrency must be implemented into the language. -manfred
Jun 18 2005
next sibling parent Matthias Becker <Matthias_member pathlink.com> writes:
 I need to read up a bit on multi-core systems, but they act the
 same as SMP systems, correct?

Dual-cores _are_ an implementation of SMP.
 So your concern is having library
 facilities which allow you to assign tasks to different
 processors and so on?

No. I have somewhere seen an argument, that if concurrency is not implemented into the language then no compiler can be guaranteed to deliver correct code under all circumstances---therefore concurrency must be implemented into the language.

There are some problems with optimizers that can move code around so things might get called before a libray-lock-directive if the compiler isn't aware of that it musn't move code in fromt of or behind this function call.
Jun 18 2005
prev sibling parent reply Sean Kelly <sean f4.ca> writes:
In article <d90hkm$134$2 digitaldaemon.com>, Manfred Nowak says...
Sean Kelly <sean f4.ca> wrote:

 I need to read up a bit on multi-core systems, but they act the
 same as SMP systems, correct?

Dual-cores _are_ an implementation of SMP.

Just making sure I wasn't missing something.
 So your concern is having library
 facilities which allow you to assign tasks to different
 processors and so on?

No. I have somewhere seen an argument, that if concurrency is not implemented into the language then no compiler can be guaranteed to deliver correct code under all circumstances---therefore concurrency must be implemented into the language.

This is an issue with C/C++. Specifically, it relates to the "as if" rule and the fact that the theoretical virtual machine optimizers target has no concept of concurrency. So there's no real way to ensure volatile instructions aren't being reordered unless you use a synchronization library. D addresses this particular issue somewhat in its reinterpretation of "volatile," and I'm sure Walter is keeping an eye on the C++ standardization talks about this issue as well. Sean
Jun 18 2005
parent reply Manfred Nowak <svv1999 hotmail.com> writes:
Sean Kelly <sean f4.ca> wrote:

[...]
 This is an issue with C/C++.  Specifically, it relates to the
 "as if" rule and the fact that the theoretical virtual machine
 optimizers target has no concept of concurrency.  So there's no
 real way to ensure volatile instructions aren't being reordered
 unless you use a synchronization library.  D addresses this 
 particular issue somewhat in its reinterpretation of "volatile,"
 and I'm sure Walter is keeping an eye on the C++ standardization
 talks about this issue as well.

Are you able to prove, that the argument holds for C++ only, which would be a contradiction to a paper accepted by ACM and available here: http://plg.uwaterloo.ca/~usystem/pub/uSystem/LibraryApproach.ps.gz -manfred
Jun 19 2005
next sibling parent Sean Kelly <sean f4.ca> writes:
In article <d948ms$2feb$1 digitaldaemon.com>, Manfred Nowak says...
Sean Kelly <sean f4.ca> wrote:

[...]
 This is an issue with C/C++.  Specifically, it relates to the
 "as if" rule and the fact that the theoretical virtual machine
 optimizers target has no concept of concurrency.  So there's no
 real way to ensure volatile instructions aren't being reordered
 unless you use a synchronization library.  D addresses this 
 particular issue somewhat in its reinterpretation of "volatile,"
 and I'm sure Walter is keeping an eye on the C++ standardization
 talks about this issue as well.

Are you able to prove, that the argument holds for C++ only, which would be a contradiction to a paper accepted by ACM and available here: http://plg.uwaterloo.ca/~usystem/pub/uSystem/LibraryApproach.ps.gz

Not at all. I imagine many languages target a single-threaded virtual machine. Java is probably one of the few exceptions. Sean
Jun 19 2005
prev sibling parent reply Sean Kelly <sean f4.ca> writes:
In article <d948ms$2feb$1 digitaldaemon.com>, Manfred Nowak says...
Sean Kelly <sean f4.ca> wrote:

[...]
 This is an issue with C/C++.  Specifically, it relates to the
 "as if" rule and the fact that the theoretical virtual machine
 optimizers target has no concept of concurrency.  So there's no
 real way to ensure volatile instructions aren't being reordered
 unless you use a synchronization library.  D addresses this 
 particular issue somewhat in its reinterpretation of "volatile,"
 and I'm sure Walter is keeping an eye on the C++ standardization
 talks about this issue as well.

Are you able to prove, that the argument holds for C++ only, which would be a contradiction to a paper accepted by ACM and available here: http://plg.uwaterloo.ca/~usystem/pub/uSystem/LibraryApproach.ps.gz

Okay, I dug up a copy of Ghostscript for the PC and read the first few pages of this paper. I definately agree with it, but I don't know that it applies to D. For reference, here are the suggested solutions: 1. provide some explicit language facilities to control optimization (eg. pragma, volatile, etc.) 2. provide some concurrency constructs that allow the translator to determine when to disable certain optimizations 3. a combination of approaches one and two It's worth noting that D already provides both of these proposed solutions in language. The 'synchronized' keyword could be used to prevent the compiler from optimizing code around these areas (if it isn't already). And 'volatile' provides programmers who need to implement concurrent code outside of synchronization blocks a means of preventing compiler optimization of critical code blocks. More work may still be useful in this area. For example, 'volatile' in D just prevents optimization across a code block, but it might be worthwhile to provide a means for something akin to acquire and release semantics to allow *some* optimization to occur. Sean
Jun 20 2005
parent reply Brad Beveridge <brad somewhere.net> writes:
Sean Kelly wrote:
<Snip>
 It's worth noting that D already provides both of these proposed solutions in
 language.  The 'synchronized' keyword could be used to prevent the compiler
from
 optimizing code around these areas (if it isn't already).  And 'volatile'
 provides programmers who need to implement concurrent code outside of
 synchronization blocks a means of preventing compiler optimization of critical
 code blocks.  More work may still be useful in this area.  For example,
 'volatile' in D just prevents optimization across a code block, but it might be
 worthwhile to provide a means for something akin to acquire and release
 semantics to allow *some* optimization to occur.
 

... some optimised code (A) ... volatile { ... some order critical code ... } ... some optimised code (B) It is obvious from the description of volatile that the 3 sections of code above will have memory barriers, ie when the volatile section begins all memory writes from A will have occured, and when B begins executing all memory writes from the volatile block will have finished. But, does code within the volatile block get optimised? It would be nice if code within a volatile statement is strictly ordered, with no opportunity for the compiler to move memory read/write operations. Does anybody know if this is true in practice? Brad
Jun 20 2005
parent Sean Kelly <sean f4.ca> writes:
In article <d97jeu$1mcv$1 digitaldaemon.com>, Brad Beveridge says...
Sean Kelly wrote:
<Snip>
 It's worth noting that D already provides both of these proposed solutions in
 language.  The 'synchronized' keyword could be used to prevent the compiler
from
 optimizing code around these areas (if it isn't already).  And 'volatile'
 provides programmers who need to implement concurrent code outside of
 synchronization blocks a means of preventing compiler optimization of critical
 code blocks.  More work may still be useful in this area.  For example,
 'volatile' in D just prevents optimization across a code block, but it might be
 worthwhile to provide a means for something akin to acquire and release
 semantics to allow *some* optimization to occur.
 

... some optimised code (A) ... volatile { ... some order critical code ... } ... some optimised code (B) It is obvious from the description of volatile that the 3 sections of code above will have memory barriers, ie when the volatile section begins all memory writes from A will have occured, and when B begins executing all memory writes from the volatile block will have finished. But, does code within the volatile block get optimised? It would be nice if code within a volatile statement is strictly ordered, with no opportunity for the compiler to move memory read/write operations. Does anybody know if this is true in practice?

The spec just says that "Memory writes occurring before the Statement are performed before any reads within or after the Statement. Memory reads occurring after the Statement occur after any writes before or within Statement are completed." So the compiler is currently free to optimize within the code block, just not across the boundaries. And now that I look at it, it sounds like volatile statements already implement acquire/release semantics. I think the current behavior is actually okay though, as the code within the volatile block could theoretically be thousands of lines long, and I wouldn't want the optimizer to ignore that code completely, just not optimize it beyond the boundaries I've established. Also, the requirements for 'synchronized' say nothing about optimizer behavior, and I think they should--'synchronized' should probably be identical to 'volatile' except that the block is also atomic. I grant that it would be easy enough for a Mutex writer to add volatile blocks to his code, but as a synchronized block is implicitly volatile, it's worth changing simply to improve clarity if nothing else. Sean
Jun 21 2005
prev sibling parent reply Derek Parnell <derek psych.ward> writes:
On Thu, 16 Jun 2005 16:09:44 +0000 (UTC), Manfred Nowak wrote:

 The shipping of the "AMD Athlon 64 X2" is announced to start at the 
 end of this month.
 
 A review is available:
 http://www.amdreview.com/reviews.php?rev=athlonx24200
 
 As the review suggests WinXP and Sandra are prepared to use more than 
 one CPU.
 
 Will D be outdated before the release of 1.0 because D has no support 
 for multi core units?

Yes. In the exact same manner that all existing 3+GL languages are. C/C++/C#/Eiffel,SmallTalk, Forth, COBOL, VB, Fortran, ... But maybe you are talking about library support rather than language support? Are you talking about the need for D to have new keywords or new object code generation when the target is a dual/triple/quadruple/quintuple/... core machine? Maybe this thread can be renamed "Duel Core Support" ;-) -- Derek Parnell Melbourne, Australia 20/06/2005 7:35:55 AM
Jun 19 2005
parent reply Manfred Nowak <svv1999 hotmail.com> writes:
Derek Parnell <derek psych.ward> wrote:

[...]
 Will D be outdated before the release of 1.0 because D has no
 support for multi core units?

Yes. In the exact same manner that all existing 3+GL languages are. C/C++/C#/Eiffel,SmallTalk, Forth, COBOL, VB, Fortran, ...

I disagree. All this languages are way beyond version 1.0 whereas D isnt.
 But maybe you are talking about library support rather than
 language support?

If the paper of Buhr, which I have mentioned somewehere above, is right then it is possible to include all concurrency support into a library, but only if the language follows the rules dictated by the library. And I agree with Buhr that such dicatation is the same as havin chnged the language.
 Are you talking about the need for D to have
 new keywords or new object code generation when the target is a
 dual/triple/quadruple/quintuple/... core machine? 

According to my statement above a clear: maybe. And the reason for this is that I do not believe that the only two keyowrds in D that something have to do with concurrency can be show as aequivalents to Buhrs "mutex" and "monitor". But I may be wrong.
 Maybe this thread can be renamed "Duel Core Support"  ;-)

Thx for this broad hint. In fact I feel thrown onto a position which I did not want to be engaged in. All I wanted to know is whether there is a proof that D can handle concurrency in general and as the title shows dual cores as a special case. Maybe I should have posted this into the "learn" group. However, I posted here and found myself confronted with opinions, that dual cores are not different from single cores or unfounded claims that D can handle any kind of concurrency. Somehow I feel very uncomfortable. -manfred
Jun 20 2005
next sibling parent Brad Beveridge <brad somewhere.net> writes:
Manfred Nowak wrote:

 Thx for this broad hint. In fact I feel thrown onto a position which 
 I did not want to be engaged in. All I wanted to know is whether 
 there is a proof that D can handle concurrency in general and as the 
 title shows dual cores as a special case. Maybe I should have posted 
 this into the "learn" group.
 
 However, I posted here and found myself confronted with opinions, 
 that dual cores are not different from single cores or unfounded 
 claims that D can handle any kind of concurrency.
 
 Somehow I feel very uncomfortable.

If I have contributed to your discomfort, I am sorry - that was certainly not my intention. I truly am interested in this topic, but as I've said before I just don't understand the problem. I also have not read the references previously posted as they are not in a format I can easily open (need to get a ps viewer, etc). I think the primary things I don't understand are (all are from a logical/programmers point of view) 1) Is there any difference between multiple core CPUs, and machines with multiple CPUs? * I don't believe that there is any significant difference, in which case we perhaps should agree that we are talking about SMP in general. 2) From a programmers point of view, what _is_ the difference between a program that runs in multiple threads and a program that runs in multiple threads on multiple cores? * I understand that physically there are different things happening, but I currently believe that logically there is no difference. 3) Can you please summerise the primitives that are required to program properly on SMP machines? * Although I do little multi-threaded programming, I understand that threads need to have atomic operations as a basic synchronizing mechanism, other than that I am not familiar enough to comment. 4) Could you please show a specific case that D is not able to handle an SMP situation, and how it could/should be fixed with additions to the language? * I liked the train example, could you perhaps make it pseudo-code & point out the weaknesses? Thanks Brad
Jun 20 2005
prev sibling parent reply Matthias Becker <Matthias_member pathlink.com> writes:
 Are you talking about the need for D to have
 new keywords or new object code generation when the target is a
 dual/triple/quadruple/quintuple/... core machine? 

According to my statement above a clear: maybe. And the reason for this is that I do not believe that the only two keyowrds in D that something have to do with concurrency can be show as aequivalents to Buhrs "mutex" and "monitor". But I may be wrong.

You can build mutexes and monitors with synchronized without problems.
Jun 21 2005
parent reply Manfred Nowak <svv1999 hotmail.com> writes:
Matthias Becker <Matthias_member pathlink.com> wrote:

 You can build mutexes and monitors with synchronized without
 problems. 

So why did Buhr implement them? -manfred
Jun 21 2005
parent Brad Beveridge <brad somewhere.net> writes:
Manfred Nowak wrote:
 Matthias Becker <Matthias_member pathlink.com> wrote:
 
 
You can build mutexes and monitors with synchronized without
problems. 

So why did Buhr implement them? -manfred

don't see that he implemented anything. He made two basic points 1) Variables cached in registers will not be visible between tasks 2) Code optimisation can reorder instructions agressively, which can lead to code that should be inside critical sections being moved outside critical sections. C addresses point 1 with the volatile keyword, any variable that is "volatile" will be written to memory rather than kept solely in registers. D's meaning of volatile addresses both concerns, code cannot move around a volatile statement, and reads and writes are performed to memory. D also adds "synchronized", but in reality you could build your own locks on top of volatile without the language feature "sychronized". So D as a language meets the criteria for concurrent programming that Buhr layed out. Brad
Jun 22 2005