digitalmars.D - Multicores and Publication Safety

Walter Bright (4/4) Aug 04 2008 "What memory fences are useful for on multiprocessors; and why you

Walter Bright (3/4) Aug 04 2008 There seems to be a cadre of reddit readers who immediately vote down
Jb (7/11) Aug 04 2008 None of that is relevant on x86 as far as I understand. I could only fin...

Brad Roberts (2/19) Aug 04 2008 Pay very close attention to sections 2.3 and 2.4 of that document.

Sean Kelly (33/52) Aug 04 2008 2.4 is the most interesting aspect of PC. It means that you can run

Brad Roberts (10/76) Aug 04 2008 For that example, section 2.8 kicks in, locked instructions (such as

Jb (22/43) Aug 05 2008 They dont override 2.1, they complement it. IE...

Walter Bright (6/9) Aug 05 2008 It's risky to write such code, however, because:

Jb (18/27) Aug 05 2008 You cant design / write your code based on the idea that someone who doe...

Walter Bright (3/36) Aug 05 2008 I think that is because the current language technology is deficient. We...

Jb (6/19) Aug 06 2008 FWIW i think you're right.

Sean Kelly (11/27) Aug 04 2008 Not true. The actual behavior of IA-32 processors has been hotly

Jb (6/28) Aug 05 2008 Thats news to me.

Sean Kelly (3/34) Aug 05 2008 True enough. It's mostly an issue with creating mutexes and the like.
Sean Kelly (4/27) Aug 05 2008 I don't know that this was ever confirmed with anyone at AMD, but it did...

Jb (15/42) Aug 05 2008 I did a bit of googling and it does seem older AMDs were less strongly

Sean Kelly (19/65) Aug 05 2008 At least AMD and Intel have figured out how to separate discussion of

Benji Smith (8/13) Aug 05 2008 Interesting you should bring this up. I was just reading an article

Walter Bright <newshound1 digitalmars.com> writes:

"What memory fences are useful for on multiprocessors; and why you 
should care, even if you're not an assembly programmer."

http://bartoszmilewski.wordpress.com/2008/08/04/multicores-and-publication-safety/

http://www.reddit.com/comments/6uuqc/multicores_and_publication_safety/

Aug 04 2008

Walter Bright <newshound1 digitalmars.com> writes:

Walter Bright wrote:
 http://www.reddit.com/comments/6uuqc/multicores_and_publication_safety/

There seems to be a cadre of reddit readers who immediately vote down 
anything on D. That can be counteracted if the community votes them up!

Aug 04 2008

"Jb" <jb nowhere.com> writes:

"Walter Bright" <newshound1 digitalmars.com> wrote in message 
news:g7855a$2sd3$1 digitalmars.com...
 "What memory fences are useful for on multiprocessors; and why you should 
 care, even if you're not an assembly programmer."

 http://bartoszmilewski.wordpress.com/2008/08/04/multicores-and-publication-safety/

 http://www.reddit.com/comments/6uuqc/multicores_and_publication_safety/

None of that is relevant on x86 as far as I understand. I could only find 
the one regarding x86-64, but as far as I know it's the same on x86-32.

http://www.intel.com/products/processor/manuals/318147.pdf

The key point being loads are not reordered with other loads, and stores are 
not reordered with other stores.

Aug 04 2008

Brad Roberts <braddr puremagic.com> writes:

Jb wrote:
 "Walter Bright" <newshound1 digitalmars.com> wrote in message 
 news:g7855a$2sd3$1 digitalmars.com...
 "What memory fences are useful for on multiprocessors; and why you should 
 care, even if you're not an assembly programmer."

 http://bartoszmilewski.wordpress.com/2008/08/04/multicores-and-publication-safety/

 http://www.reddit.com/comments/6uuqc/multicores_and_publication_safety/

 
 None of that is relevant on x86 as far as I understand. I could only find 
 the one regarding x86-64, but as far as I know it's the same on x86-32.
 
 http://www.intel.com/products/processor/manuals/318147.pdf
 
 The key point being loads are not reordered with other loads, and stores are 
 not reordered with other stores.
 

Pay very close attention to sections 2.3 and 2.4 of that document.

Aug 04 2008

Sean Kelly <sean invisibleduck.org> writes:

Brad Roberts wrote:
 Jb wrote:
 "Walter Bright" <newshound1 digitalmars.com> wrote in message 
 news:g7855a$2sd3$1 digitalmars.com...
 "What memory fences are useful for on multiprocessors; and why you should 
 care, even if you're not an assembly programmer."

 http://bartoszmilewski.wordpress.com/2008/08/04/multicores-and-publication-safety/

 http://www.reddit.com/comments/6uuqc/multicores_and_publication_safety/

 None of that is relevant on x86 as far as I understand. I could only find 
 the one regarding x86-64, but as far as I know it's the same on x86-32.

 http://www.intel.com/products/processor/manuals/318147.pdf

 The key point being loads are not reordered with other loads, and stores are 
 not reordered with other stores.

 
 Pay very close attention to sections 2.3 and 2.4 of that document.

2.4 is the most interesting aspect of PC.  It means that you can run 
into situations like this:

     // Thread A
     x = 1;

     // Thread B
     if( x == 1 )
         y = 1;

     // Thread C
     if( y == 1 )
         assert( x == 1 ); // may fail

Alex Terekhov came up with a sneaky solution for this based on how the 
IA-32 spec says CAS is currently implemented:

     // Thread A
     x = 1;

     // Thread B
     t = CAS( x, 0, 0 );
     if( t == 1 )
         y = 1;

     // Thread C
     if( y == 1 )
         assert( x == 1 ); // true

In essence, Intel currently implements CAS by either storing the new 
value /or/ re-storing the old value based on the result of the 
comparison, and because all stores from a single processor are ordered, 
Thread C is therefore guaranteed to see the store to x before the store 
to y.

As cool as I find the above solution, however, I do hope that this helps 
to demonstrate the complexity of lock-free programming.  It also shows 
just how complex analysis of this stuff is.  Even with the full source 
code available it would take some doing for a compiler to recognize a 
problem similar to the above.


Sean

Aug 04 2008

Brad Roberts <braddr puremagic.com> writes:

Sean Kelly wrote:
 Brad Roberts wrote:
 Jb wrote:
 "Walter Bright" <newshound1 digitalmars.com> wrote in message
 news:g7855a$2sd3$1 digitalmars.com...
 "What memory fences are useful for on multiprocessors; and why you
 should care, even if you're not an assembly programmer."

 http://bartoszmilewski.wordpress.com/2008/08/04/multicores-and-publication-safety/


 http://www.reddit.com/comments/6uuqc/multicores_and_publication_safety/

 None of that is relevant on x86 as far as I understand. I could only
 find the one regarding x86-64, but as far as I know it's the same on
 x86-32.

 http://www.intel.com/products/processor/manuals/318147.pdf

 The key point being loads are not reordered with other loads, and
 stores are not reordered with other stores.

 Pay very close attention to sections 2.3 and 2.4 of that document.

 
 2.4 is the most interesting aspect of PC.  It means that you can run
 into situations like this:
 
     // Thread A
     x = 1;
 
     // Thread B
     if( x == 1 )
         y = 1;
 
     // Thread C
     if( y == 1 )
         assert( x == 1 ); // may fail
 
 Alex Terekhov came up with a sneaky solution for this based on how the
 IA-32 spec says CAS is currently implemented:
 
     // Thread A
     x = 1;
 
     // Thread B
     t = CAS( x, 0, 0 );
     if( t == 1 )
         y = 1;
 
     // Thread C
     if( y == 1 )
         assert( x == 1 ); // true
 
 In essence, Intel currently implements CAS by either storing the new
 value /or/ re-storing the old value based on the result of the
 comparison, and because all stores from a single processor are ordered,
 Thread C is therefore guaranteed to see the store to x before the store
 to y.
 
 As cool as I find the above solution, however, I do hope that this helps
 to demonstrate the complexity of lock-free programming.  It also shows
 just how complex analysis of this stuff is.  Even with the full source
 code available it would take some doing for a compiler to recognize a
 problem similar to the above.
 
 
 Sean

For that example, section 2.8 kicks in, locked instructions (such as
CAS) help constrain ordering.

So.. summary.  Reordering is real, even on x86 class hardware.  To make
life even more interesting, there's also various cpu bugs that help make
things even worse.  See this thread (unconfirmed info, but interesting
non-the-less) on the linux-kernel mailing list:

http://www.ussg.iu.edu/hypermail/linux/kernel/0808.0/0882.html

Whee,
Brad

Aug 04 2008

"Jb" <jb nowhere.com> writes:

"Brad Roberts" <braddr puremagic.com> wrote in message 
news:mailman.10.1217908384.1156.digitalmars-d puremagic.com...
 Jb wrote:
 "Walter Bright" <newshound1 digitalmars.com> wrote in message
 news:g7855a$2sd3$1 digitalmars.com...
 "What memory fences are useful for on multiprocessors; and why you 
 should
 care, even if you're not an assembly programmer."

 http://bartoszmilewski.wordpress.com/2008/08/04/multicores-and-publication-safety/

 http://www.reddit.com/comments/6uuqc/multicores_and_publication_safety/

 None of that is relevant on x86 as far as I understand. I could only find
 the one regarding x86-64, but as far as I know it's the same on x86-32.

 http://www.intel.com/products/processor/manuals/318147.pdf

 The key point being loads are not reordered with other loads, and stores 
 are
 not reordered with other stores.

 Pay very close attention to sections 2.3 and 2.4 of that document.

They dont override 2.1, they complement it. IE...

*Stores cannot be reordered with other stores*
*Loads cannot be reordered with other loads*

x = 1;
ready = 1;

Happens in order whether or not a load is reordered with those stores. You 
cant have a situation where a processor sees the write to "ready" before it 
sees the write "x".

What Bartoz said.. "writes to memory can be completed out of order and"

Is not true on x86.

What 2.3 is saying is that a later load could be reordered before either 
store, but it still cant be reordered before the store to 'x' and after the 
store to 'ready', because the order of those stores cannot be changed. If it 
gets reordered before the store to 'x' it implicity gets reordered before 
the store to ready.

That's the whole point of the ordering of stores / loads being enforced.

Reagrding 2.4 : What this is saying is that there may be a delay between 
processors seeing each others stores, not that they can be seen out of 
order. Processor 1 may see it's own write to 'x' before processor 2 does, 
but processor 2 still wont see the write to 'ready' before the write to 'x'.

Aug 05 2008

Walter Bright <newshound1 digitalmars.com> writes:

Jb wrote:
 What Bartoz said.. "writes to memory can be completed out of order and"
 
 Is not true on x86.

It's risky to write such code, however, because:

1. someone else may try to port it to another processor, and then be 
mystified as to why it breaks

2. Intel may change this behavior on future x86's, which means your code 
will break years from now

Aug 05 2008

"Jb" <jb nowhere.com> writes:

"Walter Bright" <newshound1 digitalmars.com> wrote in message 
news:g795mq$25jq$1 digitalmars.com...
 Jb wrote:
 What Bartoz said.. "writes to memory can be completed out of order and"

 Is not true on x86.

 It's risky to write such code, however, because:

 1. someone else may try to port it to another processor, and then be 
 mystified as to why it breaks

You cant design / write your code based on the idea that someone who doesnt 
know what they are doing will try and modify it later. And if they are 
unaware of memory ordering they are likely unaware of alignment atomicity, 
and probably dont understand the subtleties of syncronization, and a whole 
bunch of other issues.

I'm not saying every joe blogs programmer should know about memory ordering 
and use it where they can to avoid more expensive syncronization primatives. 
But the compiler and stdlib, or multithreding librarys, should know about 
it. I dont think the compiler should be dumping memory fences all over the 
place on the assumtion that they might be needed by the x86 processors of 
2012.


 2. Intel may change this behavior on future x86's, which means your code 
 will break years from now

I dont think they could because i think a lot of code probably already relys 
on it. And i think it's likely that the new comitment to strong memory 
ordering, from both AMD and INTEL (both have pdfs regarding 64 bit that 
specify it), is mainly because they realize it is needed to help progress 
with multi core.

Aug 05 2008

Walter Bright <walter nospammm-digitalmars.com> writes:

Jb Wrote:

 
 "Walter Bright" <newshound1 digitalmars.com> wrote in message 
 news:g795mq$25jq$1 digitalmars.com...
 Jb wrote:
 What Bartoz said.. "writes to memory can be completed out of order and"

 Is not true on x86.

 It's risky to write such code, however, because:

 1. someone else may try to port it to another processor, and then be 
 mystified as to why it breaks

 
 You cant design / write your code based on the idea that someone who doesnt 
 know what they are doing will try and modify it later. And if they are 
 unaware of memory ordering they are likely unaware of alignment atomicity, 
 and probably dont understand the subtleties of syncronization, and a whole 
 bunch of other issues.
 
 I'm not saying every joe blogs programmer should know about memory ordering 
 and use it where they can to avoid more expensive syncronization primatives. 
 But the compiler and stdlib, or multithreding librarys, should know about 
 it. I dont think the compiler should be dumping memory fences all over the 
 place on the assumtion that they might be needed by the x86 processors of 
 2012.

The model the compiler uses is to generate code "as if" fences were inserted
everywhere. The compiler may, however, as part of optimization and generating
code for a particular CPU, elide as many as it can.


 2. Intel may change this behavior on future x86's, which means your code 
 will break years from now

 
 I dont think they could because i think a lot of code probably already relys 
 on it. And i think it's likely that the new comitment to strong memory 
 ordering, from both AMD and INTEL (both have pdfs regarding 64 bit that 
 specify it), is mainly because they realize it is needed to help progress 
 with multi core.

I think that is because the current language technology is deficient. We aim to
fix that with D :-)

Aug 05 2008

"Jb" <jb nowhere.com> writes:

"Walter Bright" <walter nospammm-digitalmars.com> wrote in message 
news:g7b7h1$aeb$1 digitalmars.com...
 2. Intel may change this behavior on future x86's, which means your 
 code
 will break years from now

 I dont think they could because i think a lot of code probably already 
 relys
 on it. And i think it's likely that the new comitment to strong memory
 ordering, from both AMD and INTEL (both have pdfs regarding 64 bit that
 specify it), is mainly because they realize it is needed to help progress
 with multi core.

 I think that is because the current language technology is deficient. We 
 aim to fix that with D :-)

FWIW i think you're right.

But a little more help from the hardware would be nice aswell. I'd like to 
see "lock free" (non blocking) syncronization made a bit easier, somthing 
like a double CAS.

Aug 06 2008

Sean Kelly <sean invisibleduck.org> writes:

Jb wrote:
 "Walter Bright" <newshound1 digitalmars.com> wrote in message 
 news:g7855a$2sd3$1 digitalmars.com...
 "What memory fences are useful for on multiprocessors; and why you should 
 care, even if you're not an assembly programmer."

 http://bartoszmilewski.wordpress.com/2008/08/04/multicores-and-publication-safety/

 http://www.reddit.com/comments/6uuqc/multicores_and_publication_safety/

 
 None of that is relevant on x86 as far as I understand. I could only find 
 the one regarding x86-64, but as far as I know it's the same on x86-32.
 
 http://www.intel.com/products/processor/manuals/318147.pdf
 
 The key point being loads are not reordered with other loads, and stores are 
 not reordered with other stores.

Not true.  The actual behavior of IA-32 processors has been hotly 
debated, but it's been established that at least certain AMD processors 
may reorder loads.  Also, even under the PCsc model it is completely 
legal to "hoist" loads above stores, or equivalently, to "sink" stores 
below loads.  In short, unless you've *really* done your homework I 
suggest being very careful with respect to lock-free programming--ie. 
always perform fully sequenced operations just to be safe.  Tango has 
had such a module from the start, and it looks like Phobos2 may get one 
fairly soon as well.


Sean

Aug 04 2008

"Jb" <jb nowhere.com> writes:

"Sean Kelly" <sean invisibleduck.org> wrote in message 
news:g78man$17sb$1 digitalmars.com...
 Jb wrote:
 "Walter Bright" <newshound1 digitalmars.com> wrote in message 
 news:g7855a$2sd3$1 digitalmars.com...
 "What memory fences are useful for on multiprocessors; and why you 
 should care, even if you're not an assembly programmer."

 http://bartoszmilewski.wordpress.com/2008/08/04/multicores-and-publication-safety/

 http://www.reddit.com/comments/6uuqc/multicores_and_publication_safety/

 None of that is relevant on x86 as far as I understand. I could only find 
 the one regarding x86-64, but as far as I know it's the same on x86-32.

 http://www.intel.com/products/processor/manuals/318147.pdf

 The key point being loads are not reordered with other loads, and stores 
 are not reordered with other stores.

 Not true.  The actual behavior of IA-32 processors has been hotly debated, 
 but it's been established that at least certain AMD processors may reorder 
 loads.

Thats news to me.


 Also, even under the PCsc model it is completely legal to "hoist" loads 
 above stores, or equivalently, to "sink" stores below loads.

Yes but as long as stores are not reordered with other stores, and loads not 
reordered with other loads, then that kind of re-ordering wont result in the 
situation Bartoz described.

Aug 05 2008

Sean Kelly <sean invisibleduck.org> writes:

Jb wrote:
 "Sean Kelly" <sean invisibleduck.org> wrote in message 
 news:g78man$17sb$1 digitalmars.com...
 Jb wrote:
 "Walter Bright" <newshound1 digitalmars.com> wrote in message 
 news:g7855a$2sd3$1 digitalmars.com...
 "What memory fences are useful for on multiprocessors; and why you 
 should care, even if you're not an assembly programmer."

 http://bartoszmilewski.wordpress.com/2008/08/04/multicores-and-publication-safety/

 http://www.reddit.com/comments/6uuqc/multicores_and_publication_safety/

 None of that is relevant on x86 as far as I understand. I could only find 
 the one regarding x86-64, but as far as I know it's the same on x86-32.

 http://www.intel.com/products/processor/manuals/318147.pdf

 The key point being loads are not reordered with other loads, and stores 
 are not reordered with other stores.

 Not true.  The actual behavior of IA-32 processors has been hotly debated, 
 but it's been established that at least certain AMD processors may reorder 
 loads.

 
 Thats news to me.
 
 
 Also, even under the PCsc model it is completely legal to "hoist" loads 
 above stores, or equivalently, to "sink" stores below loads.

 
 Yes but as long as stores are not reordered with other stores, and loads not 
 reordered with other loads, then that kind of re-ordering wont result in the 
 situation Bartoz described.

True enough.  It's mostly an issue with creating mutexes and the like.


Sean

Aug 05 2008

Sean Kelly <sean invisibleduck.org> writes:

Jb wrote:
 "Sean Kelly" <sean invisibleduck.org> wrote in message 
 news:g78man$17sb$1 digitalmars.com...
 Jb wrote:
 "Walter Bright" <newshound1 digitalmars.com> wrote in message 
 news:g7855a$2sd3$1 digitalmars.com...
 "What memory fences are useful for on multiprocessors; and why you 
 should care, even if you're not an assembly programmer."

 http://bartoszmilewski.wordpress.com/2008/08/04/multicores-and-publication-safety/

 http://www.reddit.com/comments/6uuqc/multicores_and_publication_safety/

 None of that is relevant on x86 as far as I understand. I could only find 
 the one regarding x86-64, but as far as I know it's the same on x86-32.

 http://www.intel.com/products/processor/manuals/318147.pdf

 The key point being loads are not reordered with other loads, and stores 
 are not reordered with other stores.

 Not true.  The actual behavior of IA-32 processors has been hotly debated, 
 but it's been established that at least certain AMD processors may reorder 
 loads.

 
 Thats news to me.

I don't know that this was ever confirmed with anyone at AMD, but it did 
come up in the C++0x talks and I believe the linux kernel accounts for it.


Sean

Aug 05 2008

"Jb" <jb nowhere.com> writes:

"Sean Kelly" <sean invisibleduck.org> wrote in message 
news:g79ugv$mdd$1 digitalmars.com...
 Jb wrote:
 "Sean Kelly" <sean invisibleduck.org> wrote in message 
 news:g78man$17sb$1 digitalmars.com...
 Jb wrote:
 "Walter Bright" <newshound1 digitalmars.com> wrote in message 
 news:g7855a$2sd3$1 digitalmars.com...
 "What memory fences are useful for on multiprocessors; and why you 
 should care, even if you're not an assembly programmer."

 http://bartoszmilewski.wordpress.com/2008/08/04/multicores-and-publication-safety/

 http://www.reddit.com/comments/6uuqc/multicores_and_publication_safety/

 None of that is relevant on x86 as far as I understand. I could only 
 find the one regarding x86-64, but as far as I know it's the same on 
 x86-32.

 http://www.intel.com/products/processor/manuals/318147.pdf

 The key point being loads are not reordered with other loads, and 
 stores are not reordered with other stores.

 Not true.  The actual behavior of IA-32 processors has been hotly 
 debated, but it's been established that at least certain AMD processors 
 may reorder loads.

 Thats news to me.

 I don't know that this was ever confirmed with anyone at AMD, but it did 
 come up in the C++0x talks and I believe the linux kernel accounts for it.

I did a bit of googling and it does seem older AMDs were less strongly 
ordered. It seems SSE/3DNow non temporal stores particulary. But it looks 
like they have gone for strong ordering with AMD64.

http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24593.pdf

From 7.2 : Multiprocessor Memory Ordering.

"Loads do not pass previous loads (loads are not re-ordered). Stores do not 
pass previous stores
(stores are not re-ordered)"

Although skim reading more of chapter 7 it looks like they might do 
reordering behind the scence, or "such that the appearance of in-order 
execution is maintained" as they say.

My guess is that strong ordering, or at least the appearance of it, is an 
important factor in multi core cpus scalling well.

Aug 05 2008

Sean Kelly <sean invisibleduck.org> writes:

Jb wrote:
 "Sean Kelly" <sean invisibleduck.org> wrote in message 
 news:g79ugv$mdd$1 digitalmars.com...
 Jb wrote:
 "Sean Kelly" <sean invisibleduck.org> wrote in message 
 news:g78man$17sb$1 digitalmars.com...
 Jb wrote:
 "Walter Bright" <newshound1 digitalmars.com> wrote in message 
 news:g7855a$2sd3$1 digitalmars.com...
 "What memory fences are useful for on multiprocessors; and why you 
 should care, even if you're not an assembly programmer."

 http://bartoszmilewski.wordpress.com/2008/08/04/multicores-and-publication-safety/

 http://www.reddit.com/comments/6uuqc/multicores_and_publication_safety/

 None of that is relevant on x86 as far as I understand. I could only 
 find the one regarding x86-64, but as far as I know it's the same on 
 x86-32.

 http://www.intel.com/products/processor/manuals/318147.pdf

 The key point being loads are not reordered with other loads, and 
 stores are not reordered with other stores.

 Not true.  The actual behavior of IA-32 processors has been hotly 
 debated, but it's been established that at least certain AMD processors 
 may reorder loads.

 Thats news to me.

 I don't know that this was ever confirmed with anyone at AMD, but it did 
 come up in the C++0x talks and I believe the linux kernel accounts for it.

 
 I did a bit of googling and it does seem older AMDs were less strongly 
 ordered. It seems SSE/3DNow non temporal stores particulary. But it looks 
 like they have gone for strong ordering with AMD64.
 
 http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24593.pdf
 
 From 7.2 : Multiprocessor Memory Ordering.
 
 "Loads do not pass previous loads (loads are not re-ordered). Stores do not 
 pass previous stores
 (stores are not re-ordered)"
 
 Although skim reading more of chapter 7 it looks like they might do 
 reordering behind the scence, or "such that the appearance of in-order 
 execution is maintained" as they say.

At least AMD and Intel have figured out how to separate discussion of 
implementation issues with visible behavior.  The original IA-32 spec 
was an absolute disaster in this respect.  I'm also encouraged that the 
memory model has been both fully specified and strengthened to PCsc or 
better.  The x86 has always been pretty easy to deal with and it's nice 
to see that this will continue to be true.  I suppose my only question 
at this point is how the official memory barrier instructions apply to 
normal (non-SSE) instruction ordering.  I don't suppose the recent specs 
say anything about this?

 My guess is that strong ordering, or at least the appearance of it, is an 
 important factor in multi core cpus scalling well.

Yup.  And the Intel announcement makes the very good point that it's a 
huge factor in performance per watt as well.  Strengthening the memory 
model and shrinking the pipeline allows for a tremendous amount of logic 
hardware to simply be thrown away, which means smaller, cooler, more 
energy-efficient CPUs.  My big question now is how computers will be 
built in the coming years... will we have a few traditional (fast) cores 
plus a general-purpose parallel computing cluster?  I suppose I should 
read that Intel paper posted yesterday.


Sean

Aug 05 2008

Benji Smith <dlanguage benjismith.net> writes:

Sean Kelly wrote:
 My big question now is how computers will be 
 built in the coming years... will we have a few traditional (fast) cores 
 plus a general-purpose parallel computing cluster?
 
 Sean

Interesting you should bring this up. I was just reading an article 
yesterday about the "Cell Broadband Engine" used in the Playstation 3.

It features one general-purpose 64-bit PowerPC chip (the "Power 
Processor Element") and eight co-processing cores (the "Synergistic 
Processing Units"), each with a 128-bit SIMD architecture.

So, at least from the perspective of IBM and Sony, the answer is "yes".

--benji

Aug 05 2008

D Programming

C/C++ Programming

Other

digitalmars.D - Multicores and Publication Safety