digitalmars.D - Dual CPU code

bearophile (8/8) Feb 02 2009 This comes after a small discussion I've had in the #D IRC channel.

Don (2/15) Feb 02 2009 Is this mostly integer, or floating point code?

bearophile (4/5) Feb 02 2009 In that specific cases, it's mostly FP. If I compile it with LDC with -s...

Walter Bright (18/26) Feb 02 2009 This is a very old problem, it even cropped up in the bad old DOS days

bearophile (7/15) Feb 02 2009 I think that solves my problem, thank you. It's a simple solution (maybe...

Walter Bright (2/15) Feb 02 2009 That's one way to do it.

grauzone (2/18) Feb 02 2009 The glorious return of include files!
Andrei Alexandrescu (5/22) Feb 02 2009 I must be missing something - why isn't

Walter Bright (4/28) Feb 02 2009 Because importing something does not change how it was compiled. If you
BCS (2/15) Feb 02 2009 the code generator needs to be run on the code more than once.
Christopher Wright (4/31) Feb 02 2009 The shared code has to be compiled with two sets of compiler switches,

BCS (10/32) Feb 02 2009 my first thought would be to play games with the linker:

Tim M (4/27) Feb 02 2009 Is this the sort thing you are looking for:

bearophile <bearophileHUGS lycos.com> writes:

This comes after a small discussion I've had in the #D IRC channel.

I have seen that the LDC compiler is much more efficient if you use SSE(2)
extensions, while it's not much efficient if you don't use them (GCC/GDC don't
seem so much sensitive to the presence of the SSE extensions).

I often have to switch from an old and a new CPU, so if I compile with SSE2
extensions the program doesn't run on the old CPU, while if I don't use them, I
sometimes have a program that goes much slower on the newer CPU.

So, it may be useful to have a way to build executables able to run well on
both CPUs (Apple has done something like this two or more times in the past).
There are several ways to do this, a solution is to compile just critical
functions for different CPUs, but that may require compiler support.
My executables are generally small, so doubling their size isn't a problem. So
a simple solution is to bundle two whole executables into an executable and add
a small header that looks for the current CPU, and runs the right executable.

Notice that the problem I have shown isn't limited to SSE2, it's more common,
for example in the close future you may want code compiled for the GPU and/or
CPU, etc.

Bye,
bearophile

Feb 02 2009

Don <nospam nospam.com> writes:

bearophile wrote:
 This comes after a small discussion I've had in the #D IRC channel.
 
 I have seen that the LDC compiler is much more efficient if you use SSE(2)
extensions, while it's not much efficient if you don't use them (GCC/GDC don't
seem so much sensitive to the presence of the SSE extensions).
 
 I often have to switch from an old and a new CPU, so if I compile with SSE2
extensions the program doesn't run on the old CPU, while if I don't use them, I
sometimes have a program that goes much slower on the newer CPU.
 
 So, it may be useful to have a way to build executables able to run well on
both CPUs (Apple has done something like this two or more times in the past).
There are several ways to do this, a solution is to compile just critical
functions for different CPUs, but that may require compiler support.
 My executables are generally small, so doubling their size isn't a problem. So
a simple solution is to bundle two whole executables into an executable and add
a small header that looks for the current CPU, and runs the right executable.
 
 Notice that the problem I have shown isn't limited to SSE2, it's more common,
for example in the close future you may want code compiled for the GPU and/or
CPU, etc.
 
 Bye,
 bearophile

Is this mostly integer, or floating point code?

Feb 02 2009

bearophile <bearophileHUGS lycos.com> writes:

Don:
 Is this mostly integer, or floating point code?

In that specific cases, it's mostly FP. If I compile it with LDC with -sse3
flags the resulting asm is a jungle of the new registers :-)

Bye,
bearophile

Feb 02 2009

Walter Bright <newshound1 digitalmars.com> writes:

bearophile wrote:
 So, it may be useful to have a way to build executables able to run
 well on both CPUs (Apple has done something like this two or more
 times in the past). There are several ways to do this, a solution is
 to compile just critical functions for different CPUs, but that may
 require compiler support. My executables are generally small, so
 doubling their size isn't a problem. So a simple solution is to
 bundle two whole executables into an executable and add a small
 header that looks for the current CPU, and runs the right executable.


This is a very old problem, it even cropped up in the bad old DOS days 
where you had the choice of emulator or FPU. The solution is fairly 
simple - you don't need to bind together two executables. Simply put a 
runtime switch in:


import std.cpuid;
import sse;
import nosse;

...

if (std.cpuid.sse2())
     sse2.foo();
else
     nosse2.foo();


and then compile sse.d and nosse.d with different compiler switches. The 
std.cpuid module will tell you what you've got at runtime. To see a real 
example of this, look at the array op implementation code in the 
standard library, such as internal/arrayfloat.d, it does a runtime 
switch for several different FPU flavors.

Feb 02 2009

bearophile <bearophileHUGS lycos.com> writes:

Walter Bright:
 import std.cpuid;
 import sse;
 import nosse;
 ...
 if (std.cpuid.sse2())
      sse2.foo();
 else
      nosse2.foo();

I think that solves my problem, thank you. It's a simple solution (maybe I
didn't think of it because I use bud that compiles all the program in one go).

I presume that usually the D code in the sse and nosse modules is the same,
it's just compiled in two different ways, so the two modules may just contain
two lines of code as:

module sse;
mixin(import("shared_module_code.dd"));

Bye,
bearophile

Feb 02 2009

Walter Bright <newshound1 digitalmars.com> writes:

bearophile wrote:
 Walter Bright:
 import std.cpuid; import sse; import nosse; ... if
 (std.cpuid.sse2()) sse2.foo(); else nosse2.foo();

 
 I think that solves my problem, thank you. It's a simple solution
 (maybe I didn't think of it because I use bud that compiles all the
 program in one go).
 
 I presume that usually the D code in the sse and nosse modules is the
 same, it's just compiled in two different ways, so the two modules
 may just contain two lines of code as:
 
 module sse; mixin(import("shared_module_code.dd"));

That's one way to do it.

Feb 02 2009

grauzone <none example.net> writes:

Walter Bright wrote:
 bearophile wrote:
 Walter Bright:
 import std.cpuid; import sse; import nosse; ... if
 (std.cpuid.sse2()) sse2.foo(); else nosse2.foo();

 I think that solves my problem, thank you. It's a simple solution
 (maybe I didn't think of it because I use bud that compiles all the
 program in one go).

 I presume that usually the D code in the sse and nosse modules is the
 same, it's just compiled in two different ways, so the two modules
 may just contain two lines of code as:

 module sse; mixin(import("shared_module_code.dd"));

 
 That's one way to do it.

The glorious return of include files!

Feb 02 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Walter Bright wrote:
 bearophile wrote:
 Walter Bright:
 import std.cpuid; import sse; import nosse; ... if
 (std.cpuid.sse2()) sse2.foo(); else nosse2.foo();

 I think that solves my problem, thank you. It's a simple solution
 (maybe I didn't think of it because I use bud that compiles all the
 program in one go).

 I presume that usually the D code in the sse and nosse modules is the
 same, it's just compiled in two different ways, so the two modules
 may just contain two lines of code as:

 module sse; mixin(import("shared_module_code.dd"));

 
 That's one way to do it.
 

I must be missing something - why isn't

import shared_module_code;

good?


Andrei

Feb 02 2009

Walter Bright <newshound1 digitalmars.com> writes:

Andrei Alexandrescu wrote:
 Walter Bright wrote:
 bearophile wrote:
 Walter Bright:
 import std.cpuid; import sse; import nosse; ... if
 (std.cpuid.sse2()) sse2.foo(); else nosse2.foo();

 I think that solves my problem, thank you. It's a simple solution
 (maybe I didn't think of it because I use bud that compiles all the
 program in one go).

 I presume that usually the D code in the sse and nosse modules is the
 same, it's just compiled in two different ways, so the two modules
 may just contain two lines of code as:

 module sse; mixin(import("shared_module_code.dd"));

 That's one way to do it.

 
 I must be missing something - why isn't
 
 import shared_module_code;
 
 good?

Because importing something does not change how it was compiled. If you 
have one module that you want two separate instances of, compiled with 
different switches, they have to be somehow given different names.

Feb 02 2009

BCS <none anon.com> writes:

Hello Andrei,

 bearophile wrote:
 
 
 module sse; mixin(import("shared_module_code.dd"));
 


 I must be missing something - why isn't
 
 import shared_module_code;
 
 good?
 
 Andrei
 

the code generator needs to be run on the code more than once.

Feb 02 2009

Christopher Wright <dhasenan gmail.com> writes:

Andrei Alexandrescu wrote:
 Walter Bright wrote:
 bearophile wrote:
 Walter Bright:
 import std.cpuid; import sse; import nosse; ... if
 (std.cpuid.sse2()) sse2.foo(); else nosse2.foo();

 I think that solves my problem, thank you. It's a simple solution
 (maybe I didn't think of it because I use bud that compiles all the
 program in one go).

 I presume that usually the D code in the sse and nosse modules is the
 same, it's just compiled in two different ways, so the two modules
 may just contain two lines of code as:

 module sse; mixin(import("shared_module_code.dd"));

 That's one way to do it.

 
 I must be missing something - why isn't
 
 import shared_module_code;
 
 good?
 
 
 Andrei

The shared code has to be compiled with two sets of compiler switches, 
resulting in two distinct modules with different ModuleInfo, TypeInfo, 
and so forth. You can't do that with import.

Feb 02 2009

BCS <ao pathlink.com> writes:

Reply to bearophile,

 Walter Bright:
 
 import std.cpuid;
 import sse;
 import nosse;
 ...
 if (std.cpuid.sse2())
 sse2.foo();
 else
 nosse2.foo();

 I think that solves my problem, thank you. It's a simple solution
 (maybe I didn't think of it because I use bud that compiles all the
 program in one go).
 
 I presume that usually the D code in the sse and nosse modules is the
 same, it's just compiled in two different ways, so the two modules may
 just contain two lines of code as:
 
 module sse;
 mixin(import("shared_module_code.dd"));
 Bye,
 bearophile

my first thought would be to play games with the linker:

define a function EnterA() that calls code
define a function EnterB() that calls code
compile needed code for CPU A to A.obj
Compile needed code for CPU B to B.obj
make a lib with EnterA and A.obj forcing internal linking
make a lib with EnterB and B.obj forcing internal linking
link common code and both libs

making the libs becomes the fun part

Feb 02 2009

"Tim M" <a b.com> writes:

On Tue, 03 Feb 2009 00:31:17 +1300, bearophile <bearophileHUGS lycos.com>  
wrote:

 This comes after a small discussion I've had in the #D IRC channel.

 I have seen that the LDC compiler is much more efficient if you use  
 SSE(2) extensions, while it's not much efficient if you don't use them  
 (GCC/GDC don't seem so much sensitive to the presence of the SSE  
 extensions).

 I often have to switch from an old and a new CPU, so if I compile with  
 SSE2 extensions the program doesn't run on the old CPU, while if I don't  
 use them, I sometimes have a program that goes much slower on the newer  
 CPU.

 So, it may be useful to have a way to build executables able to run well  
 on both CPUs (Apple has done something like this two or more times in  
 the past). There are several ways to do this, a solution is to compile  
 just critical functions for different CPUs, but that may require  
 compiler support.
 My executables are generally small, so doubling their size isn't a  
 problem. So a simple solution is to bundle two whole executables into an  
 executable and add a small header that looks for the current CPU, and  
 runs the right executable.

 Notice that the problem I have shown isn't limited to SSE2, it's more  
 common, for example in the close future you may want code compiled for  
 the GPU and/or CPU, etc.

 Bye,
 bearophile

Is this the sort thing you are looking for:  
http://www.songho.ca/misc/sse/sse.html

Feb 02 2009

D Programming

C/C++ Programming

Other

digitalmars.D - Dual CPU code