www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Dual CPU code

reply bearophile <bearophileHUGS lycos.com> writes:
This comes after a small discussion I've had in the #D IRC channel.

I have seen that the LDC compiler is much more efficient if you use SSE(2)
extensions, while it's not much efficient if you don't use them (GCC/GDC don't
seem so much sensitive to the presence of the SSE extensions).

I often have to switch from an old and a new CPU, so if I compile with SSE2
extensions the program doesn't run on the old CPU, while if I don't use them, I
sometimes have a program that goes much slower on the newer CPU.

So, it may be useful to have a way to build executables able to run well on
both CPUs (Apple has done something like this two or more times in the past).
There are several ways to do this, a solution is to compile just critical
functions for different CPUs, but that may require compiler support.
My executables are generally small, so doubling their size isn't a problem. So
a simple solution is to bundle two whole executables into an executable and add
a small header that looks for the current CPU, and runs the right executable.

Notice that the problem I have shown isn't limited to SSE2, it's more common,
for example in the close future you may want code compiled for the GPU and/or
CPU, etc.

Bye,
bearophile
Feb 02 2009
next sibling parent reply Don <nospam nospam.com> writes:
bearophile wrote:
 This comes after a small discussion I've had in the #D IRC channel.
 
 I have seen that the LDC compiler is much more efficient if you use SSE(2)
extensions, while it's not much efficient if you don't use them (GCC/GDC don't
seem so much sensitive to the presence of the SSE extensions).
 
 I often have to switch from an old and a new CPU, so if I compile with SSE2
extensions the program doesn't run on the old CPU, while if I don't use them, I
sometimes have a program that goes much slower on the newer CPU.
 
 So, it may be useful to have a way to build executables able to run well on
both CPUs (Apple has done something like this two or more times in the past).
There are several ways to do this, a solution is to compile just critical
functions for different CPUs, but that may require compiler support.
 My executables are generally small, so doubling their size isn't a problem. So
a simple solution is to bundle two whole executables into an executable and add
a small header that looks for the current CPU, and runs the right executable.
 
 Notice that the problem I have shown isn't limited to SSE2, it's more common,
for example in the close future you may want code compiled for the GPU and/or
CPU, etc.
 
 Bye,
 bearophile

Is this mostly integer, or floating point code?
Feb 02 2009
parent bearophile <bearophileHUGS lycos.com> writes:
Don:
 Is this mostly integer, or floating point code?

In that specific cases, it's mostly FP. If I compile it with LDC with -sse3 flags the resulting asm is a jungle of the new registers :-) Bye, bearophile
Feb 02 2009
prev sibling next sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
bearophile wrote:
 So, it may be useful to have a way to build executables able to run
 well on both CPUs (Apple has done something like this two or more
 times in the past). There are several ways to do this, a solution is
 to compile just critical functions for different CPUs, but that may
 require compiler support. My executables are generally small, so
 doubling their size isn't a problem. So a simple solution is to
 bundle two whole executables into an executable and add a small
 header that looks for the current CPU, and runs the right executable.

This is a very old problem, it even cropped up in the bad old DOS days where you had the choice of emulator or FPU. The solution is fairly simple - you don't need to bind together two executables. Simply put a runtime switch in: import std.cpuid; import sse; import nosse; ... if (std.cpuid.sse2()) sse2.foo(); else nosse2.foo(); and then compile sse.d and nosse.d with different compiler switches. The std.cpuid module will tell you what you've got at runtime. To see a real example of this, look at the array op implementation code in the standard library, such as internal/arrayfloat.d, it does a runtime switch for several different FPU flavors.
Feb 02 2009
parent reply bearophile <bearophileHUGS lycos.com> writes:
Walter Bright:
 import std.cpuid;
 import sse;
 import nosse;
 ...
 if (std.cpuid.sse2())
      sse2.foo();
 else
      nosse2.foo();

I think that solves my problem, thank you. It's a simple solution (maybe I didn't think of it because I use bud that compiles all the program in one go). I presume that usually the D code in the sse and nosse modules is the same, it's just compiled in two different ways, so the two modules may just contain two lines of code as: module sse; mixin(import("shared_module_code.dd")); Bye, bearophile
Feb 02 2009
next sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
bearophile wrote:
 Walter Bright:
 import std.cpuid; import sse; import nosse; ... if
 (std.cpuid.sse2()) sse2.foo(); else nosse2.foo();

I think that solves my problem, thank you. It's a simple solution (maybe I didn't think of it because I use bud that compiles all the program in one go). I presume that usually the D code in the sse and nosse modules is the same, it's just compiled in two different ways, so the two modules may just contain two lines of code as: module sse; mixin(import("shared_module_code.dd"));

That's one way to do it.
Feb 02 2009
next sibling parent grauzone <none example.net> writes:
Walter Bright wrote:
 bearophile wrote:
 Walter Bright:
 import std.cpuid; import sse; import nosse; ... if
 (std.cpuid.sse2()) sse2.foo(); else nosse2.foo();

I think that solves my problem, thank you. It's a simple solution (maybe I didn't think of it because I use bud that compiles all the program in one go). I presume that usually the D code in the sse and nosse modules is the same, it's just compiled in two different ways, so the two modules may just contain two lines of code as: module sse; mixin(import("shared_module_code.dd"));

That's one way to do it.

The glorious return of include files!
Feb 02 2009
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Walter Bright wrote:
 bearophile wrote:
 Walter Bright:
 import std.cpuid; import sse; import nosse; ... if
 (std.cpuid.sse2()) sse2.foo(); else nosse2.foo();

I think that solves my problem, thank you. It's a simple solution (maybe I didn't think of it because I use bud that compiles all the program in one go). I presume that usually the D code in the sse and nosse modules is the same, it's just compiled in two different ways, so the two modules may just contain two lines of code as: module sse; mixin(import("shared_module_code.dd"));

That's one way to do it.

I must be missing something - why isn't import shared_module_code; good? Andrei
Feb 02 2009
next sibling parent Walter Bright <newshound1 digitalmars.com> writes:
Andrei Alexandrescu wrote:
 Walter Bright wrote:
 bearophile wrote:
 Walter Bright:
 import std.cpuid; import sse; import nosse; ... if
 (std.cpuid.sse2()) sse2.foo(); else nosse2.foo();

I think that solves my problem, thank you. It's a simple solution (maybe I didn't think of it because I use bud that compiles all the program in one go). I presume that usually the D code in the sse and nosse modules is the same, it's just compiled in two different ways, so the two modules may just contain two lines of code as: module sse; mixin(import("shared_module_code.dd"));

That's one way to do it.

I must be missing something - why isn't import shared_module_code; good?

Because importing something does not change how it was compiled. If you have one module that you want two separate instances of, compiled with different switches, they have to be somehow given different names.
Feb 02 2009
prev sibling next sibling parent BCS <none anon.com> writes:
Hello Andrei,

 bearophile wrote:
 
 
 module sse; mixin(import("shared_module_code.dd"));
 


I must be missing something - why isn't import shared_module_code; good? Andrei

the code generator needs to be run on the code more than once.
Feb 02 2009
prev sibling parent Christopher Wright <dhasenan gmail.com> writes:
Andrei Alexandrescu wrote:
 Walter Bright wrote:
 bearophile wrote:
 Walter Bright:
 import std.cpuid; import sse; import nosse; ... if
 (std.cpuid.sse2()) sse2.foo(); else nosse2.foo();

I think that solves my problem, thank you. It's a simple solution (maybe I didn't think of it because I use bud that compiles all the program in one go). I presume that usually the D code in the sse and nosse modules is the same, it's just compiled in two different ways, so the two modules may just contain two lines of code as: module sse; mixin(import("shared_module_code.dd"));

That's one way to do it.

I must be missing something - why isn't import shared_module_code; good? Andrei

The shared code has to be compiled with two sets of compiler switches, resulting in two distinct modules with different ModuleInfo, TypeInfo, and so forth. You can't do that with import.
Feb 02 2009
prev sibling parent BCS <ao pathlink.com> writes:
Reply to bearophile,

 Walter Bright:
 
 import std.cpuid;
 import sse;
 import nosse;
 ...
 if (std.cpuid.sse2())
 sse2.foo();
 else
 nosse2.foo();

(maybe I didn't think of it because I use bud that compiles all the program in one go). I presume that usually the D code in the sse and nosse modules is the same, it's just compiled in two different ways, so the two modules may just contain two lines of code as: module sse; mixin(import("shared_module_code.dd")); Bye, bearophile

my first thought would be to play games with the linker: define a function EnterA() that calls code define a function EnterB() that calls code compile needed code for CPU A to A.obj Compile needed code for CPU B to B.obj make a lib with EnterA and A.obj forcing internal linking make a lib with EnterB and B.obj forcing internal linking link common code and both libs making the libs becomes the fun part
Feb 02 2009
prev sibling parent "Tim M" <a b.com> writes:
On Tue, 03 Feb 2009 00:31:17 +1300, bearophile <bearophileHUGS lycos.com>  
wrote:

 This comes after a small discussion I've had in the #D IRC channel.

 I have seen that the LDC compiler is much more efficient if you use  
 SSE(2) extensions, while it's not much efficient if you don't use them  
 (GCC/GDC don't seem so much sensitive to the presence of the SSE  
 extensions).

 I often have to switch from an old and a new CPU, so if I compile with  
 SSE2 extensions the program doesn't run on the old CPU, while if I don't  
 use them, I sometimes have a program that goes much slower on the newer  
 CPU.

 So, it may be useful to have a way to build executables able to run well  
 on both CPUs (Apple has done something like this two or more times in  
 the past). There are several ways to do this, a solution is to compile  
 just critical functions for different CPUs, but that may require  
 compiler support.
 My executables are generally small, so doubling their size isn't a  
 problem. So a simple solution is to bundle two whole executables into an  
 executable and add a small header that looks for the current CPU, and  
 runs the right executable.

 Notice that the problem I have shown isn't limited to SSE2, it's more  
 common, for example in the close future you may want code compiled for  
 the GPU and/or CPU, etc.

 Bye,
 bearophile

Is this the sort thing you are looking for: http://www.songho.ca/misc/sse/sse.html
Feb 02 2009