www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.ldc - CT Information about target CPU and Related cross-compile

reply Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
Hi all,

I will write std.blas and it will be heavily optimised for LDC. 
Can these features be added to LDC?

1. Basic compile time information about target CPU such as 
L1/L2/L3 cache sizes and available instructions set, e.g. SSE2, 
AVX, AVX2, AVX512.

2. Related cross-compile. For example: target is x86_64; AVX 
support can be checked at runtime using core.cpuid; so I want to 
force LDC to compile three versions of BLAS for SSE, AVX and 
AVX512, and choose better in runtime.

Links:
std.blas annonce: 
http://forum.dlang.org/thread/nilhvnqbsgqhxdshpqfl forum.dlang.org
Dec 26 2015
next sibling parent reply Johan Engelen <j j.nl> writes:
On Saturday, 26 December 2015 at 20:47:39 UTC, Ilya Yaroshenko 
wrote:
 Hi all,

 I will write std.blas and it will be heavily optimised for LDC.
jay! :-)
 Can these features be added to LDC?

 1. Basic compile time information about target CPU such as 
 L1/L2/L3 cache sizes and available instructions set, e.g. SSE2, 
 AVX, AVX2, AVX512.
Do you have a proposal for a set of function names / version IDs / ...? This sounds like a simple thing to add. I'm not sure about cache sizes: is it currently possible to specify the target microarchitecture on the cmdline?
 2. Related cross-compile. For example: target is x86_64; AVX 
 support can be checked at runtime using core.cpuid; so I want 
 to force LDC to compile three versions of BLAS for SSE, AVX and 
 AVX512, and choose better in runtime.
Something like this? https://gcc.gnu.org/wiki/FunctionMultiVersioning
Dec 27 2015
next sibling parent Johan Engelen <j j.nl> writes:
 On Saturday, 26 December 2015 at 20:47:39 UTC, Ilya Yaroshenko 
 wrote:
 Hi all,

 2. Related cross-compile. For example: target is x86_64; AVX 
 support can be checked at runtime using core.cpuid; so I want 
 to force LDC to compile three versions of BLAS for SSE, AVX 
 and AVX512, and choose better in runtime.
An LLVM presentation I found on the topic: http://llvm.org/devmtg/2014-10/Slides/Christopher-Function%20Multiversioning%20Talk.pdf (perhaps mostly a reminder to self ;)
Dec 27 2015
prev sibling parent reply Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
On Sunday, 27 December 2015 at 17:34:26 UTC, Johan Engelen wrote:
 On Saturday, 26 December 2015 at 20:47:39 UTC, Ilya Yaroshenko 
 wrote:
 Hi all,

 I will write std.blas and it will be heavily optimised for LDC.
jay! :-)
 Can these features be added to LDC?

 1. Basic compile time information about target CPU such as 
 L1/L2/L3 cache sizes and available instructions set, e.g. 
 SSE2, AVX, AVX2, AVX512.
Do you have a proposal for a set of function names / version IDs / ...? This sounds like a simple thing to add. I'm not sure about cache sizes: is it currently possible to specify the target microarchitecture on the cmdline?
I have found that core.cpuid can provide runtime information about cache sizes, it is enough. However amount of SIMD registers and their sizes should be known at compile time. What do you mean with "set of function names / version IDs"?
 2. Related cross-compile. For example: target is x86_64; AVX 
 support can be checked at runtime using core.cpuid; so I want 
 to force LDC to compile three versions of BLAS for SSE, AVX 
 and AVX512, and choose better in runtime.
Something like this? https://gcc.gnu.org/wiki/FunctionMultiVersioning
Yes! Or runtime check at least. Ilya
Dec 27 2015
parent reply Johan Engelen <j j.nl> writes:
On Sunday, 27 December 2015 at 23:47:41 UTC, Ilya Yaroshenko 
wrote:
 On Sunday, 27 December 2015 at 17:34:26 UTC, Johan Engelen 
 wrote:
 On Saturday, 26 December 2015 at 20:47:39 UTC, Ilya Yaroshenko 
 wrote:

 Can these features be added to LDC?

 1. Basic compile time information about target CPU such as 
 L1/L2/L3 cache sizes and available instructions set, e.g. 
 SSE2, AVX, AVX2, AVX512.
Do you have a proposal for a set of function names / version IDs / ...? This sounds like a simple thing to add. I'm not sure about cache sizes: is it currently possible to specify the target microarchitecture on the cmdline?
I have found that core.cpuid can provide runtime information about cache sizes, it is enough. However amount of SIMD registers and their sizes should be known at compile time. What do you mean with "set of function names / version IDs"?
(I am pretty new to D, etc.) Can you give me a sample of code showing what "API" you expect for this stuff?
 2. Related cross-compile. For example: target is x86_64; AVX 
 support can be checked at runtime using core.cpuid; so I want 
 to force LDC to compile three versions of BLAS for SSE, AVX 
 and AVX512, and choose better in runtime.
Something like this? https://gcc.gnu.org/wiki/FunctionMultiVersioning
Yes! Or runtime check at least.
I had been thinking about implementing function multiversioning before. It's great that someone wants it :-)
Dec 30 2015
parent reply Ilya <ilyayaroshenko gmail.com> writes:
On Wednesday, 30 December 2015 at 15:20:35 UTC, Johan Engelen 
wrote:
 On Sunday, 27 December 2015 at 23:47:41 UTC, Ilya Yaroshenko 
 wrote:
 On Sunday, 27 December 2015 at 17:34:26 UTC, Johan Engelen 
 wrote:
 On Saturday, 26 December 2015 at 20:47:39 UTC, Ilya 
 Yaroshenko wrote:

 Can these features be added to LDC?

 1. Basic compile time information about target CPU such as 
 L1/L2/L3 cache sizes and available instructions set, e.g. 
 SSE2, AVX, AVX2, AVX512.
Do you have a proposal for a set of function names / version IDs / ...? This sounds like a simple thing to add. I'm not sure about cache sizes: is it currently possible to specify the target microarchitecture on the cmdline?
I have found that core.cpuid can provide runtime information about cache sizes, it is enough. However amount of SIMD registers and their sizes should be known at compile time. What do you mean with "set of function names / version IDs"?
(I am pretty new to D, etc.) Can you give me a sample of code showing what "API" you expect for this stuff?
Dispatching example: target("default") //used for ctfe code int foo () { // The default version of foo. return 0; } target("sse4.2") int foo() { // foo version for SSE4.2 if compiler is LDC return 1; } target("arch=atom,+sse2") int foo() { // foo version for the Intel ATOM processor with SSE2 suport return 2; } Compile time features example: version(LDC) { enum bool a = __target(has, "avx2"); enum bool b = __target(compatible, "core-avx2"); enum bool c = __target("broadwell"); } else version(GNU) { ... }
 2. Related cross-compile. For example: target is x86_64; AVX 
 support can be checked at runtime using core.cpuid; so I 
 want to force LDC to compile three versions of BLAS for SSE, 
 AVX and AVX512, and choose better in runtime.
Something like this? https://gcc.gnu.org/wiki/FunctionMultiVersioning
Yes! Or runtime check at least.
I had been thinking about implementing function multiversioning before. It's great that someone wants it :-)
Dec 30 2015
parent reply JohanEngelen <j j.nl> writes:
On Wednesday, 30 December 2015 at 20:07:02 UTC, Ilya wrote:
  target("sse4.2")
 int foo() {
  // foo version for SSE4.2 if compiler is LDC
  return 1;
 }
I'm working on (a rudimentary version of) target at the moment. I assume you build LDC yourself and you are happy to help with some testing and give feedback? :) cheers, Johan
Jan 02 2016
parent reply Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
On Saturday, 2 January 2016 at 23:27:16 UTC, JohanEngelen wrote:
 On Wednesday, 30 December 2015 at 20:07:02 UTC, Ilya wrote:
  target("sse4.2")
 int foo() {
  // foo version for SSE4.2 if compiler is LDC
  return 1;
 }
I'm working on (a rudimentary version of) target at the moment. I assume you build LDC yourself and you are happy to help with some testing and give feedback? :) cheers, Johan
Yes! You can count on me ;) --Ilya
Jan 02 2016
parent reply JohanEngelen <j j.nl> writes:
On Sunday, 3 January 2016 at 05:16:36 UTC, Ilya Yaroshenko wrote:
 On Saturday, 2 January 2016 at 23:27:16 UTC, JohanEngelen wrote:
 I'm working on (a rudimentary version of)  target at the 
 moment.
 I assume you build LDC yourself and you are happy to help with 
 some testing and give feedback? :)

 cheers,
   Johan
Yes! You can count on me ;) --Ilya
Great, thanks :) The branch is ready: https://github.com/JohanEngelen/ldc/tree/attr_target (make sure git correctly fetches the druntime branch with ldc.attributes.target in it) Usage examples can be found in the test file: tests/ir/attr_target_x86.d It'd be great if you can run the IR tests (and can help improve the tests): cd tests/ir python runlit.py -v . I myself often modify a test file locally and rerun the test to quickly see if things are working or not (inspect output .ll and .s). cheers, Johan
Jan 03 2016
parent Johan Engelen <j j.nl> writes:
On Sunday, 3 January 2016 at 13:11:55 UTC, JohanEngelen wrote:
 The branch is ready:
See: https://github.com/ldc-developers/ldc/pull/1244
Jan 03 2016
prev sibling next sibling parent Johan Engelen <j j.nl> writes:
On Saturday, 26 December 2015 at 20:47:39 UTC, Ilya Yaroshenko 
wrote:
 2. Related cross-compile. For example: target is x86_64; AVX 
 support can be checked at runtime using core.cpuid; so I want 
 to force LDC to compile three versions of BLAS for SSE, AVX and 
 AVX512, and choose better in runtime.
I think we could also implement this as a library solution, instead of compiler-internally. Would that make more sense?
Jan 04 2016
prev sibling parent reply Johan Engelen <j j.nl> writes:
On Saturday, 26 December 2015 at 20:47:39 UTC, Ilya Yaroshenko 
wrote:
 Hi all,

 I will write std.blas and it will be heavily optimised for LDC. 
 Can these features be added to LDC?

 1. Basic compile time information about target CPU such as 
 L1/L2/L3 cache sizes and available instructions set, e.g. SSE2, 
 AVX, AVX2, AVX512.
I looked a little more into adding this to LDC. LLVM seems to support it nicely, so it should be straightforward to get basic functionality that you can test and play with. Adding a "__target()" interface would be a big front-end addition, I think. How about using existing interfaces? To me it seems it is easiest to add extra __traits(...), which has an easily extendable interface. Is there another pre-existing interface that could be used? e.g.: enum sse4 = __traits(targetHasFeature, "sse4"); Possible __traits(targetXXX) that I can think of: - targetArch ("aarch64", "x86", "x86_64",...) - targetOS ("Linux", "Win32",...) - targetFeatures, returning tuple of feature strings - targetCPU What do you think?
Apr 13 2016
next sibling parent Johan Engelen <j j.nl> writes:
On Wednesday, 13 April 2016 at 17:27:54 UTC, Johan Engelen wrote:
 
 Possible __traits(targetXXX) that I can think of:
 - targetArch ("aarch64", "x86", "x86_64",...)
 - targetOS ("Linux", "Win32",...)
Forgot that there are already predefined versions for these.
Apr 13 2016
prev sibling parent Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
On Wednesday, 13 April 2016 at 17:27:54 UTC, Johan Engelen wrote:
 On Saturday, 26 December 2015 at 20:47:39 UTC, Ilya Yaroshenko 
 wrote:
 [...]
I looked a little more into adding this to LDC. LLVM seems to support it nicely, so it should be straightforward to get basic functionality that you can test and play with. [...]
Hi Johan, Thank you for doing this! Yes, __traits(targetHasFeature, "sse4") looks good Best regards, Ilya
Apr 15 2016