www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - alignment on stack-allocated arrays/structs

reply Trass3r <mrmocool gmx.de> writes:
I originally posted a question about this in D.learn. bearophile advised 
me to ask for that feature here.


Original post:
==============

OpenCL requires all types to be naturally aligned.

The D specs state:
"AlignAttribute is ignored when applied to declarations that are not 
struct members."

Could there arise any problems translating the following

/*
  * Vector types
  *
  *  Note:   OpenCL requires that all types be naturally aligned.
  *          This means that vector types must be naturally aligned.
  *          For example, a vector of four floats must be aligned to
  *          a 16 byte boundary (calculated as 4 * the natural 4-byte
  *          alignment of the float).  The alignment qualifiers here
  *          will only function properly if your compiler supports them
  *          and if you don't actively work to defeat them.  For example,
  *          in order for a cl_float4 to be 16 byte aligned in a struct,
  *          the start of the struct must itself be 16-byte aligned.
  *
  *          Maintaining proper alignment is the user's responsibility.
  */

typedef double          cl_double2[2]   __attribute__((aligned(16)));
typedef double          cl_double4[4]   __attribute__((aligned(32)));
typedef double          cl_double8[8]   __attribute__((aligned(64)));
typedef double          cl_double16[16] __attribute__((aligned(128)));



into just


alias double[2]    cl_double2;
alias double[4]    cl_double4;
alias double[8]    cl_double8;
alias double[16]   cl_double16;

?
Nov 17 2009
next sibling parent reply Tomas Lindquist Olsen <tomas.l.olsen gmail.com> writes:
On Tue, Nov 17, 2009 at 9:12 PM, Trass3r <mrmocool gmx.de> wrote:
 I originally posted a question about this in D.learn. bearophile advised =

 to ask for that feature here.


 Original post:
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

 OpenCL requires all types to be naturally aligned.

 The D specs state:
 "AlignAttribute is ignored when applied to declarations that are not stru=

 members."

 Could there arise any problems translating the following

 /*
 =C2=A0* Vector types
 =C2=A0*
 =C2=A0* =C2=A0Note: =C2=A0 OpenCL requires that all types be naturally al=

 =C2=A0* =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0This means that vector types mu=

 =C2=A0* =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0For example, a vector of four f=

 =C2=A0* =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0a 16 byte boundary (calculated =

 =C2=A0* =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0alignment of the float). =C2=A0=

 =C2=A0* =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0will only function properly if =

 =C2=A0* =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0and if you don't actively work =

 =C2=A0* =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0in order for a cl_float4 to be =

 =C2=A0* =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0the start of the struct must it=

 =C2=A0*
 =C2=A0* =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Maintaining proper alignment is=

 =C2=A0*/

 typedef double =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0cl_double2[2] =C2=A0 __a=

 typedef double =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0cl_double4[4] =C2=A0 __a=

 typedef double =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0cl_double8[8] =C2=A0 __a=

 typedef double =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0cl_double16[16] __attrib=

 into just


 alias double[2] =C2=A0 =C2=A0cl_double2;
 alias double[4] =C2=A0 =C2=A0cl_double4;
 alias double[8] =C2=A0 =C2=A0cl_double8;
 alias double[16] =C2=A0 cl_double16;

 ?

yep, D provides no way to do this, they'd all align to 4 bytes (at least on x86-32)
Nov 17 2009
parent bearophile <bearophileHUGS lycos.com> writes:
Tomas Lindquist Olsen:

 yep, D provides no way to do this, they'd all align to 4 bytes (at
 least on x86-32)

The idea, that I suggested to the LDC team too, is to extend the semantics of align, no new syntax seems needed: align(8) alias int[4] Foo; align(8) double good; Bye, bearophile
Nov 17 2009
prev sibling next sibling parent reply "Robert Jacques" <sandford jhu.edu> writes:
On Tue, 17 Nov 2009 15:12:50 -0500, Trass3r <mrmocool gmx.de> wrote:

 I originally posted a question about this in D.learn. bearophile advised  
 me to ask for that feature here.


 Original post:
 ==============

 OpenCL requires all types to be naturally aligned.

 The D specs state:
 "AlignAttribute is ignored when applied to declarations that are not  
 struct members."

 Could there arise any problems translating the following

 /*
   * Vector types
   *
   *  Note:   OpenCL requires that all types be naturally aligned.
   *          This means that vector types must be naturally aligned.
   *          For example, a vector of four floats must be aligned to
   *          a 16 byte boundary (calculated as 4 * the natural 4-byte
   *          alignment of the float).  The alignment qualifiers here
   *          will only function properly if your compiler supports them
   *          and if you don't actively work to defeat them.  For example,
   *          in order for a cl_float4 to be 16 byte aligned in a struct,
   *          the start of the struct must itself be 16-byte aligned.
   *
   *          Maintaining proper alignment is the user's responsibility.
   */

 typedef double          cl_double2[2]   __attribute__((aligned(16)));
 typedef double          cl_double4[4]   __attribute__((aligned(32)));
 typedef double          cl_double8[8]   __attribute__((aligned(64)));
 typedef double          cl_double16[16] __attribute__((aligned(128)));



 into just


 alias double[2]    cl_double2;
 alias double[4]    cl_double4;
 alias double[8]    cl_double8;
 alias double[16]   cl_double16;

 ?

To the best of my knowlegde, D only supports align(1) and align(4). On the other hand, compile time introspection allows my CUDA api to convert alignment correctly for any given struct. As for your question, yes, there's lot's of trouble using simple aliases. You'll run into alignment issues with both function calling and if you use cl_double2, etc in structs. Of course, alignment issues only raise their ugly heads some of the time, which often leads to brittle code. A robust OpenCL binding for D needs to do alignment correction.
Nov 17 2009
parent Trass3r <mrmocool gmx.de> writes:
Robert Jacques schrieb:
 To the best of my knowlegde, D only supports align(1) and align(4). On 
 the other hand, compile time introspection allows my CUDA api to convert 
 alignment correctly for any given struct.
 

gotta look that up in your code. Maybe I also find some other ideas for writing my wrapper. It currently is a plain OO-approach using classes for platform, device, kernel, etc. But maybe one can exploit D's capabilities to make things easier to program. Something along the lines of http://ochafik.free.fr/blog/?p=207 while not retricting what can be done with the wrapper compared to plain OpenCL...
Nov 17 2009
prev sibling parent reply Don <nospam nospam.com> writes:
 OpenCL requires all types to be naturally aligned.
 /*
  * Vector types
  *
  *  Note:   OpenCL requires that all types be naturally aligned.
  *          This means that vector types must be naturally aligned.
  *          For example, a vector of four floats must be aligned to
  *          a 16 byte boundary (calculated as 4 * the natural 4-byte
  *          alignment of the float).  The alignment qualifiers here
  *          will only function properly if your compiler supports them
  *          and if you don't actively work to defeat them.  For example,
  *          in order for a cl_float4 to be 16 byte aligned in a struct,
  *          the start of the struct must itself be 16-byte aligned.

http://d.puremagic.com/issues/show_bug.cgi?id=2278
Nov 18 2009
next sibling parent reply Trass3r <mrmocool gmx.de> writes:
Don schrieb:
 http://d.puremagic.com/issues/show_bug.cgi?id=2278

Isn't this a distinct problem or am I wrong? This is not only about 8-byte boundaries.
Nov 18 2009
parent reply Don <nospam nospam.com> writes:
Trass3r wrote:
 Don schrieb:
 http://d.puremagic.com/issues/show_bug.cgi?id=2278

Isn't this a distinct problem or am I wrong? This is not only about 8-byte boundaries.

Well, sort of. It's impossible to align stack-allocated structs with any alignment greater than the alignment of the stack itself (which is 4 bytes). Anything larger than that and you HAVE to use the heap or alloca(). Since D2.007, static items use align(16); before that, they were also limited to align(4). Nothing on x86 benefits from more than 16 byte alignment, AFAIK, and it's never mandatory to use more than 8 byte alignment. I don't know so much about the recent GPUs, though -- do they really require 16 byte alignment or more?
Nov 18 2009
parent reply Trass3r <mrmocool gmx.de> writes:
Don schrieb:
 Well, sort of.
 It's impossible to align stack-allocated structs with any alignment 
 greater than the alignment of the stack itself (which is 4 bytes). 
 Anything larger than that and you HAVE to use the heap or alloca().
 

So how do other compilers supporting that alignment syntax do it?
 Nothing on x86 benefits from more than 16 byte alignment, AFAIK, and 
 it's never mandatory to use more than 8 byte alignment. I don't know so 
 much about the recent GPUs, though -- do they really require 16 byte 
 alignment or more?
 

I'm not sure how exactly this works and why they require alignment. Couldn't find anything about that in the clEnqueueWriteBuffer description where data gets written into GPU memory. The specification for the OpenCL C language itself only states: A data item declared to be a data type in memory is always aligned to the size of the data type in bytes. For example, a float4 variable will be aligned to a 16-byte boundary, a char2 variable will be aligned to a 2-byte boundary. A built-in data type that is not a power of two bytes in size must be aligned to the next larger power of two. This rule applies to built-in types only, not structs or unions. They also strangely state: The components of vector data types with 1 ... 4 components can be addressed as <vector_data_type>.xyzw. float4 c, a, b; c.xyzw = (float4)(1.0f, 2.0f, 3.0f, 4.0f); c.z = 1.0f; // is a float c.xy = (float2)(3.0f, 4.0f); // is a float2 So I wonder why they used arrays in the headers and not structs to be consistent with this.
Nov 18 2009
parent Don <nospam nospam.com> writes:
Trass3r wrote:
 Don schrieb:
 Well, sort of.
 It's impossible to align stack-allocated structs with any alignment 
 greater than the alignment of the stack itself (which is 4 bytes). 
 Anything larger than that and you HAVE to use the heap or alloca().

So how do other compilers supporting that alignment syntax do it?

It might only be required on particular CPUs/OSes. Eg requirements for Sparc are quite different. Some of them might be doing alloca() under the covers.
 Nothing on x86 benefits from more than 16 byte alignment, AFAIK, and 
 it's never mandatory to use more than 8 byte alignment. I don't know 
 so much about the recent GPUs, though -- do they really require 16 
 byte alignment or more?

I'm not sure how exactly this works and why they require alignment. Couldn't find anything about that in the clEnqueueWriteBuffer description where data gets written into GPU memory. The specification for the OpenCL C language itself only states: A data item declared to be a data type in memory is always aligned to the size of the data type in bytes. For example, a float4 variable will be aligned to a 16-byte boundary, a char2 variable will be aligned to a 2-byte boundary. A built-in data type that is not a power of two bytes in size must be aligned to the next larger power of two. This rule applies to built-in types only, not structs or unions. They also strangely state: The components of vector data types with 1 ... 4 components can be addressed as <vector_data_type>.xyzw. float4 c, a, b; c.xyzw = (float4)(1.0f, 2.0f, 3.0f, 4.0f); c.z = 1.0f; // is a float c.xy = (float2)(3.0f, 4.0f); // is a float2 So I wonder why they used arrays in the headers and not structs to be consistent with this.

Nov 18 2009
prev sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Wed, 18 Nov 2009 11:03:19 -0500, Don <nospam nospam.com> wrote:

 Trass3r wrote:
 Don schrieb:
 http://d.puremagic.com/issues/show_bug.cgi?id=2278

8-byte boundaries.

Well, sort of. It's impossible to align stack-allocated structs with any alignment greater than the alignment of the stack itself (which is 4 bytes). Anything larger than that and you HAVE to use the heap or alloca(). Since D2.007, static items use align(16); before that, they were also limited to align(4). Nothing on x86 benefits from more than 16 byte alignment, AFAIK, and it's never mandatory to use more than 8 byte alignment. I don't know so much about the recent GPUs, though -- do they really require 16 byte alignment or more?

NVIDIA only requires 16-byte alignment.
Nov 18 2009