digitalmars.D - alignment on stack-allocated arrays/structs

Trass3r (34/34) Nov 17 2009 I originally posted a question about this in D.learn. bearophile advised...

Tomas Lindquist Olsen (19/53) Nov 17 2009 ct

bearophile (6/8) Nov 17 2009 The idea, that I suggested to the LDC team too, is to extend the semanti...

Robert Jacques (9/43) Nov 17 2009 To the best of my knowlegde, D only supports align(1) and align(4). On t...

Trass3r (8/12) Nov 17 2009 gotta look that up in your code.

Don (1/14) Nov 18 2009 http://d.puremagic.com/issues/show_bug.cgi?id=2278

Trass3r (3/4) Nov 18 2009 Isn't this a distinct problem or am I wrong? This is not only about

Don (11/16) Nov 18 2009 Well, sort of.

Robert Jacques (2/17) Nov 18 2009 NVIDIA only requires 16-byte alignment.
Trass3r (22/32) Nov 18 2009 I'm not sure how exactly this works and why they require alignment.

Don (4/51) Nov 18 2009 It might only be required on particular CPUs/OSes. Eg requirements for

Trass3r <mrmocool gmx.de> writes:

I originally posted a question about this in D.learn. bearophile advised 
me to ask for that feature here.


Original post:
==============

OpenCL requires all types to be naturally aligned.

The D specs state:
"AlignAttribute is ignored when applied to declarations that are not 
struct members."

Could there arise any problems translating the following

/*
  * Vector types
  *
  *  Note:   OpenCL requires that all types be naturally aligned.
  *          This means that vector types must be naturally aligned.
  *          For example, a vector of four floats must be aligned to
  *          a 16 byte boundary (calculated as 4 * the natural 4-byte
  *          alignment of the float).  The alignment qualifiers here
  *          will only function properly if your compiler supports them
  *          and if you don't actively work to defeat them.  For example,
  *          in order for a cl_float4 to be 16 byte aligned in a struct,
  *          the start of the struct must itself be 16-byte aligned.
  *
  *          Maintaining proper alignment is the user's responsibility.
  */

typedef double          cl_double2[2]   __attribute__((aligned(16)));
typedef double          cl_double4[4]   __attribute__((aligned(32)));
typedef double          cl_double8[8]   __attribute__((aligned(64)));
typedef double          cl_double16[16] __attribute__((aligned(128)));



into just


alias double[2]    cl_double2;
alias double[4]    cl_double4;
alias double[8]    cl_double8;
alias double[16]   cl_double16;

?

Nov 17 2009

Tomas Lindquist Olsen <tomas.l.olsen gmail.com> writes:

On Tue, Nov 17, 2009 at 9:12 PM, Trass3r <mrmocool gmx.de> wrote:
 I originally posted a question about this in D.learn. bearophile advised =

me
 to ask for that feature here.


 Original post:
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

 OpenCL requires all types to be naturally aligned.

 The D specs state:
 "AlignAttribute is ignored when applied to declarations that are not stru=

ct
 members."

 Could there arise any problems translating the following

 /*
 =C2=A0* Vector types
 =C2=A0*
 =C2=A0* =C2=A0Note: =C2=A0 OpenCL requires that all types be naturally al=

igned.
 =C2=A0* =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0This means that vector types mu=

st be naturally aligned.
 =C2=A0* =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0For example, a vector of four f=

loats must be aligned to
 =C2=A0* =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0a 16 byte boundary (calculated =

as 4 * the natural 4-byte
 =C2=A0* =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0alignment of the float). =C2=A0=

The alignment qualifiers here
 =C2=A0* =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0will only function properly if =

your compiler supports them
 =C2=A0* =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0and if you don't actively work =

to defeat them. =C2=A0For example,
 =C2=A0* =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0in order for a cl_float4 to be =

16 byte aligned in a struct,
 =C2=A0* =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0the start of the struct must it=

self be 16-byte aligned.
 =C2=A0*
 =C2=A0* =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Maintaining proper alignment is=

 the user's responsibility.
 =C2=A0*/

 typedef double =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0cl_double2[2] =C2=A0 __a=

ttribute__((aligned(16)));
 typedef double =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0cl_double4[4] =C2=A0 __a=

ttribute__((aligned(32)));
 typedef double =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0cl_double8[8] =C2=A0 __a=

ttribute__((aligned(64)));
 typedef double =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0cl_double16[16] __attrib=

ute__((aligned(128)));
 into just


 alias double[2] =C2=A0 =C2=A0cl_double2;
 alias double[4] =C2=A0 =C2=A0cl_double4;
 alias double[8] =C2=A0 =C2=A0cl_double8;
 alias double[16] =C2=A0 cl_double16;

 ?

yep, D provides no way to do this, they'd all align to 4 bytes (at
least on x86-32)

Nov 17 2009

bearophile <bearophileHUGS lycos.com> writes:

Tomas Lindquist Olsen:

 yep, D provides no way to do this, they'd all align to 4 bytes (at
 least on x86-32)

The idea, that I suggested to the LDC team too, is to extend the semantics of
align, no new syntax seems needed:

align(8) alias int[4] Foo;
align(8) double good;

Bye,
bearophile

Nov 17 2009

"Robert Jacques" <sandford jhu.edu> writes:

On Tue, 17 Nov 2009 15:12:50 -0500, Trass3r <mrmocool gmx.de> wrote:

 I originally posted a question about this in D.learn. bearophile advised  
 me to ask for that feature here.


 Original post:
 ==============

 OpenCL requires all types to be naturally aligned.

 The D specs state:
 "AlignAttribute is ignored when applied to declarations that are not  
 struct members."

 Could there arise any problems translating the following

 /*
   * Vector types
   *
   *  Note:   OpenCL requires that all types be naturally aligned.
   *          This means that vector types must be naturally aligned.
   *          For example, a vector of four floats must be aligned to
   *          a 16 byte boundary (calculated as 4 * the natural 4-byte
   *          alignment of the float).  The alignment qualifiers here
   *          will only function properly if your compiler supports them
   *          and if you don't actively work to defeat them.  For example,
   *          in order for a cl_float4 to be 16 byte aligned in a struct,
   *          the start of the struct must itself be 16-byte aligned.
   *
   *          Maintaining proper alignment is the user's responsibility.
   */

 typedef double          cl_double2[2]   __attribute__((aligned(16)));
 typedef double          cl_double4[4]   __attribute__((aligned(32)));
 typedef double          cl_double8[8]   __attribute__((aligned(64)));
 typedef double          cl_double16[16] __attribute__((aligned(128)));



 into just


 alias double[2]    cl_double2;
 alias double[4]    cl_double4;
 alias double[8]    cl_double8;
 alias double[16]   cl_double16;

 ?

To the best of my knowlegde, D only supports align(1) and align(4). On the  
other hand, compile time introspection allows my CUDA api to convert  
alignment correctly for any given struct.

As for your question, yes, there's lot's of trouble using simple aliases.  
You'll run into alignment issues with both function calling and if you use  
cl_double2, etc in structs. Of course, alignment issues only raise their  
ugly heads some of the time, which often leads to brittle code. A robust  
OpenCL binding for D needs to do alignment correction.

Nov 17 2009

Trass3r <mrmocool gmx.de> writes:

Robert Jacques schrieb:
 To the best of my knowlegde, D only supports align(1) and align(4). On 
 the other hand, compile time introspection allows my CUDA api to convert 
 alignment correctly for any given struct.
 

gotta look that up in your code.

Maybe I also find some other ideas for writing my wrapper. It currently 
is a plain OO-approach using classes for platform, device, kernel, etc.
But maybe one can exploit D's capabilities to make things easier to 
program. Something along the lines of http://ochafik.free.fr/blog/?p=207 
while not retricting what can be done with the wrapper compared to plain 
OpenCL...

Nov 17 2009

Don <nospam nospam.com> writes:

 OpenCL requires all types to be naturally aligned.
 /*
  * Vector types
  *
  *  Note:   OpenCL requires that all types be naturally aligned.
  *          This means that vector types must be naturally aligned.
  *          For example, a vector of four floats must be aligned to
  *          a 16 byte boundary (calculated as 4 * the natural 4-byte
  *          alignment of the float).  The alignment qualifiers here
  *          will only function properly if your compiler supports them
  *          and if you don't actively work to defeat them.  For example,
  *          in order for a cl_float4 to be 16 byte aligned in a struct,
  *          the start of the struct must itself be 16-byte aligned.

http://d.puremagic.com/issues/show_bug.cgi?id=2278

Nov 18 2009

Trass3r <mrmocool gmx.de> writes:

Don schrieb:
 http://d.puremagic.com/issues/show_bug.cgi?id=2278

Isn't this a distinct problem or am I wrong? This is not only about 
8-byte boundaries.

Nov 18 2009

Don <nospam nospam.com> writes:

Trass3r wrote:
 Don schrieb:
 http://d.puremagic.com/issues/show_bug.cgi?id=2278

 
 Isn't this a distinct problem or am I wrong? This is not only about 
 8-byte boundaries.

Well, sort of.
It's impossible to align stack-allocated structs with any alignment 
greater than the alignment of the stack itself (which is 4 bytes). 
Anything larger than that and you HAVE to use the heap or alloca().

Since D2.007, static items use align(16); before that, they were also 
limited to align(4).

Nothing on x86 benefits from more than 16 byte alignment, AFAIK, and 
it's never mandatory to use more than 8 byte alignment. I don't know so 
much about the recent GPUs, though -- do they really require 16 byte 
alignment or more?

Nov 18 2009

"Robert Jacques" <sandford jhu.edu> writes:

On Wed, 18 Nov 2009 11:03:19 -0500, Don <nospam nospam.com> wrote:

 Trass3r wrote:
 Don schrieb:
 http://d.puremagic.com/issues/show_bug.cgi?id=2278

  Isn't this a distinct problem or am I wrong? This is not only about  
 8-byte boundaries.

 Well, sort of.
 It's impossible to align stack-allocated structs with any alignment  
 greater than the alignment of the stack itself (which is 4 bytes).  
 Anything larger than that and you HAVE to use the heap or alloca().

 Since D2.007, static items use align(16); before that, they were also  
 limited to align(4).

 Nothing on x86 benefits from more than 16 byte alignment, AFAIK, and  
 it's never mandatory to use more than 8 byte alignment. I don't know so  
 much about the recent GPUs, though -- do they really require 16 byte  
 alignment or more?

NVIDIA only requires 16-byte alignment.

Nov 18 2009

Trass3r <mrmocool gmx.de> writes:

Don schrieb:
 Well, sort of.
 It's impossible to align stack-allocated structs with any alignment 
 greater than the alignment of the stack itself (which is 4 bytes). 
 Anything larger than that and you HAVE to use the heap or alloca().
 

So how do other compilers supporting that alignment syntax do it?

 Nothing on x86 benefits from more than 16 byte alignment, AFAIK, and 
 it's never mandatory to use more than 8 byte alignment. I don't know so 
 much about the recent GPUs, though -- do they really require 16 byte 
 alignment or more?
 

I'm not sure how exactly this works and why they require alignment. 
Couldn't find anything about that in the clEnqueueWriteBuffer 
description where data gets written into GPU memory.


The specification for the OpenCL C language itself only states:

A data item declared to be a data type in memory is always aligned to 
the size of the data type in bytes.  For example, a float4 variable will 
be aligned to a 16-byte boundary, a char2 variable will be aligned to a 
2-byte boundary.

A built-in data type that is not a power of two bytes in size must be 
aligned to the next larger power of two.  This rule applies to built-in 
types only, not structs or unions.



They also strangely state:

The components of vector data types with 1 ... 4 components can be 
addressed as <vector_data_type>.xyzw.

float4 c, a, b;

c.xyzw = (float4)(1.0f, 2.0f, 3.0f, 4.0f);
c.z = 1.0f;         // is a float
c.xy = (float2)(3.0f, 4.0f); // is a float2



So I wonder why they used arrays in the headers and not structs to be 
consistent with this.

Nov 18 2009

Don <nospam nospam.com> writes:

Trass3r wrote:
 Don schrieb:
 Well, sort of.
 It's impossible to align stack-allocated structs with any alignment 
 greater than the alignment of the stack itself (which is 4 bytes). 
 Anything larger than that and you HAVE to use the heap or alloca().

 
 So how do other compilers supporting that alignment syntax do it?

It might only be required on particular CPUs/OSes. Eg requirements for 
Sparc are quite different.
Some of them might be doing alloca() under the covers.

 Nothing on x86 benefits from more than 16 byte alignment, AFAIK, and 
 it's never mandatory to use more than 8 byte alignment. I don't know 
 so much about the recent GPUs, though -- do they really require 16 
 byte alignment or more?

 
 I'm not sure how exactly this works and why they require alignment. 
 Couldn't find anything about that in the clEnqueueWriteBuffer 
 description where data gets written into GPU memory.
 
 
 The specification for the OpenCL C language itself only states:
 
 A data item declared to be a data type in memory is always aligned to 
 the size of the data type in bytes.  For example, a float4 variable will 
 be aligned to a 16-byte boundary, a char2 variable will be aligned to a 
 2-byte boundary.
 
 A built-in data type that is not a power of two bytes in size must be 
 aligned to the next larger power of two.  This rule applies to built-in 
 types only, not structs or unions.
 
 
 
 They also strangely state:
 
 The components of vector data types with 1 ... 4 components can be 
 addressed as <vector_data_type>.xyzw.
 
 float4 c, a, b;
 
 c.xyzw = (float4)(1.0f, 2.0f, 3.0f, 4.0f);
 c.z = 1.0f;         // is a float
 c.xy = (float2)(3.0f, 4.0f); // is a float2
 
 
 
 So I wonder why they used arrays in the headers and not structs to be 
 consistent with this.

Nov 18 2009

D Programming

C/C++ Programming

Other

digitalmars.D - alignment on stack-allocated arrays/structs