digitalmars.D - alignment on stack-allocated arrays/structs
- Trass3r <mrmocool gmx.de> Nov 17 2009
- Tomas Lindquist Olsen <tomas.l.olsen gmail.com> Nov 17 2009
- bearophile <bearophileHUGS lycos.com> Nov 17 2009
- "Robert Jacques" <sandford jhu.edu> Nov 17 2009
- Trass3r <mrmocool gmx.de> Nov 17 2009
- Don <nospam nospam.com> Nov 18 2009
- Trass3r <mrmocool gmx.de> Nov 18 2009
- Don <nospam nospam.com> Nov 18 2009
- Trass3r <mrmocool gmx.de> Nov 18 2009
- Don <nospam nospam.com> Nov 18 2009
- "Robert Jacques" <sandford jhu.edu> Nov 18 2009
I originally posted a question about this in D.learn. bearophile advised me to ask for that feature here. Original post: ============== OpenCL requires all types to be naturally aligned. The D specs state: "AlignAttribute is ignored when applied to declarations that are not struct members." Could there arise any problems translating the following /* * Vector types * * Note: OpenCL requires that all types be naturally aligned. * This means that vector types must be naturally aligned. * For example, a vector of four floats must be aligned to * a 16 byte boundary (calculated as 4 * the natural 4-byte * alignment of the float). The alignment qualifiers here * will only function properly if your compiler supports them * and if you don't actively work to defeat them. For example, * in order for a cl_float4 to be 16 byte aligned in a struct, * the start of the struct must itself be 16-byte aligned. * * Maintaining proper alignment is the user's responsibility. */ typedef double cl_double2[2] __attribute__((aligned(16))); typedef double cl_double4[4] __attribute__((aligned(32))); typedef double cl_double8[8] __attribute__((aligned(64))); typedef double cl_double16[16] __attribute__((aligned(128))); into just alias double[2] cl_double2; alias double[4] cl_double4; alias double[8] cl_double8; alias double[16] cl_double16; ?
Nov 17 2009
On Tue, Nov 17, 2009 at 9:12 PM, Trass3r <mrmocool gmx.de> wrote:I originally posted a question about this in D.learn. bearophile advised =
to ask for that feature here. Original post: =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D OpenCL requires all types to be naturally aligned. The D specs state: "AlignAttribute is ignored when applied to declarations that are not stru=
members." Could there arise any problems translating the following /* =C2=A0* Vector types =C2=A0* =C2=A0* =C2=A0Note: =C2=A0 OpenCL requires that all types be naturally al=
=C2=A0* =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0This means that vector types mu=
=C2=A0* =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0For example, a vector of four f=
=C2=A0* =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0a 16 byte boundary (calculated =
=C2=A0* =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0alignment of the float). =C2=A0=
=C2=A0* =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0will only function properly if =
=C2=A0* =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0and if you don't actively work =
=C2=A0* =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0in order for a cl_float4 to be =
=C2=A0* =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0the start of the struct must it=
=C2=A0* =C2=A0* =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Maintaining proper alignment is=
=C2=A0*/ typedef double =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0cl_double2[2] =C2=A0 __a=
typedef double =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0cl_double4[4] =C2=A0 __a=
typedef double =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0cl_double8[8] =C2=A0 __a=
typedef double =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0cl_double16[16] __attrib=
into just alias double[2] =C2=A0 =C2=A0cl_double2; alias double[4] =C2=A0 =C2=A0cl_double4; alias double[8] =C2=A0 =C2=A0cl_double8; alias double[16] =C2=A0 cl_double16; ?
yep, D provides no way to do this, they'd all align to 4 bytes (at least on x86-32)
Nov 17 2009
Tomas Lindquist Olsen:yep, D provides no way to do this, they'd all align to 4 bytes (at least on x86-32)
The idea, that I suggested to the LDC team too, is to extend the semantics of align, no new syntax seems needed: align(8) alias int[4] Foo; align(8) double good; Bye, bearophile
Nov 17 2009
On Tue, 17 Nov 2009 15:12:50 -0500, Trass3r <mrmocool gmx.de> wrote:I originally posted a question about this in D.learn. bearophile advised me to ask for that feature here. Original post: ============== OpenCL requires all types to be naturally aligned. The D specs state: "AlignAttribute is ignored when applied to declarations that are not struct members." Could there arise any problems translating the following /* * Vector types * * Note: OpenCL requires that all types be naturally aligned. * This means that vector types must be naturally aligned. * For example, a vector of four floats must be aligned to * a 16 byte boundary (calculated as 4 * the natural 4-byte * alignment of the float). The alignment qualifiers here * will only function properly if your compiler supports them * and if you don't actively work to defeat them. For example, * in order for a cl_float4 to be 16 byte aligned in a struct, * the start of the struct must itself be 16-byte aligned. * * Maintaining proper alignment is the user's responsibility. */ typedef double cl_double2[2] __attribute__((aligned(16))); typedef double cl_double4[4] __attribute__((aligned(32))); typedef double cl_double8[8] __attribute__((aligned(64))); typedef double cl_double16[16] __attribute__((aligned(128))); into just alias double[2] cl_double2; alias double[4] cl_double4; alias double[8] cl_double8; alias double[16] cl_double16; ?
To the best of my knowlegde, D only supports align(1) and align(4). On the other hand, compile time introspection allows my CUDA api to convert alignment correctly for any given struct. As for your question, yes, there's lot's of trouble using simple aliases. You'll run into alignment issues with both function calling and if you use cl_double2, etc in structs. Of course, alignment issues only raise their ugly heads some of the time, which often leads to brittle code. A robust OpenCL binding for D needs to do alignment correction.
Nov 17 2009
Robert Jacques schrieb:To the best of my knowlegde, D only supports align(1) and align(4). On the other hand, compile time introspection allows my CUDA api to convert alignment correctly for any given struct.
gotta look that up in your code. Maybe I also find some other ideas for writing my wrapper. It currently is a plain OO-approach using classes for platform, device, kernel, etc. But maybe one can exploit D's capabilities to make things easier to program. Something along the lines of http://ochafik.free.fr/blog/?p=207 while not retricting what can be done with the wrapper compared to plain OpenCL...
Nov 17 2009
OpenCL requires all types to be naturally aligned. /* * Vector types * * Note: OpenCL requires that all types be naturally aligned. * This means that vector types must be naturally aligned. * For example, a vector of four floats must be aligned to * a 16 byte boundary (calculated as 4 * the natural 4-byte * alignment of the float). The alignment qualifiers here * will only function properly if your compiler supports them * and if you don't actively work to defeat them. For example, * in order for a cl_float4 to be 16 byte aligned in a struct, * the start of the struct must itself be 16-byte aligned.
http://d.puremagic.com/issues/show_bug.cgi?id=2278
Nov 18 2009
Don schrieb:http://d.puremagic.com/issues/show_bug.cgi?id=2278
Isn't this a distinct problem or am I wrong? This is not only about 8-byte boundaries.
Nov 18 2009
Trass3r wrote:Don schrieb:http://d.puremagic.com/issues/show_bug.cgi?id=2278
Isn't this a distinct problem or am I wrong? This is not only about 8-byte boundaries.
Well, sort of. It's impossible to align stack-allocated structs with any alignment greater than the alignment of the stack itself (which is 4 bytes). Anything larger than that and you HAVE to use the heap or alloca(). Since D2.007, static items use align(16); before that, they were also limited to align(4). Nothing on x86 benefits from more than 16 byte alignment, AFAIK, and it's never mandatory to use more than 8 byte alignment. I don't know so much about the recent GPUs, though -- do they really require 16 byte alignment or more?
Nov 18 2009
Don schrieb:Well, sort of. It's impossible to align stack-allocated structs with any alignment greater than the alignment of the stack itself (which is 4 bytes). Anything larger than that and you HAVE to use the heap or alloca().
So how do other compilers supporting that alignment syntax do it?Nothing on x86 benefits from more than 16 byte alignment, AFAIK, and it's never mandatory to use more than 8 byte alignment. I don't know so much about the recent GPUs, though -- do they really require 16 byte alignment or more?
I'm not sure how exactly this works and why they require alignment. Couldn't find anything about that in the clEnqueueWriteBuffer description where data gets written into GPU memory. The specification for the OpenCL C language itself only states: A data item declared to be a data type in memory is always aligned to the size of the data type in bytes. For example, a float4 variable will be aligned to a 16-byte boundary, a char2 variable will be aligned to a 2-byte boundary. A built-in data type that is not a power of two bytes in size must be aligned to the next larger power of two. This rule applies to built-in types only, not structs or unions. They also strangely state: The components of vector data types with 1 ... 4 components can be addressed as <vector_data_type>.xyzw. float4 c, a, b; c.xyzw = (float4)(1.0f, 2.0f, 3.0f, 4.0f); c.z = 1.0f; // is a float c.xy = (float2)(3.0f, 4.0f); // is a float2 So I wonder why they used arrays in the headers and not structs to be consistent with this.
Nov 18 2009
Trass3r wrote:Don schrieb:Well, sort of. It's impossible to align stack-allocated structs with any alignment greater than the alignment of the stack itself (which is 4 bytes). Anything larger than that and you HAVE to use the heap or alloca().
So how do other compilers supporting that alignment syntax do it?
It might only be required on particular CPUs/OSes. Eg requirements for Sparc are quite different. Some of them might be doing alloca() under the covers.Nothing on x86 benefits from more than 16 byte alignment, AFAIK, and it's never mandatory to use more than 8 byte alignment. I don't know so much about the recent GPUs, though -- do they really require 16 byte alignment or more?
I'm not sure how exactly this works and why they require alignment. Couldn't find anything about that in the clEnqueueWriteBuffer description where data gets written into GPU memory. The specification for the OpenCL C language itself only states: A data item declared to be a data type in memory is always aligned to the size of the data type in bytes. For example, a float4 variable will be aligned to a 16-byte boundary, a char2 variable will be aligned to a 2-byte boundary. A built-in data type that is not a power of two bytes in size must be aligned to the next larger power of two. This rule applies to built-in types only, not structs or unions. They also strangely state: The components of vector data types with 1 ... 4 components can be addressed as <vector_data_type>.xyzw. float4 c, a, b; c.xyzw = (float4)(1.0f, 2.0f, 3.0f, 4.0f); c.z = 1.0f; // is a float c.xy = (float2)(3.0f, 4.0f); // is a float2 So I wonder why they used arrays in the headers and not structs to be consistent with this.
Nov 18 2009
On Wed, 18 Nov 2009 11:03:19 -0500, Don <nospam nospam.com> wrote:Trass3r wrote:Don schrieb:http://d.puremagic.com/issues/show_bug.cgi?id=2278
8-byte boundaries.
Well, sort of. It's impossible to align stack-allocated structs with any alignment greater than the alignment of the stack itself (which is 4 bytes). Anything larger than that and you HAVE to use the heap or alloca(). Since D2.007, static items use align(16); before that, they were also limited to align(4). Nothing on x86 benefits from more than 16 byte alignment, AFAIK, and it's never mandatory to use more than 8 byte alignment. I don't know so much about the recent GPUs, though -- do they really require 16 byte alignment or more?
NVIDIA only requires 16-byte alignment.
Nov 18 2009









bearophile <bearophileHUGS lycos.com> 