www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - How to do alligned allocation?

reply Quirin Schroll <qs.il.paperinik gmail.com> writes:
When I do `new void[](n)`, is that buffer allocated with an 
alignment of 1 or what are the guarantees? How can I set an 
alignment? Also, is the alignment of any type guaranteed to be a 
power of 2?
Sep 30 2022
next sibling parent reply mw <mingwu gmail.com> writes:
On Friday, 30 September 2022 at 15:57:22 UTC, Quirin Schroll 
wrote:
 When I do `new void[](n)`, is that buffer allocated with an 
 alignment of 1 or what are the guarantees? How can I set an 
 alignment? Also, is the alignment of any type guaranteed to be 
 a power of 2?
https://dlang.org/library/core/stdc/stdlib/aligned_alloc.html It's the C func, so check C lib doc.
Sep 30 2022
parent mw <mingwu gmail.com> writes:
On Friday, 30 September 2022 at 16:23:00 UTC, mw wrote:
 On Friday, 30 September 2022 at 15:57:22 UTC, Quirin Schroll 
 wrote:
 When I do `new void[](n)`, is that buffer allocated with an 
 alignment of 1 or what are the guarantees? How can I set an 
 alignment? Also, is the alignment of any type guaranteed to be 
 a power of 2?
https://dlang.org/library/core/stdc/stdlib/aligned_alloc.html It's the C func, so check C lib doc.
and then use emplace on the C-alloc-ed memory.
Sep 30 2022
prev sibling next sibling parent reply tsbockman <thomas.bockman gmail.com> writes:
On Friday, 30 September 2022 at 15:57:22 UTC, Quirin Schroll 
wrote:
 When I do `new void[](n)`, is that buffer allocated with an 
 alignment of 1 or what are the guarantees?
It is guaranteed an alignment of at least 1 because `void.alignof == 1` (and because that is the lowest possible integer alignment). When I last checked, `new T` guaranteed a minimum alignment of `min(T.alignof, 16)`, meaning that all basic scalar types (`int`, `double`, pointers, etc.), and SIMD `__vector`s up to 128 bits will be correctly aligned, while 256 bit (for example, AVX's `__vector(double[4])`) and 512 bit (AVX512) types might not be. Arrays and aggregate types (`struct`s and `class`es) by default use the maximum alignment required by any of their elements or fields (including hidden fields, like `__vptr` for `class`es). This can be overridden manually using the `align` attribute, which must be applied to the aggregate type as a whole. (Applying `align` to an individual field does something else.)
 How can I set an alignment?
If the desired alignment is `<= 16`, you can specify a type with that `.alignof`. However, if you may need higher alignment than the maximum guaranteed to be available from the allocator, or you are not writing strongly typed code to begin with, as implied by your use of `void[]`, you can just align the allocation yourself: ```D void[] newAligned(const(size_t) alignment)(const(size_t) size) pure trusted nothrow if(1 <= alignment && isPowerOf2(alignment)) { enum alignMask = alignment - 1; void[] ret = new void[size + alignMask]; const misalign = (cast(size_t) ret.ptr) & alignMask; const offset = (alignment - misalign) & alignMask; ret = ret[offset .. offset + size]; return ret; } ``` However, aligning memory outside of the allocator itself like this does waste up to `alignment - 1` bytes per allocation, so it's best to use as much of the allocator's internal alignment capability as possible: ```D import core.bitop : bsr; import std.math : isPowerOf2; import std.meta : AliasSeq; void[] newAligned(const(size_t) alignment)(const(size_t) size) pure trusted nothrow if(1 <= alignment && isPowerOf2(alignment)) { alias Aligned = .Aligned!alignment; void[] ret = new Aligned.Chunk[(size + Aligned.mask) >> Aligned.chunkShift]; static if(Aligned.Chunk.alignof == alignment) enum size_t offset = 0; else { const misalign = (cast(size_t) ret.ptr) & Aligned.mask; const offset = (alignment - misalign) & Aligned.mask; } ret = ret[offset .. offset + size]; return ret; } private { align(16) struct Chunk16 { void[16] data; } template Aligned(size_t alignment) if(1 <= alignment && isPowerOf2(alignment)) { enum int shift = bsr(alignment); enum size_t mask = alignment - 1; static if(alignment <= 16) { enum chunkShift = shift, chunkMask = mask; alias Chunk = AliasSeq!(ubyte, ushort, uint, ulong, Chunk16)[shift]; } else { enum chunkShift = Aligned!(16).shift, chunkMask = Aligned!(16).mask; alias Chunk = Aligned!(16).Chunk; } } } safe unittest { static immutable(size_t[]) alignments = [ 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024 ]; static immutable(size_t[]) sizes = [ 9, 31, 4, 57, 369, 3358 ]; foreach(size; sizes) { static foreach(alignment; alignments) { { void[] memory = newAligned!alignment(size); assert(memory.length == size); assert((cast(size_t) &(memory[0])) % alignment == 0); } } } } ```
 Also, is the alignment of any type guaranteed to be a power of 
 2?
In practice, yes. On Friday, 30 September 2022 at 16:23:00 UTC, mw wrote:
 https://dlang.org/library/core/stdc/stdlib/aligned_alloc.html

 It's the C func, so check C lib doc.
https://en.cppreference.com/w/c/memory/aligned_alloc Note that common implementations place arbitrary restrictions on the alignments and sizes accepted by `aligned_alloc`, so to support the general case you would still need a wrapper function like the one I provided above. (If this all seems overly complicated, that's because it is. I have no idea why allocators don't just build in the logic above; it's extremely simple compared to the rest of what a good general-purpose heap allocator does.)
Sep 30 2022
parent tsbockman <thomas.bockman gmail.com> writes:
On Saturday, 1 October 2022 at 00:32:28 UTC, tsbockman wrote:
             alias Chunk = AliasSeq!(ubyte, ushort, uint, ulong, 
 Chunk16)[shift];
Oops, I forgot that `ulong.alignof` is platform dependent. It's probably best to just go ahead and explicitly specify the alignment for all `Chunk` types: ```D private template Aligned(size_t alignment) if(1 <= alignment && isPowerOf2(alignment)) { enum int shift = bsr(alignment); enum size_t mask = alignment - 1; static if(alignment <= 16) { enum chunkShift = shift, chunkMask = mask; align(alignment) struct Chunk { void[alignment] data; } } else { enum chunkShift = Aligned!(16).shift, chunkMask = Aligned!(16).mask; alias Chunk = Aligned!(16).Chunk; } } ``` (This also eliminates the `std.meta : AliasSeq` dependency.)
Sep 30 2022
prev sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 9/30/22 11:57 AM, Quirin Schroll wrote:
 When I do `new void[](n)`, is that buffer allocated with an alignment of 
 1 or what are the guarantees? How can I set an alignment? Also, is the 
 alignment of any type guaranteed to be a power of 2?
In practice, it's not necessarily a power of 2, but it's *at least* 16 bytes. In general there are very few types (maybe vectors?) that need alignment more than 16 bytes. The list of bit sizes is currently here: https://github.com/dlang/dmd/blob/82870e890f6f0e0dca3e8f0032a7819416319124/druntime/src/core/internal/gc/impl/conservative/gc.d#L1392-L1414 -Steve
Sep 30 2022
parent reply tsbockman <thomas.bockman gmail.com> writes:
On Saturday, 1 October 2022 at 01:37:00 UTC, Steven Schveighoffer 
wrote:
 On 9/30/22 11:57 AM, Quirin Schroll wrote:
 Also, is the alignment of any type guaranteed to be a power of 
 2?
In practice, it's not necessarily a power of 2, but it's *at least* 16 bytes.
**Types** always require some power of 2 alignment (on any sensible platform, anyway), and it is usually *less* than 16 bytes - typically `size_t.sizeof`. The fact that the current GC implementation apparently has a minimum block size of 16 bytes, and that minimum size blocks are always size-aligned, is not guaranteed by the public API and *should not be* when requesting memory for something that the type system says only requires an alignment of `void.alignof == 1`. D and C both have formal ways to communicate alignment requirements to the allocator; people should use them and not constrain all future D GC development to conform to undocumented details of the current implementation.
 In general there are very few types (maybe vectors?) that need 
 alignment more than 16 bytes.
256 bit SIMD (AVX/AVX2) and 512 bit SIMD (AVX512) `__vector`s should be `.sizeof` aligned (32 and 64 bytes, respectively). Memory used for inter-thread communication (such as mutexes) may perform significantly better if cache line aligned (typically 64 bytes, but CPU dependent). I don't know any other examples off the top of my head.
 The list of bit sizes is currently here:
I'm pretty sure those are in **bytes** not **bits**.
 https://github.com/dlang/dmd/blob/82870e890f6f0e0dca3e8f0032a7819416319124/druntime/src/core/internal/gc/impl/conservative/gc.d#L1392-L1414
That's not a list of alignments, it is block sizes for some GC memory pools. The alignment of each block depends on the alignment of its pool, not just its size. It's not immediately obvious from the context, but I suspect the pools are actually page aligned, which would mean that the non power of 2 sized blocks are **not** consistently aligned to their own sizes. Regardless, it's not part of the public API, so it could change without warning.
Sep 30 2022
parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 10/1/22 12:57 AM, tsbockman wrote:
 On Saturday, 1 October 2022 at 01:37:00 UTC, Steven Schveighoffer wrote:
 The list of bit sizes is currently here:
I'm pretty sure those are in **bytes** not **bits**.
Yes, I meant bytes, sorry.
 
 That's not a list of alignments, it is block sizes for some GC memory 
 pools. The alignment of each block depends on the alignment of its pool, 
 not just its size.
Pools are all page multiples. Each pool is split equally into bin sizes, from that list.
 Regardless, it's not part of the public API, so it could change without 
 warning.
Hence the "in practice" qualifier. Is it theoretically possible for a GC implementation to use smaller bin sizes, but it will never happen. Consider that in small bins ( < 1 page ), no two bins are concatenated together. So if you had a bin of size 1, it means you would only allocate *one byte blocks*, never combining them. Again, these are all implementation details, that likely will never change. -Steve
Oct 01 2022