www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Stack-allocated arrays

reply dsimcha <dsimcha yahoo.com> writes:
I've noticed that, when writing multithreaded number crunching code,
allocating memory for temporary storage can be a huge threading bottleneck.
Of course, the obvious, but ugly solution is to pre-allocate and recycle temp
variables.  However, this breaks encapsulation at every place imaginable.

An alternative that I've tried recently is to use alloca to stack allocate the
temporary arrays if they are small enough.  If the arrays being allocated are
larger, I heap allocate, and this isn't a problem because in the large array
case, the number crunching takes up enough time that heap allocations aren't
much of a bottleneck anymore.  However, this is also butt ugly because I have
to use a bunch of pointers and casts and explicitly test the size for it to
work.  Given that this is highly useful, I think a nice feature for D would be
to have some kind of a scheme to make this clean.  For example:

auto foo = newTemp[length];

If length is larger than some arbitrary but reasonable value, this is heap
allocated.  If length is small, it is stack allocated.  Either way, allowing
the array to escape the current scope is undefined behavior and the compiler
would detect at least the simple cases of this.  Ideally, since this would
only be used for temp variables in performance-critical code, the contents of
the array would also not be initialized.

It would be nice to do this in a library, but there's no obvious way to do it
without compiler support, since calling a function sets up a new stack frame.
 I'd like to just do it with an interface like:

auto foo = newTemp!(T)(length).

Anyone know any ASM frame pointer hacks to make something like this workable?
 Otherwise, is there enough interest in this that it might belong in the core
language?
Nov 11 2008
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
dsimcha wrote:
 I've noticed that, when writing multithreaded number crunching code,
 allocating memory for temporary storage can be a huge threading bottleneck.
 Of course, the obvious, but ugly solution is to pre-allocate and recycle temp
 variables.  However, this breaks encapsulation at every place imaginable.
 
 An alternative that I've tried recently is to use alloca to stack allocate the
 temporary arrays if they are small enough.  If the arrays being allocated are
 larger, I heap allocate, and this isn't a problem because in the large array
 case, the number crunching takes up enough time that heap allocations aren't
 much of a bottleneck anymore.  However, this is also butt ugly because I have
 to use a bunch of pointers and casts and explicitly test the size for it to
 work.  Given that this is highly useful, I think a nice feature for D would be
 to have some kind of a scheme to make this clean.  For example:
 
 auto foo = newTemp[length];
 
 If length is larger than some arbitrary but reasonable value, this is heap
 allocated.  If length is small, it is stack allocated.  Either way, allowing
 the array to escape the current scope is undefined behavior and the compiler
 would detect at least the simple cases of this.  Ideally, since this would
 only be used for temp variables in performance-critical code, the contents of
 the array would also not be initialized.
 
 It would be nice to do this in a library, but there's no obvious way to do it
 without compiler support, since calling a function sets up a new stack frame.
  I'd like to just do it with an interface like:
 
 auto foo = newTemp!(T)(length).
 
 Anyone know any ASM frame pointer hacks to make something like this workable?
  Otherwise, is there enough interest in this that it might belong in the core
 language?
I, for one, am very interested and pleaded to Walter to introduce stack-allocation for simple arrays. I could point at an informal case study in which many students of mine discovered the related g++ language extension for C++ and used it to great effect. Andrei
Nov 11 2008
prev sibling next sibling parent reply "Denis Koroskin" <2korden gmail.com> writes:
On Tue, 11 Nov 2008 20:39:07 +0300, dsimcha <dsimcha yahoo.com> wrote:

 I've noticed that, when writing multithreaded number crunching code,
 allocating memory for temporary storage can be a huge threading  
 bottleneck.
 Of course, the obvious, but ugly solution is to pre-allocate and recycle  
 temp
 variables.  However, this breaks encapsulation at every place imaginable.

 An alternative that I've tried recently is to use alloca to stack  
 allocate the
 temporary arrays if they are small enough.  If the arrays being  
 allocated are
 larger, I heap allocate, and this isn't a problem because in the large  
 array
 case, the number crunching takes up enough time that heap allocations  
 aren't
 much of a bottleneck anymore.  However, this is also butt ugly because I  
 have
 to use a bunch of pointers and casts and explicitly test the size for it  
 to
 work.  Given that this is highly useful, I think a nice feature for D  
 would be
 to have some kind of a scheme to make this clean.  For example:

 auto foo = newTemp[length];

 If length is larger than some arbitrary but reasonable value, this is  
 heap
 allocated.  If length is small, it is stack allocated.  Either way,  
 allowing
 the array to escape the current scope is undefined behavior and the  
 compiler
 would detect at least the simple cases of this.  Ideally, since this  
 would
 only be used for temp variables in performance-critical code, the  
 contents of
 the array would also not be initialized.

 It would be nice to do this in a library, but there's no obvious way to  
 do it
 without compiler support, since calling a function sets up a new stack  
 frame.
  I'd like to just do it with an interface like:

 auto foo = newTemp!(T)(length).

 Anyone know any ASM frame pointer hacks to make something like this  
 workable?
  Otherwise, is there enough interest in this that it might belong in the  
 core
 language?
I would *LOVE* to see Variable-Length Arrays in D (they are implemented in C99) void foo(int i) { int[i] myArray; // this is a static stack-allocated array (size can't be changed) assert(myArray.length() == i); }
Nov 11 2008
parent dsimcha <dsimcha yahoo.com> writes:
== Quote from Denis Koroskin (2korden gmail.com)'s article
 I would *LOVE* to see Variable-Length Arrays in D (they are implemented in
 C99)
 void foo(int i)
 {
     int[i] myArray; // this is a static stack-allocated array (size can't
 be changed)
     assert(myArray.length() == i);
 }
Yes, a few people have said this in the past. I'm suggesting something a little more than that. I'm saying that, to make this relatively safe, the compiler automatically "does the right thing" depending on how large an array you are allocating. If it's big enough to risk overflowing the stack, then it does heap allocation. Better yet, if the array is large, it does heap allocation *and* deletes the array for you at any exit points of the function so that the GC doesn't run as much, since the lifetime of the array is known at compile time.
Nov 11 2008
prev sibling next sibling parent Kagamin <spam here.lot> writes:
Since your code is designed to work in this way, you can implement allocation
in a library. If you have one heap per thread, alloc is no more bottleneck. If
you allocate only scope arrays, heap will have simplest stack design and will
be extremely fast for this. Though with such design you should take caution
when choosing in what heap to allocate and in what heap to free :3
Nov 11 2008
prev sibling next sibling parent reply Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
dsimcha wrote:
 I've noticed that, when writing multithreaded number crunching code,
 allocating memory for temporary storage can be a huge threading bottleneck.
 Of course, the obvious, but ugly solution is to pre-allocate and recycle temp
 variables.  However, this breaks encapsulation at every place imaginable.
 
 An alternative that I've tried recently is to use alloca to stack allocate the
 temporary arrays if they are small enough.  If the arrays being allocated are
 larger, I heap allocate, and this isn't a problem because in the large array
 case, the number crunching takes up enough time that heap allocations aren't
 much of a bottleneck anymore.  However, this is also butt ugly because I have
 to use a bunch of pointers and casts and explicitly test the size for it to
 work.  Given that this is highly useful, I think a nice feature for D would be
 to have some kind of a scheme to make this clean.  For example:
 
 auto foo = newTemp[length];
 
 If length is larger than some arbitrary but reasonable value, this is heap
 allocated.  If length is small, it is stack allocated.  Either way, allowing
 the array to escape the current scope is undefined behavior and the compiler
 would detect at least the simple cases of this.  Ideally, since this would
I'd love for "scope foo = new T[len];" to do for arrays what "scope bar = new Class;" does for classes. And indeed, if it's too big the compiler can always fall back on heap-allocation. That could be slightly tricky if you want to make sure the array is deleted in the heap-alloc case, but an initial implementation could always just let the GC handle it. ("T[len] foo;" for non-constant len would also be a nice syntax) A workaround you could use: ----- // N == your "arbitrary but reasonable value", use whatever you want. // (Add "= void" to next line to avoid initialization if you prefer) T[N] stackbuf; // don't use this except in next line T[] buf = stackbuf; // Set reference to stack buffer // Shrink to smaller size or heap-allocate bigger size: buf.length = length; ----- The disadvantage being, of course, that stackbuf will always be N items. If the majority of the cases is for the needed space to be much smaller an alloca-using solution may use much less stack space, especially in code where functions using this call each other.
 only be used for temp variables in performance-critical code, the contents of
 the array would also not be initialized.
I disagree with not initializing it. By default it should be initialized. An explicit "don't initialize this, please" syntax would be nice, but the default should be safe.
 It would be nice to do this in a library, but there's no obvious way to do it
 without compiler support, since calling a function sets up a new stack frame.
  I'd like to just do it with an interface like:
 
 auto foo = newTemp!(T)(length).
 
 Anyone know any ASM frame pointer hacks to make something like this workable?
  Otherwise, is there enough interest in this that it might belong in the core
 language?
Implementing alloca isn't actually all that hard in ASM; it just won't be in any way portable. The needed code depends on both the architecture (obviously) and the calling convention. Oh, and on the assumption that the compiler generates no code that assumes it knows the stack frame size (i.e. don't use -fomit-frame-pointer or equivalent, make sure your compiler uses a frame pointer[1]) [1]: And doesn't extend the frame and address the extended part using the frame pointer -- but most compilers don't do that anyway because it's less efficient, better to just pre-allocate the entire stackframe at the start.
Nov 11 2008
next sibling parent dsimcha <dsimcha yahoo.com> writes:
== Quote from Frits van Bommel (fvbommel REMwOVExCAPSs.nl)'s article
 A workaround you could use:
 -----
 // N == your "arbitrary but reasonable value", use whatever you want.
 // (Add "= void" to next line to avoid initialization if you prefer)
 T[N] stackbuf;	        // don't use this except in next line
 T[] buf = stackbuf;     // Set reference to stack buffer
 // Shrink to smaller size or heap-allocate bigger size:
 buf.length = length;
 -----
If I'm going to do something this hackish and ugly, I may as well just use alloca explicitly.
Nov 11 2008
prev sibling parent reply "Dave" <Dave_member pathlink.com> writes:
"Frits van Bommel" <fvbommel REMwOVExCAPSs.nl> wrote in message 
news:gfcjb4$25un$1 digitalmars.com...
 dsimcha wrote:
 I've noticed that, when writing multithreaded number crunching code,
 allocating memory for temporary storage can be a huge threading 
 bottleneck.
 Of course, the obvious, but ugly solution is to pre-allocate and recycle 
 temp
 variables.  However, this breaks encapsulation at every place imaginable.

 An alternative that I've tried recently is to use alloca to stack 
 allocate the
 temporary arrays if they are small enough.  If the arrays being allocated 
 are
 larger, I heap allocate, and this isn't a problem because in the large 
 array
 case, the number crunching takes up enough time that heap allocations 
 aren't
 much of a bottleneck anymore.  However, this is also butt ugly because I 
 have
 to use a bunch of pointers and casts and explicitly test the size for it 
 to
 work.  Given that this is highly useful, I think a nice feature for D 
 would be
 to have some kind of a scheme to make this clean.  For example:

 auto foo = newTemp[length];

 If length is larger than some arbitrary but reasonable value, this is 
 heap
 allocated.  If length is small, it is stack allocated.  Either way, 
 allowing
 the array to escape the current scope is undefined behavior and the 
 compiler
 would detect at least the simple cases of this.  Ideally, since this 
 would
I'd love for "scope foo = new T[len];" to do for arrays what "scope bar = new Class;" does for classes. And indeed, if it's too big the compiler
I'm surprised it doesn't and see that as a bit inconsistent, with the only serious argument against it being that 'scope' couldn't be used for large dynamic arrays. But then again: class C { int[264_000] a; } void foo() { scope C c = new C; ... } could also overflow the stack. In either case the work-around would be the same (increase the stack size or not use 'scope').
 can always fall back on heap-allocation. That could be slightly tricky if 
 you want to make sure the array is deleted in the heap-alloc case, but an 
 initial implementation could always just let the GC handle it.
 ("T[len] foo;" for non-constant len would also be a nice syntax)

 A workaround you could use:
 -----
 // N == your "arbitrary but reasonable value", use whatever you want.
 // (Add "= void" to next line to avoid initialization if you prefer)
 T[N] stackbuf;         // don't use this except in next line
 T[] buf = stackbuf;     // Set reference to stack buffer
 // Shrink to smaller size or heap-allocate bigger size:
 buf.length = length;
 -----
 The disadvantage being, of course, that stackbuf will always be N items. 
 If the majority of the cases is for the needed space to be much smaller an 
 alloca-using solution may use much less stack space, especially in code 
 where functions using this call each other.

 only be used for temp variables in performance-critical code, the 
 contents of
 the array would also not be initialized.
I disagree with not initializing it. By default it should be initialized. An explicit "don't initialize this, please" syntax would be nice, but the default should be safe.
 It would be nice to do this in a library, but there's no obvious way to 
 do it
 without compiler support, since calling a function sets up a new stack 
 frame.
  I'd like to just do it with an interface like:

 auto foo = newTemp!(T)(length).

 Anyone know any ASM frame pointer hacks to make something like this 
 workable?
  Otherwise, is there enough interest in this that it might belong in the 
 core
 language?
Implementing alloca isn't actually all that hard in ASM; it just won't be in any way portable. The needed code depends on both the architecture (obviously) and the calling convention. Oh, and on the assumption that the compiler generates no code that assumes it knows the stack frame size (i.e. don't use -fomit-frame-pointer or equivalent, make sure your compiler uses a frame pointer[1]) [1]: And doesn't extend the frame and address the extended part using the frame pointer -- but most compilers don't do that anyway because it's less efficient, better to just pre-allocate the entire stackframe at the start.
Nov 11 2008
parent reply Janderson <ask me.com> writes:
Dave wrote:
 I'd love for "scope foo = new T[len];" to do for arrays what "scope 
 bar = new Class;" does for classes. And indeed, if it's too big the 
 compiler
I'm surprised it doesn't and see that as a bit inconsistent, with the only serious argument against it being that 'scope' couldn't be used for large dynamic arrays. But then again: class C { int[264_000] a; } void foo() { scope C c = new C; ... } could also overflow the stack. In either case the work-around would be the same (increase the stack size or not use 'scope').
As a work around, I imagine it would be possible to write a template that used the above syntax with a static if that would change depending on the size: Something like this (untested): class FastArray(T, int size) if (size < 1000) { T[size] a; ... Overload operators } class FastArray(T, int size) if (size >= 1000) { T a[] = new T[size]; ... Overload operators } //Use void foo() { scope FastArray array = new FastArray!(int, 10); //Stack scope FastArray array = new FastArray!(int, 10000); //Heap } Of course you never know where you are in the stack, so nesting these to much would be bad.
Nov 11 2008
parent reply KennyTM~ <kennytm gmail.com> writes:
Janderson wrote:
 Dave wrote:
 I'd love for "scope foo = new T[len];" to do for arrays what "scope 
 bar = new Class;" does for classes. And indeed, if it's too big the 
 compiler
I'm surprised it doesn't and see that as a bit inconsistent, with the only serious argument against it being that 'scope' couldn't be used for large dynamic arrays. But then again: class C { int[264_000] a; } void foo() { scope C c = new C; ... } could also overflow the stack. In either case the work-around would be the same (increase the stack size or not use 'scope').
As a work around, I imagine it would be possible to write a template that used the above syntax with a static if that would change depending on the size: Something like this (untested): class FastArray(T, int size) if (size < 1000) { T[size] a; ... Overload operators } class FastArray(T, int size) if (size >= 1000) { T a[] = new T[size]; ... Overload operators } //Use void foo() { scope FastArray array = new FastArray!(int, 10); //Stack scope FastArray array = new FastArray!(int, 10000); //Heap } Of course you never know where you are in the stack, so nesting these to much would be bad.
But this won't work if size is runtime-determined.
Nov 12 2008
parent reply Janderson <ask me.com> writes:
KennyTM~ wrote:
 Janderson wrote:
 Dave wrote:
 I'd love for "scope foo = new T[len];" to do for arrays what "scope 
 bar = new Class;" does for classes. And indeed, if it's too big the 
 compiler
I'm surprised it doesn't and see that as a bit inconsistent, with the only serious argument against it being that 'scope' couldn't be used for large dynamic arrays. But then again: class C { int[264_000] a; } void foo() { scope C c = new C; ... } could also overflow the stack. In either case the work-around would be the same (increase the stack size or not use 'scope').
As a work around, I imagine it would be possible to write a template that used the above syntax with a static if that would change depending on the size: Something like this (untested): class FastArray(T, int size) if (size < 1000) { T[size] a; ... Overload operators } class FastArray(T, int size) if (size >= 1000) { T a[] = new T[size]; ... Overload operators } //Use void foo() { scope FastArray array = new FastArray!(int, 10); //Stack scope FastArray array = new FastArray!(int, 10000); //Heap } Of course you never know where you are in the stack, so nesting these to much would be bad.
But this won't work if size is runtime-determined.
Thanks for clarifying my last point :) -Joel
Nov 12 2008
parent reply KennyTM~ <kennytm gmail.com> writes:
Janderson wrote:
 KennyTM~ wrote:
 Janderson wrote:
 Dave wrote:
 I'd love for "scope foo = new T[len];" to do for arrays what "scope 
 bar = new Class;" does for classes. And indeed, if it's too big the 
 compiler
I'm surprised it doesn't and see that as a bit inconsistent, with the only serious argument against it being that 'scope' couldn't be used for large dynamic arrays. But then again: class C { int[264_000] a; } void foo() { scope C c = new C; ... } could also overflow the stack. In either case the work-around would be the same (increase the stack size or not use 'scope').
As a work around, I imagine it would be possible to write a template that used the above syntax with a static if that would change depending on the size: Something like this (untested): class FastArray(T, int size) if (size < 1000) { T[size] a; ... Overload operators } class FastArray(T, int size) if (size >= 1000) { T a[] = new T[size]; ... Overload operators } //Use void foo() { scope FastArray array = new FastArray!(int, 10); //Stack scope FastArray array = new FastArray!(int, 10000); //Heap } Of course you never know where you are in the stack, so nesting these to much would be bad.
But this won't work if size is runtime-determined.
Thanks for clarifying my last point :) -Joel
Is it? I mean things like: int size = read_from_stdin(); scope array = new FixedSizeArray!(int)(size); You can't do it FixedSizeArray!(int, size) because all template arguments must be determined at compile time. The best solution I can think of, without compiler modification is a struct/class that contains a static array member T[1024] and a dynamic array member T[] initialized to null; and the code chooses which member to use in the constructor. But this always occupies 1024*T.sizeof bytes and there will always be a conditional (if) sticked to all access methods. I hope I don't misunderstand your last point :).
Nov 12 2008
next sibling parent Max Samukha <samukha voliacable.com.removethis> writes:
On Thu, 13 Nov 2008 01:54:59 +0800, KennyTM~ <kennytm gmail.com>
wrote:

Janderson wrote:
 KennyTM~ wrote:
 Janderson wrote:
 Dave wrote:
 I'd love for "scope foo = new T[len];" to do for arrays what "scope 
 bar = new Class;" does for classes. And indeed, if it's too big the 
 compiler
I'm surprised it doesn't and see that as a bit inconsistent, with the only serious argument against it being that 'scope' couldn't be used for large dynamic arrays. But then again: class C { int[264_000] a; } void foo() { scope C c = new C; ... } could also overflow the stack. In either case the work-around would be the same (increase the stack size or not use 'scope').
As a work around, I imagine it would be possible to write a template that used the above syntax with a static if that would change depending on the size: Something like this (untested): class FastArray(T, int size) if (size < 1000) { T[size] a; ... Overload operators } class FastArray(T, int size) if (size >= 1000) { T a[] = new T[size]; ... Overload operators } //Use void foo() { scope FastArray array = new FastArray!(int, 10); //Stack scope FastArray array = new FastArray!(int, 10000); //Heap } Of course you never know where you are in the stack, so nesting these to much would be bad.
But this won't work if size is runtime-determined.
Thanks for clarifying my last point :) -Joel
Is it? I mean things like: int size = read_from_stdin(); scope array = new FixedSizeArray!(int)(size); You can't do it FixedSizeArray!(int, size) because all template arguments must be determined at compile time. The best solution I can think of, without compiler modification is a struct/class that contains a static array member T[1024] and a dynamic array member T[] initialized to null; and the code chooses which member to use in the constructor. But this always occupies 1024*T.sizeof bytes and there will always be a conditional (if) sticked to all access methods. I hope I don't misunderstand your last point :).
Mixin brute force: import std.c.stdlib; template FixedSizeArray(T, string name, alias size) { mixin ("scope " ~ name ~ " = size > 1000 ? new T[size] : (cast(T*)alloca(size * T.sizeof))[0..size];"); } void main() { size_t i = 100; mixin FixedSizeArray!(int, "arr", i); arr[] = 123; i = 10000; mixin FixedSizeArray!(int, "arr2", i); arr2[] = 123; }
Nov 12 2008
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
KennyTM~ wrote:
 The best solution I can think of, without compiler modification is a 
 struct/class that contains a static array member T[1024] and a dynamic 
 array member T[] initialized to null; and the code chooses which member 
 to use in the constructor. But this always occupies 1024*T.sizeof bytes 
 and there will always be a conditional (if) sticked to all access methods.
This entire discussion gets me thinking - could an alternate stack be a good (or even better) solution? Consider a global per-thread "superstack" - a growable region that allocates memory in large chunks and makes sub-chunks accessible in a strictly LIFO manner. The primitives of the superstack are as follows: void* SuperStack.allocate(size_t bytes); void SuperStack.free(size_t bytes); size_t SuperStack.slack(); The SuperStack's management is a singly-linked list of large blocks. The slack function returns how many bytes are left over in the current chunk. If you request more than slack bytes, a new chunk is allocated etc. A SuperStack can grow indefinitely (and is susceptible to leaks). Some convenience functions complete the picture: T[] SuperStack.array(T)(size_t objects); enum Uninitialized { yeahIKnow } T[] SuperStack.array(T)(size_t objects, Uninitialized); Freeing chunks should not be done immediately in order to avoid pathological behavior when a function repeatedly allocates and frees a chunk just to fulfill some small data needs. With the SuperStack in place, code could look like this: void foo(in size_t s) { auto a = SuperStack.array(int)(s, Uninitialized.yeahIKnow); scope(exit) SuperStack.free(s); ... play with a ... } Is there interest for such a thing? Andrei
Nov 12 2008
next sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
Andrei Alexandrescu wrote:
 KennyTM~ wrote:
 The best solution I can think of, without compiler modification is a 
 struct/class that contains a static array member T[1024] and a dynamic 
 array member T[] initialized to null; and the code chooses which 
 member to use in the constructor. But this always occupies 
 1024*T.sizeof bytes and there will always be a conditional (if) 
 sticked to all access methods.
This entire discussion gets me thinking - could an alternate stack be a good (or even better) solution? Consider a global per-thread "superstack" - a growable region that allocates memory in large chunks and makes sub-chunks accessible in a strictly LIFO manner. The primitives of the superstack are as follows: void* SuperStack.allocate(size_t bytes); void SuperStack.free(size_t bytes); size_t SuperStack.slack(); The SuperStack's management is a singly-linked list of large blocks.
It's an implementation detail, but when the owner thread dies it should null the links for all the list nodes, assuming the data contained in those nodes can be shared across threads.
 With the SuperStack in place, code could look like this:
 
 void foo(in size_t s)
 {
     auto a = SuperStack.array(int)(s, Uninitialized.yeahIKnow);
     scope(exit) SuperStack.free(s);
     ... play with a ...
 }
 
 Is there interest for such a thing?
I suppose I'm wondering what the advantage is of this approach over simply using dynamic memory. Is it the fact that allocations are non-locking? Pure stack allocations make sense because all such allocations are scoped and guaranteed to be extremely fast. So I suppose this is simply a parallel local "stack" for the same purpose and the goal is to avoid the assembly magic necessary for an alloca() type solution? And if this is the case, then I assume that sharing would be illegal (at least on D2)? My initial reaction is that I'm skeptical of any attempt at manual replacement of built-in functionality. It's been shown, for example, that custom allocators often perform worse than the default allocators. That aside, the general approach does seem reasonable. Though I'd still prefer we just get real stack-based dynamic allocation in D. It seems quite natural that "scope" should provide this for arrays as a QOI feature, as it does for classes. Sean
Nov 12 2008
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Sean Kelly wrote:
 Andrei Alexandrescu wrote:
 KennyTM~ wrote:
 The best solution I can think of, without compiler modification is a 
 struct/class that contains a static array member T[1024] and a 
 dynamic array member T[] initialized to null; and the code chooses 
 which member to use in the constructor. But this always occupies 
 1024*T.sizeof bytes and there will always be a conditional (if) 
 sticked to all access methods.
This entire discussion gets me thinking - could an alternate stack be a good (or even better) solution? Consider a global per-thread "superstack" - a growable region that allocates memory in large chunks and makes sub-chunks accessible in a strictly LIFO manner. The primitives of the superstack are as follows: void* SuperStack.allocate(size_t bytes); void SuperStack.free(size_t bytes); size_t SuperStack.slack(); The SuperStack's management is a singly-linked list of large blocks.
It's an implementation detail, but when the owner thread dies it should null the links for all the list nodes, assuming the data contained in those nodes can be shared across threads.
Damn straight it could free() them assuming it has malloc()ated them!
 With the SuperStack in place, code could look like this:

 void foo(in size_t s)
 {
     auto a = SuperStack.array(int)(s, Uninitialized.yeahIKnow);
     scope(exit) SuperStack.free(s);
     ... play with a ...
 }

 Is there interest for such a thing?
I suppose I'm wondering what the advantage is of this approach over simply using dynamic memory.
Speed. An allocation/deallocation is most of the time a counter bump and a check.
 Is it the fact that allocations are 
 non-locking?
No.
 Pure stack allocations make sense because all such 
 allocations are scoped and guaranteed to be extremely fast.  So I 
 suppose this is simply a parallel local "stack" for the same purpose and 
 the goal is to avoid the assembly magic necessary for an alloca() type 
 solution?  And if this is the case, then I assume that sharing would be 
 illegal (at least on D2)?
It is better than the stack because it can tap into the entire heap, as opposed to whatever was reserved for the stack. You can also expand an array on the superstack, whereas you can't with alloca. (I forgot to provide that primitive.) Yah, and indeed recall that D2 is all built around no implicit sharing across threads.
 My initial reaction is that I'm skeptical of any attempt at manual 
 replacement of built-in functionality.  It's been shown, for example, 
 that custom allocators often perform worse than the default allocators.
I seem to recall that. Oh, wait, I gave a talk on that, several times :o). Emery Berger studied the problem. It turns out user-defined heaps are usually worse off, with one exception: regions. SuperStack is a region. That's why I'm counting it could make a difference.
  That aside, the general approach does seem reasonable.  Though I'd 
 still prefer we just get real stack-based dynamic allocation in D.  It 
 seems quite natural that "scope" should provide this for arrays as a QOI 
 feature, as it does for classes.
I agree. There are a few problems. One is that loops can make things tricky, as you need to call alloca only once. The other has to do with the way Walter generates code, I forgot the details, but it's fixable if some time is invested. And the third is, as I said, stack allocation can actually fail long before the system runs out of memory. It has happened to me. Andrei
Nov 12 2008
next sibling parent Sean Kelly <sean invisibleduck.org> writes:
Andrei Alexandrescu wrote:
 Sean Kelly wrote:
 It's an implementation detail, but when the owner thread dies it 
 should null the links for all the list nodes, assuming the data 
 contained in those nodes can be shared across threads.
Damn straight it could free() them assuming it has malloc()ated them!
Or delete them if it's newed them :-) Assuming we want the GC to potentially scan at least some of these superblocks.
 I suppose I'm wondering what the advantage is of this approach over 
 simply using dynamic memory.
Speed. An allocation/deallocation is most of the time a counter bump and a check.
Gotcha.
 
 Is it the fact that allocations are non-locking?
No.
 Pure stack allocations make sense because all such allocations are 
 scoped and guaranteed to be extremely fast.  So I suppose this is 
 simply a parallel local "stack" for the same purpose and the goal is 
 to avoid the assembly magic necessary for an alloca() type solution?  
 And if this is the case, then I assume that sharing would be illegal 
 (at least on D2)?
It is better than the stack because it can tap into the entire heap, as opposed to whatever was reserved for the stack. You can also expand an array on the superstack, whereas you can't with alloca. (I forgot to provide that primitive.) Yah, and indeed recall that D2 is all built around no implicit sharing across threads.
Makes sense. I think stack allocations tend to get a bit weird if you ask for more than a page or so of memory anyway.
 My initial reaction is that I'm skeptical of any attempt at manual 
 replacement of built-in functionality.  It's been shown, for example, 
 that custom allocators often perform worse than the default allocators.
I seem to recall that. Oh, wait, I gave a talk on that, several times :o). Emery Berger studied the problem. It turns out user-defined heaps are usually worse off, with one exception: regions. SuperStack is a region. That's why I'm counting it could make a difference.
Fair enough :-)
  That aside, the general approach does seem reasonable.  Though I'd 
 still prefer we just get real stack-based dynamic allocation in D.  It 
 seems quite natural that "scope" should provide this for arrays as a 
 QOI feature, as it does for classes.
I agree. There are a few problems. One is that loops can make things tricky, as you need to call alloca only once. The other has to do with the way Walter generates code, I forgot the details, but it's fixable if some time is invested. And the third is, as I said, stack allocation can actually fail long before the system runs out of memory. It has happened to me.
Sounds like it's worth doing then. At worst it makes for an easy search/replace later if we get this feature natively. I'd certainly use it anyway. I currently use alloca() or new/delete depending on the situation, and it seems convenient to flatten all this into the use of a specialized library module. Sean
Nov 12 2008
prev sibling parent Don <nospam nospam.com> writes:
Andrei Alexandrescu wrote:
 Sean Kelly wrote:
 Andrei Alexandrescu wrote:
 KennyTM~ wrote:
 The best solution I can think of, without compiler modification is a 
 struct/class that contains a static array member T[1024] and a 
 dynamic array member T[] initialized to null; and the code chooses 
 which member to use in the constructor. But this always occupies 
 1024*T.sizeof bytes and there will always be a conditional (if) 
 sticked to all access methods.
This entire discussion gets me thinking - could an alternate stack be a good (or even better) solution? Consider a global per-thread "superstack" - a growable region that allocates memory in large chunks and makes sub-chunks accessible in a strictly LIFO manner. The primitives of the superstack are as follows: void* SuperStack.allocate(size_t bytes); void SuperStack.free(size_t bytes); size_t SuperStack.slack(); The SuperStack's management is a singly-linked list of large blocks.
It's an implementation detail, but when the owner thread dies it should null the links for all the list nodes, assuming the data contained in those nodes can be shared across threads.
Damn straight it could free() them assuming it has malloc()ated them!
It wouldn't even need to return the memory to the OS. It could even defer that until the next run of the gc, possibly avoiding the need for slack. (So the first action of the gc would be drop the unused space from the superstacks; if that's enough, there's no need to do a full collect). Of course that only makes sense if the OS allocation is slow. Anyway, the idea sounds great to me. It can be faster than stack allocation in some cases, since it can result in less cache pollution (dramatically so in the case where you have tail recursion of a function which is allocating large arrays; all the return addresses stay next to each other on the stack, and the allocated arrays don't need to be touched when they're being deleted).
Nov 12 2008
prev sibling parent reply "Denis Koroskin" <2korden gmail.com> writes:
On Wed, 12 Nov 2008 21:59:16 +0300, Sean Kelly <sean invisibleduck.org>  
wrote:

 Andrei Alexandrescu wrote:
 KennyTM~ wrote:
 The best solution I can think of, without compiler modification is a  
 struct/class that contains a static array member T[1024] and a dynamic  
 array member T[] initialized to null; and the code chooses which  
 member to use in the constructor. But this always occupies  
 1024*T.sizeof bytes and there will always be a conditional (if)  
 sticked to all access methods.
This entire discussion gets me thinking - could an alternate stack be a good (or even better) solution? Consider a global per-thread "superstack" - a growable region that allocates memory in large chunks and makes sub-chunks accessible in a strictly LIFO manner. The primitives of the superstack are as follows: void* SuperStack.allocate(size_t bytes); void SuperStack.free(size_t bytes); size_t SuperStack.slack(); The SuperStack's management is a singly-linked list of large blocks.
It's an implementation detail, but when the owner thread dies it should null the links for all the list nodes, assuming the data contained in those nodes can be shared across threads.
 With the SuperStack in place, code could look like this:
  void foo(in size_t s)
 {
     auto a = SuperStack.array(int)(s, Uninitialized.yeahIKnow);
     scope(exit) SuperStack.free(s);
     ... play with a ...
 }
  Is there interest for such a thing?
I suppose I'm wondering what the advantage is of this approach over simply using dynamic memory. Is it the fact that allocations are non-locking? Pure stack allocations make sense because all such allocations are scoped and guaranteed to be extremely fast.
Not only it is almost as fast as stack allocation, it is much safer (you never get stack overflow, for example).
 So I suppose this is simply a parallel local "stack" for the same  
 purpose and the goal is to avoid the assembly magic necessary for an  
 alloca() type solution?  And if this is the case, then I assume that  
 sharing would be illegal (at least on D2)?

 My initial reaction is that I'm skeptical of any attempt at manual  
 replacement of built-in functionality.  It's been shown, for example,  
 that custom allocators often perform worse than the default allocators.
   That aside, the general approach does seem reasonable.  Though I'd  
 still prefer we just get real stack-based dynamic allocation in D.  It  
 seems quite natural that "scope" should provide this for arrays as a QOI  
 feature, as it does for classes.
I agree, making "scope T[] t = new T[100];" stack-allocated is great. // D1/D2 scope T[] t1 = new T[size]; // stack-allocated // D2 shared scope T[] t2 = new T[size]; // Does it make any sense? Should it be an error?
Nov 12 2008
parent Sean Kelly <sean invisibleduck.org> writes:
Denis Koroskin wrote:
 
 // D2
 shared scope T[] t2 = new T[size]; // Does it make any sense? Should it 
 be an error?
It makes sense. Consider how futures work: shared scope T[] t2 = new T[size]; auto f = new ComputeAsync(t2); ... writefln( f.val ); Here, the computation could be done by another thread but in this situation we know for certain that the other thread will be done with it before t2 is destroyed. Sean
Nov 12 2008
prev sibling next sibling parent reply "Dave" <Dave_Member pathlink.com> writes:
"Andrei Alexandrescu" <SeeWebsiteForEmail erdani.org> wrote in message 
news:gff7g0$t54$1 digitalmars.com...
 KennyTM~ wrote:
 The best solution I can think of, without compiler modification is a 
 struct/class that contains a static array member T[1024] and a dynamic 
 array member T[] initialized to null; and the code chooses which member 
 to use in the constructor. But this always occupies 1024*T.sizeof bytes 
 and there will always be a conditional (if) sticked to all access 
 methods.
This entire discussion gets me thinking - could an alternate stack be a good (or even better) solution? Consider a global per-thread "superstack" - a growable region that allocates memory in large chunks and makes sub-chunks accessible in a strictly LIFO manner. The primitives of the superstack are as follows: void* SuperStack.allocate(size_t bytes); void SuperStack.free(size_t bytes); size_t SuperStack.slack(); The SuperStack's management is a singly-linked list of large blocks. The slack function returns how many bytes are left over in the current chunk. If you request more than slack bytes, a new chunk is allocated etc. A SuperStack can grow indefinitely (and is susceptible to leaks). Some convenience functions complete the picture: T[] SuperStack.array(T)(size_t objects); enum Uninitialized { yeahIKnow } T[] SuperStack.array(T)(size_t objects, Uninitialized); Freeing chunks should not be done immediately in order to avoid pathological behavior when a function repeatedly allocates and frees a chunk just to fulfill some small data needs. With the SuperStack in place, code could look like this: void foo(in size_t s) { auto a = SuperStack.array(int)(s, Uninitialized.yeahIKnow); scope(exit) SuperStack.free(s); ... play with a ... } Is there interest for such a thing?
That would be interesting and even better if it could be a built-in (not clear on if that was the intention or not). The reason for having it be a built-in is (IMO) it's important for D to provide language constructs that can improve performance for common things like allocating temporary, scope-limited arrays and classes, especially with the new direction D2 discussions have been heading in lately (like pure functions and such).
 Andrei 
Nov 12 2008
parent "Dave" <Dave_Member pathlink.com> writes:
"Dave" <Dave_Member pathlink.com> wrote in message 
news:gffgig$1ge6$1 digitalmars.com...
 "Andrei Alexandrescu" <SeeWebsiteForEmail erdani.org> wrote in message 
 news:gff7g0$t54$1 digitalmars.com...
 KennyTM~ wrote:
 The best solution I can think of, without compiler modification is a 
 struct/class that contains a static array member T[1024] and a dynamic 
 array member T[] initialized to null; and the code chooses which member 
 to use in the constructor. But this always occupies 1024*T.sizeof bytes 
 and there will always be a conditional (if) sticked to all access 
 methods.
This entire discussion gets me thinking - could an alternate stack be a good (or even better) solution? Consider a global per-thread "superstack" - a growable region that allocates memory in large chunks and makes sub-chunks accessible in a strictly LIFO manner. The primitives of the superstack are as follows: void* SuperStack.allocate(size_t bytes); void SuperStack.free(size_t bytes); size_t SuperStack.slack(); The SuperStack's management is a singly-linked list of large blocks. The slack function returns how many bytes are left over in the current chunk. If you request more than slack bytes, a new chunk is allocated etc. A SuperStack can grow indefinitely (and is susceptible to leaks). Some convenience functions complete the picture: T[] SuperStack.array(T)(size_t objects); enum Uninitialized { yeahIKnow } T[] SuperStack.array(T)(size_t objects, Uninitialized); Freeing chunks should not be done immediately in order to avoid pathological behavior when a function repeatedly allocates and frees a chunk just to fulfill some small data needs. With the SuperStack in place, code could look like this: void foo(in size_t s) { auto a = SuperStack.array(int)(s, Uninitialized.yeahIKnow); scope(exit) SuperStack.free(s); ... play with a ... } Is there interest for such a thing?
That would be interesting and even better if it could be a built-in (not clear on if that was the intention or not). The reason for having it be a built-in is (IMO) it's important for D to provide language constructs that can improve performance for common things like allocating temporary, scope-limited arrays and classes, especially with the new direction D2 discussions have been heading in lately (like pure functions and such).
Oh, and any built-in syntax needs to provide a way to allocate uninitialized data also (yeah, I know)...
 Andrei
Nov 12 2008
prev sibling next sibling parent dsimcha <dsimcha yahoo.com> writes:
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 KennyTM~ wrote:
 The best solution I can think of, without compiler modification is a
 struct/class that contains a static array member T[1024] and a dynamic
 array member T[] initialized to null; and the code chooses which member
 to use in the constructor. But this always occupies 1024*T.sizeof bytes
 and there will always be a conditional (if) sticked to all access methods.
This entire discussion gets me thinking - could an alternate stack be a good (or even better) solution? Consider a global per-thread "superstack" - a growable region that allocates memory in large chunks and makes sub-chunks accessible in a strictly LIFO manner. The primitives of the superstack are as follows: void* SuperStack.allocate(size_t bytes); void SuperStack.free(size_t bytes); size_t SuperStack.slack(); The SuperStack's management is a singly-linked list of large blocks. The slack function returns how many bytes are left over in the current chunk. If you request more than slack bytes, a new chunk is allocated etc. A SuperStack can grow indefinitely (and is susceptible to leaks). Some convenience functions complete the picture: T[] SuperStack.array(T)(size_t objects); enum Uninitialized { yeahIKnow } T[] SuperStack.array(T)(size_t objects, Uninitialized); Freeing chunks should not be done immediately in order to avoid pathological behavior when a function repeatedly allocates and frees a chunk just to fulfill some small data needs. With the SuperStack in place, code could look like this: void foo(in size_t s) { auto a = SuperStack.array(int)(s, Uninitialized.yeahIKnow); scope(exit) SuperStack.free(s); ... play with a ... } Is there interest for such a thing? Andrei
That sounds terrific, although I think it would be even better if it were built into the runtime. This way, all scope objects, including arrays, could be allocated on the SuperStack. If this is in the language core, the compiler could even optimize at least simple cases automatically, if it could prove that something doesn't escape. The only hard part would be getting the heuristics right for when to free the chunks. I haven't thought through all the contingencies, but I guess this would be as fast as stack allocation, assuming enough slack.
Nov 13 2008
prev sibling parent Tom S <h3r3tic remove.mat.uni.torun.pl> writes:
Andrei Alexandrescu wrote:
 enum Uninitialized { yeahIKnow }
Hahaha, brilliant! :D
 Is there interest for such a thing?
Yes, absolutely. I use lots of stack allocations and the vision of an overflow is always lurking somewhere in the shadows... EASTL has a stack allocator, so we must have it as well ;) -- Tomasz Stachowiak http://h3.team0xf.com/ h3/h3r3tic on #D freenode
Nov 14 2008
prev sibling parent Janderson <ask me.com> writes:
KennyTM~ wrote:
 Janderson wrote:
 KennyTM~ wrote:
 Janderson wrote:
 Dave wrote:
 I'd love for "scope foo = new T[len];" to do for arrays what 
 "scope bar = new Class;" does for classes. And indeed, if it's too 
 big the compiler
I'm surprised it doesn't and see that as a bit inconsistent, with the only serious argument against it being that 'scope' couldn't be used for large dynamic arrays. But then again: class C { int[264_000] a; } void foo() { scope C c = new C; ... } could also overflow the stack. In either case the work-around would be the same (increase the stack size or not use 'scope').
As a work around, I imagine it would be possible to write a template that used the above syntax with a static if that would change depending on the size: Something like this (untested): class FastArray(T, int size) if (size < 1000) { T[size] a; ... Overload operators } class FastArray(T, int size) if (size >= 1000) { T a[] = new T[size]; ... Overload operators } //Use void foo() { scope FastArray array = new FastArray!(int, 10); //Stack scope FastArray array = new FastArray!(int, 10000); //Heap } Of course you never know where you are in the stack, so nesting these to much would be bad.
But this won't work if size is runtime-determined.
Thanks for clarifying my last point :) -Joel
Is it? I mean things like: int size = read_from_stdin(); scope array = new FixedSizeArray!(int)(size); You can't do it FixedSizeArray!(int, size) because all template arguments must be determined at compile time. The best solution I can think of, without compiler modification is a struct/class that contains a static array member T[1024] and a dynamic array member T[] initialized to null; and the code chooses which member to use in the constructor. But this always occupies 1024*T.sizeof bytes and there will always be a conditional (if) sticked to all access methods. I hope I don't misunderstand your last point :).
I assumed you knew the size at compile time (like in Dave's example) and wanted to avoid stack overflow. Modifying the stack at compile time would be a dangerous thing. I think the best solution would be to have a second stack for that sort of thing, which could be managed in a library still. -Joel
Nov 12 2008
prev sibling next sibling parent Derek Parnell <derek psych.ward> writes:
On Tue, 11 Nov 2008 17:39:07 +0000 (UTC), dsimcha wrote:

 If length is larger than some arbitrary but reasonable value, this is heap
 allocated.  If length is small, it is stack allocated.  Either way, allowing
 the array to escape the current scope is undefined behavior and the compiler
 would detect at least the simple cases of this.  Ideally, since this would
 only be used for temp variables in performance-critical code, the contents of
 the array would also not be initialized.
Would there be a problem if multiple stack-arrays were asked for. It might be that one or two such arrays are not going to upset the stack size but allocating 100 of them on the same stack might be a problem. Can the compiler help here? -- Derek Parnell Melbourne, Australia skype: derek.j.parnell
Nov 11 2008
prev sibling parent reply Kagamin <spam here.lot> writes:
Andrei Alexandrescu Wrote:

 With the SuperStack in place, code could look like this:
 
 void foo(in size_t s)
 {
      auto a = SuperStack.array(int)(s, Uninitialized.yeahIKnow);
      scope(exit) SuperStack.free(s);
      ... play with a ...
 }
void foo(in size_t s) { StackFrame m; //struct, remember heap status in constructor //restore in destructor //bonus: can't alloc unless StackFrame is created auto a = m.alloc(int[])(s, Uninitialized.yeahIKnow); //scope(exit) SuperStack.free(s); you don't need free to free stack ... play with a ... }
Nov 13 2008
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Kagamin wrote:
 Andrei Alexandrescu Wrote:
 
 With the SuperStack in place, code could look like this:

 void foo(in size_t s)
 {
      auto a = SuperStack.array(int)(s, Uninitialized.yeahIKnow);
      scope(exit) SuperStack.free(s);
      ... play with a ...
 }
void foo(in size_t s) { StackFrame m; //struct, remember heap status in constructor //restore in destructor //bonus: can't alloc unless StackFrame is created auto a = m.alloc(int[])(s, Uninitialized.yeahIKnow); //scope(exit) SuperStack.free(s); you don't need free to free stack ... play with a ... }
Sweet!! Andrei
Nov 13 2008