www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Q: fast stack allocation?

reply "Kris" <someidiot earthlink.dot.dot.dot.net> writes:
How does one allocate an array on the stack without incurring the wrath of a
"clear everything to zero" loop? I've tried the obvious, such as

char[Size] array;

and

char[] array = (cast(char *) alloca(Size))[0..Size];

both of which insist on clearing the content to zero. I appreciate
deterministic state as much as the next person, but there are times when it
gets in the way of execution speed ...

- Kris
May 30 2004
parent reply "Walter" <newshound digitalmars.com> writes:
"Kris" <someidiot earthlink.dot.dot.dot.net> wrote in message
news:c9c1l3$1tfp$1 digitaldaemon.com...
 How does one allocate an array on the stack without incurring the wrath of

 "clear everything to zero" loop? I've tried the obvious, such as

 char[Size] array;

 and

 char[] array = (cast(char *) alloca(Size))[0..Size];

 both of which insist on clearing the content to zero. I appreciate
 deterministic state as much as the next person, but there are times when

 gets in the way of execution speed ...

char[] array; array = (cast(char *) alloca(Size))[0..Size];
May 30 2004
parent reply "Kris" <someidiot earthlink.dot.dot.dot.net> writes:
alloca() still does a < rep movs dword ptr [edi],dword ptr [esi] >, albeit a
shorter loop than < char[Size] array > emits.

Surely there must be a simple means of circumventing all this overhead? I
mean, I'm allocating on the stack so as to avoid heap management: yet still
have all this to contend with. Whatever happened to <sub sp, Size> type
stack allocation? There's still a real need for that ...

- Kris


"Walter" <newshound digitalmars.com> wrote in message
news:c9c7p0$260b$1 digitaldaemon.com...
 "Kris" <someidiot earthlink.dot.dot.dot.net> wrote in message
 news:c9c1l3$1tfp$1 digitaldaemon.com...
 How does one allocate an array on the stack without incurring the wrath


 a
 "clear everything to zero" loop? I've tried the obvious, such as

 char[Size] array;

 and

 char[] array = (cast(char *) alloca(Size))[0..Size];

 both of which insist on clearing the content to zero. I appreciate
 deterministic state as much as the next person, but there are times when

 gets in the way of execution speed ...

char[] array; array = (cast(char *) alloca(Size))[0..Size];

May 30 2004
parent reply "Walter" <newshound digitalmars.com> writes:
"Kris" <someidiot earthlink.dot.dot.dot.net> wrote in message
news:c9d2n6$7hp$1 digitaldaemon.com...
 alloca() still does a < rep movs dword ptr [edi],dword ptr [esi] >, albeit

 shorter loop than < char[Size] array > emits.

That loop doesn't initialize the array, it only adjusts the stack frame to make room. The array itself is uninitialized.
Surely there must be a simple means of circumventing all this overhead? I
mean, I'm allocating on the stack so as to avoid heap management: yet still
have all this to contend with. Whatever happened to <sub sp, Size> type
stack allocation? There's still a real need for that ...

Uninitialized variables are a huge source of hidden bugs in C/C++. Getting rid of that is a design goal of D.
May 30 2004
next sibling parent reply "Kris" <someidiot earthlink.dot.dot.dot.net> writes:
I have absolutely no argument with your point regarding uninitialized
variables, but there are occasions where a "practical programmer needs a
practical language".

That adjustment of the stack frame is a 44 byte *copy* loop. With today's
12+ multiplier CPUs, that's a noticeable amount of time spent waiting for
memory access ...

What I'm doing here is making a scratchpad available for lower-level
functions to format content into. This has to be done on a thread-by-thread
basis, so the stack is a safe location to house said scratchpad. The
alternative is for me to provide a 'freelist' of preallocated buffers ... in
the latter case, the freelist would likely end up (over time) with as many
instances as there are threads running. If the number of threads is quite
large, and the buffer is of a reasonable size,  you can end up eating gobs
of memory instead of simply placing the buffer on the stack in the first
place. To get around this, you end up creating a "reaper" for the freelist.

This is not a good solution. To make matters worse, freelists must be
synchronized which adds more overhead. To add insult to injury, a
synchronized method often includes the hidden equivalent of a try/catch to
ensure the monitor is released. All this, just because I wanted to quickly
allocate a scratchpad on the stack? Good grief :-)

This issue has arisen simply because I want to ensure the overhead for
runtime logging/tracing is minimal. Surely this is a valid and practical
goal?

To illustrate, here's a simplified example:

        void test ()
        {
                // provide a scratchpad on the stack
                char[40] output;

                // do something with the returned slice
                uitoa (output, 17);
        }

        static char[] uitoa (char[] s, uint i)
        {
                uint len = s.length;
                while (i && len)
                      {
                      s[--len] = i % 10 + '0';
                      i /= 10;
                      }
                return s[len..s.length];
        }

That is, rather than uitoa() allocating from the heap (a la Phobos) it
populates and returns a slice on the provided buffer. You must realize that
the initialization of the caller's stack array is actually likely to cost
more (in time) than the uitoa() call itself for small numbers ...

There has to be a better solution, and I'd much appreciate your suggestions.

- Kris


"Walter" <newshound digitalmars.com> wrote in message
news:c9d510$akc$2 digitaldaemon.com...
 "Kris" <someidiot earthlink.dot.dot.dot.net> wrote in message
 news:c9d2n6$7hp$1 digitaldaemon.com...
 alloca() still does a < rep movs dword ptr [edi],dword ptr [esi] >,


 a
 shorter loop than < char[Size] array > emits.

That loop doesn't initialize the array, it only adjusts the stack frame to make room. The array itself is uninitialized.
Surely there must be a simple means of circumventing all this overhead? I
mean, I'm allocating on the stack so as to avoid heap management: yet


have all this to contend with. Whatever happened to <sub sp, Size> type
stack allocation? There's still a real need for that ...

Uninitialized variables are a huge source of hidden bugs in C/C++. Getting rid of that is a design goal of D.

May 30 2004
next sibling parent reply Arcane Jill <Arcane_member pathlink.com> writes:
In article <c9d8n1$fq6$1 digitaldaemon.com>, Kris says...
I have absolutely no argument with your point regarding uninitialized
variables, but there are occasions where a "practical programmer needs a
practical language".

inline assembler? Jill
May 30 2004
parent "Kris" <someidiot earthlink.dot.dot.dot.net> writes:
Yeah, I didn't dare mention that one AJ 'cos it seems ridiculous to go there
just for a fast stack array :-)

I'm not saying that D should follow the C/C++ mentality here (for fast stack
arrays), but surely this is a valid issue?

- Kris

"Arcane Jill" <Arcane_member pathlink.com> wrote in message
news:c9da9d$hp3$1 digitaldaemon.com...
 In article <c9d8n1$fq6$1 digitaldaemon.com>, Kris says...
I have absolutely no argument with your point regarding uninitialized
variables, but there are occasions where a "practical programmer needs a
practical language".

inline assembler? Jill

May 30 2004
prev sibling next sibling parent reply "Walter" <newshound digitalmars.com> writes:
"Kris" <someidiot earthlink.dot.dot.dot.net> wrote in message
news:c9d8n1$fq6$1 digitaldaemon.com...
 I have absolutely no argument with your point regarding uninitialized
 variables, but there are occasions where a "practical programmer needs a
 practical language".

 That adjustment of the stack frame is a 44 byte *copy* loop. With today's
 12+ multiplier CPUs, that's a noticeable amount of time spent waiting for
 memory access ...

Copying a few bytes that are on the stack are in the cache already.
 What I'm doing here is making a scratchpad available for lower-level
 functions to format content into. This has to be done on a

 basis, so the stack is a safe location to house said scratchpad. The
 alternative is for me to provide a 'freelist' of preallocated buffers ...

 the latter case, the freelist would likely end up (over time) with as many
 instances as there are threads running. If the number of threads is quite
 large, and the buffer is of a reasonable size,  you can end up eating gobs
 of memory instead of simply placing the buffer on the stack in the first
 place. To get around this, you end up creating a "reaper" for the

 This is not a good solution. To make matters worse, freelists must be
 synchronized which adds more overhead. To add insult to injury, a
 synchronized method often includes the hidden equivalent of a try/catch to
 ensure the monitor is released. All this, just because I wanted to quickly
 allocate a scratchpad on the stack? Good grief :-)

 This issue has arisen simply because I want to ensure the overhead for
 runtime logging/tracing is minimal. Surely this is a valid and practical
 goal?

 To illustrate, here's a simplified example:

         void test ()
         {
                 // provide a scratchpad on the stack
                 char[40] output;

                 // do something with the returned slice
                 uitoa (output, 17);
         }

         static char[] uitoa (char[] s, uint i)
         {
                 uint len = s.length;
                 while (i && len)
                       {
                       s[--len] = i % 10 + '0';
                       i /= 10;
                       }
                 return s[len..s.length];
         }

 That is, rather than uitoa() allocating from the heap (a la Phobos) it
 populates and returns a slice on the provided buffer. You must realize

 the initialization of the caller's stack array is actually likely to cost
 more (in time) than the uitoa() call itself for small numbers ...

 There has to be a better solution, and I'd much appreciate your

In most cases, a good optimizer will be able to eliminate the redundant initialization. But not all cases. If the array is just a few bytes long, I doubt you'll be able to detect it in a real app. For large arrays, alloca() is the way to go. I think the price of a bit of an occaisional over-initialization is worth paying in exchange for more reliable programs. And I speak as one who has a mania for detailed optimization and have a reputation for using dirty tricks to get it.
May 30 2004
next sibling parent reply "Kris" <someidiot earthlink.dot.dot.dot.net> writes:
"Walter" wrote
 Copying a few bytes that are on the stack are in the cache already.

If so, then fair point ... (though it would hurt a typical MCU).
 I think the price of a bit of an occaisional over-initialization is worth
 paying in exchange for more reliable programs. And I speak as one who has

 mania for detailed optimization and have a reputation for using dirty

 to get it.

I was going to pull you on the latter, but thought it might be out-of-order <g>. However, let's not confuse the current issue with a dirty trick; it's a perfectly reasonable and legitimate thing to do. And it's not even close to manic optimization. If you disagree with those two points, then please say so :-) Let me ask a question (or two), and then I'll drop it :: suppose there were some reasonable assignment syntax that permitted the programmer to omit default array initialization under carefully considered circumstances, would you still claim that alloca() is the right way to go, even with its overhead? For all purposes? If nothing else, the exercise uncovered two bugs (in the buglist) ... - Kris
May 30 2004
parent "Kris" <someidiot earthlink.dot.dot.dot.net> writes:
"Kris"  wrote in
 If nothing else, the exercise uncovered two bugs (in the buglist) ...

Errrr ... perhaps one bug (should have checked first).
May 30 2004
prev sibling next sibling parent "Matthew" <matthew.hat stlsoft.dot.org> writes:
 I think the price of a bit of an occaisional over-initialization is worth
 paying in exchange for more reliable programs. And I speak as one who has a
 mania for detailed optimization and have a reputation for using dirty tricks
 to get it.

Some of them are positively filthy! (More power to your elbow.)
Jun 02 2004
prev sibling parent reply Norbert Nemec <Norbert.Nemec gmx.de> writes:
Walter wrote:

 I think the price of a bit of an occaisional over-initialization is worth
 paying in exchange for more reliable programs. And I speak as one who has
 a mania for detailed optimization and have a reputation for using dirty
 tricks to get it.

Implicit initialization does not make programs more reliable at all. If the default value is what you want for the variable, implicit initialization saves a bit of typing. If the variable needs to be initialized to some other value but you forget to do this, the default value is just as incorrect as any random value that might have been in that memory location by chance. Implicit initialization makes bugs behave more predictable and portable, but they stay bugs. Furthermore, they will even be harder to localize, because you never know whether the programmer really intended the variable to be initialized with the default value, or whether he just forgot some explicit initialization. C/C++ may warn if an initialization was forgotten. D will just accept the code, since there is no way to recognize that it has wrong semantics. If you want to prevent bugs, go for *explicit initialization* demanded by the language. Just call it an *error* if the compiler cannot assert that a variable was explicitely initialized before the first use. (In C/C++, this is just a warning which might be ignored. To turn it into an error, the specs might have to be slightly restricted compared to C/C++, no idea whether this is possible.)
Jun 02 2004
parent "Matthew" <matthew.hat stlsoft.dot.org> writes:
"Norbert Nemec" <Norbert.Nemec gmx.de> wrote in message
news:c9kmvm$1v8g$1 digitaldaemon.com...
 Walter wrote:

 I think the price of a bit of an occaisional over-initialization is worth
 paying in exchange for more reliable programs. And I speak as one who has
 a mania for detailed optimization and have a reputation for using dirty
 tricks to get it.

Implicit initialization does not make programs more reliable at all. If the default value is what you want for the variable, implicit initialization saves a bit of typing. If the variable needs to be initialized to some other value but you forget to do this, the default value is just as incorrect as any random value that might have been in that memory location by chance. Implicit initialization makes bugs behave more predictable and portable, but they stay bugs. Furthermore, they will even be harder to localize, because you never know whether the programmer really intended the variable to be initialized with the default value, or whether he just forgot some explicit initialization. C/C++ may warn if an initialization was forgotten. D will just accept the code, since there is no way to recognize that it has wrong semantics. If you want to prevent bugs, go for *explicit initialization* demanded by the language. Just call it an *error* if the compiler cannot assert that a variable was explicitely initialized before the first use. (In C/C++, this is just a warning which might be ignored. To turn it into an error, the specs might have to be slightly restricted compared to C/C++, no idea whether this is possible.)

And another long, and eventually ignored, thread begins. I wish you the best of luck. :-) Derek the Doomed
Jun 04 2004
prev sibling parent reply J Anderson <REMOVEanderson badmama.com.au> writes:
Kris wrote:

I have absolutely no argument with your point regarding uninitialized
variables, but there are occasions where a "practical programmer needs a
practical language".

That adjustment of the stack frame is a 44 byte *copy* loop. With today's
12+ multiplier CPUs, that's a noticeable amount of time spent waiting for
memory access ...

What I'm doing here is making a scratchpad available for lower-level
functions to format content into. This has to be done on a thread-by-thread
basis, so the stack is a safe location to house said scratchpad. The
alternative is for me to provide a 'freelist' of preallocated buffers ... in
the latter case, the freelist would likely end up (over time) with as many
instances as there are threads running. If the number of threads is quite
large, and the buffer is of a reasonable size,  you can end up eating gobs
of memory instead of simply placing the buffer on the stack in the first
place. To get around this, you end up creating a "reaper" for the freelist.

This is not a good solution. To make matters worse, freelists must be
synchronized which adds more overhead. To add insult to injury, a
synchronized method often includes the hidden equivalent of a try/catch to
ensure the monitor is released. All this, just because I wanted to quickly
allocate a scratchpad on the stack? Good grief :-)

This issue has arisen simply because I want to ensure the overhead for
runtime logging/tracing is minimal. Surely this is a valid and practical
goal?

To illustrate, here's a simplified example:

        void test ()
        {
                // provide a scratchpad on the stack
                char[40] output;

                // do something with the returned slice
                uitoa (output, 17);
        }

        static char[] uitoa (char[] s, uint i)
        {
                uint len = s.length;
                while (i && len)
                      {
                      s[--len] = i % 10 + '0';
                      i /= 10;
                      }
                return s[len..s.length];
        }

That is, rather than uitoa() allocating from the heap (a la Phobos) it
populates and returns a slice on the provided buffer. You must realize that
the initialization of the caller's stack array is actually likely to cost
more (in time) than the uitoa() call itself for small numbers ...

There has to be a better solution, and I'd much appreciate your suggestions.

- Kris
  

If performace is an issue, generally I try to pre-allocate the array. Obviously there are circumstances where that doesn't work but generally it's sufficient.
"Walter" <newshound digitalmars.com> wrote in message
news:c9d510$akc$2 digitaldaemon.com...
  

"Kris" <someidiot earthlink.dot.dot.dot.net> wrote in message
news:c9d2n6$7hp$1 digitaldaemon.com...
    

alloca() still does a < rep movs dword ptr [edi],dword ptr [esi] >,
      


a
    

shorter loop than < char[Size] array > emits.
      

make room. The array itself is uninitialized.
Surely there must be a simple means of circumventing all this overhead? I
mean, I'm allocating on the stack so as to avoid heap management: yet
      


have all this to contend with. Whatever happened to <sub sp, Size> type
stack allocation? There's still a real need for that ...
      

rid of that is a design goal of D.


-- -Anderson: http://badmama.com.au/~anderson/
May 30 2004
parent "Kris" <someidiot earthlink.dot.dot.dot.net> writes:
"J Anderson" wrote
 If performace is an issue, generally I try to pre-allocate the array.
 Obviously there are circumstances where that doesn't work but generally
 it's sufficient.

Me too, and seriously so. For example, mango.server and mango.servlet happily process each HTTP request without making a single heap allocation. That includes all the header/params/url parsing & decoding etc, and is why those two make arguably the fastest HTTP/servlet engine in existance. However, in this particular case there should be absolutely no need to go to the trouble of freelists and/or thread-local-storage (like mango.server et. al. do). Those options incurr a reasonable amount of development and runtine overhead for what really should be a no-brainer operation. Other than that, I fully agree with your sentiment. - Kris
May 31 2004
prev sibling parent Norbert Nemec <Norbert.Nemec gmx.de> writes:
Walter wrote:

 Uninitialized variables are a huge source of hidden bugs in C/C++. Getting
 rid of that is a design goal of D.

If you forget to initialize a variable that should be initialized, that will be a bug in D just the same way that it would be one in C/C++. The only difference is, that the behavior of the buggy program is deterministic. Even worse: In C/C++, a good compiler can give a warning if a variable may be read uninitialized. In D, this is not possible, because a variable is always set to some default value, so it is legal to read it. Default values in D are a great feature of practical use, but as a means to prevent bugs, they will not help at all. If the default value is what you want, then implicit initialization might help. For pointers/references, initializing to zero by default is reasonable. But forcing every type to be initialized to something does not help at all but even is counterproductive (preventing the compiler from throwing reasonable warnings.)
Jun 01 2004