www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - D stack allocation

reply "van eeshan" <vanee hotmail.net> writes:
"Walter" <newshound digitalmars.com> wrote in message
news:cfbip7$ora$1 digitaldaemon.com...
 "van eeshan" <vanee hotmail.net> wrote in message
 news:cfb72j$9l7$1 digitaldaemon.com...
 "Walter" <newshound digitalmars.com> wrote in message
 news:cfb5v9$76u$1 digitaldaemon.com...
 The problem, however, is that since Java does not allow any objects or
 arrays on the stack (or in static data), whenever you need one it
needs
 to
 be allocated on the heap. This is a lot more expensive than stack
 allocation. Since one winds up allocating a lot more heap objects in
Java
 than one does in C/C++, the end result is slower. Java offers no
 alternatives to gc allocation.

 This is why D allows stack based objects and value objects.
Are you saying that D allows one to create Objects entirely upon the
stack? For auto objects, it is allowed for the compiler to put them there. But I was primarilly referring to structs and arrays.
 And entirely within static data?
Structs and arrays, yes. Java does not allow structs, nor does it allow arrays on the stack.
Thanks for clarifying that Walter. Would it be prudent to note that D allocates auto Objects on the heap? Stack allocation may be allowed for in the spec, but that's not how the compiler operates. Also, there is that point about heap allocations being a lot more expensive than stack allocations. Naturally, one would lean towards wholeheartedly agreeing; but this is just not true when it comes to D. Here's a little (contrived) test jig to illustrate: private import std.c.time; private import std.c.stdlib; private import std.c.windows.windows; void heap() { void *x = malloc (16 * 1024); free (x); } void stack() { ubyte[16 * 1024] x; } void trace (uint startTime) { FILETIME ft; GetSystemTimeAsFileTime(&ft); printf ("%u\n", (ft.dwLowDateTime - startTime) / 10000); } void main() { FILETIME start; GetSystemTimeAsFileTime(&start); trace (start.dwLowDateTime); sleep (3); trace (start.dwLowDateTime); for (int i=1_000_000; --i;) heap(); trace (start.dwLowDateTime); for (int i=1_000_000; --i;) stack(); trace (start.dwLowDateTime); } The results? On this machine it takes the heap() routine ~500ms to execute, and the stack() version ~3500ms. Yep, that's seven times longer. Why? What wasn't stated is that D ensures all stack variables are initialized ... including arrays. The innocuous stack allocation actually clears the 16KB to zero each time, which is why it takes so long. For a 32KB 'buffer', the stack() execution time goes up to ~8600ms versus the ~500ms for heap(); you can see there's some rather non-linear behavior for the stack version <g> The point here is not about the benefits/detriments of deterministic initialization. It's about clarifying a point of performance-tuning that is far from obvious. Without wishing to put too fine a point on it, the quoted post was somewhat misleading. That is; allocating an array on the D stack can consume far more cycles than the equivalent heap allocation. It depends upon the size of the array, the bus speed, the size of the cache, heap fragmentation and many other things. D stack allocations are simply not like C/C++ at all. Far from it. For the curious, here's the disassembly for both routines. It's worth noting that the array initialization could be sped up. Also worth noting is that the loop/call overhead was 10ms. 10: { 11: void *x = malloc (16 * 1024); 12: free (x); 00402011 push 4000h 00402016 call _malloc (0040574c) 0040201B add esp,4 0040201E push eax 0040201F call _free (004057a4) 00402024 add esp,4 13: } 00402027 pop eax 00402028 ret 14: 15: 16: void stack() 0040202C push ebp 0040202D mov ebp,esp 0040202F mov edx,4 00402034 sub esp,1000h 0040203A test dword ptr [esp],esp 0040203D dec edx 0040203E jne _D5hello5stackFZv+8 (00402034) 00402040 sub esp,edx 00402042 mov ecx,4000h 00402047 xor eax,eax 17: { 18: ubyte[16 * 1024] x; 00402049 push edi 0040204A lea edi,[x] 00402050 rep stos byte ptr [edi] 19: } 00402052 pop edi 00402053 mov esp,ebp 00402055 pop ebp 00402056 ret
Aug 10 2004
next sibling parent reply Andy Friesen <andy ikagames.com> writes:
van eeshan wrote:
 Thanks for clarifying that Walter. Would it be prudent to note that D
 allocates auto Objects on the heap? Stack allocation may be allowed for in
 the spec, but that's not how the compiler operates.
 
 Also, there is that point about heap allocations being a lot more expensive
 than stack allocations. Naturally, one would lean towards wholeheartedly
 agreeing; but this is just not true when it comes to D. Here's a little
 (contrived) test jig to illustrate:
You'll notice, though, that there's absolutely nothing preventing D from performing these allocations on the stack as an optimization. All the compiler needs is to be certain that the object's lifetime is limited to the scope. Auto references offer precisely this. (incidently, it is legal to write "auto int[] x = new int[...];") -- andy
Aug 10 2004
next sibling parent "van eeshan" <vanee hotmail.net> writes:
Yep; agreed (and noted below). It just doesn't do that today, so I thought
it was worth a clarification (for those who didn't know).

"Andy Friesen" <andy ikagames.com> wrote in message
news:cfbpc5$13eh$1 digitaldaemon.com...
 van eeshan wrote:
 Thanks for clarifying that Walter. Would it be prudent to note that D
 allocates auto Objects on the heap? Stack allocation may be allowed for
in
 the spec, but that's not how the compiler operates.

 Also, there is that point about heap allocations being a lot more
expensive
 than stack allocations. Naturally, one would lean towards wholeheartedly
 agreeing; but this is just not true when it comes to D. Here's a little
 (contrived) test jig to illustrate:
You'll notice, though, that there's absolutely nothing preventing D from performing these allocations on the stack as an optimization. All the compiler needs is to be certain that the object's lifetime is limited to the scope. Auto references offer precisely this. (incidently, it is legal to write "auto int[] x = new int[...];") -- andy
Aug 10 2004
prev sibling parent reply "van eeshan" <vanee hotmail.net> writes:
Attached a little graph that shows arrays allocated on the heap versus
arrays declared as local (stack) variables. You can see that stack
allocations are faster only for array sizes of below ~64 bytes (on this
machine), and quickly become significantly slower once you start using local
arrays for anything bigger than around 512-bytes.

There's an interesting quirk at the 4096-byte mark: the codegen changes
strategy slightly at this point, and I'd guess it is doing something akin to
paging (ensuring the stack page-set is available), though can't be sure.
Take a look at the previously posted assembler for reference.

If local (stack) arrays were to use the C/C++ approach, the time consumed
would be ~10ms across the board, rather than the ~500ms for heap
allocations. That is, for a 32KB local array, the initialization cost is
~865 times /more/ than it would be for the equivalent C/C++ approach (~10ms
versus ~8650ms); almost three orders of magnatude.

This indicates two things:

1) making claims about D's perfomance superiority over Java due to the
former's ability to use the stack (versus always on the heap) is based on
speculation; not fact. Testing shows this to be entirely the opposite for
arrays beyond 64 bytes in length. Similarly, any kind of performance testing
against C/C++ within this realm will be viewed as a negative mark against D.

2) Implicit initialization is great. However, it can clearly and easily be
demonstrated to be a serious impediment. Given that D is supposed to be a
'systems' language, I humbly suggest that a mechanism be provided to bypass
such nicities when one is prepared to assume full responsibility. For
example, there might be a qualifier for variables that disables the
automatic initialization (along the lines of the 'auto' qualifier).

What do you folks think?









begin 666 stack allocation.PNG
MB5!.1PT*& H````-24A$4 ```C$```$L" (```"]K75[````!&=!34$`````
MBR5 30```"!C2%)-``````````````````````````````````````````" 





M`0;P.    $;KI%$P"D;!*! %`PS =1! `(W62:- %(R"43 *!AC ZR" `*)5

M>($/$$#TZ

M$*V3``)H8,;NB.<28_ H& 6C8!2,`C0P1.LD  `:DI,W0\NUHV 4C()10'\P

M"D;!*!B&8(C620`!-%HGC8)1, I&P3 $0[1.` B T3II%(R"43 *AB$8HG42


MZZ11, I&P2 8U("! 9S:98C620`!-%HGC8)1, I&P: &(ZI.` B T3II%(R"
M43 *!B\ KT+Z/V3K)( `&JV31L$H& 6C8) "LBND_T.V3 ((H-$Z:12, E$P
M" 8I&(%U$D `C=9)HV 4C()1,! !)172_R%;)P$$T&B=- I&P2 8!8,1C,PZ

M`.45TO\A6R<!!!!-ZB3,. :901(7E_E4=.TH& 6C8!0,*C"2ZR2 `*)'/XD!
M!M#M'JV31L$H& 6C`!50I4+Z/V3K)( `HE.=A,P8K9-&P2 8!:,`*Z!6A?1_



M^C]DZR2 `!JMDT;!*! %HV  P6B=]!^I5 <(H-$Z:12, E$P" 8,T*A"^C]D


M==(H& 6C8!30&]"Z0OH_9.LD  `:K9-&P2 8!:. KH .%=+_(5LG`030:)TT
M"D;!*! %= 6C=1(F )?J``$T6B>- E$P"D8!_0!]*J3_0[9.` B T3II%(R"
M43 *Z 3H5B']'[)U$D `C=9)HV 4C()10"<P6B?A`O!2'2" 1NND43 *1L$H
MH > 9X7T?\C620`!-%HGC8)1, I&`<T!G2ND_T.V3 ((H-$Z:12, E$P"F .
M1NLD_ !>J ,$T&B=- I&P2 8!;0%]*^0_ _9. D  &A5)V$U$&$K&""+X^$2
M:? H& 6C8!0,3C!:)Q$$\%(=((!H4B=AK53  L U$T$N+O.IZ-I1, I&P2B 
M'1B0"NG_D*V3``*(3OTDY/IFM$X:!:- %(P0,% 5TO\A6R<!!! ]ZB22*J'1
M.FD4C()1,&S :)U$)("7Z `!1*<Z"0T JR&I3D(S812, E$P" 8M&, *Z?^0


MP6"HD/X/V3H)((!&ZZ11, I&P2B )ABMD\ `\%(=((!&ZZ11, I&P2B &A D
M%=+_(5LG`030:)TT"D;!*! %U &#IT+Z/V3K)( `&JV31L$H& 6C #I M$XB
M&\!+=8 `&JV31L$H& 6C  I 4%5(_X=LG0000*-UTB 8!:- %% !C-9)E !X
MJ0X00*-UTB 8!:- %% *!EN%]'_(UDD``31:)XV"43 *1 %%8!!62/^';)T$


M%(R"43 *R &#N4+Z/V3K)( `&JV31L$H& 6C !PP6B=1$<!+=8 `&JV31L$H


M)G429AV#S"")B\M\*KIV%(R"43 *B 1#I4+Z/V3K)( `HE,_"5EPM$X:!:- 
M% Q1,%HGT0C 2W6 `*)?G819V8S62:- %(R"(02&4(7T?\C620`!-,3J) 88
MH*YK1\$H& 6C " 8K9-H!^"E.D `T6^-`Q:[1_M)HV 4C((A`H96A?1_R-9)


M:(7T?\C620`!-%HGC8)1, I&`4XP6B?1!\!+=8 `&JV31L$H& 6C`#L8NA72
M_R%;)P$$T&B=- I&P2 8!=C!:)U$-P`OU0$":+1.& 6C8!2,`BQ 2%=(_X=L
MG0000*-UTB 8!:- %*"#H5XA_1^R=1) `(W62:- %(R"48 .1NLD. -XJ0X0
M0*-UTB 8!:- %*" 85 A_1^R=1) `(W62:- %(R"48 "1NLD^ -XJ0X00*-U



M"^"E.D `C8[=C8)1, I&.AA^%=+_(5LG`030:)TT"D;!*!CI8+1.&G `+]4!
M`HA6=1*: 6 S3R1Q"1H^"D;!*! %9(-A62']'[)U$D `D5PG(73BUH)9QR S


M-JA8)Z%5K:- %(R"44 V&*V3!AQ %O ``42/-0ZC_:11, I&P6 #P[M"^C]D

M_KJ[`02#P0VC8!2, J$+1NLDJ ."QA)9)P$$T&B=- I&P2 866 D5$C_:5\G
M(8]I,6 #R,K^8ZM^L H"!!"MQNYH"D;KI%$P"D8!>6"$5$C_:5PG81J")H)9
M#Z
M6B>- E$P"D8*&#D5TG]ZS2>A52IXN$3VDP`"B)PZ:73L;A2, E$P%,%HG?2?


MCH)1, H&/QBMD^! 4)6?F'420 "-CMV- E$P"H8Y&($5TO\A6R<!!-!HG30*

M%<& "M-1, I&P: %([9"^C]DZR2 `")SS^QHG30*1L$H&/Q M$["!(.J_,2L

MUDE8Q0=YG0000*-K'$;!*! %PQ",\ KI/]W/%H(+HBF #ZIA[<E UDD``411
M/VET[&X4C()1,#C!:)U$5)W$0#I


M""!Z])-&ZZ11, I&`=W :)T$`8-J?Q(N2S$+?( `&H"QN]$Z:12, E% (S!:
M(<'!$*V3``*(G#II



MD,%HA80)AFB=!!! =%H+3ETP&-PP"D;!*! \8+1.P 1#M
MP= &HQ425C!$ZR2 `!JMDT;!*! %0QN,UDE8P1"MDP`":+1.& 6C8!0,83!:
M(>$"0[1.` B T3II%(R"43"$P6B=A L,T3H)((!&ZZ11, I&P5 %HQ42'C!$
MZR2 `!JMDT;!*! %0Q6,UDEXP!"MDP`":+1.& 6C8!0,23!:(>$'0[1.` B 
MT3II%(R"43 DP6B=A!\,T3H)((!&ZZ11, I&P= #HQ4203!$ZR2 `!JMDT;!
M*! %0P^,UDD$P1"MDP`":+1.& 6C8!0,,3!:(1$#AFB=!!! HW72*! %HV"(
M =$ZB1 P1.LD  `:K9-&P2 8!4,)C%9(1((A6B<!!-!HG30*1L$H&$I M$XB
M$ S1. D  .A4)Z%=MD02%ZMIM'#D*! %HV"0 ]$*B7 P1.LD  "B1YV$L S,
M((F+W\!1, I&P8 "HW42\6"(UDD``42G. FYZS-:)XV"43 *R "C%1))8(C6

M` B (58G$90:!:- % Q+,%HGD0J&:)T$$$ #L.X.<WH)#Q>7:31U[2 8!:- 

M$AE B-9)``$T6B>- E% !3!::-(.C(8M>6"(UDD``31:)XV"44 I !2:HT4G
MC<!HP)('AFB=!!! HW72*! %% %XB3E:=-("C(8JV6"(UDD``31:)XV"44 ^



M.1 -0\K!$*V3``)HM$X:!:. !$!D63E:I%((1 .0<C!$ZR2 `!JMDT;!*" 6
MD%10CI:J9(/1H*,*&*)U$D `C=9)HV 4$ 5(+2A'"U:RP6C0404,T3H)((!&
MZZ11, H(`_)*R=&RE0PP&FC4`D.T3 ((H-$Z:12,` * [%)RM' E`XP&&K7 
M$*V3``)HM$X:!:,`'Z"PB!PM84D"H\%%13!$ZR2 `!JMDT;!*, )*"\B1PM9




M`IH6CJ,E+RXP&C*T`$.T3 (((#K526 `V5*2ZB0T$T;!** BH'7).%KR8 6C
MP4(C,$3K)( `&H U#J/]I%$P" %]2L;1\A<3C(8)C< 0K9,``FA ZB2TWA(>
M+AYS1L$HH!: 6[$X6OZB =$`H1T8HG420 "-[D\:!2,=T+E8'"V%D<%H:- .

M-!QH"H9HG0000*-UTB 8H6! "\31XG T!& -AFB=!!! HW72*!B)8, +Q %W
MP("#T1" -1BB=1) `(W62:- Q(%!4AH.$F?0'P`]/F+]3D\P1.LD  `:K9-&





MK9,``FBT3AH%PPH,W<)QB#K[_U!V^? &0[1.` B T3II% P3,'1K(S 8<NX?
M!F$^C,$0K9,``FBT3AH%0QX,FY)Q:/EB:+EV!((A6B<!!-!HG30*AC 8-K41


MHX 2,.QK(S 8;-X<.2$_G, 0K9,``HA^=1)F93-:)XT"(L%(*Q,'E6<'E6-&

MW:/]I%& &XSP`G% _3["`W]X "%:)P$$$)W6."#W;]#Z.OBYN



MJ]$`'[% B-9)``$T6B>-`GJ T6*1$D!&N(T&^" 8HG420 "-UDFC (9 M&2D

M #I M%BD-< ,V]$P'P5XP!"MDP`":+1.& 7D ]$N$3T!/)!'PWP4$ .&:)T$

M&JV31 $^,#HZ-PI&P1 %0[1.` B T3II%" `< TT6 ^- E$PI,$0K9,``FBT
M3AJ)`+/N&:V!1L$H&&9 B-9)``$T6B<-6X"KXAFM>T;!*! )8(C620`!-%HG


M20B7X782.:X=\*[W(.O7CX)1, J&-QBB=1) `(V8. F?C;3LP8R"43 *1L% 
M "%:)P$$T$BIDT8KF%$P"D;!B )#M$X""* A5B<QC()1, I&P2  ` S1. D 
M (98G410BA8N&;5E\- R;#PR:LL M&+4E $$<-<"!-! K), `+\:^KADU);!


MHV 4C()1, H&&,#K
M8 "O P`":+1.& 6C8!2, E$PP !>!P$$X-Z,<0``01CH_U_M1 Q8H-K!R.AP
M;0G$B9__)'?G5)X]253L42N!Y"2A(L1=B0E5G/G<!LM'*$1FY3I85=,01S(`


M2::1EXF(CV7*?80G6G$E8^)M(9 !\1M+HY*-_ #N?H `W%4Q$ ` "/K_JVMK
M2$!3N^MR<PA25#ZI1X10MP4<#N4EYV


MS2>)2Y(M)%42Y%E!O&D45A $N63'#C


M#R8




M  $&\#H(((!&ZZ11, I&P2 8!0,,X'400 "-UDFC8!2, E$P" 88P.L   `:





MC_\8W2!D76B,_WCK)/S=IO^CV7 4C& `3_P``31:)XV"40`%:-4)FC N]5 5

MHV"D`7CB!PB T3II%(P", '^7$-&GAK-AJ- Q )XX <(H-$Z:12,`O(!KJ4*
MHQ72*! %) %X^ <(H-$Z:12, E$P"D;!``-X'0000*-UTB 8!:- %(R"`0;P
M.    $;KI%$P"D;!*! %`PS =1! `(W62:- %(R"43 *!AC ZR" `!JMDT;!


M& 6C8!2,  $&\#H((("P'"(Y"D;!*! %HV 4T!E *B. `!KM'HV"43 *1L$H
J&"P`((!&ZZ11, I&P2 8!8,%``08`&FN<"M[6$Y2`````$E%3D2N0F""
`
end
Aug 11 2004
next sibling parent reply Sean Kelly <sean f4.ca>s writes:
In article <cfdoer$13pt$1 digitaldaemon.com>, van eeshan says...
1) making claims about D's perfomance superiority over Java due to the
former's ability to use the stack (versus always on the heap) is based on
speculation; not fact. Testing shows this to be entirely the opposite for
arrays beyond 64 bytes in length. Similarly, any kind of performance testing
against C/C++ within this realm will be viewed as a negative mark against D.
Regan, if I remember correctly, reported a bug where executable size is proportional to the size of static arrays the application contains. This performance issue sounds like it may be related.
2) Implicit initialization is great. However, it can clearly and easily be
demonstrated to be a serious impediment. Given that D is supposed to be a
'systems' language, I humbly suggest that a mechanism be provided to bypass
such nicities when one is prepared to assume full responsibility. For
example, there might be a qualifier for variables that disables the
automatic initialization (along the lines of the 'auto' qualifier).
An impediment how? Just in terms of performance? In general, I would say that default initialization is worth the performance cost for its ability to prevent bugs. Sean
Aug 11 2004
next sibling parent stonecobra <scott stonecobra.com> writes:
2) Implicit initialization is great. However, it can clearly and easily be
demonstrated to be a serious impediment. Given that D is supposed to be a
'systems' language, I humbly suggest that a mechanism be provided to bypass
such nicities when one is prepared to assume full responsibility. For
example, there might be a qualifier for variables that disables the
automatic initialization (along the lines of the 'auto' qualifier).
An impediment how? Just in terms of performance? In general, I would say that default initialization is worth the performance cost for its ability to prevent bugs.
That's why I think the ability to have it on by default is great. If it were able to be turned off, that is no different from me reusing a buffer. The buffer is only initialized once, yet I re-use it thousands of times without much problem. Performance does and will always matter, regardless of the secret contract between Intel and Microsoft to castrate newer, faster PCs. MHO, of course. Scott
Aug 11 2004
prev sibling parent reply "van eeshan" <vanee hotmail.net> writes:
"Sean Kelly s" <sean f4.ca> wrote in message
news:cfe4hu$19cf$1 digitaldaemon.com...
 In article <cfdoer$13pt$1 digitaldaemon.com>, van eeshan says...
1) making claims about D's perfomance superiority over Java due to the
former's ability to use the stack (versus always on the heap) is based on
speculation; not fact. Testing shows this to be entirely the opposite for
arrays beyond 64 bytes in length. Similarly, any kind of performance
testing
against C/C++ within this realm will be viewed as a negative mark against
D.
 Regan, if I remember correctly, reported a bug where executable size is
 proportional to the size of static arrays the application contains.  This
 performance issue sounds like it may be related.
AFAIK this is not related in the slightest, I'm afraid. This topic is purely about CPU cycles, and the stack. Nothing to do with static data or executable size.
2) Implicit initialization is great. However, it can clearly and easily
be
demonstrated to be a serious impediment. Given that D is supposed to be a
'systems' language, I humbly suggest that a mechanism be provided to
bypass
such nicities when one is prepared to assume full responsibility. For
example, there might be a qualifier for variables that disables the
automatic initialization (along the lines of the 'auto' qualifier).
An impediment how? Just in terms of performance? In general, I would say
that
 default initialization is worth the performance cost for its ability to
prevent
 bugs.
I don't think anything negative was said about default initialization as a feature, Sean. If it was then I apologize. Default/implicit initialization is great (as was noted; so we agree), but there are times when you need to take full control over what's going on. It may not matter to you specifically, but there are many situations whereby one needs to take full responsibility over a select set of variables (such as temporary stack-based buffers). Providing the mechanism as an /option/ is not detrimental to the language. It's like applying "const", or one of the other optional qualifiers. I'm just illustrating that the language could really use a means to disable default-initialization for those cases where it is deemed both completely unecessary and too costly. Surely that's not too much to consider?
Aug 11 2004
next sibling parent Sean Kelly <sean f4.ca> writes:
In article <cfe60s$1a2i$1 digitaldaemon.com>, van eeshan says...
"Sean Kelly s" <sean f4.ca> wrote in message
news:cfe4hu$19cf$1 digitaldaemon.com...

 Regan, if I remember correctly, reported a bug where executable size is
 proportional to the size of static arrays the application contains.  This
 performance issue sounds like it may be related.
AFAIK this is not related in the slightest, I'm afraid. This topic is purely about CPU cycles, and the stack. Nothing to do with static data or executable size.
Ah okay. I thought perhaps the current code generation involving arrays had issues that might be corrected in the future.
 An impediment how?  Just in terms of performance?  In general, I would say
that
 default initialization is worth the performance cost for its ability to
prevent
 bugs.
I don't think anything negative was said about default initialization as a feature, Sean. If it was then I apologize. Default/implicit initialization is great (as was noted; so we agree), but there are times when you need to take full control over what's going on. It may not matter to you specifically, but there are many situations whereby one needs to take full responsibility over a select set of variables (such as temporary stack-based buffers). Providing the mechanism as an /option/ is not detrimental to the language. It's like applying "const", or one of the other optional qualifiers. I'm just illustrating that the language could really use a means to disable default-initialization for those cases where it is deemed both completely unecessary and too costly. Surely that's not too much to consider?
Not at all. However this might violate Walter's rule against pragmas. Even so, I had overlooked the fact that declaring an array would implicitly initialize all array elements. This kind of stinks from a performance perspective, since 90% of the time I use arrays I have no need for them to be initialized. Perhaps a new attribute? declare int x[10000000]; declare { float a[100000]; char b[1000000]; } Sean
Aug 11 2004
prev sibling next sibling parent Regan Heath <regan netwin.co.nz> writes:
On Wed, 11 Aug 2004 15:21:50 -0700, van eeshan <vanee hotmail.net> wrote:
 "Sean Kelly s" <sean f4.ca> wrote in message
 news:cfe4hu$19cf$1 digitaldaemon.com...
 In article <cfdoer$13pt$1 digitaldaemon.com>, van eeshan says...
1) making claims about D's perfomance superiority over Java due to the
former's ability to use the stack (versus always on the heap) is based 
on
speculation; not fact. Testing shows this to be entirely the opposite 
for
arrays beyond 64 bytes in length. Similarly, any kind of performance
testing
against C/C++ within this realm will be viewed as a negative mark 
against
D.
 Regan, if I remember correctly, reported a bug where executable size is
 proportional to the size of static arrays the application contains.  
 This
 performance issue sounds like it may be related.
AFAIK this is not related in the slightest, I'm afraid. This topic is purely about CPU cycles, and the stack. Nothing to do with static data or executable size.
I'm not sure it was me who reported it initially, Arcane Jill was discussing it more recently.. I think what Sean was referring to by mentioning the size of executables is an indication of how static arrays are currently implemented, the array seems to be compiled into the executable, whether this is to increase performance or decrease complexity, we don't know.
2) Implicit initialization is great. However, it can clearly and easily
be
demonstrated to be a serious impediment. Given that D is supposed to 
be a
'systems' language, I humbly suggest that a mechanism be provided to
bypass
such nicities when one is prepared to assume full responsibility. For
example, there might be a qualifier for variables that disables the
automatic initialization (along the lines of the 'auto' qualifier).
An impediment how? Just in terms of performance? In general, I would say
that
 default initialization is worth the performance cost for its ability to
prevent
 bugs.
I don't think anything negative was said about default initialization as a feature, Sean. If it was then I apologize. Default/implicit initialization is great (as was noted; so we agree), but there are times when you need to take full control over what's going on. It may not matter to you specifically, but there are many situations whereby one needs to take full responsibility over a select set of variables (such as temporary stack-based buffers). Providing the mechanism as an /option/ is not detrimental to the language. It's like applying "const", or one of the other optional qualifiers. I'm just illustrating that the language could really use a means to disable default-initialization for those cases where it is deemed both completely unecessary and too costly. Surely that's not too much to consider?
After seeing the cost of default initialization, I can see when someone might not want it, it should be optional IMO. Perhaps it's a good default behaviour, I am un-convinced, AFAIKS un-initialised data generally causes a spectaclar and obvious failure, in fact shouldn't good DBC catch them, so, if I have to choose between default initialization and efficient arrays I choose the latter. default initialization also seems to be responsible for hiding my un-used variables? or is dmd just not looking for them, I guess an 'error' is overkill for an un-used variable, and there might possibly be a reason to use one, can anyone think of one? I am a tidy programmer and these bug me. Regan -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Aug 11 2004
prev sibling parent "Walter" <newshound digitalmars.com> writes:
"van eeshan" <vanee hotmail.net> wrote in message
news:cfe60s$1a2i$1 digitaldaemon.com...
 I don't think anything negative was said about default initialization as a
 feature, Sean. If it was then I apologize. Default/implicit initialization
 is great (as was noted; so we agree), but there are times when you need to
 take full control over what's going on. It may not matter to you
 specifically, but there are many situations whereby one needs to take full
 responsibility over a select set of variables (such as temporary
stack-based
 buffers). Providing the mechanism as an /option/ is not detrimental to the
 language. It's like applying "const", or one of the other optional
 qualifiers.

 I'm just illustrating that the language could really use a means to
disable
 default-initialization for those cases where it is deemed both completely
 unecessary and too costly. Surely that's not too much to consider?
You can take full control in D by doing: byte* p = malloc(size); and p will point to an uninitialized buffer.
Aug 11 2004
prev sibling parent reply "Walter" <newshound digitalmars.com> writes:
"van eeshan" <vanee hotmail.net> wrote in message
news:cfdoer$13pt$1 digitaldaemon.com...
 There's an interesting quirk at the 4096-byte mark: the codegen changes
 strategy slightly at this point, and I'd guess it is doing something akin
to
 paging (ensuring the stack page-set is available), though can't be sure.
 Take a look at the previously posted assembler for reference.
That's because of how the underlying OS allocates space to the stack. This shift in strategy happens with C/C++, too.
 If local (stack) arrays were to use the C/C++ approach, the time consumed
 would be ~10ms across the board, rather than the ~500ms for heap
 allocations. That is, for a 32KB local array, the initialization cost is
 ~865 times /more/ than it would be for the equivalent C/C++ approach
(~10ms
 versus ~8650ms); almost three orders of magnatude.
To do an apples/apples comparison, benchmark against calloc/free, not malloc/free. I normally use calloc in C++, not malloc, because the uninitialization bugs aren't worth it.
 This indicates two things:

 1) making claims about D's perfomance superiority over Java due to the
 former's ability to use the stack (versus always on the heap) is based on
 speculation; not fact. Testing shows this to be entirely the opposite for
No, it does not show that at all: 1) Your test was with C++, not Java. 2) You cannot allocate uninitialized arrays (or any uninitialized data) in Java. 3) I've written a Java compiler and have done a lot of performance tuning of Java apps trying to make them go faster. I'm very familiar with where the cycles got eaten. 4) I presume that someone allocating an array in C++ would presumably store something in it - just allocating it, doing nothing with it, and freeing it is about as representative as benchmarking the sequence of code: x = x + 1; x = x - 1;
 arrays beyond 64 bytes in length.
Most lightweight objects are less than 64 bytes in length. Allocating 1000 byte objects on the stack is very rare, not the usual case.
 Similarly, any kind of performance testing
 against C/C++ within this realm will be viewed as a negative mark against
D. I've had problems with inappropriate use of benchmarks since my compiler executed the following loop: void foo() { int i; double d; for (i = 0; i < 1000; i++) d += 1; } in orders of magnitude faster than the competition. The journalist disassembled the code, found that the compiler had removed the floating point instructions, and so declared the compiler to be "buggy" in his writeup. He never bothered to contact us to find out why. What actually happened was that it was the ONLY compiler reviewed that did data flow analysis, and concluded that the benchmark routine did nothing and so deleted it. Today, nearly all compilers do that.
 2) Implicit initialization is great. However, it can clearly and easily be
 demonstrated to be a serious impediment. Given that D is supposed to be a
 'systems' language, I humbly suggest that a mechanism be provided to
bypass
 such nicities when one is prepared to assume full responsibility. For
 example, there might be a qualifier for variables that disables the
 automatic initialization (along the lines of the 'auto' qualifier).
The solution is improving the optimizer so it can determine when the initialization is redundant. It already does this for most variables, just not for arrays.
Aug 11 2004
parent reply "van eeshan" <vanee hotmail.net> writes:
"Walter" <newshound digitalmars.com> wrote
 If local (stack) arrays were to use the C/C++ approach, the time
consumed
 would be ~10ms across the board, rather than the ~500ms for heap
 allocations. That is, for a 32KB local array, the initialization cost is
 ~865 times /more/ than it would be for the equivalent C/C++ approach
(~10ms
 versus ~8650ms); almost three orders of magnatude.
To do an apples/apples comparison, benchmark against calloc/free, not malloc/free. I normally use calloc in C++, not malloc, because the uninitialization bugs aren't worth it.
The whole point of that post was to selectively and optionally /inhibit/ initialization, where that eats far too many cycles. That's why malloc() was used rather than calloc(). There are plenty times where you need to avoid the (totally superfluous) overhead with workspace buffers. You were stating that heap allocation is a lot slower than stack allocation. Well, that is absolutely not true when you don't want the initialization; heap allocation is (bizarrely) faster within D since you /cannot/ inhibit the darned array initialization when you really need to. C/C++ stack allocations are blindingly fast by comparision; just a couple of instructions in many cases.
 1) Your test was with C++, not Java.
That was D source-code.
 4) I presume that someone allocating an array in C++ would presumably
store
 something in it - just allocating it, doing nothing with it, and freeing
it
 is about as representative as benchmarking the sequence of code:

     x = x + 1;
     x = x - 1;
Thanks Walter ... you have never heard of allocating a stack buffer, and then passing it off to another function for population? Yes, something else is done with the buffer, but often there's only a very /small/ part of it used. The overhead of initialization totally swamps the typical processing time. Do that often enough (in, say, something like text processing) and it adds up rapidly. You've never done this kind of thing before? Fine. This was a test to see where the unknown overhead was coming from during a C port; naturally, the test was whittled down to isolate the culprit. Isn't that how you like them?
 Most lightweight objects are less than 64 bytes in length. Allocating 1000
 byte objects on the stack is very rare, not the usual case.
What usual case? Do you mean in all software? Please be a little more specific, as "usual case" is terribly vague. Allocating double[1024], char[8192], and often much bigger, on the stack is not at all rare; far from it. How about just one example? Microsoft does it /all/ the time in their Operating Systems, IE, and probably all their other mainstream software. Why do you think AMD and Intel have the NX instruction in their CPUs now? What's nice about D is that you always know the length of the array, and debug options catch out-of-bounds conditions. Rare? I'd claim it's perhaps a rather usual case in existing C and C++ code. May I ask where you get your use-cases from, please?
 I've had problems with inappropriate use of benchmarks since my compiler
 executed the following loop:

 void foo()
 {  int i;
     double d;
     for (i = 0; i < 1000; i++)
         d += 1;
 }

 in orders of magnitude faster than the competition. The journalist
 disassembled the code, found that the compiler had removed the floating
 point instructions, and so declared the compiler to be "buggy" in his
 writeup. He never bothered to contact us to find out why. What actually
 happened was that it was the ONLY compiler reviewed that did data flow
 analysis, and concluded that the benchmark routine did nothing and so
 deleted it. Today, nearly all compilers do that.
You didn't bother to ask about the code-paths that prompted this topic, Walter; yet seem to go out of your way to belittle any and all points made; and then vehemently deny any possible existence of the problem. I really don't understand why this post has apparently inflamed you so, but do offer sincere and profound apologize. One would have thought knowing where the codegen was seriously non-optimal for given scenarios would be helpful to you. Would you rather not know?
 2) Implicit initialization is great. However, it can clearly and easily
be
 demonstrated to be a serious impediment. Given that D is supposed to be
a
 'systems' language, I humbly suggest that a mechanism be provided to
bypass
 such nicities when one is prepared to assume full responsibility. For
 example, there might be a qualifier for variables that disables the
 automatic initialization (along the lines of the 'auto' qualifier).
The solution is improving the optimizer so it can determine when the initialization is redundant. It already does this for most variables, just not for arrays.
Great! Will it eliminate initialization for this type of (trivail) usage: void A (void delegate (char[]) formatter) { char [1024 * 32] workspace; formatter (workspace); } Because, if not, then the following modification is the sort of (keyword qualifier) thing that would hand control back to the programmer: { uninitialized char[1024 * 32] workspace; formatter (workspace); } That's perhaps not the best name for it, but you get the idea. It's very clear what's going on, it's entirely optional, and the compiler will do exactly what the programmer intends it to do. No under-the-tablecloth-magic required. Oh, and it's easy to grep for. It's just a suggestion.
Aug 12 2004
next sibling parent Derek Parnell <derek psych.ward> writes:
On Thu, 12 Aug 2004 00:31:38 -0700, van eeshan wrote:

 "Walter" <newshound digitalmars.com> wrote
[snip]
 The solution is improving the optimizer so it can determine when the
 initialization is redundant. It already does this for most variables, just
 not for arrays.
Great! Will it eliminate initialization for this type of (trivail) usage: void A (void delegate (char[]) formatter) { char [1024 * 32] workspace; formatter (workspace); } Because, if not, then the following modification is the sort of (keyword qualifier) thing that would hand control back to the programmer: { uninitialized char[1024 * 32] workspace; formatter (workspace); } That's perhaps not the best name for it, but you get the idea. It's very clear what's going on, it's entirely optional, and the compiler will do exactly what the programmer intends it to do. No under-the-tablecloth-magic required. Oh, and it's easy to grep for. It's just a suggestion.
Sounds like a good idea. It allows a coder to take some responsibility for their own work. There is also nothing wrong with D's auto-initialized arrays *when* the coder accepts them too. I feel that D ought to support both ideas. A new keyword is probably the way to go. -- Derek Melbourne, Australia 12/Aug/04 5:54:15 PM
Aug 12 2004
prev sibling next sibling parent reply Ben Hinkle <bhinkle4 juno.com> writes:
[snip]

 2) Implicit initialization is great. However, it can clearly and easily
be
 demonstrated to be a serious impediment. Given that D is supposed to be
a
 'systems' language, I humbly suggest that a mechanism be provided to
bypass
 such nicities when one is prepared to assume full responsibility. For
 example, there might be a qualifier for variables that disables the
 automatic initialization (along the lines of the 'auto' qualifier).
The solution is improving the optimizer so it can determine when the initialization is redundant. It already does this for most variables, just not for arrays.
Great! Will it eliminate initialization for this type of (trivail) usage: void A (void delegate (char[]) formatter) { char [1024 * 32] workspace; formatter (workspace); } Because, if not, then the following modification is the sort of (keyword qualifier) thing that would hand control back to the programmer: { uninitialized char[1024 * 32] workspace; formatter (workspace); } That's perhaps not the best name for it, but you get the idea. It's very clear what's going on, it's entirely optional, and the compiler will do exactly what the programmer intends it to do. No under-the-tablecloth-magic required. Oh, and it's easy to grep for. It's just a suggestion.
As a general principle I agree it would be nice to be able to skip initialization just like it's nice to be able to turn off array-bounds checking. But in practice (in D) I wonder how often that would pay off. Allocating a 32K array on the stack is pretty evil, and I wonder how often that really happens. What C code were you porting that was doing that? I assume it was allocating such a big array not because it was going to use it all (as you said only a small portion of the array gets used) but because the coder was lazy and was trying to avoid worrying about array bounds errors. As we all know overwriting array bounds invites exploits and bugs. In D the way I've been coding APIs that take buffers as input is to take a dynamic array as "a typical size buffer" and if the buffer turns out to be too small then the function increases the length field of the local variable which could potentially allocate new GC'ed memory. For example in std/stream.d there is the function char[] readLine(char[] buffer); which takes an input buffer and fills it with the next line of text from the stream. If the line fits in the buffer the output array's length is the number of characters read and the char* is the buffer's char*. If the line didn't fit then readLine (potentially) allocates a new buffer (just by increasing the length field of the local copy of the buffer variable) and returns the char* of the new buffer. The "overflow" buffer is garbage collected eventually so there isn't any memory leak going on. So in essence the input buffer is just a suggested buffer for readLine but there is no guarantee that the buffer will contain the result of the funtion call when it is done. This approach means you can target your buffers to the typical usage and avoid array-bounds errors.
Aug 12 2004
parent reply "van eeshan" <vanee hotmail.net> writes:
"Ben Hinkle" <bhinkle4 juno.com> wrote in message
news:cfg7af$2dbv$1 digitaldaemon.com...
 [snip]

 2) Implicit initialization is great. However, it can clearly and
easily
 be
 demonstrated to be a serious impediment. Given that D is supposed to
be
 a
 'systems' language, I humbly suggest that a mechanism be provided to
bypass
 such nicities when one is prepared to assume full responsibility. For
 example, there might be a qualifier for variables that disables the
 automatic initialization (along the lines of the 'auto' qualifier).
The solution is improving the optimizer so it can determine when the initialization is redundant. It already does this for most variables, just not for arrays.
Great! Will it eliminate initialization for this type of (trivail)
usage:
 void A (void delegate (char[]) formatter)
 {
     char [1024 * 32] workspace;

     formatter (workspace);
 }

 Because, if not, then the following modification is the sort of (keyword
 qualifier) thing that would hand control back to the programmer:

 {
     uninitialized char[1024 * 32] workspace;

     formatter (workspace);
 }

 That's perhaps not the best name for it, but you get the idea. It's very
 clear what's going on, it's entirely optional, and the compiler will do
 exactly what the programmer intends it to do. No
 under-the-tablecloth-magic required. Oh, and it's easy to grep for. It's
 just a suggestion.
As a general principle I agree it would be nice to be able to skip initialization just like it's nice to be able to turn off array-bounds checking. But in practice (in D) I wonder how often that would pay off. Allocating a 32K array on the stack is pretty evil, and I wonder how often that really happens.
Is allocating a 2K array on the stack equally evil? From the reply to AJ: 2K allocations: heap: 504 stack[]: 5681 alloca(): 57 asm: 10 32K allocations: heap: 520 stack[]: 8812 alloca(): 118 asm: 10 You can see that the 2K array initialization is almost equally obnoxious, compared to the traditional [asm] C/C++ uninitialized approach. The whole point is that here's a scenario where the compiler thinks it knows best, and won't hand over control unless we run rings around it. Perhaps worse, the whole issue is squirrelled away under the covers; you have to dig like a rabbit just to find the culprit.
 What C code were you porting that was doing that? I
 assume it was allocating such a big array not because it was going to use
 it all (as you said only a small portion of the array gets used) but
 because the coder was lazy and was trying to avoid worrying about array
 bounds errors.
Heavy-duty text processing. The coder wasn't being lazy at all, as out-of-bounds conditions are being checked for. The allocated (temporary) workspace is typically used in only a small amounts, but sporadically will get used more fully. The approach taken was apparently "we'll check for end-of-buffer, but if anything exceeds these limits, then it's so beyond the design-specification that it's an error condition". I understand that to be a rather common approach. It's certainly valid, and rather effective. The D approach to this is, " ...you /vill/ pay ze initialization cost, regardless of vot ze design!" (that is a caricature; no offense intended to anyone) The existing code simply took advantage of the fact that it's perfectly legitimate, and very fast, to do that kind of thing in C and C++. If D is going to outlaw such approaches then it had better get it down in the "manifesto". However, I didn't realize that D was such an ideological exercise. <snip>
 it is done. This approach means you can target your buffers to the typical
 usage and avoid array-bounds errors.
Yes, this is all well and good. But /redesigning/ existing software to suit is just not a solution. While the approach you discuss is perfectly valid, so was the original (given its apparent design scope) since it did not have array-bounds or buffer-overrun conditions. Further, you can't reliably place an extensible buffer on the stack; thus, you are forced to forego the performance benefits of doing so. This thread is not about a design-methodology issue ~ if it were, nobody would be allowed to use the stack for array allocation, at all! It's a simple issue of the D language not providing the programmer with "accessible" tools to decide what's best. In this particular case it showed up in ported code. There's nothing to say that it won't show up in original D software either. As you say, Ben; as a general principal, D should (absolutely) permit the skipping of array initialization. The whole default initialization thing is arguably wonderful, but then not providing accessible control over it is akin to tinpot-dictatorship. For example, I will probably resort to assembly to get around this issue: that's not exactly an accessible or acceptable solution, for obvious reasons. I think we can agree on that basic tenet. Your other points are perhaps more about a design philosophy: that's a different ball-game and, applicable or not, unfortunately does not jive with porting existing code. The optional "keyword qualifier" example (at the top of the page) is likely to be as trivial as one could hope for within the compiler; combined with a little documentation, that would resolve such issues in a pragmatic and considerate manner. Surely that is far simpler than building an optimizer to read the programmers mind? Or, is there some fundamental ideology at stake here?
Aug 12 2004
parent reply "Walter" <newshound digitalmars.com> writes:
"van eeshan" <vanee hotmail.net> wrote in message
news:cfge6g$2g1j$1 digitaldaemon.com...
 Heavy-duty text processing. The coder wasn't being lazy at all, as
 out-of-bounds conditions are being checked for. The allocated (temporary)
 workspace is typically used in only a small amounts, but sporadically will
 get used more fully. The approach taken was apparently "we'll check for
 end-of-buffer, but if anything exceeds these limits, then it's so beyond
the
 design-specification that it's an error condition". I understand that to
be
 a rather common approach. It's certainly valid, and rather effective. The
D
 approach to this is, " ...you /vill/ pay ze initialization cost,
regardless
 of vot ze design!"
Here's an example of an alternate way to do it, for code that would typically use only a small amount, but occaisionally could use a great deal. It can also be set up to never overflow (at least until all memory is exhausted) by repeatedly resizing it. This has the advantage of not needing to set up any error recovery, or create and document an error message for the limits. import std.stdio; char[] formatter(char[] s) { if (s.length < 14) // if our passed buffer isn't big enough s.length = 14; // yes, we can reallocate things that were originally on the stack s[0..14] = "abcdefghijklmn"; // copy result into s[] return s; } void main() { char[4] workspace; // assume 4 is size typically needed in formatter() char[] s; s = formatter(workspace); writefln(s); }
Aug 12 2004
next sibling parent reply "van eeshan" <vanee hotmail.net> writes:
"Walter" <newshound digitalmars.com> wrote ...
 Here's an example of an alternate way to do it, for code that would
 typically use only a small amount, but occaisionally could use a great
deal. Thanks Walter, for the alternate implementation. Ben had already suggested that, and I am aware of various other approaches too. However, let's not confuse this topic with that of design-methodology. As was noted (in the post you replied to) this is a port. Do you think it reasonable to state, " ... Now listen up you ...scurrilous ... miserable bunch of C coders! If you want to join the D party, then you're gonna' have to shape-up, and drop all those implementation approaches that were valid and comfortable. We don't have any of that nonsense over here! If you want your damned existing code to run fast, well then, by golly you'd better make sure you redesign it too!" ? ... and all because there's no accessible way to inhibit default initialization? The issues at hand are several: 1) the original implementation was, and still is, a perfectly valid approach (given the requirements). It is also executes rather efficiently. The same pattern is quite likely to occur in original D code also. Let's try to avoid whether other approaches are better or worse; that doesn't help. Not even the best education can thoroughly resolve such problems. For the record, your alternative still needs an additional boundary-check plus error-message for equivalent pathological conditions (out of memory). 2) what you are now suggesting is that the program should be redesigned instead. That is simply not realistic for working, optimal code. The code compiled and executed without too much effort, but it exhibits performance issues. That's what drove the foray into this topic. I will probably fix it with the assembly-code thoughtfully provided by AJ (and thanks to Ben also for both his suggestions) but that's hardly an accessible, or portable, resolution in the general sense. 3) the solution offered (by both yourself and by Ben) is not allocated on the stack. The whole point here is to utilize the stack because it's a much faster means of providing a temporary workspace. Going to the heap means a partial redesign, and the program will still run slower in D than it did originally. Forcing stack arrays to initialize will /always/ be dramatically slower than the equivalent C/C++ code, whereas a simple embellishment to the existing initialization mechanism would resolve that completely; and in a jiffy. This is about choice, Walter, and supporting the programmer in what they are doing. 4) the approach to addressing this seems to be one of "guilty until proven innocent". That is, there doesn't seem to be any notion on your part that this might, actually, be a valid observation. With solid number to back it up. You're basically saying, " ... the approach never had, and never will, have any validity whatsoever. Rip out the code and do it just like I would". I do sincerely hope that I'm misunderstanding you, but that's kinda' how it has been coming across. May I ask why some easy manual control over default-initialization is such a terrible thing? Is it really /so/ impossible that you'd rather force people into re-implementing and retesting parts of their existing codebase instead (as you suggest above) ? Or is there perhaps some fundamental ideology at stake, that I just don't see? "Walter" <newshound digitalmars.com> wrote in message news:cfgqv4$2lq9$1 digitaldaemon.com...
 "van eeshan" <vanee hotmail.net> wrote in message
 news:cfge6g$2g1j$1 digitaldaemon.com...
 Heavy-duty text processing. The coder wasn't being lazy at all, as
 out-of-bounds conditions are being checked for. The allocated
(temporary)
 workspace is typically used in only a small amounts, but sporadically
will
 get used more fully. The approach taken was apparently "we'll check for
 end-of-buffer, but if anything exceeds these limits, then it's so beyond
the
 design-specification that it's an error condition". I understand that to
be
 a rather common approach. It's certainly valid, and rather effective.
The
 D
 approach to this is, " ...you /vill/ pay ze initialization cost,
regardless
 of vot ze design!"
Here's an example of an alternate way to do it, for code that would typically use only a small amount, but occaisionally could use a great
deal.
 It can also be set up to never overflow (at least until all memory is
 exhausted) by repeatedly resizing it. This has the advantage of not
needing
 to set up any error recovery, or create and document an error message for
 the limits.

 import std.stdio;

 char[] formatter(char[] s)
 {
     if (s.length < 14)        // if our passed buffer isn't big enough
         s.length = 14;        // yes, we can reallocate things that were
 originally on the stack
     s[0..14] = "abcdefghijklmn";    // copy result into s[]
     return s;
 }


 void main()
 {   char[4] workspace;    // assume 4 is size typically needed in
 formatter()
     char[] s;

     s = formatter(workspace);
     writefln(s);
 }
Aug 12 2004
parent reply "Walter" <newshound digitalmars.com> writes:
"van eeshan" <vanee hotmail.net> wrote in message
news:cfgvi5$2npu$1 digitaldaemon.com...
 1) the original implementation was, and still is, a perfectly valid
approach
 (given the requirements).
And it is valid and works in D.
 It is also executes rather efficiently. The same
 pattern is quite likely to occur in original D code also. Let's try to
avoid
 whether other approaches are better or worse; that doesn't help. Not even
 the best education can thoroughly resolve such problems. For the record,
 your alternative still needs an additional boundary-check
Yes.
 plus error-message
 for equivalent pathological conditions (out of memory).
This is not necessary, because very few programs can proceed if they run out of memory, so they just issue an error message and abort. In D, this is handled automatically - running out of memory throws an exception with a message in it, if you don't catch the exception, the message is printed and the program aborted. For nearly all programs, this is the right behavior.
 2) what you are now suggesting is that the program should be redesigned
 instead. That is simply not realistic for working, optimal code. The code
 compiled and executed without too much effort, but it exhibits performance
 issues. That's what drove the foray into this topic. I will probably fix
it
 with the assembly-code thoughtfully provided by AJ (and thanks to Ben also
 for both his suggestions) but that's hardly an accessible, or portable,
 resolution in the general sense.
I suggest the alloca() approach, and do some timing tests on the real app with it vs AJ's approach. I'm willing to bet you a beer that you won't be able to measure the difference, especially if the typical use of the array will fill it 2 or 3 Kb worth.
 3) the solution offered (by both yourself and by Ben) is not allocated on
 the stack.
The alloca() version is.
 The whole point here is to utilize the stack because it's a much
 faster means of providing a temporary workspace. Going to the heap means a
 partial redesign, and the program will still run slower in D than it did
 originally. Forcing stack arrays to initialize will /always/ be
dramatically
 slower than the equivalent C/C++ code, whereas a simple embellishment to
the
 existing initialization mechanism would resolve that completely; and in a
 jiffy. This is about choice, Walter, and supporting the programmer in what
 they are doing.

 4) the approach to addressing this seems to be one of "guilty until proven
 innocent". That is, there doesn't seem to be any notion on your part that
 this might, actually, be a valid observation. With solid number to back it
 up. You're basically saying, " ... the approach never had, and never will,
 have any validity whatsoever. Rip out the code and do it just like I
would".
 I do sincerely hope that I'm misunderstanding you, but that's kinda' how
it
 has been coming across.

 May I ask why some easy manual control over default-initialization is such
a
 terrible thing? Is it really /so/ impossible that you'd rather force
people
 into re-implementing and retesting parts of their existing codebase
instead
 (as you suggest above) ?

 Or is there perhaps some fundamental ideology at stake, that I just don't
 see?
Uninitialized data can be a source of bugs and trouble, as I pointed out in my other posting, even when used correctly. One design goal of D is to improve reliability and portability by eliminating sources of undefined behavior, and uninitialized data is one huge source of undefined, unportable, erratic and unpredictable behavior. I'm trying to squeeze that out of the normal constructs in D - but access to C functions and inline assembler is there for folks who really need it. Give the other methods presented a chance, and time them in context.
Aug 12 2004
next sibling parent reply Michael <Michael_member pathlink.com> writes:
Just out of curiosity, how does alloca() work? I guess it just manipulates the
Stack and Base Pointers?

My apologies for going a bit off topic, I've just never seen alloca() before.
Aug 13 2004
parent reply "Walter" <newshound digitalmars.com> writes:
"Michael" <Michael_member pathlink.com> wrote in message
news:cfhr6d$9qj$1 digitaldaemon.com...
 Just out of curiosity, how does alloca() work? I guess it just manipulates
the
 Stack and Base Pointers?

 My apologies for going a bit off topic, I've just never seen alloca()
before. It manipulates the ESP register. There is some complexity in it due to needing to 'touch' each 4k page allocated, and to relocate expression temporaries that are pushed on the stack. The source for it comes on the DMC++ CD.
Aug 13 2004
parent Michael <Michael_member pathlink.com> writes:
Ahh, I hadn't thought about expression temps. Very interesting stuff. I think I
may order that CD. 

Thanks very much for the information.


 Just out of curiosity, how does alloca() work? I guess it just manipulates
the
 Stack and Base Pointers?

 My apologies for going a bit off topic, I've just never seen alloca()
before. It manipulates the ESP register. There is some complexity in it due to needing to 'touch' each 4k page allocated, and to relocate expression temporaries that are pushed on the stack. The source for it comes on the DMC++ CD.
Aug 13 2004
prev sibling parent reply "van eeshan" <vanee hotmail.net> writes:
"Walter" <newshound digitalmars.com> wrote in message
news:cfhk9n$5jh$1 digitaldaemon.com...
 plus error-message
 for equivalent pathological conditions (out of memory).
This is not necessary, because very few programs can proceed if they run
out
 of memory, so they just issue an error message and abort. In D, this is
 handled automatically - running out of memory throws an exception with a
 message in it, if you don't catch the exception, the message is printed
and
 the program aborted. For nearly all programs, this is the right behavior.
Perhaps you haven't had the opportunity to write resiliant software? For example, halting an entire web-site, because some file is not in the correct format, is just not a solution. Frankly, I'm really surprised you'd encourage something so obviously naiive in this day and age. With a server, you do whatever is necessary to track and release allocations to ensure the server stays alive as long as it can. Even better; you focus on temporarily /stack/ allocations (surprise!), and thread-local reusable buffers, so there's never any issue about running out of memory (as a bonus, it's all thread-safe). Please don't waste your efforts on me in this respect: Been there. Done that. Seen the movie a gazillion times. Optimize it while I sleep.
 2) what you are now suggesting is that the program should be redesigned
 instead. That is simply not realistic for working, optimal code. The
code
 compiled and executed without too much effort, but it exhibits
performance
 issues. That's what drove the foray into this topic. I will probably fix
it
 with the assembly-code thoughtfully provided by AJ (and thanks to Ben
also
 for both his suggestions) but that's hardly an accessible, or portable,
 resolution in the general sense.
I suggest the alloca() approach, and do some timing tests on the real app with it vs AJ's approach. I'm willing to bet you a beer that you won't be able to measure the difference, especially if the typical use of the array will fill it 2 or 3 Kb worth.
You are well aware that this is not about alloca() versus assembly.
 3) the solution offered (by both yourself and by Ben) is not allocated
on
 the stack.
The alloca() version is.
Was referring to the example Ben gave that looked rather like your own one. The one he noted in the post you recently replied to.
 The whole point here is to utilize the stack because it's a much
 faster means of providing a temporary workspace. Going to the heap means
a
 partial redesign, and the program will still run slower in D than it did
 originally. Forcing stack arrays to initialize will /always/ be
dramatically
 slower than the equivalent C/C++ code, whereas a simple embellishment to
the
 existing initialization mechanism would resolve that completely; and in
a
 jiffy. This is about choice, Walter, and supporting the programmer in
what
 they are doing.

 4) the approach to addressing this seems to be one of "guilty until
proven
 innocent". That is, there doesn't seem to be any notion on your part
that
 this might, actually, be a valid observation. With solid number to back
it
 up. You're basically saying, " ... the approach never had, and never
will,
 have any validity whatsoever. Rip out the code and do it just like I
would".
 I do sincerely hope that I'm misunderstanding you, but that's kinda' how
it
 has been coming across.

 May I ask why some easy manual control over default-initialization is
such
 a
 terrible thing? Is it really /so/ impossible that you'd rather force
people
 into re-implementing and retesting parts of their existing codebase
instead
 (as you suggest above) ?

 Or is there perhaps some fundamental ideology at stake, that I just
don't
 see?
Uninitialized data can be a source of bugs and trouble, as I pointed out
in
 my other posting, even when used correctly. One design goal of D is to
 improve reliability and portability by eliminating sources of undefined
 behavior, and uninitialized data is one huge source of undefined,
 unportable, erratic and unpredictable behavior. I'm trying to squeeze that
 out of the normal constructs in D - but access to C functions and inline
 assembler is there for folks who really need it.
Most admirable, Walter. And agreed, as I've stated in virtually every single post on this topic. But here's where that position truly falls apart ... what about those multitude of cases where buffers are reused over and over again? What about thread-local workspaces? What about pooled buffers? what about all those X, Y, and Z variations? They do /not/ get automagically re-initialized. Your argument, as you state it, is one-shot only. That is, the whole basis of said argument hold validity only as long as the "uninitialized data" is used once, and once only. Anyone who produces high-performance designs and/or code will chortle with glee at this glaring logic hole (where such buffer are deliberately reused thousands, sometimes millions, of times per second). Such systems do not have the time to spend clearing out each little byte of data ... they simply ensure that it's never a problem. Can we please get past this level of discussion? You know; in the time you've taken over these postings, you could probably have implemented a keyword qualifier to optionally inhibit the default initialization. Again, we're talking about a deliberate action by the programmer to say " I don't want that frickin CPU gobblin initialization!". From the developers position, your noble ideology has already been placed to one side at that point. The D language should be able to help there, rather than just get in the way. Remember? It's a "Practical Language for Practical Programmers". Please consider.
Aug 13 2004
next sibling parent Sean Kelly <sean f4.ca> writes:
van eeshan wrote:
 
 Perhaps you haven't had the opportunity to write resiliant software? For
 example, halting an entire web-site, because some file is not in the correct
 format, is just not a solution. Frankly, I'm really surprised you'd
 encourage something so obviously naiive in this day and age.
I think Walter meant that this is typically the proper behavior for an out of memory condition. He also notes that an exception is thrown so "resilient software" has a chance to do something about the problem instead of aborting. C++ behaves the exact same way, however C++ does not automatically catch the error and display a meaningful message if the programmer forgot to write a try/catch block. Sean
Aug 13 2004
prev sibling parent reply "Walter" <newshound digitalmars.com> writes:
"van eeshan" <vanee hotmail.net> wrote in message
news:cfhv05$bl9$1 digitaldaemon.com...
 Uninitialized data can be a source of bugs and trouble, as I pointed out
in
 my other posting, even when used correctly. One design goal of D is to
 improve reliability and portability by eliminating sources of undefined
 behavior, and uninitialized data is one huge source of undefined,
 unportable, erratic and unpredictable behavior. I'm trying to squeeze
that
 out of the normal constructs in D - but access to C functions and inline
 assembler is there for folks who really need it.
But here's where that position truly falls apart ... what about those multitude of cases where buffers are reused over and over again? What
about
 thread-local workspaces? What about pooled buffers?  what about all those
X,
 Y, and Z variations? They do /not/ get automagically re-initialized.
Right, they don't. However, data buffers are unlikely to contain garbage that looks like pointers into the gc heap, whereas garbage on the stack *is* very likely to. Next, detritus in recycled data buffers is not going to consist of objects that are no longer valid, the source of the other problem I mentioned. And lastly, D can't close every hole, but that doesn't mean it shouldn't close what it can.
Aug 13 2004
parent "van eeshan" <vanee hotmail.net> writes:
May I point you towards this post, which has an attempted summary of the
discussion thus far? news:cfhqhu$9jj$1 digitaldaemon.com

Would very much appreciate your comments on those points.

I think, perhaps, this unfolding picture indicates it's not a question of
ideology per se; but one of stack-usage being somewhat incompatible with the
current GC design. If there were some means to quickly eliminate what look
like old GC references from the (reused) portion of the stack that's still
live, then what Ben has suggested would presumably be more palatable?
Currently, the method employed to achieve this is to wipe everything to a
predefined default value; which can be agonizingly slow compared to the
C/C++ approach.

Is that a fair assessment?


"Walter" <newshound digitalmars.com> wrote in message
news:cfisfb$s39$1 digitaldaemon.com...
 "van eeshan" <vanee hotmail.net> wrote in message
 news:cfhv05$bl9$1 digitaldaemon.com...
 Uninitialized data can be a source of bugs and trouble, as I pointed
out
 in
 my other posting, even when used correctly. One design goal of D is to
 improve reliability and portability by eliminating sources of
undefined
 behavior, and uninitialized data is one huge source of undefined,
 unportable, erratic and unpredictable behavior. I'm trying to squeeze
that
 out of the normal constructs in D - but access to C functions and
inline
 assembler is there for folks who really need it.
But here's where that position truly falls apart ... what about those multitude of cases where buffers are reused over and over again? What
about
 thread-local workspaces? What about pooled buffers?  what about all
those
 X,
 Y, and Z variations? They do /not/ get automagically re-initialized.
Right, they don't. However, data buffers are unlikely to contain garbage that looks like pointers into the gc heap, whereas garbage on the stack
*is*
 very likely to. Next, detritus in recycled data buffers is not going to
 consist of objects that are no longer valid, the source of the other
problem
 I mentioned. And lastly, D can't close every hole, but that doesn't mean
it
 shouldn't close what it can.
Aug 13 2004
prev sibling parent reply "van eeshan" <vanee hotmail.net> writes:
Addendum.

I misread your code somewhat, Water, but what you suggest has a fatal flaw
in it:  you are still initializing the original stack-based array, and
you're making the assumption that the typical size is very small. Further,
you are then extending the array via the heap which will, I imagine,
initialize the extended part of the array. That just doesn't work because:

a) there's a significant amount of effort to convert all the "workers" to do
what you suggest. Plus, the code is less efficient, there's more of it, and
there's possible heap-fragmentation involved.

b) the initial size needs to be around the 2 to 3KB mark. Given that, your
example is hardly faster than the original code. In fact, the initialization
cost is ~57000% slower than the equivalent C/C++ array allocation. That's
better than ~88000%, but who's to quibble at these levels?

I wish it were so simple as your suggestion conveyed. Conversely, if there
were a syntactic means of optionally inhibiting the array initialization,
there would be no related issue at all. We could all be radiantly happy, and
have a beer or something.




"Walter" <newshound digitalmars.com> wrote in message
news:cfgqv4$2lq9$1 digitaldaemon.com...
 "van eeshan" <vanee hotmail.net> wrote in message
 news:cfge6g$2g1j$1 digitaldaemon.com...
 Heavy-duty text processing. The coder wasn't being lazy at all, as
 out-of-bounds conditions are being checked for. The allocated
(temporary)
 workspace is typically used in only a small amounts, but sporadically
will
 get used more fully. The approach taken was apparently "we'll check for
 end-of-buffer, but if anything exceeds these limits, then it's so beyond
the
 design-specification that it's an error condition". I understand that to
be
 a rather common approach. It's certainly valid, and rather effective.
The
 D
 approach to this is, " ...you /vill/ pay ze initialization cost,
regardless
 of vot ze design!"
Here's an example of an alternate way to do it, for code that would typically use only a small amount, but occaisionally could use a great
deal.
 It can also be set up to never overflow (at least until all memory is
 exhausted) by repeatedly resizing it. This has the advantage of not
needing
 to set up any error recovery, or create and document an error message for
 the limits.

 import std.stdio;

 char[] formatter(char[] s)
 {
     if (s.length < 14)        // if our passed buffer isn't big enough
         s.length = 14;        // yes, we can reallocate things that were
 originally on the stack
     s[0..14] = "abcdefghijklmn";    // copy result into s[]
     return s;
 }


 void main()
 {   char[4] workspace;    // assume 4 is size typically needed in
 formatter()
     char[] s;

     s = formatter(workspace);
     writefln(s);
 }
Aug 12 2004
parent reply "Walter" <newshound digitalmars.com> writes:
"van eeshan" <vanee hotmail.net> wrote in message
news:cfh14k$2ouk$1 digitaldaemon.com...
 I misread your code somewhat, Water, but what you suggest has a fatal flaw
 in it:  you are still initializing the original stack-based array, and
 you're making the assumption that the typical size is very small.
I'm confused - you wrote: "The allocated (temporary) workspace is typically used in only a small amounts, but sporadically will get used more fully." My suggestion was tuned to that.
 Further,
 you are then extending the array via the heap which will, I imagine,
 initialize the extended part of the array.
That's correct.
 That just doesn't work because:
 a) there's a significant amount of effort to convert all the "workers" to
do
 what you suggest.
Correct, but resizing dynamic arrays in D is pretty easy. It's not as klunky and error-prone as in C.
 Plus, the code is less efficient,
Since the resize will only happen in atypical cases, its efficiency should not impact overall performance.
 there's more of it, and
 there's possible heap-fragmentation involved.
Heap fragmentation is much less of an issue than it used to be, with large memory pools and virtual memory. I know how to build a gc that will compact memory to deal with fragmentation when it does arise, so although this is not in the current D it is not a technical problem to do it.
 b) the initial size needs to be around the 2 to 3KB mark. Given that, your
 example is hardly faster than the original code. In fact, the
initialization
 cost is ~57000% slower than the equivalent C/C++ array allocation. That's
 better than ~88000%, but who's to quibble at these levels?
If your app typically is going to fill 2 to 3 Kb, I suggest using malloc() (or alloca()) for it. The handful of extra instructions to allocate via malloc as opposed to the stack will be lost in the thousands of instructions needed to fill 2 to 3 Kb.
 I wish it were so simple as your suggestion conveyed. Conversely, if there
 were a syntactic means of optionally inhibiting the array initialization,
 there would be no related issue at all. We could all be radiantly happy,
and
 have a beer or something.
Allocating large, uninitialized arrays on the stack has its own problems: 1) The uninitialized data that is on the stack will get scanned by the garbage collector looking for any references to allocated memory. Since the uninitialized data consists of old D stack frames, it is highly likely that some of that garbage will look like references into the gc heap, and the gc memory will not get freed. This problem really does happen, and can be pretty frustrating to track down. Some of my C code that uses GC gets a memset() on the stack frame for that reason. 2) It's possible for a function to pass out of it a reference to data on that function's stack frame. By then allocating a new stack frame over the old data, and not initializing, the reference to the old data may still appear to be valid. The program will then behave erratically. Initializing all data on the stack frame will greatly increase the probability of forcing that bug into the open in a repeatable manner. 3) Large arrays on the stack bring up the specter of stack exhaustion with even modest amounts of recursion. That said, there are things I can do to generate faster code for the initialization.
Aug 12 2004
next sibling parent reply Arcane Jill <Arcane_member pathlink.com> writes:
In article <cfhhbe$3jf$1 digitaldaemon.com>, Walter says...

I know how to build a gc that will compact
memory to deal with fragmentation when it does arise, so although this is
not in the current D it is not a technical problem to do it.
That's probably good news, but: Will it be possible to mark specific arrays on the heap as "immovable"? (Please say yes. I'm sure it is not a technical problem). Arcane Jill
Aug 12 2004
parent "Walter" <newshound digitalmars.com> writes:
"Arcane Jill" <Arcane_member pathlink.com> wrote in message
news:cfhoe3$793$1 digitaldaemon.com...
 In article <cfhhbe$3jf$1 digitaldaemon.com>, Walter says...

I know how to build a gc that will compact
memory to deal with fragmentation when it does arise, so although this is
not in the current D it is not a technical problem to do it.
That's probably good news, but: Will it be possible to mark specific arrays on the heap as "immovable"?
(Please
 say yes. I'm sure it is not a technical problem).
Yes. It's called "pinning". And I can guess why you need this feature <g>.
Aug 13 2004
prev sibling next sibling parent reply Arcane Jill <Arcane_member pathlink.com> writes:
In article <cfhhbe$3jf$1 digitaldaemon.com>, Walter says...

Allocating large, uninitialized arrays on the stack has its own problems:

1) The uninitialized data that is on the stack will get scanned by the
garbage collector looking for any references to allocated memory. Since the
uninitialized data consists of old D stack frames, it is highly likely that
some of that garbage will look like references into the gc heap, and the gc
memory will not get freed. This problem really does happen, and can be
pretty frustrating to track down. Some of my C code that uses GC gets a
memset() on the stack frame for that reason.
That is a very good point. In fact, it's an excellent point, and one which I had not previously considered. Although, I suppose you could get round it with something like this:
2) It's possible for a function to pass out of it a reference to data on
that function's stack frame. By then allocating a new stack frame over the
old data, and not initializing, the reference to the old data may still
appear to be valid. The program will then behave erratically. Initializing
all data on the stack frame will greatly increase the probability of forcing
that bug into the open in a repeatable manner.
That's another very good point.
3) Large arrays on the stack bring up the specter of stack exhaustion with
even modest amounts of recursion.
I'm not sure about this one. I thought that in the 80386+ architecture, the stack(s) can grow indefinitely. Or is it just that Windows doesn't take advantage of this possibility.
That said, there are things I can do to generate faster code for the
initialization.
I suspect that if you succeed in making this lightening-fast, everyone will stop complaining. You only have to look at the figures which have been posted recently to see what everybody's complaining about.
Aug 13 2004
parent "Walter" <newshound digitalmars.com> writes:
"Arcane Jill" <Arcane_member pathlink.com> wrote in message
news:cfhppt$97t$1 digitaldaemon.com...
 In article <cfhhbe$3jf$1 digitaldaemon.com>, Walter says...
3) Large arrays on the stack bring up the specter of stack exhaustion
with
even modest amounts of recursion.
I'm not sure about this one. I thought that in the 80386+ architecture,
the
 stack(s) can grow indefinitely. Or is it just that Windows doesn't take
 advantage of this possibility.
Stacks are allocated a range of memory. Once that's filled up, since the stack cannot be relocated, the program comes to a (crashing) halt. The problem gets worse if you create a lot of threads, because each thread will necessarilly have a fixed maximum stack size. The stack size only appears to be infinite to those of us used to 16 bit machines <g>.
Aug 13 2004
prev sibling next sibling parent reply "van eeshan" <vanee hotmail.net> writes:
"Walter" <newshound digitalmars.com> wrote in message
news:cfhhbe$3jf$1 digitaldaemon.com...
 "van eeshan" <vanee hotmail.net> wrote in message
 news:cfh14k$2ouk$1 digitaldaemon.com...
 I misread your code somewhat, Water, but what you suggest has a fatal
flaw
 in it:  you are still initializing the original stack-based array, and
 you're making the assumption that the typical size is very small.
I'm confused - you wrote: "The allocated (temporary) workspace is
typically
 used in only a small amounts, but sporadically will get used more fully."
My
 suggestion was tuned to that.
This topic had become attuned somewhat to 16KB and 32KB buffers, so the typical population of 2 to 3KB is relatively small; I can understand how that might have been misinterpreted to become the < 5 bytes you had tuned for.
 b) the initial size needs to be around the 2 to 3KB mark. Given that,
your
 example is hardly faster than the original code. In fact, the
initialization
 cost is ~57000% slower than the equivalent C/C++ array allocation.
That's
 better than ~88000%, but who's to quibble at these levels?
If your app typically is going to fill 2 to 3 Kb, I suggest using malloc() (or alloca()) for it. The handful of extra instructions to allocate via malloc as opposed to the stack will be lost in the thousands of
instructions
 needed to fill 2 to 3 Kb.
Good that you should note the thousands of instructions, because that's effectively what the default initialization does (fills the entire array with default data). This particular app is actually very efficient in filling the 'typical' space used; you might think of it as effectively a quick lookup followed by a memcpy(). If you consider that as the core element of the algorithm; then the default initialization will slow that core down by almost a factor of two (clear the entire buffer, then fill the whole thing again). That slows down the overall application quite noticeably. One is rather likely to notice that level of redundant overhead in any CPU-bound application that doesn't completely toss the notion of efficiency out of the proverbial window.
 Allocating large, uninitialized arrays on the stack has its own problems:

 1) The uninitialized data that is on the stack will get scanned by the
 garbage collector looking for any references to allocated memory. Since
the
 uninitialized data consists of old D stack frames, it is highly likely
that
 some of that garbage will look like references into the gc heap, and the
gc
 memory will not get freed. This problem really does happen, and can be
 pretty frustrating to track down. Some of my C code that uses GC gets a
 memset() on the stack frame for that reason.
So what you are saying is that the current GC is incompatible with stack allocation in general? What you describe is not restricted to uninitialized data, Walter ~ what you appear to be saying is that /any/ data on the stack could potentially look like a reference into the gc heap (as you describe it). If this is the case, the use of alloca() will have exactly the same issues, yet you recommend that as a solution a few paragraphs above. Where does this really stand? Are you really saying that the D GC is incompatible with long-term stack content? That is, if I place some arbitrary values in a stack array during main(), will those values cause the GC confusion throughout the remainder of program execution if they happen to look like GC references?
 2) It's possible for a function to pass out of it a reference to data on
 that function's stack frame. By then allocating a new stack frame over the
 old data, and not initializing, the reference to the old data may still
 appear to be valid. The program will then behave erratically. Initializing
 all data on the stack frame will greatly increase the probability of
forcing
 that bug into the open in a repeatable manner.
Oh ... I thought we were already well past the general case here, and into the realm of where the programmer had alreay taken full responsibility for their actions. Specifically, not initializing the stack content. Let's face it: it's quite feasible for any function to barf all over the heap, stack, and everywhere else until the hardware catches it. And what about the cases whereby a stack buffer is consistently reused within the same, or subordinate, function? It's not suddenly cleared all over again before each subsequent usage (e.g. between foreach loops), is it? This appears to be a completely moot (and potentially misleading) point, but please educate me if I'm wrong. Seriously ...
 3) Large arrays on the stack bring up the specter of stack exhaustion with
 even modest amounts of recursion.
Anyone who has read this thread is well aware that we're not talking recursion. Regardless; your sudden concern didn't stop this being a common and successful approach in C/C++. I mean, did you outlaw large(ish) stack-arrays in the DM C++ compiler? That is not a rhetorical question ... Besides, as noted previously, it's best we not confuse this topic with design-methodologies; as even the best education cannot reconcile such issues. I hope we can agree on that.
 That said, there are things I can do to generate faster code for the
 initialization.
Yes, I'm aware of that. You could, for example, initialize in chunks of four rather than one. However, that will still not even get close to a C/C++ style array-allocation on the stack. But the thought is both appreciated and noted. To summarize thus far: a) we have a D language facility to initialize all variables with default values. This is arguably a wonderful facility, and can help catch certain bugs in the case of general programming. b) unfortunately there's no easy "accessible" means to inhibit that initialization for stack-based arrays; leaving the programmer to eat a potentially (and repeated) vast overhead of CPU consumption, or resort to some non-obvious guru-tricks (none of which will easily work on anything other than a single dimensional array) ... Not to mention that this CPU cost is completely hidden from source-level view in the first place. c) you indicate, Walter, that the GC has problems with uninitialized data on the stack. Well, that will happen regardless of whether or not the language supports non-initialization. That is; a valid stack array can contain arbitrary data patterns ~ said content may look exactly like gc references. Please correct me if I'm wrong. d) The language explicitly exposes a built-in assembler, to do whatever the heck anyone might want to (barring the usual platform issues). Yet, there seems to be a massive pushback against providing an easy to use, symmetrical, high performance, and thoroughly useful counter to the default initialization ~ the alternative to which is to drop into assembly (or similar guru-grok), to simulate what C/C++ provides as its default, highly efficient, behavior. There's not even the vaguest counter-argument relating to implementation complexity. Please forgive the unsavory mess while this latter, wildly contradictory concept, blows my tiny non-academic mind all over the inside of your screen ... Let me finish with the sincere hope that my (obviously) thoroughly weak grasp on reality will soon be at an end. Can you please help me wrangle that into shape Walter?
Aug 13 2004
parent reply Arcane Jill <Arcane_member pathlink.com> writes:
In article <cfhqhu$9jj$1 digitaldaemon.com>, van eeshan says...

Since
the
 uninitialized data consists of old D stack frames, it is highly likely
that
 some of that garbage will look like references into the gc heap, and the
gc
 memory will not get freed.
Are you really saying that the D GC is incompatible
with long-term stack content? That is, if I place some arbitrary values in a
stack array during main(), will those values cause the GC confusion
throughout the remainder of program execution if they happen to look like GC
references?
I think you may have misunderstood Walter. Arbitrary random data can indeed give the GC false positives, but this usually doesn't matter, since it is rare. What Walter said is that uninitialized stack data is /not random/ - in that it is almost certain to contain fragments of some previous stack frame, and therefore also /real, valid/ (but out-of-date) references to GC-controlled memory. So the number of false positives will shoot up from "almost none" to "very many". (And my previous suggestion of temporarily disabling the GC won't help either, so I withdraw the suggestion). Jill
Aug 13 2004
next sibling parent "van eeshan" <vanee hotmail.net> writes:
Fair point Jill.

But at that stage, one might as well state that the stack is entirely off
limits for general array usage. You cannot predict the frequency of what
might or might not look like GC references for any given application. It
would require dynamic sampling to find a trend and try to offset that
against GC behavior. The real problem, in this particular case, is the GC ~
it's simply incompatible with arbitrary (yet valid) stack-based array
content. How much of a problem is that? It may actually be a red-herring. On
the other hand, I may be waning considerably at this very moment (eight
hours behind you).



"Arcane Jill" <Arcane_member pathlink.com> wrote in message
news:cfhvbq$bra$1 digitaldaemon.com...
 In article <cfhqhu$9jj$1 digitaldaemon.com>, van eeshan says...

Since
the
 uninitialized data consists of old D stack frames, it is highly likely
that
 some of that garbage will look like references into the gc heap, and
the
gc
 memory will not get freed.
Are you really saying that the D GC is incompatible
with long-term stack content? That is, if I place some arbitrary values
in a
stack array during main(), will those values cause the GC confusion
throughout the remainder of program execution if they happen to look like
GC
references?
I think you may have misunderstood Walter. Arbitrary random data can
indeed give
 the GC false positives, but this usually doesn't matter, since it is rare.
What
 Walter said is that uninitialized stack data is /not random/ - in that it
is
 almost certain to contain fragments of some previous stack frame, and
therefore
 also /real, valid/ (but out-of-date) references to GC-controlled memory.
So the
 number of false positives will shoot up from "almost none" to "very many".

 (And my previous suggestion of temporarily disabling the GC won't help
either,
 so I withdraw the suggestion).

 Jill
Aug 13 2004
prev sibling parent "Walter" <newshound digitalmars.com> writes:
"Arcane Jill" <Arcane_member pathlink.com> wrote in message
news:cfhvbq$bra$1 digitaldaemon.com...
 In article <cfhqhu$9jj$1 digitaldaemon.com>, van eeshan says...

Since the
 uninitialized data consists of old D stack frames, it is highly likely
that
 some of that garbage will look like references into the gc heap, and
the
 gc memory will not get freed.
Are you really saying that the D GC is incompatible
with long-term stack content? That is, if I place some arbitrary values
in a
stack array during main(), will those values cause the GC confusion
throughout the remainder of program execution if they happen to look like
GC
references?
I think you may have misunderstood Walter. Arbitrary random data can
indeed give
 the GC false positives, but this usually doesn't matter, since it is rare.
What
 Walter said is that uninitialized stack data is /not random/ - in that it
is
 almost certain to contain fragments of some previous stack frame, and
therefore
 also /real, valid/ (but out-of-date) references to GC-controlled memory.
So the
 number of false positives will shoot up from "almost none" to "very many".
You're right, I'd forgotten to make that clear. Uninitialized data is "garbage", but that's not at all the same thing as "random". In my extensive experiments with this, text and integer "garbage" rarely looks like a valid pointer into the gc heap, but old stack frames are very likely to. Also, it is not a matter of "incompatibility" with gc, it's more a matter of the gc being more effective.
Aug 13 2004
prev sibling parent "van eeshan" <vanee hotmail.net> writes:
So, the resolution on this has apparently been implemented. If you write:

{
    char[] x;
    const int size = 2048;

    x = cast(char[]) alloca(size)[0..size];
}

... then you will get the equivalent of a C/C++ stack array, without all the
prior initialization overhead involved. While this is a VeryGoodThing, it
could use some syntactic sugar ...

There were a number of suggestions made throughout this topic; here's a
brief recap of three:

{
    uninitialized char[2048] x;

    char[2048] x = void;

    char[] x = stackalloc char[2048];
}

The first two may find use outside of stack-allocation, whilst the third is
clearly stack-based. The second suggestion avoids creating an additional
language 'qualifier'.

How about it?



"Walter" <newshound digitalmars.com> wrote in message
news:cfhhbe$3jf$1 digitaldaemon.com...
 "van eeshan" <vanee hotmail.net> wrote in message
 news:cfh14k$2ouk$1 digitaldaemon.com...
 I misread your code somewhat, Water, but what you suggest has a fatal
flaw
 in it:  you are still initializing the original stack-based array, and
 you're making the assumption that the typical size is very small.
I'm confused - you wrote: "The allocated (temporary) workspace is
typically
 used in only a small amounts, but sporadically will get used more fully."
My
 suggestion was tuned to that.

 Further,
 you are then extending the array via the heap which will, I imagine,
 initialize the extended part of the array.
That's correct.
 That just doesn't work because:
 a) there's a significant amount of effort to convert all the "workers"
to
 do
 what you suggest.
Correct, but resizing dynamic arrays in D is pretty easy. It's not as
klunky
 and error-prone as in C.

 Plus, the code is less efficient,
Since the resize will only happen in atypical cases, its efficiency should not impact overall performance.
 there's more of it, and
 there's possible heap-fragmentation involved.
Heap fragmentation is much less of an issue than it used to be, with large memory pools and virtual memory. I know how to build a gc that will
compact
 memory to deal with fragmentation when it does arise, so although this is
 not in the current D it is not a technical problem to do it.

 b) the initial size needs to be around the 2 to 3KB mark. Given that,
your
 example is hardly faster than the original code. In fact, the
initialization
 cost is ~57000% slower than the equivalent C/C++ array allocation.
That's
 better than ~88000%, but who's to quibble at these levels?
If your app typically is going to fill 2 to 3 Kb, I suggest using malloc() (or alloca()) for it. The handful of extra instructions to allocate via malloc as opposed to the stack will be lost in the thousands of
instructions
 needed to fill 2 to 3 Kb.

 I wish it were so simple as your suggestion conveyed. Conversely, if
there
 were a syntactic means of optionally inhibiting the array
initialization,
 there would be no related issue at all. We could all be radiantly happy,
and
 have a beer or something.
Allocating large, uninitialized arrays on the stack has its own problems: 1) The uninitialized data that is on the stack will get scanned by the garbage collector looking for any references to allocated memory. Since
the
 uninitialized data consists of old D stack frames, it is highly likely
that
 some of that garbage will look like references into the gc heap, and the
gc
 memory will not get freed. This problem really does happen, and can be
 pretty frustrating to track down. Some of my C code that uses GC gets a
 memset() on the stack frame for that reason.

 2) It's possible for a function to pass out of it a reference to data on
 that function's stack frame. By then allocating a new stack frame over the
 old data, and not initializing, the reference to the old data may still
 appear to be valid. The program will then behave erratically. Initializing
 all data on the stack frame will greatly increase the probability of
forcing
 that bug into the open in a repeatable manner.

 3) Large arrays on the stack bring up the specter of stack exhaustion with
 even modest amounts of recursion.

 That said, there are things I can do to generate faster code for the
 initialization.
Aug 30 2004
prev sibling next sibling parent Ilya Minkov <minkov cs.tum.edu> writes:
Ahem, you're speaking of allocating a *buffer*. A large thing, perhaps 
used partially. Normally, i would feel somewhat uneasy to do that on the 
stack - the heap does it quite well, and, well, if there is still a 
performance problem, a buffer can be cached/shared/whatever, fairly 
easily. And if i remember correctly, there was some circumstance which 
could make the OS break if it had to extend the stack by more than 1 
page at a time.

The bulk of performance loss from heap allocation comes from many little 
objects, much more than from a few big ones... So what's the fuss about? 
Besides, you'd be using it by a pointer anyway.

-eye
Aug 12 2004
prev sibling parent reply Ben Hinkle <bhinkle4 juno.com> writes:
 Because, if not, then the following modification is the sort of (keyword
 qualifier) thing that would hand control back to the programmer:
 
 {
     uninitialized char[1024 * 32] workspace;
 
     formatter (workspace);
 }
 
 That's perhaps not the best name for it, but you get the idea.
another option is to look for "void" as an initialization value: { char[1024*32] workspace = void; formatter (workspace); } that way we don't need another keyword. If the initializer is void the variable isn't initialized.
Aug 13 2004
next sibling parent reply Arcane Jill <Arcane_member pathlink.com> writes:
In article <cfiam0$hhc$1 digitaldaemon.com>, Ben Hinkle says...

another option is to look for "void" as an initialization value:
{
  char[1024*32] workspace = void;
  formatter (workspace);
}
that way we don't need another keyword. If the initializer is void the
variable isn't initialized.
/another/ option is for the compiler, on seeing the code: would think to itself: "Hmmm. Now 1024*32 - that's 32768. That's a mighty large number of bytes to be putting on the stack. I think I'll allocate it from the heap instead. The programmer won't care. Hell, the programmer won't even /notice/ unless they start looking at the generated assembler!" Arcane Jill
Aug 13 2004
next sibling parent "Ben Hinkle" <bhinkle mathworks.com> writes:
"Arcane Jill" <Arcane_member pathlink.com> wrote in message
news:cfidel$j0u$1 digitaldaemon.com...
 In article <cfiam0$hhc$1 digitaldaemon.com>, Ben Hinkle says...

another option is to look for "void" as an initialization value:
{
  char[1024*32] workspace = void;
  formatter (workspace);
}
that way we don't need another keyword. If the initializer is void the
variable isn't initialized.
/another/ option is for the compiler, on seeing the code: would think to itself: "Hmmm. Now 1024*32 - that's 32768. That's a mighty
large
 number of bytes to be putting on the stack. I think I'll allocate it from
the
 heap instead.
That would be very unusual for a compiler to do - and it raises various questions like what "large" is. Plus unless it uses malloc to allocate it still won't be as fast as C due to the initialization overhead.
 The programmer won't care. Hell, the programmer won't even
 /notice/ unless they start looking at the generated assembler!"
uhh - that's obviously not completely true. Some programmers will care and will definitely notice. The question is how many (and how bad the problem is). Walter thinks the number is small (ie-not large enough to change things). The OP thinks the number is significant. I'm in the camp of nice-to-have but not a high priority problem.
 Arcane Jill
Aug 13 2004
prev sibling parent reply Andy Friesen <andy ikagames.com> writes:
Arcane Jill wrote:

 /another/ option is for the compiler, on seeing the code:
 

 
 would think to itself: "Hmmm. Now 1024*32 - that's 32768. That's a mighty large
 number of bytes to be putting on the stack. I think I'll allocate it from the
 heap instead. The programmer won't care. Hell, the programmer won't even
 /notice/ unless they start looking at the generated assembler!"
I like this approach. The programmer didn't say where he wants the memory to come from, so it's only logical to assume that he isn't thinking about it and is therefore relying on the compiler to do the right thing for him. If the programmer wants to force the storage to one place or another, he can say so explicitly. (using calloc, malloc, gc.malloc, etc) -- andy
Aug 13 2004
parent reply "van eeshan" <vanee hotmail.net> writes:
With respect Andy, the programmer was quite explicit about where the memory
should come from ... it's a locally-declared array, and the language
documentation points out such declarations are place on the stack (just like
C/C++) because there are some very good reasons to do so. If you abstract
that notion away, then we're not discussing D anymore. Note that the
declaration does not have static, const, or any other qualifier either.


"Andy Friesen" <andy ikagames.com> wrote in message
news:cfim8j$nq2$1 digitaldaemon.com...
 Arcane Jill wrote:

 /another/ option is for the compiler, on seeing the code:



 would think to itself: "Hmmm. Now 1024*32 - that's 32768. That's a
mighty large
 number of bytes to be putting on the stack. I think I'll allocate it
from the
 heap instead. The programmer won't care. Hell, the programmer won't even
 /notice/ unless they start looking at the generated assembler!"
I like this approach. The programmer didn't say where he wants the memory to come from, so it's only logical to assume that he isn't thinking about it and is therefore relying on the compiler to do the right thing for him. If the programmer wants to force the storage to one place or another, he can say so explicitly. (using calloc, malloc, gc.malloc, etc) -- andy
Aug 13 2004
parent reply Andy Friesen <andy ikagames.com> writes:
van eeshan wrote:
 With respect Andy, the programmer was quite explicit about where the memory
 should come from ... it's a locally-declared array, and the language
 documentation points out such declarations are place on the stack (just like
 C/C++) because there are some very good reasons to do so. If you abstract
 that notion away, then we're not discussing D anymore. Note that the
 declaration does not have static, const, or any other qualifier either.
One of the greatest things about D is that we have a chance to change it. Therefore, we are discussing D even if we abstract that notion away, because D itself may follow suit if we can convince Walter. :) The declaration, taken purely at face value: char[1024 * 32] workspace; declares 32768 consecutive chars under the name 'workspace'. Since the 'new' keyword is not used, this storage is only guaranteed to exist within the function's scope. It doesn't necessarily have to say any more than that: the D language (not the D compiler implementation) is much more than spec for a portable assembler, (as C was) and it makes perfect sense to describe it in conceptual terms where possible because concepts are by far the most common use case. (code is rarely optimized before it is written :) ) Ideally, the compiler knows what the output has to do, but is completely free to achieve that any way it wants. (this is the main reason O'Caml is so fast, as I understand it) However, there's no denying that D is a systems language, so it goes without saying that there absolutely must be a convenient way to say that you *do* care precisely where that storage comes from and how you want it to be handled. its name, allocates stack storage. By being a language primitive, it presents plenty of information to the code optimizer. So, hypothetically, we would be able to write // want chars. Don't care where they came from. char[1024*32] workspace; // get 'new' storage from the GC. char[] workspace = new char[1024*32]; // allocate stack storage char[] workspace = stackalloc char[1024*32]; -- andy
Aug 13 2004
parent reply "van eeshan" <vanee hotmail.net> writes:
All good points. As long as there is some easy means of avoiding unwanted
initialization overhead (particularly vis-a-vis the stack, since it is by
far the fastest place to allocate locally-scoped storage), then I'd
certainly be all for it.

Regarding the stack in particular, the declaration:

      // allocate stack storage
      char[] workspace = stackalloc char[1024*32];
... would likely map to the alloca() function (which does not waste valuable time initializing all of the requested storage space). However, as Walter has described it, the GC is apparently incompatible with usage of alloca() also ~ because potentially old GC references will suddenly appear back in the "live" stack area again. If this is truly the case, then you should expect the same pushback on this notion as I've had with the notion of an "uninitialized" qualifier, or Ben's alternative "=void". After all, your "stackalloc" keyword is a variation on the theme. On the other hand, if the GC turns out to be compatible with alloca (as it is today), then we can all sing and dance with joy when Walter wraps that function with some "more accessible" syntax <g>. I prefer Ben's approach though, because it's potentially applicable in more places than just the stack. Bon Chance. "Andy Friesen" <andy ikagames.com> wrote in message news:cfivtl$vcr$1 digitaldaemon.com...
 van eeshan wrote:
 With respect Andy, the programmer was quite explicit about where the
memory
 should come from ... it's a locally-declared array, and the language
 documentation points out such declarations are place on the stack (just
like
 C/C++) because there are some very good reasons to do so. If you
abstract
 that notion away, then we're not discussing D anymore. Note that the
 declaration does not have static, const, or any other qualifier either.
One of the greatest things about D is that we have a chance to change it. Therefore, we are discussing D even if we abstract that notion away, because D itself may follow suit if we can convince Walter. :) The declaration, taken purely at face value: char[1024 * 32] workspace; declares 32768 consecutive chars under the name 'workspace'. Since the 'new' keyword is not used, this storage is only guaranteed to exist within the function's scope. It doesn't necessarily have to say any more than that: the D language (not the D compiler implementation) is much more than spec for a portable assembler, (as C was) and it makes perfect sense to describe it in conceptual terms where possible because concepts are by far the most common use case. (code is rarely optimized before it is written :) ) Ideally, the compiler knows what the output has to do, but is completely free to achieve that any way it wants. (this is the main reason O'Caml is so fast, as I understand it) However, there's no denying that D is a systems language, so it goes without saying that there absolutely must be a convenient way to say that you *do* care precisely where that storage comes from and how you want it to be handled. its name, allocates stack storage. By being a language primitive, it presents plenty of information to the code optimizer. So, hypothetically, we would be able to write // want chars. Don't care where they came from. char[1024*32] workspace; // get 'new' storage from the GC. char[] workspace = new char[1024*32]; // allocate stack storage char[] workspace = stackalloc char[1024*32]; -- andy
Aug 13 2004
parent reply Andy Friesen <andy ikagames.com> writes:
van eeshan wrote:
 All good points. As long as there is some easy means of avoiding unwanted
 initialization overhead (particularly vis-a-vis the stack, since it is by
 far the fastest place to allocate locally-scoped storage), then I'd
 certainly be all for it.
 
 Regarding the stack in particular, the declaration:
 
 
     // allocate stack storage
     char[] workspace = stackalloc char[1024*32];
... would likely map to the alloca() function (which does not waste valuable time initializing all of the requested storage space). However, as Walter has described it, the GC is apparently incompatible with usage of alloca() also ~ because potentially old GC references will suddenly appear back in the "live" stack area again. If this is truly the case, then you should expect the same pushback on this notion as I've had with the notion of an "uninitialized" qualifier, or Ben's alternative "=void". After all, your "stackalloc" keyword is a variation on the theme.
I actually didn't mean to address implicit initialization directly at all so much as a need for some way to let the compiler decide what's faster. I think the same reasoning applies, though: as a language which allows the programmer to hit the metal when he needs to, D needs to offer some mechanism that communicates that the programmer knows better than the compiler. Using the 'void' keyword for this purpose sounds a lot like the Perl practice of assigning arbitrary meaning to symbols for the sake of terseness. That being said, I can't think of anything better off the top of my head, so I'll shut up now. :) -- andy
Aug 13 2004
next sibling parent "Ben Hinkle" <bhinkle mathworks.com> writes:
 Using the 'void' keyword for this purpose sounds a lot like the Perl
 practice of assigning arbitrary meaning to symbols for the sake of
 terseness.  That being said, I can't think of anything better off the
 top of my head, so I'll shut up now. :)
Perl! ouch! ;-) Void currently means "no type" so it seems very natural to me to extend that to mean "no initial value" when used as an initializer. On a related note, I think the only way to get uninitialized memory from the heap should be through malloc.
Aug 13 2004
prev sibling parent reply "van eeshan" <vanee hotmail.net> writes:
"Andy Friesen"  wrote ..
 I actually didn't mean to address implicit initialization directly at
 all so much as a need for some way to let the compiler decide what's
faster.
 I think the same reasoning applies, though: as a language which allows
 the programmer to hit the metal when he needs to, D needs to offer some
 mechanism that communicates that the programmer knows better than the
 compiler.
In light of what we now know (but still don't quite know :-) I think it's worth recapturing the initial request. I hope you folks will correct and/or embellish the following, as my knowledge of the internal GC operation is meager: a) there are two things that a programmer uses the stack for: 1) nonchalant locally-scoped variables, and 2) a deliberate and explicit means of allocating temporary storage very quickly ( ~50x faster than the heap). b) the default initialization scheme works great for most variables, and even arrays, where they belong to the nonchalant-usage group. storage requested is of anything other than a truly trivial size (~5 bytes). These issues apparently (still to be clarified) are borne due to the nature and behavior of GC stack-scans. That is, the GC apparently performs more efficiently where the "live" stack does not contain "old" things that look attempt to eliminate such references from the stack content (wipe everything to a default value). This begs the question, "... how often does the GC run?" And also, "... how much impact do "old" GC references have on its performance?" Let's make an assertion here first: If a programmer is explicitly and assume said storage will be given up very soon. After all, if the execution time is long-lived then one might as well allocate on the heap instead; right? The core issue brought up by this thread/topic is about those cases where execution-time is very short-lived, such that the overhead of initialization is enough to be noticeably detrimental. Let's give this short execution-window a name: X ... Okay; so given this assertion, how many times will the GC become active during X? One might expect something close to zero ... especially so because the currently executing thread will not be making any heap requests. If it did, then the benefits of the original stack-allocation would likely be drowned out anyway. The problem remains that /another/ thread may cause the GC to scan the (now multiple) stacks during X. So, how often does that happen? Within this short window? One might expect "very few". Another approach would presumably be to disable GC stack-walks during X; that is, inhibit the GC from scanning a specific stack (the one belonging to the X-thread) for the duration of X. Either way, the upshot is this: the programmer wants to take full control over what's going on; and D needs a decent mechanism to enable such control. Placing blinkers over the eyes of the programmer is not a good approach to this.
Aug 13 2004
parent reply ben 0x539.de writes:
Pretty ignorant of some of this thread, I would like to make some points anyway.

Firstly, I would consider the ability to request storage on the stack without it
being initialised a rather important feature for a system programming language.
Yes, I absolutely want this piece of storage on the stack, and no, my dear
compiler, I do not want you to waste my time by filling it with zeroes. ;)
There have been some fine suggestions that address this point, yet they appear
to be less accessible (like alloca, or even asm which is not even portable)
than, or bring the semantics of the language too far away from, the way C or C++
do it.
I recognise that D need not be another iteration of C++, yet I consider the ease
of adoption of D for folks used to C'ish languages to be an important strength.
If someone as simple as int[2048] foo; could have a meaning (`` allocate GC'ed
heap memory '') `` totally '' different from that in C (`` subq %rsp, 8192 ''),
it would probably appear to be unintuitive.

But that is just my unbased personal opinion.

Therefore I consider some simple qualifier with obvious semantics like ``
uninitialised '' to be worth the possible danger. This does not mean that
everybody should default to using uninitialised memory unless they absolutely
need initialisation, for methological reasons that were well stated by Walter.
But if you want it, here it is, no need for alloca or asm hacks.

This might confuse the garbage collector, yet I do not see how that is much of a
problem. Some of the uses of uninitialised arrays demonstrated in this thread, I
think, quickly overwrote the array with memcpy'd data, and in others the array
might not even live long enough to `` block '' memory from being free'd for any
timespan that would matter.
Of course the problem exists, but `` false positives '' are not exactly fatal.


As another approach to this problem, I think it might be a worthwhile hack to
the semantics of the language itself to have all pointers' memory be zeroed when
it goes out of scope.
This basically means that pointers do the opposite of what arrays do, being ``
initialised '' once we do not need them anymore instead of being initialised
when they are created.
So when the memory occupied by them goes back in scope due to any means of
allocating uninitialised memory on the stack, what used to be used for pointers
does now not confuse the garbage collector.
Considering we already have a lot of initialisation going on, I do not think
that it would be a terrible overhead to have deinitialisation, but I might be
wrong, of course.


Please excuse my amateurish intervention. I certainly have no qualifications at
all to point at anything you folks have said and say `` NO YOU ARE WRONG! '',
yet I hope that I might have helped.
Aug 15 2004
parent Derek Parnell <derek psych.ward> writes:
On Mon, 16 Aug 2004 04:43:43 +0000 (UTC), ben 0x539.de wrote:

 Pretty ignorant of some of this thread, I would like to make some points
anyway.
 
 Firstly, I would consider the ability to request storage on the stack without
it
 being initialised a rather important feature for a system programming language.
 Yes, I absolutely want this piece of storage on the stack, and no, my dear
 compiler, I do not want you to waste my time by filling it with zeroes. ;)
 There have been some fine suggestions that address this point, yet they appear
 to be less accessible (like alloca, or even asm which is not even portable)
 than, or bring the semantics of the language too far away from, the way C or
C++
 do it.
 I recognise that D need not be another iteration of C++, yet I consider the
ease
 of adoption of D for folks used to C'ish languages to be an important strength.
 If someone as simple as int[2048] foo; could have a meaning (`` allocate GC'ed
 heap memory '') `` totally '' different from that in C (`` subq %rsp, 8192 ''),
 it would probably appear to be unintuitive.
 
 But that is just my unbased personal opinion.
 
 Therefore I consider some simple qualifier with obvious semantics like ``
 uninitialised '' to be worth the possible danger. This does not mean that
 everybody should default to using uninitialised memory unless they absolutely
 need initialisation, for methological reasons that were well stated by Walter.
 But if you want it, here it is, no need for alloca or asm hacks.
 
 This might confuse the garbage collector, yet I do not see how that is much of
a
 problem. Some of the uses of uninitialised arrays demonstrated in this thread,
I
 think, quickly overwrote the array with memcpy'd data, and in others the array
 might not even live long enough to `` block '' memory from being free'd for any
 timespan that would matter.
 Of course the problem exists, but `` false positives '' are not exactly fatal.
 
 As another approach to this problem, I think it might be a worthwhile hack to
 the semantics of the language itself to have all pointers' memory be zeroed
when
 it goes out of scope.
 This basically means that pointers do the opposite of what arrays do, being ``
 initialised '' once we do not need them anymore instead of being initialised
 when they are created.
 So when the memory occupied by them goes back in scope due to any means of
 allocating uninitialised memory on the stack, what used to be used for pointers
 does now not confuse the garbage collector.
 Considering we already have a lot of initialisation going on, I do not think
 that it would be a terrible overhead to have deinitialisation, but I might be
 wrong, of course.
 
 Please excuse my amateurish intervention. I certainly have no qualifications at
 all to point at anything you folks have said and say `` NO YOU ARE WRONG! '',
 yet I hope that I might have helped.
This idea is stilling making sense to me, even when taking Walter's objections into consideration. Okay, so an uninitialized stack array might be a likely source of bugs. But if its not the default behaviour, and the coder explicitly takes responsibility of it by coding the keyword 'uninitialized' or similar, then why shouldn't D say "Okay if that's what you want, that's what you'll get!" D allows 'goto' which is probably a more common source of bugs and high maintenance costs. -- Derek Melbourne, Australia 16/Aug/04 3:26:36 PM
Aug 15 2004
prev sibling parent "van eeshan" <vanee hotmail.net> writes:
That's a very nice idea Ben ...

"Ben Hinkle" <bhinkle4 juno.com> wrote in message
news:cfiam0$hhc$1 digitaldaemon.com...
 Because, if not, then the following modification is the sort of (keyword
 qualifier) thing that would hand control back to the programmer:

 {
     uninitialized char[1024 * 32] workspace;

     formatter (workspace);
 }

 That's perhaps not the best name for it, but you get the idea.
another option is to look for "void" as an initialization value: { char[1024*32] workspace = void; formatter (workspace); } that way we don't need another keyword. If the initializer is void the variable isn't initialized.
Aug 13 2004
prev sibling parent reply Ben Hinkle <bhinkle4 juno.com> writes:
Have you tried alloca? I haven't tried running it but I'm thinking of

void stack2()
{
  void* x = alloca(16*1024);
}


van eeshan wrote:

 "Walter" <newshound digitalmars.com> wrote in message
 news:cfbip7$ora$1 digitaldaemon.com...
 "van eeshan" <vanee hotmail.net> wrote in message
 news:cfb72j$9l7$1 digitaldaemon.com...
 "Walter" <newshound digitalmars.com> wrote in message
 news:cfb5v9$76u$1 digitaldaemon.com...
 The problem, however, is that since Java does not allow any objects
 or arrays on the stack (or in static data), whenever you need one it
needs
 to
 be allocated on the heap. This is a lot more expensive than stack
 allocation. Since one winds up allocating a lot more heap objects in
Java
 than one does in C/C++, the end result is slower. Java offers no
 alternatives to gc allocation.

 This is why D allows stack based objects and value objects.
Are you saying that D allows one to create Objects entirely upon the
stack? For auto objects, it is allowed for the compiler to put them there. But I was primarilly referring to structs and arrays.
 And entirely within static data?
Structs and arrays, yes. Java does not allow structs, nor does it allow arrays on the stack.
Thanks for clarifying that Walter. Would it be prudent to note that D allocates auto Objects on the heap? Stack allocation may be allowed for in the spec, but that's not how the compiler operates. Also, there is that point about heap allocations being a lot more expensive than stack allocations. Naturally, one would lean towards wholeheartedly agreeing; but this is just not true when it comes to D. Here's a little (contrived) test jig to illustrate: private import std.c.time; private import std.c.stdlib; private import std.c.windows.windows; void heap() { void *x = malloc (16 * 1024); free (x); } void stack() { ubyte[16 * 1024] x; } void trace (uint startTime) { FILETIME ft; GetSystemTimeAsFileTime(&ft); printf ("%u\n", (ft.dwLowDateTime - startTime) / 10000); } void main() { FILETIME start; GetSystemTimeAsFileTime(&start); trace (start.dwLowDateTime); sleep (3); trace (start.dwLowDateTime); for (int i=1_000_000; --i;) heap(); trace (start.dwLowDateTime); for (int i=1_000_000; --i;) stack(); trace (start.dwLowDateTime); } The results? On this machine it takes the heap() routine ~500ms to execute, and the stack() version ~3500ms. Yep, that's seven times longer. Why? What wasn't stated is that D ensures all stack variables are initialized ... including arrays. The innocuous stack allocation actually clears the 16KB to zero each time, which is why it takes so long. For a 32KB 'buffer', the stack() execution time goes up to ~8600ms versus the ~500ms for heap(); you can see there's some rather non-linear behavior for the stack version <g> The point here is not about the benefits/detriments of deterministic initialization. It's about clarifying a point of performance-tuning that is far from obvious. Without wishing to put too fine a point on it, the quoted post was somewhat misleading. That is; allocating an array on the D stack can consume far more cycles than the equivalent heap allocation. It depends upon the size of the array, the bus speed, the size of the cache, heap fragmentation and many other things. D stack allocations are simply not like C/C++ at all. Far from it. For the curious, here's the disassembly for both routines. It's worth noting that the array initialization could be sped up. Also worth noting is that the loop/call overhead was 10ms. 10: { 11: void *x = malloc (16 * 1024); 12: free (x); 00402011 push 4000h 00402016 call _malloc (0040574c) 0040201B add esp,4 0040201E push eax 0040201F call _free (004057a4) 00402024 add esp,4 13: } 00402027 pop eax 00402028 ret 14: 15: 16: void stack() 0040202C push ebp 0040202D mov ebp,esp 0040202F mov edx,4 00402034 sub esp,1000h 0040203A test dword ptr [esp],esp 0040203D dec edx 0040203E jne _D5hello5stackFZv+8 (00402034) 00402040 sub esp,edx 00402042 mov ecx,4000h 00402047 xor eax,eax 17: { 18: ubyte[16 * 1024] x; 00402049 push edi 0040204A lea edi,[x] 00402050 rep stos byte ptr [edi] 19: } 00402052 pop edi 00402053 mov esp,ebp 00402055 pop ebp 00402056 ret
Aug 11 2004
next sibling parent reply "van eeshan" <vanee hotmail.net> writes:
"Ben Hinkle" <bhinkle4 juno.com> wrote in message
news:cfe87l$1akg$1 digitaldaemon.com...
 Have you tried alloca? I haven't tried running it but I'm thinking of

 void stack2()
 {
   void* x = alloca(16*1024);
 }
Tried that, Ben; alloca() also clears everything to zero.
Aug 11 2004
next sibling parent Ben Hinkle <bhinkle4 juno.com> writes:
van eeshan wrote:

 "Ben Hinkle" <bhinkle4 juno.com> wrote in message
 news:cfe87l$1akg$1 digitaldaemon.com...
 Have you tried alloca? I haven't tried running it but I'm thinking of

 void stack2()
 {
   void* x = alloca(16*1024);
 }
Tried that, Ben; alloca() also clears everything to zero.
Hmm. I just tried on Linux and got the times: heap: .17sec stack: 14 stack2: .06 looks like on Linux alloca is the fastest.
Aug 11 2004
prev sibling parent "Walter" <newshound digitalmars.com> writes:
"van eeshan" <vanee hotmail.net> wrote in message
news:cfe8ob$1b85$1 digitaldaemon.com...
 "Ben Hinkle" <bhinkle4 juno.com> wrote in message
 news:cfe87l$1akg$1 digitaldaemon.com...
 Have you tried alloca? I haven't tried running it but I'm thinking of

 void stack2()
 {
   void* x = alloca(16*1024);
 }
Tried that, Ben; alloca() also clears everything to zero.
No, it does not. It simply calls C's alloca(), and gets whatever C's alloca() does. Note, however, that under linux the OS seems to clear pages that are faulted into the stack, so it could *appear* to be 0 initialized when it really is just fresh snow <g>.
Aug 11 2004
prev sibling parent reply "van eeshan" <vanee hotmail.net> writes:
Well, blow me down with a feather. I must have really screwed that test up;
thank you Ben ~ that gets me out of a hole for now. So, in light of this,
where are we now? Currently, if you want to quickly allocate an array on the
stack you must do something like the following:

  char[] array = (cast(char *) alloca(Size))[0..Size];

I feel it would be beneficial to both programmers and to the language
reputation to provide something rather more "accessible" than this ...
thing. Perhaps a globally available template? Something, where taking
advantage of the stack is not hidden away in the darkest recesses. This is
why I figured something like an optional qualifier on the variable (to
inhibit default initialization) would be ideal: it's easy to apply, easy to
understand, easy to document, and very likely easy to implement. If this
sort of thing remains poorly documented, or is not easily accessible, people
will be left wondering why their supposed fast stack-arrays are actually
eating serious chunks of CPU time. After all, porting some C code is what
led to this foray (temporary stack arrays are commonplace in C).

Finally, the alloca() approach is still ~700% slower than the C/C++ standard
mechanism, yet it is faster than simply declaring a "char[20]" D local
variable (that's kinda' hoky when you think about it ... a char[20]
declaration is ~750% slower in D than in C).

What to do? Can we please have a simple, clean, optional mechanism for
inhibiting default initialization? Is it really so much trouble?













"Ben Hinkle" <bhinkle4 juno.com> wrote in message
news:cfe87l$1akg$1 digitaldaemon.com...
 Have you tried alloca? I haven't tried running it but I'm thinking of

 void stack2()
 {
   void* x = alloca(16*1024);
 }


 van eeshan wrote:

 "Walter" <newshound digitalmars.com> wrote in message
 news:cfbip7$ora$1 digitaldaemon.com...
 "van eeshan" <vanee hotmail.net> wrote in message
 news:cfb72j$9l7$1 digitaldaemon.com...
 "Walter" <newshound digitalmars.com> wrote in message
 news:cfb5v9$76u$1 digitaldaemon.com...
 The problem, however, is that since Java does not allow any objects
 or arrays on the stack (or in static data), whenever you need one
it
 needs
 to
 be allocated on the heap. This is a lot more expensive than stack
 allocation. Since one winds up allocating a lot more heap objects
in
 Java
 than one does in C/C++, the end result is slower. Java offers no
 alternatives to gc allocation.

 This is why D allows stack based objects and value objects.
Are you saying that D allows one to create Objects entirely upon the
stack? For auto objects, it is allowed for the compiler to put them there. But
I
 was primarilly referring to structs and arrays.

 And entirely within static data?
Structs and arrays, yes. Java does not allow structs, nor does it allow arrays on the stack.
Thanks for clarifying that Walter. Would it be prudent to note that D allocates auto Objects on the heap? Stack allocation may be allowed for
in
 the spec, but that's not how the compiler operates.

 Also, there is that point about heap allocations being a lot more
 expensive than stack allocations. Naturally, one would lean towards
 wholeheartedly agreeing; but this is just not true when it comes to D.
 Here's a little (contrived) test jig to illustrate:

 private import std.c.time;
 private import std.c.stdlib;
 private import std.c.windows.windows;

 void heap()
 {
         void *x = malloc (16 * 1024);
         free (x);
 }

 void stack()
 {
         ubyte[16 * 1024] x;
 }

 void trace (uint startTime)
 {
         FILETIME ft;
         GetSystemTimeAsFileTime(&ft);
         printf ("%u\n", (ft.dwLowDateTime - startTime) / 10000);
 }

 void main()
 {
         FILETIME start;

        GetSystemTimeAsFileTime(&start);
         trace (start.dwLowDateTime);

         sleep (3);
         trace (start.dwLowDateTime);

         for (int i=1_000_000; --i;)
              heap();

         trace (start.dwLowDateTime);

         for (int i=1_000_000; --i;)
              stack();

         trace (start.dwLowDateTime);
 }

 The results? On this machine it takes the heap() routine ~500ms to
 execute, and the stack() version ~3500ms. Yep, that's seven times
longer.
 Why? What wasn't stated is that D ensures all stack variables are
 initialized ... including arrays. The innocuous stack allocation
actually
 clears the 16KB to zero each time, which is why it takes so long. For a
 32KB 'buffer', the stack() execution time goes up to ~8600ms versus the
 ~500ms for heap(); you can see there's some rather non-linear behavior
for
 the stack version <g>

 The point here is not about the benefits/detriments of deterministic
 initialization. It's about clarifying a point of performance-tuning that
 is far from obvious. Without wishing to put too fine a point on it, the
 quoted post was somewhat misleading. That is; allocating an array on the
D
 stack can consume far more cycles than the equivalent heap allocation.
It
 depends upon the size of the array, the bus speed, the size of the
cache,
 heap fragmentation and many other things.

 D stack allocations are simply not like C/C++ at all. Far from it.


 For the curious, here's the disassembly for both routines. It's worth
 noting that the array initialization could be sped up. Also worth noting
 is that the loop/call overhead was 10ms.

 10:   {
 11:           void *x = malloc (16 * 1024);
 12:           free (x);
 00402011   push        4000h
 00402016   call        _malloc (0040574c)
 0040201B   add         esp,4
 0040201E   push        eax
 0040201F   call        _free (004057a4)
 00402024   add         esp,4
 13:   }
 00402027   pop         eax
 00402028   ret
 14:
 15:
 16:   void stack()
 0040202C   push        ebp
 0040202D   mov         ebp,esp
 0040202F   mov         edx,4
 00402034   sub         esp,1000h
 0040203A   test        dword ptr [esp],esp
 0040203D   dec         edx
 0040203E   jne         _D5hello5stackFZv+8 (00402034)
 00402040   sub         esp,edx
 00402042   mov         ecx,4000h
 00402047   xor         eax,eax
 17:   {
 18:           ubyte[16 * 1024] x;
 00402049   push        edi
 0040204A   lea         edi,[x]
 00402050   rep stos    byte ptr [edi]
 19:   }
 00402052   pop         edi
 00402053   mov         esp,ebp
 00402055   pop         ebp
 00402056   ret
Aug 11 2004
parent reply Arcane Jill <Arcane_member pathlink.com> writes:
In article <cfete2$1p7g$1 digitaldaemon.com>, van eeshan says...

Currently, if you want to quickly allocate an array on the
stack you must do something like the following:

  char[] array = (cast(char *) alloca(Size))[0..Size];
My experiments seem to suggest that the following actually works rather well.
Aug 12 2004
next sibling parent "van eeshan" <vanee hotmail.net> writes:
Now that is exactly what the compiler ought to (optionally) provide for, at
the high level :-)

16 bytes allocations (versus relative execution time):
heap:      269
stack[]:   50
alloca():  57
asm:       6

2K allocations:
heap:      504
stack[]:   5681
alloca():  57
asm:       10

32K allocations:
heap:      520
stack[]:   8812
alloca():  118
asm:       10

There's that 32KB stack-array initialization being ~88000% slower than the
traditional [asm] C/C++ uninitialized style. Barring further issue, I will
use that ~ thanks very much Jill. Perhaps one of the Wiki owners will take
an interest in noting these workarounds.



"Arcane Jill" <Arcane_member pathlink.com> wrote in message
news:cff6a9$1ugb$1 digitaldaemon.com...
 In article <cfete2$1p7g$1 digitaldaemon.com>, van eeshan says...

Currently, if you want to quickly allocate an array on the
stack you must do something like the following:

  char[] array = (cast(char *) alloca(Size))[0..Size];
My experiments seem to suggest that the following actually works rather
well.










Aug 12 2004
prev sibling parent "van eeshan" <vanee hotmail.net> writes:
I've been trying to convert this to a mixin, but not having much luck. Guess
I don't know enough about templates ... can you point me in the right
direction please Jill? Or will a mixin always place this inside its own
private function scope (and thus restore ESP before the array can be used) ?


"Arcane Jill" <Arcane_member pathlink.com> wrote in message
news:cff6a9$1ugb$1 digitaldaemon.com...
 In article <cfete2$1p7g$1 digitaldaemon.com>, van eeshan says...

Currently, if you want to quickly allocate an array on the
stack you must do something like the following:

  char[] array = (cast(char *) alloca(Size))[0..Size];
My experiments seem to suggest that the following actually works rather
well.










Aug 12 2004