digitalmars.D.bugs - array .dup bug in 0.97 and 0.98?

digitalmars.D.bugs - array .dup bug in 0.97 and 0.98? - matrix_dup_bug.zip

Dave (97/97) Aug 05 2004 Wow! Great language Walter!!

Nick (20/24) Aug 05 2004 I'm not quite sure I understand the problem. Are you trying to copy a do...

Dave (31/37) Aug 05 2004 Yes, I was trying to copy the whole matrix with a dup. on m1.

Dave (15/26) Aug 05 2004 I apologize - you are exactly correct in what is happening. I guess it w...

Regan Heath (34/74) Aug 05 2004 char[] doesn't exactly have copy-on-write semantics, for example:

Dave (17/19) Aug 05 2004 Got it - thanks for the quick replies.

J C Calvarese (20/49) Aug 05 2004 From: http://www.digitalmars.com/d/future.html
Juanjo =?ISO-8859-15?Q?=C1lvarez?= (6/36) Aug 06 2004 Somebody with time should implement and run the "Great Language

Dave (22/27) Aug 06 2004 Where is the updated version? I've already run half the tests, thanks to...

Juanjo =?ISO-8859-15?Q?=C1lvarez?= (6/29) Aug 06 2004 The objective would be to fix the slow parts of the language, if the nat...

Dave (12/26) Aug 06 2004 Can't believe I missed it the first time - been a long week, TGIF.

Regan Heath (25/65) Aug 08 2004 The built in D strings do not handle concatenation too well either, they...

Dave (8/26) Aug 09 2004 If 'reserve' is added, another suggestion could be to preemptively expan...

Juanjo =?ISO-8859-15?Q?=C1lvarez?= (4/7) Aug 19 2004 An algorithm that I've find good enough for most situations and simple t...

Dave (17/26) Aug 19 2004 Good idea - I've written some wicked fast C++ string classes (for, among

Dave <Dave_member pathlink.com> writes:

Wow! Great language Walter!!

Please see attached. It contains the .d and two .cod files (one with the bug and
one without). Comments in the .d file explain the potential problem fairly well,
hopefully.

Also, would you like reports and examples on performance issues should I run
into any at this time? I would be more than happy to take the time and provide
this info. if it is wanted.

Thanks,
Dave


begin 0644 matrix_dup_bug.zip

M$D%_2A)!57 $``````"-5FMOVS84_1X _^&B:!$[UH.2_(I3(VB;8`BPH,/6
M8<`\?Z`EVJ8MDII(V>F&[+?ODGXICH,F$&(^[CWW\.CPVF$(_J4/0F5L`&FK

M)<Z-*09AN%ZO PF=Y>Q[H,I9^)^-"O5<*:,J$YZ?G9]Q4:C2 .&"E==V(552

MY0K'0XBN]SL;E.U<!#F3,S/'$(NU79VJ$AI\2*Z!?W3+P%NM/;K+&_'Q(=46
MOSYLNO2%35]\=%NP.$K?08P68Y>.)%NM&L+39KC]*)FI2MD038QX>J:%J')S



M#IZ[%,WCN/A%7#TL
M`5H4C)8:S)P:N#B2]`+H1*T8<`U2&9 P6X'A3D4-RV"+G'^'QLU-\QGPU!'!


M/,7."&4EM255&>B1#Z!SM;8O<$X=8*T.L+\KCCH 9_B"W=1E[QC!FJ-;,I'"

MZ(KRG$YR9H^Z/=A!_T/OL*:)`GP3QW82K]KI96N0)WJCZTH'NWJN"]GN(\3.
MDT\[K`)OBIDVWGW(8/?\)=]Y0HS(&!\[B,>CQ`V2,8[MH#W&9P>U1]`V,:7:

M\^6PZ ?$0L0]$G4[$)%NU.L2B-J=I'N%\WZG2^(.0"?H]Q+2!\WPNRS3T/"_
M)N"+M*B&!5J"5Z(-OC;4\+2YJX!?KK;7)4$[B'Y<H1MT^R3J'568*L&-!4Z7

MD18%^&H[VY.Y??AB->D';?(&U:^"J-,FI$9&U9!N+1()KOH_1HHZ0;?W'.DK
M^%SFME'X)<L9U0RA_P=02P,$%````` `9TX%,:-C]V67!```.Q(```H`%0!M
M871R:7 N8V]D550)``-B21)!R4D2055X!```````I5A;;Z,Z$'XVOZ+/*UI!
M0E*2-P*)%"D/1\H^1*VJB`0:V.42 7.VY]^?\05C&]/-;J6*VI[Q>"Z>;\;9
M[(+OZ-+4MRNRT/5V*O(S.D;S,L9-_N'MMR_K?+CNES_98)/G+T% X)B5Y:W`
M0`8J^V-\Z0=N*G0LX[SJ-R5I=CRE%W4AK1))+-O`=E^;O,+O0E8TPWF9-K/O
M]'L\X[K9O(3RHF!-CE7ZZUS$;2N60C([&KFC:8N3>0OF5!<OQG6^">*70A(6
M-TW\7YOB(JTN.+.L)PPDU*:7,JVPA4#PK4Q1N
M2D[(<=/,=AQ'^O!M3,*I;84`.F'+3:V(II(F,]C_/"5"XDX<79M[9,2GC%=9
MLHU3$,)XJ4K\0';ZI;H]%7GULZ[.Z1,>O2]&']VYUQJE -_ PK096J_^ 6%9
M_TM&]GI/9NWMA&!D.ZZ?]7P'&!9IC-;AP7YTPNP5-KR)O0=['1P$\WXKAE$_



MG6*A;!59D:(DL06&"Q%(YH>'-TFBIZTGJ7PF.&O[S6-$? 93:LLC08WE]B`P


M\'`]*LFG"F,2G*&$YT .K1IL"+\4ZSY; X.4K>RZ=[C#TK


MQ/_"QQP`O,`"%R_[``T0JU-`KFE<MJ9Z-Y?5XI TA"%=D<D"D"CJ%`&&/M:F
MS9ZRV07T]F48,\"N_W40(SK_`8Z-(),9S4AW>A=HT3:6_;L/ L!7HQ"D 8[>
M(] 3XN`*TG#B64
M='5<#D%=3M;O[]`;/&S G;(T=NMR#S+H+?I^7_01GD'M.YX0'$YZF`OD7L%Q

ML0*$62U5)09C4^MMI/GCM-'T4*J`3O/&"&/]^N\>R PX5P+^B0=Y)P"-`'AC
ML9([OP&F:]46$HGAKM2V2YE/+0O5BDMO>J341U:PC2+\P7[J9'W_N`KA4`/7
M&0"2J;Q*"UL%+OC;]-4)YYGA2&4K*8QR69/!58E<]^N#U


M$D%5>`0``````*586V^C.A!^AE_1YQ6I("$IR1N!1(J4AY6Z#U&K*B*!)NQR

M.OUPGW<OFW2X[N4_V&";IB^^K^&8Y_DM:X$,5/;'^)*/MBZ,8QZE1;\I3J['
M4W*1%Y(B1F+9!K:[JM.B?1>RHCB\56(6SMLT3^KY-_H]GMNRWKX$>%&PQL<B
M^77.HJ812P&9';7<X:QIXT4#QA47-VK+=.M'+QD2%M5U]%^3M%E27-JK:3ZV

M))R:1 B $[9<EY)H*FDZA_U/,R(DZL31M85+1GS*>*4E2SL%(8R7JL0/9*=?
MBMMCEA8_RN*</+:CMT?KHSOWFJ,4\!M<G^9J;-9?89B7/\G(VCR367,[&3"R
M;,>[]GP'&&9)9&R" S6Q ^LK;' 3>P_6QC\(YN>=&(;],!`,'M\7_RKK^*%J
MZX>)[;A,I.5HB/PXRQ[2O)Y"9?<DKU/Q'&69421139>'M]$P/LJ::&TQS:,X

M.Q3Y^T[%0U`\'"K>)DW+0D.%,;5M<S^?8;6%XX$E+<YDL5,LP%:1%10EQ.9K
M+H2/S`\.;TBBJZS'"3X3G+7[XC(B/X,IM>.1H,9R>PPPQ0235ISI63!!*(7B

M^!-X&"D[OZ'?"16V`A4(".3D0K!`0F5/Q8WJD\N#Y')L'JZ)E'RR,";!'DIX


MG- :)% ?Y'FX4Q5G9TCK*5S''DC[LWI`(' G_!MT_B7^%S[F`.#Z)KAXU0=H
M %B=`KBF<=F*ZMT<J\4Q:0A#JB

M"TJ1T\X;8?4Z1J68C[7,XF*PANI[7CTTU[)NV=69NNCJ.!R"NIPLW]^A-WC8
MPJMEI>W6<0\RZ"WZ?E_T$:Y&[3N>$!Q.>ICS<:] .QL1*#Y6SQA]275`Y7TF
M6=NI_-UQHEM5 CI1`16#I]`)U8]>!TE(C]2*3OS=QJJ+:.&D<NCK*XRN!Q3U
MDI::Y18 :+N2?3(8CQC[F0T#VFC

M!HX]0"Q=_44+.PE/^./UU0[\J^9(:2NIG+CN8?25(M?]6(&Z"KE30,\ [&%/
MV,=%'Y`96JW#S55A

M550%``-_2A)!57 ``%!+`0(7`Q0````(`&=.!3&C8_=EEP0``#L2```*``T`
M``````$```"D 48$``!M871R:7 N8V]D550%``-B21)!57 ``%!+`0(7`Q0`
M```(`&U)!3'0*7OWK`0``&H2```.``T```````$```"D 1H)``!M871R:7A?

`
end

Aug 05 2004

Nick <Nick_member pathlink.com> writes:

In article <cetin9$1oo1$1 digitaldaemon.com>, Dave says...

Wow! Great language Walter!!

Please see attached. It contains the .d and two .cod files (one with the
bug and one without). Comments in the .d file explain the potential
problem fairly well, hopefully.

I'm not quite sure I understand the problem. Are you trying to copy a double
array with a dup? In a double array such as int[][] mm, each element mm[] is a
pointer to an array int[]. Writing mm.dup gives you a copy of the table of row
pointer, but the pointers still point to the same actual data. If you want to
copy the entire matrix you would write something like













Incidently you can also write the last part as m1 = m2.copyMatrix(), but this
only works on arrays (as discussed recently on one of these NGs.)

Nick

Aug 05 2004

Dave <Dave_member pathlink.com> writes:

In article <cetnkl$1r0r$1 digitaldaemon.com>, Nick says...
In article <cetin9$1oo1$1 digitaldaemon.com>, Dave says...

I'm not quite sure I understand the problem. Are you trying to copy a double
array with a dup? In a double array such as int[][] mm, each element mm[] is a
pointer to an array int[]. Writing mm.dup gives you a copy of the table of row
pointer, but the pointers still point to the same actual data. If you want to
copy the entire matrix you would write something like

Yes, I was trying to copy the whole matrix with a dup. on m1.

I guess what you say makes sense (if you are used to the C/C++ 'way' at least),
but I gotta say is not explained well in the docs., the compiler didn't complain
at all, and the thing didn't crash at runtime.

Besides that, I tried this:

int[][] m1 = mkmatrix(SIZE, SIZE); // mkmatrix allocates/init's each element
printf("%d, %d, %d\n",m1.length,m1[0].length, m1[m1.length - 1].length);
printf("%d %d %d %d\n",m1[0][0],m1[2][3],m1[3][2],m1[4][4]);

int[][] mx = m1.dup;
printf("%d, %d, %d\n",mx.length,mx[0].length, mx[mx.length - 1].length);
printf("%d %d %d %d\n",mx[0][0],mx[2][3],mx[3][2],mx[4][4]);

The results were the same for both:

30, 30, 30
1 64 93 125
30, 30, 30
1 64 93 125

The way I look at it and how the compiler apparently looks at it is that m1 and
mx are both type int[][], so therefore a 'dup' that is applied directly to m1
should therefore allocate for and copy the entire thing.

Either way, there is a bug somewhere in the compiler - either it is copying the
whole matrix as it is supposed to and the buglet is further along in the code or
it is copying the entire matrix and is _not_ supposed to do that. If it is not
supposed to allocate and copy everything, then at least a warning should be
issued because the m1.dup code is pretty intuitive (and nice! if that is how it
is supposed to work).

I think the dup part is working the way it should but the buglet is further
along in the code, like in the mmult(...) part.

Maybe the compiler is getting references mixed up after a 'copy on write' or
something..

- Dave

Aug 05 2004

Dave <Dave_member pathlink.com> writes:

In article <cetvkv$1vpp$1 digitaldaemon.com>, Dave says...
In article <cetnkl$1r0r$1 digitaldaemon.com>, Nick says...
In article <cetin9$1oo1$1 digitaldaemon.com>, Dave says...

I'm not quite sure I understand the problem. Are you trying to copy a double
array with a dup? In a double array such as int[][] mm, each element mm[] is a
pointer to an array int[]. Writing mm.dup gives you a copy of the table of row
pointer, but the pointers still point to the same actual data. If you want to
copy the entire matrix you would write something like

Yes, I was trying to copy the whole matrix with a dup. on m1.

I apologize - you are exactly correct in what is happening. I guess it would not
necessarily be considered a bug and my reply just confused things more.

After the dup and subsequent writes to mm, the m1 data was being modified to be
the same as mm, as you implied.

What was further confusing me was the copy-on-write stuff, for example, for
operations on char[].

Which begs the question, since 1-D arrays are copy-on-write, shouldn't the 2nd
dimension in a dup'ed matrix also be copy-on-write?? Or does copy-on-write only
apply to char[]'s and not any other type of array??

I think that is really non-intuitive. I think of dup as a synonym for 'copy' and
most object.copy() type of methods I've used and written actually make a deep
copy of the object and it's contents (in this case, I'm looking at an int[][] as
an object).

- Dave

Aug 05 2004

Regan Heath <regan netwin.co.nz> writes:

On Thu, 5 Aug 2004 20:54:08 +0000 (UTC), Dave <Dave_member pathlink.com> 
wrote:
 In article <cetvkv$1vpp$1 digitaldaemon.com>, Dave says...
 In article <cetnkl$1r0r$1 digitaldaemon.com>, Nick says...
 In article <cetin9$1oo1$1 digitaldaemon.com>, Dave says...

 I'm not quite sure I understand the problem. Are you trying to copy a 
 double
 array with a dup? In a double array such as int[][] mm, each element 
 mm[] is a
 pointer to an array int[]. Writing mm.dup gives you a copy of the 
 table of row
 pointer, but the pointers still point to the same actual data. If you 
 want to
 copy the entire matrix you would write something like

 Yes, I was trying to copy the whole matrix with a dup. on m1.

 I apologize - you are exactly correct in what is happening. I guess it 
 would not
 necessarily be considered a bug and my reply just confused things more.

 After the dup and subsequent writes to mm, the m1 data was being 
 modified to be
 the same as mm, as you implied.

 What was further confusing me was the copy-on-write stuff, for example, 
 for
 operations on char[].

char[] doesn't exactly have copy-on-write semantics, for example:

char[] a = "regan";
char[] b = a;

b[0] = 'm';
assert(b[0] == a[0]);

will _not_ assert, this is because a and b are references, in this case to 
the same data.

Similarly, this..

char[] a = "regan";
char[] b = a[1..3];

b[0] = 'o';
assert(a[1] == b[0]);

will also not assert, this is because b simply references the same data as 
a.

This however...

char[] a = "regan";
char[] b;

b = a ~ " was here";

will copy the contents of a, for b, and append the new data.
The docs specifically state that an append will copy.

 Which begs the question, since 1-D arrays are copy-on-write, shouldn't 
 the 2nd
 dimension in a dup'ed matrix also be copy-on-write?? Or does 
 copy-on-write only
 apply to char[]'s and not any other type of array??

An 'int[][]' is a reference, to an array of references, to arrays of ints.
So...

int[][] a;
a.dup

says duplicate the data 'a' references, which is an array of references, 
to arrays of ints, the array of references gets duplicated, but the arrays 
of ints they refer to do not.

Nicks function was duplicating the arrays of ints as well.

 I think that is really non-intuitive. I think of dup as a synonym for 
 'copy' and
 most object.copy() type of methods I've used and written actually make a 
 deep
 copy of the object and it's contents (in this case, I'm looking at an 
 int[][] as
 an object).

And it does, the difference is what is considered the object in this case.

Regan

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

Aug 05 2004

Dave <Dave_member pathlink.com> writes:

In article <opsb99pvnp5a2sq9 digitalmars.com>, Regan Heath says...
And it does, the difference is what is considered the object in this case.

Regan

Got it - thanks for the quick replies.

In that same attachment with the original post, there is a performance
comparison between dmd, dmc, Intel and gcc for that code.

I suspect the question of array performance is going to be on many potential
users minds and gcc and intel perform 3x better for this. Of course, Java is
still living bad perf. down even though most runtimes are now pretty good, at
least for native data types, and I would like to see D avoid that wrap.

I know, this code is kind-of trivial, artificial, etc. but a 3x difference is
still pretty large (and yes, I used -O -inline -release).

Not complaining - just trying to make the language and tools better. Actually am
quite pleased with things like OutBuffer performance, etc. Other things, like
the native associative arrays though seem pretty slow compared to C++ hash_map<>
and Java HashMap().

Does anyone know if there are planned improvements before what I presume will be
the big release of 1.0? Is there any "roadmap" around I could look at?

Thanks.

Aug 05 2004

J C Calvarese <jcc7 cox.net> writes:

Dave wrote:
 In article <opsb99pvnp5a2sq9 digitalmars.com>, Regan Heath says...
 
And it does, the difference is what is considered the object in this case.

Regan

 
 
 Got it - thanks for the quick replies.
 
 In that same attachment with the original post, there is a performance
 comparison between dmd, dmc, Intel and gcc for that code.
 
 I suspect the question of array performance is going to be on many potential
 users minds and gcc and intel perform 3x better for this. Of course, Java is
 still living bad perf. down even though most runtimes are now pretty good, at
 least for native data types, and I would like to see D avoid that wrap.
 
 I know, this code is kind-of trivial, artificial, etc. but a 3x difference is
 still pretty large (and yes, I used -O -inline -release).
 
 Not complaining - just trying to make the language and tools better. Actually
am
 quite pleased with things like OutBuffer performance, etc. Other things, like
 the native associative arrays though seem pretty slow compared to C++
hash_map<>
 and Java HashMap().
 
 Does anyone know if there are planned improvements before what I presume will
be
 the big release of 1.0? Is there any "roadmap" around I could look at?
 
 Thanks.

From: http://www.digitalmars.com/d/future.html

Future Directions
The following new features for D are planned, but the details have not 
been worked out:

    1. Mixins.
    2. Template inheritance.
    3. Array literal expressions.



I get the impression that once the important bugs and are squashed (and 
don't ask me which ones are important because I don't know), it'll be 
stamped D 1.0 and sent out the door. I suspect that the only major 
features that might be added before 1.0 would be the ones 
inspired/demanded by the D Template Library project.

Then, I think Walter's planning on adding optimizations in the trek to 
2.0. And new features that fit into the D paradigm would be accumulated, 
too.

-- 
Justin (a/k/a jcc7)
http://jcc_7.tripod.com/d/

Aug 05 2004

Juanjo =?ISO-8859-15?Q?=C1lvarez?= <juanjuxNO SPAMyahoo.es> writes:

Dave wrote:

 In article <opsb99pvnp5a2sq9 digitalmars.com>, Regan Heath says...
And it does, the difference is what is considered the object in this case.

Regan

 
 Got it - thanks for the quick replies.
 
 In that same attachment with the original post, there is a performance
 comparison between dmd, dmc, Intel and gcc for that code.
 
 I suspect the question of array performance is going to be on many
 potential users minds and gcc and intel perform 3x better for this. Of
 course, Java is still living bad perf. down even though most runtimes are
 now pretty good, at least for native data types, and I would like to see D
 avoid that wrap.
 
 I know, this code is kind-of trivial, artificial, etc. but a 3x difference
 is still pretty large (and yes, I used -O -inline -release).
 
 Not complaining - just trying to make the language and tools better.
 Actually am quite pleased with things like OutBuffer performance, etc.
 Other things, like the native associative arrays though seem pretty slow
 compared to C++ hash_map<> and Java HashMap().
 
 Does anyone know if there are planned improvements before what I presume
 will be the big release of 1.0? Is there any "roadmap" around I could look
 at?
 
 Thanks.

Somebody with time should implement and run the "Great Language
Shootout" (updated version) to find problems like that in the current DMD
compiler.

http://shootout.alioth.debian.org/

PS: How GDC compare to DMD on array handling?

Aug 06 2004

Dave <Dave_member pathlink.com> writes:

In article <cevk5l$3e4$1 digitaldaemon.com>, Juanjo =?ISO-8859-15?Q?=C1lvarez?=
says...
Somebody with time should implement and run the "Great Language
Shootout" (updated version) to find problems like that in the current DMD
compiler.

http://shootout.alioth.debian.org/

PS: How GDC compare to DMD on array handling?

Where is the updated version? I've already run half the tests, thanks to a quick
start from http://www.functionalfuture.com/d/

For this particular test, GDC actually took over 2x as long compiled with:
'-O3 -fomit-frame-pointer -funroll-loops -mtune=pentium4 -static'. Based on
similiar differences between g++ and gcj, I think this has a lot to do with the
implementation of the frontend with the gcc backend and doesn't have anything to
do with D itself (gdc is brand-new, right)?

I would be happy to start on this comparison (but I can't 'promise' that I'll
have the time to finish it anytime real soon).

I therefore need to know thoughts on:

- Use the built-in Associative Array, or not? If not, is there a DTL
hash_map-like class I should use? I would vote for the built-in AA because
that's what'll be used most and so far it seems pretty slow.
- Use the built-in string, OutBuffer or DTL string (if there is one)?
- I plan to always use the built-in arrays for the numeric stuff, unless
somebody can give me a real good reason to use something else (like DTL).

Or...

- Just make it go as fast as possible (barring inline assembler) no matter what
the intent of the language design?

Thanks..

Aug 06 2004

Juanjo =?ISO-8859-15?Q?=C1lvarez?= <juanjuxNO SPAMyahoo.es> writes:

Dave wrote:


Somebody with time should implement and run the "Great Language
Shootout" (updated version) to find problems like that in the current DMD
compiler.

http://shootout.alioth.debian.org/

PS: How GDC compare to DMD on array handling?

 
 Where is the updated version? I've already run half the tests, thanks to a
 quick start from http://www.functionalfuture.com/d/

The url posted is the one of the updated version.

 I would be happy to start on this comparison (but I can't 'promise' that
 I'll have the time to finish it anytime real soon).
 
 I therefore need to know thoughts on:
 
 - Use the built-in Associative Array, or not? If not, is there a DTL
 hash_map-like class I should use? I would vote for the built-in AA because
 that's what'll be used most and so far it seems pretty slow.

The objective would be to fix the slow parts of the language, if the native
associative array is damn slow use it so maybe big W fix it :)

 - Use the built-in string, OutBuffer or DTL string (if there is one)?
 - I plan to always use the built-in arrays for the numeric stuff, unless
 somebody can give me a real good reason to use something else (like DTL).

Native, it will shown better how nice string manipulation can be in D.

 - Just make it go as fast as possible (barring inline assembler) no matter
 what the intent of the language design?

Please, not.

Aug 06 2004

Dave <Dave_member pathlink.com> writes:

In article <cf13rk$14jn$1 digitaldaemon.com>, Juanjo =?ISO-8859-15?Q?=C1lvarez?=
says...
Dave wrote:


Somebody with time should implement and run the "Great Language
Shootout" (updated version) to find problems like that in the current DMD
compiler.

http://shootout.alioth.debian.org/

PS: How GDC compare to DMD on array handling?

 
 Where is the updated version? I've already run half the tests, thanks to a
 quick start from http://www.functionalfuture.com/d/

The url posted is the one of the updated version.

Can't believe I missed it the first time - been a long week, TGIF.

The objective would be to fix the slow parts of the language, if the native
associative array is damn slow use it so maybe big W fix it :)

Makes total sense, but being new to the language I thought I'd ask because I
don't know the history or intended use of the built-ins. For example, Java has
String also, but it is not meant for heavy concatenation and the recommendation
has always been to use StringBuffer (and now StringBuilder for Java1.5 when it
doesn't need to be thread sychronized). 

I will use the built-in AA, string and indexed arrays for everything applicable,
but I bet when the comparisons come out, I'll get lots of the "hey, you
should've used class X instead" type of complaints ;).

- Dave

Aug 06 2004

Regan Heath <regan netwin.co.nz> writes:

On Sat, 7 Aug 2004 03:39:21 +0000 (UTC), Dave <Dave_member pathlink.com> 
wrote:
 In article <cf13rk$14jn$1 digitaldaemon.com>, Juanjo 
 =?ISO-8859-15?Q?=C1lvarez?=
 says...
 Dave wrote:


 Somebody with time should implement and run the "Great Language
 Shootout" (updated version) to find problems like that in the current 
 DMD
 compiler.

 http://shootout.alioth.debian.org/

 PS: How GDC compare to DMD on array handling?

 Where is the updated version? I've already run half the tests, thanks 
 to a
 quick start from http://www.functionalfuture.com/d/

 The url posted is the one of the updated version.

 Can't believe I missed it the first time - been a long week, TGIF.

 The objective would be to fix the slow parts of the language, if the 
 native
 associative array is damn slow use it so maybe big W fix it :)

 Makes total sense, but being new to the language I thought I'd ask 
 because I
 don't know the history or intended use of the built-ins. For example, 
 Java has
 String also, but it is not meant for heavy concatenation and the 
 recommendation
 has always been to use StringBuffer (and now StringBuilder for Java1.5 
 when it
 doesn't need to be thread sychronized).

The built in D strings do not handle concatenation too well either, they 
reallocate each time creating only enough space for each concatentation. 
You can however use a little trick like so:

char[] s = "test"
int keep;

keep = s.length;
s.length = 1000;
s.length = keep;
s = s ~ "a";

to pre-allocate the space you think you might need. However there is no 
guarantee it won't release that memory at some point during your 
concatenations.

I have requested a .reserve property for strings that reserves space as 
the above does but with the guarantee not to release it. It also has the 
benefit of more intuitive and simple syntax eg.

char[] s = "test";

s.reserve = 1000;
s = s ~ "a";

So far no luck.

 I will use the built-in AA, string and indexed arrays for everything 
 applicable,
 but I bet when the comparisons come out, I'll get lots of the "hey, you
 should've used class X instead" type of complaints ;).

To avoid that comment your intent at the top of the code.

Regan

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

Aug 08 2004

Dave <Dave_member pathlink.com> writes:

In article <opscftmwn55a2sq9 digitalmars.com>, Regan Heath says...
The built in D strings do not handle concatenation too well either, they 
reallocate each time creating only enough space for each concatentation. 
You can however use a little trick like so:

char[] s = "test"
int keep;

keep = s.length;
s.length = 1000;
s.length = keep;
s = s ~ "a";

to pre-allocate the space you think you might need. However there is no 
guarantee it won't release that memory at some point during your 
concatenations.

I have requested a .reserve property for strings that reserves space as 
the above does but with the guarantee not to release it. It also has the 
benefit of more intuitive and simple syntax eg.

char[] s = "test";

s.reserve = 1000;
s = s ~ "a";

If 'reserve' is added, another suggestion could be to preemptively expand the
buffer as the string grows from concatenation, to lower the
copying/reallocations. I believe most C++ std::string implementations do this.

Maybe for most situations, the programmer would be able to avoid using
string.reserve and performance would be better for the times it is tough to make
a good guess at the size at runtime.

Maybe this is how OutBuffer currently works as well.

Aug 09 2004

Juanjo =?ISO-8859-15?Q?=C1lvarez?= <juanjuxNO SPAMyahoo.es> writes:

Dave wrote:

 Maybe for most situations, the programmer would be able to avoid using
 string.reserve and performance would be better for the times it is tough
 to make a good guess at the size at runtime.

An algorithm that I've find good enough for most situations and simple to
code is to double the reserved size of the string every time it gets out of
space. I think that's also the way Python lists work.

Aug 19 2004

Dave <Dave_member pathlink.com> writes:

Juanjo Álvarez wrote:

 Dave wrote:
 
 Maybe for most situations, the programmer would be able to avoid using
 string.reserve and performance would be better for the times it is tough
 to make a good guess at the size at runtime.

 
 An algorithm that I've find good enough for most situations and simple to
 code is to double the reserved size of the string every time it gets out
 of space. I think that's also the way Python lists work.

Good idea - I've written some wicked fast C++ string classes (for, among
other things, 'buffering' very large dynamically generated HTML pages) that
would basically add 10% - 20% (can't remember exactly) whenever things were
realloc'd. This avoided a bunch of swapping once the strings got large and
the perf. difference for the smaller requirements was negligable. Pretty
simple algorithm, but it worked. No special attention to alignment or
anything else - just realloc a bit more than requested and perf. improved
dramatically in tight loops.

I've never dug into what std::basic_string<> is doing internally, but it
looks to be something along those lines rather than doubling it. And
basic_string<> is as fast or faster for concatenation as anything else I've
seen, including other built-in's like Borland Object Pascal.

Matthew, if you happen to read this, what do the STLSoft implementations do
in these cases?

Thanks,

Dave

Aug 19 2004

D Programming

C/C++ Programming

Other

digitalmars.D.bugs - array .dup bug in 0.97 and 0.98? - matrix_dup_bug.zip