www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - newCTFE gets a 10x faster string concat

reply Stefan Koch <uplink.coder googlemail.com> writes:
Hi there,

in preparation for my little talk/demo of newCTFE I have worked 
on a few things to make it less embarrassing.
Consider the following code:
```d
string makeBigString(int N)
{
     string x = "this is the string I want to append\n";
     string result = "";
     foreach(_; 0 .. N)
     {
         result ~= x;
     }
     return result;
}

pragma(msg, makeBigString(short.max / 4).length);
```

An hour ago this would had this embarrassing outcome:
```

testStringConcat.d  -new-ctfe
   Time (mean ± σ):     831.7 ms ±  29.3 ms    [User: 320.9 ms, 
System: 509.5 ms]
   Range (min … max):   805.9 ms … 880.9 ms    10 runs


   Time (mean ± σ):     378.2 ms ±  12.1 ms    [User: 102.6 ms, 
System: 274.9 ms]
   Range (min … max):   366.7 ms … 400.0 ms    10 runs

Summary
   'generated/linux/release/64/dmd -c testStringConcat.d' ran
     2.20 ± 0.10 times faster than 'generated/linux/release/64/dmd 
-c testStringConcat.d  -new-ctfe'
```

With new CTFE being twice as slow.
And if you had written
```d
pragma(msg, makeBigString(short.max).length);
```
you would have gotten something even more embarrassing:

`core.exception.AssertError src/dmd/ctfe/bc.d(3675): !!! HEAP 
OVERFLOW !!!`

I have fixed that now.
As of a few moments ago the results look different though.

for
```d
pragma(msg, makeBigString(short.max/4).length);
```

you now get:
```

testStringConcat.d  -new-ctfe
   Time (mean ± σ):      55.3 ms ±   2.7 ms    [User: 40.4 ms, 
System: 14.7 ms]
   Range (min … max):    48.2 ms …  63.8 ms    50 runs


   Time (mean ± σ):     387.6 ms ±  16.6 ms    [User: 112.0 ms, 
System: 274.6 ms]
   Range (min … max):   372.5 ms … 420.9 ms    10 runs

Summary
   'generated/linux/release/64/dmd -c testStringConcat.d  
-new-ctfe' ran
     7.01 ± 0.45 times faster than 'generated/linux/release/64/dmd 
-c testStringConcat.d'
```

and for  for `pragma(msg, makeBigString(short.max).length);`

```

testStringConcat.d  -new-ctfe
   Time (mean ± σ):     498.6 ms ±  16.0 ms    [User: 209.3 ms, 
System: 287.7 ms]
   Range (min … max):   481.8 ms … 523.6 ms    10 runs


   Time (mean ± σ):      5.094 s ±  0.130 s    [User: 995.8 ms, 
System: 4086.8 ms]
   Range (min … max):    4.909 s …  5.270 s    10 runs

Summary
   'generated/linux/release/64/dmd -c testStringConcat.d  
-new-ctfe' ran
    10.22 ± 0.42 times faster than 'generated/linux/release/64/dmd 
-c testStringConcat.d'
```

Which is the 10x faster that I was talking about.

If you want to know how I was able to speed it up attend my 
demonstration at beerconf on Saturday.

P.S.
In terms of memory use we are looking at  `1.3 GB` for `newCTFE` 
and
`18.1GB` for "oldCTFE".

which is roughly a 13x difference.

Cheers,
Stefan
Sep 23
parent Stefan Koch <uplink.coder googlemail.com> writes:
On Thursday, 23 September 2021 at 13:01:33 UTC, Stefan Koch wrote:
 Hi there,
 [ ... 10x difference  bla bla ...]
Of course it is possible by varying the test-cases to get an almost arbitrary speedup. ``` testStringConcat.d -new-ctfe Time (mean ± σ): 160.3 ms ± 2.8 ms [User: 121.6 ms, System: 38.4 ms] Range (min … max): 154.1 ms … 164.9 ms 18 runs Time (mean ± σ): 6.538 s ± 0.105 s [User: 3.253 s, System: 3.276 s] Range (min … max): 6.450 s … 6.768 s 10 runs Summary 'generated/linux/release/64/dmd -c testStringConcat.d -new-ctfe' ran 40.79 ± 0.96 times faster than 'generated/linux/release/64/dmd -c testStringConcat.d' ``` The highest I have been able to get it a 50x ... after that the old interpreter will run out of memory and freeze my computer The code for the benchmark below is: ```d string makeBigString(int N) { string x = "this is the string I want to append\n"; string result = ""; foreach(_; 0 .. N) { result ~= x; } return result; } // pragma(msg, makeBigString(cast(uint)(short.max * 1.91)).length); // max for newCTFE we run out of 32 address space after this // commented out because without newCTFE we just crash int[] crappyIota(int N) { int[] result = []; foreach(i; 0 .. N) { result ~= i; } return result; } pragma(msg, crappyIota(short.max).length + crappyIota(short.max)[$-1]); pragma(msg, makeBigString(cast(uint)(short.max / 4)).length); pragma(msg, makeBigString(cast(uint)(short.max / 2)).length); ``` As you can see `makeBigString(cast(uint)(short.max * 1.91)).length)` is the most I can test at all since the newCTFE VM uses a 31bit bit heap address space. as half of the space is reserved for the stack. I am meaning to change the 2GB/2GB split to a 3.498 GB / 0.512 GB split but I haven't done that yet. For the example above newCTFE uses 60 times less memory than the current interpreter.
Sep 23