digitalmars.D - newCTFE gets a 10x faster string concat

Stefan Koch (91/91) Sep 23 2021 Hi there,

Stefan Koch (61/63) Sep 23 2021 Of course it is possible by varying the test-cases to get an

Stefan Koch <uplink.coder googlemail.com> writes:

Hi there,

in preparation for my little talk/demo of newCTFE I have worked 
on a few things to make it less embarrassing.
Consider the following code:
```d
string makeBigString(int N)
{
     string x = "this is the string I want to append\n";
     string result = "";
     foreach(_; 0 .. N)
     {
         result ~= x;
     }
     return result;
}

pragma(msg, makeBigString(short.max / 4).length);
```

An hour ago this would had this embarrassing outcome:
```

testStringConcat.d  -new-ctfe
   Time (mean ± σ):     831.7 ms ±  29.3 ms    [User: 320.9 ms, 
System: 509.5 ms]
   Range (min … max):   805.9 ms … 880.9 ms    10 runs


   Time (mean ± σ):     378.2 ms ±  12.1 ms    [User: 102.6 ms, 
System: 274.9 ms]
   Range (min … max):   366.7 ms … 400.0 ms    10 runs

Summary
   'generated/linux/release/64/dmd -c testStringConcat.d' ran
     2.20 ± 0.10 times faster than 'generated/linux/release/64/dmd 
-c testStringConcat.d  -new-ctfe'
```

With new CTFE being twice as slow.
And if you had written
```d
pragma(msg, makeBigString(short.max).length);
```
you would have gotten something even more embarrassing:

`core.exception.AssertError src/dmd/ctfe/bc.d(3675): !!! HEAP 
OVERFLOW !!!`

I have fixed that now.
As of a few moments ago the results look different though.

for
```d
pragma(msg, makeBigString(short.max/4).length);
```

you now get:
```

testStringConcat.d  -new-ctfe
   Time (mean ± σ):      55.3 ms ±   2.7 ms    [User: 40.4 ms, 
System: 14.7 ms]
   Range (min … max):    48.2 ms …  63.8 ms    50 runs


   Time (mean ± σ):     387.6 ms ±  16.6 ms    [User: 112.0 ms, 
System: 274.6 ms]
   Range (min … max):   372.5 ms … 420.9 ms    10 runs

Summary
   'generated/linux/release/64/dmd -c testStringConcat.d  
-new-ctfe' ran
     7.01 ± 0.45 times faster than 'generated/linux/release/64/dmd 
-c testStringConcat.d'
```

and for  for `pragma(msg, makeBigString(short.max).length);`

```

testStringConcat.d  -new-ctfe
   Time (mean ± σ):     498.6 ms ±  16.0 ms    [User: 209.3 ms, 
System: 287.7 ms]
   Range (min … max):   481.8 ms … 523.6 ms    10 runs


   Time (mean ± σ):      5.094 s ±  0.130 s    [User: 995.8 ms, 
System: 4086.8 ms]
   Range (min … max):    4.909 s …  5.270 s    10 runs

Summary
   'generated/linux/release/64/dmd -c testStringConcat.d  
-new-ctfe' ran
    10.22 ± 0.42 times faster than 'generated/linux/release/64/dmd 
-c testStringConcat.d'
```

Which is the 10x faster that I was talking about.

If you want to know how I was able to speed it up attend my 
demonstration at beerconf on Saturday.

P.S.
In terms of memory use we are looking at  `1.3 GB` for `newCTFE` 
and
`18.1GB` for "oldCTFE".

which is roughly a 13x difference.

Cheers,
Stefan

Sep 23 2021

Stefan Koch <uplink.coder googlemail.com> writes:

On Thursday, 23 September 2021 at 13:01:33 UTC, Stefan Koch wrote:
 Hi there,
 [ ... 10x difference  bla bla ...]

Of course it is possible by varying the test-cases to get an 
almost arbitrary speedup.

```

testStringConcat.d -new-ctfe
   Time (mean ± σ):     160.3 ms ±   2.8 ms    [User: 121.6 ms, 
System: 38.4 ms]
   Range (min … max):   154.1 ms … 164.9 ms    18 runs


   Time (mean ± σ):      6.538 s ±  0.105 s    [User: 3.253 s, 
System: 3.276 s]
   Range (min … max):    6.450 s …  6.768 s    10 runs

Summary
   'generated/linux/release/64/dmd -c testStringConcat.d 
-new-ctfe' ran
    40.79 ± 0.96 times faster than 'generated/linux/release/64/dmd 
-c testStringConcat.d'
```

The highest I have been able to get it a 50x ... after that the 
old interpreter will run out of memory and freeze my computer
The code for the benchmark below is:
```d
string makeBigString(int N)
{
     string x = "this is the string I want to append\n";
     string result = "";
     foreach(_; 0 .. N)
     {
         result ~= x;
     }
     return result;
}

// pragma(msg, makeBigString(cast(uint)(short.max * 
1.91)).length);
// max for newCTFE we run out of 32 address space after this
// commented out because without newCTFE we just crash

int[] crappyIota(int N)
{
     int[] result = [];
     foreach(i; 0 .. N)
     {
         result ~= i;
     }
     return result;
}

pragma(msg, crappyIota(short.max).length + 
crappyIota(short.max)[$-1]);
pragma(msg, makeBigString(cast(uint)(short.max / 4)).length);
pragma(msg, makeBigString(cast(uint)(short.max / 2)).length);
```

As you can see `makeBigString(cast(uint)(short.max * 
1.91)).length)`
is the most I can test at all since the newCTFE VM uses a 31bit 
bit heap address space.
as half of the space is reserved for the stack.
I am meaning to change the 2GB/2GB split to a 3.498 GB / 0.512 GB 
split
but I haven't done that yet.

For the example above newCTFE uses 60 times less memory than the 
current interpreter.

Sep 23 2021

D Programming

C/C++ Programming

Other

digitalmars.D - newCTFE gets a 10x faster string concat