www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - DMD Performance Regression Publisher [GSoC 2026]

reply Abul Hossain Khan <abulkhan19175 gmail.com> writes:
Hi everyone,
I am working on the Performance Regression Publisher project 
under the mentorship of Dennis.

My progress so far: the initial end-to-end pipeline has been 
built and is working on my fork. The bot builds DMD at a PR's 
merge-base and at its head, measures a small set of metrics under 
cachegrind, and posts a single sticky comment with the diff.

**What's done**

The harness is in `tools/perfrunner/` and is written in D (dub 
project),

- `app.d` — CLI, takes the two already-built dmd binaries + 
metadata and writes `results.json`.
- `cachegrind.d` — runs the compile under cachegrind and reads 
the instruction count.
- `metrics.d` — the five metrics below.
- `report.d` / `stats.d` — the schema-v1 JSON and the 
percent-delta math.
- `workloads/hello.d` — the single workload for now will add more 
soon.

Around it, `.github/workflows/perf.yml` runs on every PR (and on 
pushes to master), builds both refs with the 
existing `ci/run.sh`, runs the harness, and hands the result 
to `.github/scripts/perf_comment.py`, which upserts one sticky 
comment so force-pushes don't spam the thread.

**The metrics it reports(PR Comment) right now Looks like this:**

| Metric | Base | PR | delta |
|--------|------|----|-------|
| compile hello.d (instr) | 422.9 M | 457.9 M | +8.27% |
| compile hello.d -O (instr) | 446.1 M | 481.1 M | +7.84% |
| dmd binary size (stripped) | 11.91 MB | 11.91 MB | 0.00% |
| hello binary size | 0.72 MB | 0.72 MB | 0.00% |
| peak RSS (compile hello.d) | 56 MB | 55 MB | -2.17% |


I tested the whole path end to end: On my fork with a deliberate 
busy loop in `compiler/src/main.d`, and the bot reported +8.27% / 
+7.84% instructions consistently across reruns while size stayed 
flat.
Code - https://github.com/abulgit/dmd/pull/39

**what's next -**

1. We are currently building both with DMD 2.112.0 as the host 
compiler. Dennis suggested moving to `ldc2 -O3` with PGO so the 
binary we measure matches a real release build and the optimizer 
doesn't make a harmless PR look like a regression. Also he 
suggested that we should to do this early, in case valgrind has 
any trouble with ldc2.
2. After that, We will add more Realistic Workloads there like 
Phobos etc.
3. And then Building the dashboard that will show the historical 
Data.

That's the plan we have right now, and I'll try to post weekly 
updates here as work progresses.
Feel free to leave any feedback or suggestions!
Jun 12
next sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On Friday, 12 June 2026 at 18:58:03 UTC, Abul Hossain Khan wrote:
 Hi everyone,
 I am working on the Performance Regression Publisher project 
 under the mentorship of Dennis.

 [...]
Have you thought about using Linux’s perf tool it has interesting stats about performance counters. Valgrind is sensible tool but being intrusive it may distort profile a little bit.
Jun 15
parent Abul Hossain Khan <abulkhan19175 gmail.com> writes:
 Have you thought about using Linux’s perf tool it has 
 interesting stats about performance counters. Valgrind is 
 sensible tool but being intrusive it may distort profile a 
 little bit.
We started with Valgrind because it's giving us quite good and stable results so far, but we'll definitely look into perf as well, especially once we switch to ldc2 + PGO and can compare the approaches.
Jun 16
prev sibling parent reply Abul Hossain Khan <abulkhan19175 gmail.com> writes:
Hi everyone,

Last time I posted, we had the basic pipeline working: the bot 
builds DMD at the base and head commits, runs Cachegrind, and 
posts a sticky comment. At that point, everything was being built 
with DMD 2.112.0 as the host compiler.



Now we are building with `ldc2-1.42.0` as the host compiler, so 
the binary we measure more closely resembles a real release 
build. I got that working, and the numbers improved quite a bit.

| Metric                     | DMD Host |   LDC2 Host | 
Improvement |
| -------------------------- | -------: | ----------: | 
----------: |
| compile hello.d (instr)    |  422.8 M | 257.6 M     |  -39.1%   
   |
| compile hello.d -O (instr) |  445.9 M | 277.9 M     |  -37.7%   
   |
| dmd binary size (stripped) | 11.93 MB | 6.96 MB     |  -41.7%   
   |
| hello binary size          |  0.72 MB | 0.72 MB     |      -    
   |
| peak RSS                   |    56 MB |   49 MB     |  -12.5%   
   |

Switching from DMD as the host compiler to LDC2 reduced 
instruction counts by about 39%, cut the compiler binary size by 
41%, and slightly reduced peak memory usage. The results are also 
very consistent across reruns, and the workflow runs faster as 
well.

Now as my mentor also suggested adding PGO (Profile Guided 
Optimization) with the LDC2 build for even better results. I 
didn't know much about PGO at first, but Dennis helped me 
understand it and also shared his local build script with PGO 
enabled with me. I think I've almost figured it out, so that will 
be the next thing I work on.



* Finish the PGO integration ASAP.
* Add more realistic workloads, like Phobos etc. and then we will 
start working on the dashboard to display historical results.



Yes I will have my semester exams from June 27 to July 3, so I 
will be unavailable for about a week. Hopefully that's okay. I'll 
make up for it afterward if needed, and I will make sure to 
finish the project on time (or even ahead of schedule).

Also, I want to thank my mentor, Dennis. He has been incredibly 
supportive throughout.

As always, feel free to leave any feedback or suggestions!
Jun 25
next sibling parent monkyyy <crazymonkyyy gmail.com> writes:
On Thursday, 25 June 2026 at 17:51:10 UTC, Abul Hossain Khan 
wrote:
 Hi everyone,

 Last time I posted, we had the basic pipeline working: the bot 
 builds DMD at the base and head commits, runs Cachegrind, and 
 posts a sticky comment. At that point, everything was being 
 built with DMD 2.112.0 as the host compiler.

 [...]
binary size of dmd is much less important then compile speed
Jun 25
prev sibling parent FinalEvilution <FinalEvilution gmail.com> writes:
On Thursday, 25 June 2026 at 17:51:10 UTC, Abul Hossain Khan 
wrote:

 Now as my mentor also suggested adding PGO (Profile Guided 
 Optimization) with the LDC2 build for even better results. I 
 didn't know much about PGO at first, but Dennis helped me 
 understand it and also shared his local build script with PGO 
 enabled with me. I think I've almost figured it out, so that 
 will be the next thing I work on.
+1 for PGO. As far as I'm concerned everyone who is seriously optimizing for performance should be using PGO. Modifying a build script to run the benchmark suite (if you don't have one what are you optimizing) and compile a second time is a pretty simple change, but I've seen up to a 30% decrease in run time... For just a makefile edit.
 **Semester exam**
 Yes I will have my semester exams from June 27 to July 3, so I 
 will be unavailable for about a week.
You're going to ace it.
 As always, feel free to leave any feedback or suggestions!
If memory serves perf has the ability to diff the performance of 2 separate profiles on a per function basis. Might be handy.
Jun 26