digitalmars.D - DMD Performance Regression Publisher [GSoC 2026]
- Abul Hossain Khan (51/51) Jun 12 Hi everyone,
- Dmitry Olshansky (4/8) Jun 15 Have you thought about using Linux’s perf tool it has interesting
- Abul Hossain Khan (4/8) Jun 16 We started with Valgrind because it's giving us quite good and
- Abul Hossain Khan (46/46) Jun 25 Hi everyone,
- monkyyy (3/9) Jun 25 binary size of dmd is much less important then compile speed
- FinalEvilution (12/22) Jun 26 +1 for PGO.
Hi everyone, I am working on the Performance Regression Publisher project under the mentorship of Dennis. My progress so far: the initial end-to-end pipeline has been built and is working on my fork. The bot builds DMD at a PR's merge-base and at its head, measures a small set of metrics under cachegrind, and posts a single sticky comment with the diff. **What's done** The harness is in `tools/perfrunner/` and is written in D (dub project), - `app.d` — CLI, takes the two already-built dmd binaries + metadata and writes `results.json`. - `cachegrind.d` — runs the compile under cachegrind and reads the instruction count. - `metrics.d` — the five metrics below. - `report.d` / `stats.d` — the schema-v1 JSON and the percent-delta math. - `workloads/hello.d` — the single workload for now will add more soon. Around it, `.github/workflows/perf.yml` runs on every PR (and on pushes to master), builds both refs with the existing `ci/run.sh`, runs the harness, and hands the result to `.github/scripts/perf_comment.py`, which upserts one sticky comment so force-pushes don't spam the thread. **The metrics it reports(PR Comment) right now Looks like this:** | Metric | Base | PR | delta | |--------|------|----|-------| | compile hello.d (instr) | 422.9 M | 457.9 M | +8.27% | | compile hello.d -O (instr) | 446.1 M | 481.1 M | +7.84% | | dmd binary size (stripped) | 11.91 MB | 11.91 MB | 0.00% | | hello binary size | 0.72 MB | 0.72 MB | 0.00% | | peak RSS (compile hello.d) | 56 MB | 55 MB | -2.17% | I tested the whole path end to end: On my fork with a deliberate busy loop in `compiler/src/main.d`, and the bot reported +8.27% / +7.84% instructions consistently across reruns while size stayed flat. Code - https://github.com/abulgit/dmd/pull/39 **what's next -** 1. We are currently building both with DMD 2.112.0 as the host compiler. Dennis suggested moving to `ldc2 -O3` with PGO so the binary we measure matches a real release build and the optimizer doesn't make a harmless PR look like a regression. Also he suggested that we should to do this early, in case valgrind has any trouble with ldc2. 2. After that, We will add more Realistic Workloads there like Phobos etc. 3. And then Building the dashboard that will show the historical Data. That's the plan we have right now, and I'll try to post weekly updates here as work progresses. Feel free to leave any feedback or suggestions!
Jun 12
On Friday, 12 June 2026 at 18:58:03 UTC, Abul Hossain Khan wrote:Hi everyone, I am working on the Performance Regression Publisher project under the mentorship of Dennis. [...]Have you thought about using Linux’s perf tool it has interesting stats about performance counters. Valgrind is sensible tool but being intrusive it may distort profile a little bit.
Jun 15
Have you thought about using Linux’s perf tool it has interesting stats about performance counters. Valgrind is sensible tool but being intrusive it may distort profile a little bit.We started with Valgrind because it's giving us quite good and stable results so far, but we'll definitely look into perf as well, especially once we switch to ldc2 + PGO and can compare the approaches.
Jun 16
Hi everyone, Last time I posted, we had the basic pipeline working: the bot builds DMD at the base and head commits, runs Cachegrind, and posts a sticky comment. At that point, everything was being built with DMD 2.112.0 as the host compiler. Now we are building with `ldc2-1.42.0` as the host compiler, so the binary we measure more closely resembles a real release build. I got that working, and the numbers improved quite a bit. | Metric | DMD Host | LDC2 Host | Improvement | | -------------------------- | -------: | ----------: | ----------: | | compile hello.d (instr) | 422.8 M | 257.6 M | -39.1% | | compile hello.d -O (instr) | 445.9 M | 277.9 M | -37.7% | | dmd binary size (stripped) | 11.93 MB | 6.96 MB | -41.7% | | hello binary size | 0.72 MB | 0.72 MB | - | | peak RSS | 56 MB | 49 MB | -12.5% | Switching from DMD as the host compiler to LDC2 reduced instruction counts by about 39%, cut the compiler binary size by 41%, and slightly reduced peak memory usage. The results are also very consistent across reruns, and the workflow runs faster as well. Now as my mentor also suggested adding PGO (Profile Guided Optimization) with the LDC2 build for even better results. I didn't know much about PGO at first, but Dennis helped me understand it and also shared his local build script with PGO enabled with me. I think I've almost figured it out, so that will be the next thing I work on. * Finish the PGO integration ASAP. * Add more realistic workloads, like Phobos etc. and then we will start working on the dashboard to display historical results. Yes I will have my semester exams from June 27 to July 3, so I will be unavailable for about a week. Hopefully that's okay. I'll make up for it afterward if needed, and I will make sure to finish the project on time (or even ahead of schedule). Also, I want to thank my mentor, Dennis. He has been incredibly supportive throughout. As always, feel free to leave any feedback or suggestions!
Jun 25
On Thursday, 25 June 2026 at 17:51:10 UTC, Abul Hossain Khan wrote:Hi everyone, Last time I posted, we had the basic pipeline working: the bot builds DMD at the base and head commits, runs Cachegrind, and posts a sticky comment. At that point, everything was being built with DMD 2.112.0 as the host compiler. [...]binary size of dmd is much less important then compile speed
Jun 25
On Thursday, 25 June 2026 at 17:51:10 UTC, Abul Hossain Khan wrote:Now as my mentor also suggested adding PGO (Profile Guided Optimization) with the LDC2 build for even better results. I didn't know much about PGO at first, but Dennis helped me understand it and also shared his local build script with PGO enabled with me. I think I've almost figured it out, so that will be the next thing I work on.+1 for PGO. As far as I'm concerned everyone who is seriously optimizing for performance should be using PGO. Modifying a build script to run the benchmark suite (if you don't have one what are you optimizing) and compile a second time is a pretty simple change, but I've seen up to a 30% decrease in run time... For just a makefile edit.**Semester exam** Yes I will have my semester exams from June 27 to July 3, so I will be unavailable for about a week.You're going to ace it.As always, feel free to leave any feedback or suggestions!If memory serves perf has the ability to diff the performance of 2 separate profiles on a per function basis. Might be handy.
Jun 26









Abul Hossain Khan <abulkhan19175 gmail.com> 