www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Profile-Guided Optimization (PGO) support in D ecosystem

reply Alexander Zaitsev <zamazan4ik tut.by> writes:
Hi!

I am investigating the Profile-Guided Optimization (PGO) state 
across the ecosystem - all my current results (with a lot of 
benchmarks, PGO-related information, and much more) are available 
at https://github.com/zamazan4ik/awesome-pgo . I am interested in 
PGO state in the D ecosystem too - that's why I am here.

I had been researching a little about PGO in D but compared to 
C++ almost no information is available in the official 
documentation (or I just don't know where to search it, hah). I 
have the following questions (for each D compiler: DMD, GDC, LDC 
- am interested in all of them):

1. What is the most up-to-date place for PGO documentation? Right 
now I found only 
[this](https://wiki.dlang.org/LDC_LLVM_profiling_instrumentation) 
for LDC. What about DMD and GDC?
2. Does any D compiler support [Sampling 
PGO](https://clang.llvm.org/docs/UsersManual.html#using-sampling-profilers)
(also known as [AutoFDO](https://github.com/google/autofdo))? If Sampling PGO
is not supported - do you plan to support it in the future? For us sampling PGO
can be important since it's much easier to use for gathering the PGO profiles
directly from the production environment without hurting the production
performance a lot.
3. Do you support 
[other](https://aaupov.github.io/blog/2023/07/09/pgo) PGO modes 
like CSIR PGO in D compilers? If not, do you plan to support them 
in the future?
4. What performance improvements did you get with enabling LTO + 
PGO on D compilers? Could you please share the number for each 
compiler? With this information it's much easier to consider 
rebuilding a D compiler (due to strict security requirements) 
locally with PGO since we can estimate benefits from PGO for the 
D compiler based on the actual benchmarks from the compiler 
developers.
5. Is there any documentation on how to build DMD and GDC with 
LTO+PGO? I am looking for smth like it's 
[done](https://clickhouse.com/docs/en/operations/optimizing-performance/profile-
uided-optimization) in the ClickHouse documentation (or like it's done for
Clang or Rustc).
6. Am I right that the officially released D compiler binaries 
are already LTO + PGO optimized? According to the 
[script](https://github.com/ldc-developers/ldc/blob/master/.github
workflows/main.yml) it's true at least for LDC. What about other compilers?



Similar questions about LDC in the upstream: 
https://github.com/ldc-developers/ldc/discussions/4524

Thanks a lot for the help!
Nov 10 2023
next sibling parent reply Sergey <kornburn yandex.ru> writes:
On Friday, 10 November 2023 at 12:47:56 UTC, Alexander Zaitsev 
wrote:
 Hi!


 Similar questions about LDC in the upstream: 
 https://github.com/ldc-developers/ldc/discussions/4524

 Thanks a lot for the help!
Hi, Some internal details about PGO is in this article: https://archive.fosdem.org/2017/schedule/event/ldc_d_optimization/attachments/slides/1819/export/events/attachments/ldc_d_optimization/slides/1819/FOSDEM_2017.pdf and blog posts from LDC dev: http://johanengelen.github.io/ldc/2016/11/10/Link-Time-Optimization-LDC.html http://johanengelen.github.io/ldc/2016/04/13/PGO-in-LDC-virtual-calls.html For PGO + LTO example of usage and the performance improvements, I think one of the most helpful source is this repo: https://github.com/eBay/tsv-utils/blob/master/docs/BuildingWithLTO.md https://github.com/eBay/tsv-utils/blob/master/docs/lto-pgo-study.md
Nov 10 2023
parent Johan <j j.nl> writes:
On Friday, 10 November 2023 at 13:09:01 UTC, Sergey wrote:
 On Friday, 10 November 2023 at 12:47:56 UTC, Alexander Zaitsev 
 wrote:
 Hi!


 Similar questions about LDC in the upstream: 
 https://github.com/ldc-developers/ldc/discussions/4524

 Thanks a lot for the help!
Hi, Some internal details about PGO is in this article: https://archive.fosdem.org/2017/schedule/event/ldc_d_optimization/attachments/slides/1819/export/events/attachments/ldc_d_optimization/slides/1819/FOSDEM_2017.pdf and blog posts from LDC dev: http://johanengelen.github.io/ldc/2016/11/10/Link-Time-Optimization-LDC.html http://johanengelen.github.io/ldc/2016/04/13/PGO-in-LDC-virtual-calls.html For PGO + LTO example of usage and the performance improvements, I think one of the most helpful source is this repo: https://github.com/eBay/tsv-utils/blob/master/docs/BuildingWithLTO.md https://github.com/eBay/tsv-utils/blob/master/docs/lto-pgo-study.md
This list is a fantastic "overview" Sergey, thanks! Please add this to the discussion https://github.com/ldc-developers/ldc/discussions/4524, so it does not get lost as easily. Cheers, Johan
Nov 11 2023
prev sibling next sibling parent Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Friday, 10 November 2023 at 12:47:56 UTC, Alexander Zaitsev 
wrote:
 Hi!

 I am investigating the Profile-Guided Optimization (PGO) state 
 across the ecosystem - all my current results (with a lot of 
 benchmarks, PGO-related information, and much more) are 
 available at https://github.com/zamazan4ik/awesome-pgo . I am 
 interested in PGO state in the D ecosystem too - that's why I 
 am here.

 I had been researching a little about PGO in D but compared to 
 C++ almost no information is available in the official 
 documentation (or I just don't know where to search it, hah). I 
 have the following questions (for each D compiler: DMD, GDC, 
 LDC - am interested in all of them):

 Thanks a lot for the help!
I have only used PGO with LDC, if I remember correctly I posted something about it in the forums. Let me see if I can find it. I think it was this: https://forum.dlang.org/post/ajorqeooyccwuwpvteue forum.dlang.org
Nov 11 2023
prev sibling next sibling parent Iain Buclaw <ibuclaw gdcproject.org> writes:
On Friday, 10 November 2023 at 12:47:56 UTC, Alexander Zaitsev 
wrote:
 Hi!
IIRC, Jon wrote a bit about LTO and PGO (with benchmarks somewhere) for tsv-utils. https://github.com/eBay/tsv-utils/
 1. What is the most up-to-date place for PGO documentation? 
 Right now I found only 
 [this](https://wiki.dlang.org/LDC_LLVM_profiling_instrumentation) for LDC.
What about DMD and GDC?
Look up any GCC documentation/how-tos on using `-fprofile-generate=` and `-fprofile-use=`.
 5. Is there any documentation on how to build DMD and GDC with 
 LTO+PGO? I am looking for smth like it's 
 [done](https://clickhouse.com/docs/en/operations/optimizing-performance/profile-
uided-optimization) in the ClickHouse documentation (or like it's done for
Clang or Rustc).
LTO and PGO aren't a feature of the language, rather the compiler infrastructure.
Nov 12 2023
prev sibling parent reply max haughton <maxhaton gmail.com> writes:
On Friday, 10 November 2023 at 12:47:56 UTC, Alexander Zaitsev 
wrote:
 Hi!

 I am investigating the Profile-Guided Optimization (PGO) state 
 across the ecosystem - all my current results (with a lot of 
 benchmarks, PGO-related information, and much more) are 
 available at https://github.com/zamazan4ik/awesome-pgo . I am 
 interested in PGO state in the D ecosystem too - that's why I 
 am here.

 [...]
If you search for PGO in the dmd repo you will find that I implemented a pgo build for the compiler a while ago. I'm not sure if it's enabled for releases but we use it internally at Symmetry IIRC. One thing if note is that GCC has a feature called AutoFDO which is quite interesting. I think LLVM might have a similar concept but I'm not sure, but also has a tool called Bolt which does the same thing only after compilation.
Nov 12 2023
parent reply Siarhei Siamashka <siarhei.siamashka gmail.com> writes:
On Sunday, 12 November 2023 at 18:03:19 UTC, max haughton wrote:
 If you search for PGO in the dmd repo you will find that I 
 implemented a pgo build for the compiler a while ago.
The PGO code is there, but it seems to fail at compiling "dshell_prebuilt.d" file. And aborts collecting the profiling data prematurely because of this: ``` [...] Built dmd with PGO instrumentation Compiling dmd testsuite to generate PGO data Executing: ldmd2 -m64 -of/tmp/dmd/compiler/test/test_results/d_do_test /tmp/dmd/compiler/test/tools/d_do_test.d -fPIC -I/tmp/dmd/compiler/test/tools -i -version=NoMain Executing: ldmd2 -m64 -of/tmp/dmd/compiler/test/test_results/unit_test_runner /tmp/dmd/compiler/test/tools/unit_test_runner.d -fPIC /tmp/dmd/compiler/test/tools/paths Executing: /tmp/dmd/generated/linux/release/64/dmd -conf= -m64 -of/tmp/dmd/compiler/test/test_results/dshell_prebuilt.o -c /tmp/dmd/compiler/test/tools/dshell_prebuilt/dshell_prebuilt.d -fPIC Executing: ldmd2 -m64 -of/tmp/dmd/compiler/test/test_results/sanitize_json /tmp/dmd/compiler/test/tools/sanitize_json.d -fPIC /tmp/dmd/compiler/test/tools/dshell_prebuilt/dshell_prebuilt.d(6): Error: unable to read module `stdlib` /tmp/dmd/compiler/test/tools/dshell_prebuilt/dshell_prebuilt.d(6): Expected 'core/stdc/stdlib.d' or 'core/stdc/stdlib/package.d' in one of the following import paths: import path[0] = /tmp/dmd/compiler/test/../../druntime/import import path[1] = /tmp/dmd/compiler/test/../../../phobos failed to build '/tmp/dmd/compiler/test/test_results/dshell_prebuilt.o' dmd tests failed! This will not end the PGO build because some data may have been gathered Merging PGO data [...] ``` The compiler is still built successfully, but its performance is not optimal this way. I patched up "dshell_prebuilt.d" to strip everything out of it, the error disappears, the whole DMD test suite seems to compile and the compiler gets a noticeable performance boost.
 I'm not sure if it's enabled for releases
This doesn't seem to be the case at least for the https://downloads.dlang.org/releases/2.x/2.106.0/dmd.2.106.0.linux.tar.xz tarball. LTO is enabled, but apparently without PGO.
 but we use it internally at Symmetry IIRC.
It's good to know that DMD with LTO+PGO is already successfully used in production at Symmetry. Would it make sense to also enable this optimization for everyone else?
Dec 17 2023
next sibling parent reply max haughton <maxhaton gmail.com> writes:
On Monday, 18 December 2023 at 01:57:17 UTC, Siarhei Siamashka 
wrote:
 On Sunday, 12 November 2023 at 18:03:19 UTC, max haughton wrote:
 [...]
The PGO code is there, but it seems to fail at compiling "dshell_prebuilt.d" file. And aborts collecting the profiling data prematurely because of this: [...]
Does druntime need building?
Dec 17 2023
parent reply Siarhei Siamashka <siarhei.siamashka gmail.com> writes:
On Monday, 18 December 2023 at 02:30:21 UTC, max haughton wrote:
 On Monday, 18 December 2023 at 01:57:17 UTC, Siarhei Siamashka 
 wrote:
 On Sunday, 12 November 2023 at 18:03:19 UTC, max haughton 
 wrote:
 [...]
The PGO code is there, but it seems to fail at compiling "dshell_prebuilt.d" file. And aborts collecting the profiling data prematurely because of this: [...]
Does druntime need building?
That's a good question. As the author of this code, you probably have a much better idea about how it's supposed to work. I tried to come up with some scriptable step by step build instructions: ``` DMD_TAG=v2.106.0 LDMD=ldmd2-1.32.0 git clone --depth 1 --branch "${DMD_TAG}" https://github.com/dlang/dmd.git || exit 1 git clone --depth 1 --branch "${DMD_TAG}" https://github.com/dlang/phobos.git || exit 1 cd dmd make -j4 -f posix.mak HOST_DMD=$LDMD ENABLE_RELEASE=1 ENABLE_LTO=1 || exit 1 cd ../phobos make -j4 -f posix.mak || exit 1 cd ../dmd cp generated/linux/release/64/dmd "../dmd_${DMD_TAG}_lto" rm -rf generated rdmd compiler/src/build.d OS="linux" BUILD="release" MODEL="64" HOST_DMD="$LDMD" CXX="c++" AUTO_BOOTSTRAP="" DOCDIR="" STDDOC="" DOC_OUTPUT_DIR="" MAKE="make" VERBOSE="" ENABLE_RELEASE="1" ENABLE_DEBUG="" ENABLE_ASSERTS="" ENABLE_LTO="1" ENABLE_UNITTEST="" ENABLE_PROFILE="" ENABLE_COVERAGE="" DFLAGS="" dmd-pgo || exit 1 cp generated/linux/release/64/dmd "../dmd_${DMD_TAG}_lto+pgo" ls -l dmd_* ``` Running it results in the following: ``` -rwxr-xr-x 1 ssvb ssvb 7626560 Dec 18 13:30 dmd_v2.106.0_lto -rwxr-xr-x 1 ssvb ssvb 7994232 Dec 18 13:38 dmd_v2.106.0_lto+pgo ``` The `dmd_v2.106.0_lto` file roughly matches the size and performance characteristics of the `dmd` executable from the https://downloads.dlang.org/releases/2.x/2.106.0/dmd.2.106.0.linux.tar.xz release tarball and `dmd_v2.106.0_lto+pgo` is its faster PGO-enabled upgrade. I can observe at least 10% compilation time reduction when using the PGO-enabled `dmd`. It's rather messy, but this somehow works. There are many questions though. For example, should the "dmd-pgo" target be accessible from the makefile without invoking "rdmd compiler/src/build.d" directly? Is sharing the same directory "generated/linux/release" for the produced non-PGO and PGO binaries actually okay? Is the dmd testsuite a good training set or maybe collecting profiling data during Phobos compilation would be better? The "dshell_prebuilt.d" glitch happens if the PGO-enabled DMD is built before Phobos & druntime and this makes everything fragile and non-intuitive. So the LTO-enabled DMD needs to be built first, then we need to use it to compile druntime, and finally the "generated/linux/release" directory has to be erased before the PGO build is started in order not to clash with it. Either way, providing faster PGO-enabled binary releases of DMD would make it more competitive in the compilation speed race against LDC: https://forum.dlang.org/post/pugqkvthbicqaigemijj forum.dlang.org :-)
Dec 18 2023
parent Siarhei Siamashka <siarhei.siamashka gmail.com> writes:
On Monday, 18 December 2023 at 12:24:39 UTC, Siarhei Siamashka 
wrote:
 [...]
 It's rather messy, but this somehow works. There are many 
 questions though.
 [...]
Max, do you have any comments on the described procedure? Are people from Symmetry using the current unmodified `build.d` from the DMD repository for building the compiler with PGO support? Or have you done some extra customizations since then? The PGO issue is not resolved until the official DMD binary releases actually take it into use.
Dec 21 2023
prev sibling parent Siarhei Siamashka <siarhei.siamashka gmail.com> writes:
On Monday, 18 December 2023 at 01:57:17 UTC, Siarhei Siamashka 
wrote:
 This doesn't seem to be the case at least for the 
 https://downloads.dlang.org/releases/2.x/2.106.0/dmd.2.106.0.linux.tar.xz
tarball. LTO is enabled, but apparently without PGO.
Formally submitted an issue about this at https://issues.dlang.org/show_bug.cgi?id=24287 So that the https://github.com/dlang/installer maintainers can probably take some action to improve the current situation.
Dec 18 2023