www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.announce - TSV Utilities release with LTO and PGO enabled

reply Jon Degenhardt <jond noreply.com> writes:
I just released a new version of eBay's TSV Utilities. The cool 
thing about the release is not about changes in toolkit, but that 
it was possible to build everything using LDC's support for Link 
Time Optimization (LTO) and Profile Guided Optimization (PGO). 
This includes running the optimizations on both the application 
code and the D standard libraries (druntime and phobos). Further, 
it was all doable on Travis-CI (Linux and MacOS), including 
building release binaries available from the GitHub release page.

Combined, LTO and PGO resulted in performance improvements 
greater than 25% on three of my standard six benchmarks, and five 
of the six improved at least 8%.

Release info: 
https://github.com/eBay/tsv-utils-dlang/releases/tag/v1.1.16
Jan 14
parent reply Martin Nowak <code dawg.eu> writes:
On Sunday, 14 January 2018 at 23:18:42 UTC, Jon Degenhardt wrote:
 Combined, LTO and PGO resulted in performance improvements 
 greater than 25% on three of my standard six benchmarks, and 
 five of the six improved at least 8%.
Yay, I'm usually seeing double digit improvements for PGO alone, and single digit improvements for LTO. Meaning PGO has more effect even though LTO seems to be the more hyped one. Have you bothered benchmarking them separately?
Jan 15
parent reply Jon Degenhardt <jond noreply.com> writes:
On Tuesday, 16 January 2018 at 00:19:24 UTC, Martin Nowak wrote:
 On Sunday, 14 January 2018 at 23:18:42 UTC, Jon Degenhardt 
 wrote:
 Combined, LTO and PGO resulted in performance improvements 
 greater than 25% on three of my standard six benchmarks, and 
 five of the six improved at least 8%.
Yay, I'm usually seeing double digit improvements for PGO alone, and single digit improvements for LTO. Meaning PGO has more effect even though LTO seems to be the more hyped one. Have you bothered benchmarking them separately?
Last spring I made a few quick tests of both separately. That was just against the app code, without druntime/phobos. Saw some benefit from LTO, mainly one of the tools, and not much from PGO. More recently I tried LTO standalone and LTO plus PGO, both against app code and druntime/phobos, but not PGO standalone. The LTO benchmarks are here: https://github.com/eBay/tsv-utils-dlang/blob/master/docs/dlang-m etup-14dec2017.pdf. I've haven't published the LTO + PGO benchmarks. The takeaway from my tests is that LTO and PGO will benefit different apps differently, perhaps in ways not easily predicted. One of my tools benefited primarily from PGO, two primarily from LTO, and one materially from both. So, it is worth trying both. For both, the big win was from optimizing across app code and libs (druntime/phobos in my case). It'd be interesting to see if other apps see similar behavior, either with phobos/druntime or other libraries, perhaps libraries from dub dependencies.
Jan 15
parent reply Johan Engelen <j j.nl> writes:
On Tuesday, 16 January 2018 at 02:45:39 UTC, Jon Degenhardt wrote:
 On Tuesday, 16 January 2018 at 00:19:24 UTC, Martin Nowak wrote:
 On Sunday, 14 January 2018 at 23:18:42 UTC, Jon Degenhardt 
 wrote:
 Combined, LTO and PGO resulted in performance improvements 
 greater than 25% on three of my standard six benchmarks, and 
 five of the six improved at least 8%.
Yay, I'm usually seeing double digit improvements for PGO alone, and single digit improvements for LTO. Meaning PGO has more effect even though LTO seems to be the more hyped one. Have you bothered benchmarking them separately?
Last spring I made a few quick tests of both separately. That was just against the app code, without druntime/phobos. Saw some benefit from LTO, mainly one of the tools, and not much from PGO.
Because PGO optimizes for the given profile, it would help a lot if you clarified how you do your PGO benchmarking. What kind of test load profile you used for optimization and what test load you use for the time measurement. Regardless, it's fun to hear your test results :-) Johan
Jan 16
parent reply Jon Degenhardt <jond noreply.com> writes:
On Tuesday, 16 January 2018 at 22:04:52 UTC, Johan Engelen wrote:
 Because PGO optimizes for the given profile, it would help a 
 lot if you clarified how you do your PGO benchmarking. What 
 kind of test load profile you used for optimization and what 
 test load you use for the time measurement.
The profiling used is checked into the repo and run as part of a PGO build, it is available for inspection. The benchmarks used for deltas are also documented, they the ones used in the benchmark comparison to similar tools done in March 2017. This report is in the repo (https://github.com/eBay/tsv-utils-dlang/blob/master/docs/Performance.md). However, it's hard to imagine anyone perusing the repo for this stuff, so I'll try to summarize what I did below. Benchmarks - Six different tests of rather different but common operations run on large data files. The six tests were chosen because for each I was able to find at least three other tools, written in native compiled languages, with similar functionality. There are other valuable benchmarks, but I haven't published them. Profiling - Profiling was developed separately for each tool. For each I generated several data files with data representative of typical uses cases. Generally numeric or text data in several forms and distributions. The data was unrelated to the data used in benchmarks, which is from publicly available machine learning data sets. However, personal judgement was used in the generation of the data sets, so it's not free from bias. After generating the data, I generated a set of run options specific to each tool. As an example, tsv-filter selects data file lines based on various numeric and text criteria (e.g. less-than). There are a bit over 50 comparison operations, plus a few meta operations. The profiling runs ensure all the operations are run at least once, but that the most important overweighted. The ldc.profile.resetAll call was used to exclude all the initial setup code (command line argument processing). This was nice because it meant the data files could be small relative to real-world sets, and it runs fast enough to do at part of the build step (ie. on Travis-CI). Look at https://github.com/eBay/tsv-utils-dlang/tree/master/tsv-filter/profile_data to see a concrete example (tsv-filter). In that directory are five data files and a shell script that runs the commands and collects the data. This was done for four of the tools covering five of the benchmarks. I skipped one of the tools (tsv-join), as it's harder to come up with a concise set of profile operations for it. I then ran the standard benchmarks I usually report on in various D venues. Clearly personal judgment played a role. However, the tools are reasonably task focused, and I did take basic steps to ensure the benchmark data and tests were separate from the training data/tests. For these reasons, my confidence is good that the results are reasonable and well founded. --Jon
Jan 16
parent reply Johan Engelen <j j.nl> writes:
On Wednesday, 17 January 2018 at 04:37:04 UTC, Jon Degenhardt 
wrote:
 Clearly personal judgment played a role. However, the tools are 
 reasonably task focused, and I did take basic steps to ensure 
 the benchmark data and tests were separate from the training 
 data/tests. For these reasons, my confidence is good that the 
 results are reasonable and well founded.
Great, thanks for the details, I agree. Hope it's useful for others to see these details. (btw, did you also check the performance gains when using the profile of the benchmark itself, to learn about the upper-bound of PGO for your program?) I'll merge the IR PGO addition into LDC master soon. Don't know what difference it'll make. -Johan
Jan 17
parent Jon Degenhardt <jond noreply.com> writes:
On Wednesday, 17 January 2018 at 21:49:52 UTC, Johan Engelen 
wrote:
 On Wednesday, 17 January 2018 at 04:37:04 UTC, Jon Degenhardt 
 wrote:
 Clearly personal judgment played a role. However, the tools 
 are reasonably task focused, and I did take basic steps to 
 ensure the benchmark data and tests were separate from the 
 training data/tests. For these reasons, my confidence is good 
 that the results are reasonable and well founded.
Great, thanks for the details, I agree. Hope it's useful for others to see these details.
Thanks Johan, much appreciated. :)
 (btw, did you also check the performance gains when using the 
 profile of the benchmark itself, to learn about the upper-bound 
 of PGO for your program?)

 I'll merge the IR PGO addition into LDC master soon. Don't know 
 what difference it'll make.
No, I didn't do an upper-bounds check, that's a good idea. I plan to test the IR based PGO when it's available, I'll run an upper-bounds check as part of it.
Jan 17