I have some static code that I've been tuning as much as I can, trying all sorts of things. The fastest I could get was 9.5 seconds. When I ran the program multiple times with prof-gen, and then used prof-use, the code is now down to 8.2 seconds. Wow.
Is there any way I can find out what it did? Any hints/reports? I've looked through the assembler but would like some hints at producing the source code in the first place.
With the knowledge of program runtime execution behavior PGO (Profile Guided Optimization) is able to do a number of optimizations such as:
- Better data & code layout - Frequently accessed code placed next together - Better instruction cache usage & fetching - Improved branch prediction - good for branchy apps/large switch blocks (e.g. moving the most frequently taken branch higher up in the block). - Better function inlining (e.g. inline hot functions, not cold ones)
You could generate a PGO optimization report (using -opt-report, -opt-report-phase options) and see what kind of transformation took place. Use "pgo" for the opt-report phase. Use -opt-report-help to get a list of various optimzation phases that you can get report on.
You could also add -O3, and then -ipo to the PGO options and see if you get more performance.