Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

PGOize Intel Compiler

foxtran
New Contributor I
1,815 Views

Dear,

Compilation of large projects takes a time. For my case, it takes about 2-3 hours in Release mode on multicore machines. Due to module usage, development of core part requires to rebuild the whole app. So, one iteration of write-build-test may took several hours. To speed-up compilation, one can use Profile-Guided Optimization (PGO) applied to compiler. 
According to homebrew tests of clang (https://github.com/Homebrew/homebrew-core/pull/79454#issuecomment-869055079) they have a speed up from 19000 s to 15400 s (~18%) in building clang itself and from 195 s to 156 s (~20%) in building Python3.
You can find more successful examples of PGO in the following not yet finished arcticle: https://github.com/zamazan4ik/awesome-pgo/blob/main/article/article.md#attention-anchor

In Clang-LLVM, there is an instruction how to PGOize clang itself: https://llvm.org/docs/HowToBuildWithPGO.html#introduction

It would be nice to use Profile-Guided Optimization (PGO) for ifx/icx compilers. It also could be a nice test sets for PGO in Intel Compilers and an excellent advertisement to use PGO in end-user applications.

Labels (1)
12 Replies
zamazan4ik
Beginner
1,781 Views

Much more PGO benchmark results with actual performance numbers can be found at https://github.com/zamazan4ik/awesome-pgo/blob/main/README.md#pgo-showcases

0 Kudos
andrew_4619
Honored Contributor III
1,774 Views

submodules is the answer to large build cascades caused by modules. A lot of work but worth the effort in my cases. 

0 Kudos
foxtran
New Contributor I
1,761 Views

Dear Andrew,

Unfortunately, submodules will not help when you are updating API.  And the app's compilation is well parallelized up to 250 threads.

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,760 Views

@foxtran 

In this situation (and pre-submodules), what I've found that can help is to separate your module's data section from the code (contains) section. Should the API of a code section of a module not change, but the code does, then this method will reduce the number of files that require recompilation. Note, this technique is similar to use of submodules.

 

Also do not use a conglomeration module (one that contains USE ... of all the other modules).

 

Jim Dempsey

0 Kudos
foxtran
New Contributor I
1,755 Views

@jimdempseyatthecove 

Unfortunately, I'm updating the API, so the full recompilation must be. I know about side effects of using modules and submodules, and currently is it not a solution. Sure, it decreases number of files required to be recompiled but it is still huge, that is why I'm asking about general speed-up of compiler. 

Igor

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,723 Views

>>For my case, it takes about 2-3 hours in Release mode

How long does this take in Debug mode (iow with IPO disabled)?

If it is much shorter in Debug mode, then disable multi-file ipo (you can keep single file ipo).

 

Your situation still sounds like you are using a (or equivalent to) conglomeration module. e.g.

    use YourLibraryWithAllTheAPIs

 

Note, while this may be what you intend to distribute, for development reduction in turnaroundo time, I would suggest that you replace that statement with

    use SpecificModule1

    use SpecificModule2

    ....

Hopefully, you are editing an API that is not common to many/all of the source files.

 

What you distribute to your end users is the YourLibraryWithAllTheAPIs

 

Jim Dempsey

 

0 Kudos
foxtran
New Contributor I
1,698 Views

>> How long does this take in Debug mode (iow with IPO disabled)?

With -O1 it will take a close time: 2-3 hours. With -O0 it takes days due to CMPLRLLVM-53405. Also, it is useless to compile code in debug mode since it will check only compiliability, but not timings.

>> disable multi-file ipo

I did not enable IPO yet.

>> conglomeration module

Or a constants like PI in global module which is widely used. Please, do not try to use telepathy ability

>> Hopefully, you are editing an API that is not common to many/all of the source files.
>>>> development of core part

>> What you distribute to your end users is the YourLibraryWithAllTheAPIs

Then each user will need to compile my library but not many users have such machines like me. It will be a painful for them.

Igor

0 Kudos
andrew_4619
Honored Contributor III
1,627 Views
As a point of interest how big is this code? The compile times seem very long compared to what I am used to which suggests either the code is mega huge or that it has some unusual attributes.
0 Kudos
foxtran
New Contributor I
1,536 Views
0 Kudos
andrew_4619
Honored Contributor III
1,524 Views

Then your compile times of 3 hours are quite fast for the size of the task.  I can see that the compile cascade makes development very slow.  I am not very convinced that some efficiency that makes compiling  20% faster helps that much IMO.  The use of submodules made my typical build  many times faster  like 1000% faster.

I noticed you wrote "Unfortunately, I'm updating the API, so the full recompilation must be. I know about side effects of using modules and submodules, and currently is it not a solution. Sure, it decreases number of files required to be recompiled but it is still huge, that is why I'm asking about general speed-up of compiler. "

I don't understand this, if a typical build drops to say 1/3rd of the files that is massively faster (300%) is it not? I am guessing as I do not know how this code is organised and structured and I also recognize that this it a big job to restructure.  What I do know from experience is that  use of modules and interrelated modules in a large code is a disaster in terms of build efficiency.   My development rules are:

1) Never use modules only sub-modules

2) Never USE modules in the  interfaces module of a submodule ( the except being  data modules that only have constants such as kinds, PI etc  that are never changing).

3) only expose interfaces for module procedures that are referenced from outside the module. 

4) For new developments branches create a new dedicated sub-module on use the sub-module during the development phase only at the lowest level of where it is called up.  When it is finalised the routines can be distributed into the code base wherever is most appropriate.

There are some more things but that is the main thrust. As a result most of my  build are only a tiny fraction of the code base. An yes I will say it again that is probably a huge task bit a first step using so automated tools to convert modules into sub modules is a start. 

 

0 Kudos
foxtran
New Contributor I
1,469 Views

@Barbara_P_Intel , could one please consider applying PGO to Intel Compiler?


The main motivation is to speed-up all compilation processes: both first-time compilation and incremental builds. First-time compilation is especially important in the case of Continuous Integration builds where usually full compilation is used. Incremental builds, which are typical for developers also get profit, but not so noticeable.

The good idea of submodules' usage, suggested by @jimdempseyatthecove and @andrew_4619, only decreases the number of files required to rebuild in incremental builds and it does not help to CI/end-users who need to compile it from beginning. However, in that case, PGO can give additional speed-up of compilation. 

0 Kudos
Barbara_P_Intel
Employee
1,455 Views

I can certainly pose building with PGO as a Feature Request. I'll check to see if it will get any traction among all the other requests.

If you run across any more like CMPLRLLVM-53405, please report it. The analysis of that reproducer shows the time is related to register allocation.

Other developers are also reporting long compile times and the compiler team is addressing them.

 

Reply