I depend a lot on SSE/AVX auto-vectorization and it seems that /Qipo disables it. These are relevant parameters I'm using:
/arch:SSE2 /QxSSE2 /Qvec-report /QaxAVX /Qftz
Tye compiler reports lots of loops being vectorized. But if I add /Qipo, it states that the messages will be generated by linker (makes sense), but the linker reports nothing... (I'm not adding /Qvec-report to it though, doesn't seem logical anymore)
You would still need vecreport to see the messages. In past compiler versions, IPO might still have suppressed them.
I have concern about issuing redundant or conflicting arch options but that doesn't appear to be the problem.
If you specify both the /Qax and /arch options, the compiler will not generate Intel specific instruction. Try using just the /QxSSE2 (which is default BTW) and use the "/Qopt-report-phase:vec /Qopt-report-file:stdout" to see the vectorization output accordingly.
actually I didn't check, because I gave up on this feature. I'm compiling huge source codes anyway (kind of alternative approach having most things included in a single source code, usually makes the compilation time faster and provides these optimizations automatically). Anyway this /Qipo had no improvement in performance. I also tried profile guided optimizations and these actually made the performance worse...
With IPO optimization it's a two step process for the compiler, first generating the intermediate language (IL) in the object files (mock objects) and at link time it's invoked again to figure out options used before and then merge all the IL in the object files and analyzed for IPO opportunites. This means that the link step will take a while, because the entire program is being examined and hence the build time can be greater. To avoid large build times, try to use IPO on performance critical files/libs only and avoid using it on all the files.
If there's a lot of inlining and better register usage opportunities in the code it might boost performance with some tradeoffs on compile time.
With PGO it's dependent really on whether the application has a high number of performance critical small sections of code that's executed very frequently which the compiler tries to optimize accordingly.
Also, with the advent of newer processors with a rich set of vector extensions in the instruction set, you should try and see if vectorization using avx/avx2 etc., (depending on whether the system you're running on supports or not) to exploit data parallelism that should increase performance. You can generate the optimization reports accordingly for those phases (ipo, vec, hlo etc) and check out what optimizations were made if any as well.