I upgraded my Intel C++ and Fortran compilers last week (September 26-30, 2016) to the Intel 17 versions 17.0.0 and immediately ran into what appears to be a compiler bug due to improper inlining of code. My code did not have this problem with 16.0.3, which I had previously been using. I have not yet been able to get a simple reproducer, so I can only supply the symptoms. My code fails in what appears to be a two or three level inlined code. When it tries to do a sufficiently deep inline expansion I get a run time segmentation violation. The author of this section of the code (which is a huge multideveloper project) is trying to do very aggressive inlining, and this same code has been used intensively since Intel version 15. I experimented with different compilation options and discovered that using the arguments "-no-ip -fno-inline" to replace the argument "-ip" for ifort enables my code to again run properly. I tracked down the location of the offending subroutine, which was quite simply a single loop and when I removed this as a sub-subroutine (i.e. a contains routine inside another subroutine) and simply added the loop to replace the original call, then the code will also again run. This is the basis for my suspicion that the problem is with the compiler doing something wrong when it tries to inline this code. My only guess is that some memory access was messed up by the inlining, perhaps a stack overflow. If I can create a simple reproducer I will supply it to this forum, but for now I just wanted to get this bug documented somewhere.
(if not already) try specifying that the generated code with -recursive (-qopenmp defaults to including -recursive).
This alters how local variables are placed. Note, you are not intending to actually recursively call the subroutine, your intention is to direct the compiler to place/manage local data in a recursively safe manner.
Note, over-aggressive inlining can be counter-productive to performance. CPU's tend to have a limited amount of L1 instruction cache. Too aggressive inlining and loop unrolling will subvert L1 instruction cache usage. I suggest you run test: old compiler that works with aggressive inlining and new compiler without. This will get you by until a fix is made. Then you will have to re-evaluate as to the benefit, if any, of more aggressive inlining/loop unrolling.
Among the problems I've encountered with -ip over a range of recent ifort releases is breakage in the protect_parens setting. I submitted an IPS problem report on that. As you mentioned, on linux (but not Windows) it might be avoided by setting -fno-inline-functions, or (for traditional code without module procedures) by fsplitting the sources.
I've also seen failure of a procedure to receive its arguments correctly until ip was turned off. I don't have permission to submit the reproducer. There were also cases showing extreme register pressure, where any use of too aggressive options would push the code generation over the limit of correctness (and, believe it or not, -prec-div could count as too aggressive, depending how the source was arranged).
As you may have guessed, these problems have aggravated difficulty in obtaining reproducible behavior among linux, Windows command line, and Windows GUI builds.
In-lining may expose problems with incorrect declarations; for example, we had one where over-running a character string caused problems only when ip was set. warn-interfaces might catch some of these.