This has been an issue for years, are there any plans to address the ifort bug mentioned here?:
This has been an ongoing issue for a few years. The only work-around to date has been to compile with less aggressive settings (-O1 instead of -O3). At the very least, can we get an idea of the performance hit we're taking (if any) by using the -O1 flag? Thanks!
Nor does mkl use -O1 to set conditional numerical reproducibility. If you have numerical issues with aggressive vectorization in icc you should consider setting options more consistent with what you may use for gcc, such as -fp-model source
Thanks for the quick reply! That makes sense, however I have a couple of follow up questions if you would oblige me (I'm a data scientist, not a software engineer, so I'm trying to understand the full scope of what I'm dealing with here). From strictly a performance point of view, let's say we have two distinct installations of MKL + NumPy + SciPy. The first has NumPy and SciPy compiled with the -O1 optimization and the second compiled with the -O3 optimization. If we run the same program on both setups, 1) will the -O3 compiled installation compile our program more efficiently, resulting in faster performance? and/or 2) greater numeric precision?
If so, what kind of difference are we talking here for both speed and precision?
Thanks again for your help!
As Tim has said, it is best to determine the exact optimization causing test failures in ODR and address that specific issue directly, rather than lowering the optimization level altogether, which disables a battery of optimizations and code transformations altogether.
Unfortunately we have not made progress in identifying this specific Intel Fortran Compiler optimization/code-transformation step yet.
However, while building SciPy for Intel (R) Distribution for Python*, we were able to use `-O3` instead of `-O1` for the entire SciPy, while lowering optimization level only for the odr module.
To see details, please download conda tar-ball of the SciPy from Intel channel, https://anaconda.org/intel/scipy/files . The archive contains info/recipe folder, which includes our patches. In particular, in scipy/odr/setup.py, we added
diff --git a/scipy/odr/setup.py b/scipy/odr/setup.py index 9974dfa..aad4efe 100644 --- a/scipy/odr/setup.py +++ b/scipy/odr/setup.py @@ -22,7 +22,7 @@ def configuration(parent_package='', top_path=None): libodr_files.append('d_lpkbls.f') odrpack_src = [join('odrpack', x) for x in libodr_files] - config.add_library('odrpack', sources=odrpack_src) + config.add_library('odrpack', sources=odrpack_src, extra_f77_compile_args=['-O1']) sources = ['__odrpack.c'] libraries = ['odrpack'] + blas_info.pop('libraries', )
while replacing `-O1` with `-O3` in NumPy's distutils for Intel (R) Fortran compiler.
This made tests pass on Linux 64-bits, but on Mac OS and on Windows further reduction of optimization level were necessary. Specifically, in scipy/sparse/linalg/isolve for extention `_iterative` to `-O1`, and in `scipy/linalg/setup.py` for extension `_fblas` to `-O2`.
While tests are not failing, while we use `-O3`, vectorization is still inhibited by use of `-fp-model strict`.