- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have just received my copy of the Intel Fortran compiler (Linux) as an Open Source Contributor. I first idea I had was to compare it to the gfortran compiler. As a benchmark I have tried the following code:
https://github.com/marcobruns/fortran_performance_for_neural_networks/blob/master/fortran_matmul.f90
with gfortran the compiled code took 24.5sec to be executed and the with ifort it took 219.4sec!!!!! The blas version of the code (same repository quoted above) performed with nearly identical execution durations close to 3.5sec.
Why does it take so much longer when compiled with ifort.
I am using the following compiler versions:
gfortran: GNU Fortran (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0
Intel Fortran: Intel(R) Fortran Intel(R) 64 Compiler for applications running on Intel(R) 64, Version 19.0.3.199 Build 20190206
Thank you very much in advance for any kind of constructive criticism.
Best Wishes
Marco
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The ifort compiler option '-parallel' does a great job at your code. With that option ('-O3 -fast -parallel') I could reduce execution time from 21 sec. to 1.4 sec. compared to '-O3 -fast' only (for my cpu, PSXE 2019 u3).
I think '-qopt-matmul' can also be used. In that case one has to specify -mkl:parallel also. '-O3 -parallel' triggers '-qopt-matmul'.
ps: It might be good way to use modules instead of interfaces in your code.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Marco,
indeed I can confirm your timings, for gfortran 5.4 they are very similar to ifort v19.0.3, namely ca. 290-300 s, while for gfortran 9.0.1
I get 50s. I know that between v6 and v7 of gcc/gfortran there was some massive work on performance and optimization (which also caused a couple of optimization regressions), so that might be a reason. I didn't look into the details to find out what exactly is going on.
Cheers,
JRR
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Bruns, Marco wrote:.. Why does it take so much longer when compiled with ifort. ..
Thank you very much in advance for any kind of constructive criticism.
@Bruns, Marco,
You may also want to submit a support request about this at Intel Support Center: https://supporttickets.intel.com/servicecenter?lang=en-US
My hunch is they will request details such as compiler options especially with optimization, etc. with your 2 comparisons and it will be worth sharing them here as well.
See this as to how they list the compiler options they employed in the comparisons: https://www.fortran.uk/fortran-compiler-comparisons/polyhedron-benchmarks-linux64-on-intel/
Are you using the same set of options?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The ifort compiler option '-parallel' does a great job at your code. With that option ('-O3 -fast -parallel') I could reduce execution time from 21 sec. to 1.4 sec. compared to '-O3 -fast' only (for my cpu, PSXE 2019 u3).
I think '-qopt-matmul' can also be used. In that case one has to specify -mkl:parallel also. '-O3 -parallel' triggers '-qopt-matmul'.
ps: It might be good way to use modules instead of interfaces in your code.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
ifort perf1.f90 -o perf -O3 -parallel ./perf mat1 created! mat2 created! matrix multiplication took: 1.472000
What options were used with gfortran build?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
FWIW
The sample code is serial. However, depending on the libraries linked, the parallel version of the MKL matmul may be called, and if so, the first call has the additional overhead of instantiating the OpenMP thread pool (or other thread pool if this has changed).
For smaller arrays, you can link in the non-threaded MKL library.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
FYI
'-qopt-matmul' without '-parallel' and with '-mkl:sequential' creates link errors. The documentation is not clear about this. Intentionally, I would suggest that this option should link (PSEX 2017 up to 2019u3 on Windows OS, Linux version seem to be different PSXE 2017u6 links). Might be a bug?
In an older thread of mine I encountered a similar issue for PSXE 2015/2016, which was solved in PSXE 2016 u3 (https://software.intel.com/en-us/forums/intel-visual-fortran-compiler-for-windows/topic/606632). However, MKL/BLAS offload of matmul has not been performed in this case.
Maybe it is a nice feature to let the compiler choose between intrisic matmul and MKL/BLAS by '-mkl=sequential' for the case that you strictly need sequential code?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I haven't checked the latest ifort, but past versions did not vectorize matmul effectively until you set -O3 (which implies -qopt-matmul, which you may or may not want, either at -O2 or -O3). Surely gfortran performance also depends on your compiler settings, but not in the same way.
-qopt-matmul is implemented with linking to an internal entry point in MKL library. If you wished to set -O3 for good single thread performance of MATMUL and did not want to link MKL, you would turn off opt-matmul explicitly. opt-matmul is probably required for threaded MATMUL ; -qparallel would imply -qopt-matmul -mkl .
Past versions of ifort have MKL_DIRECT options to optimize MATMUL for moderate size problems. Unless the release notes indicate a change, I would expect the latest version to work with the documentation of earlier versions. Perhaps the latest version has done away with the need to consider these options.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I can confirm after installation PSXE2019u3 for my GNU/Linux that linking against mkl:sequential works fine (ifort -qopt-matmul -mkl:parallel matmultest.f90). The same fails for Windows OS for PSXE 2019 family (ifort /Qopt-matmul /Qmkl:sequential matmultest.f90), while /Qmkl:parallel works fine. I don't know if it's an issue with my system. Nevertheless, I will open a ticket.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Johannes,
thank you very much (and I would also like to thank everybody who replied to my question) for your input
johannes k. wrote:The ifort compiler option '-parallel' does a great job at your code. With that option ('-O3 -fast -parallel') I could reduce execution time from 21 sec. to 1.4 sec. compared to '-O3 -fast' only (for my cpu, PSXE 2019 u3).
Sorry, for not mentioning it - since it is vital information - I have used no compiler for optimization. the commands for compiling the code producing my results are:
ifort fortran_matmul.f90 -o fortran_matmul
gfortran fortran_matmul.f90 -o fortran_matmul
But regarding your answer, I will defintely look into the compiler options for ifort (and also for gfortran), since there are obviously some promising compiler options available to speed up the execution of the code.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page