I'm now trying to find the best compilation flags for my code (CFD) since a couple of weeks but I think maybe thing wrong because the behaviour of my code is wired.
My code is written in fortran (77/95) and I use OpenMPI for parallelisation. According to the Intel site of the CPU I use SSE4.2 can be used for optimization! (http://ark.intel.com/products/52576/Intel-Xeon-Processor-X5690-12M-Cache-3_46-GHz-6_40-GTs-Intel-QPI), therefore I compiled OpenMPI with the following command:
./configure --prefix=/opt/OpenMpi_intel_Opt_static/ CC=icc CXX=icpc FC=ifort CFLAGS="-msse4.2 -axsse4.2" CXXFLAGS="-msse4.2 -axsse4.2" FFLAGS="-msse4.2 -axsse4.2" FCFLAGS="-msse4.2 -axsse4.2" LDFLAGS="-msse4.2 -axsse4.2" --with-platform=optimized --disable-shared --enable-static
based on (https://software.intel.com/en-us/articles/performance-tools-for-software-developers-building-open-mpi-with-the-intel-compilers).
My code is than compiled with the following:
mpif90 -c -axsse4.3 -O3 files.f mpif90 -o prog all.o -axsse4.2 -O3
It seems, that based on the size of my arrays (in my case the number of entries of the array are equal my domain size) I get a good result or just "NaN". And when I get "NaN" the following remark is shown during compilation:
MAIN__ has been targeted for automatic cpu dispatch ....
If I delete the "-axsse4.2" flag it worked fine but It takes longer!
Is there another way to optimize my code or change the compilation flags in order to decrease the runtime?
EDIT: mistyped -axsse4.2
- Parallel Computing
The question looks more suitable for the Intel Linux Fortran forum. But it's difficult to see what you're trying to accomplish with all this confusion. Why not use -msse4.2 throughout? There's no sse4.3 option for intel compilers. If it's not rejected we can't guess what will happen.