Compiler: Intel 16.0.3
OS: OS X 10.10.5
We are experiencing much slower compile times with on our OS X machines after we upgraded from the Intel compiler 13.0.3 to 16.0.3. The intel flags used to compile are as follows:
icpc -c -wd858,1572,1569,279 -O3 -inline-level=2 -falign-functions=16 -ansi-alias -xSSSE3 -w -m64 -std=c++11 -DQT_NO_DEBUG -DQT_GUI_LIB -DQT_CORE_LIB -DQT_SHARED
We did the same upgrade on our Linux and Windows machines and we did not have the same slow downs. For a comparison our previous build would take ~25 minutes compared to ~70 minutes with the new compiler. Looking at the activity monitor it appears as if the compiler is unable to multi thread and is stuck using one thread despite specifying the number of cores we want to use.
Are you saying that there is no performance difference between "make -j1" and "make -j8" when using the Intel C++ compiler? For us, there is a huge difference. We are currently evaluating the latest version 17 compiler and so far everything seems fine. So maybe try going to version 17.
Do you have the same problem when compiling with llvm?
If I recall right, V13 default -O3 did not include interprocedural optimizations (I do not think IPO was available then).
Whereas V16 -O3 defaults to include IPO. Use the compiler flag to disable IPO. Note, there are two levels of IPO: single-file, and multi-file.
You may want to vary the IPO settings (none, single, multi) per object file.
Yes there is no difference between -j1 and -j8. As for the IPO, we experience the problem even in debug where we do not use the -O3 flag (we use the -O0 flag for debug). We suspect this might be related to licensing, we have two machines that build on OS X and we have two floating licenses for the OS X Intel compiler. Unfortunately we do not have a node locked license for OS X to test this theory. One thing to note is that the correct number of icpc instances are being created when we start the compile, but they just do not seem to ever get any CPU time.
Enter "Flex" into the Powered by Google search near the top of this web page. There may be some useful information as to how to reduce the latency.
Thanks Jim, but I don't think it's latency since none of our other machines have this issue. It appears that only once instance of icpc is obtaining a license at a time, which doesn't match the behaviour of previous versions or that of icpc on Mac. We have created a small test to prove this. Below is result from a test in which we compare clang to icpc with the hostname in the license file and the icpc with the ip address in the license file( As suggested in the fortran forum https://software.intel.com/en-us/forums/intel-fortran-compiler-for-linux-and-mac-os-x/topic/532464)
Test project using "-j5" with 4 cores:
Clang: 9 seconds
With hostname in the USE_SERVER.lic file: 1.05 minutes
With Ip Address in the USE_SERVER.lic file: 45 seconds
These results are from a compile without any optimization flags. Clang is 4 times faster than icpc using an ip address in the license file. This is expected since it uses all the cores available and icpc is not. Another thing to note is we have builders on Windows and Linux OS's connecting to the same license server that do not experience these extreme slow downs.
are you using floating license?
please clean the license folder so it contains one valid license only.
one more question: how did you compile the source files? is it like this: icpc [-options] -c f1.cpp f2.cpp f3.cpp ..... fn.cpp?
Yes we have are using a float license, we have a license for 2 users. The folder containing the license only contains only 1 license file. An example as to how we compile is below:
icpc -c -wd858,1572,1569,279 -qopenmp -mkl=parallel -std=c++11 -O3 -inline-level=2 -falign-functions=16 -ansi-alias -xSSSE3 -w -m64 -w -fpic -DQT_NO_DEBUG_OUTPUT -DQT_NO_DEBUG_OUTPUT -DQT_NO_DEBUG -DQT_GUI_LIB -DQT_CORE_LIB -DQT_SHARED -I/opt/qt-4.8.7/mkspecs/macx-icc -I. -I/opt/qt-4.8.7/lib/QtCore.framework/Headers -I/opt/qt-4.8.7/include/QtCore -I/opt/qt-4.8.7/lib/QtGui.framework/Headers -I/opt/qt-4.8.7/include/QtGui -I/opt/qt-4.8.7/include -I/usr/local/include/vtk-6.3 -I/usr/local/boost_1_55_0/include -I/opt/intel/compilers_and_libraries_2016.3.170/mac/tbb/include -I/opt/hdf5-1.8.17/include -I/opt/ffmpeg/include -I/usr/local/include -I/opt/flexnet/188.8.131.52/machind -I/opt/tetgen/include -I../tools -I../geometrykernel/include -I../math/include -I../mesh/include -I../utils/include -IGeneratedFiles/release/x64 -F/opt/qt-4.8.7/lib -o release/x64/test.o src/product/test.cpp