- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Compiler: Intel 16.0.3
OS: OS X 10.10.5
Hello,
We are experiencing much slower compile times with on our OS X machines after we upgraded from the Intel compiler 13.0.3 to 16.0.3. The intel flags used to compile are as follows:
icpc -c -wd858,1572,1569,279 -O3 -inline-level=2 -falign-functions=16 -ansi-alias -xSSSE3 -w -m64 -std=c++11 -DQT_NO_DEBUG -DQT_GUI_LIB -DQT_CORE_LIB -DQT_SHARED
We did the same upgrade on our Linux and Windows machines and we did not have the same slow downs. For a comparison our previous build would take ~25 minutes compared to ~70 minutes with the new compiler. Looking at the activity monitor it appears as if the compiler is unable to multi thread and is stuck using one thread despite specifying the number of cores we want to use.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Cole,
Are you saying that there is no performance difference between "make -j1" and "make -j8" when using the Intel C++ compiler? For us, there is a huge difference. We are currently evaluating the latest version 17 compiler and so far everything seems fine. So maybe try going to version 17.
Do you have the same problem when compiling with llvm?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If I recall right, V13 default -O3 did not include interprocedural optimizations (I do not think IPO was available then).
Whereas V16 -O3 defaults to include IPO. Use the compiler flag to disable IPO. Note, there are two levels of IPO: single-file, and multi-file.
You may want to vary the IPO settings (none, single, multi) per object file.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes there is no difference between -j1 and -j8. As for the IPO, we experience the problem even in debug where we do not use the -O3 flag (we use the -O0 flag for debug). We suspect this might be related to licensing, we have two machines that build on OS X and we have two floating licenses for the OS X Intel compiler. Unfortunately we do not have a node locked license for OS X to test this theory. One thing to note is that the correct number of icpc instances are being created when we start the compile, but they just do not seem to ever get any CPU time.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Enter "Flex" into the Powered by Google search near the top of this web page. There may be some useful information as to how to reduce the latency.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Jim, but I don't think it's latency since none of our other machines have this issue. It appears that only once instance of icpc is obtaining a license at a time, which doesn't match the behaviour of previous versions or that of icpc on Mac. We have created a small test to prove this. Below is result from a test in which we compare clang to icpc with the hostname in the license file and the icpc with the ip address in the license file( As suggested in the fortran forum https://software.intel.com/en-us/forums/intel-fortran-compiler-for-linux-and-mac-os-x/topic/532464)
Test project using "-j5" with 4 cores:
Clang: 9 seconds
With hostname in the USE_SERVER.lic file: 1.05 minutes
With Ip Address in the USE_SERVER.lic file: 45 seconds
These results are from a compile without any optimization flags. Clang is 4 times faster than icpc using an ip address in the license file. This is expected since it uses all the cores available and icpc is not. Another thing to note is we have builders on Windows and Linux OS's connecting to the same license server that do not experience these extreme slow downs.
Cole
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Bumping this issue, since this should be our scenario should be very easy for Intel support to re create and it is I have already shown that it is not a latency issue.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
are you using floating license?
please clean the license folder so it contains one valid license only.
one more question: how did you compile the source files? is it like this: icpc [-options] -c f1.cpp f2.cpp f3.cpp ..... fn.cpp?
thanks,
Jennifer
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes we have are using a float license, we have a license for 2 users. The folder containing the license only contains only 1 license file. An example as to how we compile is below:
icpc -c -wd858,1572,1569,279 -qopenmp -mkl=parallel -std=c++11 -O3 -inline-level=2 -falign-functions=16 -ansi-alias -xSSSE3 -w -m64 -w -fpic -DQT_NO_DEBUG_OUTPUT -DQT_NO_DEBUG_OUTPUT -DQT_NO_DEBUG -DQT_GUI_LIB -DQT_CORE_LIB -DQT_SHARED -I/opt/qt-4.8.7/mkspecs/macx-icc -I. -I/opt/qt-4.8.7/lib/QtCore.framework/Headers -I/opt/qt-4.8.7/include/QtCore -I/opt/qt-4.8.7/lib/QtGui.framework/Headers -I/opt/qt-4.8.7/include/QtGui -I/opt/qt-4.8.7/include -I/usr/local/include/vtk-6.3 -I/usr/local/boost_1_55_0/include -I/opt/intel/compilers_and_libraries_2016.3.170/mac/tbb/include -I/opt/hdf5-1.8.17/include -I/opt/ffmpeg/include -I/usr/local/include -I/opt/flexnet/11.14.0.2/machind -I/opt/tetgen/include -I../tools -I../geometrykernel/include -I../math/include -I../mesh/include -I../utils/include -IGeneratedFiles/release/x64 -F/opt/qt-4.8.7/lib -o release/x64/test.o src/product/test.cpp

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page