I have read several documents, presentations and white papers about Intel Parallel Studio XE and I have some questions about Intel C++ Compiler, IPP and OpenCV.
1. http://software.intel.com/sites/default/files/6bygraph-100912sm.png In this benchmark table, it seems Intel's C++ compiler produces more efficient machine codes than others (MS Compiler & gcc). Is major factor on this perfomance increase due to parallelism which is acquired by data parallelism (loop vectorization)?
2. I am curious about is there anybody who had compiled OpenCV by using icc since I want to learn about performance differences between two builds of OpenCV - 1. Build OpenCV with icc integrating IPP 2. Build OpenCV with gcc integrating IPP. Actually I am curious about are there enough code regions (OpenCV code not IPP function calls) for which Intel's compiler can increase performance by using vectorization techniques so that the overall OpenCV library performce gets higher?
3. If you know there ocl module in OpenCV. Moreover, ocl module had matured in OpenCV version 2.4.4. In the case I have called functions of module ocl, could I use Intel VTune Amplifier as a performance profiler?
VTune should work well as a performance profiler if you have built the code of interest with debug symbols and optimization, e.g. icc -O -g or gcc -O -ftree-vectorize -g -funroll-loops ....
Personal preferences on compile options enter strongly into comparisons of performance among compilers. Special optimization packages which compilers may use to optimize SPEC performance under the special SPEC rules aren't likely to be applicable to "real" applications, thus SPEC performance ratios aren't so meaningful.
Note that icc -ansi-alias -xHost -O2 is roughly equivalent to gcc -O3 -ffast-math -fno-cx-limited-range -funroll-loops --param max-unroll-times=2 -march=native so you will not often see comparisons making an effort to set equivalent options.