Running openmp on the xeon phi

Samuel_H_2 · ‎08-09-2013

I created a program which will calculate the number of FLOPS per second on the Xeon Phi. This program was made to run nativly on the Xeon Phi. I tried using openmp and compiled the program using icc as such

$ icc -openmp -mmic -vec-report=3 -O3 helloflops3.c -o helloflops3

Though when I tried to run it on the Xeon Phi, I found out that there was no different in speed between running the program in 1 thread or 240 threads. The progam tells me that am using 240 threads, but there is no difference in speed. I ran the program as such.

$ export OMP_NUM_THREADS=240

$ export OMP_KMP_AFFINITY=scatter

$ ./helloflops3

And the Gflops per seconds are 16.729 for 1 thread, and 8.286 for 240 threads.I complied a fortran version of the code, and I complied that using ifort, and that work exactually as intended. I got 1992 GFLOPs per second for 240 threads, and 16.734 for 1 thread. I am currently using the latest verion of Intel(r) c++ composer XE for Linux which I got for free. I don't think the problem is caused because I am using a free copy because the Intel(r) fortran composer XE is also a free copy. My OS is Redhat 6.4. Is there something I am missing?

Sumedh_N_Intel · ‎08-12-2013

Could you share a reproducer? Also, it would be great if you could provide the exact versions of the compilers and the output of micinfo.