- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I created a program which will calculate the number of FLOPS per second on the Xeon Phi. This program was made to run nativly on the Xeon Phi. I tried using openmp and compiled the program using icc as such
$ icc -openmp -mmic -vec-report=3 -O3 helloflops3.c -o helloflops3
Though when I tried to run it on the Xeon Phi, I found out that there was no different in speed between running the program in 1 thread or 240 threads. The progam tells me that am using 240 threads, but there is no difference in speed. I ran the program as such.
$ export OMP_NUM_THREADS=240
$ export OMP_KMP_AFFINITY=scatter
$ ./helloflops3
And the Gflops per seconds are 16.729 for 1 thread, and 8.286 for 240 threads.I complied a fortran version of the code, and I complied that using ifort, and that work exactually as intended. I got 1992 GFLOPs per second for 240 threads, and 16.734 for 1 thread. I am currently using the latest verion of Intel(r) c++ composer XE for Linux which I got for free. I don't think the problem is caused because I am using a free copy because the Intel(r) fortran composer XE is also a free copy. My OS is Redhat 6.4. Is there something I am missing?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Could you share a reproducer? Also, it would be great if you could provide the exact versions of the compilers and the output of micinfo.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page