- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I copied a example ' helloglops3offload ' from the book named ' Intel Xeon Phi Coprocessor High-performance Programming '.
When I compile it with optimization option -O3 , It takes 2.6 second to complete test, but When I change optimization option to -O0, It takes 2670 second.
Is this is a bug ?
MPSS: 3.6.1
icc: 2017
OS: Centos 6.7
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Umm, wouldn't you expect the code to run more slowly when you tell the compiler not to optimize?
My view here is that you should be impressed that the compiler can improve the code by a factor of 100x, not that when you tell it to produce slow code it does.
(This is like the chap who goes to the doctor and says "When I poke a stick in my eye it hurts", to which the doctor replies "Well, don't do that, then.")
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The difference between O0 and O3 is too large, a thousand times (1000x).
So you think this is correct?
Ok, I just did not think the difference between O0 and O3 is so great.
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You made the measurements and know what you changed and precisely how you did them. If you're confident in your technique and that the only change was the compiler flag, it's hard to argue that that isn't the cause.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The full code of test:
#include <stdio.h> #include <stdlib.h> #include <string.h> #include <omp.h> #include <sys/time.h> double dtime() { double tseconds = 0.0; struct timeval mytime; gettimeofday(&mytime, (struct timezone*)0); tseconds = (double)(mytime.tv_sec + mytime.tv_usec*1.0e-6); return tseconds; } #define FLOPS_ARRAY_SIZE (1024*1024) #define MAXFLOPS_ITERS 100000000 #define LOOP_COUNT 128 #define FLOPSPERCALC 2 __declspec (target(mic)) float fa[FLOPS_ARRAY_SIZE] __attribute__((aligned(64))); __declspec (target(mic)) float fb[FLOPS_ARRAY_SIZE] __attribute__((aligned(64))); int main(int argc, char *argv[]) { int i,j,k; int numthreads = 2; double tstart, tstop, ttime; double gflops = 0.0; float a = 1.1; #pragma offload target (mic) #pragma omp parallel #pragma omp master numthreads = omp_get_num_threads(); printf("Initializing\r\n"); #pragma omp parallel for for(i=0; i<FLOPS_ARRAY_SIZE; i++) { fa = (float)i + 0.1; fb = (float)i + 0.2; } printf("Starting Compute on %d threads\r\n", numthreads); tstart = dtime(); #pragma offload target (mic) #pragma omp parallel for private(j,k) for(i=0; i<numthreads; i++) { int offset = i*LOOP_COUNT; for(j=0; j<MAXFLOPS_ITERS; j++) { for(k=0; k<LOOP_COUNT; k++) { fa[k+offset] = fa[k+offset] + fb[k+offset]; } } } tstop = dtime(); gflops = (double)(1.0e-9 * numthreads * LOOP_COUNT * MAXFLOPS_ITERS * FLOPSPERCALC); ttime = tstop - tstart; if((ttime) > 0.0) { printf("GFlops = %10.3lf, Secs = %10.3lf, GFlops per sec = %10.3lf\r\n", gflops, ttime, gflops/ttime); } return (0); }
Only change the optimization option.
icc -qopenmp -O0 test.cpp
&
icc -qopenmp -O3 test.cpp

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page