Software Archive
Read-only legacy content
17061 Discussions

Why Are Two Same Code and Data with Different Execution Time?

Jiawen_L_
Beginner
527 Views

Hi everyone,

I found that when I run the axpy(y = x * a + y) with two separate set of similar data, I got the totally different execution time as following. The attached file is the sample code for axpy.

My assumption is that the first time to run the inout pragma has to spend the time to prepare/preconfigure/preheat the Xeon Phi Coprocessor. If so, is there any official explanation to explain this odd situation? If not, what is the reason? Is there any better way to make a improvement or avoid for this situation?  It's really important for the benchmark. Because compare to NVIDIA/INTEL GPU/CPU, this situation never happens.

[liu@fornax Test_offomp]$ ./a.out

Total time for inout1 combined   = 0.39732003 sec

Total time for inout2 combined   = 0.01132083 sec

 

Best wishes,

Jiawen

0 Kudos
1 Reply
jimdempseyatthecove
Honored Contributor III
527 Views

For offload code, the first time offload has to copy in the MIC code of the application, and then instantiate the OpenMP thread pool.

Whereas the second and later offloads do not have to copy in the MIC code of the application, and also can re-use the existing OpenMP thread pool.

There is an option to specify that the MIC code than be pre-loaded at program start time.

Generally for timing your code you either disregard the first call of your timed region or prior to the timed region you induce an offload region that is not timed (this is once only at application start).

Jim Dempsey

0 Kudos
Reply