Software Archive
Read-only legacy content
17061 Discussions

GFlops on MIC

Timocafe
Beginner
874 Views

Hello Intel,

I wrote "a kind of" meta compiler to generate SIMD code on multi platform (x86, power, PHI, etc ...). I will present my work in the ISC 2014 in june. I am preparing the Super Computing conference, where I would like present to result on the Phi platform I have an issue.

I am trying to calculate the GFLOP/s of my application, a first approach will be to count the number of operations and divide by the elapsed time, as usually done for dgemm benchmark. Unfortunately I have thousands of lines ...

I read on an intel post:  http://software.intel.com/en-us/articles/best-know-method-estimating-flops-for-workloads-running-on-the-intel-xeon-phi-coprocessor

I may get GFLOP if I divide VPU_ELEMENTS_ACTIVE counter by the time of execution. It is rough estimation but enough for a first approach.

I check this on my code for float I get, 45560136680 for 0.013 and for double  91938275814 for 0.0281985 .

Well I will get the same number of GFLOP/s for float and double although I should have twice more FLOPS for float.

An other post of intel : https://software.intel.com/en-us/articles/optimization-and-performance-tuning-for-intel-xeon-phi-coprocessors-part-2-understanding says "We would like to be able to measure efficiency in terms of floating-point operations per second, as that can easily be compared to the peak floating-point performance of the machine. However, the Intel Xeon Phi coprocessor does not have events to count floating-point operations."

So is it possible to have this GFLOP or not ?

Best,

++t

 

 

 

 

0 Kudos
2 Replies
Timocafe
Beginner
874 Views

error

0 Kudos
jimdempseyatthecove
Honored Contributor III
874 Views

From the first article, keep in mind: The hardware event counters treat the FMA instructions as a single floating-point operation instead of two operations. This should be taken into consideration if your workload contains a large number of FMA operations.

Therefore, using VPU_ELEMENTS_ACTIVE may not be representative to true flop count.

I think the better approach is:

Since you are writing a meta compiler, the meta compiler could conceivably be written to be compiled with a conditional compile option that assess a flop count at each SIMD instruction generated. This would be equivalent to the VPU_ELEMENTS_ACTIVE collection, but with a correction for FMA, and anything else that you find pertinent.

You then produce two programs, one with the flop counter enabled, and one without. The two programs are run with identical input data and functionality. The counter enabled run will tell you the number of floating point operations, and the counter disabled will produce the runtime. From the two results you can compute an adjusted flops value.

Jim Dempsey

0 Kudos
Reply