Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

Question about get Gflops and AVX performance

GHui
Novice
980 Views

I want to get Gflops and AVX performance. The PCM tools seems not support. What else I can do, in order to get Gflops and AVX? 

Any help will be appreciated.

 

 

0 Kudos
9 Replies
Bernard
Valued Contributor I
980 Views

Do you want to measure program performance?

0 Kudos
McCalpinJohn
Honored Contributor III
980 Views

If you want to measure actual floating point arithmetic execution rate you are mostly out of luck.  The performance counters that measure floating-point arithmetic instructions (scalar, 128-bit vector, and 256-bit vector) on Sandy Bridge, Ivy Bridge, and Haswell are known to "over-count".     

The degree of over-counting depends primarily on average latency between issuing the instruction and the availability of the data that the instruction uses (either register arguments or memory arguments).    If all the data is in the L1 cache, then there is almost no over-counting. If the data is in the L2 cache then you can get slight over-counting (10%-20%, but variable), and if all the data is in memory the counts can be as much as 6x to 10x higher than the actual number of completed floating-point arithmetic instructions.

See more discussion at https://software.intel.com/en-us/forums/topic/499193 and https://software.intel.com/en-us/forums/topic/531796

 

0 Kudos
GHui
Novice
980 Views

iliyapolak wrote:

Do you want to measure program performance?

Yes, is there some way to do that?

0 Kudos
GHui
Novice
980 Views

John D. McCalpin wrote:

If you want to measure actual floating point arithmetic execution rate you are mostly out of luck.  The performance counters that measure floating-point arithmetic instructions (scalar, 128-bit vector, and 256-bit vector) on Sandy Bridge, Ivy Bridge, and Haswell are known to "over-count".     

The degree of over-counting depends primarily on average latency between issuing the instruction and the availability of the data that the instruction uses (either register arguments or memory arguments).    If all the data is in the L1 cache, then there is almost no over-counting. If the data is in the L2 cache then you can get slight over-counting (10%-20%, but variable), and if all the data is in memory the counts can be as much as 6x to 10x higher than the actual number of completed floating-point arithmetic instructions.

See more discussion at https://software.intel.com/en-us/forums/topic/499193 and https://software.intel.com/en-us/forums/topic/531796

 

I have measure it on Sandy Bridge and Ivy Bridge, but not Haswell. I can accept slight over-counting. I have check the documents from http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html. But it not list events about flops and vector on Haswell.

0 Kudos
GHui
Novice
980 Views

John D. McCalpin wrote:

If you want to measure actual floating point arithmetic execution rate you are mostly out of luck.  The performance counters that measure floating-point arithmetic instructions (scalar, 128-bit vector, and 256-bit vector) on Sandy Bridge, Ivy Bridge, and Haswell are known to "over-count".     

The degree of over-counting depends primarily on average latency between issuing the instruction and the availability of the data that the instruction uses (either register arguments or memory arguments).    If all the data is in the L1 cache, then there is almost no over-counting. If the data is in the L2 cache then you can get slight over-counting (10%-20%, but variable), and if all the data is in memory the counts can be as much as 6x to 10x higher than the actual number of completed floating-point arithmetic instructions.

See more discussion at https://software.intel.com/en-us/forums/topic/499193 and https://software.intel.com/en-us/forums/topic/531796

 

I have measure it on Sandy Bridge and Ivy Bridge, but not Haswell. I can accept slight over-counting. I have check the documents from http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html. But it not list events about flops and vector on Haswell.

0 Kudos
GHui
Novice
980 Views

John D. McCalpin wrote:

If you want to measure actual floating point arithmetic execution rate you are mostly out of luck.  The performance counters that measure floating-point arithmetic instructions (scalar, 128-bit vector, and 256-bit vector) on Sandy Bridge, Ivy Bridge, and Haswell are known to "over-count".     

The degree of over-counting depends primarily on average latency between issuing the instruction and the availability of the data that the instruction uses (either register arguments or memory arguments).    If all the data is in the L1 cache, then there is almost no over-counting. If the data is in the L2 cache then you can get slight over-counting (10%-20%, but variable), and if all the data is in memory the counts can be as much as 6x to 10x higher than the actual number of completed floating-point arithmetic instructions.

See more discussion at https://software.intel.com/en-us/forums/topic/499193 and https://software.intel.com/en-us/forums/topic/531796

 

I have measure it on Sandy Bridge and Ivy Bridge, but not Haswell. I can accept slight over-counting. I have check the documents from http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html. But it not list events about flops and vector on Haswell.

0 Kudos
GHui
Novice
980 Views

Sorry, because of slow internet, I clicked submit button one more times.

0 Kudos
Bernard
Valued Contributor I
980 Views

GHui wrote:

Quote:

iliyapolak wrote:

 

Do you want to measure program performance?

 

 

Yes, is there some way to do that?

You can use VTune for do that. Start measurement  by choosing Lightweight Hotspots and move deeper by choosing more advanced analysis types .

0 Kudos
Bernard
Valued Contributor I
980 Views

>>>I have measure it on Sandy Bridge and Ivy Bridge, but not Haswell. I can accept slight over-counting. I have check the documents from http://www.intel.com/content/www/us/en/processors/architectures-software.... But it not list events about flops and vector on Haswell.>>>

Check following paper about FP performance analysis https://software.intel.com/en-us/articles/estimating-flops-using-event-based-sampling-ebs

0 Kudos
Reply