Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Beginner
30 Views

Question about get Gflops and AVX performance

I want to get Gflops and AVX performance. The PCM tools seems not support. What else I can do, in order to get Gflops and AVX? 

Any help will be appreciated.

 

 

0 Kudos
9 Replies
Highlighted
Black Belt
30 Views

Do you want to measure program performance?

0 Kudos
Highlighted
Black Belt
30 Views

If you want to measure actual floating point arithmetic execution rate you are mostly out of luck.  The performance counters that measure floating-point arithmetic instructions (scalar, 128-bit vector, and 256-bit vector) on Sandy Bridge, Ivy Bridge, and Haswell are known to "over-count".     

The degree of over-counting depends primarily on average latency between issuing the instruction and the availability of the data that the instruction uses (either register arguments or memory arguments).    If all the data is in the L1 cache, then there is almost no over-counting. If the data is in the L2 cache then you can get slight over-counting (10%-20%, but variable), and if all the data is in memory the counts can be as much as 6x to 10x higher than the actual number of completed floating-point arithmetic instructions.

See more discussion at https://software.intel.com/en-us/forums/topic/499193 and https://software.intel.com/en-us/forums/topic/531796

 

0 Kudos
Highlighted
Beginner
30 Views

iliyapolak wrote:

Do you want to measure program performance?

Yes, is there some way to do that?

0 Kudos
Highlighted
Beginner
30 Views

John D. McCalpin wrote:

If you want to measure actual floating point arithmetic execution rate you are mostly out of luck.  The performance counters that measure floating-point arithmetic instructions (scalar, 128-bit vector, and 256-bit vector) on Sandy Bridge, Ivy Bridge, and Haswell are known to "over-count".     

The degree of over-counting depends primarily on average latency between issuing the instruction and the availability of the data that the instruction uses (either register arguments or memory arguments).    If all the data is in the L1 cache, then there is almost no over-counting. If the data is in the L2 cache then you can get slight over-counting (10%-20%, but variable), and if all the data is in memory the counts can be as much as 6x to 10x higher than the actual number of completed floating-point arithmetic instructions.

See more discussion at https://software.intel.com/en-us/forums/topic/499193 and https://software.intel.com/en-us/forums/topic/531796

 

I have measure it on Sandy Bridge and Ivy Bridge, but not Haswell. I can accept slight over-counting. I have check the documents from http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html. But it not list events about flops and vector on Haswell.

0 Kudos
Highlighted
Beginner
30 Views

John D. McCalpin wrote:

If you want to measure actual floating point arithmetic execution rate you are mostly out of luck.  The performance counters that measure floating-point arithmetic instructions (scalar, 128-bit vector, and 256-bit vector) on Sandy Bridge, Ivy Bridge, and Haswell are known to "over-count".     

The degree of over-counting depends primarily on average latency between issuing the instruction and the availability of the data that the instruction uses (either register arguments or memory arguments).    If all the data is in the L1 cache, then there is almost no over-counting. If the data is in the L2 cache then you can get slight over-counting (10%-20%, but variable), and if all the data is in memory the counts can be as much as 6x to 10x higher than the actual number of completed floating-point arithmetic instructions.

See more discussion at https://software.intel.com/en-us/forums/topic/499193 and https://software.intel.com/en-us/forums/topic/531796

 

I have measure it on Sandy Bridge and Ivy Bridge, but not Haswell. I can accept slight over-counting. I have check the documents from http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html. But it not list events about flops and vector on Haswell.

0 Kudos
Highlighted
Beginner
30 Views

John D. McCalpin wrote:

If you want to measure actual floating point arithmetic execution rate you are mostly out of luck.  The performance counters that measure floating-point arithmetic instructions (scalar, 128-bit vector, and 256-bit vector) on Sandy Bridge, Ivy Bridge, and Haswell are known to "over-count".     

The degree of over-counting depends primarily on average latency between issuing the instruction and the availability of the data that the instruction uses (either register arguments or memory arguments).    If all the data is in the L1 cache, then there is almost no over-counting. If the data is in the L2 cache then you can get slight over-counting (10%-20%, but variable), and if all the data is in memory the counts can be as much as 6x to 10x higher than the actual number of completed floating-point arithmetic instructions.

See more discussion at https://software.intel.com/en-us/forums/topic/499193 and https://software.intel.com/en-us/forums/topic/531796

 

I have measure it on Sandy Bridge and Ivy Bridge, but not Haswell. I can accept slight over-counting. I have check the documents from http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html. But it not list events about flops and vector on Haswell.

0 Kudos
Highlighted
Beginner
30 Views

Sorry, because of slow internet, I clicked submit button one more times.

0 Kudos
Highlighted
Black Belt
30 Views

GHui wrote:

Quote:

iliyapolak wrote:

 

Do you want to measure program performance?

 

 

Yes, is there some way to do that?

You can use VTune for do that. Start measurement  by choosing Lightweight Hotspots and move deeper by choosing more advanced analysis types .

0 Kudos
Highlighted
Black Belt
30 Views

>>>I have measure it on Sandy Bridge and Ivy Bridge, but not Haswell. I can accept slight over-counting. I have check the documents from http://www.intel.com/content/www/us/en/processors/architectures-software.... But it not list events about flops and vector on Haswell.>>>

Check following paper about FP performance analysis https://software.intel.com/en-us/articles/estimating-flops-using-event-based-sampling-ebs

0 Kudos