- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I want to get Gflops and AVX performance. The PCM tools seems not support. What else I can do, in order to get Gflops and AVX?
Any help will be appreciated.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Do you want to measure program performance?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you want to measure actual floating point arithmetic execution rate you are mostly out of luck. The performance counters that measure floating-point arithmetic instructions (scalar, 128-bit vector, and 256-bit vector) on Sandy Bridge, Ivy Bridge, and Haswell are known to "over-count".
The degree of over-counting depends primarily on average latency between issuing the instruction and the availability of the data that the instruction uses (either register arguments or memory arguments). If all the data is in the L1 cache, then there is almost no over-counting. If the data is in the L2 cache then you can get slight over-counting (10%-20%, but variable), and if all the data is in memory the counts can be as much as 6x to 10x higher than the actual number of completed floating-point arithmetic instructions.
See more discussion at https://software.intel.com/en-us/forums/topic/499193 and https://software.intel.com/en-us/forums/topic/531796
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
iliyapolak wrote:
Do you want to measure program performance?
Yes, is there some way to do that?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
John D. McCalpin wrote:
If you want to measure actual floating point arithmetic execution rate you are mostly out of luck. The performance counters that measure floating-point arithmetic instructions (scalar, 128-bit vector, and 256-bit vector) on Sandy Bridge, Ivy Bridge, and Haswell are known to "over-count".
The degree of over-counting depends primarily on average latency between issuing the instruction and the availability of the data that the instruction uses (either register arguments or memory arguments). If all the data is in the L1 cache, then there is almost no over-counting. If the data is in the L2 cache then you can get slight over-counting (10%-20%, but variable), and if all the data is in memory the counts can be as much as 6x to 10x higher than the actual number of completed floating-point arithmetic instructions.
See more discussion at https://software.intel.com/en-us/forums/topic/499193 and https://software.intel.com/en-us/forums/topic/531796
I have measure it on Sandy Bridge and Ivy Bridge, but not Haswell. I can accept slight over-counting. I have check the documents from http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html. But it not list events about flops and vector on Haswell.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
John D. McCalpin wrote:
If you want to measure actual floating point arithmetic execution rate you are mostly out of luck. The performance counters that measure floating-point arithmetic instructions (scalar, 128-bit vector, and 256-bit vector) on Sandy Bridge, Ivy Bridge, and Haswell are known to "over-count".
The degree of over-counting depends primarily on average latency between issuing the instruction and the availability of the data that the instruction uses (either register arguments or memory arguments). If all the data is in the L1 cache, then there is almost no over-counting. If the data is in the L2 cache then you can get slight over-counting (10%-20%, but variable), and if all the data is in memory the counts can be as much as 6x to 10x higher than the actual number of completed floating-point arithmetic instructions.
See more discussion at https://software.intel.com/en-us/forums/topic/499193 and https://software.intel.com/en-us/forums/topic/531796
I have measure it on Sandy Bridge and Ivy Bridge, but not Haswell. I can accept slight over-counting. I have check the documents from http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html. But it not list events about flops and vector on Haswell.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
John D. McCalpin wrote:
If you want to measure actual floating point arithmetic execution rate you are mostly out of luck. The performance counters that measure floating-point arithmetic instructions (scalar, 128-bit vector, and 256-bit vector) on Sandy Bridge, Ivy Bridge, and Haswell are known to "over-count".
The degree of over-counting depends primarily on average latency between issuing the instruction and the availability of the data that the instruction uses (either register arguments or memory arguments). If all the data is in the L1 cache, then there is almost no over-counting. If the data is in the L2 cache then you can get slight over-counting (10%-20%, but variable), and if all the data is in memory the counts can be as much as 6x to 10x higher than the actual number of completed floating-point arithmetic instructions.
See more discussion at https://software.intel.com/en-us/forums/topic/499193 and https://software.intel.com/en-us/forums/topic/531796
I have measure it on Sandy Bridge and Ivy Bridge, but not Haswell. I can accept slight over-counting. I have check the documents from http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html. But it not list events about flops and vector on Haswell.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry, because of slow internet, I clicked submit button one more times.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
GHui wrote:
Quote:
iliyapolak wrote:
Do you want to measure program performance?
Yes, is there some way to do that?
You can use VTune for do that. Start measurement by choosing Lightweight Hotspots and move deeper by choosing more advanced analysis types .
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>>I have measure it on Sandy Bridge and Ivy Bridge, but not Haswell. I can accept slight over-counting. I have check the documents from http://www.intel.com/content/www/us/en/processors/architectures-software.... But it not list events about flops and vector on Haswell.>>>
Check following paper about FP performance analysis https://software.intel.com/en-us/articles/estimating-flops-using-event-based-sampling-ebs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page