- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
SIMD_FP_256
?Review countnumber to know if the results are under expectation.
Event Name Extension | Mask | Definition | Description | Counter | Counter (HT off) |
---|---|---|---|---|---|
0x01 | This events counts the number of AVX-256 Computational FP single precision uops issued during the cycle. Note: Packed AVX-256 can be counted as one, and will count for SIMD FP 128. | 0,1,2,3 | 0,1,2,3,4,5,6,7 | ||
0x02 | This event counts the number of AVX-256 Computational FP doube precision uops issued during the cycle. Note: Packed AVX-256 can be counted as one, and will count for SIMD FP 128. | 0,1,2,3 | 0,1,2,3,4,5,6,7 |
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What is equivalent of
SIMD_FP_256.PACKED_DOUBLE.
SIMD_FP_256.PACKED_DOUBLE
on haswell ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It appears that all of the floating-point performance counters (with the except of the Event 0xCA "Floating Point Assists") have been removed from the Haswell-based products.
These counters are known to systematically overcount in Sandy Bridge and Ivy Bridge processors whenever the input registers are not ready (e.g., due to cache misses). I have seen overcounting by anywhere from ~3% to 10x, depending on the average latency for loads feeding into the FP instructions.
We still use these counters on our 6400-node Sandy Bridge system to monitor whether codes are using SSE or AVX, how well the codes vectorize, and whether they are running with 32-bit or 64-bit floating-point arithmetic. The accuracy is good enough for this classification process, and if we deploy a large Haswell-based system we will have to employ a different approach to get this information.
Intel is certainly aware of the accuracy issues with these counters and is likely to fix the existing problems in some future products. Section 19.2 of Volume 3 of the SW Developer's Guide (document 324384-053, January 2015) shows that Broadwell gets a few FP events back:
- Event 0x14, Umask 0x01: ARITH.FPU_DIV_ACTIVE -- cycles that the divide unit is active
- Event 0xC0, Umask 0x02: INST_RETIRED.X87 -- x87 Floating-Point operations that are retired without generating exceptions.
I have not heard any definitive statements on when improved support for floating-point counts will make it into shipping products.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>>As far as I understand during execution of packed AVX instructions the vector can be filled just partly. Is there a way to determine whether a vector was completely filled or nor>>>
I presume that you are referring to XMMx/YMMx registers. I this case you can see with debugger if specific register is filled with 4 or 8 scalars.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>>and if we deploy a large Haswell-based system we will have to employ a different approach to get this information.
Do you have any idea to get flops on haswell architecture ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>>Do you have any idea to get flops on haswell architecture ?>>>
Do you mean to count how many GFLOPS were executed?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>Do you mean to count how many GFLOPS were executed?
yes to count Gflops of application, and number of simple precision and double precision flops were executed
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think that John answered your question.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page