- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I want get 'cpu stall' using vtune.which event should i use?How to use? I have found two events.
UOPS_RETIRED.STALLED_CYCLES?
UOPS_RETIRED.STALLED?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I would rather go with the full analysis of back end and front end pipeline stalls.There you have already calculated breakdown of various stalled uops at various stages of pipelined execution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
back end and front end pipeline stalls? what are they?
I just want to get cpu install for memory access
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
First at all, you may read book <Intel® 64 and IA-32 Architectures Optimization Reference Manual> if you have time (which can be download from Intel website). For saving your time, you also can read <Tuning Guides and Performance Analysis Papers> for different processor type.
iliyapolak is right that you have to consider stalls from the front end and back end. For example, stall from instruction fetching due to Instruction cache miss, IDQ is buff, branch misprediction, ROB is full, Resource Allocation Table is not usable to Reservation Station, and RS can not dispatch uops to execution unit since Port 1-n are busy. At the back end, maybe Load/Store buffer will cause delay as well as Data Cache Miss (also TLB miss), nonalignment data access, 4K-aliasing, etc . Also, uops retirement (in ROB) might cause delay (piror instruction is retired), etc.
To know which events name supported in your platform will cause stall, you can use "amplxe-runss -event-list | grep STALL" . However some events name without "STALL", for example L2_MISS also will cause penalty - you need to read guideline or optimization user manual.
Regards, Peter
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@lina
Peter explained that pretty well.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I would like also add that SIMD floating point stack(fadd and fmul) units can be stalled while executing interdependent FP code for example on Port 0.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I would like also add that SIMD floating point stack(fadd and fmul) units can be stalled while executing interdependent FP code for example on Port 0.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page