Showing results for

- Intel Community
- Software Development Topics
- Software Tuning, Performance Optimization & Platform Monitoring
- Average number of uops per instruction

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Highlighted
##

Ganesh_P

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-17-2013
07:50 PM

17 Views

Average number of uops per instruction

Hi,

I have been trying to use performance counters to study the program behaviour. During my data collection, I observed some of the counters give instruction count while some give uops count. Since I am only looking for a higher level study at this point, **is there any document which gives the average number of uops per instruction for different architectures (like Nehalem, Sandy bridge, Ivy bridge, etc) which I can use as a reference for my calculation?** I found some documents on uops details (fused/unfused counts) for each of the instructions in these architectures, but couldn't find any higher level average counts (may be from some execution profile studies).

Thanks,

Ganesh

6 Replies

Highlighted
##

Bernard

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-18-2013
12:13 PM

17 Views

You can look at Agner Fog instruction tables.I suppose that every machine code instruction is decoded into some fixed number of uop(s) the difference could be architectural.I also suppose that you are reffering to average number of micro-ops related events.

You can look at:

RS_UOPS_DISPATCHED

RS_UOPS_DISPATCHED.CYCLES_ANY

RS_UOPS_DISPATCHED.CYCLES_NONE

UOPS_RETIRED.ANY

Highlighted
##

Ganesh_P

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-22-2013
06:04 AM

17 Views

iliyapolak wrote:

You can look at Agner Fog instruction tables.I suppose that every machine code instruction is decoded into some fixed number of uop(s) the difference could be architectural.I also suppose that you are reffering to average number of micro-ops related events.

You can look at:

RS_UOPS_DISPATCHED

RS_UOPS_DISPATCHED.CYCLES_ANY

RS_UOPS_DISPATCHED.CYCLES_NONE

UOPS_RETIRED.ANY

Thanks iliyapolak for the note. Yes, I have Agner Fog's uop detail for each instruction per architecture. Below are the couple of issues I am trying to fix.

* I have actually been trying to find the total number of uops retired, but I am seeing some discrepancy in the documents where **UOPS_RETIRED** is described as "**Cycles **Uops are being retired" and the mask **:ANY** as "Uops retired". However, when I tried accessing these counters, they are not consistent between the runs (where **INSTRUCTION_RETIRED** is consistent). So I am not sure whether the UOPS* counters represent the actual uops or the Cycles count. Any pointers here to clarify this will be much helpful.

* Since the above approach was not convincing, I was looking for any document which suggests the average number of uops per instruction for a particular architecture, so I can try to derive the uops count from the INSTRUCTION_RETIRED count.

-Ganesh

Highlighted
##

Bernard

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-22-2013
11:10 AM

17 Views

Sorry,but I cannot understand your question.Particular machine code instruction will be decoded into fixed number of uop(s) and what you are seeing it is number of retired uops which may depend on CPU front end and back end performance.

>>> I was looking for any document which suggests the average number of uops per instruction>>>

Afaik there is no average number of uops per machine code instruction.Think about the uop as some kind of control signal which probably encodes the primitive operation of ALU or FPU unit and maybe containes also control bits and register number.

Highlighted
##

Ganesh_P

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-27-2013
08:55 PM

17 Views

Sorry for the ambiquity in my comment. Actually in some architecture the UOPS_RETIRED is described as "**Cycles **Uops are being retired" and the mask **:ANY** as "Uops retired" and in some other it is described as "Uops retired" and mask :ALL "All uops that actually retired (Precise Event)" and mask :ANY "All uops that actually retired (Precise Event)". So I was kind of confused whether they represent the Cycles or the number of uops retired.

I was asking about the average number of uops per instruction because I have read in some documents/case-study that the average uops_retired/instruction_retired ratio is about 1.6. So I am just trying to find any reliable documentation which gives this number for different architectures.

-Ganesh

Highlighted
##

Bernard

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-28-2013
12:56 AM

17 Views

Hi

It could be a ratio of uops retired to its machine code instructions beign decoded by the front-end.So in this case you can think about the average number of retired uops per retired instruction.

Highlighted
##

Patrick_F_Intel1

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

01-24-2014
08:29 AM

17 Views

Hello Ganesh,

I know this is really late but just in case it helps...

I think that the 'cycles uops are being retired' vs 'uops retired' confusion comes from using different events. Most every event (I think...) can be encoded with options that change what it counts. uops_retired is one such event. If I set a bit (I forget which bit) in the event mask, I can get a count of "for how many cycles was at least 1 uop_retired". Or if I leave that bit 0 then the event just counts the number of uops_retired. You could also specify things like 'count how many cycles we retired 2 or more uops' , or 3 or more uops, etc. Some analysis techniques use this as a measure of the efficiency of code and front end, to see how close the code is to the max number of uops retired per cycle.

Hope this helps.

Pat

For more complete information about compiler optimizations, see our Optimization Notice.