Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

Average number of uops per instruction

Ganesh_P
Beginner
1,552 Views

Hi,

I have been trying to use performance counters to study the program behaviour. During my data collection, I observed some of the counters give instruction count while some give uops count. Since I am only looking for a higher level study at this point, is there any document which gives the average number of uops per instruction for different architectures (like Nehalem, Sandy bridge, Ivy bridge, etc) which I can use as a reference for my calculation? I found some documents on uops details (fused/unfused counts) for each of the instructions in these architectures, but couldn't find any higher level average counts (may be from some execution profile studies).

Thanks,

Ganesh

0 Kudos
6 Replies
Bernard
Valued Contributor I
1,552 Views

You can look at Agner Fog instruction tables.I suppose that every machine code instruction is decoded into some fixed number of uop(s) the difference could be architectural.I also suppose that you are reffering to average number of micro-ops related events.

You can look at:

RS_UOPS_DISPATCHED

RS_UOPS_DISPATCHED.CYCLES_ANY

RS_UOPS_DISPATCHED.CYCLES_NONE

UOPS_RETIRED.ANY


0 Kudos
Ganesh_P
Beginner
1,552 Views

iliyapolak wrote:

You can look at Agner Fog instruction tables.I suppose that every machine code instruction is decoded into some fixed number of uop(s) the difference could be architectural.I also suppose that you are reffering to average number of micro-ops related events.

You can look at:

RS_UOPS_DISPATCHED

RS_UOPS_DISPATCHED.CYCLES_ANY

RS_UOPS_DISPATCHED.CYCLES_NONE

UOPS_RETIRED.ANY


Thanks iliyapolak for the note. Yes, I have Agner Fog's uop detail for each instruction per architecture. Below are the couple of issues I am trying to fix. 

* I have actually been trying to find the total number of uops retired, but I am seeing some discrepancy in the documents where UOPS_RETIRED is described as "Cycles Uops are being retired" and the mask :ANY as "Uops retired". However, when I tried accessing these counters, they are not consistent between the runs (where INSTRUCTION_RETIRED is consistent). So I am not sure whether the UOPS* counters represent the actual uops or the Cycles count. Any pointers here to clarify this will be much helpful. 

* Since the above approach was not convincing, I was looking for any document which suggests the average number of uops per instruction for a particular architecture, so I can try to derive the uops count from the INSTRUCTION_RETIRED count. 

-Ganesh

0 Kudos
Bernard
Valued Contributor I
1,552 Views

Sorry,but I cannot understand your question.Particular machine code instruction will be decoded into fixed number of uop(s) and what you are seeing it is number of retired uops which may depend on CPU front  end and back end performance. 

>>> I was looking for any document which suggests the average number of uops per instruction>>>

Afaik there is no average number of uops per machine code instruction.Think about the uop as some kind of control signal which probably encodes the primitive operation of ALU or FPU unit and maybe containes also control bits and register number.

0 Kudos
Ganesh_P
Beginner
1,552 Views

Sorry for the ambiquity in my comment. Actually in some architecture the UOPS_RETIRED is described as "Cycles Uops are being retired" and the mask :ANY as "Uops retired" and in some other it is described as "Uops retired" and mask :ALL "All uops that actually retired (Precise Event)" and mask :ANY "All uops that actually retired (Precise Event)". So I was kind of confused whether they represent the Cycles or the number of uops retired. 

I was asking about the average number of uops per instruction because I have read in some documents/case-study that the average uops_retired/instruction_retired ratio is about 1.6. So I am just trying to find any reliable documentation which gives this number for different architectures. 

-Ganesh

0 Kudos
Bernard
Valued Contributor I
1,552 Views

Hi

It could be a ratio of uops retired to its  machine code instructions beign decoded by the front-end.So in this case you can think about the average number of  retired uops per retired instruction.

0 Kudos
Patrick_F_Intel1
Employee
1,552 Views

Hello Ganesh,

I know this is really late but just in case it helps...

I think that the 'cycles uops are being retired' vs 'uops retired' confusion comes from using different events. Most every event (I think...) can be encoded with options that change what it counts. uops_retired is one such event. If I set a bit (I forget which bit) in the event mask, I can get a count of "for how many cycles was at least 1 uop_retired". Or if I leave that bit 0 then the event just counts the number of uops_retired. You could also specify things like 'count how many cycles we retired 2 or more uops' , or 3 or more uops, etc. Some analysis techniques use this as a measure of the efficiency of code and front end, to see how close the code is to the max number of uops retired per cycle.

Hope this helps.
Pat

0 Kudos
Reply