Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

How can I understand the result of PMU

Chenjie_Y_
Beginner
351 Views

Hi, all.

My CPU is Core architecture(T7100), I found in datasheet there was a event, FP_COMP_OPS_EXE, for monitoring floating point mico-ops.

And I write a very very simple benchmark,test.c  to test this counter, like 

int main(void)
 {
   float i;
   i=i+0.01;

 }

then gcc -o test.out test.c
then,I use perf to monitor, the commond is, (0010 is Umask|Event_number): perf stat -e r0010 ./test.out &

And get the result

 Performance counter stats for './test.out':

             1,398 raw 0x10                                                   

       0.001437684 seconds time elapsed


My question is how can understand the number 1,398. Accurately, my code only contains one FADD operation. Is that means the FADD is translated into 1,398 micro-ops? or I misundestand the meaning of micro-ops ?

Thank you.

0 Kudos
3 Replies
Patrick_F_Intel1
Employee
351 Views

Hello Chenjie,

The monitoring utility 'perf' doesn't start monitoring at your main(). It starts monitoring before your program is loaded. So it counts (probably) some uops in perf, some uops due to loading your program, some uops due to initializing everything for your program and then, after all that, the instructions in your program. And then the uops for cleaning up after your program, and returning to perf. Since your program includes floating point, linux may (I'm 99.999% sure) also setup extended save/restore registers to hold the sse2 state in case of context switches.

Lastly, you need to look at the disassemby of the binary to see what your program is actually doing. It may or may not be doing what you think... especially since you don't return any value or print anything out... the compiler may (as an optimization) just be executing a return. And since you are using an uninitalized variable 'i', you might be getting exceptions.

You could try inserting a loop to see if there is a base number of uops that always gets executed (say when the loop count==0) and a number of uops that increased in proportion to the loop. That would probably provide more insights.

Pat

0 Kudos
Patrick_F_Intel1
Employee
351 Views

Hello Chenjie,

The monitoring utility 'perf' doesn't start monitoring at your main(). It starts monitoring before your program is loaded. So it counts (probably) some uops in perf, some uops due to loading your program, some uops due to initializing everything for your program and then, after all that, the instructions in your program. And then the uops for cleaning up after your program, and returning to perf. Since your program includes floating point, linux may (I'm 99.999% sure) also setup extended save/restore registers to hold the sse2 state in case of context switches.

Lastly, you need to look at the disassemby of the binary to see what your program is actually doing. It may or may not be doing what you think... especially since you don't return any value or print anything out... the compiler may (as an optimization) just be executing a return. And since you are using an uninitalized variable 'i', you might be getting exceptions.

You could try inserting a loop to see if there is a base number of uops that always gets executed (say when the loop count==0) and a number of uops that increased in proportion to the loop. That would probably provide more insights.

Pat

0 Kudos
Chenjie_Y_
Beginner
351 Views

Dear Patrick, thank you. I get your idea.

0 Kudos
Reply