Software Tuning, Performance Optimization & Platform Monitoring
Discussion around monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform monitoring

Performance monitoring of mulss and imul on SMT


I am trying to understand port utilization of sandy-bridge while running multiplication.

I am running three versions of multiplication, in one version 2 sibling SMT thread is running floating point multiplication(mulss)(case 1)(port 0). Another version performs integer multiplication(imul)(case 2)(port 1) in sibling SMT threads and in final version, one sibling SMT thread is running mulss(port 0) and another thread running imul(case 3)(port 1). When I measure port utilization of port 0 and 1 using UOPS_DISPATCHED_PORT it seems that port 0 and 1 utilization is similar for case 1 and case 3. But it was expected that port 1 should be more utilized in case 3 compared to case 1 as port 1 performs imul operation.

UOPS_DISPATCHED_PORT:PORT_1 measures cycles per thread, does it mean it can observe only one thread and cannot report about the other sibling SMT thread?

0 Kudos
0 Replies