Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)
5113 Discussions

Why does _spin_lock has such high CPI in VTune report?

mfcking
Beginner
841 Views
Hello,
I used VTune 3.0 to sample the spin lock activitesinvoked bythe e1000 Gigabit driver and the Linux kernel 2.6.12.I found the CPI of _spin_lock is almost 27while _spin_lock has 100%L2 cache hit rate.
I checked the assembly code of _spin_lock in Linux and it uses the LOCK instruction.Based on IA32 optimization manual,the LOCK prefix does not lock the FSB once the referred data is found in the L2 cache of local CPU. However, it also goes to say that, Locked instructions are inherently slow, whether the data to be locked in found in the L2 cache or not.
I still do not understand what caused the CPI of _spin_lock so high?
Thanks a lot,
L.Y.
_spin_lock code in Linux
1: lock; decb slp# atomically decrement
jns 3f # if clear sign bit jump forward to 3
2: cmpb $0,slp # spin compare to 0
pause # spin wait
jle 2b # spin go back to 2 if <= 0 (locked)
jmp 1b # unlocked; go back to 1 to try to lock again
3: # we have acquired the lock

Message Edited by mfcking@yahoo.com on 07-11-2005 03:09 PM

0 Kudos
6 Replies
jeffrey-gallagher
841 Views
Just curious here, L.Y. Do you have calibration enabled or disabled in your sampling session? If you aren't sure, it's hard to guess because calibration is off by default for some events, and on by default for others.
If disabled, enable it and report back here what you see, the difference, if any.
cheers
jdg
For more:
$ man sampling
But in case this rings a bell, use "-cal yes" to turn it on, "-cal no" to turn it off in the syntax.
0 Kudos
Boaz_T_Intel
Employee
841 Views
One more interesting question is whether you run it on a Multi-CPU machine?What about HT?
If there is some way of parallelism, two threads accessing the same variable, or even different variables on the same cache line can cause large number of L2 cache misses.
Boaz.
0 Kudos
mfcking
Beginner
841 Views
Yes, I did run my testing on SMP(2 Xeon) with HTdisabled.

Message Edited by mfcking@yahoo.com on 07-12-2005 09:19 AM

0 Kudos
mfcking
Beginner
841 Views

Hi JDG,

I enabled calibration for all the events and the result is even worse (now is 29 and the CPI without calibration is 27):

FunctionClockticks per Instructions Retired (CPI) (261)

"_spin_lock" "29.153"

2nd-Level Cache Load Hit Rate (261)

"100.000"

Thanks,

L.Y.

Message Edited by mfcking@yahoo.com on 07-12-2005 01:01 PM

0 Kudos
TimP
Honored Contributor III
841 Views
I'm trying to understand whether you think that high CPI in a spin lock loop is good or bad. The usual goal would be to have the spin lock spend time as efficiently (issuing as few instructions) as possible, which clearly means a high CPI. This would be particularly true if the spin lock loop could be competing for resources with another thread, which would be enabled to do useful work with a lower CPI than if it were competing against the spin lock.
0 Kudos
mfcking
Beginner
841 Views
Hi Tim,
Yeah, it seems a single lock instruction in _spin_lock can cost 70 clock cycles. If we add up the clock cycles of other instructions in _spin_lock, a 29 CPI is a reasonable result.
Thanks,
Liang

Message Edited by mfcking@yahoo.com on 07-13-2005 10:44 AM

0 Kudos
Reply