Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

Impact of SPLIT LOCK when running on SkyLake processors


Hi Experts, 


I am observing a huge impact on performance of locked operation being split across two cache lines even though the code the locked operation happens does not show much of cycles spent there. 

With some simplification, in the case I have been looking at,  there are two groups of threads: the first does some memory intensive calculations like checksums which consumes most of cpu cycles. The second is relatively light threads, consuming ~1-2% of cycles , which does an atomic operation (lock or) as a part of its code path. After some code modification, the atomic shift and spanned two cache line. The performance of the checksum dropped as a result. 


Vtune's uarch showed that SQ and memory became a bottleneck for the checksum part.

Separately collected, SQ_MISC.SPLIT_LOCK showed  quite high rate of the event attributed to the lock "lock org .." 


Can anyone explain how the split lock may be implemented and how that affects the caches?





$ perf stat -e sq_misc.split_lock -a -- sleep 3

Performance counter stats for 'system wide':

            419881      sq_misc.split_lock 


$ perf stat -e sq_misc.split_lock -a -- sleep 3

Performance counter stats for 'system wide':

            12      sq_misc.split_lock 



0 Kudos
1 Reply

Split locks occur when an access has to span two cache lines and is detrimental to performance. To avoid this you need to make sure that your access request does not cross 64B boundaries. 

0 Kudos