Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

Impact of SPLIT LOCK when running on SkyLake processors

Rustem
Beginner
936 Views

Hi Experts, 

 

I am observing a huge impact on performance of locked operation being split across two cache lines even though the code the locked operation happens does not show much of cycles spent there. 

With some simplification, in the case I have been looking at,  there are two groups of threads: the first does some memory intensive calculations like checksums which consumes most of cpu cycles. The second is relatively light threads, consuming ~1-2% of cycles , which does an atomic operation (lock or) as a part of its code path. After some code modification, the atomic shift and spanned two cache line. The performance of the checksum dropped as a result. 

 

Vtune's uarch showed that SQ and memory became a bottleneck for the checksum part.

Separately collected, SQ_MISC.SPLIT_LOCK showed  quite high rate of the event attributed to the lock "lock org .." 

 

Can anyone explain how the split lock may be implemented and how that affects the caches?

Thanks,

Rustem

 

BAD

$ perf stat -e sq_misc.split_lock -a -- sleep 3

Performance counter stats for 'system wide':

            419881      sq_misc.split_lock 

GOOD

$ perf stat -e sq_misc.split_lock -a -- sleep 3

Performance counter stats for 'system wide':

            12      sq_misc.split_lock 

 

 

0 Kudos
1 Reply
A_T_Intel
Employee
830 Views

Split locks occur when an access has to span two cache lines and is detrimental to performance. To avoid this you need to make sure that your access request does not cross 64B boundaries. 

0 Kudos
Reply