Can someone explain why the time and memory accesses are showing up on the wrong line in the Assembly window?
Is this a common phenomenon, or maybe this is specific to situations where the lock prefix is used?
VT 2018 Update 1 (build 535340), Windows 10 Enterprise x64.
In its most common mode of operation, VTune takes samples when a performance counter overflows. There is always a delay between the execution of the instruction that generated the performance counter increment that causes the performance counter overflow and the processing of the interrupt. During this delay, the program will continue executing instructions. When the interrupt handler begins processing, it reads a program counter that points to an instruction after the one that caused the interrupt. This phenomenon is called "skid", and is discussed in the several sections of Chapter 18 of Volume 3 of the Intel Architectures Software Developer's Manual (document 325384).
This topic is also discussed in the VTune documentation, for example
Intel supports a "Precise Event-Based Sampling" (PEBS) infrastructure that provides reduced (and typically more predictable) skid for a subset of the performance counter events. This infrastructure and the associated events are also discussed in Chapters 18 and 19 of Volume 3 of the Intel Architectures SW Developer's Manual. VTune knows how to use PEBS events, but I am not sure how easy it is for a user to determine whether VTune is using "reduced-skid" events or not. The first VTune reference above discusses this.