3D X-Point programming versus OS context switching

David_Z_2 · ‎05-15-2017

Two related question with regard to Intel's 3D X-Point memory and the behavior of SFENCE and CLWB: 1) If one uses SFENCE, but not CLWB, what synchronization/ordering guarantees does 3D X-Point technology make? 2) Given that the OS can move a thread between cores, what if anything does CLWB guarantee about the ordering of writes made by a thread before a context switch to another core? 3) In short, is CLWB a performance optimization, or a semantic obligation? Thanks in advance!

SergeyKostrov · ‎05-19-2017

>>...3D X-Point programming... Do you mean some API to do 3D X-Point programming? Or, NVM programming?

David_Z_2 · ‎05-19-2017

I'm trying to understand this Intel blog post about the PCOMMIT instruction being deprecated and what is still required to use nonvolatile memory correctly. I think question #3 gets to the heart of the question above. Is my code obligated to use CLWB or not? (My intuition says that CLWB is an optimization, not an obligation, but I felt that I should check first.)

SergeyKostrov · ‎05-23-2017

>>...what is still required to use nonvolatile memory correctly... The article you've mentioned clearly describes it and CLWB needs to be used before SFENCE. I would consider it as a rule that guarantees that data is stored correctly ( integrity not violated ) in case of power failure.

David_Z_2 · ‎05-23-2017

The problem is, in a word: preemption. The OS is free to move threads around between CPUs, and there is no guarantee that the CPU that executed CLWB is the same CPU that executes SFENCE. Given that Intel provides "libpmem", a user-space library for nonvolatile memory programming, that strongly implies that nonvolatile memory programming is preemption safe. Given that operating systems do NOT flush the caches between context switches (because the performance would be awful), one is left concluding that either CLWB is a performance hint, or that Intel's libpmem is broken by design (and nonvolatile memory only works inside of preemption free kernel code).

I really doubt that libpmem is broken by design, that is why I want to confirm that CLWB is just a performance hint, not a semantic obligation.

SergeyKostrov · ‎05-23-2017

>>...The OS is free to move threads around between CPUs, and there is no guarantee that the CPU that executed CLWB is the same CPU >>that executes SFENCE... Two simple techniques could solve that problem ( or almost solve... ): - Set a thread affinity for a processing unit that currently executes a store procedure - Raise priority of the thread to Real Time and then lower the priority as soon as the store is completed

David_Z_2 · ‎05-23-2017

Intel's libpmem is an open source abstraction library that uses CLWB, and it does not manipulate scheduler priorities, nor does it even discuss an obligation for clients to do so. This also implies that CLWB is a performance hint and not a semantic obligation.

In short, I think the Intel blog post I linked to above isn't strictly correct based on all of the evidence presented in this thread, and I'm looking for clarification.