TSX problem(s)

Kelvin_C_ · ‎04-22-2019

Consider the following scene:

There are two thread :
One is secure thread, trying to enter RTM, and monitoring anyone who wants access RTM
Two is attack thread, simply thread, didn't have TSX instruction, and trying to access RTM region.

I've many question of TSX during my research:

1) What happened internally if the attack thread accesses the RTM memory?

I supposed the attack thread execution path will be redirected by secure thread's fallback pointer with attack thread context, however, in fact, I've try to get the Thread ID from fallback instruction part, it magically got the secure thread part, is processor may record the thread which is using TSX and RTM?

How does it works internally??

2) I'm wonder if is there any method for getting the which instruction point was accessing the region and causing the conflict?

Best

jimdempseyatthecove · ‎04-23-2019

The flow of the attack thread is unchanged. The flow rate may vary. IOW the cache line may have to get reloaded in order to perform the write.

The secure thread will encounter aborts due to conflicts. And as a result experience severe performance reduction while passing through the exception handler (e.g. retry or other means to work around the conflict).

*** This assumes g_Count2 is accessible by both threads.

For future reference, use the {...} button in the tool bar to paste the text of your code (select C++ format). This way readers can import your sample code for testing and evaluation. Also, assure the provided code is complete (g_Count2 not provided).

Jim Dempsey

Kelvin_C_ · ‎04-23-2019

Hi Jim,

Thanks for your answer.

Do you know why the secure execution path is changed? even the thread is keeping on infinite-loop?

For my understanding, there's somethings like "track" in processors for run-time monitoring if anyone access occurred, then cause an exception (fallback)

If my assumption makes sense, how processors remember which thread should be throw an exception? for keeping the stability of whole system.

jimdempseyatthecove · ‎04-24-2019

https://software.intel.com/en-us/cpp-compiler-developer-guide-and-reference-intel-transactional-synchronization-extensions-intel-tsx-overview

A processor can perform a transactional abort for numerous reasons. A primary cause is due to conflicting accesses between the transactionally executing logical processor and another logical processor. Such conflicting accesses may prevent a successful transactional execution. Memory addresses read from within a transactional region constitute the read-set of the transactional region and addresses written to within the transactional region constitute the write-set of the transactional region. Intel® TSX maintains the read- and write-sets at the granularity of a cache line. A conflicting access occurs if another logical processor either reads a location that is part of the transactional region’s write-set or writes a location that is a part of either the read- or write-set of the transactional region.

Jim Dempsey

andysem · ‎04-26-2019

Somewhat offtopic: I wonder why a read at the location of the transaction's write-set should cause a transaction abort. The read could have returned the original contents of the memory, not modified by the transaction; that would not invalidate the atomicity of the transaction. Is this a limitation of the current implementation in Intel CPUs (e.g. if the two logical cores share cache and the cache architecture does not support storing the original cache line alongside with the modified one)?

McCalpinJohn · ‎04-26-2019

Atomicity of transactions spanning multiple cache lines has many, many subtle cases that can lead to unexpected race conditions. Some of these are "architectural" (in the sense of the processor's published ordering model) and some are "implementation dependent" (e.g., may be possible or not possible depending on the size of reorder queues). This latter case is one of the reasons transactional memory was created -- humans are not good at reasoning about ordering, so many codes are just "tweaked" until they "work" -- only to break unexpectedly when a new processor implementation with greater out-of-order capability enables triggering the underlying race condition.

I don't know if it is an issue here, but if the target line starts off in E or M state (writable), a load to that address will downgrade the line to S state (not writable). The subsequent store will then have to perform its coherence transaction over again, which can introduce additional ordering complications (at least on some implementations that I have worked on).

Another possible implementation approach is to defer the read, rather than aborting the transaction. Coherence protocols all have the ability to defer reads when they hit a cache line in a "transient state". A similar approach could be used here -- NACK (or stall) the read, hoping that the transaction completes before the read is retried (so the read will get the new values). To prevent deadlock/livelock you probably need to limit the number of read deferrals to a small number -- possibly one per transaction.

This is not easy stuff -- as evidenced by the number of iterations it took Intel to get a working version....

Kelvin_C_ · ‎04-28-2019

I'm just curious is there any possibility to figure out where is the abort source, eg. context, instruction pointer, or thread

jimdempseyatthecove · ‎04-30-2019

VTune may be of assistance. You will have to ignore the activity on the thread attempting the transaction.

You may need to make at least two sampling runs. One with the TSX protected thread running and one with it paused or inactive. Then compare the activity of the other threads. You might see a difference in the cache activity.

A second route that is similar is to condition your TSX code such that by setting a flag variable your code avoids entering the TSX region, but instead takes the abort path that resolves the conflict with use of traditional means (mutex, or other means). Then using VTune, examine the activity differences in cache usage between when you are in your protected region and when you are not.

I am not an expert on this and do not know if the cache adverse characteristic of "false sharing" where writes access to a cache line that lies on some modulus address of another cache line causes the eviction of the other cache line, would also cause a transaction abort. Should this be the case, then transaction aborts will be aggravated.

Keep in mind that in writing your code that it is important to keep all other activity to the same cache lines out of the way of those being used by your protected region. This generally requires inserting pads into your structure data (e.g. separate your fill pointer from empty pointer in a ring buffer such that a TSX protected push function does not abort your TSX protected pop function that does not abort your TSX protected push, etc, etc.... (i.e. experiencing a deadlock of transaction aborts).

Jim Dempsey

Crownie__Daniel · ‎05-14-2019

I found the solution thanks, to you! Thanks!