Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.

TSX - XACQUIRE

jimdempseyatthecove
Honored Contributor III
1,037 Views

James Reinders (if you read this),

Could you comment on a potential unintended beneficial side effect of TSX.

At first glance this may seem odd, consider a single threaded application that enters a transaction (XACQUIRE or XBEGIN), from that point on until the exit (XRELEASE or XEND) all memory reads and writes (up to your buffering capacity) are effectively "sticky" with respect to the internal copies. IOW these locations behave similar to non-evictable L1 cache locations (be they in L1 or elsewhere). Further the the writes and read/modify/writes are write combining with respect to the internal storage as opposed to in RAM via the memory bus. What I am suggesting is, TSX can be used to improve performance of code that has no thread contention. Is this a valid assessment?

Jim Dempsey

0 Kudos
4 Replies
Dmitry_Vyukov
Valued Contributor I
1,037 Views

Hi Jim,

>these locations behave similar to non-evictable L1 cache locations

I think that processors do not evict w/o reasons. So if there are no reasons to evict, it is the same with and w/o TSX. If there are reasons, one gets either evictions (preferable) or trx abort (not preferable).

>the writes and read/modify/writes are write combining with respect to the internal storage as opposed to in RAM via the memory bus

With WB memory type (default) you get this for free. All writes are effectively "write-combining" (however on the higher level then intra-core write combining, which avoids even accessing caches).

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,037 Views
>>

>the writes and read/modify/writes are write combining with respect to the internal storage as opposed to in RAM via the memory bus

With WB memory type (default) you get this for free. All writes are effectively "write-combining" (however on the higher level then intra-core write combining, which avoids even accessing caches).

<<
When not in HLE nor TSX protected region the WB memory subsystem must preserve (the appearance of) memory write order. While this permits multple locations within a cache line to be written together... it is at a provision that aninterviening cache line was not written.

It appears from the documentation that HLE and TSX will extend this to all aggrigated cache lines being written as a locked batch and thus can extent the appearance of memory write order to the number of cache lines permitted within the protected region.

Consider

IntCacheLineA[0]++
IntCacheLineA[1]++
IntCacheLineX++ (seperated by larger than memory load)
IntCacheLineA[0]++
IntCacheLineA[1]++

In non-HLE/TSX protected area, the two IntCacheLineA[] could be combined and written together, followed by write to IntCacheLineX, then followed by one write of the subsequent two updates to IntCacheLineA. IOW 3 writes.

XACQUIRE thisSection
IntCacheLineA[0]++
IntCacheLineA[1]++
IntCacheLineX++ (seperated by larger than memory load)
IntCacheLineA[0]++
IntCacheLineA[1]++
XRELEASE thisSection

Int the above TSX protected region, the lock/unlock of the thisSection is eleded and two writes are performed (as locked batch write).

Jim Dempsey
0 Kudos
Dmitry_Vyukov
Valued Contributor I
1,037 Views
> When not in HLE nor TSX protected region the WB memory subsystem must preserve (the appearance of) memory write order. While this permits multple locations within a cache line to be written together... it is at a provision that aninterviening cache line was not written.
What do you mean here by "not written"? Not written from cache to main memory or not written from processor to cache?
0 Kudos
jimdempseyatthecove
Honored Contributor III
1,037 Views
>What do you mean here by "not written"? Not written from cache to main memory or not written from processor to cache?

Not written from the cache that HLE/TSX uses (as to if this is L1, L2, L3 or other specialized cache this is not disclosed) to the cache and/or RAM that the rest of the processor(s) use.

My guess is

On entry to TSX region the cache line containing the semaphor is read and saved as "at entry state".

First read of a cache line pulls in from the closer of (TSX cache if seperated from L1), L1, L2, L3, RAM, (NUMA hops).

First write of a cache line (may requiring a first read too) writes to a TSX cache (may be L1 but not necessarily L1).

Any subsequent reads/writes are performed with the TSX cache (may be L1 but not necessarily L1).

Successfull release:

bus lock
The cache line containing the semaphor if different from "at entry state" is written, if not different is not written. Note, L1 cache (if used)may show line as dirty (when not written).
All protected dirty cache lines are written
bus unlock

Where bus lock is effective for any cache line or memory bus outside the level of the TSX cache system

Jim Dempsey

0 Kudos
Reply