- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
James Reinders (if you read this),
Could you comment on a potential unintended beneficial side effect of TSX.
At first glance this may seem odd, consider a single threaded application that enters a transaction (XACQUIRE or XBEGIN), from that point on until the exit (XRELEASE or XEND) all memory reads and writes (up to your buffering capacity) are effectively "sticky" with respect to the internal copies. IOW these locations behave similar to non-evictable L1 cache locations (be they in L1 or elsewhere). Further the the writes and read/modify/writes are write combining with respect to the internal storage as opposed to in RAM via the memory bus. What I am suggesting is, TSX can be used to improve performance of code that has no thread contention. Is this a valid assessment?
Jim Dempsey
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Jim,
>these locations behave similar to non-evictable L1 cache locations
I think that processors do not evict w/o reasons. So if there are no reasons to evict, it is the same with and w/o TSX. If there are reasons, one gets either evictions (preferable) or trx abort (not preferable).
>the writes and read/modify/writes are write combining with respect to the internal storage as opposed to in RAM via the memory bus
With WB memory type (default) you get this for free. All writes are effectively "write-combining" (however on the higher level then intra-core write combining, which avoids even accessing caches).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>the writes and read/modify/writes are write combining with respect to the internal storage as opposed to in RAM via the memory bus
With WB memory type (default) you get this for free. All writes are effectively "write-combining" (however on the higher level then intra-core write combining, which avoids even accessing caches).
<<When not in HLE nor TSX protected region the WB memory subsystem must preserve (the appearance of) memory write order. While this permits multple locations within a cache line to be written together... it is at a provision that aninterviening cache line was not written.
It appears from the documentation that HLE and TSX will extend this to all aggrigated cache lines being written as a locked batch and thus can extent the appearance of memory write order to the number of cache lines permitted within the protected region.
Consider
IntCacheLineA[0]++
IntCacheLineA[1]++
IntCacheLineX++ (seperated by larger than memory load)
IntCacheLineA[0]++
IntCacheLineA[1]++
In non-HLE/TSX protected area, the two IntCacheLineA[] could be combined and written together, followed by write to IntCacheLineX, then followed by one write of the subsequent two updates to IntCacheLineA. IOW 3 writes.
XACQUIRE thisSection
IntCacheLineA[0]++
IntCacheLineA[1]++
IntCacheLineX++ (seperated by larger than memory load)
IntCacheLineA[0]++
IntCacheLineA[1]++
XRELEASE thisSection
Int the above TSX protected region, the lock/unlock of the thisSection is eleded and two writes are performed (as locked batch write).
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Not written from the cache that HLE/TSX uses (as to if this is L1, L2, L3 or other specialized cache this is not disclosed) to the cache and/or RAM that the rest of the processor(s) use.
My guess is
On entry to TSX region the cache line containing the semaphor is read and saved as "at entry state".
First read of a cache line pulls in from the closer of (TSX cache if seperated from L1), L1, L2, L3, RAM, (NUMA hops).
First write of a cache line (may requiring a first read too) writes to a TSX cache (may be L1 but not necessarily L1).
Any subsequent reads/writes are performed with the TSX cache (may be L1 but not necessarily L1).
Successfull release:
bus lock
The cache line containing the semaphor if different from "at entry state" is written, if not different is not written. Note, L1 cache (if used)may show line as dirty (when not written).
All protected dirty cache lines are written
bus unlock
Where bus lock is effective for any cache line or memory bus outside the level of the TSX cache system
Jim Dempsey
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page