- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I hope this forum is the right place to ask this question, please forgive me if it is not.
I am trying to run some benchmarks to measure the read and write sets available in RTM on a Haswell machine. However, the results I get are quite surprising, since they are larger than the L1 and L2 caches: I find the maximum write set is about 280 KB and the read set is about 512 KB....which should not be possible according to the Intel specification (the write set should not exceed L1 cache capacity, 64KB, and the read set should not exceed L2 cache capacity, 256KB).
I must be doing something wrong, but I cannot tell what exactly. The principle of my benchmark is quite straightforward: I allocate a big array (100MB), and I try to access a specific size from inside a transaction. I increase the accessed size until the transaction fails with a capacity abort.
Can someone provide some insight about this behaviour?
Thanks,
Sylvain
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for your reply.
I already know the cache hierarachy of my machine: L1 caches are 32KB and private (one per core) for both data and instructions. L2 cache is 256KB and also private (one per core) but it is shared for data and instructions. L3 cache is shared among all the 4 cores and is 8MB.
What I do not understand is that, according to Intel specification, the TSX write set should not exceed L1 cache size and the TSX read set should not exceed L2 cache size. However, my benchmark gives me results that are bigger than those sizes, so I am trying to get some information about the actual sizes of TSX read and write sets here, or how to measure them correctly.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Does intel specification clarify the used cache level? Based on what do you say that write set is in L1 and read set is in L2? Why write set cannot be implemented in L2 while read set cannot use L3?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
A supplement, optimization manual clarifies that both read and write are traced in the L1 cache, so your problem is really strange. Can you share your new findings?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
could you provide a specific document reference to the supplement?
AFAIK, the latest document version is 248966-028 (July 2013).
EDIT: Just re-read that doc 12.1.1 says that the processor tracks addresses in L1 cache
Thx,
Rolf
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Right. It's in section 12.1.1 in version 248966-028.
"The processor tracks both the read-set addresses and the write-set addresses in the first level data cache (L1 cache) of the processor."
For read set, there is another implementation-specific second level structure, which is not necessarily in L2.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
yes, just saw that. sorry to take up your time. should have read again before asking.
Thanks,
Rolf
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page