Sorry, maybe here is not the right place to ask a question about rtm provided by haswell, but i don't know where should I post my question. I run a simple test case with the xbegin and xend intructions provided by haswell. The test case is single thread and just touched 20K continued memory bytes in the RTM protected region.(which is much smaller than the l1 cache size.) When using sdk, the test will completes without any abort event. But when I run it on a real haswell machine, it will incur a number of capacity aborts and work out after a number of retries. I want to ask in the real machine, what kind of event will cause the capacity abort except for the cache miss.
The right place for this question is the Intel® AVX and CPU Instructions forum at http://software.intel.com/en-us/forums/intel-avx-and-cpu-instructions
Our best guess is that you are exceeeding the associativity of the L1 cache. However, as a general principle, we've found that micro-benchmarks of TSX are generally unhelpful in understanding how it will work with real codes (because it's very hard to get a micro-benchmark that represents a real code sufficiently accurately), and that a better approach is to try TSX in your real code, and then follow the guidelines in chapter 12 of the optimzation guide at http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimizat...
Thanks for your reply. :-) I use the sde (the emulator in intel) to run the program, it doesn't incur the L1 cache misses. I found even i add a single variable (i++) in the rtm protected region, it also have abortsometimes.
TSX never guarantees completion. Transactional code that has no memory accesses can still abort either every time it executes (for instance if there is an unfriendly instuction), or "randomly" (for instance if an interrupt arrives). This sort of complexity is why measuring the behaviour of real code is more useful than playing with micro-benchmarks. (Aside from the fact that it's the performance of real code that matters :-)).
As Jim mentioned on real machine the interrupts and rare microarchitectural conditions may cause infrequent random aborts (at noise-level). The SDE does not emulate those aborts (and it does not need to because they are very rare). Playing with microbenchmarks is fun, but it is more productive to analyse the behavior of TSX eliding locks and critical sections in a real-world application.