- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
Currently, I'm doing an experiment that splits single transaction into
small and multiple sub transactions with 'labyrinth' in STAMP benchmark.
Original TX and multiple sub-TX example codes are following:
[Original TX code] TM_BEGIN() func2(1,n-1) TM_END()
[Multiple sub-TX code (ignore consistency of program)] long start,end; long chunk; while(start < n -1) { end=((start+chunk) < n-1 ) ? (start+chunk) : n-1); TM_BEGIN() func2(start,end); TM_END() start=end; }
When chunk size is n-1, it is exactly the same as original code example
and when chunk size is less than 8, it runs well with TSX.
This is the TX statistics information when chunk size is 4
tx-start: 48474
tx-abort: 2837
tx-explicit: 1775
tx-conflict: 670
tx-capacity: 306
tx-other: 86
But, problem is here when chunk size is 8. The result shows wired counts.
tx-start: 41805
tx-abort: 19969
tx-explicit: 599
tx-conflict: 107
tx-capacity: 19154
tx-other: 109
It shows too high abort ratio, especially in capacity aborts!!
As far as I know, capacity aborts occur when transactional writes are evicted
from L1D cache due to lack of capacity.
But, the above result shows that a lot of capacity aborts occur, even with small TX chunk size.
Is there anyone who know why this results happen?
please, help me solve this problem.
More detailed code is written under
func1 { long n=getSize(pointVectorPtr); long start,end; long chunk=8; while(start < n -1) { end=((start+chunk) < n-1 ) ? (start+chunk) : n-1); func2(pointVectorPtr, start,end); start=end; } } func2 (pointVectorPtr, start, end) { tsx_begin(); for(i=start; i<end; i++) { long *gridPointPtr = (long *)pointVectorPtr->elements; long value = (long)(*gridPointPtr); if( value != -1 ) { TM_RESTART(); } *gridPointPtr = -1; } tsx_end(); }
- Tags:
- Intel® Advanced Vector Extensions (Intel® AVX)
- Intel® Streaming SIMD Extensions
- Parallel Computing
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
according to this paper the scalability with HTM for "labyrinth" is expected to be low due to the very large thread-local-memory footprint accessed within transaction (up to 14 MByte).
Thanks,
Roman
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Also: is data modified within transaction 4KByte aligned? If so then the program experiences associativity cache misses leading to TSX aborts.
A solution is to avoid 4KByte alignment.
Thanks,
Roman
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page