Wired tsx behavior

YangHun_P_ · ‎05-21-2016

Hi all,

Currently, I'm doing an experiment that splits single transaction into
small and multiple sub transactions with 'labyrinth' in STAMP benchmark.
Original TX and multiple sub-TX example codes are following:

[Original TX code]
TM_BEGIN()
func2(1,n-1)
TM_END()

[Multiple sub-TX code (ignore consistency of program)]
long start,end;
long chunk;
while(start < n -1)
{
     end=((start+chunk) < n-1 ) ? (start+chunk) : n-1);
     TM_BEGIN()
     func2(start,end);
     TM_END()
     start=end;
}

When chunk size is n-1, it is exactly the same as original code example
and when chunk size is less than 8, it runs well with TSX.
This is the TX statistics information when chunk size is 4
tx-start: 48474
tx-abort: 2837
tx-explicit: 1775
tx-conflict: 670
tx-capacity: 306
tx-other: 86

But, problem is here when chunk size is 8. The result shows wired counts.
tx-start: 41805
tx-abort: 19969
tx-explicit: 599
tx-conflict: 107
tx-capacity: 19154
tx-other: 109

It shows too high abort ratio, especially in capacity aborts!!
As far as I know, capacity aborts occur when transactional writes are evicted
from L1D cache due to lack of capacity.
But, the above result shows that a lot of capacity aborts occur, even with small TX chunk size.
Is there anyone who know why this results happen?
please, help me solve this problem.

More detailed code is written under

func1 {
     long n=getSize(pointVectorPtr);
     long start,end;
     long chunk=8;
     while(start < n -1)
     {
          end=((start+chunk) < n-1 ) ? (start+chunk) : n-1);
          func2(pointVectorPtr, start,end);
          start=end;
     }
}

func2 (pointVectorPtr, start, end) {
     tsx_begin();
     for(i=start; i<end; i++)
    {
          long *gridPointPtr = (long *)pointVectorPtr->elements;
          long value = (long)(*gridPointPtr);
          if( value != -1 ) {
                 TM_RESTART();
           }
           *gridPointPtr = -1;
     }
     tsx_end();
}

Roman_D_Intel · ‎05-23-2016

Hi,

according to this paper the scalability with HTM for "labyrinth" is expected to be low due to the very large thread-local-memory footprint accessed within transaction (up to 14 MByte).

Thanks,

Roman

Roman_D_Intel · ‎05-23-2016

Also: is data modified within transaction 4KByte aligned? If so then the program experiences associativity cache misses leading to TSX aborts.

A solution is to avoid 4KByte alignment.

Thanks,

Roman