Finding TSX abort root cause.

Fernando__Pradeep · ‎03-09-2019

Hi All,

I am using a tsx/rtm enabled micro-benchmark ( vacation from STAMP benchmark suite) for my project. -- I run it single threaded.

I run the same code on two different Intel processor micro-architectures. Broadwell and Cascadelake.

It runs well on Broadwell. But on Cascadelake, it has unusually high abort rates. over 50% of the started transactions get aborted.

I used article[1] and tried debugging the root cause using Linux 'perf'. Perf tells me that high number of aborts on Cascadelake are due to 'TX_NEITHER' return code. (see below)

I cannot figure out the root cause of aborts/ how to stop it. The aborts are not due to capacity or conflicts. I have given the partial output of 'perf' report. Appreciate any help on this issue.

thanks,

--Pradeep

Partial output of perf report on Cascadelake.

Samples: 61K of event 'cpu/tx-abort/ppu', Event count (approx.): 1894098
Children Self Symbol Transaction
+ 155.23% 77.62% [.] TMlookup TX NEITHER
+ 89.76% 0.39% [.] client_run TX NEITHER
+ 88.98% 0.00% [.] 0x30ce258d4c544155 TX NEITHER
+ 88.98% 0.00% [.] __libc_start_main TX NEITHER
+ 88.98% 0.00% [.] main TX NEITHER
+ 88.98% 0.00% [.] thread_start TX NEITHER
+ 88.98% 0.00% [.] threadWait TX NEITHER
+ 14.67% 7.33% [.] TMlookup TX ASYNC CAP-WRITE
+ 9.11% 0.04% [.] client_run TX ASYNC CAP-WRITE
+ 9.04% 0.00% [.] 0x30ce258d4c544155 TX ASYNC CAP-WRITE
+ 9.04% 0.00% [.] __libc_start_main TX ASYNC CAP-WRITE
+ 9.04% 0.00% [.] main TX ASYNC CAP-WRITE
+ 9.04% 0.00% [.] thread_start TX ASYNC CAP-WRITE
+ 9.04% 0.00% [.] threadWait TX ASYNC CAP-WRITE
+ 4.39% 2.20% [.] compareKeysDefault TX NEITHER
+ 3.83% 1.92% [.] _int_malloc TX SYNC

Partial output of perf report on Broadwell

Samples: 40K of event 'cpu/tx-abort/pp', Event count (approx.): 675270
Children Self Symbol Transaction Weight
+ 0.02% 0.00% [.] client_run TX ASYNC CAP-WRITE 16137
+ 0.01% 0.00% [.] client_run TX ASYNC CAP-WRITE 17998
+ 0.01% 0.00% [.] client_run TX ASYNC CAP-WRITE 16472
+ 0.01% 0.00% [.] client_run TX ASYNC CAP-WRITE 10599
+ 0.01% 0.01% [.] TMlookup TX SYNC 5712
+ 0.01% 0.00% [.] client_run TX ASYNC CAP-WRITE 14041
+ 0.01% 0.01% [.] TMlookup TX SYNC 25436
+ 0.01% 0.01% [.] TMlookup TX SYNC 22064
+ 0.01% 0.01% [.] TMlookup TX SYNC 18558
+ 0.01% 0.01% [.] TMlookup TX SYNC 15536
+ 0.01% 0.01% [.] TMlookup TX SYNC 14896
+ 0.01% 0.01% [.] TMlookup TX SYNC 11092
+ 0.01% 0.01% [.] TMlookup TX SYNC 10696

[1] https://software.intel.com/en-us/blogs/2013/05/03/intelr-transactional-synchronization-extensions-intelr-tsx-profiling-with-linux-0

Alexander_V_2 · ‎01-28-2020

This sounds similar to the problem we are having trying to investigate RTM aborts. It would be good to get an expert to answer! Now, we used Vtune to get the abort status, so I am not 100% sure about the mapping on Vtune "reasons" to what you got from Perf. The Vtune screenshot is at the bottom of this mail

We run a simple test with one tiny transaction per thread's for loop iteration. The transaction has two instructions in it, Here is the pthread function:

void test (const int volume, int threadNum, int memory)
{
   for (int i = 0; i < volume; i++)
   {
       sleep_until(system_clock::now() + milliseconds(10));
       tmp = (char*) calloc (memory, sizeof(char));
       auto volatile v = *tmp;
       auto volatile vt = threadNum;
       unsigned status = _xbegin();
       if (status == _XBEGIN_STARTED)
       {
           tmp[memory-2] = threadNum;
           _xend();
}
       if (status != _XBEGIN_STARTED)
       {
           aborts++;
       }
}
   return;
}