Intel® ISA Extensions
Use hardware-based isolation and memory encryption to provide more code protection in your solutions.

Finding TSX abort root cause.

Fernando__Pradeep
509 Views

Hi All,

I am using a tsx/rtm enabled micro-benchmark ( vacation from STAMP benchmark suite) for my project. -- I run it single threaded. 

I run the same code on two different Intel processor micro-architectures. Broadwell and Cascadelake.

It runs well on Broadwell. But on Cascadelake, it has unusually high abort rates. over 50% of the started transactions get aborted.  

I used article[1] and tried debugging the root cause using Linux 'perf'. Perf tells me that high number of aborts on Cascadelake are due to 'TX_NEITHER' return code. (see below)

I cannot figure out the root cause of aborts/ how to stop it. The aborts are not due to capacity or conflicts. I have given the partial output of 'perf' report. Appreciate any help on this issue.

thanks,

--Pradeep

 

Partial output of perf report on Cascadelake.

Samples: 61K of event 'cpu/tx-abort/ppu', Event count (approx.): 1894098                                                                                                                                                                       
  Children      Self  Symbol                           Transaction                                                                                                                                                                             
+  155.23%    77.62%  [.] TMlookup                     TX NEITHER
+   89.76%     0.39%  [.] client_run                   TX NEITHER
+   88.98%     0.00%  [.] 0x30ce258d4c544155           TX NEITHER
+   88.98%     0.00%  [.] __libc_start_main            TX NEITHER
+   88.98%     0.00%  [.] main                         TX NEITHER
+   88.98%     0.00%  [.] thread_start                 TX NEITHER
+   88.98%     0.00%  [.] threadWait                   TX NEITHER
+   14.67%     7.33%  [.] TMlookup                     TX ASYNC CAP-WRITE
+    9.11%     0.04%  [.] client_run                   TX ASYNC CAP-WRITE
+    9.04%     0.00%  [.] 0x30ce258d4c544155           TX ASYNC CAP-WRITE
+    9.04%     0.00%  [.] __libc_start_main            TX ASYNC CAP-WRITE
+    9.04%     0.00%  [.] main                         TX ASYNC CAP-WRITE
+    9.04%     0.00%  [.] thread_start                 TX ASYNC CAP-WRITE
+    9.04%     0.00%  [.] threadWait                   TX ASYNC CAP-WRITE
+    4.39%     2.20%  [.] compareKeysDefault           TX NEITHER
+    3.83%     1.92%  [.] _int_malloc                  TX SYNC

 

Partial output of perf report on Broadwell

Samples: 40K of event 'cpu/tx-abort/pp', Event count (approx.): 675270                                                 
  Children      Self  Symbol                           Transaction                       Weight                        
+    0.02%     0.00%  [.] client_run                   TX ASYNC CAP-WRITE                16137
+    0.01%     0.00%  [.] client_run                   TX ASYNC CAP-WRITE                17998
+    0.01%     0.00%  [.] client_run                   TX ASYNC CAP-WRITE                16472
+    0.01%     0.00%  [.] client_run                   TX ASYNC CAP-WRITE                10599
+    0.01%     0.01%  [.] TMlookup                     TX SYNC                           5712
+    0.01%     0.00%  [.] client_run                   TX ASYNC CAP-WRITE                14041
+    0.01%     0.01%  [.] TMlookup                     TX SYNC                           25436
+    0.01%     0.01%  [.] TMlookup                     TX SYNC                           22064
+    0.01%     0.01%  [.] TMlookup                     TX SYNC                           18558
+    0.01%     0.01%  [.] TMlookup                     TX SYNC                           15536
+    0.01%     0.01%  [.] TMlookup                     TX SYNC                           14896
+    0.01%     0.01%  [.] TMlookup                     TX SYNC                           11092
+    0.01%     0.01%  [.] TMlookup                     TX SYNC                           10696
 

[1] https://software.intel.com/en-us/blogs/2013/05/03/intelr-transactional-synchronization-extensions-intelr-tsx-profiling-with-linux-0

 

0 Kudos
1 Reply
Alexander_V_2
Beginner
509 Views

This sounds similar to the problem we are having trying to investigate RTM aborts.  It would be good to get an expert to answer!  Now, we used Vtune to get the abort status, so I am not 100% sure about the mapping on Vtune "reasons" to what you got from Perf.  The Vtune screenshot is at the bottom of this mail

We run a simple test with one tiny transaction per thread's for loop iteration.  The transaction has two instructions in it, Here is the pthread function:

void test (const int volume, int threadNum, int memory)
{
    for (int i = 0; i < volume; i++)
    {
        sleep_until(system_clock::now() + milliseconds(10));
        tmp = (char*) calloc (memory, sizeof(char));
        auto volatile v = *tmp; 
        auto volatile vt = threadNum;
        unsigned status = _xbegin();
        if (status == _XBEGIN_STARTED)
        {
            tmp[memory-2] = threadNum;
            _xend();
        }
        if (status != _XBEGIN_STARTED)
        {
            aborts++;
        }
    }
    return;
}

 


 

0 Kudos
Reply