Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.

RTM: Finding memory address at which transaction was aborted

William_Leiserson
2,027 Views

If a page is marked Copy-on-Write, and I try to write to it inside of a transaction, the transaction aborts.  If I know the address at which it aborted, there is a trivial fix:

int v = addr[0];
__sync_bool_compare_and_swap(addr, v, v); // Force CoW.

And then retry the transaction.  Again, this only works if I know the cacheline on which the transaction was aborted.  Is there a way to find this?  Is it in a performance counter or something?

0 Kudos
1 Solution
Roman_D_Intel
Employee
2,027 Views

There is no perf counter or register with the memory access address that causes the abort. I think the best you can do is to retry the transaction body under a global lock instead of using TSX. In the TSX abort handler you can check the abort status (in EAX register) if the abort was persistent (RETRY bit = 0).

Roman

 

View solution in original post

0 Kudos
11 Replies
jimdempseyatthecove
Honored Contributor III
2,027 Views

Three options that I can think of:

a) prior to issuing each instruction that may cause a transaction abort, write the address that may abort into a memory location that won't abort. Somewhat of the same philosophy of a try/catch. Should the transaction abort __sync_fetch_add(loc,0); // RMW

b) prior to entering the transaction region (and inside your retry code) performs something like

     __sync_fetch_add(locA,0); // RMW
     __sync_fetch_add(locB,0);
     ...
     ... start transaction (if abort, go back/redo __sync_fetch_add's, then retry transaction)

c) start transaction, then if abort, perform the __sync_fetch_add's of b) and retry transaction

You will have to determine if choice a), b) or c) is more efficient.

Jim Dempsey

0 Kudos
Roman_D_Intel
Employee
2,027 Views

Hi,

before writing to a page you can printf its address using tsx_printf (it escapes transaction also working for transactions that are aborted). You need a processor with Skylake architecture for that.

Roman

0 Kudos
William_Leiserson
2,027 Views

Hi Jim and Roman,

(a) may be a workable possibility, but it won't catch all cases.  Same with tsx_printf (which is very clever, btw; I just read your blog post about it).

Let me provide some context:  I'm developing a compiler that allows the code:

xbegin;
​// Do some transactional stuff.
xcommit;

If the transaction aborts, it will simply try again until it succeeds.  However, the "do some transactional stuff" may call out to C or C++, which contains code that the compiler didn't generate and can't hook.  This is expected to be a common scenario in the language.  So it could be walking a graph or some such, and the memory is non-trivial to find outside of the transaction.

0 Kudos
Roman_D_Intel
Employee
2,028 Views

There is no perf counter or register with the memory access address that causes the abort. I think the best you can do is to retry the transaction body under a global lock instead of using TSX. In the TSX abort handler you can check the abort status (in EAX register) if the abort was persistent (RETRY bit = 0).

Roman

 

0 Kudos
jimdempseyatthecove
Honored Contributor III
2,027 Views

Roman,

tsx_printf is an interesting hack. I would have done something different.

#include <stdio.h>
#include <stdarg.h>
#include <stdlib.h>

#define TSX_PRINTF_BUF_PADD 1024
#define TSX_PRINTF_BUF_LEN (1024*1024)
char tsx_printf_str[TSX_PRINTF_BUF_LEN+TSX_PRINTF_BUF_PADD];
int  tsx_printf_fill = 0;
bool tsx_printf_wrapped =  false;

int tsx_printf(const char* format, ...)
{
    va_list list;
    va_start(list, format);
    int ret = vsnprintf(str+tsx_printf_fill, TSX_PRINTF_BUF_PADD, format, list);
    va_end(list);
    if((tsx_printf_fill += ret) == TSX_PRINTF_BUF_LEN)
    {
        tsx_printf_wrapped = true;
        int j = TSX_PRINTF_BUF_LEN;
        tsx_printf_fill = 0;
        for(int i = 0; i < ret; ++i)
        {
            tsx_printf_str[tsx_printf_fill++] = tsx_printf_str[j++];
        }
    }
    return ret;
}

void tsx_printf_dump()
{
    if(tsx_printf_wrapped)
    {
        tsx_printf_str[TSX_PRINTF_BUF_LEN] = 0;
        printf(&tsx_printf_str[tsx_printf_fill]);
    }
    tsx_printf_str[tsx_printf_fill] = 0;
    printf((&tsx_printf_str);
    tsx_printf_fill = 0;
    tsx_printf_wrapped =  false;
}

Jim Dempsey

0 Kudos
William_Leiserson
2,027 Views

Okay, yeah.  I was afraid of that.  Thanks Roman.

0 Kudos
Roman_D_Intel
Employee
2,027 Views

Jim,

changes to your tsx_printf memory buffer will be lost in case of an abort (e.g. if it happens after tsx_printf). The Intel processor trace records instruction control flow also in aborted transactions. My tsx_printf is (mis-)using it allowing the output data survive aborts.

Best regards,

Roman

0 Kudos
William_Leiserson
2,027 Views

jimdempseyatthecove wrote:

a) prior to issuing each instruction that may cause a transaction abort, write the address that may abort into a memory location that won't abort. Somewhat of the same philosophy of a try/catch. Should the transaction abort __sync_fetch_add(loc,0); // RMW

Hi Jim,

I've been thinking more about this.  Is there a way to specify the memory I don't want added the transaction, or is there pre-defined memory that I can use?  Do you have a link with information on how this would work?

0 Kudos
jimdempseyatthecove
Honored Contributor III
2,027 Views

Roman, oops. You are right.

William, either all memory access is transactional or not. The instruction trace hack is not backed out. You could potentially use the instruction trace information in your transaction abort handler. This should be documented in a systems programmer manual. Perhaps Roman could provide a link. Converting the address into source code line number would be up to you to figure out.

Jim Dempsey

0 Kudos
William_Leiserson
2,027 Views

Hey Jim,

This was what I thought.  I must have misunderstood your first comment.

Thanks!

0 Kudos
Roman_D_Intel
Employee
2,027 Views

Setting up processor trace recording and reading the results directly from hardware is only possible in the kernel (ring 0) and not from user space. I am not sure if it is practical for this use case (compiler).

Roman

0 Kudos
Reply