Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.

lambda capture [this]

jimdempseyatthecove
Honored Contributor III
787 Views

EDIT: Please disregard this post. JD

V 17.0.132 Build 20161005

I am doing some template meta functions, and have made great progress.... up until now.

I have an object (struct), with a member function. I wish to construct a lambda capture of either:

auto x = [=]() { this->memvar = 0; };
auto x = [this]() { memvar = 0; };

In both cases, the this pointer is corrupted.

I've also tried, with no success:

myType* This = this;
auto x = [&]() { This->memvar = 0; };
auto x = [=]() { This->memvar = 0; };

I haven't done this, but I suspect I can create a static member function:

struct myType
{
  static void foo(myType* This) { This->memvar = 0; }

  //Then in some other non-static member function:
  void other()
  {
    myType* This = this;
    auto x = [=](){ foo(This); };
   }
};

*** Note, I am not using "auto x =", rather I am passing the lambda object on to a different template.
I do reach the code inside the capture, when the other template eventually gets there. The Disassembly looks OK. but the value of this ([this]) and This ([=]) is corrupt.

Any suggestions for cleaner code would be appreciated.

Jim Dempsey

0 Kudos
7 Replies
jimdempseyatthecove
Honored Contributor III
787 Views

I did a little more inspection. I am building a 64-bit application. It appears that the this pointer is passed as a 32-bit number into a 64-bit location in the lambda capture without sign extension

lambda this.jpg

In both the [this] and [&] cases, the low 32-bits of the 64-bit were passed, but the high 32-bits (formerly 0000) were not.

If I edit the this->this pointer to remove the 200 "junk", the program works as expected.

Jim Dempsey

0 Kudos
jimdempseyatthecove
Honored Contributor III
787 Views

I managed to hack in a fix (at least it seems to work) for Linux 64-bit, I haven't tried Windows.

// Check windows
#if _WIN32 || _WIN64
#if _WIN64
#define ENVIRONMENT64
#else
#define ENVIRONMENT32
#endif
#endif

// Check GCC
#if __GNUC__
#if __x86_64__ || __ppc64__
#define ENVIRONMENT64
#else
#define ENVIRONMENT32
#endif
#endif

#if defined(ENVIRONMENT64)
 // Intel V17.0.1 had a bug where when a lambda capture was performed
 // within a class/struct member function, on a 64-bit system, that
 // only the low 32-bits of the this pointer was captured.
 // When the bug is fixed, remove or comment out this section
 // Use these macros only with respect to lambda captures within
 // member functions
 // place LAMBDA_THIS_FIX_MASK in the source file requiring the fix
 // place LAMBDA_THIS_FIX_CAPTURE prior to capture
 // place LAMBDA_THIS_FIX inside the captured function as first statement
 // ***
 // *** This is a simplified fix that has a potential bug ***
 // ***
 // *** In a multi-threaded system:
 // ***
 // ***  when all objects of each class/struct using this
 // ***  reside below 4GB or above 4GB then this is always safe.
 // ***
 // ***  when different objects reside on different sides of the 4GB
 // ***  boundry, then a race condition exists between the time of
 // ***  LAMBDA_THIS_FIX_CAPTURE and time of LAMBDA_THIS_FIX
 // ***  for the same class/struct)
 // ***  This can be fixed with additional coding effort
 #define USE_LAMBDA_THIS_FIX
 #if defined(USE_LAMBDA_THIS_FIX)
  #define LAMBDA_THIS_FIX_MASK static __int64 lambda_this_fix_mask;
  #define LAMBDA_THIS_FIX_CAPTURE lambda_this_fix_mask = ((reinterpret_cast<__int64>(this)) & 0xFFFFFFFF00000000) ? -1 : 0xFFFFFFFF;
//  #define LAMBDA_THIS_FIX { intptr_t x = lambda_this_fix_mask; __asm { and rcx, x } } volatile;
  #define LAMBDA_THIS_FIX __asm__ __volatile__ ( \
    "mov -0x10(%%rbp),%%rbx\n\t" \
    "and %%rax,(%%rbx)\n\t" \
    : : "a"(lambda_this_fix_mask) : "memory" );
 #else
  #define LAMBDA_THIS_FIX_MASK
  #define LAMBDA_THIS_FIX_CAPTURE
  #define LAMBDA_THIS_FIX
 #endif
#else
 #define LAMBDA_THIS_FIX_MASK
 #define LAMBDA_THIS_FIX_CAPTURE
 #define LAMBDA_THIS_FIX
#endif

Place that into a header (after __int64 is defined).

Into each source file that contains class/struct member functions *** that contain lambda capture functions referencing this.

Insert LAMBDA_THIS_FIX_MASK into the code outside the context of the class/struct. This creates a static __int64 variable which will hold the mask.

// parallel_matrix_multiplySliceTransposeTagTeam.cpp
// -pstt

#include "MatrixMultiply.h"

LAMBDA_THIS_FIX_MASK

In the member functions, just prior to the capture, insert LAMBDA_THIS_FIX_CAPTURE. This determines the side of the 4GB boundry of the current this pointer... to all class/struct in the source file using the fix. I tried placing this into the class/struct as a static member variable, but on my system, the Linker could not resolve the SomeClass::lambda_this_fix_mask. That is a different issue that does not affect me at this time. And n the captured lambda function, insert as first statement LAMBDA_THIS_FIX:

struct TagTeamsSystemPSTT_NUMA
{
   ...
 bool init(double** _A, double** _B, double** _R, intptr_t _size)
 {
           ...
  LAMBDA_THIS_FIX_CAPTURE
  parallel_invoke( M0$, // stay in this NUMA node
   [&]
   { // first task
    LAMBDA_THIS_FIX

 

Jim Dempsey

 

0 Kudos
jimdempseyatthecove
Honored Contributor III
787 Views

BTW

parallel_invoke( M0$, ...

Is using QuickThread templates. This parallel tasking system supports cache level and NUMA node placement tokens.

The lambda capture this pointer fix should work with TBB, OpenMP, boost, etc... (at least on Linux x64 w/Intel C++ compiler).

I hope this gets fixed relatively soon.

Jim Dempsey

0 Kudos
Judith_W_Intel
Employee
787 Views

 

Hi Jim,

If you could provide us a small reproducer which shows the bug that would help. I assume the symptom is a runtime seg fault?

thanks

Judy

0 Kudos
jimdempseyatthecove
Honored Contributor III
787 Views

Judith,

Please disregard this post. After a long debugging session, it ends up as an error on my part with my templates were I was passing the lambda objects "as objects". IOW a copy of the object (functor) was pushed onto the stack, and I was taking the address of the object (on stack) instead of the address of the original lambda object. The code happened to work as long as the stack wasn't reused between the instantiation and the occurrence of dispatching into the thunk. I am in the process of fixing up my templates to correct for my faux pas.

Jim Dempsey

0 Kudos
Judith_W_Intel
Employee
787 Views

 

ok thanks for the update!

0 Kudos
jimdempseyatthecove
Honored Contributor III
787 Views

FWIW the erroneous template used (abbreviated)

template< lambdaT >
void foo(lambdaT fn) {...}

As opposed to

template< lambdaT >
void foo(lambda&& fn) {...}

In the first case, a copy of the lambda object is pushed onto the stack.

In the second case, the address of the lambda object is placed on the stack, but used inside the {...} as-if by reference. As long as the copy didn't get obliterate everything worked out.

Jim Dempsey

0 Kudos
Reply