Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

Lambda not working

Nav
New Contributor I
2,229 Views
Hi,
I've installed gcc version 4.4.2 on my Fedora system. I got the below code from this forum, but when I compile, I get:
g++ -ltbb -o lambda1 lambda1.cpp

lambda1.cpp: In function void par_ms(int, int, int*):
lambda1.cpp:43: error: expected primary-expression before [ token
lambda1.cpp:43: error: expected primary-expression before ] token
lambda1.cpp:44: error: expected primary-expression before [ token
lambda1.cpp:44: error: expected primary-expression before ] token

I thought lambda's were supported in the latest version of gcc. Please help.

[cpp]#include 
#include
#include

#define N 9999999

using namespace std;
using namespace tbb;

void merge(int beg, int mid, int end, int *A)
{
vector tmp;
int i = beg;
int j = mid;

while ( ( i < mid ) && ( j < end ) )
{
if ( A < A ) {tmp.push_back( A );i++;} else {tmp.push_back( A );j++;}
}

while ( i < mid )
{
tmp.push_back( A );
i++;
}

while ( j < end )
{
tmp.push_back( A );
j++;
}

for ( int t = 0; t < (int) tmp.size(); t++ ) {A[ beg + t ] = tmp;}
}

void par_ms(int beg, int end, int *A)
{
if ( beg + 1 == end ) {return;}

int mid = beg + (end - beg)/2;

parallel_invoke(
[&](){ par_ms(beg, mid, A); },
[&](){ par_ms(mid, end, A); }
);

merge( beg, mid, end, A);

return;
}

int main()
{
task_scheduler_init init(-1);

int A;

for ( int i = 0; i < N; i++ ) {A = N - i;}

par_ms(0, N, A);

for ( int i = 0; i < 10; i++ ) {cout << i << " " << A << endl;}
for ( int i = N-10; i < N; i++ ) {cout << i << " " << A << endl;}

return 0;
}//main
[/cpp]

0 Kudos
29 Replies
Nav
New Contributor I
496 Views

Robert,

While I agree that Lambda's ease the programming it does come at an expense. From my (limited) experience Lambda's have two issues:

a) theyhave a little higher overhead than an explicit functor with args or ->context, therefor the body of the Lambda function must perform more work in order for the extra cost to be amortized.

b) When using [&] and when using objects with reference counters you cannot turn off the IncReferences()/DecReferences() meaning these must nowinclude locks (runs slower). Pointers passed from outside the scope of theLambda to inside the scope of thelambdamight be more suitable. IOW create the additional reference(s) and pointers to these references outside the scope of the parallel_xxx with [&]Lambda and using pointer inside the Lambda functon.

Jim Dempsey
I'm a bit surprised because I thought that the lambda's would be converted into something like "inline code" at compile time. But that was just an assumption. It's interesting to hear your viewpoint.
It'd be great to know how much percent of an increase there is in the overhead.
0 Kudos
robert-reed
Valued Contributor II
496 Views
From my (limited) experience Lambda's have two issues:

a) theyhave a little higher overhead than an explicit functor with args or ->context, therefor the body of the Lambda function must perform more work in order for the extra cost to be amortized.

b) When using [&] and when using objects with reference counters you cannot turn off the IncReferences()/DecReferences() meaning these must nowinclude locks (runs slower). Pointers passed from outside the scope of theLambda to inside the scope of thelambdamight be more suitable. IOW create the additional reference(s) and pointers to these references outside the scope of the parallel_xxx with [&]Lambda and using pointer inside the Lambda functon.

An interesting observation, Jim. This is the first I've heard of such anissue. Sounds like something we should gather evidence for, or at least independently verify. Unfortunately at the moment, my blog is on a bit of a hiatus as I work through some strange compiler optimization anomalies I encountered back in December, working with the compiler team, and I have a couple other high priority tasks that will keep me from trying anything for at least a week or two. Has anyone else observed this phenomenon?

To Nav I'd reply that lambdas are a little more complicated than you might suspect. Implementing them just as inline code would not provide the flexibilty to allow independent threads to call them (need an entry point for that). The context (the [&] stuff) enables dynamic binding of local independent variables for the associated function and the result represents a full closure, though there might be some functional programming purists in the audience who might dispute that.
0 Kudos
RafSchietekat
Valued Contributor III
496 Views
#18 "Yes, roughly linear, though the linear scale factor may vary depending on the local context around the kernel."
How much would you typically capture (average, extreme), and what would you think is typical in general? In your opinion, do the Amdahl parameters indicate an acceptable temporary solution if g++ is required?
0 Kudos
robert-reed
Valued Contributor II
496 Views
Quoting - Raf Schietekat
#18 "Yes, roughly linear, though the linear scale factor may vary depending on the local context around the kernel."
How much would you typically capture (average, extreme), and what would you think is typical in general? In your opinion, do the Amdahl parameters indicate an acceptable temporary solution if g++ is required?


In the conversion case I'm thinking of, there were particular parts of the solver that might have a dozen or more arguments for the class constructor in order to provide the linkages for extracting the kernel. They included arrays and objects that had member functions used in the kernel. I think if I was designing the code with a separate function object in mind, I could do it without so many wiggly ends to reconnect, but in the case of converting an existing serial solver to make use of such a parallel implementation with a subgoal of disrupting the original code as little as possible, the hairy ends were left for all to see. Now, Jim's observations about lambda slowdowns being a caution that I hope proves not an issue, I'd just use lambda context and dispense with the data linkage management process altogether.

I don't think the Amdahl parameters apply much to the parallel conversion process unless you can dole out the various serial kernels to a coterie of developers to get data conversion scaling ;-). For the sole converter, it would still be a process of taking them on one at a time, your basic n * m, or maybe n * E(i)n where n is the number of kernels to convert and E(i)n is the "embeddedness" or complexity of the kernel in situ, in other words, some measure of the extraction complexity for a particular kernel in question.

0 Kudos
RafSchietekat
Valued Contributor III
496 Views
"a dozen or more arguments"
Ouch!

"I don't think the Amdahl parameters apply much to the parallel conversion process unless you can dole out the various serial kernels to a coterie of developers to get data conversion scaling ;-)."
The speed-up/slow-down formula doesn't just apply to parallelism, of course. But my real question was whether my suggestion seemed realistic to you. Perhaps not, with a dozen or more arguments required again and again and again?
0 Kudos
robert-reed
Valued Contributor II
496 Views
Quoting - Raf Schietekat
"I don't think the Amdahl parameters apply much to the parallel conversion process unless you can dole out the various serial kernels to a coterie of developers to get data conversion scaling ;-)."
The speed-up/slow-down formula doesn't just apply to parallelism, of course. But my real question was whether my suggestion seemed realistic to you. Perhaps not, with a dozen or more arguments required again and again and again?

No, not again and again and again. The linkages would be established for each kernel with a specialized constructor call that initialized "holders" for the components in the class and the number required varied with the requirements of the kernel. Replication of instances of this class happened behind the scenes as threads scheduled to share the work made copies of the object.

Is it realistic to use explicit functors if you don't have access to lambdas? Yes. I've done it before and I'm likely to do it again as I work with TBB in areas that may not have lambda support yet (like with constraints to use g++ pre-4.5). And the difficulty level of doing that will be determined by the context complexity of each of the kernels.
0 Kudos
ARCH_R_Intel
Employee
496 Views

There was some uncertainty expressed earlier on whether even gcc 4.5 supported lambdas. David Raila reported to me that lambda expressions do indeed work with gcc 4.5. Below are his notes.

[bash][raila@upcrc-win01 llvm]$ gcc-4.5 -v
Using built-in specs.
COLLECT_GCC=gcc-4.5
COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-unknown-linux-gnu/4.5.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ./configure --program-suffix=-4.5 --disable-libgcj --enable-languages=c++ : (reconfigured) 
Thread model: posix
gcc version 4.5.0 20100105 (experimental) (GCC) 

Compile with:  gcc-4.5 -std=c++0x


#include 

#include "tbb/parallel_for.h"
#include "tbb/blocked_range.h"

using namespace tbb;

template 
void
doForeach( F f, int start, int end)
{
  for (int i=start; i < end; i++)
    f(i);
}

//
// lambda f
//
auto f = [](int i){printf("Hello3 Lambdas %dn", i); };

int main() {
        // serial:
        doForeach(f, 0, 10);
        // parallel
        parallel_for(0, 10, f);

        return 0;
}
[/bash]
0 Kudos
Nav
New Contributor I
496 Views
Yay! Thanks for posting!
0 Kudos
Om_S_Intel
Employee
496 Views
The test is working with gcc. You use the following command.

$gcc -c -std=c++0x tstcase.cpp

$ gcc -v

Using built-in specs.

COLLECT_GCC=/usr/bin/gcc

COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.5.1/lto-wrapper

Target: x86_64-redhat-linux

Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,lto --enable-plugin --enable-java-awt=gtk --disable-dssi --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre --enable-libgcj-multifile --enable-java-maintainer-mode --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib --with-ppl --with-cloog --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux

Thread model: posix

gcc version 4.5.1 20100924 (Red Hat 4.5.1-4) (GCC)

0 Kudos
Reply