Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.
Announcements
This community is designed for sharing of public information. Please do not share Intel or third-party confidential information here.
2417 Discussions

How to conrol the performance of parallel_for?

Jakcie_Jin
Beginner
100 Views
Hi, I used parallel_for to parallelize some same tasks for the elements of a vector. Somehow it did not result in speed up. I wonder why. Is it because the number of threads or grain size?
I have tested the code with different number of threads, It seems there is no obvious difference in running time when using from 1 to 8 threads.

Any advice? Thanks. One parallel_for function is defined as follows:

struct OneMove
{
short TheNode;
short aNode;
short route1;
short route2;
float tempObj;
};

template

class PSwap
{
public:
myType* sol;
concurrent_vector* pMoveList;

void operator() (const blocked_range& r) const
{
short TheNode,routeNum1,routeNum2,aNode;
float tempSolObj;
for(short i=r.begin();i!=r.end();++i)
{
// find the routeNum of the candidate route
TheNode=i;
routeNum1=sol->RouteNumList[TheNode];
vector nearNeighborList=sol->pinst->get_PNL(TheNode,sol->listSize);
// find a node to move theNode in a candidate route
for(unsigned int j=1;j {
aNode=nearNeighborList;
if(aNode==0)
continue;
routeNum2=sol->RouteNumList[aNode];
//abolish those in the same route
if(routeNum2==routeNum1)
continue;
//evaluate the obj of swap the two nodes
//tempObj=obj+alpha*VioC+belta*VioD
tempSolObj=sol->evaluateOneSwapeMove(TheNode,aNode,routeNum1,routeNum2);

OneMove move;
move.TheNode =TheNode;
move.aNode =aNode;
move.route1 =routeNum1;
move.route2 =routeNum2;
move.tempObj =tempSolObj;
(*pMoveList).push_back(move);
}
}
}



PSwap(myType* s,concurrent_vector* pl):
sol(s),pMoveList(pl) {}

};

template
static void DoPSwap(myType* s, concurrent_vector* pl, int size)
{
PSwap pswap(s,pl);
parallel_for(blocked_range(1,size),pswap,auto_partitioner());
}
0 Kudos
2 Replies
jimdempseyatthecove
Black Belt
100 Views
Can you post your code, or at least small test case that illustrates your problem?
Alexey_K_Intel3
Employee
100 Views

One idea: look at TBB convex_hull example to see how concurrent_vector can be used in parallel loops. The idea is to collect local data in a normal vector, without locks, and then copy all at once into concurrent_vector. In our experience with convex_hull, this method was more efficient compared to push_backs. Check whether it improves performance of your code above.

Reply