I have tested the code with different number of threads, It seems there is no obvious difference in running time when using from 1 to 8 threads.
Any advice? Thanks. One parallel_for function is defined as follows:
void operator() (const blocked_range
// find the routeNum of the candidate route
// find a node to move theNode in a candidate route
for(unsigned int j=1;j
//abolish those in the same route
//evaluate the obj of swap the two nodes
static void DoPSwap(myType* s, concurrent_vector
One idea: look at TBB convex_hull example to see how concurrent_vector can be used in parallel loops. The idea is to collect local data in a normal vector, without locks, and then copy all at once into concurrent_vector. In our experience with convex_hull, this method was more efficient compared to push_backs. Check whether it improves performance of your code above.