Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.

Parallel Option

valerio19
Beginner
430 Views
Dear all,

I have a question regarding the option "Qparallel".

It generates multi-threading code in presence of loops.

My question is: is more efficient this skill or is better to parallelize the loop using openMP? Can I use them at the same time?

Thanks
0 Kudos
4 Replies
TimP
Honored Contributor III
430 Views
A few weeks ago I would have been prepared to answer this more dogmatically, but now I've had experience both ways.
You should be able to get the best of both, as the current /Qparallel uses OpenMP in a compatible way behind the scenes. Don't count on that remaining true for future compiler versions.
An OpenMP region should prevent the compiler from trying auto-parallel in that region, while auto-parallel could work outside OpenMP regions.
So it's a valid strategy to start with /Qparallel and work on regions which need improvement with OpenMP.

My recent case where /Qparallel was more efficient than OpenMP arose where the private list was so big that Intel compiler failed. gnu compiler was able to run correctly but got no advantage from OpenMP. It appears that efficient OpenMP may require careful programming to minimize the number of privates. On the other hand, if you can figure out the optimum way to parallelize with OpenMP, you may do better than /Qparallel.
If you organize your code correctly for OpenMP, it may also work better with /Qparallel when you turn off OpenMP.
0 Kudos
valerio19
Beginner
430 Views
With auto-parallelization, is it possible to set the number of threads? For example by means of an environment variable
0 Kudos
Om_S_Intel
Employee
430 Views
Yes. You may use the same environment variable for OpenMP and /Qparallel to set the number of threads. This is OMP_NUM_THREADS.
0 Kudos
jimdempseyatthecove
Honored Contributor III
430 Views
>>... where the private list was so big that Intel compiler failed.

At this point you should consider encapsulating the body of the loop within a function where the function's local variables are what were the privates. This may mean if your shared list is large, you would have a large number of arguments on the call. (you win some and lose some).

A different tactic would be to create a(two) struct(s) that holds the private (and shared) variables. Instantiate an instance of the struct(s) and PRIVATE/SHARED the container

Your code would change

structprivates_t
{
...
};
struct shared_t
{
...
};

privates_t privates;
shared_t shares;
...
#pragma omp parallel for private(privates), shared(shares)
for(inti=1; i < whatever; ++i)
{
privates.X = shares.Y * ...;
...
}

To me, encapsulating body of loop is cleaner and less error prone.

Jim Dempsey


0 Kudos
Reply