Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.

Nested OpenMP parallel regions with ifort

Gene_Wagenbreth
Beginner
264 Views
Does ifort support OpenMP nested parallel regions ?

If so, what version do I need. I am porting a code to multiple machines. I need to know if I
need to purchase a new compiler for particular systems.

Thank You
Gene W
0 Kudos
3 Replies
Vladimir_P_1234567890
264 Views
Hello,

the latest compiler should support OpenMP specification 3.0. You can try evaluation version of compiler to find out whether it works for you.
--Vladimir
0 Kudos
Michael_K_Intel2
Employee
264 Views
Hi Gene,

The Intel Composoer XE does support nested parallelism. Yoou might need to explicitly enable nesting by calling omp_set_nested(). Please see the OpenMP specification on how use this runtime call.

Please be adviced that nested parallelism is supported, but might cause trouble when actually used in an application. This is related to the fact that nested parallelism in OpenMP has some flaws that need to taken care of. The main problem is that a nested region needs to also create threads for parallel execution. Given, for instance, an out region that already runs with 8 threads, you need atleast 16 cores in your machine to have 2 threads running in the nested parallel region. Having more than one level of parallel regions essentially exposes you to exponential grows of thread counts or limits parallel region to only one thread. You have the choice :-).

My advice would be to check if the OpenMP tasking model does work for you. Tasking does not exhibit the same nesting issues as parallel region. The idea is that you create a single parallel region right at the beginning of your application. Parallelism then comes from the tasks you create and fire up for execution. The task then will be schedule to run on the thread team that you've created.

Cheers,
-michael
0 Kudos
jimdempseyatthecove
Honored Contributor III
264 Views
You can also start an outer parallel region with fewer than the full compliment of threads. e.g. pick numbers who's product produces the thread count.

On system with 8 hardware threads you could choose

8 threads without nesting
2 outer level threads, 4 inner (only 2 levels nested)
4 outer level, 2 inner (only 2 levels nested)
2 outer level, 2 middle level, 2 inner level (3 nested levels)

If some of these threads are performing I/O (IOW stalls) then consider oversubscription (dependent on frequency and duration of thread stall for I/O).

Jim Dempsey
0 Kudos
Reply