Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
28459 Discussions

Only one cpu available for child process after OpenMP library routine call

JantN
Beginner
501 Views

One of my fortran programs calls another fortran program using a CALL SYSTEM statement. I run the main program in a slurm queue on a linux OS. I assign 32 cpus per task, single task per node, single node. OMP_GET_NUM_PROCS() yields 32 cores for the first program. For the underlying program, OMP_GET_NUM_PROCS() yields 32 cores if the program is called before calling any OpenMP routine. If I call OMP_GET_NUM_PROCS() first and then call the underlying program, the OMP_GET_NUM_PROCS in the underlying program yields 1.

The call of the underlying program is not in a parallel region. The maximum number of threads is 10 and is not affected by calling OpenMP routines.

What is happening here? Is the OpenMP library assigning all cores to the main program? Is it modifying the environment variables passed on to the child process? Is there an environment variable associated with OMP_GET_NUM_PROCS?

More importantly, what can I do to avoid it?

Thanks.

0 Kudos
1 Solution
jimdempseyatthecove
Honored Contributor III
493 Views

The O/S permits a process, say your master process, to specify a sub-set of the system "logical processors" (hardware threads) on which to run it's code.

Further, that process, can elect to use a subset of those system provided hardware threads (process affinity mask).

Further, within the process, the process can create additional threads. Those threads (initially) assume the affinities of the spawning thread (usually master thread of main process), then subsequently the newly spawned thread can specify a subset of the (given) process (sub)set of threads.

Now then, when any thread of a given process, spawns a child process, (depending on O/S) the child process inherits by default the environment of the spawning thread (which is not necessarily the process of the spawning thread).

One way to fix this is to NOT use affinity settings on the parent process that sets (pins) the OpenMP thread that will initiate the CALL SYSTEM to fewer threads than you desire to use.

A different option (to explore), is prior to 1st parallel region in spawned process, (attempt to) query the system affinity and then set the process affinity to the system affinity. Note, you may be inhibited from doing this.

Jim Dempsey

View solution in original post

2 Replies
jimdempseyatthecove
Honored Contributor III
494 Views

The O/S permits a process, say your master process, to specify a sub-set of the system "logical processors" (hardware threads) on which to run it's code.

Further, that process, can elect to use a subset of those system provided hardware threads (process affinity mask).

Further, within the process, the process can create additional threads. Those threads (initially) assume the affinities of the spawning thread (usually master thread of main process), then subsequently the newly spawned thread can specify a subset of the (given) process (sub)set of threads.

Now then, when any thread of a given process, spawns a child process, (depending on O/S) the child process inherits by default the environment of the spawning thread (which is not necessarily the process of the spawning thread).

One way to fix this is to NOT use affinity settings on the parent process that sets (pins) the OpenMP thread that will initiate the CALL SYSTEM to fewer threads than you desire to use.

A different option (to explore), is prior to 1st parallel region in spawned process, (attempt to) query the system affinity and then set the process affinity to the system affinity. Note, you may be inhibited from doing this.

Jim Dempsey

JantN
Beginner
489 Views

Thank you, Jim, for the explanation and suggestions.

I removed  "export KMP_AFFINITY=verbose,scatter"  for the parent process and the problem has disappeared.

0 Kudos
Reply