- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We use Intel MPI by default on our clusters, and we've recently run into a problem which, I think, is due to some problem with the way it passes environment variables through to the tasks.
If you export a shell function in bash:
function hw() { echo "Hello world"; } export -f hw
That is available to child processes, and can be seen with env:
$ env | grep -A1 hw BASH_FUNC_hw()=() { echo 'Hello world!' }
Using `mpirun` with this set, without any modifications to the environment, gives the following error:
bash: hw: line 1: syntax error: unexpected end of file bash: error importing function definition for `BASH_FUNC_hw'
We see this in 2018.3 and 2019.4, the most recent versions we have installed on our clusters. I tried to get a better handle on what it was doing with strace but I couldn't manage to find how it was going wrong -- like I said above, I'm guessing the way the environment variables are passed through wasn't written with these functions in mind and can't handle them.
Has anyone else seen this problem? I'd guess the ability to export shell functions doesn't get much use, but the module command in Environment Modules 4.4.x does it, so I would have thought it would come up elsewhere.
- Tags:
- Cluster Computing
- General Support
- Intel® Cluster Ready
- Message Passing Interface (MPI)
- Parallel Computing
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ian,
Thanks for reaching out to us.
Could you please share the following :
1) The MPI code which uses the shell function
2) The mpirun command you are using.
Thanks
Prasanth
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dwadasi, Prasanth (Intel) wrote:Hi Ian,
Thanks for reaching out to us.
Could you please share the following :
1) The MPI code which uses the shell function
2) The mpirun command you are using.Thanks
Prasanth
This happens when running MPI and non-MPI-using code with mpirun -- it happens whether the executable I'm running with mpirun is something really simple like "/bin/true", or a simple MPI pi-calculating example code we use for training. They're not using the shell functions themselves internally, and execute successfully.
And, this is with the mpirun command on 64-bit Linux (Red Hat Enterprise Linux 7.6) from either Intel MPI 2018.3.222 or 2019.4.243. I haven't tried any other versions.
Full paths to those if relevant:
/shared/ucl/apps/intel/2018.Update3/impi/2018.3.222/intel64/bin/mpirun /shared/ucl/apps/intel/2019.Update4/impi/2019.4.243/intel64/bin/mpirun
Running as:
mpirun -machinefile $TMPDIR/machines -np 80 /bin/true
which in turn is actually running, for example:
mpiexec.hydra -machinefile /tmpdir/job/30889.undefined/sge_machinefile_uccaiki.34978 -machinefile /tmpdir/job/30889.undefined/machines -np 80 /bin/true
(The fact that it's using two machinefiles is almost certainly an artefact of some automatic SGE integration I wasn't aware of before trying to find this out, and I assume shouldn't affect the issue I'm talking about.)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ian,
We tried to reproduce the same but haven't faced any errors that you mentioned.
We have tried with both the versions IMPI you have mentioned.
Could you provide the log info while running after exporting these two variables:
I_MPI_DEBUG=5
FI_LOG_LEVEL=debug
Also if possible could you please share the machine file which you using.
Thanks
Prasanth
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ian,
Are you still facing the issue? Could you check with Latest IMPI 2019.6 version.
Let us know if you have any other queries.
Thanks
Prasanth
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dwadasi, Prasanth (Intel) wrote:Hi Ian,
Are you still facing the issue? Could you check with Latest IMPI 2019.6 version.
Let us know if you have any other queries.
Thanks
Prasanth
Apologies, I've been prevented from working on this due to industrial action and other concerns.
The environment variables you suggested did not provide any additional output, sadly.
We do not currently have IMPI 2019.6 installed on the cluster I am testing this on, so this will have to be done first.
To be honest, though, it's possible this is a problem with how the startup process integrates with our scheduler, which would mean this wouldn't help. Because the whole SSH chain breaks if you try to ptrace it, there's a limited amount of inspection I can do there.
I'll see what else I can do.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ian,
We are forwarding this issue to the respective engineering team.
We will get back to you soon.
Thanks
Prasanth
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page