Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

Error importing shell functions

Ian_K_2
Beginner
683 Views

We use Intel MPI by default on our clusters, and we've recently run into a problem which, I think, is due to some problem with the way it passes environment variables through to the tasks.

If you export a shell function in bash:

function hw() { echo "Hello world"; }
export -f hw

That is available to child processes, and can be seen with env:

$ env | grep -A1 hw
BASH_FUNC_hw()=() {  echo 'Hello world!'
}

Using `mpirun` with this set, without any modifications to the environment, gives the following error:

bash: hw: line 1: syntax error: unexpected end of file
bash: error importing function definition for `BASH_FUNC_hw'

We see this in 2018.3 and 2019.4, the most recent versions we have installed on our clusters. I tried to get a better handle on what it was doing with strace but I couldn't manage to find how it was going wrong -- like I said above, I'm guessing the way the environment variables are passed through wasn't written with these functions in mind and can't handle them.

Has anyone else seen this problem? I'd guess the ability to export shell functions doesn't get much use, but the module command in Environment Modules 4.4.x does it, so I would have thought it would come up elsewhere.

0 Kudos
6 Replies
PrasanthD_intel
Moderator
683 Views

 Hi Ian,
 
 Thanks for reaching out to us.
  
 Could you please share the following :
    1) The MPI code which uses the shell function
    2) The mpirun command you are using.

 Thanks 
 Prasanth

0 Kudos
Ian_K_2
Beginner
683 Views

Dwadasi, Prasanth (Intel) wrote:

 Hi Ian,
 
 Thanks for reaching out to us.
  
 Could you please share the following :
    1) The MPI code which uses the shell function
    2) The mpirun command you are using.

 Thanks 
 Prasanth

This happens when running MPI and non-MPI-using code with mpirun -- it happens whether the executable I'm running with mpirun is something really simple like "/bin/true", or a simple MPI pi-calculating example code we use for training. They're not using the shell functions themselves internally, and execute successfully.

And, this is with the mpirun command on 64-bit Linux (Red Hat Enterprise Linux 7.6) from either Intel MPI 2018.3.222 or 2019.4.243. I haven't tried any other versions.

Full paths to those if relevant:

/shared/ucl/apps/intel/2018.Update3/impi/2018.3.222/intel64/bin/mpirun
/shared/ucl/apps/intel/2019.Update4/impi/2019.4.243/intel64/bin/mpirun

Running as:

mpirun -machinefile $TMPDIR/machines -np 80 /bin/true

which in turn is actually running, for example:

mpiexec.hydra -machinefile /tmpdir/job/30889.undefined/sge_machinefile_uccaiki.34978 -machinefile /tmpdir/job/30889.undefined/machines -np 80 /bin/true

(The fact that it's using two machinefiles is almost certainly an artefact of some automatic SGE integration I wasn't aware of before trying to find this out, and I assume shouldn't affect the issue I'm talking about.)

0 Kudos
PrasanthD_intel
Moderator
683 Views

Hi Ian,

We tried to reproduce the same but haven't faced any errors that you mentioned.

We have tried with both the versions IMPI you have mentioned.

Could you provide the log info while running after exporting these two variables:

I_MPI_DEBUG=5

FI_LOG_LEVEL=debug

Also if possible could you please share the machine file which you using.

 

Thanks 

Prasanth

 

 

 

0 Kudos
PrasanthD_intel
Moderator
683 Views

Hi Ian,

Are you still facing the issue? Could you check with Latest IMPI 2019.6 version.

Let us know if you have any other queries.

 

Thanks 

Prasanth

0 Kudos
Ian_K_2
Beginner
683 Views

Dwadasi, Prasanth (Intel) wrote:

Hi Ian,

Are you still facing the issue? Could you check with Latest IMPI 2019.6 version.

Let us know if you have any other queries.

 

Thanks 

Prasanth

Apologies, I've been prevented from working on this due to industrial action and other concerns.

The environment variables you suggested did not provide any additional output, sadly.

We do not currently have IMPI 2019.6 installed on the cluster I am testing this on, so this will have to be done first.

To be honest, though, it's possible this is a problem with how the startup process integrates with our scheduler, which would mean this wouldn't help. Because the whole SSH chain breaks if you try to ptrace it, there's a limited amount of inspection I can do there.

I'll see what else I can do.

0 Kudos
PrasanthD_intel
Moderator
683 Views

Hi Ian,

 

We are forwarding this issue to the respective engineering team.

We will get back to you soon.

 

Thanks

Prasanth

0 Kudos
Reply