Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Ian_K_2
Beginner
92 Views

Error importing shell functions

We use Intel MPI by default on our clusters, and we've recently run into a problem which, I think, is due to some problem with the way it passes environment variables through to the tasks.

If you export a shell function in bash:

function hw() { echo "Hello world"; }
export -f hw

That is available to child processes, and can be seen with env:

$ env | grep -A1 hw
BASH_FUNC_hw()=() {  echo 'Hello world!'
}

Using `mpirun` with this set, without any modifications to the environment, gives the following error:

bash: hw: line 1: syntax error: unexpected end of file
bash: error importing function definition for `BASH_FUNC_hw'

We see this in 2018.3 and 2019.4, the most recent versions we have installed on our clusters. I tried to get a better handle on what it was doing with strace but I couldn't manage to find how it was going wrong -- like I said above, I'm guessing the way the environment variables are passed through wasn't written with these functions in mind and can't handle them.

Has anyone else seen this problem? I'd guess the ability to export shell functions doesn't get much use, but the module command in Environment Modules 4.4.x does it, so I would have thought it would come up elsewhere.

0 Kudos
6 Replies
PrasanthD_intel
Moderator
92 Views

 Hi Ian,
 
 Thanks for reaching out to us.
  
 Could you please share the following :
    1) The MPI code which uses the shell function
    2) The mpirun command you are using.

 Thanks 
 Prasanth

Ian_K_2
Beginner
92 Views

Dwadasi, Prasanth (Intel) wrote:

 Hi Ian,
 
 Thanks for reaching out to us.
  
 Could you please share the following :
    1) The MPI code which uses the shell function
    2) The mpirun command you are using.

 Thanks 
 Prasanth

This happens when running MPI and non-MPI-using code with mpirun -- it happens whether the executable I'm running with mpirun is something really simple like "/bin/true", or a simple MPI pi-calculating example code we use for training. They're not using the shell functions themselves internally, and execute successfully.

And, this is with the mpirun command on 64-bit Linux (Red Hat Enterprise Linux 7.6) from either Intel MPI 2018.3.222 or 2019.4.243. I haven't tried any other versions.

Full paths to those if relevant:

/shared/ucl/apps/intel/2018.Update3/impi/2018.3.222/intel64/bin/mpirun
/shared/ucl/apps/intel/2019.Update4/impi/2019.4.243/intel64/bin/mpirun

Running as:

mpirun -machinefile $TMPDIR/machines -np 80 /bin/true

which in turn is actually running, for example:

mpiexec.hydra -machinefile /tmpdir/job/30889.undefined/sge_machinefile_uccaiki.34978 -machinefile /tmpdir/job/30889.undefined/machines -np 80 /bin/true

(The fact that it's using two machinefiles is almost certainly an artefact of some automatic SGE integration I wasn't aware of before trying to find this out, and I assume shouldn't affect the issue I'm talking about.)

PrasanthD_intel
Moderator
92 Views

Hi Ian,

We tried to reproduce the same but haven't faced any errors that you mentioned.

We have tried with both the versions IMPI you have mentioned.

Could you provide the log info while running after exporting these two variables:

I_MPI_DEBUG=5

FI_LOG_LEVEL=debug

Also if possible could you please share the machine file which you using.

 

Thanks 

Prasanth

 

 

 

PrasanthD_intel
Moderator
92 Views

Hi Ian,

Are you still facing the issue? Could you check with Latest IMPI 2019.6 version.

Let us know if you have any other queries.

 

Thanks 

Prasanth

Ian_K_2
Beginner
92 Views

Dwadasi, Prasanth (Intel) wrote:

Hi Ian,

Are you still facing the issue? Could you check with Latest IMPI 2019.6 version.

Let us know if you have any other queries.

 

Thanks 

Prasanth

Apologies, I've been prevented from working on this due to industrial action and other concerns.

The environment variables you suggested did not provide any additional output, sadly.

We do not currently have IMPI 2019.6 installed on the cluster I am testing this on, so this will have to be done first.

To be honest, though, it's possible this is a problem with how the startup process integrates with our scheduler, which would mean this wouldn't help. Because the whole SSH chain breaks if you try to ptrace it, there's a limited amount of inspection I can do there.

I'll see what else I can do.

PrasanthD_intel
Moderator
92 Views

Hi Ian,

 

We are forwarding this issue to the respective engineering team.

We will get back to you soon.

 

Thanks

Prasanth