Software Archive
Read-only legacy content

Problems with shell in MPSS 3.3

Holger_A_
Beginner
845 Views

Hello,

I have a problem with the recent MPSS 3.3 version. For some reason, the busybox does not seem to be used and bash is not really working. So let me explain what I experienced.

On a fresh booted MIC, the symlinc in /etc/sh points to bash

h_zimm01@sl270-02-mic7:~$ ls -l /bin/sh
lrwxrwxrwx    1 root     root             9 Jan  1  1970 /bin/sh -> /bin/bash

This leads to the very strange behavior that the directory, a shell is started in is not defined:

[h_zimm01@sl270-02 ~]$ ssh mic7 pwd
shell-init: error retrieving current directory: getcwd: cannot access parent directories: Invalid argument
pwd: error retrieving current directory: getcwd: cannot access parent directories: Socket operation on non-socket

This has nasty effects when you want to start codes on the MIC from the host. With an interactive login, this works as expected.

[h_zimm01@sl270-02 ~]$ ssh mic7

h_zimm01@sl270-02-mic7:~$ pwd
/home/h/h_zimm01

I could solve this problem by setting the symlink /bin/sh to /bin/busybox and changing the /etc/passwd so that the user uses the sh as the default shell.

[h_zimm01@sl270-01 ~]$ ssh mic0

[h_zimm01@sl270-01-mic0 h_zimm01]$ ls -l /bin/sh
lrwxrwxrwx 1 root root 7 Sep 11 17:23 /bin/sh -> busybox
[h_zimm01@sl270-01-mic0 h_zimm01]$ grep h_zimm01 /etc/passwd
h_zimm01:x:154525:200:Holger A:/home/h/h_zimm01:/bin/sh

Ok, until here, I could get it running, but now comes my unsolved problem. When I for example log in to the MIC and try to run a code, the necessary libraries must be in /lib64 and executables like mpirun in /bin. Even setting the LD_LIBRARY_PATH does not have an effect.

I really do not want to copy all my libraries to /lib64. Since we are running a cluster, I cannot predict, which libraries our users will need. So I want to copy all MPI libraries an so on to a path called for example /micfs/lib on the host and export it via NFS to the MICs. But without LD_LIBRARY_PATH being recognized. This won't work.

How can I get LD_LIBRARY_PATH being interpreted (in busybox)? Or find a better solution for the getcwd-problem cited above?

Thank you for your help.

 

0 Kudos
4 Replies
Frances_R_Intel
Employee
845 Views

The shell /bin/bash should exist on the coprocessor as a distinct executable:

faroth@knightscorner5-mic0:~$ ls -l /bin/bash
-rwxr-xr-x 1 root root 934016 Jul 10 18:59 /bin/bash

I don't think that is where your problem lies and I wouldn't recommend resetting /bin/sh to point to busybox. Try the following to see if you can execute pwd when bash is running (it doesn't run when you say only 'ssh mic0 pwd'):

faroth@knightscorner5:~$ ssh mic0 ps
   PID TTY          TIME CMD
 10894 ?        00:00:00 sshd
 10895 ?        00:00:00 ps
faroth@knightscorner5:~$ ssh mic0 bash -c 'ps;pwd'
   PID TTY          TIME CMD
 10898 ?        00:00:00 sshd
 10899 ?        00:00:00 bash
 10900 ?        00:00:00 ps
 /home/faroth

 

Is /home being NFS mounted? If so, can you tell me what options are being used to mount it? The behavior you are seeing implies that, for some reason, the files are not mounted when you execute 'ssh mic0 pwd'.

Setting LD_LIBRARY_PATH on the coprocessor should set the search path for the relocatable libraries as you expect. If you are running an offload program on the host, you would want to set MIC_LD_LIBRARY_PATH on the host to force the path to be sent to the coprocessor. But if you are setting the environment variable on the coprocessor or if you are passing it as an environment variable to mpirun, LD_LIBRARY_PATH should work. Is the variable being exported properly? Does it work if you leave /bin/sh pointing to /bin/bash?

0 Kudos
Holger_A_
Beginner
845 Views
Hi Frances, yes, my home is NFS mounted, but not by an auto-mounter. So I doubt that the problem stems from the home directory being mounted too late. Fortunately I could identify the problem to be related to the NFS mount. When the directory runs natively on the coporsessor, the problem disappears. I will next try to mount the home via the native BeeGFS client. Unfortunately, it does not run with MPSS 3.3 at the moment, but the BeeGFS support team is working on this. I will report, if this brings a solution.
0 Kudos
TaylorIoTKidd
New Contributor I
845 Views

Holger,

Did you fix your issue?

Regards
--
Taylor
 

0 Kudos
Evan_P_Intel
Employee
845 Views

Holger A. wrote:

When I for example log in to the MIC and try to run a code, the necessary libraries must be in /lib64 and executables like mpirun in /bin. Even setting the LD_LIBRARY_PATH does not have an effect.

I'm not sure it remains relevant, but I thought I would comment on this portion.

Customizing PATH is difficult when /bin/sh is a symlink to busybox because busybox implements a (merely) POSIX-compatible Almquist shell derivative; unlike GNU bash interactive shells, a POSIX-compatible interactive shell (which is what ssh creates) doesn't have a predefined script name (e.g. ~/.bashrc) it sources on startup. Instead, it sources nothing unless the environment variable ENV is set to a filename; arranging for ENV to be defined on the Xeon Phi side is its own problem.

The unusual behavior of LD_LIBRARY_PATH is also a consequence of symlinking /bin/sh to busybox, which is a SUID executable in MPSS 3.3. (Certain busybox subcommands, like "ping," require root to work as expected; the rest quickly drop root privilege.) Linux's dynamic linker is documented (man ld.so) to "ignore" the environment variable LD_LIBRARY_PATH for SUID programs; in truth, this variable (among others) is outright removed from the environment before the SUID program's main() begins execution—for GLIBC 2.15 (not exactly what MPSS uses, but close) the code in question is here, the list of variables here—unless the program's UID and EUID are equal (e.g., it's root invoking the program).

This means that all busybox programs will behave as if LD_LIBRARY_PATH wasn't set in their environment when run by non-root users, and won't pass that variable on to any child processes. So if you run a program which is (or runs) a shell script which runs another program, it is impossible to arrange for LD_LIBRARY_PATH to be set in the latter program's environment without modifying the shell script to export it directly. It also means that LD_LIBRARY_PATH can appear to be unset even when it is, in fact, set—because /usr/bin/env is also symlinked to busybox:

root$ su - nobody -c "/bin/busybox sh"
No directory, logging in with HOME=/
/ $ touch /tmp/libc.so.6  # create a broken libc in /tmp
/ $ ps U uucp
   PID TTY      STAT   TIME COMMAND
/ $ export LD_LIBRARY_PATH=/tmp
/ $ env | fgrep LD  # appears to be unset in environment
/ $ ps U uucp       # yet obviously *is* set
ps: error while loading shared libraries: /tmp/libc.so.6: file too short

Along these same lines, note that ldd is a shell script.

Notwithstanding the fact that I agree there's likely a better solution than using busybox for /bin/sh, there is another way to modify the dynamic linker's library search path besides resorting to LD_LIBRARY_PATH—instead, you can add your new paths to /etc/ld.so.conf. (This config file can be used to configure the dynamic linker in the same way as on distributions like RHEL, SLE, Debian, etc.)

0 Kudos
Reply