Software Archive
Read-only legacy content
17060 Discussions

Config not taking reliably

William_H_
Beginner
346 Views

I'm trying to configure some MIC cards to work in our cluster.  My problem is that the changes aren't consistently applied.

I have a script  that uses micctrl to apply changes to the systems and most of these changes seem to be reflected in /etc/mpss/mic0.conf /etc/mpss/mic1.conf.   One of the changes I make is to specify the host_keys using micctrl --hostkeys.  As I'm giving both MICs the same keys I don't specify which mic to use.   This seems to add the keys to the host specific overlays in /var/mpss/mic[01]

When I boot the MIC using micctrl it will sometimes come up as configured but sometimes the config clearly hasn't been applied (as evidenced by ssh-keyscan returning a different host key and my being unable to log in to the MIC over ssh.

I have done this on a freshly installed host so am confident there is no legacy config lying around on the host.

If I run micctrl --reboot on a MIC then it will sometimes change from unconfigured to configured OR vice-versa.  While I have considered using this as a workaround (just keep rebooting until it has the correct host key) I would prefer not to engage in voodoo sysadmin and I can't be sure it won't just get stuck in a loop.

I have tried various combinations of:

micctrl --resetconfig and micctrl --updateramfs.  

Doing everything with micctrl

Doing everything by modifying files under  /var/mpss/mic0 and /var/mpss/mic1 directly using micctrl only for booting/shutdown, resetconfig and updateramfs.

Regardless of what I do I get the same inconsistent results (occasionally I break the config in some other way as well:)

mic1 seems more likely than mic0 to have the intended config but this is not 100%.

#rpm -q --whatprovides /usr/sbin/micctrl
mpss-daemon-3.2.1-1.glibc2.12.2.x86_64

I note that if I zcat /var/mpss/mic1.image.gz|cpio -i the resulting files don't seem to contain any of the required config even when the MIC in question does.  Indeed their contents appear to be identical to each other and to the base filesystem  /usr/share/mpss/boot/initramfs-knightscorner.cpio.gz (diff -ur finds no differences although it does produce output for device files and  dangling symlinks....). 

 

 

 

 

 

 

 

 

 

 

 

 

  

0 Kudos
1 Reply
Frances_R_Intel
Employee
346 Views

Do you have OFED-3.5-2-MIC-BETA installed? I suspect that you are having the same problem as https://software.intel.com/en-us/forums/topic/514700. When the coprocessor reboots, it first makes a RAM file system and copies over a base root directory that allows it to bring up some daemons that are required before the boot can continue. It then makes a new RAM file system and copies over the complete root directory. The file containing the complete root directory is remade with each reboot. There is a problem with OFED-3.5-2-MIC-BETA that can cause this file with the complete root directory to become corrupted intermittently. The way to tell if this is your problem is to look for the error "Initramfs unpacking failed" in the system log on the host. The way to get around this problem is to use OFED-1.5.4.1 for now - or as you have suggested, just reboot until it takes, which is not a great solution but is doable. There is a short script in that other forum post if you want to do that.

0 Kudos
Reply