Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Beginner
31 Views

Suddenly Cannot SSH into MIC Cards

Hey Everyone,
I am currently in the process of investigating why our two MIC cards went offline suddenly over the course of the last week or so (hard to tell as they get only used semi-frequently). As of right now, I cannot ping nor SSH into either card. For some reason, mic0 and mic1 no longer have inet4 addresses.

I have included the micinfo below. Both of the cards are currently online as shown by micctrl --status:

mic0: online (mode: linux image: /usr/share/mpss/boot/bzImage-knightscorner)
mic1: online (mode: linux image: /usr/share/mpss/boot/bzImage-knightscorner)

Any help would be greatly appreciated.

Warm Regards,
Joe

MicInfo Utility Log
Created Thu Aug 16 14:37:10 2018


	System Info
		HOST OS			: Linux
		OS Version		: 2.6.32-431.20.5.el6.x86_64
		Driver Version		: 3.5.2-1
		MPSS Version		: 3.5.2
		Host Physical Memory	: 132124 MB

Device No: 0, Device Name: mic0

	Version
		Flash Version 		 : 2.1.02.0391
		SMC Firmware Version	 : 1.17.6900
		SMC Boot Loader Version	 : 1.8.4326
		uOS Version 		 : 2.6.38.8+mpss3.5.2
		Device Serial Number 	 : ADKC32101046

	Board
		Vendor ID 		 : 0x8086
		Device ID 		 : 0x2250
		Subsystem ID 		 : 0x2500
		Coprocessor Stepping ID	 : 3
		PCIe Width 		 : x16
		PCIe Speed 		 : 5 GT/s
		PCIe Max payload size	 : 256 bytes
		PCIe Max read req size	 : 512 bytes
		Coprocessor Model	 : 0x01
		Coprocessor Model Ext	 : 0x00
		Coprocessor Type	 : 0x00
		Coprocessor Family	 : 0x0b
		Coprocessor Family Ext	 : 0x00
		Coprocessor Stepping 	 : B1
		Board SKU 		 : B1PRQ-5110P/5120D
		ECC Mode 		 : Enabled
		SMC HW Revision 	 : Product 225W Passive CS

	Cores
		Total No of Active Cores : 60
		Voltage 		 : 1051000 uV
		Frequency		 : 1052631 kHz

	Thermal
		Fan Speed Control 	 : N/A
		Fan RPM 		 : N/A
		Fan PWM 		 : N/A
		Die Temp		 : 42 C

	GDDR
		GDDR Vendor		 : Elpida
		GDDR Version		 : 0x1
		GDDR Density		 : 2048 Mb
		GDDR Size		 : 7936 MB
		GDDR Technology		 : GDDR5 
		GDDR Speed		 : 5.000000 GT/s 
		GDDR Frequency		 : 2500000 kHz
		GDDR Voltage		 : 1501000 uV

Device No: 1, Device Name: mic1

	Version
		Flash Version 		 : 2.1.02.0391
		SMC Firmware Version	 : 1.17.6900
		SMC Boot Loader Version	 : 1.8.4326
		uOS Version 		 : 2.6.38.8+mpss3.5.2
		Device Serial Number 	 : ADKC32100888

	Board
		Vendor ID 		 : 0x8086
		Device ID 		 : 0x2250
		Subsystem ID 		 : 0x2500
		Coprocessor Stepping ID	 : 3
		PCIe Width 		 : x16
		PCIe Speed 		 : 5 GT/s
		PCIe Max payload size	 : 256 bytes
		PCIe Max read req size	 : 512 bytes
		Coprocessor Model	 : 0x01
		Coprocessor Model Ext	 : 0x00
		Coprocessor Type	 : 0x00
		Coprocessor Family	 : 0x0b
		Coprocessor Family Ext	 : 0x00
		Coprocessor Stepping 	 : B1
		Board SKU 		 : B1PRQ-5110P/5120D
		ECC Mode 		 : Enabled
		SMC HW Revision 	 : Product 225W Passive CS

	Cores
		Total No of Active Cores : 60
		Voltage 		 : 1036000 uV
		Frequency		 : 1052631 kHz

	Thermal
		Fan Speed Control 	 : N/A
		Fan RPM 		 : N/A
		Fan PWM 		 : N/A
		Die Temp		 : 41 C

	GDDR
		GDDR Vendor		 : Elpida
		GDDR Version		 : 0x1
		GDDR Density		 : 2048 Mb
		GDDR Size		 : 7936 MB
		GDDR Technology		 : GDDR5 
		GDDR Speed		 : 5.000000 GT/s 
		GDDR Frequency		 : 2500000 kHz
		GDDR Voltage		 : 1501000 uV

 

0 Kudos
2 Replies
Highlighted
New Contributor III
31 Views

have you tried restarting the MPSS daemon ? I've seen this issue with my KNC cards in the past, where they would no longer be reachable. this always happened if they were left unused for several days/weeks - an 'service mpss restart' usually did the trick.

Also, you're using quite an old version of the mpss stack - I'd recommend to upgrade to the latest version, 3.8.4 ; I have not experienced the network dropouts for almost a year now.

 

0 Kudos
Highlighted
Beginner
31 Views

You can try the command :  sudo micctrl -R to restart it .

 

0 Kudos