- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Xeon Phi experts:
When i try to connect to the phi (7120P) with ssh it hangs until timeout, and 'ping -c 3 mic0' has 100% package loss.
The only problem i had during setup was when executing:
~# systemctl stop mpssd ~# micctrl --initdefaults -vvv [Info] mic0: Using existing /etc/mpss/default.conf [Info] mic0: Using existing /etc/mpss/mic0.conf [Info] mic0: File System Base /usr/share/mpss/boot/initramfs-knightscorner.cpio.gz [Info] mic0: MIC Family x100 [Info] mic0: MPSSVersion 3.x [Info] mic0: Common files at /var/mpss/common [Info] mic0: Unique files at /var/mpss/mic0 [Info] mic0: Hostname CTHULHU-mic0 [Network] mic0: ifdown mic0 [Error] Failed to rename temporary file /etc/netctl/interfaces [Filesys] mic0: Created /etc/netctl/static-mic0 [Network] mic0: ifup mic0 [Filesys] mic0: Update /var/mpss/mic0/etc/network/interfaces [Info] mic0: Removing conflicting existing /etc/hosts entry: 172.31.1.1 CTHULHU-mic0 mic0 #Generated-by-micctrl [Filesys] mic0: Update /etc/hosts with 172.31.1.1 CTHULHU-mic0 [Info] mic0: Verbose mode Disabled [Info] mic0: Linux OS image /usr/share/mpss/boot/bzImage-knightscorner System Map /usr/share/mpss/boot/bzImage-knightscorner [Info] mic0: Boot On Start Enabled [Info] mic0: Shutdown Timeout 300 [Info] mic0: MIC Crash Dump at /var/crash/mic size 16 [Error] mic0: Create failed for /etc/ssh/ rsa1 keys: Unknown error 255 [Info] mic0: ExtraCommandLine 'highres=off noautogroup' [Info] mic0: RootDevice RamFS /var/mpss/mic0.image.gz [Info] mic0: Console hvc0 [Info] mic0: PowerManagement cpufreq_on;corec6_on;pc3_on;pc6_on [Info] mic0: Cgroup memory=disabled [Info] mic0: [Parse] /etc/mpss/mic0.conf [Info] mic0: [Parse] Configuration version 1.1 [Info] mic0: [Parse] /etc/mpss/default.conf [Filesys] mic0: Update /var/mpss/mic0/etc/hosts
I have tried deleting the file, but it doesn't make difference..
Everything passes in miccheck:
~$ python2 miccheck.py MicCheck 3.6.1-r1 Copyright (c) 2015, Intel Corporation. Executing default tests for host Test 0: Check number of devices the OS sees in the system ... pass Test 1: Check mic driver is loaded ... pass Test 2: Check number of devices driver sees in the system ... pass Test 3: Check mpssd daemon is running ... pass Executing default tests for device: 0 Test 4 (mic0): Check device is in online state and its postcode is FF ... pass Test 5 (mic0): Check ras daemon is available in device ... pass Test 6 (mic0): Check running flash version is correct ... pass Test 7 (mic0): Check running SMC firmware version is correct ... pass Status: OK
And lspci -vvv has the expected output (LinkSta: Width x8 because my current cpu only has 28 pcie lanes):
# lspci -s 04:00.0 -vvv 04:00.0 Co-processor: Intel Corporation Xeon Phi coprocessor SE10/7120 series (rev 20) Subsystem: Intel Corporation Xeon Phi coprocessor SE10/7120 series Physical Slot: 4-1 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 32 bytes Interrupt: pin A routed to IRQ 52 NUMA node: 0 Region 0: [virtual] Memory at 13800000000 (64-bit, prefetchable) [size=16G] Region 4: Memory at fb400000 (64-bit, non-prefetchable) [size=128K] Capabilities: [44] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [4c] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 <64us ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq- RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+ MaxPayload 256 bytes, MaxReadReq 512 bytes DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend- LnkCap: Port #0, Speed 5GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <4us, L1 unlimited ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 5GT/s (ok), Width x8 (downgraded) TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range AB, TimeoutDis+, LTR-, OBFF Not Supported AtomicOpsCap: 32bit- 64bit- 128bitCAS- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled AtomicOpsCtl: ReqEn- LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [88] MSI: Enable- Count=1/16 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [98] MSI-X: Enable+ Count=16 Masked- Vector table: BAR=4 offset=00017000 PBA: BAR=4 offset=00018000 Capabilities: [100 v1] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn- MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap- HeaderLog: 00000000 00000000 00000000 00000000 Kernel driver in use: mic Kernel modules: mic_host
ip link shows mic0 is "DOWN", which may be a bit suspicious?:
2: mic0: <BROADCAST> mtu 64512 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 4c:79:ba:44:04:83 brd ff:ff:ff:ff:ff:ff
And beginning of dmesg | grep mic:
[ 1.842992] mic: loading out-of-tree module taints kernel. [ 1.847430] mic 0000:04:00.0: enabling device (0100 -> 0102) [ 1.847654] mic0: Transition from state ready to resetting [ 11.848714] mic_probe 4:0:0 as board #0 [ 11.848731] mic: number of devices detected 1 [ 12.874044] mic0: Resetting (Post Code 12) [ 12.874047] mic0: Transition from state resetting to ready [ 16.261422] mic0: Transition from state ready to booting [ 16.261443] mic image: /usr/share/mpss/boot/bzImage-knightscorner [ 34.112744] mic0: Transition from state booting to online [ 1151.785868] mic0: Transition from state online to shutdown [ 1173.888858] mic0: Transition from state shutdown to resetting [ 1175.915456] mic0: Resetting (Post Code 3C) [ 1176.928771] mic0: Resetting (Post Code 3d) [ 1177.942096] mic0: Resetting (Post Code 3d) [ 1178.955412] mic0: Resetting (Post Code 3d) [ 1179.968696] mic0: Resetting (Post Code 3d) [ 1180.982047] mic0: Resetting (Post Code 3E) [ 1181.995368] mic0: Resetting (Post Code 3E) [ 1183.008682] mic0: Resetting (Post Code 3E) [ 1184.022000] mic0: Resetting (Post Code 09) [ 1185.035322] mic0: Resetting (Post Code 09) [ 1186.048633] mic0: Resetting (Post Code 10) [ 1187.061958] mic0: Resetting (Post Code 12) [ 1187.061961] mic0: Transition from state resetting to ready [ 1523.197850] mic0: Transition from state ready to booting [ 1523.197871] mic image: /usr/share/mpss/boot/bzImage-knightscorner [ 1557.373556] mic0: Transition from state booting to online
I can't really figure out the problem. Is it my network configuration? And if so what should I do?
I use Arch linux (Manjaro) with linux 4.9, with mpss from the AUR.
MB: AsRock Taichi x99
CPU: i7 6800K
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The liines about not being able to read/update the directory /etc/ssh are worrisome.
Unfortunately, ARC Linux is not supported. Here's what I get on a CentOS 6 system:
micctrl --initdefaults -vvv [Filesys] mic0: Created directory /etc/mpss [Filesys] mic0: Created /etc/mpss/default.conf [Filesys] mic0: Created /etc/mpss/mic0.conf version 1.1 [Info] mic0: File System Base /usr/share/mpss/boot/initramfs-knightscorner.cpio.gz [Info] mic0: MIC Family x100 [Info] mic0: MPSSVersion 3.x [Info] mic0: Common files at /var/mpss/common [Info] mic0: Unique files at /var/mpss/mic0 [Info] mic0: Hostname pleedo-mic0.nikhef.nl [Filesys] mic0: Update MacAddrs in /etc/mpss/mic0.conf [Info] mic0: Network Static Pair MIC 172.31.1.1 Host 172.31.1.254 [Filesys] mic0: Updated /etc/sysconfig/network-scripts/ifcfg-mic0 [Network] mic0: ifup mic0 [Filesys] Update file /etc/resolv.conf [Filesys] mic0: Update /var/mpss/mic0/etc/network/interfaces [Info] mic0: Using existing /etc/hosts entry: 172.31.1.1 pleedo-mic0.nikhef.nl mic0 [Filesys] mic0: Update Network in /etc/mpss/mic0.conf [Info] mic0: Verbose mode Disabled [Info] mic0: Linux OS image /usr/share/mpss/boot/bzImage-knightscorner System Map /usr/share/mpss/boot/bzImage-knightscorner [Info] mic0: Boot On Start Enabled [Info] mic0: Shutdown Timeout 300 [Info] mic0: MIC Crash Dump at /var/crash/mic size 16 [Info] mic0: ExtraCommandLine 'highres=off noautogroup' [Filesys] mic0: Update RootDevice in /etc/mpss/mic0.conf [Info] mic0: RootDevice RAMFS /var/mpss/mic0.image.gz [Info] mic0: Console hvc0 [Info] mic0: PowerManagement cpufreq_on;corec6_off;pc3_on;pc6_off [Info] mic0: Cgroup memory=disabled [Info] mic0: [Parse] /etc/mpss/mic0.conf [Info] mic0: [Parse] Configuration version 1.1 [Info] mic0: [Parse] /etc/mpss/default.conf [Filesys] mic0: Update /var/mpss/mic0/etc/hosts
Here's what I would do:
- start the MPSS daemon
- check for the device /dev/ttyMIC0
- use a tool like minicom to connect to the console of the MIC and log in
- check the network settings on the MIC
Good luck!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the suggestion!
I have just solved the first problem about "/ets/netctl/interfaces".
It turned out that --initdefaults created a file in "/ets/netctl", for every invocation, named inter*****, where ***** were random letters, e.g. "inter0NmyNk".
I renamed the latest one to "interfaces", manually, and deleted the rest and the error disappeared.
Maybe something similar is happening for "/ets/ssh". I'll investigate and report back if I am successful.
P.S. I know Arch linux is not supported and neither is the motherboard, technically. But there is always a chance that someone has seen it before, the unix(-like) OS's are far more alike than different.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page