- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Are you having problems with your hardware (Cannot see your Intel(R) Xeon Phi(tm) coprocessor? Sporadic accessibility?) or with the Intel(R) Manycore Platform Software Stack (Intel(R) MPSS) running reliably?
Attached to this post are PDF "flowcharts" that explain how you can troubleshoot the problem (note: Both Linux and Windows flowcharts are available), and shows what information you will want to collect if you need to escalate your issue to your OEM provider or Intel.
We hope this is is useful to you! Please let us know if you have found a boundary condition not comprehended properly by this "flow".
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have experienced a problem that looks like a bug in the the 64 bit memory to stack push instruction.
I am porting the Glasgow Pascal compiler to the MIC and have run into an error that looks very like a bad implementation of PUSH
It appears that a push instruction of the form:
push QWORD[ r8* 8+ label]
actually pushes the quadword at
push QWORD[ r8* 8+ label140ba08d9aadf+8]
Here are the relevant source lines along with the relevant assembler lines that they translate into
First we have a call on a run time library function written in C using C parameter passing.
;writeln( shiftindex[d,0]);
; note that shiftindex is declared as array[0..4,0..1] of integer
mov rcx, 5 ; field width info
mov rdx, 12 ; field width info
mov bl,BYTE ptr [ rbp+ -49]
movsx r8, bl
imul r8, 8
movsx rsi, dword ptr [ r8+ label140ba08d9aadf] ; the parameter for the value to be printed
movsx rdi, dword ptr [ unit$system$base+ -24] ; the file it will be sent to
.ifndef definedprintint
definedprintint=1
.extern printint
.endif
call printint;#imported
;--------
; this correctly prints out the 0th element of the row of the array shiftindex
Now we call a pascal function passing a row of the array by value on the stack using a push instruction to place the row on the stack
;compareImagePair (shiftindex
; d is a byte
; #297
mov bl,BYTE ptr [ rbp+ -49]
movsx r8, bl
push QWORD[ r8* 8+ label140ba08d9aadf]
call label140ba08d9abe3
; this passes to the function the d+1 th element of the array shiftindex ; in other words the push function fetches the wrong element from the array ; as compared to the mov instruction used earlier
Printout from programme
First the contents of the shiftindex array
0 0
0 -1
0 1
-1 0
1 0
d shiftindex[d,0]
3 -1
what we get inside the function compareImagePair when we print the parameter
dirvec = 1 0
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have now concluded that this is a bug in the assembler distributed with the MIC, if you replace the line
push QWORD[ r8* 8+ label140ba08d9aadf]
with
push QWORD ptr [ r8* 8+ label140ba08d9aadf]
it fetches the correct value not a value 8 on from the correct address.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for isolating this bug. It has been reported to the team that owns the assembler.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It appears that the x86_64 assembler does the same thing.
The error is that "QWORD ptr" must be used here, as Paul realized. The fact that QWORD alone is allowed may be a bug, which we need to discuss internally; If AT&T syntax is used, what happens?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am trying to install Xeon Phi card on a Supermicro server (http://www.supermicro.com/products/superblade/module/sbi-7127rg.cfm). According to the flow chart, I need to "Enable support for mapping >4GB MMIO in the host BIOS" . However, I cannot see the MMIO setup option in BIOS even after upgrading to the latest version. Could anyone please give me some suggestions?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Yue,
from a quick view on your link it looks like it's an old server which might not support Xeon Phi at all. You might check with Supermicro whether this server could host Xeon Phi.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
otherwise it might also be the case that your BIOS has this option by default - is your card detected at all?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
... see also http://www.supermicro.com/support/resources/gpu/, and http://www.supermicro.com/products/superblade/module/SBI-7127RG-E.cfm. From that it looks like you'd need the "-E" version
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Belinda,
I've installed MPSS for Windows on a Windows 7 Pro x64 system. I can get the two Xeon Phi 5510P cards up and running, firmware updates, cards boot, micinfo shows both cards, cards ready, I can ping both cards. MicSmc-gui.exe shows both cards ideling along, ...
I can compile my first project selected in the tutorials coi folder "hello_world". The project compiles and runs fine up to the point where it wants to launch the native side app hello_world_sink_mic, which is not built by the solution (as separate project).
Launching an Intel Parallel Studio XE 2013 command prompt for use with Visual Studio 2012, and navigating to the demo folder (under C:\Program Files\..." and issuing
icl -Qmic hello_world_sink.cpp -o hello_world_sink_mic
I receive an error stating stdio.h cannot be found, check MPSS environment variables.
If I remove the -Qmic (not what I want as this compiles as host app) I get an error writing the .obj file (due to folder off C:\Program Files\..."
If I copy the MPSS folder elsewhere (not under protected folder)...
compile with -Qmic fails with stdio.h not found
compile without -Qmic succeeds.
IOW -Qmic expects a different set of environment variables (with respect to INCLUDE)
How do I properly set the environment variable(s) for compiling the coprocessor side (-Qmic) of the demo programs under Windows?
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We installed the 3120A card in Windows7 box, the card is blinking blue. Installed MPSS 3.1.2. The card is not displayed in Device Manager (?). "micctrl -s" command results in error:
Error manipulating coprocessor: Intel(R) Xeon Phi(TM) coprocessor driver is not loaded or you have insufficient access
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Alex, can you check the following (this is based on similar forum posts reported earlier this month)
1. physically inspect the card installation - is the card inserted properly, and are all power connectors on the card plugged in properly
2. If you are working with a numa machine, where some of the PCI slots are enabled or disabled, you need to make sure the coprocessor is installed on an enabled PCI slot.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you, BELINDA! Switching to another PCI slot worked!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The other problem that we are having is actually with NFS exporting GPFS shares. Since the GPFS drivers and client software does not support MIC, we NFS export the drives from each host to its MICs. It is very unreliable though, and so we find that the MICs will not mount the drives sometimes, citing "stale NFS filehandle" as the cause, which is untrue. It seems related to the order of the mounts in /etc/fstab, as the first one will mount and the second won't.
Ideally we'd like GPFS binaries for MIC, as this is a kludge anyway. In the current state we can't really say to users that the systems are ready to use.
(We'd also really like MPSS to support OFED 2.x, since that is what the rest of the machine is using. Only the nodes with MICs in are on 1.5.x, and that's entirely due to needing it to support the IPoIB software provided with MPSS.)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Zaniyah,
is your mic stracktrace from a consistently failing coprocessor (and can you send the tarball that gets created by the micdebug.sh script?)
As for GPFS -- are you in a position where you can ask IBM for their plans to support GPFS with Intel Xeon Phi Coprocessors ? You can even tell them that there is now a Lustre client (was recently released, we'll provide a writeup on that soon).
I will make sure to pass on your comments about wanting OFED 2.x support.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Belinda,
I have installed mpss-3.2 for first use of Xeon Phi, but I can not know which version of Flash is installed and I can not update it.
Neither can I start mpss service.
Here are several results of commands :
sudo micinfo MicInfo Utility Log Copyright 2011-2013 Intel Corporation All Rights Reserved. Created Thu Mar 27 08:18:47 2014 System Info HOST OS : Linux OS Version : 2.6.32-431.5.1.el6.x86_64 Driver Version : 3.2-1 MPSS Version : 3.2 Host Physical Memory : 65918 MB Device No: 0, Device Name: mic0 Version Flash Version : NotAvailable SMC Firmware Version : NotAvailable SMC Boot Loader Version : NotAvailable uOS Version : NotAvailable Device Serial Number : NotAvailable ...
sudo micflash -update -device all -smcbootloader No image path specified - Searching: /usr/share/mpss/flash mic0: No valid image found
micsmc DEBUG: ***** MicSettings(parent)::fileName(): "/home/vivi/.config/Intel Corp/MicSmcGUI.ini" DEBUG: ***** SessionSettings(parent)::fileName(): "/home/vivi/.config/Intel Corp/MicSmcGUI.ini" Avertissement�: mic0 : Connexion avec le p�riph�rique perdue ! Infos Web mic0 : Connexion avec le p�riph�rique r�tablie. Avertissement�: mic0 : Connexion avec le p�riph�rique perdue ! Infos Web mic0 : Connexion avec le p�riph�rique r�tablie. Avertissement�: mic0 : Connexion avec le p�riph�rique perdue !
sudo service mpss start Starting Intel(R) MPSS: [ÉCHOUÉ]
May you help me to find what is the trouble ?
Thanks in advance.
Virginie
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I was thinking that there was a probleme beacause of the kenrnel version.
So I restart with the original kernel version and reinstall MPSS. But here is the result of micctrl --initdefaults :
micctrl --initdefaults micctrl(segv_handler+0x18) [0x4070c8] /lib64/libpthread.so.0() [0x34cb40f710] /usr/lib64/libmpssconfig.so.0.0.1(_add_miclist_not_present+0xb8) [0x7f5c84a35b98] /usr/lib64/libmpssconfig.so.0.0.1(mpss_get_miclist+0x4d) [0x7f5c84a35e7d] micctrl(create_miclist+0x1cd) [0x42123d] micctrl(parse_config_args+0x370) [0x40db60] micctrl(main+0x236) [0x40df56] /lib64/libc.so.6(__libc_start_main+0xfd) [0x34cb01ed1d] micctrl() [0x406d29]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Virginie, can you send us the output of /usr/bin/micdebug.sh (just attach the tarball to this thread). That would be most helpful.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Virginie,
the 'lspci -vvv' output on your host shows some weird things for the coprocessor (look for Co-processor in the output).
Here is what it shows for you:
------
04:00.0 Co-processor: Intel Corporation Xeon Phi coprocessor SE10/7120 series (rev ff) (prog-if ff)
!!! Unknown header type 7f
Kernel driver in use: mic
------
Here is what it should normally show: (as an example)
-----
01:00.0 Co-processor: Intel Corporation Device 2250 (rev 11)
Subsystem: Intel Corporation Device 2500
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 32
Region 0: Memory at 380c00000000 (64-bit, prefetchable) [size=8G]
Region 4: Memory at fb700000 (64-bit, non-prefetchable) [size=128K]
Capabilities: [44] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [4c] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 <64us
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 5GT/s, Width x16, ASPM L0s L1, Latency L0 <4us, L1 unlimited
ClockPM- Surprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range AB, TimeoutDis+, LTR-, OBFF Not Supported
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [88] MSI: Enable- Count=1/16 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [98] MSI-X: Enable+ Count=16 Masked-
Vector table: BAR=4 offset=00017000
PBA: BAR=4 offset=00018000
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
Kernel driver in use: mic
----
I found someone else in this forum who has similar hardware to yours:
Manufacturer: ASUSTeK COMPUTER INC.
Product Name: P9X79 WS
he uses CentOS (6.4) vs. yours (6.5), using an older MPSS (3.1.x) vs yours (3.2).
Let me ask a couple of questions:
- is this the first time you've installed this coprocessor? (that seems to be the case based on what you've said before)
- have you tried plugging the coprocessor into any other slot in your system
- did you change anything in your system's BIOS? (i.e. you need to enable BIOS support for memory mapped I/O address ranges above 4GB? )
- we may have to look further into the BIOS -- I have some BIOS update files from someone who, like I said before, had his ASUS functioning. ". I could forward these to you. The version he has working is P9x79-WS-ASUS-4306.CA. what is yours?
![](/skins/images/0B743BCD37FD8EED134343C203D4D429/responsive_peak/images/icon_anonymous_message.png)
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page