- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am trying to use the Xeon Phi card with a Xen guest VM (HVM guest) with PCI passthrough (supported by Intel's VT-d). The following is my setup:
SuperMicro SYS-1017GR-TF (has the X9SRG-F motherboard which is known to support the Phi with the latest BIOS)
Dom0: Debian Wheezy 64-bit
HVM Guest: CentOS 6.3 with stock kernel (2.6.32-279 x86_64), Xeon Phi is passed through with the Xen pciback driver
In this configuration, the VM will not boot. I'm assuming it has something to do with the special BIOS requirements that the Phi card has for memory addressing, though I'm not sure how that translates to a Xen VM.
Any thoughts or advice are greatly appreciated.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
we do have some recently published documentation here on how to get Xen going: http://software.intel.com/en-us/articles/getting-xen-working-for-intelr-xeon-phitm-coprocessor
although I am not sure this will solve your problem, it's worth seeing if the patches make any difference (provided you can apply them to your distro..)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Belinda,
Thank you for your quick response! The patches you have in that article worked perfectly and I'm currently getting the Phi card up and running in a VM.
Do you know the status of those patches getting incorporated into the Xen and Qemu projects?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I may have spoken slightly too soon. :)
I have passed the Phi through to the guest VM and it is showing up with "lspci" on the guest in the same way that it does when running on a system without Xen. The mic.ko kernel module loads successfully but when I run "micctrl --initdefaults" I get an error message... "No MIC cards found in the system. The MIC driver has been determined to be loaded. Use the lspci utility to verify the cards are installed".
Any ideas why micctrl doesn't seem to think there are any cards?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
can you provide the output of '/opt/intel/mic/bin/micsmc -a' ?
(you can dump it into a text file and attach it to your response)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Here it is, it's short so I just pasted it here:
Critical: mic0: Failed to Initialize API, returned with 0x2
Critical: mic0: Failed to Initialize API, returned with 0x2
Fatal: Attempt to query devices to determine the number of Intel® Xeon Phi™ Coprocessors in your system failed.
Please make sure your devices are initialized and running.
No devices detected in target system: exiting program.
Also, here's the output of lspci -vv for the Phi card, just for reference:
00:05.0 Co-processor: Intel Corporation Device 2250 (rev 11)
Subsystem: Intel Corporation Device 2500
Physical Slot: 5
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 10
Region 0: Memory at <ignored> (64-bit, prefetchable)
Region 4: Memory at f3060000 (64-bit, non-prefetchable) [size=128K]
Capabilities: [44] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [4c] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 <64us
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 5GT/s, Width x16, ASPM L0s L1, Latency L0 <4us, L1 unlimited
ClockPM- Surprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range AB, TimeoutDis+
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB
Capabilities: [88] MSI: Enable- Count=1/1 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [98] MSI-X: Enable- Count=16 Masked-
Vector table: BAR=4 offset=00017000
PBA: BAR=4 offset=00018000
Let me know if there's anything else I can give you to help. Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
a couple of additional data requests:
1. What Xen tree and Xen version are you using?
2. What Qemu version are you using
3. Can you include the Xen and Qemu-generated logfiles
4. Output of '"lspci -s B:D.F -vvv -xxx' from your Xen client (so the actual command for you is probably: lspci -s 00:05.0 -vvv -xxx)
(we may need to see if we can reproduce this internally)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
1. I pulled Xen from the master branch of their git repository (making it 4.3.0 unstable I guess) ... the commit ending in 1c61028.
2. Qemu was also pulled from the master branch of the git repo, it was tagged 1.5.0-rc0, commit ending in 15d23fb.
3. I've attached all the log files I found that seemed pertinent. If there is something in particular that I haven't included, let me know how to get it and I'll provide it to you.
4. I ran the lspci command on both the dom0 and the VM and have attached output of both (looked the same at first glance but I included both anyway).
I really appreciate your help on this. Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Our internal development team was able to get things working with the following configuration:
* Xen
- xen unstable: http://xenbits.xen.org/hg/xen-unstable.hg
- xen version: xen unstable 25970
* Qemu
- qemu upstream: git.qemu.org/qemu.git
- qemu version: aabc8530c7ba2be89e21463f051056ad7c255e6e
was wondering if it's at all possible to see if you could try these specific versions & see if the problem persist?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
i suggest reboot the virtual machine or the computer and after loading the mic-driver successfully ,you can read the log or proc-maps as a reference.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Belinda,
I'll give those versions a try today and let you know how I make out. Thanks again.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Belinda,
After acquiring and building the versions you listed, I'm having trouble even starting the VM. xl gives the following error when I attempt to create the VM:
libxl: error: libxl_create.c:432:libxl__domain_make: domain creation fail
libxl: error: libxl_create.c:660:initiate_domain_create: cannot make domain: -3
libxl: error: libxl.c:1378:libxl__destroy_domid: non-existant domain -1
libxl: error: libxl.c:1342:domain_destroy_callback: unable to destroy guest with domid 4294967295
libxl: error: libxl_create.c:1225:domcreate_destruction_cb: unable to destroy domain 4294967295 following failed creation
This is my VM config file, for reference:
name = 'mbcphi-centos'
builder = 'hvm'
memory = 10240
disk = [ 'file:/storage/centos.img,xvda,w' ]
vif = [ 'model=e1000,mac=00:16:3e:72:79:aa,bridge=xenbr0']
boot = 'c'
localtime = 1
vnc=1pci = [ '03:00.0' ]
device_model_version = 'qemu-xen'
device_model_override = '/root/new/qemu/x86_64-softmmu/qemu-system-x86_64'
The Xen and qemu builds looked successful, but I haven't had this error with the other versions I've tried.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Do you use the same version for xen and libxl, I mean, when you build xen, do you also re-build the libxl?
It's better to share xen log with command "xl dmesg" when something wrong.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Xudong,
I have been rebuilding the entire xen source, including the libraries and toolstack. Now that you mention it though, there are several different versions of libxl hanging out in the /usr/lib area, so I may be running into some conflicts. I am going to try removing all of them and reinstalling xen to see what happens. I will include xl dmesg output afterwards.
Justin
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I cleared out all of the libxl and libxen* libraries from my system and reinstalled the version of Xen called out in Belinda's post above. After doing that, I'm having the same problem as before, with micctrl and micsmc claiming that I do not have a Phi card installed (mic driver is loaded successfully, but no cards are found). I've attached the output of xl dmesg in case that's helpful.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
From the comment above, it's said you can boot up guest, right? So there is only still one problem that the device doesn't work in guest, right?
Can you provide me guest dmesg log?
And can you try to remove emulated NIC device by removing "vif =" line, as well as guest dmesg log if problem persist.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, the guest does boot up, and the device is present as shown by lspci in the guest VM. The mic driver loads successfully, but the mic utilities claim there is no Phi card present. I've attached dmesg output from the guest, one file is dmesg with the NIC present, the other is dmesg without the NIC present.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Xudong,
Any thoughts on this? Just to clarify, the Phi card was not "seen" by the mic utilities in either case, with or without the NIC present.
One thought I had... would the virtual BDF number be significant? In dom0, the Phi card is 03:00.0 but in the VM, it is presented as 00:05.0. I thought I remembered seeing a way to tell Xen to keep the same BDF number for the guest, but I can't seem to find that now. Do you think that would affect anything?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Was just troubleshooting some more and started digging into some of the source code for the mic utilities to try to see how they determine whether there are any cards present. It looks like the micinfo tool is basically just checking /sys/class/mic/ for the presence of mic0, mic1, etc. but on my VM, the only things in that directory are a ctrl and scif softlink, no mic0.
I'm assuming that the contents of the sysfs directory are populated by the mic kernel module, is that correct? If so, any idea how the mic driver determines what cards are present? I'm starting to pick through the source of mic.ko, but I don't have a lot of driver code experience to back me up.
Any help is greatly appreciated.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Although the device is assigned to guest, but the BDF is emulated by qemu which is not same as the real device's BDF.
Can you try to boot guest with 2GB memory, check NIC device and attach the guest dmesg?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Xudong,
I apologize for not getting back to you sooner.
I have tried a few different configurations and have found that the VM will not boot in certain configurations with 2G of RAM. I have detailed the configurations below:
- 2GB of RAM, with Xeon Phi passthrough, with NIC enabled: Does not boot
- 2GB of RAM, without Xeon Phi, with NIC: Boots fine (dmesg output attached)
- 10GB of RAM, with Xeon Phi, with NIC: Boots fine (dmesg output attached)
- 2GB of RAM, with Xeon Phi, without NIC: Does not boot
I'm not sure why the VM will not boot with only 2GB of RAM with the Phi card installed. Please let me know if there's any more information I can get for you.
I really appreciate all the help you've provided.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page