Software Archive
Read-only legacy content
17061 Discussions

bootup Phi in Xen/Dom0 failed with kernel crash

Martin_C_1
Beginner
827 Views
Hello. I have unusual (and unsupported) config: - centos 5.10 - kernel 3.10 (patched by citrix - XenServer Creedence Release Candidate) - xen from git - mpss-3.4.2 (some *.rpm without dependencies installed), kernel modules rebuilt from mpss-modules-3.4.2-1.src.rpm, libraries and base management compiled from sources (mpss-micmgmt-3.4.2.tar.bz2,mpss-daemon-3.4.2.tar.bz2,mpss-metadata-3.4.2.tar.bz2) If I boot WITHOUT xen then all is ok (Phi booted, ssh works). If I boot WITH xen and if I try to boot Phi (eg. "mpss.redhat start") then the Dom0 kernel crashes (see below). Is Phi tested in Dom0 ? (not in DomU - https://software.intel.com/en-us/articles/getting-xen-working-for-intelr-xeon-phitm-coprocessor) Thanks for answer and happy new year, Martin PS: Is there some script (or ,srpm/.spec) for mpss-src-3.4.2.tar to get .rpm ? --------------------------------------------------------- Dec 31 15:28:39 xen kernel: [ 671.595539] mic0: Transition from state ready to booting Dec 31 15:28:39 xen kernel: [ 671.617345] mic image: /usr/share/mpss/boot/bzImage-knightscorner Dec 31 15:28:39 xen kernel: [ 671.638972] MIC 0 Booting Dec 31 15:28:48 xen kernel: [ 681.084473] Waiting for MIC 0 boot 5 Dec 31 15:28:53 xen kernel: [ 686.105140] Waiting for MIC 0 boot 10 Dec 31 15:28:58 xen kernel: [ 691.125740] Waiting for MIC 0 boot 15 Dec 31 15:28:58 xen kernel: [ 691.216977] Unknown message 0x0n scifdev->sd_state 0x1 scifdev->sd_node 0x1 Dec 31 15:28:58 xen kernel: [ 691.238007] ------------[ cut here ]------------ Dec 31 15:28:58 xen kernel: [ 691.257871] kernel BUG at /root/rpmbuild/BUILD/mpss-modules-3.4.2/micscif/micscif_nodeqp.c:2377! Dec 31 15:28:58 xen kernel: [ 691.297410] invalid opcode: 0000 [#1] SMP Dec 31 15:28:59 xen kernel: [ 691.316850] Modules linked in: dm_round_robin cls_u32 sch_htb lockd sunrpc bridge stp llc ib_iser(O) iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi dm_multipath scsi_dh rdma_ucm(O) rdma_cm(O) iw_cm(O) ib_addr(O) ipv6 ib_ipoib(O) ib_cm(O) ib_uverbs(O) ib_umad(O) ocrdma(O) be2net(O) mlx4_en(O) mlx4_ib(O) ib_sa(O) mlx4_core(O) ib_mthca(O) ib_mad(O) ib_core(O) video backlight sbs sbshc hed acpi_ipmi ipmi_msghandler nvram mic(O) sg hid_generic usbhid hid nvidia(PO) mxm_wmi compat(O) igb i2c_algo_bit ptp pps_core wmi tpm_tis tpm tpm_bios i2c_i801 ehci_pci lpc_ich mfd_core crc32_pclmul shpchp dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage isci libsas scsi_transport_sas ahci libahci libata sd_mod scsi_mod uhci_hcd ohci_hcd ehci_hcd [last unloaded: mlx4_core] Dec 31 15:28:59 xen kernel: [ 691.511812] CPU: 2 PID: 3878 Comm: kworker/u48:11 Tainted: P W O 3.10.0+2 #1 Dec 31 15:28:59 xen kernel: [ 691.549268] Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 3.0a 07/31/2013 Dec 31 15:28:59 xen kernel: [ 691.568281] Workqueue: SCIF INTR 1 micscif_intr_bh_handler [mic] Dec 31 15:28:59 xen kernel: [ 691.587080] task: ffff880054d42e20 ti: ffff8800408d4000 task.ti: ffff8800408d4000 Dec 31 15:28:59 xen kernel: [ 691.623894] RIP: e030:[] [] scif_msg_unknown+0x20/0x30 [mic] Dec 31 15:28:59 xen kernel: [ 691.660942] RSP: e02b:ffff8800408d5cd8 EFLAGS: 00010296 Dec 31 15:28:59 xen kernel: [ 691.679537] RAX: 000000000000003f RBX: ffff8800408d5d78 RCX: ffff88007d64f9f0 Dec 31 15:28:59 xen kernel: [ 691.698391] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8800408d0218 Dec 31 15:28:59 xen kernel: [ 691.717089] RBP: ffff8800408d5cd8 R08: 0000000000000000 R09: ffff88005e26d220 Dec 31 15:28:59 xen kernel: [ 691.735795] R10: 0000000000000000 R11: ffffffffffffffff R12: ffffffffa0d1c330 Dec 31 15:28:59 xen kernel: [ 691.754872] R13: ffff88010200ac00 R14: ffff88005897c600 R15: ffff88005897c648 Dec 31 15:28:59 xen kernel: [ 691.774301] FS: 0000000000000000(0000) GS:ffff88007d640000(0000) knlGS:ffff88007d640000 Dec 31 15:28:59 xen kernel: [ 691.812588] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 Dec 31 15:28:59 xen kernel: [ 691.831861] CR2: ffffffffff600000 CR3: 0000000045081000 CR4: 0000000000042660 Dec 31 15:28:59 xen kernel: [ 691.851097] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Dec 31 15:28:59 xen kernel: [ 691.870225] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Dec 31 15:28:59 xen kernel: [ 691.888861] Stack: Dec 31 15:28:59 xen kernel: [ 691.906857] ffff8800408d5cf8 ffffffffa0cf25b7 0000000000000000 ffffffffa0d1c330 Dec 31 15:28:59 xen kernel: [ 691.943284] ffff8800408d5e08 ffffffffa0cf44b2 0000000100000002 0000000000000000 Dec 31 15:28:59 xen kernel: [ 691.979591] ffff8800408d5d48 ffffffffa0d1c4c8 ffffffffa0d1c4c8 ffffffffa0d1c4c8 Dec 31 15:28:59 xen kernel: [ 692.015812] Call Trace: Dec 31 15:28:59 xen kernel: [ 692.033264] [] micscif_nodeqp_msg_handler+0x57/0x60 [mic] Dec 31 15:28:59 xen kernel: [ 692.051271] [] micscif_nodeqp_intrhandler+0x2a2/0x410 [mic] Dec 31 15:28:59 xen kernel: [ 692.085436] [] ? __schedule+0x75f/0x800 Dec 31 15:28:59 xen kernel: [ 692.102662] [] ? __queue_delayed_work+0x128/0x140 Dec 31 15:28:59 xen kernel: [ 692.119776] [] micscif_intr_bh_handler+0x5b/0x70 [mic] Dec 31 15:28:59 xen kernel: [ 692.136769] [] process_one_work+0x238/0x390 Dec 31 15:28:59 xen kernel: [ 692.153386] [] worker_thread+0x1d9/0x2c0 Dec 31 15:28:59 xen kernel: [ 692.169590] [] ? manage_workers+0x1f0/0x1f0 Dec 31 15:28:59 xen kernel: [ 692.185457] [] kthread+0xc3/0xd0 Dec 31 15:28:59 xen kernel: [ 692.201095] [] ? xen_end_context_switch+0x1e/0x30 Dec 31 15:28:59 xen kernel: [ 692.216679] [] ? flush_kthread_worker+0xd0/0xd0 Dec 31 15:28:59 xen kernel: [ 692.232294] [] ret_from_fork+0x7c/0xb0 Dec 31 15:29:00 xen kernel: [ 692.247980] [] ? flush_kthread_worker+0xd0/0xd0 Dec 31 15:29:00 xen kernel: [ 692.263614] Code: c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 1f 44 00 00 0f b7 0f 8b 57 04 31 c0 8b 76 08 48 c7 c7 88 04 d1 a0 e8 20 32 36 e0 <0f> 0b eb fe 66 0f 1f 44 00 00 66 0f 1f 44 00 00 55 48 89 e5 48 Dec 31 15:29:00 xen kernel: [ 692.311953] RIP [] scif_msg_unknown+0x20/0x30 [mic] Dec 31 15:29:00 xen kernel: [ 692.327995] RSP Dec 31 15:29:00 xen kernel: [ 692.370751] ---[ end trace 0fc9b14212e11d7f ]---
0 Kudos
2 Replies
Frances_R_Intel
Employee
827 Views

I am far from a Xen expert (in fact, I am far from a Xen anything) but I don't think booting the coprocessor from Dom0 is what you want to do. Xen, when used with the coprocessor, does not divide the coprocessor between domains; it allows you to have a DomU that has a full coprocessor assigned to it. (I think if you have more than one card, you can have different DomU's each with their own card. This would make sense but, as I said, I am no Xen expert.) If the coprocessor is assigned to a DomU for its sole use, then the DomU should have complete control of the coprocessor, including booting it. Does this make sense, or have I, in my naivete, said something that makes no sense? Is there a scenario in which the Dom0 should have ultimate control of the coprocessor, with or without a DomU, given that a DomU is given complete control over a coprocessor?

And, as to the other issues -

There shouldn't be any unsatisfied dependencies when you install the MPSS. Do you have a list of what was "missing"? Were they things that were installed but for which there wasn't an entry in the table of installed rpm's?

And I will ask the developers if they can pass on the spec files for the source code in mpss-src-3.4.2.tar.

0 Kudos
Martin_C_1
Beginner
827 Views
Hello in Y2015. Dom0 - I had work for Phi in Dom0, I need encode many VDI screens to jpegs. But Phi seams to be totally unusable for this workload :-( (https://software.intel.com/en-us/forums/topic/537804) Dependencies - Intel MPSS drops support for Centos5, eg. binaries and libs requires GLIBCXX version 3.4.9+3.4.11... and GLIBC version 2.7+2.9.., but Centos5 have only GLIBCXX_3.4.8 and GLIBC_2.5. I am not able to leave this old stuff because Nvidia binary drivers. Thanks for answers, Martin
0 Kudos
Reply