Software Archive
Read-only legacy content
17061 Discussions

Trouble installing Xeon Phi 7120 coprocessor on CentOS 7.3

GAl-G
Beginner
599 Views

Hi all,

We purchased two Intel Xeon Phi 7120P co-processors and are attempting to set one of them up with a CentOS 7.3 machine. 

Disclaimer: I don't know what hardware stepping or firmware exists on these cards. We were told they were an early stepping and do not come with logos.

First steps: Insert the card, connect both power cables, and verify the card appears in lspci:

     01:00.0 Co-processor: Intel Corporation Xeon Phi coprocessor SE10/7120 series (rev 10)

The very first steps seemed to be going a bit wrong with SPSS 3.8 (this is the only version we tried):

cp: cannot stat ‘./modules/*3.10.0-514.2.2.el7.x86_64*.rpm’: No such file or directory

This is odd. I manually copied the file "mpss-modules-dev-3.10.0-514.el7.x86_64-3.8-1.x86_64.rpm" to the main directory and continued with the installation:

$ sudo yum install *.rpm

Total size: 697 M
Installed size: 697 M
Is this ok [y/d/N]: y
Downloading packages:
Running transaction check
Running transaction test


Transaction check error:
  file /usr/bin/coitrace from install of mpss-coi-3.8-1.glibc2.12.x86_64 conflicts with file from package xppsl-coi-1.3.3-151.x86_64
  file /usr/bin/micnativeloadex from install of mpss-coi-3.8-1.glibc2.12.x86_64 conflicts with file from package xppsl-coi-1.3.3-151.x86_64
  file /usr/lib64/libcoi_host.so.0 from install of mpss-coi-3.8-1.glibc2.12.x86_64 conflicts with file from package xppsl-coi-1.3.3-151.x86_64
  file /usr/lib64/libcoitracelib.so.0 from install of mpss-coi-3.8-1.glibc2.12.x86_64 conflicts with file from package xppsl-coi-1.3.3-151.x86_64

Error Summary
-------------

Needless to say, this is also strange. I never installed the mpss before, and looking for it turns up nothing. However, looking for "xppsl" returns this:

$ rpm -qa | grep xppsl
xppsl-systools-sb-1.3.3-151.x86_64
xppsl-hwloc-gui-1.11.2-1.3.3.151.x86_64
xppsl-memkind-1.3.3-151.x86_64
xppsl-memkind-devel-1.3.3-151.x86_64
xppsl-hwloc-sbin-1.11.2-1.3.3.151.x86_64
xppsl-hwloc-libs-1.11.2-1.3.3.151.x86_64
xppsl-micperf-1.3.3-151.x86_64
kernel-3.10.0-327.13.1.el7.xppsl_1.3.3.151.x86_64
xppsl-coi-1.3.3-151.x86_64
xppsl-coi-device-1.3.3-151.x86_64
kernel-devel-3.10.0-327.13.1.el7.xppsl_1.3.3.151.x86_64
xppsl-hwloc-1.11.2-1.3.3.151.x86_64
xppsl-hwloc-devel-1.11.2-1.3.3.151.x86_64
xppsl-cpuid-1.3.3-151.x86_64
xppsl-threadrunner-1.3.3-151.x86_64
xppsl-mcelog-1.3.3-151.x86_64

 

These errors did not appear in a yum check prior to installation. I am the only administrator of this system. 

Disclaimer2: I have devtoolset-3 enabled, an external repo for gcc and binutils. I highly doubt this is an issue here. I don't know why I have these packages or where they come from, as they appeared to be automatically installed by the OS, but I am hesitant to remove them without checking with you first.

Further googling of this mystery found this page: https://software.intel.com/en-us/articles/xeon-phi-software#lx1-5rel 

That link has nothing to do with the 7120P, and seem to pertain instead to my main system processor (Intel 7210). What's going on here? What to do?

Thanks!

Gabe

0 Kudos
4 Replies
JJK
New Contributor III
599 Views

Leave it up to Intel Marketing to come up with confusing product names ;)

You've got a first generation Xeon Phi 7120 ; for this card you must install only the mpss software stack; you cannot also install the xppsl software. Remove all xppsl RPMs and reinstall the mpss 3.8 software stack.

The xppsl packages (and devtoolset3 stuff) is for the second generation of Xeon Phi's with the product number 7210.

 

0 Kudos
GAl-G
Beginner
599 Views

But my server processor is indeed called "7210" and is a "Knights Landing" standalone server CPU. Removing xppsl resulted in a kernel panic and CentOS would not boot. :)

Couldn't intel simply rename the conflicting files? Obviously it makes sense to add Phi nodes to a Phi system (and we were told, sensibly, that any server with a modern CPU and BIOS that supports 64-bit PCI-E memory mapping would work with the Phi add-on card). My server doesn't have problems with anything else, including wonderful NVIDIA GTX cards, all sorts of dongles, a 4k monitor, etc. Strange the most natural expansion of all (a Phi co-processor) would "conflict" like this?

0 Kudos
JJK
New Contributor III
599 Views

Ah, you're attempting to do what I still want to try with my 7210 machine :)

However, my local Intel rep once told me that this is not supported at all and that they've even included some code to prevent people from doing this. You'd have to talk to Intel to get an answer on the "why" question.

One thing you could try to do is run a stock CentOS 7.3 (or RedHat 7.3) kernel on the 7210, then remove all xppsl packages (this should be possible) , then install the mpss 3.8 software stack: this is the approach I envisage taking when trying it with my 7210 machine.

 

0 Kudos
GAl-G
Beginner
599 Views

Haha, yep, seems a lot of people want to try it!

Interesting suggestion about using an OS that's unaware of the fact that it's running on a 7210. I have a few concerns, though -- my understanding is that xppsl allows the processor to enter turbo mode and use dynamic voltage controls, without which the processor cannot do either of these things and become more likely to throttle. I thought there were also some kernel-level tweaks necessary to get it to talk to the MCDRAM in certain modes (which? HBM-as-cache?).

If these fears are unfounded and the processor will work just fine without xppsl, then absolutely. Can anyone weigh in on what xppsl specifically is used for from a processing performance standpoint? (I don't need extra goodies or benchmarks or perf monitors, or even HBM-specific memory control, I just need to retain the chip performance).

But again, great potential hack. Think I should try booting into safe mode and using yum to remove just xppsl-coi first?

0 Kudos
Reply