- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
It seems like I can't get my MIC card working. I followed the instructions in the readme to install MPSS on a fresh Centos 6.3 (which is pretty much the same as RHEL 6.3, I think). The errors I get are not constant, which makes debugging quite hard, but right now, starting mpss using
# service mpss start
fails with this in /var/log/mpssd:
Mon Mar 25 12:10:48 2013: mic0: log_buf_addr: ffffffff832332d0
Mon Mar 25 12:10:48 2013: mic0: log_buf_len: ffffffff81724c70
Mon Mar 25 12:10:48 2013: mic0: Current state "reset failed" cannot boot card
Mon Mar 25 12:10:50 2013: Wait for download requests
The output of miccheck doesn't look good either:
[root@semperphi ~]# /opt/intel/mic/bin/miccheck
miccheck 2.1.5889-14, created 18:10:54 Feb 28 2013
Copyright 2011-2013 Intel Corporation All rights reserved
Test 1 Ensure installation matches manifest : OK
Test 2 Ensure host driver is loaded : OK
Test 3 Ensure driver matches manifest : OK
Test 4 Detect all listed devices : OK
MIC 0 Test 1 Find the device : OK
MIC 0 Test 2 Check the POST code via PCI : FAILED
MIC 0 Test 2> Current POST code is �� (not FF) for MIC 0
MIC 0 Test 3 Connect to the device : SKIPPED
MIC 0 Test 3> Prerequisite 'Ensure the device is online' failed:
MIC 0 Test 3> The device is not online
MIC 0 Test 4 Check for normal mode : SKIPPED
MIC 0 Test 4> Prerequisite 'Ensure the device is online' failed:
MIC 0 Test 4> The device is not online
MIC 0 Test 5 Check the POST code via SCIF : SKIPPED
MIC 0 Test 5> Prerequisite 'Ensure the device is online' failed:
MIC 0 Test 5> The device is not online
MIC 0 Test 6 Send data to the device : SKIPPED
MIC 0 Test 6> Prerequisite 'Check for normal mode' failed:
MIC 0 Test 6> The device is not in normal mode
MIC 0 Test 7 Compare the PCI configuration : OK
MIC 0 Test 8 Ensure Flash version matches manifest : SKIPPED
MIC 0 Test 8> Prerequisite 'Check for normal mode' failed:
MIC 0 Test 8> The device is not in normal mode
Status: The POST code was not "FF"
The output of micinfo:
[root@semperphi ~]# /opt/intel/mic/bin/micinfo
MicInfo Utility Log
Created Mon Mar 25 12:13:53 2013
System Info
HOST OS : Linux
OS Version : 2.6.32-279.el6.x86_64
Driver Version : 5889-14
MPSS Version : 2.1.5889-14
Host Physical Memory : 16300 MB
Device No: 0, Device Name: mic0
Version
Flash Version : NotAvailable
SMC Boot Loader Version : NotAvailable
uOS Version : NotAvailable
Device Serial Number : NotAvailable
Board
Vendor ID : ffff
Device ID : ffff
Subsystem ID : ffff
Coprocessor Stepping ID : f
PCIe Width : x63
PCIe Speed : Unknown
PCIe Max payload size : 16384 bytes
PCIe Max read req size : 16384 bytes
Coprocessor Model : 0x0f
Coprocessor Model Ext : 0x0f
Coprocessor Type : 0x03
Coprocessor Family : 0x0f
Coprocessor Family Ext : 0x0ff
Coprocessor Stepping : B1
Board SKU : NotAvailable
ECC Mode : NotAvailable
SMC HW Revision : NotAvailable
Cores
Total No of Active Cores : NotAvailable
Voltage : NotAvailable
Frequency : NotAvailable
Thermal
Fan Speed Control : NotAvailable
SMC Firmware Version : NotAvailable
FSC Strap : NotAvailable
Fan RPM : NotAvailable
Fan PWM : NotAvailable
Die Temp : NotAvailable
GDDR
GDDR Vendor : NotAvailable
GDDR Version : NotAvailable
GDDR Density : NotAvailable
GDDR Size : NotAvailable
GDDR Technology : NotAvailable
GDDR Speed : NotAvailable
GDDR Frequency : NotAvailable
GDDR Voltage : NotAvailable
Do you have any ideas what steps I can take to start debugging this ?
Simon
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Just a quick thought to start with. Do you have large BAR (Base Address Registers) support enabled in your BIOS? It must be greater than 4 gigabytes.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Good question, I forgot to mention it. I enabled the following option in the PCIe section of the BIOS: "Above 4G Decoding (Available if the system supports 64-bit PCI decoding)". It looks like the same thing said differently.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Could you try rebooting your system? Or power cycling it if you are using any kind of virtual machine? I suspect that your problem will go away.
Sometimes when a new MPSS is installed (particularly if everything isn't cleanly shut down before the old MPSS is deinstalled and the new one installed), it takes a couple of reboots to shake things out. Looking at your information again, I think that this is the best fit to your symptoms.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Not only does 5889 require host reboots at additional points beyond those mentioned in the instructions; after host reboot on my box it is necessary to restart network service even though it is shown as running.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Not sure if the OP got it resolved.. I faced a similar issue and it was due to the apps mpssflash/mpssinfo being not installed properly. Installing them and making sure it was in the path ensured I got beyond this error.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I rebooted the host several times, it doesn't help.
I am not sure how the mpss tools can be installed not properly. I followed the installation procedure to the letter, and they are in /opt/intel/mic/bin, which is in my PATH.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
we can submit this to our internal teams for investigation but need a couple more things from you:
1) the host kernel log after mpss shows “reset failed”
2) Can you confirm that you have successfully updated the flash/smc of the card during installation?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
we can submit this to our internal teams for investigation but need a couple more things from you:
1) the host kernel log after mpss shows “reset failed”
2) Can you confirm that you have successfully updated the flash/smc of the card during installation?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
> 1) the host kernel log after mpss shows “reset failed”
After a failed "service mpss start", relevant entries in the system log:
Apr 3 07:25:58 semperphi kernel: mic0: Transition from state reset failed to resetting
Apr 3 07:26:00 semperphi kernel: mic0: Resetting (Post Code ��)
Apr 3 07:26:00 semperphi kernel: mic0: Transition from state resetting to reset failed
Apr 3 07:26:00 semperphi kernel: MIC 0 RESETFAIL postcode �� -1
I guess the ?? that show up are value that got written directly as binary and not converted to text before getting in the log. In hex, their value is "ef bf bd ef bf bd". Here are the relevant entries in /var/log/mpssd:
Wed Apr 3 07:26:42 2013: MPSS Daemon start
Wed Apr 3 07:26:42 2013: Configuration version 0.4
Wed Apr 3 07:26:42 2013: mic0: Command line: "quiet root=ramfs console=hvc0 highres=off clocksource=micetc micpm=cpufreq_on;corec6_off;pc3_on;pc6_on"
Wed Apr 3 07:26:42 2013: mic0: log_buf_addr: ffffffff832332d0
Wed Apr 3 07:26:42 2013: mic0: log_buf_len: ffffffff81724c70
Wed Apr 3 07:26:42 2013: mic0: Current state "reset failed" cannot boot card
Wed Apr 3 07:26:44 2013: Wait for download requests
> 2) Can you confirm that you have successfully updated the flash/smc of the card during installation?
I did it, although it took me a few tries before getting a "success" message. I would get some errors I can't remember before.
Thanks,
Simon
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The system comes from Supermicro. The board is a Supermicro X9DRG-QF (http://www.supermicro.com/manuals/motherboard/C606_602/MNL-1309.pdf).
[root@semperphi ~]# dmidecode -s bios-version
1.1
Here is the full dmidecode output if it can be useful: http://pastebin.com/vN9r0xgU
Thanks !
Simon
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
additionally it would be helpful if you could collect some data to help us understand what's going on -- the MPSS team has offered the attached debug script to collect the necessary info... could you run this after a reset failure && send us the resulting zip that gets created (you should be able to attach it to this forum thread)
(note: I gave the micdebug script a txt extension so that this forum software would allow me to attach it here )
# chmod +x micdebug.txt
# sudo sh ./micdebug.txt
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The system comes from Supermicro. The motherboard is a Supermicro X9DRG-QF (http://www.supermicro.com/manuals/motherboard/C606_602/MNL-1309.pdf).
The results are here: http://nova.polymtl.ca/~simark/micdebug-03042013-105411.zip
Attaching them directly to the post was triggering the spam filter somehow.
Simon
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Simon, thank you for all of the information -- team is researching now. Interestingly, this is the second report of this problem, against the same platform/OEM; now trying to reproduce in our labs.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Oh interesting. I look forward to see the results.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Simon: our internal team has not been able to reproduce your issue, so back to asking you lots of questions to see if we can figure this out. Can you run minicom on /dev/ttyMIC<n> and send the output from that?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Belinda,
I can't seem to connect to /dev/ttyMIC0.
[root@semperphi ~]# minicom /dev/ttyMIC0
minicom: WARNING: configuration file not found, using defaults
Device /dev/modem access failed: No such file or directory.
Am I doing something wrong ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Simon, seems that you need to configure minicom to make this happen...here is the list of steps (these I tested on a SLES11 system).
sudo minicom -s
Go to "Serial Port Setup"
Choose option: A - Serial Device
Edit Serial Device to /dev/ttyMIC0
Hit <Enter> twice
Go to "Save setup as.."
When the input prompt 'Give name to save this configuration?' shows up,
save the <ConfigName> to the name you prefer. For example: mic0 <Enter>
Select "Exit from Minicom"
Then, set up a typescript session to capture what comes next: script /tmp/minicom.out
Start minicom: minicom mic0
<stuff scrolls on screen>
Terminate the minicom session: <CTRL>-A-X
Terminate the typescript session: <CTRL>-D
and send that newly created output file
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Oops, I assumed the usage was simply "minicom <device)". Nope.
Now minicom hangs on "Initializing modem" for ever...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
OK, thank you (for trying minicom) - let's try something else:
echo 0 > /sys/class/mic/scif/watchdog_enabled
Then, use the following steps to show the micro-OS kernel log buffer
Mount debugfs on the host: mount -t debugfs none /sys/kernel/debug
Dump the buffer: sudo cat /sys/kernel/debug/mic_debug/mic0/log_buf > <some file of your choice>

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page