Software Archive
Read-only legacy content
Ankündigungen
FPGA community forums and blogs have moved to the Altera Community. Existing Intel Community members can sign in with their current credentials.
17060 Diskussionen

MIC reset failed

Simon_M_1
Einsteiger
8.294Aufrufe

Hi,

It seems like I can't get my MIC card working. I followed the instructions in the readme to install MPSS on a fresh Centos 6.3 (which is pretty much the same as RHEL 6.3, I think). The errors I get are not constant, which makes debugging quite hard, but right now, starting mpss using

# service mpss start

fails with this in /var/log/mpssd:

Mon Mar 25 12:10:48 2013: mic0: log_buf_addr: ffffffff832332d0
Mon Mar 25 12:10:48 2013: mic0: log_buf_len: ffffffff81724c70
Mon Mar 25 12:10:48 2013: mic0: Current state "reset failed" cannot boot card
Mon Mar 25 12:10:50 2013: Wait for download requests

The output of miccheck doesn't look good either:

[root@semperphi ~]# /opt/intel/mic/bin/miccheck

miccheck 2.1.5889-14, created 18:10:54 Feb 28 2013
Copyright 2011-2013 Intel Corporation All rights reserved

Test 1 Ensure installation matches manifest : OK
Test 2 Ensure host driver is loaded : OK
Test 3 Ensure driver matches manifest : OK
Test 4 Detect all listed devices : OK
MIC 0 Test 1 Find the device : OK
MIC 0 Test 2 Check the POST code via PCI : FAILED
MIC 0 Test 2> Current POST code is �� (not FF) for MIC 0
MIC 0 Test 3 Connect to the device : SKIPPED
MIC 0 Test 3> Prerequisite 'Ensure the device is online' failed:
MIC 0 Test 3> The device is not online
MIC 0 Test 4 Check for normal mode : SKIPPED
MIC 0 Test 4> Prerequisite 'Ensure the device is online' failed:
MIC 0 Test 4> The device is not online
MIC 0 Test 5 Check the POST code via SCIF : SKIPPED
MIC 0 Test 5> Prerequisite 'Ensure the device is online' failed:
MIC 0 Test 5> The device is not online
MIC 0 Test 6 Send data to the device : SKIPPED
MIC 0 Test 6> Prerequisite 'Check for normal mode' failed:
MIC 0 Test 6> The device is not in normal mode
MIC 0 Test 7 Compare the PCI configuration : OK
MIC 0 Test 8 Ensure Flash version matches manifest : SKIPPED
MIC 0 Test 8> Prerequisite 'Check for normal mode' failed:
MIC 0 Test 8> The device is not in normal mode
Status: The POST code was not "FF"

The output of micinfo:

[root@semperphi ~]# /opt/intel/mic/bin/micinfo
MicInfo Utility Log

Created Mon Mar 25 12:13:53 2013


System Info
HOST OS : Linux
OS Version : 2.6.32-279.el6.x86_64
Driver Version : 5889-14
MPSS Version : 2.1.5889-14
Host Physical Memory : 16300 MB

Device No: 0, Device Name: mic0

Version
Flash Version : NotAvailable
SMC Boot Loader Version : NotAvailable
uOS Version : NotAvailable
Device Serial Number : NotAvailable

Board
Vendor ID : ffff
Device ID : ffff
Subsystem ID : ffff
Coprocessor Stepping ID : f
PCIe Width : x63
PCIe Speed : Unknown
PCIe Max payload size : 16384 bytes
PCIe Max read req size : 16384 bytes
Coprocessor Model : 0x0f
Coprocessor Model Ext : 0x0f
Coprocessor Type : 0x03
Coprocessor Family : 0x0f
Coprocessor Family Ext : 0x0ff
Coprocessor Stepping : B1
Board SKU : NotAvailable
ECC Mode : NotAvailable
SMC HW Revision : NotAvailable

Cores
Total No of Active Cores : NotAvailable
Voltage : NotAvailable
Frequency : NotAvailable

Thermal
Fan Speed Control : NotAvailable
SMC Firmware Version : NotAvailable
FSC Strap : NotAvailable
Fan RPM : NotAvailable
Fan PWM : NotAvailable
Die Temp : NotAvailable

GDDR
GDDR Vendor : NotAvailable
GDDR Version : NotAvailable
GDDR Density : NotAvailable
GDDR Size : NotAvailable
GDDR Technology : NotAvailable
GDDR Speed : NotAvailable
GDDR Frequency : NotAvailable
GDDR Voltage : NotAvailable

Do you have any ideas what steps I can take to start debugging this ?

Simon

0 Kudos
32 Antworten
Frances_R_Intel
Mitarbeiter
5.927Aufrufe

Just a quick thought to start with. Do you have large BAR (Base Address Registers) support enabled in your BIOS? It must be greater than 4 gigabytes.

Simon_M_1
Einsteiger
5.927Aufrufe

Good question, I forgot to mention it. I enabled the following option in the PCIe section of the BIOS: "Above 4G Decoding (Available if the system supports 64-bit PCI decoding)". It looks like the same thing said differently.

Frances_R_Intel
Mitarbeiter
5.927Aufrufe

Could you try rebooting your system? Or power cycling it if you are using any kind of virtual machine? I suspect that your problem will go away.

Sometimes when a new MPSS is installed (particularly if everything isn't cleanly shut down before the old MPSS is deinstalled and the new one installed), it takes a couple of reboots to shake things out. Looking at your information again, I think that this is the best fit to your symptoms.

TimP
Geehrter Beitragender III
5.927Aufrufe

Not only does 5889 require host reboots at additional points beyond those mentioned in the instructions; after host reboot on my box it is necessary to restart network service even though it is shown as running.

Jaiber_J_Intel
Mitarbeiter
5.927Aufrufe

Not sure if the OP got it resolved.. I faced a similar issue and it was due to the apps mpssflash/mpssinfo being not installed properly. Installing them and making sure it was in the path ensured I got beyond this error.

Simon_M_1
Einsteiger
5.927Aufrufe

I rebooted the host several times, it doesn't help.

I am not sure how the mpss tools can be installed not properly. I followed the installation procedure to the letter, and they are in /opt/intel/mic/bin, which is in my PATH.

BelindaLiviero
Mitarbeiter
5.927Aufrufe

we can submit this to our internal teams for investigation but need a couple more things from you:

1) the host kernel log after mpss shows “reset failed”

2) Can you confirm that you have successfully updated the flash/smc of the card during installation?

BelindaLiviero
Mitarbeiter
5.927Aufrufe

we can submit this to our internal teams for investigation but need a couple more things from you:

1) the host kernel log after mpss shows “reset failed”

2) Can you confirm that you have successfully updated the flash/smc of the card during installation?

Simon_M_1
Einsteiger
5.927Aufrufe

> 1) the host kernel log after mpss shows “reset failed”

After a failed "service mpss start", relevant entries in the system log:

Apr 3 07:25:58 semperphi kernel: mic0: Transition from state reset failed to resetting
Apr 3 07:26:00 semperphi kernel: mic0: Resetting (Post Code ��)
Apr 3 07:26:00 semperphi kernel: mic0: Transition from state resetting to reset failed
Apr 3 07:26:00 semperphi kernel: MIC 0 RESETFAIL postcode �� -1

I guess the ?? that show up are  value that got written directly as binary and not converted to text before getting in the log. In hex, their value is "ef bf bd ef bf bd". Here are the relevant entries in  /var/log/mpssd:

Wed Apr 3 07:26:42 2013: MPSS Daemon start
Wed Apr 3 07:26:42 2013: Configuration version 0.4
Wed Apr 3 07:26:42 2013: mic0: Command line: "quiet root=ramfs console=hvc0 highres=off clocksource=micetc micpm=cpufreq_on;corec6_off;pc3_on;pc6_on"
Wed Apr 3 07:26:42 2013: mic0: log_buf_addr: ffffffff832332d0
Wed Apr 3 07:26:42 2013: mic0: log_buf_len: ffffffff81724c70
Wed Apr 3 07:26:42 2013: mic0: Current state "reset failed" cannot boot card
Wed Apr 3 07:26:44 2013: Wait for download requests

> 2) Can you confirm that you have successfully updated the flash/smc of the card during installation?

I did it, although it took me a few tries before getting a "success" message. I would get some errors I can't remember before.

Thanks,

Simon

BelindaLiviero
Mitarbeiter
5.927Aufrufe
Simon, thanks for the info. Another piece needed: What's your system Board BIOS version? At the # _prompt> type "dmidecode -s bios-verison". (and can you say what kind of system/model you are running?)
Simon_M_1
Einsteiger
5.927Aufrufe

The system comes from Supermicro. The board is a Supermicro X9DRG-QF (http://www.supermicro.com/manuals/motherboard/C606_602/MNL-1309.pdf).

[root@semperphi ~]# dmidecode -s bios-version
1.1

Here is the full dmidecode output if it can be useful: http://pastebin.com/vN9r0xgU

Thanks !

Simon

BelindaLiviero
Mitarbeiter
5.927Aufrufe

additionally it would be helpful if you could collect some data to help us understand what's going on -- the MPSS team has offered the attached debug script to collect the necessary info... could you run this after a reset failure && send us the resulting zip that gets created (you should be able to attach it to this forum thread)

(note:  I gave the micdebug script a txt extension so that this forum software would allow me to attach it here )

# chmod +x micdebug.txt

# sudo sh ./micdebug.txt

Simon_M_1
Einsteiger
5.927Aufrufe

The system comes from Supermicro. The motherboard is a Supermicro X9DRG-QF (http://www.supermicro.com/manuals/motherboard/C606_602/MNL-1309.pdf).

The results are here: http://nova.polymtl.ca/~simark/micdebug-03042013-105411.zip

Attaching them directly to the post was triggering the spam filter somehow.

Simon

BelindaLiviero
Mitarbeiter
5.927Aufrufe

Simon, thank you for all of the information -- team is researching now.   Interestingly, this is the second report of this problem, against the same platform/OEM; now trying to reproduce in our labs.  

Simon_M_1
Einsteiger
5.927Aufrufe

Oh interesting. I look forward to see the results.

BelindaLiviero
Mitarbeiter
5.927Aufrufe

Hi Simon:  our internal team has not been able to reproduce your issue, so back to asking you lots of questions to see if we can figure this out.   Can you run minicom on /dev/ttyMIC<n> and send the output from that?

Simon_M_1
Einsteiger
5.927Aufrufe

Hi Belinda,

I can't seem to connect to /dev/ttyMIC0.

[root@semperphi ~]# minicom /dev/ttyMIC0
minicom: WARNING: configuration file not found, using defaults
Device /dev/modem access failed: No such file or directory.

Am I doing something wrong ?

BelindaLiviero
Mitarbeiter
5.927Aufrufe

Simon, seems that you need to configure minicom to make this happen...here is the list of steps (these I tested on a SLES11 system).

     sudo minicom -s
     Go to "Serial Port Setup"

     Choose option: A - Serial Device
     Edit Serial Device to /dev/ttyMIC0
     Hit <Enter> twice

     Go to "Save setup as.."
     When the input prompt 'Give name to save this configuration?' shows up,
         save the <ConfigName> to the name you prefer.  For example: mic0 <Enter>

     Select "Exit from Minicom"

Then, set up a typescript session to capture what comes next:    script /tmp/minicom.out

Start minicom:      minicom mic0

<stuff scrolls on screen>

Terminate the minicom session: <CTRL>-A-X
Terminate the typescript session:   <CTRL>-D

and send that newly created output file

 

Simon_M_1
Einsteiger
5.927Aufrufe

Oops, I assumed the usage was simply "minicom <device)". Nope.

Now minicom hangs on "Initializing modem" for ever...

BelindaLiviero
Mitarbeiter
5.682Aufrufe

OK, thank you (for trying minicom) - let's try something else:

echo 0 > /sys/class/mic/scif/watchdog_enabled

 Then, use the following steps to show the micro-OS kernel log buffer

Mount debugfs on the host:    mount -t debugfs none /sys/kernel/debug   

Dump the buffer:   sudo cat /sys/kernel/debug/mic_debug/mic0/log_buf > <some file of your choice>

Antworten