Server Products
Data Center Products including boards, integrated systems, Intel® Xeon® Processors, RAID Storage, and Intel® Xeon® Processors
4778 Discussions

MFSYS25 Switch module problem

sadmi7
Beginner
5,637 Views

Hi,

I have network problem with my Intel MFSYS22 (Current Build Version: 6.10.100.20120307.34729) Switch module.

Product NameGigabit Ethernet SwitchManufacturerIntel CorporationManufacture Date2008-02-03 10:40:26 PMPart NumberD70739-404Serial NumberBZTC80500085Asset TagUp Time0 days 0:46:35Product IDINTEL_SMHW VersionD70739-404Boot Version1.0.0.6SW Version1.0.0.27

Once or twice a day it loses network connectivity on external and internal ports. It's status becomes "Unmanageable" until I remove it from chassis and put it back.

This issue start about month ago, but for first time it loses network connectivity once at 10 days. Now it loses twice times for a day.

How can I resolve this issue?

24 Replies
DG9
Beginner
515 Views

All is well. Replace the module. Except that it now price in Russia from $ 2000

 

Problem popular. And that means that somewhere flaw Intel. Maybe some element on the board fails?
0 Kudos
GDoig
Beginner
515 Views

Solution!

I had a problem similar to this, and found a solution!

At around 5am, my mfsys25 sent me 3 error messages: Server 1 failed, Server 2 failed and Server 3 failed! The next morning, I started problem solving with a hard reboot. Long story short, there were no Server Blade failures, it appears my Switch Failed.

Others have described this problem: Everything internal seems fine, boot up and external ports 'flash green' as if an external network is being established.. then the external port lights go off and nothing can be done to establish an external network connection.

Next, I replaced my switch with a BRAND NEW switch module. Same problem! Multiple reboots. Increased waiting time switched off. No change ALL DAY. I assumed something internally (in the rack/case) had failed! Ready to read last rites, then I tried one more thing:

I only have 3 server blades and three external network connections. So I created 1 VLAN for each external connection to each internal connection and, one by one, turned Spanning Tree off for each external port:

It is possible that the Spanning Tree algorithm is 'complex' and requires hardware that is not needed when a simple direct connection is needed.

Anyhoo, it overcame my failure and now the system has been up for a day (fingers crossed, touch wood).

Hope this works for you too!!

gdoig

0 Kudos
SKusd
Beginner
515 Views

Same problem here. How did you connect to switch to make those changes? My management module cannot connect to switches.

0 Kudos
GDoig
Beginner
515 Views

After I reset my switch as described above, I had one more event where my switch 'went down' in July. The switch was unmanageble, as previously described. To re-establish contact with the switch, I removed it, rebooted the system, re-inserted etc. Evetually, the switch is 'manageable' again.

Whist trying to re-establish control over the switch, the network admin contacted me (I'm on a small 500 person segment of a large 4,000 person coporation's head office network) and told me my server created a feedback strom with his network !! I thought this was odd, because my server has been stable on the network for about 6 years. Personally, I think he installed a new bridge / component that caused the storm with my server (not vice versa!!), simply because my server has been stable for 6 years. In good admin style, I got NO useful information back from him.... no logs to show where the storm originated from or anything. My logs couldn't tell me the inciting cause.... but to be proactive, I set 'Storm Control' under advanced options for my network switch.

This is probably a VERY low threshold, but I don't see extremly high traffic on my server so it is fine.

Since July, with this change in addition to the above changes in reduction in complexity of the 'spanning tree', I have not had a single switch event.

It is possible that some newer complex network switches/bridges attempt to comminicate with the mfsys25 swithc to 'learn' its network topology. Since it can look quite complex with spanning tree etc, perhaps this communication does cause a network storm (feedback loop between the two pieces of equipment that eventually takes down one or the other)....

Anyway, it is now 100% stable for over 3 months!

Regards,

gdoig

0 Kudos
Reply