we are faced with a serious problem with RSS:
We have 16 GE ports with very high packet load (nearly maximum GE bandwith) within one computer. Since a single CPU core cannot cope with this load we activated RSS with 4 queues. Basically this works great and we saw CPU load for the driver distributed among 4 cores (each core about ~40% of its compute capacity).
But this works only once: After reboot (and lots of more reboots and tests) nearly all the load is again on core 0. The remaining three cores receive maybe ~5% and rarely up to max. 20%. Thus GE performance breaks down dramatically and the entire system becomes sluggish. From further tests it seems that core utilization becomes balanced only if the RSS queue setting is changed during operation, and after reboot it doesn't work correctly anymore.
I read about RSS on MSDN, but there seems to be no chance for a user to influence core utilization except setting the first core to use and a maximum number of cores (changing the first leads to even more massive performance degradation). Is there a way to change RSS hashing or something else in the Intel driver to always ensure balanced core utilization?
Dual Xeon E5-2620, 32GB, I340-T4, HotLava Shasta 12G6 (82576EB)
Windows 7 Pro 64-bit SP1, Intel Drivers v17.4
9K jumbo frames, 16 ports teamed to 8 pairs as static LAG