I assembled a system using P4304CR2LFKN chassis, two Xeon E5-2620 CPUs, 32GB (4x8GB) RAM and four GTX690 graphics cards. It's running Windows 7 Pro SP1 x64, with the latest drivers from Intel and NVidia. However, I have several problems:
1) The only BIOS version which works at all is 01.02.2002, dated 5/1/2012. Anything newer and BIOS gives an error "0146 PCI out of resources error" on startup, and Windows does not see any GPUs, regardless of the value I set for IO space allocation above 4GB.
2) BIOS is not accessible using an add-on graphics card - both analog and digital connections output random noise during POST. I have to remove all add-on cards and use the onboard video in order to enter BIOS setup and configure settings.
3) When I install more than two (i.e. three or four) graphics cards (total six to eight GPUs), applications that use 3D acceleration or CUDA stop working. Whereas with two graphics cards (four GPUs) applications such as Octane Render run fine, adding another card either produces blank screens, or crashes on startup. Additionally, when three or four cards installed, shutdown process results in a reboot rather than power down, and after the reboot Windows displays a message that there was a blue screen crash.
How do I get this system to work with eight GPUs (4xGTX690) for running CUDA workloads? I don't need SLI, just CUDA.
Re the /message/184272# 184272 P4304CR2LFKN and graphics cards.
I assume you have a W2600CR mother board...
I have one of these with W2600CR, 2xE5-2687Ws, 128 GB ram, 1 x Quadro 4000, 1 x RS25DB080, 8 x 1 TB HDs, 1 x OCZ PCIe Revodrive 120GB.
I updated all BIOS, BMC and ME when setting up using Intel Deployment Assistant and latest downloads from Intel Support website "http://www.intel.com/p/en_US/support/highlights/server/wb-w2600cr Support for the Intel® Workstation Board W2600CR Familyhttp://www.intel.com/p/en_US/support/highlights/server/wb-w2600cr http://www.intel.com/p/en_US/support/highlights/server/wb-w2600cr" (Yes, you also need the latest deployment assistant iso). More often than not you need the latest drivers, bios etc with new Intel gear
From memory, you must have both onboard video disabled and dual monitor video disabled in bios settings, to get a PCIe Graphics card to show at post (see Intel service guide g61542002_p4000ip_w4000cr_sg_r1_2.pdf approx page 142~160).
Have you tried booting with just one GTX card in PCIe slot 1? If you can and when you add more cards things fail, it could be you're running out of option rom... You may need to switch to the EFI boot system.
When installing the PCIe Revodrive card with onboard scu controller active I ran out of option rom space, could be that this is happening with your GTX cuda cards. I tried disabling the onboard sata/scu controller and used the RS25DB080 instead, however still not enough option rom space with Revodrive.
Finally I had to turn on and use EFI optimised boot, to free the option rom! http://www.eightforums.com/ www.eightforums.com has tutorials for building an EFI bootable WIN7Sp1 USB stick or DVD and installing on an EFI optimised system (yes there are a tricks to it).
P.S. You cannot EFI boot from the onboard SCU controller! in RSTe or ERST2 mode, you must use the SATA controller port 1 or2 or an addin raid card as a boot drive, hence the RS25DB080 in my case.
P.P.S And remember some of the PCIe slots on these motherboards are X4/X8/X16 muxed, so be careful where you place any other PCIe cards when using all four X16 slots...
Maybe a Quadro and a Tesla would be a better setup for what you seem to be doing with the 4 x GTX690s and CUDA as they're designed to cohabit/co-process.
The P4300CR series machines are great workstations but a little tricky to setup...
Hope this helps.
Regarding upgrading the BIOS etc, did you try resetting the BIOS to defaults using the jumper after upgrading, sometimes an upgraded BIOS can be confused by old settings and fail...
And maybe a more robust OS such as WinServer 2012 setup for a "Desktop Experience" may help with so many processors, processes and threads. You could try the Trial edition but don't forget to get the genuine Nvidia drivers, I believe the MS ones are crippled/limited.
Hope this helps,
1. Disable the unused onboard devices and/or OpROMs in BIOS, for example, NIC ROMs, SATA/SAS controllers, etc.
2. Go to BIOS => Advanced => PCI Configuration =>
Maximize Memory below 4GB: Disabled
Memory Mapped I/O above 4 GB: Enabled
Okay, I flashed the latest firmware package (BIOS version 01.06.0002R4151, etc), reset BIOS to defaults and followed the advice above, but nothing helps. I can get one GTX 690 to work, but with two or more I get error 146 PCI out of resources every time, no matter what I do. I set Maximize Memory below 4GB to disable, Memory Mapped I/O above 4GB enable, Memory Mapped I/O Size to 8G, 32G, 128G and 1024G, disabled both onboard NICs and the onboard storage controller, disabled serial port B, enabled EFI optimized boot - nothing helps. When the BIOS reports error 146, I cannot disable the onboard video controller, and if I click past the error, OS loads but can't see the add-on graphics cards/
Today I was able to get the system to boot up with all four cards and latest BIOS version (01.06.0002, package dated 6/1/2013) by disabling the onboard SATA/RAID controller and one of the NICs, installing the cards one at a time and booting up then powering down after each card installation. With all four cards installed, I can't turn on the disk controller anymore - it results in an "Nmi error - system halted" message right after POST, but I can live without it. The new BIOS also resolved the issue where I got colorful noise instead of POST screens. However, the original issue with OpenGL remains. By trial and error I determined that I can have at most five GPUs active at any one time, with three disabled in device manager. If I do that, OpenGL applications run successfully. However, if I enable six or more GPUs, then I get errors, crashing apps, blank screens, etc.