It appears the above content is garaged and I can't modify or delete it...
Is there any total speed limit for the X99/C610 chipset's ten SATA ports? I can only get them running at maxium 1000MB/s.
I recently had some issues with the total speed for X99 platform's SATA ports. The CPU is E5 2678 v3 and the motherboard is the Huananzhi X99-AD3 motherboard (https://www.amazon.com/HELLOLAND-HUANANZHI-Motherboard-Supported-Supports/dp/B082W24TBR). The motherboard has ten SATA ports and six of the ports are connected to the PCH SATA controller 1 and the other four ports are connected to the PCH SATA controller 2. I connected 10 hard drive disks to it and each of the hard disk can reach 200MB/s write speed.
Then if there is no speed limit, the total speed of all the disk writing at the same time should be around 2000MB/s. However, I tested the total speed for all the 10 SATA disks, the speed is around 1000MB/s.
Here is the test result using fio (https://fio.readthedocs.io/en/latest/fio_doc.html). I cleaned all the disks and freshly installed the latest Windows 10 20H2 on the first drive. Then I run the following command to test the sequential write speed on all the drives.
PS C:\Windows\system32> fio --name=seqwrite --rw=write --bs=32K --size=20G --filename=C\:\fiotestdir\seqwrite2.bin:D\:\fiotestdir\seqwrite2.bin:E\:\fiotestdir\seqwrite2.bin:F\:\fiotestdir\seqwrite2.bin:G\:\fiotestdir\seqwrite2.bin:H\:\fiotestdir\seqwrite2.bin:I\:\fiotestdir\seqwrite2.bin:J\:\fiotestdir\seqwrite2.bin:K\:\fiotestdir\seqwrite2.bin:L\:\fiotestdir\seqwrite2.bin fio: this platform does not support process shared mutexes, forcing use of threads. Use the 'thread' option to get rid of this warning. seqwrite: (g=0): rw=write, bs=(R) 32.0KiB-32.0KiB, (W) 32.0KiB-32.0KiB, (T) 32.0KiB-32.0KiB, ioengine=windowsaio, iodepth=1 fio-3.25 Starting 1 thread seqwrite: Laying out IO files (10 files / total 20480MiB) Jobs: 1 (f=10): [f(1)][100.0%][w=997MiB/s][w=31.9k IOPS][eta 00m:00s] seqwrite: (groupid=0, jobs=1): err= 0: pid=12268: Sun Dec 27 05:40:03 2020 write: IOPS=31.3k, BW=977MiB/s (1024MB/s)(20.0GiB/20967msec); 0 zone resets slat (usec): min=16, max=138408, avg=21.84, stdev=172.83 clat (nsec): min=148, max=85615k, avg=9187.61, stdev=174856.93 lat (usec): min=21, max=138424, avg=31.02, stdev=250.55 clat percentiles (nsec): | 1.00th=[ 7328], 5.00th=[ 8096], 10.00th=[ 8160], 20.00th=[ 8256], | 30.00th=[ 8256], 40.00th=[ 8384], 50.00th=[ 8384], 60.00th=[ 8384], | 70.00th=[ 8512], 80.00th=[ 8512], 90.00th=[ 9024], 95.00th=, | 99.00th=, 99.50th=, 99.90th=, 99.95th=, | 99.99th= bw ( KiB/s): min=197343, max=1092580, per=100.00%, avg=1001021.78, stdev=146515.51, samples=41 iops : min= 6166, max=34143, avg=31281.51, stdev=4578.67, samples=41 lat (nsec) : 250=0.10%, 500=0.28%, 750=0.01%, 1000=0.01% lat (usec) : 2=0.01%, 4=0.01%, 10=92.84%, 20=6.57%, 50=0.17% lat (usec) : 100=0.02%, 250=0.01% lat (msec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01% cpu : usr=4.77%, sys=66.77%, ctx=0, majf=0, minf=0 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: total=0,655360,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=1 Run status group 0 (all jobs): WRITE: bw=977MiB/s (1024MB/s), 977MiB/s-977MiB/s (1024MB/s-1024MB/s), io=20.0GiB (21.5GB), run=20967-20967msec PS C:\Windows\system32>
Then tested again using another block size, but the speed is still around 1000MB/s.
PS C:\Windows\system32> fio --name=seqwrite --rw=write --bs=512K --size=100G --filename=C\:\fiotestdir\seqwrite3.bin:D\:\fiotestdir\seqwrite3.bin:E\:\fiotestdir\seqwrite3.bin:F\:\fiotestdir\seqwrite3.bin:G\:\fiotestdir\seqwrite3.bin:H\:\fiotestdir\seqwrite3.bin:I\:\fiotestdir\seqwrite3.bin:J\:\fiotestdir\seqwrite3.bin:K\:\fiotestdir\seqwrite3.bin:L\:\fiotestdir\seqwrite3.bin fio: this platform does not support process shared mutexes, forcing use of threads. Use the 'thread' option to get rid of this warning. seqwrite: (g=0): rw=write, bs=(R) 512KiB-512KiB, (W) 512KiB-512KiB, (T) 512KiB-512KiB, ioengine=windowsaio, iodepth=1 fio-3.25 Starting 1 thread seqwrite: Laying out IO files (10 files / total 102400MiB) Jobs: 1 (f=10): [W(1)][100.0%][w=1052MiB/s][w=2104 IOPS][eta 00m:00s] seqwrite: (groupid=0, jobs=1): err= 0: pid=11812: Sun Dec 27 05:44:40 2020 write: IOPS=2116, BW=1058MiB/s (1110MB/s)(100GiB/96760msec); 0 zone resets slat (usec): min=19, max=71987, avg=170.04, stdev=266.64 clat (nsec): min=204, max=140887k, avg=301193.02, stdev=1073267.43 lat (usec): min=135, max=141029, avg=471.24, stdev=1069.42 clat percentiles (usec): | 1.00th=[ 5], 5.00th=[ 8], 10.00th=[ 8], 20.00th=[ 8], | 30.00th=[ 9], 40.00th=[ 9], 50.00th=[ 9], 60.00th=[ 9], | 70.00th=[ 9], 80.00th=[ 12], 90.00th=[ 1549], 95.00th=[ 2278], | 99.00th=[ 3064], 99.50th=[ 3359], 99.90th=[ 4883], 99.95th=[ 6390], | 99.99th= bw ( MiB/s): min= 390, max= 2784, per=100.00%, avg=1059.48, stdev=199.60, samples=189 iops : min= 781, max= 5568, avg=2118.45, stdev=399.18, samples=189 lat (nsec) : 250=0.01%, 500=0.44%, 750=0.05%, 1000=0.01% lat (usec) : 2=0.05%, 4=0.34%, 10=75.14%, 20=9.25%, 50=0.12% lat (usec) : 100=0.02%, 250=0.01%, 500=0.24%, 750=0.53%, 1000=1.00% lat (msec) : 2=5.78%, 4=6.87%, 10=0.14%, 20=0.01%, 50=0.01% lat (msec) : 100=0.01%, 250=0.01% cpu : usr=2.07%, sys=36.17%, ctx=0, majf=0, minf=0 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: total=0,204800,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=1 Run status group 0 (all jobs): WRITE: bw=1058MiB/s (1110MB/s), 1058MiB/s-1058MiB/s (1110MB/s-1110MB/s), io=100GiB (107GB), run=96760-96760msec PS C:\Windows\system32>
I also tested on linux and the result is almost the same, the speed can't reach 2000MB/s. I haven't changed other X99 motherboard or CPUs to test. I have searched and found that there is potentional speed bottle neck called DMI 2.0 between the CPU and the PCH. But it's bandwidth is 2GB/s. Then I searched the "Intel® C610 Series Chipset and Intel® X99 Chipset. Platform Controller Hub (PCH) Datasheet", it appears it did not document the speed limit for all SATA ports.
So what's the cause for the speed limit?
If there is anything limiting speed, it is the bottleneck at the DMI bus. You need to remember that the DMI bus is used for a *lot* more things that just SATA - there's PCIe, USB, GFX steering, SPI, I2C, SMBus, LPC (SIO/EC), HDA (Audio), ME, LAN, WiFi (including CNVi), etc.