I've got a Z3 pool with 9 10TB drives , under a Dell PowerEdge R730xd running as a mixed OS virtual machine host. Two pools exist: rpool - The SSD mirrored pool, used mostly for root drives in VMs. dpool - The larger HDD pool, used for data.
On principal - As the device is outside of prosupport, I don't elect to use hardware RAID.
The performance of dpool dreadful, I suspect:
- The controller could either be mis-configured or overloaded for this mode of operation.
- Less optimal disk layout, tuning guides recommend using full disks rather than partitions.
- The block size of written data is out of sync with the block size of the disks.
- The drive pool needs a SSD cache.
Are there any quick fixes to be able to get read performance up? I'd like to reinstall this array with better performance tuning. So I'd just be looking at the best way to do a full clone of all data. There are filesystem backups I can fall back on, if needs be.
Below are various outputs:
fio --filename=/dpool/datatest --sync=1 --rw=randread --bs=4k --numjobs=1 --iodepth=4 --group_reporting --name=test --filesize=10G --runtime=300
test: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=4
fio-3.25
Starting 1 process
test: Laying out IO file (1 file / 10240MiB)
Jobs: 1 (f=1): [r(1)][100.0%][r=416KiB/s][r=104 IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=1812194: Sat Mar 19 16:44:08 2022
read: IOPS=96, BW=386KiB/s (395kB/s)(113MiB/300008msec)
clat (usec): min=10, max=130802, avg=10360.59, stdev=7791.93
lat (usec): min=11, max=130803, avg=10361.10, stdev=7791.95
clat percentiles (usec):
| 1.00th=[ 63], 5.00th=[ 82], 10.00th=[ 6718], 20.00th=[ 8029],
| 30.00th=[ 8717], 40.00th=[ 9110], 50.00th=[ 9503], 60.00th=[ 9765],
| 70.00th=[ 10028], 80.00th=[ 10159], 90.00th=[ 13698], 95.00th=[ 21627],
| 99.00th=[ 45351], 99.50th=[ 55837], 99.90th=[100140], 99.95th=[105382],
| 99.99th=[116917]
bw ( KiB/s): min= 64, max= 544, per=99.78%, avg=385.81, stdev=106.22, samples=599
iops : min= 16, max= 136, avg=96.45, stdev=26.55, samples=599
lat (usec) : 20=0.09%, 50=0.32%, 100=5.95%, 250=0.27%, 500=0.01%
lat (msec) : 2=0.01%, 4=0.05%, 10=64.61%, 20=23.10%, 50=4.79%
lat (msec) : 100=0.71%, 250=0.10%
cpu : usr=0.11%, sys=1.69%, ctx=27210, majf=0, minf=112
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=28940,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=4
Run status group 0 (all jobs):
READ: bw=386KiB/s (395kB/s), 386KiB/s-386KiB/s (395kB/s-395kB/s), io=113MiB (119MB), run=300008-300008msec
The disks are configured as non-RAID devices on the server controller, and passed through as normal disks to Linux.
Debug dumps here: https://gist.github.com/pbrooks/3eca3e78ecada637c57962c2682f4a69

