13

This may sound silly, but is there a way to intentionally reduce the speed of a rebuild in a Linux Software RAID? (Basically reducing the throughput of all of the disks so that it's not maxing out.)

The RAID itself is just a bunch of drives connected via external SATA to a Slackware 13.37 box running software RAID (everything controlled by mdadm). The drives aren't of the highest quality (it's a budget home system) and I'd just like the peace of mind that I'm not pushing them too much.

Maybe there's a way to pause and unpause the rebuild, which I can script to happen from time to time?

David
  • 848
  • 3
  • 16
  • 28
  • Can you add up the reasons why you want this behavior, because as far I'm concerned, RAID builds once when you set-up and rebuilds when you add/replace a disk. – Braiam Jul 30 '13 at 02:11
  • @Braiam: As mentioned in the question, it's *mostly* about peace of mind. The drives are consumer-grade Western Digital drives with firmware patches to remove the TLER problem with software RAID. There are *occasional* false failures. Sometimes the devices have the same number of RAID events and can just be re-assembled, sometimes they don't and the false failure needs to be re-built. I'd just like to re-build slowly to avoid another false failure while it's in progress. – David Jul 30 '13 at 02:14
  • Worries not, I was just trying to determine how silly your reason was ;) – Braiam Jul 30 '13 at 02:30
  • I'm facing a similar issue, as my Synology NAS is shutting down from time to time when there's heavy disk usage. Bad thing is that RAID resync does a lot of stress on the disk, so the NAS reboots, then runs the resync, then reboots... It never ends. So thanks a lot for the question, and the answer! – Yvan Sep 04 '14 at 11:42
  • One example is that I had to do some quick work, but my RAID array decided to recheck itself. Obviously this maxed out the drives and made work impossible. I realize that it may have been a read error, but the work was more important than the check at the moment. – Apache Mar 09 '17 at 06:35
  • A good reasons to pause it or slow it down is to finish a fresh installation more quickly. – Joachim Wagner Apr 05 '18 at 14:06
  • The problem is, the raid rebuild isn't consistent. It runs in bursts. For example, if you set the minimum speed limit to 10,000, it will sit idle for around 20 seconds, then run FULL speed to catch up for the 20 seconds. Then it will sit idle again. Rinse and repeat. Whomever did the rebuild schedule didn't do a very good job. – Brain2000 Jul 13 '20 at 05:34
  • @Braiam (1) When the MD-RAID1 with the OS is being rebuilt, the OS is terribly slow (probably due to excessive seeks). (2) When I had to press Reset after shutdown hung, the raid was marked unclean and was being rebuilt. To make things worse, the "external:imsm" does not have a bitmap, so it rebuild the whole disk (2TB in my case, "finish=236.9min speed=120296K/sec"). – U. Windl Aug 04 '23 at 21:09

1 Answers1

25

You can pause a rebuild with this:

echo "idle" > /sys/block/md0/md/sync_action

Assuming md0 is your md device. However, mdadm will commence rebuilding on an "event" which it isn't clear what that would be. I suspect a read or write to the array will kick off the rebuild again - so often this command does nothing obvious as the rebuild stops and then immediately restarts. If you have multiple md devices, then this will cause mdadm to rebuild the next one that needs it.

To throttle the rebuild, you can use:

echo 5000 > /proc/sys/dev/raid/speed_limit_max

This will limit the rebuild maximum throughput to 5Mb/s. You can see the current resync speed by doing

cat /proc/mdstat
Paul
  • 59,223
  • 18
  • 147
  • 168
  • I *really* like that `speed_limit_max` option. (Which I'm also now seeing confirmed in my local `man` page for `md`, which is awesome.) The first option would probably work for me since the file system on the array is un-mounted while the rebuild is in progress. But the `speed_limit_max` option seems more straightforward. – David Jul 30 '13 at 02:18
  • How do you keep these changes permanent? – John Tate Dec 02 '18 at 09:03
  • 1
    To make changes permanent, every boot should be kernel value updated. Create file `/etc/sysctl.d/maxresync.conf` with one line `dev.raid.speed_limit_max = 10000`. Then run `sysctl -p`. After that and after next boot verify by `cat /proc/sys/dev/raid/speed_limit_max` – Milan Kerslager Mar 17 '19 at 22:18
  • 1
    The speed limit options do not work like they should. The rebuild will burst every 20 seconds with the values entered, as opposed to running within the speed limits all the time. For example, if you enter a min/max speed limit of 10,000, it will burst 10,000 * 20 = 200MB every 20 seconds, instead of sending a constant stream of 10MB every second. – Brain2000 Jul 13 '20 at 05:35
  • 1
    @Brain2000 I'd argue that the burst makes sense for spinning disks as it means less head seeks than trying to read tiny bits of data from the same position all the time. – Xmister Apr 09 '21 at 19:46
  • @Xmister that is true if you want the drive to rebuild faster. But when you have it in production and need it to service customer requests the resync code does not work as expected. I looked at the source code and realized it is using a 20 second sample and throttle rate. Meaning you can go 20 seconds completely unthrottled. It is bad for both spindles and SSDs. It should probably be reduced to a one second granularity. – Brain2000 Apr 11 '21 at 19:15
  • 1
    As long as more than 1 byte is transferred at a time, there will always be a time interval during which the speed limit is exceeded when the speed limit is below the normal speed of the hardware. 20s or 1s is a choice. The throttle parameters should be configurable or at least the behaviour documented, e.g. if you know that the limit is enforced only over 20s intervals you can limit the speed to 10 MB for every second by setting the limit 20 times lower, i.e. 500 KB/s. In other words, set it as low as necessary to keep your customers happy. – Joachim Wagner Aug 15 '21 at 11:03