25

When running SMART-Tests using smartmontools, they NEVER finish. I always get "Interrupted (host reset.)" on various different systems and disks, including Debian in x86 and ARM, OS X on x64, with external and internal drives. Even when run in captive mode with disks all empty (zeroed with dd).

What am I doing wrong?

bot47
  • 1,802
  • 3
  • 21
  • 35

4 Answers4

19

When the drive does not handle any input/output activity during the test, it may go to standby, which raises the Interrupted (host reset) condition. Try to read from the disk at suitable intervals:

while true; do dd if=/dev/disk1 of=/dev/null count=1; sleep 60; done

(replace /dev/disk1 with the appropriate device; reads one sector from that device every 60 seconds until you hit ctrl-c)

This helped in my environment: OS X 10.6.8, WD Elements USB-connected drive, SAT-SMART-driver 0.8.

A captive test should theoretically keep the drive online. Yet the hardware command send by smartctl may time out before the test completes, causing the kernel to reset the link and ending up in the same situation as above (bug #303).

See this thread on the smartmontools-support mailing list for further details. I acknowledge Christian Franke for the insight given here.

Tobu
  • 2,663
  • 19
  • 22
sve.g
  • 191
  • 1
  • 3
  • 1
    Other possible interruptions (http://serverfault.com/a/584055/): a bad cable can cause timeouts, and the kernel will trigger a reset. I'm less sure it's necessary to stop smartd. Any timeouts and interruptions will appear in dmesg/kern.log/`journalctl -fk`. – Tobu Nov 13 '14 at 14:40
  • 1
    Wow, that's nuts! Confirming - after dropping a HGST HDN726060ALE610 from a zpool mirror, it was stuck at 10% for 36 hours (it'll finish faster without other activity, RIGHT?). Five minutes of these tiny dd reads caused it to finish. Skepticism dismissed. – Bill McGonigle Jun 03 '19 at 22:13
  • When I did my test on a 14TB Elements USB drive without the `dd` it endet with 90% still to go. Now it is at 10% to go since four hours, already. Temperature is at 49°C. I just changed the `dd` to `watch` from GreatEmerald's answer but I am not confident it will change anything. – bomben Dec 01 '20 at 00:24
9

A variation on Ari's answer is to use watch, because the smartctl output may in fact be interesting to keep track of the status:

sudo watch -d -n 60 smartctl -a /dev/sdx

This will auto-update the output of smartctl -a every 60 seconds, so you can see how much of the self-test time remains, and highlight the changes (so it's easier to spot that the test is indeed progressing).

GreatEmerald
  • 201
  • 2
  • 3
5

I tried the solution from Tobu, in my case the I kept finding the external USB drive in sleep mode regardless sometime after starting the test and interrupting it, it seems dd ended up reading from a kernel cache and the cache was large enough for the disk to enter sleep mode. I noticed that calling smartctl to ask for status was always able to "wake up" the disk. So: this version of the same idea did the trick for me:

sudo bash -c 'while true; do smartctl -a /dev/sdb > /dev/null; sleep 60; done'

After 5 hours the external USB disk is still spinning. For the first time I could see a smartctl long test finish in an external disk.

I believe this solution also has the advantage that the disk heads are not moved unnecesarily every minute. The long run finished almost exactly in the predicted time (the keep-awake script did not add time to the run)

Ari
  • 391
  • 3
  • 4
2

The captive test may not work if it takes more than 20 seconds.

Source: ticket #303, titled "In smart test captive mode, extend the timeout as described by the ATA device".

Pierre.Vriens
  • 1,415
  • 38
  • 16
  • 20
Sergey V
  • 21
  • 1