0

dmesg reports to be the following

[78909.100057] ata5.00: failed command: WRITE DMA
[78909.100063] ata5.00: cmd ca/00:08:80:08:00/00:00:00:00:00/e0 tag 11 dma 4096 out
                        res 51/04:00:88:08:00/00:00:00:00:00/e0 Emask 0x1 (device error)
[78909.100067] ata5.00: status: { DRDY ERR }
[78909.100069] ata5.00: error: { ABRT }

lsscsi reports

....
[4:0:0:0]    disk    ATA      Maxtor 6H500F0   1DD0  /dev/sdc
[5:0:0:0]    disk    ATA      Maxtor 6H500F0   1DD0  /dev/sdd
[6:0:0:0]    disk    ATA      Maxtor 6H500F0   1DD0  /dev/sde 
....

and ls /dev/disk/by-path/ reports

....
lrwxrwxrwx 1 root root   9 Oct  7 18:22 pci-0000:00:1f.2-ata-4 -> ../../sdd
lrwxrwxrwx 1 root root  10 Oct  7 18:22 pci-0000:00:1f.2-ata-4-part1 -> ../../sdd1
lrwxrwxrwx 1 root root   9 Oct  7 18:22 pci-0000:00:1f.2-ata-5 -> ../../sde
lrwxrwxrwx 1 root root  10 Oct  7 18:22 pci-0000:00:1f.2-ata-5-part1 -> ../../sde1
lrwxrwxrwx 1 root root   9 Oct  7 18:22 pci-0000:00:1f.2-ata-6 -> ../../sdf
lrwxrwxrwx 1 root root  10 Oct  7 18:22 pci-0000:00:1f.2-ata-6-part1 -> ../../sdf1
....

The failed disk is either /dev/sdd or /dev/sde, but since both disks here is the same make and model it is hard for me to determine which serial number under /dev/disk/by-id is the correct one to remove.

Now the question is: dmesg refers to ata5.00 , which one is correct? lsscsi or /dev/disk-by/path/ or (more likely) are both correct. How to determine which disk is actually failing?

EDIT: This question is about finding out what ataX.XX in dmesg refers to what /dev/sdX

Waxhead
  • 1,212
  • 1
  • 19
  • 34
  • This looks incomplete. Besides the syslog lines, what other symptoms of "failure" are there? Are you sure you're reporting all of the salient lines from the syslog (dmesg output)? See https://superuser.com/questions/641219/possibly-a-dying-hard-drive-but-reads-writes-work-unsure-about-log-entries Have you looked at or run any SMART tests? – sawdust Oct 09 '17 at 19:28
  • @sawdust: Yes, this is incomplete. The question is about identifying the drive and not about the disk error. dmesg shows ataX.XX and lsscsi and /dev/disk/by-path shows different ata numbers. If I need to look at the disk serial number in /dev/disk/by-path/ i need to find the correct disk. – Waxhead Oct 09 '17 at 21:57
  • 1
    You can use `smartctl` from `smartmontools` to obtain the [SMART](https://en.wikipedia.org/wiki/S.M.A.R.T.) status of your disks. – xenoid Oct 10 '17 at 09:35
  • I'm with @xenoid. The first thing to do is to run `smartctl --all /dev/sdX` for all the drives and look at the variables related to errors ("offline uncorrectable", "pending" etc), and at the drive's error log. The second step is to run `smartctl -t offline /dev/sdX` for each drive and then recheck the error values and logs. Note that the drive performing self tests has its normal performance degraded. – kostix Oct 10 '17 at 11:58
  • For the future, your best bet is to setup `smartmontools` for periodic disc checks. On all my machines, I have short self-tests enabled at night, and long self-tests on weekends at night, and in a way so that each drive has its own time slot so no two drives do long self-test at the same time. The tooling is able to send you e-mails when one of the critical parameters is noticed to increase its value (such as the number of offline uncorrectable sectors). – kostix Oct 10 '17 at 12:00
  • The bet option would be using Smartmontools, more information, how it can be used, can be found here - https://www.howtoforge.com/checking-hard-disk-sanity-with-smartmontools-debian-ubuntu – batistuta09 Oct 12 '17 at 14:49
  • I am perfectly aware of smartmontools - The question is about how to relate ataX.XX to what /dev/sdX – Waxhead Oct 18 '17 at 16:24

0 Answers0