2

I have 1Tb HDD on my laptop, it is about 2 years old. Recently, I started noticing random hang up and freezes, then checked my HDD's health. The first time I checked there were 502 bad sectors, then it kept increasing every day, over 3 days, it jumped to 702. Is it a bad sign? Does that mean it might fail soon?

enter image description here

UPD

After installing Speccy. SMART status shows Warning but every attribute is Good and Reallocated sector counts increased to 750

UPD

It increased to 807

  • 1
    Next time when it freezes try to explore Event Viewer to see if there's any meaningful message. If the reallocated sector count keeps on increasing soon your HDD will go BAD. However it's matter of luck. An HDD with a few reallocated sectors (but stable) can last for a few more years whereas one with constantly increasing count can go bad in no time. – patkim Oct 16 '22 at 12:17
  • Given the rapid increase I wouldn't bet on "the drive is good", although I don't understand why the normalised value is stuck at 100... If it's a "primary drive" you'd better replace it now. At least I would... If it's a "secondary drive" (e.g. for backups) you can wait and see what happens. – PierU Oct 16 '22 at 12:36
  • @PierU, it is primary and only drive I have. I am confused as you are – Javokhir Matnazarov Oct 16 '22 at 12:39
  • 1
    It's the rate vs. time and the RAW value that is worrying. Even though the drive may have plenty of sectors to spare, the rate and number IMO indicate a more serious problem with the drive, and if it was mine I'd replace it ASAP. – Joep van Steen Oct 16 '22 at 14:50
  • 1
    If this were my drive and I cared about the data on it, I would NOT DO A SINGLE TEST at this point until you duplicate the drive to elsewhere using a utility like the `dd` utility in linux or similar in Windows. Doing comprehensive r/w sector scans on a failing drive can and probably will make the problem worse if there is indeed a problem. If you care about your data, back it up now. I too have been doing this for over 30 years (many of us have). – Señor CMasMas Oct 17 '22 at 05:16
  • Does Toshiba have its own HDD utility software that report SMART attibutes? At least, the interpretation of the attributes would more reliable. – PierU Oct 17 '22 at 06:03

4 Answers4

4

I treat SMART like this (based on 20 years experience in data recovery):

  • If SMART says all is well, do not for one minute think your drive can not die the next minute.

  • If SMART however gives reason for caution, act as if the drive will die soon.

IOW, drives die without SMART ever warning but some SMART values can actually help us determine if some issue is going on a with a drive. Examples are values for reallocated sectors and sectors pending reallocation. These are sectors the drive could not read whatever it tried.

AFAIK the RAW value for these attributes is simply a count. If we see 0xFF we know 255 reallocations took place, simple as that. Some manufacturers may employ more complex RAW values for certain attributes (example), but not for these in my experience.

In question we do not only see a lot of reallocations, although this is arbitrary, and some say 700 or so reallocations isn't a lot, we also see them increase rapidly. IMHO the number and the rate are alarming. It is why I believe the drive is dying.

If we consider a patient with a wound, it may take some time for blood pressure to drop below critical values. But if we observe at the same time patient is losing a lot of blood we're not going to wait until his blood pressure is below the critical value, we act immediately and try to prevent the situation from getting worse.

Each time a drive encounters a sector it can not read and for which it can not ECC correct the data a drive initiates a so called error correcting procedure. The OS can not do anything else than sit these out and so this can cause apparent hang-ups. These procedures takes at least seconds for each sector and may take up to 20 seconds.

You will push 'dying drives' closer to the edge by the simple fact of reading from them. So you then better make each read count and not waste them on disk surface scans. A data recovery engineer would hook up such a drive to specialized hardware disk imager and skip bad sectors as much as possible. Closest we can get to this specialized hardware is probably the open source tool HDDSuperClone.

So if you need the data from this drive my advice would be to clone it ASAP using this software. If you don't I'd replace it.

EDIT: It seems this may be an SMR drive. Once a SMR drive actually fails they're often problematic to recover data from even by a data recovery lab.

Joep van Steen
  • 4,730
  • 1
  • 17
  • 34
  • 1
    I would also mention ddrescue which is excellent for this job. It's linux based though. And, some drives used to "shut down" during clones. Seagate barracudas were terrible for this. As soon as they hit a bad spot they would simply return 0xFF on every subsequent read. So, I used a USB relay and HDD dock with ddrescue to automatically power down the drive and power it back up whenever a bad sector was detected and then skip to a new section. Worth it's weight in gold. :) – Appleoddity Oct 17 '22 at 20:32
  • @Appleoddity, I agree on ddrescue but HDDSuperClone is a step up. Note that HDDSuperClone is Linux based too and can also work with relays and YKUSH devices to power cycle drives. – Joep van Steen Oct 17 '22 at 21:28
0

There perhaps is too much discussion here around interpreting your SMART values, rather than discussing what is necessary to determine the condition of your disk.

In 30 years of diagnosing bad hard disks rarely ever did I use SMART values to do it. Indeed, it didn't even exist back then. That's not to say there isn't value with the SMART stats, but you can see the kind of confusion it causes. The problem with SMART is that each manufacturer implements it in different ways, and it's nearly impossible to know for sure what you're looking at unless the manufacturer's own specification is documented, or their own tool interprets them.

My suggestion is that you perform a surface scan on the disk. A surface scan, sometimes referred to a "long test" in some tools, will physically read (and optionally write) every physical sector on the disk helping to determine the condition of the drive. This type of test is primarily used on hard disks, and has little value on SSDs, although I was able to detect a bad SSD one time with a surface scan, that was otherwise passing other diagnostic tools.

First, a warning. If your drive has bad sectors, this test will find them. If your drive is failing, it may have hundreds of bad sectors. Running this test on a failing drive will make it worse. Data recovery and backup is priority one! Rescue your data FIRST with a backup or clone of the disk BEFORE running surface scans on it.

Now, there are many tools which can perform these tests. Most manufacturers release their own tools which can do this type of test, and there are other commercial and freeware products available. Back when I did this on a regular basis, I most recently used a tool call HDD Regenerator. Before that we used a tool called SpinRite. In addition there are Seagate Seatools, Western Digital Data LifeGuard Tools, and many others. However, I fall back on a freeware tool called HDDScan for easy Windows based tests, especially for end users.

So, how do you use HDDScan to determine if your drive is good or bad?

Get the tool and start the test:

  • Navigate to https://hddscan.com and download the .zip file of the tool.
  • The tool does not require installation on your computer. Instead, you can decompress the downloaded .zip file by using your favorite tool, or by right-clicking the downloaded .zip file and choosing 'Extract All,' and then follow the instructions.
  • You should now have the decompressed HDDScan files in a folder on your computer. enter image description here
  • Double-click the HDDScan.exe file to run the application. HDDScan needs administrative access on your computer and you will be prompted to allow the application to make changes to your computer.
  • Accept the license agreement.
  • The first page you are presented with is the drive and test selection page. Choose your drive from the dropdown, and choose the 'TESTS' button. Then choose the 'VERIFY' test. enter image description here
  • The next page will allow you to choose the sector range you want to test. In this case, it defaults to the whole disk and you can click the right-arrow to continue. Before starting the test, be sure to close all other applications down on your computer to obtain more accurate results and avoid transient read/writes that can throw the sector read times off. enter image description here
  • The test will start immediately and you will see the task in the task list view. enter image description here
  • Double-click the running task to open the live view. You can pause and stop the test here as well. enter image description here

How do you interpret these results?

First, allow the test to complete. It will take a significant amount of time. However, you should monitor it occasionally. As mentioned earlier, this test will find bad sectors if they exist. And if it starts finding a lot of them (>10), you can stop the test. The drive is failing. There is no sense in continuing to beat it up.

When the test is complete you can review the stats. The test status window has three tabs: Graph, Map, and Report.

The Graph Tab. This tab displays the testing speed in KB/s during the course of the test. We expect the drive should maintain a fairly consistent read speed throughout the test. Transient spikes may not be an indication of a problem, and could be artifacts of other disk access that occurred by Windows during the scan. It is also worth noting that because the outer edge of a physical disk spins faster than the inner edge, you may see a ramp up or ramp down effect in speed over the course of the test.

Extended bursts of decreased read speed is a clear indication the drive may be having trouble reading the disk surfaces.

The Map Tab. This view is perhaps the most useful. Here you see a live view of the status of each sector that is read including the time it took to read the sector, and if any bad sectors are detected. In this view, we are primarily interested in the statistics on the right hand side.

enter image description here

This chart gives you the number of sectors read at certain speeds during the test. By far, most of your reads should be less than 10ms on a properly working drive. By default, any sector which takes longer than 50ms to read will create a log entry on the 'Report' tab. Sectors which take longer than 50ms to read are not necessarily bad. Again, because this is on a running windows system, your drive may be actively used during the test which will affect read speeds. However, if you begin to see a large number of sectors, especially when they are consecutive, taking over 150ms or worse, over 500ms, then this is a pretty clear indicator the drive is having issues reading this area of the drive.

Finally, the number of 'Bads' is the number of bad sectors detected. These are sectors in which the drive was unable to read the sector, and the data in that sector is most likely lost. While slow read times can indicate issues, the bad sector count is a clear indicator the drive has physical damage to it's disk surfaces.

The Report Tab.

This tab displays a log of all events of interest. Whether that be bad sectors or sectors which took an unusually long time to read, this log will show you a summary of things you might need to be concerned about on the drive.

What indicates a bad drive?

There is a little room for personal experience and preference here. However, a general rule of thumb is that slow reading sectors (>150ms) and bad sectors are an indication of physical issues on the drive. But, it is difficult to set hard and fast rules here. Drives do have a pool of spare sectors specifically for handling bad sectors. Drives will automatically lock out bad sectors and remap them to good sectors. To a point, these minor failures are expected and handled by the drive without any user intervention. I have reason to think (but am not sure) that if ANY bad sectors show up in this test, the drive has already exhausted its spare pool of sectors. So, determining when to replace a drive is sometimes decided by your level of risk tolerance.

Here is how I interpret test results.

  • If the drive has one or two bad sectors detected on the disk. I get concerned the drive is starting to fail. But, I also understand that the drive may have experienced a single event (such as a drop or bang) that damaged that specific area and it very well may continue to operate just fine. This is especially true if it is 2 or 3 consecutive bad sectors. However, there are numerous times this type of test found 1 or 2 bad sectors and even after returning the drive to service, it failed shortly after. So, this is a scenario where you need to decide what your risk tolerance is, backup often, and possibly continue to monitor the drive rather than replace it.
  • If the drive has numerous bad sectors, let's say 10 or more, especially if they are spread around on the drive. The drive is failing. It's time to replace it.
  • If the drive has numerous (greater than 10) slow reading sectors (>150ms), especially when consecutive, this can be an indication of problems on the drive. If no bad sectors are found, I would lean towards continuing to monitor in the future. However, when coupled with bad sectors, these slow reading areas are nearly as clear an indicator of physical damage as the bad sectors are and should be counted the same.

Ultimately, if it was MY drive, and any bad sectors are detected I would replace the drive immediately. Drives that are in tip top shape never have slow reading or bad sectors show up in these tests. In fact, many manufacturer diagnostic tools will fail the drive if ANY bad sectors are detected.

Finally, if you are really interested in SMART values. It would be a neat experiment to record the values before this test, and then look at them again after the test. This test will force the drive to read every available sector, so if there are any problems, SMART should be detecting it.

Appleoddity
  • 11,565
  • 2
  • 24
  • 40
  • 1
    *GREAT +1 write up*. The only thing I would change is telling OP to duplicate the drive before running ANY of these tests on it. Probably where you put the "if it was MY drive" part and mentioning that in the beginning before OPs eyes glaze over. Whacking the drive trying to get it to spin up or attempting to scrape a drive after the failures is no fun.. :) Just my opinion. – Señor CMasMas Oct 17 '22 at 05:24
  • 1
    When performing a "surface scan", what about the physical sectors that have already been reallocated? I mean, can any sofware address a physical sector by bypassing the relocation map of the controller? It not, the sector will simply be reported as "good". – PierU Oct 17 '22 at 05:54
  • 2
    My suggestion is you CLONE the drive ASAP using something like HDDSuperClone, rather than torture it with disk scans. This is mistake no. 1 we see in data recovery labs, people that keep trying and torturing their dying disk using disk scans, chkdsk, etc.. – Joep van Steen Oct 17 '22 at 07:44
  • 1
    @PierU You can't scan already reallocated sectors unless you have access to pro equipment like Acelab PC3000. – Joep van Steen Oct 17 '22 at 07:46
  • 1
    @JoepvanSteen OK, so that is defeating (at least partly) the purpose of a surface scan in the particular case of this question. A surface scan may report only a few bad sectors, but not counting the already reallocated ones... – PierU Oct 17 '22 at 09:18
  • @PierU - Very good question. I mentioned in the answer that I have reason to believe that the drive has already exhausted it's spare pool of sectors by the time these tools start seeing bad sectors. However, I imagine there must be tools which can disable this remapping function. Again, I'm not sure. That is why I evolved to simply replace any drive which shows bad sectors on a surface test and, indeed, some tools fail a disk that detect any bad sectors. Yes, a drive that is in an early stage of failure may go unnoticed with a surface scan, but it's been an extremely reliable test for me. – Appleoddity Oct 17 '22 at 20:20
  • @SeñorCMasMas Yes, I agree with you fully. I just wanted to make a small disclaimer and keep from making this answer even larger and more complex. :) I'll move the warning. – Appleoddity Oct 17 '22 at 20:21
  • WOW! Apparently people have strong opinions about this. Your ZERO score is the #1 score right now! :^P – Señor CMasMas Oct 18 '22 at 15:11
-2

Below are two answers, before and after seeing the SMART attributes. One describes the disk as dying soon, and the second as not perfect but still in a working condition.

This has precipitated an argument here between people who believe in SMART attributes and people who don't.

As usual, the truth is probably in-between. The disk should be watched for further degradation, but at the moment there is no indication that it will fail in the very near future.

A product like Speccy that analyzes the SMART attributes, is preferable as the tool, above one that just reports the raw data and leaves it to us to argue about.

Regarding the number of Reallocated sectors : This is not the same and is less serious than Unrecoverable sectors. Modern disks are fabricated with thousands of spare sectors, meaning they are made to recover from such problems. The point of no return arrives when these sectors are exhausted and no more sectors can be mapped. A disk that starts showing Unrecoverable sectors, and their number is growing, should be replaced.


Yes, the disk is failing. About a hundred bad sectors per day is extremely worrying.

Save your data before it fails completely.

Please add a screenshot of where you see the number of bad sectors, so I can be sure of my prognosis.


According to the screenshot, your disk in is in a very good shape, no errors at all.

You were misled by the SMART attributes. The values 100, 200 and sometimes 253 are normalized values that mean "no errors". These are the initial values of most SMART indicators, and errors cause them to go down to zero. The Raw values are mostly to be ignored - they are usually divided into bit-fields, so treating them as integers is meaningless.

DO NOT replace the disk - there is nothing wrong with it.

harrymc
  • 455,459
  • 31
  • 526
  • 924
  • I edited the message and added the screenshot – Javokhir Matnazarov Oct 16 '22 at 09:02
  • thank you, if it was failing, i was going to replace it with DRAMless SATA SSD – Javokhir Matnazarov Oct 16 '22 at 09:14
  • To avoid such mistakes, you could use a product such as [Speccy](https://www.ccleaner.com/speccy), which will also analyze the SMART values and mark each one with Good or Bad. – harrymc Oct 16 '22 at 09:36
  • 3
    I beg to differ. Relying on the generally worthless normalized SMART values is not sane. This disk is dying and the raw values for most counter-type attributes are not bit fields on just about any drive. There are 702 remapped sectors and 702 separate remapping events indicated. Your update made this answer incorrect. I suggest you remove it. – Daniel B Oct 16 '22 at 10:15
  • 1
    @DanielB: There are two approaches on our site : people who believe in normalized values and people who believe in raw values. I would prefer the poster to install Speccy to analyze these values for us and give the final verdict. It would be nice to have here the judgement of a professional company like Ccleaner. I would then delete from my answer the wrong half. – harrymc Oct 16 '22 at 10:19
  • I will install Speccy and check – Javokhir Matnazarov Oct 16 '22 at 10:47
  • After installing, I checked the hard drive. The status is `Warning` but every attribute is `good` – Javokhir Matnazarov Oct 16 '22 at 10:50
  • 1
    This means that the disk is still good and can be used, but you need to check it from time to time and mind your backups (which is always good idea). If you prefer not to use a disk with a Warning and to get a new disk, this disk can still be used as a secondary disk, as it can still last for quite some time. – harrymc Oct 16 '22 at 11:04
  • 2
    @JavokhirMatnazarov the drive is failing at a rapid rate. It’s already in caution state. Sorry harrymc, but this answer is not correct, at all. Even the OP has indicated the hang ups and issues it is already causing. Replace this drive asap - it is bad. – Appleoddity Oct 16 '22 at 13:16
  • @JavokhirMatnazarov: Ignore the people who do not believe in SMART. Keep an eye on the disk and see if the number of the Reallocated sectors has not stabilized. It's possible that the firmware has done a scan of the disk and has fixed weak sectors by remapping them to stronger ones. If, say, the number exceeds 1500, this enters the danger zone and the disk should be replaced. If it stabilizes, then the disk is not in the prime of its life, but you may keep using it. (Answers based on SMART always get downvoted by some people, even if funnily enough "Reallocated sectors" is a SMART attribute). – harrymc Oct 17 '22 at 07:43
-2

Your reallocated sector count is not 750. For SMART values, 100 is "normal", lower is worse, and below the indicated threshold means "failing". The raw value is not standardized at all.

In some harddisks, the raw value for this attribute is really the number of reallocated sectors, but it can be also some compound value where different bitfields that are part of the value mean different things (that's also why the value is displayed in hex).

So if no value is going below 100, and in particular if there are no other values going down that indicate read errors etc. which are the reason sectors get reallocated, you don't have to worry at all. The warning is lying to you.

I started noticing random hang up and freezes,

Could have other reasons, like a cable not correctly seated. Did you investigate the system errors that lead to the hang up/freeze?

dirkt
  • 16,421
  • 3
  • 31
  • 37
  • 2
    but both Speccy and CrystalDiskInfo show reallocated sector count 750. When I converted raw value to decimal, it says 750 – Javokhir Matnazarov Oct 16 '22 at 11:10
  • 1
    Cable errors do not result in sectors going bad and the need for them to be replaced. Bad sectors can perfectly explain freezes by themselves. Since we have many bad sectors and freezes it seems obvious that those two are related in this case. – Joep van Steen Oct 16 '22 at 14:53
  • @JavokhirMatnazarov they both assume wrongly that the raw value always represents the actual number of sectors, that's why they say "750". But that's a huge value, and their is no way that the normalized value would stay at "100". So that assumption is wrong for your harddisk. – dirkt Oct 16 '22 at 17:00
  • @JoepvanSteen No, cable errors don't result in bad sectors, but they can result in random hangups and freezes. And we **don't** have many bad sectors, as explained. – dirkt Oct 16 '22 at 17:01
  • 1
    My point is, bad sectors can explain hang ups pretty well and since we have plenty of those it seems the most likely explanation. Yes, 750 is a lot of bad sectors and since OP saw number increase by hundreds in just a few days, assuming something serious is going on is the analysis that makes most sense. – Joep van Steen Oct 16 '22 at 21:16
  • 2
    @JavokhirMatnazarov. Then explain value for reallocation events. – Joep van Steen Oct 16 '22 at 21:21
  • 1
    And there is an easy way to find out what it is: Look for the actual errors when the hang up/freeze happens. Transport error? Then likely a cable. As I wrote in the answer. Also, reallocated sectors actually **don't** cause hang ups, they get reallocated quickly, and are then read normally. **Pending** reallocating sectors can potentially cause hang ups, if the hard disk still tries to read them, and they error out. – dirkt Oct 16 '22 at 21:28
  • 1
    Reallocated sectors were 'bad' at some point. We don't know what triggered reallocation, drives can reallocate on read if ERP is successful. ERP takes enough time (for every single sector), can still explain hangs. But I see your point. Also these numbers and the rate at which they occur is worrying, – Joep van Steen Oct 16 '22 at 21:33
  • Your read error rate is zero, your seek error rate is zero, both would go up if ERP was necessary. Again: You don't have any bad sectors, nor do you have reallocated sectors. It's a wrong interpretation of the raw value by some tool. A real bad harddisk looks e.g. [like this](https://unix.stackexchange.com/questions/588251/how-to-interpret-these-smartctl-results-possible-solutions). – dirkt Oct 16 '22 at 21:40
  • Let us [continue this discussion in chat](https://chat.stackexchange.com/rooms/139902/discussion-between-joep-van-steen-and-dirkt). – Joep van Steen Oct 16 '22 at 21:42
  • 1
    @dirkt Read_Error_Rate and Seek_Error_Rate at zero definitely don't mean that no error happened. This is typically the kind of raw attribute that is completely uninterpretable if not documented by the vendor. Even on a healthy drive, having zero read/seek error event during the life of the disk is completely unlikely. – PierU Oct 17 '22 at 06:01