Find chunks of all 0x55 or 0xAA on hard drive image and blank them?

Question

This is a weird request, but I had a hard drive that I initially ran badblocks on and then stopped partway through. So it started out with part of the drive covered in 0xAA and another part of the drive covered in 0x55. I then put NTFS filesystem on it, leaving the empty regions filled with this garbage, and then files written to it overwrote those regions.

Later the drive died, with many chunks of data missing throughout the entire drive.

It's now a raw image of an NTFS partition stored on a btrfs filesystem that I can probably delete, but I want to make sure there aren't any important files on it I can recover.

The drive image is taking up a lot more space than necessary because all of those 0xAA and 0x55 can't be stored as "holes". Likewise, NTFS recovery program DMDE lists a lot of "files" that contain nothing but 0xAA and 0x55.

Is there some way to go through and find any blocks/chunks/chains that are entirely 0xAA or 0x55 and blank them to 0x00 so they take up zero space on the btrfs volume? They aren't zero, but they don't contain any information either.

I don't have any direct experience, so no answer, but have a look at this Q&A ( https://superuser.com/questions/274972/how-to-pad-a-file-with-ff-using-dd ) they use `dd` on a source file, `|` pipe it to `tr` which filters the values and `>` redirects the stdout to a destination file. Possibly very slow. — Yorik, May 27 '21 at 18:37
Adjust [this answer](https://superuser.com/a/1510940/432690). You need to generate a bbe script that (instead of removing blocks of zeros) translates blocks of 0xAA or 0x55 to zeros. Can you do this? — Kamil Maciorowski, May 27 '21 at 18:42
Several filesystem tools have a utliity to zero free space, but I don't know if there is one for NTFS. If this was an ssd, I'd suggest telling the filesystem to do a trim. — user10489, May 27 '21 at 19:12
@user10489 `fstrim` easily makes my images of NTFS sparse. The problem here is the filesystem is not healthy. Note the OP uses a recovery program, so it's not about copying files the filesystem knows about. Zeroing allegedly free space in whatever way will overwrite lost files the OP hopes to recover later. — Kamil Maciorowski, May 27 '21 at 19:20
Another idea: use `tr` to translate every 0xAA and 0x55 to 0x00 and save a translated copy. Some of 0xAA or 0x55 are meaningful and should not be translated. So use [this answer](https://unix.stackexchange.com/a/478548/108618) to detect sectors of zeros in the translated copy, but instead of `--fill-mode=+` use `--fill-mode=?` to zero sectors that are zeros in the translated copy; *but do this on the original*. This way you will zero sectors containing nothing more than 0xAA, 0x55 and 0x00. — Kamil Maciorowski, May 27 '21 at 19:24
@kamil-maciorowski : good point. fstrim might do bad things on a damaged filesystem. But arbitrarily changing 0x55 / 0xAA to 0x00 might hit a data (or worse, a metadata) block that just happens to contain these values. And if you use tr to do this as several have suggested, it will change individual bytes, not blocks, which will certainly corrupt it. — user10489, May 27 '21 at 19:32
@user10489 That's why I wrote "some of 0xAA or 0x55 are meaningful and should not be translated" and then introduced an idea of detecting full sectors of zeros in the translated copy and modifying the original according to the result. — Kamil Maciorowski, May 27 '21 at 19:35
A very simple alternative would be to enable btrfs compression. Bytes that repeat very often can easily be compressed and thus reduce the used effective space. — Robert, May 28 '21 at 13:23
@Robert The file already has the compressed flag, but "[if the first portion of data being compressed is not smaller than the original, the compression of the file is disabled](https://btrfs.wiki.kernel.org/index.php/Compression#incompressible)" Anyway it's still useful for not recognizing the junk as files to be recovered — endolith, May 28 '21 at 14:14

endolith · Answer 1 · 2021-05-28T13:10:43.610

1

I realized I could just write my own Python program to do this:

filename = 'NTFS_3TB.img'
chunk_size = 512
with open(filename, 'r+b') as f:
    while True:
        chunk = f.read(chunk_size)
        if chunk == b'':
            break
        if chunk == b'\x55'*chunk_size:
            start = f.tell()-chunk_size
            print(f'5: {start}')
            f.seek(start)
            f.write(b'\x00'*chunk_size)
        if chunk == b'\xaa'*chunk_size:
            start = f.tell()-chunk_size
            print(f'A: {start}')
            f.seek(start)
            f.write(b'\x00'*chunk_size)

I looked through the file with a hex editor and confirmed that the chunk size was correct, stepped through a few iterations and watched them being changed in the hex editor, etc. to make sure it wasn't wiping the wrong chunk.

More efficient version:

filename = 'NTFS_3TB.img'
chunk_size = 512
all_5s = b'\x55'*chunk_size
all_As = b'\xaa'*chunk_size
all_0s = b'\x00'*chunk_size
try:
    with open(filename, 'r+b') as f:
        f.seek(236039143424)  # From last run
        while True:
            chunk = f.read(chunk_size)
            if chunk == b'':
                break
            if chunk == all_5s:
                start = f.tell()-chunk_size
                f.seek(start)
                f.write(all_0s)
            if chunk == all_As:
                start = f.tell()-chunk_size
                f.seek(start)
                f.write(all_0s)
finally:
    print(f'Position: {start}')

edited May 28 '21 at 13:10

answered May 28 '21 at 03:02

endolith

7,507
25
84
121

1

Probably not the most efficient way to do the comparison, but if you don't care about speed, this should work. – user10489 May 28 '21 at 11:44
@user10489 Yeah it's very slow – endolith May 28 '21 at 12:09
if this was C, I'd say replace the == with a memcmp which is optimized for this kind of thing. A less fundamental change would be to generate the block to compare once, outside the loop. Make 3 precomputed blocks, one for each of the 3 values, and use the precomputed blocks instead of generating them each time. – user10489 May 28 '21 at 12:17
@user10489 Good points! I would imagine the overhead from reading in small chunks at a time is the biggest time loser though? (And printing) – endolith May 28 '21 at 13:05
I'm alternating between running this script for a while, then running `fallocate --dig-holes`, then back again – endolith May 28 '21 at 14:16
1

You could probably replace the writing of zeros with a fallocate/trucate that released the block instead of rewriting it. This would also improve performance, but is a bit more complexity to add to the code and test carefully!! – user10489 May 28 '21 at 23:18

score 0 · Answer 2 · answered May 27 '21 at 19:44

0

I don't think a tool exists to do this safely.

If you had a healthy mounted filesystem, fstrim would free all unused blocks.

If you use something like tr to arbitrarily translate 0xAA and 0x55 values, it will get single bytes and likely corrupt valid data. Additionally, tr was originally designed for ascii files and may work badly on binary files.

Even if you only translated whole blocks containing only 0xAA and 0x55 values, you might accidentally clear valid data or metadata blocks.

Probably what you want is something that checked free blocks in the filesystem to see if they are a single value and then used fstrim on each block.

My approach to this would be to:

mount the filesystem read only (if possible) and copy off everything it could
use a file scavanger to get everything else
use checksums and binary compares to remove duplicates in 2 that are also in 1
scan the results of 2 and remove obvious junk

Note that step 1 might get a lot of corrupted files containing zeroed out bad blocks.

answered May 27 '21 at 19:44

user10489

1,173
1
5
10

"`tr` was originally designed for ascii files and may work badly on binary files" – Even if it was *originally* designed like this, an OS able to mount btrfs should use `tr` that follows the [POSIX specification](https://pubs.opengroup.org/onlinepubs/9699919799/utilities/tr.html) that says "the standard input can be any type of file". POSIX-compliant `tr` handles binary data just fine. – Kamil Maciorowski May 27 '21 at 19:54
tr is absolutely the wrong tool for this. It's a great way to corrupt all your data. It's not like 0x55 is an invalid value in arbitrary data, and tr doesn't understand blocks. – user10489 May 27 '21 at 19:57
I'm not saying it's a good tool for *this*. I'm disagreeing with your statement it may work badly on binary files. – Kamil Maciorowski May 27 '21 at 20:00
"you might accidentally clear valid data or metadata blocks." I highly doubt that. And the filesystem is completely hosed already anyway. – endolith May 28 '21 at 01:45
I've looked at enough hex dumps of filesystem metadat to know it is possible. – user10489 May 28 '21 at 11:41

Find chunks of all 0x55 or 0xAA on hard drive image and blank them?

2 Answers2