Combining two corrupted movie files into a working file

Question

I have two copies of a movie, both of which have some corruption, but in different places. Both movies play but skip different frames at different positions.

Is there a way to combine these movie files (such as with ffmpeg) taking "good frames" from both files and combining them into a new file, thus keeping at least most of the movie?

Is there maybe a way to export only readable frames from both movies and supplement one with the working frames of the other one?

Also (or alternatively), can I re-encode a movie file from a corrupt one, but leaving only readable frames in?

This sounds to me like a perfect job for an AI: read the video, figure out which frames are corrupt, then compute those missing frames from what the AI knows (or assumes). Maybe that'll be there one day.

I'm working on macOS under Ventura 13.4.1.

You need to clip the movie into segments (`ffmpeg -ss [skip seconds] -i [input] -t [duration] ...` and then concatenate ([link](https://stackoverflow.com/questions/7333232/how-to-concatenate-two-mp4-files-using-ffmpeg)) the files back together — Brian, Jul 27 '23 at 23:13
Thanks @Brian. Does that mean I'll have to note down the frames that are corrupt, cut out the working parts and then put them all together this way? This seems quite laborious and impracticable when the movie has over 130,000 frames. — Alex Ixeras, Jul 27 '23 at 23:53
What I meant was is it a manual process to note down where to chunk the movie, or is there a way to automate this? — Alex Ixeras, Jul 28 '23 at 00:00
Or use any of the free video editors, such as ShotCut or OpenShot. Some are just front-ends for ffmpeg — DrMoishe Pippik, Jul 28 '23 at 03:04
"some corruption" is too inaccurate for any more accurate answers to appear. Include more details about the problem, please. — Destroy666, Jul 30 '23 at 00:51
(1) "two copies of a movie" – Are they files that used to be identical, but suffered like bit rot in different fragments? Or are they completely different files (possibly different codecs, resolution, bitrate etc.) that only happen to stem from the same recording? Note in the latter case differences may be quite severe: aspect ratio, fps (and thus the total number of frames), interlacing. (2) What about audio? (3) What exact container(s), codec(s) and their settings are we dealing with? — Kamil Maciorowski, Aug 25 '23 at 07:09
@KamilMaciorowski (1) Yes, they used to be identical but both suffered a corruption. (2) Both have audio where the frames are working. (3) I actually have several of those duplicate, but broken files, (a) mpg: MPEG-1/2 Video (mpgv), standard 420YUV with MPEG Audio layer 1/2 (mpga); (b) avi: Xvid MPEG-4 Video (XVID), 420YUV with MPEG Audio layer 3 (mp3) — Alex Ixeras, Aug 25 '23 at 13:09
Where did the files come from? If the corruption happened in filesystem(s) that return read errors (instead of silently giving corrupt data) then use [this answer](https://superuser.com/a/1405862/432690) to recover as much of the first file as possible, then switch to the second file (with the same target and mapfile). Chances are `ddrescue` will build the error-free original. This does not rely on files being videos, but *it absolutely requires the filesystem(s) to give you read errors*. If the filesystem(s) don't, then indeed you need something that understands video files. — Kamil Maciorowski, Aug 25 '23 at 16:49
I `rsync`d the files from one hard drive to another (bigger) one. During that transfer both the original drive and the destination drive must've experienced some corruptions. I've checked the destination drive and it reports _consistency errors_. I've tried recovering a some files with `ddrescue` saving the copy on a separate drive, but it didn't restore the files. — Alex Ixeras, Aug 28 '23 at 05:30
(1) "I've checked the destination drive and it reports consistency errors." – How did you check? Did you check and possibly (partially) "repair" the filesystem as a whole? What I need to know is if you get read errors *when you try to read the files in question* (e.g. with `cat >/dev/null`). (2) "I've tried recovering a some files with `ddrescue` saving the copy on a separate drive, but it didn't restore the files." – It's totally unclear if you told the tool to read the whole block device or separate files from their filesystem(s). // I'm going to write an answer and we will see if it helps. — Kamil Maciorowski, Aug 28 '23 at 05:53
Thanks for patiently guiding me through this. (1) I've used a commercial software (Drive Genius) to check, but macOS's Disk Utility reports similar errors. Both softwares say they can't repair the drive. I've run `cat /path/with/corrupted/files/* >/dev/null` but did not get any read errors. (2) I tried recovering separate files from the file system (specifying individual files using `ddrescue`), but was not successful in doing so. — Alex Ixeras, Aug 28 '23 at 07:08

Kamil Maciorowski · Answer 1 · 2023-08-28T09:05:46.300

Preliminary notes

This answer introduces a method to merge two or more corrupted files that used to be identical, but suffered like bit rot in different fragments.
The method is general in the sense it's not limited to video files.
The question mentions two copies. This answer is general and assumes two or more copies, possibly (but not necessarily) on different filesystems.
There's a condition related to the filesystem(s) holding the files.
There is no guarantee the method will work (see "Conclusions" at the end).

Condition

The method requires the filesystem(s) holding the files to throw read errors when you try to read the corrupted parts. One copy from a filesystem that does not throw read errors may be used as the last resort, but at least one copy from a filesystem that does throw read errors must be used earlier.

Why the condition and what it implies

Note a video file that used to play well may be corrupted in two different ways:

Some fragments have changed for whatever reason, some bits have been flipped, but the filesystem considers the file healthy. The corruption manifests itself only when its internal structures are interpreted by a program that understands them and expects them to follow some specification. In your case of video files the program may be a player that will tell you some frames are corrupted; in case of PDF files the program may be a PDF reader; etc. But for programs that don't care about the internal structure, a file corrupted this way will look healthy: cat >/dev/null will read the whole of it, cp will copy it, base64 will encode it, md5sum will compute some sum, od will show you octal bytes; all these without any error.
The corrupted fragments are because the underlying block device cannot read some data at all (badly scratched CD or DVD; HDD with bad sectors) or because the filesystem somehow knows some data from the block device is bad (e.g. Btrfs verifies checksums on read). You get read error(s). Note in case of a video file your player may (or may not) ignore the error(s), skip the frame(s), so effectively it may treat this kind of corruption like the other kind (described above); but tools like cat or cp will stop on the first error.

For this answer to work, we want all the corruption to be accompanied with read errors. A simple way to tell if there is at least one read error is to read the file: cat the_file >/dev/null. If this command does not throw a read error for a file you know is ("internally") corrupted then the file can only be used as the last resort (this will be explained later). If this command throws a read error then you may hope all the corruption within the file will be accompanied with read errors.

Note it may happen some corruption within a file is accompanied with read errors, while corruption in other place(s) of the file is "internal" and not accompanied with read errors. Here we assume that if there is a read error then we are lucky and all the corruption is accompanied with read errors.

Procedure

This another answer of mine provides a command to read as much of a file (for which the filesystem throws read error(s)) as possible. Pick a copy for which the filesystem throws read error(s) and run the command:

ddrescue /path/to/copy1 /where/to/save/result mapfile

where result and mapfile don't exist yet, the tool will create these files. Watch the output; it should tell you there are some read errors. After the command finishes you can inspect the mapfile and see the list of data blocks contains line(s) with status - (or use ddrescueview).

Then proceed to the next copy for which the filesystem throws read error(s):

ddrescue /path/to/copy2 /where/to/save/result mapfile

Where result and mapfile are the exact files from the previous run of ddrescue. The tool will try to fill in the gaps, i.e. after reading the mapfile it will know what parts of copy1 did not get to the result (because of read errors) and it will try to read them from copy2.

Again, watch the output and inspect the mapfile. It may happen some parts we need from copy2 cannot be read; if so, try reading from copy3, copy4 and so on. All these copies should be from filesystem(s) that report read errors.

Unless one (or more) of the underlying block devices is about to die on the hardware level, the sequence in which you read the copies does not matter. If you suspect some block device(s) are about to fail then a sane approach will be to read from healthy devices first and maybe you won't need to read from the unhealthy one(s) at all.

Hopefully eventually ddrescue will tell you that 100% of the file has been rescued and you will confirm this by inspecting the mapfile, were you will find just one line on the list of data blocks and the status will be +. If this happens then you can be certain that each fragment of result has been read from a copy for which this particular fragment does not throw a read error; so result should be a proper file (this assumes there was no corruption not accompanied with read error(s)).

What if `ddrescue` rescues less than 100%?

If after reading from all the copies (for which the corruption is accompanied with read errors) there are still gap(s) to fill (status - in the mapfile), there are few options:

If there are no more copies (i.e. no copies for which the corruption is not accompanied with read errors) then you won't be able to fill the gaps at all. You have tried all the copies and they all reported read errors in the same fragment(s), so there is nowhere to get the missing data from.
If there is just one copy for which the corruption is not accompanied with read errors then use the copy with ddrescue as the last resort. The tool will fill the gaps and maybe the data from the last copy will turn out not to be "internally" corrupted in these exact fragments and result will become a proper file.
If there are two or more copies for which the corruption is not accompanied with read errors then you can use each of them separately with ddrescue as the last resort. "Separately" means you should try each such file with separate copies of the result and the mapfile. For each copy of the result the tool will fill the gaps from a different file and maybe in at least one case the data from the file will turn out not to be "internally" corrupted in these exact fragments and result will become a proper file.

What if there is corruption not accompanied with read errors?

Some possible scenarios:

Files for which corruption is not accompanied with read errors are everything you have.
ddrescue read all the files for which corruption is accompanied with read errors, but there are still gaps; and you have two or more files for which there are no read errors, but none of them alone is enough to fill all the gaps and produce a proper file.
The assumption that one read error for a file implies that all the corruption in the file is accompanied with read errors is wrong.

In these and similar cases you need a tool that understands the internal structure of the file and is able to detect corruption not accompanied with read errors. In your case of video files this will be some video editing software; so it will be something you explicitly asked about. This answer does not cover this case.

Conclusions

Depending on the circumstances the method may or may not work.

If you can read all the copies without read errors then the method cannot work.
If at least one copy gives you read error(s) then there's a chance all its corruption is accompanied with read errors and you will be able to fill the gaps by reading from other copy (or copies); the method may work.
The method may seem to work (i.e. ddrescue may tell you it recovered 100% of the file) while it really doesn't (i.e. the file is still "internally" corrupted, e.g. a video file still contains invalid frames). This is because there may be corruption not accompanied with read errors. We assumed it's not the case, but in practice it may be.

If at least one copy gives you read error(s) then you should try the method and maybe it will work.

Thanks for the very detailed and understandable answer! I know I'll be referring to it once I come across this scenario. (Unfortunately?) None of my (video) files have read errors. — Alex Ixeras, Aug 30 '23 at 11:33

Combining two corrupted movie files into a working file

1 Answers1

Preliminary notes

Condition

Why the condition and what it implies

Procedure

What if `ddrescue` rescues less than 100%?

What if there is corruption not accompanied with read errors?

Conclusions

Linked

Combining two corrupted movie files into a working file

1 Answers1

Preliminary notes

Condition

Why the condition and what it implies

Procedure

What if ddrescue rescues less than 100%?

What if there is corruption not accompanied with read errors?

Conclusions

Linked

What if `ddrescue` rescues less than 100%?