2

I have a bunch of mp3 files, of which some got cut short (due to an unfinished download or something). I want to determine the good from the bad (i.e. the ones that cut off in the middle). There's a few possibilities listed in answers to related questions, e.g.

that use metadata to determine if length does not match what it's supposed to (and find and correct other errors). There's a few tools that can be used for this, such as mp3val, mp3check and mp3diags (all of which were available to me via apt-get in Ubuntu repositories and looked promising and easy to use), as well as checkmate (which I didn't try).

However, none of these worked in my case because apparently all the files appeared to have metadata errors. So that left me manually listening to the end of each to see if it trailed off correctly or was obviously cut short. Is there anyway to do something like this automatically (listen for abrupt sound cutoff) for a large number of files?

I found at least one way (which I'll post as an answer). Obviously this whole approach is based on an assumption and will depend on the nature of the mp3 files in question - namely whether or not they are expected to end in silence. However, I expect that would be true for most people with this general problem, so it seemed a useful route to examine and post for others who may have the same issue.

argentum2f
  • 141
  • 4

1 Answers1

1

I found the command line application sox (which was already installed on my ubuntu machine - perhaps used internally by one of my audio apps) can give you stats about 'loudness' (amplitude, etc.). It also can trim files and do lots of other things. Most importantly, it allows you to string commands together - such as trimming the file to the last half second or so and looking at amplitude stats. To do this you can do something like this:

sox file.mp3 -n trim -0.5 stat

The 'Maximum amplitude' value reported should be close to zero if the file ended in silence, or between 0 and 1 if it did not. Adding a few more commands (in linux/bash) to pull out just this number:

sox file.mp3 -n trim -0.5 stat 2>&1 | grep 'Maximum amplitude' | sed 's/.* //g'

I added the 2>&1 to suppress some warning and error lines from sox I didn't care about. Finally, to make a determination, you can compare this number to a reasonable threshold (e.g. .1 seems to work well in my case). Doing this automatically on a large number of files can look something like this:

for f in $(ls *.mp3); 
    do echo $f:; 
    end_amp=$(sox $f -n trim -0.5 stat 2>&1 | grep 'Maximum amplitude' | sed 's/.* //g');
    python -c "print('bad' if $end_amp>0.1 else 'good')"; 
done

This ran in a few seconds for me on 300+ files, and can easily be modified to move or delete the bad files as desired. I used python to do the float number comparison because there didn't seem to be an easier way to do this in bash. It failed on a few files for some reason (I think these were almost completely empty, so sox got nothing - but those kinds of files could be caught with a filesize threshold). It worked on most.

Here it is as a one liner to copy/paste into the terminal. Just run in the folder where the bad files are suspected:

for f in $(ls *.mp3); do echo $f:; a=$(sox $f -n trim -0.5 stat 2>&1 | grep 'Maximum amplitude' | sed 's/.* //g'); python -c "print('bad' if $a>0.1 else 'good')"; done
argentum2f
  • 141
  • 4