2

Given the md5sum of a file, I want to know if anywhere else in the directory tree is another file with the same md5sum (but maybe under a different name). How can I do that in bash?

P.S.: To emphasize, this should work for the entire tree below a given directory, i.e. must work recursively not just in the current directory.

3 Answers3

1

Using find to recursively test all files:

find . -type f -exec \
bash -c 'md5sum "$0" | grep -q 2690d194b68463c5a6dd53d32ba573c7 && echo $0' {} \;

Here, md5sum outputs the MD5 sum and the file name. You need to grep it for the actual MD5 sum as there is no switch to have it just output the sum alone.

You can check the MD5 sum much easier with md5 if you're on BSD or OS X:

find . -type f -exec \
bash -c '[ "$(md5 -q "$0")" = 2690d194b68463c5a6dd53d32ba573c7 ] && echo $0' {} \;
slhck
  • 223,558
  • 70
  • 607
  • 592
  • Thanks slhck looks very interesting, but apparently my md5 command has no -q option; neither has my md5sum command. I am using Xubuntu. Also what is the {} \ for? –  Oct 02 '13 at 06:06
  • Sorry, I had the wrong BSD `md5` tool here. On Linux you need `md5sum`. I'll correct my post when I'm back on a computer. The {} is the file path for each file found. It gets passed to `sh`. The \; simply ends the `exec` call. – slhck Oct 02 '13 at 06:34
1

The other solutions are good but I want to propose one with fewer spawned processes, which should be significantly faster for many small files, if you have GNU find:

find /path/to/tree -type f -exec md5sum \{\} + | sed -nre 's/^md5-to-search-for  //p'

or without GNU find:

find /path/to/tree -type f -print0 | xargs -r -0 -- md5sum | sed -nre 's/^md5-to-search-for  //p'
David Foerster
  • 867
  • 5
  • 19
0

Borrowing some of the solution from slhck, I've came up with

find . -type f -print0 | while read -r -d '' f;
do
 md5sum "$f" | grep "$1"
done

Where $1 is the first argument. If you want to check for a missing argument start the file with:

if [ -z "$1" ]
  then
    echo "No argument supplied"
    exit
fi
slhck
  • 223,558
  • 70
  • 607
  • 592
tbrixen
  • 121
  • 4
  • This breaks if files contain whitespace in their path. To iterate over files you should use `find` with `exec` or globbing (e.g. `**`) – slhck Oct 02 '13 at 06:31
  • Another option would be to use `-print0` in your `find` command, and `xargs -0`. So, in this case, `find . -type f -print0 | xargs -0 md5 | grep (your MD5 code)` – Kent Oct 02 '13 at 06:36
  • I like this solution a lot. However it gives me tons of messages on stderr: "md5sum: somefilename: No such file or directory". I wonder if there's a way to suppress that? –  Oct 03 '13 at 05:50
  • 1
    @gojira If you get a "no such file or directory" that is probably because the files contain whitespace, and brixenDK's command breaks on this (due to the `for f in …`). If one file was named `foo bar`, for example, it'd try to do an MD5 sum on `foo` and `bar`, which both don't exist. For a good explanation why this happens and how to avoid it, see: http://mywiki.wooledge.org/ParsingLs – slhck Oct 03 '13 at 06:54