How can I locate duplicate binary files using freely available tools?

Question

As part of an exercise to help reduce the duplication of files, management would like to get a report of all duplicate binary files including images as these seems to exceed 100,000 or more and considering the size of these, it has an impact on backup. Is there a way to locate duplicate files, spit out a report and then run through a process of deleting these or archiving them. Ideally the applications should work with Windows or Linux.

Have you tried some suggestions from [this SU question?](http://superuser.com/questions/8223/which-duplicate-files-and-folders-finders-exist-for-windows?rq=1). It seems that [doubles](http://doubles.sourceforge.net/) will fit your needs — nixda, Aug 10 '13 at 08:35

score 2 · Accepted Answer · answered Aug 10 '13 at 04:05

2

You can use fdupes to achieve it. FDUPES is a program for identifying or deleting duplicate files residing within specified directories.

answered Aug 10 '13 at 04:05

vfbsilva

370
2
12

score 1 · Answer 2 · answered Aug 10 '13 at 08:26

I did this under linux (for my music) by doing an md5sum on all the files, then sorting and counting the number of unique MD5 strings and where there was more then 1 matching them up with the file associated with the MD5 and printing it out. I must say that I think the FDUPES answer above is probably a better one, but my solution just uses whats available on a stock linux install.

How can I locate duplicate binary files using freely available tools?

2 Answers2