227

I have two directories that should contain the same files and have the same directory structure.

I think that something is missing in one of these directories.

Using the bash shell, is there a way to compare my directories and see if one of them is missing files that are present in the other?

Braiam
  • 66,947
  • 30
  • 177
  • 264
AndreaNobili
  • 4,379
  • 10
  • 26
  • 36

17 Answers17

198

You can use the diff command just as you would use it for files:

diff <directory1> <directory2>

If you want to see subfolders and -files too, you can use the -r option:

diff -r <directory1> <directory2>
Alex R.
  • 2,277
  • 2
  • 10
  • 8
  • 3
    Didn't know `diff` works for directories as well(man diff confirmed that), but this doesn't recursively check for changes in subdirectories inside subdirectories. – jobin Feb 16 '14 at 17:04
  • 2
    @Jobin That's strange... For me, it does work. – Alex R. Feb 16 '14 at 17:07
  • 1
    I have something like this: `a/b/c/d/a`, `x/b/c/d/b`. See what `diff a x` gives you. – jobin Feb 16 '14 at 17:09
  • 4
    You have to use the `-r` option. That (`diff -r a x`) gives me: `Only in a/b/c/d: a. only in x/b/c/d: b.` – Alex R. Feb 16 '14 at 17:11
  • Cool! It works! +1. Diff just got more powerful(for me)! :) – jobin Feb 16 '14 at 17:12
  • 8
    diff show me the difference INTO files but not if a directory contains a file that the other one not contains !!! I don't need know the differences into file but also if a file exist in a directory and not in the other one – AndreaNobili Feb 16 '14 at 17:17
  • is this a function that allows us to see what files are the same between two folders – BenKoshy Feb 12 '16 at 06:18
  • 3
    @AndreaNobili, GNU diff shows `Only in directory1/path` for files in only one of the folders. – joeytwiddle Mar 06 '16 at 04:11
  • I'm looking for a way to diff two dirs and also include file attributes (timestamps, permissions etc). any ideas? – cavalcade Oct 01 '16 at 03:31
  • To additionally get information for files files which are equal both by filename *and* contents, use `-s`. Example: `diff -s directory-a directory-b` – Abdull Oct 27 '21 at 19:00
155

A good way to do this comparison is to use find with md5sum, then a diff.

Example

Use find to list all the files in the directory then calculate the md5 hash for each file and pipe it sorted by filename to a file:

find /dir1/ -type f -exec md5sum {} + | sort -k 2 > dir1.txt

Do the same procedure to the another directory:

find /dir2/ -type f -exec md5sum {} + | sort -k 2 > dir2.txt

Then compare the result two files with diff:

diff -u dir1.txt dir2.txt

Or as a single command using process substitution:

diff <(find /dir1/ -type f -exec md5sum {} + | sort -k 2) <(find /dir2/ -type f -exec md5sum {} + | sort -k 2)

If you want to see only the changes:

diff <(find /dir1/ -type f -exec md5sum {} + | sort -k 2 | cut -f1 -d" ") <(find /dir2/ -type f -exec md5sum {} + | sort -k 2 | cut -f1 -d" ")

The cut command prints only the hash (first field) to be compared by diff. Otherwise diff will print every line as the directory paths differ even when the hash is the same.

But you won't know which file changed...

For that, you can try something like

diff <(find /dir1/ -type f -exec md5sum {} + | sort -k 2 | sed 's/ .*\// /') <(find /dir2/ -type f -exec md5sum {} + | sort -k 2 | sed 's/ .*\// /')

This strategy is very useful when the two directories to be compared are not in the same machine and you need to make sure that the files are equal in both directories.

Another good way to do the job is using Git’s diff command (may cause problems when files has different permissions -> every file is listed in output then):

git diff --no-index dir1/ dir2/
Zanna
  • 69,223
  • 56
  • 216
  • 327
Adail Junior
  • 1,776
  • 1
  • 10
  • 6
63

Through you are not using bash, you can do it using diff with --brief and --recursive:

$ diff -rq dir1 dir2 
Only in dir2: file2
Only in dir1: file1

The man diff includes both options:

-q, --brief
report only when files differ

-r, --recursive
recursively compare any subdirectories found

Braiam
  • 66,947
  • 30
  • 177
  • 264
28

Maybe one option is to run rsync two times:

rsync -rtOvcs --progress -n /dir1/ /dir2/

With the previous line, you will get files that are in dir1 and are different (or missing) in dir2.

rsync -rtOvcs --progress -n /dir2/ /dir1/

The same for dir2

#from the rsync --help :
-n, --dry-run               perform a trial run with no changes made

-r, --recursive             recurse into directories
-t, --times                 preserve modification times
-O, --omit-dir-times        omit directories from --times
-v, --verbose               increase verbosity
    --progress              show progress during transfer
-c, --checksum              skip based on checksum, not mod-time & size
-s, --protect-args          no space-splitting; only wildcard special-chars

You can delete the -n option to undergo the changes. That is copying the list of files to the second folder.

In case you do that, maybe a good option is to use -u, to avoid overwriting newer files.

-u, --update                skip files that are newer on the receiver

A one-liner:

rsync -rtOvcsu --progress -n  /dir1/ /dir2/ && rsync -rtOvcsu --progress -n /dir2/ /dir1/
Ferroao
  • 745
  • 1
  • 7
  • 22
17

Here is an alternative, to compare just filenames, and not their contents:

diff <(cd folder1 && find . | sort) <(cd folder2 && find . | sort)

This is an easy way to list missing files, but of course it won't detect files with the same name but different contents!

(Personally I use my own diffdirs script, but that is part of a larger library.)

joeytwiddle
  • 1,887
  • 1
  • 21
  • 27
  • 3
    You'd better use process substitution, not temp files... – mniip Feb 16 '14 at 18:03
  • 3
    Note that this does not support file names with certain special characters, in that case you might want to use zero-delimiters which AFAIK `diff` is not supporting as of now. But there is `comm` which is supporting it since http://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=f3b4def577c4eee22f83b72d1310aa1d9155908d so once it comes to a coreutils near you, you can do `comm -z <(cd folder1 && find -print0 | sort) <(cd folder2 && find -print0 | sort -z)` (whose output you might have to further convert in the format you need using the `--output-delimiter`parameter and additional tools). – phk Mar 05 '16 at 21:52
12

I would like to suggest a great tool that I have just discover: MELD.

It works properly and everything you can do with the command diff on Linux-based system, can be there replicated with a nice Graphic Interface!

For instance, the comparison of directories is straightforward:

directories comparison

and also the files comparison is made easier:

files comparison

There is a nice integration with some control version (for instance Git) and can be used as merge tool. See the complete documentation on its website.

Leos313
  • 1,826
  • 3
  • 18
  • 31
  • 1
    Great recommendation. I use Meld all the time for text file comparison, but had forgotten that it could do directories as well. My only gripe is that the UI doesn't resize in a way that lets me see long paths completely. – John T Jul 15 '21 at 10:47
5

If you want to make each file expandable and collapsible, you can pipe the output of diff -r into Vim.

First let's give Vim a folding rule:

mkdir -p ~/.vim/ftplugin
echo "set foldexpr=getline(v:lnum)=~'^diff.*'?'>1':1 foldmethod=expr fdc=2" >> ~/.vim/ftplugin/diff.vim

Now just:

diff -r dir1 dir2 | vim - -R

You can hit zo and zc to open and close folds. To get out of Vim, hit :q<Enter>

The -R is optional, but I find it useful alongside - because it stops Vim from bugging you to save the buffer when you quit.

joeytwiddle
  • 1,887
  • 1
  • 21
  • 27
5

Inspired by Sergiy's reply, I wrote my own Python script to compare two directories.

Unlike many other solutions it doesn't compare contents of the files. Also it doesn't go inside subdirectories which are missing in one of the directories. So the output is quite concise and the script works fast with large directories.

#!/usr/bin/env python3

import os, sys

def compare_dirs(d1: "old directory name", d2: "new directory name"):
    def print_local(a, msg):
        print('DIR ' if a[2] else 'FILE', a[1], msg)
    # ensure validity
    for d in [d1,d2]:
        if not os.path.isdir(d):
            raise ValueError("not a directory: " + d)
    # get relative path
    l1 = [(x,os.path.join(d1,x)) for x in os.listdir(d1)]
    l2 = [(x,os.path.join(d2,x)) for x in os.listdir(d2)]
    # determine type: directory or file?
    l1 = sorted([(x,y,os.path.isdir(y)) for x,y in l1])
    l2 = sorted([(x,y,os.path.isdir(y)) for x,y in l2])
    i1 = i2 = 0
    common_dirs = []
    while i1<len(l1) and i2<len(l2):
        if l1[i1][0] == l2[i2][0]:      # same name
            if l1[i1][2] == l2[i2][2]:  # same type
                if l1[i1][2]:           # remember this folder for recursion
                    common_dirs.append((l1[i1][1], l2[i2][1]))
            else:
                print_local(l1[i1],'type changed')
            i1 += 1
            i2 += 1
        elif l1[i1][0]<l2[i2][0]:
            print_local(l1[i1],'removed')
            i1 += 1
        elif l1[i1][0]>l2[i2][0]:
            print_local(l2[i2],'added')
            i2 += 1
    while i1<len(l1):
        print_local(l1[i1],'removed')
        i1 += 1
    while i2<len(l2):
        print_local(l2[i2],'added')
        i2 += 1
    # compare subfolders recursively
    for sd1,sd2 in common_dirs:
        compare_dirs(sd1, sd2)

if __name__=="__main__":
    compare_dirs(sys.argv[1], sys.argv[2])

If you save it to a file named compare_dirs.py, you can run it with Python3.x:

python3 compare_dirs.py dir1 dir2

Sample output:

user@laptop:~$ python3 compare_dirs.py old/ new/
DIR  old/out/flavor-domino removed
DIR  new/out/flavor-maxim2 added
DIR  old/target/vendor/flavor-domino removed
DIR  new/target/vendor/flavor-maxim2 added
FILE old/tmp/.kconfig-flavor_domino removed
FILE new/tmp/.kconfig-flavor_maxim2 added
DIR  new/tools/tools/LiveSuit_For_Linux64 added

P.S. If you need to compare file sizes and file hashes for potential changes, I published an updated script here: https://gist.github.com/amakukha/f489cbde2afd32817f8e866cf4abe779

Andriy Makukha
  • 151
  • 1
  • 5
  • 1
    Thanks, I added an optional third param regexp to skip/ignore https://gist.github.com/mscalora/e86e2bbfd3c24a7c1784f3d692b1c684 to make just what I needed like: `cmpdirs dir1 dir2 '/\.git/'` – Mike Feb 18 '18 at 22:15
4

Fairly easy task to achieve in python:

python -c 'import os,sys;d1=os.listdir(sys.argv[1]);d2=os.listdir(sys.argv[2]);d1.sort();d2.sort();x="SAME" if d1 == d2 else "DIFF";print x' DIR1 DIR2

Substitute actual values for DIR1 and DIR2.

Here's sample run:

$ python -c 'import os,sys;d1=os.listdir(sys.argv[1]);d2=os.listdir(sys.argv[2]);d1.sort();d2.sort();x="SAME" if d1 == d2 else "DIFF";print x' Desktop/ Desktop
SAME
$ python -c 'import os,sys;d1=os.listdir(sys.argv[1]);d2=os.listdir(sys.argv[2]);d1.sort();d2.sort();x="SAME" if d1 == d2 else "DIFF";print x' Desktop/ Pictures/
DIFF

For readability, here's an actual script instead of one-liner:

#!/usr/bin/env python
import os, sys

d1 = os.listdir(sys.argv[1])
d2 = os.listdir(sys.argv[2])
d1.sort()
d2.sort()

if d1 == d2:
    print("SAME")
else:
    print("DIFF")
Sergiy Kolodyazhnyy
  • 103,293
  • 19
  • 273
  • 492
  • 2
    Note that the [`os.listdir`](https://docs.python.org/2/library/os.html#os.listdir) doesn't give any specific order. So the lists might have the same things in different order and the comparison would fail. – muru Nov 14 '16 at 06:15
  • 1
    @muru good point, I'll include sorting to that – Sergiy Kolodyazhnyy Nov 14 '16 at 06:17
2

Adail Junior's nice answer might have an issue in time execution if you have hundreds of thousands of files! So here is another way to do it. Say you want to compare all the filenames of folder A with all the filenames of folder B. Step 1, cd to folder A and do:

find . | sort -k 2 > listA.txt

Step 2, cd to folder B and do:

find . | sort -k 2 > listB.txt

Step 3, take the diff of listA.txt and listB.txt

I tried that in folders containing half a million txt files and in less than 30 secs I had the diff on my screen, whereas computing the md5sums and then piping and then appending can be very very time consuming. Note also the original question is asking for comparing filenames (not their content!) and check if there are files missing between the folders under comparison! Thanks

pebox11
  • 537
  • 1
  • 3
  • 14
1

As already noted, you can also use the comm command, e.g. this way:

comm -3 <(ls -1 dir1) <(ls -1 dir2)

This compares the contents of the 2 directories, showing only 2 columns, each with files unique to that directory.

muru
  • 193,181
  • 53
  • 473
  • 722
1

On a slow file system, diff might take a while, but I have made good experiences with rsync, as it works well incrementally:

rsync --recursive --progress --delete --links --dry-run

Aliased as rdiff, this is an example run:

> rdiff test/ testuser
sending incremental file list
deleting .sudo_as_admin_successful
.bash_history
.bash_logout
.bashrc
.profile

It obviously only lists files without diffing them, but I find that tremendously useful already.

xeruf
  • 382
  • 1
  • 3
  • 9
0

I'll add to this list a NodeJs alternative that I've written some time ago.

dir-compare

npm install dir-compare -g
dircompare dir1 dir2
gliviu
  • 21
  • 2
0

You could use this tool:

https://github.com/jfabaf/comparefolders/

I developed it a few years ago because I had same problem.

It compares MD5 of files, so It doesn't matter the name of files.

jfabaf
  • 1
0

Answers using "batteries included" Python miss such battery - filecmp module:

https://docs.python.org/3/library/filecmp.html

Sample solution from Python's docs:

#!/usr/bin/env python

from filecmp import dircmp


def print_diff_files(dcmp):
    for name in dcmp.diff_files:
        print(f"diff_file {name} found in {dcmp.left} and {dcmp.right}")

    for sub_dcmp in dcmp.subdirs.values():
        print_diff_files(sub_dcmp)


dcmp = dircmp("dir1", "dir2")
print_diff_files(dcmp)
murla
  • 1
  • 2
0

Unison

The text mode program unison and GUI program unison-gtk can be installed with

sudo apt update
sudo apt install unison

Unison is dedicated to synchronize directory trees within computers and between computers.

  • There is a comparison
  • You can inspect the result and decide if/how you want to modify the default action (which updates to the newest status)
  • Finally files are transferred according to the selected actions

See man unison

You can find explanations of the options in man ffmpeg

This manual page briefly documents Unison, and was written for the Debian GNU/Linux distribution because the original program does not have a manual page. For a full description, please refer to the inbuilt documentation or the manuals in /usr/share/doc/unison/. The unison-2.48.4-gtk binary has similar command-line options, but allows the user to select and create profiles and configure options from within the program.

Unison is a file-synchronization tool for Unix and Windows. It allows two replicas of a collection of files and directories to be stored on different hosts (or different disks on the same host), modified separately, and then brought up to date by propagating the changes in each replica to the other.

Unison offers several advantages over various synchronization methods such as CVS, Coda, rsync, Intellisync, etc. Unison can run on and synchronize between Windows and many UNIX platforms. Unison requires no root privileges, system access or kernel changes to function. Unison can synchronize changes to files and directories in both directions, on the same machine, or across a network using ssh or a direct socket connection.

Transfers are optimised using a version of the rsync protocol, making it ideal for slower links. Unison has a clear and precise specification, and is resilient to failure due to its careful handling of the replicas and its private structures.

The two roots can be specified using an URI or a path. The URI must follow the convention:

protocol://[user@][host][:port][/path]. The protocol part can be `file, socket, ssh or rsh`.

There is a learning curve, but it is worth the effort :-)

sudodus
  • 45,126
  • 5
  • 87
  • 151
0

Dont see this in the answers but if you want to check for filesnames excluding extensions such that hello.png matches hello.zip I just used nested for loops.

for f in *.zip; do for f2 in ./dirtwo/*.png; 
    #Your logix here
done;done

for example my full code is

for f in *.zip; do 
    for f2 in ./ArcadeBezels/*.png; do
        if [ "${f:0:-4}" = "${f2:15:-4}" ]; then 
        # I'm sure something more readable can be found but this strips the last 4 chars (file ext) of both filenames and the first 15 chars of second filename to remove "./ArcadeBezels/"
            rm "$f";
        fi;
    done;
done
Leathan
  • 125
  • 9