3

I have two directories with thousands of files which contain more or less the same files.

How can I copy all files from dirA to dirB which are not in dirB or if the file exists in dirB only overwrite it if it's smaller.

I know there are a lot of examples for different timestamp or different file size but I only want to overwrite if the destination file is smaller and under no circumstances if it's bigger.

Background of my problem:
I've rendered a dynmap on my Minecraft Server but some of the tiles are missing or corrupted. Then I did the rendering again on another machine with a faster CPU and copied all the new rendered files (~50GB / 6.000.000 ~4-10 KB PNGs) on my server. After that I noticed that there are also corrupted files in my new render.

left: old render, right: new render

old 1 corrupted new 1

old 2 new 2 corrupted

Therefor I don't want to overwrite all files but only the ones which are bigger (the corrupted carry less data and are smaller).

das Keks
  • 377
  • 1
  • 4
  • 15
  • Use `cp` with combination of `cmp` commands or better use `rsync` that has all options you want – Alex Feb 03 '17 at 12:34
  • What option do I have to use with rsync? I didn't find anything for larger files only newer or different size. That's why I asked. – das Keks Feb 03 '17 at 12:47
  • Use `stat` on files in both locations to get files size and then copy if it satisfy your conditions then – Alex Feb 03 '17 at 13:02
  • Well, it a challenge, looked for `rsync` options you need but fail to find right one, so went with a simple way – Alex Feb 03 '17 at 23:28

5 Answers5

3

My problem had been similar. I wanted to synchronize files from a remote folder to a local one, but only copy the remote files which were bigger than the according local files.

My workaround with rsync was like that, which in fact a bash one-liner:

for x in $(ls -1 home/me/local/folder/*)
do
    eachsize=$(stat -c "%s")
    rsync -avz --progress --max-size=${eachsize} remote:/home/you/folder/${x} .
done

I think you can get the point, since the filenames are the same between the two folders, I go through each one in the local folder and keep its size, then I place it as a limit whether rsync should copy or not the remote file of the same name but different size.

user32916
  • 31
  • 1
  • Don’t use ``ls`` like that; just do ``for x in home/me/local/folder/*``. – G-Man Says 'Reinstate Monica' Sep 04 '17 at 13:01
  • You are right; just to make my point though. – user32916 Sep 04 '17 at 15:42
  • testing this approach I think using `ls` like that or not depends on how you want empty folders handled -- in such cases `for x in $(ls folder/*)` will execute the loop zero times, whereas `for x in folder/*` will execute the loop precisely one time (against the non-expandable `folder/*` literal as-if a filename) – humbletim Nov 24 '19 at 17:03
2

You can use rsync command

Syntax :

-a = archive mode
-v = increase verbosity
-z = compress file data during the transfer
--progress = show progress during transfer

rsync -avz --progress <source path> <destination path>

you can use --delete to delete extraneous files from destination directory

rsync -avz --delete --progress <source path> <destination path>

so your command will be:

rsync -avz --delete --progress dirA dirB
  • 1
    Doesn't the -a flag copies all files which have newer timestamp or different file size? It's important that only smaller files will be overwritten. – das Keks Feb 03 '17 at 12:57
  • this command will not overwrite anything, this will copy only changed file and new file which is not available under Destination Director. – Pankaj Jackson Feb 03 '17 at 21:17
  • Changed files will be overwritten in the destination. Regardless of the size of the destination file. Tested it with some data and the -a option is not what I need. – das Keks Feb 03 '17 at 21:31
2

May be a dirty way, but I hope it is what you are looking for

#!/bin/bash

### Purpose:
# Copy huge amount of files from source to destination directory only if
# destination file is smaller in size than in source directory
###

src='./d1' # Source directory
dst='./d2' # Destination directory

icp() {
  f="${1}";
  [ -d "$f" ] && {
    [ ! -d "${dst}${f#$src}" ] && mkdir -p "${dst}${f#$src}";
    return
  }

  [ ! -f "${dst}/${f#$src/}" ] && { cp -a "${f}" "${dst}/${f#$src/}"; return; }
  fsizeSrc=$( stat -c %s "$f" )
  fsizeDst=$( stat -c %s "${dst}/${f#$src/}" )
  [ ${fsizeDst} -lt ${fsizeSrc} ] && cp -a "${f}" "${dst}/${f#$src/}"
}

export -f icp
export src
export dst

find ${src} -exec bash -c 'icp "$0"' {} \;
Alex
  • 6,187
  • 1
  • 16
  • 25
  • Thanks. I tested it with some test data and it works as I need it. But when I want to execute it on my real data I have a problem because the directory contains too many files (about 6.000.000) :`ls argument list too long`) – das Keks Feb 04 '17 at 09:29
  • This is operation system limit (you can get it for your system as : `getconf ARG_MAX`). You probably have there pretty long file names or very deep directories structure, so when `find` feed `ls` with such names it exceed maximum allowed length for command line. I modified a little script to eliminate `ls` command, could you try this new version. – Alex Feb 04 '17 at 09:46
  • If script would choke again, you may try to reduce full path by mounting it to some short path. For example `sudo mkdir -m 777 /a` then mount source directory to `/a` as `sudo mount --bind /pretty/long/prefix/to/source/directory /a` then use `/a` in my script. When you done, unmount `/a` by issue command: `sudo umount /a` – Alex Feb 04 '17 at 10:09
  • I think it's not the path length since the longest path (including file name) is about 80 characters long. Could it be the list, which is passed to the for each, which is too long? I think this question targets something similar: http://unix.stackexchange.com/questions/128559/solving-mv-argument-list-too-long – das Keks Feb 04 '17 at 10:59
  • May be `diff --brief -r dir1/ dir2/` is a good approach and then do something for each line of the output. I'll try to construct something like this in the evening. – das Keks Feb 04 '17 at 11:20
  • @dasKeks You absolutely right, that the list of huge amount of files wont fit in for loop. I rewrote script completely, so it won't choke – Alex Feb 04 '17 at 11:40
  • What's the option `-f` meant to do? I get an error `export: Illegal option -f` – das Keks Feb 04 '17 at 11:58
  • It is a `bash`'s option. Change `#!/bin/sh` to `#/bin/bash` as i did in the updated script – Alex Feb 04 '17 at 12:12
2

rsync -va --append dirA dirB

just got from man rsync, and you can get more of what you want.

  • --append: append data onto shorter files
  • -v verbose
  • -a archive mode; same as -rlptgoD (no -H)

--append is very useful for me when the cp -ru a b process was interrupted several times and you also changed the updated time of files by chown user:user -R *. haha :)

michael wang
  • 121
  • 1
0

I've modified this to something like:

# Copy src to destination if the src is larger.
function copy_if_larger() {
  local src="$1"
  local dest="$2"

  [ ! -f "$1" ] return
  [ ! -f "$2" ] return

  local srcSize=$( stat -c %s "$1")
  local dstSize=$( stat -c %s "$2")

  [ ${dstSize} -lt ${srcSize} ] && {
    cp -a "$1" "$2"
  }
  return
}

Then I wrote another method to adjust the files that I want to copy and feed them into the copy_if_larger function.

function do_copy_if_larger() {
  # trim prefix
  local suffix=$(echo "$1" | cut -c 10-)
  copy_if_larger "$1" "/dest/path/$suffix"
}

# make the functions visible to the subshell.
export -f copy_if_larger
export -f do_copy_if_larger

# copy all larger jpeg files over /dest/path
find . -name '*jpg' | xargs -n 1 bash -c 'do_copy_if_larger "$@"' {}
Lar
  • 41
  • 4