1

I have nested folders with a bunch of files inside that are hardlinked to each other. I would like to break the hardlinks (convert them into separate files), but then immediately convert each pair into a reflink (so they have different inodes but use the same section of disk).

find -type f -links +1

will find all the hardlinks, while a command like

cp --reflink=always my_file.bin my_file_copy.bin

will copy a file without using any more disk space, creating it as a reflink.

How do I combine these to go through a whole set of nested folders and convert each hardlink into a reflink, replacing them with the same filename?

endolith
  • 7,507
  • 25
  • 84
  • 121

2 Answers2

3

You tagged ubuntu, I understand you are not limited to strictly POSIX tools and their POSIX options.

find . -type f -links +1 -execdir sh -c '
    tmp="$(TMPDIR=. mktemp)" &&
    cp -p --reflink=always -- "$1" "$tmp" &&
    mv -f -- "$tmp" "$1"
' find-sh {} \; -print

Notes:

  • This converts hardlinks to reflinks, i.e. my_file.bin hardlink becomes my_file.bin reflink. There will be no my_file_copy.bin. (This note is in case you want to create my_file_copy.bin reflink while leaving my_file.bin hardlink intact. The question is not crystal clear in this matter, it introduces my_file_copy.bin for some reason.)
  • If mktemp or cp fails then mv will not be performed. In any case you shouldn't lose the original content, unless some other process modifies the temporary file.
  • Because find tests files one by one, it will never overwrite (convert) all the hardlinks to any inode. If all the hardlinks are processed by find then -links +1 will fail for the last one. The original inode will survive. This means if the original file is open and going to be modified in place (without changing the inode number) then the modification will survive somewhere (but it's hard to tell in advance which hardlink will be processed last and will keep its inode number). A situation when an open file gets totally unlinked, modified as such and removed from the filesystem as soon as it's closed shouldn't happen.
  • If cp or mv fails then the temporary file will survive. You may want to capture stderr to a file (2>some_file) and investigate later.
  • -print will act if the shell code succeeds. It's only there so you can see something happens.
  • find-sh is explained here: What is the second sh in sh -c 'some shell code' sh?
Kamil Maciorowski
  • 69,815
  • 22
  • 136
  • 202
-1

Edit: As pointed out by Kamil, don't do the for x in $(find ...). Using the find -execdir sh -c format is the proper way to use find output. I'll leave my answer here however.

You can write a small Bash script or directly write a for loop in your bash shell:

$ for filename in $(find -type f -links +1); do echo "I found this file: ${filename}"; done

This example will take each line from the find command and place it in a ${filename} variable that you can then use. Here, we are just printing a I found this file: $filename for each one, but you can replace that with your copy command, which would probably look something like this:

$ for filename in $(find -type f -links +1); do echo "Copying ${filename} to ${filename}_copy.bin"; cp --reflink=always ${filename} ${filename}_copy.bin; done

Or, if you want to put this in a Bash script instead so it's easier to read and work with. Create a file copy_script.sh with these contents:

#!/bin/bash
for filename in $(find -type f -links +1); do
    echo "Copying ${filename} to ${filename}_copy.bin"
    cp --reflink=always "${filename}" "${filename}_copy.bin"
done

Then save and run with $ bash ./copy_script.sh

staehle
  • 695
  • 5
  • 12
  • 2
    (1) `for f in $(find …)` is the [Bash pitfall number 1](https://mywiki.wooledge.org/BashPitfalls#for_f_in_.24.28ls_.2A.mp3.29). (2) Later you're not [quoting right](https://unix.stackexchange.com/a/131767/108618). A filename containing e.g literal `*` can make your code misbehave even more. (3) If `${filename}_copy.bin` already exists then its content will be lost. // For these reasons I'm downvoting the answer, the code is just too buggy. Additionally you're not converting the original, you're crating a copy alongside. – Kamil Maciorowski Jan 16 '21 at 21:23
  • (1) Ah, you are correct. I've been working for too long with bash scripts with automated filenames without spaces in them. I've edited my answer to reflect that. (2) Edited answer to fix quoting, though again, only related to filenames with spaces. (3) This I'll disagree with, not because you're wrong, but I was just using the commands to be used as stated by OP. – staehle Jan 16 '21 at 21:33
  • Now I see the question can be interpreted in a way that allows your approach to (3). – Kamil Maciorowski Jan 16 '21 at 21:37