How to compare two (vague) file lists and print the duplicates?

Question

I have two file lists. backup.txt and backup2.txt Some of the entries aren't exact, so it makes it difficult to find the duplicates with diff or uniq.

Example:

:::backup.txt:::
auser_backup
auser_backup2
buser_backup
cuser_backup

:::backup2.txt:::
auser.backup
auser.backup.2
buser
cuser

I was wondering if there is a way to compare these vaguely similar file lists, where auser_backup and auser.backup along with auser_backup2 and auser.backup.2 would be counted as duplicates.

Maybe there's another step to rename all the entries so that the formats are correct? I'm kind of at a loss.

You mean *all four* starting with `auser` seen as one duplicate (well, "quadroplicate")? — Jacob Vlijm, Jan 21 '15 at 10:24

score 1 · Answer 1 · answered Jan 21 '15 at 11:34

1

You're going to have to pre-process the files to "fix" the irregularities:

fixfile() { sed -r 's/([[:alpha:]])([[:digit:]]+)$/\1.\2/; s/\./_/g' "$1"; }
comm -12 <(fixfile backup.txt | sort) <(fixfile backup2.txt | sort)

auser_backup
auser_backup_2

answered Jan 21 '15 at 11:34

glenn jackman

17,625
2
37
60

That is neat, avoids lots of while loop. – NGRhodes Jan 21 '15 at 12:18

How to compare two (vague) file lists and print the duplicates?

1 Answers1