2

I have two file lists. backup.txt and backup2.txt Some of the entries aren't exact, so it makes it difficult to find the duplicates with diff or uniq.

Example:

:::backup.txt:::
auser_backup
auser_backup2
buser_backup
cuser_backup

:::backup2.txt:::
auser.backup
auser.backup.2
buser
cuser

I was wondering if there is a way to compare these vaguely similar file lists, where auser_backup and auser.backup along with auser_backup2 and auser.backup.2 would be counted as duplicates.

Maybe there's another step to rename all the entries so that the formats are correct? I'm kind of at a loss.

Fabby
  • 34,341
  • 38
  • 97
  • 191
mktoaster
  • 251
  • 1
  • 4
  • 11

1 Answers1

1

You're going to have to pre-process the files to "fix" the irregularities:

fixfile() { sed -r 's/([[:alpha:]])([[:digit:]]+)$/\1.\2/; s/\./_/g' "$1"; }
comm -12 <(fixfile backup.txt | sort) <(fixfile backup2.txt | sort)
auser_backup
auser_backup_2
glenn jackman
  • 17,625
  • 2
  • 37
  • 60