4

If I have a file A containing a list of fields:

2017-04-23
2017-04-30
2017-05-07
2017-05-14
2017-05-21
2017-05-28
2017-06-04
2017-06-11
2017-06-18
2017-06-25

And another file B containing a list of fields:

2017-04-23
2017-04-30
2017-05-07
2017-05-14
2017-05-21
2017-05-28
2017-06-04
2017-06-11
2017-06-18
2017-06-25
2017-07-02
2017-07-09
2017-07-16
2017-07-23

How can I quickly diff these two files where I want to know all fields in file B which are not present in file A?

This is not a regular diff where I want to see a relative difference between files but more like a hash comparison where each line is an entry in a map. I want to get a list of all lines in file B which are not present in file A so that I can remove them where each line in file A represents a directory which is to be preserved.

I am looking for a Bash/CoreUtils solution.

Kamil Maciorowski
  • 69,815
  • 22
  • 136
  • 202
Zhro
  • 837
  • 2
  • 15
  • 31

3 Answers3

3

If your files are sorted, you can use comm:

$ comm -13 A B
2017-07-02
2017-07-09
2017-07-16
2017-07-23

with options:

  • -1 : suppress column 1 (lines unique to FILE1)
  • -3 : suppress column 3 (lines that appear in both files)
Gohu
  • 924
  • 1
  • 7
  • 16
  • 1
    And if they're not sorted, you can sort them with process substitution `<(sort filename)` – Barmar Jan 12 '18 at 17:30
2

grep is the right tool for the job, although it's neither Bash nor from CoreUtils:

grep -Fxvf A B

All these options are POSIX compliant. From man 1 grep:

-f pattern_file

Read one or more patterns from the file named by the pathname pattern_file. Patterns in pattern_file shall be terminated by a . A null pattern can be specified by an empty line in pattern_file. Unless the -E or -F option is also specified, each pattern shall be treated as a BRE, as described in the Base Definitions volume of POSIX.1-2008, Section 9.3, Basic Regular Expressions.

-F

Match using fixed strings. Treat each pattern specified as a string instead of a regular expression. If an input line contains any of the patterns as a contiguous sequence of bytes, the line shall be matched. A null string shall match every line.

-v

Select lines not matching any of the specified patterns. If the -v option is not specified, selected lines shall be those that match any of the specified patterns.

-x

Consider only input lines that use all characters in the line excluding the terminating to match an entire fixed string or regular expression to be matching lines.

Kamil Maciorowski
  • 69,815
  • 22
  • 136
  • 202
1

Another way with some pipes

cat A B|sort|uniq -u

edit- UUOC

There is no need of cat

sort A B|uniq -u
Paulo
  • 646
  • 3
  • 9
  • This treats `A` and `B` equally, while in the original problem these files are not interchangeable. What if there is a line in `A` which is not in `B`? – Kamil Maciorowski Jan 10 '18 at 15:46
  • @Kamil Yes, you're right. I misunderstood the question, this will print all lines not duplicated on both files, which is not what OP wants. – Paulo Jan 10 '18 at 15:53
  • 1
    Fix: `sort A A B | uniq -u`. :) – Kamil Maciorowski Jan 10 '18 at 15:56
  • It works :) nice fix. But there is another problem with my solution, the output will appear sorted, maybe would be a problem for OP purposes. – Paulo Jan 10 '18 at 16:03
  • Did you read the last paragraph of the question? He just wants to get a list of directories to remove, it doesn't sound like order matters. Also, his input files appear to be sorted. – Barmar Jan 12 '18 at 17:32