1

How can I use GNU sort and uniq to have the most common occurrences on top instead of numerical or alphanumerical sorting? Example list.txt:

1
2
2
2
3
3

Since '2' occurs 3 times, should be on top, followed by '3' and '1' like this:

$ cat list.txt | "some sort/uniq magic combo"
2
3
1
719016
  • 4,177
  • 16
  • 59
  • 91

1 Answers1

4

Like this:

cat list.txt | sort | uniq -c | sort -rn

The -c includes the count of each unique line and then you sort by that.

If you want to remove the count after sorting, do so:

cat list.txt | sort | uniq -c | sort -rn | awk '{ print $2; }'
Doug Harris
  • 27,333
  • 17
  • 78
  • 105
  • I've been doing this for ages, and for moderate size tasks it works well. However every so often I find myself with gigabytes of log data to go through and doing a sort on that requires a lot of disk space that is for duplicate lines that you throw away in the next step. There are better algorithms, but I don't know good simple command line tools for solving this problem at a larger scale. – mc0e Feb 18 '15 at 15:48