gnu sort/uniq: sorting by number of times

Question

How can I use GNU sort and uniq to have the most common occurrences on top instead of numerical or alphanumerical sorting? Example list.txt:

Since '2' occurs 3 times, should be on top, followed by '3' and '1' like this:

$ cat list.txt | "some sort/uniq magic combo"
2
3
1

score 4 · Accepted Answer · answered Jan 24 '12 at 16:10

4

Like this:

cat list.txt | sort | uniq -c | sort -rn

The -c includes the count of each unique line and then you sort by that.

If you want to remove the count after sorting, do so:

cat list.txt | sort | uniq -c | sort -rn | awk '{ print $2; }'

answered Jan 24 '12 at 16:10

Doug Harris

I've been doing this for ages, and for moderate size tasks it works well. However every so often I find myself with gigabytes of log data to go through and doing a sort on that requires a lot of disk space that is for duplicate lines that you throw away in the next step. There are better algorithms, but I don't know good simple command line tools for solving this problem at a larger scale. – mc0e Feb 18 '15 at 15:48

1 Answers1