10

I'm learning bash code and today I'm studying the command grep.

if I run

$ ps -fU user | grep thunderbird

terminal shows:

user  17410     1  0 10:09 ?        00:00:20 /usr/lib/thunderbird/thunderbird
user  18990 15896  0 12:25 pts/1    00:00:00 grep --color=auto thunderbird

But if I run:

$ ps -fU user | grep [t]hunderbird

terminal shows:

user  17410     1  0 10:09 ?        00:00:20 /usr/lib/thunderbird/thunderbird

why? I read the guide but I don't understand.

linofex
  • 554
  • 9
  • 28

4 Answers4

14

There are two issues here. First, when you run ps | grep ..., the grep process is also shown in the output of ps. The default ps output includes the arguments a process was launched with, not only the process's name. This means that if you run grep foo, and there is a running process called foo, there will be two ps results matching foo: the foo process and the grep itself since it is searching for foo. This is why you get two lines when running ps -f | grep thunderbird.

Now, the [ ] is a regular expression construct which defines a list of characters, a character class. For example, [abc] will match a or b or c. When you run ps -f | grep [t]hunderbird, that class only contains a single character so is equivalent to thunderbird without the brackets. However, the grep process was launched with [t]hunderbird as an argument this time, and not thunderbird. Therefore, its line in the output of ps will contain [t]hunderbird. It will look like this:

terdon   23101 10991  0 16:53 pts/3    00:00:00 grep --color [t]hunderbird

This means that it is not matched when you run ps -f | grep thunderbird since it contains [t]hunderbird and not thunderbird.

This is a common trick to avoid matching the grep process itself when running ps | grep. Another alternative is to run ps -f | grep foo | grep -v grep to exclude the grep. The best approach, however, is to use a program specifically designed for this, pgrep:

$ pgrep -l thunderbird
11330 thunderbird
terdon
  • 98,183
  • 15
  • 197
  • 293
  • terdon, as @Serg has removed his answer I am putting it here.... timing is related to when the process will appear in the process table and hence in the output....I read this from an answer of Gilles on U&L and later experimented it myself..I am on my phone so could not show it right now.. – heemayl Nov 07 '15 at 14:58
  • 1
    @heemayl timing might affect whether or not the `grep` process is launched before or after the `ps` one so whether it will be present in the output of `ps`. It has nothing to do with the issue here which is that the _string_ `[t]hunderbird` does not match the regex `[t]hunderbird`. – terdon Nov 07 '15 at 15:00
  • that's what I am trying to tell :).. if `ps` does not have the pipeline's `grep`, it will never be on the output.. – heemayl Nov 07 '15 at 15:02
  • 1
    @heemayl yes, I know, but that's not why the `[t]hunderbird` is not matching the `grep` process. `ps =f | grep [t]hunderbird` will _never_ match the `grep` process since the regex doesn't match it. Timing is irrelevant, it won't match it even if it is there. – terdon Nov 07 '15 at 15:04
  • aye guru..but my point is if the `grep` itself does not appear in the `ps` by then, no need for `[]`.. only `grep thunderbird` would suffice..no? – heemayl Nov 07 '15 at 15:07
  • 1
    @heemayl yes, but I have never seen that happen. If you read in in one of Gilles's answers, I'm sure it does, but I have never seen it in the wild. Perhaps Gilles was talking about an embedded system or another flavor of *nix. I'm pretty sure that you'll never see this in a Linux box. – terdon Nov 07 '15 at 15:08
  • @heemayl I think you are referring to [this answer](http://unix.stackexchange.com/a/37597/22222) which states that the commands are run at the same time. Yes, one might start before the other but they run concurrently. If so, I don't know if it's even possible to have `ps` get the list of running processes before `grep` is launched. I very much doubt it. – terdon Nov 07 '15 at 15:11
  • sorry not that one..I know the processes start concurrently but it was about how fast `ps` can parse the process table & have something that started concurrently.... – heemayl Nov 07 '15 at 15:15
  • anyway I am gonna edit my answer & put this as a highly unlikely case as you have rightly pointed out.. – heemayl Nov 07 '15 at 15:16
  • 1
    @heemayl OK, I tested this and in 10000 tests, I got one case where the `grep` wasn't in the output of `ps`. So yes, it does happen but very, very rarely. As I said though, this is irrelevant here, the difference is in what `grep` is searching for. – terdon Nov 07 '15 at 17:26
  • Thank you for your reply. Yes I know how ' |' means. I read all answers. I think that I have understood. In the second case the process is [t]hunderbird, so grep find only the process 'thunderbird'. It is true? – linofex Nov 07 '15 at 17:37
  • Yeah..that's fairly conclusive..I have edited my answer already :) – heemayl Nov 07 '15 at 17:40
  • @linofex sort of. In the second case, the _grep_ process is `grep [t]hunderbird` so the grep doesn't match it. Try running `grep [t]hunderbird` (just like that, no arguments, hit enter and it will stay there doing nothing). Then, open another terminal and run `ps -af | grep grep`. – terdon Nov 07 '15 at 17:40
  • `ps -f | grep foo | grtep -v grep` should be `ps -f | grep foo | grep -v grep`. I only found 3 characters to change in the entire answer and it won't allow an edit with less than 6 characters. – Joe Nov 12 '15 at 12:40
6

In the first case you're looking for any process with the word thunderbird in. There are two thunderbird and the grep command itself.

In the second you're also looking for t character followed by hunderbird, as the [t] means match any listed characters in the square bracket of which there's just the one, the letter t, but this time your two processes are

user  17410     1  0 10:09 ?        00:00:20 /usr/lib/thunderbird/thunderbird
user  18990 15896  0 12:25 pts/1    00:00:00 grep --color=auto [t]hunderbird

The second process does not match because the rexep [t]hunderbird does not match the literal string [t]hunderbird as the ] prevents the match.

Sergiy Kolodyazhnyy
  • 103,293
  • 19
  • 273
  • 492
Robert Longson
  • 687
  • 8
  • 21
  • 1
    Just a **TL;DR** for the lazy readers: treat output of second `ps` as text, and realize that second grep matches `t` and `hunderbird` , not `[`, `t`,`]`, and `hunderbird , which is what second process looks like as string – Sergiy Kolodyazhnyy Nov 07 '15 at 15:12
  • @Serg perhaps you should check the comments below terdon's answer, regarding the timing issue.. – heemayl Nov 07 '15 at 15:56
3

Firstly, whether grep --color=auto thunderbird will appear on the process table and hence on the output of ps depends on the time i.e. how busy is your system & how much time it takes for ps to show something on it's output (by parsing the process table) that has started concurrently, grep in this case. Although this is a highly unlikely case and we can assume that that grep will appear on the output of ps.

Now, [] is a grep syntax which means (if not followed by other tokens) match any of the characters inside []. So when you use grep '[t]hunderbird', grep treats the [t] to match only t, as a result it would not appear in the output.

While using grep thunderbird, if it makes it way in the process table, we would find the grep process in the output as we are grep-ping for the same thing i.e. grep thunderbird.

Also note that grep is an alias to grep --color=auto causing it to appear in the output too.

heemayl
  • 90,425
  • 20
  • 200
  • 267
1

The [] in grep is used for character matching. If we take grep [tb]all my_file.txt

It is equivalent to

grep tall my_file.txt
grep ball my_file.txt

It will execute grep with taking t+all and then it will execute grep by taking b+all.

For example: If we want to look for a word ABC or BBC in a file we can use the following grep command:

grep [AB]BC file_name 

Here the [] will make grep to expand the the word by taking A, then it will expand the word by using B, thus creating ABC and BBC.

Bidyut
  • 759
  • 7
  • 14