2

I need to extract log data from many terrabytes worth of log files. The thing is, the data I need starts and ends with patterns I can identify, but the code between can be anything between 10 and 100+ lines.

Example:

Start
# lots of lines here
End

Currently, what I do is grep -A 50 "Start", which gives me the Start and the 50 lines thereafter. However, in almost all cases that is more or less than I need. More meaning the resulting report file grows Gigabytes larger than it needs to be and less meaning I don't get the information I need.

Is there a way to extract exactly what I need, using standard Unix / Linux tools?

1 Answers1

4

Try it with awk:

awk '/^Start/,/^End/' file

or if you prefer sed:

sed -n '/Start/,/End/p' file
Simon
  • 3,943
  • 2
  • 24
  • 40
  • @SeanPatrickFloyd you're welcome. I added as well a solution with sed. – Simon Apr 08 '13 at 14:36
  • Nice code there. Do you mind explaining how `sed` works in this case? I tried to figure it out by myself checking the man page on ss64 and doing some trial and error experiments, but I still don't get it. :) Thanks. –  Apr 08 '13 at 14:47
  • @Radoo **sed -n** -> Suppresses the default output. **'** -> beginning of the filter command **/Start/** -> regular expression **,** -> separator **/End/** -> regular expression **p** -> Print. Copy the pattern space to the standard output. **'** -> end of the filter **file** -> file name – Simon Apr 08 '13 at 14:59
  • unfortunately, the log files are gzipped, so I have to use a two step version: `zgrep -n -A 140 "Start" http*/p*/some.log.pattern.*.gz | sed -n '/Start/,/End/p' > /tmp/output.txt` but it does work like a charm. Thanks! – Sean Patrick Floyd Apr 08 '13 at 15:17
  • @Simon Well, that I figure it out, but couldn't find anything about that separator `,`. The help only says about '/regexp/substitution/'. What kind of sorcery is that? Where can I find a thoroughly explanation about how sed works? :) –  Apr 08 '13 at 15:23
  • 1
    @Radoo: check the "Addresses" section of [the man page](http://linux.die.net/man/1/sed) -- this sed command uses two addresses (both of which are regular expressions) to select ranges of lines to apply the "p" command to. – Gordon Davisson Apr 08 '13 at 15:39