Sunday, April 01, 2007

How to Search Logs Using grep, Part 1

Here is something that I could write a book about.. or a few good chapters on. grep is one of the key tools in the traditional Unix arsenal for tearing through text files and finding exactly what you want very, very quickly. It doesn't take long to master if you have the right tools.

First, you're going to need to understand how to use pipes. If you aren't familiar with pipes or use them regularly, it would definitely be worth your while to dig into this. If folks would be interested in a complete pipe tutorial here, or may know of a good one online, please comment. That said, I will give a short overview.

suppose we have a text file called "data.txt" with the following contents:

delta is 4th
alpha is 1st
gamma is 3rd
beta is 2nd

the following command would display the contents of that file in your terminal

cat data.txt
delta is 4th
alpha is 1st
gamma is 3rd
beta is 2nd

What "cat data.txt" did was really read the file, line by line, and output it to your terminal, line by line. Yes, line by line, is the key term here. The term for "output to terminal" is standard out. We will use that going forward.

suppose I wanted to do something useful to this data. I can combine the "cat" command with the "sort" command. What "sort" does is read, line by line, everything you give it until it detects the end of the file. Suppose we type the following command:


It just sits there, doing nothing. It's waiting for some data to come in on the terminal (or, better termed, standard input). That's pretty useless most of the time! But remember that "cat" will read a file and send it, line by line, to the terminal? Well, using a pipe, we can take those lines from cat and feed them into sort.

cat data.txt | sort
alpha is 1st
beta is 2nd
gamma is 3rd
delta is 4th

Now we have something useful! What we have done above can be described by this statement "Take the output of cat data.txt and pipe it through sort". Many, many commands work in Unix (or Linux/MacOS/etc) will act like "sort" did and accept input line by line. by stringing together commands that print output to the terminal and commands that read from the terminal, you can do some very powerful things. grep is one of those commands.


Now that we have the basics of pipes squared away, we can get into some more interesting and useful stuff. grep can be described as a program that reads from standard input, tests each line against a pattern, and writes to standard output the lines that match this pattern. It can do a lot more, but this is a good working definition to start. Here's an example:

cat data.txt | grep gamma
gamma is 4th

What we've done here told "cat" to read every line of the file "data.txt" and pipe it into grep. grep took each line that came in and checked to see if the pattern "gamma" appeared on that line. when it did, it displayed the line. What happens if no lines match the pattern?

cat data.txt | grep epsilon

grep only outputs the lines that match. If no lines match, then nothing is sent to standard output.

Note that grep reads in lines from standard input and outputs lines to standard output. That means it can be both a consumer and a provider of lines for other commands that can process standard input. That is huge... More on that later.

Let's try a more complex example with the same file.

cat data.txt | grep l
delta is 4th
alpha is 1st

Great, we matched every line with an "l" (the letter l) in it and displayed it to standard input. Looks like it's out of order, though, so lets sort it after it comes out of grep.

cat data.txt | grep l | sort
alpha is 1st
delta is 4th

So we had "cat" read data.txt line by line, piped it through grep looking for "l" and piped the results through sort. You can chain commands like this indefinitely as long as they're reading from standard in and outputting to standard out.

Lets try something else:

cat data.txt | grep l | grep p
alpha is 1st

grep can read another grep's output!

Let's work on some logs now. Suppose I have an apache log where I'd like to see all of the lines that match a hit to a certain URL. Lets try this:

cat /var/log/httpd/access.log | grep "GET /signup.jsp" - - [01/Apr/2007:18:19:45 -0700] "GET /signup.jsp HTTP/1.1" 200 4664 "-" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv: Gecko/20070309 Firefox/" - - [01/Apr/2007:18:22:48 -0700] "GET /signup.jsp HTTP/1.1" 200 4664 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv: Gecko/20070312 Firefox/" - - [01/Apr/2007:18:23:08 -0700] "GET /signup.jsp HTTP/1.1" 200 4664 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv: Gecko/20070312 Firefox/"

Great. now we have searched the entire log and filtered out only those hits to that particular IP. What if I wanted to know who came in on a Mac?

cat /var/log/httpd/access.log | grep "GET /signup.jsp" | grep "Mac OS X" - - [01/Apr/2007:18:19:45 -0700] "GET /signup.jsp HTTP/1.1" 200 4664 "-" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv: Gecko/20070309 Firefox/"

That covers basic grepping. To review, you can chain as many grep commands as you like. This allows you to filter the output of one grep command with a more specific pattern.

grep has some more useful options as well:

grep -v pattern

the -v will search for "pattern" and show you the lines that DON'T match. This is useful for ignoring lines. For example, suppose you wanted to see all the hits to the signup.jsp page on your website that did NOT come from your company's firewall (say it's for the sake of argument).

cat /var/log/httpd/access.log | grep "GET /signup.jsp" | grep -v - - [01/Apr/2007:18:22:48 -0700] "GET /signup.jsp HTTP/1.1" 200 4664 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv: Gecko/20070312 Firefox/" - - [01/Apr/2007:18:23:08 -0700] "GET /signup.jsp HTTP/1.1" 200 4664 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv: Gecko/20070312 Firefox/"

Just for fun, lets use the "wc", or word count, command.

cat /var/log/httpd/access.log | grep "GET /signup.jsp" | grep -v | wc -l

So, we catted our access.log, piped it through grep for our signup URL, piped those results through grep to filter out lines containing our IP address, and piped that through word count to show the number of lines in the result. We got two log lines that matched.

This really is the tip of the iceberg for grep and what it can do for you in processing your logs. I will follow up with part two in the coming days where I will cover more complex patterns and some shortcuts. There are easier ways to do all of these examples, but this should help you to understand how it works and give you the tools to started using it today.


iCehaNgeR's hAcK NoteS said...

Nice article. I have a question, is there a way we can specify multiple patterns. like, grep -v foo,boo

kjusupov said...

grep -vE 'one|two|three'

Athletic773 said...

1 - actually it is faster to use grep's stdin instead of passing through pipe. ie:
grep THISTEXT filename
does the same but much faster than:
cat filename | grep THISTEXT [don't do!]

2 - In response to the question, you just pass the OR operator to grep, which is the |, but you must escape that character from the shell with a backslash, ie:
grep "thistext\|thattext" filename

Anonymous said...

Nice simple explanation thanks. Helped me filter a rather large log file that will now be heading cisco's direction!!

Jimmy said...

Hi iCehaNgeR's hAcK NoteS, you can use egrep command per example: egrep '(foo|boo)'. You can also use the -v param to invert the match