Saturday, April 25, 2009

Find pattern from a file using grep, egrep or fgrep utility

The grep or egrep or fgrep utility is used to find pattern from a file and print the output. We can use these utilities to search words from multiple files.

You can put the searching patterns inside single quotes.

The syntax of using grep is,
grep pattern file_1 file_2 ..... file_n

Let's do our experiment on two files named blog_info.txt and myself.txt. The contents of this two files are shown below.

# vi blog_info.txt
Welcome to http://arjudba.blogspot.com blog.
This blog contains day to day tasks of Arju.
Most of the contents of this blog is oracle+php+shell scripts.
Thanks for visiting this blog.

# vi myself.txt
Assalamu Alaikum - which means peace be upon you.
I am Arju. I work as DBA Consultant.
I also help people to learn oracle online.
I completed by B.Sc from IUT.
I try to update my blog regularly.

Let's now search for Arju keyword from both of these files.

# grep 'Arju' blog_info.txt myself.txt
blog_info.txt:This blog contains day to day tasks of Arju.
myself.txt:I am Arju. I work as DBA Consultant.

With grep command you can use several options. Below is some of them.

1)-h : If you search from more than one files like in the above output after searching "Arju" keyword the lines from both of the files are displayed containing Arju plus the file name are shown before the output. If you don't want file name to be displayed then you can use -h option.

# grep -h 'Arju' blog_info.txt myself.txt
This blog contains day to day tasks of Arju.
I am Arju. I work as DBA Consultant.

2)-w : Let's search for 'is' keyword from both files. Any word containing 'is' keyword will appear. So word containing this also will appear.

# grep 'is' blog_info.txt myself.txt
blog_info.txt:This blog contains day to day tasks of Arju.
blog_info.txt:Most of the contents of this blog is oracle+php+shell scripts.
blog_info.txt:Thanks for visiting this blog.

Now if we want to restrict search containing just 'is' keyword only then use grep with -w option.

# grep -w 'is' blog_info.txt myself.txt
blog_info.txt:Most of the contents of this blog is oracle+php+shell scripts.

3)-b : If you want to print the word position number of the search text within the file then use -b option.

# grep -b 'is' blog_info.txt myself.txt
blog_info.txt:45:This blog contains day to day tasks of Arju.
blog_info.txt:90:Most of the contents of this blog is oracle+php+shell scripts.
blog_info.txt:153:Thanks for visiting this blog.

In the first line 45 is printed as position of T is 45 of word "This" which conatins "is".

4)-c : If you use -c option, then it displays only a count of the number of matched lines and not the lines themselves.

# grep -c 'is' blog_info.txt myself.txt
blog_info.txt:3
myself.txt:0

So within blog_info.txt the "is" keyword is appeared three times but within myself.txt there is no such "is" keyword.

5)-e : With -e option you can specify one or more patterns for which grep is to search. You may indicate each pattern with a separate -e option character, or with newlines within pattern. For example, the following two commands are equivalent:
grep -e pattern_1 -e pattern_2 file
grep -e 'pattern_1 pattern_2' file

For example to search either keyword "Arju" or "IUT." from both file you can issue,

# grep -e 'Arju' -e 'IUT.' blog_info.txt myself.txt
blog_info.txt:This blog contains day to day tasks of Arju.
myself.txt:I am Arju. I work as DBA Consultant.
myself.txt:I completed by B.Sc from IUT.

6)-f : The -f patternfile reads one or more patterns from patternfile. Patterns in patternfile are separated by newlines.


7)-i : The -i option tells grep to ignore case. So if you use -i option keyword "blog" and "BlOg" treated same.

8)-l : The -l option prints the file name that contain the matching lines.
# grep -l 'IUT' blog_info.txt myself.txt
myself.txt

As "IUT" word is present inside file myself.txt so that is printed.

9)-n : The -n option precedes each line with the line number where it was found.
# grep -n 'IUT' myself.txt
4:I completed by B.Sc from IUT.

Before printing line 4 is printed as it is found on line 4.

10)-q : The -q option suppresses output and simply returns appropriate return code.

11)-s : The -s option suppresses the display of any error messages for nonexistent or unreadable files.

12)-U[b|B|l|L]: The -U option forces the specified files to be treated as Unicode files. By default, these utilities assume that Unicode characters are little-endian. If a byte-order marker is present, that is used to determine the byte order for the characters. You can force Unicode characters to be treated as big-endian by specifying -Ub or -UB. Similarly, you can force them to be treated as little-endian by specifying -Ul or -UL.

13)-v : The -v option displays all lines not matching a pattern.
For example if we want to print the line numbers where "I" letter is not found then issue,

# grep -v 'I' myself.txt
Assalamu Alaikum - which means peace be upon you.

14)-x : The -x option is used to find line that requires a string to match an entire line.

# grep -x 'I completed by B.Sc from IUT.' myself.txt
I completed by B.Sc from IUT.

fgrep searches files for one or more pattern arguments, but does not use regular expressions. It does direct string comparison to find matching lines of text in the input.

egrep works similarly, but uses extended regular expression matching. If you include special characters in patterns typed on the command line, escape them by enclosing them in apostrophes to prevent inadvertent misinterpretation by the shell or command interpreter. To match a character that is special to egrep, a backslash (\) should be put in front of the character. It is usually easier to use fgrep if you don't need special pattern matching.


No comments:

Post a Comment