Friday, April 24, 2009

Data manipulation using awk utility in linux

With awk utility you can scan a pattern within file record and then process the record as you want. Suppose you have the student_grade.txt like below. First field is
student id and second field is grade. Both field is separated by tab delimiter.

# cat student_grade.txt
024401 4.98
024402 4.95
024403 4.95
024404 4.50
024405 4.95

Now we want to find out those student id whose grade is 4.95. With awk utility we can do this. We will search each record for the grade 4.95 and then print the 1st field.

The syntax of usage awk utiity is,
awk 'pattern_action' {file_name}

Now we can extract the student id whose grade is 4.95 by,
# awk '/4.95/{print $1}' student_grade.txt
024402
024403
024405

It will search for 4.95 within each record of file student_grade.txt and then by command "print $1" it will print first field.

Meta characters used in awk
To search for a pattern in awr you can use various meta characters. The list of meta characters along with their meaning is given below.

1). (Dot): Match any character
2)* : Match zero or more character
3)^ : Match beginning of line
4)$ : Match end of line
5)\ : Escape character following
6)[ ] : Match any of the list of characters
7){ } : Match range of instance
8)+ : Match one more preceding
9)? : Match zero or one preceding
10)| : Separate choices to match

Predefined variables in awk
1)FILENAME : Name of current input file
2)RS : Input record separator character (Default is new line)
3)OFS : Output field separator string (Blank is default)
4)ORS : Output record separator string (Default is new line)
5)NF : Number of input record
6)NR : Number of fields in input record
7)OFMT : Output format of number
8)FS : Field separator character (Blank & tab is default)

Related Documents
http://arjudba.blogspot.com/2009/04/translate-or-replace-characters-using.html

No comments:

Post a Comment