Friday, April 24, 2009

Remove duplicate successive line using uniq utility

With uniq utility you can remove duplicate line if they are within successive line. For example you have successive identical line in the file, then with uniq you can discard all but one of successive identical lines from the file. Consider you have following lines within with files.

# vi student_data.txt
Roll number is 024401
His Name is Rafi
His Name is Rafi
He is 24 years old.
Roll number is 024401

Then using use uniq utility as below will yield following result.

# uniq student_data.txt
Roll number is 024401
His Name is Rafi
He is 24 years old.
Roll number is 024401

Note that within file there was two duplicate lines. One is, "Roll number is 024401" and another is "His Name is Rafi". Using the "uniq" output only "His Name is Rafi" line is omitted because they are successive identical lines. However "Roll number is 024401" text line is not removed because they are not successive though they are identical. So uniq utility is used to remove adjacent identical line only.

With the help of "sort" command uniq can be used to remove all duplicate lines within a file regardless of they are successive or not. Following is an example which will remove all duplicate lines within file student_data.txt and save it as sort_student.txt.

# sort student_data.txt | uniq > sort_student.txt

# cat sort_student.txt
He is 24 years old.
His Name is Rafi
Roll number is 024401

Related Documents
http://arjudba.blogspot.com/2009/04/edit-file-on-linux-using-sed-utility.html

No comments:

Post a Comment