Linux Shell – How To Remove Duplicate Text Lines nixCraft Updated Tutorials/Posts

I need to sort data from a log file, but there are too many duplicate lines. How do I remove all duplicate lines from a text file under GNU/Linux?

You need to use shell pipes along with the following two Linux command line utilities to sort and remove duplicate text lines:

  1. sort command– Sort lines of text files in Linux and Unix-like systems.
  2. uniq command– Rport or omit repeated lines on Linux or Unix

Removing Duplicate Lines With Sort, Uniq and Shell Pipes

Use the following syntax:
sort {file-name} | uniq -u
sort file.log | uniq -u

Remove duplicate lines with uniq

Here is a sample test file called garbage.txt displayed using the cat command:
cat garbage.txt
Sample outputs:

this is a test
food that are killing you
wings of fire
we hope that the labor spent in creating this software
this is a test
unix ips as well as enjoy our blog

Removing duplicate lines from a text file on Linux

Type the following command to get rid of all duplicate lines:
$ sort garbage.txt | uniq -u
Sample output:

food that are killing you
unix ips as well as enjoy our blog
we hope that the labor spent in creating this software
wings of fire

Where,

  • -u : check for strict ordering, remove all duplicate lines.

Sort file contents on Linux

Let us say you have a file named users.txt:
cat users.txt
Sample outputs:

Vivek Gite 24/10/72
Martin Lee 12/11/68
Sai Kumar 31/12/84
Marlena Summer 13/05/76
Wendy Lee 04/05/77
Sayali Gite 13/02/76
Vivek Gite 24/10/72

Let us sort, run:
sort users.txt
Next sort by last name, run:
sort +2 users.txt
Want to sort in reverse order? Try:
sort -r users.txt
You can eliminate any duplicate entries in a file while ordering the file, run:
sort +2 -u users.txt
sort -u users.txt

Remove duplicate lines with uniq and sort commands
Without any options, the sort compares entire lines in the file and outputs them in ASCII order. You can control output with options.

How to remove duplicate lines on Linux with uniq command

Consider the following file:
cat -n telphone.txt
Sample outputs:

 1 99884123 2 97993431 3 81234000 4 02041467 5 77985508 6 97993431 7 77985509 8 77985509

The uniq command removes the 8th line from file and places the result in a file called output.txt:
uniq telphone.txt output.txt
Verify it:
cat -n output.txt
How to use uniq command to remove duplicate lines

How to remove duplicate lines in a .txt file and save result to the new file

Try any one of the following syntax:
sort input_file | uniq > output_file
sort input_file | uniq -u | tee output_file

Conclusion

The sort command is used to order the lines of a text file and uniq filters duplicate adjacent lines from a text file. These commands have many more useful options. I suggest you read the man pages by typing the following man command:
man sort
man uniq

Posted by: Vivek Gite

The author is the creator of nixCraft and a seasoned sysadmin, DevOps engineer, and a trainer for the Linux operating system/Unix shell scripting. Get the latest tutorials on SysAdmin, Linux/Unix and open source topics via RSS/XML feed or weekly email newsletter.

blank

Posted by Web Monkey

blank