In this post we will see how to delete repeated words. There is a human tendency to write fast and and when we try to review our writing we will find repeated words side by side. If you observe I written “and” two times. This is human mind tendency to process before we write actual word. Its hard to read entire file for duplicate words if the file is big enough to skim the text. This even cause to skip some words. A better procedure is to use some tools like SED and Perl/Python to do this with the help of Regular Expressions.

I have a file abc.txt with following data.

cat abc.txt
Output:

This is is how it works buddy
What else else you want

 Remove repeated words with SED as given below.

sed -ri ‘s/(.* )1/1/g’  abc.txt

cat abc.txt

Output:

This is how it works buddy
What else you want

Let me explain sed command which we used.

-r option is for enabling Extended Regular Expression which have grouping option with () braces.
-i option for inserting the changes to original file, Be careful with this option as you can not get your original file once modified.
(.* ) for mentioning any group of characters and which is followed by same set of characters which is represented by 1. This concept is called back reference, where 1 can store first set of characters enclosed in first (). And these two things (.* )1 is replaced by same word with 1 which is actual back reference to first (.* ).

 

The following two tabs change content below.
Mr Surendra Anne is from Vijayawada, Andhra Pradesh, India. He is a Linux/Open source supporter who believes in Hard work, A down to earth person, Likes to share knowledge with others, Loves dogs, Likes photography. He works as Devops Engineer with Taggle systems, an IOT automatic water metering company, Sydney . You can contact him at surendra (@) linuxnix dot com.