[SED]: Remove repeated/duplicate words from a file in Linux

In this post we will see how to delete repeated words. There is a human tendency to write fast and and when we try to review our writing we will find repeated words side by side. If you observe I written “and” two times. This is human mind tendency to process before we write actual word. Its hard to read entire file for duplicate words if the file is big enough to skim the text. This even cause to skip some words. A better procedure is to use some tools like SED and Perl/Python to do this with the help of Regular Expressions.

I have a file abc.txt with following data.

cat abc.txt
Output:

This is is how it works buddy
What else else you want
Remove repeated words with SED as given below.

sed -ri ‘s/(.* )1/1/g’ abc.txt

cat abc.txt

Output:

This is how it works buddy
What else you want
Let me explain sed command which we used.

-r option is for enabling Extended Regular Expression which have grouping option with () braces.
-i option for inserting the changes to original file, Be careful with this option as you can not get your original file once modified.
(.* ) for mentioning any group of characters and which is followed by same set of characters which is represented by 1. This concept is called back reference, where 1 can store first set of characters enclosed in first (). And these two things (.* )1 is replaced by same word with 1 which is actual back reference to first (.* ).

Post Views: 15,917

Bio
Latest Posts

Surendra Anne

Mr Surendra Anne is from Vijayawada, Andhra Pradesh, India. He is a Linux/Open source supporter who believes in Hard work, A down to earth person, Likes to share knowledge with others, Loves dogs, Likes photography. He works as Devops Engineer with Taggle systems, an IOT automatic water metering company, Sydney . You can contact him at surendra (@) linuxnix dot com.

Latest posts by Surendra Anne (see all)

Docker: How to copy files to/from docker container - June 30, 2020
Anisble: ERROR! unexpected parameter type in action: Fix - June 29, 2020
FREE: JOIN OUR DEVOPS TELEGRAM GROUPS - August 2, 2019
Review: Whizlabs Practice Tests for AWS Certified Solutions Architect Professional (CSAP) - August 27, 2018
How to use ohai/chef-shell to get node attributes - July 19, 2018

[SED]: Remove repeated/duplicate words from a file in Linux

Surendra Anne

Latest posts by Surendra Anne (see all)

Over 16,000 readers, Get fresh content from “The Linux juggernaut”

Email Subscribe

Take this course

ABOUT ME..!

[SED]: Remove repeated/duplicate words from a file in Linux

Surendra Anne

Latest posts by Surendra Anne (see all)

Related Posts

Generate randum password using a shell script

Linux script/command to display lines ends with .(dot) and ;(semicolon)

Linux/Unix shell script to number the lines of a file

Learn SED with examples

Over 16,000 readers, Get fresh content from “The Linux juggernaut”

Email Subscribe

Take this course

ABOUT ME..!