You searched for label/Perl - The Linux Juggernaut

[SED]: Remove repeated/duplicate words from a file in Linux

Posted by Surendra Anne | May 4, 2013 | Programming, SED |

In this post we will see how to delete repeated words. There is a human tendency to write fast and and when we try to review our writing we will find repeated words side by side. If you observe I written “and” two times. This is human mind tendency to process before we write actual word. Its hard to read entire file for duplicate words if the file is big enough to skim the text. This even cause to skip some words. A better procedure is to use some tools like SED and Perl/Python to do this with the help of Regular Expressions. I have a file abc.txt with following data. cat abc.txtOutput: This is is how it works buddyWhat else else you want Remove repeated words with SED as given below. sed -ri ‘s/(.* )1/1/g’ abc.txt cat abc.txt Output: This is how it works buddyWhat else you wantLet me explain sed command which we used. -r option is for enabling Extended Regular Expression which have grouping option with () braces.-i option for inserting the changes to original file, Be careful with this option as you can not get your original file once modified. (.* ) for mentioning any group of characters and which is followed by same set of characters which is represented by 1. This concept is called back reference, where 1 can store first set of characters enclosed in first...

Search Results for: label/Perl

[SED]: Remove repeated/duplicate words from a file in Linux

Over 16,000 readers, Get fresh content from “The Linux juggernaut”

Email Subscribe

Take this course

ABOUT ME..!