Download as txt, pdf, or txt
Download as txt, pdf, or txt
You are on page 1of 31

- to xtract same lines from 2 files

=========================================
grep -Fxf file1 file2
awk 'FNR==NR{A[$0];next}$0 in A' file1 file2
comm -12 <( sort file1 ) <( sort file2
How to replace and remove few junk characters from a specific field?
=====================================================================
awk -v OFS="\t" '{$1=$1;sub(/%.*\)/, "", $4)}1' filename
perl -wnla -e '$F[3] =~ s/%.*\)// if $#F > 2; print join "\t", @F;' filename
Adding the file name as the first line of the file
===================================================
ls | while IFS="" read -r file
do
read hasline1 < "$file"
if [ "$hasline1" != "$file" ]
then
(echo "$file"; cat "$file") > /tmp/tempfile.$$ &&
cp /tmp/tempfile.$$ "$file"
fi
done
Replace with last date update
==============================
sort -k1,1rn file > a.txt
date ='head -n1 a.txt | awk -F'|' '{print$1}
awk -F'|' -V date=$date '{print date "|"$2$3}' a.txt
$ awk 'FNR==NR{if($1 > m) m = $1 ; next}{$1 = m}1' FS='|' OFS='|' file file
Removing PATTERN from txt without removing lines and general text formatting
-===========================================================================
sed 's/ -- //g' file.txt > newfile.txt'
Just redirect stderr to /dev/null to get rid of the error messages
====================================================================
find / -name merge.txt 2>/dev/null
If you want to remove lines ended by a ^M character:
=====================================================
sed '/^M$/d' /pathto/infile > /path/to/outfile
delete only the character itself, but not the line:
===================================================
sed 's/^M//g' /pathto/infile > /path/to/outfile

Sometimes, we might have a requirement wherein we need to display the file conte
nts with its line number. In this article, we will see the different ways in whi
ch we can achieve this line numbering of file.
Let us take a sample file, say file1, with the following contents:
$ cat file1

Ashwath, Mangalore
Abirami, Chennai
Swetha, Karwar
1. Unix has a specific command, nl, whose very purpose is to do line numbering.
This command is very rarely known or used:
$ nl file1
1 Ashwath, Mangalore
2 Abirami, Chennai
3 Swetha, Karwar
2. The simplest of all is using the cat command. cat command has an option "-n"
which exactly does this line numbering.
$ cat -n file1
1 Ashwath, Mangalore
2 Abirami, Chennai
3 Swetha, Karwar
3. awk can also be used to get the line numbering using the awk special variable
, NR.
$
1
2
3

awk '{print NR, $0}' file1


Ashwath, Mangalore
Abirami, Chennai
Swetha, Karwar

NR contains the line number of the line being processed by awk currently. And
hence the above output.
4. perl has a special variable, $., which is same as the NR of awk, gives line n
umber at any point of a file.
$
1
2
3

perl -ne '{print $.," ", $_}' file1


Ashwath, Mangalore
Abirami, Chennai
Swetha, Karwar

5. sed command can be tweaked a bit to get the line numbering:


$
1
2
3

sed '=' file1 | sed 'N;s/\n/ /'


Ashwath, Mangalore
Abirami, Chennai
Swetha, Karwar

To understand the above sed command, you need to execute the commands separa
tely. The '=' operator gives a line number to every line. However, the line numb
er is one line, the contents move to the next line. In order to join the number
and the file contents, we use the 'N' function of sed to join it. Using 'N', we
substitute the new line character by 2 spaces, and hence the result as above.
Happy Numbering!!!!

cut is a very frequently used command for file parsing. It is very useful in spl
itting columns on files with or without delimiter. In this article, we will see
how to use the cut command on files having a delimiter.
Let us consider a sample file, say file1, with a comma as delimiter as shown bel
ow:
$ cat file1
Rakesh,Father,35,Manager
Niti,Mother,30,Group Lead
Shlok,Son,5,Student
The first column indicates name, second relationship, the third being the age,
and the last one is their profession.
cut command has 2 main options to work on files with delimiters:
-f - To indicate which field(s) to cut.
-d - To indicate the delimiter on the basis of which the cut command will cut t
he fields.
Let us now try to work with this command with a few examples:
1. To get the list of names alone from the file, which is the first column:
$ cut -d, -f 1 file1
Rakesh
Niti
Shlok
The option "-d' followed by a comma indicates to cut the file on the basis of
comma. "-f" followed by 1 indicates to retrieve the field 1 from the file1, and
hence we got the names alone.
2. To get the relationship alone:, i.e, 2nd field
$ cut -d, -f 2 file1
Father
Mother
Son
3. To get 2 fields, say Name and Age:
$ cut -d, -f 1,3 file1
Rakesh,35
Niti,30
Shlok,5
Giving 1,3 means to retrieve the first and third fields which happens to be n
ame and age respectively.
4. To get the name, relationship and age, excluding the profession, i.e, 1st to
3rd fields:
$ cut -d, -f 1-3 file1
Rakesh,Father,35
Niti,Mother,30
Shlok,Son,5

The option 1-3 means from first field till third field. Whenever we need a ra
nge of fields to be retrieved, we use the '-' option.
The same result above can also be retrieved in other ways also:
$ cut -d, -f -3 file1
Rakesh,Father,35
Niti,Mother,30
Shlok,Son,5
This is the best of the 3 methods to retrieve a range of fields. The option "-3"
means from the beginning i.e, the first field till the third field. And hence w
e get the fields 1, 2 and 3.
5. To retrieve all the fields except the name field. i.e, to retrieve from field
2 to field 4:
$ cut -d, -f 2- file1
Father,35,Manager
Mother,30,Group Lead
Son,5,Student
Similar to the last result, "2-" means from the second field till the end whic
h is the 4th field. Whenever the beginning of the range is not specified, it def
aults to 1, similarly when the end of the range is not given, it defaults to the
last field. The same result could have been achieved using the option "2-4" as
well.
Let us consider the same input file with a space as the delimiter:
$ cat file1
Rakesh Father 35 Manager
Niti Mother 30 GL
Shlok Son 5 Student
The same options and commands used above hold good but for the delimiter spec
ified. When comma is the delimiter, we can give it after the -d option. However,
for the space as delimiter, we need to quote the delimiter as shown below. In f
act, we can always quote the delimiter to be in the safer side.
6. To retrieve the first field from a space delimited file:
$ cut -d" " -f 1 file1
Rakesh
Niti
Shlok
Let us consider the same file separated by tab space:
$ cat file1
Rakesh Father 35
Niti
Mother 30
Shlok Son
5

Manager
GL
Student

To actually confirm the file is indeed separated by tab space, use the "-t" op
tion with the cat command:
$ cat -t file1
Rakesh^IFather^I35^IManager

Niti^IMother^I30^IGL
Shlok^ISon^I5^IStudent
The ^I indicates a tab space.
7. To retrieve the first field from this tab separated file. How to specify the
tab space with the "-d" option?
$ cut -f 1 file1
Rakesh
Niti
Shlok
Surprised!! The default delimiter of the cut command is the tab space, and he
nce when we have a file which is tab separated, we need not specify the "-d" opt
ion at all. Directly, the "-f" option can be used to retrieve the fields.
Happy Cutting!!! - See more at: http://www.theunixschool.com/2011/05/cut-cut-fil
es-with-delimiter.html#sthash.9vzduhCN.dpuf

we will see the different ways to add a header record or a trailer record to a f
ile.
Let us consider a file, file1.
$ cat file1
apple
orange
grapes
banana

1. To add a header record using sed:


$ sed '1i FRUITS' file1
FRUITS
apple
orange
grapes
banana
The '1i' in sed includes(i) the line FRUITS only before the first line(1) of
the file. The above command displays the file contents along with the header wi
thout updating the file. To update the original file itself, use the -i option o
f sed.
2. To add a header record to a file using awk:
$ awk 'BEGIN{print "FRUITS"}1' file1
FRUITS
apple

orange
grapes
banana
The BEGIN statement in awk makes the statement FRUITS to get printed before pr
ocessing the file, and hence the header appears in the output. The 1 is to indic
ate to print every line of the file.
3. To add a trailer record to a file using sed:
$ sed '$a END OF FRUITS' file1
apple
orange
grapes
banana
END OF FRUITS
The $a makes the sed to append(a) the statement END OF FRUITS only after the l
ast line($) of the file.
4. To add a trailer record to a file using awk:
$ awk '1;END{print "END OF FRUITS"}' file
apple
orange
grapes
banana
END OF FRUITS
The END label makes the print statement to print only after the file has been
processed. The 1 is to print every line. 1 actually means true.

We discussed about the positional parameter $*. There is one more positional par
ameter, $@, the definition of which is also the same as $*. Let us see in this a
rticle the exact difference between the parameters $* and $@.
First, let us write a simple shell script to understand $@:
$ cat cmd
#!/usr/bin/bash
echo "The total no of args are: $#"
echo "The \$* is: $*"
echo "The \$@ is: $@"
On running the above program with some command line arguments:
$ ./cmd 1 2 3
The total no of args are: 3
The $* is: 1 2 3
The $@ is: 1 2 3

As shown in the above output, both the $* and $@ behave the same. Both contai
n the command line arguments given.
Let us now write a script which exactly shows the difference:
$ cat cmd
#!/usr/bin/bash
echo "Printing \$* "
for i in $*
do
echo i is: $i
done
echo "Printing \$@ "
for i in "$@"
do
echo i is: $i
done
Now, on running the above script:
$ ./cmd a b "c d" e
Printing $*
i is: a
i is: b
i is: c
i is: d
i is: e
Printing $@
i is: a
i is: b
i is: c d
i is: e
In the above example, we write a for loop and display the arguments one by one
using $* and $@. Notice the difference. When we pass the command line argument
in double quotes("c d"), the $* does not consider them as a single entity, and s
plits them. However, the $@ considers them as a single entity and hence the 3rd
echo statement shows "c d" together. This is the difference between $* and $@.

How to remove / delete the leading and trailing spaces in a file? How to replace
a group of spaces with a single space?
Let us consider a file with the below content:
$ cat file
Linux 25
Fedora 40

Suse 36
CentOS 50
LinuxMint 15
Using the -e option of cat, the trailing spaces can be noticed easily(the $ symb
ol indicates end of line)
$ cat -e file
Linux 25$
Fedora 40$
Suse 36
$
CentOS 50$
LinuxMint 15$
Let us see the different ways of how to remove these spaces:
1. awk command:
$ awk '$1=$1' file
Linux 25
Fedora 40
Suse 36
CentOS 50
LinuxMint 15
awk has a property wherein just by editing a field, all the whitespaces get remo
ved automatically. Nothing changes just by assigning $1 to $1 ('$1=$1' ) and at
the same time, a dummy edit has happened which will remove the whitespaces.
2. sed command:
$ sed 's/^ *//;s/ *$//;s/ */ /;' file
Linux 25
Fedora 40
Suse 36
CentOS 50
LinuxMint 15
Using multiple substitutions(3)
removes the leading spaces, the
replaces a group of spaces with
dated by using the -i option of

in sed, the spaces are removed. The 1st command


second removes the trailing spaces and the last
a single space. The source file itself can be up
sed.

3. Perl solution:
$ perl -plne 's/^\s*//;s/\s*$//;s/\s+/ /;' file
Linux 25
Fedora 40
Suse 36
CentOS 50
LinuxMint 15
This is almost same as the sed solution. Like sed, the source file itself can be
updated by adding an -i option to the above command.
4. Bash solution:
$
>
>
>

while read f1 f2
do
echo $f1 $f2
done < file

Linux 25
Fedora 40
Suse 36
CentOS 50
LinuxMint 15
Using the while loop, the 2 columns are read in variables f1 and f2. By just ech
oing the variables back, the spaces get removed automatically. - See more at:

How to remove / delete duplicate records / lines from a file?


Let us consider a file with the following content. The duplicate record is 'Li
nux' with 2 entries :
$ cat file
Unix
Linux
Solaris
AIX
Linux
1. Using sort and uniq:
$ sort file | uniq
AIX
Linux
Solaris
Unix
uniq command retains only unique records from a file. In other words, uniq rem
oves duplicates. However, uniq command needs a sorted file as input.
2. Only the sort command without uniq command:
$ sort -u file
AIX
Linux
Solaris
Unix
sort with -u option removes all the duplicate records and hence uniq is not n
eeded at all.
Without changing order of contents:
The above 2 methods change the order of the file. The unique records may not
be in the order in which it appears in the file. The below 2 methods will print
the file without duplicates in the same order in which it was present in the fi
le.
3. Using the awk :
$ awk '!a[$0]++' file
Unix

Linux
Solaris
AIX
This is very tricky. awk uses associative arrays to remove duplicates here.
When a pattern appears for the 1st time, count for the pattern is incremented. T
his will still make the count as 0 since it is a post-fix, and the negation of
0 which is 'True' makes the pattern printed. When the same pattern appears again
, the count is now 1 and hence the inverse is 'False' and hence the pattern does
not get printed.
4. Perl solution:
$ perl -lne '$x=$_;if(!grep(/^$x$/,@arr)){print; push @arr,$_ ;}' file
Unix
Linux
Solaris
AIX
Every time before printing a pattern, the pattern is checked in the array "ar
r". If not present, the pattern is printed and also pushed into the array "arr"
so that the pattern does not get printed the next time.
5. A shell script to remove duplicates:
$ cat dupl.sh
#!/bin/bash
TEMP="temp"`date '+%d%m%Y%H%M%S'`
touch $TEMP
while read line
do
grep -q "$line" $TEMP || echo $line >> $TEMP
done < $1
cat $TEMP
\rm $TEMP
The input file contents are read using the while loop. Within the loop, every
pattern is written to a temporary file if the pattern is not present in it. And
hence, the temporary file contains a copy of the original file without duplicat
es.
On running the above script:
$ ./dupl.sh file
Unix
Linux
Solaris
AIX

How to change the delimiter of a file from comma to colon?


Let us consider a file with the following contents:
$ cat file

Unix,10,A
Linux,30,B
Solaris,40,C
HPUX,20,D
Ubuntu,50,E
1. sed solution:
$ sed 's/,/:/g' file
Unix:10:A
Linux:30:B
Solaris:40:C
HPUX:20:D
Ubuntu:50:E
Using the sed substitution(s) command, all(g) commas are repalced with colons.
2. awk solution:
$ awk '$1=$1' FS="," OFS=":" file
Unix:10:A
Linux:30:B
Solaris:40:C
HPUX:20:D
Ubuntu:50:E
FS and OFS are awk special variables which means Input Field separator and Outpu
t field separator respectively. FS is set to comma which is the input field sepa
rator, OFS is the output field separator which is colon. $1=$1 actually does not
hing. For awk, to change the delimiter, there should be some change in the data
and hence this dummy assignment.
3. awk using gsub function:
$ awk 'gsub(",",":")' file
Unix:10:A
Linux:30:B
Solaris:40:C
HPUX:20:D
Ubuntu:50:E
gsub function in awk is for global
bstitute all occurrences. awk provides
The difference between sub and gsub is
t occurrence, whereas gsub substitutes

substitution. Global, in the sense, to su


one more function for substitution: sub.
sub replaces or substitutes only the firs
all occurrences.

4. tr solution:
$ tr ',' ':' < file
Unix:10:A
Linux:30:B
Solaris:40:C
HPUX:20:D
Ubuntu:50:E
tr can be used for mutliple things: to delete, squeeze or replace specific chara
cters. In this case, it is used to repalce the commas with colons.
5.Perl solution to change the delimiter:

$ perl -pe 's/,/:/g' file


Unix:10:A
Linux:30:B
Solaris:40:C
HPUX:20:D
Ubuntu:50:E
Same explanation as the sed solution above.
6. One more perl way:
$ perl -F, -ane 'print join ":",@F;' file
Unix:10:A
Linux:30:B
Solaris:40:C
HPUX:20:D
Ubuntu:50:E
In this, the elements of the line are autosplit(a) and stored into the default a
rray(@F). Using join, the array elements are joined using colon and printed.
7. Shell script to change the delimiter of a file:
$ while read line
> do
> echo ${line//,/:}
> done < file
Unix:10:A
Linux:30:B
Solaris:40:C
HPUX:20:D
Ubuntu:50:E
Using the shell substitution command, all the commas are replaced with the co
lons. '${line/,/:}' will replace only the 1st match. The extra slash in '${line/
/,/:}' will replace all the matches.
Note: This method will work in bash and ksh93 or higher, not in all flavors.
8. Shell script using IFS to change the delimiter of file:
$ while IFS=, read f1 f2 f3
> do
> echo $f1:$f2:$f3
> done < file
Unix:10:A
Linux:30:B
Solaris:40:C
HPUX:20:D
Ubuntu:50:E
IFS(Internal Field Separator) is a shell environment variable which holds the
delimiter. The default is whitespace. Using the IFS in while loop, individual c
olumns can be read in separate variables and while printing -

sort command is used to sort a file, arranging the records in a particular order

. By default, the sort command sorts file assuming the contents are ascii. Using
options in sort command, it can also be used to sort numerically. Let us discus
s it with some examples:
File with Ascii data:
Let us consider a file with the following contents:
$ cat file
Unix
Linux
Solaris
AIX
Linux
HPUX
1. sort simply sorts the file in alphabetical order:
$ sort file
AIX
HPUX
Linux
Linux
Solaris
Unix
All records are sorted alphabetically.
2. sort removes the duplicates using the -u option:
$ sort -u file
AIX
HPUX
Linux
Solaris
Unix
The duplicate 'Linux' record got removed. '-u' option removes all the duplicate
records in the file. Even if the file have had 10 'Linux' records, with -u optio
n, only the first record is retained.
File with numbers:
Let us consider a file with numbers:
$ cat file
20
19
5
49
200
3. The default sort 'might' give incorrect result on a file containing numbers:
$ sort file
19
20
200
49
5

In the above result, 200 got placed immediately below 20, not at the end which i
s incorrect. This is because the sort did ASCII sort. If the file had not conta
ined '200', the default sort would have given proper result. However, it is inco
rrect to sort a numerical file in this way since the sorting logic is incorrect.
4. To sort a file numericallly:
$ sort -n file
5
19
20
49
200
-n option can sort the decimal numbers as well.
5. sort file numerically in reverse order:
$ sort -nr file
200
49
20
19
5
'r' option does a reverse sort.
Multiple Files:
Let us consider examples with multiple files, say file1 and file2, containing n
umbers:
$ cat file1
20
19
5
49
200
$ cat file2
25
18
5
48
200
6. sort can sort multiple files as well.
$ sort -n file1 file2
5
5
18
19
20
25
48
49
200
200

The result of sort with multiple files will be a sorted and merged output of the
multiple files.
7. Sort, merge and remove duplicates:
$ sort -nu file1 file2
5
18
19
20
25
48
49
200
-u option becomes more handy in case of multiple files. With this, the output is
now sorted, merged and without duplicate records.
Files with multiple fields and delimiter:
Let us consider a file with multiple fields:
$ cat file
Linux,20
Unix,30
AIX,25
Linux,25
Solaris,10
HPUX,100
8. sorting a file containing multiple fields:
$ sort file
AIX,25
HPUX,100
Linux,20
Linux,25
Solaris,10
Unix,30
As shown above, the file got sorted on the 1st field, by default.
9. sort file on the basis of 1st field:
$ sort -t"," -k1,1 file
AIX,25
HPUX,100
Linux,20
Linux,25
Solaris,10
Unix,30
This is being more explicit. '-t' option is used to provide the delimiter in cas
e of files with delimiter. '-k' is used to specify the keys on the basis of whic
h the sorting has to be done. The format of '-k' is : '-km,n' where m is the sta
rting key and n is the ending key. In other words, sort can be used to sort on a
range of fields just like how the group by in sql does. In our case, since the
sorting is on the 1st field alone, we speciy '1,1'. Similarly, if the sorting is
to be done on the basis of first 3 fields, it will be: '-k 1,3'.

Note: For a file which has fields delimited by a space or a tab, there is no nee
d to specify the "-t" option since the white space is the delimiter by default i
n sort.
10. sorting file on the basis of the 2nd field:
$ sort -t"," -k2,2 file
Solaris,10
HPUX,100
Linux,20
AIX,25
Linux,25
Unix,30
11. sorting file on the basis of 2nd field , numerically:
$ sort -t"," -k2n,2 file
Solaris,10
Linux,20
AIX,25
Linux,25
Unix,30
HPUX,100
12. Remove duplicates from the file based on 1st field:
$ sort -t"," -k1,1 -u file
AIX,25
HPUX,100
Linux,20
Solaris,10
Unix,30
The duplicate Linux record got removed. Keep in mind, the command "sort -u file"
would not have worked here becuase both the 'Linux' records are not same, the v
alues were different. However, in the above, sort is told to remove the duplicat
es based on the 1st key, and hence the duplicate 'Linux' record got removed. Acc
ording to sort, in case of a group of similar records, except the first one, the
rest are considered duplicate.
13. Sort the file numerically on the 2nd field in reverse order:
$ sort -t"," -k2nr,2 file
HPUX,100
Unix,30
AIX,25
Linux,25
Linux,20
Solaris,10
14. sort the file alphabetically on the 1st field, numerically on the 2nd field:
$ sort -t"," -k1,1 -k2n,2 file
AIX,25
HPUX,100
Linux,20

Linux,25
Solaris,10
Unix,30
15. sort a file based on the 1st and 2nd field, and numerically on 3rd field on
a file containing 5 columns:
$ sort -t"," -k1,2 -k3n,3 file

we will see the different ways in which we can print or display the last line or
the trailer record of a file in Linux.
Let us consider a file with the following contents:
$ cat file
Unix
Solaris
Linux
1. The tail is the most common command used. tail command prints the last part o
f the files. -1 specifies to print one line from the last part.
$ tail -1 file
Linux
2. The END label in awk makes it even more easily. END label is reached once the
entire file is parsed. Hence, on reaching END, the special variable $0 will be
holding the last line of the file.
$ awk 'END{print}' file
Linux
3. In sed, $ indicates the last line, and $p tells to print(p) the last line($)
only.
$ sed -n '$p' file
Linux
4. Another option in sed is to delete(d) all the lines other than(!) the last li
ne($) which in turn prints only the last line.
$ sed '$!d' file
Linux
5. In perl, every line being processed is saved in a variable. Same explanation
as of awk. Only difference here is the scope of the variable $_ is only present
in main, not in the END label. Hence, every line is stored in a variable and the
same variable is used in the END.
$ perl -ne '$x=$_;END{print $x;}' file
Linux
6. tac command prints a file in reverse. By printing the first line of the tac o

utput using the head, we will get the last line of the file printed. Not the bes
t of options, but an option nevertheless.
$ tac file | head -1
Linux
7. Solution using a proper shell script. The file is processed using the while
loop where each line is read in one iteration. The line is assigned to a variabl
e x, and outside the loop, x is printed which will contain the last line of the
file.
#!/bin/bash
while read line
do
x=$line
done < file
echo $x

grep, awk or a sed command is used to print the line matching a particular pat
tern. However, at times, we need to print a few more lines following the lines
matching the pattern. In this article, we will see the different ways in which w
e can get this done. The first part explains ways to print one line following th
e pattern along with the pattern, few examples to print only the line following
the pattern and in the end we have ways to print multiple lines following the pa
ttern.
Let us consider a file with the following contents as shown below:
$ cat file
Unix
Linux
Solaris
AIX
SCO
1. The simplest is using the grep command. In GNU grep, there is an option -A wh
ich prints lines following the pattern.
$ grep -A1 Linux file
Linux
Solaris
In the above example, -A1 will print one line following the pattern along with
the line matching the pattern. To print 2 lines after the pattern, it is -A2.
2. sed has the N command which will read the next line into the pattern space.
$ sed -n '/Linux/{N;p}' file
Linux
Solaris
First, the line containing the pattern /Linux/ is found. The command within t
he braces will run on the pattern found. {N;p} means read the next line and prin
t the pattern space which now contains the current and the next line. Similarly,

to print 2 lines, you can simply put: {N;N;p}. Example 7 onwards explains for p
rinting multiple lines following the pattern.
3. awk has the getline command which reads the next line from the file.
$ awk '/Linux/{print;getline;print;}' file
Linux
Solaris
Once the line containing the pattern Linux is found, it is printed. getline
command reads the next line into $0. Hence, the second print statement prints th
e next line in the file.
4. In this, the same thing is achieved using only one print statement.
$ awk '/Linux/{getline x;print $0 RS x;}' file
Linux
Solaris
getline x reads the next line into variable x. x is used in order to preven
t the getline from overwriting the current line present in $0. The print stateme
nt prints the current line($0), record separator(RS) which is the newline, and t
he next line which is in x.
5. To print only the line following the pattern without the line matching the pa
ttern:
$ sed -n '/Linux/{n;p}' file
Solaris
The n command reads the next line into the pattern space thereby overwritin
g the current line. On printing the pattern space using the p command, we get th
e next line printed.
6. Same using awk:
$ awk '/Linux/{getline;print;}' file
Solaris
Multiple lines after the pattern:
GNU grep may not available in every Unix box. Excluding grep option, the above
examples are good only to print a line or two following the pattern. Say, if yo
u have to print some 10 lines after the pattern, the command will get clumsy. Le
t us now see how to print n lines following the pattern along with the pattern:
7. To print multiple(2) lines following the pattern using awk:
$ awk '/Linux/{x=NR+2}(NR<=x){print}' file
Linux
Solaris
Aix
To print 5 lines after the pattern, simply replace the number 2 with 5. This
above example is a little tricky. Once the pattern Linux is found, x is calcula
ted which is current line number(NR) plus 2. So, we will print lines from the cu
rrent line till the line number(NR) reaches x.
8. To print 2 lines following the pattern without the line matching the pattern:
$ awk '/Linux/{x=NR+2;next}(NR<=x){print}' file

Solaris
Aix
The next command makes the current line, which is the pattern matched, to get
skipped. In this way, we can exclude the line matching the pattern from getting
printed.
9. To print 2 lines following the pattern along with the pattern matching line i
n another way.
$ x=`grep -n Linux file | cut -f1 -d:`
$ awk -v ln=$x 'NR>=ln && NR<=ln+2' file
Using grep and cut command, the line number of the pattern in the file is fo
und. By passing the shell variable to awk, we make it print only those lines wh
ose line number is between x and x+2.
10. One more way using sed and awk combination. First we calculate the from and
to line numbers in the variables x and y. Using sed printing range of lines, we
can get the same. sed can not only deal with numbers, it can work with variables
as well:
$ x=`awk '/Linux/{print NR}' file`
$ y=`awk '/Linux/{print NR+2}' file`
$ sed -n "$x,$y p" file
OR
$ x=`awk '/Linux/{print NR+2}' file`
$ sed -n "/Linux/,$x p" file

we will see about the finding files using the filename or using a part of the fi
le name or using the extension.
1. To find all the files ending with .c :
$ find . -name "*.c"
The '.' indicates current directory. The default behavior of find makes th
is command to search in the current directory and also in all the sub-directorie
s under the current directory for the files ending with .c.
2. To search for all the .c files only in the current directory:
$ find . -maxdepth 1 -name "*.c"
The maxdepth switch is used to control the number of levels to descend. maxd
epth value of 1 indicates to descend only one level, which in other words is onl
y the current directory alone.
3. To find all the .c and .h files:
$ find . -name "*.[ch]"

[ch] will match either c or h. And hence .c and .h both will be matched.
4. To find all the .cpp and .h files:
$ find . \( -name *.cpp -o -name *.h \)
In the earlier example, we could use [ch] since both were one character. Th
e same cannot be done in this example. Hence, we need to give two separate condi
tions on the 'name' switch using the OR(-o) condition. The backslash(\) is given
to escape the bracket.
5. To find all the files whose names begin with 'f' :
$ find . -name "f*"
6. To find all the files whose names begin with 'f' and end with '.c' :
$ find . -name "f*.c"
7. To find all the files except the .c files:
$ find . ! -name "*.c"
The '!' is used to negate the condition.
8. To find all the files other than the .c and .h files:
$ find . ! \( -name "*.c" -o -name "*.h" \)
Since the condition inside the brackets tell to find .c or .h files, the n
egation makes it to search for files anything other than .c and .h file.
9. To find all the files:
$ find . -type f
The type switch can be used to find files of specific type. Without the typ
e switch in the above example, the find command would have given files and direc
tories as well. '-type f' indicates to find the files alone. To find the directo
ries alone, we can use '-type d'.
10. To find all the .c files which contain the word 'stdio':
$ find . -name "*.c" -exec grep -l stdio '{}' \;
Using the exec switch, we can execute more commands on the output of the fin
d command. Here, we grep for the word stdio on all the files resulted from the
find command and display the matching file names alone. '{}' is a place holder f
or the output of the find command.
The same can also be achieved as below:
$ find . -name "*.c" | xargs grep -l stdio
11. To find all the .c files which does not contain the word 'stdio':
$ for i in `find . -name "*.c"`
do
grep -q stdio $i || echo $i
done

This cannot be achieved only with the find command itself like the earlier e
xamples. We loop on the files found by the find command, and print those files n
ames which does not contain the word stdio. -q option of grep is to suppress the
default grep output from getting printed since only the file names should get p
rinted.
12. The find command output always displays the file names with the relative fi
le path. To get the file names alone without the relative path:
$ find . -name "*.c" | sed 's^.*/^^'
The sed command removes everything till the last slash and hence only the fi
lename remains.

cp command is used to copy files in Unix . There are some options in cp which ad
d lot of value to this command. In this article, we will see about some of the r
arely used options of the cp command which makes cp really powerful.
1. Copy a file
cp command has the arguments source file and destination file. Here, we are copy
ing file1 to file2.
$ cp file1 file2
$ ls -l file*
-rw-r--r-- 1 guru users 163 Apr 11 15:36 file1
-rw-r--r-- 1 guru users 163 Apr 11 15:40 file2
The same thing can also be achieved using the below command. It might look tr
icky if you are seeing this for the first time.
$ cp file{1,2}
When this above command runs, shell expands "file{1,2}" to "file1 file2".
And hence it becomes in the right format as needed by the cp command. To underst
and it better, simply put an "echo " ahead of the cp command:
$ echo cp file{1,2}
cp file1 file2
2. Copy a file named "file" to "file1":
Copying file1 to file2 is fine. Using the same shortcut as above, how to copy "f
ile" to "file1"?
$ cp file{,1}
$ ls -l file*
-rw-rw-r-- 1 guru users 163 Apr 10 13:50 file
-rw-r--r-- 1 guru users 163 Apr 11 15:36 file1
If this command is not clear, as did earlier, put an echo before cp and ch
eck it.
On carefully noticing the above output file listing, 2 things will be clear
:

cp command creates the new file with the current timestamp.


The file permission is the default file permissions, not the file permission
s of the original file.
3. How to copy a file by preserving the timestamp of the original file? In oth
er words, how to copy a file with the same timestamp as that of the original fil
e?
$ cp --preserve=timestamp file file1
$ ls -l file*
-rw-rw-r-- 1 guru users 163 Apr 10 13:50 file
-rw-r--r-- 1 guru users 163 Apr 10 13:50 file1
GNU cp provides an option --preserve using which the timestamp of the origi
nal file can be retained. 3 properties of a file are of importance while copying
a file:
Time stamp
File Modes
Ownership
4. Copy a file retaining the modes of the source file:
Like timestamp, cp can also retain the mode of the files and ownership as well.
The same --preserve switch does it.
$ cp --preserve=mode file file1
$ ls -l file*
-rw-rw-r-- 1 guru users 163 Apr 10 13:50 file
-rw-rw-r-- 1 guru users 163 Apr 11 15:35 file1
The mode of the "file" has been retained in "file1" as well.
5. Copy a file preserving timestamp, mode and ownership:
To copy a file with all the 3 properties of the original file, use the "-p" opti
on. "-p" is a refined form of "--preserve" wherein all the 3 properties(timestam
p, mode and ownership) are copied as is from the original file.
$ cp -p file file1
$ ls -l file*
-rw-rw-r-- 1 guru users 163 Apr 10 13:50 file
-rw-rw-r-- 1 guru users 163 Apr 10 13:50 file1
6. Create a link using the cp command:
ln command is the one which is used in creating soft links and hard links. cp c
ommand can also create soft links and hard links.
$ cp -l file file1
$ ls -li file*
9962525 -rw-rw-r-- 2 guru users 163 Apr 10 13:50 file
9962525 -rw-rw-r-- 2 guru users 163 Apr 10 13:50 file1
The "-l" option of the cp command does not copy files, instead creates a ha
rd link of the source file. The same inode numbers of file and file1 proves it.
7. Create a soft link using the cp command:
cp can create soft links using the "-s" option.
$ cp -s file file1
$ ls -l file*

-rw-rw-r-- 1 guru users 163 Apr 10 13:50 file


lrwxrwxrwx 1 guru users 4 Apr 11 14:59 file1 -> file
8. Copy only if the source file is updated:
Sometimes, we copy a file the need for which could be to copy only if the source
file is updated(time stamp). Keep in mind, cp is an external command. Some extr
a code need to be written to check if the source file is updated of late. cp pro
vides an option "-u" for this purpose. With -u, the file will be copied only if
the source file is newer than the destination file.
$ date
Wed Apr 11 16:33:42 IST 2012
$ cp -u file1 file2
$ ls -l file*
-rw-r--r-- 1 guru users 163 Apr 11 15:36 file1
-rw-r--r-- 1 guru users 163 Apr 11 15:40 file2
As shown above, cp did not copy the file since file2 is newer than file1.
9. Let us just touch the file file1 and try the cp.
$ touch file1
$ cp -u file1 file2
$ ls -l file*
-rw-r--r-- 1 guru users 163 Apr 11 16:31 file1
-rw-r--r-- 1 guru users 163 Apr 11 16:31 file2
As seen above, file2 is copied since the file1 has a timestamp later than
file2.
10. Copy a file with a pre-defined time stamp:
Say, we want to have the copied file with a specific time stamp of the user
choice. cp does not provide an option for this. Instead, we can copy the file no
rmally, and use the touch command to change the timestamp of our choice.
$ cp file1 file2
$ touch -t 1204091524 file2
$ ls -l file2
-rw-r--r-- 1 guru users 163 Apr 9 15:24 file2
As above, the touch command can be used to change the timestamp of the file
to any date using the "-t" option. The time stamp of the file file2 has now been
changed to "1204091524" which stands for 2012(12), April(04), 9th(08), 15 hours
(15) and 24 minutes(24).

date command in UNIX is used more often to create file names, rather than to che
ck the system date. For any of our process, if we are in need of a unique file n
ame, then one of the best options is to have the date and time as part of the fi
le name.
Let us see the usage of date command:
1. date command by default gives the system date and time.

$ date
Tue Mar 27 15:42:24 IST 2012
2. Date command has many switches with which we can extract the components of da
te separately.
$ date '+%Y%m%d'
20120327
For example, the above command gives the 'year'(%Y), 'month'(%m) and 'date'
(%d). Say, if we want the year component alone from the date command:
$ date '+%Y'
3. Some format specifiers can also be added along with the date command as shown
with the "-" below.
$ date '+%Y-%m-%d'
2012-03-27
4. This one gives the year, month, date, hours, minutes and seconds.
$ date '+%Y%m%d%H%M%S'
20120327154527
5. A typical example in shell scripts how we use the date command to prepare a
file name.
$ DT=`date '+%Y%m%d%H%M%S'`
$ FILE=VD_DAT_${DT}
$ echo $FILE
VD_DAT_20120327154618
First, the date command output is taken in a variable DT.
$ FILE=VD_DAT_${DT}.txt
$ echo $FILE
VD_DAT_20120327154618.txt
GNU Date:
If the date in your system is GNU date, the options you get in GNU date ma
kes your life a breeze. The many difficult date computations are easily made wit
h the "-d" option. Systems with GNU date will always be at advantage.
1. To get the date - 10mins. The "-d" switch allows us to back date by minutes,
hours, days, months and years too.
$ date -d "10 mins ago"
Tue Mar 27 15:39:09 IST 2012
2. To get the date time 2 hours back:
$ date -d "2 hours ago"
Tue Mar 27 13:52:14 IST 2012
3. In the same lines, to get yesterdays date:
$ date -d "1 day ago"
Mon Feb 27 15:52:30 IST 2012

4. A combination of all the above meaning we can use multiple components such as
days, months, hours, etc.
$ date -d "2 years 1 month 2 hours 30 mins ago"
Sun Apr 27 18:14:12 IST 2014
5. Similarly, we can have dates in future as well. All that needs to be done is
to remove the word "ago". Say, to get a datetime 40 mins in advance:
$ date -d "40 mins"
Mon Apr 2 15:53:25 IST 2012
All this command does is adds 40 mins to the current date.
6. Similarly, to get current date + 1day OR to get tomorrow's date:
$ date -d "1 day"
Tue Apr 3 15:21:15 IST 2012
Script to do arithmetic on Dates:
Even with the GNU date, the kind of date calculations you can do is pretty
limited. Calculations like to find the difference between 2 dates are not possib
le using the date command. One of the most popular scripts in the Unix.com forum
is the one which does arithmetic on dates, or in other words, date calculations
. Some of the important things which the date calculation script does is:
Difference between 2 dates
Adding a number to a date
Converting a date to day of the week
Display the name of the day for a given date
Convert a date to a julian day number.
The script for this date calculation is : datecalc. The beauty of this script i
s it is written purely using internal commands, without using a single external
command

Looking at the file contents is one such activity which a UNIX developer might b
e doing in sleep also. In this article, we will see the umpteen different ways i
n which we can display or print the contents of a file.
Let us consider a file named "a", with the following contents:
a
b
c
1. The first is the most common cat command:
$ cat a
a
b
c
Another variant of "cat " will be:

cat < a
2. paste command, we might have used it earlier, to paste contents from multiple
files. But, when paste is used without any options against a single file, it do
es print out the contents.
$ paste a
a
b
c
3. grep command, as we know, will search for the text or regular expression in a
file. What if, our regular expression, tells to match anything(.*). Of course,
the whole file comes out.
$ grep '.*' a
a
b
c
4. The good old cut command. "1-" indicates from 1st column till the end, in ot
her words, cut and display the entire file.
$ cut -c 1- a
a
b
c
5. Using the famous while loop of the shell:
$ while read line
>do
> echo $line
>done < a
a
b
c
6. xargs command. xargs takes input from standard input. By default, it suppress
es the newline character. -L1 option is to retain the newline after every(1) lin
e.
$ xargs -L1 < a
a
b
c
7. sed without any command inside the single quote just prints the file.
$ sed '' a
a
b
c
8. sed' s -n option is to supress the default printing. And 'p' is to print. So,
we supress the default print, and print every line explicitly.
$ sed -n 'p' a
a

b
c
9. Even more explicit. 1,$p tells to print the lines starting from 1 till the en
d of the file.
$ sed -n '1,$p' a
a
b
c
10. awk solution. 1 means true which is to print by default. Hence, every line e
ncountered by awk is simply printed.
$ awk '1' a
a
b
c
11. The print command of awk prints the line read, which is by default $0().
$ awk '{print;}' a
a
b
c
12. awk saves the line read from the file in the special variable $0. Hence, pri
nt $0 prints the line. In fact, in the earlier case, simply putting 'print' inte
rnally means 'print $0'.
$ awk '{print $0;}' a
a
b
c
13. perl solution. -e option is to enable command line option of perl. -n option
is to loop over the contents of the file. -p options tells to print the every l
ine read. And hence -pne simply prints every line read in perl.
$ perl -pne '' a
a
b
c
14. perl explicitly tries to print using the print command.
$ perl -ne 'print;' a
a
b
c
15. perl saves the line read in the $_ special variable. Hence, on trying to pri
nt $_, every line gets printed and hence the entire file. In fact, when we simpl
y say 'print', it implies 'print $_'.
$ perl -ne 'print $_;' a
a
b
c

Happy file printing!!!

How to find the duplicate records / lines from a file in Linux?


$ sort file | uniq -d
You will do well to know how to print a few lines AFTER a pattern match.
$ grep -B2 Linux file
$ sed -n '1,/Linux/p' file | tail -3
$ tac file | grep -A2 Linux | tac
Different ways to print the next few lines after pattern match $ grep -A1 Linux file
$ sed -n '/Linux/{N;p}' file
$ sed -n '/Linux/{n;p}' file
Different ways to split the file contents
$ sed 's/,/\n/g' a
$ tr ',' '\n' < a
$ awk '$1=$1' RS=, a
5
$
$
$
$

different ways to do numbering of a file contents


nl file1
cat -n file1
awk '{print NR, $0}' file1
sed '=' file1 | sed 'N;s/\n/ /'

Different ways to add header and trailer line to a file


$ sed '1i FRUITS' file1
$ awk 'BEGIN{print "FRUITS"}1' file1
$ sed '$a END OF FRUITS' file1
$ awk '1;END{print "END OF FRUITS"}' file
Different ways to delete ^M character in a file
$ cat -v file1
$ tr -d '^M' <file1
$ sed 's/^M//g' file1
$ awk 'sub(/^M/,"");1' file1
Different ways to print non-empty lines in a file
grep -v '^$' file
sed '/^$/D' file
sed -n '/^$/!p' file
awk NF file
awk 'length' file

You might also like