Professional Documents
Culture Documents
Filters Final
Filters Final
Mirunalini.P
SSNCE
4 Sorting
5 References
Introduction to Filters
Head and Tail commands
Cut and Paste Commands
Filter is any command that gets its input from the standard input
stream, manipulates the input, and then sends the result to the
standard output stream.
Some filters can receive data directly from a file
More - passes all data from input to output with pauses at the
end of the each screen of data.
12 more simple filters are there.
Filter Action
cat Passes all the data from input to output
cmp Compares two files
comm Identifies common lines in two files
cut Passes only specified columns
diff Identifies differences between two
files or between common files in two directories
head Passes the number of specified lines
at the beginning of the data.
paste Combines columns
sort Arranges data in sequence
tail Passes the number of specified lines at the end of the data
Table 1: Common filters
Filter Action
tr Translates one or characters as specified
uniq Deletes duplicate lines
wc Counts characters,words or lines
grep Passes only specified lines
sed Passes edited lines
awk Passes edited lines - parses lines
Table 2: Common filters
Given one or more input files, the cat command writes them one
after another to standard output.
The result is that all the input files are combined and becomes one
output
1. $ cat f i l e 1 f i l e 2 f i l e 3
This i s f i l e 1 .
This i s f i l e 2 .
This i s f i l e 3 .
The c o n t e n t w i l l be c o n c a t e n a t e d and d i s p l a y e d
$ cat computers
A broad range of industrial and consumer products use
computers as control systems. Simple special-purpose
devices like microwave ovens and remote controls are
included,as are factory devices like industrial robots
and computer-aided design, as well as general-purpose
devices like personal computers and mobile devices like
smartphones.
$ c a t > go od S tu de n t s
Now i s t h e time
For a l l good s t u d e n t s
To come t o t h e a i d
of their college .
#t o append t h e f i l e
$ c a t >> go o d S t u d e nt s
Mirunalini.P (SSNCE) Unix October 18, 2023 12 / 60
cat Options
There are six options available with cat. Four categories:
Visual characters
Buffered output
Missing files
Numbered lines
Visual Characters: During output display unprintable character
such as ASCII control characters cannot be seen because they have no
visual graphic.
The visual option -v allows us to see control characters, with the
exception of the tab, new line, and form feed characters.
For -ve a dollar sign is printed at the end of the each line.
For -vt, the tabs appear as
ˆI
For -vet nonprintable characters are prefixed with a caret ˆ.
Mirunalini.P (SSNCE) Unix October 18, 2023 13 / 60
cat Options- Example
Buffered Output:
When output is buffered, it is kept in the computer until the system
has time to write it to a file.
Normally cat output is buffered.
We can force the output to be written to the file immediately by
specifying the option -u unbuffered.
This will slows down the system.
c a t f 1 . t x t f 3 . t x t 2>/dev / n u l l
1
2
3
4
Mirunalini.P (SSNCE) Unix October 18, 2023 16 / 60
cat Options
Numbered lines:
The numbered lines option (-n) numbers each line in each file as
the line is written to standard output.
If more than one file is being written, line numbers will continue
sequentially from the first line of the first file to the last line of the
last file.
Example :
$ c a t −n go odS tu d e n ts catExample
1 : Now i s t h e time
2 : For a l l good s t u d e n t s
3 : To come t o t h e a i d
4 : of their college .
5 : There i s a tab between t h e numbers
on t h e next l i n e
6 : 1 2 3 4 5
$ head −2 g o od St ud e nt s
Now i s t h e time
For a l l good s t u d e n t s
= => g o od St u d e nt s <=
Now i s t h e time
For a l l good s t u d e n t s
The tail command also outputs data from the end of the file.
The basic syntax: tail options inputfile
Although only file can be referenced it has several options as
shown in the table below:
head −5 f 3 . t x t
head −5 f 3 . t x t | t a i l −3
1 3
2 4
3 5
4
5
1 one 2 4 6
2 two 8 10 12
3 three 14 16 18
4 four
Mirunalini.P (SSNCE) Unix October 18, 2023 24 / 60
Paste Command
if length of the line is uneven writes the extra data from the first
file a separation delimiter such as the tab and newline until all
data have been output. paste odd.txt even.txt
1 one 2 4 6
2 two 8 10 12
3 three 14 16 18
4 four
5 five
In field sort need to define which field or fields are used for the
sort.
Field specifiers are a set of two numbers that together identify the
first and last field in a sort key.
+number1 -number2
number1 specifies the number of fields to be skipped to get to the
beginning of the sort field
number2 specifies the number of fields to be skipped, relative to
the beginning of the line to get to the end of the sort key.
+0 -1 => 0 fields are skipped and the end is one field away
+2 -4 => sorted based on 3rd and 4th field
+3 => 4th to end fields
Sort should be delimited by one space or tab if more spaces then
null field is considered in between the fields
s o r t +0 −1 r 2 . dat − S o r t based a s t h e 1 s t f i e l d
B a n g a l o r e KA 795623 895671 568712
Chennai TN 893457 589765 896713
Faridabad HR 589657 895756 785123
Imphal MN 785965 586972 698713
U j j a i n MP 528958 896257 268985
D e l i m i t e r o p t i o n (− t ) : S p e c i f i e s an a l t e r n a t e \\ d e l i m i
$ c a t > amp
Chennai &TN
B a n g a l o r e &KA
Faridabad &HR
Imphal &MN
$ s o r t −t & amp
B a n g a l o r e &KA
Chennai &TN
Faridabad &HR
Imphal &MN
Mirunalini .
Merge F i l e s : Combines m u l t i p l e o r d e r e d
f i l e s i n t o one f i l e
$ c a t > male $ c a t > f e m a l e
Arun Brindha
Sai Seetha
s o r t −m male f e m a l e
Arun
Brindha
Sai
Seetha
If input files are not sorted the merge sort to its maximum
and produce merge files
Unique S o r t F i e l d s : E l i m i n a t e s a l l k e e p i n g
one l i n e when t h e s o r t f i e l d s
are i d e n t i c a l
s o r t +2 −3 −u s t a t e . t x t
Bihar B EI 400
U t t a r p r a d e s h MP NI 900
c h e n n a i TN SI 856
Figure 1: Caption
Mirunalini.P (SSNCE) Unix October 18, 2023 35 / 60
Multi Pass Sorting
Defines two different fields for sorting.
List the fields one after another
Phoenix AZ 899
Sanfrancisco CA 890
Sandiego CA 85796
Dallas TX 89576
Sanantonio TX 7859
Table 6: City Table
Delete Characters:
To delete matching characters in the translation we use the delete
option (-d).
Note that the delete option does not use string2.
tr –d “aeiouAEIOU”
It is very easy to use TRANSLATE
t s vry sy t s TRNSLT
Squeeze Output:
The squeeze option deletes consecutive occurrences of the same
character in the output.
After translation if the output contains a string of ”d’s” all but one
would be deleted.
tr –s “ie” “dd”
The fiend did dastardly deeds
Thd fdnd d dastardly ds
Input s a m t e x t . t x t
5 completely duplicate l i n e s
5 completely duplicate l i n e s
Not a d u p l i c a t e − − next d u p l i c a t e s f i r s t 5
5 completely duplicate l i n e s
Last 3 f i e l d s d u p l i c a t e : one two t h r e e
Last 3 f i e l d s d u p l i c a t e : one two t h r e e
The next 3 l i n e s a r e d u p l i c a t e a f t e r c h a r 5
uniq sam text.txt
Output :
5 completely duplicate l i n e s
Not a d u p l i c a t e − − next d u p l i c a t e s f i r s t 5
5 completely duplicate l i n e s
Last 3 f i e l d s d u p l i c a t e : one two t h r e e
The next 3 l i n e s a r e d u p l i c a t e a f t e r c h a r 5
Mirunalini.P (SSNCE) Unix October 18, 2023 43 / 60
Files with Duplicate Lines
cat unique1.txt
5 completely duplicate l i n e s
5 completely duplicate l i n e s
Not a d u p l i c a t e − − next d u p l i c a t e s f i r s t 5
5 completely duplicate l i n e s
Last 3 f i e l d s d u p l i c a t e : one two t h r e e
Last 3 f i e l d s d u p l i c a t e : one two t h r e e
The next 3 l i n e s a r e d u p l i c a t e a f t e r c h a r 5
abcde d u p l i c a t e t o end
f g h i j d u p l i c a t e t o end
uniq -u unique1.txt
Not a d u p l i c a t e − − next d u p l i c a t e s f i r s t 5
5 completely duplicate l i n e s
The next 3 l i n e s a r e d u p l i c a t e a f t e r c h a r 5
abcde d u p l i c a t e t o end
f g h i j d u p l i c a t e t o end
Mirunalini.P (SSNCE) Unix October 18, 2023 44 / 60
Files with Duplicate Lines
2 5 completely duplicate l i n e s
1 Not a d u p l i c a t e − − next d u p l i c a t e s f i r s t 5
1 5 completely duplicate l i n e s
2 Last 3 f i e l d s d u p l i c a t e : one two t h r e e
1 The next 3 l i n e s a r e d u p l i c a t e a f t e r c h a r 5
1 abcde d u p l i c a t e t o end
1 f g h i j d u p l i c a t e t o end
wc TheRaven
116 994 5782 TheRaven
l i n e s words c h a r
options :
wc -c : No of Characters
wc -l: No of Lines
wc -w : No of words
There are three UNIX commands that can be used to compare the
contents of two files:
compare (cmp), difference (diff ), and common (comm)
The cmp command examines two files byte by byte.
Its operation is shown below:
cmp options file1 file2
cmp without Options
When the cmp command is executed without any options, it stops at
the first byte that is different.
The byte number of the first difference is reported.
c a t cmp f i l e 2
123456
as9u
cmp c m p f i l e 1 c m p f i l e 2
cmpfile1 cmpfile2 d i f f e r : char 8 l i n e 2
cat>d i f f 1
one same
two same
x and y
same
x
˜ $ cat>d i f f 2
one same
two same
y and x
same
not x
extra l1
˜$ d i f f diff1 diff2
3 c3
< x and y
−−−
> y and x
5 c5 , 6
< x
−−−
> not x
> extra l1
The comm command finds lines that are identical in two files.
It compares the files line by line and displays the results in three
columns
The left column contains unique lines from file1
The center column contains unique lines from file2
The right column contains lines found in both files
one same
two same
d i f f e r e n c e comm2
d i f f e r e n t comm1
same a t l i n e 4
same a t l i n e 5
comm : f i l e 1 i s not i n s o r t e d o r d e r
comm : f i l e 2 i s not i n s o r t e d o r d e r
not i n comm2
same a t l i n e 7
same a t l i n e 8
not i n comm1