Professional Documents
Culture Documents
Advanced Filter 1
Advanced Filter 1
Y ou often need to search a file for a pattern--either to see the lines containing
(or not containing) it or to have it replaced by something else. This chapter
discusses two important filters that are specially suited for these tasks--grep-and sed.
grep takes care of all search requirements you may have-and does the job well. sed
goes further and can even m_anipulate the individual characters in a line. In fact sed can
do several things, some of them quite well. ·
This chapter also takes up regular expressions--one of the fascinating features
of UNIX. You've already had a taste of these expressions when using the search capa-
bilities of vi and emac~. But it'.s in· this chapter that you'll see them in all their mani-
festations. The rules for framing the patterns are 'Veil defined, and in no time you'll be
able to devise compact express~ons that perform amazing matches. In fact, it's common
to find just a single line of grep or sed code replacing several tin.es of C code.
The system administrator must be adept in understanding and framing regular
expressions. Learning to use them along with grep ;md sed serves as a suitable prelude -
to learning perl wµich uses many of their features. Toes~ features ,are discussed here
but assumed in the chapter on perl.
Objectives
• Output lines containing a simple string with grep. (15.2)
• Use grep's options to display their count, line numbers, and lines not containing a
pattern. ( 15.3)
• Use a regular expression to search for multiple similar strings. ( 15.4)
• Use egrep and fgrep with multiple patterns. (15.5)
• Learn how egrep uses a special regular expression that can group and delimit mul-
tiple patterns: (15.6)
• Use sed to select and edit lines. (15. 7 to 15. JO)
• Replace one pattern with another in sed, using regular expressions where necessary.
(15.JJ)
• Use the interval and tagged regular expressions to enhance the power of grep and
sed. (15.12)
433
434 Your UN,x. 71.
. 'fie Ufr
'¾e e,
15.1 The Sample Database ""1
In this chapter and the ones dealing with filters and shell program .
referring . to the file emp. l st. Somet1mes,
. you 'll also be usingming' Yo u'II o[J••
. . ~ k
derived from it. It's a good idea to have a close look at the file now r file Or ut
organization: anct uncle rsland1.,,0
$ cat emp.lst
2233jcharles harris jg.m. Isales 112/12/521 90000
9876lbill johnson !director iproductionj03/12/50jl30000
5678lrobert dylan jd.g.m. !marketing j04/19/43 j 85000
2365ljohn woodcock !director jpersonnel j05/11/47jl20000
5423lbarry wood !chairman jadmin j08/30/56jl60000
1006jgordon lightfootldirector Isales j09/03/38jl40000
6213j~ichael lennon lg.m. !accounts I06/05/62jl05000
1265lp.j . woodhouse !manager Isales j09/12/63j 90000
4290 Ineil o' bryan Iexecutive.Iproduction I09 /07 /50 f 65000
2476ljackie wodehousejmanager Isales j05/01/59lll0000
6521jderryk o'brien jdirector !marketing 109/26/451125000
3212jbill wilcocks ld.g.m. !accounts 112/12/551 85000
3564jronie trueman lexecutivelpersonnel I07/06/47j 75000
2345jjames wilcox ig.m. !marketing I03/12/45lll000O ·
0ll0jjulie truman lg.m. !marketing 112/31/401 95000
The first five lines of this file were used as the file shortlist in the section on sort
(9.12). The significance of the fields have also been explained there, but we'll recount
them just the same. This is a text•file containing 15 lines of a personnel database. There
are six fields-empid, name, designation, department, date of birth and salary using the
I as the field delimiter. This character has a special meaning to the shell, so we must
remember to escape it whenever we specify the delimiter. .
grep interprets light foot as a filename, and obviously fails to open such a file. How-
ever, its search continues by using the next argument, i.e., emp. l st. Now, quote the
pattern:
What happened here? The Bourne, Korn and bash shells interpret this as an incompkte
command by issuing a secondary prompt string(>). The same-command run in the C
shell even causes an error: .
Your UNJX- r
436 . he Ut,-
''%,
C,;~
% grep 'neil o'bryan' emp.lst
Unmatched '•
The pattern itself contains a single quote. The shell looks for even
. f . -' number
quotes to determine the boundanes .
o nonmteuerence. We should h
ave re
sof th
that single quotes don't p~otect single quotes, only double quotes do: lllernber~
Though quotes are redundan_t in single-word fixed strings, it's better to en~
use. It sets up a good habit with no adverse consequences. You can then u~:ce ~eir
expressions inside them. · regular
You need to quote the pattern in grep if the pattern contains more than one w d
. . b h or orsno
cial characters that can be interpreted otherwise y t e shell. You can generally u .".
single or double quotes but ·tI you nee d comman d su bst1tu
. t·10n or variable evaluat·se e1th~
Note I
There's more to it here than meets the eye. The command failed because the strini
president couldn't be ·located. Though the feature of scanning a file for a pattern is
available in both sed and awk, these commands are not considered to fail if they can't
locate a pattern in their input.
Don't, however, draw the wrong conclusion from the above behavioral pattern of
grep. The silent return of the shell prompt is no ·evidence of faifure. In fact, the silent
behavior of cmp denotes success. Success or failure is determined by the value of asix·
cial variable($?) that gets set when a command has finished execution. You'll seei:
the chapter on shell programming (18.5.1) how this variable is applied in the comman
line of the shell's programming constructs. ·
Option Significance
·C Displays count of number of occurrences
-1 Displays list of filenames only
-n Displays line numbers along with lines
•V Doesn't display lines matching expression
.; Ignores case when matching
-h Omits filenames when handling multiple files
·W Matches complete word (grep only)
-e pat Also matches pattern pat beginning with a • (hyphen)
-e pat As above, but can be used multiple times (Linux and some UNIX versions)
-E Treats pattern as an egrep regular expression (Linux and Solaris-xpg4)
-F Matches pattern in fgrep-style (Linux and Solaris-xpg4)
-n Displays line and n lines above and below (Linux only)
-An Displays line and n lines after matching lines (Linux only)
-B n Displays line ~nd n lines before matching lines Wnux only)
-f file Take patterns from file, one per line (Linux only)
This is one of the few grep options that doesn't display the lines at all. If you use this
command with multiple files, the filename is prefixed to the line count:
Sometimes, you need to get a single count from all these files so that you can use it in
script logic. You have already handled a similar situation before (8.8.1), and you should
be able to use grep in a manner that drops the filenames from the output. Try the -h
option.
~isplaying Line Numbers (-n) The -n (number) option can be used to display the
hne numbers containing the pattern, along with the lines:
The line numbers are shown at the beginning of each line, separated f 4ia
0
line by a : . If you use this option with multiple filenames, then you / rn the ac~
additional fields-the filename and the count: ouJd have t
%
$ grep -n 'marketing' emp?.lst I head -2
empl .lst:2:5678jrobert dylan jd.g.m. /marketing /O4/19/4J/
· I09/26/45/ 85 ooo
empl .lst:6:6521jderryk 0 brien jdirector Imar k·et mg
1
125000
Deleting Lines (-v) The -v (inverse) option selects all but the lines co . .
pattern. Thus, you can create a file other list containing all but directors:ntauung ~e
This is a useful option for "deleting" lines, but you can use it effectively only by ap .
1
ing redirection. Obviously, the lines haven't been deleted from the original file as s:C:
We had to create a separate file otherl i st containing all but the directors' lines. ·
L,,,J The - v option removes lines from grep's output, but doesn't actually change the argument file.
Note
Displaying Filenames (-1) The -1 (list) option displays only the names of files
where a pattern has been found:
So, if you have forgotten the filename where you last ·saw something, just use this option
to find out which one has it. This is the second option that doesn't display the lines.
"Ignoring Case (-i) When you look for a name, but are not sure of the case, grep
offers the -i (ignore) option. This makes the match case-insensitive:
$ grep -i 'WILCOX' emp.lst
2345jjames wilcox jg.m. !marketing /03/12/45/110000
This locates the name wi l cox. However, a simple string like this· can ' t match theepoaint
5up-
wil cocks that also exists in the file but is spelled with minor differe~ces. ~rfllrll for
. .
ports very soph1st1cated techniques of pattern matching, and th"1s 1s. the ideal iO
regular expressions to make their entry. ·
(bat
apattefll
Patterns Beginning with a - (-e) What happens when you look ior . .
. w1"th a hyphen? Most systems will show you something s1.·1111·1ar to thJS ·
begms
. R gular Expressions-grep and sed 439
;sing e
grep treats -mtime as an opti~n of its own, and finds it "illegal." To locate such pat-
terns, you must use the -e option:
some•More'. Options
Matching-Mu~tiple Patterns (-e and -f) As just mentioned, the -e option has
an additional use in Linux~With this option, you can match multiple patterns using
a single invocation of the command:
Printing the Neighborhood GNU grep has a nifty option that locates not only
the matching line, but also a certain number of lines above and below it. For instance,
you may want to know what went before and after the f oreach statement that you
used in a perl script
$ grep -1 •foreach • count. p1 One line above and below
/print ("Region List\n") ; -
foreach $r_code sort (keys(%regionlist)) {
print ("$r_code : $regi on{$r_code} : $region 1i st {$r_code}\n")
440
Your UNtx, 7n
. e Ut,;,,,_
"'"ltG
The command locates the s~g :oreach _and ~isplays one line on either .
Isn't this feature useful? Usmg this npmenc opllon, you can locate a . 81de ofit
by supplying a umque. . th at exists
stnng . m . the IIl1'ddle of the code se sechon of cOde·
. and pre1er
You can b~ seIe~llve I:
to di spIa~, al on~ with
. the matched gmentli ·
tain number of Imes either above or below. This requrres the -A and -B n~, acer.
Op!Jons:
~.
grep -A 5 "do loop" update.sq]
5 lines after mat h.
grep -B 3 "do loop" update.sq] 3/ . cmgt;n
mes before mot h' ei
C lng[iflei
It's easier to identify the context of a matched line when the immediate .
. .
, hood 1s also presented. These opllons are also usenil·when searching soned r.
;The previous user-id or the next date of birth are often important things to loo~es.
or.
15.4 Regular Expressions-Round One
View the file emp. l st (15.J) once again, and you'll find some names spelled in .
ilar manner-like trueman and truman, w,. 1coc ks and wi l cox. You'll often wantas1m.
locate both truman and trueman without using grep twice. The command 10
15.4.1
The Character Class closes
Like the shell's wild cards, a regular expression also uses a character class that en per·
is (hen
a group of characters within a pair of rectangular brackets [ ] . The match
fonned for a single character in the group. Thus, the expression
. Regular Expressions-grep and sed 441
... [)sing
5' filte,,
i,rt .
(t,lf TA 8 LE 15.2 The Regular Expression Characters Used by grep, sed and per1
Matches
matches either an o or d. You can also use ranges, both for alphabets and numerals.
Thus, the pattern
[a-zA-Z0-9]
I .
matches a single alphanumeric character. This property can now be used to match
Woodhouse and wodehouse. These two patterps p ffer in their third and fourth charac-
ter positions-od in one and de in the other. To match these two strings, we'll have to
use the model [od] [de] which in fact matches all these four patterns:
Od oe dd de
Your UNIX- Th
. e Ult;rria,
442 •o"idt
The first and fourth are_relevant to the present problem. Using the character c
regular expression required to match woodhouse and wodehouse shou\d be thi:s, lhe
wo[od] [de] house
Let's use this regular expression with grep:
$ grep "w~[od][de]house" emp.lst
1265lp.j. wo9dhouse !manager Isales 109/12/631 90000
2476ljackie wodehouselmanager Isales I05/0l/59ll10000
A single pattern has located two similar strings; that's what regular expressi·ons areal]
about.
'-When ranges are used, the character on the left siqe of the - must be lower .
the ASCII collating sequence) than the one on the right. The character class [X- ]\in1
therefore, quite legitimate, as X has a lower ASCII value than c. However, that dc ~'
. h . .th .th oesn1
mean you can match an alphabetic c aracter m e1 er case w1 the expression [A-z]
because, between Z and a, there are a number of other nonalphabetic characters as well
(the caret, for example))
Negating a Class e u ex ressions use the " (caret) to ne ate the character class
while the shell uses the ! (bang). When the character class begins with this c aracte;
all characters other than the ones grouped in the class are matched. So a single nona1'.
phabetic character string is represented by this expression:
["a-zA-Z]
The feature of the character class is similar to the wild cards except that negation of the clasi
Q
Note
is done by a" (caret), while in the shell it's done by the ! (bang).
15.4.2 The*
The* (asterisk) refers to the immediately preceding character. However, its interpreta·
10
tion is the trickiest of the lot. Keep in mind that it bears absolutely no resemblance
the * used by wild cards or DOS. Rather, it matches zero or more occurrences of the
previous character. In other words, the previous character can occur many times, or not
at all. The pattern ·
e*
matches the single character e and any number of es. Because the previous ch~ct~:1
may not occur at all, it also matches a null string. Thus, apart from this null String,
also matches the following strings: ·
e ee eee eeee .....
d10
Mark.bthe words
de . t:ze~ o or more occurrences of the previous character':jpat are use
sion 1°
th
sen e ~ sigruficance of the *. Pon't make the mistake of using this exp~sd ard5
match
d · · with e; use ee* instead. Recall that the * used bYwil c
, a strmg begmnmg
oesn t relate to the previous character at all.
\
•ng Regular Expressions-grep and sed
\
. filters U51 443
()11P1er 15-
eral enough to mclude other.patterns. It would have.also matched trueeman had there \
been such a pattern in the file. · .
Using both the character class ,and the. *, we ,can now match wi1 cocks and
wilcox: --
The expression k*s* means that k ands may not occur at ~11 (or as many times as pos-
sible); that's why the expression used with grep also matches wi l cox which doesn't
contain these two characters at its end. You can feel the power of regular expressions \
here-and how they easily exceed t~e capi bilities o( wild cards. I
2.•.
The Regular Expression . * Th~ dot along _w ith the *. ( . *) constitutes very useful
regular expression. It signifies any number of characters, or none. S.ay, for mstanc~, y~u
are looking for the name p. wood house, ~ut are not sure w;~ther it actually _ex.1_sts m
the file as p. j. woodhouse. No problem, JUSt embed the • m the search stnng. ·
Your UN[>:,
,,
. The li1,·
$g
rep "p. *woodhouse" emp.lst
dh e Imanager Isales
1265IP ·j . woo ous
109/12/631 90000
2•. .
as the expression? This won't_do because the character 2, followed by three charac1e11,
can occur anywhere in the line. You must indicate to grep that the pattern occurs at Ill
beginning of the line, and the" does it easily:
Similarly, to select those lines where the salary lies between 70,000 and 89,999 doll311,
you have to use the $ (nothing to do with the currency) at the end of the pattern:
(lll
p11r
ls -1
I grep ""d" Shows only the directories
It's indeed strange th at in an operating system known for its commitment to brevity and
options, you hav~ t~ type s~ch a long sequence simply to list the directories! You
should convert thi s mto an ahas ( l7.4) or a shell function (19.10) so that it is always
vailable for you to use.
a dd '·
Here's"how g~ep can~ . power to the ls -1 command. This pipeline locates all
files which have wnte penruss1on for the group:
$ 1s -1 I grep , ..... w
I A I
This sequence matches a w at the sixth column locat,ion of the ls, -1 output-the one
which indicates the presence or absence of write permission for the group.
The caret has a triple role to play in regular expressions. When placed at the beginning of a
Q character class (e.g., [-:-a-z] ), it negates every character of the class. When placed outside it,
and at the beginning of the expression (e.g., "2 ... ), the pattern is matched at the begin-
Note ning of the line. At any other location (e.g., a"b), it matches itself literally.
'
15.4.5 When Metacharacters Lose Their Meaning
It's possible that some of these special characters actually exist as part of the text. If a
literal match has to be made for any of them, the "magic" of the characters should be
turned off. Sometimes, that is automatically done if the characters violate the regular
expression rules. Like the caret, the meaning of these characters can change de~nding
on the place they occupy in the expression.
The - loses its meaning inside the character class if it's not enclosed on either side
by a suitable character, or when placed outside the class. The . and * lose their mean-
ings when placed inside the character class. The * is also matched literally if it's the first
character of the expression. For instance, when you use grep 11 * 11 , you are in fact look- i
ing for an asterisk. I I
Sometimes, you may need to escape these characters, say, when looking for a
pattern g*. In that case, grep "g*" won't do, and you have to use the\ for escaping.
Similarly, to look for a [, you should use \ [, and to look for the literal pattern . *, you
should use \. \ *.
Regular expressions are found everywhere in the UNIX system. You have already
used them with v; and emacs-and now with grep. Apart from them, some of the most .l
t
powerful UNIX commands like egrep, sed, awk, perl and expr also use regular
expressions. You must understand them because they hold the key to the mastery of the
UNIX system.
We'll introduce some more metacharacters used by regular expressions later in
· this chapter and when we take up perl. To understand some of them, you need to know
!he egrep command first.
446
Your UN
. m,n
You should always keep 1n . d t hat a regular expression t . !Jc. 11ie IJ1,;,.,,,
. h .
beginning of the hne. The mate 1s also made for the Ion nes torn atch a t . , ,·
use the expression 03 . *05, it will match 03 and 05 as clogest Poss·b1
I e strins ring neare
Note . point
. acquires . ·t·1cance when these e se to .the Ieft andg. lhu
. s1gn1
respectively. Th 1s . s, 1<n-.it½
xpress1ons ar right ··~,
eusedf of~
Or Sub
15.5 egrep and f grep: The Other Members stitl.li
1
How do you now locate both wood house and woodcock e. f · •neYe
GNU grep achieves by using multiple -e options? whi·s 1
. ro~ th efile,ath·
Delimit the two expressions with the I and the job is done: easily done With in
· 1s e;r,
,~
C shell users should escape the newline/ character by using a \ at the end of the fiN
line. fgrep doesn't use any regular expression character-including the I used by
egrep. If the pattern to search for is a simple string or a group of them, fgrepisrec-
ommended. It is arguably faster too, the reason why it's known as fast grep.
$ cat pat.1st
adminlaccountslsales
And, here 's how the file should look like if fgrep were to use it:
$ cat pat.1st
admin
accounts
sales beus~
Id now
To look for these three patterns the egrep and fgrep comm ands shoU
in these ways:
. Regular Expressions-grep and sed
• f·11ers l)stn8 447
() '. I
ir11r
egrep -f pat.1st emp.1st
fgrep -f pat.1st emp.1st
15,61
, The+ and?
egrep's extended set includes two special characters-+ and?. They are often used in
place of the * to restrict the matching scope. They sigwfy the following:
+ Matches one or more occurrences of·the previous character.
? Matches zero or ~e occurrence of the previous character.
Now, what all this means is~that b+ matches b, bb.-bbb, etc; it doesn't match nothing-
unlike b*. The expression b? matches either a single instance of..,b or nothing. These
characters restrict the scope of match as eompared to the *.
In the two truemans that exist in emp. l st (15:I), note that the character e either
occurs once or not at all. So, e? is the expression to use here:
TA B LE 1S.3 The Extended Regular Expression Set Used by egrep and awk
\
I
I\
Expression Matches
ch+ One or more occurrences of character ch
g+ At least one 9
ch? Zero or one occurrence of character ch
."l
g? Nothing or one g I
expl jexp2 Expression exp 1 or exp2
GIFIJPEG GIF or JPEG
(xl Ix2)x3 Expression x1x3 or x2x3
~--
2365jjohn woodcock !director jpersonnel I05/11/47 j120000
1265jp.j. woodhouse !manager Isales 109/12/631 90000
You can now combine the other regular expression characters that w
to form . a rather complex sequence: 1n gr,1
"
You can use grep -E also to use egrep's extended regular expressions. The -F option
malces grep behave like fgrep.
Linux
Command Significance
l <Ill
450
' c.~
Your UN1x.
Before proceeding further, C shell users must note that When a Sed co
· 1'neu11;
. h t i·ine by pressing the [Enter] key, the shell generates an ll"lrtia d .
rn t e nex . error 11 is c
, t hed" quote As a general rule, escape all lines except th and co.... °'1tin..
'unma c
th · . . , . e Iast vv· h " 1Plai •
? prompt. (Some systems lrke Solarrs don t display this prornpt) 1t a\ to ns ~.
eh· escaping is required are pornte · d out sometimes,
· but not alway · The s1tlJatio
· 9en..•~"I
sue
d d . h f
't work which means the comman rn t at orm simply Won't
. s. Sorn . ns 'M.
et1mes •·"'•
oesn , vvork · , ¾ .'
Note Chapter 1 7 emphasizes that the Korn and bash shells :are far supe . 1n this sh
C shell. You'll find sed easier to use .1f you c h oose e1t . h er Korn or bash nor to 8OlJrne ea11·
at least the Bourne shell. If you still don't want to change your worki~s ~lJr login sh~
a different shell for running the sed commands and awk prograrns d~ shell, at least
chapter. Simply execute sh, ksh or bas h-w h.1c h ever .1s available . on youiscussed 1 .n the nu~
tinue working normally. At the end of your session, . run exit to return tor~~m--and ~-
The address and action are normally enclosed within a pair of single quotes. As you
-
have already learned by now, _you should use double quotes only when parameter eval-
uation or command substitution is embedded in a sed instruction. ·
Reversing Line Selection Criteria ( !) You can use sed's negation operator (!)with
any action. So selecting the first two lines is the same as not selecting lines 3 through
the end. The comm~d sequence prior to the previous one can be written in this way too:
Selecting Lines from the Middle sed can also select lines from the middle of a
file-something that's not possible with either head or taf 1 (acting alone):
You can place all these instructions in a single line too, but each instruction has to be
preceded by the -e option:
Same as above
sed -n -e '1, 2p' -e 'l,9p' -e '$p' emp.1st
J '
1s way• s1h
. ·r~
$ sed -n '/From: /p' $HOME/mbox
From: janis joplin <joplinj@altavista.net>
From: charles king <charlesk@rocketmail.com>
From: Monica Johnson <Moni~aj@Web6000.com>
From: The Economist <business@lists.economist.com>
Both awk and perl also support this form of addressing. Ideally y h
lookmg. for From: at the begmnmg
. . of a hne.
. sed also accepts regulare
' ou s OUld. llill1~
.
type we used w1th:grep. . command hnes
The followmg . should refreshXpress1ons
y r
01~
. our me~ :
sed -n '/AFrom: /p' $HOME/mbox
sed -n '/wilco[cx]k*s*/p' emp.lst " matches at beginnif19
sed -n "/o'br[iy][ae]n/p Both wi 1cox andwi1%
/lennoh/p" emp.lst Either the o'boen,
ler.rQ
Note that we had to use double quotes in the third example because the pattern i~
contains a single quote. Double quotes protect single quotes in just the same waim
gle quotes protect double, . ·
C shell users should note that you must-•add ai' \ at the_end of the first line in
the third example above. Otherwise, the shell will generate the error message
11
Unmatched • as it always does when~ver it sees a line containing an uncl~
double or single quote.
. to I'is t files W
In a previous example (15.4.4 ), we used 1s and grep in a pipehne ~
h . . for the group. We can do that w1'th sed as well:
ave wnte
. perm1ss1on
You can actually key in as many lines as you wish, but you have to precede the [Enter]
key in each line J(
except the last with a\ . This technique has to be followed when using
I ,,,. \
a numerici"F
filename. You can
'~, • ,
use any_. J ffilename
,• ; {; , -.
here" you want; it's just that you are I
Double-Spacing Text Wh~t is the consequence o( not using an address with these
commands? The inserted or changed text is t):ten placed after or befote every li~e of the
1
fi le. The following command: ' • •;
I ' ">
inserts 'a'blank line before each line 6f the pnn ted. This is another way of double- fileis
spacing text (9.4.1). The difference' betw'een .; artd a is that ; inserts text before the
addressed line, while a does the same after the line.
{'l • • • 't~J
1
" oi a u1\t" <r ,.... .,
These commands won't work in the C shell in the way described here. You have to
use two /s for lines that already have one f, and one-/ when there is none. The pre-
vious command ,will work in this way, in the C shell:
sed 'i \ \ Two \s here
\ and one here
I foo
This is im'~wkward form 'of usage and is not intuitive at all. The sed, awk and
~rl co~an* sho~~ ,,flln in ~so!her sh~~s. O l;;i •I , v. :-. , l. I
Reading in a File (r) Toe r command lets you read in a file at a certain location
of the file. This is how you can •insert a form's details from an external file
template.html after the <FORM>•ta'g: .)
r ... , ,
454
Thew (write) command writes the selected lines to a s~parate file. You can save~
- lines contained within the <FORM> and </FORM> tags in a separate file:
Every <FORM> tag in an HTML file has a corresponding </FORM> tag. The /here neoJ
escaping as/ is sed's pattern delimiter. Here, the form contents are extractedandsai~
in forms. html. To go further, you can save all form segments from all HTMLfilesi
a single file:
1
sed /<FORM>/,/<\/FORM>/w forms.html' *.html
sed's power doesn't stop here. Since it accepts more than one address, you cdanstore
pedtx~e
a full context splitting of its input. You can search for three patterns an
matched lines in three separate files-all in one shot:
separate files. If you prefer silent behavior, then us~ the -n option. ace~
When there ar
- .. . .
tion to 111
se the -f op f il i'·
. . e numerous ed1t1ng instructions to perform, u
instructions from flI F e sed •
f tnstr,
. st . a e. or the example above, you can us
Tip 1
where 1n r • fi contains the instructions in this format:
. Regular Expressions-grep and sed 455
fJI~ VSJll8
115:
r/ /<FORM>/,/<\/FORM>/w forms.html
/<fRAME>/,/<\/FRAME>/w frames.html
/<TABLE>/,/<\/TABLE>/w tables.html
you can specify some more instructions with the -e optl . h
st from the file
.
on m t e command hne and let se
d
take the re ·
substitution
15,11 sed's strongest feature is undoubtedly substI·tuu·on ' aChieved Wl'th Its
· s (SUbstitute
· )
coounand. It lets you replace
. . a pattern in its input w'th
I somethi ng e1se. v:.1ou have
encountered the syntax m v1 before (4.16):
[addressJs /expression]/ string2/ flag
Here, exp~essionl (which can also be a regular expression) is replaced by str:ing2 in all
lines s~ifi~d by _the laddress]. Unlike in vi, however, if the address is not specified,
the subsbtution will be performed for all lines containing expression]. This is how you
replace the I with a colon: _
$ sed 's/l/:/ 1 emp.lst I head -2
2233:charles harris lg.m. Isales 112/12/521 90000
9876:bill johnson !director jproductionj03/12/50ll30000
But notice what happened. Just the first (left-most) instance of the I in a line has been
replaced. You need to use the g (global) flag to replace all the pipes
sed's output here is compared with the original file. (cmp's - l opti·
on Proct
each unmatched character.) When we counts these lines it effective! uces a Ii,.
' · Y te 1ls u - ""' r~
have been replaced. A count of O would mean that no substitution h be s that 75 Pi
as en Perfo
Performing Multiple Substitutions You can perform multiple b . O!ied.
. SU Slit ·
one invocation of sed. Simply press [Enter] at the end of each instru . UIJons~~
· · ctJon
close the quote at the end: ' and Oien
$ sed 's/<I>/<EM>/g
For csh add O \
> s/<B>/<S~RONG>/g Ol!r(f
for every rflli
> s/<U>/<EM>/g' form.html
e~cept knt
sed is a stream editor; it works on a data .stream. This means that an instru .
. Th. . . ctton
processes th. e output o f the prevwus one. 1s 1s something users often forget· 1h
don ' t get the sequence right. Note that the following sequence finally converts ~I<;!
tags to <STRONG>:
$ sed 's/<I>/<EM>/g
> s/<EM>/<STRONG>/g' form.html
When lb.ere are-a group of instructions to execute, you should place these s instructions
in a file instead, and then use sed with the - f option.
When a g is used at the end of a substitution instruction, the change is performed global~
along the line. Without it, only the left-most occurrence is substituted.
Note
Compressing Multiple Spaces How do you delete the trailing spaces from ~e sec·
ond, third and fourth fields of the employee· databasef The,regular expression required lil
the source string needs to signify zero or more occurrences of a spa<;~, followed by a 1-
The second form sugge sts that sed "remembers" the scanned pattern, and stores it in
// (2 frontslashes). The II representing ~n empt~ (or null) regular expression is inter-
preted to mean that the search and substituted patterns are thesame. We'll call it the
remembered pattern.
However, when you use / / in the target string, it means you are removing the
pattern totally:
The address /di rector/ in the third form appears to be redundant. However, you must
understand this form also because it widens the scope of substitution. It's possible that
you may like to replace a string in all lines containing a different string:
J,
The known as the repeat~d pattern, expands to the ~ntire so~rce string. Apart_ from
the numbered tag ( 15.12.2), rhe & is tl'le only other special character you can use m the
replacement string. All other characters are treated literally.