Chapter 4 - Regular Expression

Unit 4 : Regular Expressions grep FAMILY OF COMMANDS AND THE sed
Quite often a Unix user is involved in searching one or more records from a database
or one or more lines from a text file. Such a search could be for finding or extracting
a
1. file using the filename among a large number of filenames,

2. line having a specific word or a phrase in a document,
3. record based on certain data item like designation or name
4. selected portion of the output of a program, and so on.
REGULAR EXPRESSIONS
The term regular expression comes from theoretical computer science. In its
simplest form, it is defined as a language for specifying patterns that match a
sequence of characters. These patterns are made up of one of the following.
1. Normal characters that match exactly the same character in the input.
2. character classes that match any single character in the class.
3. certain other special characters that specify the way in which parts of an expression are
to be matched against the input.
Metacharacters and their Meaning
^—The Caret or Circumflex Character This metacharacter is used to search

and extract lines or records that begin with a specific pattern.
$—The Dollar Character This metacharacter is used to search and extract lines
or records that end with a specific pattern.
.—The Dot Character The dot is used to match any single character, except a
newline character.
*—The Asterisk Character Asterisk is used to match multiple characters. This
metacharacter stands for zero or more occurrences of the preceding character.
Character Class
There are situations when it is necessary to match a character from within a set of
characters. In Unix this set of characters out of which, only one character is
matched, is referred to as a character class. This set of characters is presented
within a pair of square brackets—[and the character]. For example, if the user wants
to extract all lines that have a pattern (anywhere on it) that begins with chap and
end with any one of the digits 1, 2, 3 or 4 then the search pattern will be chap[1234].
The same search pattern can also be written as chap[1-4]. The hyphen (-) indicates
the range of the characters in the set. Here [1-4] means any of the characters that
constitute the set {1,2,3,4}.
Searching for Patterns Having Metacharacters
Sometimes it is necessary to search and extract lines containing metacharacters.

This can be done by de-specialising the metacharacters that appear in the search
pattern. The metacharacter \ (backslash) is used to de-specialize or remove the
special meaning associated with any character that immediately follows it. For
example, to search and extract all lines that contain the $ character, the regular
expression has to be ‘\$’.
THE grep FAMILY
This family consists of three commands—grep, egrep (extended grep) and fgrep
(fixed grep).
The grep Command
This command is used to search, select and print specified records or lines from an
input file. grep is an acronym for globally search a regular expression and print it.
$grep [options] pattern [filename1][filename2] …
grep Options grep has a number of options like the inverse option –v, the ignore
option –i, the filename option –l, the line number option –n, the count option –c
The inverse option: –v Generally grep searches for lines or records containing a
pattern, and prints them out.
The ignore option: –i Normally, grep distinguishes between uppercase and lowercase
letters. This option (ignore case) searches for all patterns without considering the
case.
The filename option: –l When this option is used, only the filenames on which the
required pattern is present will be printed.
The count option: –c This option counts the occurrences of the records that contain
the pattern in all files given as arguments,
The line number option: –n This option prints out the line numbers of the selected
lines or records
THE egrep COMMAND
egrep stands for extended grep. This is so because it has two additional
metacharacters. These two additional metacharacters are the plus (+) character and
the question mark (?) character. This command is the most powerful member of the
grep command family. The foremost advantage of this command is that multiple
search patterns can be handled very easily. The pipe (|) character is used to mention
alternate patterns.
THE fgrep COMMAND
fgrep stands for fixed grep or fixed character grep. This command uses only fixed
characters patterns. In other words, it does not allow the use of regular expressions.
Because this command works with only fixed patterns and does not involve itself in
the interpretation of any regular expression it is the fastest among the entire
pattern-searching programs. It is used for searching large files. The important
feature of this command is that like egrep, this command also accepts multiple
search patterns.
THE STREAM EDITOR—sed
sed is an acronym for the stream editor. It is an extremely powerful editor by using
which, one can perform (affect) quick and easy changes to a file without entering
into an editor like vi or emacs and others.
The general format of a sed command is as follows.
$sed options `address_actionlist` filelist
Where action part of the address_actionlist informs the users about the action or
actions to be taken and the address part identifies a line (record) or lines (records)
on which these actions are to be taken. The filelist holds zero or more filenames from
which lines are picked up one by one, processed and sent on to the standard output,
that is the monitor.
Operational Mechanism of sed
sed reads in one line at a time, holds it in a memory space called pattern space and
acts on it as mentioned in the sed command. It then reads in the next line, acts on it
in the same manner and so on. By default, all the processed lines are sent to the
standard output—the monitor. The sed’s operational mechanism is shown in Fig.
6.3. This processing does not affect the original contents of the file in any way. If
required, the processed output can be written on to a separate file.
As shown in Fig. 6.2 every line/record read from the input file is held in a memory
area that is called the pattern space and all the commands are applied on this, one
by one. Because the sed reads in and works on a line at a time, one can alter very
large files without invoking an editor or worrying about the memory or disk-space
requirements.
The q Command—Quitting sed When this command is used, all the lines upto
and including the line addressed from the input file are picked up for processing and
then quits.
The d Command—Deleting Lines Unnecessary lines or records can be deleted
by using the delete command d
The p Command and the –n option—Printing Lines Required lines or
records can be printed by using the p command
The s Command—Substitution This is one of the very widely used commands.
Substitutions are made using the s command.
The a Command—Appending One or more lines or records can be appended to
an existing file or a database by using the append command a
The i Command—Inserting the Text Using this command, one can insert
certain text before the contents of an input file.
The c Command—Changing the Text Using this command one can change one
or more lines or records of an input line.
The w Command—Writing Files One can write the output of a sed command
onto a separate file by using the write command w.
The r Command—Reading a File The contents of a given file can be read into a
specified input file by using the read command r.

Chapter 4 - Regular Expression

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 4 - Regular Expression

Uploaded by

Copyright:

Available Formats

Unit 4 : Regular Expressions grep FAMILY OF COMMANDS AND THE sed

1. file using the filename among a large number of filenames,

Metacharacters and their Meaning

^—The Caret or Circumflex Character This metacharacter is used to search

Searching for Patterns Having Metacharacters

Sometimes it is necessary to search and extract lines containing metacharacters.

THE grep FAMILY

The grep Command

THE egrep COMMAND

THE fgrep COMMAND

THE STREAM EDITOR—sed

Operational Mechanism of sed

You might also like

Chapter 4 - Regular Expression

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 4 - Regular Expression

Uploaded by

Copyright:

Available Formats

Unit 4 : Regular Expressions ​grep FAMILY OF COMMANDS AND THE sed

1. file using the filename among a large number of filenames,

Metacharacters and their Meaning

^—The Caret or Circumflex Character​ This metacharacter is used to search

Searching for Patterns Having Metacharacters

Sometimes it is necessary to search and extract lines containing metacharacters.

THE grep FAMILY

The grep Command

THE egrep COMMAND

THE fgrep COMMAND

THE STREAM EDITOR—sed

Operational Mechanism of sed

You might also like

Unit 4 : Regular Expressions grep FAMILY OF COMMANDS AND THE sed

^—The Caret or Circumflex Character This metacharacter is used to search