Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Unix Tools - Topic 6 Objectives

§ Understand uses for a w k, commands and be able to


read scripts.
§ Be able to use a w k from the command line and from an
a w k script.
awk - Report Generator and § Be able to use the most commonly use features
including:
Data Filter
• patterns, actions and operators;
• BEGIN and END patterns;
• programming control structures;
• built-in variables;
• associative arrays.

Monash University ©Ronald Pose 2006 CSE2391 / CSE3391 1 Monash University ©Ronald Pose 2006 CSE2391 / CSE3391 2

awk - Report Generator and Data Filter awk Programs


§ awk takes its name from its original authors: Aho, Weinberger § Programs have the general form:
and Kernighan, and dates back to the late 1970s. pattern { action }
§ Many newer versions, including nawk and GNU gawk.
We will only consider the older awk. • If pattern matches current line, action is applied to
line.
§ awk typically produces an output stream, either from input
file(s) or from standard input, ie as a filter. • If pattern omitted, action applies to each line.
§ awk program determines how the input is modified. § Following prints lines from file starting with the pattern
§ Input lines are called records. hello:
§ Records can contain one or more fields. a w k '/^hello/ { print }' file
§ Fields are separated by field separators: spaces and tabs by § Default action is print line. Above is same as:
default. a w k '/^hello/' file
§ Actions are written in a modified C syntax, providing a proper
but restricted programming language.
§ All variables are stored as strings.

Monash University ©Ronald Pose 2006 CSE2391 / CSE3391 3 Monash University ©Ronald Pose 2006 CSE2391 / CSE3391 4

1
Built-in Variables Built-in Variables (cont)
§ Variables are available for access to field, record, file and § Following prints the third field of records starting with
control information: hello and prints Some fields and the first and last fields
RS input record separator - "\n" by default for those records containing there:
NR record number a w k '/^hello/ { print $3 }
FS field separator - " \t" by default /there/ { print "Some fields", $1, $NF }'
NF number of fields in current record • For records containing there, the , character between
$0 current line each argument forces a space (ie OFS) to be printed
$1 first field, $2 second field, etc between each.
$i ith field, eg $NF is last field
OFS output field separator - space by default
ORS output record separator - "\n" by default
FILENAME name of current file

Monash University ©Ronald Pose 2006 CSE2391 / CSE3391 5 Monash University ©Ronald Pose 2006 CSE2391 / CSE3391 6

Command Options Operators


§ Above program, ie patterns and actions, can be placed § Patterns and actions can contain operators:
in an a w k script. Eg hellothere contains:
+ - * / ++ -- % arithmetic
/^hello/ { print $3 }
/there/ { print "Last field", $NF } = += -= *= /= %= assignment
• Specify script by: < <= > >= == != && || !
a w k -f hellothere relational and logical
§ Examples given as scripts from hereafter. ~ !~ match (contains), not match
§ Can change field separator FS to say : character by: (does not contain)
a w k -F: '{ print $3 }' § ~ operator is true if first operand matches (or contains)
§ Comments follow # character, but not in a string like "Not the second operand which is a regular expression.
# here".
§ Can continue code on the next line by ending current line § !~ operator is true if first operand does not match (or
at an appropriate place with \ character. does not contain) the second operand which is a regular
expression.

Monash University ©Ronald Pose 2006 CSE2391 / CSE3391 7 Monash University ©Ronald Pose 2006 CSE2391 / CSE3391 8

2
Operators (cont) BEGIN and END Patterns
§ Patterns can be regular expressions as in previous examples. § Two special patterns are available to enable action before
Or can be ranges such as: reading the first record and after reading the last record.
/start/,/finish/ { print } § Following begin.awk script initializes field separator to be
§ Can have expressions such as printing records 11 to 20 and space and : (ie a set of delimiters) and prints a heading:
every record that is multiple of 3 thereafter but indented:
BEGIN {
NR > 10 && NR <= 20 { print NR ":" $0 }
FS = " :"
NR > 20 && NR%3 == 0 { print "\t" $0 }
• Above also illustrates concatenation of strings. Record print "Stud ID\tMark"
number is concatenated (joined) with : and the record prior }
to output. { print $1 " \t" $2 }
§ Can use ~ operator. Eg print records whose 3rd field ends in § Following end.awk script illustrates adding the values in the
ing:
2nd field and printing the total and average on completion:
$3 ~ /ing$/ { print "3rd field", $3 }
{ total += $2 }
§ And !~ operator. Eg print records whose 3rd field is not
contained in the 1st field: END {
NF >= 3 && $1 !~ $3 { print } print "Total: " total "\nAverage: " total/NR
}
Monash University ©Ronald Pose 2006 CSE2391 / CSE3391 9 Monash University ©Ronald Pose 2006 CSE2391 / CSE3391 10

Actions Actions (cont)


§ Syntax for awk actions is similar to C, with a few exceptions. § Consider operating on a file week12 containing the hours
§ Newline at end of line signifies end of statement. worked daily by casual employees in the previous (12th) full
week of the year. A summary of hours is required. week12
• Use at end of action line to continue on next line. contains sets of : separated records with name, date, hours
• Can have more than one statement on a line. Use ; to separate. and minutes and is sorted by name. Eg:
§ All variables are strings. Barney Rubble:21/3/2004:4:30
• No declarations of variables. Barney Rubble:22/3/2004:7:30
• First use defines variable, like in sh. Barney Rubble:23/3/2004:4:15
• Uninitialized value of "" which is interpreted as 0 for arithmetic. Barney Rubble:25/3/2004:4:30
§ Associative arrays available (see later). Barney Rubble:27/3/2004:7:0
§ ~ and !~ operators available for matching in expressions. Fred Flintstone:21/3/2004:7:30
§ Control structures include: Fred Flintstone:23/3/2004:4:45
if -then-else, for, while, do-while, break, continue. Fred Flintstone:24/3/2004:8:15
§ Also next command: finishes with current record, starts with next Fred Flintstone:25/3/2004:4:0
record at first pattern.
Fred Flintstone:26/3/2004:7:35
Fred Flintstone:27/3/2004:7:30
Monash University ©Ronald Pose 2006 CSE2391 / CSE3391 11 Monash University ©Ronald Pose 2006 CSE2391 / CSE3391 12

3
Actions (cont) Actions (cont)
§ The weekly.awkscript produces a heading and line for BEGIN { FS = ":"; print FILENAME }
each employee containing name and total hours and $1 != name { # new name
minutes worked and run by following, which also saves
output to week12Report: if (NR != 1) # not first line
print name, hours ":" minutes
a w k -f weekly.awkweek12 > week12Report
name = $1; hours = $3; minutes = $4
}
$1 == name { # same name - totals
hours += $3; minutes += $4
if (minutes >= 60) {
hours++; minutes -= 60
}
}
# last person’s totals
END { print name, hours ":" minutes }
Monash University ©Ronald Pose 2006 CSE2391 / CSE3391 13 Monash University ©Ronald Pose 2006 CSE2391 / CSE3391 14

Associative Arrays Associative Array Examples


§ Most programming languages index arrays by integer § Consider file that contains space separated names and
values. internal phone numbers. It is appended to each time a
person changes phone number. Eg:
a w k has no integer type since all variables are strings.
Carol Wright 3873
Hence its 1-dimensional arrays are indexed by strings.
Brad 1432
• Indexing by a string requires a look-up of the string Carol Wright 2832
index (as opposed to addition of offset times integer Goh Bee Leng 3241
index).
• Look-up process associates string indexed array with
a particular element. Hence called associative array.
• Can be thought of as keeping a list of
array_name.string_index variables.

Monash University ©Ronald Pose 2006 CSE2391 / CSE3391 15 Monash University ©Ronald Pose 2006 CSE2391 / CSE3391 16

4
Associative Array Examples (cont) Associative Array Examples (cont)
§ The following phone.awkscript produces a current § Notice that the name strings indexing into array phone
phone list: are not known in advance and only a single pass of input
BEGIN { print "Phone List" } file is possible.
{ § for (name in phone) iterates through each different index
name = "" string used, enabling access without the programmer
remembering these.
for (i=1; i<NF; i++)
name = name " " $i
phone[name] = $NF
}
END {
for (name in phone)
print name, phone[name]
}

Monash University ©Ronald Pose 2006 CSE2391 / CSE3391 17 Monash University ©Ronald Pose 2006 CSE2391 / CSE3391 18

Associative Array Examples (cont) Associative Array Examples (cont)


§ Consider file that contains names and internal phone BEGIN { print "\tStaff List" }
numbers and room numbers. It is appended to each time {
a person changes rooms or phone number. The
following illustrates the four cases that can occur: name = ""
for (i=1; i<=NF; i++)
Carol Wright 3873 3.24
Brad 1.31 if ($i ~ /\./)
room[name] = $i
Carol Wright 2832
Goh Bee Leng 2.34 3241 else if ($i ~ /^[0-9][0-9][0-9][0-9]$/)
phone[name] = $i
So:
• the file is space separated; else
name = name " " $i
• names can be one or more words;
• room numbers contain . character; and }
• phone numbers have four digits. # continued next slide …
§ The following list.awkproduces a current list and allows
for missing room and phone numbers:
Monash University ©Ronald Pose 2006 CSE2391 / CSE3391 19 Monash University ©Ronald Pose 2006 CSE2391 / CSE3391 20

5
Associative Array Examples (cont) Built-in Functions
END { § awk has a limited number of built-in functions, including string
for (name in room) functions:
if ( phone[name] == "") length(str) return length of str;
phone[name] = " -" length() of current record
for (name in phone) index(str1,str2) return position of str1 in str2, or 0
if ( room[name] == "") substr(str,pos,len) substring of str starting at pos of
room[name] = "-" length len
for (name in phone) substr(str,pos) substring of str starting at pos
printf "%-20s Rm %4s Ph %4s\n", name, \ split(str,arr,delim) count of split str into subfields
room[name], phone[name] stored in arr delimited by delim
} match(str,regexp) return position of pattern regexp
§ Illustrates in str
• Continuation with \ character. sprintf(fmt,list ) return string formatted by fmt for
• printf command is available. Personal preference: values in list
printf("%-20s Rm %4s Ph %4s\n", name, \
room[name], phone[name])
Monash University ©Ronald Pose 2006 CSE2391 / CSE3391 21 Monash University ©Ronald Pose 2006 CSE2391 / CSE3391 22

Built-in Functions (cont) Built-in Function Examples


§ a w k has a limited number of built-in functions, including § Consider a file that has long lines. Any line longer than
mathematical functions: 40 characters, is to be continued on the next line with the
int(number) integer part of number 41st and 42nd characters output as" \" to indicate
cos(x) cosine of x radians continuation.
sin(x) sine of x radians cont.awkperforms task:
log(x) natural log of x { line = $0
exp(x) exponent (e to power) of x while (length(line) > 40) {
sqrt(x) square root of x head = substr(line , 1, 40)
print head " \\"
line = substr(line, 41)
}
print line
}
Monash University ©Ronald Pose 2006 CSE2391 / CSE3391 23 Monash University ©Ronald Pose 2006 CSE2391 / CSE3391 24

6
Built-in Function Examples (cont) Built-in Function Examples (cont)
§ Smarter approach would be to break line at a space at or before § Consider a file that has three : separated fields:
column 40 (omitting the extra space): • 1st field is a description of a task.
{ line = $0
while (length(line) > 40) { • 2nd field contains comma separated subfields of
prices.
for (i=40; i>=1; i --)
if (substr(line, i, 1) == " ") • 3rd field is date or period when the task is scheduled.
break; Eg:
Back hoe trenches and refill:842,84.20,44.00:20- 22 Mar
if (i == 0)
Supply and spread 20m loam:475,47.50,550,75.50:Fri 31 Mar
i = 40
head = substr(line, 1, i) In larger font (but imagine as two lines only):
print head "\\" Back hoe trenches and refill:842,84.20,44.00:20-22 Mar
line = substr(line, i+1) Supply and spread 20m loam:475,47.50,550,75.50:Fri 31
} Mar
print line § Following subfields.awkscript will replace costs with
} their total:

Monash University ©Ronald Pose 2006 CSE2391 / CSE3391 25 Monash University ©Ronald Pose 2006 CSE2391 / CSE3391 26

Built-in Function Examples (cont) Built-in Function Examples (cont)


BEGIN { FS = ":"; OFS = ":" } § Easier to use split as in split.awkwith printf to force
{ output of cents:
total = 0 BEGIN { FS = ":" }
field = $2
{ total = 0
comma = index(field, ",")
while (comma > 0) { count = split($2, parts, ",")
total += substr(field , 1, comma- 1) for (i=1; i<=count; i++)
field = substr(field, comma+1) total += parts[i]
comma = index(field, ",") printf("%s:%.2f:%s\n", $1, total, $3)
}
}
total += field
print $1, total, $3 • split breaks into separate strings removing the
} delimiter.

Monash University ©Ronald Pose 2006 CSE2391 / CSE3391 27 Monash University ©Ronald Pose 2006 CSE2391 / CSE3391 28

7
nawk and gawk nawk and gawk (cont)
§ nawkand gawk have many more features including: § Can also define functions:
nextfile start processing next input file function name(arg1,...)
sub(rexp,s,str) substitute s for rexp in str { statements; return x }
gsub(rexp,s,str) globally substitute s for rexp • Parameters, arg1 etc, are local variables.
in str • All other variables are global.
tolower(str) return lower case str § Can execute commands outside. Eg:
toupper(str) return upper case str system("date >today")
rand() return random number x § Can read from files. Eg:
where 0 <= x < 1 getline dayandtime < "today"
srand(exp) seed random number generator
tan2(y,x) arctangent in radians of y/x

Monash University ©Ronald Pose 2006 CSE2391 / CSE3391 29 Monash University ©Ronald Pose 2006 CSE2391 / CSE3391 30

nawk and gawk (cont)


§ Can output to files and programs via pipe. Eg:
/find these/ { print >> "found" }
/and these/ { print $2 >> "found" }
/different/ { print $1, $2, $3 > "diffs" }
/out/ {
for (i=1; i<=NF; i++)
print $i | "sort -u"
}
§ Possible to have multiple "subscripts" to arrays.
§ Much more than above to nawkand gawk.
§ Have illustrated the power of the a w k family of programs.

§ Want more power? Use perl or python or even C, etc! :-)

Monash University ©Ronald Pose 2006 CSE2391 / CSE3391 31

You might also like