Professional Documents
Culture Documents
Awk - Report Generator and Data Filter: Unix Tools - Topic 6 Objectives
Awk - Report Generator and Data Filter: Unix Tools - Topic 6 Objectives
Monash University ©Ronald Pose 2006 CSE2391 / CSE3391 1 Monash University ©Ronald Pose 2006 CSE2391 / CSE3391 2
Monash University ©Ronald Pose 2006 CSE2391 / CSE3391 3 Monash University ©Ronald Pose 2006 CSE2391 / CSE3391 4
1
Built-in Variables Built-in Variables (cont)
§ Variables are available for access to field, record, file and § Following prints the third field of records starting with
control information: hello and prints Some fields and the first and last fields
RS input record separator - "\n" by default for those records containing there:
NR record number a w k '/^hello/ { print $3 }
FS field separator - " \t" by default /there/ { print "Some fields", $1, $NF }'
NF number of fields in current record • For records containing there, the , character between
$0 current line each argument forces a space (ie OFS) to be printed
$1 first field, $2 second field, etc between each.
$i ith field, eg $NF is last field
OFS output field separator - space by default
ORS output record separator - "\n" by default
FILENAME name of current file
Monash University ©Ronald Pose 2006 CSE2391 / CSE3391 5 Monash University ©Ronald Pose 2006 CSE2391 / CSE3391 6
Monash University ©Ronald Pose 2006 CSE2391 / CSE3391 7 Monash University ©Ronald Pose 2006 CSE2391 / CSE3391 8
2
Operators (cont) BEGIN and END Patterns
§ Patterns can be regular expressions as in previous examples. § Two special patterns are available to enable action before
Or can be ranges such as: reading the first record and after reading the last record.
/start/,/finish/ { print } § Following begin.awk script initializes field separator to be
§ Can have expressions such as printing records 11 to 20 and space and : (ie a set of delimiters) and prints a heading:
every record that is multiple of 3 thereafter but indented:
BEGIN {
NR > 10 && NR <= 20 { print NR ":" $0 }
FS = " :"
NR > 20 && NR%3 == 0 { print "\t" $0 }
• Above also illustrates concatenation of strings. Record print "Stud ID\tMark"
number is concatenated (joined) with : and the record prior }
to output. { print $1 " \t" $2 }
§ Can use ~ operator. Eg print records whose 3rd field ends in § Following end.awk script illustrates adding the values in the
ing:
2nd field and printing the total and average on completion:
$3 ~ /ing$/ { print "3rd field", $3 }
{ total += $2 }
§ And !~ operator. Eg print records whose 3rd field is not
contained in the 1st field: END {
NF >= 3 && $1 !~ $3 { print } print "Total: " total "\nAverage: " total/NR
}
Monash University ©Ronald Pose 2006 CSE2391 / CSE3391 9 Monash University ©Ronald Pose 2006 CSE2391 / CSE3391 10
3
Actions (cont) Actions (cont)
§ The weekly.awkscript produces a heading and line for BEGIN { FS = ":"; print FILENAME }
each employee containing name and total hours and $1 != name { # new name
minutes worked and run by following, which also saves
output to week12Report: if (NR != 1) # not first line
print name, hours ":" minutes
a w k -f weekly.awkweek12 > week12Report
name = $1; hours = $3; minutes = $4
}
$1 == name { # same name - totals
hours += $3; minutes += $4
if (minutes >= 60) {
hours++; minutes -= 60
}
}
# last person’s totals
END { print name, hours ":" minutes }
Monash University ©Ronald Pose 2006 CSE2391 / CSE3391 13 Monash University ©Ronald Pose 2006 CSE2391 / CSE3391 14
Monash University ©Ronald Pose 2006 CSE2391 / CSE3391 15 Monash University ©Ronald Pose 2006 CSE2391 / CSE3391 16
4
Associative Array Examples (cont) Associative Array Examples (cont)
§ The following phone.awkscript produces a current § Notice that the name strings indexing into array phone
phone list: are not known in advance and only a single pass of input
BEGIN { print "Phone List" } file is possible.
{ § for (name in phone) iterates through each different index
name = "" string used, enabling access without the programmer
remembering these.
for (i=1; i<NF; i++)
name = name " " $i
phone[name] = $NF
}
END {
for (name in phone)
print name, phone[name]
}
Monash University ©Ronald Pose 2006 CSE2391 / CSE3391 17 Monash University ©Ronald Pose 2006 CSE2391 / CSE3391 18
5
Associative Array Examples (cont) Built-in Functions
END { § awk has a limited number of built-in functions, including string
for (name in room) functions:
if ( phone[name] == "") length(str) return length of str;
phone[name] = " -" length() of current record
for (name in phone) index(str1,str2) return position of str1 in str2, or 0
if ( room[name] == "") substr(str,pos,len) substring of str starting at pos of
room[name] = "-" length len
for (name in phone) substr(str,pos) substring of str starting at pos
printf "%-20s Rm %4s Ph %4s\n", name, \ split(str,arr,delim) count of split str into subfields
room[name], phone[name] stored in arr delimited by delim
} match(str,regexp) return position of pattern regexp
§ Illustrates in str
• Continuation with \ character. sprintf(fmt,list ) return string formatted by fmt for
• printf command is available. Personal preference: values in list
printf("%-20s Rm %4s Ph %4s\n", name, \
room[name], phone[name])
Monash University ©Ronald Pose 2006 CSE2391 / CSE3391 21 Monash University ©Ronald Pose 2006 CSE2391 / CSE3391 22
6
Built-in Function Examples (cont) Built-in Function Examples (cont)
§ Smarter approach would be to break line at a space at or before § Consider a file that has three : separated fields:
column 40 (omitting the extra space): • 1st field is a description of a task.
{ line = $0
while (length(line) > 40) { • 2nd field contains comma separated subfields of
prices.
for (i=40; i>=1; i --)
if (substr(line, i, 1) == " ") • 3rd field is date or period when the task is scheduled.
break; Eg:
Back hoe trenches and refill:842,84.20,44.00:20- 22 Mar
if (i == 0)
Supply and spread 20m loam:475,47.50,550,75.50:Fri 31 Mar
i = 40
head = substr(line, 1, i) In larger font (but imagine as two lines only):
print head "\\" Back hoe trenches and refill:842,84.20,44.00:20-22 Mar
line = substr(line, i+1) Supply and spread 20m loam:475,47.50,550,75.50:Fri 31
} Mar
print line § Following subfields.awkscript will replace costs with
} their total:
Monash University ©Ronald Pose 2006 CSE2391 / CSE3391 25 Monash University ©Ronald Pose 2006 CSE2391 / CSE3391 26
Monash University ©Ronald Pose 2006 CSE2391 / CSE3391 27 Monash University ©Ronald Pose 2006 CSE2391 / CSE3391 28
7
nawk and gawk nawk and gawk (cont)
§ nawkand gawk have many more features including: § Can also define functions:
nextfile start processing next input file function name(arg1,...)
sub(rexp,s,str) substitute s for rexp in str { statements; return x }
gsub(rexp,s,str) globally substitute s for rexp • Parameters, arg1 etc, are local variables.
in str • All other variables are global.
tolower(str) return lower case str § Can execute commands outside. Eg:
toupper(str) return upper case str system("date >today")
rand() return random number x § Can read from files. Eg:
where 0 <= x < 1 getline dayandtime < "today"
srand(exp) seed random number generator
tan2(y,x) arctangent in radians of y/x
Monash University ©Ronald Pose 2006 CSE2391 / CSE3391 29 Monash University ©Ronald Pose 2006 CSE2391 / CSE3391 30