Download as ps, pdf, or txt
Download as ps, pdf, or txt
You are on page 1of 39

1 Awk - A Tutoria and Introduction - by Bruce Barnett

06/24/2011 07:50:19 PM http://www.grymoire.com/Unix/Awk.htm


Awk

Last updated - Tue Dec 28 20:29:22 EST 2010
Part oI the Unix tutorias And then there' s My bog
Table of Content
Why earn AWK?
Basic Structure
Executing an AWKscript
Which she to use with AWK?
Dynamic Variabes
The Essentia Syntax oI AWK
Arithmetic Expressions
Unary arithmetic operators
The Autoincrement and Autodecrement Operators
Assignment Operators
Conditiona expressions
Reguar Expressions
And/Or/Not
Commands
AWKBuit-in Variabes
FS - The Input Fied Separator Variabe
OFS - The Output Fied Separator Variabe
NF - The Number oI Fieds Variabe
NR - The Number oI Records Variabe
RS - The Record Separator Variabe
ORS - The Output Record Separator Variabe
FILENAME - The Current Fiename Variabe
Associative Arrays
Muti-dimensiona Arrays
Exampe oI using AWK' s Associative Arrays
Output oI the script
Picture PerIect PRINTF Output
PRINTF - Iormatting output
Escape Sequences
Format SpeciIiers
Width - speciIying minimum Iied siz e
LeIt JustiIication
The Fied Precision Vaue
Expicit Fie output
AWKNumerica Functions
Trigonometric Functions
Exponents. ogs and square roots
Truncating Integers
"Random Numbers
The Lotto script
String Functions
The Length Iunction
The Index Function
The Substr Iunction
GAWK's Toower and Toupper Iunction
The Spit Iunction
NAWK's string Iunctions
The Match Iunction
The System Iunction
The Getine Iunction
The systime Iunction
The StrItime Iunction
User DeIined Functions
AWKpatterns
2 Awk - A Tutoria and Introduction - by Bruce Barnett
06/24/2011 07:50:19 PM http://www.grymoire.com/Unix/Awk.htm
Formatting AWKprograms
Environment Variabes
ARGC - Number or arguments (NAWK/GAWK)
ARGV - Array oI arguments (NAWK/GAWK)
ARGIND- Argument Index (GAWK ony)
FNR (NAWK/GAWK)
OFMT (NAWK/GAWK)
RSTART. RLENGTHand match (NAWK/GAWK)
SUBSEP - Muti-dimensiona array separator (NAWK/GAWK)
ENVIRON - environment variabes (GAWK ony)
IGNORECASE (GAWK ony)
CONVFMT - conversion Iormat (GAWKony)
ERRNO - system errors (GAWK ony)
FIELDWIDTHS - Iixed width Iieds (GAWKony)
AWK. NAWK. GAWK. or PERL
Copyright 2001. 2004 Bruce Barnett and Genera Eectric Company
A rights reserved
You are aowed to print copies oI this tutoria Ior your persona use . and ink to this page. but you are not aowed to make eectronic copies.
or redistribute this tutoria in any Iorm without permission.
Origina version written in 1994 and pubished in the Sun Observer
Updated: Tue Mar 9 11:07:08 EST 2004
Updated: Fri Feb 16 05:32:38 EST 2007
Updated: Wed Apr 16 20:55:07 EDT 2008
Awk is an extremey versatie programming anguage Ior working on Iies. We' teach you iust enough to understand the exampes in this
page. pus a smidgen.
The exampes given beow have the extensions oI the executing script as part oI the Iiename. Once you downoad it. and make it executabe .
you can rename it anything you want.
Why learn AWK?
In the past I have covered grep and sed. This section discusses AWK. another cornerstone oI UNIX she programming. There are three
variations oI AWK:
AWK- the origina Irom AT&T
NAWK - A newer. improved version Irom AT&T
GAWK - The Free SoItware Ioundation' s version
Originay. I didn' t pan to discuss NAWK. but severa UNIX vendors have repaced AWKwith NAWK. and there are severa
incompatibiities between the two. It woud be crue oI me to not warn you about the diIIerences. So I wi highight those when I come to them.
It is important to know than a oI AWK's Ieatures are in NAWK and GAWK. Most. iI not a. oI NAWK's Ieatures are in GAWK. NAWK
ships as part oI Soaris. GAWK does not. However. many sites on the Internet have the sources Ireey avaiabe. II you user Linux. you have
GAWK. But in genera. assume that I am taking about the cassic AWKuness otherwise noted.
Why is AWKso important? It is an exceent Iiter and report writer. Many UNIX utiities generates rows and coumns oI inIormation. AWK is
an exceent too Ior processing these rows and coumns. and is easier to use AWKthan most conventiona programming anguages. It can be
considered to be a pseudo-C interpretor. as it understands the same arithmatic operators as C. AWKaso has string manipuation Iunctions. so
it can search Ior particuar strings and modiIy the output. AWKaso has associative arrays. which are incredibe useIu. and is a Ieature most
computing anguages ack. Associative arrays can make a compex probem a trivia exercise .
I won't exhaustivey cover AWK. That is. I wi cover the essentia parts. and avoid the many variants oI AWK. It might be too conIusing to
discuss three diIIerent versions oI AWK. I won't cover the GNU version oI AWK caed "gawk." Simiary. I wi not discuss the new AT&T
AWKcaed "nawk. " The new AWKcomes on the Sun system. and you may Iind it superior to the od AWKin many ways. In particuar. it
has better diagnostics. and won' t print out the inIamous "baiing out near ine .. ." message the origina AWKis prone to do. Instead. "nawk"
prints out the ine it didn't understand. and highights the bad parts with arrows. GAWK does this as we. and this reay heps a ot. II you
Iind yourseI needing a Ieature that is very diIIicut or impossibe to do in AWK. I suggest you either use NAWK. or GAWK. or convert your
AWKscript into PERL using the "a 2p" conversion program which comes with PERL. PERL is a marveous anguage . and I use it a the time .
but I do not pan to cover PERL in these tutorias. Having made my intention cear. I can continue with a cear conscience.
Many UNIX utiities have strange names. AWKis one oI those utiities. It is not an abbreviation Ior awkward. In Iact. it is an eegant and simpe
anguage . The work "AWK" is derived Irom the initias oI the anguage 's three deveopers: A. Aho. B. W. Kernighan and P. Weinberger.
Baic Structure
3 Awk - A Tutoria and Introduction - by Bruce Barnett
06/24/2011 07:50:19 PM http://www.grymoire.com/Unix/Awk.htm
The essentia organiz ation oI an AWKprogram Ioows the Iorm:
pattern action }
The pattern speciIies when the action is perIormed. Like most UNIX utiities. AWKis ine oriented . That is. the pattern speciIies a test that is
perIormed with each ine read as input. II the condition is true . then the action is taken. The deIaut pattern is something that matches every
ine. This is the bank or nu pattern. Two other important patterns are speciIied by the keywords "BEGIN" and "END." As you might expect.
these two words speciIy actions to be taken beIore any ines are read. and aIter the ast ine is read. The AWKprogram beow:
8L61u { prnt "1^81" |
{ prnt |
Lu0 { prnt "10" |
adds one ine beIore and one ine aIter the input Iie. This isn' t very useIu. but with a simpe change. we can make this into a typica AWK
program:
BEGIN print "Fie \tOwner". " }
print $8. "\t". $3}
END print " - DONE -" }
I' improve the script in the next sections. but we' ca it "FieOwner." But et' s not put it into a script or Iie yet. I wi cover that part in a bit.
Hang on and Ioow with me so you get the Iavor oI AWK.
The characters "\t" Indicates a tab character so the output ines up on even boundries. The "$8" and "$ 3" have a meaning simiar to a she
script. Instead oI the eighth and third argument. they mean the eighth and third Iied oI the input ine. You can think oI a Iied as a coumn.
and the action you speciIy operates on each ine or row read in.
There are two diIIerences between AWKand a she processing the characters within doube quotes. AWKunderstands specia characters
Ioow the "\" character ike "t". The Bourne and C UNIX shes do not. Aso. unike the she (and PERL) AWKdoes not evauate variabes
within strings. To expain. the second ine coud not be written ike this:
print "$8\t$3" }
That exampe woud print "$8 $3." Inside the quotes. the doar sign is not a specia character. Outside . it corresponds to a Iied. What do I
mean by the third and eight Iied? Consider the Soaris "/usr/bin/s -" command . which has eight coumns oI inIormation. The System V
version (Simiar to the Linux version). "/usr/5bin/s -. " has 9 coumns. The third coumn is the owner. and the eighth (or nineth) coumn in the
name oI the Iie. This AWKprogram can be used to process the output oI the "s -" command. printing out the Iiename. then the owner. Ior
each Iie. I' show you how.
Update: On a inux system. change "$8" to "$9".
One more point about the use oI a doar sign. In scripting anguages ike Per and the various shes. a doar sign means the word Ioowing is
the name oI the variabe. Awk is diIIerent. The doar sign means that we are reIering to a Iied or coumn in the current ine. When switching
between Per and AWK you must remener that "$" has a diIIerent meaning. So the Ioowing piece oI code prints two "Iieds" to standard out.
The Iirst Iied printed is the number "5". the second is the IiIth Iied (or coumn) on the input ine .
BEGIN x5 }
print x. $x}
Executing an AWK cript
So et's start writing our Iirst AWKscript. There are a coupe oI ways to do this.
Assuming the Iirst script is caed "FieOwner. " the invocation woud be
s - ' FieOwner
This might generate the Ioowing iI there were ony two Iies in the current directory:
Fie Owner
a. Iie barnett
another.Iie barnett
- DONE -
There are two probems with this script. Both probems are easy to Iix. but I' hod oII on this unti I cover the basics.
The script itseI can be written in many ways. The C she version woud ook ike this:
#!1bn1csh -1
# lnux users have to change $8 to $9
4 Awk - A Tutoria and Introduction - by Bruce Barnett
06/24/2011 07:50:19 PM http://www.grymoire.com/Unix/Awk.htm
awk '
8L61u { prnt "I1e\t0wner" | \
{ prnt $8, "\t", $2| \
Lu0 { prnt " - 00uL -" | \
'
Cick here to get Iie: awkexampe1. csh
As you can see in the above script. each ine oI the AWKscript must have a backsash iI it is not the ast ine oI the script. This is necessary as
the C she doesn't. by deIaut. aow strings I have a ong ist oI compaints about using the C she. See Top Ten reasons not to use the C
she
The Bourne she (as does most shes) aows quoted strings to span severa ines:
#!/bin/sh
# Linux users have to change $8 to $9
awk '
BEGIN print "Fie \tOwner" }
print $8. "\t". $3}
END print " - DONE -" }
'
Cick here to get Iie: awkexampe1. sh
The third Iorm is to store the commands in a Iie. and execute
awk -I Iiename
Since AWKis aso an interpretor. you can save yourseI a step and make the Iie executabe by add one ine in the beginning oI the Iie:
#!/bin/awk -I
BEGIN print "Fie \tOwner" }
print $8. "\t". $3}
END print " - DONE -" }
Cick here to get Iie: awkexampe1. awk
Change the permission with the chod command . (i.e . "chmod x awkexampe1.awk"). and the script becomes a new command . Notice the
"-I" option Ioowing '#!/bin/awk "above . which is aso used in the third Iormat where you use AWKto execute the Iie directy. i.e. "awk -I
Iiename". The "-I" option speciIies the AWKIie containing the instructions. As you can see. AWK considers ines that start with a "#" to be a
comment. iust ike the she. To be precise. anything Irom the "#" to the end oI the ine is a comment (uness its inside an AWKstring.
However. I aways comment my AWKscripts with the "#" at the start oI the ine. Ior reasons I' discuss ater.
Which Iormat shoud you use? I preIer the ast Iormat when possibe. It' s shorter and simper. It' s aso easier to debug probems. II you need to
use a she. and want to avoid using too many Iies. you can combine them as we did in the Iirst and second exampe .
Which hell to ue with AWK?
The Iormat oI AWKis not Iree-Iorm. You cannot put new ine breaks iust anywhere . They must go in particuar ocations. To be precise. in the
origina AWKyou can insert a new ine character aIter the cury braces. and at the end oI a command. but not esewhere. II you wanted to
break a ong ine into two ines at any other pace. you had to use a backsash:
#!/bin/awk -I
BEGIN print "Fie \tOwner" }
print $8. "\t". \
$3}
END print " - DONE -" }
Cick here to get Iie: awkexampe2. awk
The Bourne she version woud be
#!/bin/sh
awk '
BEGIN print "Fie \tOwner" }
print $8. "\t". \
$3}
END print "done"}
'
5 Awk - A Tutoria and Introduction - by Bruce Barnett
06/24/2011 07:50:19 PM http://www.grymoire.com/Unix/Awk.htm
Cick here to get Iie: awkexampe2. sh
whie the C she woud be
#!/bin/csh -I
awk '
BEGIN print "Fie \tOwner" }\
print $8. "\t". \\
$3}\
END print "done"}\
'
Cick here to get Iie: awkexampe2. csh
As you can see. this demonstrates how awkward the C she is when encosing an AWKscript. Not ony are back sashes needed Ior every
ine. some ines need two. (Note - this is true when using od awk (e. g. on Soaris) because the print statement had to be on one ine. Newer
AWK's are more Iexibe where newines can be added .) Many peope wi warn you about the C she. Some oI the probems are subte. and
you may never see them. Try to incude an AWK or sed script within a C she script. and the back sashes wi drive you craz y. This is what
convinced me to earn the Bourne she years ago. when I was starting out. I strongy recommend you use the Bourne she Ior any AWK or sed
script. II you don't use the Bourne she. then you shoud earn it. As a minimum. earn how to set variabes. which by some strange
coincidence is the subiect oI the next section.
Dynamic Variable
Since you can make a script an AWKexecutabe by mentioning "#!/bin/awk -I" on the Iirst ine. incuding an AWKscript inside a she script
isn' t needed uness you want to either eiminate the need Ior an extra Iie. or iI you want to pass a variabe to the insides oI an AWKscript.
Since this is a common probem. now is as good a time to expain the technique. I' do this by showing a simpe AWKprogram that wi ony
print one coumn. NOTE: there will be a bug in the firt verion. The number oI the coumn wi be speciIied by the Iirst argument. The Iirst
version oI the program. which we wi ca "Coumn. " ooks ike this:
#!/bin/sh
#NOTE - this script does not work!
coumn$1
awk 'print $coumn}'
Cick here to get Iie (but be aware that it doesn't work): Coumn1. sh
A suggested use is:
s - ' Coumn 3
This woud print the third coumn Irom the ls command. which woud be the owner oI the Iie. You can change this into a utiity that counts
how many Iies are owned by each user by adding
s - ' Coumn 3 ' uniq -c ' sort -nr
Only one problem: the cript doen't work. The vaue oI the "coumn" variabe is not seen by AWK. Change "awk" to "echo" to check.
You need to turn oII the quoting when the variabe is seen. This can be done by ending the quoting. and restarting it aIter the variabe:
#!/bin/sh
coumn$1
awk 'print $' $coumn'}'
Cick here to get Iie: Coumn2.sh
This is a very important concept. and throws experienced programmers a curve ba. In many computer anguages. a string has a start quote.
and end quote. and the contents in between. II you want to incude a specia character inside the quote. you must prevent the character Irom
having the typica meaning. In the C anguage. this is down by putting a backsash beIore the character. In other anguages. there is a specia
combination oI characters to to this. In the C and Bourne she. the quote is iust a switch. It turns the interpretation mode on or oII. There is
reay no such concept as "start oI string" and "end oI string." The quotes togge a switch inside the interpretor. The quote character is not
passed on to the appication. This is why there are two pairs oI quotes above. Notice there are two doar signs. The Iirst one is quoted. and is
seen by AWK. The second one is not quoted. so the she evauates the variabe. and repaces "$coumn" by the vaue. II you don' t
understand. either change "awk" to "echo. " or change the Iirst ine to read "#!/bin/sh -x."
Some improvements are needed . however. The Bourne she has a mechanism to provide a vaue Ior a variabe iI the vaue isn' t set. or is set
and the vaue is an empty string. This is done by using the Iormat:
$variable:-defaultvalue}
6 Awk - A Tutoria and Introduction - by Bruce Barnett
06/24/2011 07:50:19 PM http://www.grymoire.com/Unix/Awk.htm
This is shown beow. where the deIaut coumn wi be one:
#!/bin/sh
coumn$1:-1}
awk 'print $' $coumn'}'
Cick here to get Iie: Coumn3.sh
We can save a ine by combining these two steps:
#!/bin/sh
awk 'print $' $1:-1}'}'
Cick here to get Iie: Coumn4.sh
It is hard to read. but it is compact. There is one other method that can be used. II you execute an AWKcommand and incude on the
command ine
variablevalue
this variabe wi be set when the AWKscript starts. An exampe oI this use woud be:
#!/bin/sh
awk 'print $c }' c$1:-1}
Cick here to get Iie: Coumn5.sh
This ast variation does not have the probems with quoting the previous exampe had. You shoud master the earier exampe . however.
because you can use it with any script or command. The second method is specia to AWK. Modern AWK' s have other options as we. See
the comp.unix. she FAQ.
The Eential Syntax of AWK
Earier I discussed ways to start an AWKscript. This section wi discuss the various grammatica eements oI AWK.
Arithmetic Expreion
There are severa arithmetic operators. simiar to C. These are the binary operators. which operate on two variabes:
+--------------------------------------------+
| ^Wk 1ab1e 1 |
| 8nary 0perators |
|0perator 1ype Meanng |
+--------------------------------------------+
|+ ^rthmetc ^ddton |
|- ^rthmetc ubtracton |
|* ^rthmetc Mu1tp1caton |
|1 ^rthmetc 0vson |
|x ^rthmetc Modu1o |
|<space> trng Concatenaton |
+--------------------------------------------+
Using variabes with the vaue oI "7" and "3. " AWKreturns the Ioowing resuts Ior each operator when using the print command:
+---------------------+
|Lxpresson 8esu1t |
+---------------------+
|7+2 10 |
|7-2 4 |
|7*2 21 |
|712 2.22222 |
|7x2 1 |
|7 2 72 |
+---------------------+
There are a Iew points to make. The moduus operator Iinds the remainder aIter an integer divide. The print command output a Ioating point
number on the divide. but an integer Ior the rest. The string concatenate operator is conIusing. since it isn' t even visibe. Pace a space between
two variabes and the strings are concatenated together. This aso shows that numbers are converted automaticay into strings when needed.
Unike C. AWKdoesn't have "types" oI variabes. There is one type ony. and it can be a string or number. The conversion rues are simpe. A
number can easiy be converted into a string. When a string is converted into a number. AWK wi do so. The string "123" wi be converted
into the number 123. However. the string "123X" wi be converted into the number 0. (NAWKwi behave diIIerenty. and converts the string
into integer 123. which is Iound in the beginning oI the string).
7 Awk - A Tutoria and Introduction - by Bruce Barnett
06/24/2011 07:50:19 PM http://www.grymoire.com/Unix/Awk.htm
Unary arithmetic operator
The "" and "-" operators can be used beIore variabes and numbers. II X equas 4. then the statement:
print -x;
wi print "-4."
The Autoincrement and Autodecrement Operator
AWKaso supports the "" and "--" operators oI C. Both increment or decrement the variabes by one . The operator can ony be used with a
singe variabe. and can be beIore or aIter the variabe. The preIix Iorm modiIies the vaue. and then uses the resut. whie the postIix Iorm
gets the resuts oI the variabe. and aIterwards modiIies the variabe. As an exampe. iI X has the vaue oI 3. then the AWKstatement
print x. " ". x;
woud print the numbers 3 and 5. These operators are aso assignment operators. and can be used by themseves on a ine :
x;
--y;
Aignment Operator
Variabes can be assigned new vaues with the assignment operators. You know about "" and "--." The other assignment statement is simpy:
variable aritheticexpression
Certain operators have precedence over others; parenthesis can be used to contro grouping. The statement
x12*3 4;
is the same as
x (1 (2 * 3)) "4";
Both print out "74."
Notice spaces can be added Ior readabiity. AWK. ike C. has specia assignment operators. which combine a cacuation with an assignment.
Instead oI saying
xx2;
you can more concisey say:
x2;
The compete ist Ioows:
+-----------------------------------------+
| ^Wk 1ab1e 2 |
| ^ssgnment 0perators |
|0perator Meanng |
|+= ^dd resu1t to varab1e |
|-= ubtract resu1t 1rom varab1e |
|*= Mu1tp1y varab1e by resu1t |
|1= 0vde varab1e by resu1t |
|x= ^pp1y modu1o to varab1e |
+-----------------------------------------+
Conditional expreion
The second type oI expression in AWKis the conditiona expression. This is used Ior certain tests. ike the if or while. Booean conditions
evauate to true or Iase. In AWK. there is a deIinite diIIerence between a booean condition. and an arithmetic expression. You cannot
convert a booean condition to an integer or string. You can. however. use an arithmetic expression as a conditiona expression. A vaue oI 0
is Iase. whie anything ese is true. UndeIined variabes has the vaue oI 0. Unike AWK. NAWK ets you use booeans as integers.
Arithmetic vaues can aso be converted into booean conditions by using reationa operators:
+---------------------------------------+
| ^Wk 1ab1e 2 |
8 Awk - A Tutoria and Introduction - by Bruce Barnett
06/24/2011 07:50:19 PM http://www.grymoire.com/Unix/Awk.htm
| 8e1atona1 0perators |
|0perator Meanng |
+---------------------------------------+
|== 1s equa1 |
|!= 1s not equa1 to |
|> 1s greater than |
|>= 1s greater than or equa1 to |
|< 1s 1ess than |
|<= 1s 1ess than or equa1 to |
+---------------------------------------+
These operators are the same as the C operators. They can be used to compare numbers or strings. With respect to strings. ower case etters
are greater than upper case etters.
Regular Expreion
Two operators are used to compare strings to reguar expressions:
+-----------------------------+
| ^Wk 1ab1e 4 |
|8egu1ar Lxpresson 0perators |
|0perator Meanng |
+-----------------------------+
|~ Matches |
|!~ 0oesn't match |
+-----------------------------+
The order in this case is particuar. The reguar expression must be encosed by sashes. and comes aIter the operator. AWK supports
extended reguar expressions. so the Ioowing are exampes oI vaid tests:
word !~ /START/
awrencewek ~ /(one 'two' three)/
And/Or/Not
There are two booean operators that can be used with conditiona expressions. That is. you can combine two conditiona expressions with the
"or" or "and" operators: "&&" and "''." There is aso the unary not operator: "!."
Command
There are ony a Iew commands in AWK. The ist and syntax Ioows:
iI ( conditional ) stateent | ese stateent |
whie ( conditional ) stateent
Ior ( expression ; conditional ; expression ) stateent
Ior ( variable in arrav ) stateent
break
continue
| stateent | ...}
variableexpression
print | expression-list | | ~ expression |
printI forat | . expression-list | | ~ expression |
next
exit
At this point. you can use AWKas a anguage Ior simpe cacuations; II you wanted to cacuate something. and not read any ines Ior input.
you coud use the BEGIN keyword discussed earier. combined with a exit command:
#!1bn1awk -1
8L61u {
# rnt the squares 1rom 1 to 10 the 1rst way
=1,
wh1e { <= 10) {
prnt1 "1he square o1 ", , " s ", *,
= +1,
|
# do t agan, usng more concse code
1or {=1, <= 10, ++) {
prnt1 "1he square o1 ", , " s ", *,
|
9 Awk - A Tutoria and Introduction - by Bruce Barnett
06/24/2011 07:50:19 PM http://www.grymoire.com/Unix/Awk.htm
# now end
ext,
|
Cick here to get Iie: awkprintsquares.awk
The Ioowing asks Ior a number. and then squares it:
#!1bn1awk -1
8L61u {
prnt "type a number",
|
{
prnt "1he square o1 ", $1, " s ", $1*$1,
prnt "type another number",
|
Lu0 {
prnt "0one"
|
Cick here to get Iie: awkaskIorsquare .awk
The above isn' t a good Iiter. because it asks Ior input each time . II you pipe the output oI another program into it. you woud generate a ot oI
meaningess prompts.
Here is a Iiter that you shoud Iind useIu. It counts ines. totas up the numbers in the Iirst coumn. and cacuates the average. Pipe "wc -c *"
into it. and it wi count Iies. and te you the average number oI words per Iie. as we as the tota words and the number oI Iies.
#!1bn1awk -1
8L61u {
# uow many 1nes
1nes=0,
tota1=0,
|
{
# ths code s executed once 1or each 1ne
# ncrease the number o1 11es
1nes++,
# ncrease the tota1 sze, whch s 1e1d #1
tota1+=$1,
|
Lu0 {
# end, now output the tota1
prnt 1nes " 1nes read",
prnt "tota1 s ", tota1,
1 {1nes > 0 ) {
prnt "average s ", tota111nes,
| e1se {
prnt "average s 0",
|
|
Cick here to get Iie: average.awk
You can pipe the output oI "s -s" into this Iiter to count the number oI Iies. the tota siz e . and the average siz e . There is a sight probem with
this script. as it incudes the output oI "s" that reports the tota. This causes the number oI Iies to be oII by one . Changing
ines;
to
iI ($ 1 ! "tota" ) ines;
wi Iix this probem. Note the code which prevents a divide by z ero. This is common in we-written scripts. I aso initiaiz e the variabes to
z ero. This is not necessary. but it is a good habit.
AWK Built-in Variable
I have mentioned two kinds oI variabes: positiona and user deIined. A user deIined variabe is one you create. A positiona variabe is not a
specia variabe. but a Iunction triggered by the doar sign. ThereIore
print $1;
and
10 Awk - A Tutoria and Introduction - by Bruce Barnett
06/24/2011 07:50:19 PM http://www.grymoire.com/Unix/Awk.htm
X1;
print $X;
do the same thing: print the Iirst Iied on the ine. There are two more points about positiona variabes that are very useIu. The variabe "$0"
reIers to the entire ine that AWK reads in. That is. iI you had eight Iieds in a ine.
print $0;
is simiar to
print $1. $ 2. $3. $4. $5. $6. $ 7. $8
This wi change the spacing between the Iieds; otherwise. they behave the same . You can modiIy positiona variabes. The Ioowing
commands
$2"";
print;
deetes the second Iied. II you had Iour Iieds. and wanted to print out the second and Iourth Iied. there are two ways. This is the Iirst:
#!1bn1awk -1
{
$1="",
$2="",
prnt,
|
and the second
#!1bn1awk -1
{
prnt $2, $4,
|
These perIorm simiary. but not identicay. The number oI spaces between the vaues vary. There are two reasons Ior this. The actua number
oI Iieds does not change. Setting a positiona variabe to an empty string does not deete the variabe. It's sti there. but the contents has been
deeted. The other reason is the way AWKoutputs the entire ine . There is a Iied separator that speciIies what character to put between the
Iieds on output. The Iirst exampe outputs Iour Iieds. whie the second outputs two. In-between each Iied is a space . This is easier to expain
iI the characters between Iieds coud be modiIied to be made more visibe. We. it can. AWKprovides specia variabes Ior iust that purpose.
FS - The Input Field Separator Variable
AWKcan be used to parse many system administration Iies. However. many oI these Iies do not have whitespace as a separator. as an
exampe. the password Iie uses coons. You can easiy change the Iied separator character to be a coon using the "-F" command ine option.
The Ioowing command wi print out accounts that don' t have passwords:
awk -F: 'iI ($2 "") print $ 1 ": no password!"}' /etc/passwd
There is a way to do this without the command ine option. The variabe "FS" can be set ike any variabe. and has the same Iunction as the "-
F" command ine option. The Ioowing is a script that has the same Iunction as the one above.
#!1bn1awk -1
8L61u {
I=":",
|
{
1 { $2 == "" ) {
prnt $1 ": no password!",
|
|
Cick here to get Iie: awknopasswd.awk
The second Iorm can be used to create a UNIX utiity. which I wi name "chkpasswd. " and executed ike this:
chkpasswd /etc /passwd
The command "chkpasswd -F:" cannot be used. because AWK wi never see this argument. A interpreter scripts accept one and ony one
argument. which is immediatey aIter the "#!/bin/awk" string. In this case. the singe argument is "-I." Another diIIerence between the
command ine option and the interna variabe is the abiity to set the input Iied separator to be more than one character. II you speciIy
FS": ";
11 Awk - A Tutoria and Introduction - by Bruce Barnett
06/24/2011 07:50:19 PM http://www.grymoire.com/Unix/Awk.htm
then AWKwi spit a ine into Iieds wherever it sees those two characters. in that exact order. You cannot do this on the command ine .
There is a third advantage the interna variabe has over the command ine option: you can change the Iied separator character as many times
as you want whie reading a Iie. We. at most once Ior each ine. You can even change it depending on the ine you read. Suppose you had
the Ioowing Iie which contains the numbers 1 through 7 in three diIIerent Iormats. Lines 4 through 6 have coon separated Iieds. whie the
others separated by spaces.
ONE 1 I
TWO 2 II
#START
THREE:3:III
FOUR:4:IV
FIVE:5:V
#STOP
SIX 6 VI
SEVEN 7 VII
The AWK program can easiy switch between these Iormats:
#!1bn1awk -1
{
1 {$1 == "#1^81") {
I=":",
| e1se 1 {$1 == "#10") {
I=" ",
| e1se {
#prnt the 8oman number n co1umn 2
prnt $2
|
|
Cick here to get Iie: awkexampe3. awk
Note the Iied separator variabe retains its vaue unti it is expicity changed. You don't have to reset it Ior each ine . Sounds simpe. right?
However. I have a trick question Ior you. What happens iI you change the Iied separator whie reading a ine? That is. suppose you had the
Ioowing ine
One Two:Three :4 Five
and you executed the Ioowing script:
#!1bn1awk -1
{
prnt $2
I=":"
prnt $2
|
What woud be printed? "Three" or "Two:Three :4?" We. the script woud print out "Two:Three:4" twice. However. iI you deeted the Iirst
print statement. it woud print out "Three" once! I thought this was very strange at Iirst. but aIter puing out some hair. kicking the deck. and
yeing at museI and everyone who had anything to do with the deveopment oI UNIX. it is intuitivey obvious. You iust have to be thinking
ike a proIessiona programmer to reaiz e it is intuitive. I sha expain. and prevent you Irom causing yourseI physica harm.
II you change the Iied separator before you read the ine . the change affect what you read. II you change it after you read the ine . it wi
not redeIine the variabes. You woudn' t want a variabe to change on you as a side -eIIect oI another action. A programming anguage with
hidden side eIIects is broken. and shoud not be trusted . AWKaows you to redeIine the Iied separator either beIore or aIter you read the
ine. and does the right thing each time. Once you read the variabe. the variabe wi not change uness you change it. Bravo!
To iustrate this Iurther. here is another version oI the previous code that changes the Iied separator dynamicay. In this case . AWKdoes it
by examining Iied "$0. " which is the entire ine . When the ine contains a coon. the Iied separator is a coon. otherwise. it is a space :
#!1bn1awk -1
{
1 { $0 ~ 1:1 ) {
I=":",
| e1se {
I=" ",
|
#prnt the thrd 1e1d, whatever 1ormat
prnt $2
|
Cick here to get Iie: awkexampe4. awk
12 Awk - A Tutoria and Introduction - by Bruce Barnett
06/24/2011 07:50:19 PM http://www.grymoire.com/Unix/Awk.htm
This exampe eiminates the need to have the specia "#START" and "#STOP" ines in the input.
OFS - The Output Field Separator Variable
There is an important diIIerence between
print $2 $ 3
and
print $2. $ 3
The Iirst exampe prints out one Iied. and the second prints out two Iieds. In the Iirst case . the two positiona parameters are concatenated
together and output without a space . In the second case. AWKprints two Iieds. and paces the output Iied separator between them. Normay
this is a space. but you can change this by modiIying the variabe "OFS. "
II you wanted to copy the password Iie. but deete the encrypted password. you coud use AWK:
#!1bn1awk -1
8L61u {
I=":",
0I=":",
|
{
$2="",
prnt
|
Cick here to get Iie: deetepasswd .awk
Give this script the password Iie. and it wi deete the password. but eave everything ese the same . You can make the output Iied separator
any number oI characters. You are not imited to a singe character.
NF - The Number of Field Variable
It is useIu to know how many Iieds are on a ine. You may want to have your script change its operation based on the number oI Iieds. As an
exampe. the command "s -" may generate eight or nine Iieds. depending on which version you are executing. The System Vversion. "/usr/
bin/s -" generates nine Iieds. which is equivaent to the Berkeey "/usr/ucb/s -g" command . II you wanted to print the owner and Iiename
then the Ioowing AWKscript woud work with either version oI "s:"
#!1bn1awk -1
# parse the output o1 "1s -1"
# prnt owner and 11ename
# remember - 8erke1ey 1s -1 has 8 1e1ds, ystem v has 9
{
1 {uI == 8) {
prnt $2, $8,
| e1se 1 {uI == 9) {
prnt $2, $9,
|
|
Cick here to get Iie: ownergroup.awk
Don't Iorget the variabe can be prepended with a "$." This aows you to print the ast Iied oI any coumn
#!/bin/awk -I
print $NF; }
Cick here to get Iie: printastIied .awk
One warning about AWK. There is a imit oI 99 Iieds in a singe ine . PERL does not have any such imitations.
NR - The Number of Record Variable
13 Awk - A Tutoria and Introduction - by Bruce Barnett
06/24/2011 07:50:19 PM http://www.grymoire.com/Unix/Awk.htm
Another useIu variabe is "NR. " This tes you the number oI records. or the ine number. You can use AWKto ony examine certain ines.
This exampe prints ines aIter the Iirst 100 ines. and puts a ine number beIore each ine aIter 100:
#!1bn1awk -1
{ 1 {u8 >= 100) {
prnt u8, $0,
|
Cick here to get Iie: awkexampe5. awk
RS - The Record Separator Variable
Normay. AWKreads one ine at a time. and breaks up the ine into Iieds. You can set the "RS" variabe to change AWK' s deIinition oI a
"ine." II you set it to an empty string. then AWKwi read the entire Iie into memory. You can combine this with changing the "FS" variabe.
This exampe treats each ine as a Iied. and prints out the second and third ine:
#!1bn1awk -1
8L61u {
# change the record separator 1rom new1ne to nothng
8=""
# change the 1e1d separator 1rom whtespace to new1ne
I="\n"
|
{
# prnt the second and thrd 1ne o1 the 11e
prnt $2, $2,
|
Cick here to get Iie: awkexampe6. awk
The two ines are printed with a space between. Aso this wi ony work iI the input Iie is ess than 100 ines. thereIore this technique is
imited. You can use it to break words up. one word per ine. using this:
#!1bn1awk -1
8L61u {
8=" ",
|
{
prnt ,
|
Cick here to get Iie: onewordperine .awk
but this ony works iI a oI the words are separated by a space . II there is a tab or punctuation inside. it woud not.
ORS - The Output Record Separator Variable
The deIaut output record separator is a newine. ike the input. This can be set to be a newine and carriage return. iI you need to generate a
text Iie Ior a non-UNIX system.
#!1bn1awk -1
# ths 11ter adds a carrage return to a11 1nes
# be1ore the new1ne character
8L61u {
08="\r\n"
|
{ prnt |
Cick here to get Iie: addcr. awk
FILENAME - The Current Filename Variable
The ast variabe known to reguar AWK is "FILENAME. " which tes you the name oI the Iie being read .
#!1bn1awk -1
# reports whch 11e s beng read
8L61u {
1="",
14 Awk - A Tutoria and Introduction - by Bruce Barnett
06/24/2011 07:50:19 PM http://www.grymoire.com/Unix/Awk.htm
|
{ 1 {1 != I1lLu^ML) {
prnt "readng", I1lLu^ML,
1=I1lLu^ML,
|
prnt,
|
Cick here to get Iie: awkexampe6a .awk
This can be used iI severa Iies need to be parsed by AWK. Normay you use standard input to provide AWKwith inIormation. You can aso
speciIy the Iienames on the command ine . II the above script was caed "testIiter. " and iI you executed it with
testIiter Iie1 Iie2 Iie3
It woud print out the Iiename beIore each change. An aternate way to speciIy this on the command ine is
testIiter Iie1 - Iie3 Iie2
In this case . the second Iie wi be caed "-. " which is the conventiona name Ior standard input. I have used this when I want to put some
inIormation beIore and aIter a Iiter operation. The preIix and postIix Iies specia data beIore and aIter the rea data. By checking the
Iiename. you can parse the inIormation diIIerenty. This is aso useIu to report syntax errors in particuar Iies:
#!1bn1awk -1
{
1 {uI == 6) {
# do the rght thng
| e1se {
1 {I1lLu^ML == "-" ) {
prnt "Yu1^x L8808, Wrong number o1 1e1ds,",
"n 101u, 1ne #:", u8, "1ne: ", $0,
| e1se {
prnt "Yu1^x L8808, Wrong number o1 1e1ds,",
"I1ename: ", I1lLu^ML, "1ne # ", u8,"1ne: ", $0,
|
|
|
Cick here to get Iie: awkexampe7. awk
Aociative Array
I have used doz ens oI diIIerent programming anguages over the ast 20 years. and AWKis the Iirst anguage I Iound that has associative
arrays. This term may be meaningess to you. but beieve me. these arrays are invauabe . and simpiIy programming enormousy. Let me
describe a probem. and show you how associative arrays can be used Ior reduce coding time . giving you more time to expore another stupid
probem you don't want to dea with in the Iirst pace.
Let's suppose you have a directory overIowing with Iies. and you want to Iind out how many Iies are owned by each user. and perhaps how
much disk space each user owns. You reay want someone to bame ; it' s hard to te who owns what Iie. A Iiter that processes the output oI
ls woud work:
s - ' Iiter
But this doesn't te you how much space each user is using. It aso doesn't work Ior a arge directory tree. This requires find and xargs:
Iind . -type I -print ' xargs s - ' Iiter
The third coumn oI "s" is the username. The Iiter has to count how many times it sees each user. The typica program woud have an array
oI usernames and another array that counts how many times each username has been seen. The index to both arrays are the same; you use
one array to Iind the index. and the second to keep track oI the count. I' show you one way to do it in AWK--the wrong way:
#!1bn1awk -1
# bad examp1e o1 ^Wk programmng
# ths counts how many 11es each user owns.
8L61u {
number_o1_users=0,
|
{
# must make sure you on1y examne 1nes wth 8 or more 1e1ds
1 {uI>7) {
user=0,
# 1ook 1or the user n our 1st o1 users
1or {=1, <=number_o1_users, ++) {
# s the user known?
1 {username]1 == $2) {
# 1ound t - remember where the user s
15 Awk - A Tutoria and Introduction - by Bruce Barnett
06/24/2011 07:50:19 PM http://www.grymoire.com/Unix/Awk.htm
user=,
|
|
1 {user == 0) {
# 1ound a new user
username]++number_o1_users1=$2,
user=number_o1_users,
|
# ncrease number o1 counts
count]user1++,
|
|
Lu0 {
1or {=1, <=number_o1_users, ++) {
prnt count]1, username]1
|
|
Cick here to get Iie: awkexampe8. awk
I don' t want you to read this script. I tod you it's the wrong way to do it. II you were a C programmer. and didn't know AWK. you woud
probaby use a technique ike the one above . Here is the same program. except this exampe that uses AWK's associative arrays. The
important point is to notice the diIIerence in siz e between these two versions:
#!1bn1awk -1
{
username]$21++,
|
Lu0 {
1or { n username) {
prnt username]1, ,
|
|
Cick here to get Iie: countusers0.awk
This is shorter. simper. and uch easier to understand--Once you understand exacty what an associative array is. The concept is simpe.
Instead oI using a number to Iind an entry in an array. use anvthing vou want. An associative array in an array whose index is a string. A
arrays in AWKare associative. In this case. the index into the array is the third Iied oI the "s" command . which is the username. II the user
is "bin. " the main oop increments the count per user by eIIectivey executing
username|"bin"|;
UNIX guru's may geeIuy report that the 8 ine AWK script can be repaced by:
awk 'print $3}' ' sort ' uniq -c ' sort -nr
True. However. this can' t count the tota disk space Ior each user. We need to add some more inteigence to the AWKscript. and need the
right Ioundation to proceed . There is aso a sight bug in the AWK program. II you wanted a "quick and dirty" soution. the above woud be
Iine. II you wanted to make it more robust. you have to hande unusua conditions. II you gave this program an empty Iie Ior input. you woud
get the error:
awk: username is not an array
Aso. iI you piped the output oI "s -" to it. the ine that speciIied the tota woud increment a non-existing user. There are two techniques
used to eiminate this error. The Iirst one ony counts vaid input:
#!1bn1awk -1
{
1 {uI>7) {
username]$21++,
|
|
Lu0 {
1or { n username) {
prnt username]1, ,
|
|
Cick here to get Iie: countusers1.awk
This Iixes the probem oI counting the ine with the tota. However. it sti generates an error when an empty Iie is read as input. To Iix this
probem. a common technique is to make sure the array aways exists. and has a specia marker vaue which speciIies that the entry is invaid.
Then when reporting the resuts. ignore the invaid entry.
16 Awk - A Tutoria and Introduction - by Bruce Barnett
06/24/2011 07:50:19 PM http://www.grymoire.com/Unix/Awk.htm
#!1bn1awk -1
8L61u {
username]""1=0,
|
{
username]$21++,
|
Lu0 {
1or { n username) {
1 { != "") {
prnt username]1, ,
|
|
|
Cick here to get Iie: countusers2.awk
This happens to Iix the other probem. Appy this technique and you wi make your AWKprograms more robust and easier Ior others to use.
Multi-dimenional Array
Some peope ask iI AWKcan hande muti-dimensiona arrays. It can. However. you don' t use conventiona two-dimensiona arrays. Instead
you use associative arrays. (Did I even mention how useIu associative arrays are?) Remember. you can put anything in the index oI an
associative array. It requires a diIIerent way to think about probems. but once you understand. you won't be abe to ive without it. A you
have to do is to create an index that combines two other indices. Suppose you wanted to eIIectivey execute
a|1. 2| y;
This is invaid in AWK. However. the Ioowing is perIecty Iine:
a|1 ". " 2| y;
Remember: the AWKstring concatenation operator is the space . It combines the three strings into the singe string "1. 2." Then it uses it as an
index into the array. That' s a there is to it. There is one minor probem with associative arrays. especiay iI you use the for command to
output each eement: you have no contro over the order oI output. You can create an agorithm to generate the indices to an associative array.
and contro the order this way. However. this is diIIicut to do. Since UNIX provides an exceent sort utiity. more programmers separate the
inIormation processing Irom the sorting. I' show you what I mean.
Example of uing AWK' Aociative Array
I oIten Iind myseI using certain techniques repeatedy in AWK. This exampe wi demonstrate these techniques. and iustrate the power and
eegance oI AWK. The program is simpe and common. The disk is Iu. Who's gonna be bamed? I iust hope you use this power wisey.
Remember. you may be the one who Iied up the disk.
Having resoved my mora diemma . by pacing the burden squarey on your shouders. I wi describe the program in detai. I wi aso discuss
severa tips you wi Iind useIu in arge AWKprograms. First. initiaiz e a arrays used in a for oop. There wi be Iour arrays Ior this purpose.
Initiaiz ation is easy:
ucount|""|0;
gcount|""|0;
ugcount|""|0;
acount|""|0;
The second tip is to pick a convention Ior arrays. Seecting the names oI the arrays. and the indices Ior each array is very important. In a
compex program. it can become conIusing to remember which array contains what. I suggest you ceary identiIy the indices and contents oI
each array. To demonstrate. I wi use a "count" to indicate the number oI Iies. and "sum" to indicate the sum oI the Iie siz es. In addition.
the part beIore the "" speciIies the index used Ior the array. which wi be either "u" Ior user. "g" Ior group. "ug" Ior the user and group
combination. and "a" Ior the tota Ior a Iies. In other programs. I have used names ike
usernametodirectory|username|directory;
Foow a convention ike this. and it wi be hard Ior you to Iorget the purpose oI each associative array. Even when a quick hack comes back
to haunt you three years ater. I've been there.
The third suggestion is to make sure your input is in the correct Iorm. It's generay a good idea to be pessimistic . but I wi add a simpe but
suIIicient test in this exampe.
1 {uI != 10) {
# gnore
| e1se {
17 Awk - A Tutoria and Introduction - by Bruce Barnett
06/24/2011 07:50:19 PM http://www.grymoire.com/Unix/Awk.htm
etc.
I paced the test and error cause up Iront. so the rest oI the code won' t be cuttered. AWKdoesn' t have user deIined Iunctions. NAWK.
GAWK and PERL do.
The next piece oI advice Ior compex AWKscripts is to deIine a name Ior each Iied used. In this case. we want the user. group and siz e in
disk bocks. We coud use the Iie siz e in bytes. but the bock siz e corresponds to the bocks on the disk. a more accurate measurement oI
space . Disk bocks can be Iound by using "s -s." This adds a coumn. so the username becomes the Iourth coumn. etc. ThereIore the script
wi contain:
siz e $1;
user$4;
group$5;
This wi aow us to easiy adapt to changes in input. We coud use "$1" throughout the script. but iI we changed the number oI Iieds. which
the "-s" option does. we' d have to change each Iied reIerence. You don't want to go through an AWKscript. and change a the "$1" to "$2. "
and aso change the "$2" to "$3" because those are reay the "$1" that you iust changed to "$2." OI coure this is conIusing. That's why it' s
a good idea to assign names to the Iieds. I've been there too.
Next the AWKscript wi count how many times each combination oI users and groups occur. That is. I am going to construct a two-part index
that contains the username and groupname . This wi et me count up the number oI times each user/group combination occurs. and how much
disk space is used.
Consider this: how woud you cacuate the tota Ior iust a user. or Ior iust a group? You coud rewrite the script. Or you coud take the user/
group totas. and tota them with a second script.
You coud do it. but it's not the AWKway to do it. II you had to examine a baz iion Iies. and it takes a ong time to run that script. it woud
be a waste to repeat this task. It' s aso ineIIicient to require two scripts when one can do everything. The proper way to sove this probem is to
extract as much inIormation as possibe in one pass through the Iies. ThereIore this script wi Iind the number and siz e Ior each category:
Each user
Each group
Each user/group combination
A users and groups
This is why I have 4 arrays to count up the number oI Iies. I don't reay need 4 arrays. as I can use the Iormat oI the index to determine
which array is which. But this does maake the program easier to understand Ior now. The next tip is subte. but you wi see how useIu it is. I
mentioned the indices into the array can be anything. II possibe. seect a Iormat that aows you to merge inIormation Irom severa arrays. I
reaiz e this makes no sense right now. but hang in there. A wi become cear soon. I wi do this by constructing a universa index oI the Iorm
user~ group~
This index wi be used Ior a arrays. There is a space between the two vaues. This covers the tota Ior the user/group combination. What
about the other three arrays? I wi use a "*" to indicate the tota Ior a users or groups. ThereIore the index Ior a Iies woud be "* *" whie
the index Ior a oI the Iie owned by user daeon woud be "daemon *." The heart oI the script totas up the number and siz e oI each Iie.
putting the inIormation into the right category. I wi use 8 arrays; 4 Ior Iie siz es. and 4 Ior counts:
ucount|user " *"|;
gcount|"* " group|;
ugcount|user " " group|;
acount|"* *"|;
usiz e|user " *"|siz e;
gsiz e|"* " group|siz e ;
ugsiz e |user " " group|siz e;
asiz e |"* *"|siz e ;
This particuar universa index wi make sorting easier. as you wi see . Aso important is to sort the inIormation in an order that is useIu. You
can try to Iorce a particuar output order in AWK. but why work at this. when it's a one ine command Ior sort? The diIIicut part is Iinding the
right way to sort the inIormation. This script wi sort inIormation using the siz e oI the category as the Iirst sort Iied. The argest tota wi be
the one Ior a Iies. so this wi be one oI the Iirst ines output. However. there may be severa ties Ior the argest number. and care must be
used. The second Iied wi be the number oI Iies. This wi hep break a tie. Sti. I want the totas and sub-totas to be isted beIore the
individua user/group combinations. The third and Iourth Iieds wi be generated by the index oI the array. This is the tricky part I warned you
about. The script wi output one string. but the sort utiity wi not know this. Instead. it wi treat it as two Iieds. This wi uniIy the resuts. and
inIormation Irom a 4 arrays wi ook ike one array. The sort oI the third and Iourth Iieds wi be dictionary order. and not numeric. unike
the Iirst two Iieds. The "*" was used so these sub-tota Iieds wi be isted beIore the individua user/group combination.
The arrays wi be printed using the Ioowing Iormat:
1or { n u_count) {
1 { != "") {
prnt u_sze]1, u_count]1, ,
18 Awk - A Tutoria and Introduction - by Bruce Barnett
06/24/2011 07:50:19 PM http://www.grymoire.com/Unix/Awk.htm
|
|
0
I ony showed you one array. but a Iour are printed the same way. That's the essence oI the script. The resuts is sorted. and I converted the
space into a tab Ior cosmetic reasons.
Output of the cript
I changed my directory to /usr/ucb. used the script in that directory. The Ioowing is the output:
sze count user group
2172 81 * *
2172 81 root *
2972 75 * sta11
2972 75 root sta11
88 2 * daemon
88 2 root daemon
64 2 * kmem
64 2 root kmem
48 1 * tty
48 1 root tty
This says there are 81 Iies in this directory. which takes up 3173 disk bocks. A oI the Iies are owned by root. 2973 disk bocks beong to
group staII. There are 3 Iies with group daemon. which takes up 88 disk bocks.
As you can see. the Iirst ine oI inIormation is the tota Ior a users and groups. The second ine is the sub-tota Ior the user "root." The third
ine is the sub-tota Ior the group "staII." ThereIore the order oI the sort is useIu. with the sub-totas beIore the individua entries. You coud
write a simpe AWKor grep script to obtain inIormation Irom iust one user or one group. and the inIormation wi be easy to sort.
There is ony one probem. The /usr/ucb directory on my system ony uses 1849 bocks; at east that's what du reports. Where 's the
discrepancy? The script does not understand hard inks. This may not be a probem on most disks. because many users do not use hard inks.
Sti. it does generate inaccurate resuts. In this case. the program vi is aso e. ex. edit. view. and 2 other names. The program ony exists once.
but has 7 names. You can te because the ink count (Iied 2) reports 7. This causes the Iie to be counted 7 times. which causes an inaccurate
tota. The Iix is to ony count mutipe inks once. Examining the ink count wi determine iI a Iie has mutipe inks. However. how can you
prevent counting a ink twice? There is an easy soution: a oI these Iies have the same inode number. You can Iind this number with the -i
option to ls. To save memory. we ony have to remember the inodes oI Iies that have mutipe inks. This means we have to add another
coumn to the input. and have to renumber a oI the Iied reIerences. It' s a good thing there are ony three. Adding a new Iied wi be easy.
because I Ioowed my own advice .
The Iina script shoud be easy to Ioow. I have used variations oI this hundreds oI times and Iind it demonstrates the power oI AWKas we
as provide insight to a powerIu programming paradigm. AWKsoves these types oI probems easier than most anguages. But you have to use
AWKthe right way.
Note - this version was written Ior a Soaris box. You have to veriIy iI ls is generating the right number oI arguments. The -g argument may
need to be deeted. and the check Ior the number oI Iies may have to be modiIied . UpdatedI added a Linux version beow - to be
downoaded.
This is a Iuy working version oI the program. that accuratey counts disk space . appears beow:
#!1bn1sh
1nd . -type 1 -prnt | xargs 1usr1bn11s -s1g |
awk '
8L61u {
# nta1ze a11 arrays used n 1or 1oop
u_count]""1=0,
g_count]""1=0,
ug_count]""1=0,
a11_count]""1=0,
|
{
# va1date your nput
1 {uI != 11) {
# gnore
| e1se {
# assgn 1e1d names
node=$1,
sze=$2,
1nkcount=$4,
user=$5,
group=$6,
# shou1d 1 count ths 11e?
dot=0,
1 {1nkcount == 1) {
# on1y one copy - count t
dot++,
| e1se {
# a hard 1nk - on1y count 1rst one
19 Awk - A Tutoria and Introduction - by Bruce Barnett
06/24/2011 07:50:19 PM http://www.grymoire.com/Unix/Awk.htm
seen]node1++,
1 {seen]node1 == 1) {
dot++,
|
|
# 1 dot s true, then count the 11e
1 {dot ) {
# tota1 up counts n one pass
# use descrpton array names
# use array ndex that un1es the arrays
# 1rst the counts 1or the number o1 11es
u_count]user " *"1++,
g_count]"* " group1++,
ug_count]user " " group1++,
a11_count]"* *"1++,
# then the tota1 dsk space used
u_sze]user " *"1+=sze,
g_sze]"* " group1+=sze,
ug_sze]user " " group1+=sze,
a11_sze]"* *"1+=sze,
|
|
|
Lu0 {
# output n a 1orm that can be sorted
1or { n u_count) {
1 { != "") {
prnt u_sze]1, u_count]1, ,
|
|
1or { n g_count) {
1 { != "") {
prnt g_sze]1, g_count]1, ,
|
|
1or { n ug_count) {
1 { != "") {
prnt ug_sze]1, ug_count]1, ,
|
|
1or { n a11_count) {
1 { != "") {
prnt a11_sze]1, a11_count]1, ,
|
|
| ' |
# numerc sort - bggest numbers 1rst
# sort 1e1ds 0 and 1 1rst {sort starts wth 0)
# 1o11owed by dctonary sort on 1e1ds 2 + 2
sort +0nr -2 +2d |
# add header
{echo "sze count user group",cat -) |
# convert space to tab - makes t nce output
# the second set o1 quotes contans a sng1e tab character
tr ' ' ' '
# done - 1 hope you 1ke t
Cick here to get Iie: countusers3.awk
Remember when I said I didn' t need to use 4 diIIerent arrays? I can use iust one. This is more conIusing. but more concise
#!1bn1sh
1nd . -type 1 -prnt | xargs 1usr1bn11s -s1g |
awk '
8L61u {
# nta1ze a11 arrays used n 1or 1oop
count]""1=0,
|
{
# va1date your nput
1 {uI != 11) {
# gnore
| e1se {
# assgn 1e1d names
node=$1,
sze=$2,
1nkcount=$4,
user=$5,
group=$6,
# shou1d 1 count ths 11e?
20 Awk - A Tutoria and Introduction - by Bruce Barnett
06/24/2011 07:50:19 PM http://www.grymoire.com/Unix/Awk.htm
dot=0,
1 {1nkcount == 1) {
# on1y one copy - count t
dot++,
| e1se {
# a hard 1nk - on1y count 1rst one
seen]node1++,
1 {seen]node1 == 1) {
dot++,
|
|
# 1 dot s true, then count the 11e
1 {dot ) {
# tota1 up counts n one pass
# use descrpton array names
# use array ndex that un1es the arrays
# 1rst the counts 1or the number o1 11es
count]user " *"1++,
count]"* " group1++,
count]user " " group1++,
count]"* *"1++,
# then the tota1 dsk space used
sze]user " *"1+=sze,
sze]"* " group1+=sze,
sze]user " " group1+=sze,
sze]"* *"1+=sze,
|
|
|
Lu0 {
# output n a 1orm that can be sorted
1or { n count) {
1 { != "") {
prnt sze]1, count]1, ,
|
|
| ' |
# numerc sort - bggest numbers 1rst
# sort 1e1ds 0 and 1 1rst {sort starts wth 0)
# 1o11owed by dctonary sort on 1e1ds 2 + 2
sort +0nr -2 +2d |
# add header
{echo "sze count user group",cat -) |
# convert space to tab - makes t nce output
# the second set o1 quotes contans a sng1e tab character
tr ' ' ' '
# done - 1 hope you 1ke t
Cick here to get Iie: countusers.awk
Here is a version that works with modern Linux systems. but assumes you have we-behaved Iienames (without spaces. etc. ): countusers
new.awk
Picture Perfect PRINTF Output
So Iar. I described severa simpe scripts that provide useIu inIormation. in a somewhat ugy output Iormat. Coumns might not ine up
propery. and it is oIten hard to Iind patterns or trends without this unity. As you use AWKmore . you wi be desirous oI crisp. cean
Iormatting. To achieve this. you must master the printf Iunction.
PRINTF - formatting output
The printf is very simiar to the C Iunction with the same name. C programmers shoud have no probem using printf Iunction.
Printf has one oI these syntactica Iorms:
printI ( Iormat);
printI ( Iormat. arguments.. .);
printI ( Iormat) ~expression;
printI ( Iormat. arguments.. .) ~ expression;
The parenthesis and semicoon are optiona. I ony use the Iirst Iormat to be consistent with other nearby printf statements. A print statement
woud do the same thing. Printf reveas it's rea power when Iormatting commands are used.
21 Awk - A Tutoria and Introduction - by Bruce Barnett
06/24/2011 07:50:19 PM http://www.grymoire.com/Unix/Awk.htm
The Iirst argument to the printf Iunction is the Iormat. This is a string. or variabe whose vaue is a string. This string. ike a strings. can contain
specia escape sequences to print contro characters.
Ecape Sequence
The character "\" is used to "escape " or mark specia characters. The ist oI these characters is in tabe beow:
+-------------------------------------------------------+
| ^Wk 1ab1e 5 |
| Lscape equences |
|equence 0escrpton |
+-------------------------------------------------------+
|\a ^C11 be11 {u^Wk on1y) |
|\b 8ackspace |
|\1 Iorm1eed |
|\n uew1ne |
|\r Carrage 8eturn |
|\t uorzonta1 tab |
|\v vertca1 tab {u^Wk on1y) |
|\ddd Character {1 to 2 octa1 dgts) {u^Wk on1y) |
|\xdd Character {hexadecma1) {u^Wk on1y) |
| ^ny character c |
+-------------------------------------------------------+
It' s diIIicut to expain the diIIerences without being wordy. HopeIuy I' provide enough exampes to demonstrate the diIIerences.
With NAWK. you can print three tab characters using these three diIIerent representations:
printI("\t\11\x9\n");
A tab character is decima 9. octa 11. or hexadecima 09. See the man page ascii(7) Ior more inIormation. Simiary. you can print three
doube-quote characters (decima 34. hexadecima 22. or octa 42 ) using
printI("\"\x22\42\n");
You shoud notice a diIIerence between the printf Iunction and the print Iunction. Print terminates the ine with the ORS character. and
divides each Iied with the OFS separator. Printf does nothing uness you speciIy the action. ThereIore you wi Irequenty end each ine with
the newine character "\n. " and you must speciIy the separating characters expicity.
Format Specifier
The power oI the printf statement ies in the Iormat speciIiers. which aways start with the character "." The Iormat speciIiers are described
in tabe 6:
+----------------------------------------+
| ^Wk 1ab1e 6 |
| Iormat pec1ers |
|pec1er Meanng |
+----------------------------------------+
|c ^C11 Character |
|d 0ecma1 nteger |
|e I1oatng ont number |
| {engneerng 1ormat) |
|1 I1oatng ont number |
| {1xed pont 1ormat) |
|g 1he shorter o1 e or 1, |
| wth tra1ng zeros removed |
|o 0cta1 |
|s trng |
|x uexadecma1 |
|x ltera1 x |
+----------------------------------------+
Again. I' cover the diIIerences quicky. Tabe 3 iustrates the diIIerences. The Iirst ine states "printI(c\n". 100.0)"" prints a "d ."
+--------------------------------+
| ^Wk 1ab1e 7 |
| Lxamp1e o1 1ormat conversons |
|Iormat va1ue 8esu1ts |
+--------------------------------+
|xc 100.0 d |
|xc "100.0" 1 {u^Wk?) |
|xc 42 " |
|xd 100.0 100 |
|xe 100.0 1.000000e+02 |
|x1 100.0 100.000000 |
|xg 100.0 100 |
|xo 100.0 144 |
|xs 100.0 100.0 |
|xs "121" 121 |
22 Awk - A Tutoria and Introduction - by Bruce Barnett
06/24/2011 07:50:19 PM http://www.grymoire.com/Unix/Awk.htm
|xd "121" 0 {^Wk) |
|xd "121" 12 {u^Wk) |
|xx 100.0 64 |
+--------------------------------+
This tabe reveas some diIIerences between AWKand NAWK. When a string with numbers and etters are coverted into an integer. AWK
wi return a z ero. whie NAWK wi convert as much as possibe. The second exampe . marked with "NAWK?" wi return "d" on some
earier versions oI NAWK. whie ater versions wi return "1."
Using Iormat speciIiers. there is another way to print a doube quote with NAWK. This demonstrates Octa. Decima and Hexadecima
conversion. As you can see. it isn' t symmetrica. Decima conversions are done diIIerenty.
printI("sssc \n". "\"". "\x22". "\42". 34);
Between the "" and the Iormat character can be Iour optiona pieces oI inIormation. It heps to visuaiz e these Iieds as:
sign~z ero~width~.precision~Iormat
I' discuss each one separatey.
Width - pecifying minimum field ize
II there is a number aIter the ". " this speciIies the minimum number oI characters to print. This is the width Iied. Spaces are added so the
number oI printed characters equa this number. Note that this is the minimum Iied siz e . II the Iied becomes to arge . it wi grow. so
inIormation wi not be ost. Spaces are added to the eIt.
This Iormat aows you to ine up coumns perIecty. Consider the Ioowing Iormat:
printI("std\n". s. d);
II the string "s" is onger than 8 characters. the coumns won't ine up. Instead. use
printI("20sd \n". s. d);
As ong as the string is ess than 20 characters. the number wi start on the 21st coumn. II the string is too ong. then the two Iieds wi run
together. making it hard to read. You may want to consider pacing a singe space between the Iieds. to make sure you wi aways have one
space between the Iieds. This is very important iI you want to pipe the output to another program.
Adding inIormationa headers makes the output more readabe. Be aware that changing the Iormat oI the data may make it diIIicut to get the
coumns aigned perIecty. Consider the Ioowing script:
#!1usr1bn1awk -1
8L61u {
prnt1{"trng uumber\n"),
|
{
prnt1{"x10s x6d\n", $1, $2),
|
Cick here to get Iie: awkexampe9. awk
It woud be awkward (Iorgive the choice oI words) to add a new coumn and retain the same aignment. More compicated Iormats woud
require a ot oI tria and error. You have to adiust the Iirst printf to agree with the second printf statement. I suggest
#!1usr1bn1awk -1
8L61u {
prnt1{"x10s x6sn", "trng", "uumber"),
|
{
prnt1{"x10s x6d\n", $1, $2),
|
Cick here to get Iie: awkexampe10.awk
or even better
#!1usr1bn1awk -1
8L61u {
1ormat1 ="x10s x6sn",
1ormat2 ="x10s x6dn",
prnt1{1ormat1, "trng", "uumber"),
|
23 Awk - A Tutoria and Introduction - by Bruce Barnett
06/24/2011 07:50:19 PM http://www.grymoire.com/Unix/Awk.htm
{
prnt1{1ormat2, $1, $2),
|
Cick here to get Iie: awkexampe11.awk
The ast exampe . by using string variabes Ior Iormatting. aows you to keep a oI the Iormats together. This may not seem ike it's very
useIu. but when you have mutipe Iormats and mutipe coumns. it' s very useIu to have a set oI tempates ike the above. II you have to add
an extra space to make things ine up. it's much easier to Iind and correct the probem with a set oI Iormat strings that are together. and the
exact same width. CHainging the Iirst coumne Irom 10 characters to 11 is easy.
Left 1utification
The ast exampe paces spaces beIore each Iied to make sure the minimum Iied width is met. What do you do iI you want the spaces on the
right? Add a negative sign beIore the width:
printI("-10s -6d\n". $1. $ 2);
This wi move the printing characters to the eIt. with spaces added to the right.
The Field Preciion Value
The precision Iied. which is the number between the decima and the Iormat character. is more compex. Most peope use it with the Ioating
point Iormat (I). but surprisingy. it can be used with any Iormat character. With the octa. decima or hexadecima Iormat. it speciIies the
minimum number oI characters. Zeros are added to met this requirement. With the e and I Iormats. it speciIies the number oI digits aIter
the decima point. The e "e00" is not incuded in the precision. The g Iormat combines the characteristics oI the d and I Iormats. The
precision speciIies the number oI digits dispayed . beIore and aIter the decima point. The precision Iied has no eIIect on the c Iied. The
s Iormat has an unusua. but useIu eIIect: it speciIies the maximum number oI signiIicant characters to print.
II the Iirst number aIter the ". " or aIter the "-. " is a z ero. then the system adds z eros when padding. This incudes a Iormat types.
incuding strings and the c character Iormat. This means "010d" and ".10d " both adds eading z eros. giving a minimum oI 10 digits.
The Iormat "10.10d" is thereIore redundant. Tabe 8 gives some exampes:
+--------------------------------------------+
| ^Wk 1ab1e 8 |
| Lxamp1es o1 comp1ex 1ormattng |
|Iormat varab1e 8esu1ts |
+--------------------------------------------+
|xc 100 "d" |
|x10c 100 " d" |
|x010c 100 "000000000d" |
+--------------------------------------------+
|xd 10 "10" |
|x10d 10 " 10" |
|x10.4d 10.122456789 " 0010" |
|x10.8d 10.122456789 " 00000010" |
|x.8d 10.122456789 "00000010" |
|x010d 10.122456789 "0000000010" |
+--------------------------------------------+
|xe 987.1224567890 "9.871225e+02" |
|x10.4e 987.1224567890 "9.8712e+02" |
|x10.8e 987.1224567890 "9.87122457e+02" |
+--------------------------------------------+
|x1 987.1224567890 "987.122457" |
|x10.41 987.1224567890 " 987.1225" |
|x010.41 987.1224567890 "00987.1225" |
|x10.81 987.1224567890 "987.12245679" |
+--------------------------------------------+
|xg 987.1224567890 "987.122" |
|x10g 987.1224567890 " 987.122" |
|x10.4g 987.1224567890 " 987.1" |
|x010.4g 987.1224567890 "00000987.1" |
|x.8g 987.1224567890 "987.12246" |
+--------------------------------------------+
|xo 987.1224567890 "1722" |
|x10o 987.1224567890 " 1722" |
|x010o 987.1224567890 "0000001722" |
|x.8o 987.1224567890 "00001722" |
+--------------------------------------------+
|xs 987.122 "987.122" |
|x10s 987.122 " 987.122" |
|x10.4s 987.122 " 987." |
|x010.8s 987.122 "000987.122" |
+--------------------------------------------+
|xx 987.1224567890 "2db" |
|x10x 987.1224567890 " 2db" |
|x010x 987.1224567890 "00000002db" |
24 Awk - A Tutoria and Introduction - by Bruce Barnett
06/24/2011 07:50:19 PM http://www.grymoire.com/Unix/Awk.htm
|x.8x 987.1224567890 "000002db" |
+--------------------------------------------+
There is one more topic needed to compete this esson on printf.
Explicit File output
Instead oI sending output to standard output. you can send output to a named Iie. The Iormat is
printI("string\n") ~ "/tmp/Iie";
You can append to an existing Iie. by using "~~ :"
printI("string\n") ~~ "/tmp/Iie";
Like the she. the doube ange brackets indicates output is appended to the Iie. instead oI written to an empty Iie. Appending to the Iie
does not deete the od contents. However. there is a subte diIIerence between AWKand the she.
Consider the she program:
#!1bn1sh
wh1e x=`1ne`
do
echo got $x >>1tmp1a
echo got $x >1tmp1b
done
This wi read standard input. and copy the standard input to Iies "/tmp/a" and "/tmp/b." Fie "/tmp/a" wi grow arger. as inIormation is
aways appended to the Iie. Fie "/tmp/b. " however. wi ony contain one ine . This happens because each time the she see the "~" or "~~"
characters. it opens the Iie Ior writing. choosing the truncate/create or appending option at that time .
Now consider the equivaent AWKprogram:
#!1usr1bn1awk -1
{
prnt $0 >>"1tmp1a"
prnt $0 >"1tmp1b"
|
This behaves diIIerenty. AWKchooses the create/append option the Iirst time a Iie is opened Ior writing. AIterwards. the use oI "~" or "~~"
is ignored . Unike the she. AWKcopies a oI standard input to Iie "/tmp/b. "
Instead oI a string. some versions oI AWK aow you to speciIy an expression:
# |note to seI| check this one - it might not work
printI("string\n") ~ FILENAME ".out";
The Ioowing uses a string concatenation expression to iustrate this:
#!1usr1bn1awk -1
Lu0 {
1or {=0,<20,++) {
prnt1{"=xd\n", ) > "1tmp1a" ,
|
|
Cick here to get Iie: awkexampe12.awk
This script never Iinishes. because AWKcan have 10 additiona Iies open. and NAWK can have 20. II you Iind this to be a probem. ook
into PERL.
I hope this gives you the ski to make your AWKoutput picture perIect.
AWK Numerical Function
In previous tutorias. I have shown how useIu AWKis in manipuating inIormation. and generating reports. When you add a Iew Iunctions.
AWKbecomes even more. mmm. Iunctiona.
There are three types oI Iunctions: numeric. string and whatever' s eIt. Tabe9 ists a oI the numeric Iunctions:
+----------------------------------+
| ^Wk 1ab1e 9 |
+----------------------------------+
25 Awk - A Tutoria and Introduction - by Bruce Barnett
06/24/2011 07:50:19 PM http://www.grymoire.com/Unix/Awk.htm
| uumerc Iunctons |
|uame Iuncton varant |
+----------------------------------+
|cos cosne ^Wk |
|exp Lxponent ^Wk |
|nt 1nteger ^Wk |
|1og logarthm ^Wk |
|sn ne ^Wk |
|sqrt quare 8oot ^Wk |
|atan2 ^rctangent u^Wk |
|rand 8andom u^Wk |
|srand eed 8andom u^Wk |
+----------------------------------+
Trigonometric Function
Oh ioy. I bet miions. iI not doz ens. oI my readers have been waiting Ior me to discuss trigonometry. Personay. I don' t use trigonometry
much at work. except when I go oII on a tangent.
Sorry about that. I don't know what came over me . I don' t usuay resort to puns. I' write a note to myseI. and aIter I sine the note. I' have
my boss cosine it.
Now top that! I hate arguing with myseI. I aways ose. Thinking about math I earned in the year 2 B.C. (BeIore Computers) seems to
cause Iashbacks oI high schoo. pimpes. and (shudder) times best eIt Iorgotten. The stress oI remembering those days must have made me
Iorget the standards I normay set Ior myseI. Besides. no-one appreciates obtuse humor anyway. even iI I Iind acute way to say it.
I better change the subiect Iast. Combining humor and computers is a very serious matter.
Here is a NAWK script that cacuates the trigonometric Iunctions Ior a degrees between 0 and 360. It aso shows why there is no tangent.
secant or cosecant Iunction. (They aren't necessary). II you read the script. you wi earn oI some subte diIIerences between AWKand
NAWK. A this in a thin veneer oI demonstrating why we earned trigonometry in the Iirst pace. What more can you ask Ior? Oh. in case you
are wondering. I wrote this in the month oI December.
#!1usr1bn1nawk -1
#
# ^ smatterng o1 trgonometry...
#
# 1hs ^Wk scrpt p1ots the va1ues 1rom 0 to 260
# 1or the basc trgonometry 1unctons
# but 1rst - a revew:
#
# {uote to the edtor - the 1o11owng dagram assumes
# a 1xed wdth 1ont, 1ke Courer.
# otherwse, the dagram 1ooks very stupd, nstead o1 s1ght1y stupd)
#
# ^ssume the 1o11owng rght trang1e
#
# ^ng1e Y
#
# |
# |
# |
# a | c
# |
# |
# +------- ^ng1e x
# b
#
# snce the trang1e s a rght ang1e, then
# x+Y=90
#
# 8asc 1rgonometrc Iunctons. 11 you know the 1ength
# o1 2 sdes, and the ang1es, you can 1nd the 1ength o1 the thrd sde.
# ^1so - 1 you know the 1ength o1 the sdes, you can ca1cu1ate
# the ang1es.
#
# 1he 1ormu1as are
#
# sne{x) = a1c
# cosne{x) = b1c
# tangent{x) = a1b
#
# recproca1 1unctons
# cotangent{x) = b1a
# secant{x) = c1b
# cosecant{x) = c1a
#
# Lxamp1e 1)
# 1 an ang1e s 20, and the hypotenuse {c) s 10, then
# a = sne{20) * 10 = 5
# b = cosne{20) * 10 = 8.66
#
# 1he second examp1e w11 be more rea1stc:
#
26 Awk - A Tutoria and Introduction - by Bruce Barnett
06/24/2011 07:50:19 PM http://www.grymoire.com/Unix/Awk.htm
# uppose you are 1ookng 1or a Chrstmas tree, and
# wh1e ta1kng to your 1am1y, you smack nto a tree
# because your head was turned, and your kds were argung over who
# was gong to put the 1rst ornament on the tree.
#
# ^s you come to, you rea1ze your 1eet are touchng the trunk o1 the tree,
# and your eyes are 6 1eet 1rom the bottom o1 your 1rostbtten toes.
# Wh1e countng the stars that spn around your head, you a1so rea1ze
# the top o1 the tree s 1ocated at a 65 degree ang1e, re1atve to your eyes.
# You sudden1y rea1ze the tree s 12.84 1eet hgh! ^1ter a11,
# tangent{65 degrees) * 6 1eet = 12.84 1eet
# ^11 rght, t sn't rea1stc. uot many peop1e memorze the
# tangent tab1e, or can estmate ang1es that accurate1y.
# 1 was te11ng the truth about the stars spnnng around the head, however.
#
8L61u {
# assgn a va1ue 1or p.
1=2.14159,
# se1ect an "Ld u11van" number - rea11y rea11y bg
816=999999,
# pck two 1ormats
# keep them c1ose together, so when one co1umn s made 1arger
# the other co1umn can be adjusted to be the same wdth
1mt1="x7s x8s x8s x8s x10s x10s x10s x10sn",
# prnt out the tt1e o1 each co1umn
1mt2="x7d x8.21 x8.21 x8.21 x10.21 x10.21 x10.21 x10.21n",
# o1d ^Wk wants a backs1ash at the end o1 the next 1ne
# to contnue the prnt statement
# new ^Wk a11ows you to break the 1ne nto two, a1ter a comma
prnt1{1mt1,"0egrees","8adans","Cosne","ne",
"1angent","Cotangent","ecant", "Cosecant"),
1or {=0,<=260,++) {
# convert degrees to radans
r = * {1 1 180 ),
# n new ^Wk, the backs1ashes are optona1
# n 0l0 ^Wk, they are requred
prnt1{1mt2, , r,
# cosne o1 r
cos{r),
# sne o1 r
sn{r),
#
# 1 ran nto a prob1em when dvdng by zero.
# o 1 had to test 1or ths case.
#
# o1d ^Wk 1nds the next 1ne too comp1cated
# 1 don't mnd addng a backs1ash, but rewrtng the
# next three 1nes seems pont1ess 1or a smp1e 1esson.
# 1hs scrpt w11 on1y work wth new ^Wk, now - sgh...
# 0n the p1us sde,
# 1 don't need to add those back s1ashes anymore
#
# tangent o1 r
{cos{r) == 0) ? 816 : sn{r)1cos{r),
# cotangent o1 r
{sn{r) == 0) ? 816 : cos{r)1sn{r),
# secant o1 r
{cos{r) == 0) ? 816 : 11cos{r),
# cosecant o1 r
{sn{r) == 0) ? 816 : 11sn{r)),
|
# put an ext here, so that standard nput sn't needed.
ext,
|
Cick here to get Iie: trigonometry.awk
NAWK aso has the arctangent Iunction. This is useIu Ior some graphics work. as
arc tangent(a /b) ange (in radians)
ThereIore iI you have the X and Y ocations. the arctangent oI the ratio wi te you the ange. The atan2() Iunction returns a vaue Irom
negative pi to positive pi.
Exponent. log and quare root
27 Awk - A Tutoria and Introduction - by Bruce Barnett
06/24/2011 07:50:19 PM http://www.grymoire.com/Unix/Awk.htm
The Ioowing script uses three other arithmetic Iunctions: log. exp. and sqrt. I wanted to show how these can be used together. so I divided
the og oI a number by two. which is another way to Iind a square root. I then compared the vaue oI the exponent oI that new og to the buit-
in square root Iunction. I then cacuated the diIIerence between the two. and converted the diIIerence into a positive number.
#!1bn1awk -1
# demonstrate use o1 exp{), 1og{) and sqrt n ^Wk
# e.g. what s the d11erence between usng 1ogarthms and regu1ar arthmetc
# note - exp and 1og are natura1 1og 1unctons - not base 10
#
8L61u {
# what s the about o1 error that w11 be reported?
L8808=0.000000000001,
# 1oop a 1ong wh1e
1or {=1,<=2147482647,++) {
# 1nd 1og o1
1og=1og{),
# what s square root o1 ?
# dvde the 1og by 2
1ogsquareroot=1og12,
# convert 1og o1 back
squareroot=exp{1ogsquareroot),
# 1nd the d11erence between the 1ogarthmc ca1cu1aton
# and the bu1t n ca1cu1aton
d11=sqrt{)-squareroot,
# make d11erence postve
1 {d11 < 0) {
d11*=-1,
|
1 {d11 > L8808) {
prnt1{"x10d, squareroot: x16.81, error: x16.141\n",
, squareroot, d11),
|
|
ext,
|
Cick here to get Iie: awkexampe13.awk
Yawn. This exampe isn't too exciting. except to those who enioy nitpicking. Expect the program to reach 3 miion beIore you see any errors.
I' give you a more exciting sampe soon.
Truncating Integer
A version oI AWKcontain the int Iunction. This truncates a number. making it an integer. It can be used to round numbers by adding 0.5:
printI("rounding 8.4I gives 8dn". x. int(x0.5));
"Random Number
NAWK has Iunctions that can generate random numbers. The Iunction rand returns a random number between 0 and 1. Here is an exampe
that cacuates a miion random numbers between 0 and 100. and counts how oIten each number was used:
#!1usr1bn1nawk -1
# o1d ^Wk doesn't have rand{) and srand{)
# on1y new ^Wk has them
# how random s the random 1uncton?
8L61u {
# srand{),
=0,
wh1e {++<1000000) {
x=nt{rand{)*100 + 0.5),
y]x1++,
|
1or {=0,<=100,++) {
prnt1{"xdtxd\n",y]1,),
|
ext,
|
Cick here to get Iie: random.awk
II you execute this script severa times. you wi get the exact same resuts. Experienced programmers know random number generators aren't
reay random. uness they use specia hardware. These numbers are pseudo-random. and cacuated using some agorithm. Since the
agorithm is Iixed. the numbers are repeatabe uness the numbers are seeded with a unique vaue. This is done using the srand Iunction
above. which is commented out. Typicay the random number generator is not given a specia seed unti the bugs have been worked out oI the
28 Awk - A Tutoria and Introduction - by Bruce Barnett
06/24/2011 07:50:19 PM http://www.grymoire.com/Unix/Awk.htm
program. There 's nothing more Irustrating than a bug that occurs randomy. The srand Iunction may be given an argument. II not. it uses the
current time and day to generate a seed Ior the random number generator.
The Lotto cript
I promised a more useIu script. This may be what you are waiting Ior. It reads two numbers. and generates a ist oI random numbers. I ca
the script "otto.awk."
#!1usr1bn1nawk -1
8L61u {
# ^ssume we want 6 random numbers between 1 and 26
# We cou1d get ths n1ormaton by readng standard nput,
# but ths examp1e w11 use a 1xed set o1 parameters.
#
# Irst, nta1ze the seed
srand{),
# uow many numbers are needed?
uuM=6,
# what s the mnmum number
M1u=1,
# and the maxmum?
M^x=26,
# uow many numbers w11 we 1nd? start wth 0
uumber=0,
wh1e {uumber < uuM) {
r=nt{{{rand{) *{1+M^x-M1u))+M1u)),
# have 1 seen ths number be1ore?
1 {array]r1 == 0) {
# no, 1 have not
uumber++,
array]r1++,
|
|
# now output a11 numbers, n order
1or {=M1u,<=M^x,++) {
# s t marked n the array?
1 {array]1) {
# yes
prnt1{"xd ",),
|
|
prnt1{"\n"),
ext,
|
Cick here to get Iie: otto.awk
II you do win a ottery. send me a postcard .
String Function
Besides numeric Iunctions. there are two other types oI Iunction: strings and the whatchamacaits. First. a ist oI the string Iunctions:
+-------------------------------------------------+
| ^Wk 1ab1e 10 |
| trng Iunctons |
|uame varant |
+-------------------------------------------------+
|ndex{strng,search) ^Wk, u^Wk, 6^Wk |
|1ength{strng) ^Wk, u^Wk, 6^Wk |
|sp1t{strng,array,separator) ^Wk, u^Wk, 6^Wk |
|substr{strng,poston) ^Wk, u^Wk, 6^Wk |
|substr{strng,poston,max) ^Wk, u^Wk, 6^Wk |
|sub{regex,rep1acement) u^Wk, 6^Wk |
|sub{regex,rep1acement,strng) u^Wk, 6^Wk |
|gsub{regex,rep1acement) u^Wk, 6^Wk |
|gsub{regex,rep1acement,strng) u^Wk, 6^Wk |
|match{strng,regex) u^Wk, 6^Wk |
|to1ower{strng) 6^Wk |
|toupper{strng) 6^Wk |
+-------------------------------------------------+
Most peope Iirst use AWKto perIorm simpe cacuations. Associative arrays and trigonometric Iunctions are somewhat esoteric Ieatures. that
new users embrace with the eagerness oI a chain smoker in a Iireworks Iactory. I suspect most users add some simpe string Iunctions to their
repertoire once they want to add a itte more sophistication to their AWK scripts. I hope this coumn gives you enough inIormation to inspire
your next eIIort.
There are Iour string Iunctions in the origina AWK: index(). length(). split(). and substr(). These Iunctions are quite versatie .
29 Awk - A Tutoria and Introduction - by Bruce Barnett
06/24/2011 07:50:19 PM http://www.grymoire.com/Unix/Awk.htm
The Length function
What can I say? The length() Iunction cacuates the ength oI a string. I oIten use it to make sure my input is correct. II you wanted to ignore
empty ines. check the ength oI the each ine beIore processing it with
iI (ength($0) ~ 1)
. . .
}
You can easiy use it to print a ines onger than a certain ength. etc. The Ioowing command centers a ines shorter than 80 characters:
#!1bn1awk -1
{
1 {1ength{$0) < 80) {
pre1x = "",
1or { = 1,<{80-1ength{$0))12,++)
pre1x = pre1x " ",
prnt pre1x $0,
| e1se {
prnt,
|
|
Cick here to get Iie: center.awk
The Index Function
II you want to search Ior a specia character. the index() Iunction wi search Ior speciIic characters inside a string. To Iind a comma. the code
might ook ike this:
sentence"This is a short. meaningess sentence.";
iI (index(sentence. ". ") ~ 0)
printI("Found a comma in position \d\n". index(sentence. ". "));
}
The Iunction returns a positive vaue when the substring is Iound . The number speciIied the ocation oI the substring.
II the substring consists oI 2 or more characters. a oI these characters must be Iound . in the same order. Ior a non-z ero return vaue. Like the
length() Iunction. this is useIu Ior checking Ior proper input conditions.
The Subtr function
The substr() Iunction can extract a portion oI a string. One common use is to spit a string into two parts based on a specia character. II you
wanted to process some mai addresses. the Ioowing code Iragment might do the iob:
#!1bn1awk -1
{
# 1e1d 1 s the e-ma1 address - perhaps
1 {{x=ndex{$1,"")) > 0) {
username = substr{$1,1,x-1),
hostname = substr{$1,x+1,1ength{$1)),
# the above s the same as
# hostname = substr{$1,x+1),
prnt1{"username = xs, hostname = xs\n", username, hostname),
|
|
Cick here to get Iie: emai.awk
The substr() Iunction takes two or three arguments. The Iirst is the string. the second is the position. The optiona third argument is the ength oI
the string to extract. II the third argument is missing. the rest oI the string is used.
The substr Iunction can be used in many non-obvious ways. As an exampe . it can be used to convert upper case etters to ower case.
#!1usr1bn1awk -1
# convert upper case 1etters to 1ower case
8L61u {
lC="abcde1ghjk1mnopqrstuvwxyz",
uC="^8C0LI6u1JklMu0081uvWxY2",
|
{
out="",
# 1ook at each character
30 Awk - A Tutoria and Introduction - by Bruce Barnett
06/24/2011 07:50:19 PM http://www.grymoire.com/Unix/Awk.htm
1or{=1,<=1ength{$0),++) {
# get the character to be checked
char=substr{$0,,1),
# s t an upper case 1etter?
j=ndex{uC,char),
1 {j > 0 ) {
# 1ound t
out = out substr{lC,j,1),
| e1se {
out = out char,
|
|
prnt1{"xs\n", out),
|
Cick here to get Iie: uppertoower.awk
GAWK' Tolower and Toupper function
GAWK has the toupper() and tolower() Iunctions. Ior convenient conversions oI case . These Iunctions take strings. so you can reduce the
above script to a singe ine :
#!1usr11oca11bn1gawk -1
{
prnt to1ower{$0),
|
Cick here to get Iie: uppertoower.gawk
The Split function
Another way to spit up a string is to use the split() Iunction. It takes three arguments: the string. an array. and the separator. The Iunction
returns the number oI pieces Iound. Here is an exampe :
#!1usr1bn1awk -1
8L61u {
# ths scrpt breaks up the sentence nto words, usng
# a space as the character separatng the words
strng="1hs s a strng, s t not?",
search=" ",
n=sp1t{strng,array,search),
1or {=1,<=n,++) {
prnt1{"Word]xd1=xs\n",,array]1),
|
ext,
|
Cick here to get Iie: awkexampe14.sh
The third argument is typicay a singe character. II a onger string is used. ony the Iirst etter is used as a separator.
NAWK' tring function
NAWK (and GAWK) have additiona string Iunctions. which add a primitive SED-ike Iunctionaity: sub(). atch(). and gsub().
Sub() perIorms a string substitution. ike sed. To repace "od" with "new" in a string. use
sub(/od/. "new". string)
II the third argument is missing. $0 is assumed to be string searched . The Iunction returns 1 iI a substitution occurs. and 0 iI not. II no sashes
are given in the Iirst argument. the Iirst argument is assumed to be a variabe containing a reguar expression. The sub() ony changes the Iirst
occurrence. The gsub() Iunction is simiar to the g option in sed: a occurrence are converted . and not iust the Iirst. That is. iI the patter occurs
more than once per ine (or string). the substitution wi be perIormed once Ior each Iound pattern. The Ioowing script:
#!1usr1bn1nawk -1
8L61u {
strng = "^nother samp1e o1 an examp1e sentence",
pattern="]^a1n",
1 {gsub{pattern,"^u",strng)) {
prnt1{"ubsttuton occurred: xs\n", strng),
31 Awk - A Tutoria and Introduction - by Bruce Barnett
06/24/2011 07:50:19 PM http://www.grymoire.com/Unix/Awk.htm
|
ext,
|
Cick here to get Iie: awkexampe15.awk
print the Ioowing when executed:
Substitution occurred: ANother sampe oI AN exampe sentence
As you can see. the pattern can be a reguar expression.
The Match function
As the above demonstrates. the sub() and gsub() returns a positive vaue iI a match is Iound . However. it has a side -eIIect oI changing the
string tested . II you don' t wish this. you can copy the string to another variabe. and test the spare variabe. NAWKaso provides the atch()
Iunction. II atch() Iinds the reguar expression. it sets two specia variabes that indicate where the reguar expression begins and ends. Here
is an exampe that does this:
#!1usr1bn1nawk -1
# demonstrate the match 1uncton
8L61u {
regex="]a-z^-20-91+",
|
{
1 {match{$0,regex)) {
# 81^81 s where the pattern starts
# 8lLu61u s the 1ength o1 the pattern
be1ore = substr{$0,1,81^81-1),
pattern = substr{$0,81^81,8lLu61u),
a1ter = substr{$0,81^81+8lLu61u),
prnt1{"xs<xs>xs\n", be1ore, pattern, a1ter),
|
|
Cick here to get Iie: awkexampe16.awk
Lasty. there are the whatchamacait Iunctions. I coud use the word "misceaneous. " but it' s too hard to spe. Darn it. I had to ook it up
anyway.
+-----------------------------------------------+
| ^Wk 1ab1e 11 |
| Msce11aneous Iunctons |
|uame varant |
+-----------------------------------------------+
|get1ne ^Wk, u^Wk, 6^Wk |
|get1ne <11e u^Wk, 6^Wk |
|get1ne varab1e u^Wk, 6^Wk |
|get1ne varab1e <11e u^Wk, 6^Wk |
|"command" | get1ne u^Wk, 6^Wk |
|"command" | get1ne varab1e u^Wk, 6^Wk |
|system{command) u^Wk, 6^Wk |
|c1ose{command) u^Wk, 6^Wk |
|systme{) 6^Wk |
|str1tme{strng) 6^Wk |
|str1tme{strng, tmestamp) 6^Wk |
+-----------------------------------------------+
The Sytem function
NAWK has a Iunction svste() that can execute any program. It returns the exit status oI the program.
iI (system("/bin/rm iunk") ! 0)
print "command didn' t work";
The command can be a string. so you can dynamicay create commands based on input. Note that the output isn't sent to the NAWK program.
You coud send it to a Iie. and open that Iie Ior reading. There is another soution. however.
The Getline function
AWKhas a command that aows you to Iorce a new ine. It doesn' t take any arguments. It returns a 1. iI successIu. a 0 iI end -oI-Iie is
reached . and a -1 iI an error occurs. As a side eIIect. the ine containing the input changes. This next script Iiters the input. and iI a
32 Awk - A Tutoria and Introduction - by Bruce Barnett
06/24/2011 07:50:19 PM http://www.grymoire.com/Unix/Awk.htm
backsash occurs at the end oI the ine . it reads the next ine in. eiminating the backsash as we as the need Ior it.
#!1usr1bn1awk -1
# 1ook 1or a as the 1ast character.
# 1 1ound, read the next 1ne and append
{
1ne = $0,
wh1e {substr{1ne,1ength{1ne),1) == "\\") {
# chop o11 the 1ast character
1ne = substr{1ne,1,1ength{1ne)-1),
=get1ne,
1 { > 0) {
1ne = 1ne $0,
| e1se {
prnt1{"mssng contnuaton on 1ne xd\n", u8),
|
|
prnt 1ne,
|
Cick here to get Iie: awkexampe17.awk
Instead oI reading into the standard variabes. you can speciIy the variabe to set:
getine aine
print aine ;
NAWK and GAWK aow the getline Iunction to be given an optiona Iiename or string containing a Iiename. An exampe oI a primitive Iie
preprocessor. that ooks Ior ines oI the Iormat
#incude Iiename
and substitutes that ine Ior the contents oI the Iie:
#!1usr1bn1nawk -1
{
# a prmtve nc1ude preprocessor
1 {{$1 == "#nc1ude") && {uI == 2)) {
# 1ound the name o1 the 11e
11ename = $2,
wh1e { = get1ne < 11ename ) {
prnt,
|
| e1se {
prnt,
|
|
Cick here to get Iie: incude.nawk
NAWK's getine can aso read Irom a pipe. II you have a program that generates singe ine. you can use
"command" ' getine ;
print $0;
or
"command" ' getine abc ;
print abc;
II you have more than one ine . you can oop through the resuts:
wh1e {"command" | get1ne) {
cmd]++1 = $0,
|
1or { n cmd) {
prnt1{"xs=xs\n", , cmd]1),
|
Ony one pipe can be open at a time . II you want to open another pipe. you must execute
cose("command ");
This is necessary even iI the end oI Iie is reached .
33 Awk - A Tutoria and Introduction - by Bruce Barnett
06/24/2011 07:50:19 PM http://www.grymoire.com/Unix/Awk.htm
The ytime function
The svstie() Iunction returns the current time oI day as the number oI seconds since Midnight. January 1. 1970. It is useIu Ior measuring how
ong portions oI your GAWK code takes to execute .
#!1usr11oca11bn1gawk -1
# how 1ong does t take to do a 1ew 1oops?
8L61u {
l00=100,
# do the test twce
start=systme{),
1or {=0,<l00,++) {
|
end = systme{),
# ca1cu1ate how 1ong t takes to do a dummy test
do_nothng = end-start,
# now do the test agan wth the *1M081^u1* code nsde
start=systme{),
1or {=0,<l00,++) {
# uow 1ong does ths take?
wh1e {"date" | get1ne) {
date = $0,
|
c1ose{"date"),
|
end = systme{),
newtme = {end - start) - do_nothng,
1 {newtme <= 0) {
prnt1{"xd 1oops were not enough to test, ncrease t\n",
l00),
ext,
| e1se {
prnt1{"xd 1oops took x6.41 seconds to execute\n",
l00, newtme),
prnt1{"1hat's x10.81 seconds per 1oop\n",
{newtme)1l00),
# snce the c1ock has an accuracy o1 +1- one second, what s the error
prnt1{"accuracy o1 ths measurement = x6.21xx\n",
{11{newtme))*100),
|
ext,
|
Cick here to get Iie: awkexampe17.gawk
The Strftime function
GAWK has a specia Iunction Ior creating strings based on the current time . It's based on the strftie(3c ) Iunction. II you are Iamiiar with the
"" Iormats oI the date(1) command . you have a good head -start on understanding what the strftie command is used Ior. The svstie()
Iunction returns the current date in seconds. Not very useIu iI you want to create a string based on the time . Whie you coud convert the
seconds into days. months. years. etc .. it woud be easier to execute "date" and pipe the resuts into a string. (See the previous script Ior an
exampe). GAWK has another soution that eiminates the need Ior an externa program.
The Iunction takes one or two arguments. The Iirst argument is a string that speciIied the Iormat. This string contains reguar characters and
specia characters. Specia characters start with a backsash or the percent character. The backsash characters with the backsash preIix are
the same I covered earier. In addition. the strftie() Iunction deIines doz ens oI combinations. a oI which start with "." The Ioowing tabe
ists these specia sequences:
+---------------------------------------------------------------+
| ^Wk 1ab1e 12 |
| 6^Wk's str1tme 1ormats |
+---------------------------------------------------------------+
|xa 1he 1oca1e's abbrevated weekday name |
|x^ 1he 1oca1e's 1u11 weekday name |
|xb 1he 1oca1e's abbrevated month name |
|x8 1he 1oca1e's 1u11 month name |
|xc 1he 1oca1e's "approprate" date and tme representaton |
|xd 1he day o1 the month as a decma1 number {01--21) |
|xu 1he hour {24-hour c1ock) as a decma1 number {00--22) |
|x1 1he hour {12-hour c1ock) as a decma1 number {01--12) |
|xj 1he day o1 the year as a decma1 number {001--266) |
|xm 1he month as a decma1 number {01--12) |
|xM 1he mnute as a decma1 number {00--59) |
|xp 1he 1oca1e's equva1ent o1 the ^M1M |
|x 1he second as a decma1 number {00--61). |
|xu 1he week number o1 the year {unday s 1rst day o1 week) |
|xw 1he weekday as a decma1 number {0--6). unday s day 0 |
34 Awk - A Tutoria and Introduction - by Bruce Barnett
06/24/2011 07:50:19 PM http://www.grymoire.com/Unix/Awk.htm
|xW 1he week number o1 the year {Monday s 1rst day o1 week) |
|xx 1he 1oca1e's "approprate" date representaton |
|xx 1he 1oca1e's "approprate" tme representaton |
|xy 1he year wthout century as a decma1 number {00--99) |
|xY 1he year wth century as a decma1 number |
|x2 1he tme zone name or abbrevaton |
|xx ^ 1tera1 x. |
+---------------------------------------------------------------+
Depending on your operating system. and instaation. you may aso have the Ioowing Iormats:
+-----------------------------------------------------------------------+
| ^Wk 1ab1e 12 |
| 0ptona1 6^Wk str1tme 1ormats |
+-----------------------------------------------------------------------+
|x0 Lquva1ent to spec1yng xm1xd1xy |
|xe 1he day o1 the month, padded wth a b1ank 1 t s on1y one dgt |
|xh Lquva1ent to xb, above |
|xn ^ new1ne character {^C11 lI) |
|xr Lquva1ent to spec1yng x1:xM:x xp |
|x8 Lquva1ent to spec1yng xu:xM |
|x1 Lquva1ent to spec1yng xu:xM:x |
|xt ^ 1^8 character |
|xk 1he hour as a decma1 number {0-22) |
|x1 1he hour {12-hour c1ock) as a decma1 number {1-12) |
|xC 1he century, as a number between 00 and 99 |
|xu s rep1aced by the weekday as a decma1 number ]Monday == 11 |
|xv s rep1aced by the week number o1 the year {usng 10 8601) |
|xv 1he date n vM 1ormat {e.g. 20-Juu-1991) |
+-----------------------------------------------------------------------+
One useIu Iormat is
strItime ("ymdHMS")
This constructs a string that contains the year. month. day. hour. minute and second in a Iormat that aows convenient sorting. II you ran this
at noon on Christmas. 1994. it woud generate the string
941225120000
Here is the GAWK equivaent oI the date command :
#! 1usr11oca11bn1gawk -1
#
8L61u {
1ormat = "xa xb xe xu:xM:x x2 xY",
prnt str1tme{1ormat),
|
Cick here to get Iie: date.gawk
You wi note that there is no exit command in the begin statement. II I was using AWK. an exit statement is necessary. Otherwise. it woud
never terminate . II there is no action deIined Ior each ine read. NAWK and GAWK do not need an exit statement.
II you provide a second argument to the strftie() Iunction. it uses that argument as the timestamp. instead oI the current system' s time . This is
useIu Ior cacuating Iuture times. The Ioowing script cacuates the time one week aIter the current time:
#!1usr11oca11bn1gawk -1
8L61u {
# get current tme
ts = systme{),
# the tme s n seconds, so
one_day = 24 * 60 * 60,
next_week = ts + {7 * one_day),
1ormat = "xa xb xe xu:xM:x x2 xY",
prnt str1tme{1ormat, next_week),
ext,
|
Cick here to get Iie: oneweekater. gawk
Uer Defined Function
35 Awk - A Tutoria and Introduction - by Bruce Barnett
06/24/2011 07:50:19 PM http://www.grymoire.com/Unix/Awk.htm
Finay. NAWK and GAWK support user deIined Iunctions. This Iunction demonstrates a way to print error messages. incuding the Iiename
and ine number. iI appropriate:
#!1usr1bn1nawk -1
{
1 {uI != 4) {
error{"Lxpected 4 1e1ds"),
| e1se {
prnt,
|
|
1uncton error { message ) {
1 {I1lLu^ML != "-") {
prnt1{"xs: ", I1lLu^ML) > "1dev1tty",
|
prnt1{"1ne # xd, xs, 1ne: xs\n", u8, message, $0) >> "1dev1tty",
|
Cick here to get Iie: awkexampe18.nawk
AWK pattern
In my Iirst tutoria on AWK. I described the AWKstatement as having the Iorm
pattern commands}
I have ony used two patterns so Iar: the specia words BEGIN and END. Other patterns are possibe. yet I haven't used any. There are severa
reasons Ior this. The Iirst is that these patterns aren' t necessary. You can dupicate them using an if statement. ThereIore this is an "advanced
Ieature." Patterns. or perhaps the better word is conditions. tend to make an AWK program obscure to a beginner. You can think oI them as an
advanced topic. one that shoud be attempted aIter becoming Iamiiar with the basics.
A pattern or condition is simpy an abbreviated test. II the condition is true. the action is perIormed. A reationa tests can be used as a
pattern. The "head -10" command . which prints the Iirst 10 ines and stops. can be dupicated with
iI (NR 10 ) print}}
Changing the if statement to a condition shortens the code:
NR 10 print}
Besides reationa tests. you can aso use containment tests. i. e. do strings contain reguar expressions? Printing a ines that contain the word
"specia" can be written as
iI ($0 ~ /specia/) print}}
or more brieIy
$0 ~ /specia/ print}
This type oI test is so common. the authors oI AWK aow a third. shorter Iormat:
/specia/ print}
These tests can be combined with the AND (&&) and OR ([[) commands. as we as the NOT (') operator. Parenthesis can aso be added iI
you are in doubt. or to make your intention cear.
The Ioowing condition prints the ine iI it contains the word "whoe " or coumns 1 and 2 contain "part1" and "part2" respectivey.
($0 ~ /whoe /) '' (($1 ~ /part1/) && ($2 ~ /part2/)) print}
This can be shortened to
/whoe / '' $1 ~ /part1/ && $2 ~ /part2/ print}
There is one case where adding parenthesis hurts. The condition
/whoe / print}
works. but
(/whoe /) print}
36 Awk - A Tutoria and Introduction - by Bruce Barnett
06/24/2011 07:50:19 PM http://www.grymoire.com/Unix/Awk.htm
does not. II parenthesis are used. it is necessary to expicity speciIy the test:
($0 ~ /whoe ) print}
A murky situation arises when a simpe variabe is used as a condition. Since the variabe NF speciIies the number oI Iieds on a ine. one
might think the statement
NF print}
woud print a ines with one oI more Iieds. This is an iega command Ior AWK. because AWKdoes not accept variabes as conditions. To
prevent a syntax error. I had to change it to
NF ! 0 print}
I expected NAWK to work. but on some SunOS systems it reIused to print any ines at a. On newer Soaris systems it did behave propery.
Again. changing it to the onger Iorm worked Ior a variations. GAWK. ike the newer version oI NAWK. worked propery. AIter this
experience. I decided to eave other. exotic variations aone. Ceary this is unexpored territory. I coud write a script that prints the Iirst 20
ines. except iI there were exacty three Iieds. uness it was ine 10. by using
NF 3 ? NR 10 : NR 20 print }
But I won't. Obscurity. ike puns. is oIten unappreciated.
There is one more common and useIu pattern I have not yet described. It is the comma separated pattern. A common exampe has the Iorm:
/start/. /stop/ print}
This Iorm deIines. in one ine . the condition to turn the action on. and the condition to turn the action oII. That is. when a ine containing
"start" is seen. it is printed . Every ine aIterwards is aso printed. unti a ine containing "stop" is seen. This one is aso printed. but the ine
aIter. and a Ioowing ines. are not printed. This triggering on and oII can be repeated many times. The equivaent code . using the if
command . is:
{
1 {$0 ~ 1start1) {
trggered=1,
|
1 {trggered) {
prnt,
1 {$0 ~ 1stop1) {
trggered=0,
|
|
|
The conditions do not have to be reguar expressions. Reationa tests can aso be used. The Ioowing prints a ines between 20 and 40:
(NR20). (NR40) print}
You can mix reationa and containment tests. The Ioowing prints every ine unti a "stop" is seen:
(NR1). /stop/ print}
There is one more area oI conIusion about patterns: each one is independent oI the others. You can have severa patterns in a script; none
inIuence the other patterns. II the Ioowing script is executed :
NR10 print}
(NR5). (NR15) print}
/xxx/ print}
(NR1). /NeVerMatchThiS/ print}
and the input Iie's ine 10 contains "xxx. " it woud be printed 4 times. as each condition is true. You can think oI each condition as
cumuative. The exception is the specia BEGIN and END conditions. In the origina AWK. you can ony have one oI each. In NAWKand
GAWK. you can have severa BEGIN or ENDactions.
Formatting AWK program
Many readers have questions my stye oI AWKprogramming. In particuar. they ask me why I incude code ike this:
# Print coumn 3
print $3;
when I coud use
37 Awk - A Tutoria and Introduction - by Bruce Barnett
06/24/2011 07:50:19 PM http://www.grymoire.com/Unix/Awk.htm
print $3 # print coumn 3
AIter a. they reason. the semicoon is unnecessary. and comments do not have to start on the Iirst coumn. This is true. Sti I avoid this.
Years ago. when I started writing AWKprograms. I woud Iind myseI conIused when the nesting oI conditions were too deep. II I moved a
compex if statement inside another if statement. my aignment oI braces became incorrect. It can be very diIIicut to repair this condition.
especiay with arge scripts. Nowadays I use eacs to do the Iormatting Ior me. but 10 years ago I didn' t have this option. My soution was to
use the program cb. which is a "C beautiIier. " By incuding optiona semicoons. and starting comments on the Iirst coumn oI the ine. I coud
send my AWKscript through this Iiter. and propery aign a oI the code.
Environment Variable
I've described the 7 specia variabes in AWK. and brieIy mentioned some others NAWK and GAWK. The compete ist Ioows:
+--------------------------------+
| ^Wk 1ab1e 14 |
|varab1e ^Wk u^Wk 6^Wk |
+--------------------------------+
|I Yes Yes Yes |
|uI Yes Yes Yes |
|8 Yes Yes Yes |
|u8 Yes Yes Yes |
|I1lLu^ML Yes Yes Yes |
|0I Yes Yes Yes |
|08 Yes Yes Yes |
+--------------------------------+
|^86C Yes Yes |
|^86v Yes Yes |
|^861u0 Yes |
|Iu8 Yes Yes |
|0IM1 Yes Yes |
|81^81 Yes Yes |
|8lLu61u Yes Yes |
|u8L Yes Yes |
|Luv180u Yes |
|16u08LC^L Yes |
|C0uvIM1 Yes |
|L88u0 Yes |
|I1Ll0W101u Yes |
+--------------------------------+
Since I' ve aready discussed many oI these. I' ony cover those that I missed earier.
ARGC - Number or argument (NAWK/GAWK)
The variabe ARGC speciIies the number oI arguments on the command ine. It aways has the vaue oI one or more. as it counts its own
program name as the Iirst argument. II you speciIy an AWK Iiename using the "-I" option. it is not counted as an argument. II you incude a
variabe on the command ine using the Iorm beow. NAWK does not count as an argument.
nawk -I Iie.awk x17
GAWK does. but that is because GAWK requires the "-v" option beIore each assignment:
gawk -I Iie.awk -v x17
GAWK variabes initiaiz ed this way do not aIIect the ARGC variabe.
ARGV - Array of argument (NAWK/GAWK)
The ARGV array is the ist oI arguments (or Iies) passed as command ine arguments.
ARGIND - Argument Index (GAWK only)
The ARGIND speciIies the current index into the ARGV array. and thereIore ARGV[ARGIND] is aways the current Iiename. ThereIore
FILENAME ARGV|ARGIND|
is aways true. It can be modiIied . aowing you to skip over Iies. etc.
FNR (NAWK/GAWK)
38 Awk - A Tutoria and Introduction - by Bruce Barnett
06/24/2011 07:50:19 PM http://www.grymoire.com/Unix/Awk.htm
The FNR variabe contains the number oI ines read. but is reset Ior each Iie read. The NRvariabe accumuates Ior a Iies read. ThereIore iI
you execute an awk script with two Iies as arguments. with each containing 10 ines:
nawk ' print NR}' Iie Iie2
nawk ' print FNR}' Iie Iie2
the Iirst program woud print the numbers 1 through 20. whie the second woud print the numbers 1 through 10 twice. once Ior each Iie.
OFMT (NAWK/GAWK)
The OFMT variabe SpeciIies the deIaut Iormat Ior numbers. The deIaut vaue is ".6g."
RSTART. RLENGTH and match (NAWK/GAWK)
I've aready mentioned the RSTART and RLENGTH variabes. AIter the atch() Iunction is caed. these variabes contain the ocation in the
string oI the search pattern. RLENGTH contains the ength oI this match.
SUBSEP - Multi-dimenional array eparator (NAWK/GAWK)
Earier I described how you can construct muti-dimensiona arrays in AWK. These are constructed by concatenating two indexes together
with a specia character between them. II I use an ampersand as the specia character. I can access the vaue at ocation X. Y by the
reIerence
array| X "&" Y |
NAWK (and GAWK) has this Ieature buit in. That is. you can speciIy the array eement
array|X. Y|
It automaticay constructs the string. pacing a specia character between the indexes. This character is the non-printing character "034." You
can contro the vaue oI this character. to make sure your strings do not contain the same character.
ENVIRON - environment variable (GAWK only)
The ENVIRON array contains the environment variabes the current process. You can print your current search path using
print ENVIRON|"PATH"|
IGNORECASE (GAWK only)
The IGNORECASE variabe is normay z ero. II you set it to non-z ero. then a pattern matches ignore case. ThereIore the Ioowing is
equivaent to "grep -i match:"
BEGIN IGNORECASE1;}
/match/ print}
CONVFMT - converion format (GAWK only)
The CONVFMT variabe is used to speciIy the Iormat when converting a number to a string. The deIaut vaue is ".6g." One way to truncate
integers is to convert an integer to a string. and convert the string to an integer - modiIying the actions with CONVFMT:
a 12;
b a "";
CONVFMT "2. 2I";
c a "";
Variabes b and c are both strings. but the Iirst one wi have the vaue "12.00" whie the second wi have the vaue "12. "
ERRNO - ytemerror (GAWK only)
The ERRNO variabe describes the error. as a string. aIter a ca to the getline command Iais.
39 Awk - A Tutoria and Introduction - by Bruce Barnett
06/24/2011 07:50:19 PM http://www.grymoire.com/Unix/Awk.htm
FIELDWIDTHS - fixed width field (GAWK only)
The FIELDWIDTHS variabe is used when processing Iixed width input. II you wanted to read a Iie that had 3 coumns oI data; the Iirst one
is 5 characters wide . the second 4. and the third 7. you coud use substr to spit the ine apart. The technique . using FIELDWIDTHS. woud
be:
BEGIN FIELDWIDTHS"5 4 7";}
printI("The three Iieds are s s s\n". $1. $2. $3);}
AWK. NAWK. GAWK. or PERL
This concudes my series oI tutorias on AWKand the NAWKand GAWK variants. I originay intended to avoid extensive coverage oI
NAWK and GAWK. I am a PERL user myseI. and sti beieve PERL is worth earning. yet I don't intend to cover this very compex
anguage . Understanding AWKis very useIu when you start to earn PERL. and a part oI me Iet that teaching you extensive NAWK and
GAWK Ieatures might encourage you to bypass PERL.
I Iound myseI going into more depth than I panned. and I hope you Iound this useIu. I Iound out a ot myseI. especiay when I discussed
topics not covered in other books. Which reminds me oI some cosing advice: iI you don't understand how something works. experiment and
see what you can discover. Good uck. and happy AWKing...
Other oI my Unix she tutorias can be Iound here. Other she tutorias can be Iound at Heiner's SHELLdorado and Chris F. A. Johnson's
Unix She Page
I'd ike to thank the Ioowing Ior Ieedback:
8anjt ngh
8chard Jans 8eckert
Chrstan uaarmann
kuang ue
6uenter knau1
Chay Wes1ey
1odd ^ndrews
This docuent was translated bv troff2htl v0.21 on Septeber 22. 2001 and then anuallv edited to ake it copliant with.

You might also like