Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 50

Shell Scripting

&
awk

2013 GlobalLogic Inc.

CONFIDENTIAL

Agenda

Shell scripting

Standard file descriptors & IO redirection


File test operators
Functions
Executing SQLs/FTP from a shell-script

awk

Programming model
Records and fields
Pattern matching
Functions
System/built-in variables
Data manipulation and report Generation
CONFIDENTIAL

Training schedule

Day-1

2013 GlobalLogic Inc.

Shell scripting
awk

CONFIDENTIAL

Shell scripting

Standard file descriptors & IO redirection


Executing SQLs/FTP from a shell-script
File test operators
Functions

CONFIDENTIAL

UNIX standard files : IO-redirection

Re-direction of input and output: >, >>, <,<<


Standard streams, their (C) names and file-descriptors:
Input stdin - 0
Output stdout - 1
Error stderr - 2

UNIX standard files [ contd.]

Ex.1
cat x1 vs cat < x1
cat x1 > x2 vs cat < x1 > x2

UNIX standard files [ contd.]

Ex.2
1. Login as a non root user and go to root - cd /
2. Find everything find .
3. Again, find everything, but redirect output to a file
find . > $HOME/x1
What is being shown on screen? Why?
4. Redirect o/p & error to different files:
find . > $HOME/x1 2>$HOME/x2
5. Send o/p & error to different files:

UNIX Shell Scripting

Here document (heredoc/hereis/here-script)


Special purpose code-block, read something from stdin
Spread over multiple line
Delimiter in the last line, first column
Example, - send mail with message spread over multiple lines:

mail -s "Hello" yogesh.mahajan@globallogic.com <<!


Test command
!
See more exampes http://tldp.org/LDP/abs/html/here-docs.html

UNIX FTP examples

FTP
ftp -n -i ftp.FreeBSD.org <<END_SCRIPT
user anonymous
pass MyScretPassword
ls
bye
END_SCRIPT
Receive input from a file:
ftp n i my.ftp.server < FileContaingCommands

Redirect output to a file for further analysis


ftp -n -i ftp.FreeBSD.org <<END_SCRIPT >TMPLOG 2>&1
user anonymous
pass MyScretPassword
ls
bye
END_SCRIPT

UNIX FTP example

cat -n 07ExampleFTP.sh
1
#!/bin/sh
2
# Usage:
3
# 07ExampleFTP.sh machine file
4
# set -x
5
SOURCE=$1
6
FILE=$2
7
GETHOST="uname -n"
8
BFILE=`basename $FILE`
9
ftp -n $SOURCE <<EndFTP
10
ascii
11
user anonymous $USER@`$GETHOST`
12
get $FILE /tmp/$BFILE
13
EndFTP

UNIX Shell Scripting

Ex: Redirect output, as well as error to the location where output is being directed:
isql $DATBASE_NAME <<END_EXEC >/home/ymahajan/Log 2>&1
select
request_id ,
trade_date ,
from
trade_table
where
request_id = "$REQUEST_ID";
END_EXEC

UNIX SQL handling

Executing SQL queries:


sqlite3 TrainingDB "select * from orders;"
Write query over multiple lines:
sqlite3 TrainingDB<<!
select * from orders;
!
Redirect output to a file:
sqlite3 TrainingDB > Output.txt <<END_SQL
select * from t1;
END_SQL
cat Output.txt
-Or process output without using a file:
sqlite3 TrainingDB
<<! | cut -d '|' -f 2select * from t1;
!

UNIX SQL handling [contd.]

Multiple SQLs:
sqlite3 TrainingDB
<<END_SQL
.mode column
.header on
select DATA AS DATA from t1;
Select * from orders;
END_SQL
-A single-column:
sqlite3 TrainingDB
<<END_SQL
.mode column
.header on
select DATA AS DATA
from t1;
END_SQL

UNIX SQL handling [contd. 2]

Process output in this single column:


sqlite3 TrainingDB
<<END_SQL | sed -e "/^$/d"
.mode column
.header on
select DATA AS MyOwnData
from t1;
END_SQL

-e "/MyOwnData/d"

Assign this to some variable, and process it:


SomeVariable=`sqlite3 TrainingDB
<<END_SQL | sed -e "s/-//g" | sed -e "/^$/d"
"/MyOwnData/d"
.mode column
.header on
select DATA AS MyOwnData
from t1;
END_SQL
`
echo $SomeVariable
for eachVar in $SomeVariable; do echo "xxxxx $SomeVariable" ;done;

-e

UNIX SQL handling [contd. 3]

Redirect output to some file


SomeVariable=`sqlite3 TrainingDB
<<END_SQL | sed -e "s/-//g"
| sed -e "/^$/d" -e "/MyOwnData/d" | tee TMPLOG 2>&1
.mode column
.header on
select DATA AS MyOwnData
from t1;
END_SQL
`
cat TMPLOG
Dont type it every timeuse the queries from an existing file:
sqlite3 TrainingDB < SQLInput

UNIX SQL handling [contd. 4]

tee:
sqlite3 TrainingDB
<<END_SQL | sed -e "s/-//g" | sed -e
"/^$/d" -e "/MyOwnData/d"
> TMPLOG 2>&1
.mode column
.header on
select DATA AS MyOwnData
from t1;
END_SQL
---vs--sqlite3 TrainingDB
<<END_SQL | sed -e "s/-//g" | sed -e
"/^$/d" -e "/MyOwnData/d" | tee TMPLOG 2>&1
.mode column
.header on
select DATA AS MyOwnData
from t1;
END_SQL

UNIX SQL handling [contd. 5]

Error Handling: Use redirection-operators to direct output to a file for further analysis:
SomeVariable=`sqlite3 TrainingDB
<<END_SQL 2>&1 | sed -e "s/-//g" | sed -e "/^$/d"
-e "/MyOwnData/d" | tee TMPLOG
.mode column
.header on
select ColNotPresent AS MyOwnData
from t1;
END_SQL
`
echo $SomeVariable;
grep ie error TMPLOG
if [ $? = 0 ]
echo "$0: INFORMIX SQL ERROR: Script $0 failed in $Query" | tee -a $OPC >> $LOG
exit $retcd
else
echo "$0: QUERY $Query SUCCESSFUL >> $LOG
fi
WHY did NOT we compare $0 to 0/1 directly after SQL query instead of doing an grep on TMPLOG and
then doing it?

The tee command

tee [-a] file read from standard i/p and writes to


standard o/p & files
The tee command can be used to send standard output to
the screen and to a file simultaneously.
make | tee build.log
Runs the make command and stores its output to build.log.
make install | tee -a build.log
Runs the make install command and appends its output to
build.log.

Utilities automate tasks


Run a task in background:
&
nohup

Sending mails from UNIX


mail -s "Hello"
yogesh.mahajan@globallogic.com <<!
Test message `date`
From: $USER
Server: $HOSTNAME
!

File test operators

Files attributes comparison


[ parameter FILE ] OR test parameter FILE
Ex. test -x x.awk && echo Executable || echo NOT executable
--OR -if [ -x x.awk ]; then echo Executable; else echo NOT executable; fi
Returns true if...
-e
file exists
-a
file exists: identical to e, but has been "deprecated
-f
file is a regular file, i.e. not a directory or device file
-s
file is not zero size
-d
directory
-b
a block device
-c
a character device
-p
pipe
-h/-L
symbolic-link

File test operators

-S
-t
terminal.
-r/-w/-x
-g
-u
-t
write
-O
-G
-N
f1 -nt f2
f1 -ot f2
f1 -ef f2
!

socket
Terminal-device, e.g. whether the stdin [ -t 0 ] or stdout [ -t 1 ] in a given script is a
File has read / write / execute permission
set-group-id (sgid) flag set on file or directory. If true, then any file created in this
directory will have direcotorys group ID
set-user-id (suid) flag set on file
sticky bit set (the t at the end of ls l o/p) - the save-text-mode flag is a special type of
file permission, if set, then file will be kept in cache-memory, if set to a file, then
permission will be restricted.
you are owner of file
group-id of file same as yours
file modified since it was last read
file f1 is newer than f2
file f1 is older than f2
files f1 and f2 are hard links to the same file
"not" -- reverses the sense of the tests above (returns true if condition absent).

File test operators

File existence:
if [ -f /home/user11/Yogesh ]; then echo "File exists"; else echo "File NOT
present"; fi;

Directory existence:
if [ -d /home/user11/Yogesh ]; then echo 'This is not a file' ;fi;

Executable file:
if [ -x /home/user11/TrainingScripts/01Hello.sh ]; then echo 'Wow, I can run this!
' ;fi;

Writeable file:
if [ -w /home/user11/TrainingScripts/01Hello.sh ]; then echo 'Warning! This file could
be over-written!! ' ;fi;

File-test operators
Logical Operators:
! : NOT
-a : AND
-o : OR

Examples:
File empty or not?
if [ -f /home/user11/Yogesh.data a
-s /home/user11/Yogesh.data ];
then echo 'Some data in file' ;
fi;

Risk of losing data?


if [ -f /home/user11/Yogesh.data
-s /home/user11/Yogesh.data
-w /home/user11/Yogesh.data
then echo 'Danger of a non-empty
fi;

-a
-a
];
file being written over!' ;

Functions
To be used within the script
Functionsor procedures

A function may return a value in one of four different ways:

Change the state of a variable or variables


Use the exit command to end the shell script
Use the return command to end the function, and return the supplied value to the calling section of the shell script
echo output to stdout, which will be caught by the caller just as c=`expr $a + $b` is caught

Can be defined within a file


or inside a project library as well.
There is _NO_ scoping

24

Other than the parameters ($1, $2, $@ etc.)

CONFIDENTIAL

Functions: scoping
Declare as:
function_name () {
list of commands
}
Invoke as:
function_name
function_name

1 b 3 other-arguments

Return a value as:


return 10
Evaluate return code with $?
iRC=$ret
if [ $iRC ge 0 ] .
25

CONFIDENTIAL

Checklist Shell scripting

01
02
03
04

26

Standard file descriptors & IO redirection


Executing SQLs/FTP from a shell-script
File test operators
Shell-scripting: functions

CONFIDENTIAL

awk

27

awk programming model


Records and fields
Pattern matching
Functions
System/built-in variables
Data manipulation and report Generation

CONFIDENTIAL

awk programming model


Designed for text-processing and used typically for data-extraction
Data-driven, Interpreted
awk views input stream as a collection of records
Records are made up of fields
Fields is word w/ one/more non-whitespace characters
Fields are separated by one/more whitespace characters
An awk program consists of pairs of patterns and braced actions
All patterns are examined for every input record
Fields could be accessed by $1, $2 etc. $0 is for whole record
Program consist of main input loop, which gets executed over all the records
Typical awk programs looks like this:
BEGIN{}
{}
END{}
pattern { action }
pattern { action }

Input_file

28

CONFIDENTIAL

awk programming model


BEGIN What happens before processing,
e.g. initialization part
Main input loop the processing
END What happens after processing, e.g.
print some concluding stats

Main input loop:


- You dont write the loop, e.g. in C while(!
EOF) do {readline()}.
- Instructions are written as a series of
pattern/action procedures
- Multiple BEGIN/END/Main-loops are
possible will be executed in the order of
appearance

29

CONFIDENTIAL

awk programming model - examples


awk '$0 ~ /Rent/{print}' file
Rent,900
awk '/Rent/{print}' file
awk '/Rent/' file
awk -F, '$1 ~ /Rent/' file
Search only in first field
awk -F, '$1 == "Medicine"{print $2}' file
200 /n 600
awk '/Rent|cine/' file
3 lines for Medicine and Rent
awk '!/Medicine/' file
The non-medicine lines
awk -F, '$2>500' file
where did I spent more than 500?
awk 'NR==3|| NR==5' file
3rd and 5th lines
awk 'NR!=1' file
skip the header
awk 'BEGIN{IGNORECASE=1} /Rent/' file
Rent + Restaurent as well!
awk '/Rent/{print} /cine/{print}' file
+Medicine
awk 'BEGIN{IGNORECASE=1;print("--START--")} /Rent/{print} /cine/;END{print ("-END--")}' file
....+ report-heading / footer!
awk 'NR>2{ print x} {x=$0}' file
Skip first and the last line...what/how?

30

CONFIDENTIAL

cat file
Medicine,200
Grocery,500
Rent,900
Grocery,800
Medicine,600
Restaurent,300
<empty line>

awk programming model BEGIN and END


Implementation of wc-l in awk (run as awk f awkScriptName inputFileName)
BEGIN { lines = 0}
{ lines = lines + 1 }
END { print lines }
Implementation of cat n in awk (run as awk f awkScriptName inputFileName)
BEGIN { linenum = 0 }
{
linenum = linenum + 1
print \t linenum $0
}

31

CONFIDENTIAL

awk programming model - examples


awk -f awkscript02.awk file
Run the script from a file
awk is C-like input language, so youll see printf(), if, while, for with syntax as
exactly same as that in C

32

CONFIDENTIAL

awk programming model basic awk programs


cat awkscript01.awk
1
BEGIN{
2
IGNORECASE=1
3
print("--START--")
4
}
5
6
/Rent/{print} /cine/;
7
8
END{
9
print ("--END--")
10
}

cat -n awkscript02.awk
1
BEGIN{
33
2
IGNORECASE=1
3
print("--START--")

CONFIDENTIAL

Records and fields


Input is structured and not just an endless string of characters.

- Delimited by spaces or tabs


echo a b c d | awk 'BEGIN { one = 1; two = 2 } { print $(one + two) } c
echo a,b,c,d | awk 'BEGIN { one = 1; two = 2 } { print $(one + two) } null-string
$0: Whole record, $1 / $2: First/second field etc. $NF last field

- You can change the field separator with the -F option on the command line
echo a,b,c,d | awk -F, 'BEGIN { one = 1; two = 2 } { print $(one + two) }
- f vs F:
awk -F, -f awkScriptFile.awk inputDataFile.dat
A better option is to specify it in BEGIN:
BEGIN { FS = "," }
- FS = "\t
Tab, i.e. a single tab as the field separator
- FS = "\t+
Tabs one or more!
- FS = "[':\t]
Any of these three 1, : or tab could be present
- awk -F word[0-9][0-9][0-9] file
fields separated by 3 digits
34

CONFIDENTIAL

Records and Fields


- RS - how to separate records, default value is \n
- It can be changed:
awk 'BEGIN { RS = "/" } ; { print $0 }' BBS-list
or from the command-line, like this (i.e. even before it starts processing BBS-list!):
awk '{ print $0 }' RS="/" BBS-list
- NR: Record number total records if multiple files - read so far , FNR resets for each file

35

CONFIDENTIAL

awk pattern and actions


Kinds of Patterns
- /regular expression/ It matches when the text of the input record fits the regular expression.
- Expression
A single expression, matches when its value is non-zero (if a number) or
non-null (if a string)
- Range patter, e.g. pat1, pat2 A pair of patterns separated by a comma, specifying a range of records.
The range includes both the initial record that matches pat1, and the final
record
that matches pat2.
- BEGIN / END
Special patterns to supply start-up or clean-up actions
Empty

36

The empty pattern matches every input record

CONFIDENTIAL

awk pattern and actions: Regular expressions


It matches when the text of the input record fits the regular expression.
awk '/foo/ { print $2 }' BBS-list
awk '$1 ~ /J/' inventory-shipped
awk '$1 !~ /J/' inventory-shipped
tolower($1) ~ /foo/ { ... }
Regexp, e,g: e.g. ^ (Start) $(End) .(1 char) [] (char list) *(0-more) +(1-more)etc. could be used.

37

CONFIDENTIAL

awk pattern and actions: Expressions and Range


A single expression, matches when its value is non-zero (if a number) or a non-null (if a string)
awk
awk
awk
awk
awk

'$1 == "foo" { print $2 }' BBS-list


Exact word foo
'$1 ~ /foo/ { print $2 }' BBS-list shall contain foo
'/2400/ && /foo/' BBS-list
2400 and foo, both should be present
'/2400/ || /foo/' BBS-list
either of these two
'! /foo/' BBS-list
all line, but those having the word foo

Range pattern, pat1, pat2 : A pair of patterns separated by a comma, specifying a range of records.
The range includes both the initial record that matches pat1, and the final
record that
matches pat2
awk '$1 == "on", $1 == "off"
Everything b/w on and off inclusive

38

CONFIDENTIAL

awk pattern and actions: BEING-END and Empty pattern


BEGIN / END - Special patterns to supply start-up or clean-up actions
awk BEGIN { print "Analysis of \"foo\"" }
/foo/ { ++n }
END
{ print "\"foo\" appears " n " times." }' BBS-list
Empty: To print every input record
awk '{ print $1 }' BBS-list

39

CONFIDENTIAL

awk functions
Built-in functions:
C-like operations, and operators.
Arithmetic functions
int(), sqrt(), sin( ), cos( ), exp( ), atan2( ), sqrt( ), rand( ), srand()
http://www.staff.science.uu.nl/~oostr102/docs/nawk/nawk_91.html#SEC94
String functions
index(), length(), match(), split(),sprint(), sub(), gsub(),substr(),tolower(), toupper()
http://www.staff.science.uu.nl/~oostr102/docs/nawk/nawk_92.html

awk 'x = sqrt( $2+$3);{printf("%f,%f,%f,%f",x, $2,$3, $2+$3)}'

file2

awk 'x = sqrt( $2+$3);{printf("%s %.2f %d %d %d", substr($1,3,3) ,x, $2,$3, $2+$3)}'

40

CONFIDENTIAL

file2

awk functions

datafile
1.2
3.4
5.6
7.8
9.10 11.12 -13.14 15.16
17.18 19.20 21.22 23.24

User-defined functions:
Define:
function myprint(num)
{
printf "%6.3g\n", num
}

File rev.ask
function rev(str)
{
if (str == "")
return ""
return (rev(substr(str, 2)) substr(str, 1, 1))
}

41

CONFIDENTIAL

awk Operators
Arithmetic operators:
^, **, -, +, *, / , %,
Comparison-operators: <, <=, >, >=, ==, !=, ~, !~, in
String Concatenation:
No explicite operator, simply write strings next to each other, e.g. print "Field number one: " $1
Assignment:
=
Increment/Decrement: ++, -- : both post and pre-fix
Regexp Operators:
\
Suppress special meaning of a character, e.g. \$ would match a $ and not something at end of a line
^
Beginning of a string
$
End of a string
.(Period)
Any single character
() Group regexp together, e.g. @(samp|code)\{[^}]+\} matches both @code{foo} and @samp{bar}.
*
Repeat as many times as possible, e.g. ph* - lookup for one p followed by 0 or more h, e.g. p, ph, phhh
+
Repeat at least once, e.g. p - lookup for one p followed by 1 or more h, i.e. ph, phh etc. but not p
?
Match once or not at all, e.g. fe?d matches fd or fed, but not feed
{n}/{n,},{n,m} Match exactly n / n or more / n to m e.g. wh{3}y whhhy, w{1,2}y - why, why, w{1,}y why, whhy, whhhy etc.
[] Bracket expression, match any one, e.g. [Yog] matches any one of the Y, o or g.
[^] Complimented bracket expression, e.g. [^Yog] match if it does not contain either of Y, or or g.
|
Alteration operator, e.g. ^P|[aeiouy] - either it starts with a P, or contains any of aeiouy

42

CONFIDENTIAL

awk built-in variables


Field variables: $1, $2, $3, and so on ($0 represents the entire record).
NR:
Current count of the number of input records / line being read
NF:
Count of the number of fields in an input record. $NF for the last field in the input record
FILENAME:
Contains the name of the current input-file.
FS:
Field-separator" character for input record, default is "white space (1/more spaces/tabs)
characters. FS can be reassigned to another character to change the field separator.
RS:
Record Separator" character. new line is the default record separator character
OFS:
Output field separator for o/p fields when awk prints them, default is a "space"
character.
ORS:
Output record separator, for o/p records when awk prints them, default is a "newline"
character.
OFMT:
Format for numeric output. The default format is "%.6g".

43

CONFIDENTIAL

awk Data Manipulation


Input file flat file containing record and fields, available for string-manipulation and arithmetic operations.
e.g. consider file2:
If the 5th column is + then subtract 5000 from column 2 and add 2000 to column 3
If the 5th column is "-", then add 5000 to column 3 and subtract 2000 from column 2
awk '$5 == "+" {$2-=5000;$3+=2000}; $5 == "-"{$3+=5000;$2-=2000};{print}'
awk -f awkscript04.awk file2
cat awkscript04.awk
BEGIN{("---START-----")}
{
if($5 == "+"){
$2-=5000;
$3+=2000
}
if($5 == "-") {
$3+=5000;
$2-=2000
}
print
}
END { print ("---END-----") }

file2

cat file2
#track
chr11
61731756
61735132
chr12
6643584 6647537 GAPDH
chr11
18415935
18429765
chr12
21788274
21810728
chr22
24236564
24237409
chr4
6641817 6644470 MRFAP1
chr15
72491369
72523727
chr10
73576054
73611082
chr2
85132762
85133799
chr13
45911303
45915297

FTH1
+
LDHA
LDHB
MIF +
+
PKM PSAP
TMSB10
TPT1

Ref.Stack Exchange:
http://unix.stackexchange.com/questions/127471/using-awk-for-data-manipulation
44

CONFIDENTIAL

+
-

+
-

Data transformation and report generation language


Data manipulation and retrieval of information from text files

Initialize variables before reading a file:


awk -f progfile a=1 f1 f2 a=2 f3
sets a to 1 before reading input from f1 and sets a to 2 before reading input from f3
-

The -v option lets you assign a value to a variable before the awk program begins running (that is, before the
BEGIN action). For example, in
awk -v v1=10 -f prog datafile

45

CONFIDENTIAL

Report Generation-I
- Get employee names and salary:
awk '{print $2, $5}' employee.tx

employee.txt
100 Thomas Manager
Sales
$5,000
200 Jason
Developer Technology $5,500
300 Sanjay Sysadmin
Technology $7,000
400 Nisha
Manager
Marketing
$9,500
500Randy
DBA
Technology $6,000

..

- or something more report like:


cat -n report01.awk
1 BEGIN {
2
printf("
Salary Report\n");
3
printf("EName\tSalary\n");
4
printf("=====\t=======\n")
5 }
6 { printf("%s\t%d\n",$2,$5)}
7 END {
8
printf("--END OF REPORT--\n")
9 }

awk -f report01.awk employee.txt

46

CONFIDENTIAL

Report Generation - II
- An HTML report:
awk -f report02.awk -v v1=Technology employee.txt > abc.html
cat report02.awk:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

47

BEGIN {
title="Salary Report by awk"
print"<html>\n<title>"title"</title><body bgcolor=\"#aabbcc\">"
print"\n<table border=1><th colspan=3 align=centre>Salary Report"
print"for "v1" department</th>";
print "<tr><td>#</td><td>EName</td><td>Salary</td>"
totalSal=0
count=0
}
{
#if($4=="Technology")
if($4==v1) {
count++
print "<tr><td>"count"</td><td>"$2"</td><td>"$5"</td>"
totalSal+=$5
}
}

CONFIDENTIAL

Assignment
Create an HTML Report for states data input states.dat (file shared w/ all)
Print name of state / UT, Capital, and year (in which capital was established).
Skip the header and footer (first and last row) of file
Background of UTs should be red
Names of UTs contain words union territory - since youre using highlighting, dont print that part
States set background colour to blue
In the bottom, print counts of:
States
Uts
Email the report as an attachment to yourself

Nice job! Now try this as wellit would be fun!


- Count and list of states / UTs sharing capitals, e.g. PB, HR, CH
- Has their been any city which was capital of more than one states at different times (e.g. Shimla has been
capital of PB and HP and Kolkata, Mumbai)
- The oldest and the newest capitals
- whatever else catches your fancy!
48

CONFIDENTIAL

Checklist - awk

01
02
03
04
05
06

49

awk programming model


Records and fields
Pattern matching
Functions
System/built-in variables
Data manipulation and report Generation

CONFIDENTIAL

Questions?
Thank you

Yogesh.Mahajan@GlobalLogic.Com

2013 GlobalLogic Inc.

CONFIDENTIAL

You might also like