Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Final Presentation

AWK

Wednesday (February 23) 11:00-2:00


Carlson Learning Center 76-1275
Please Review Submission Guidelines
Also please include a PDF File of your final
report

Tip of the Day


You have a file that you misplaced and
want to find it quickly
You want to use the command
find

First create a postscript file of your report


Then use the ps2pdf utility on the center machines
to generate a pdf file

Create a gzipped tar file of your entire project.

find command example

Another find command


example

Yet another find command


example

% find ~ -name final_exam.txt

% find . -name *.pro -ls

% find / -name *junk* -exec rm {} \;

Above command will search for the file


final_exam.txt in all subdirectories
under your home directory
When found, it will print out the full path
file name

Above command will search for all your


IDL files in all subdirectories under your
current working directory
When found, it will print out an ls
listing of the file

Above command will search for all files


in the entire directory tree that contains
the pattern junk in the file name
When found, the file will be deleted

Example #1

Aho, Weinberger, Kernighan


AWK - is a text processing
utility that can efficiently
process and extract text data
with minimal programming

How Does AWK Work?


Awk is based on the concept of pattern
matching
Think of AWK as a filter program
Looks for key patterns and process records
matching patterns.

- I have a column of numbers


(input.dat) that require
conversion, e.g., square root.
Centigrad to Fahrenheit, etc.

Solutions
You can write an IDL or C program to do
this.
Transfer the data over to a spreadsheet
Or write a one line awk program

1
2

100

Syntax of AWK
/pattern/ {action}

Simplest AWK program


% gawk {print $0} input.dat
This simply prints out (echoes) the output
file

To take the square root


% gawk {print sqrt($0)} input.dat

If you have two columns of


data and you want to add
them up
gawk {print $1+$2} input.dat

What about patterns?

Matching

* - matches all patterns


? - matches a single character
[0-9] - matches a single character that is a
number
[A-Z] - matches a single character that is an
upper case letter.

/pattern/ - tries to match the pattern


/^pattern/ - makes sure the pattern starts at
the beginning of a line
/pattern$/ - end of a line
$1 ~ /pattern/ - tries to match the first field to
a pattern
$1 !~ /pattern/ - tries to NOT match the first
field to a pattern

Meaning of the fields


$0 - represent the entire input line
$1 - represent the first field
$2 - represents the second field
Etc.
NF - number of fields
NR - record number

Suppose you had headers on


the top of your file which you
wanted to ignore
% gawk /[0-9]/ {print $0} input.dat

Removing comments #
gawk '$0 !~ /^#/
Above works for # at the beginning of
line
gawk '$0 !~ /^ *#/
Better Pattern
Works for # at the beginning of line when
preceded by whitespace

Water Quality Samples

ID Chlor SS CDOM
B1
P1
P2

% gawk {print $1/1000.0, $2} water.ref >


water.ref.microns

MISI Image example


at 2000'AGL4'pixel

4Legend
MISI flight area

Boston Whaler
canoe
kayak

Conversion AWK script

Conversion of wavelength units


from nanometers to microns for a
spectral file (water.ref)

Real Life Problem 1:


ASD Spectra Conversion

Pier Team
ASD Truth
Panels

radiometer
thermistors
secchi depth
water samples

400.350
410.170
419.990

4
4

What if you have multiple files


Water_0001.ref
Water_0002.ref

Soil_0001.ref
Soil_0002.ref

Cement_1000.ref

0.0509975
0.0502359
0.0474999

683.900
693.440
702.980

0.0215759
0.0214323
0.0213168

How do we repeatedly apply


the AWK script
We would use the foreach UNIX statement.
The form of the foreach statement
% foreach shell_variable (regular_expression)
unix_statements
unix_statements

unix_statments
end

Processing only the water files


% foreach i (water*.ref)
foreach? echo Processing $i
foreach? gawk {print $1/1000.0, $2} $i >
$i.microns
foreach? end

Shell Filename Modifiers


h
r
e
t

Remove a trailing pathname component,


leaving only the head.
Remove a trailing suffix of the form
.xxx, leaving the basename.
Remove all but the trailing suffix.
Remove all leading pathname components,
leaving the tail.

Renaming a set of files


Suppose you had a set of files
Water_0001.ref.microns
Water_0002.ref.microns

Water_0100.ref.microns
You want to rename them back to
Water_0001.ref
Water_0002.ref

Sample output of the shell modifiers


% set a=/usr/tmp/water_00001.ref.microns
% echo $a
/usr/tmp/water_00001.ref.microns
% echo $a:h
/usr/tmp
% echo $a:r
/usr/tmp/water_00001.ref
% echo $a:e
microns
% echo $a:t
water_00001.ref.microns

We need tools to extract file


name components
Given the sample file
water_0001.ref.microns

Need to extract the file name extension(s)


.ref.microns
.microns

Need to extract the file name base


Water_0001

Renaming the water files


% foreach i (water*.microns)
foreach? echo Renaming $i to $i:r
foreach? mv $i $i:r
foreach? end

foreach statement can extract


elements of a shell variable

Real Life Problem 2:


MODTRAN Output

What do we want?
H2O value
Z

How do you extract a single value out


of a 40 page output?

% set a='0.0 0.1 0.2'


% foreach i ($a)
foreach? echo $i
foreach? end
0.0
0.1
0.2

What do we know?

Using grep to help analyze pattern


% grep H2O output.tp6

P
(KM)
0.315
0.554

T
(MB)
984.200
958.100

REL H
(K)
305.45
300.35

H2O

CLD AMT

RAIN RATE

AEROSOL

(%)
(GM M-3) (GM M-3) (MM HR-1) TYPE PROFILE
2.20 7.545E-01 0.000E+00 0.000E+00 RURAL RURAL
2.60 6.765E-01 0.000E+00 0.000E+00 RURAL

H2O

O3

CO2

2.2208E+02

1.3433E-01

2.6589E+02

P
(KM)
0.315
0.554

T
(MB)
984.200
958.100

REL H
(K)
305.45
300.35

CO
ATM CM
8.2446E-02

CH4

N2O
)

1.1924E+00

T
(MB)
984.200
958.100

REL H
(K)
305.45
300.35

H2O

CLD AMT

RAIN RATE

AEROSOL

(%)
(GM M-3) (GM M-3) (MM HR-1) TYPE PROFILE
2.20 7.545E-01 0.000E+00 0.000E+00 RURAL RURAL
2.60 6.765E-01 0.000E+00 0.000E+00 RURAL

1
***** MODTRAN 3.5 Version 1.1
Jan 97 *****
0 CARD 1 *****t0 7
2
2
1
0
0
0
0
0
0
1
1
0
0.000
0.00
0 CARD 1B *****T
8F
0
360.000
0 CARD 2 *****
1
1
0
0
0
0 30.00000
0.00000
0.00000
0.
00000
0.31500
0
GNDALT =
0.31500
0 CARD 2C *****
15
0
0AUG01
MODEL ATMOSPHERE NO.
7 ICLD =
0
MODEL 0 / 7 USER INPUT DATA
0.315 9.842E+02 3.230E+01 7.545E-01 0.000E+00 0.000E+00
ABD2222222
22222
0.554 9.581E+02 2.720E+01 6.765E-01 0.000E+00 0.000E+00
ABD2222222
2

We know that the value we want has the table


name H2O in the first field.
Z

P
(KM)
0.315
0.554

Z
I
1
H2O
1
H2O

P
Z
J

T
P
Z

REL H
H2O
H2O

H2O

H2O
O3
O3
O3
O3
O3

CLD AMT
CO2
CO2
CO2
CO2
CO2

RAIN RATE
CO
CH4
CO
CH4
CO
CH4
CO
CH4
CO
CH4

AEROSOL
N2O O2 NH3 NO NO2 SO2 HNO3
N2O O2 NH3 NO NO2 SO2
N2O
N2O O2 NH3 NO NO2 SO2
N2O

H2O

O3

CO2

(
2.2208E+02

P
(KM)
0.315
0.554

1.3433E-01

T
(MB)
984.200
958.100

CO
ATM CM

2.6589E+02

REL H
(K)
305.45
300.35

8.2446E-02

H2O
CLD AMT
RAIN RATE AEROSOL
(%)
(GM M-3) (GM M-3) (MM HR-1) TYPE PROFILE
2.20 7.545E-01 0.000E+00 0.000E+00 RURAL RURAL
2.60 6.765E-01 0.000E+00 0.000E+00 RURAL

N2O
)

1.1924E+00

2.2553E-01

H2O
CLD AMT
RAIN RATE AEROSOL
(%)
(GM M-3) (GM M-3) (MM HR-1) TYPE PROFILE
2.20 7.545E-01 0.000E+00 0.000E+00 RURAL RURAL
2.60 6.765E-01 0.000E+00 0.000E+00 RURAL

Need to Identify Unique


Pattern Property
Several H2Os in the file
Desired record is in the first column
Need to specify first column-only
matches

$1 ~ /H2O/
2.2553E-01

CH4

Need to skip to the value and


extract the value
Based on the following pattern
H2O

O3

CO2

(
2.2208E+02

1.3433E-01

CO
ATM CM

2.6589E+02

8.2446E-02

CH4

N2O
)

1.1924E+00

2.2553E-01

Putting it all together

Can be made into a shell script


(get_water_vapor.csh)

gawk '$1 ~ /H2O/ { getline; getline; getline; \


print ($1*18.015/22413.83) } input_modtran.dat

#!/bin/csh

Action is a unit conversion of water vapor value

gawk '$1 ~ /H2O/ { getline; getline; getline; \


print ($1*18.015/22413.83) }' $1

print ($1*18.015/22413.83)

We need to skip to the third line and


get the first record
This can be accomplished by the
getline command

From within IDL


IDL> spawn, get_water_vapor.csh
input.dat, results

What is this file?


Stripping Out Comments in
IDL

400.350
410.170
419.990

0.0509975
0.0502359
0.0474999

683.900
693.440
702.980

0.0215759
0.0214323
0.0213168

Commented File
# Water reflectance data file
# ASD Reflectance May 20, 1999 11:31 PM
# Local Time
# Wavelength [Nanometers] Reflectance
# [unitless]
400.350
0.0509975
410.170
0.0502359
419.990
0.0474999

702.980

0.0213168

Comment Stripping Routine


pro strip_out_comments, input_file_name, output_file_name
openr, input_file, input_file_name, /get_lun
openw, output_file, output_file_name, /get_lun
original_string = ''
while ( NOT EOF(input_file)) do begin
readf, input_file, original_string
input_string=strtrim( original_string, 2 )
comment_position = strpos(input_string,'#')
if( comment_position eq -1 and input_string ne '' )then begin
printf, output_file, input_string
end else if( comment_position gt 0 ) then begin
printf,output_file,strmid(input_string,0,comment_position )
endif
endwhile
free_lun, input_file, output_file
end

You might also like