Download as pdf or txt
Download as pdf or txt
You are on page 1of 244

SAS® Programming 3:

Advanced Techniques

Course Notes
SAS® Programming 3: Advanced Techniques Course Notes was developed by Michele Ensor and
Susan Farmer. Additional contributions were made by Bruce Dawless, Anke Fleischer, Mary Harding,
Marty Hultgren, Johnny Johnson, Eva-Maria Kegelmann, John McCall, Josee Ranger-Lacroix,
Christine Riddiough, Theresa Stemler, Stacey Syphus, Su Chee Tay, and Carlos Torres Ardila.
Instructional design, editing, and production support was provided by the Learning Design and
Development team.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or
trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies.

SAS® Programming 3: Advanced Techniques Course Notes

Copyright © 2019 SAS Institute Inc. Cary, NC, USA. All rights reserved. Printed in the United States
of America. No part of this publication may be reproduced, stored in a retrieval system, or
transmitted, in any form or by any means, electronic, mechanical, photocopying, or otherwise,
without the prior written permission of the publisher, SAS Institute Inc.

Book code E71390, course code LWPG3M6/PG3M6, prepared date 15May2019. LWPG3M6_001

ISBN 978-1-64295-095-3
For Your Information iii

Table of Contents

Lesson 1 Getting Started ............................................................................................ 1-1

1.1 Setting Up for This Course ........................................................................................ 1-3

1.2 DATA Step Review..................................................................................................... 1-9


Demonstration: Reviewing DATA Step Syntax and Processing ........................ 1-14

1.3 Solutions .................................................................................................................. 1-18


Solutions to Activities and Questions ................................................................ 1-18

Lesson 2 Using Advanced Functions ........................................................................ 2-1

2.1 Using a Variety of Advanced Functions ..................................................................... 2-3


Demonstration: Retrieving Previous Values with the LAG Function ................... 2-7

Demonstration: Counting and Finding Words with Character Functions .......... 2-16

Practice.............................................................................................................. 2-19

2.2 Performing Pattern Matching with Perl Regular Expressions ................................. 2-22

Demonstration: Validating Data with the PRXMATCH Function........................ 2-32

Demonstration: Standardizing Data with the PRXCHANGE Function .............. 2-40

Practice.............................................................................................................. 2-43

2.3 Solutions .................................................................................................................. 2-46


Solutions to Practices ........................................................................................ 2-46

Solutions to Activities and Questions ................................................................ 2-49

Lesson 3 Defining and Processing Arrays ............................................................... 3-1

3.1 Defining and Referencing One-Dimensional Arrays ................................................. 3-3

3.2 Doing More with One-Dimensional Arrays .............................................................. 3-13


Demonstration: Processing One-Dimensional Arrays: Part 1 ........................... 3-17

Demonstration: Processing One-Dimensional Arrays: Part 2 ........................... 3-26

Practice.............................................................................................................. 3-29
iv For Your Information

3.3 Defining and Referencing Two-Dimensional Arrays................................................ 3-32

Demonstration: Processing Two-Dimensional Arrays: Part 1 ............................ 3-40

Demonstration: Processing Two-Dimensional Arrays: Part 2 ............................ 3-46

Practice.............................................................................................................. 3-50

3.4 Solutions .................................................................................................................. 3-53


Solutions to Practices ........................................................................................ 3-53

Solutions to Activities and Questions ................................................................ 3-56

Lesson 4 Defining and Processing Hash Objects ................................................... 4-1

4.1 Declaring Hash Objects ............................................................................................. 4-3

4.2 Defining Hash Objects ............................................................................................. 4-16


Demonstration: Declaring and Defining a Hash Object ..................................... 4-19

4.3 Finding Key Values in a Hash Object ...................................................................... 4-23


Demonstration: Performing a Table Lookup with the FIND Method .................. 4-26

Practice.............................................................................................................. 4-29

4.4 Writing a Hash Object to a Table ............................................................................. 4-32


Demonstration: Creating a Table with the ADD and OUTPUT Methods ........... 4-36

Practice.............................................................................................................. 4-40

4.5 Using Hash Iterator Objects .................................................................................... 4-43

Demonstration: Reading Data in Forward and Reverse Direction .................... 4-48

Practice.............................................................................................................. 4-52

4.6 Solutions .................................................................................................................. 4-55

Solutions to Practices ........................................................................................ 4-55

Solutions to Activities and Questions ................................................................ 4-63

Lesson 5 Using Utility Procedures ............................................................................ 5-1

5.1 Creating Picture Formats with the FORMAT Procedure ........................................... 5-3
Demonstration: Specifying a Template for Datetime Values ............................... 5-8

Demonstration: Specifying a Template for Large Numbers ............................... 5-17


For Your Information v

Practice.............................................................................................................. 5-20

5.2 Creating Functions with the FCMP Procedure ........................................................ 5-23


Demonstration: Creating Functions Containing One Argument ........................ 5-28

Demonstration: Creating Functions Containing Multiple Arguments ................. 5-33

Practice.............................................................................................................. 5-38

5.3 Solutions .................................................................................................................. 5-41

Solutions to Practices ........................................................................................ 5-41


Solutions to Activities and Questions ................................................................ 5-44
vi For Your Information

To learn more…
For information about other courses in the curriculum, contact the
SAS Education Division at 1-800-333-7660, or send e-mail to
training@sas.com. You can also find this information on the web at
http://support.sas.com/training/ as well as in the Training Course
Catalog.

For a list of SAS books (including e-books) that relate to the topics
covered in this course notes, visit https://www.sas.com/sas/books.html or
call 1-800-727-0025. US customers receive free shipping to US
addresses.
Lesson 1 Getting Started
1.1 Setting Up for This Course .................................................................................................... 1-3

1.2 DATA Step Review................................................................................................................... 1-9


Demonstration: Reviewing DATA Step Syntax and Processing ........................................... 1-14

1.3 Solutions ................................................................................................................................ 1-18


Solutions to Activities and Questions ................................................................................... 1-18
1-2 Lesson 1 Getting Started

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.1 Setting Up for This Course 1-3

1.1 Setting Up for This Course

Course Overview

/[XYZ]-\d{3}/
advanced pattern-matching
Y-456

Advanced
Techniques

custom

$1,500,000 $1.5M

3
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Practicing in This Course

international European US Census World Bank


US National
storm and tourism and population population
Park data
weather data trade data data data

4
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-4 Lesson 1 Getting Started

Practicing in This Course


Demonstration Performed by your instructor as an example for you
to observe
Activity Short practice opportunities for you to work in SAS,
either independently or with the guidance of your
instructor
Practice Extended practice opportunities for you to work
independently

5
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Choosing a Practice Level


Level 1 Solve basic problems with step-by-step
guidance.
Level 2 Solve intermediate problems with defined
goals.
Challenge Solve complex problems using SAS Help
Choose one
and documentation resources. practice to do in
class based on your
interest and skill
level.

6
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.1 Setting Up for This Course 1-5

SAS Programming Interfaces

SAS Enterprise
SAS Studio Guide
SAS windowing
environment

You can use the interface of your choice, but some demonstrations
in this course use features specifically in SAS Enterprise Guide.
7
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Accessing the Course Files

course
files
activities
Make note of
data the location of
your course
demos files folder.

output

practices

8
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-6 Lesson 1 Getting Started

Accessing the Course Files


Programs in the
course activities, demos, and
files practices folders
activities follow this naming
convention.
data

demos

output
p304d01.sas
practices Programming 3, Lesson 4, demo 1

9
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Creating the Course Data

course
files
activities

data
cre8data.sas
demos

output

practices

10
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.1 Setting Up for This Course 1-7

1.01 Activity (Required)


1. Open the cre8data.sas program located in the course files folder.
2. Find the %LET statement. %let path=s:/workshop;

3. If the path does not represent your course files folder, modify the path.
4. Run the program and verify that a report that lists the created SAS
tables is generated.
5. How many SAS tables were created?

11
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

1.02 Activity (Required)


1. Open the libname.sas program located in the course files folder.
This is an example. Your
%let path=s:/workshop; paths might be different.
%let pathout=s:/workshop/output;

libname pg3 's:/workshop/data' filelockwait=20;

2. Run the program.


3. Check the SAS log to verify that library pg3 was successfully assigned.

Note: The libname.sas program must be resubmitted anytime that you


restart your SAS session.

13
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

The FILELOCKWAIT= option in the LIBNAME statement specifies the number of seconds that SAS
will wait for a locked table to become available. This option helps prevent lock errors when a table is
accessed consecutively within a step such as PROC FCMP.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-8 Lesson 1 Getting Started

Extending Your Learning

Use your Extended


Learning page to
download course files
and access additional
helpful resources!

14
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 DATA Step Review 1-9

1.2 DATA Step Review

DATA Step

DATA output-table;
The DATA step is a
. . . other statements . . . powerful tool to
create, clean, and
prepare your data!
RUN;

16
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

DATA Step Processing

Compilation Execution
establish data read, manipulate,
attributes and rules and write data
for execution
What happens
behind the
scenes when a
DATA step runs?

17
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-10 Lesson 1 Getting Started

DATA Step Processing: Compilation

Compilation PDV
Continent Country Pop2000 Pop2017
1) Check for syntax errors. $ 13 $ 18 N8 N8
2) Create the program data
vector (PDV), which includes
all columns and attributes. The PDV is the
3) Establish the specifications magic behind the
for processing data in the DATA step’s processing
PDV during execution. power!
4) Create the descriptor
portion of the output table.

18
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

DATA Step Processing: Execution

Execution data output-table;


set input-table;
1) Initialize the PDV. ...other statements...
run;
2) Read a row from the input
Implicit OUTPUT;
table into the PDV.
Implicit RETURN;
3) Sequentially process
statements and update values
in the PDV. Automatic looping makes
4) At the end of the step, write processing data easy!
the contents of the PDV to the
output table.
5) Return to the top of the DATA
step.
19
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 DATA Step Review 1-11

DATA Step Processing in Action

You can watch execution


happen one statement at a
time in the Enterprise
Guide DATA step debugger.

20
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Viewing Execution in the Log


writes all columns
and values in the PDV
to the log
PUTLOG _ALL_;
writes selected
PUTLOG column=; columns and values in Use the PUTLOG
the PDV to the log statement to write
PUTLOG "message"; information about
writes a text string execution to the log.
to the log

21
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-12 Lesson 1 Getting Started

Scenario 1
pg3.population_top25countries (25 rows)

What is the population


growth percentage per
country between the years
2000 and 2017?
work.PctGrowth18Yrs (25 rows)

22
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Scenario 1: DATA Step Review


data work.PctGrowth18Yrs;
length Continent $ 13 Country $ 18;
set pg3.population_top25countries;
Country=scan(CountryCodeName,2,'-');
PctGrowth18Yrs=(Pop2017-Pop2000)/Pop2000*100;
drop CountryCodeName;
format Pop2000 Pop2017 comma16. PctGrowth18Yrs 5.1;
run;

PDV
Continent Country CountryCodeName Pop2000 Pop2017 PctGrowthAvgYr PctGrowth18Yrs
_N_
$ 13 $ 18 $ 22 N8 N8 N8 N8
D
Asia China CHN-China 1,262,645,000 1,386,395,000 0.5 9.8 1

23
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 DATA Step Review 1-13

Scenario 2
pg3.population_top25countries (25 rows)

What is the population


growth percentage per
continent between the
years 2000 and 2017?
work.PctGrowth18Yrs_Cont (5 rows)

24
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-14 Lesson 1 Getting Started

Reviewing DATA Step Syntax and Processing

Scenario
Use one DATA step to calculate the population growth percentage per country between the years
2000 and 2017. Use a second DATA step to calculate the population growth percentage per
continent between the years 2000 and 2017.

Files
• p301d01.sas
• population_top25countries – a SAS table containing the 2000 and 2017 populations for the 25
biggest countries based on the 2017 population

Syntax

DATA output-table;
. . . other statements . . .
RUN;

Notes
• The DATA step is a powerful tool to create, clean, and prepare your data.
• When a DATA step runs, it goes through compilation and execution. Compilation establishes data
attributes and rules for execution. Execution reads, manipulates, and writes data.
• Specifically, compilation checks for syntax errors, creates the program data vector (PDV),
establishes the specifications for processing data in the PDV during execution, and creates the
descriptor portion of the output table.
• Specifically, execution initializes the PDV, reads a row from the input table into the PDV,
sequentially processes statements and update values in the PDV, writes the contents of the PDV
to the output table, and returns to the top of the DATA step.
• The Enterprise Guide DATA step debugger enables you to watch execution happen one statement
at a time.
• The PUTLOG statement can write information about execution to the log.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 DATA Step Review 1-15

Demo
Note: The debugger portion of this demo must be performed in Enterprise Guide.
1. Open the p301d01.sas program in the demos folder and find the Demo section. As needed
based on the audience, review the DATA step syntax for Scenario 1.
*Scenario 1;
data work.PctGrowth18Yrs;
length Continent $ 13 Country $ 18;
set pg3.population_top25countries;
Country=scan(CountryCodeName,2,'-');
PctGrowth18Yrs=(Pop2017-Pop2000)/Pop2000*100;
drop CountryCodeName;
format Pop2000 Pop2017 comma16. PctGrowth18Yrs 5.1;
run;

2. In Enterprise Guide, click the Toggle DATA Step Debugger toolbar button to enable
debugging in the program. Click the Debugger icon next to the DATA statement. The DATA Step
Debugger window appears. Notice the columns in the PDV. Click the Step execution to the
next line toolbar button to execute through the statements. Continue to step through the
statements to understand the processing of the DATA step. Close the DATA Step Debugger
window.
3. Run the Scenario 1 DATA step. View the output table and the log.

4. As needed based on the audience, review the syntax for Scenario 2.


*Scenario 2;
proc sort data=pg3.population_top25countries
out=work.continent_sorted;
by Continent descending Pop2017;
run;

data work.PctGrowth18Yrs_Cont;
set work.continent_sorted;
by Continent;
if first.Continent=1 then do;
Count=0; Pop2000Total=0; Pop2017Total=0;
end;
Count+1;
Pop2000Total+Pop2000;
Pop2017Total+Pop2017;
if last.Continent=1 then do;
PctGrowth18Yrs_Cont=

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-16 Lesson 1 Getting Started

(Pop2017Total-Pop2000Total)/Pop2000Total*100;
output;
end;
format Pop2000Total Pop2017Total comma16.
PctGrowth18Yrs_Cont 5.1;
keep Continent Count Pop2000Total
Pop2017Total PctGrowth18Yrs_Cont;
run;
5. In Enterprise Guide, use the DATA step debugger to understand the processing of the DATA
step.
6. Run the Scenario 2 steps. View the output tables and the log.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 DATA Step Review 1-17

Scenario 3
pg3.population_top25countries (25 rows)

What is the predicted


population for the years
2018 to 2022 based on
the average yearly growth
percentage?
work.PredictedPopulation (125 rows)

26
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

1.03 Activity

View the How many rows are in the output table?


output table. What is the value of Year?

View the How many rows are in the output table?


output table. What is the range of values for Year?
View the How many times did SAS iterate through the DATA step
SAS log. based on the PUTLOG statement?

27
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-18 Lesson 1 Getting Started

1.3 Solutions
Solutions to Activities and Questions

1.01 Activity – Correct Answer

44 SAS tables
were created.

12
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

1.03 Activity – Correct Answer


Without OUTPUT statement:
View the How many rows are in the output table? 25
output table. What is the value of Year? 2023
With OUTPUT statement:
View the How many rows are in the output table? 125
output table. What is the range of values for Year? 2018 to 2022
View the How many times did SAS iterate through the DATA step
SAS log. based on the PUTLOG statement? 25

In what year will the predicted population of India exceed China? 2021

28
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
Lesson 2 Using Advanced
Functions
2.1 Using a Variety of Advanced Functions ............................................................................... 2-3
Demonstration: Retrieving Previous Values with the LAG Function ...................................... 2-7
Demonstration: Counting and Finding Words with Character Functions ............................. 2-16
Practice ................................................................................................................................. 2-19

2.2 Performing Pattern Matching with Perl Regular Expressions ......................................... 2-22
Demonstration: Validating Data with the PRXMATCH Function .......................................... 2-32
Demonstration: Standardizing Data with the PRXCHANGE Function ................................. 2-40
Practice ................................................................................................................................. 2-43

2.3 Solutions ................................................................................................................................ 2-46


Solutions to Practices ........................................................................................................... 2-46
Solutions to Activities and Questions ................................................................................... 2-49
2-2 Lesson 2 Using Advanced Functions

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 Using a Variety of Advanced Functions 2-3

2.1 Using a Variety of Advanced


Functions

Commonly Used Functions

Character Date and Time


LENGTH FIND CAT WEEKDAY YEAR DATEPART
SCAN ANYALPHA CATS DAY YRDIF TIMEPART
SUBSTR ANYDIGIT CATX MONTH INTCK MDY
TRANWRD ANYPUNCT LOWCASE QTR INTNX TODAY
STRIP COMPBL PROPCASE
TRIM COMPRESS UPCASE
Descriptive Statistics

Truncation Special SUM RANGE LARGEST


MEAN MIN N
ROUND CEIL INPUT PUT MEDIAN MAX NMISS
INT FLOOR
3
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Documentation for Functions

4
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-4 Lesson 2 Using Advanced Functions

2.01 Activity
1. Go to the Programming: SAS 9.4 and SAS Viya documentation.
2. Under DATA Step, select Function and Call Routines.
3. Under Dictionary of Functions and CALL Routines, select FIND
Function.
4. How many arguments are required for the FIND function?
5. How many modifiers are available for the FIND function?

5
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Link
• Programming: SAS 9.4 and SAS Viya

Additional Functions

FINDC
FINDW
COUNT
COUNTC LAG
COUNTW

7
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 Using a Variety of Advanced Functions 2-5

Scenario
Need to create.

Which city has the


bigger difference
in daily average
temperature
between
consecutive days?

8
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

LAG Function

LAG<n>(column)
LAG or
first previous value
LAG1

The LAG function LAG2 second previous value


retrieves the
previous value of a
column the last LAG3 third previous value
time that the LAG
function executed.
...

9
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-6 Lesson 2 Using Advanced Functions

Retrieving Previous Values


FirstPrevDay =lag1(TavgC);
SecondPrevDay=lag2(TavgC);
ThirdPrevDay =lag3(TavgC);
FourthPrevDay=lag4(TavgC);

10
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved. p302d01

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 Using a Variety of Advanced Functions 2-7

Retrieving Previous Values with the LAG Function

Scenario
Use the LAG function to retrieve the previous day’s average temperature to calculate the difference
between the current day’s and the previous day’s average temperature.

Files
• p302d01.sas
• weather_china_daily2017 – a SAS table that contains the daily precipitation totals in centimeters
and the average daily Celsius temperatures for Beijing and Shanghai, China

Syntax

LAG<n>(column)

Notes
• The LAGn function retrieves the previous value of a column the last time that the LAG function
executed.
• The value of n represents the number of previous values to go back. For example, LAG1 is one
previous value and LAG2 is two previous values.

Demo
1. Open the p302d01.sas program in the demos folder and find the Demo section. Highlight and
run the DATA step. View the output table and notice that the first 365 rows contain the daily
average temperatures for Beijing and the last 365 rows contain the daily average temperatures
for Shanghai.
2. Uncomment the two assignment statements.
TavgCPrevDay=lag1(TavgC);
TempIncrease=TavgC-TavgCPrevDay;

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-8 Lesson 2 Using Advanced Functions

3. Run the DATA step and view the output table. Notice the values of TavgCPrevDay and
TempIncrease. Specifically, look at the value of TavgCPrevDay for the first row of Shanghai
data (row 366). The last temperature for Beijing is being used as the previous value for the first
temperature of Shanghai.

4. Add a BY statement and a conditional statement to correct the previous value anytime that there
is a switch to a new city.
by City;
TavgCPrevDay=lag1(TavgC);
if first.City=1 then TavgCPrevDay=.;
TempIncrease=TavgC-TavgCPrevDay;
5. Run the DATA step and view the output table. Confirm that the first row of Shanghai data
(row 366) contains a missing value for the previous temperature.
6. Run the ODS statements, the PROC MEANS step, and the PROC SGPLOT step to determine
the biggest difference in daily average temperature between consecutive days.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 Using a Variety of Advanced Functions 2-9

In the HTML results, place your cursor on data points to see a tooltip of Date and TempIncrese.
• The biggest decrease in temperature (-8.9) occurred in Beijing on 10/2/2017.
• The biggest increase in temperature (7.2) occurred in Beijing on 6/7/2017.
• The biggest decrease in temperature (-10.5) occurred in Shanghai on 2/20/2017.
• The biggest increase in temperature (8.9) occurred in Shanghai on 2/19/2017.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-10 Lesson 2 Using Advanced Functions

Calculating a Moving Average

Need to create.

Open3MnthAvg=mean(Open,Open1MnthBack,Open2MnthBack);

12
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

2.02 Activity
Open p302a02.sas from the activities folder and perform the following
tasks:
1. Complete the assignment statement for Open1MnthBack, which
is equal to the Open value from one previous month.
2. Complete the assignment statement for Open2MnthBack, which
is equal to the Open value from two previous months.
3. Run the program and view the results.
4. What is the three-month average (Open3MnthAvg) for 02MAR2010?

13
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 Using a Variety of Advanced Functions 2-11

Best Practice
Open1MnthBack=lag1(Open);
Open2MnthBack=lag2(Open);
if _N_ ge 3 then
Open3MnthAvg=mean(Open,Open1MnthBack,Open2MnthBack);

if _N_ ge 3 then
Open3MnthAvg=mean(Open,lag1(Open),lag2(Open));

Create a lagged value in an


assignment statement before using
it in a conditional statement.

15
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

The LAG function stores previous values in a queue when the LAG function is executed. If the LAG
function is not executed due to a false condition, the previous value is not stored. By using the LAG
function outside of a conditional statement, the previous value is stored.
http://support.sas.com/kb/24/665.html

2.03 Activity

if _N_ ge 3 then
Open3MnthAvg=mean(Open,lag1(Open),lag2(Open));

16
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-12 Lesson 2 Using Advanced Functions

Scenario
pg3. tornado_2017narrative
On average, how many
words are written in
narratives about tornadoes?

On average, how often


is EF referenced within a
narrative?

18
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Tornado EF Scale

Scale Winds Damage


EF-0 65-85 mph Light

The Enhanced Fujita EF-1 86-110 mph Moderate


Scale, or EF Scale, is EF-2 111-135 mph Considerable
used to assign a EF-3 136-165 mph Severe
tornado a rating EF-4 166-200 mph Devastating
based on estimated
EF-5 >200 mph Incredible
wind speeds and
related damage.
The values in our table might
or might not contain a hyphen.

19
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

For more information about the EF Scale, check out https://www.spc.noaa.gov/efscale/ and
https://www.weather.gov/oun/efscale.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 Using a Variety of Advanced Functions 2-13

COUNT Functions
Counts the number of
times that a specified
substring appears within
COUNT(string, substring <, modifier(s)>)
a character string.
Counts the number of
characters in a string that
COUNTC(string, character-list <, modifier(s)>)
appear or do not appear
in a list of characters.

Counts the number of


words in a character COUNTW(string <, delimiter(s)> <, modifier(s)>)
string.

20
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

COUNT and COUNTW Functions

NumEF=count(Narrative,'EF');

NumWord=countw(Narrative,' ');

This argument specifies the blank as the


delimiter that separates words. Multiple
delimiters can be specified. If no delimiters
are specified, a default list is used.
blank ! $ % & () * + , - . / ; < ^ |

21
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-14 Lesson 2 Using Advanced Functions

Scenario
pg3. tornado_2017narrative

What word tends to


follow the EF value?

22
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

FIND Functions

Returns the starting


FIND(string, substring
position where a substring
<, modifier(s)> <, start-position>)
is found in a string.

Returns the starting


position where a character FINDC(string, character-list
from a list of characters <, modifier(s)> <, start-position>)
is found in a string.
Returns the starting
position of a word in FINDW(string, word, <, delimiter(s)>
a string or the number <, modifier(s)> <, start-position>)
of the word in a string.
23
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 Using a Variety of Advanced Functions 2-15

FINDW Function
Many characters are being specified
as delimiters that separate words.

EFWordNum=findw(Narrative,'EF','012345- .,','e');

The e or E modifier counts the words that


are scanned until the specified word is
found, instead of determining the starting
position of the specified word in the string.

24
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

2.04 Activity
1. Go to the Programming: SAS 9.4 and SAS Viya documentation.
2. Under DATA Step, select Function and Call Routines.
3. Under Dictionary of Functions and CALL Routines, select FINDW
Function.
4. Does the FINDW function contain more modifiers than the FIND
function (2.01 Activity) or fewer?

25
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Link
• Programming: SAS 9.4 and SAS Viya

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-16 Lesson 2 Using Advanced Functions

Counting and Finding Words with Character Functions

Scenario
Use variations of the COUNT functions to count the number of words and the number of times the
string EF appears in a tornado narrative. Also, use the FINDW function to return the number of the
string EF within a tornado narrative so that the word following EF can be determined.

Files
• p302d02.sas
• tornado_2017narrative – a SAS table that contains narratives for tornados occurring in the
United States during 2017

Synta

COUNT(string, substring <, modifier(s)>)


COUNTC(string, character-list <, modifier(s)>)
COUNTW(string <, delimiter(s)> <, modifier(s)>)
FIND(string, substring <, modifier(s)> <, start-position>)
FINDC(string, character-list <, modifier(s)> <, start-position>)
FINDW(string, word, <, delimiter(s)> <, modifier(s)> <, start-position>)

Notes
• The COUNT function counts the number of times that a specified substring appears within a
character string.
• The COUNTC function counts the number of characters in a string that appear or do not appear in
a list of characters.
• The COUNTW function counts the number of words in a character string.
• The FIND function returns the starting position where a substring is found in a string.
• The FINDC function returns the starting position where a character from a list of characters is
found in a string.
• The FINDW function returns the starting position of a word in a string or the number of the word in
a string.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 Using a Variety of Advanced Functions 2-17

Demo
1. Open the p302d02.sas program in the demos folder and find the Demo section. Highlight and
run the DATA step and the PROC PRINT step. View the results and notice the values of
Narrative.
2. Uncomment the two assignment statements relating to the COUNT functions. Run the DATA step
and the PROC PRINT step. View the results. Verify the values of NumEF and NumWord.
NumEF=count(Narrative,'EF');
NumWord=countw(Narrative,' ');

3. Uncomment the two assignment statements relating to the FIND functions. Run the DATA step
and the PROC PRINT step. View the results. Notice that EFWordNum is equal to EFStartPos
anytime the first occurrence of EF is followed by a hyphen and that EFWordNum is equal to 0
anytime the first occurrence of EF is followed by a number.
EFStartPos=find(Narrative,'EF');
EFWordNum=findw(Narrative,'EF');

4. Modify the EFWordNum assignment statement to add a third argument that includes a set of
delimiters that separates words. Run the DATA step and the PROC PRINT step. View the
results. Notice that EFWordNum is now equal to EFStartPos for all rows except row 240.
EFWordNum=findw(Narrative,'EF','012345- .,');
5. Modify the EFWordNum assignment statement to add a fourth argument that returns the
number of the word instead of the starting position. Run the DATA step and the PROC PRINT
step. View the results. Notice that EFWordNum is now the number of the word, so the number is
smaller than EFStartPos.
EFWordNum=findw(Narrative,'EF','012345- .,','e');

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-18 Lesson 2 Using Advanced Functions

6. Uncomment the conditional statement, which, if true, scans the narrative for the word after the
EF word. Run the DATA step and the PROC PRINT step. View the results. Verify that AfterEF
contains the word after the first occurrence of the EF word.
if EFWordNum>0 then
AfterEF=scan(Narrative,EFWordNum+1,'012345- .,');

7. Run the PROC FREQ step and the PROC MEANS step. View the results.

• On average, EF is referenced 0.88 times within a narrative with a range of 0 to 6 times.


• On average, 101.7 words are written in a narrative with a range of 3 to 676 words.
• Tornado is the word that tends to follow the EF value.
8. Self-study: Refer to program p302d03 for examples of using FIND and FINDW functions with
a DO loop to find all occurrences of EF within a narrative.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 Using a Variety of Advanced Functions 2-19

Practice

If you restarted your SAS session, open and submit the libname.sas program in the course files.

Level 1
1. Using the LAG Function
The pg3.np_2016traffic table contains monthly traffic counts for national parks for the year
2016. There are 12 rows for each value of ParkCode, and each row gives the traffic count for
months 1 through 12. Calculate the change in traffic count between consecutive months for each
park.
a. Open the p302p01.sas program in the practices folder. Run the program to view the 2016
traffic count data.
b. In an assignment statement, use the LAG function to create the column PrevMthTC, which
is the previous value of TrafficCount.
c. In another assignment statement, create the column OneMthChange, which is TrafficCount
minus PrevMthTC.
d. Run the program and view the results. Notice that the rows for the month value of 1 (starting
with row 13) have Prev1MthTC equal to the previous park’s traffic count for month 12.

e. Before the OneMthChange assignment statement, add an IF/THEN statement that changes
the PrevMthTC value to missing for the first occurrence of a park code.
f. Run the program and verify the results. What is the value of OneMthChange for row 13?

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-20 Lesson 2 Using Advanced Functions

Level 2
2. Using the COUNT and FINDW Functions
The pg3.np_grandcanyon table contains comments regarding Grand Canyon National Park.
The canyon consists of the North Rim and the South Rim. Determine how many times the word
South appears in each comment. Also, retrieve the word after the first occurrence of the word
South in each comment.
a. Open the p302p02.sas program in the practices folder. Run the program to view the Grand
Canyon comments.
b. Use the COUNT function to create a column named NumSouth that is equal to the number
of times the word South appears in each comment. Use the modifier i to ignore case.
c. Subset the data to include only the rows that contain a comment with the word South.
d. Run the program and verify that 19 rows contain a comment with the word South.
e. Use the FINDW function to create a column named SouthWordPos that is equal to the word
number for the first occurrence of the word South in each comment.
1) Specify the space and period as the delimiters that separate words.
2) Use the modifier i to ignore case and the modifier e to return the word number instead
of the starting position.
f. Use the SCAN function to create a column named AfterSouth that is equal to the word after
the first occurrence of the word South. Use the same delimiters in the SCAN function as the
FINDW function.
g. Run the program and verify the results. What word occurs most often after the word South?

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 Using a Variety of Advanced Functions 2-21

Challenge
3. Using the COUNTC Function
The pg3.np_mammals table contains the scientific and common names of mammals found
in national parks. The table lists one scientific name for each mammal and one or more common
names. Common names are separated by commas, forward slashes, or asterisks. Create
a separate row for each common name.
a. Open the p302p03.sas program in the practices folder. Run the program to view the
scientific and common names of national park mammals.

b. Use the COUNTC function to create a column named SpecCharNum that is equal to the
number of times that a special character (comma, forward slash, or asterisk) is found in the
Common_Names column per each mammal.
c. If SpecCharNum is equal to zero (only one common name), assign the Name column
the value of Common_Names and write it to output.
d. If SpecCharNum is greater than zero (multiple common names), assign a Name column
with each common name from the Common_Names column and write to output.
1) Use a DO loop incrementing from a start value of 1 to a stop value of SpecCharNum
plus 1.
2) Use the SCAN function to capture each common name.
e. Run the program and verify the results. How many values of Name are present for
the species Vulpes vulpes (which starts on row 10)?

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-22 Lesson 2 Using Advanced Functions

2.2 Performing Pattern Matching with Perl


Regular Expressions

Pattern Matching

/([2-9]\d\d)-([2-9]\d\d)-(\d{4})/

A regular expression
is a sequence of
strings that define a
search pattern.

30
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

This regular expression is for matching valid US and Canada phone numbers.

Regular Expressions

concept of regular regular expressions using


expressions introduced Perl syntax included in SAS

early 1950s late 1980s early 2000s

programming language
Perl developed

31
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Performing Pattern Matching with Perl Regular Expressions 2-23

Perl Regular Expressions

PRXCHANGE PRXPOSN
PRX
CALL Functions CALL
PRXCHANGE and PRXPOSN
Call Routines
CALL CALL
PRXDEBUG PRXMATCH PRXPARSE PRXSUBSTR

CALL CALL
PRXPAREN PRXNEXT
PRXFREE

32
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Refer to demo p302d07 for examples of all the PRX functions and call routines.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-24 Lesson 2 Using Advanced Functions

Metacharacters
Metacharacter Behavior
/…/ Forward slash is starting and ending delimiter.
(…) Parentheses are for grouping.
| Vertical line is for OR situation.
\d Matches a digit (0-9).
\D Matches a non-digit such as letter or special character.
\s Matches a whitespace character such as space or tab.
\w Matches a word character (a-z, A-Z, 0-9, or underscore).
. Matches any character.
[…] Matches a character in the brackets.
[^…] Matches a character not in the brackets.
33
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Examples of metacharacters:

dog|cat|fox matches “dog” or “cat” or “fox”


|
d|fog matches “dog” or “fog”

\d \d\d\d\d matches any four-digits (0-9) such as “0123” or “6387”

\D \D\D\D\D matches any four non-digits such as “WxYz” or “AVG%”

\sBob\s matches “ Bob “


\s
\D\D\D\D\s\d matches “Apt# 5”

\w \w\w\w\w\w\w\w matches any seven-word characters such as “F_Last2”

mi.e matches “mike” or “mice”


.
..i.e matches “white” or “smile”

[dmn]ice matches “dice” or “mice” or “nice”


[…]
\d[6789]\d matches “162” or “574” or “685” or “999”

[^mn]ice matches “dice” but not “mice” or “nice”


[^…]
\d[^6789]\d matches “152“ or “608” but not “574“ or “999”

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Performing Pattern Matching with Perl Regular Expressions 2-25

Metacharacters
Metacharacter Behavior
^ Matches the beginning of the string.
$ Matches the end of the string.
\b Matches a word boundary (most special characters).
\B Matches a non-word boundary (letter, digit, or underscore).
* Matches the preceding character 0 or more times.
+ Matches the preceding character 1 or more times.
? Matches the preceding character 0 or 1 times.
{n} Matches exactly n times.
\ Overrides the next metacharacter such as a ( or ?.
\n Matches the n capture buffer.
34
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Examples of metacharacters:

^ ^ter matches “terrific” but not “test”

$ ter$ matches “winter” but not “winner”

\b Corp\b matches “Corp.” or “Corp XYZ” but not “Corporation” or “Corp123” or “Corp_A”

\B Corp\B matches “Corporation” or ”Corp123” or “Corp_A” but not “Corp.” or “Corp XYZ”

* dog* (g can appear 0 or more times) matches “do” or “dog” or “doggg”

+ dog+ (g can appear 1 or more times) matches “dog” or “doggg” but not “do”

? dog? (g can appear 0 or 1 times) matches “do” or “dog”

\w{7} matches any seven-word characters such as “F_Last2”


{n} dog{3} matches the letters “do” plus letter “g” three times such as “doggg”
(dog){3} matches the letters “dog” three times such as “dogdogdog”

\ \(\d+\) matches a value with a “(“, one or more digits, and a “)” such as “(123)”

(\dA\d)\1 matches “1A21A2” (first grouping is repeated) but not “1A23A4”


\n (B\dB)(\dA\d)\1 matches “B9B1A2B9B” (first grouping is repeated)
(B\dB)(\dA\d)\2 matches “B9B1A21A2” (second grouping is repeated)

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-26 Lesson 2 Using Advanced Functions

Metacharacters Example

A Perl regular expression must


start and end with a delimiter.

/([2-9]\d\d)-([2-9]\d\d)-(\d{4})/

group 1 group 2 group 3

hyphen hyphen

This regular expression is for matching valid


US and Canada phone numbers.
35
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

In the United States and Canada, phone numbers are fixed length, with a total of 10 digits. A phone
number contains a three-digit area code, a three-digit prefix, and a four-digit line number. A 1 at the
beginning of the phone number represents the country code.
• The area code designates a specific geographic region such as a city or part of a state
or province. Area codes cannot start with the digits of 0 or 1.
• The prefix narrows the location of the phone number even further. Prefixes cannot start with
the digits of 0 or 1.
• The line number directs a call to a specific phone line within the location.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Performing Pattern Matching with Perl Regular Expressions 2-27

Explanation of Perl regular expression:


/ Beginning delimiter for regular expression
( Marks the start of group 1
[2-9] Matches a digit, 2-9
\d Matches a digit, 0-9
\d Matches a digit, 0-9
) Marks the end of group 1
- Matches the - symbol
( Marks the start of group 2
[2-9] Matches a digit, 2-9
\d Matches a digit, 0-9
\d Matches a digit, 0-9
) Marks the end of group 2
- Matches the - symbol
( Marks the start of group 3
\d{4} Matches a digit, 0-9, four times
) Marks the end of group 3
/ Ending delimiter for regular expression

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-28 Lesson 2 Using Advanced Functions

2.05 Activity
Which Perl regular expression will not find all tornados with a value of EF3,
EF-3, EF4, or EF-4?

a. '/(EF3|EF-3|EF4|EF-4)/'
b. '/(EF-?3|EF-?4)/'
c. '/EF-?(3|4)/' (…) Parentheses are for grouping.

| Vertical line is for OR situation.


d. '/EF-?[34]/'
? Matches the preceding character 0 or 1 times.
e. '/EF.[34]/'
[…] Matches a character in the brackets.

. Matches any character.

36
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

PRXMATCH Function

PRXMATCH(Perl-regular-expression, source)

The PRXMATCH function


searches for a pattern match and
returns the position at which the
pattern is found. A value of zero
is returned if no match is found.

38
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Performing Pattern Matching with Perl Regular Expressions 2-29

PRXMATCH Function

PRXMATCH(Perl-regular-expression, source)

We will look at constant


three methods for
specifying the Perl column
regular expression.
pattern ID number

39
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Validating Data

Loc=prxmatch('/([2-9]\d\d)-([2-9]\d\d)-(\d{4})/',Phone);

Valid phone values are the position


values that are greater than 0.

40
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-30 Lesson 2 Using Advanced Functions

PRXMATCH Function Using a Constant

Loc=prxmatch('/([2-9]\d\d)-([2-9]\d\d)-(\d{4})/',Phone);

A Perl regular expression expressed as a


constant must be enclosed in quotation marks.
If the expression is a constant, the expression
is compiled only once.

41
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

PRXMATCH Function Using a Column

Exp='/([2-9]\d\d)-([2-9]\d\d)-(\d{4})/o';
Loc=prxmatch(Exp,Phone);

The Perl regular If the Perl regular expression


expression can be is specified as a column, end
expressed as a column. the expression with the letter o
or O. This causes the expression
to be compiled once and not
recompiled for each call to the
PRX function.

42
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Performing Pattern Matching with Perl Regular Expressions 2-31

PRXPARSE Function

pattern-ID-number = PRXPARSE(Perl-regular-expression);

The PRXPARSE function


returns a pattern identifier
number that is used by
other PRX functions and
call routines.

43
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

PRXMATCH Function Using a Pattern ID Number


Exp='/([2-9]\d\d)-([2-9]\d\d)-(\d{4})/o';

Pid=prxparse(Exp);

Loc=prxmatch(Pid,Phone);

The Pid value would be different for each row if the option
o was not used at the end of the Perl regular expression.
44
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

The pattern identifier number is not required with the PRXMATCH function. The pattern identifier
number is required by some of the PRX functions (PRXPAREN and PRXPOSN) and the majority of
the PRX call routines.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-32 Lesson 2 Using Advanced Functions

Validating Data with the PRXMATCH Function

Scenario
Use the PRXMATCH function to find phone numbers that match specified patterns.

Files
• p302d04.sas
• phonenumbers_us – a SAS table that contains nine rows of potentially valid US phone numbers

Syntax

pattern-ID-number=PRXPARSE(Perl-regular-expression);
PRXMATCH(Perl-regular-expression | pattern-ID-number, source)

Notes
• The PRXMATCH function searches for a pattern match and returns the position at which
the pattern is found. A value of zero is returned if no match is found.
• The Perl regular expression can be expressed as a constant, a column, or a pattern identifier
number.
• A Perl regular expression expressed as a constant must be enclosed in quotation marks.
If the expression is a constant, the expression is compiled only once.
• If the Perl regular expression is specified as a column, end the expression with the letter o or O.
This causes the expression to be compiled once and not recompiled for each call to the PRX
function.
• The PRXPARSE function returns a pattern identifier number that is used by other PRX functions
and call routines. PRXMATCH can use the pattern identifier number but it is not required.

Demo
1. Open the p302d04.sas program in the demos folder and find the Demo section. In the first
DATA step, notice the incomplete assignment statement.
Loc=prxmatch('/ /',Phone);
2. Add a Perl regular expression to the first argument of the PRXMATCH function to find valid
phone numbers.
Loc=prxmatch('/([2-9]\d\d)-([2-9]\d\d)-(\d{4})/',Phone);
3. Highlight and run the first DATA step and PROC PRINT step. Verify that the Loc value
represents the starting location of the 10-digit phone number. Rows 2, 3, and 7 should have
a Loc value greater than 0.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Performing Pattern Matching with Perl Regular Expressions 2-33

4. Copy and paste the Loc assignment statement. Modify the statement to create a column named
LocStartEnd and to find only values that start and end with the 10-digit number (no leading or
trailing text).
LocStartEnd=
prxmatch('/^([2-9]\d\d)-([2-9]\d\d)-(\d{4})$/',strip(Phone));
5. Highlight and run the first DATA step and PROC PRINT step. Verify that only row 2 has a
LocStartEnd value greater than 0.
6. Copy and paste the Loc assignment statement. Modify the statement to create a column named
LocParen. Alter the expression to find area codes in parentheses. In addition, instead of the first
hyphen, there might or might not be a space. There is no longer a hyphen after the area code.
LocParen=
prxmatch('/\(([2-9]\d\d)\)\s*([2-9]\d\d)-(\d{4})/',Phone);
7. Highlight and run the first DATA step and PROC PRINT step. Verify that only rows 8 and 9 have
a LocParen value greater than 0.
8. Add a subsetting IF statement to subset the rows where a pattern was matched. Highlight and
run the first DATA step and PROC PRINT step. Verify that only five rows are in the results.
if Loc ne 0 or LocStartEnd ne 0 or LocParen ne 0;

9. In the last DATA step, notice the CALL PRXDEBUG routine. Run the DATA step and view the
SAS log. Notice the Compiling line after each iteration.
call prxdebug(1); /* Sends debugging output to the SAS log. */

Iteration: _N_=1
Compiling REx `([2-9]\d\d)-([2-9]\d\d)-(\d{4})'
. . .
Iteration: _N_=2
. . .
Compiling REx `([2-9]\d\d)-([2-9]\d\d)-(\d{4})'

10. Add the O option to the end of the Perl regular expression. Run the DATA step and view the SAS
log. Notice that the Compiling line is now only after the first iteration.
Exp='/([2-9]\d\d)-([2-9]\d\d)-(\d{4})/o';

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-34 Lesson 2 Using Advanced Functions

2.06 Activity
Open p302a06.sas from the activities folder and perform the following
tasks:
1. Run the PROC PRINT step and view the results.
2. Modify the WHERE statement in the PROC PRINT step to find
all the values of Narrative that contain EF3, EF-3, EF4, or EF-4.
where prxmatch('/ /',Narrative)>0;

Add the appropriate syntax


to the Perl regular expression.

3. Run the PROC PRINT step. View the results and the SAS log.
How many rows were read based on the WHERE statement?
46
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

PRXCHANGE Function

PRXCHANGE(Perl-regular-expression, times, source)

The second argument is the number


of times to search and replace.
The PRXCHANGE The value -1 specifies to search and replace
function performs until the end of the source is reached.
a substitution for
a pattern match.

48
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Performing Pattern Matching with Perl Regular Expressions 2-35

Standardizing Data

49
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-36 Lesson 2 Using Advanced Functions

Perl Regular Expressions for Substitution

starting and ending delimiter

's/ AP / AIRPORT /'

Start the expression with a The middle delimiter separates the


lowercase s to signify pattern that you are searching (before)
substitution instead of and the pattern that you will use
matching. for substitution (after).

50
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Explanation of the Perl regular expression:


s Specifies a substitution regular expression
/ Beginning delimiter for regular expression
spaceAPspace Matches a space followed by the letters AP and a space
/ Middle delimiter that separates the search pattern from the substitution
pattern
spaceAIRPORTspace Substitutes a space followed by the letters AIRPORT and a space
/ Ending delimiter for regular expression

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Performing Pattern Matching with Perl Regular Expressions 2-37

Perl Regular Expressions for Substitution

's/ INT( |L |L. )/ INTERNATIONAL /i'

ignores letter case for the


pattern being searched

51
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-38 Lesson 2 Using Advanced Functions

Explanation of the Perl regular expression:


s Specifies a substitution regular expression
/ Beginning delimiter for regular expression
spaceINT Matches a space followed by the letters INT
( Marks the start of the group
space Matches a space after the letters INT
| Or operator
Lspace Matches the letter L and a space after the letters INT
| Or operator
L.space Matches the letter L, a period, and a space after the letters INT
) Marks the end of the group
/ Middle delimiter, which separates the search pattern from the
substitution pattern
spaceINTERNATIONALspace Substitutes with a space, the letters INTERNATIONAL, and a
space
/ Ending delimiter for regular expression
i Ignores letter case for the pattern being searched

Capture Buffers for Substitution


's/(-?\d+\.\d*)(@)(-?\d+\.\d*)/$3$2$1/'

$1 $2 $3 The capture buffers


are being rearranged.

Each grouping (set of parentheses)


creates a capture buffer. Each capture
buffer is referenced by a dollar sign
followed by the sequential number.

52
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Performing Pattern Matching with Perl Regular Expressions 2-39

Every set of parentheses used in a regular expression creates a capture buffer, a slot in a memory
buffer. Each slot is created in sequential order of the parentheses pairs and is referenced using the
dollar sign followed by the sequential number.
Explanation of Perl regular expression:
s Specifies a substitution regular expression
/ Beginning delimiter for regular expression
( Marks the start of group 1
-? Matches a negative sign, zero times or one time
\d+ Matches a digit, one or more times
\. Marks the period as a literal character that will be matched
\d* Matches a digit, zero or more times
) Marks the end of group 1
( Marks the start of group 2
@ Matches the @ symbol
) Marks the end of group 2
( Marks the start of group 3
-? Matches a negative sign, zero times or one time
\d+ Matches a digit, one or more times
\. Marks the period as a literal character that will be matched
\d* Matches a digit, zero or more times
) Marks the end of group 3
/ Middle delimiter that separates the search pattern from the substitution pattern
$3 Substitutes with capture buffer 3 (group 3)
$2 Substitutes with capture buffer 2 (group 2)
$1 Substitutes with capture buffer 1 (group 1)
/ Ending delimiter for regular expression

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-40 Lesson 2 Using Advanced Functions

Standardizing Data with the PRXCHANGE Function

Scenario
Use the PRXCHANGE function to standardize words in the names of weather stations and to switch
the order of longitude and latitude values for the location of weather stations.

Files
• p302d05.sas
• weather_usstationshourly – a SAS table that contains information about the United States
weather stations that collect data hourly

Syntax

PRXCHANGE(Perl-regular-expression, times, source)

Notes
• The PRXCHANGE function performs a substitution for a pattern match.
• The second argument of the PRXCHANGE function is the number of times to search and replace.
The value -1 specifies to search and replace until the end of source is reached.
• When using the PRXCHANGE function, the Perl regular expression starts with a lowercase s to
signify substitution instead of matching. The middle delimiter of the expression separates the
pattern that you are searching (before) and the pattern that you will use for substitution (after).
• Each grouping (set of parentheses) creates a capture buffer. Each capture buffer is referenced
with a dollar sign followed by the sequential number.

Demo
1. Open the p302d05.sas program in the demos folder and find the Demo section. In the DATA
step, notice the incomplete syntax for the first assignment statement for Name_New and the
complete syntax for the second assignment statement for Name_New.
Name_New=prxchange('s/ / /',-1,Name);
Name_New=prxchange('s/ INT( |L |L. )/ INTERNATIONAL /i',
-1,Name_New);
2. In the first assignment statement for Name_New, modify the Perl regular expression to replace
the letters AP with the word AIRPORT for all occurrences.
Name_New=prxchange('s/ AP / AIRPORT /',-1,Name);

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Performing Pattern Matching with Perl Regular Expressions 2-41

Alternatively, you could use \b (word boundary) in place of the leading and trailing spaces around
the string AP.
Name_New=prxchange('s/\bAP\b/AIRPORT/',-1,Name);
3. Run the DATA step and the PROC step. View the results and verify that the Name_New column
contains the standardized values of AIRPORT and INTERNATIONAL.

4. Uncomment the LatLong assignment statement in the DATA step and the VAR statement in the
PROC PRINT step. Modify the expression in the assignment statement to specify the
substitution of the third capture buffer followed by the second and first buffers.
LatLong=prxchange('s/(-*\d+\.\d*)(@)(-*\d+\.\d*)/$3$2$1/',
-1,LongLat);
5. Run the DATA step and the PROC step. View the results and verify that the latitude value now
appears before the longitude value in the LatLong column.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-42 Lesson 2 Using Advanced Functions

2.07 Activity
Open p302a07.sas from the activities folder and perform the following
tasks:
1. Run the program and view the results. Notice that the Loc column
contains the first position where EF- is found in the Narrative column.
2. Uncomment the Narrative_New assignment statement.
3. Modify the first argument of the PRXCHANGE function to find
the pattern of EF- and substitute it with the value EF.
4. Modify the second argument of the PRXCHANGE function so that
all occurrences of the pattern are substituted.
5. Run the program and verify that the Narrative_New column no longer
contains the string EF- for every Loc value greater than 0.
6. For row 7, how many EF- values were
54 substituted by EF?
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Beyond SAS Programming 3


What if you want to ...

… learn more about … learn more about


advanced functions? Perl regular expressions?
• Unstructured Data Analysis: Entity
• A Survey of Some of the Most Useful SAS® Resolution and Regular Expressions in SAS®
Functions by Ron Cody (paper) by Matthew Windham (book)

• Don't Be a SAS® Dinosaur: Modernizing • An Introduction to Perl Regular Expressions


Programs with Base SAS 9.2 Enhancements in SAS® 9 by Ron Cody (paper)
by Warren Repole Jr. (paper)
• Doing More with Perl Regular Expressions
on the Extended Learning page

56
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

You can find direct links to these resources in the Course Links section on the Extended Learning
page.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Performing Pattern Matching with Perl Regular Expressions 2-43

Practice

If you restarted your SAS session, open and submit the libname.sas program in the course files.

Level 1
4. Using the PRXMATCH and PRXCHANGE Functions
The pg3.np_acres table contains acreage amounts for national parks. The ParkName column
contains the descriptive name for each park. Find the national preserves by locating all rows with
a ParkName value that contains the string N PRES, N PRESERVE, NPRES, or NPRE followed
by a space. Within the ParkName values, modify the national preserve string to be displayed
with the string of NPRES.
a. Open the p302p04.sas program in the practices folder. Run the program to view the
ParkName values.
b. In an assignment statement, use the PRXMATCH function to create the column Position,
which is equal to the starting position of a string that represents national preserves. All
national preserves will contain one of the following strings followed by a space (\s): N PRES,
N PRESERVE, NPRES, or NPRE.

prxmatch('/ . . . /', source);

c. Add a subsetting IF statement to include only the rows where the Position values are
greater than zero.
d. Run the program and verify that nine rows are returned for national preserves.
e. In an assignment statement, use the PRXCHANGE function to change the strings N PRES,
N PRESERVE, or NPRES to be NPRE. Store the changed values in a column named
NewName.

prxchange('s/ . . . / . . . /', times, source);

f. Run the program and verify that NPRE is displayed for all nine rows in the column. How many
national preserves are in the state of Alaska (AK)?

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-44 Lesson 2 Using Advanced Functions

Level 2
5. Using the PRXCHANGE Function with Capture Buffers
The sashelp.baseball data set contains salary and performance information for Major League
Baseball players (excluding pitchers) who played at least one game in both the 1986 and 1987
seasons. The Name column contains the player’s name in the form LastName, FirstName (that
is, Mattingly, Don). For each player, rearrange the order of the player’s name to be in the form
FirstName LastName (that is, Don Mattingly).
a. Open the p302p05.sas program in the practices folder. Run the program to view the Name
values.
b. In an assignment statement, create a column named FirstLastName based on using the
PRXCHANGE function to rearrange the order of the Name column.

prxchange('s/ . . . / . . . /', times, source);


• Use three sets of parentheses to create three capture buffers that represent the pattern
of the Name column.
• The first set of parentheses represents the last name. The last name can contain an
embedded blank (for example, Van Slyke) or a special character (for example, O’Brien).
• The second set of parentheses represents the comma and space.
• The third set of parentheses represents the first name. The first name can contain
an embedded blank (that is, Billy Jo).
• Use the reference to the capture buffers to rearrange the order of the capture buffers
so that the player’s name is in the form of FirstName LastName.
c. Run the program and verify the FirstLastName column.
• What is the value of FirstLastName for row 41?
• What is the value of FirstLastName for row 236?
• What is the byte size of the FirstLastName column?

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Performing Pattern Matching with Perl Regular Expressions 2-45

Challenge
6. Using the CALL PRXSUBSTR Routine
The pg3.np_unstructured_codes table contains a single column (Column1) whose contents
include park codes and names. Extract the first occurrence of the park codes. Park codes are 3
to 10 uppercase letters.

a. Open the p302p06.sas program in the practices folder. Run the program to view the
Column1 values.
b. Create a column named Expression that is equal to a Perl regular expression that includes
at least three uppercase letters followed by zero or more uppercase letters. Be sure to end
the expression with the letter o so that it is compiled only once.
c. Create a column named PatternID that uses the PRXPARSE function on the Expression
column to return a pattern identifier number.
d. Use the CALL PRXSUBSTR routine to create a MyStart column and a MyLength column.
The routine looks for the expression (PatternID) in Column1.

CALL PRXSUBSTR(pattern-identifier-number, source, position <,length>);

Note: Use the SAS documentation (SAS Functions and CALL Routines: Reference) to
learn about the CALL PRXSUBSTR routine.
e. Run the program and verify the MyStart and MyLength columns.

f. Use the SUBSTR function to create a column named ParkCode based on the MyStart and
MyLength columns.
g. Run the program and verify the ParkCode values. Are the majority of ParkCode values
three or four letters?

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-46 Lesson 2 Using Advanced Functions

2.3 Solutions
Solutions to Practices
1. Using the LAG Function
data work.ParkTraffic2016;
set pg3.np_2016traffic;
by ParkCode;
PrevMthTC=lag1(TrafficCount);
if first.ParkCode=1 then PrevMthTC=.;
OneMthChange=TrafficCount-PrevMthTC;
run;

title '2016 National Park Traffic Counts';


proc print data=work.ParkTraffic2016;
run;
What is the value of OneMthChange for row 13? . (numeric missing)
2. Using the COUNT and FINDW Functions
data work.SouthRim;
set pg3.np_grandcanyon;
NumSouth=count(Comments,'South','i');
if NumSouth>0;
SouthWordPos=findw(Comments,'South',' .','ei');
AfterSouth=scan(Comments,SouthWordPos+1,' .');
run;

title 'Grand Canyon Comments Regarding South Rim';


proc print data=work.SouthRim;
run;
title;
What word occurs most often after the word South? Entrance
3. Using the COUNTC Function
data work.Mammal_Names;
set pg3.np_mammals(keep=Scientific_Name Common_Names);
SpecCharNum=countc(Common_Names,',/*');
if SpecCharNum=0 then do;
Name=Common_Names;
output;
end;
else do i=1 to SpecCharNum+1;
Name=scan(Common_Names,i,',/*');
output;
end;
run;

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.3 Solutions 2-47

title 'National Park Mammals';


proc print data=work.Mammal_Names;
run;
title;
How many values of Name are present for the species Vulpes vulpes (starts on row 10)? six
4. Using the PRXMATCH and PRXCHANGE Functions
data work.NationalPreserves;
set pg3.np_acres;
Position=
prxmatch('/N PRES\s|N PRESERVE\s|NPRES\s|NPRE\s/',ParkName);
if Position ne 0;
NewName=
prxchange('s/N PRES\s|N PRESERVE\s|NPRES\s/NPRE /',
1,ParkName);
run;

title 'National Preserves (NPRE)';


proc print data=work.NationalPreserves;
run;
title;
How many national preserves are in the state of Alaska (AK)? four
5. Using the PRXCHANGE Function with Capture Buffers
data work.BaseballPlayers;
set sashelp.baseball(keep=Name);
FirstLastName=
prxchange('s/(\w+\D*\w*)(, )(\w+\s*\w*)/$3 $1/',-1,Name);
run;

title 'Names of Baseball Players';


proc print data=work.BaseballPlayers;
run;
title;
What is the value of FirstLastName for row 41? Billy Jo Robidoux
What is the value of FirstLastName for row 236? Pete O’Brien
What is the byte size of the FirstLastName column? 200

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-48 Lesson 2 Using Advanced Functions

6. Using the CALL PRXSUBSTR Routine


data work.ParkCodes;
set pg3.np_unstructured_codes;
Expression='/[A-Z]{3}[A-Z]*/o';
PatternId=prxparse(Expression);
call prxsubstr(PatternID,Column1,MyStart,MyLength);
ParkCode=substr(Column1,MyStart,MyLength);
run;

title 'Park Codes from Unstructured Column';


proc print data=work.ParkCodes;
run;
title;
Are the majority of ParkCode values three or four letters? four letters

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.3 Solutions 2-49

Solutions to Activities and Questions

2.01 Activity – Correct Answer


The FIND function
returns a number that
represents the first
Two arguments are required. character position
where substring is
found in string.

Two modifiers (i and t) are available.

6
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

2.02 Activity – Correct Answer


data work.stockmovingaverage;
set pg3.stocks_ABC(drop=Close);
Open1MnthBack=lag1(Open);
Open2MnthBack=lag2(Open);
Open3MnthAvg=mean(Open,Open1MnthBack,Open2MnthBack);
format Open3MnthAvg 8.2;
run;

What is the three-month average (Open3MnthAvg) for 02MAR2010? 102.90

The third row is the first row


that calculates an average
based on three values.

14
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-50 Lesson 2 Using Advanced Functions

2.03 Activity – Correct Answer


if _N_ ge 3 then
Open3MnthAvg=mean(Open,lag1(Open),lag2(Open));

Is Open3MnthAvg equal to 102.90 for 02MAR2010? No

The LAG function stores previous values in a queue when the


LAG function is executed. If the LAG function is not executed
due to a false condition, the previous value is not stored.

17
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

2.04 Activity – Correct Answer

The FINDW function contains more


modifiers (20+) than the FIND function.

26
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.3 Solutions 2-51

2.05 Activity – Correct Answer


Which Perl regular expression will not find all tornados with a value of EF3,
EF-3, EF4, or EF-4?

a. '/(EF3|EF-3|EF4|EF-4)/'
b. '/(EF-?3|EF-?4)/'
c. '/EF-?(3|4)/' (…) Parentheses are for grouping.

| Vertical line is for OR situation.


d. '/EF-?[34]/'
? Matches the preceding character 0 or 1 times.
e. '/EF.[34]/'
[…] Matches a character in the brackets.

. Matches any character.

37
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

2.06 Activity – Correct Answer


where prxmatch('/(EF3|EF-3|EF4|EF-4)/',Narrative)>0;

where prxmatch('/(EF-?3|EF-?4)/',Narrative)>0;

where prxmatch('/EF-?(3|4)/',Narrative)>0;

where prxmatch('/EF-?[34]/',Narrative)>0;

NOTE: There were 21 observations read . . .


47
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-52 Lesson 2 Using Advanced Functions

2.07 Activity – Correct Answer


data work.tornadoEF; The default byte size of a
set pg3.tornado_2017narrative; column created based on a
length Narrative_New $ 4242; PRX function is 200 bytes.
Pos=prxmatch('/EF-/',Narrative);
Narrative_New=prxchange('s/EF-/EF/',-1,Narrative);
run;

Three values were


substituted for row 7.
55
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
Lesson 3 Defining and Processing
Arrays
3.1 Defining and Referencing One-Dimensional Arrays ........................................................... 3-3

3.2 Doing More with One-Dimensional Arrays ......................................................................... 3-13


Demonstration: Processing One-Dimensional Arrays: Part 1 .............................................. 3-17
Demonstration: Processing One-Dimensional Arrays: Part 2 .............................................. 3-26
Practice ................................................................................................................................. 3-29

3.3 Defining and Referencing Two-Dimensional Arrays ......................................................... 3-32


Demonstration: Processing Two-Dimensional Arrays: Part 1 .............................................. 3-40
Demonstration: Processing Two-Dimensional Arrays: Part 2 .............................................. 3-46
Practice ................................................................................................................................. 3-50

3.4 Solutions ................................................................................................................................ 3-53


Solutions to Practices ........................................................................................................... 3-53
Solutions to Activities and Questions ................................................................................... 3-56
3-2 Lesson 3 Defining and Processing Arrays

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Defining and Referencing One-Dimensional Arrays 3-3

3.1 Defining and Referencing One-


Dimensional Arrays

Processing Repetitive Code: Example 1

if Weight='High' then HighCount+1;


if BlPres='High' then HighCount+1;
if Pulse='High' then HighCount+1;
if Chol='High' then HighCount+1;
if Glucose='High' then HighCount+1;

3
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved. p303d01

Processing Repetitive Code: Example 1

do i = 1 to 5;

if ? ='High' then HighCount+1;

end;
How can I refer to the five
columns with one reference?

4
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-4 Lesson 3 Defining and Processing Arrays

Array Processing

processing repetitive code Programmers use arrays


to simplify programs by
performing similar tasks
on related columns.
A SAS array provides a
way to reference a group
of columns for processing
in the DATA step. rotating data

performing table lookups


5
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

What Is an Array?

array XYZ

An array is a temporary 1 2 3
grouping of SAS PDV
columns that are
col_A col_B col_C col_D col_E
arranged in a particular
order and identified by
an array name.

6
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Defining and Referencing One-Dimensional Arrays 3-5

Incorporating an Array

define
ARRAY array-name[number-of-elements] <array-elements> ;
array

An array is
incorporated into a
reference DATA step by first
array-name[element-number]
array defining the array
and then referencing
the array.

7
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Defining a One-Dimensional Array

ARRAY array-name[number-of-elements] <array-elements> ;

An ARRAY statement specifies


the name of the array and the
number of elements (columns)
in the array. In addition, the
array elements can be specified.

8
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-6 Lesson 3 Defining and Processing Arrays

Defining a One-Dimensional Array

ARRAY array-name[number-of-elements] <array-elements> ;

The number of elements must be


enclosed in either parentheses (),
braces {}, or brackets [].

The array elements must be of the


same data type: character or numeric.

9
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Number of Elements
health

1st 2nd 3rd 4th 5th


element element element element element
PDV
Name Weight BlPres Pulse Chol Glucose HighCount

array health[5] Weight BlPres Pulse Chol Glucose;

This number indicates a one-dimensional array with


a lower bound of 1 and an upper bound of 5.
10
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Defining and Referencing One-Dimensional Arrays 3-7

Array Elements
health

1st 2nd 3rd 4th 5th


element element element element element
PDV
Name Weight BlPres Pulse Chol Glucose HighCount

array health[5] Weight BlPres Pulse Chol Glucose;

Array elements can be specified in any order and do


not have to be positioned consecutively in the PDV.
11
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Array Elements
health

1st 2nd 3rd 4th 5th


element element element element element
PDV
Name Weight BlPres Pulse Chol Glucose HighCount

array health[5] Weight--Glucose;

Array elements can be specified using column lists. The double


hyphen specifies all columns ordered as they are in the PDV.
12
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-8 Lesson 3 Defining and Processing Arrays

The following are types of column lists that can be used to reference elements in the ARRAY
statement:

SAS Column Lists

Specifies all columns from x1 to xn inclusive. You can begin with any
Numbered
x1-xn number and end with any number as long as you do not violate the rules
range lists
for user-supplied column names and the numbers are consecutive.

Specifies all columns ordered as they are in the program data vector,
x--a
from x to a inclusive.
Name range
lists x-numeric-a Specifies all numeric columns from x to a inclusive.

x-character-a Specifies all character columns from x to a inclusive.

Name prefix Specifies all the columns that begin with REV, such as REVJAN,
REV:
lists REVFEB, and REVMAR.

Specifies all columns that are already defined in the current


_ALL_
DATA step.

Special SAS Specifies all numeric columns that are already defined in the current
_NUMERIC_
name lists DATA step.

Specifies all character columns that are already defined in the current
_CHARACTER_
DATA step.

Referencing a One-Dimensional Array

array-name[element-number]

To reference an array, you


specify the array name
followed by the number of
the desired element in
either parentheses (),
braces {}, or brackets [].

13
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Defining and Referencing One-Dimensional Arrays 3-9

Referencing a One-Dimensional Array

An array reference is
simply another name
for referencing a
column in a DATA step.
health

1st 2nd 3rd 4th 5th


element element element element element

PDV health[1] health[2] health[3] health[4] health[5]


Name Weight BlPres Pulse Chol Glucose HighCount

14
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Referencing an Array in a DO Loop

An array reference
is typically used
within a DO loop.
DO index-column = 1 to number-of-elements;

. . . array-name[index-column] . . .

END;

The index column is used to


reference the element number.
15
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-10 Lesson 3 Defining and Processing Arrays

Processing Repetitive Code: Example 1

array health[5] Weight--Glucose;

do i = 1 to 5;

if health[i]='High' then HighCount+1;

end;

16
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Processing Repetitive Code: Example 2


monthly average Fahrenheit temperatures for the year 2017

Use an array to convert Fahrenheit temperatures to Celsius.


Celsius = (Fahrenheit - 32) * 5 / 9;

monthly average Celsius temperatures for the year 2017


17
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Defining and Referencing One-Dimensional Arrays 3-11

3.01 Activity

array Temperature[#] Temp1-Temp12;


do Month=1 to #;

Temperature[???]=(Temperature[???]-32)*5/9;

18
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Unknown Number of Array Elements


data work.DublinMadrid2018(drop=Month);
set pg3.weather_dublinmadrid_monthly2018
(keep=City Temp:);
array Temperature[*] Temp:;
Temp: specifies all
columns that
begin with Temp.
run; Use an asterisk so that
SAS automatically sizes
the array to the number
of elements specified.

20
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-12 Lesson 3 Defining and Processing Arrays

Unknown Number of Array Elements


data work.DublinMadrid2018(drop=Month);
set pg3.weather_dublinmadrid_monthly2018
(keep=City Temp:);
array Temperature[*] Temp:;
do Month=1 to dim(Temperature);
Temperature[Month]=(Temperature[Month]-32)*5/9;
end;
format Temp: 6.1;
run; Use the DIM
function to return
the number of
elements in an array.
DIM(array-name)

21
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

3.02 Activity

array Temperature[12] Temp1-Temp12;

do Month=1 to 12; DIM(array-name)

22
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Doing More with One-Dimensional Arrays 3-13

3.2 Doing More with One-Dimensional


Arrays

How do the
temperatures compare
Based on my weather between the two cities?
preferences, should I
spend the holiday in
Dublin, Ireland, or
Which quarter has
Madrid, Spain?
the lowest
percentage of
precipitation?

Do I like the first


quarter or third
quarter weather
better?

25
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

continued...
Scenario
Quarter 1
(months 1, 2, and 3)

Quarter 3
(months 7, 8, and 9)

Use arrays to add Celsius


temperature columns to a
quarter 1 table and a
quarter 3 table.

26
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-14 Lesson 3 Defining and Processing Arrays

Scenario

In addition, use arrays to


calculate the quarterly
percent of precipitation.

27
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Referencing Existing Columns with an Array

array Farenht[3] Temp1-Temp3;

PDV Farenht[1] Farenht[2] Farenht[3]


City Temp1 Temp2 Temp3
$ 24 N8 N8 N8

existing columns read


from input table

28
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Doing More with One-Dimensional Arrays 3-15

Creating Numeric Columns with an Array

array Celsius[3] TempC1-TempC3;

PDV Farenht[1] Farenht[2] Farenht[3] Celsius[1] Celsius[2] Celsius[3]


City Temp1 Temp2 Temp3 TempC1 TempC2 TempC3
$ 24 N8 N8 N8 N8 N8 N8

new columns
being created

29
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Not Specifying Array Elements

array Celsius[3];

PDV Farenht[1] Farenht[2] Farenht[3] Celsius[1] Celsius[2] Celsius[3]


City Temp1 Temp2 Temp3 Celsius1 Celsius2 Celsius3
$ 24 N8 N8 N8 N8 N8 N8

If array elements are not specified in the ARRAY statement, SAS


creates column names by combining the array name and a number.

30
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-16 Lesson 3 Defining and Processing Arrays

Specifying Lower and Upper Bounds

upper lower upper


bound bound bound

array Celsius[3]; array Celsius[1:3];

implicitly specifies a bound explicitly specifies a bound


range of 1 to n, where n is range of the lower bound to
the number of elements the upper bound

31
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Specifying Lower and Upper Bounds

array Farenht[7:9] Temp7-Temp9;


array Celsius[7:9] TempC7-TempC9;

PDV Farenht[7] Farenht[8] Farenht[9] Celsius[7] Celsius[8] Celsius[9]


City Temp7 Temp8 Temp9 TempC7 TempC8 TempC9
$ 24 N8 N8 N8 N8 N8 N8

Specifying lower and upper bounds is beneficial if you want


to start the lower bound at another value other than 1.

32
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Doing More with One-Dimensional Arrays 3-17

Processing One-Dimensional Arrays: Part 1

Scenario
Modify one DATA step to calculate the Celsius temperatures based on the Fahrenheit temperatures
for months 1 through 3 and months 7 through 9. Modify an additional DATA step to calculate the
quarterly percentages of precipitation based on the total yearly precipitation.

Files
• p303d01.sas
• weather_dublinmadrid_monthly2017 – a SAS table that contains the average monthly
Fahrenheit temperatures (Temp1-Temp12) and the total quarterly precipitation values in inches
(PrecipQ1-PrecipQ4) for Dublin, Ireland, and Madrid, Spain

Syntax

ARRAY array-name[number-of-elements | *] <array-elements>;

array-name[element-number | lower-bound:upper-bound]

DIM(array-name)

Notes
• A SAS array provides a way to reference a group of columns for processing in the DATA step.
Programmers use arrays to simplify programs by performing similar tasks on related columns.
• An array is a temporary grouping of related SAS columns that are arranged in a particular order
and identified by an array name.
• An array is incorporated into a DATA step by first defining the array and then referencing the array.
• An ARRAY statement specifies the name of the array and the number of elements (columns) in the
array. In addition, the array elements can be specified.
• The number of elements must be enclosed in either parentheses (), braces {}, or brackets [].
• The array elements must be of the same data type: character or numeric.
• To reference an array, you specify the array name followed by the number of the desired element
in either parentheses (), braces {}, or brackets []. An array reference is simply another name for
referencing a column in a DATA step.
• An array reference is typically used within a DO loop.
• Use an asterisk in place of the number of array elements to have SAS automatically size the array
to the number of elements specified.
• Use the DIM function to return the number of elements in an array.
• In a one-dimensional array, elements are referenced by their position using a single number.
• The ARRAY statement can define an array based on existing columns read from the input table.
• The ARRAY statement can define an array based on new columns that are being created.
By default, the new columns are numeric.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-18 Lesson 3 Defining and Processing Arrays

• If array elements are not specified in an ARRAY statement, SAS uses the name of the array
and adds a numeric suffix to create the specified number of columns.
• When defining an array with the number of elements, the lower bound is 1 and the upper bound
is the number of elements. If you want a lower bound other than 1, you can specify a number as
the lower bound, followed by a colon and a number for the upper bound.

Demo
1. Open the p303d01.sas program in the demos folder and find the Demo section. In the first
DATA step, notice the two ARRAY statements. The array Farenht references existing columns
of Fahrenheit temperatures read from the input table. The array Celsius references new numeric
columns that are being created.
*First DATA Step;
data work.tempQ1(drop=Month);
set pg3.weather_dublinmadrid_monthly2017
(keep=City Temp1-Temp3);
array Farenht[3] Temp1-Temp3;
array Celsius[3] TempC1-TempC3;

format TempC1-TempC3 6.1;


run;
2. Add the following DO loop to the first DATA step after the ARRAY statements. The assignment
statement calculates the Celsius temperatures.
do Month=1 to 3;
Celsius[Month]=(Farenht[Month]-32)*5/9;
end;
3. Highlight and run the DATA step. Verify that your output table tempQ1 contains the three existing
Fahrenheit temperatures and the three new Celsius temperatures for months 1 through 3.

4. Modify the first DATA step to create the table tempQ3 that contains the Celsius temperatures for
months 7 through 9.
data work.tempQ3(drop=Month);
set pg3.weather_dublinmadrid_Monthly2017
(keep=City Temp7-Temp9);
array Farenht[7:9] Temp7-Temp9;
array Celsius[7:9] TempC7-TempC9;
do Month=7 to 9;
Celsius[Month]=(Farenht[Month]-32)*5/9;
end;
format TempC7-TempC9 6.1;
run;

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Doing More with One-Dimensional Arrays 3-19

5. Highlight and run the DATA step. Verify that your output table tempQ3 contains the three existing
Fahrenheit temperatures and the three new Celsius temperatures for months 7 through 9.

6. Open the pg3.weather_dublinmadrid_monthly2017 table. Notice that the table contains four
quarterly precipitation columns in inches (PrecipQ1-PrecipQ4) in addition to the Fahrenheit
temperature columns.
7. In the second DATA step, notice the two array statements. The array P references existing
columns of quarterly precipitation read from the input table. The array Pct references new
numeric columns that are being created.
*Second DATA Step;
data work.precip(drop=i);
set pg3.weather_dublinmadrid_Monthly2017
(keep=City PrecipQ1-PrecipQ4);
array P[4] PrecipQ1-PrecipQ4;

array Pct[4] PrecipPctQ1-PrecipPctQ4;


do i=1 to 4;

end;
format PrecipPctQ1-PrecipPctQ4 percent8.1;
run;
8. After the first ARRAY statement, add an assignment statement to calculate the total yearly
precipitation by summing the four quarterly precipitation columns.
PrecipTotal=sum(of PrecipQ1-PrecipQ4);
An alternative for specifying the columns in the SUM function is to reference all elements of the
P array by using an asterisk in an array reference.
PrecipTotal=sum(of P[*]);
9. In the DO loop, add an assignment statement to calculate the quarterly percent of precipitation
based on the quarterly precipitation divided by the total yearly precipitation.
Pct[i]=P[i]/PrecipTotal;
10. Highlight and run the DATA step. Verify that your output table precip contains the four existing
quarterly precipitation columns and the four new percentages of quarterly precipitation along with
the total yearly precipitation.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-20 Lesson 3 Defining and Processing Arrays

Rotating Data: Example 1

34
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Rotating Data: Example 1


array P[4] PrecipQ1-PrecipQ4;

do Quarter=1 to 4;

Precip=P[Quarter];

output;

end;

35
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved. p303d02

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Doing More with One-Dimensional Arrays 3-21

Rotating Data: Example 2

1 in = 2.54 cm

36
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

continued...
3.03 Activity
Open p303a03.sas from the activities folder and perform the following tasks:
1. Run the DATA step, which does not include ARRAY syntax, and verify that
the new table contains 20 rows of rotated data.
2. Modify the DATA step to use ARRAY syntax to rotate the data.
Quarter=1; Precip=PrecipQ1*2.54; output;
Quarter=2; Precip=PrecipQ2*2.54; output; Delete 12
Quarter=3; Precip=PrecipQ3*2.54; output; statements.
Quarter=4; Precip=PrecipQ4*2.54; output;

array P[4] PrecipQ1-PrecipQ4;


do Quarter=1 to 4;
Precip=P[Quarter]*2.54; Add 5
output; statements.
end;
37
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-22 Lesson 3 Defining and Processing Arrays

3.03 Activity
3. Run the DATA step, which now includes ARRAY syntax, and verify that
the new table contains 20 rows of rotated data.
4. Run the PROC SGPLOT step to create the desired bar chart.
5. What is the highest average quarterly precipitation in centimeters
for Dublin?

38
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Scenario

Compare each quarterly


precipitation value to the
average quarterly
precipitation value to create
a quarterly status column.

8.94 is greater than 7.65, so


StatusQ1 is equal to Above.
40
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Doing More with One-Dimensional Arrays 3-23

Specifying Initial Values for Array Elements

ARRAY array-name[number-of-elements] < $ length array-elements (initial-values) > ;

array PAvg[4] PAvgQ1-PAvgQ4 (7.65 , 6.26 , 7.56 , 9.12);

PDV PAvg[1] PAvg[2] PAvg[3] PAvg[4]


City Year PAvgQ1 PAvgQ2 PAvgQ3 PAvgQ4
...
$ 24 N8 N8 R N8 R N8 R N8 R
7.65 6.26 7.56 9.12

41
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Initial values are matched to elements by position. The initial values must be defined in one set of
parentheses and can be separated with either a comma or a blank space. Initial values are
automatically retained. Character initial values must be enclosed in quotation marks.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-24 Lesson 3 Defining and Processing Arrays

Specifying Temporary Array Elements

ARRAY array-name[number-of-elements] < $ length array-elements (initial-values) > ;

array PAvg[4] _temporary_ (7.65 , 6.26 , 7.56 , 9.12);

PDV PAvg[1] PAvg[2] PAvg[3] PAvg[4]


City Year
... D N8 R D N8 R D N8 R D N8 R
$ 24 N8
7.65 6.26 7.56 9.12

42
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Temporary data elements do not have column names and must be referenced by the array name
and the element number. They do not appear in the output table and are automatically retained.
Temporary data elements are useful when the only purpose for creating an array is to perform
a calculation. You can improve performance by using temporary data elements. You cannot use
an asterisk to refer to all elements when using temporary data elements.

Creating Character Columns with an Array

ARRAY array-name[number-of-elements] < $ length array-elements (initial-values) > ;

array Status[4] $ 5 StatusQ1-StatusQ4;

PDV Status[1] Status[2] Status[3] Status[4]


City Year StatusQ1 StatusQ2 StatusQ3 StatusQ4
...
$ 24 N8 $5 $5 $5 $5

43
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Doing More with One-Dimensional Arrays 3-25

3.04 Multiple Answer Question


Which ARRAY statements have incorrect syntax for a one-dimensional array?

ARRAY array-name[number-of-elements] < $ length array-elements (initial-values) > ;


The items in the ARRAY statement must be specified in this order.
a. array student[3] $ 12;
b. array yr[2013:2018] Year2013-Year2018;
c. array bike[5] Week1 Week2 Week3 Week4;
d. array increase[4] (1.05, 1.09, 1.07, 1.10);
e. array hobby[3] _temporary_ $ 10 ('running',
'reading', 'traveling');

44
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-26 Lesson 3 Defining and Processing Arrays

Processing One-Dimensional Arrays: Part 2

Scenario
Modify a DATA step to create a quarterly status column based on a comparison of the quarterly
precipitation for one year to the average quarterly precipitation column for five years.

Files
• p303d02.sas
• pg3.weather_dublinmadrid_monthly5yr – a SAS table that contains the average monthly
Fahrenheit temperatures (Temp1-Temp12) and the total quarterly precipitation values in inches
(PrecipQ1-PrecipQ4) for Dublin, Ireland, and Madrid, Spain for a five-year period (2013-2017)

Syntax

ARRAY array-name[number-of-elements | *]
<$ length array-elements | _temporary_ (initial-values)>;

Notes
• In an ARRAY statement, initial values can be specified for corresponding elements in the array.
The initial values must be specified in one set of parentheses, and the values can be separated
with a comma or blank space. The initial values are automatically retained.
• In the ARRAY statement, the list of array elements can be replaced with _TEMPORARY_.
Temporary data elements do not have column names, so they must be referenced by the array.
Also, temporary data elements do not appear in the output table.
• If the ARRAY statement is responsible for creating new character columns, the dollar sign is
needed to specify that the elements are to be character. You can also specify a length for the
character columns. If no length is specified, the default length is 8 bytes.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Doing More with One-Dimensional Arrays 3-27

Demo
1. Open the p303d02.sas program in the demos folder and find the Demo section. In the DATA
step, notice the three ARRAY statements. The array P references existing columns of quarterly
precipitation read from the input table. The array PAvg creates and references new numeric
columns with initial values representing quarterly averages more than five years. The array
Status references new character columns that are being created with a byte size of 5.
data work.DublinPrecipStatus(drop=i);
set pg3.weather_dublinmadrid_monthly5yr
(keep=City Year PrecipQ1-PrecipQ4);
where City='Dublin';
array P[4] PrecipQ1-PrecipQ4;
array PAvg[4] PAvgQ1-PAvgQ4 (7.65 , 6.26 , 7.56 , 9.12);
array Status[4] $ 5 StatusQ1-StatusQ4;
do i=1 to 4;

end;

run;
2. In the DO loop, add three conditional statements to create the values for the status columns
based on the comparison of the precipitation columns with the average precipitation columns.
if P[i] > PAvg[i] then Status[i]='Above';
else if P[i] < PAvg[i] then Status[i]='Below';
else if P[i] = PAvg[i] then Status[i]='Same';
3. Run the DATA step and view the output table. Notice the redundant rows for the PAvg1 through
PAvg4 columns.
4. Add a DROP statement to eliminate the average precipitation columns.
drop PAvgQ1-PAvgQ4;
5. Run the DATA step. Verify that the output table contains the four precipitation columns and the
four status columns but not the four average precipitation columns.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-28 Lesson 3 Defining and Processing Arrays

6. Alternatively, delete the DROP statement and replace the syntax of PAvgQ1-PAvgQ4 with
_TEMPORARY_ in the ARRAY statement for the PAvg array.
data work.DublinPrecipStatus(drop=i);
set pg3.weather_dublinmadrid_monthly5yr
(keep=City Year PrecipQ1-PrecipQ4);
where City='Dublin';
array P[4] PrecipQ1-PrecipQ4;
array PAvg[4] _temporary_ (7.65 , 6.26 , 7.56 , 9.12);
array Status[4] $ 5 StatusQ1-StatusQ4;
do i=1 to 4;
if P[i] > PAvg[i] then Status[i]='Above';
else if P[i] < PAvg[i] then Status[i]='Below';
else if P[i] = PAvg[i] then Status[i]='Same';
end;
run;
7. Run the DATA step and verify that the output table contains the same data as before the
alternative changes.
8. Self-study: The section at the end of the demo program is an example of storing the initial values
in a macro variable and then referencing the macro variable in the ARRAY statement.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Doing More with One-Dimensional Arrays 3-29

Practice

If you restarted your SAS session, open and submit the libname.sas program in the course files.

Level 1
1. Using One-Dimensional Arrays on Numeric Data
The pg3.eu_occ table contains monthly occupancy rates broken down by type of property
(Hotel, ShortStay, and Camp) for European countries from January 2004 through September
2017. Calculate the percentage that each type of property represents of the total occupancy for
each month and year by country.
a. Open the p303p01.sas program in the practices folder. Run the program to view the
European occupancy data.
b. Add an ARRAY statement to create an array named OccType that references the three
existing columns of property type: Hotel, ShortStay, and Camp.
c. Add another ARRAY statement to create an array named OccPct that creates the numeric
columns HotelPct, ShortStayPct, and CampPct.
d. Add a DO loop with the index column Num. Use a start value of 1 and an end value of 3.
e. Within the DO loop, add an assignment statement using array references to calculate the
percentage of occupancy rate. The percentage (HotelPct, ShortStayPct, and CampPct)
is equal to the property type (Hotel, ShortStay, and Camp) divided by the total occupancy
(OccTotal).
f. Run the program and verify the results.

g. Add to the FORMAT statement to format HotelPct, ShortStayPct, and CampPct using
the PERCENT8.1 format.
h. Add a DROP statement to eliminate the Num column from the output table.
i. Run the program and verify the results. In general, which percentages are the highest (hotel,
short stay, or camp)?

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-30 Lesson 3 Defining and Processing Arrays

Level 2
2. Using One-Dimensional Arrays on Character Data
The pg3.test_answers table contains employee answers (A through E) to 10 test questions.
Calculate the test score for each employee by comparing their answers to the correct answers.
a. Open the p303p02.sas program in the practices folder. Run the program to view the
employee’s answers.
b. Create an array named EmpAnswer that references the employee’s answers to the 10 test
questions.
c. Create a temporary array named CorAnswer that references the correct answers to the 10
questions as shown below.
Question: 1 2 3 4 5 6 7 8 9 10
Answer: A C C B E E D B B A
d. Within a DO loop, use a conditional IF/THEN statement to compare the employee answer to
the correct answer for the 10 questions. If the values compare, add 1 to the column Score.
e. Eliminate the index column from the output table.
f. Run the program and verify the results. How many employees have a perfect score?

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Doing More with One-Dimensional Arrays 3-31

Challenge
3. Calculating the Differences between Array Elements
The pg3.storm_range table contains four recorded wind speeds (Wind1, Wind2, Wind3, and
Wind4) for storms from the 1980 season through the 2016 season. Using arrays, calculate the
difference in Wind1 and Wind2, Wind2 and Wind3, and Wind3 and Wind4.
a. Open the p303p03.sas program in the practices folder. The current DATA step is calculating
the three desired wind differences. Run the program to view the wind differences.
b. Modify the DATA step to calculate the wind differences using two arrays and a DO loop.
Note: An expression can be used to determine the element number in an array reference.
c. Run the program and verify the results. Based on the PROC MEANS results, what is the
maximum value of Diff12, Diff23, and Diff34?

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-32 Lesson 3 Defining and Processing Arrays

3.3 Defining and Referencing Two-


Dimensional Arrays

Two-Dimensional Arrays

In a two-dimensional
array XYZ
array, elements are
referenced using two
dimensions: a row 1 2 3 4
number and a
column number. XYZ[1,1] XYZ[1,2] XYZ[2,1] XYZ[2,2]
col_A col_B col_C col_D col_E

49
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Performing a Table Lookup: Example 1


pg3. weather_dublin_daily5yr

The weather_dublin_daily5yr
table contains the daily
average temperatures for the
years 2013 through 2017.

(where=(day(Date)=15 and
month(Date) le 3 and
year(Date) in (2013,2014))
keep=Date TempDailyAvg)

50
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.3 Defining and Referencing Two-Dimensional Arrays 3-33

Performing a Table Lookup: Example 1


pg3. weather_dublin_monthly5yr

The weather_dublin_monthly5yr
table contains the monthly
average temperatures for the
years 2013 through 2017.
JAN MAR

FEB

(where=(Year in (2013,2014))
keep=Year Temp1-Temp3)

51
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Performing a Table Lookup: Example 1


pg3. weather_dublin_daily5yr pg3. weather_dublin_monthly5yr

+
Combine the two tables to calculate the
difference between the daily average temperature
and the monthly average temperature.
52
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-34 Lesson 3 Defining and Processing Arrays

How many steps are needed for this


Discussion example if you want to use a DATA
step with a MERGE statement?

Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

DATA Step MERGE Versus DATA Step ARRAY


PROC rotate the monthly
TRANSPOSE table and create
step TempMonthlyAvg
• load the monthly
read the rotated data into an array
DATA step • read the daily table and
monthly table to DATA step
with SET create Month create Year and Month
with SET
• create TempMonthlyAvg
and by looking up monthly
read the daily ARRAY
DATA step values in the array based
table to create
with SET on Year and Month
Year and Month
• calculate Difference

DATA step merge tables and


with MERGE calculate Difference

55
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved. p303d03

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.3 Defining and Referencing Two-Dimensional Arrays 3-35

Here is a DATA step MERGE solution for this example. The merge solution is multiple steps versus
the array solution is one DATA step. Refer to demo p303d03 for the merge solution and demo
p303d04 for the array solution.
proc transpose data=pg3.weather_dublin_monthly5yr
(where=(Year in (2013,2014))
keep=Year Temp1-Temp3)
out=work.dublin_monthly5yr_rotated
(rename=(COL1=TempMonthlyAvg))
name=MonthChar;
by Year;
var Temp1-Temp3;
run;

data work.dublin_monthly5yr;
set work.dublin_monthly5yr_rotated;
Month=input(substr(MonthChar,5),2.);
drop MonthChar;
run;

data work.dublin_daily5yr;
set pg3.weather_dublin_daily5yr
(where=(day(Date)=15 and
month(Date) le 3 and
year(Date) in (2013,2014))
keep=Date TempDailyAvg);
Year=year(Date);
Month=month(Date);
run;

data work.dublin_daily_monthly;
merge work.dublin_daily5yr
work.dublin_monthly5yr;
by Year Month;
Difference=TempDailyAvg-TempMonthlyAvg;
drop Year Month;

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-36 Lesson 3 Defining and Processing Arrays

Two-Dimensional Array Process


pg3. weather_dublin_daily5yr pg3. weather_dublin_monthly5yr

+
1. Load the monthly temperatures into a two-dimensional array.

56
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Two-Dimensional Array Process


pg3. weather_dublin_daily5yr pg3. weather_dublin_monthly5yr

+
2. Read the daily temperature one row at a time.

57
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.3 Defining and Referencing Two-Dimensional Arrays 3-37

Two-Dimensional Array Process


pg3. weather_dublin_daily5yr pg3. weather_dublin_monthly5yr

+
3. Look up the monthly temperature based on the daily date.

58
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Two-Dimensional Array Process


pg3. weather_dublin_daily5yr pg3. weather_dublin_monthly5yr

+
4. Calculate the difference between the daily and monthly temperatures.

59
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-38 Lesson 3 Defining and Processing Arrays

Two-Dimensional Array with Initial Values


Initial values are specified in row order
(that is, values from the first row and
rows columns
then values from the second row).

array Avg[2,3] (40.9, 40.7, 38.6, 42.5, 42.6, 45.4);

col 1 col 2 col 3

row 1 1,1 1,2 1,3

row 2 2,1 2,2 2,3

60
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Two-Dimensional Array with Initial Values


rows columns

array Avg[2,3] (40.9, 40.7, 38.6, 42.5, 42.6, 45.4);

PDV Avg[1,1] Avg[1,2] Avg[1,3] Avg[2,1] Avg[2,2] Avg[2,3]


... R Avg1 R Avg2 R Avg3 R Avg4 R Avg5 R Avg6
40.9 40.7 38.6 42.5 42.6 45.4

61
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.3 Defining and Referencing Two-Dimensional Arrays 3-39

Two-Dimensional Array with Initial Values


rows columns

array Avg[2013:2014,3] (40.9, 40.7, 38.6, 42.5,42.6, 45.4);

PDV Avg[2013,1] Avg[2013,2] Avg[2013,3] Avg[2014,1] Avg[2014,2] Avg[2014,3]


... R Avg1 R Avg2 R Avg3 R Avg4 R Avg5 R Avg6
40.9 40.7 38.6 42.5 42.6 45.4

62
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-40 Lesson 3 Defining and Processing Arrays

Processing Two-Dimensional Arrays: Part 1

Scenario
Use the DATA step to calculate the difference in the average daily temperature from the average
monthly temperature. The average monthly temperatures are stored in a two-dimensional array
using initial values.

Files
• p303d04.sas
• weather_dublin_daily5yr – a SAS table that contains the average temperature and total
precipitation per day for the five-year period of 2013 through 2017 for Dublin, Ireland

Notes
• In a two-dimensional array, elements are referenced using two numbers. The first number
corresponds to the row, and the second number corresponds to the column.
• Initial values are loaded into a two-dimensional array by specifying the values one row at a time.

Demo
Note: The debugger portion of this demo must be performed in Enterprise Guide.
1. Open the p303d04.sas program in the demos folder and find the Demo section. Notice the
ARRAY statement that creates the two-dimensional array Avg. This array has two rows defined
with year values of 2013 and 2014 and three columns defined with month values of 1, 2, and 3.
The array creates six new columns, Avg1-Avg6. For simplicity purposes, the SET statement
contains a WHERE= data set option limiting the daily average temperatures to the 15th day of
January, February, or March for the years 2013 and 2014.
data work.DublinDaily;
array Avg[2013:2014,3] (40.9, 40.7, 38.6,
42.5, 42.6, 45.4);
set pg3.weather_dublin_daily5yr
(where=(day(Date)=15 and
month(Date) le 3 and
year(Date) in (2013,2014))
keep=Date TempDailyAvg);
Y=year(Date);
M=month(Date);
TempMonthlyAvg=Avg[Y,M];
Difference=TempDailyAvg-TempMonthlyAvg;
run;

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.3 Defining and Referencing Two-Dimensional Arrays 3-41

2. In Enterprise Guide, click the Toggle DATA Step Debugger toolbar button to enable
debugging in the program. Click the Debugger icon next to the DATA statement. The DATA Step
Debugger window appears. Notice that at the beginning of execution the six average columns
are populated with the initial values from the ARRAY statement.

3. Click the Step execution to the next line toolbar button to execute through the
statements. As the SET statement executes, Date and TempDailyAvg are populated. The Y
and M columns are populated from the first two assignment statements. TempMonthlyAvg is
populated by the third assignment statement, which locates the desired monthly average
temperature in the array using Y and M for the lookup. Continue clicking through the six
iterations of the DATA step and notice that each iteration uses the appropriate value from the
array.
4. Close the DATA Step Debugger window.
5. Run the DATA step. Notice that the output table contains the six average columns.
6. Add _temporary_ to the ARRAY statement prior to the initial values.
array Avg[2013:2014,3] _temporary_
(40.9, 40.7, 38.6,
42.5, 42.6, 45.4);
7. Click the Debugger icon next to the DATA statement to open the DATA Step Debugger window.
Notice that the six average columns do not appear in the debugger because the columns are
defined as temporary. Close the DATA Step Debugger window.
8. Run the DATA step. Verify that the output table contains the desired output.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-42 Lesson 3 Defining and Processing Arrays

9. As an alternative, eliminate the assignment statements for the Y and M columns. In the
assignment statement for the TempMonthlyAvg column, use expressions for the rows and
columns within the reference to the Avg array.
TempMonthlyAvg=Avg[year(Date),month(Date)];
10. Run the DATA step. Verify that the output table contains the desired output.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.3 Defining and Referencing Two-Dimensional Arrays 3-43

Scenario

Determine the dates in


January and February of
2015 through 2017 when
the daily precipitation
exceeded 0.3 inches and
exceeded 20% of the
monthly precipitation.

64
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Two-Dimensional Array Process


pg3. weather_dublin_daily5yr pg3. weather_dublin_monthly5yr

where Precip > 0.3;


65
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-44 Lesson 3 Defining and Processing Arrays

3.05 Activity
Open p303a05.sas from the activities folder and perform the following tasks:
1. Add an ARRAY statement to create a two-dimensional array.
• Name the array PMT.
• The row dimension should reference the values 2015, 2016, and 2017.
• The column dimension should reference the values 1 to 2.
• The array elements should be temporary.
• Use the following as the six initial values:

2. Run the program and view the results.


3. How many dates have daily precipitation greater than 0.3 inches
and greater than 20% of the monthly precipitation?
66
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Two-Dimensional Array Loaded from Table


5 year rows 12 month columns

array Avg[2013:2017,12] _temporary_;

pg3. weather_dublin_monthly5yr

68
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.3 Defining and Referencing Two-Dimensional Arrays 3-45

Two-Dimensional Array Loaded from Table


5 year rows 12 month columns

array Avg[2013:2017,12] _temporary_;

PDV Avg[2013,1] Avg[2013,2] Avg[2013,3] Avg[2017,10] Avg[2017,11] Avg[2017,12]


... D R D R D R ... D R D R D R
40.9 40.7 38.6 53.5 44.0 41.6

60 elements

69
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Two-Dimensional Array Loaded from Table

Load the array array Avg[2013:2017,12] _temporary_;


Read 5 rows
only the first
if _N_=1 then do Year=2013 to 2017; from the table.
time through
the DATA step. set pg3.weather_dublin_monthly5yr
(keep=Temp1-Temp12);
array T[12] Temp1-Temp12;

do Month=1 to 12; Read 12 columns


Avg[Year,Month]=T[Month]; per row.
end;

end;

70
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-46 Lesson 3 Defining and Processing Arrays

Processing Two-Dimensional Arrays: Part 2

Scenario
Use the DATA step to calculate the difference in the average daily temperature from the average
monthly temperature. The average monthly temperatures are loaded into a two-dimensional array
from a SAS table.

Files
• p303d05.sas
• weather_dublin_monthly5yr – a SAS table that contains the average monthly temperatures and
total quarterly precipitation values per year for the five-year period of 2013 through 2017 for
Dublin, Ireland

• weather_dublin_daily5yr – a SAS table that contains the average temperature and total
precipitation per day for the five-year period of 2013 through 2017 for Dublin, Ireland

Notes
• Use the syntax IF _N_=1 THEN to load only the average monthly temperature data into an array
the first time through the DATA step.
• Use a SET statement inside a DO loop to read in each row of the average monthly temperature
data to load the two-dimensional array.
• A nested DO loop is needed to load multiple columns of data for each row.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.3 Defining and Referencing Two-Dimensional Arrays 3-47

Demo
Note: The debugger portion of this demo must be performed in Enterprise Guide.
1. Open the p303d05.sas program in the demos folder and find the Demo section. Notice the
ARRAY statement that creates the two-dimensional array Avg. This array is defined using 5 rows
of year values and 12 columns of month values. The array creates 60 new columns, Avg1
through Avg60. For simplicity, the second SET statement contains a WHERE= data set option
that limits the daily average temperatures to the 15th day of the month.
data work.DublinDaily;
array Avg[2013:2017,12];
if _N_=1 then do Yr=2013 to 2017;
set pg3.weather_dublin_monthly5yr(keep=Temp1-Temp12);
Avg[Yr,1]=Temp1;
Avg[Yr,2]=Temp2;
Avg[Yr,3]=Temp3;
Avg[Yr,4]=Temp4;
Avg[Yr,5]=Temp5;
Avg[Yr,6]=Temp6;
Avg[Yr,7]=Temp7;
Avg[Yr,8]=Temp8;
Avg[Yr,9]=Temp9;
Avg[Yr,10]=Temp10;
Avg[Yr,11]=Temp11;
Avg[Yr,12]=Temp12;
end;
set pg3.weather_dublin_daily5yr(where=(day(Date)=15)
keep=Date TempDailyAvg);
Y=year(Date);
M=month(Date);
TempMonthlyAvg=avg[Y,M];
Difference=TempDailyAvg-TempMonthlyAvg;
keep Date TempDailyAvg TempMonthlyAvg Difference;
run;
2. In Enterprise Guide, use the DATA Step Debugger.

a. Click the Toggle DATA Step Debugger toolbar button to enable debugging in the
program. Click the Debugger icon next to the DATA statement. The DATA Step Debugger
window appears. Notice that at the beginning of execution there are 60 average columns
with missing values.

b. Click the Step execution to the next line toolbar button to execute through the
statements that are loading the two-dimensional array. Observe that the values of Yr, Temp1
through Temp12, Month, and Avg1 through Avg60 change.
c. To speed through the loading of the array, put a watch on the Avg60 column. Click the
Start/continue debugger execution toolbar button . This continues the execution until
the Avg60 column has been changed.

d. Click the Step execution to the next line toolbar button . Observe that the values of
Date, TempDailyAvg, Y, M, TempMonthlyAvg, and Difference change.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-48 Lesson 3 Defining and Processing Arrays

e. Continue clicking the Step execution to the next line toolbar button until _N_ changes
to 2 (the second iteration of the DATA step).
f. Notice the values of Avg1 through Avg60 are reset to missing. Because these values are
assigned by the 12 assignment statements, they are reset to missing at the beginning of
each iteration.
g. Close the DATA Step Debugger window.
3. Add _temporary_ to the two-dimensional ARRAY statement, which retains the loaded values
and eliminates the columns from the output table.
array Avg[2013:2017,12] _temporary_;
4. Modify the IF-THEN block to eliminate the repetitive assignment statements.
if _N_=1 then do Yr=2013 to 2017;
set pg3.weather_dublin_monthly5yr(keep=Temp1-Temp12);
array T[12] Temp1-Temp12;
do Month=1 to 12;
Avg[Yr,Month]=T[Month];
end;
end;
5. Run the DATA step. Verify that the output table contains the desired output.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.3 Defining and Referencing Two-Dimensional Arrays 3-49

Array Advantages and Disadvantages

Advantages Disadvantages
❑ Simplifies programs for ❑ Size of array defined at
processing repetitive code, compile time
rotating data, and performing ❑ Cannot combine character
table lookup and numeric elements in an
❑ Fast processing time because array
the array is stored in memory ❑ Must use a numeric value to
❑ Potentially eliminates the reference an element
need for multiple steps ❑ Returns only a single value
❑ Ability to use non-sorted data when referencing an element

72
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Beyond SAS Programming 3


What if you want to ...

. . . access SAS documentation . . . review array topics from a


and examples for arrays? foundation perspective?
• Arrays Made Easy: An Introduction to
• Array Processing in SAS® Language Arrays and Array Processing by Steve
Reference: Concepts First and Teresa Schudrowitz (paper)

• ARRAY Statement and Array Reference in • Adventures in Arrays: A Beginning


DATA Step Statements: Reference Tutorial by Jane Stroupe (paper)

• Using Arrays in SAS® Programming


(technical paper)

73
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

You can find direct links to these resources in the Course Links section on the Extended Learning
page.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-50 Lesson 3 Defining and Processing Arrays

Practice

If you restarted your SAS session, open and submit the libname.sas program in the course files.

Level 1
4. Using a Two-Dimensional Array Based on Initial Values
The pg3.storm_stats table contains statistics such as MaxWindMPH for storms from seasons
1980 to 2016. For the storms in season 1980 and 1981, calculate the difference in a storm’s
MaxWindMPH compared to the quarterly maximum wind speed per season as shown below.
Quarter
1 2 3 4
1980 132 121 190 138
Season
1981 127 109 138 127
a. Open the p303p04.sas program in the practices folder. Run the program to view the
MaxWindMPH for storms from seasons 1980 and 1981.
b. Add an ARRAY statement to create a two-dimensional array named MWtable.
1) The row dimension is based on two rows with a lower boundary of 1980 and an upper
boundary of 1981.
2) The column dimension is based on four columns corresponding to the four quarters.
3) The array needs to be temporary.
4) The array elements should include the initial values as shown in the table above.
c. Add an assignment to create a column named MaxWindSQ. The value of this column will be
retrieved from the MWtable array using Season for the row dimension and Qtr for the
column dimension.
d. Add MaxWindSQ to the VAR statement in the PROC PRINT step. Run the program and
verify the results.

e. Add an assignment statement to create a column named Difference that is equal to


MaxWindMPH minus MaxWindSQ.
f. Add Difference to the VAR statement in the PROC PRINT step. Run the program and verify
the results. What is the value of Difference for the 1980 storm named Lester (row 55)?

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.3 Defining and Referencing Two-Dimensional Arrays 3-51

Level 2
5. Using a Two-Dimensional Array Based on Loading a Table
The pg3.storm_stats table contains statistics such as MaxWindMPH for storms from seasons
1980 through 2016. The pg3.storm_maxwindseasqtr table contains quarterly maximum wind
speeds (MaxWindQ1 to MaxWindQ4) for the same seasons. Calculate the difference in a
storm’s MaxWindMPH value compared to the quarterly maximum wind speed per season.
a. Open the p303p05.sas program in the practices folder. Run the program to view
MaxWindMPH for storms from season 1980 to 2016.
b. Add an ARRAY statement to create a two-dimensional array named MWtable.
1) The row dimension is based on 37 rows with a lower boundary of 1980 and an upper
boundary of 2016.
2) The column dimension is based on four columns corresponding to the four quarters.
3) The array needs to be temporary.
c. Load the two-dimensional array with the pg3.storm_maxwindseasqtr table.
1) Load the array only during the first iteration of the DATA step.
2) Use an outer DO loop to iterate through each season.
3) Create a one-dimensional array referencing MaxWindQ1-MaxWindQ4.
4) Use an inner DO loop to iterate through the quarters.
5) Load the two-dimensional array by referencing the one-dimensional array.
d. Add an assignment to create a column named MaxWindSQ. The value of this column will be
retrieved from the MWtable array using Season for the row dimension and Qtr for the
column dimension.
e. Add an assignment statement to create a column named Difference that is equal to
MaxWindMPH minus MaxWindSQ.
f. Add MaxWindSQ and Difference to the VAR statement in the PROC PRINT step.
g. Eliminate the index columns and MaxWindQ1-MaxWindQ4. Run the program and verify the
results. What is the value of Difference for the 2016 storm named Zena (row 3038)?

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-52 Lesson 3 Defining and Processing Arrays

Challenge
6. Using a Three-Dimensional Array
The pg3.product_warehouse table contains the warehouse location for all products. The
warehouse location is per each ProductLine (values from 21 to 24), ProductCatID (values from
0 to 8), and ProductLocID (values from 0 to 9). The pg3.product_list table contains the
ProductID and ProductName values for all products. ProductID consists of 12 digits. The digits
in positions 1 and 2 represent ProductLine. The digits in positions 3 and 4 represent
ProductCatID. The digit in position 12 represents ProductLocID. Look up the warehouse
location using digits from ProductID.
a. Open the p303p06.sas program in the practices folder. Run the program to view the list
of products.
b. Add an ARRAY statement to create a three-dimensional array named W.
1) The first dimension is based on four ProductLine values with a lower boundary of 21
and an upper boundary of 24.
2) The second dimension is based on nine ProductCatID values with a lower boundary
of 0 and an upper boundary of 8.
3) The third dimension is based on 10 ProductLocID values with a lower boundary
of 0 and an upper boundary of 9.
4) The array is based on character elements with a byte size of 5.
5) The array needs to be temporary.
c. Load the three-dimensional array with the pg3.product_warehouse table.
1) Load the array only during the first iteration of the DATA step.
2) Use a DO loop to iterate through each row of the input table (116 observations).
3) Load the three-dimensional array by referencing the Warehouse column.
b. Add an assignment statement to create a column named WarehouseLoc. The value of this
column will be retrieved from the W array using ProductLine for the first dimension,
ProductCatID for the second dimension, and ProductLocID for the third dimension.
c. Add a KEEP statement to include ProductID, ProductName, and WarehouseLoc in the
output table.
d. Run the program and verify the results. What is the warehouse location for the product
Children’s Mitten (row 10)?

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.4 Solutions 3-53

3.4 Solutions
Solutions to Practices
1. Using One-Dimensional Arrays on Numeric Data
data work.MonthlyOcc;
set pg3.eu_occ(drop=Geo);
OccTotal=sum(Hotel,ShortStay,Camp);
array OccType[3] Hotel ShortStay Camp;
array OccPct[3] HotelPct ShortStayPct CampPct;
do Num=1 to 3;
OccPct[Num]=OccType[Num]/OccTotal;
end;
format Hotel ShortStay Camp OccTotal comma16.
HotelPct ShortStayPct CampPct percent8.1;
drop Num;
run;

title 'Percentage of Occupancy by Type';


proc print data=work.MonthlyOcc;
run;
title;
In general, which percentages are higher (hotel, short stay, or camp)? HotelPct
2. Using One-Dimensional Arrays on Character Data
data work.TestScores;
set pg3.test_answers;
Score=0;
array EmpAnswer[10] Q1-Q10;
array CorAnswer[10] $ 1 _temporary_
('A','C','C','B','E','E','D','B','B','A');
do i=1 to 10;
if EmpAnswer[i]=CorAnswer[i] then Score+1;
end;
drop i;
run;

title 'Employee Test Results';


proc print data=work.TestScores;
run;
title;
How many employees have a perfect score? Four

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-54 Lesson 3 Defining and Processing Arrays

3. Calculating the Differences between Array Elements


data work.WindDifference;
set pg3.storm_range;
array Wind[4] Wind1-Wind4;
array Diff[3] Diff12 Diff23 Diff34;
do i=1 to 3;
Diff[i]=Wind[i]-Wind[i+1];
end;
run;

title 'Storm Wind Differences (first 10 rows)';


proc print data=work.WindDifference(obs=10);
var Name Basin StartYear Wind1-Wind4 Diff12 Diff23 Diff34;
run;
title;

title 'Summary of Storm Wind Differences';


proc means data=work.WindDifference maxdec=1;
var Diff12 Diff23 Diff34;
run;
title;
Based on the PROC MEANS results, what is the maximum value of Diff12, Diff23, and Diff34?
• The maximum value of Diff12 is 20.
• The maximum value of Diff23 is 30.
• The maximum value of Diff34 is 30.
4. Using a Two-Dimensional Array Based on Initial Values
data work.MaxWind;
set pg3.storm_stats;
where Season between 1980 and 1981;
Qtr=qtr(StartDate);
array MWtable[1980:1981,4] _temporary_ (132,121,190,138,
127,109,138,127);
MaxWindSQ=MWTable[Season,Qtr];
Difference=MaxWindMPH-MaxWindSQ;
run;

title 'Maximum Winds for Storms Between 1980 and 1981';


proc print data=work.MaxWind;
var Season Qtr Name MaxWindMPH MaxWindSQ Difference;
run;
title;
What is the value of Difference for the 1980 storm named Lester (row 55)? -150

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.4 Solutions 3-55

5. Using a Two-Dimensional Array Based on Loading a Table


data work.MaxWind;
array MWTable[1980:2016,4] _temporary_;
if _N_=1 then do S=1980 to 2016;
set pg3.storm_maxwindseasqtr;
array MWQ[4] MaxWindQ1-MaxWindQ4;
do Q=1 to 4;
MWTable[S,Q]=MWQ[Q];
end;
end;
set pg3.storm_stats;
Qtr=qtr(StartDate);
MaxWindSQ=MWTable[Season,Qtr];
Difference=MaxWindMPH-MaxWindSQ;
drop S Q MaxWindQ1-MaxWindQ4;
run;

title 'Maximum Winds for Storms Between 1980 and 2016';


proc print data=work.MaxWind;
var Season Qtr Name MaxWindMPH MaxWindSQ Difference;
run;
title;
What is the value of Difference for the 2016 storm named Zena (row 3038)? -74
6. Using a Three-Dimensional Array
data work.warehouses;
array W[21:24,0:8,0:9] $ 5 _temporary_;
if _N_=1 then do i=1 to 116;
set pg3.product_warehouse;
W[ProductLine,ProductCatID,ProductLocID]=Warehouse;
end;
set pg3.product_list;
ProdID=put(ProductID,12.);
ProductLine=input(substr(ProdID,1,2),2.);
ProductCatID=input(substr(ProdID,3,2),2.);
ProductLocID=input(substr(ProdID,12,1),1.);
WarehouseLoc=W[ProductLine,ProductCatID,ProductLocID];
keep ProductID ProductName WarehouseLoc;
run;

title 'Warehouse Location for Products';


proc print data=work.warehouses;
run;
title;
What is the warehouse location for the product Children’s Mitten (row 10)? A2122

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-56 Lesson 3 Defining and Processing Arrays

Solutions to Activities and Questions

3.01 Activity – Correct Answer


data work.DublinMadrid2017(drop=Month);
set pg3.weather_dublinmadrid_monthly2017
(keep=City Temp1-Temp12);
array Temperature[12] Temp1-Temp12;
do Month=1 to 12;
Temperature[Month]=(Temperature[Month]-32)*5/9;
end;
format Temp1-Temp12 6.1;
run;

What if I don’t know how many


columns make up the array?

19
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

3.02 Activity – Correct Answer


data work.DublinMadrid2018(drop=Month);
set pg3.weather_dublinmadrid_monthly2018
(keep=City Temp:);
array Temperature[*] Temp:;
do Month=1 to dim(Temperature);
Temperature[Month]=(Temperature[Month]-32)*5/9;
end;
format Temp: 6.1;
run;

Five temperature columns are


in the array for the 2018 data.
23
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.4 Solutions 3-57

3.03 Activity – Correct Answer


data work.DublinPrecipRotate;
set pg3.weather_dublinmadrid_monthly5yr
(keep=City Year PrecipQ1-PrecipQ4);
where City='Dublin';
array P[4] PrecipQ1-PrecipQ4;
do Quarter=1 to 4;
Precip=P[Quarter]*2.54;
output;
end;
format Precip 6.2;
drop PrecipQ1-PrecipQ4;
run;

What is the highest average quarterly precipitation in centimeters for Dublin?


23.16 cm for quarter 4
39
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

3.04 Multiple Answer Question – Correct Answer

ARRAY array-name[number-of-elements] < $ length array-elements (initial-values) > ;


The items in the ARRAY statement must be specified in this order.

45
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-58 Lesson 3 Defining and Processing Arrays

3.05 Activity – Correct Answer


data work.DublinPrecipPct(drop=Y M);
Array PMT[2015:2017,2] _temporary_ (2.29, 1.04,
4.15, 2.34,
0.90, 2.44);
set pg3.weather_dublin_daily5yr(keep=Date Precip);
Where month(Date) le 2 and year(Date) ge 2015
and Precip > 0.3;
Y=year(Date);
M=month(Date);
PrecipMonthlyTotal=PMT[Y,M];
PrecipMonthlyPct=Precip/PrecipMonthlyTotal;
format PrecipMonthlyPct percent8.1;
run;

How many dates have daily precipitation greater than 0.3 inches and greater
than 20% of the monthly precipitation? Four
67
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
Lesson 4 Defining and Processing
Hash Objects
4.1 Declaring Hash Objects.......................................................................................................... 4-3

4.2 Defining Hash Objects.......................................................................................................... 4-16


Demonstration: Declaring and Defining a Hash Object ....................................................... 4-19

4.3 Finding Key Values in a Hash Object ................................................................................. 4-23


Demonstration: Performing a Table Lookup with the FIND Method .................................... 4-26
Practice ................................................................................................................................. 4-29

4.4 Writing a Hash Object to a Table ......................................................................................... 4-32


Demonstration: Creating a Table with the ADD and OUTPUT Methods .............................. 4-36
Practice ................................................................................................................................. 4-40

4.5 Using Hash Iterator Objects................................................................................................. 4-43


Demonstration: Reading Data in Forward and Reverse Direction ....................................... 4-48
Practice ................................................................................................................................. 4-52

4.6 Solutions ................................................................................................................................ 4-55


Solutions to Practices ........................................................................................................... 4-55
Solutions to Activities and Questions ................................................................................... 4-63
4-2 Lesson 4 Defining and Processing Hash Objects

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.1 Declaring Hash Objects 4-3

4.1 Declaring Hash Objects

Performing a Table Lookup: Example 1

Calculate the percentage of US


city population within the state
population, storing the results by
descending city population.

PctPop=CityPop2017/StatePop2017;

3
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Performing a Table Lookup: Example 1


pg3.population_usstates (50 rows) pg3.population_uscities (19,500 rows)

+
Before calculating the population
percentage, the state populations
must be combined with the city
populations.
StateName=stnamel(StateCode);
4
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-4 Lesson 4 Defining and Processing Hash Objects

How many steps are needed for this


Discussion example if you want to use a DATA
step with a MERGE statement?

Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

DATA Step MERGE Versus DATA Step Hash Object


DATA step read cities table to
with SET create StateName

PROC SORT sort cities table • load states table into


by StateName DATA step hash object
step
with SET • read cities table and
PROC SORT sort states table create StateName
and
step by StateName • look up StateName in
Hash hash object and if found
DATA step Object return states data
merge tables to
with MERGE calculate PctPop • calculate PctPop

PROC SORT sort new table by


step descending CityPop2017

7
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved. p304d01

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.1 Declaring Hash Objects 4-5

Here is a DATA step MERGE solution for this example. This MERGE statement solution involves
multiple DATA and PROC SORT steps versus the hash object solution involves only one DATA step.
Refer to demo p304d01 for the MERGE solution and the hash object solution.
data work.uscities;
set pg3.population_uscities;
StateName=stnamel(StateCode);
run;

proc sort data=work.uscities;


by StateName;
run;

proc sort data=pg3.population_usstates


out=work.usstates;
by StateName;
run;

data work.StateCityPopulation;
merge work.usstates work.uscities(in=C);
by StateName;
if C;
PctPop=CityPop2017/StatePop2017;
format StatePop2017 comma14. PctPop percent8.1;
run;

proc sort data=work.StateCityPopulation;


by descending CityPop2017;
run;

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-6 Lesson 4 Defining and Processing Hash Objects

What Is a Hash Object?

KEY KEY DATA DATA DATA


col_A col_B col_C col_D col_E

A hash object is an
in-memory table
that contains key
and data
components.

8
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

What Is a Hash Object?

KEY KEY DATA DATA DATA


You can quickly and efficiently col_A col_B col_C col_D col_E
store, search, and retrieve data
based on the key components.

When the key component values


are found, the data component
values are copied to the PDV.

9
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.1 Declaring Hash Objects 4-7

Key Components

KEY KEY DATA DATA DATA


• You must have at least one key col_A col_B col_C col_D col_E
component. Multiple key
components are supported.
• Key components can be
numeric, character, or both.
• Key components must be
defined as PDV columns.
• The data does not need to be
sorted by the key components.
• By default, each row of key
components must be unique.

10
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Data Components

KEY KEY DATA DATA DATA


• There can be multiple data col_A col_B col_C col_D col_E
components per each row of
key components.
• Data components can be
numeric, character, or both.
• Data components must be
defined as PDV columns.

11
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-8 Lesson 4 Defining and Processing Hash Objects

What Is a Hash Object?

data ... ;

... A hash object


exists only for A hash object is
run; the duration of created and sized
the DATA step. dynamically at
execution time.

12
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Hash Object Process: Example 1


Hash Object STATES (3 cols and 50 rows)
KEY DATA DATA
StateName Capital StatePop2017
States table
California Sacramento 39536653
Texas Austin 28304596
... ... ...

PDV
StateName Capital StatePop2017 CityName StateCode CityPop2017 PctPop _N_
. . . 1

13
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.1 Declaring Hash Objects 4-9

Hash Object Process: Example 1


Hash Object STATES (3 cols and 50 rows)
KEY DATA DATA
StateName Capital StatePop2017
States table
California Sacramento 39536653
Texas Austin 28304596
... ... ...

Remember that key and data components


must also be defined as columns in the PDV.
PDV
StateName Capital StatePop2017 CityName StateCode CityPop2017 PctPop _N_
. . . 1

14
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Hash Object Process: Example 1


Hash Object STATES (3 cols and 50 rows)
KEY DATA DATA
StateName Capital StatePop2017
California Sacramento 39536653
Texas Austin 28304596
... ... ...

Cities table

PDV
StateName Capital StatePop2017 CityName StateCode CityPop2017 PctPop _N_
. Los Angeles CA 3999759 . 2

15
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-10 Lesson 4 Defining and Processing Hash Objects

Hash Object Process: Example 1


Hash Object STATES (3 cols and 50 rows)
KEY DATA DATA
StateName Capital StatePop2017
California Sacramento 39536653
Texas Austin 28304596
... ... ...

StateName=stnamel(StateCode);
PDV
StateName Capital StatePop2017 CityName StateCode CityPop2017 PctPop _N_
California . Los Angeles CA 3999759 . 2

16
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

The STNAMEL function converts a two-character state postal code (or worldwide GSA geographic
code for US territories) to the corresponding state name in mixed case. If the STNAMEL function
returns a value to a column that has not yet been assigned a length, by default, the column is
assigned a length of 20.

Hash Object Process: Example 1


Hash Object STATES (3 cols and 50 rows)
KEY DATA DATA
StateName Capital StatePop2017
California Sacramento 39536653
Texas Austin 28304596
... ... ...
The hash object is efficiently
searched by the key component.

PDV
StateName Capital StatePop2017 CityName StateCode CityPop2017 PctPop _N_
California . Los Angeles CA 3999759 . 2

17
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

The hash object is stored as AVL trees (self-balancing binary search trees), which allows for efficient
searching.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.1 Declaring Hash Objects 4-11

Hash Object Process: Example 1


Hash Object STATES (3 cols and 50 rows)
KEY DATA DATA
StateName Capital StatePop2017
California Sacramento 39536653
Texas Austin 28304596
... ... ...

PDV
StateName Capital StatePop2017 CityName StateCode CityPop2017 PctPop _N_
California Sacramento 39536653 Los Angeles CA 3999759 . 2

18
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Hash Object Process: Example 1


Hash Object STATES (3 cols and 50 rows)
KEY DATA DATA
StateName Capital StatePop2017
California Sacramento 39536653
Texas Austin 28304596
... ... ...

PctPop=CityPop2017/StatePop2017;
PDV
StateName Capital StatePop2017 CityName StateCode CityPop2017 PctPop _N_
California Sacramento 39536653 Los Angeles CA 3999759 0.4344 2

19
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-12 Lesson 4 Defining and Processing Hash Objects

Hash Object Documentation

Hash objects are


documented under
DATA Step
Component Objects.

20
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

4.01 Activity
1. Go to the Programming: SAS 9.4 and SAS
Viya documentation.
2. Under DATA Step, select Component
Objects.
3. Under Dictionary of Hash and Hash Iterator
Object Language Elements, select the
DECLARE statement.
4. How many argument_tag:values are valid
for the DECLARE statement?

21
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Link
• Programming: SAS 9.4 and SAS Viya

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.1 Declaring Hash Objects 4-13

Declaring a Hash Object

DECLARE object object-name(<argument_tag-1: value-1, … >);

hash
The DECLARE
statement creates
an instance of and
initializes data for
an object.

23
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Declaring a Hash Object

declare hash States( );

name of the hash object

24
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-14 Lesson 4 Defining and Processing Hash Objects

Declaring a Hash Object

(DATASET: 'data-set-name <(data-set-option)>')

declare hash States(data set:'pg3.population_usstates');

name of the table to load into the hash object

declare hash States(dataset:'pg3.population_usstates


(where=(StatePop2017>20000000))');

data set option


25
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

When specifying the DATASET argument, the table name can be specified as a character literal in
quotation marks, a character column, or a character expression. Refer to demo p304d01 for
examples of the three methods.
Character literal:
declare hash States(dataset: 'pg3.population_usstates');
Character column:
declare hash States(dataset: tablename);
Character expression:
declare hash States(dataset: cats('pg3.population_',location));
If the table contains duplicate keys, the default is to load the first instance in the hash object.
Subsequent instances are ignored. Use the DUPLICATE argument to store the last instance in the
hash object or write an error message to the SAS log. Use the MULTIDATA argument to allow
multiple data items for a key.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.1 Declaring Hash Objects 4-15

Hash Object Process


declare
hash DECLARE object object-name(<argument_tag-1: value-1, … >);
object

object-name.DEFINEKEY('key-1' < , … 'key-n' >);


define
hash object-name.DEFINEDATA('data-1' < , … 'data-n' >);
object
object-name.DEFINEDONE( );

use
hash object-name.FIND(<KEY: value-1, … KEY: value-n>)
object
26
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-16 Lesson 4 Defining and Processing Hash Objects

4.2 Defining Hash Objects

Hash Object Methods


object-name.definekey()

object-name.definedata()
Hash object
methods are object-name.definedone()
operations
performed on a
hash object using object-name.find()
dot notation.
object-name.add()

object-name.output()

28
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.2 Defining Hash Objects 4-17

Defining a Hash Object

object-name.DEFINEKEY('key-1' < , … 'key-n' >);

object-name.DEFINEDATA('data-1' < , … 'data-n' >);

object-name.DEFINEDONE( );
To define a hash
object, three
DEFINE methods
are needed.

29
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

• The DEFINEKEY method defines the column (or columns) that make up the key component
(or components). A column can be specified as a character literal in quotation marks, a character
column, or a character expression.
• The DEFINEDATA method defines the column (or columns) that make up the data component
(or components). A column can be specified as a character literal in quotation marks, a character
column, or a character expression.
• The DEFINEDONE method indicates that all key and data components are complete. If a
DATASET argument is used in the DECLARE statement, the table is loaded in to the hash object
at this time.
Multiple DEFINEKEY and DEFINEDATA statements can be used for one hash object.

object-name.DEFINEKEY('key-1');
object-name.DEFINEKEY('key-n');
object-name.DEFINEDATA('data-1');
object-name.DEFINEDATA('data-n');

ALL: 'YES' can be used with the DEFINEKEY or DEFINEDATA statements to specify that all table
columns are used as key or data components when the table is loaded into the hash object.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-18 Lesson 4 Defining and Processing Hash Objects

Defining a Hash Object


length StateName $ 20 Capital $ 14 StatePop2017 8;

if _N_=1 then do; declare and define the hash object only one time

declare hash States(dataset: 'pg3.population_usstates');

States.definekey('StateName'); defines the key components

States.definedata('Capital','StatePop2017'); defines the data


components
States.definedone();
loads the table into the hash object
call missing(StateName, Capital, StatePop2017);

end;

30
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Defining Key and Data Components as PDV Columns


length StateName $ 20 Capital $ 14 StatePop2017 8;

if _N_=1 then do; defines the key and data components as PDV columns
declare hash States(dataset: 'pg3.population_usstates');

States.definekey('StateName');

States.definedata('Capital','StatePop2017');

States.definedone();

call missing(StateName, Capital, StatePop2017);

end; sets the key and data columns to missing

31
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.2 Defining Hash Objects 4-19

Declaring and Defining a Hash Object

Scenario
In a DATA step, declare and define a hash object containing US states data.

Files
• p304d02.sas
• population_usstates – a SAS table containing the 50 states and capitals of the United States
sorted in descending order of the 2017 state population

Syntax

DECLARE object object-name(<argument_tag-1: value-1, … >);


object-name.DEFINEKEY('key-1' < , … 'key-n' >);
object-name.DEFINEDATA('data-1' < , … 'data-n' >);
object-name.DEFINEDONE( );

Notes
• A hash object is a table in memory that contains key and data components. A hash object
is available only to the DATA step that creates it, and it is defined at execution.
• You can quickly and efficiently store, search, and retrieve data based on the key components.
When the key components are found, the data components are copied to the PDV.
• You must have at least one key component. Multiple key components are supported. Key
components can be numeric, character, or both, and must be defined as PDV columns. The data
does not need to be sorted by the key components. By default, each row of key components must
be unique.
• There can be multiple data components per each row of key components. Data components can
be numeric, character, or both, and must be defined as PDV columns.
• A solution involving multiple steps can often be replaced with one DATA step that contains a hash
object.
• The DECLARE statement creates an instance of and initializes data for an object.
• Hash object methods are operations performed on a hash object using dot notation.
• To define a hash object, three DEFINE methods (DEFINEKEY, DEFINEDATA, and DEFINEDONE)
are needed.
• The LENGTH and CALL MISSING statements can be used to define key and data components
as PDV columns and initially assign missing values to them.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-20 Lesson 4 Defining and Processing Hash Objects

Demo
1. Open the p304d02.sas program in the demos folder and find the Demo section. Notice the
syntax to declare and define the States hash object with one key component and two data
components.
data work.StateCityPopulation;
if _N_=1 then do;
declare hash States(dataset: 'pg3.population_usstates');
States.definekey('StateName');
States.definedata('Capital','StatePop2017');
States.definedone();
end;
run;
2. Run the program. View the error in the SAS log concerning the undeclared key symbol
StateName. This error appears because the key component has not been defined as a column
in the PDV.
Note: You can open the DATA Step Debugger in SAS Enterprise Guide to see that none
of the hash object components have been defined in the PDV.
3. After the DATA statement, add a LENGTH statement to define StateName as character with
a byte size of 20.
length StateName $ 20;
4. Run the program. View the error in the SAS log concerning the undeclared data symbol Capital.
This error appears because the data component has not been defined as a column in the PDV.
5. Add to the LENGTH statement to define Capital as character with a byte size of 14 and
StatePop2017 as numeric with a byte size of 8.
length StateName $ 20 Capital $ 14 StatePop2017 8;
6. Run the program. View the uninitialized notes in the SAS log. These notes appear because SAS
does not see any syntax that is assigning values to the three columns in the PDV.
NOTE: Variable StateName is uninitialized.
NOTE: Variable Capital is uninitialized.
NOTE: Variable StatePop2017 is uninitialized.

7. To eliminate the uninitialized notes, add a CALL MISSING statement inside the DO block to
assign a missing value to the three columns during the first iteration of the DATA step.
length StateName $ 20 Capital $ 14 StatePop2017 8;
if _N_=1 then do;
declare hash States(dataset: 'pg3.population_usstates');
States.definekey('StateName');
States.definedata('Capital','StatePop2017');
States.definedone();
call missing(StateName, Capital, StatePop2017);
end;

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.2 Defining Hash Objects 4-21

8. Run the program. Verify that 50 rows were read to load the hash object and that a new table has
been created with 1 row of empty data.
NOTE: There were 50 observations read from the data set
PG3.POPULATION_USSTATES.
NOTE: The data set WORK.STATECITYPOPULATION has 1 observations
and 3 variables.
9. As an alternative, delete the LENGTH and CALL MISSING statements. Add a conditional SET
statement to the beginning of the DO block. This statement is never executed, but it makes
a spot in the PDV for every column that is in the table.
if _N_=1 then do;
if 0 then set pg3.population_usstates;
declare hash States(dataset: 'pg3.population_usstates');
States.definekey('StateName');
States.definedata('Capital','StatePop2017');
States.definedone();
end;
10. Run the program. Verify that 50 rows were read to load the hash object and that a new table has
been created with 1 row of empty data.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-22 Lesson 4 Defining and Processing Hash Objects

Performing a Table Lookup: Example 2


pg3.weather_ustop5_monthly2017 (60 rows) pg3.weather_ustop5_daily2017 (1,825 rows)

33
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

4.02 Activity
Open p304a02.sas from the activities folder and perform the following tasks:
1. Add two statements to the DATA step for the Monthly hash object:
▪ Add a DEFINEKEY method referencing the keys of City and Month.

object-name.DEFINEKEY('key-1', 'key-2');

▪ Add a DEFINEDATA method referencing the data of TempMonAvg


and PrecipMonSum.

object-name.DEFINEDATA('data-1', 'data-2');

2. Run the DATA step and confirm no errors in your SAS log. How many rows
were read from the input table into the hash object?
34
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.3 Finding Key Values in a Hash Object 4-23

4.3 Finding Key Values in a Hash Object

FIND Method

object-name.FIND(<KEY: value-1, … KEY: value-n>)

• returns a return code of zero if the key value


The FIND (or values) is found (a match)
method searches • returns a return code of a nonzero value
the hash object if the key value (or values) is not found
for the current (a non-match)
key value or
values.

37
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

FIND Method

If the value is
found, the FIND
method copies the
data component
values to the PDV.

38
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-24 Lesson 4 Defining and Processing Hash Objects

Match
Hash Object STATES (3 cols and 50 rows)
KEY DATA DATA
StateName Capital StatePop2017
California Sacramento 39536653
The key value
Texas Austin 28304596
is found.
... ... ...

RC=States.find(key:StateName);
PDV
StateName Capital StatePop2017 CityName StateCode CityPop2017 RC PctPop _N_
California Sacramento 39536653 Los Angeles CA 3999759 0 0.4344 2

39
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

If the key component value (or values) is found in the hash object, data values from the hash object
are copied to the PDV.

Non-Match
Hash Object STATES (3 cols and 50 rows)
KEY DATA DATA
StateName Capital StatePop2017
California Sacramento 39536653
The key value
Texas Austin 28304596
is not found.
... ... ...

RC=States.find(key:StateName);
PDV
StateName Capital StatePop2017 CityName StateCode CityPop2017 RC PctPop _N_
District of
. Washington DC 693972 160038 . 20
Columbia

40
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

If the key component value (or values) is not found, no data values from the hash object are copied
to the PDV.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.3 Finding Key Values in a Hash Object 4-25

KEY Argument
The KEY argument is not needed
States.find() if the PDV column has the same
name as the key component.

The KEY argument is required if


States.find(key:StateName) the PDV column has a different
name than the key component.

An expression can be
States.find(key:stnamel(StateCode)) specified as the value to
the KEY argument.

41
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Multiple KEY Arguments

Monthly.definekey('City','Month');

Monthly.find(key:City,key:month(Date));

If your hash object has multiple key


components, your KEY arguments must
be specified in the order in which the
key components are defined.

42
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved. p304d03

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-26 Lesson 4 Defining and Processing Hash Objects

Performing a Table Lookup with the FIND Method

Scenario
In a DATA step, read city population values from a table and find the appropriate state population
values in a hash object. Then calculate the percentage of city populations within a state.

Files
• p304d03.sas
• population_usstates – a SAS table containing the 50 states and capitals of the United States
sorted in descending order of the 2017 state population

• population_uscities – a SAS table containing 19,500 United States cities sorted in descending
order of the 2017 city population

Syntax

object-name.FIND(<KEY:value-1, … KEY:value-n>)

Notes
• The FIND method produces a return code that indicates whether the key component is found
in the hash object. A zero value indicates success (a match). A nonzero value indicates failure
(a non-match).
• If the key component is found in the hash object, data values from the hash object are copied to
the PDV. If the key component is not found, no data values from the hash object are copied to the
PDV.
• In the FIND method, the KEY argument is not needed if the PDV column has the same name as
the key component. The KEY argument is required if the PDV column has a different name than
the key component. An expression can be specified as the value to the KEY argument.
• If your hash object has multiple key components, your KEY arguments must be specified in the
order in which the key components are defined.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.3 Finding Key Values in a Hash Object 4-27

Demo
1. Open the p304d03.sas program in the demos folder and find the Demo section. Notice that
rows are being read from the pg3.population_uscities table and that StateCode is being used
to create StateName.
data work.StateCityPopulation;
length StateName $ 20 Capital $ 14 StatePop2017 8;
if _N_=1 then do;
declare hash States(dataset: 'pg3.population_usstates');
States.definekey('StateName');
States.definedata('Capital','StatePop2017');
States.definedone();
call missing(StateName, Capital, StatePop2017);
end;
set pg3.population_uscities;
StateName=stnamel(StateCode);

*PctPop=CityPop2017/StatePop2017;
*format StatePop2017 comma14. PctPop percent8.1;
run;
2. Highlight and run the DATA step. View the output table and notice that the values of Capital
and StatePop2017 are missing. There is no syntax in the DATA step for the StateName value
to be searched in the hash object.
3. Add an RC assignment statement after the StateName assignment statement to find the values
of StateName in the hash object.
RC=States.find(key:StateName);
4. Uncomment the PctPop assignment statement and the FORMAT statement.
5. Run the DATA step. Observe that the value of RC is 0 for all rows except row 20. The
StateName value District of Columbia does not have a match in the hash object. Notice the
values of Capital and StatePop2017 are missing for row 20. The values were reinitialized to
missing at the beginning of the iteration, and no values were retrieved from the hash object.

6. As an alternative, delete the LENGTH and CALL MISSING statements. Add a conditional SET
statement to the beginning of the DO block.
if 0 then set pg3.population_usstates;

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-28 Lesson 4 Defining and Processing Hash Objects

7. Run the DATA step and notice the difference for row 20. The Capital and StatePop2017 are
retained due to the nature of the SET statement. Therefore, you are seeing the previous values
for these two columns.

8. Add the following statement after the RC assignment statement to set the values of Capital and
StatePop2017 to missing when a match is not found. Run the program and confirm the results.
if RC ne 0 then call missing(Capital, StatePop2017);
9. Modify the DATA step to create an additional table that contains the population data for only the
capital cities.
data work.StateCityPopulation work.CapitalPopulation;
if _N_=1 then do;
if 0 then set pg3.population_usstates;
declare hash States(dataset: 'pg3.population_usstates');
States.definekey('StateName');
States.definedata('Capital','StatePop2017');
States.definedone();
end;
set pg3.population_uscities;
StateName=stnamel(StateCode);
RC=States.find(key:StateName);
if RC ne 0 then call missing(Capital, StatePop2017);
PctPop=CityPop2017/StatePop2017;
output work.StateCityPopulation;
if Capital=CityName then output work.CapitalPopulation;
format StatePop2017 comma14. PctPop percent8.1;
run;
10. Run the DATA step. Verify that work.StateCityPopulation contains 19,500 rows and
work.CapitalPopulation contains 50 rows. What would need to be added to the program if we
wanted work.CapitalPopulation in sorted order by descending PctPop values?

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.3 Finding Key Values in a Hash Object 4-29

Practice

If you restarted your SAS session, open and submit the libname.sas program in the course files.

Level 1
1. Performing a Table Lookup Using One Key
Create a hash object based on the pg3.np_codelookup table. This table contains national park
information. Read the columns ParkCode, State, and GrossAcres from the pg3.np_acres2
table. This table contains acreage amounts for the national parks. Look up the uppercase value
of ParkCode in the hash object to retrieve values of ParkName and Type.
a. Open the p304p01.sas program in the practices folder. Review the DATA step syntax.
1) What is the name of the hash object being created?
2) What table is loading the hash object?
3) What is the name of the key component?
4) What are the names of the data components?
b. Add an assignment statement to the DATA step to create the column RC, which is equal to
the return code from finding the ParkCode value in the hash object.

RC = object-name.FIND(<KEY: value>);

c. Run the DATA step. View the log and the output table.
1) Based on the log, how many rows from the pg3.np_codelookup table were read into the
hash object?
2) Based on the log, how many rows were read from the pg3.np_acres2 table?
3) Based on the log, how many rows are in the output table?
4) Based on the output table, how many ParkCode values are not found in the hash object
(RC not equal to 0)?
d. Add a subsetting IF statement to output only the RC values that are equal to 0 (matches).
e. Add a DROP statement to eliminate the RC column.
f. Run the program and verify the results. How many data rows are in the results?

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-30 Lesson 4 Defining and Processing Hash Objects

Level 2
2. Performing a Table Lookup Using Three Keys
Create a hash object based on the pg3.storm_range table. This table contains four wind
measurements for each combination of StartYear, Name, and Basin. Read the columns from
the pg3.storm_summary_cat345 table. This table contains information such as MaxWindMPH
and MinPressure for each combination of StartDate, Name, and Basin for category 3, 4, and 5
storms. Look up the appropriate values in the hash object to retrieve the four wind
measurements.
a. Open the p304p02.sas program in the practices folder. Add statements to the DATA step to
create a hash object named Storm.
1) Add a DECLARE statement to load the hash object Storm with the table
pg3.storm_range.

DECLARE HASH object-name(dataset: 'table');

2) Use the DEFINEKEY method to specify the key components of StartYear, Name, and
Basin.

object-name.DEFINEKEY('key-1', 'key-2', 'key-3');

3) Use the DEFINEDATA method to specify the data components of Wind1, Wind2, Wind3,
and Wind4.

object-name.DEFINEDATA('data-1', 'data-2', 'data-3', 'data-4');

4) Use the DEFINEDONE method to complete the hash object.

object-name.DEFINEDONE( );

b. Add an assignment statement to create the column ReturnCode, which is equal to the return
code from finding the key values in the hash object. You will need to use the YEAR function
on StartDate for the first key value followed by the key values of Name and Basin.

ReturnCode = object-name.FIND(KEY: value-1, KEY: value-2, KEY: value-3);

Note: Key values must be specified in the FIND method in the same order as specified with
the DEFINEKEY method.
c. Run the DATA step. View the log and the results.
1) How many rows from the pg3. storm_range table were read into the hash object?
2) How many rows were read from the pg3. storm_summary_cat345 table?
3) How many rows are in the output table?
4) How many key values are not found in the hash object (ReturnCode not equal to 0)?
5) Why is the StartYear column set to missing values?
d. Modify the assignment statement to be a subsetting IF statement to output only the FIND
values that are equal to 0 (matches).
e. Drop the StartYear column.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.3 Finding Key Values in a Hash Object 4-31

f. Run the program and verify the results. How many data rows are in the results?

Challenge
3. Performing a Table Lookup with Two Hash Objects and a Table
Create a hash object based on the pg3.storm_range table. This table contains four wind
measurements for each combination of StartYear, Name, and Basin. Create another hash
object based on the pg3.storm_basincodes table. This table contains the BasinName value for
each value of Basin. Read the columns from the pg3.storm_summary_cat345 table. This table
contains information such as MaxWindMPH and MinPressure for each combination of
StartDate, Name, and Basin for category 3, 4, and 5 storms. Look up the appropriate values in
the hash objects to retrieve the four wind measurements and the basin name.
a. Open the p304p03.sas program in the practices folder. Review the existing DATA step
syntax that is creating the Storm hash object and finding values in the hash object.
b. Declare and define an additional hash object.
1) Name the hash object BasinDesc and load the hash object with the table
pg3.storm_basincodes.
2) Specify a key component of Basin and a data component of BasinName.
c. In the IF 0 THEN DO block, include a SET statement for the pg3.storm_basincodes table.
d. After the ReturnCode1 assignment statement, add an assignment statement to create a
ReturnCode2 column. This column is equal to the return code from finding the Basin values
in the BasinDesc hash object.
e. Add conditional statements to write the output to the work.storm_cat345_facts table if both
return codes are equal to zero. Otherwise, write the output to the work.nonmatches table.
f. Run the program and verify the results. How many rows are in the work.nonmatches table?

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-32 Lesson 4 Defining and Processing Hash Objects

4.4 Writing a Hash Object to a Table

Scenario

This table needs to be ordered


by descending PctPop values.

46
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Writing a Hash Object to a Table


if Capital=CityName then output work.CapitalPopulation;

Table needs to be ordered


by descending PctPop.

if Capital=CityName then ... Add data to a new hash object.

Output the hash object to a table by descending PctPop after all cities are read.

47
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.4 Writing a Hash Object to a Table 4-33

ADD Method

object-name.ADD()

A zero value indicates success.


A nonzero value indicates failure.
The ADD method adds
data in the PDV columns
to the corresponding key
and data components
within a hash object.

48
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

OUTPUT Method

object-name.OUTPUT(DATASET: 'data-set-name <(data-set-option)>')

A zero value indicates success.


A nonzero value indicates failure.

The OUTPUT method


creates a table containing
the data components of the
hash object.

49
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-34 Lesson 4 Defining and Processing Hash Objects

Ordering with the OUTPUT Method

specifies how the data is returned in the key-


value order when the OUTPUT method is used

declare hash CapitalPopSort(ordered: 'descending');

(ORDERED: 'option' <, . . . >)

50
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

ORDERED: 'option' – Specifies whether or how the data is returned in key-value order if you use the
hash object with a hash iterator object or if you use the hash object OUTPUT method.
The 'option' can be one of the following values:
• 'ascending' | 'a' | 'yes' | 'y' – Data is returned in ascending key-value order.
• 'descending' | 'd' – Data is returned in descending key-value order.
• 'no' | 'n' – Data is returned in some undefined order (default).
The option value is not case sensitive.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.4 Writing a Hash Object to a Table 4-35

Key Components as Data Components

Only the data components are output when using the OUTPUT
method. Key components can be defined as part of the data
component to be included in the output table.

declare hash CapitalPopSort(ordered: 'descending');

CapitalPopSort.definekey('PctPop');

CapitalPopSort.definedata('PctPop',
'CityName','CityPop2017',
'StateName','StatePop2017');

CapitalPopSort.definedone();

51
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Controlling Output with the OUTPUT Method

Creates a temporary column that contains


an end-of-file indicator. The column is
initialized to 0 and set to 1 when SET reads
the last row of the input table.

set pg3.population_uscities end=lastrow; END=column

if lastrow=1 then
CapitalPopSort.output(dataset: 'work.CapitalPopSort');

52
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-36 Lesson 4 Defining and Processing Hash Objects

Creating a Table with the ADD and OUTPUT Methods

Scenario
Modify the DATA step to add the capital population data to a hash object and write the hash object
output to a table based on descending PctPop values.

Files
• p304d04.sas
• population_usstates – a SAS table containing the 50 states and capitals of the United States
sorted in descending order of the 2017 state population

• population_uscities – a SAS table containing 19,500 United States cities sorted in descending
order of the 2017 city population

Syntax

object-name.ADD()
object-name.OUTPUT(DATASET: 'data-set-name<(data-set-option)>')
DECLARE object object-name(<ORDERED: 'option' <, . . . >);
SET table END=column;

Notes
• The ADD method adds data in the PDV column to the corresponding key and data components
within a hash object. A zero value indicates success. A nonzero value indicates failure.
• The OUTPUT method creates a table containing the data component of the hash object. A zero
value indicates success. A nonzero value indicates failure.
• The ORDERED argument in the DECLARE statement specifies how the data is returned in the
key-value order when the OUTPUT method is used.
• If a key component is desired in the output table when the OUTPUT method is used, specify the
key component also as a data component.
• The END= option in the SET statement creates and names a temporary column that contains an
end-of-file indicator. The column is initialized to zero and set to 1 when SET reads the last row
of the table.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.4 Writing a Hash Object to a Table 4-37

Demo
1. Open the p304d04.sas program in the demos folder and find the Demo section. Run the
program. Verify that work.StateCityPopulation contains 19,500 rows and
work.CapitalPopulation contains 50 rows.
2. Delete work.CapitalPopulation from the DATA statement.
3. Remove the /* and */ from around the statements that are creating the hash object
CapitalPopSort.
declare hash CapitalPopSort(ordered: 'descending');
CapitalPopSort.definekey('PctPop');
CapitalPopSort.definedata('PctPop',
'CityName','CityPop2017',
'StateName','StatePop2017');
CapitalPopSort.definedone();
4. Add the END= option to the SET statement for the cities table.
set pg3.population_uscities end=lastrow;
5. Modify the conditional statement to add data to the CapitalPopSort hash object instead of
writing the output to a table.
if Capital=CityName then CapitalPopSort.add();
6. Add the following statement to output the CapitalPopSort hash object to a table.
if lastrow=1 then
CapitalPopSort.output(dataset: 'work.CapitalPopSort');
7. Run the program. Verify that work.StateCityPopulation contains 19,500 rows and
work.CapitalPopulation contains 50 rows sorted by descending PctPop. Are there any
duplicate values of PctPop in work.CapitalPopSort?
8. It appears that there are duplicate values of PctPop in work.CapitalPopSort. For example,
rows 23 and 24 show a PctPop value of 4.3% and rows 25 and 26 show a PctPop value of
4.1%.
9. Comment out the format for PctPop in the FORMAT statement. Run the program. Verify that
there are no duplicate values for PctPop in work.CapitalPopSort.
format StatePop2017 comma14. /* PctPop percent8.1 */;
10. Modify the calculation of PctPop to round the value, which will now produce duplicate values.
Run the program. Notice the error messages in the SAS log, ERROR: Duplicate key. Nine error
messages exist due to the duplicate values for 4.3, 4.1, 3.4, 2.6, 1.2, 0.9 (2), 0.7, and 0.6.
PctPop=round(CityPop2017/StatePop2017*100,.1);

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-38 Lesson 4 Defining and Processing Hash Objects

Unique Key Components


Hash Object CapitalPopSort
KEY DATA DATA DATA
PctPop PctPop CityName ...
By default, the
4.4 4.4 Madison ...
key component
(or components) 4.3 4.3 Topeka ...
must be unique. ERROR 4.3 4.3 Juneau ...
4.1 4.1 Montgomery ...
ERROR 4.1 4.1 Salem ...
4.0 4.0 Santa Fe ...
... ... ... ...

PctPop=round(CityPop2017/StatePop2017*100,.1);

54
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Duplicate Key Components

DECLARE object object-name (MULTIDATA: 'YES' <, . . . >);

Use the MULTIDATA


argument in the DECLARE
statement to allow for
duplicate key component
values.

55
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

MULTIDATA: 'option' – Specifies whether multiple data items are allowed for each key.
The 'option' can be one of the following values:
• 'YES' | 'Y' – Multiple data items are allowed for each key.
• 'NO' | 'N' – Only one data item is allowed for each key (default).
The option value is not case sensitive.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.4 Writing a Hash Object to a Table 4-39

4.03 Activity

declare hash CapitalPopSort(ordered: 'descending');

56
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-40 Lesson 4 Defining and Processing Hash Objects

Practice

If you restarted your SAS session, open and submit the libname.sas program in the course files.

Level 1
4. Creating a Sorted Table from a Hash Object
The solution to practice 1 involves looking up the values of ParkCode in a hash object to retrieve
values of ParkName and Type. The output table work.acreage contains national park
information in the default sorted order of ParkCode. Modify the starter program to create an
additional output table, work.acreage_sort, which contains the same data as work.acreage but
in sorted order by descending GrossAcres.
a. Open the p304p04.sas program in the practices folder. This program contains the solution
to practice 1. Review the DATA step syntax. Run the program and view the results.
b. Declare and define a hash object named Acreage. Specify an ORDERED argument
of DESCENDING and a MULTIDATA argument of YES. Define a key component of
GrossAcres and data components of ParkCode, ParkName, Type, State, and
GrossAcres.

DECLARE HASH object-name(ORDERED: 'value', MULTIDATA: 'value');

c. Change the subsetting IF statement for RC to be a conditional IF-THEN statement. If RC


is equal to 0, then add the data to the Acreage hash object.

object-name.ADD( )

d. Add an END= option to the SET statement to create a column named Last.
e. Add an IF-THEN statement before the DROP statement to output the hash object Acreage
to a table named work.acreage_sort if the last row has been read from the input table.

object-name.OUTPUT( DATASET: 'table_name')

f. Add a PROC PRINT step for the table work.acreage_sort to create the following results.
What is the smallest value of GrossAcres?

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.4 Writing a Hash Object to a Table 4-41

Level 2
5. Eliminating a PROC SORT Step by Creating a Sorted Table from a Hash Object
The solution to practice 2 involves looking up the values of StartYear, Name, and Basin
to retrieve four wind measurement values. The final output table, work.storm_cat345_facts,
contains storm information in default sorted order of Season and Name. Modify the starter
program to create an additional output table, work.cat345_sort, that contains the data in sorted
order by descending MaxWindMPH, Season, and Name.
a. Open the p304p05.sas program in the practices folder. Run the DATA step and verify that
the output table is sorted by ascending Season and Name.
b. Run the PROC SORT and PROC PRINT steps. Verify that the results are sorted by
descending MaxWindMPH, Season, and Name.
c. In the DATA step, declare and define a hash object named StormSort. Specify an
ORDERED argument of DESCENDING and a MULTIDATA argument of YES.

DECLARE HASH object-name(ORDERED: 'value', MULTIDATA: 'value');

1) Define key components for the StormSort hash object based on the columns specified
in the BY statement of the PROC SORT step.
2) Define data components for the StormSort hash object based on the columns specified
in the KEEP= option in the PROC SORT statement.
d. Change the subsetting IF statement to be a conditional IF-THEN statement. If the FIND
method is equal to 0, then add the data to the StormSort hash object.

object-name.ADD( )

e. Add an END= option to the SET statement to create a column named Last.
f. Add an IF-THEN statement before the KEEP statement to output the hash object StormSort
to a table named work.cat345_sort if the last row has been read from the input table.

object-name.OUTPUT( DATASET: 'table_name')

g. Delete the PROC SORT step.


h. Run the program and verify the PROC PRINT results. How many storms have a maximum
wind speed of 173 mph?

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-42 Lesson 4 Defining and Processing Hash Objects

Challenge
6. Ignoring Duplicates Keys when Loading a Hash Object
The pg3.storm_final table contains statistics such as MaxWindMPH for the storms from
seasons 1980 to 2017. Load the table into a hash object to find the unique combinations of
Season and MaxWindMPH. Use the data from the hash object in other hash objects to
determine the lowest and highest values of MaxWindMPH per season.
a. Open the p304p06.sas program in the practices folder. Review the DATA step syntax,
specifically the syntax for the SeasWind hash.
b. Run the DATA step and view the work.SeasonWind table. This table contains the unique
combinations of the key components (Season and MaxWindMPH). The default is to store
the first unique combination of keys and ignore all subsequent duplicate combinations.

c. Uncomment the syntax block that is creating the Seas1 hash object. Review the syntax.
d. Run the DATA step and view the work.SeasonWindLow table. This table contains the first
row of each unique Season value, which contains the lowest MaxWindMPH.

e. Uncomment the syntax block that is creating the Seas2 hash object. Review the syntax.
f. Add an argument to the DECLARE statement that will store the last duplicate key row instead
of the first duplicate key row.
Note: Use the SAS documentation (SAS® Component Objects: Reference) to determine
the needed DECLARE statement argument.
g. Run the DATA step and view the work.SeasonWindHigh table. This table contains the last
row of each unique Season value, which will contain the highest MaxWindMPH. What is the
highest MaxWindMPH for season 2017?

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.5 Using Hash Iterator Objects 4-43

4.5 Using Hash Iterator Objects

Reading Data in Forward and Reverse Order


pg3.population_uscapitals (50 rows)

Create a table of capitals


that have the five highest
city population values.

Create a table of capitals


that have the five lowest
city population values.

60
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

What Is a Hash Iterator Object?

KEY DATA DATA


col_A col_C col_D
A hash iterator
object is associated FIRST
with a hash object NEXT
and allows for data NEXT
retrieval in either
forward or reverse PREV
key order. PREV
LAST

61
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-44 Lesson 4 Defining and Processing Hash Objects

Hash Object to Hash Iterator Object

Hash Object Hiter Object


Capitals table Capitals C

declare hash Capitals(dataset: 'pg3.population_uscapitals',


ordered: 'ascending', multidata: 'yes');

declare hiter C('Capitals'); A hash iterator object is always


associated with a hash object. When you
declare a hash iterator object, the hash object
name must be enclosed in quotation marks.

62
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Hash Object to Hash Iterator Object

ordered
Hash Object Hiter Object
Capitals table Capitals C

declare hash Capitals(dataset: 'pg3.population_uscapitals',


ordered: 'ascending', multidata: 'yes');

declare hiter C('Capitals'); The ORDERED argument on the hash


object specifies how the data is returned
in the key-value order when the OUTPUT
method or a hash iterator object is used.

63
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.5 Using Hash Iterator Objects 4-45

Declaring and Defining the Hash Object


Hash Object Capitals (6 columns and 50 rows)
KEY DATA DATA DATA DATA DATA
CityPop2017 PctPop CityName CityPop2017 StateName StatePop2017
350395 24.5 Honolulu 350395 Hawaii
Not ordered1427538
by
Capitals table ascending
1626078 23.2 Phoenix 1626078 Arizona 7016270
CityPop2017
... ... ... ... ... ...

declare hash Capitals(dataset: 'pg3.population_uscapitals',


ordered: 'ascending', multidata: 'yes’);
Capitals.definekey('CityPop2017');
Capitals.definedata('PctPop','CityName','CityPop2017',
'StateName','StatePop2017');
Capitals.definedone();
64
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Declaring the Hash Iterator Object


Hash Iterator C (6 columns and 50 rows)
KEY DATA DATA DATA DATA DATA
CityPop2017 PctPop CityName CityPop2017 StateName StatePop2017
Hash Object 7484 1.2 Montpelier 7484 Vermont 623657
Capitals Ordered by869666
14004 1.6 Pierre 14004 South Dakota
ascending
18594 1.4 Augusta 18594 Maine 1335907
CityPop2017
27621 0.6 Frankfort 27621 Kentucky 454189
... ... ... ... ... ...

declare hiter C('Capitals');

65
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-46 Lesson 4 Defining and Processing Hash Objects

Declaring the Hash Iterator Object


if _N_=1 then do;
if 0 then set pg3.population_uscapitals;
declare hash Capitals(dataset: 'pg3.population_uscapitals',
ordered: 'ascending',
multidata: 'yes');
Capitals.definekey('CityPop2017');
Capitals.definedata('PctPop','CityName','CityPop2017',
'StateName','StatePop2017');
Capitals.definedone();
declare hiter C('Capitals');
end;

66
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.5 Using Hash Iterator Objects 4-47

Hash Iterator Object Methods

object-name.first()
Hash iterator object
methods are object-name.next()
operations performed
on a hash iterator object-name.last()
object using dot
notation.
object-name.prev()

67
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

• The FIRST method returns the data component of the first item in the hash iterator object that is
based on a hash object.
• The NEXT method returns the data component of the next item in the hash iterator object that is
based on a hash object. If you call the NEXT method without calling the FIRST method, then the
NEXT method starts at the first item.
• The LAST method returns the data component of the last item in the hash iterator object that is
based on a hash object.
• The PREV method returns the data component of the previous item in the hash iterator object that
is based on a hash object.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-48 Lesson 4 Defining and Processing Hash Objects

Reading Data in Forward and Reverse Direction

Scenario
In a DATA step, use a hash iterator object to reference the capital data in ascending order by city
population. Use this hash iterator object to create a table containing the capitals with the five lowest
city population values and a table containing the capitals with the five highest city population values.

Files
• p304d05.sas
• population_uscapitals – a SAS table containing the 2017 city and state populations for the 50
state capitals of the United States sorted in descending order of the percentage of city population
within the state

Syntax

DECLARE HITER hash-iterator-object-name('hash-object');


object-name.FIRST()
object-name.NEXT()
object-name.LAST()
object-name.PREV()

Notes
• A hash iterator object is associated with a hash object and allows for data retrieval in either
forward or reverse key order.
• When declaring a hash iterator object, the hash object name must be enclosed in quotation marks.
• Using the ORDERED argument in the DECLARE statement specifies how the data is returned in
the key-value order when the OUTPUT method or a hash iterator object is used.
• Use the DECLARE statement to create a hash iterator object that references a hash object.
• Use the hash iterator object methods of FIRST and NEXT to read data in the forward direction.
• Use the hash iterator object methods of LAST and PREV to read data in the reverse direction.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.5 Using Hash Iterator Objects 4-49

Demo
1. Open the p304d05.sas program in the demos folder and find the Demo section. Notice the
syntax for declaring and defining the hash object Capitals.
data work.LowCapitalPop(drop=High)
work.HighCapitalPop(drop=Low);
if _N_=1 then do;
if 0 then set pg3.population_uscapitals;
declare hash Capitals(dataset: 'pg3.population_uscapitals',
ordered: 'ascending',
multidata: 'yes');
Capitals.definekey('CityPop2017');
Capitals.definedata('PctPop','CityName','CityPop2017',
'StateName','StatePop2017');
Capitals.definedone();

end;
2. Add a DECLARE statement after the Capitals hash object has been defined to create a hash
iterator object named C.
declare hiter C('Capitals');
3. Notice the remaining syntax of the DATA step. The first DO loop iterates five times. The first time
reads in the first item of the hash iterator object, and the remaining iterations read the next item
of the hash iterator object. The last DO loop also iterates five times. The first time reads in the
last item of the hash iterator object, and the remaining iterations read the previous item of the
hash iterator object.
do Low=1 to 5;
if Low=1 then C.first();
else C.next();
output work.LowCapitalPop;
end;
do High=1 to 5;
if High=1 then C.last();
else C.prev();
output work.HighCapitalPop;
end;
run;
4. Run the program. Verify that the LowCapitalPop table contains low city populations and the
HighCapitalPop contains high city populations.
work.LowCapitalPop

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-50 Lesson 4 Defining and Processing Hash Objects

work.HighCapital Pop

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.5 Using Hash Iterator Objects 4-51

Hash Object Advantages and Disadvantages

Advantages Disadvantages
❑ Simplifies programs by ❑ Different syntax and concept
eliminating the need for ❑ Memory requirement
multiple steps
❑ Fast processing time because
stored in memory
❑ Created at execution time so
dynamic sizing
❑ Character and numeric data
can be included

69
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Beyond SAS Programming 3


What if you want to ...

. . . learn more
. . . reference a two-
about how a hash . . . read a book about the
page summary of
object is stored or power of hash objects?
hash object syntax?
searched? • SAS® Hash Object Programming
Made Easy by Michele M. Burlew • Hash Object Tip Sheet
• A peek under the hood in
DATA Step Hash Objects
• Data Management Solutions
as Programming Tools by
Using SAS® Hash Table
Paul M. Dorfman and
Operations: A Business
Koen Vyverman
Intelligence Case Study by Paul
Dorfman and Don Henderson

70
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

You can find direct links to these resources in the Course Links section on the Extended Learning
page.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-52 Lesson 4 Defining and Processing Hash Objects

Practice

If you restarted your SAS session, open and submit the libname.sas program in the course files.

Level 1
7. Reading Data in Forward and Reverse Direction
The pg3.np_acres table contains acreage amounts for national parks. Use a hash iterator to
create the following two tables:
• work.LowAcres, which contains the 10 parks with the lowest number of acres.
• work.HighAcres, which contains the 10 parks with the highest number of acres.
a. Open the p304p07.sas program in the practices folder. Review the DATA step syntax that is
creating the Acres hash object.
b. Add a DECLARE statement to create the hash iterator named A, which is associated with
the Acres hash object.
c. In the first DO loop, read the first 10 rows of the hash iterator, which will be the national parks
with the lowest number of acres. Output each row to work.LowAcres.
d. In the last DO loop, read the last 10 rows of the hash iterator, which will be the national parks
with the highest number of acres. Output each row to work.HighAcres.
e. Run the program and verify the results. What is the GrossAcres value for the national park
with the 10th highest acreage?

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.5 Using Hash Iterator Objects 4-53

Level 2
8. Reading Data in Forward and Reverse Directions
The pg3.storm_final table contains storm statistics such as MaxWindMPH for seasons 1980
through 2017. Use a hash iterator to create the following two tables:
• work.LowWind, which contains the five storms with the lowest maximum wind speeds.
• work.HighWind, which contains the five storms with the highest maximum wind speeds.
a. Open the p304p08.sas program in the practices folder. Review the DATA step syntax that is
defining the Storm hash object.
b. Add a DECLARE statement to declare the hash object named Storm. Load the hash object
with the pg3.storm_final table, excluding the MaxWindMPH values that are missing.
Specify the ORDERED argument with a value of ASCENDING and the MULTIDATA
argument with a value of YES.
c. Add a DECLARE statement to create the hash iterator named Stm, which is associated with
the Storm hash object.
d. In the first DO loop, read the first five rows of the hash iterator, which will be the storms with
the lowest maximum winds. Output each row to work.LowWind.
e. In the last DO loop, read the last five rows of the hash iterator, which will be the storms with
the highest maximum winds. Output each row to work.HighWind.
f. Run the program and verify the results. Based on the SAS log, how many observations were
read from the data set and loaded into the hash object?

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-54 Lesson 4 Defining and Processing Hash Objects

Challenge
9. Reading Data in a Forward Direction Based on a Starting Key
The pg3.storm_final table contains storm statistics such as MaxWindMPH for seasons 1980
through 2017. Use a hash iterator to find the category 5 storm with the lowest maximum wind
speed. Then output the 20 category 5 storms with the lowest maximum wind speeds. Category 5
storms have winds greater than or equal to 157 mph.
a. Open the p304p09.sas program in the practices folder. Review the DATA step syntax that is
defining the Storm hash object and the Stm hash iterator object.
b. Within the DO UNTIL loop, add the following two statements:
1) Add an assignment statement that creates the column RC based on whether the
Cat5Speed value is found in the hash iterator. If the value is found, a zero value is
returned.
Note: Use the SAS documentation (SAS® Component Objects: Reference) to learn the
syntax that is needed for the SETCUR method, which specifies a starting key
value for a hash iterator.
2) Add a sum statement that increments the Cat5Speed column by a value of 1.
c. Within the last DO loop, add the following two statements:
1) Add a statement to output the existing category 5 storm.
2) Add a statement with the appropriate hash iterator method that goes to the next row in
the hash iterator.
d. Run the program and verify the results. What is the range of the MaxWindMPH values for
the 20 weakest category 5 storms?

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.6 Solutions 4-55

4.6 Solutions
Solutions to Practices
1. Performing a Table Lookup Using One Key
data work.acreage;
length ParkCode $ 4 ParkName $ 115 Type $ 28;
if _N_=1 then do;
declare hash ParkDesc(dataset:'pg3.np_codelookup');
ParkDesc.definekey('ParkCode');
ParkDesc.definedata('ParkName','Type');
ParkDesc.definedone();
call missing(ParkCode,ParkName,Type);
end;
set pg3.np_acres2;
ParkCode=upcase(ParkCode);
run;
1) What is the name of the hash object being created? ParkDesc
2) What table is loading the hash object? pg3.np_codelookup
3) What is the name of the key component? ParkCode
4) What are the names of the data components? ParkName and Type
data work.acreage;
length ParkCode $ 4 ParkName $ 115 Type $ 28;
if _N_=1 then do;
declare hash ParkDesc(dataset:'pg3.np_codelookup');
ParkDesc.definekey('ParkCode');
ParkDesc.definedata('ParkName','Type');
ParkDesc.definedone();
call missing(ParkCode,ParkName,Type);
end;
set pg3.np_acres2;
ParkCode=upcase(ParkCode);
RC=ParkDesc.find(key:ParkCode); *RC=ParkDesc.find();
run;
1) Based on the log, how many rows from the pg3.np_codelookup table were read into
the hash object? 713
2) Based on the log, how many rows were read from the pg3.np_acres2 table? 368
3) Based on the log, how many rows are in the output table? 368
4) Based on the output table, how many ParkCode values are not found in the hash object
(RC not equal to 0)? 2 (rows 10 and 30)

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-56 Lesson 4 Defining and Processing Hash Objects

data work.acreage;
length ParkCode $ 4 ParkName $ 115 Type $ 28;
if _N_=1 then do;
declare hash ParkDesc(dataset:'pg3.np_codelookup');
ParkDesc.definekey('ParkCode');
ParkDesc.definedata('ParkName','Type');
ParkDesc.definedone();
call missing(ParkCode,ParkName,Type);
end;
set pg3.np_acres2;
ParkCode=upcase(ParkCode);
RC=ParkDesc.find(key:ParkCode); *RC=ParkDesc.find();
if RC=0;
drop RC;
run;

title 'Gross Acres for National Parks';


proc print data=work.acreage;
run;
title;
How many data rows are in the results? 366
2. Performing a Table Lookup Using Three Keys
data work.storm_cat345_facts;
if _N_=1 then do;
if 0 then set pg3.storm_range;
declare hash Storm(dataset:'pg3.storm_range');
Storm.definekey('StartYear','Name','Basin');
Storm.definedata('Wind1','Wind2','Wind3','Wind4');
Storm.definedone();
end;
set pg3.storm_summary_cat345;
ReturnCode=Storm.find(key:year(StartDate),key:Name,key:Basin);
run;
1) How many rows from the pg3. storm_range table were read into the hash object? 2959
2) How many rows were read from the pg3. storm_summary_cat345 table? 570
3) How many rows are in the output table? 570
4) How many key values are not found in the hash object (ReturnCode not equal to 0)?
Two (rows 505 and 539)
5) Why is the StartYear column set to missing values?
StartYear is a key component. Key components are not copied to the PDV. Only
data components are copied to the PDV. Basin and Name are also key components,
but those values are not missing because the values are read from
pg3.storm_summary_cat345.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.6 Solutions 4-57

data work.storm_cat345_facts;
if _N_=1 then do;
if 0 then set pg3.storm_range;
declare hash Storm(dataset:'pg3.storm_range');
Storm.definekey('StartYear','Name','Basin');
Storm.definedata('Wind1','Wind2','Wind3','Wind4');
Storm.definedone();
end;
set pg3.storm_summary_cat345;
if Storm.find(key:year(StartDate),key:Name,key:Basin)=0;
drop StartYear;
run;

title 'Storm Statistics for Category 3, 4, and 5';


proc print data=work.storm_cat345_facts;
run;
title;
How many data rows are in the results? 568
3. Performing a Table Lookup with Two Hash Objects and a Table
data work.storm_cat345_facts work.nonmatches;
if _N_=1 then do;
if 0 then do;
set pg3.storm_range;
set pg3.storm_basincodes;
end;
declare hash Storm(dataset:'pg3.storm_range');
Storm.definekey('StartYear','Name','Basin');
Storm.definedata('Wind1','Wind2','Wind3','Wind4');
Storm.definedone();
declare hash BasinDesc(dataset:'pg3.storm_basincodes');
BasinDesc.definekey('Basin');
BasinDesc.definedata('BasinName');
BasinDesc.definedone();
end;
set pg3.storm_summary_cat345;
ReturnCode1=Storm.find(key:year(StartDate),key:Name,key:Basin);
ReturnCode2=BasinDesc.find(key:Basin);
if ReturnCode1=0 and ReturnCode2=0 then
output work.storm_cat345_facts;
else output work.nonmatches;
drop StartYear;
run;

title 'Storm Statistics with Basin Names for Category 3, 4, and 5';
proc print data=work.storm_cat345_facts;
run;
title;

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-58 Lesson 4 Defining and Processing Hash Objects

title 'Non-Matches';
proc print data=work.nonmatches;
run;
title;
How many rows are in the work.nonmatches table? 11 (9 non-matches for Basin and 2 non-
matches for StartYear, Name, and Basin)
4. Creating a Sorted Table from a Hash Object
data work.acreage;
length ParkCode $ 4 ParkName $ 115 Type $ 28;
if _N_=1 then do;
declare hash ParkDesc(dataset:'pg3.np_codelookup');
ParkDesc.definekey('ParkCode');
ParkDesc.definedata('ParkName','Type');
ParkDesc.definedone();
call missing(ParkCode,ParkName,Type);
declare hash Acreage(ordered:'descending', multidata:'yes');
Acreage.definekey('GrossAcres');
Acreage.definedata('ParkCode','ParkName','Type',
'State','GrossAcres');
Acreage.definedone();
end;
set pg3.np_acres2 end=Last;
ParkCode=upcase(ParkCode);
RC=ParkDesc.find(key:ParkCode);
if RC=0 then Acreage.add();
if Last=1 then Acreage.output(dataset:'work.acreage_sort');
drop RC;
run;

title 'Gross Acres for National Parks Sorted by ParkCode';


proc print data=work.acreage;
run;
title;

title 'Gross Acres for National Parks Sorted by Gross Acres';


proc print data=work.acreage_sort;
run;
title;
What is the smallest value of GrossAcres? 0.02 acres

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.6 Solutions 4-59

5. Eliminating a PROC SORT Step by Creating a Sorted Table from a Hash Object
data work.storm_cat345_facts;
if _N_=1 then do;
if 0 then set pg3.storm_range;
declare hash Storm(dataset:'pg3.storm_range');
Storm.definekey('StartYear','Name','Basin');
Storm.definedata('Wind1','Wind2','Wind3','Wind4');
Storm.definedone();
declare hash StormSort(ordered:'descending',
multidata:'yes');
StormSort.definekey('MaxWindMPH','Season','Name');
StormSort.definedata('Season','Name','Wind1','Wind2',
'Wind3','Wind4','MaxWindMPH');
StormSort.definedone();
end;
set pg3.storm_summary_cat345 end=Last;
if Storm.find(key:year(StartDate),key:Name,key:Basin)=0
then StormSort.add();
if Last=1 then StormSort.output(dataset:'work.cat345_sort');
keep Name Basin Wind1-Wind4 Season MaxWindMPH StartDate;
run;

title1 'Storm Statistics for Category 3, 4, and 5';


title2 'sorted by descending (MaxWindMPH, Season, and Name)';
proc print data=work.cat345_sort;
run;
title;
How many storms have a maximum wind speed of 173 mph? six (rows 9 -14)
6. Ignoring Duplicate Keys when Loading a Hash Object
data _null_;
if 0 then set pg3.storm_final(keep=Season MaxWindMPH);
declare hash SeasWind(dataset:'pg3.storm_final
(where=(MaxWindMPH ne .))',
ordered:'ascending');
SeasWind.definekey('Season','MaxWindMPH');
SeasWind.definedone();
*output each unique combination of Season and MaxWindMPH;
SeasWind.output(dataset:'work.SeasonWind');

declare hash Seas1(dataset:'work.SeasonWind',


ordered:'ascending');
Seas1.definekey('Season');
Seas1.definedata('Season','MaxWindMPH');
Seas1.definedone();
*output the lowest MaxWindMPH per Season;
Seas1.output(dataset:'work.SeasonWindLow');

declare hash Seas2(dataset:'work.SeasonWind',

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-60 Lesson 4 Defining and Processing Hash Objects

ordered:'ascending',
duplicate:'r');
Seas2.definekey('Season');
Seas2.definedata('Season','MaxWindMPH');
Seas2.definedone();
*output the highest MaxWindMPH per Season;
Seas2.output(dataset:'work.SeasonWindHigh');

stop;
run;
What is the highest MaxWindMPH for season 2017? 185
7. Reading Data in Forward and Reverse Direction
data work.LowAcres work.HighAcres;
if _N_=1 then do;
if 0 then set pg3.np_acres(keep=ParkName GrossAcres);
declare hash Acres(dataset:'pg3.np_acres',
ordered:'ascending', multidata:'yes');
Acres.definekey('GrossAcres');
Acres.definedata('ParkName','GrossAcres');
Acres.definedone();
/* declare a hash iterator */
declare hiter A('Acres');
end;
do i=1 to 10;
/* retrieve parks with the lowest number of acres */
if i=1 then A.first();
else A.next();
output work.LowAcres;
end;
do i=1 to 10;
/* retrieve parks with the highest number of acres */
if i=1 then A.last();
else A.prev();
output work.HighAcres;
end;
drop i;
run;

title 'National Parks with Lowest Acreage';


proc print data=work.LowAcres;
run;
title;

title 'National Parks with Highest Acreage';


proc print data=work.HighAcres;
run;
title;

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.6 Solutions 4-61

What is the GrossAcres value for the national park with the 10th highest acreage? 3,281,789.43
8. Reading Data in Forward and Reverse Direction
data work.LowWind work.HighWind;
if _N_=1 then do;
if 0 then set pg3.storm_final
(keep=Season Name BasinName MaxWindMPH);
/* declare a hash object */
declare hash Storm
(dataset:'pg3.storm_final(where=(MaxWindMPH ne .))',
ordered:'ascending', multidata:'yes');
Storm.definekey('MaxWindMPH');
Storm.definedata('Season','Name','BasinName','MaxWindMPH');
Storm.definedone();
/* declare a hash iterator */
declare hiter Stm('Storm');
end;
do i=1 to 5;
/* retrieve storms with the lowest maximum winds */
if i=1 then Stm.first();
else Stm.next();
output work.LowWind;
end;
do i=1 to 5;
/* retrieve storms with the highest maximum winds */
if i=1 then Stm.last();
else Stm.prev();
output work.HighWind;
end;
drop i;
run;

title 'Storms with Lowest Maximum Winds';


proc print data=work.LowWind;
run;
title;

title 'Storms with Highest Maximum Winds';


proc print data=work.HighWind;
run;
title;
Based on the SAS log, how many observations were read from the data set and loaded into the
hash object? 3071

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-62 Lesson 4 Defining and Processing Hash Objects

9. Reading Data in a Forward Direction Based on a Starting Key


data work.category5;
if _N_=1 then do;
if 0 then set pg3.storm_final
(keep=Season Name BasinName MaxWindMPH);
declare hash Storm(dataset:'pg3.storm_final',
ordered:'ascending', multidata:'yes');
Storm.definekey('MaxWindMPH');
Storm.definedata('Season','Name','BasinName','MaxWindMPH');
Storm.definedone();
declare hiter Stm('Storm');
end;
Cat5Speed=157;
do until(rc=0);
/* add an assignment statement and a sum statement */
rc=Stm.setcur(key:Cat5Speed);
Cat5Speed+1;
end;
do i=1 to 20;
/* output and retrieve the weakest category 5 storms */
output;
Stm.next();
end;
run;

title 'Twenty Weakest Category 5 Storms';


proc print data=work.category5;
run;
title;
What is the range of the MaxWindMPH values for the 20 weakest category 5 storms?
161 to 173 mph

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.6 Solutions 4-63

Solutions to Activities and Questions

4.01 Activity – Correct Answer


In SAS 9.4M6, there are seven valid argument
and value tags valid for the DECLARE statement:
• dataset: 'dataset_name <(datasetoption)>'
• duplicate: 'option'
• hashexp: n
• keysum:'variable-name'
• multidata: 'option'
• ordered: 'option'
• suminc: 'variable-name'

22
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

4.02 Activity – Correct Answer


length City $ 11 Month TempMonAvg PrecipMonSum 8;
if _N_=1 then do;
declare hash Monthly(dataset:
'pg3.weather_monthly2017_ustop5');
Monthly.definekey('City','Month');
Monthly.definedata('TempMonAvg','PrecipMonSum');
Monthly.definedone();
call missing (City, Month, TempMonAvg, PrecipMonSum);
end;

NOTE: There were 60 observations read from the data set


PG3.WEATHER_MONTHLY2017_USTOP5.
NOTE: The data set WORK.TOP5TEMPPRECIP has 1 observations
and 4 variables.

35
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-64 Lesson 4 Defining and Processing Hash Objects

4.03 Activity – Correct Answer


declare hash CapitalPopSort(ordered: 'descending',
multidata: 'YES');

Do you see duplicate PctPop values? Yes PctPop Rows


4.3 23 and 24
4.1 25 and 26
3.4 29 and 30
2.6 34 and 35
1.2 40 and 41
0.9 42, 43, and 44
0.7 45 and 46
0.6 47 and 48
57
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
Lesson 5 Using Utility Procedures
5.1 Creating Picture Formats with the FORMAT Procedure ..................................................... 5-3
Demonstration: Specifying a Template for Datetime Values .................................................. 5-8
Demonstration: Specifying a Template for Large Numbers ................................................. 5-17
Practice ................................................................................................................................. 5-20

5.2 Creating Functions with the FCMP Procedure .................................................................. 5-23


Demonstration: Creating Functions Containing One Argument ........................................... 5-28
Demonstration: Creating Functions Containing Multiple Arguments ................................... 5-33
Practice ................................................................................................................................. 5-38

5.3 Solutions ................................................................................................................................ 5-41


Solutions to Practices ........................................................................................................... 5-41
Solutions to Activities and Questions ................................................................................... 5-44
5-2 Lesson 5 Using Utility Procedures

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.1 Creating Picture Formats with the FORMAT Procedure 5-3

5.1 Creating Picture Formats with the


FORMAT Procedure

Creating and Using Custom Formats


proc format;
value $genfmt 'F' = 'Female'
'M' = 'Male'
create other = 'Miscoded';
format value hrange low-<58 = 'Below Average'
58-60 = 'Average'
60<-high = 'Above Average';
run;

proc print data=pg3.class_birthdate noobs;


apply var Name Gender Height;
format format Gender $genfmt. Height hrange.;
run;

p305d01
3
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

FORMAT Procedure with the VALUE Statement

PROC FORMAT;
VALUE format-name value-or-range-1 = 'formatted-value'
value-or-range-2 = 'formatted-value'
...;
RUN;
You use the VALUE statement
to create a format that
specifies character strings to
display for values.

4
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-4 Lesson 5 Using Utility Procedures

Scenarios

My date is formatted The percent format


as 02JAN2019 but I multiplies my number
want Wed-2-Jan-2019. by 100, but I do not
want that.
I need my currency
value to look like
$1.5M instead of
$1,500,000.

5
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

FORMAT Procedure with the PICTURE Statement

PROC FORMAT;
PICTURE format-name (format-options)
value-or-range-1 = 'template-value' (template-options)
value-or-range-2 = 'template-value' (template-options)
...;
RUN; Use the PICTURE statement
to create a format that
specifies a template for
displaying numeric values.

6
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.1 Creating Picture Formats with the FORMAT Procedure 5-5

Custom Date, Time, and Datetime Formats

PM 9:13 :05

Wednesday
02JAN2019
StartDate StartTime
Wed-2-Jan-2019 H:21 M:13 S:05

StartDateTime
2019.01.02 @ 9:13:05 PM

7
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Custom Date, Time, and Datetime Formats


specifies the default length
of the formatted value

PROC FORMAT; specifies special


PICTURE format-name (DEFAULT=length) characters that
define a template
value-or-range-1 = 'directives'
for the display of
(DATATYPE=DATE | TIME | DATETIME) ; date, time, or
RUN; datetime values

enables the use of directives in the template


to format date, time, or datetime values
8
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

If the DEFAULT= option is not used, the length of the formatted value is based on the length of the
longest template.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-6 Lesson 5 Using Utility Procedures

Date Directives
%A Weekday name (full) Wednesday
%a Weekday name (first three letters) Wed
%d Day of month (one or two digits) 2
%0d Day of month (two digits) 02
%B Month name (full) January
%3B Month name (first three letters) Jan
%0m Month number (two digits) 01
%Y Year (four digits) 2019
%0y Year (two digits) 19

21551 %a-%d-%3B-%Y Wed-2-Jan-2019


9
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Time Directives
%H or %0H Hour, 24-hour clock (one or two digits) 21
%I or %0I Hour, 12-hour clock (one or two digits) 9 or 09
%M or %0M Minute (one or two digits) 13
%S or %0S Second (one or two digits) 5 or 05
%p AM or PM PM

Note: To add a leading zero before a single digit, insert a 0 before the directive.

76385 H:%H M:%M S:%S H:21 M:13 S:5

76385 H:%0H M:%0M S:%0S H:21 M:13 S:05

10
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.1 Creating Picture Formats with the FORMAT Procedure 5-7

5.01 Multiple Choice Question


Which directives create the following value? 2019.01.02 @ 9:13:05 PM

a. %Y.%m.%d @ %H:%M:%S %p
b. %y.%m.%d @ %I:%0M:%0S %p
c. %Y.%0m.%0d @ %I:%0M:%0S %p
d. %0y.%0m.%0d @ %H:%0M:%0S %p

11
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Custom Date, Time, and Datetime Formats


proc format;

picture MyDate (default=15) StartDate


low-high = '%a-%d-%3B-%Y'
(datatype=date); Wed-2-Jan-2019

picture MyTime (default=14) StartTime


low-high = 'H:%0H M:%0M S:%0S'
(datatype=time); H:21 M:13 S:05

picture MyDateTime (default=24)


low-high = '%Y.%0m.%0d @ %I:%0M:%0S %p'
(datatype=datetime);
StartDateTime
run;
2019.01.02 @ 9:13:05 PM
13
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

The DATATYPE= option must be specified for each value-range set.


proc format;
picture MyDate (default=15)
low-'31DEC1999'd = '%3B-%Y' (datatype=date)
'01JAN2000'd-high = '%a-%d-%3B-%Y' (datatype=date);
run;

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-8 Lesson 5 Using Utility Procedures

Specifying a Template for Datetime Values

Scenario
Specify a template for SAS datetime values by using the PICTURE statement within the FORMAT
procedure.

Files
• p305d01.sas
• storm_detail – a SAS table that contains multiple rows per storm for the 1980 through 2016 storm
seasons. Each row represents one measurement for each six hours of a storm.

Syntax

PROC FORMAT;
PICTURE format-name (DEFAULT=length)
value-or-range-1 = 'directives'
(DATATYPE=DATE | TIME | DATETIME);
RUN;

Notes
• The PICTURE statement of the FORMAT procedure is used to create a format that specifies a
template for displaying numeric values.
• The DEFAULT= option specifies the default length of the formatted value.
• Directives are special characters that are used in the template to format date, time, or datetime
values.
• The DATATYPE= option enables the use of directives to format date, time, or datetime values.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.1 Creating Picture Formats with the FORMAT Procedure 5-9

Demo
1. Open the p305d01.sas program in the demos folder and find the Demo section. Highlight and
run the TITLE statements and the PROC PRINT step. Notice the formatted value of ISO_time.
title 'Storms with Category 5 Winds';
proc print data=pg3.storm_detail noobs;
var Name ISO_time Wind Pressure Region;
where Wind>155;
format ISO_time datetime19.;
run;
title;

2. Notice the syntax of the PROC FORMAT step. The date and time directives are specifying a one-
or two-digit day of month, a three-letter proper case month name, a four-digit year, a colon, a
one- or two-digit 24-clock hour, and the letter H (for example, 7Aug1980:18H).
proc format;
picture dateh
low-high='%d%3B%Y:%HH';
run;
3. Modify the FORMAT statement in the PROC PRINT step to use the custom format created in the
PROC FORMAT step.
format ISO_time dateh.;
4. Highlight and run the demo program. Notice that the ISO_time values are not formatted properly
because SAS did not recognize the date and time directives.
5. Add the DATATYPE= option to the PICTURE statement after the directives.
picture dateh
low-high='%d%3B%Y:%HH'
(datatype=datetime);
6. Highlight and run the program. Notice that the ISO_time values are formatted but are being
truncated. A length of 11 is being used by default because that is the length of the template.
7. Add the DEFAULT= option to the PICTURE statement after the name of the format.
picture dateh (default=13)
low-high='%d%3B%Y:%HH'
(datatype=datetime);

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-10 Lesson 5 Using Utility Procedures

8. Highlight and run the demo program. Verify the formatted values of ISO_time.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.1 Creating Picture Formats with the FORMAT Procedure 5-11

5.02 Activity
Open p305a02.sas from the activities folder and perform the following
tasks:
1. Run the program and notice the formatting of BeginDate.
2. Add a PICTURE statement to create another date format.
• Name the format MyQtr.
• Be sure to specify a default length.
• All date values should be formatted to look like this:
Hint: Use the %q and %Y directives.
• Be sure to declare the data type.
3. Modify the FORMAT statement in the PROC FREQ step to use the new
format. Run the program. Which quarter has the most tornadoes?
15
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Custom Numeric Formats

17
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-12 Lesson 5 Using Utility Procedures

Custom Numeric Formats


rounds the value to
the nearest integer

PROC FORMAT;
PICTURE format-name (ROUND DEFAULT=length)
value-or-range-1 = 'digit selectors with non-numeric characters'
(MULT|MULTIPLIER=n PREFIX='prefix') ;
RUN;

specifies a number to specifies a character string to place


multiply the value by in front of the formatted value
18
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Custom Percent Format

picture mypct (round) low-high='009.9%' (multiplier=10);

19
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.1 Creating Picture Formats with the FORMAT Procedure 5-13

Digit Selectors with Non-numeric Characters


Digit selectors of 0 do
not print leading zeros. Non-numeric characters
are printed as specified.

Digit selectors '009.9%'


(0 through 9)
define positions
for numeric
values. Nonzero digit selectors
print leading zeros.
The digit selector of 9 is
commonly used as the
nonzero digit selector.
20
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

If the template contains digit selectors, then a digit selector must be the first character in the
template. Use the PREFIX= option to specify a character string to place in front of the formatted
value.

MULTIPLIER= Option

picture mypct (round) low-high='009.9%' (multiplier=10);

The default multiplier is 10n, where


n is the number of digits after the
first decimal point in the template.
The MULTIPLIER= option
specifies a number to
multiply the value by
before rounding and
formatting.

21
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-14 Lesson 5 Using Utility Procedures

ROUND Option

picture mypct (round) low-high='009.9%' (multiplier=10);

The decimal portion is truncated


without the ROUND option.
The ROUND option rounds
the value to the nearest
integer after multiplying
and before formatting.

22
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Custom Percent Format

picture mypct (round) low-high='009.9%' (multiplier=10);

Initial multiplied
Value by 10 Step 1: Apply the multiplier
0.3165 3.165 to the numeric value.
4.1139 41.139
6.656 66.56
11.7089 117.089
18.9873 189.873
100 1000
23
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved. p305d02

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.1 Creating Picture Formats with the FORMAT Procedure 5-15

Custom Percent Format


Step 2: If specified, round the numeric value to the nearest integer.

picture mypct (round) low-high='009.9%' (multiplier=10);

24
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved. p305d02

Custom Percent Format


Step 3: Apply the numeric value into the template beginning on the right.

picture mypct (round) low-high='009.9%' (multiplier=10);

Initial multiplied without with Final Value


Value by 10 ROUND ROUND with ROUND
0.3165 3.165 3 3 0.3%
4.1139 41.139 41 41 4.1%
6.656 66.56 66 67 6.7%
11.7089 117.089 117 117 11.7%
18.9873 189.873 189 190 19.0%
100 1000 1000 1000 100.0%
25
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved. p305d02

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-16 Lesson 5 Using Utility Procedures

5.03 Question
picture mypct low-high='009.99%' (multiplier=100);

What are the formatted values of PERCENT based on the PICTURE statement?

26
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved. p305a03

Custom Large Number Format


picture dollar_KM(round default=7)
low-<1000='009' (prefix='$' mult=1)
1000-<1000000='009.9K' (prefix='$' mult=.01)
1000000-high='009.9M' (prefix='$' mult=.00001);

28
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved. p305d02

MULT= is an alias for MULTIPLIER=.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.1 Creating Picture Formats with the FORMAT Procedure 5-17

Specifying a Template for Large Numbers

Scenario
Specify a template that displays large numbers using the abbreviation K for thousands and M for
millions by using the PICTURE statement within the FORMAT procedure.

Files
• p305d02.sas
• tornado_2017 – a SAS table that contains information about tornadoes in the United States
during the year 2017

Syntax

PROC FORMAT;
PICTURE format-name (ROUND DEFAULT=length)
value-or-range-1 = 'digit selectors with non-numeric characters'
(MULT|MULTIPLIER=n PREFIX= 'prefix');
RUN;

Notes
• The MULTIPLIER= option specifies a number to multiply the value by before rounding and
formatting. The default multiplier is 10n, where n is the number of digits after the first decimal point
in the template.
• The ROUND option rounds the value to the nearest integer after multiplying and before formatting.
The decimal portion is truncated without the ROUND option.
• Digit selectors (0 through 9) define positions for numeric values. Digit selectors of 0 do not print
leading zeros. Nonzero digit selectors print leading zeros. The digit selector 9 is commonly used
as the nonzero digit selector. If a picture format contains digit selectors, then a digit selector must
be the first character in the template.
• The PREFIX= option specifies a character string to place in front of the formatted value.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-18 Lesson 5 Using Utility Procedures

Demo
1. Open the p305d02.sas program in the demos folder and find the Demo section. Highlight and
run the PROC SORT and PROC PRINT steps. Notice the formatted value of PropertyDamage.
proc sort data=pg3.tornado_2017 out=work.tornado_2017;
by descending PropertyDamage;
run;

title1 '2017 US Tornadoes';


title2 'by Descending Property Damage';
proc print data=work.tornado_2017;
var State BeginDate Scale Deaths Injuries PropertyDamage;
format PropertyDamage dollar20.;
run;
title;

2. Notice the syntax of the PROC FORMAT step. The number of digits for large numbers will be
reduced.
proc format;
picture dollar_km (round)
low-<1000='009' (prefix='$' mult=1)
1000-<1000000='009.9K' (prefix='$' mult=.01)
1000000-high='009.9M' (prefix='$' mult=.00001);
run;
3. Modify the FORMAT statement in the PROC PRINT step to use the custom format created in the
PROC FORMAT step.
format PropertyDamage dollar_km.;
4. Highlight and run the demo program. Notice that the PropertyDamage value for the first row is
being displayed without the dollar sign. A length of 6 is being used by default because that is the
length of the longest template.
5. Add the DEFAULT= option to the PICTURE statement after the name of the format.
picture dollar_km (round default=7)
low-<1000='009' (prefix='$' mult=1)
1000-<1000000='009.9K' (prefix='$' mult=.01)
1000000-high='009.9M' (prefix='$' mult=.00001);

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.1 Creating Picture Formats with the FORMAT Procedure 5-19

6. Highlight and run the demo program. Verify the formatted values of PropertyDamage.

7. Modify the PICTURE statement to eliminate digits after the decimal point.
picture dollar_km (round default=7)
low-<1000='009' (prefix='$' mult=1)
1000-<1000000='009K' (prefix='$' mult=.001)
1000000-high='009M' (prefix='$' mult=.000001);
8. Highlight and run the demo program. Verify the formatted values of PropertyDamage.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-20 Lesson 5 Using Utility Procedures

Practice

If you restarted your SAS session, open and submit the libname.sas program in the course files.

Level 1
1. Specifying a Template Based on Date Directives
The pg3.storm_final table contains storm information such as StartDate and EndDate for
storms from the 1980 through 2017 storm seasons. Create a custom date format with the
following layout: three-letter-weekday.full-month-name.two-digit-day-of-month.two-digit-year
(for example, Sat.September.09.17).
a. Open the p305p01.sas program in the practices folder. Run the program. Notice the
formatted values of StartDate and EndDate based on the WORDDATE format.
b. In the PROC FORMAT step, add a PICTURE statement to create a custom date format.
1) Name the format NewDate.
2) Set an appropriate default length that accommodates the formatted date values, such as
Sat.September.09.17.
3) Specify a date range from LOW to HIGH.
4) Use the appropriate date directives to produce a template with the following layout:
three-letter-weekday.full-month-name.two-digit-day-of-month.two-digit-year.
5) Specify the appropriate DATATYPE= option.
c. Modify the FORMAT statement in the PROC PRINT step to use the custom date format for
the StartDate and EndDate columns.
d. Run the program and verify the results. What is the formatted StartDate value for the last
storm that occurred in the year 2016?

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.1 Creating Picture Formats with the FORMAT Procedure 5-21

Level 2
2. Specifying a Template Based on Digit Selectors
The pg3.stocks table contains stock market data for the first weekday of the month for the years
2010 through 2017. Create custom formats to display the DailyChange value with a suffix of
USD and the Volume value with a suffix of shares.
a. Open the p305p02.sas program in the practices folder. Run the DATA step and PROC
PRINT step. View the results. Notice how the values of VolumeChar and DailyChangeChar
are left-justified because they are character columns based on the CATX function. Because
of this left justification, the commas, decimal points, and words are not aligned.

b. In the PROC FORMAT step, add a PICTURE statement to create a custom format in addition
to the current PICTURE statement that creates a format named shares.
1) Name the format usd.
2) Set the default length to 11.
3) Specify a numeric range of LOW through less than 0 to account for negative values. Use
a template of '009.99 USD' with a prefix of a negative sign.
4) Specify a numeric range of 0 to HIGH to account for positive values. Use a template of
'009.99 USD'.
c. Add a FORMAT statement to the DATA step to format Volume with the shares format and
DailyChange with the usd format.
d. Run the program and view the results. Notice how the values of Volume and DailyChange
are right-justified because they are numeric columns. Because of the right justification, the
commas, decimal points, and words line up. Do the DailyChange values match the
DailyChangeChar values in the results, other than in regard to justification?

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-22 Lesson 5 Using Utility Procedures

Challenge
3. Specifying a Template Based on Digit Selectors and Characters
The work.savings table contains the deposits made to a savings account during the first half of
2019. Create a custom format to display the deposit amounts less than 1000 as the actual digits
with a dollar sign but the amounts of 1000 or greater as a character string of $1,000+.
a. Open the p305p03.sas program in the practices folder. Run the program and view the
formatted values for the Deposits column.
b. In the PROC FORMAT step, add a PICTURE statement to create a custom format.
1) Name the format MyDep.
2) Display values from 0 to less than 1000 as the actual digits with a prefix of a dollar sign.
3) Display values of 1000 or greater as the character string of $1,000+.
Note: Use the SAS documentation (Base SAS® Procedures Guide) to determine the
PICTURE statement option that treats numbers as message characters rather
than digit selectors.
c. Modify the FORMAT statement in the PROC PRINT step to use the custom format MyDep
for the Deposits column.
d. Run the program and verify the results. What PICTURE statement option is treating the
numbers as message characters instead of digit selectors for rows 4 and 6?

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.2 Creating Functions with the FCMP Procedure 5-23

5.2 Creating Functions with the FCMP


Procedure

Scenario

FtoC function

Create a custom function


to convert Fahrenheit
temperatures to Celsius
temperatures.

TavgC=FtoC(Tavg);

32
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Custom Functions

TavgC=round((Tavg-32)*5/9,.01);

An advantage of creating
TavgC=FtoC(Tavg);
custom functions is to
simplify programs that
use redundant syntax.

33
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-24 Lesson 5 Using Utility Procedures

FCMP Procedure

PROC FCMP OUTLIB=libref.table.package;


FUNCTION function-name(arguments) <$> <length>;
. . . programming statements . . .
RETURN(expression);
The Function Compiler
ENDSUB;
(FCMP) procedure
RUN;
enables you to create
custom functions using
DATA step syntax.

34
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Creating a Custom Function


The OUTLIB= option specifies the table and
package that store the compiled function.

proc fcmp outlib=pg3.funcs.weather;

The weather package is a collection of


routines that have unique names and
are stored in the pg3.funcs table.

35
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.2 Creating Functions with the FCMP Procedure 5-25

Creating a Custom Function

proc fcmp outlib=pg3.funcs.weather;

function FtoC(TempF); The FUNCTION statement


specifies the function name and
The FtoC custom the function arguments, as well
function has one as whether the function returns
numeric argument a character or numeric value.
named TempF and
returns a numeric value.

36
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Creating a Custom Function

proc fcmp outlib=pg3.funcs.weather;

function FtoC(TempF);

TempC=round((TempF-32)*5/9,.01);

The body of the function consists of DATA step syntax that


uses the values of the function arguments to create a
value that can be returned by the function.

37
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-26 Lesson 5 Using Utility Procedures

Creating a Custom Function

proc fcmp outlib=pg3.funcs.weather;

function FtoC(TempF);

TempC=round((TempF-32)*5/9,.01);

return(TempC); The RETURN statement


specifies the value that is
returned from the function.

38
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Creating a Custom Function

proc fcmp outlib=pg3.funcs.weather;

function FtoC(TempF);

TempC=round((TempF-32)*5/9,.01);

return(TempC);

endsub; The ENDSUB statement ends


the syntax for the function.
run;

39
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.2 Creating Functions with the FCMP Procedure 5-27

Creating and Using a Custom Function


proc fcmp outlib=pg3.funcs.weather;
function FtoC(TempF);
create TempC=round((TempF-32)*5/9,.01);
function return(TempC);
endsub;
run;

options cmplib=pg3.funcs;

use data work.NewYork;


function set pg3.NewYorkDaily2017;
TavgC=FtoC(Tavg);
run;

40
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Using a Custom Function

CMPLIB=libref.table | (libref.table-1 … libref.table-n)

The CMPLIB= option


specifies the table or
options cmplib=pg3.funcs; tables that SAS searches
for a package that contains
use data work.NewYork; the desired function.
function set pg3.NewYorkDaily2017;
TavgC=FtoC(Tavg);
run;

41
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Function names must be unique within a package. However, different packages can have functions
with the same names. To select a specific function when there is ambiguity, use the package name
and a period as the prefix to the function name. For example, to use the weather version of FtoC,
use weather.FtoC.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-28 Lesson 5 Using Utility Procedures

Creating Functions Containing One Argument

Scenario
Create two functions with the FCMP procedure that return a numeric value: one function to convert
Fahrenheit temperatures to Celsius and a second function to convert Celsius temperatures to
Fahrenheit.

Files
• p305d03.sas
• weather_ny_daily2017 – a SAS table that contains the average temperature and total
precipitation per day in 2017 for New York, United States

• weather_sydney_daily2017 – a SAS table that contains the average temperature and total
precipitation per day in 2017 for Sydney, Australia

Syntax

PROC FCMP OUTLIB=libref.table.package;


FUNCTION function-name(arguments) <$> <length>;
. . . programming statements . . .
RETURN(expression);
ENDSUB;
RUN;

OPTIONS CMPLIB=libref.table | (libref.table-1 … libref.table-n);

Notes
• The Function Compiler (FCMP) procedure enables you to create custom functions using DATA
step syntax.
• The OUTLIB= option in the PROC FCMP statement specifies the table and package that store the
function.
• The FUNCTION statement specifies the function name and the function arguments, as well as
whether the function returns a character or numeric value.
• The RETURN statement specifies the value that is returned from the function.
• The ENDSUB statement ends the syntax for the function.
• The CMPLIB= option in the OPTIONS statement specifies the table where SAS will locate the
custom function.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.2 Creating Functions with the FCMP Procedure 5-29

Demo
1. Open the p305d03.sas program in the demos folder and find the Demo section. Highlight and
run the PROC FCMP step. View the output table storing the new function.
proc fcmp outlib=pg3.funcs.weather;
function FtoC(TempF);
TempC=round((TempF-32)*5/9,.01);
return(TempC);
endsub;
run;

2. Highlight and run the DATA step. View the error in the SAS log that is related to SAS not finding
the FtoC function.
data work.NewYork;
set pg3.weather_ny_daily2017;
TavgC=FtoC(Tavg);
run;
3. Highlight and run the OPTIONS statement and the DATA step. View the new column TavgC in
the NewYork table.
options cmplib=pg3.funcs;

data work.NewYork;
set pg3.weather_ny_daily2017;
TavgC=FtoC(Tavg);
run;

4. Open pg3.weather_sydney_daily2017. Notice that the Tavg values are in the unit of Celsius.
5. Add syntax to the PROC FCMP step to create the CtoF function.
function CtoF(TempC);
TempF=round(TempC*9/5+32,.01);
return(TempF);
endsub;
6. Highlight and run the PROC FCMP step. Confirm that there are no errors. View the output table
storing the new function.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-30 Lesson 5 Using Utility Procedures

Note: The following warning appears in the log because of rerunning the syntax for the FtoC
function.
WARNING: Function 'FtoC' was defined in a previous package. Function 'FtoC' as defined in the
current program will be used as default when the package is not specified.

7. Copy and paste the DATA step for the New York data. Modify the new DATA step to use the
Sydney data with the CtoF function.
data work.Sydney;
set pg3.weather_sydney_daily2017;
TavgF=CtoF(TAvg);
run;
8. Highlight and run the new DATA step. View the new column TavgF in the Sydney table.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.2 Creating Functions with the FCMP Procedure 5-31

Additional Scenario

INtoCM function

Create a custom
function to convert
values in inches to
values in centimeters.

43
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

5.04 Activity
Open p305a04.sas from the activities folder and perform the following
tasks:
1. Create another custom function in the PROC FCMP step.
• Add a FUNCTION statement to create a function named INtoCM that
has a numeric argument of Pin.
• Add the following assignment statement: Pcm=Pin*2.54;
• Add a RETURN statement to return the value of Pcm.
• Add an ENDSUB statement.
2. Run the program and verify that the PrecipCM values are 2.54 times
bigger than the Precip values. Does the PROC SQL step use the custom
functions successfully?
44
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-32 Lesson 5 Using Utility Procedures

Multiple Arguments

Use a comma to separate Place a dollar sign after the argument


multiple arguments. name if specifying a character argument.

function CnvTemp(Temp, Unit $);


if upcase(Unit)='F'
then NewTemp=round((Temp-32)*5/9,.01);
else if upcase(Unit)='C'
then NewTemp=round(Temp*9/5+32,.01);
return(NewTemp);
endsub;
Create a custom function to
convert Fahrenheit to Celsius or
Celsius to Fahrenheit, depending
on the initial unit.
46
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Returning a Character Value

Place a dollar sign after the list of arguments if


the function is to return a character value.

function CharTemp(Temp, Unit $, FinalUnit $) $ 10;

A length can be specified for the character value. If a


length is not specified, the length defaults to 8.

47
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.2 Creating Functions with the FCMP Procedure 5-33

Creating Functions Containing Multiple Arguments

Scenario
Create two functions with the FCMP procedure. The first function will convert temperatures to the
desired unit and return the temperatures as numeric values. The second function will convert
temperatures to the desired unit and return the temperatures along with the unit as character values.

Files
• p305d04.sas
• weather_sydney_ny_monthly2017 – a SAS table that contains the average temperature and
total precipitation per month in 2017 for New York and Sydney

Syntax

PROC FCMP OUTLIB=libref.table.package;


FUNCTION function-name(arguments) <$> <length>;
. . . programming statements . . .
RETURN(expression);
ENDSUB;
RUN;

OPTIONS CMPLIB=libref.table | (libref.table-1 … libref.table-n);

Notes
• Character arguments are specified in the FUNCTION statement by placing a dollar sign after the
argument name.
• A dollar sign after the arguments specifies that the function returns a character value. The length
of the character value can be specified after the dollar sign.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-34 Lesson 5 Using Utility Procedures

Demo
1. Open the p305d04.sas program in the demos folder and find the Demo section. Notice the
syntax for the PROC FCMP step. The Unit argument specifies whether the temperature is
currently a Fahrenheit or Celsius value. If the current temperature is Fahrenheit, it is converted to
Celsius. Conversely, if the current temperature is Celsius, it is converted to Fahrenheit.
proc fcmp outlib=pg3.funcs.temps;
function CnvTemp(Temp, Unit $);
if upcase(Unit)='F'
then NewTemp=round((Temp-32)*5/9,.01);
else if upcase(Unit)='C'
then NewTemp=round(Temp*9/5+32,.01);
. . .
return(NewTemp);
endsub;
run;
2. Highlight and run the PROC FCMP step, the OPTIONS statement, and the DATA step. View the
new column TavgConverted in the SydneyNewYork table. The temperature is in Fahrenheit for
the Sydney rows and Celsius for the New York rows.

3. Add a third argument to the CnvTemp function that specifies the desired final unit.
function CnvTemp(Temp, Unit $, FinalUnit $);
4. Delete the two conditional statements for the CnvTemp function and uncomment the four
conditional statements.
proc fcmp outlib=pg3.funcs.temps;
function CnvTemp(Temp, Unit $, FinalUnit $);
if upcase(Unit)='F' and upcase(FinalUnit)='C'
then NewTemp=round((Temp-32)*5/9,.01);
else if upcase(Unit)='F' and upcase(FinalUnit)='F'
then NewTemp=Temp;
else if upcase(Unit)='C' and upcase(FinalUnit)='F'
then NewTemp=round(Temp*9/5+32,.01);
else if upcase(Unit)='C' and upcase(FinalUnit)='C'
then NewTemp=Temp;
return(NewTemp);
endsub;
run;
5. In the DATA step, delete the TavgConverted assignment statement and uncomment the TavgF
and TavgC assignment statements.
TavgF=CnvTemp(TAvg,Tunit,'F');
TavgC=CnvTemp(TAvg,Tunit,'C');

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.2 Creating Functions with the FCMP Procedure 5-35

6. Run the demo program that includes the SGPLOT procedure. View the new columns TavgF and
TavgC in the SydneyNewYork table. Also view the graph of the TavgF column.

7. In the PROC FCMP step, copy and paste the syntax for creating the CnvTemp function within
the step. Modify the syntax to create a function that returns a character value that concatenates
the letter F or C after the temperature value.
function CharTemp(Temp, Unit $, FinalUnit $) $ 20;
if upcase(Unit)='F' and upcase(FinalUnit)='C'
then NewTemp=round((Temp-32)*5/9,.01);
else if upcase(Unit)='F' and upcase(FinalUnit)='F'
then NewTemp=Temp;
else if upcase(Unit)='C' and upcase(FinalUnit)='F'
then NewTemp=round(Temp*9/5+32,.01);
else if upcase(Unit)='C' and upcase(FinalUnit)='C'
then NewTemp=Temp;
return(catx(' ',put(NewTemp,8.2),FinalUnit));
endsub;
8. In the DATA step, add the following two assignment statements:
TavgFchar=CharTemp(TAvg,Tunit,'F');
TavgCchar=CharTemp(TAvg,Tunit,'C');

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-36 Lesson 5 Using Utility Procedures

9. Highlight and run the PROC FCMP step, the OPTIONS statement, and the DATA step. View the
new columns TavgFchar and TavgCchar in the SydneyNewYork table.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.2 Creating Functions with the FCMP Procedure 5-37

Beyond SAS Programming 3


What if you want to ...

… learn more about … learn more about


picture formats? the FCMP procedure?

• Getting in to the Picture (Format) by • Using PROC FCMP to the Fullest: Getting
Andrew H. Karp (paper) Started and Doing More by Arthur L.
Carpenter (paper)
• PICTURE Perfect: In depth look at the
PICTURE format by Carry W. Croghan • Using Base SAS® to Extend the SAS®
(paper) System by Mark L. Jordan (paper)

49
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

You can find direct links to these resources in the Course Links section on the Extended Learning
page.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-38 Lesson 5 Using Utility Procedures

Practice

If you restarted your SAS session, open and submit the libname.sas program in the course files.

Level 1
4. Creating a Custom Function That Returns a Numeric Value
The pg3.class_tests table contains student scores (ranging from 1 to 10) for four tests and a
final exam. Create a function that calculates each student’s final score. The final score is the
average of six scores: the four tests and the final exam, which is counted twice.
a. Open the p305p04.sas program in the practices folder. Highlight and run the DATA step and
the PROC PRINT step to view the student scores.
b. Create a custom function that calculates each student’s final score.
1) Add a PROC FCMP statement that has an OUTLIB= option to store the custom function
in pg3.myfunctions.class.
2) Add a FUNCTION statement to name the function CalcScore. The function should
contain five arguments: T1, T2, T3, T4, and F.
3) Add the following assignment statement:
FScore=round(sum(of T1-T4, 2*F)/6,.01);
4) Add a RETURN statement to return the value of FScore.
5) Add an ENDSUB statement and a RUN statement.
c. Highlight and run the PROC FCMP step. View the log and confirm that the function is saved.
NOTE: Function CalcScore saved to pg3.myfunctions.class.

d. After the PROC FCMP step and before the DATA step, add a global OPTIONS statement
with the CMPLIB= option to specify the table location of the function, pg3.myfunctions.
e. In the DATA step, add an assignment to create FinalScore. This column is equal to the
CalcScore function with the arguments Test1, Test2, Test3, Test4, and Final.
f. Run the program and verify the results. How many students have a final score greater than
9.00?

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.2 Creating Functions with the FCMP Procedure 5-39

Level 2
5. Creating a Custom Function That Returns a Character Value
The sashelp.baseball table contains baseball statistics per each player. The Name column
contains the player’s name with the last name appearing before the first name (for example,
Davis, Alan). Create a function that switches the order of the first and last names.
a. Open the p305p05.sas program in the practices folder.
b. Add a PROC FCMP step to create a custom function.
1) Store the function in a package named baseball within the pg3.myfunctions table.
2) Name the function flip. The function should contain a character argument named
LastFirst and return a character value with an appropriate length.
3) Use the following RETURN statement:
return(catx(' ',scan(LastFirst,2,','),
scan(LastFirst,1,',')));
c. Highlight and run the PROC FCMP step. View the log and confirm that the function is saved.
d. Add an OPTIONS statement with the CMPLIB= option to specify the table that SAS searches
for the baseball package.
e. In the DATA step, add an assignment to create Player. This column is equal to the flip
function with the argument of Name.
f. Run the program and verify the results. What is the name of the baseball player in row 21?

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-40 Lesson 5 Using Utility Procedures

Challenge
6. Creating a Custom CALL Routine
The pg3.eu_occ table contains monthly occupancy rates for European countries from January
2004 through September 2017. The column YearMon contains values with the character layout
of a four-digit year, the letter M, and a two-digit month (for example, 2017M09). Create a custom
CALL routine that extracts the year and month from the character layout to store a numeric SAS
date representing the end of the given month for the given year in a specified column.
a. Open the p305p06.sas program in the practices folder. Run the program to view the
European occupancy data and the missing data for the NewDate column.
b. Add a PROC FCMP step to create a custom CALL routine.
Note: Use the SAS documentation (Base SAS® Procedures Guide) to determine the
needed syntax for creating a CALL routine with the FCMP procedure.
1) Store the routine in a package named dates within the pg3.myfunctions table.
2) In the SUBROUTINE statement, name the routine C2Ndate. The routine should contain
a character argument named CDate and a numeric argument named NDate.
3) In the OUTARGS statement, specify the NDate argument as the column for the
subroutine to update.
4) Use the following statements in the routine:
Year=input(substr(CDate,1,4),4.);
Month=input(substr(CDate,6,2),2.);
NDate=intnx('month',mdy(Month,1,Year),0,'end');
c. Highlight and run the PROC FCMP step. View the log and confirm that C2Ndate is saved.
d. Add an OPTIONS statement that contains the CMPLIB= option to specify the table that SAS
searches for the dates package.
e. In the DATA step, add a CALL statement to reference the C2Ndate routine with arguments of
YearMon and NewDate.
f. Run the program and verify the results. What are the YearMon and NewDate values for row
20?

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.3 Solutions 5-41

5.3 Solutions
Solutions to Practices
1. Specifying a Template Based on Date Directives
proc format;
picture NewDate (default=19)
low-high = '%a.%B.%0d.%0y' (datatype=date);
run;

proc sort data=pg3.storm_final out=work.storm_final;


by descending StartDate;
run;

title 'Detail Storm Report by Descending Start Date';


proc print data=work.storm_final;
var Name BasinName StartDate EndDate MaxWindMPH MinPressure;
format StartDate EndDate NewDate.;
run;
title;
What is the formatted StartDate value for the last storm that occurred in the year 2016?
Tue.December.20.16
2. Specifying a Template Based on Digit Selectors
proc format;
picture shares
low-high='000,000,009 shares';
picture usd (default=11)
low-<0='009.99 USD' (prefix='-')
0-high='009.99 USD';
run;

data work.stock_report;
set pg3.stocks(drop=High Low);
VolumeChar=catx(' ',put(Volume,comma18.),'shares');
DailyChange=Close-Open;
DailyChangeChar=catx(' ',DailyChange,'USD');
format Volume shares. DailyChange usd.;
run;

title 'Monthly Stock Market Data for 2010 to 2017';


proc print data=work.stock_report;
run;
title;
Do the DailyChange values match the DailyChangeChar values in the results, other than in
regard to justification? Yes

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-42 Lesson 5 Using Utility Procedures

3. Specifying a Template Based on Digit Selectors and Characters


data work.savings;
input Date date9. Deposits;
datalines;
07JAN2019 199
04FEB2019 325
04MAR2019 557
01APR2019 1200
06MAY2019 215
03JUN2019 22200
;
run;

proc format;
picture MyDep
0-<1000 = '009' (prefix='$')
1000-high = '$1,000+' (noedit);
run;

title 'Monthly Deposits for Savings';


proc print data=work.savings;
format Date mmddyy10. Deposits MyDep.;
run;
title;
What PICTURE statement option is treating the numbers as message characters instead of digit
selectors for rows 4 and 6? NOEDIT
4. Creating a Custom Function That Returns a Numeric Value
proc fcmp outlib=pg3.myfunctions.class;
function CalcScore(T1, T2, T3, T4, F);
FScore=round(sum(of T1-T4, 2*F)/6,.01);
return(FScore);
endsub;
run;

options cmplib=pg3.myfunctions;

data work.scores;
set pg3.class_tests;
FinalScore=CalcScore(of Test1-Test4, Final);
run;

title 'Student Scores';


proc print data=work.scores;
run;
title;
How many students have a final score greater than 9.00? Two

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.3 Solutions 5-43

5. Creating a Custom Function That Returns a Character Value


proc fcmp outlib=pg3.myfunctions.baseball;
function flip(LastFirst $) $ 18;
return(catx(' ',scan(LastFirst,2,','),
scan(LastFirst,1,',')));
endsub;
run;

options cmplib=pg3.myfunctions;

data work.FlipNames;
set sashelp.baseball(keep=Name Team);
Player=flip(Name);
drop Name;
run;

title 'Baseball Players and Teams';


proc print data=work.FlipNames;
var Player Team;
run;
title;
What is the name of the baseball player in row 21? Barry Bonds
6. Creating a Custom CALL Routine
proc fcmp outlib=pg3.myfunctions.dates;
subroutine C2Ndate(CDate $, NDate);
outargs NDate;
Year=input(substr(CDate,1,4),4.);
Month=input(substr(CDate,6,2),2.);
NDate=intnx('month',mdy(Month,1,Year),0,'end');
endsub;
run;

options cmplib=pg3.myfunctions;

data work.DateChange;
set pg3.eu_occ;
call missing(NewDate);
call C2Ndate(YearMon,NewDate);
format NewDate date9.;
run;

title 'End of Month Occupancies';


proc print data=work.DateChange;
run;
title;
What are the YearMon and NewDate values for row 20? 2016M02 and 29FEB2016

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-44 Lesson 5 Using Utility Procedures

Solutions to Activities and Questions

5.01 Multiple Choice Question – Correct Answer


Which directives create the following value? 2019.01.02 @ 9:13:05 PM

a. %Y.%m.%d @ %H:%M:%S %p
b. %y.%m.%d @ %I:%0M:%0S %p
c. %Y.%0m.%0d @ %I:%0M:%0S %p
d. %0y.%0m.%0d @ %H:%0M:%0S %p

12
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

5.02 Activity – Correct Answer


proc format;
. . .
picture MyQtr (default=13)
low-high='Qtr %q of %Y' (datatype=date);
run;

title 'Frequency of 2017 US Tornadoes';


proc freq data=pg3.tornado_2017 order=freq;
table BeginDate;
format BeginDate MyQtr.;
run;
title;

Quarter 1 has the


most tornadoes.
16
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.3 Solutions 5-45

5.03 Question – Correct Answer


picture mypct low-high='009.99%' (multiplier=100);

What are the formatted values of PERCENT based on the PICTURE statement?

27
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved. p305a03

5.04 Activity – Correct Answer


proc fcmp outlib=pg3.funcs.weather;
. . .
function INtoCM(Pin);
Pcm=Pin*2.54;
return(Pcm);
endsub;
run;

options cmplib=pg3.funcs;

proc sql;
select Date, Tavg, Tunit, FtoC(Tavg) as TavgC,
Precip, Punit, INtoCM(Precip) as PrecipCM
from pg3.weather_ny_daily2017;
quit;
Yes, the PROC SQL step
45
can use custom functions.
Copy ri ght © S AS Insti tute Inc. Al l ri ghts reserved.

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-46 Lesson 5 Using Utility Procedures

Copyright © 2019, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

You might also like