Professional Documents
Culture Documents
SAS Base Programming Reference Sheet
SAS Base Programming Reference Sheet
SAS Base Programming Reference Sheet
Copyright © 2022 SAS Institute Inc. Cary, NC, USA. All rights reserved
SAS Base Programming Reference Sheet
Procedures for Data Manipulation The DATA Step: Controlling Variable Output
• PROC SORT: sorts the rows in a table on one or more character • DROP= / KEEP = options can be added to a table on the DATA
or numeric columns. A PROC SORT is required before any step statement or SET statement.
that uses a BY statement • DROP/ KEEP statements can be added within the data step
• BY specifies the columns used in the sort • Columns kept or dropped will be flagged in the PDV
• OUT = specifies an output table • Dropping a column on the SET statement makes a column
• NODUPKEY keeps only the first row for each unique unavailable for processing in the data step
value of the columns(s) listed in the by statement DATA work.expensive (KEEP = price item_name);
• DUPOUT creates an output table containing SET work.shopping (DROP = city state);
duplicates KEEP store_name;
• DESCENDING sorts column from 9 to 0 or Z to A RUN;
PROC SORT DATA = input-table <OUT = output-
table> <NODUPKEY> <DUPOUT = output-table>; The DATA Step: Processing Data in Groups
BY <DESCENDING> col-name(s) </options>; • Process data in groups after sorting data first
RUN; • First.bycol is 1 for the first row within a group and 0 otherwise.
• Last.bycol is 1 for the last row within a group and 0 otherwise
• PROC TRANSPOSE: is used to restructure a table
• VAR lists column(s) to be transposed BY col-names(s);
• ID creates a separate column for each value of the ID FIRST.bycol <expression>;
Variable and can only be one column. LAST.bycol<expression>;
• BY transpose data within groups. Unique • Accumulating columns require modifying SAS’ default
combinations of BY values creates one row in the behavior to retain all PDV values with each iteration.
output table • Accumulating columns are often used in conjunction with
• PREFIX provides a prefix for each value of the ID FIRST./LAST. logic allowing BY GROUP calculations & totals
column • Column is the new variable holding the accumulating total
• NAME names the column that identifies the source
column containing the transposed values Column + expression;
PROC TRANSPOSE DATA = input-table OUT = The DATA Step: Conditional Processing and Loops
output-table <PREFIX = column> <NAME = column>;
<BY col-name(s); > • Conditionally process data using IF/ELSE IF/ ELSE statements
<ID column;> • SAS will check the expressions sequentially until one is true
<VAR columns(s);> • IF statements can create new variables or new data sets
RUN; if price >100 then newVar = “Expensive”;
else if price <100 and price >0 then newVAR = “Cheap”
Preparing Data: The DATA Step else if price = 0 then newVAR = “FREE”;
else newVAR = “Priceless”;
DATA output-dataset; • Execute multiple statements by using a DO statement
set input-dataset;
Run; if price >100 then do;
newVar = “Expensive”;
• The data step is processed in two phases: output work.expensive;
• Compilation: creates the PDV, establishes data end;
attributed and rules for execution
• Process repetitive code using DO LOOPS
• Execution: SAS reads, manipulates and write data
• The optional OUTPUT statement will output a row for each
• DATA Steps create two default variables:
iteration of the loop
• _N_ counts the number of iterations through the
data step when processing DATA output-table;
• _ERROR_ is initialized at 0. If an error is SET input-table;
encountered, the value is set to 1 DO indexcolumn = start TO stop <BY increment> ;
• Explicit Output statements can be used to control when . . . repetitive code . . .
and where each row is written. <OUTPUT;>
• Multiple datasets can be created in one data step END;
RUN;
DATA work.cheap work.expensive;
set work.shopping; • A DO UNTIL executes until a condition is true, and the
if price >100 then output work.expensive; condition is checked at the bottom of the DO loop. A DO
else output work.cheap UNTIL loop always executes at least one time.
Run; • A DO WHILE executes while a condition is true, and the
condition is checked at the top of the DO loop. A DO WHILE
Program Explanation loop does not iterate even once if the condition is initially false.
The data step creates to output tables CHEAP and
EXPENSIVE based on the input dataset WORK.SHOPPING. DO WHILE | UNTIL expression;
If an observation has PRICE greater than 100, then the . . . repetitive code . . .
observation is assigned to the EXPENSIVE dataset. END;
Copyright © 2022 SAS Institute Inc. Cary, NC, USA. All rights reserved
SAS Base Programming Reference Sheet
The DATA Step: Combining Data • Common Date Functions:
• Combine tables by concatenating them (stacking), or matching NOTE: SAS Dates are numeric values calculated as the number
them based on a variable of days since JAN 1, 1960.
• Concatenating: Function What is does
• SAS reads all the rows from the first table listed on the
set statement and writes them to the output table. MDY(month, day, year) Creates a SAS Date Value
Then from the second table, and so on
TODAY() Returns the current date as a numeric SAS
• Columns with the same name are aligned
date value
• Columns not in all tables are included
• The RENAME = option can rename columns in input YEAR(date-var); Returns Year/Month/Day/QTR of the sas
tables, so they align in the output table MONTH(date-var) date value input
• Additional DATA step statements can be used after the DAY(date-var)
set statement to manipulate data QTR(date-var)
DATA output-dataset; INTCX(interval, start- Increments a date/time/datetime value by
set input-dataset1 input-dataset2 from, increment) a given time interval
(rename=(currentName = newName));
Run; • Common Character Functions
Function What is does
• Merging tables
• All tables in the MERGE statement must be sorted by TRIM(string) Removes trailing blanks
the column(s) listed in the BY statement
• The MERGE statement combines rows where the BY- STRIP(string) Removes all leading and trailing blanks
Column values match
SCAN(string, count, Returns the nth word from a string
• Identify matching and no matching rows by using the
<char-list, <modifier>>)
IN= dataset option. IN variable values are 0 or 1.
• 0 table did NOT include the by-column value. PROPCASE(string) Changes the casing of the string.
• 1 table did include the by-column value UPCASE(string) Commonly used in statements of equality
• Use a subsetting IF or IF-THEN logic to handle LOWCASE(string)
matching & nonmatching rows
SUBSTR(string, start- Extracts a substring from the argument
DATA output-dataset;
from, length)
MERGE input-dataset1 <(in = VAR1)>
input-dataset2 <(in = VAR2)>;
Customizing SAS Output: Labels and Formats
BY by-column(s);
< IF var1 = 1 and var2 = 1;> /* all matching rows */ • Labels and Formats can be applied in the DATA step and
RUN; assigned as permanent attributes. These statements can also be
used in reporting procedures as temporary attributes. (e.g. they
The DATA Step: Functions need to be specified in each procedure)
• SAS has functions to handle character, numeric and date • Labels can be used to provide more descriptive column
columns. headers. A label can include any text up to 256 characters.
new-var = function(argument1, argument2,…); • Add labels to more than one column in a single label statement
• Convert numeric values to character using the PUT function LABEL col-name1 = “Label Text 1”
col-name2 = “Label Text 2” ;
char-var = put(numeric-var, informat);
• Formats are used to change the way values are displayed in
• Convert character values to numeric using the INPUT function data and reports.
Numeric-var = input(char-var, format); • Formats do not change the underlying data values.
• Add formats to more than one column in a single statement
• SAS has functions to handle character, numeric and date
columns. FORMAT date-var mmddyy10. num-var dollar13.2;
• Common Numeric Functions: • Create your own custom formats using the PROC FORMAT
Function What is does procedure
• VALUE statement specifies the criteria for creating one
RAND(‘distribution, Generates random numbers from a custom format.
paramter1,...) selected distribution • Multiple VALUE statements can be used within the
PROC FORMAT step.
ROUND(number, Rounds number to the nearest
<rounding unit>) rounding unit (.01, .001, etc) PROC FORMAT;
VALUE format-name <$>
LARGEST(k, value1, Returns the Kth largest non missing value-or-range-1 = 'formatted-value’
value2, …) value value-or-range-2 = 'formatted-value’
...;
SUM(argument1, Sums all non missing arguments
RUN;
argument2,…)
Copyright © 2022 SAS Institute Inc. Cary, NC, USA. All rights reserved
SAS Base Programming Reference Sheet
Filtering Data Exporting Data
• WHERE Statements filter rows and can be used in both the • Export a SAS dataset to variable file types (XLSX, TXT,CSV,
DATA step and PROC steps. etc) using a PROC EXPORT step
WHERE expression; • PROC EXPORT must be used to export
unstructured data type. (e.g. CSV files)
• If the expression is true, rows are read, if false, they are not. • DMBS = the database management system which
• WHERE statements can only work with columns that exist on specifies the type of data to export. (e.g. CSV,
an input dataset, not ones that are calculated during DLM, JMP, TAB)
manipulation.
• Character values are case sensitive and must be in quotes “ ” PROC EXPORT DATA=input-table
EX: WHERE Car_Make = “Honda” will select OUTFILE="output-file“
different rows than WHERE car_make = “HONDA” <DBMS = identifier REPLACE> ;
• Numeric values are not in quotes and can only include digits, RUN;
decimal points and/or negative signs
• Compound conditions can be created using AND/OR • Alternatively use a LIBNAME statement to export data.
• Logic can be reversed with the NOT keyword • A LIBNAME statement can only be used if the output data
• Use the SAS Date constant when filtering with dates: type has an accessible SAS Engine (e.g. XLSX, JSON, XML).
“ddMONyyyy”d • Ensure a LIBNAME libref CLEAR statement is used at the
end to close the connection to the excel workbook.
WHERE Operators:
LIBNAME myXL XLSX “C:/documents/Shopping.XLSX”;
= or EQ,
DATA myXL.shopping;
^=, ~= or NE
SET work.shopping;
> or GT
RUN;
>= or GE
LIBNAME myXL CLEAR
< or LT
<= or LE
IN Operator Exporting Reports
WHERE col-name in (value1, value2,…);
• The SAS Output Delivery System (ODS) can send reports to
WHERE col-name NOT in (value1, value2,…);
various file types to display reports including CSV, PowerPoint,
Special Operators
RTF, and PDF.
WHERE col-name IS MISSING
• Each output type holds the same basic structure to open and
WHERE col-name IS NOT MISSING
close a file. Additional statements are available and optionable
WHERE col-name IS NULL
based on the file type.
WHERE col-name BETWEEN value1 AND
value2 ODS <destination> < destination specifications>;
WHERE col-name LIKE “value%” /* SAS Code that produces output */
WHERE col-name LIKE “value_” ODS destination CLOSE;
• A subsetting IF statement can be used on any variable that • Additional options to excel files include:
exists in the PDV. (e.g. variables on the input data set and • Adding a style
new variables created) • Adding a worksheet label
• The expression used in the IF statement is written with most ODS EXCEL FILE="filename.xlsx"
of the same operators as a WHERE expression. STYLE=style
OPTIONS(SHEET_NAME='label') ;
/*Implicit output */
/* SAS code that produces output on first
IF expression;
worksheet */
IF expression THEN output;
ODS EXCEL OPTIONS(SHEET_NAME=‘label’);
/* SAS code that produces output on second
/* Explicit output to specific table*/
worksheet */
IF expression THEN output libref.output-dataset-name;
ODS EXCEL CLOSE;
MACRO variables • PDF outputs can include a Table of Contents (PDFTOC) and
Procedure labels in the bookmarks.
• A macro variable stores a value that can be submitted into a
ODS PDF FILE="filename.xlsx"
SAS program
STYLE=style
• If a macro variable is referenced inside quotation marks,
STARTPAGE = NO PDFTOC= 1;
then double quotation marks must be used
ODS PROCLABEL “label”;
• Assign a value to a macro variable using a %LET statement
/* SAS code that produces output */
• The ampersand “&” must be used when calling a macro
ODS PDF CLOSE;
variable. The & triggers the macro facility
Copyright © 2022 SAS Institute Inc. Cary, NC, USA. All rights reserved