SAS Base Programming Reference Sheet

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

SAS Base Programming Reference Sheet

SAS Programming Fundamentals Procedures for Exploring and Analyzing Data


• A program can create a log, results, and output data • PROC CONTENTS: Prints descriptor portion of a data set
• Programs are comprised of Data and Procedure steps
PROC CONTENTS DATA = data-set-name;
• Steps end with a run; statement (sometimes quit;)
RUN;
• Each step is a series a statements
• A statement begins with a keyword (e.g. data) and end with a • PROC PRINT: lists all columns and rows in the input table by
semicolon “;” default.
• Assignment statements do not begin with a keyword • OBS = option limits the number of rows listed
• VAR statement limits and orders columns listed
Example Program Program Explanation • Use _NUMERIC_, _CHARACTER_, and _ALL_
Data myclass; Creates a new dataset myclass in keywords to specify variables of a certain type
set sashelp.class; the work library. Calculates a new (numeric or character) or all types
heightcm = height*2.54; variable heightcm • WHERE statement filter the data
Run;
• BY statement groups output data
Prints a view of the myclass dataset
Proc print data = myclass; in the results window displaying • FORMAT statement applies a temporary* format in the
var age heightcm; only 2 variables: age, heightcm. output
Run; • LABEL statement applies a temporary* label to the
There are 7 statements. variable names in the output
* Permanent characteristics are defined in the data step
Global Statements PROC PRINT DATA = data-set-name <label> (OBS = n);
• OPTIONS
Options …;
…; sets SAS system options <VAR col-name(s);>
EX: Options Validvarname = V7;
<WHERE expression;>
<BY col-names(s);>
• TITLE
Title <options> “...”; text…”;
<options>“…title set up to 10 titles <FORMAT col-name(s) <$> format name. ;>
EX: TITLE “Student Ages and Heights”; <LABEL col-name = “Label”; >
TITLE2 just= l “Classroom 105”;
RUN;
• FOOTNOTE <options> “…..” ; sets up to 10 footnotes
Footnote “ ….”;
EX: FOOTNOTE “Data refreshed annually”;
• PROC MEANS: generates simple summary statistics for each
numeric column in the input data by default unless the VAR
• LIBNAME
Libname libref <engine>“………”;
libref<engine> “….”; sets a shortcut reference to statement is used
data of a specific type in a specific location • CLASS specifies variables to group data before
o libref: name to call a library. 8-character max length calculating statistics
o engine: contains predefined set of rules for reading data. • WAYS specifies number of ways to make unique
Base is the default engine reading SAS datasets combinations of class variables
o “…..” physical name of the library recognized by the • OUTPUT provides the option to create an output table
system and specific output statistics
EX: LIBNAME mylib “/C:/documents/project/data”; • OUT = names the output table to be created
LIBNAME myXL xlsx “/C:/documents/class.xlsx”;
PROC MEANS DATA = data-set-name;
• LIBNAME
Libnamelibref CLEAR
clear; ends connection to the data source <WHERE expression;>
Commenting Code <VAR col-name(s);>
<CLASS col-names(s);>
• Comments can be added to prevent text in the program from
<WAYS n;>
executing
<OUTPUT OUT = output-table <statistic =col-name>
• There are 2 comment styles
RUN;
• Comments are not executable statements
• PROC UNIVARIATE: Generates summary statistics and more
/* insert commented text here */ detailed statistics about distribution and extreme values for each
* Insert commented text here ; numeric variable by default
PROC UNIVARIATE DATA = data-set-name;
Accessing Data
<VAR col-name(s);>
• SAS can read and understand structured (e.g. xlsx) and <WHERE expression;>
unstructured (e.g. csv) data types RUN;
• Structured data can be read via a LIBNAME statement or
• PROC FREQ: Creates a frequency table for each variable in the
PROC IMPORT step
input table by default.
• Unstructured data requires PROC IMPORT to define rules
• TABLES limits the variables analyzed
Importing data using Proc Import
• <options> customize outputs by limiting columns (i.e.
Example Program Program Explanation nocum), modifying output style (i.e. crosslist, listing) or
Proc import datafile = “myfile. csv” Import a CSV file (unstructured generating graphs.
dbms = csv out = mylib.example data) that outputs a SAS dataset • Create a crosstabulation report by adding an asterisk (*)
replace; example in a user defined between two variable names on the TABLES Statement
guessingrows = 100; permanent library mylib.
Run; Optionally add a guessingrows PROC FREQ DATA = data-set-name;
statement to read n rows to <TABLES col-name(s) </options>;>
determine variable attributes. <WHERE expression; >
RUN;

Copyright © 2022 SAS Institute Inc. Cary, NC, USA. All rights reserved
SAS Base Programming Reference Sheet
Procedures for Data Manipulation The DATA Step: Controlling Variable Output
• PROC SORT: sorts the rows in a table on one or more character • DROP= / KEEP = options can be added to a table on the DATA
or numeric columns. A PROC SORT is required before any step statement or SET statement.
that uses a BY statement • DROP/ KEEP statements can be added within the data step
• BY specifies the columns used in the sort • Columns kept or dropped will be flagged in the PDV
• OUT = specifies an output table • Dropping a column on the SET statement makes a column
• NODUPKEY keeps only the first row for each unique unavailable for processing in the data step
value of the columns(s) listed in the by statement DATA work.expensive (KEEP = price item_name);
• DUPOUT creates an output table containing SET work.shopping (DROP = city state);
duplicates KEEP store_name;
• DESCENDING sorts column from 9 to 0 or Z to A RUN;
PROC SORT DATA = input-table <OUT = output-
table> <NODUPKEY> <DUPOUT = output-table>; The DATA Step: Processing Data in Groups
BY <DESCENDING> col-name(s) </options>; • Process data in groups after sorting data first
RUN; • First.bycol is 1 for the first row within a group and 0 otherwise.
• Last.bycol is 1 for the last row within a group and 0 otherwise
• PROC TRANSPOSE: is used to restructure a table
• VAR lists column(s) to be transposed BY col-names(s);
• ID creates a separate column for each value of the ID FIRST.bycol <expression>;
Variable and can only be one column. LAST.bycol<expression>;
• BY transpose data within groups. Unique • Accumulating columns require modifying SAS’ default
combinations of BY values creates one row in the behavior to retain all PDV values with each iteration.
output table • Accumulating columns are often used in conjunction with
• PREFIX provides a prefix for each value of the ID FIRST./LAST. logic allowing BY GROUP calculations & totals
column • Column is the new variable holding the accumulating total
• NAME names the column that identifies the source
column containing the transposed values Column + expression;

PROC TRANSPOSE DATA = input-table OUT = The DATA Step: Conditional Processing and Loops
output-table <PREFIX = column> <NAME = column>;
<BY col-name(s); > • Conditionally process data using IF/ELSE IF/ ELSE statements
<ID column;> • SAS will check the expressions sequentially until one is true
<VAR columns(s);> • IF statements can create new variables or new data sets
RUN; if price >100 then newVar = “Expensive”;
else if price <100 and price >0 then newVAR = “Cheap”
Preparing Data: The DATA Step else if price = 0 then newVAR = “FREE”;
else newVAR = “Priceless”;
DATA output-dataset; • Execute multiple statements by using a DO statement
set input-dataset;
Run; if price >100 then do;
newVar = “Expensive”;
• The data step is processed in two phases: output work.expensive;
• Compilation: creates the PDV, establishes data end;
attributed and rules for execution
• Process repetitive code using DO LOOPS
• Execution: SAS reads, manipulates and write data
• The optional OUTPUT statement will output a row for each
• DATA Steps create two default variables:
iteration of the loop
• _N_ counts the number of iterations through the
data step when processing DATA output-table;
• _ERROR_ is initialized at 0. If an error is SET input-table;
encountered, the value is set to 1 DO indexcolumn = start TO stop <BY increment> ;
• Explicit Output statements can be used to control when . . . repetitive code . . .
and where each row is written. <OUTPUT;>
• Multiple datasets can be created in one data step END;
RUN;
DATA work.cheap work.expensive;
set work.shopping; • A DO UNTIL executes until a condition is true, and the
if price >100 then output work.expensive; condition is checked at the bottom of the DO loop. A DO
else output work.cheap UNTIL loop always executes at least one time.
Run; • A DO WHILE executes while a condition is true, and the
condition is checked at the top of the DO loop. A DO WHILE
Program Explanation loop does not iterate even once if the condition is initially false.
The data step creates to output tables CHEAP and
EXPENSIVE based on the input dataset WORK.SHOPPING. DO WHILE | UNTIL expression;
If an observation has PRICE greater than 100, then the . . . repetitive code . . .
observation is assigned to the EXPENSIVE dataset. END;

Copyright © 2022 SAS Institute Inc. Cary, NC, USA. All rights reserved
SAS Base Programming Reference Sheet
The DATA Step: Combining Data • Common Date Functions:
• Combine tables by concatenating them (stacking), or matching NOTE: SAS Dates are numeric values calculated as the number
them based on a variable of days since JAN 1, 1960.
• Concatenating: Function What is does
• SAS reads all the rows from the first table listed on the
set statement and writes them to the output table. MDY(month, day, year) Creates a SAS Date Value
Then from the second table, and so on
TODAY() Returns the current date as a numeric SAS
• Columns with the same name are aligned
date value
• Columns not in all tables are included
• The RENAME = option can rename columns in input YEAR(date-var); Returns Year/Month/Day/QTR of the sas
tables, so they align in the output table MONTH(date-var) date value input
• Additional DATA step statements can be used after the DAY(date-var)
set statement to manipulate data QTR(date-var)
DATA output-dataset; INTCX(interval, start- Increments a date/time/datetime value by
set input-dataset1 input-dataset2 from, increment) a given time interval
(rename=(currentName = newName));
Run; • Common Character Functions
Function What is does
• Merging tables
• All tables in the MERGE statement must be sorted by TRIM(string) Removes trailing blanks
the column(s) listed in the BY statement
• The MERGE statement combines rows where the BY- STRIP(string) Removes all leading and trailing blanks
Column values match
SCAN(string, count, Returns the nth word from a string
• Identify matching and no matching rows by using the
<char-list, <modifier>>)
IN= dataset option. IN variable values are 0 or 1.
• 0  table did NOT include the by-column value. PROPCASE(string) Changes the casing of the string.
• 1  table did include the by-column value UPCASE(string) Commonly used in statements of equality
• Use a subsetting IF or IF-THEN logic to handle LOWCASE(string)
matching & nonmatching rows
SUBSTR(string, start- Extracts a substring from the argument
DATA output-dataset;
from, length)
MERGE input-dataset1 <(in = VAR1)>
input-dataset2 <(in = VAR2)>;
Customizing SAS Output: Labels and Formats
BY by-column(s);
< IF var1 = 1 and var2 = 1;> /* all matching rows */ • Labels and Formats can be applied in the DATA step and
RUN; assigned as permanent attributes. These statements can also be
used in reporting procedures as temporary attributes. (e.g. they
The DATA Step: Functions need to be specified in each procedure)
• SAS has functions to handle character, numeric and date • Labels can be used to provide more descriptive column
columns. headers. A label can include any text up to 256 characters.
new-var = function(argument1, argument2,…); • Add labels to more than one column in a single label statement

• Convert numeric values to character using the PUT function LABEL col-name1 = “Label Text 1”
col-name2 = “Label Text 2” ;
char-var = put(numeric-var, informat);
• Formats are used to change the way values are displayed in
• Convert character values to numeric using the INPUT function data and reports.
Numeric-var = input(char-var, format); • Formats do not change the underlying data values.
• Add formats to more than one column in a single statement
• SAS has functions to handle character, numeric and date
columns. FORMAT date-var mmddyy10. num-var dollar13.2;
• Common Numeric Functions: • Create your own custom formats using the PROC FORMAT
Function What is does procedure
• VALUE statement specifies the criteria for creating one
RAND(‘distribution, Generates random numbers from a custom format.
paramter1,...) selected distribution • Multiple VALUE statements can be used within the
PROC FORMAT step.
ROUND(number, Rounds number to the nearest
<rounding unit>) rounding unit (.01, .001, etc) PROC FORMAT;
VALUE format-name <$>
LARGEST(k, value1, Returns the Kth largest non missing value-or-range-1 = 'formatted-value’
value2, …) value value-or-range-2 = 'formatted-value’
...;
SUM(argument1, Sums all non missing arguments
RUN;
argument2,…)

Copyright © 2022 SAS Institute Inc. Cary, NC, USA. All rights reserved
SAS Base Programming Reference Sheet
Filtering Data Exporting Data
• WHERE Statements filter rows and can be used in both the • Export a SAS dataset to variable file types (XLSX, TXT,CSV,
DATA step and PROC steps. etc) using a PROC EXPORT step
WHERE expression; • PROC EXPORT must be used to export
unstructured data type. (e.g. CSV files)
• If the expression is true, rows are read, if false, they are not. • DMBS = the database management system which
• WHERE statements can only work with columns that exist on specifies the type of data to export. (e.g. CSV,
an input dataset, not ones that are calculated during DLM, JMP, TAB)
manipulation.
• Character values are case sensitive and must be in quotes “ ” PROC EXPORT DATA=input-table
EX: WHERE Car_Make = “Honda” will select OUTFILE="output-file“
different rows than WHERE car_make = “HONDA” <DBMS = identifier REPLACE> ;
• Numeric values are not in quotes and can only include digits, RUN;
decimal points and/or negative signs
• Compound conditions can be created using AND/OR • Alternatively use a LIBNAME statement to export data.
• Logic can be reversed with the NOT keyword • A LIBNAME statement can only be used if the output data
• Use the SAS Date constant when filtering with dates: type has an accessible SAS Engine (e.g. XLSX, JSON, XML).
“ddMONyyyy”d • Ensure a LIBNAME libref CLEAR statement is used at the
end to close the connection to the excel workbook.
WHERE Operators:
LIBNAME myXL XLSX “C:/documents/Shopping.XLSX”;
= or EQ,
DATA myXL.shopping;
^=, ~= or NE
SET work.shopping;
> or GT
RUN;
>= or GE
LIBNAME myXL CLEAR
< or LT
<= or LE
IN Operator Exporting Reports
WHERE col-name in (value1, value2,…);
• The SAS Output Delivery System (ODS) can send reports to
WHERE col-name NOT in (value1, value2,…);
various file types to display reports including CSV, PowerPoint,
Special Operators
RTF, and PDF.
WHERE col-name IS MISSING
• Each output type holds the same basic structure to open and
WHERE col-name IS NOT MISSING
close a file. Additional statements are available and optionable
WHERE col-name IS NULL
based on the file type.
WHERE col-name BETWEEN value1 AND
value2 ODS <destination> < destination specifications>;
WHERE col-name LIKE “value%” /* SAS Code that produces output */
WHERE col-name LIKE “value_” ODS destination CLOSE;

• A subsetting IF statement can be used on any variable that • Additional options to excel files include:
exists in the PDV. (e.g. variables on the input data set and • Adding a style
new variables created) • Adding a worksheet label
• The expression used in the IF statement is written with most ODS EXCEL FILE="filename.xlsx"
of the same operators as a WHERE expression. STYLE=style
OPTIONS(SHEET_NAME='label') ;
/*Implicit output */
/* SAS code that produces output on first
IF expression;
worksheet */
IF expression THEN output;
ODS EXCEL OPTIONS(SHEET_NAME=‘label’);
/* SAS code that produces output on second
/* Explicit output to specific table*/
worksheet */
IF expression THEN output libref.output-dataset-name;
ODS EXCEL CLOSE;

MACRO variables • PDF outputs can include a Table of Contents (PDFTOC) and
Procedure labels in the bookmarks.
• A macro variable stores a value that can be submitted into a
ODS PDF FILE="filename.xlsx"
SAS program
STYLE=style
• If a macro variable is referenced inside quotation marks,
STARTPAGE = NO PDFTOC= 1;
then double quotation marks must be used
ODS PROCLABEL “label”;
• Assign a value to a macro variable using a %LET statement
/* SAS code that produces output */
• The ampersand “&” must be used when calling a macro
ODS PDF CLOSE;
variable. The & triggers the macro facility

%LET macro-variable = value; Additional Information


WHERE numvar = &macrovar;
WHERE charvar = “&macrovar”; • For more information on SAS programming techniques,
visit go.documentation.sas.com

Copyright © 2022 SAS Institute Inc. Cary, NC, USA. All rights reserved

You might also like