Professional Documents
Culture Documents
SAS Programming
SAS Programming
No part of this book should be referenced or copied without the prior permission of the
company.
3. Import and Export in SAS and Associations between Two Variables Page 23
SAS is an integrated system of software solutions that enables you to perform the fol-
lowing tasks:
Moreover, the SAS language is user-friendly and can be used in another platform, say,
Mainframe and Statistica(latest version is 10).So we can write Base SAS codes at the
backend but the interface or front end will be Statistica or Mainframe output. Moreo-
ver SPSS/Statistica codes are cumbersome while SAS codes are user-friendly. ORACLE
or EXCEL could also be used in Analytics but the advantage of SAS is that here, a
small keyword can be used to generate a huge output, so there is an increased flexi-
bility compared to ORACLE.SQL language can also be directly written into SAS by in-
stalling a component known as SAS-SQL.
Log Window
Temporary Library
Its Temporary Storage Location of a SAS data file. They last only for the current SAS ses-
sion. Work is the temporary library in SAS. When the session ends, the data files stored in
the temporary library are automatically deleted.
Data employee;
Set local.emp;
Run;
On the above code employee data will be stored in temporary library work.
Permanent Library:
Its the Permanent storage location of data files. Data sets stored in any Permanent
SAS libraries are available for use in subsequent SAS sessions. A data set stored in a per-
manently library will be there unless we delete them physically. To store files perma-
nently in a SAS data library specify a library name Other than the default library name
Work.
Three Permanent Libraries provided by SAS are:
Local
SASuser
SAShelp
Here, Day1is a library reference name, libname is the keyword assigns the libref Day1to
the folder called OrangeTree in the specified path:
C:\Documents and Settings\admin\Desktop\Orange Tree
Columns in SAS
Rows in SAS
Two-level name are used to reference a permanent SAS file in SAS programs
There are two parts in a Two-Level Name:
1. Libref name
2. Filename
Libref Is the name of the SAS data library that contains the file
Filename Is the name of the file itself A period separates the libref and filename. Ex-
ample: Clinic.Admit is the two-level name for the SAS data set Admit. Admit is as-
signed to the library named Clinic.
One-Level name
One-level name (the filename only) can be used to reference a file in a temporary
SAS library
When a one-level name is used, the default libref Work is assumed, Example: Here,
the one-level name Test also references the SAS data set named Test that is stored
in the temporary SAS library Work.
Data Step
Typically create or modify SAS data sets and they can also be used to produce cus-
tom-designed reports.
Compute values
Produce new SAS data sets by sub setting, merging, and updating existing data
sets
Produce new SAS data sets by sub setting, merging, and updating existing data
sets
Proc Step
They pre-written routines that enable us to analyze and process the data in a SAS data
set and to present the data in the form of a report. PROC steps sometimes create new
SAS data sets that contain the results of the procedure. PROC steps can list, sort, and
summarize data.
DATA<dataset name>;
INPUT<variablename1>*$+ <variablename2>*$+ ;
DATALINES;
run ;
After typing in the values give a semicolon to indicate the end of the data values.
Data Day1.employee;
Length city$10 ;
Input City$ Id$ Sal Doj;
InFormat sal dollar10. doj ddmmyy10. ;
Format sal dollar10. doj ddmmyy10. ;
datalines;
Bangalore T101 $20,000 19/09/1979
Delhi T101 $23,000 13/01/1983
Kolkata Y109 $24,000 12/09/2001
Chennai I111 $29,000 10/10/2010
;run;
In the above code we creating a new dataset employee . Length statement is used to
increase the length of the variable (column heading) city as the default length is 8 . If
the length is not increased then the letter e of the observation Bangalore will come
in the employee data set.
Input is the keyword to declare the column headings i.e. city id sal doj. Dollar ($) sign is
used with the variable city and id because they are character variable.
Informat is the keyword to read values with special character like $,/ , comma etc. For-
mat is the keyword to write them in the proper format example salary as $23,000 etc.
Datalines is the keyword to declare the observations under the variables.
Now if we want to add label (descriptive text )to the variable sal for better under-
standing. Secondly we can also change the name of a variable permanently by the
rename keyword.
Below is the code for label and rename.
Data Day1.employee;
Length city$10 ;
Input City$ Id$ Sal Doj;
Informat sal dollar10. doj ddmmyy10. ;
Format sal dollar10. doj ddmmyy10. ;
label sal="salary of the employees";
Rename doj = date_of_joining;
datalines;
Bangalore T101 $20,000 19/09/1979
Delhi T101 $23,000 13/01/1983
Kolkata Y109 $24,000 12/09/2001
;run;
Proc Contents
Proc contents lists the structure of the specified SAS data set. The information includes
the names and types (numeric or character) of the variables in the data set. The most
common form of usage is
Proc Print
The PRINT procedure prints the observations in a SAS data set, using all or some of the
variables. You can create a variety of reports ranging from a simple listing to a highly
customized report that groups the data and calculates totals and subtotals for numer-
ic variables.
This example selects two variables for the reports. Var is the keyword to select variable.
Id selects variables to include in the report and the order in which they appear
Points to remember
Syntax rules
SAS statements are not case sensitive. You may use upper or lower case.
When you refer to "external" file name, it is case-sensitive (interaction with UNIX op-
erating system)
Commands can extend over several lines as long as words are not split
SAS name rules (for datasets and variables): up to 32 characters long; must start
with a letter or an underscore (_). Avoid special characters.
Uses of ODS
It also can create output in a variety of formats, such as: html, pdf, rtf, etc.
As stated earlier it can also create output datasets that generally can also be cre-
ated within most of the SAS procedures.
Example:
RTF Files
This creates prettier output that can be read by MS Word and other word processing
programs. The general syntax is:
PDF Files
This creates prettier output that can be read by Adobe Acrobat Reader, one cave-
at of using this is that you need the Adobe Acrobat Distiller. The general syntax is:
Restrictions on observations
You can control how many records you process or print when testing your programs. If
you just want to print a few observations in a data set or test a new program it is a
The (obs=50) option will limit the printing to the first 50 observations in the data set. OBS
specifies the last observation of the SAS data set that will be printed. It does NOT speci-
fy how many observations that should be printed. If you wanted to begin printing at
the 20th observation and end at the 50th observation you can use the firstobs option:
Proc print data=day1.class (firstobs = 20 obs = 50);
Run;
This will limit processing to the first 50 observations in day1.class1. You can also use the
firstobs option to begin processing at a place other than the first observation. If you
combine firstobs with the obs option, remember that obs tells SAS the LAST observa-
tion to process not how many observations to process so be sure that obs is greater
than firstobs (firstobs = 1000 obs = 1500).
Points To Remember
Valid in: DATA step and PROC steps
Category: Observation Control
Default: MAX
Restriction: Use with input data sets only
Restriction: Cannot use with PROC SQL views
Restrictions On Variables
Within the SAS (R) System, the DROPIKEEP concept is used to effect the availa-
bility of variables within a SAS step, or to control which variables are to be writ-
ten to SAS data files. Both words ("KEEP" and "DROP") are part of the same con-
cept, and wherever the syntax of the SAS System allows the use of one of them,
The KEEP concept specifies which variables to make available or select. When the
KEEP concept is employed, only variables explicitly mentioned on the list following
the word "KEEP" are available/ selected.
The DROP concept, on the other hand, specifies which variables not to have
available/selected. When the DROP concept is employed, those variables not ex-
plicitly mentioned on the list following the word "DROP" are the only ones availa-
ble selected.
KEEP and DROP statements are used often to control the number of variables (fields)
read into and output into the datasets. During the data processing we create several
variables but need to save only select ones in the final dataset.
If you want to restrict the number of columns in output data set, use the following
method. This will ensure that output dataset is created with required variables only.
Set day1.class;
keep = var1 var2 var3 ;
Run;
If you are reading a big dataset into SAS and require only a few variables from it, use
the following statements in the program.
Data day1.class1 ;
Set day1.class((keep = var1 var2 var3 etc);
Run;
In the first case, SAS reads the entire data set class, even though you only intend to use
three variables. In the second case, SAS reads from disk only the three variables you
intend to keep. Please note that we have to use such efficient methods to restrict the
data read into the system to optimize the system resources such as SASWORK and
shared drives. The same way DROP statement can also be specified based on the da-
ta requirement of the user.
Conditional statements
Conditional statements are used to restricts the output based on certain conditions.
2)If else
3)When
Use a WHERE statement to select observations that meet a particular condition from a
SAS data set. The WHERE statement subsets the input data by specifying certain condi-
tions that each observation must meet before it is available for processing. The condi-
tions that you define in a WHERE statement is an arithmetic or logical expression that
generally consists of a sequence of operands and operators. To compare character
values, you must enclose them in single or double quotation marks and the values
must match exactly, including capitalization. Using the WHERE statement might im-
prove the efficiency of your SAS programs because SAS is not required to read all the
observations in the input data set. Note: You can use only one WHERE statement in a
Data or a Proc step.
Comparison Operators
Logical Operators
Logical operators, also called Boolean operators, are usually used in expressions to link
sequences of comparisons. The logical operators are shown below:
Contains Operator: The CONTAINS or question mark (?) operator selects observations
that include the string specified in the WHERE expression. This operator is available for
character variables only. The position of the string in the variable does not matter;
however, the operator distinguishes between uppercase and lowercase characters
when making comparisons. The following examples select observations containing the
values Mobay and Brisbayne for the variable COMPANY, but they do not select the
observation containing Choco:
Examples: where salary between 500 and 1000; where taxes between salary*0.30 and
salary*0.50;
You can combine the NOT operator with the BETWEEN-AND operator to select values
that fall outside the range.
Like Operator: The LIKE operator selects observations by comparing the values of a
character variable to a specified pattern, which is referred to as pattern matching.
The LIKE operator is case sensitive. There are two special characters available for spec-
ifying a pattern:
Percent sign (%):specifies that any number of characters can occupy that position.
The following WHERE expression selects all employees with a name that starts
with the letter N. The names can be of any length. where lastname like 'N%';
Underscore (_): matches just one character in the value for each underscore
character.
However, the easier way would be to use the IN operator, which says you want any
state in the list: where state in ('NC','TX');
In addition, you can use the NOT logical operator to exclude a list. For example, where
state not in ('CA', 'TN', 'MA');
AND OPERATOR: If both conditions linked by the AND are true, then the expression is
true. If either condition linked by the AND is false then the entire statement is false.
The above code will extract only brand Nestle with calories greater than 200.
SAS evaluates the expression in an IF-THEN statement to produce a result that is either
nonzero, zero, or missing. A nonzero and non missing result causes the expression to be
true; a result of zero or missing causes the expression to be false.
If the conditions that are specified in the IF clause are met, the IF-THEN statement exe-
cutes a SAS statement for observations that are read from a SAS data set, for records
in an external file, or for computed values. An optional ELSE statement gives an alter-
native action if the THEN clause is not executed. The ELSE statement, if used, must im-
mediately follow the IF-THEN statement.
Using IF-THEN statements without the ELSE statement causes SAS to evaluate all IF-THEN
statements. Using IF-THEN statements with the ELSE statement causes SAS to execute IF-
THEN statements until it encounters the first true statement. Subsequent IF-THEN state-
ments are not evaluated.
Also known as conditional statements, these are very important when subsetting data
or processing observations conditionally. The ELSE part is not required as we have seen
earlier with statements like
if x < 0 then delete;
if name='Smith';
In the form IF <condition> THEN <statement1> ELSE <statement2> SAS evaluates for
each observation the logical condition. If the condition is true, it executes statement1,
if it is false statement2. Important to note is that only a single statement follows the
THEN and ELSE clause. Example:
Data day1.cars_new;
Set day1.cars;
If country=USA then status =New Car;
Else if country =Japan then status =old Car;
Else status =others;
Run;
In the above code we extract observation and creates a new variable status and as-
sign values to them conditionally corresponding to each country.
Specifies one or more expressions that are evaluated and compared with the saved
value from the SELECT statement.
If an expression is found that is equal to the saved value, the evaluation of expressions
in WHEN statements is terminated, and the unit of the associated WHEN statement is
executed. If no such expression is found, the unit of the OTHERWISE statement is exe-
cuted. The WHEN statement must not have a label.
OTHERWISE unit
Specifies the unit to be executed when every test of the preceding WHEN statements
fails. If the OTHERWISE statement is omitted and execution of the select-group does not
result in the selection of a unit, the ERROR condition is raised. The OTHERWISE statement
must not have a label or condition prefix.
n the above code ,the data set candy_sales_summary is splitted into two i.e.
day1.choco and day1.nuts. All the category Candy is transferred to Day1.choco and
rest all categories goes to day1.nuts. Basically if statement is used for subsetting the
candy_sales_summary dataset.
In the above code we arranged the candy dataset in ascending order by the varia-
ble calories but the arranged data is stored in candy_sorted dataset as we used the
out keyword.
The NODUP option checks for and eliminates duplicate observations. If you specify this
option, PROC SORT compares all variable values for each observation to those for the
previous observation that was written to the output data set. If an exact match is
found, the observation is not written to the output data set. The NODUPKEY option
checks for and eliminates observations with duplicate BY variable values. If you specify
this option, PROC SORT compares all BY variable values for each observation to those
for the previous observation written to the output data set. If an exact match using the
BY variable values is found, the observation is not written to the output data set.
Notice that with the NODUPKEY option, PROC SORT is comparing all BY variable values
while the NODUP option compares all the variables in the data set that is being sort-
ed. An easy way to remember the difference between these options is to keep in
mind the word key in NODUPKEY. It evaluates the key or BY variable values that
you specify. One thing to beware of with both options is that they both compare the
previous observation written to the output data set. So, if the observations that you
want eliminated are not adjacent in the data set after the sort, they will not be elimi-
nated.
Proc Import
The IMPORT procedure reads data from an external data source and writes it to a SAS
data set. External data sources can include:
Excel
CSV
TXT
Delimited files contain columns of data values that are separated by a delimiter, such
as a blank or a comma.
IMPORT Procedure
The syntax for the IMPORT procedure is shown here briey but is described in detail
in theSAS Procedures Guide.
This code is for any delimited file like semicolon, space, underscore etc.
EXPORT Procedure
The syntax for the EXPORT procedure is shown here briey but is described in detail
in the SAS Procedures Guide.
The EXPORT procedure reads data from a SAS data set and exports it to an external
data source. PROC EXPORT also controls the results with options and statements that
are specic
to the output data source.
The following example exports a SAS data set named MYFILE.CLASS and creates a de-
limited external le called CLASS. Notice that the DELIMITER= statement species the
ampersand (&) delimiter to separate the column names in the new le.
Proc Freq
The FREQ procedure produces one-way to n-way frequency and contingency (cross
tabulation) tables. For two-way tables, PROC FREQ computes tests and measures of
association. For n-way tables, PROC FREQ provides stratified analysis by computing sta-
tistics across, as well as within, strata.
For one-way frequency tables, PROC FREQ computes goodness-of-fit tests for equal
proportions or specified null proportions. For one-way tables, PROC FREQ also provides
confidence limits and tests for binomial proportions, including tests for non inferiority
and equivalence. For contingency tables, PROC FREQ can compute various statistics
to examine the relationships between two classification variables. For some pairs of
variables, you might want to examine the existence or strength of any association be-
tween the variables. To determine if an association exists, chi-square tests are comput-
Example:
Proc freq data =day1.candy;
Tables brand;
Run;
In a freq procedure specifies the frequency and cross tabulation tables to produce. A
request is composed of one variable name or several variable names that are sepa-
rated by asterisks. To request a one-way frequency table, use a single variable. To re-
quest a two-way cross tabulation table, use an asterisk between two variables. To re-
quest a multi-way table (an n-way table, where n>2), separate the desired variables
with asterisks. The unique values of these variables form rows, columns, and strata of
the table.
Example:
Proc freq data =day1.candy_sales_summary;
Tables category * subcategory;
Run;
The TABLES statement requests one-way to n-way frequency and cross tabulation ta-
bles and statistics for those tables. If you omit the TABLES statement, PROC FREQ gener-
ates one-way frequency tables for all data set variables that are not listed in the other
statements. Above code will generate two way frequency distribution for the variable
category and subcategory. In the output table we will get Frequency, Percent fre-
quency , row percent and column percent. The upper left hand corner of the table
contains a legend of what the numbers inside each cell represent.
List - prints two-way to n-way tables in a list format rather than as cross tabulation ta-
bles.
Proc Format
The reports contain different values of different types. The values must be formatted for
ready understandability and proper presentation. The formats are applied to the val-
ues , after the data is read from the dataset and before sending the data to print des-
tination. The Format procedure allows formatting numeric and non numeric values by
using different statements after the proc format statement in a proc step.
Example:
Proc format;
Value $gender M=Male
F=Female;
Run;
Above output we will get if we dont invoke the user defined format i.e. gender. This
In the above
code we
called the us-
er defined for-
mat gender.
That is why
the observa-
tions under
sex variable
are changed
as Male and
female.
To generate descriptive statistics for numeric variables , following are the procedure.
Proc Means
Proc Summary
Proc univariate
Proc Tabulate
Proc means
The MEANS procedure provides data summarization tools to compute descriptive sta-
tistics for variables across all observations and within groups of observations. For exam-
ple, PROC MEANS calculates descriptive statistics based on moments
By default, PROC MEANS displays output. You can also use the OUTPUT statement to
store the statistics in a SAS data set. Means procedure by default generates five de-
scriptive statistics, they are mean, standard deviation, minimum value, and maxi-
mum value. The syntax of the PROC MEANS statement is:
Example:
Proc means data=day1.candy_sales_summary;
Var Sale_amount;
Run;
OUTPUT OUT = dataset name statistics will be output to a SAS data file
A few examples:
Proc means data = day1.candy_sales_summary;
Class category subcategory;
Var sale_amount;
Run;
Proc Summary
The PROC SUMMARY procedure allows the user to obtain statistical analyses on
data obtained from a permanent, or working storage, SAS data set. The purpose
of the procedure may be summarized as follows:
To produce a SAS data set for use with subsequent data steps or procedures.
The VAR statement lists the numeric variables for which summary statistics are de-
sired. This is a required statement for PROC SUMMARY. The output statement is re-
quired. Specifying OUTPUT OUT options to create a new data set specified as can-
dy_summary. Whenever we create output from proc summary or proc means step we
get two variable i.e. _Type_ and _freq_ . The SAS variable _TYPE_ is created con-
Proc Univariate
Median value
Mode value
Percentiles as follows-
1,5,10,25,50,75,90,95,98,99
Quartiles
Range
Mean
Variance
Standard Deviation
Minimum Value
Maximum value
Sum of Squares
Skewness
Kurtosis
Number of observations equal to zero, less than zero and greater than zero
The above code will generate all descriptive statistics for the numeric variable Height.
Proc univariate data=day1.class;
Var height;
Histogram Height;
Run;
The above will generate the descriptive statistics for the numeric variable height. Apart
from this it can generate histogram for the variable height. In addition, you can use
the following statements to request plots:
the CLASS statement together with any of these plot statements for creating com-
parative plots
Proc Tabulate
The TABULATE procedure in SAS provides a flexible platform to generate tabular re-
ports. The simplest possible table in TABULATE has to have three things: a PROC TABU-
LATE statement, a TABLE statement, and a CLASS or VAR statement. In this example,
we will use a VAR statement. Later examples will show the CLASS statement.
The PROC TABULATE statement looks like this:
The second part of the procedure is the TABLE statement. It describes which variables
to use and how to arrange the variables. When there is only one variable, you get a
one-dimensional table.
If you run this code as is, you will get an error message because TABULATE cant figure
out whether the variable HEIGHT is intended as an analysis variable, which is used to
compute statistics, or a classification variable, which is used to define categories in the
table. In this case, we want to use height as the analysis variable. We will be using it to
The above code will generate the output. It has a single column, with
the header HEIGHT to identify the variable, and the header SUM to identify the statistic.
There is just a single table cell, which contains the value for the sum of HEIGHT for all of
the observations in the dataset Class.
To specify the statistic for a PROC TABULATE table, you modify the TABLE statement.
You list the statistic right after the variable name. To tell TABULATE that the statistic
MEAN should be applied to the variable HEIGHT, you use an asterisk to link the variable
name to the statistic keyword. The asterisk is a TABULATE operator. Just as you use an
asterisk as an operator when you want to multiply 2 by 3 (2*3), you use an asterisk
when you want to apply a statistic to a variable.
The output with the new statistic is shown below. Note that the variable name at the
top of the column heading has remained unchanged. However, the
statistic name that is shown in the second line of the heading now
says Mean. In addition, the value shown in the table cell has
changed from the sum to the mean.
The resulting table is shown below. Now the column headings have changed. The vari-
able name Height and the statistic name Mean are still
there, but under the statistic label there are now two col-
umns. Each column is headed by the variable label SEX
and the category Female and Male .The values shown in
the table cells now represent subgroup means.
The SAS System Software provides a wealth of tools for users who need to work with
data collected in the time domain. These tools include functions which create a SAS
date, time or date time variable from either raw data or from variables in an existing
SAS data set.
extract parts from a SAS date variable, such as the month, day or week, or year.
A second set of tools, SAS date/time formats, modify the external representation of a
SAS date or time variable. As with other SAS System formats, a date, time or datetime
format displays the values of the variable according to a specified width and form.
Use of date, time or datetime formats is essential when creating applications or pro-
grams in the SAS System portraying the values of variables collected in time. SAS date/
time informats are able to convert raw data into a date, time or datetime variable.
They read fields (i.e., variables) in either raw data files or SAS data sets .
Extracting parts from a SAS Date Variable Several SAS functions are available to ob-
tain information about the values of a SAS date variable. These include:
Data day1.candy_date;
Set day1.candy_sales_summary;
format Sysdate ddmmyy10.;
Sysdate=today();
Date=day(date);
Month=month(date);
Year=year(date);
Quarter=qtr(date);
Week_day=weekday(date);
Week_num=week(date);
Run;
A common application of SAS System date and time capabilities is to determine how
long a period has elapsed between two points in time. This can be accomplished by
one of two methods:
arithmetic operation (usually subtraction and/or division) between two SAS date,
time or datetime variables or between a SAS date, time, or datetime variable and
a constant term
INTCK Function
A popular and powerful SAS function, INTCK, is available to determine the number of
time periods which have been crossed between two SAS date, time or datetime varia-
bles. The form of this function is: INTCK(interval,from,to)
Where: interval = character constant or variable name representing the time period
of interest enclosed in single quotes from = SAS date, time or datetime value identify-
ing the start of a time span.
Data day1.candy_date1;
Set day1.candy_sales_summary;
Year=intck(Year, date, today());
Run;
The above code with calculate the interval between two date values (date, current
date) in terms of years. But the value will be integer value.
'ACT/ACT'
uses the actual number of days between dates in calculating the number of years.
SAS calculates this value as the number of days that fall in 365-day years divided by
365 plus the number of days that fall in 366-day years divided by 366.
'ACT/360'
uses the actual number of days between dates in calculating the number of years.
SAS calculates this value as the number of days divided by 360, regardless of the actu-
al number of days in each year.
'ACT/365'
uses the actual number of days between dates in calculating the number of years.
SAS calculates this value as the number of days divided by 365, regardless of the actu-
al number of days in each year.
Datdif- The DATDIF function returns the number of days between two dates. The argu-
ments required are the start_date, end_date and basis. The end_date specifies the
date to subtract from (ie. the more recent of the two dates) and the start_date speci-
fies the date to be subtracted (ie. the less recent of the two dates). The basis is a char-
acter constant or variable that specifies the number of days in a month and year that
SAS should assume to calculate the difference. act/act specifies the actual values
and is aliased by Actual. 30/360 assumes a 30 day month and 360 days in an year.
Data day1.candy_date2;
Set day1.candy_sales_summary;
format Sysdate ddmmyy10. ;
Sysdate=today();
Day_diff=datdif(date,sysdate,ACT/ACT);
Year_Diff=yrdif(date,sysdate,ACT/ACT);
Run;
MDY Function
The MDY function takes three numeric constants, variables or expressions representing
the month, day and year and returns a date value comprising of the month, day and
year supplied as arguments. The month argument has to lie in the range 1-12, and the
day argument in the range 1-31. The year argument can be 2 or 4 integers in the first
case, the year will be picked based on the YEARCUTOFF option.
In the above code we concatenate date value which is under three columns the
month variable, day variable and year variable and we will join them using MDY to get
the complete date value under one column i.e. New_date.
DATEPART( ) and TIMEPART( ) functions are used to extract the date and time values
from the SAS date-time value respectively. One need to provide the date-time stamp
as an argument to these function, then the corresponding function will return the de-
sired part of the given date-time stamp.
SYNTAX:
DATEPART(sasdate_time_value);
TIMEPART(sasdate_time_value);
data temp;
date_time = "19DEC2010:20:10:10"dt;
date_part = datepart(date_time);
time_part = timepart(date_time);
run;
UPCASE Function
Purpose: To change all letters to uppercase.
Syntax: UPCASE(character-value)
character-value is any SAS character expression
LOWCASE Function
To change all letters to lowercase.
Syntax: LOWCASE(character-value)
character-value is any SAS character expression.
Note: The corresponding function UPCASE changes lowercase to uppercase.
Data day1.crime_new;
Set day1.crime;
State_up=upcase(Staten);
State_low=lowcase(staten);
State_prop=propcase(staten);
Run;
In above cade the case of the variable Staten is changed from Procase to upcase
and stored under an new variable name State_up. Same is the case with other func-
tions.
Find Function:
Data day1.candy_find;
Set day1.candy;
Position=find (Product,Chocolate,I);
Run;
This code will help to locate that whether variable product contains chocolate or not.
If yes, then a what position it is. Basically it return a numeric value which means at
what place chocolate I s present. But find function is not case sensitive because of the
modifier I.
Index Function
Data day1.candy_index;
Set day1.candy_sales_summary;
Position=index(product,Chocolate);
Run;
Substr Function
To extract part of a string. When the SUBSTR function is used on the left side of the
equal sign, it can place specified characters into an existing string.
character-value is any SAS character expression. start is the starting position within the
string. length if specified, is the number of characters to include in the substring. If this
argument is omitted, the SUBSTR function will return all the characters from the start po-
sition to the end of the string.
Data day1.candy_cust1;
Set day1.candy_customers;
New_region=substr(region,1,3);
Run;
In this code we will extract first three letters from the region variable and will be stored
under new variable New_region.
character-value is any SAS character expression. start is the starting position in a string
where you want to place the new characters. length is the number of characters to
be placed in that string. If length is omitted, all the characters on the right-hand side of
the equal sign replace the characters in character-value.
Data day1.candy_cust2;
Set day1.candy_customers;
Substr(region,1,3)=abc;
Run;
Scan Function
Extracts a specified word from a character expression, where word is defined as the
characters separated by a set of specified delimiters.
character-value is any SAS character expression. n-word is the nth "word" in the string.
If n is greater than the number of words, the SCAN function returns a value that con-
tains no characters. If n is negative, the character value is scanned from right to left.
A value of zero is invalid.
Data day1.candy_cust2;
Set day1.candy_customers;
First=scan(Name,1, );
Middle=scan(Name,2, );
Last=scan(Name,3, );
Run;
This code will extract the string or the character value under the variable name part by
part separated by the delimiter space. For example the string under variable name is
bulls eye emporium. After applying scan function the first part (bulls) will come under
first, eye will come under the variable middle and emporium will come under the vari-
able last. Now the string is divided into three parts i.e. first, middle and last . To join
them and put them under one common variable (column heading) we will apply Catx
function.
CATX Function
To concatenate (join) two or more character strings, stripping both leading and trailing
blanks and inserting one or more separator characters between the strings.
Data day1.candy_cust3;
Set day1.candy_cust2;
New_name=catx ( ,first, middle, last);
Run;
This code will join back the three substring into one complete string.
TRANSLATE can substitute one character for another in a string. TRANWRD is more flexi-
bleit can substitute a word or several words for one or more words.
Purpose of translate function is to exchange one character value for another. For ex-
ample, you might want to change values 15 to the values AE.
Syntax: TRANSLATE(character-value, to-1, from-1 <, to-n, from-n>) ; character-value is
any SAS character expression. to-n is a single character or a list of character values.
from-n is a single character or a list of characters. Each character listed in from-n is
changed to the corresponding value in to-n. If a character value is not listed in from-n,
it will be unaffected.
DATA MULTIPLE;
INPUT QUES : $1. @@;
QUES = TRANSLATE(QUES,'ABCDE','12345');
DATALINES;
1 4 3 2 5
5 3 4 2 1
;
run;
In this example, we want to convert the character values of 15 to the letters AE.
Tranwrd Function
To substitute one or more words in a string with a replacement word or words. It works
like the find and replace feature of most word processors.
Making the analogy to the find and replace feature of most word processors here,
from-string represents the string to find and to-string represents the string to replace.
Notice that the order of from- and to-string in this function is opposite (and more logi-
cal to this author) from the order in the TRANSLATE function.
Data day1.candy_tranwrd;
Set day1.candy_sales_summary;
New=tranwrd (product,Chocolate,Choco);
Run;
The substring chocolate under the variable product will be replaced by choco.
Trim Function
To remove trailing blanks from a character value. This is especially useful when you
data trim;
set day1.candy;
oldName=name||brand;
NewName=trim(Name)||trim(Brand);
run;
In this example we join two variable name and brand under the new column heading
Old name and it has trailing blanks in between . Now to remove the training blanks
and join them we use trim function.
Numeric Functions
Int function
The INT function truncates the decimal portion of the value of the argument. The inte-
ger portion of the value of the argument remains. The INT function takes the integer
value of each element of the argument matrix.
Data day1.candy_int;
Set day1.candy_sales_summary;
Saleamount=int(sale_amount);
Run;
This example will extract only the interger value of sale_amount. If sale_amount is Rs
100.92 ,it will extract only the integer part(100).
Round Function
DATA _null_;
cost = 4.99;
units = 3;
ucost = Round(cost/units,.01);
PUT cost units ucost ;
RUN;
Data day1.candy_rnd;
Set day1.candy_sales_summary;
Sale=round(sale_amount);
Sale1=round(sale_amount,.1);
Run;
In the DATA step you can use a number of SAS functions, e.g., MEAN (computes arith-
metic mean), SUM (calculates sum of arguments), VAR (calculates the variance), ABS
(returns absolute value), SIN (calculates sine), LOG (produces the natural logarithm),
SQRT (calculates the square root). For instance, to create a new variable final which
will be the arithmetic mean (average) of the 3 scores (variables: test1, test2, and
test3), you would use the following command: final=MEAN(test1,test2,test3);
The SUM function sums the numeric arguments. The arguments are separated by com-
mas.
We use this in data step and we get row sum. The MEAN function average the numeric
arguments. The arguments are separated by commas.
Data day1.crime_1;
Set day1.crime;
Total_crime=sum(var1,var2,var3..);
Avg_crime=mean(var1,var2,var3..);
Max_crime=max(var1,var2,var3..);
Min_crime=min(var1,var2,var3..);
Run;
These functions are used in the data step so that we can get total crime, average
crime, maximum crime and minimum crime state wise. Functions used in data step will
give row sum.
Input Function
data day1.sales2;
set day1.sales1;
sales_amount1=input(sales_amount,comma12.);
format sales_amount1 comma12.;
run;
Put Function
data day1.sales4;
set day1.sales1;
cust_id=put(customer_id,3.);
format cust_id $3.;/* since cust_id is character variable */
run;
This is an example of how to change a numeric variable, ID, to character variable. This
example uses PUT function to convert numeric data to character data. The PUT func-
tion writes values with a specified format. It takes two arguments: the name of the nu-
meric variable and a SAS format or user-defined format for writing the data.
The COLUMN statement is used to identify all variables used in the generation of the
table. This statement is followed by the DEFINE statement which specifies how the col-
umn is to be used and what its attributes are to be. One DEFINE statement is used for
each variable in the COLUMN statement. The COMPUTE statement is used to start the
definition of a compute block. The compute block is terminated with a ENDCOMP.
The compute block has a variety of uses including the creation of new columns and
performance of column specific operations.
The following PROC step shows the code to create a simple REPORT table.
You can see that this report resembles the output from a PROC PRINT, however there
are several distinct differences. These include:
there is no OBS column
variable labels are used instead of column names
it is possible to calculate summary statistics and new columns with PROC
The COLUMN statement is used not only to identify the variables of interest, but to also
add headers and to group variables. The primary function of the COLUMN statement is
to provide a list of variables for REPORT to operate against, and these variables are
listed in the order (left to right) that they are to appear on the report. In addition to list-
ing the variables, you can do a number of other things on the COLUMN statement as
well. For instance you can use a comma to attach a statistic to a variable. The de-
fault statistic is the SUM.
Much of a reports layout is determined by the usages that you specify for variables in
the DEFINE statements or DEFINITION windows. For data set variables, these usages are:
DISPLAY,ORDER,ACROSS,GROUP,ANALYSIS. A report can contain variables that are not
in the input data set. These variables must have a usage of COMPUTED.
Display Variables
A report that contains one or more display variables has a row for every observation in
the input data set. Display variables do not affect the order of the rows in the report. If
no order variables appear to the left of a display variable, then the order of the rows in
the report reects the order of the observations in the data set. By default, PROC RE-
PORT treats all character variables as display variables.
Order Variables
A report that contains one or more order variables has a row for every observation in
the input data set. If no display variable appears to the left of an order variable, then
PROC REPORT orders the detail rows according to the ascending, formatted values of
Across Variables
PROC REPORT creates a column for each value of an across variable. PROC REPORT
orders the columns by the ascending, formatted values of the across variable.
Group Variables
If a report contains one or more group variables, then PROC REPORT tries to consoli-
date into one row all observations from the data set that have a unique combination
of formatted values for all group variables. When PROC REPORT creates groups, it or-
ders the detail rows by the ascending, formatted values of the group variable.
Analysis Variables
An analysis variable is a numeric variable that is used to calculate a statistic for all the
observations represented by a cell of the report. You associate a statistic with an anal-
ysis variable in the variables denition or in the COLUMN statement. By default, PROC
REPORT uses numeric variables as analysis variables that are used to calculate the Sum
statistic.
Computed Variables
Computed variables are variables that you dene for the report. They are not in the
input data set, and PROC REPORT does not add them to the input data set. However,
computed variables are included in an output data set if you create one. In the win-
dowing environment, you add a computed variable to a report from the COMPUTED
VAR window.
Example:
In this above observations under variable sex will not repeat themselves because of
the group keyword. Secondly we have added a label to the variable sex i.e. Gen-
der. This is the way to add label in proc report to a variable.
In this example we have arranged the output in ascending order by age . This possi-
ble because of the order keyword.
Example:
In this example we have changed the default sum function to mean function by the
use of analysis keyword. Now in the report we will get average height and average
weight.
Example:
In this example we are going to compute a new variable ratio. That is why it is de-
clared in the column statement. Then we are identifying the ratio variable for compu-
tation in define statement. Ratio variable is computed by dividing average height by
average weight row wise. In this way a loop is created, to end that loop we are using
endcomp.
Append
The APPEND procedure adds the observations from one SAS data set to the end of an-
other SAS data set. PROC APPEND does not process the observations in the first data
set. It adds the observations in the second data set directly to the end of the original
data set.
By appending Data Sets
It is concatenation of two data sets which are already existing.
The observation in each data set will stack together according to the order speci-
Syntax:
DATA output-SAS-data-set;
SET SAS-data-set-1 SAS-data-set-2;
RUN;
Where,
output-SAS-data-set names the data
set to be created
SAS-data-set-1 and SAS-data-set-2
specify the data sets to be read
SAS-data-set-1 and SAS-data-set-2
gets appended and copies to out-
put-SAS-data-set
Example:
Data combined;
Set A C;
Run;
Syntax:
Where,
SAS-data-set-1 and SAS-data-set-2 specify the data sets to be read
SAS-data-set-2 gets appended to SAS-data-set-1999
Force is an optional keyword, used when base file is having some variables missing
compared to data file, to force SAS to append
Example:
A merge combines observations from two or more SAS data sets based on the values
of specified common variables (one or more). It creates a new data set (the merged
data set). Merging is done in a data step with the statements
Syntax:
DATA output-SAS-data-set;
BY <DESCENDING> variable(s);
RUN;
Where,
output-SAS-data-set names the data set to be created
SAS-data-set-1 and SAS-data-set-2 specify the data sets to be read
variable(s) in the BY statement specifies one or more variables whose values are
used to match observations
DESCENDING indicates that the input data sets are sorted in descending order by
the variable that is specified
If there are more than one variable in the BY statement, DESCENDING applies only
to the variable that immediately follows it
Each input data set in the MERGE statement must be sorted in order of the values
of the BY variable(s)
Each BY variable must have the same type in all data sets to be merged
Syntax:
Proc Sort Data = Data-Set-1 [out = Data-Set-2];
By [Descending] Variabel1 [Variable2 ];
Run;
Example:
data merged;
merge a b;
by num;
run;
Clinic.Demog
Clinic.Visit
Example: Merging
data clinic.merged;
merge clinic.demog clinic.visit;
by id;
run;
By default, DATA step match-merging combines all observations in all input data sets.
To exclude unmatched observations from output data set, use the IN= data set option
and the subsetting IF statement in DATA step. In this case, use the IN= data set option
to create and name a variable that indicates whether the data set contributed data
to the current observation; the subsetting IF statement to check the IN= values and to
write to the merged data set only those observations that appear in the data sets for
which IN= is specified.
Syntax: (IN=variable)
Where,
the IN= option, in parentheses, follows the data set name
variable names the variable to be created
Within the DATA step, the value of the variable is 1 if the data set contributed data
to the current observation. Otherwise, its value is 0.
Example:
To Match-merge the data sets Clinic.Demog and Clinic.Visit and select only observa-
tions that appear in both data sets :
Use IN= to create two temporary variables, indemog and invisit
The first IN= creates the temporary variable indemog, which is set to 1 when an ob-
servation from Clinic.Demog contributes to the current observation; otherwise, it is
set to 0
Likewise, the value of invisit depends on whether Clinic.Visit contributes to an ob-
servation or not
IF statement is used to select only observations that appear in both Clinic.Demog
and Clinic.Visit
If the condition is met, the new observation is written to Clinic.Merged. Otherwise,
the observation is deleted
data clinic.merged;
merge clinic.demog (in=
indemog) clinic.visit
(in=invisit);
by id;
if indemog=1 and invisit=1;
run;
proc print da-
ta=clinic.merged;
run;
Right Outer Join If X = 0 and Y = 1 Includes all the non matching observa
tions from right dataset
Left Outer Join If X = 1 and Y = 0 Includes all the non matching observa
tions from left dataset
INFILE/FILE work with other SAS statements to provide extensive data input and output
in the DATA step, such as:
FILENAME
DATALINES
PUT
INPUT
Where, file-specification can take the form fileref to name a previously defined file ref-
erence or 'filename' to point to the actual name and location of the file and options
describes the input file's characteristics and specifies how it is to be read with the INFILE
statement.
Example:
FILENAME test 'c: \ irs \ personal\refund.dat ';
INFILE test obs =100;
Here,
INFILE statement is used along with FILENAME statement;
Test is the file reference which contains the data;
Obs= option will import only the first 100 observations from the data;
INFILE statement can also specify the complete path of a file instead of using the FILE-
NAME statement;
Input Statement:
Describes the fields of raw data to be read and placed into the SAS data set.
Where,
variable is the SAS variable name assigned to the field
($) identifies the variable type as character (if the variable is numeric, then $ is not
specified)
startcol represents the starting column for this variable
endcol represents the ending column for this variable.
Example:
The following code reads data from the file below.
filename exer c : \ users\ exer.dat ;
data exercise ;
infile exer ;
input ID $ 1-4 Age 6-7 ActLevel $ 9-12 Sex $ 14 ;
run ;
Syntax:
Example:
libname libref SAS-data-library ;
filename exercise c:\users\exer.dat ;
data exer ;
infile exercise ;
input ID $ 1-4 Age 6-7 ActLevel $ 9-12 Sex $ 14 ;
Run ;
Here, Libname creates library reference, Filename Reference a external file, Data set
name a SAS data set to be created, Infile statement identifies a external file, Input
statement describes the data from the external file.
It can be used to read character variable values that contain embedded blanks.
input Item $ 1-13 IDnum $ 15-19 Supplier $ 15-16 InStock 21-22 BackOrd 24-25;
The file below contains personnel information for a technical writing department of a
small computer manufacturer. The fields contain values for each employee's last
name, first name, job title, and annual salary. The values for Salary contain commas.
The values for Salary are considered to be nonstandard numeric values. Column input
cannot be used to read these values.
When raw data that is organized into fixed fields is to be read, use:
Column input to read standard data only
Formatted input to read both standard and nonstandard data.
INPUT Statement
Syntax:
Where,
Column pointer-control positions the input pointer on a specified column
variable is the name of the variable that is being created
informat is the special instruction that specifies how SAS reads raw data.
It moves the input pointer to a specific column number. The @ moves the pointer to
column n, which is the first column of the field that is being read.
Where,
variable is the name of the variable that is being created
informat is the special instruction that specifies how SAS reads raw data
Example:
input @9 FirstName $5. @1 LastName $7. @15 JobTitle 3. @19 Salary comma9. ;
Here,
The value for FirstName is read first, starting in column 9.
The lastname is read by taking the @ pointer to the 1st column.
The jobtitle and salary is read from column 15 and column 19 respectively.
It moves the input pointer forward to a column number that is relative to the current
position . It moves the pointer forward n columns.
Where,
variable is the name of the variable that is being created
informat is the special instruction that specifies how SAS reads raw data
In order to count correctly, it is important to understand where the column pointer
control is located after each data value is read
Here,
Because the values for LastName begin in column 1, a column pointer control is
not needed
After LastName is read, the pointer moves to column 8
To start reading FirstName, which begins in column 9, move the column pointer
control ahead 1 column with +1
After reading FirstName, the column pointer moves to column 14
Moved column pointer ahead 5 columns from column 14 to read Salary
@n column pointer control is used to return to column 15 to read jobtitle
Array
SAS arrays are useful when we wish to perform a similar operation on a set of variables,
e.g. array weight wt1-wt50;
do i=1 to 50;
if weight{i}=999 then weight{i}=.;
end;
A SAS array is nothing more than a collection of variables (of the same type), in which
each variable can be identified by referring to the array and, by means of an index, to
the location of the variable within the array.
SAS arrays are defined using the ARRAY statement, and are only valid within the data
step in which they are defined. The syntax for the array statement is:
Suppose we have a data set with 10 variables, named x1,x2,: : :,x10. Whenever any of
these variables has a value of 9, we wish to replace it with a missing value (.).
data new;
set old;
array x x1-x10;
do i=1 to dim(x);
if x{i} = 9 then x{i} = .;
end;
run;
Using dim(x) instead of a constant (10) eliminates the need to know the size of the ar-
ray. Special variable lists (like first -- last or x:) can be very useful when setting up an ar-
ray.
In this example the array name is zero. The variable list includes x1,x2,x3,x4 and x5. This
array changes all missing values to zeros in the variables named in the array state-
ment.
data original;
input x1-x5;
cards;
9 8 7 8 .
8 7 6 . 9
. 9 7 6 .
;Run;
Now we modify the data. Here we are assigning some values to the missing data.
Interpretation
Here we rename the variables in array var, a1-a20, to na1-na20 in array varnew, and
then assign the new variables a value of 0 if the value was missing. If we do not drop
In the above code , the order of the variables in each array is the same, var1-var5.
We need to specify one array name in the DO OVER Loop as each array contains the
same number of elements.
We can do the above code by use of indexed and do loop. It will give us the same
results.
Structured Query Language (SQL) is a standardized , widely used language that re-
trieves and updates data in relational tables and databases. A relation is a mathemat-
ical concept that is similar to the mathematical concept of a set. Relations are repre-
sented physically as two dimensional tables that are arranged in rows and columns.
The structured query language is now in the public domain and is part of many ven-
dors products.
The SELECT statement is the primary tool of PROC SQL . You use it to identify, retrieve
and manipulate columns of data from a table. You can also use several optional
clauses within the SELECT statement to place restrictions on a query.
Proc Sql;
Select * from day1.uscitycoords;
Quit;
The select statement must contain a select clause and a from clause , both of which
are required in a PROC SQL query.
Proc sql;
Select city, state from day1.uscitycoords;
Quit;
You can eliminate the duplicate rows from the results by using the distinct keyword in
the select clause. The following query , which uses the Distinct keyword to produce a
single row of output for each Continent that is in the unitedstates tables.
Proc Sql;
Select distinct continent from day1.unitedstates;
Quit;
Calculating Values
You can perform calculations with values that you retrieve from numeric columns. The
following example converts temperatures in the Worldtemps table from Fahrenheit to
Celsius. By specifying a column alias , you can assign a new to any column within a
PROC SQL query. The new name must follow the rules for SAS names.
Proc sql;
Select city, (avglow 32)*5/9 as lowc format 4.1
From day1.worldtemps;
Quit;
When you use a column alias to refer to a calculated value, you must use the CALCU-
LATED keyword with the alias to inform PROC SQL that the value is calculated within
the query. The Following examples uses two calculated values, lOWC and HIGHC , to
calculate a third value, Range:
Proc sql;
Select city, (avghigh 32)*5/9 as highc format 4.1 , (avglow 32)*5/9 as lowc for-
mat 4.1 , (calculated highc calculated lowc) as range format 4.1 from
day1.worldtemps;
Quit;
You can use conditional logic within a query by using a CASE Expression to condition-
ally assign a value. You can use a CASE expression anywhere that you can use a col-
umn name. In this example , a CASE expression determines the climate zone for each
city based on the value in the latitude column in the Worldcitycoords table. The que-
ry also assigns as alias of Location to the value. You must close the CASE logic with the
END keyword.
You can also construct a Case expression by using the case operand form , as in the
following example. This example selects states and assigns them to a region based on
the value of the continent column.
Proc sql;
Select name, continent,
Case continent
When North America then continental U.S
When Oceania then Pacific Islands
Else None
End
As region from day1.unitedstates;
Quit;
The COALESCE function enables you to replace missing values in a column with a new
value that you specify. For every row that the query processes, the coalesce function
checks each of its arguments until it finds a non missing value, then returns that value.
The following query replaces missing values in the high point column in the Continents
table with the words Not available, area with 0 and depth with its mean value.
Proc sql;
Select name,
coalesce(highpoint, Not Available) as highc,
coalesce(Area, 0) as newarea ,
coalesce (depth,mean(depth)) as newdepth
from day1.continents;
Quit;
The following CASE expression shows another way to perform the same replacement
of missing values. However , the COALESCE function requires fewer lines of code to ob-
tain the same results.
Proc sql;
Select name,
Case
When highpoint is missing then Not Available
Else highpoint
end
Sorting Data
You can sort query results with an ORDER BY clause by specifying any of the columns in
the table, including unselected or calculated columns.
The following example selects countries and their populations from the countries table
and orders the results by population.
Proc sql;
Select name, population format comma15. From day1.countries
Order by population;
Quit;
When you can use an ORDER BY clause, you change the order of the output but not
the order of the rows that are stored in the table. The Proc sql default sort order is as-
cending.
You can sort by more than one column by specifying the column names, separated
by commas , in the order by clause. The following example sorts the countries table by
two columns, continents and name:
Proc sql;
Select name, Continent
From day1.countries
order by continent, Name;
Quit;
When you specify multiple columns in the order by clause, the first column determines
the primary row order of the results. Subsequent columns determine the order of rows
that have the same value for the primary sort. The following example sorts the features
table by feature type and name.
Proc sql;
Select name, type from day1.features
Order by type desc, name;
Quit;
You can sort by a calculated column by specifying its alias in the order by clause. The
following example calculates population density and then performs a sort on the cal-
culated density column:
Proc Sql;
Select name,
Population format comma15.,
The following report will get sorted in ascending order of name(A Z) as order of name
variable in the code is 1:
Proc sql;
Select name, population format comma10. From day1.countries
Order by 1;
Quit;
You can sort query results by columns that are not included in the query. For Example,
the following query returns all rows in the Countries table and sorts them by population,
even though the population column is not included in the query.
Proc sql;
Select name, continent from day1.countries
Order by population desc;
Quit;
Proc sql sorts nulls, or missing values, before character or numeric data, therefore ,
when you specify ascending order, missing values appears first in the query results. The
following example sorts the rows in the continents table by the highpoint column.
Proc sql;
Select name, highpoint
from day1.continents
order by highpoint;
Quit;
The following example uses a where clause to find all countries that are in the conti-
nent of Europe and their populations.
Proc sql;
Select name, population format comma15. From day1.countries
Where continent=Europe;
Quit;
Proc sql;
Select name, population format comma15. ,
From day1.unitedstates
Where population gt 5000000
Order by population desc;
Quit;
You can use logical or Boolean, operators to construct a where clause that contains
two or more expressions. The following table lists the logical operates that you can use.
The following example uses two expressions to include only countries that are in Africa
and that have a population greater than 100000 people.
Proc sql;
Select name,population format comma15. ,
From day1.countries
Where continent =Africa And population gt 100000
Order by population desc;
Quit;
Using IN Operator
The IN operator enables you to include values within a list that you supply . The follow-
ing example uses the IN operator to include only the mountains and waterfalls in the
features table.
Proc sql;
Select name, type, height format comma10.
From day1.features
Where type in( Mountain,Waterfall)
Order by height;
Quit;
The IS MISSING operator enables you to identify rows that contain columns with missing
values.
Proc sql;
Select name, Highpoint
From day1.continents
Where highpoint is missing;
Quit;
To select rows based on a range of values , you can use the between and and opera-
tors. This Example selects countries that have latitudes within five degrees of the equa-
tor.
Proc sql;
Select city, country, latitude
From day1.worldcitycoords
Where latitude between -12 and 50;
Quit;
Like operator
The like operator enables you to select rows based on pattern matching. For exam-
ple , the following query returns all countries in the countries table that begin with the
letter A and are any number of characters long, or end with the letter a and are
seven characters long.
Proc Sql;
Select name
From day1.countries
Where name like A%;
Quit;
Proc Sql;
Select name , population comma15.
From day1.countries
Where name like ______a;
Quit;
Similarly the following codes returns values where country names start with B , ends
with n, or starts with A and ends with n.
Proc Sql;
Select capital, population format comma15.
From day1.countries
Where capital like B%;
Quit;
Proc Sql;
Select name, population format comma15.
From day1.countries
Where name like %n;
Quit;
Proc Sql;
Select name, population format comma15.
From day1.countries
Where name like A%n;
The following query searched for names of countries from the countries dataset where
a string ba is present.
Proc Sql;
Select name, population format comma15.
From day1.countries
Where name like %ba%;
Quit;
The following query shows how to use a where clause for finding out a range of val-
ues .
Proc sql;
Select name, depth
From day1.continents
Where depth lt 500
Order by depth;
quit;
Proc sql;
Select name, depth
From day1.continents
Where depth lt 500 and depth is not missing
Order by depth;
Quit;
The above query helps to find out a depth value that are less than 500 and also not
missing.
A multiple value subquery can return more than one value from one column. It is
used in a WHERE or HAVING expression that contains in operator or a comparison op-
erator that is modified by ANY or ALL. This example displays the populations of oil pro-
ducing countries. The subquery first returns all countries that are found in the OILPROD.
Proc sql;
Select name, population comma15.
From day1.countries
Where name in (select country from day1.oilprod);
Quit;
If you use the NOT IN operator in this query , then the query result will contain all the
countries that are not contained in the OILPROD table.
Proc sql;
Select name, population comma15.
From day1.countries
Where name not in (select country from day1.oilprod);
Quit;
Proc sql can combine the results of two or more queries in various ways by using the
following set operators.
Union: Produces all unique rows from both queries.
Except: Produces rows that are part of the first query only.
Intersect: Produces rows that are common to both query results.
The Union Operator combines two query results . It produces all the unique rows that
result from both queries i.e. it returns a row if it occurs in the first table , the second or
both. Union does not return duplicate rows. If a row occurs more than once, then only
one occurrence is returned.
Proc sql;
Select * from day1.table1
Union
Select * from day1.table2;
Quit;
Producing rows that are in only the first query result (except)
The EXCEPT operator returns rows that result from the first query but not from the se-
cond query. In this example, the row that contains the values 3 and three exists in the
first query (table 1) only and is returned by except.
Proc sql;
Select * from day1.table1
Except
Select * from day1.table2;
Quit;
The Intersect operator returns rows from the first query that also occur in the second.
Proc sql;
Select * from day1.table1
Intersect
Select * from day1.table2;
Quit;
With the set clause, you assign values to columns by name. The columns can appear
in any order in the set clause. The following INSERT statement uses multiple SET clauses
to add two rows to table1.
Updating tables
Modifies a column's values in existing rows of a table or view. You want to update the
UNITEDSTATES table with updated population data. Use the following PROC SQL code
to update the population information for each state in the UNITEDSTATES table:
Proc sql;
create table day1.newcountries1 as
Select * from day1.countries;
Update day1.newcountries1
Set population = population * 2
Where name like B%;
In the above example the population column is updated but only for the name that
starts with B.
Deleting Rows
The Delete statement deletes one or more rows in a table. The following DELETE state-
ment deletes the names of cities that begin with the letter A.
Proc sql;
create table day1.newcountries2 as
Select * from day1.countries;
The drop clause deletes columns from tables. The following DROP clause delete UN-
DATE from countries.
SQL JOIN
The JOIN keyword is used in an SQL statement to query data from two or more tables,
based on a relationship between certain columns in these tables. Tables in a data-
base are often related to each other with keys. A primary key is a column (or a combi-
nation of columns) with a unique value for each row. Each primary key value must be
unique within the table. The purpose is to bind data together, across tables, without
repeating all of the data in every table.
Different SQL JOINs
Before we continue with examples, we will list the types of JOIN you can use, and the
differences between them.
INNER JOIN: Return rows when there is at least one match in both tables
LEFT JOIN: Return all rows from the left table, even if there are no matches in the
right table
RIGHT JOIN: Return all rows from the right table, even if there are no matches in the
left table
FULL JOIN: Return rows when there is a match in one of the tables
INNER JOIN
An inner join returns only the subset of rows from the first table that matches rows from
the second table. You can specify the columns that you want to be compared for
matching values in a WHERE clause.
A table alias is a temporary , alternate name for a table. You specify table alias in the
from clause. Table aliases are used in joins to qualify column names and can make a
query easier to read by abbreviating table names. The following example compares
the oil production of countries to their oil reserves by joining the oilprod and oilrsvrs ta-
bles on their country column. Because the country columns are common to both ta-
bles, they are qualified with their table aliases. You could also qualify the columns by
prefixing the column names with the table names.
Proc sql;
Select * from day1.oilprod as p
inner join
day1.oilrsvrs as r
On p.country= r.country;
Quit;
A left join lists all the rows from the left hand table (the first table listed in the from
clause). A left join is specified with the keywords left join and on.
For example, to list the coordinates of the capitals of international cities , join the coun-
tries table, which contains capitals , with the worldcitycoords table, which contains cit-
ies coordinates , by using a left join. The left join lists all capitals, regardless of whether
the cities exist in worldcitycoords. Using an inner join would list only capital cities for
which there is a matching city in worldcitycoords.
Proc sql;
Title Coordinates Of Capital Cities;
Select Capital, Name, Latitude, longitude
From day1.countries as a
left join
day1.worldcitycoords as b
On a.capital=b.city and A.name=b.country;
Quit;
Right Join
The right join is just opposite the left join. The result of a Right Outer merge or join pro-
duces matched rows from both tables while preserving all unmatched rows from the
right table. This example reverses the join of the last example, it uses a right join to se-
lect all the cities from the worldcitycoords table and displays the population only if the
city is the capital of a country ( that is , if the city exists in the countries table).
Proc sql;
Select city, country, population
From day1.countries as a
right join
day1.worldcitycoords as b
On a.capital=b.city and A.name=b.country;
Quit;
Full Join
A full outer join, specified with the keywords FULL JOIN and ON , selects all matching
and nonmatching rows. This example displays the matching and non matching rows
from the city and capital columns of worldcitycoords and countries.
Proc sql;
Select city, country, population
From day1.countries as a
full join
day1.worldcitycoords as b
On a.capital=b.city and a.name=b.country;
Quit;
SAS Macro Language is a tool for extending and customizing the SAS system and for
reducing the amount of text one must enter to do common tasks. SAS Macro facility is
a tool for text substitution. We associate a macro reference with text. When the mac-
ro processor encounters that reference, it replaces the reference with the associated
text. This text can be as simple as text strings or as complex as SAS language state-
ments. SAS macro facility is a component of BASE SAS.
By using SAS Macro facility the program can become reusable, shorter and easier
to follow.
Repetitive works can be accomplished easily and quickly.
The macro language statements start with a percent sign (%) and macro variable ref-
erences start with an ampersand (&).
Macro variables are defined with %Let statement. The %Let statement contains an
assignment where the macro variable is assigned a value. The Macro variable is re-
ferred to by using & symbol.
For various operations SAS Systems defines many Macro variables. These are termed as
system defined macro variables. These macro variables are created when SAS session
starts. To display a list of system defined macro variables in the log:
%put _automatic_;
Macro Variable Resolution and the Use of Single and Double Quotation Marks
If a macro variables reference is enclosed in double quotation then the macro varia-
bles reference is resolved, otherwise not.
Two ways to display macro variable values are with the macro language statement %
PUT and with the SAS system option SYMBOLGEN. Both of these features write the val-
ues of macro variables to the SAS log.
The %PUT statement instructs the macro processor to write information to the SAS log.
The %PUT statement can be submitted by itself from the windowing environment Editor
or from within a SAS program. Since %PUT is a macro language statement, it does not
need to be part of a DATA step or PROC step, nor can it be part of a DATA step or
PROC step. A %PUT statement displays only text and information about macro varia-
bles.
With SYMBOLGEN enabled, SAS presents the results of the resolution of macro variables
in the SAS log. SYMBOLGEN displays the value of a macro variable in the SAS log near
the statement with the macro variable reference.
SYMBOLGEN shows the values of both automatic and user-defined macro variables.
The SYMBOLGEN option helps you debug your programs. If we are getting unexpected
results when using macro variables, enable this option and read the SAS log.
Scope of a macro variable refers to the area or bounded region to which the variable
can be accessed. The scope of a sas macro variable can be categorized into two
groups.
The Global Macro variable exists till the duration of the SAS session. These global varia-
bles can be used within or outside the macros. Global macro variables can be creat-
ed anytime during the SAS session or job, their values can also be changed anywhere.
The Local Macro variables are declared within a SAS macro and the scope exists till
the execution of the macro. They have no meaning outside that macro. They can be
accessed or modified within the created macro only .
Macro Program
Macro variables are only useful for simple text substitution. When a part of the SAS
code or the whole program needs to be repeated then we need to write them in a
macro program. Each macro program is assigned a name. When we reference a
macro program, the statements inside the macro program execute. The text that re-
sults from the execution is substituted into your SAS program at the location of the
macro program reference.
Macro programs use macro variables and macro language statements to generate
the text that builds your SAS programs. The SAS macro programming language has the
Several macro language statements can be used only inside macro programs. The
macro language statements that we have seen so far, %LET and %PUT, can be used
inside or outside macro programs.
%MACRO macro_name;
macro definition;
%MEND macro_name;
A Macro program starts with a %MACRO statement and ends with a %MEND.
macro_name is the name assigned to the macro program. It must be a valid SAS
name ( Max 32 characters), it should not be any reserved word in the macro facility.
macro definition can include text strings, macro variables, functions, SAS programs
etc.
To compile the macro program for later use in our SAS session, we have to submit the
macro program definition from the Editor or from within the SAS program that calls it.
The word scanner tokenizes the macro program and sends the tokens to the macro
processor for compilation.
When the macro processor compiles the macro language statements in the macro
program, it saves the results in a SAS catalog. By default, SAS stores macro programs in
a catalog in the WORK library called SASMACR. Macro programs can also be saved in
permanent catalogs and structures called autocall libraries.
A compiled macro program can be reused within the same SAS session. A macro pro-
gram has to be submitted only once in the SAS session. The compiled macro program
remains in the SASMACR catalog throughout the SAS session. When the SAS session
ends, SAS deletes the SASMACR catalog that contains the compiled macro program.
The entry in the catalog for the <macro program> is the compiled version of <macro
program>.
We can also list the entries in WORK.SASMACR catalog by submitting the following
PROC CATALOG step.
%program
where program is the name assigned to the macro program.
A reference to a macro program that has been successfully compiled can be placed
anywhere in your SAS program except in data lines. This call to the macro program is
preceded by a percent sign (%). The percent sign tells the word scanner to direct pro-
cessing to the macro processor. The macro processor takes over and looks for the
compiled program in the WORK.SASMACR catalog of session compiled macro pro-
grams. If found, the macro processor directs execution of the compiled macro pro-
gram. If not found, an error message is written to the SAS log.
No semicolon follows the call to the macro program. The call to a macro program is
not a SAS statement. Indeed, using a semicolon to terminate the call to the macro
program might cause errors in the execution of your macro program.
We can increase the reusability and flexibility of the macro program by using parame-
ters. With the help of parameters we dont need to change the macro program every
time.
Macro parameter names are specified on the %MACRO statement. The names as-
signed to the parameters must be the same as the names of the macro variables that
we want to reference inside the macro program. The initial values of the parameters
are specified on the call to the macro program. When the macro program starts, the
corresponding macro variables are initialized with the values of the parameters.
There are two types of macro program parameters: positional and keyword.
Positional parameters are enclosed in parentheses and are separated with commas.
When we call a macro program that uses positional parameters, we must specify the
same number of values in the macro program call as the number of parameters listed
on the %MACRO statement. Valid values include null values and text. If we want to as-
sign a positional parameter a null value and we want to assign values to subsequent
positional parameters, use a comma as a placeholder.
The general format of a call to a macro program that uses positional parameters is
%program(value-1, value-2, ..., value-n)
%macro print(var1,var2,var3,var4,var5);
proc print data=&var1;
id &var2;
var &var3 &var4;
sum &var5;
run;
%mend;
%print(day1.candy_sales_summary,prodid,subcategory,category,sale_amount);
In the keyword parameters, the keywords are specified with the name followed by =
sign. Unlike the positional parameters the keyword parameters can be specified in
any order in the called macro.
%macro print(mydata=,var1=,var2=,var3=);
proc print data=&mydata;
id &var1;
var &var2 &var3;
sum &var3;
%mend;
%print
var1=prodid,mydata=day1.candy_sales_summary,var3=sale_amount,var2=subcategory);
The autocall facility consists of external files or SOURCE entries in SAS catalogs that con-
tain the macro programs. When we specify certain SAS options, the macro processor
The stored compiled macro facility consists of SAS catalogs that contain compiled
macro programs. When you specify certain SAS options, the macro processor searches
your catalogs of compiled macro programs when it is resolving a macro program ref-
erence.
When we store a macro program in an autocall library, we do not have to submit the
macro program for compilation before referring the macro program. The macro pro-
cessor does that automatically if it finds the macro program in the autocall library. Sev-
eral SAS products ship with libraries of macro programs that we can reference, or that
are referenced by the SAS products themselves.
The main disadvantage to the autocall facility is that the macro program must be
compiled for the first time it is used in a SAS session. This takes resources. Also, resources
are used to search the autocall libraries for the macro program reference.
After the macro processor finds the macro program in autocall library, it submits the
macro program for compilation. If there are any macro language statements in open
code, these statements are executed immediately. The macro program is compiled
and stored in the session compiled macro program catalog, SASMACR. SASMACR is in
the WORK directory.
The macro program can be reused within the SAS session. When it is, only the macro
program itself is executed. Any macro language statements in open code that might
have been stored with the macro program are not executed again. The compiled
macro program is deleted at the end of the session when the catalog
WORK.SASMACR is deleted. The code remains in the autocall library.
The macro programs can be stored as external files or as source entries in SAS cata-
logues.
When we are storing a macro program in a SAS catalogue we have to make each
macro program in a separate source entry. The name of the source entry should be
same as the macro program entry.
When we want SAS to search for macro programs in autocall libraries, we must specify
the two SAS options, MAUTOSOURCE and SASAUTOS. These options can be specified in
three ways:
Add MAUTOSOURCE and SASAUTOS to the SAS command that starts the SAS ses-
sion.
Submit an OPTIONS statement with MAUTOSOURCE and SASAUTOS from within a
SAS program.
Submit an OPTIONS statement with MAUTOSOURCE and SASAUTOS from within an
interactive SAS session.
The MAUTOSOURCE option must be enabled to tell the macro processor to search au-
tocall libraries when resolving macro program references. By default, this option is ena-
bled. Specify NOMAUTOSOURCE to turn off this option. A reason someone might disa-
ble MAUTOSOURCE is to save computing resources when not using autocall libraries.
options mautosource;
options nomautosource;
The SASAUTOS= option identifies the location of the autocall libraries for the macro
processor. On the SASAUTOS= option, specify either the actual directory reference en-
closed in single quotation marks or the filerefs that point to the directories. A FILENAME
statement defines the fileref.
The syntax of the SASAUTOS= option follows. The first line shows how to specify one li-
brary. The second line shows how to specify multiple libraries. The macro processor
searches the libraries in the order in which they are listed on the SASAUTOS= option
statement.
options sasautos=library;
options sasautos=(library-1, library-2,..., library-n);
The next statements define two filerefs under Windows XP with SAS 9 and assigns them
to SASAUTOS=. The OPTIONS statement includes these two filerefs plus the SASAUTOS
fileref.
To specify the same libraries as above without using filerefs, submit the following state-
ment. Note the inclusion of the SASAUTOS fileref.
options sasautos=
('c:\mymacroprograms\repmacs' 'c:\mymacroprograms\graphmacs' sasautos);
The next statements reference a user-defined autocall library stored in a SAS catalog
under Windows XP SAS 9. It also includes the SASAUTOS fileref.
Macro programs that you want to save and do not expect to modify can be com-
piled and saved in SAS catalogs using the stored compiled macro facility. When a
compiled macro program is referenced in a SAS program, the macro processor skips
the compiling step, retrieves the compiled macro program, and executes the com-
piled code. The main advantage of this facility is that it prevents repeated compiling
of macro programs that you use frequently.
A disadvantage of this facility is that the compiled versions of macro programs cannot
be moved to other operating systems. The macro source code must be saved and
recompiled under the new operating system. Further, if you are moving the compiled
macro programs to a different release of SAS under the same operating system, you
might also have to recompile the macro programs.
Macro source code is not stored by default with the compiled macro program. You
are responsible for maintaining a copy of the macro source code. A convenient place
to store the code is an autocall library. Also, you can save the source code as a
SOURCE entry in a catalog if you specify the SOURCE option when compiling your
macro program.
Another way of saving the macro program code for later retrieval is shown in a later
section where the SOURCE option is added to the %MACRO statement when creating
a stored compiled macro program. This option stores the macro program code in the
same entry as the compiled code, and you can retrieve this code later with the %
COPY statement.
We need to set two SAS options, MSTORED and SASMSTORE, before we can compile
and store the macro programs.
The MSTORED option instructs SAS that we want to make stored compiled macro pro-
grams available to our SAS session.
options mstored;
To turn off the MSTORED option, submit the following OPTIONS statement.
options nomstored;
SAS stores compiled macro programs in a catalog called SASMACR. The SASMACR
catalog is stored in the directory specified by the SASMSTORE option. In this example,
that directory has the libref of MYAPPS. Do not rename the SASMACR catalog. Use the
CATALOG command or PROC CATALOG to view the list of macro programs stored in
this catalog.
You can also tell the macro processor to search SASMACR catalogs in multiple loca-
tions for a stored compiled macro program by listing the multiple paths on the
LIBNAME statement. The following code tells the macro processor to look in the
SASMACR catalog in the three locations that are specified within the parentheses. The
order in which you list the paths is the order in which SAS searches for a stored com-
piled macro program. If you have a macro program with the same name in two loca-
tions, the program found in the first of the two paths is the one that executes.
Once the SAS options in the previous section are set, macro programs can be com-
piled and stored in a catalog by adding options to the %MACRO statement. The syn-
tax of %MACRO when you want to compile and store a macro program follows:
The STORE keyword is required. The SOURCE, SECURE, and DES= options are not re-
quired. The SOURCE option tells the macro processor to save a copy of the macro pro-
grams source code, along with the compiled macro program in the same SASMACR
catalog. It does not have a separate entry in the catalog and is instead stored in the
same MACRO entry as the compiled macro program.
Starting with SAS 9.2, you can use the SECURE option to encrypt the compiled macro
program and prevent someone from easily obtaining the source code. Without the
SECURE option, it is not easy, but it is possible to extract the code.
Saving and Retrieving the Source Code of a Stored Compiled Macro Program
By default, if you do not specify a libref with the LIBRARY= option, the macro processor
will look in the library specified by the current setting of SASMSTORE.
If you want to view the code in the SAS log, submit the following statement:
%copy reptitle / library=myapps source;
If you want to save the code in a file called REPTITLE_SOURCE.SAS, submit the
following %COPY statement.
%copy reptitle / library=myapps source out-
file='c:\mymacroprograms\reptitle_source.sas';
Suppose, you want to store the sum of 100 and 30 in a global macro variable X.
To your astonishment, you will find that the log is showing 100 + 30, not 130.
%Let X = %EVAL(100+30);
%Put &X;
%EVAL is used to evaluate a mathematical expression and the output is in integer for
mat. For an output with decimal places, we use %SYSEVALF instead of %EVAL.
Data _null_;
Set Day1.Class End= Final;
If age > 13 then n+1;
If final then CALL SYMPUT(number, n);
Run;
%PUT &number students have age greater than 13;
Here we created a macro variable number using the CALL routine SYMPUT. The value
that number stores is the final count of student whose age is greater than 13.
Proc SQL;
Select Name Into: X from SASHELP.CLASS
Having Height = Max(Height);
Quit;
1. Delwiche, L.D. , Slaughter, S.J., The Little SAS Book: A Primer, 4th Edition, SAS
Publishing
2. Bass N.J., Lata K.M., Base SAS Programming Black Book , Dreamtech Press
4. Burlew, M.M. , SAS Macro Programming Made Easy, 2nd Edition, SAS Publish-
ing
Call Us:
033 40041497
09051563222
Mail us:
info@orangetreeglobal.com
Visit us:
www.orangetreeglobal.com