SAS Aid

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 19

SAS Aid

Contents

• Basic windows available in the SAS • PROC MEANS


• SAS Language Steps • PROC FREQ
• Using SPSS to create a SAS dataset • Crosstabs
and Data Dictionary • Standardization
• Creating a permanent SAS data set • Linear Regression
• Using observations from a SAS data • Stepwise Regression
set as input to a new SAS data set • Logistic Regression
• Calculations in SAS • Factor Analysis
• Conditional logic and Logical • Cluster Analysis
Operators in SAS
• For further Reference
5 basic windows available in the SAS software

• Editor – Write the


SAS program

SAS Help • Log – Check the log


after RUNning the
SAS code

• Output – Check the


output, if applicable,
post SAS processing

• Explorer – Navigate
and check libraries
and datasets

• Results – Stores past


results for review

Results Explorer Output LOG Editor


SAS programming language consists mainly of 2 kinds of steps

SAS
Language
Data step
DATA Step PROC Step

• • PROC (procedure) steps are


DATA steps typically pre written routines that
create or modify SAS enable you to analyze and
datasets. For example you process the data in SAS data
can use DATA steps to: set. For example you can
use PROC steps to:
– put your data into a – produce descriptive
SAS data set statistics
– compute values – create a summary report
– check for and – produce plots and charts
correct errors in
your data PROC step
– produce new SAS
data sets by
subsetting, merging,
and updating
existing data sets
Using SPSS to create a SAS dataset and Data Dictionary

We can create a SAS Dataset from an SPSS Dataset as follows:


1. Open SPSS
2. Go to File-> Open->Data and open the .sav file
3. Go to File->Save As and select ‘SAS v7+ Windows Long Extension’ in the save as type

CREATING A DATA DICTIONARY WITH SPSS:


4. Open SPSS
5. Go to File->Display Data File Information->Working File
6. Go to the Output Window(Output1(Document1) by default)
7. Select Variable Information from the pane on the right
8. Right click on it and say ‘Export’ and select ‘Excel(.xls) from the Document type in the ‘Export Output’ Window
9. Repeat the process above for Variable Values(to be selected from the right pane in Output1 Window)
10. Append Variable Information and Variable Values obtained in separate excel files to create a Data Dictionary
file.
Creating a permanent SAS data set

• LIBNAME* statement can be used to create permanent libraries at user defined locations

• A permanent library will store datasets permanently, i.e., they will not get deleted after the SAS session
is closed

• Syntax –
libname Win_R "D:\C_Windows Research Projects\Project_1";

Folder where dataset is


Name of library Path to folder
created/stored

• A valid library name must start with an alphabet and cannot have more than 8 characters

*SAS coding is not case sensitive


Calculations in SAS

Operators in SAS Sample code for simple calculation


data tpv.bam_jan10_v10;
set tpv.bam_jan10_v9;
/* Calculation of TPV */
p = 0.3;
q = 0.2;
• Comment can be added as
r = 0.05;
shown to explain the code
A = 0.25;
• Commented part do not
B = 0.05;
Complete list of the logical comparison C = 0.05;
get executed
operators TPV = (p * A) + (q * B) + (r * C);
RUN;
Using observations from a SAS data set as input to a new SAS
data set
• Two – level naming convention To store datasets in the permanent library
Copies the bam_jan10 dataset from
data tpv.bam_jan10_v1; tpv library and creates a new
set tpv.bam_jan10; dataset bam_jan10_v1 in tpv library

• When the SAS session is closed, the library reference will get deleted but the SAS datasets in
that location will be retained

To RUN only a part of SAS code


Out of hundreds of lines of SAS code, if only a part is to be executed at a time then…
• First select the part of code to be executed (in the Editor window)
• Press F3 to execute that part only
Conditional Logic in SAS – if, else-if
Logical Operators AND, OR and NOT
Example : if Example : else-if
Conditional Logic Outcome Conditional Logic Outcome
Q31_1_2 = 0 OR Q31_3_2 = 0 C=0 Age is missing Group = Blank
Q31_1_2 = 1 & Q31_3_2 = 1 & for C=1 Age less than 20 Group = 1
Q28_2 option 1,2 or 3 are not selected
Age b/w 20 to 39 Group = 2
Q31_1_2 = 1 & Q31_3_2 = 1 & for C=2
Q28_2 option 1,2 or 3 are selected Age b/w 40 to 59 Group = 3
Age >=60 Group = 4

Sample Code: Sample Code:


data tpv.bam_jan10_v3; data tpv.bam_jan10_v6;
set tpv.bam_jan10_v2; set tpv.bam_jan10_v5;
If Q31_1_2=0 OR Q31_3_2=0 then C=0; if missing(Age) then Group = .;
else if Age lt 20 then Group = 1;
If Q31_1_2=1 AND Q31_3_2=1
else if Age lt 40 then Group = 2;
AND Q28_2 NOT IN (1,2,3) then C=1; else if Age lt 60 then Group = 3;
If Q31_1_2=1 AND Q31_3_2=1 else if Age ge 60 then Group = 4;
AND Q28_2 IN (1,2,3) then C=2; RUN;
RUN;
For cases where respondent has
left the field blank
Meaning of Q28_2 in (1,2,3) : for Q28_2,
respondent selected any of option 1,2 or 3

Note: IF statements are not allowed inside SAS procedures


PROC MEANS : To get Mean Std Dev Min Max etc.
Library name Name of dataset Optional, refer table below

PROC MEANS data = kevin.mstmcb113009costRussia nmiss mean median N;


var Download_month Usage_month ARPU_Month;
Variables for which mean is to be found
class segment;
where Q74_1_3 in(6,7) and Q74_1_6 in(6,7) and Segment=8;
RUN;
For logical conditions in any PROC (PROCedures), ‘Where’ is used as ‘If’ can not be used in PROCedures

Produce summary stat for each grouping class

Partial list of PROC Means options


PROC FREQ : To get frequency tables

Syntax: Statement Description


PROC FREQ < options > ; BY calculates separate frequency or crosstabulation tables
BY variables ; for each BY group.
EXACT statistic-options < / computation-options > ;
OUTPUT < OUT=SAS-data-set > options ; EXACT requests exact tests for specified statistics.
TABLES requests < / options > ; OUTPUT creates an output data set that contains specified
TEST options ; statistics.
WEIGHT variable ; TABLES specifies frequency or crosstabulation tables and
RUN; requests tests and measures of association.

In the syntax, brackets denote optional TEST requests asymptotic tests for measures of association
and agreement.
specifications, and vertical bars denote a choice of
one of the specifications separated by the vertical WEIGHT identifies a variable with values that weight each
bars. observation.

Sample Code: Variables for which frequency is to be found


PROC FREQ data = kevin.mstmcb113009costRussia;
tables Q24 Primary Secondary Other Missing Norow Nocol Nopercent all Noprint;
where Q74_1_3 in(6,7) and Q74_1_6 in(6,7) and Segment=8;
RUN; Optional:
Missing: Treats missing values as a separate observation
Norow: Removes row percentages
Nocol: Removes column percentages
Nopercent: Removes cell percentage
Further Reading : http://www.sfu.ca/sasdoc/sashtml/stat/chap28/index.htm
Crosstabs : To generate two way tables

Sample Code

PROC FREQ data=tpv.bam_jan10_v10_temp;


tables TPV*Segment/ nocol norow nopercent;;
RUN;

The asterisk between TPV and Segment tells PROC FREQ that you want a two-way table with TPV
forming the rows of the table and Segment forming the columns.

To generate multiple two way tables


tables A * (B C D);
[This statement generates three tables: A by B, A by C, and A by D]

tables (A B) * (C D);
[This request generates four tables: A by C, A by D, B by C, and B by D]
Standardization : To standardize the data

PROC STANDARD data = tpv.bam_jan out = tpv.bam_jan_v1 mean=0 std=1 ;


Var TPV Q74_1_5 Q74_1_6 Q74_1_21 Q74_1_15 Q74_1_3 Q74_1_20 Q74_1_7;
RUN;
All variables to be standardized
Linear Regression

Syntax: Statement Description


PROC REG < options > ; MODEL specifies the dependent and independent variables in
MODEL dependents=<regressors> < / options > ; the regression model, requests a model selection
VAR variables ; method, displays predicted values, and provides details
OUTPUT < OUT=SAS-data-set > keyword=names       on the estimates (according to which options are
< ... keyword=names > ; selected)
RUN;
VAR lists variables for which crossproducts are to be
computed, variables that can be interactively added to
the model, or variables to be used in scatter plots
In the syntax, brackets denote optional
specifications, and vertical bars denote a choice of OUTPUT creates an output data set and names the variables to
one of the specifications separated by the vertical contain predicted values, residuals, and other
bars. diagnostic statistics.

Sample Code
PROC REG data = itp.It_pro_wv17;
Model Sales = Price GDP IIP ATL / details VIF;
ods output parameterestimates=est;
quit;
RUN;
Further Reading : http://www.sfu.ca/sasdoc/sashtml/stat/chap55/index.htm
Stepwise Regression

Syntax notations are similar to PROC REG, with a few changes as given in the example below.

Sample Code
PROC REG data=tpv.bam_jan10_v12_v2;
Model TPV= Q74_1_5 Q74_1_6 Q74_1_21 Q74_1_15 Q74_1_3 Q74_1_20 Q74_1_7 Q21 Q29_3
/ details stb vif
selection=stepwise slentry=0.05 slstay=0.05;
RUN;

Where
Stepwise regression analysis is requested by specifying the SELECTION=STEPWISE option in the MODEL statement
 The option SLENTRY=0.05 specifies that a variable has to be significant at the 0.05 level before it can be entered into
the model
 The option SLSTAY=0.05 specifies that a variable in the model has to be significant at the 0.05 level for it to remain in
the model
 The DETAILS option requests detailed results for the variable selection process

Further Reading : http://www.sfu.ca/sasdoc/sashtml/stat/chap49/sect33.htm


Logistic Regression

Syntax: Statement Description


PROC LOGISTIC < options >; MODEL you specify one variable (preceding the equal sign) as
MODEL events/trials = < effects > < / options >; the response variable, applicable to both binary
OUTPUT < OUT=SAS-data-set > response data and ordinal response data
< keyword=name...keyword=name > / < option >;
RUN; STB displays the standardized estimates
LACKFIT requests Hosmer and Lemeshow goodness-of-fit test

In the syntax, brackets denote optional OUTROC names the output data set
specifications, and vertical bars denote a choice of OUTPUT creates an output data set and names the variables to
one of the specifications separated by the vertical contain predicted values, residuals, and other
bars. diagnostic statistics.

Sample Code
PROC LOGISTIC DATA =itp.It_pro_wv17_recommend descending;
model Dependent Variable =
Independent variables / selection = stepwise stb slentry = 0.055 slstay =
0.055 lackfit outroc = roc;
ods output ParameterEstimates = pest;
output out = test p = predval;
RUN;
Further Reading : http://www.sfu.ca/sasdoc/sashtml/stat/chap39/index.htm
Factor Analysis
Lets understand PROC FACTOR directly with an example
PROC FACTOR data= tpv.Psatrefresh_us
METHOD=Principal PRIORS=ONE ROTATE = Varimax NFACTORS=5
OUT=tpv.Psatrefresh_us_1
SCREE MSA CORR SCORE RES REORDER
MINEIGEN=1;
VAR Q21_1 Q21_2 Q21_3 Q21_4 Q21_5;
RUN;
Statement Description
METHOD=name specifies the method for extracting factors
PRIORS=name specifies a method for computing prior communality estimates, PRIORS=ONE sets all prior communalities to 1.0
ROTATE=name specifies the rotation method, ROTATE=VARIMAX specifies orthogonal varimax rotation
NFACTORS=n specifies the maximum number of factors to be extracted
OUT=SASdataset creates a data set containing all the data from the DATA= data set + variables Factor1, Factor2,… containing factor scores
SCREE displays a scree plot of the eigenvalues
MSA produces the partial correlations b/w each pair of variables controlling for all other variables
CORR displays the correlation matrix or partial correlation matrix
SCORE reads scoring coeff. (_TYPE_='SCORE') from TYPE=FACTOR, TYPE=CORR, TYPE=UCORR, TYPE=COV, or TYPE=UCOV data set
RES displays the residual correlation matrix and the associated partial correlation matrix
REORDER causes the rows (variables) of various factor matrices to be reordered on the output. Variables with their highest absolute
loading on the first factor are displayed first, from largest to smallest loading, followed similarly for the second factor …
MINEIGEN=p specifies the smallest eigenvalue for which a factor is retained. If you specify two or more of the MINEIGEN=, NFACTORS=,
and PROPORTION= options, the number of factors retained is the minimum number satisfying any of the criteria

Further Reading : http://www.sfu.ca/sasdoc/sashtml/stat/chap26/index.htm


Cluster Analysis

Lets understand PROC CLUSTER directly with a an example


PROC CLUSTER data = ob.omnibus_data
METHOD = ward
OUTTREE = Tree PSEUDO;
VAR Q235_2 Q235_3 Q235_4 Q235_5 Q235_6 Q235_7 Q235_9 Q235_10;
RUN;

Statement Description
METHOD=name determines the clustering method used by the procedure
WARD One of the many clustering methods possible, requests Ward's minimum-variance method (error sum of squares, trace
W). Distance data are squared unless you specify the NOSQUARE option
ROTATE=name specifies the rotation method, ROTATE=VARIMAX specifies orthogonal varimax rotation
OUTTREE=SAS- creates an output data set that can be used by the TREE procedure to draw a tree diagram. You must give the data set a
data-set two-level name to save it
PSEUDO displays pseudo F and t2 statistics. This option is effective only when the data are coordinates or when
METHOD=AVERAGE, METHOD=CENTROID, or METHOD=WARD

To get a Dendogram use PROC TREE statement, a sample code is given below

PROC TREE SPACES=2;


RUN;

Here SPACES= specifies the number of spaces between objects

Further Reading : PROC CLUSTER http://www.sfu.ca/sasdoc/sashtml/stat/chap23/index.htm


PROC TREE http://www.sfu.ca/sasdoc/sashtml/stat/chap66/index.htm
For further Reference …

• SAS Self Learning pdf


• http://www.ats.ucla.edu/stat/sas/modules/default.htm
• http://www.sfu.ca/sasdoc/sashtml/stat/index.htm

You might also like