Day 1

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 13

Day 1

/*Creating comments - comments are always green color


When all my codes suddenly turn green, it means that I havent closed
my comment*/

*Applying SAS rules and syntax;

*SAS Libraries - Assign Libraries;


*Accessing Data and Creating Libraries using the Libname statement;
/* SYNTAX
Libname ChosenName(must not be more than 8 characters) ' Location of
the file ';
*/
*SAS statements are blue in color;

LIbnAme pairview '/home/olabodeoa0/Data Class';

*Data Step ;
*Data statement is used to create a new SAS data set;

Data customers; *Creating a new SAS data set and saving in the
temporary(WORK) library;
Set AIR.ORGANICS;
run;

Data pairview.customers; *Creating a new SAS data set and saving in


the permanent library;
Set AIR.ORGANICS;
run;

*Getting your data in sas and converting data to SAS data sets;
* Read internal data into SAS data set uspresidents using DATA STEPS;
* DATA Statement creates a new data set;
*The INPUT statement describes the arrangement of values in the raw
data file and assigns input values
to the corresponding SAS variables - how to read the data;
*The DATALINES Statement to indicate the internal data and must be the
last statement in the DATA step;
*RUN statement is important - it depicts the end of the DATA step.;
DATA uspresidents;
INPUT President $ Party $ Number;
DATALINES;
Adams F 2
Lincoln R 16
Grant R 18
Kennedy D 35
;
RUN;

*Read data from external file into SAS data set;


*Data Infile - The INFILE statement identifies the physical name and
location of the raw data file;
/*SYNTAX
DATA output-SAS-data-set;
INFILE 'raw-data-file-name';
INPUT specifications;
RUN;
*/
*To write an INPUT statement using the list input, simply list the
variable names after the INPUT
keyword in order they appear in the file.
*Variable names must not exceed 32 characters and leave at least one
space between the names;
*If the variable type is character put a $ sign after the variable
name, leave blank for Numeric type
e.g Input Name $ Age Height;

DATA uspresidents;
INFILE '/home/olabodeoa0/LSBDATA&CODE/President.dat' DLM=' ';
INPUT President $ Party $ Number;
RUN;

*Excercise;
* Create a SAS data set named toads;
* Read the data file ToadJump.dat using list input;
DATA ;
INFILE '';
INPUT ToadName $ Weight Jump1 Jump2 Jump3;
RUN;

*Reading Delimiter type of files using the DATA Step;


* DLM= or Delimiter= option in the Infile statement allows you to read
data files with other delimiters;
DATA reading;
INFILE '/home/olabodeoa0/LSBDATA&CODE/Books.txt' DLM=' ';
INPUT Name $ Week1 Week2 Week3 Week4 Week5;
RUN;

*Reading delimiter files using the import procedure - highly


recommended;
* Proc import - makes it easy to read certain types of files;
*Datafile= - specifies the location of the data, could be physical or
server based;
*OUT= option - specifies the output dataset to create, could be saved
in temporary or permanent library;
*DBMS= specifys the type of file, see the books for more option;
*Replace option - it replaces any previous data in your library given
the same name;
/* SYNTAX
PROC Import datafile= 'location of file, including the name'
Out= chosen dataset name DBMS= identifier Replace;
run;
*/
*Example;
PROC IMPORT DATAFILE ='/home/olabodeoa0/LSBDATA&CODE/Bands2.csv'
OUT = music DBMS= CSV REPLACE;
RUN;
*Exercise;
/* Read data from the spreadsheet in the lsb-excelfiles called
Onionrings.xls into SAS, Name the output dataset = Sales,
Save the dataset in your chosen permanent library */
*The Sheet statement specifies the name of the sheet in the excel
file, its a good practice to check the names;

PROC IMPORT DATAFILE = ''


OUT = DBMS= Replace;
Sheet='Sheet1';
RUN;

* Using the DATA Step to create a new SAS data set from an existing
SAS data set,
you use a DATA step. You begin the DATA step with the DATA statement,
which provides the name of the SAS data set that you're creating.
The data set can be temporary or permanent.;
*The SET statement specifies the existing SAS data set that you want
to read in as input data. ;

DATA NIGERIA;
SET MAPSSAS.ALGERIA;
RUN;

*Selecting Variables - It is important to select variables needed for


a particular analysis, it helps the query
run faster and makes your results easy to read;
*Using keep statement at the data step selects desired variables
created in the new dataset;
*Using keep option at the set statements selects the variables
needed to create the new datasets - in this example
Selecting_Variables;
* You can use a DROP statement to list the variables to exclude from
the new data set,
or use a KEEP statement to list the variables to include. If you use a
KEEP statement,
you must include every variable to be written, including any new
variables.;
*You can't drop or keep a new variables in the Set statements;

data pairview.production_product;
set '/home/olabodeoa0/Data Class/production_product.sas7bdat';
run;

Data Selecting_Variables(keep= Name StandardCost makeflag);


Set pairview.production_product;
run;

Data Selecting_Variables;
Set pairview.production_product ;
keep Name StandardCost;
run;

Data Selecting_Variables(drop= Name StandardCost);


Set pairview.production_product;
run;

Data Selecting_Variables;
Set pairview.production_product;
Drop Name StandardCost;
run;

*Renaming Variables - could either be at the Data or Set statement


level, make sure the new name
comforms with the sas naming convention. Use Rename= option at the
DATA or SET statement level;
*Firstobs= option tells SAS to start reading at observation n ;
*OBS= option tells SAS to stop reading at observation n;

Data Selecting_Variables (Rename= (Name = Product_Name StandardCost =


Price )) ;
Set pairview.production_product (Firstobs=1 obs=10) ;
run;

* You use an assignment statement to create a new variable. The


assignment statement evaluates an
expression and assigns the resulting value to a new or existing
variable. The expression is a
sequence of operands and operators. If the expression includes
arithmetic operators,
SAS performs the numeric operations based on priority, as in math
equations.
You can use parentheses to clarify or alter the order of operations. ;
*Creating and Redefining variables - ;
* Variable = expression ;
* Create a New variable called Percentage_Increase, make the variable
a 10% increase of standard cost;
* Create a new variable called newprice;
* Evaluate the profit;
Data Create_New_Variables (Keep=Name Product_Name StandardCost
Std_Cst_Percent_Inc ListPrice NewPrice Profit);
Set pairview.production_product;
Product_Name = Name;
Std_Cst_Percent_Inc = (10/100)* StandardCost;
NewPrice = StandardCost + Std_Cst_Percent_Inc;
Profit = ListPrice - NewPrice ;
run;

* Using the FORMAT Statement in a DATA Step- you can use the FORMAT
statement in a DATA step to
permanently associate formats with variables.
* FORMAT variable(s) format;;
*SAS Formats - used to enhance the appearance of variable values in
your reports;
*Formats - format only affects the displayed value. The stored value
is not affected by a format. ;
*The Sum function ignores missing values;

Data Create_New_Variables (Keep= Product_Name StandardCost


Std_Cst_Percent_Inc ListPrice NewPrice NewPrice2 Profit);
Set pairview.production_product;
*format NewPrice dollar10. Std_Cst_Percent_Inc dollar10. Profit
dollar10.;
Product_Name = Name;
Std_Cst_Percent_Inc = (10/100)* StandardCost;
NewPrice = StandardCost + Std_Cst_Percent_Inc;
NewPrice2 = Sum(StandardCost,Std_Cst_Percent_Inc);
Profit = ListPrice - NewPrice ;
format NewPrice dollar10. Std_Cst_Percent_Inc dollar10. Profit
dollar10.;
run;

*SAS Functions ;
/*Functions performs calculation on, or a transformation of, the
arguments given in parentheses following the
function name*/
*Data to use ODBC - ADWDW - Dimcustomer;
*Input function - use input function to convert character variables to
numeric which includes dates and currency;
*Put Function - Use the Put function to convert numeric variables to
character variables;
Data SAS_functions (Keep= New_BirthDate New_DOF_purchase
Charac_Birthdate DOB_Month DOB_Day Concat_Names
Concat_Names2 Upcase_FirstName Age Tenuretodate Agedays);
FORMAT New_BirthDate YYMMDD10. New_DOF_purchase YYMMDD10.;
Set PAIRVIEW.dbo_dimcustomer;
New_BirthDate = INPUT(BirthDate, YYMMDD10.);
New_DOF_purchase = Input(DateFirstPurchase, YYMMDD10.);
Charac_Birthdate = Put(New_BirthDate, MONYY5.);
DOB_Month = Month(New_BirthDate);
DOB_Day = Day(New_BirthDate);
DOB_Year = Year(New_BirthDate);
Concat_Names = FirstName || MiddleName || LastName;
Concat_Names2 = Cat(FirstName, MiddleName, LastName);
Upcase_FirstName = Upcase(FirstName);
Age = Yrdif(New_BirthDate, today(),'Age');
Agedays = Yrdif(New_BirthDate, today(),'Age')*365.25;
Tenuretodate = Int(Yrdif(New_DOF_purchase, today(), 'Age'));
run;

day 2

*Using SAS Procedures ;

PROC CONTENTS DATA= '/home/olabodeoa0/dbo_dimemployee.sas7bdat';


RUN;

*Proc print using a SAS permanent data sets ;


Proc Print Data= pairview.dbo_dimemployee;
run;

*Proc print using a location based data sets in this case SAS servers;
Proc Print Data= '/home/olabodeoa0/dbo_dimemployee.sas7bdat' (obs=
10);
run;

*Proc without the data= options. It prints the most recently created
data sets;
Proc Print;
run;

*Using the BY statements - It is used to perform a separate analysis


for each value of the BY
variables. Before using by in any proc(procedures) ensure the data has
been sorted by the BY variable;
Proc sort data= pairview.dbo_dimemployee;
by Gender;
run;

Proc Print Data= pairview.dbo_dimemployee;


by Gender;
run;

*Using Title and Footnotes;


*TITLE and FOOTNOTE statements are global statements, so they can
stand alone.
Also, any titles or footnotes that you assign remain in effect until
you change them,
cancel them, or end your SAS session. ;
Proc Print Data= pairview.color_spend;
Title 'Count of products by color';
Footnote 'To be reviewed on a monthly basis';
run;

*Using Where Statement to subset data in Procedures - please note that


if statements only work in Data steps;
*You can use the VAR statement in a PROC PRINT step to subset the
variables in a report. You specify the variables
to include and list them in the order in which they are to be
displayed.;
*The WHERE statement in a PROC PRINT step subsets the observations in
a report. When you use a WHERE statement,
the output contains only the observations that meet the conditions
specified in the WHERE expression. ;
Title 'Count of products by color';
Footnote 'To be reviewed on a monthly basis';
Proc Print Data= pairview.color_spend;
Var Color LineTotal;
Where Color in ('Black','Blue','Silver');
run;

*Using the Title statement or Footnote statement to cancel the title;


Proc Print Data= pairview.color_spend;
Var Color LineTotal;
Where Color in ('Black','Blue','Silver');
run;
Title;
Footnote;

*Using 2 Where statements - the first one would be overwritten by the


second where statement;
Proc print data= pairview.dbo_dimemployee;
var DepartmentName BaseRate Gender MaritalStatus;
where Gender = 'M';
where MaritalStatus = 'M';
Title 'Sum of Base rate by Departmentname';
run;
/*You can use the SUM statement in a PROC PRINT step to calculate and
display report totals for the requested numeric variables.*/
PROC PRINT DATA=SAS-data-set;
VAR variable(s);
SUM variable(s);
RUN;

Proc print data= pairview.dbo_dimemployee;


var DepartmentName BaseRate;
sum BaseRate;
Title 'Sum of Base rate by Departmentname';
run;

*NOOBS - suppresses the Obs column;


Proc print data= pairview.dbo_dimemployee noobs;
var DepartmentName BaseRate;
sum BaseRate;
Title 'Sum of Base rate by Departmentname';
run;

*Using Format statements in Proc print to change the appearance of a


variable in the report;
Proc print data=pairview.sas_telco;
var Actual_Revenue MonthsWithPairViewTelecom;
Format Actual_Revenue dollar9. ;
run;

*PROC SORT - Use this procedure to sort your data: to organise data
for a report, before merging or combining
data sets, or before using a BY statement in another PROC or DATA
step;
*BY statement specifies the variable to use to form BY groups. The
variables in the BY statement are called BY variables;
*Specify Ascending or Descending function in the BY statement before
adding the variables names (could be one or more)
If dont specify AScending or Descending order, SAS uses Ascending by
default;
*PROC SORT replaces the original data set unless you specify an output
data set in the OUT= option;
/* PROC SORT DATA= input-SAS-data-set
<OUT=output-SAS-data-set>;
BY<DESCENDING> by-variable(s);
RUN; */
Proc sort data= pairview.dbo_dimemployee;
by descending BaseRate;
run;
*Using OUT= Option - if you would like to create a new sas datasets
without altering the original one;
*NODUPKEY eliminates any duplicate observation in the BY variable;
Proc sort data= pairview.dbo_dimemployee out=work.dbo_dimemployee
nodupkey;
by descending Title;
run;

Proc sort data= pairview.dbo_dimemployee out=work.dbo_dimemployee ;


by HireDate;
run;

Proc print data=work.dbo_dimemployee (firstobs=1 OBS=100);


Title 'Sorted Employee data by hire date descending';
run;

*PROC PRINT - It prints all variables for all observations in a SAS


data sets;
*Print Monthly sales volume and revenue;
Proc Sort data=Pairview.sales_salesorderheader;
by SalesOrderID;
run;

Proc Sort data=pairview.sales_salesorderdetail;


by SalesOrderID;
run;

Data Print_data (Keep= SalesOrderID OrderDate OrderQty UnitPrice


New_OrderDate Order_MonYear Year_Order );
Merge Pairview.sales_salesorderheader pairview.sales_salesorderdetail;
Format Order_MonYear Monyy5.;
by SalesOrderID;
New_OrderDate = Put(datepart(OrderDate),Monyy5.);
Order_MonYear = Input(New_OrderDate ,Monyy5.);
Year_Order = Year(datepart(OrderDate));
run;
* Use Proc Print to print the results;
* Use the sum function to summarise data, always requires a by
statement (analogous to group by in SQL);
Proc sort data= Print_data;
by Order_MonYear;
run;
Proc print data= Print_data (obs=1000);
By Order_MonYear;
Sum OrderQty UnitPrice;
*var Order_MonYear OrderQty UnitPrice;
Title 'Monthly Sales Revenue & Volume ';
run;

*Summarising Data using Proc Means - Only applicable to Numeric data


and Calculates basic statistics;
*What is the minimum, maximum and average base rate of ADW employees?;
*use the class statement to group by - similar to the BY statement but
doesnt require you to sort the data;
Proc means data=pairview.dbo_dimemployee;
var BaseRate;
run;

Proc means data=pairview.dbo_dimemployee;


Class DepartmentName;
var BaseRate;
run;
*Write simple summary statistic of the OrderQty and Unitprice by Month
and year to a SAS DATA Set;
*Output Out - used to save summary statistics to a SAS Data sets for
further analysis;
*In cases when the results is very large always use the NOPRINT Option
to prevent the report from displaying;
Proc means noprint data=Print_data;
By Order_MonYear Year_Order;
Var OrderQty UnitPrice;
Output out= Print_data_Summary
Mean(OrderQty UnitPrice) = M_OrderQty M_UnitPrice
Sum(OrderQty UnitPrice) = S_OrderQty S_UnitPrice;
run;

Proc sort data= Print_data_Summary;


by descending S_UnitPrice;
run;
*PROC Summary;
*Using PROC Means with a NOPRINT option is the same as using PROC
Summary;
Proc print data= sashelp.air;
run;

Proc summary data=Print_data;


By Order_MonYear Year_Order;
Var OrderQty UnitPrice;
Output out= Print_data_Summary
Mean(OrderQty UnitPrice) = M_OrderQty M_UnitPrice
Sum(OrderQty UnitPrice) = S_OrderQty S_UnitPrice;
run;

*What is the average sale per month? ;


Data Print_data_Summary2 (keep= S_Orderquantity S_UnitPrice);
set Print_data_Summary;
run;
*Counting Data using PROC FREQ - is a simple list of counts, could be
one way, two way or more frequencies;
*Proc Freq is used to show the distribution of categorical data
values, it can also reveal anomalies in data;
*Create a two way frequencies of Employee Gender and Marital Status;
Proc freq data= pairview.dbo_dimemployee;
table Gender*MaritalStatus;
run;
* Which Job Title has the largest number of employee? ;
Proc freq data= pairview.dbo_dimemployee;
table Title Gender*MaritalStatus;
run;
* What is customer Gender and Marital Status mix within the ADW
Database? ;
* No Percent tells SAS not to print percentages;
* Use the Out= Option to create a new data set;
Proc Freq Data= pairview.dbo_dimcustomer ;
Tables Gender*MaritalStatus / out=dbo_dimcustomerGM ;*/ NOPERCENT
NOROW NOCOL ;
run;
* Producing Tabular Reports with PROC TABULATE;
* PROC Tabulate does pretty the same thing print, means, Freq;
* Class statement tells SAS which variables contain categorical data
to be used for dividing
observations into groups;
*Using the SAS_TELCO Datasets and Proc tabulate, analyse NPS_Score by
Region.
-Proc import datafile= '/home/olabodeoa0/SAS_TELCO.csv'
-Out=pairview.SAS_TELCO dbms=csv replace;
Proc tabulate data= pairview.SAS_TELCO out=SAS_TELCO;
Class Region nps_score ;
Table Region, nps_score ;
Title 'NPS SCORE CATEGORY BY REGION';
run;
Proc tabulate data= pairview.SAS_TELCO out=SAS_TELCO;
Class Region nps_score Gender ;
Table Region, nps_score, Gender;
Title 'NPS SCORE CATEGORY BY REGION & GENDER';
run;
Proc tabulate data= pairview.SAS_TELCO out=SAS_TELCO;
Class Region nps_score ;
VAR Actual_Revenue;
Table Region, nps_score ;
Title 'NPS SCORE CATEGORY BY REGION';
run;
*Producing Simple Output with PROC REPORT;
*Column statements works like the VAR statement in proc report, use it
to select the required variables;
Proc report data= pairview.SAS_TELCO;
Format Actual_Revenue dollar9.;
column nps_score Region Actual_Revenue ;
run;
*What is the total revenue of SAS_telco?;
Proc Report Data=pairview.sas_telco;
Format Actual_Revenue dollar9.;
Column Actual_Revenue;
Title 'Total Revenue for SAS_Telco';
Footnote 'Highly Confidential';
run;
*PROC Contents - it list or can be used to examine the contents of a
data set or library ;
Proc contents data=work.print_data_summary;
run;
*_ALL_ - used to show the contents of a SAS libraries;
Proc contents data=Pairview._ALL_;
run;
* Use the NODS option to exclude the descriptions ;
Proc contents data=Pairview._ALL_ nods;
run;
*PROC Rank - It computes ranks for one or more numeric variables
across the observations of a SAS data set and outputs the ranks to a
new SAS data set;
*Var - indicates the variable to rank;
*Groups - use this to indicate the number of groups;
*Ranks - used to specify the name of newly created ranked variable;
proc rank data=pairview.dbo_dimemployee out=dbo_dimemployee
descending;
var BaseRate;
ranks BR_Ratings;
run;
proc print data=dbo_dimemployee;
run;
*Using PROC UNIVARIATE to Detect Data Outliers: A procedure that's
useful for detecting data outliers,
which are data values that fall outside the expected ranges, is PROC
UNIVARIATE. The syntax for PROC UNIVARIATE
is very similar to PROC MEANS syntax, and like PROC MEANS, PROC
UNIVARIATE produces summary reports of
descriptive statistics. You start with the PROC UNIVARIATE statement
and list the input data set.
You add a VAR statement to list the analysis variables, which are the
numeric variables for which statistics are to be computed.;
/*PROC UNIVARIATE DATA=SAS-data-set;
VAR variable(s);
RUN;*/
Proc univariate data=pairview.dbo_DimCustomer;
Var YearlyIncome;
run;

proc sql;
create table Stay as
select
Customerkey,
avg(SalesAmount) as AvgCust_Sales,
Sum(SalesAmount) as TotalCust_Sales

/* min(Orderdate) as Admit_Date,
max(Orderdate) as Discharge_Date,
max(Orderdate) - min(Orderdate) as Length_of_stay*/
from pairview.DBO_FACTINTERNETSALES
group by Customerkey,
;
quit;

proc sql;
create table pairview.Stay as
select
A.Customerkey,
MaritalStatus,
/*FORMAT YYMMDD10. input(Birthdate,YYMMDD10.) as BDate,*/
/*DATEDIFF (day, input(Birthdate,YYMMDD10.) , today() ),*/
INTNX('year', input(Birthdate,YYMMDD10.), 2, 'same') as
Nextbday,
INTCK('year', input(Birthdate,YYMMDD10.), today()) as Age,
avg(SalesAmount) as AvgCust_Sales,
Sum(SalesAmount) as TotalCust_Sales
from pairview.DBO_FACTINTERNETSALES as A
join pairview.DBO_DIMCUSTOMER B
on A.Customerkey = B.Customerkey
group by
A.Customerkey,
MaritalStatus,
Birthdate;
quit;

You might also like