Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 9

Survival Analysis

Assignment 1
a. What is the length (byte size) of the variable Product_Description?

As shown in the above Proc Content output, the length (which specifies the
number of bytes for storing variables) of the variable Product description is 45.
b. Print a subset of this data set with observations either from Denmark (DK) or
not having word Backpack in their description. Title the report appropriately.
Give the full names of countries as a footnote. (AU Australia, DK Denmark,
ES Spain, US United States).

Report 1:

The following report converted in a pdf format shows the output of PROC PRINT
statement with employees earning more than $30,000.00 descending with variables
EMPLOYEE_HIRE_DATE and SALARY. EMPLOYEEID is used instead of observations and
a sum value is added for the variable SALARY.

Adobe Acrobat
Document

The descriptive statistics of Salary for each gender class is given as follows:

Comment: It can be clearly seen that on an average Females are paid around
$4500. Although the standard deviation for females is about half of that of the
males. This shows a higher degree of variance in the salaries of males. There is

considerable difference between max of salaries between males and females which
may be the result of higher concentration of males among higher ranks.
Overall, individually, the distribution of salaries seem to be evenly distributed apart
from the difference between 99th percentile and the maximum which might
indicates presence of outliers in the variable -salary.
Following is the SAS output for frequency procedures for the variable
EMPLOYEE_GENDER and a cross tabulation between EMPLOYEE_GENDER and
DEPENDANTS.

Comment: The frequency table shows the distribution of employees between Males
and Females in absolute and percentage terms.

Comment: The following insights can be made about the data looking at the above
cross tab:
a) About 40% of female employees and 44% of male employees have no
dependents (look at row percentages).
b) Males with 1 dependent (22.32%) are relatively more than Females with 1
dependent (19.37%). Other than that, females have relatively higher
percentages with 2 and 3 dependents.

Report 2:
The following report contains variables SALARY, EMPLOYEE_BIRTH_DATE, and
calculated variables AGE_AT_TERMINATION and TENURE as well as EMPLOYEEID is
used instead of observations.

Adobe Acrobat
Document

The following table shows the averages of the calculated variables


AGE_AT_TERMINATION and TENURE:

Comment:
The Age at termination on an average is 37.67 years while the tenure of employees
is 10.16 years on an average.

Appendix:
SAS Codes:
1. To input data by data lines:
Data test;
Input Company $1-21 Country $22-24 Product_Description $25-69;
Datalines;
Top Sports
DK
Black/Black
Top Sports
DK
X-Large Bottlegreen/Black
Top Sports
DK
Comanche Women's 6000 Q Backpack. Bark
Miller Trading Inc
US
Expedition Camp Duffle Medium Backpack
Toto Outdoor Gear
AU
Feelgood 55-75 Litre Black Women's Backpack
Toto Outdoor Gear
AU
Jaguar 50-75 Liter Blue Women's Backpack
Top Sports
DK
Medium Black/Bark Backpack
Top Sports
DK
Medium Gold Black/Gold Backpack
Top Sports
DK
Medium Olive Olive/Black Backpack
Toto Outdoor Gear
AU
Trekker 65 Royal Men's Backpack
Top Sports
DK
Victor Grey/Olive Women's Backpack
Luna sastreria S.A.
ES
Hammock Sports Bag
Miller Trading Inc
US
Sioux Men's Backpack 26 Litre
;
run;
proc contents data=test;
run;

2. To identify the descriptions which contain Backpacks, I created a new


variable : flag = Backpack and then used the index variable to find the
position of the keyword in the string.

data test1;
set test;
flag= 'Backpack';
position_backpack=INDEX(Product_Description,flag);
run;
proc print data=test1;
var Company Country Product_Description;
where Country='DK' or position_backpack NE 0;
Title "Danish Companies OR ones containing 'Backpack' in description";
footnote 'AU - Australia, DK - Denmark, ES - Spain, US - United States';
run;
Report 1 SAS codes:
/*Import data set*/
DATA WORK.payroll_full;
LENGTH
Employee_ID
$ 8
Employee_Gender $ 1
Salary
8
Birth_Date
8
Employee_Hire_Date
8
Employee_Term_Date
8
Marital_Status
$ 1
Dependents
8 ;
FORMAT
Employee_ID
8.
Employee_Gender $CHAR1.
Salary
BEST6.
Birth_Date
BEST5.
Employee_Hire_Date BEST5.
Employee_Term_Date BEST5.
Marital_Status
$CHAR1.
Dependents
BEST1. ;
INFORMAT
Employee_ID
8.
Employee_Gender $CHAR1.
Salary
BEST6.
Birth_Date
BEST5.
Employee_Hire_Date BEST5.
Employee_Term_Date BEST5.
Marital_Status
$CHAR1.

Dependents
BEST1. ;
INFILE
'C:\Users\mrt14009\AppData\Local\Temp\SEG1904\EMPLOYEE_PAYROLL2932033834854e828ec3f8ce8830861b.txt'
LRECL=36
ENCODING="WLATIN1"
TERMSTR=CRLF
DLM='7F'x
MISSOVER
DSD ;
INPUT
Employee_ID
: ?? 8.
Employee_Gender : $CHAR1.
Salary
: ?? BEST6.
Birth_Date
: ?? BEST5.
Employee_Hire_Date : ?? BEST5.
Employee_Term_Date : ?? BEST5.
Marital_Status
: $CHAR1.
Dependents
: ?? BEST1. ;
RUN;
proc means data=PAYROLL_full n nmiss;
run;
/*sorting data where salary >30000*/
Proc sort data=PAYROLL_full out=payroll_subset;
by descending salary;
where salary >30000;
run;
/*printing report*/
proc print data=payroll_subset noobs;
var Employee_ID Employee_Hire_Date Salary;
sum salary;
label Employee_Hire_Date= 'employee hire date';
format
Employee_ID $8. Employee_Hire_Date date8. ;
run;
/*descreptive statistics for variable salary*/
proc means data=payroll_full mean min p1 p5 p10 p25 median p75 p90 p95
p99 max stddev;
var Salary;
class Employee_Gender;
run;
proc freq data=PAYROLL_full;
table Employee_Gender;
run;

proc freq data=PAYROLL_full;


table Employee_Gender*Dependents;
run;

Report 2 SAS codes:


/*Reading data with condition*/
data payroll_terminated;
set payroll_full;
where Employee_Term_Date NE .;
run;
/*sorting by birthdate*/
proc sort data=payroll_terminated;
by Birth_Date;
run;
/*calculating age at termination and tenure*/
data payroll_terminated1;
set payroll_terminated;
age_at_termination_days = intck('day',Birth_Date, Employee_Term_Date);
tenure_days = intck('day',Employee_Hire_Date,Employee_Term_Date);
run;
data payroll_terminated(drop=age_at_termination_days tenure_days);
set payroll_terminated1;
age_at_termination = (age_at_termination_days/365.25);
tenure = (tenure_days/365.25);
label age_at_termination = 'age at termination in years';
label tenure = 'tenure in years';
format age_at_termination 3.2;
format tenure 3.2;
run;
/*reporting*/
proc print data= payroll_terminated noobs;
var Employee_ID salary Birth_Date age_at_termination tenure;
format Birth_Date date8. tenure 3.2 age_at_termination 3.2;
run;

/*averages*/
proc means data=payroll_terminated mean ;
var age_at_termination tenure;
run;

You might also like