Categorical Analysis Assinment Assignment

JIMMA UNIERSITY
COLLEGE OF NATURAL SCIENCCE
DEPARTMENT OF STATISTICS
(3rd YEAR STATISTICS DEPARTEMENT STUDENT)
CATAGORICAL DATA ANALYSIS
ASSINGMENT STUDENT
NAME ID
1. GOSA CHALA.……………....…………………………………………….RU1377/14
SUMMITED TO: YASIN N.

MAY 2024
JIMMA ETHIOPIA
1. DESCRIPTIVE STATISTICS
Descriptive statistics serves as the bedrock of statistical analysis, acting as a means to

summarise, organize, and present raw data in a meaningful and interpretable manner. Unlike
inferential statistics which conclude populations based on sample data, descriptive statistics
concentrate on the characteristics of the data itself. It involves measures of central
tendency such as mean, median, and mode, as well as measures of variability like range and
standard deviation. Essentially, descriptive statistics provide a snapshot of the essential features
of a dataset, enabling researchers to identify patterns, trends, and outliers. This fundamental
understanding serves as the cornerstone for more advanced statistical analyses, making it an
indispensable tool in the researcher’s toolkit.
AIM OF DESCRIPTIVE STATISTICS
The primary aim of descriptive statistics is to distill complex datasets into manageable and
insightful information. By offering a snapshot of key characteristics, it facilitates a clearer
comprehension of the underlying patterns within the data. Researchers employ descriptive
statistics to summarise large datasets succinctly, enabling them to communicate essential
features to both technical and non-technical audiences. These statistical measures serve as a
foundation for making informed decisions, guiding subsequent analyses, and, ultimately,
extracting meaningful conclusions from the data at hand.
ASSUMPTIONS FOR DESCRIPTIVE STATISTICS
Before delving into the intricacies of descriptive statistics, it’s crucial to acknowledge the
underlying assumptions.
• Numerical: Descriptive statistics assume that the data under examination is numerical, as these
measures primarily deal with quantitative information.
• Random: These statistics operate under the assumption of a random and representative sample,
ensuring that the calculated values accurately reflect the broader population.
Understanding these assumptions is pivotal for accurate interpretation and application,
reinforcing the importance of meticulous data collection and the consideration of the statistical
context in which descriptive statistics are employed.
1
How to Perform Descriptive Statistics in SPSS
Step by Step: Running Descriptive Statistics in SPSS Statistics
Performing Descriptive Statistics in SPSS involves several steps. Here’s a step-by-step guide to
assist you through the procedure:
1. STEP: Load Data into SPSS

Commence by launching SPSS and loading your dataset, which should encompass the variables
of interest – a categorical independent variable (if any) and the continuous dependent variable. If
your data is not already in SPSS format, you can import it by navigating to File > Open >
Data and selecting your data file.
2. STEP: Access the Analyze Menu
In the top menu, locate and click on “Analyze.” Within the “Analyze” menu, navigate to
“Descriptive Statistics” and choose “Descriptive.” Analyze > Descriptive Statistics >
descriptivist
2
3. STEP: Specify Variables
Upon selecting “Descriptives,” a dialog box will appear. Transfer the continuous variable you
wish to analyses into the “Variable(s)” box.
4. STEP: Define Options

Click on the “Options” button within the “Descriptive” dialog box to access additional settings.
Here, you can request various descriptive statistics such as mean, median, mode, standard
deviation, and more. Adjust the settings according to your analytical requirements.
5. STEP: Generate Descriptive Statistics:
Once you have specified your variables and chosen options, click the “OK” button to execute the
analysis. SPSS will generate a comprehensive output, including the requested descriptive
statistics for your dataset.
Conducting descriptive statistics in SPSS provides a robust foundation for understanding the
key features of your data. Always ensure that you consult the documentation corresponding to
your SPSS version, as steps might slightly differ based on the software version in use.
3
SPSS Output for Descriptive Statistics
Descriptives
[DataSet1] C:\Users\user\Downloads\Lung cancer Data1 (1).sav
How to Interpret SPSS Output of Descriptive Statistics
Interpreting the SPSS output for descriptive statistics is pivotal for drawing meaningful
conclusions. Firstly, focus on the measures of central tendency, such as the mean, median, and
mode. These values provide insights into the typical or average score in your dataset. Next,
examine measures of variability, including the range and standard deviation. The range indicates
the spread of scores from the lowest to the highest, while the standard deviation quantifies the
average amount of variation in your data.
4
SPSS Statistics for Descriptive
In our example, SPSS Output for Descriptive Statistics, the descriptive statistics provided
describe five variables: “Having lung cancer or not, Gender of respondents, Smoking cigarete,
Age of respondents, Body Mass Index of respondents” based on a sample of 290 individuals.
Here’s an interpretation of each statistic:
The descriptive statistics provided in the data output offer a summary of the data related to lung
cancer.
Here's an interpretation of the key statistics: -
• Having Lung Cancer or Not
• Mean: 0.39, indicating that less than half of the respondents have lung cancer.
• Standard Deviation (Std. Deviation): 0.489, suggesting moderate variability in
the responses.
• Skewness: -1.806, indicating the data is highly skewed towards respondents not
having lung cancer.
• Gender of Respondents
• Mean: 0.37, suggesting a slightly lower proportion of one gender over the other. –
• Std. Deviation: 0.484, showing moderate variability in gender distribution. –
• Skewness: -0.141, indicating a slight skewness in gender distribution. –
• Smoking Cigarette
• Mean: 0.42, implying that a slightly higher proportion of respondents are
smokers.
• Std. Deviation: 0.495, indicating moderate variability in smoking status among
respondents.
• Skewness: 0.323, showing a slight skew towards more respondents being
smokers. - Age of Respondents: The statistics for age are not visible in the image
provided. –
• Body Mass Index (BMI) of Respondents:
The statistics for BMI are also not visible in the image provided.
Valid N (listwise): 290, indicating that all the statistics are based on 290 valid responses.
• The range and kurtosis for each variable are not discussed here as they are not fully
visible in the table. The range would provide insights into the spread of the data, while
kurtosis would indicate the 'tailedness' of the distribution.
• When interpreting these statistics, it's important to consider the context of the study and
the population from which the data was drawn.
• The mean values give us an average, but the standard deviation, skewness, and kurtosis
(if available).
5
Binary Logistic Regression in SPSS, a powerful statistical technique that unlocks insights in
various fields, from healthcare to marketing. In this blog post, we’ll navigate the intricacies of
binary logistic regression, providing you with a comprehensive understanding of its applications,
assumptions, and practical implementation. Whether you’re a student or a novice in statistical
modeling, this guide will equip you with the knowledge and skills to leverage binary logistic
regression effectively.
Types of Logistic Regression
Before delving into binary logistic regression, let’s take a moment to explore the broader
landscape of logistic regression.
There are three primary types:
• Binomial Logistic Regression,
• Multinomial Logistic Regression, and
• Ordinal Logistic Regression.
• Binomial Logistic Regression deals with binary outcomes, where the dependent variable has
only two possible categories, such as yes/no or pass/fail.
• Multinomial Logistic Regression comes into play when the dependent variable has more than
two unordered categories, allowing us to predict which category a case is likely to fall into.
• Ordinal Logistic Regression is employed when the dependent variable has multiple ordered
categories, like low, medium, and high, enabling us to predict the likelihood of a case falling into
or above a specific category.
Our focus will be on Binary Logistic Regression, which is widely used for binary outcomes and
forms the foundation for understanding logistic regression.
Definition: Binary Logistic Regression
Binary Logistic Regression is a statistical method that deals with predicting binary outcomes,
making it an invaluable tool in various fields, including healthcare, finance, and social sciences.
In binary logistic regression, the dependent variable is categorical with only two possible
outcomes, often coded as 0 and 1. This technique allows us to model and understand the
relationship between one or more independent variables and the probability of an event occurring
or not occurring.
6
Logistic Regression Equation
At the core of Binary Logistic Regression lies the Logistic Regression Equation, which is vital
for understanding the relationship between the predictor variables and the binary outcome. The
equation can be expressed as follows:
ln (p/p-1)(= b0 + b1X1 + b2X2 + …..+ bkXk \]

Where;
• p represents the probability of the binary outcome (e.g., success or failure).

• b0 is the intercept term.
• b1, b2, …., bk are the coefficients for the independent variables ( X1, X2, …, Xk ).
• ln denotes the natural logarithm.
Coefficients and Odds ratio
The coefficients ( b1, b2, …., bk ) represent the slope of the relationship between each predictor
variable and the log odds of the binary outcome. The odds ratio, which can be calculated as (
exp(bi) ), helps in interpreting the impact of each predictor. An odds ratio greater than 1 indicates
that an increase in the predictor variable is associated with higher odds of the event occurring,
while an odds ratio less than 1 implies lower odds. Understanding this equation and the concept
of odds ratios is fundamental for grasping the mechanics of binary logistic regression.
Logistic Regression Classification
In Binary Logistic Regression, classification plays a crucial role. Classification involves

assigning cases to one of the two categories based on their predicted probabilities of belonging to
a particular class. Typically, a threshold probability of 0.5 is used, where cases with predicted
probabilities greater than or equal to 0.5 are classified into one category, and those with
predicted probabilities less than 0.5 are assigned to the other category. However, the choice of
the threshold can be adjusted depending on the specific needs and priorities of the analysis.
Binary Logistic Regression not only predicts the outcome but also provides insights into the
factors that influence the outcome. By examining the coefficients and odds ratios associated with
each predictor variable, analysts can identify the significance and direction of these influences.
This information is invaluable for decision-making and understanding the driving forces behind
7
binary outcomes in various scenarios, such as predicting customer churn, diagnosing medical
conditions, or assessing the likelihood of loan default.
Assumptions of Binary Logistic Regression
Assumptions
• Logistic regression does not assume a linear relationship between the dependent and
independent variables.
• The independent variables need not be interval, nor normally distributed, nor linearly
related, nor of equal variance within each group
• Homoscedasticity is not required. The error terms (residuals) do not need to be normally
distributed.
• The dependent variable in logistic regression is not measured on an interval or ratio scale.
The dependent variable must be a dichotomous ( 2 categories) for the binary logistic
regression.
• The categories (groups) as a dependent variable must be mutually exclusive and
exhaustive; a case can only be in one group and every case must be a member of one of the
groups.
• Larger samples are needed than for linear regression because maximum coefficients using
a ML method are large sample estimates. A minimum of 50 cases per predictor is
recommended (Field, 2013)
• Hosmer, Lemeshow, and Sturdivant (2013) suggest a minimum sample of 10 observations
per independent variable in the model, but caution that 20 observations per variable should
be sought if possible.
• Leblanc and Fitzgerald (2000) suggest a minimum of 30 observations per independent
variable.
Hypothesis of Binary Logistic Regression
In Binary Logistic Regression, hypotheses guide the analysis and the interpretation of results.
Specifically, two hypotheses are central to binary logistic regression:
• Null Hypothesis (H0): there is no significant relationship between the independent variables and
the binary outcome.
• Alternative Hypothesis (H1): At least one of the independent variables has a significant effect
on the binary outcome.
Hypothesis testing in binary logistic regression involves examining the significance of the
coefficients associated with each independent variable. If any coefficient has a p-value less than
the chosen significance level (commonly 0.05), it implies that the corresponding variable has a
significant effect on the outcome. Hypothesis testing is a critical step in determining which
predictor variables contribute significantly to the model and understanding their impact on the
binary outcome.
8
Step by Step: Running Logistic Regression in SPSS Statistics
Now, let’s delve into the step-by-step process of conducting the Binary Logistic
Regression using SPSS Statistics.
Here’s a step-by-step guide on how to perform a Binary Logistic Regression in SPSS:
• STEP: Load Data into SPSS

Commence by launching SPSS and loading your dataset, which should encompass the variables
of interest – a categorical independent variable. If your data is not already in SPSS format, you
can import it by navigating to File > Open > Data and selecting your data file.
• STEP: Access the Analyze Menu
In the top menu, locate and click on “Analyze.” Within the “Analyze” menu, navigate to
“Regression” and choose” Linear” Analyze > Regression> Binary Logistic
• STEP: Choose Variables
In the Binary Logistic Regression dialog box, move the binary outcome variable into the
“Dependent” box and the predictor variables into the “Block” box. (if you have categorical
variables, please indicate them in “Categorical” )
Click “Options” and check “Hosmer-Lemeshow goodness – of – fit”, “CI for exp (B)” and
“Correlation of estimates”
1. STEP: Generate SPSS Output
Once you have specified your variables and chosen options, click the “OK” button to perform
the analysis. SPSS will generate a comprehensive output, including the requested frequency table
and chart for your dataset.
Executing these steps initiates the Binary Logistic Regression in SPSS, allowing researchers to
assess the impact of the teaching method on students’ test scores while considering the repeated
measures. In the next section, we will delve into the interpretation of SPSS output for Binary
Logistic Regression.
Note
Conducting a Binary Logistic Regression in SPSS provides a robust foundation for
understanding the key features of your data. Always ensure that you consult the documentation
corresponding to your SPSS version, as steps might slightly differ based on the software version
in use.
9
SPSS Output for Binary Logistic Regression
Model Summary
Step -2 Log likelihood Cox & Snell R Nagelkerke R
Square Square
1 282.315a .305 .413
a. Estimation terminated at iteration number 5 because
parameter estimates changed by less than .001.
Classification Tablea
Observed Predicted
Having lung cancer or not Percentage
Yes No Correct
Yes 145 32 81.9

Having lung cancer or not
Step 1 No 35 78 69.0
Overall Percentage 76.9
a. The cut value is .500
Variables in the Equation

B S.E. Wald df Sig. Exp(B) 95% C.I.for EXP(B)
Lower Upper
gender(1) 1.410 .308 21.031 1 .000 4.097 2.242 7.486
Smaking(1) 2.363 .304 60.579 1 .000 10.626 5.860 19.268
Step 1a Age .020 .009 4.455 1 .035 1.020 1.001 1.039
BMI -.004 .008 .307 1 .579 .996 .980 1.011
Constant -2.927 .758 14.920 1 .000 .054
a. Variable(s) entered on step 1: gender, Smaking, Age, BMI.
Hosmer and Lemeshow Test

Step Chi-square df Sig.
1 9.509 8 .301
10
Interpreting Binary Logistic Regression
Interpreting the SPSS output of binary logistic regression involves examining key tables to
understand the model’s performance and the significance of predictor variables. Here are the
essential tables to focus on:
Model Summary Table

The Model Summary table provides information about the overall fit of the model. Look at the
Cox & Snell R Square and Nagelkerke R Square values. Higher values indicate better model fit.
Model Summary
Step -2 Log likelihood Cox & Snell R Nagelkerke R
Square Square
1 282.315a .305 .413
a. Estimation terminated at iteration number 5 because
parameter estimates changed by less than .001.
Interpretation:
-2 Log likelihood: This is a measure of how well the model fits the data. Lower values indicate
better fit. In this case, the value is 282.315.
Cox & Snell R Square: This is a measure of the proportion of variance in the dependent
variable that is accounted for by the independent variables. It ranges from 0 to 1, where higher
values indicate a better fit. Here, its 0.305, suggesting that the model explains about 30.5% of the
variance in the dependent variable.
Nagelkerke R Square: This is another measure of the proportion of variance explained by the
model, adjusted for the number of predictors. It's also scaled from 0 to 1, with higher values
indicating a better fit. Here, its 0.413, indicating that the model explains about 41.3% of the
variance in the dependent variable.
Variables not in the Equation Table

This table lists predictor variables that were not included in the final model. It can be helpful in
understanding which variables did not contribute significantly to the prediction.
Variables not in the Equation

Score df Sig.
gender(1) 27.143 1 .000
Step 0 Variables Smoking(1) 74.817 1 .000
Age 5.810 1 .016
11
BMI 1.490 1 .222
Overall Statistics 95.616 4 .000
Interpretation
For each variable listed in the "Variables not in the Equation" section, the score, degrees of
freedom, and significance level are provided.
A low p-value (usually less than 0.05) suggests that the variable is significantly related to the
dependent variable.
In this case, it seems that "gender" and "smoking" have very low p-values (both 0.000),
indicating that they are highly significant predictors of the dependent variable. Age also has a
significant p-value of 0.016, suggesting it is also relevant to the model.
However, BMI's p-value is 0.222, indicating that it is not statistically significant in predicting the
dependent variable in this model.
Overall, these results suggest that "gender," "smoking," and "age" are important predictors in the
model, while "BMI" is not significant in predicting the dependent variable.
Block 1: Method = Enter

Goodness-of-fit statistics help you to determine whether the model adequately describes the
data
Omnibus Tests of Model Coefficients

This table presents a Chi-Square test for the overall significance of the model. A significant p-
value suggests that at least one predictor variable is significantly associated with the binary
outcome.
Omnibus Tests of Model Coefficients is used to test the model fit. If the Model is significant, this
shows that there is a significant improvement in fit as compared to the null model, hence, the
model is showing a good fit
Omnibus Tests of Model Coefficients

Chi-square df Sig.
Step 105.470 4 .000
Step 1
Block 105.470 4 .000
12
Model 105.470 4 .000
Interpretation
The omnibus test evaluates the overall significance of the model. In this case, the chi-square
value is 105.470 with 4 degrees of freedom, and the associated p-value is < .0001 (or .000). This
indicates that the model as a whole is statistically significant at a very high level of significance.
Essentially, the model, including all the predictors considered together, is providing statistically
significant information in explaining the variation in the dependent variable.
Overall, these results suggest that the predictors included in the model collectively have a
significant effect on the dependent variable.

The Hosmer and Lemeshow Test assesses the goodness of fit of the model. A non-significant p-
value indicates that the model fits the data well. However, a significant p-value suggests that
there may be issues with the model’s fit.
The Hosmer and Lemeshow test is also a test of Model fit. The Hosmer-Lemeshow statistic
indicates a poor fit if the significance value is less than 0.05. Here, the model adequately fits the
data. Hence, there is no difference between the observed and predicted model

Step Chi-square df Sig.
1 9.509 8 .301
Interpretation
The Hosmer and Lemeshow test is a statistical test used to assess the goodness of fit of a logistic
regression model. It helps determine whether there is a significant difference between the
observed and predicted values in the model.
In the given data, the test statistic (Chi-square) is 9.509, and the degrees of freedom (df) are 8.
The significance level (Sig.) is 0.301.
To interpret the results:
The chi-square statistic of 9.509 indicates the overall discrepancy between the observed and
predicted values in the logistic regression model.
The degrees of freedom (df) represent the number of categories minus the number of estimated
parameters in the model. In this case, there are 8 degrees of freedom.
13
The significance level (Sig.) of 0.301 indicates the p-value associated with the chi-square
statistic. It represents the probability of obtaining a test statistic as extreme as the observed one,
assuming the null hypothesis is true.
Based on the provided data, since the p-value is 0.301, which is greater than the common
significance level of 0.05, we fail to reject the null hypothesis. This means that there is no
significant difference between the observed and predicted values in the logistic regression model.
In other words, the model appears to provide a good fit to the data.
Classification Table
The Classification Table displays the accuracy of the model in classifying cases into their
respective categories. It includes information on true positives, true negatives, false positives,
and false negatives. The overall classification accuracy percentage indicates how well the model
predicts the binary outcome.
Classification Tablea
Observed Predicted
Having lung cancer or not Percentage
Yes No Correct
Yes 145 32 81.9

Having lung cancer or not
Step 1 No 35 78 69.0
Overall Percentage 76.9
a. The cut value is .500
Interpretation
Based on the given classification table:
True Positive (TP): 145 (Number of cases correctly classified as having lung cancer)
False Positive (FP): 32 (Number of cases incorrectly classified as having lung cancer when they
don't)
False Negative (FN): 35 (Number of cases incorrectly classified as not having lung cancer when
they do)
True Negative (TN): 78 (Number of cases correctly classified as not having lung cancer)
The overall percentage correct classification is 76.9%.
The cut value of 0.500 refers to the probability threshold used to classify instances into positive
(having lung cancer) or negative (not having lung cancer) classes.
14
Based on the provided data, the model's performance seems reasonably good, with a high overall
percentage correct of 76.9%.
Variables in the Equation Table

This table lists the predictor variables included in the final model, along with their coefficients
(Bs) and significance levels.
Variables in the Equation

B S.E. Wald df Sig. Exp(B) 95% C.I.for EXP(B)
Lower Upper
gender(1) 1.410 .308 21.031 1 .000 4.097 2.242 7.486
Smaking(1) 2.363 .304 60.579 1 .000 10.626 5.860 19.268
Step 1a Age .020 .009 4.455 1 .035 1.020 1.001 1.039
BMI -.004 .008 .307 1 .579 .996 .980 1.011
Constant -2.927 .758 14.920 1 .000 .054
a. Variable(s) entered on step 1: gender, Smaking, Age, BMI.
• Coefficients (B) represent the impact of each predictor on the log odds of the binary outcome.
• Exp (B): Odds ratios, which can be calculated as Exp (B) indicate the change in odds for a one-
unit change in the predictor. An odds ratio greater than 1 suggests an increase in the odds of the
event occurring, while a value less than 1 implies a decrease.
• Wald Statistics Wald statistics values for each predictor variable. These values help assess the
significance of each predictor. Lower p-values indicate a more significant impact on the
outcome.
• t-values: Indicate how many standard errors the coefficients are from zero. Higher absolute t-
values suggest greater significance.
• P values: Test the null hypothesis that the corresponding coefficient is equal to zero. A low p-
value suggests that the predictors are significantly related to the dependent variable.
By thoroughly examining these output tables, you can gain a comprehensive understanding of
the binary logistic regression model’s performance and the significance of predictor variables.
This information is essential for making informed decisions and drawing meaningful conclusions
from your analysis.
• Odds is the Ratio of Probability – P(A)/P(B)

• This table shows the relationship between the predictors and the outcome.
• B (Beta) is the predicted change in Log Odds – for 1 unit change in predictor, there is
Exp(B) change in the probability of the outcome.
15
• The beta coefficients can be negative or positive, and have a t-value and significance of the
t-value associated with each. … If the beta coefficient is negative, the interpretation is that
for every 1-unit increase in the predictor variable, the outcome variable will decrease by
the beta coefficient value.
Based on the Variables in the Equation, it appears that a logistic regression analysis was
performed with several independent variables entered in
Step 1. Let's interpret the results for each variable:
 Gender:
 The coefficient (B) for gender is 1.410.
 The standard error (S.E.) associated with the coefficient is 0.308.
 The Wald statistic is 21.031, which is the ratio of the coefficient to its standard error.
 The degrees of freedom (df) associated with the Wald statistic is 1.
 The significance (Sig.) value indicates the p-value for the Wald statistic, which is 0.000.
 The odds ratio (Exp(B)) is 4.097, which represents the change in odds for each unit
increase in gender.
The 95% confidence interval (C.I.) for the odds ratio ranges from 2.242 to 7.486.
 Interpretation for Gender:
The variable "gender" is statistically significant (p < 0.001) and has a positive coefficient of
1.410. This indicates that being male (assuming gender 1 represents male and 0 represents
female) is associated with higher odds of the outcome, controlling for other variables.
Specifically, males have 4.097 times higher odds of the outcome compared to females, with a
95% confidence interval ranging from 2.242 to 7.486.
 Smoking:
 The coefficient (B) for smoking is 2.363.
 The standard error (S.E.) associated with the coefficient is 0.304.
 The Wald statistic is 60.579.
 The degrees of freedom (df) associated with the Wald statistic is 1.
 The significance (Sig.) value is 0.000.
 The odds ratio (Exp(B)) is 10.626.
 Interpretation for Smoking:
The variable "Smoking" is statistically significant (p < 0.001) and has a positive coefficient of
2.363. This indicates that smoking is associated with higher odds of the outcome, controlling for
other variables. Specifically, smokers have 10.626 times higher odds of the outcome compared to
non-smokers, with a 95% confidence interval ranging from 5.860 to 19.268.
16
 Age:
 The coefficient (B) for age is 0.020.
 The Wald statistic is 4.455.
 The significance (Sig.) value is 0.035.
 The odds ratio (Exp(B)) is 1.020.
 Interpretation for Age:
The variable "Age" is statistically significant (p = 0.035) and has a positive coefficient of 0.020.
This indicates that for each unit increase in age, the odds of the outcome increase by a factor of
1.020, controlling for other variables. The 95% confidence interval for the odds ratio ranges from
1.001 to 1.039.
 BMI:
 The coefficient (B) for BMI is -0.004.
 The odds ratio (Exp (B)) is 0.996.
 Interpretation for BMI:
The variable "BMI" is not statistically significant (p = 0.579), as the p-value is greater than the
common significance level of 0.05. This suggests that BMI is not significantly associated with
the outcome when controlling for other variables.
 Constant:
 The coefficient (B) for the constant term is -2.927.
The constant term is not directly interpretable in logistic regression, but it is used as a reference
point for the other variables in the equation.
Overall, the interpretation of the logistic regression results suggests that gender, smoking, and
age are statistically significant predictors in the logistic regression model for the outcome
variable, while BMI does not appear to be a significant predictor. The coefficients and odds
17
ratios provide information about the direction and magnitude of the associations between the
independent variables and the outcome, controlling for other variables.
Odds Ratio: 1
Probability of falling into the target group is equal to the probability of falling into the non-target
group
Odds Ratio: > 1 (Probability of Event Occurring)
Probability of falling into the target group is greater to the probability of falling into the non-target
group. The Event is likely to occur.
Odds Ratio: < 1 (Probability of Event Occurring Decreases)
Probability of falling into the target group is Less to the probability of falling into the non-target
group. The Event is unlikely to occur.
We can say that the odds of a customer choosing Private Bank offering Value Added Services are
1.367 times higher than those Public Sector Banks which do not offer Value Added Services, with
a 95% CI of 1.097 to 1.703.
The important thing about this confidence interval is that it doesn’t cross 1. This is important
because values greater than 1 mean that as the predictor variable(s) increase, so do the odds of (in
this case) selecting Private Bank. Values less than 1 mean the opposite: as the predictor increases,
the odds of selecting Private Bank Decreases.
Case Processing Summary and Encoding
The first section of the output shows Case Processing Summary highlighting the cases included in
the analysis. In this example we have a total of 341 respondents.
Case Processing Summary
Unweighted Casesa N Percent
Included in Analysis 290 100.0
Selected Cases Missing Cases 0 .0
Total 290 100.0
Unselected Cases 0 .0
Total 290 100.0
a. If weight is in effect, see classification table for the total number
of cases.
Interpretation
Based on the case processing summary, it appears that all 290 cases in the dataset were selected for
analysis. There are no missing cases, indicating that there is complete data available for the selected
cases. The total number of cases in the dataset is also 290.
This summary provides an overview of the case selection and missing data status, allowing us to
understand the completeness of the dataset and the number of cases available for analysis.
18
The Dependent variable encoding table shows the coding for the criterion variable, in this case
those who will encourage are classified as 1 while those who will not encourage to take up the
Islamic Banking are classified as 0.
Dependent Variable Encoding
Original Value Internal Value
Yes 0
No 1
Interpretation
The encoding of the dependent variable suggests that the value "Yes" is represented by the
internal value 0, while the value "No" is represented by the internal value 1. This encoding
allows for numerical representation and analysis of the dependent variable in statistical models
or algorithms
19

Categorical Analysis Assinment Assignment

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Categorical Analysis Assinment Assignment

Uploaded by

Copyright:

Available Formats

JIMMA UNIERSITY

COLLEGE OF NATURAL SCIENCCE

(3rd YEAR STATISTICS DEPARTEMENT STUDENT)

CATAGORICAL DATA ANALYSIS

SUMMITED TO: YASIN N.

Descriptive statistics serves as the bedrock of statistical analysis, acting as a means to

Step by Step: Running Descriptive Statistics in SPSS Statistics

1. STEP: Load Data into SPSS

4. STEP: Define Options

[DataSet1] C:\Users\user\Downloads\Lung cancer Data1 (1).sav

How to Interpret SPSS Output of Descriptive Statistics

Here’s an interpretation of each statistic:

assumptions, and practical implementation. Whether you’re a student or a novice in statistical

Types of Logistic Regression

ln (p/p-1)(= b0 + b1X1 + b2X2 + …..+ bkXk \]

• p represents the probability of the binary outcome (e.g., success or failure).

Logistic Regression Classification

In Binary Logistic Regression, classification plays a crucial role. Classification involves

• STEP: Load Data into SPSS

Yes 145 32 81.9

Variables in the Equation

Hosmer and Lemeshow Test

Model Summary Table

Variables not in the Equation Table

Variables not in the Equation

Block 1: Method = Enter

Omnibus Tests of Model Coefficients

Omnibus Tests of Model Coefficients

Hosmer and Lemeshow Test

Hosmer and Lemeshow Test

Yes 145 32 81.9

Variables in the Equation Table

Variables in the Equation

• Odds is the Ratio of Probability – P(A)/P(B)

You might also like