陳俞成 離散資料分析 Categorical Data Analysis

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 43

大綱

課程綱要
參考用書
評分 . . .
Chapter 1 Introduction

離散資料分析
Categorical Data Analysis

陳俞成

2005.9.19

陳俞成 離散資料分析 Categorical Data Analysis


大綱
課程綱要
參考用書
評分 . . .
Chapter 1 Introduction

課程綱要

參考用書

評分 . . .

Chapter 1 Introduction
Categorical Response Data

陳俞成 離散資料分析 Categorical Data Analysis


大綱
課程綱要
參考用書
評分 . . .
Chapter 1 Introduction

課程綱要
I Introduction
I Two-Way Contingency Tables
I Three-Way Contingency Tables
I Generalized Linear Models
I Logistic Regression
I Loglinear Models for Contingency Tables
I Building and Applying Logit and Loglinear
Models
陳俞成 離散資料分析 Categorical Data Analysis
大綱
課程綱要
參考用書
評分 . . .
Chapter 1 Introduction

課程綱要

I Multicategory Logit Models(optional)


I Models for Matched Pairs(optional)
I A Twentieth-Century Tour of Categorical Data
Analysis(optional)

陳俞成 離散資料分析 Categorical Data Analysis


大綱
課程綱要
參考用書
評分 . . .
Chapter 1 Introduction

參考用書
I 劉應興 (2003) 類別資料分析導論, 華泰文化事業
公司
I Agresti, A. (1996). An Introduction to
Categorical Data Analysis, John Wiley & Sons,
Inc.
I Agresti, A. (2002). Categorical Data Analysis,
2nd Edition, John Wiley & Sons, Inc.
I Agresti, A. (1984). Analysis of Ordinal
Categorical Data. New York: Wiley.
陳俞成 離散資料分析 Categorical Data Analysis
大綱
課程綱要
參考用書
評分 . . .
Chapter 1 Introduction

參考用書
I Bishop, Y. V. V., S. E. Fienberg, and P.
W.Holland. (1975). Discrete Multivariate
Analysis.Cambridge, MA: MIT Press.
I Christensen, R. (1990). Log-Linear Models.
Springer-Verlag New York, Inc.
I Fienberg, S. E. (1980). The Analysis of
Cross-Classified Categorical Data, 2nd ed.
Cambridge, MA: MIT Press.
陳俞成 離散資料分析 Categorical Data Analysis
大綱
課程綱要
參考用書
評分 . . .
Chapter 1 Introduction

參考用書

I Hosmer, D. W. and S. Lemeshow. (1989).


Applied Logistic Regression. New York: Wiley.
I Lloyd, C. J. (1999). Statistical Analysis of
Categorical Data, John Wiley & Sons, Inc.

陳俞成 離散資料分析 Categorical Data Analysis


大綱
課程綱要
參考用書
評分 . . .
Chapter 1 Introduction

評分 . . .

I 期中考 (30%)
I 期末報告 (30%)
I What is the problem
I What tool or method used to solve the problem
I The result
I What else remained unsolved
I 作業 (40%)

陳俞成 離散資料分析 Categorical Data Analysis


大綱
課程綱要
參考用書
評分 . . .
Chapter 1 Introduction

Introduction

I An applied introduction to methods


I Uses and interpretations of the methods rather
than the theory

陳俞成 離散資料分析 Categorical Data Analysis


大綱
課程綱要
參考用書 Categorical Response Data
評分 . . .
Chapter 1 Introduction

Introduction

I Fundamental statistical prerequisites


I The binomial, multinomial and Poisson
distributions
I Maximum likelihood(e.g. for analyzing proportion
data)

陳俞成 離散資料分析 Categorical Data Analysis


大綱
課程綱要
參考用書 Categorical Response Data
評分 . . .
Chapter 1 Introduction

Categorical Response Data

I Response-Explanatory Variable Distinction

陳俞成 離散資料分析 Categorical Data Analysis


大綱
課程綱要
參考用書 Categorical Response Data
評分 . . .
Chapter 1 Introduction

Categorical Response Data

I Response-Explanatory Variable Distinction


I Response variable = dependent variable or Y variable
I Explanatory variable = independent variable or X
variable
I Regression
I ANOVA
I The subject of this book : the analysis of categorical
response variables

陳俞成 離散資料分析 Categorical Data Analysis


大綱
課程綱要
參考用書 Categorical Response Data
評分 . . .
Chapter 1 Introduction

Categorical Response Data

I Nominal-Ordinal Scale Distinction

陳俞成 離散資料分析 Categorical Data Analysis


大綱
課程綱要
參考用書 Categorical Response Data
評分 . . .
Chapter 1 Introduction

Categorical Response Data

I Nominal-Ordinal Scale Distinction


I Nominal(類別、 名目)
I Ordinal(順序、 序位)
I Interval(區間、 等距)
I Ratio(比例、 等比)
I Statistical methods for variables of one type can
also be used with variables at higher levels but
not at lower levels.

陳俞成 離散資料分析 Categorical Data Analysis


大綱
課程綱要
參考用書 Categorical Response Data
評分 . . .
Chapter 1 Introduction

Categorical Response Data

I Continuous-Discrete Variable Distinction

陳俞成 離散資料分析 Categorical Data Analysis


大綱
課程綱要
參考用書 Categorical Response Data
評分 . . .
Chapter 1 Introduction

Categorical Response Data

I Continuous-Discrete Variable Distinction


I Types of discretely measured responses:
I nominal variables
I ordinal variables
I discrete interval variables having relatively few values
I continuous variables grouped into a small number of
categories

陳俞成 離散資料分析 Categorical Data Analysis


大綱
課程綱要
參考用書 Categorical Response Data
評分 . . .
Chapter 1 Introduction

Categorical Response Data

I Qunantitative-Qualittative Variable Distinction

陳俞成 離散資料分析 Categorical Data Analysis


大綱
課程綱要
參考用書 Categorical Response Data
評分 . . .
Chapter 1 Introduction

Categorical Response Data

I Qunantitative-Qualittative Variable Distinction


I Nominal variables are qualitative
I Interval and ratio variables are quantitative
I Ordinal variables are fuzzy
I In many respects, ordinal variables more closely
resemble interval variables

陳俞成 離散資料分析 Categorical Data Analysis


大綱
課程綱要
參考用書 Categorical Response Data
評分 . . .
Chapter 1 Introduction

Binomial Distribution

I y1 , y2 , . . . , yn = n independent and identical trials


such that P(Yi = 1) = π and
P(Yi = 0) = 1 − π, i = 1, 2, . . . , n

陳俞成 離散資料分析 Categorical Data Analysis


大綱
課程綱要
參考用書 Categorical Response Data
評分 . . .
Chapter 1 Introduction

Binomial Distribution

I y1 , y2 , . . . , yn = n independent and identical trials


such that P(Yi = 1) = π and
P(Yi = 0) = 1 − π, i = 1, 2, . . . , n
I Yi , i = 1, 2, . . . , n = Bernoulli trials
Y = ni=1 Yi = binomial distribution with index
P
I

n and parameter π, denoted by bin(n, π)

陳俞成 離散資料分析 Categorical Data Analysis


大綱
課程綱要
參考用書 Categorical Response Data
評分 . . .
Chapter 1 Introduction

Binomial Distribution
I The probability mass function
p(y ) = yn π y (1 − π)n−y , y = 0, 1, 2, . . . , n

I

n

I
y
= n!/[y !(n − y )!]
I E (Yi ) = E (Yi2 ) = 1 × π + 0 × (1 − π) = π
E (Yi ) = π and var(Yi ) = π(1 − π)
P
I Y = i Yi has mean and variance
µ = E (Y ) = nπ and σ 2 = var(Y ) = nπ(1 − π)
I The skewness :
p
E (Y − µ)3 /σ 3 = (1 − 2π)/ nπ(1 − π)

陳俞成 離散資料分析 Categorical Data Analysis


大綱
課程綱要
參考用書 Categorical Response Data
評分 . . .
Chapter 1 Introduction

Binomial Distribution

I The binomial distribution converges to normality


as n increases, for fixed π
I √Y −nπ → N(0, 1) as n → ∞
nπ(1−π)
(nπ > 5 and n(1 − π) > 5)
·
I Y ∼ N(nπ, nπ(1 − π))

陳俞成 離散資料分析 Categorical Data Analysis


大綱
課程綱要
參考用書 Categorical Response Data
評分 . . .
Chapter 1 Introduction

Multinomial Distribution

I Each of n independent, identical trials has


outcome in any of c categories.
(
yij = 1 if trial i has outcome in category j
I
yij = 0 otherwise

陳俞成 離散資料分析 Categorical Data Analysis


大綱
課程綱要
參考用書 Categorical Response Data
評分 . . .
Chapter 1 Introduction

Multinomial Distribution

P
I yi = (yi1 , yi2 , . . . , yic ) with j yij = 1 is a
multinomial trial and is (c − 1)-dimensional.
P
I Let nj = i yij , then the counts (n1 , n2 , . . . , nc )
have the multinomial distribution.

陳俞成 離散資料分析 Categorical Data Analysis


大綱
課程綱要
參考用書 Categorical Response Data
評分 . . .
Chapter 1 Introduction

Multinomial Distribution
I The probability mass function
P
I πj = P(Yij = 1) ∀ i, j πj = 1
Pc
j=1 nj = n with
I

nc = n − (n1 + · · · + nc−1 )
is(c − 1)-dimensional.
I p(n1 , n2 , . . . , nc ) = ( n1 !n2n!!···nc ! )π1n1 π2n2 · · · πcnc
I E (nj ) = nπj , var(nj ) = nπj (1 − πj ),
cov(nj , nk ) = −nπj πk
I nj ∼ bin(n, πj )

陳俞成 離散資料分析 Categorical Data Analysis


大綱
課程綱要
參考用書 Categorical Response Data
評分 . . .
Chapter 1 Introduction

Poisson Distribution
I y = Y (t) is the occured counts per unit time or
space

陳俞成 離散資料分析 Categorical Data Analysis


大綱
課程綱要
參考用書 Categorical Response Data
評分 . . .
Chapter 1 Introduction

Poisson Distribution
I y = Y (t) is the occured counts per unit time or
space
I Y (0) = 0
p[Y (h)=1]−µh
I p[Y (h) = 1] = µh + o(h) ⇔ lim h
=0
h→0
I p[Y (h) ≥ 2] = o(h)
I p[Y (s + t) − Y (s) = n|Y (s) = m]
= p[Y (s + t) − Y (s) = n]
(independent increment)
I Y has the Poission distribution with parameter µ
陳俞成 離散資料分析 Categorical Data Analysis
大綱
課程綱要
參考用書 Categorical Response Data
評分 . . .
Chapter 1 Introduction

Poisson Distribution

I The probability mass function


−µ y
I P(Y = y ) = e y !µ , y = 0, 1, 2, · · ·
I E (Y ) = var(Y ) = µ

I The skewness : E (Y − µ)3 /σ 3 = 1/ µ
Y1 , · · · , Yn iid Poisson(1), Y = ni=1 Yi ∼Poisson(n)
P
I

By C.L.T. Y√−n n
→ N(0, 1)

陳俞成 離散資料分析 Categorical Data Analysis


大綱
課程綱要
參考用書 Categorical Response Data
評分 . . .
Chapter 1 Introduction

Overdispersion(過度分散)

I Suppose Y is a random variable with variance


var(Y |µ) for given µ. Let θ = E (µ)
I E (Y ) = E [E (Y |µ)],
var(Y ) = E [var(Y |µ)] + var[E (Y |µ)]
I When Y is conditionally Poisson(given µ) then
E (Y ) = E (µ) = θ and
var(Y ) = E (µ) + var(µ) = θ + var(µ) > θ

陳俞成 離散資料分析 Categorical Data Analysis


大綱
課程綱要
參考用書 Categorical Response Data
評分 . . .
Chapter 1 Introduction

Connection between Poisson and


Multinomial Distributions

I Suppose Yi ∼Poisson(µi ) are independent


random variables.
P P
I n = i Yi ∼Piosson( i µi )

陳俞成 離散資料分析 Categorical Data Analysis


大綱
課程綱要
參考用書 Categorical Response Data
評分 . . .
Chapter 1 Introduction

Connection between Poisson and


Multinomial Distributions

I For c independent Poission variates, with


E (Yi ) = µi , the conditional probability of a set
P
of counts ni given i Yi = n is

陳俞成 離散資料分析 Categorical Data Analysis


大綱
課程綱要
參考用書 Categorical Response Data
評分 . . .
Chapter 1 Introduction

X
P[(Y1 = n1 , Y2 = n2 , . . . , Yc = nc )| Yj = n]

P(Y1 = n1 , Y2 = n2 , . . . , Yc = nc )
= P
P( Yj = n)

Q ni
i [exp(−µ i )µi /ni !] n! Y ni
= P P =Q πi
exp(− µj )( µj )n /n! i ni ! i
P
where {πi = µi /( µj )}.
陳俞成 離散資料分析 Categorical Data Analysis
大綱
課程綱要
參考用書 Categorical Response Data
評分 . . .
Chapter 1 Introduction

Connection between Poisson and


Multinomial Distributions

I This is the multinomial (n, {πi }) distribution.

陳俞成 離散資料分析 Categorical Data Analysis


大綱
課程綱要
參考用書 Categorical Response Data
評分 . . .
Chapter 1 Introduction

Maximum Likelihood Estimation

Given the data, for a chosen probability distribution


the likelihood function is the probability of those
data, treated as a function of the unknown
parameter.
For example,Y ∼ bin(n, π)
f (y ; n, π) = `(π; n, y ) = yn π y (1 − π)n−y


If n = 10, y = 0 then `(π; 10, 0) = (1 − π)10

陳俞成 離散資料分析 Categorical Data Analysis


大綱
課程綱要
參考用書 Categorical Response Data
評分 . . .
Chapter 1 Introduction

Maximum Likelihood Estimation


The maximum likelihood (ML) estimate is the
parameter value that maximizes the likelihood
function.
The parameter value that maximizes the likelihood
function also maximizes the log of that function.

L(θ) = log `(θ)

The part of a likelihood function involving the


parameters is called the kernel.
陳俞成 離散資料分析 Categorical Data Analysis
大綱
課程綱要
參考用書 Categorical Response Data
評分 . . .
Chapter 1 Introduction

Maximum Likelihood Estimation

The binomial log likelihood function is


I L(π) = log[π y (1 − π)n−y ] =
y log(π) + (n − y ) log(1 − π)
I ∂L(π)/∂π = y /π − (n − y )/(1 − π) =
(y − nπ)/[π(1 − π)]
I Let ∂L(π)/∂π = 0, we get π̂ = y /n, MLE of π

陳俞成 離散資料分析 Categorical Data Analysis


大綱
課程綱要
參考用書 Categorical Response Data
評分 . . .
Chapter 1 Introduction

Maximum Likelihood Estimation

I −E [∂ 2 L(π)/∂π 2 ] =
E [y /π 2 + (n − y )/(1 − π)2 ] = n/[π(1 − π)]
I The asymptotic variance of π̂ is π(1 − π)/n
I
q) = nπ(1 − π),
Since E (Y ) = nπ, and var(Y
we have E (π̂) = π, σ(π̂) = π(1−π)
n

陳俞成 離散資料分析 Categorical Data Analysis


大綱
課程綱要
參考用書 Categorical Response Data
評分 . . .
Chapter 1 Introduction

Test about a Proportion

I H0 : π = π0 vs. Ha : π 6= π0 (two − sided)


I The Wald statistic is zW = √ π̂−π0
π̂(1−π̂)/n
π̂−π0
I The score statistic is zS = √
π0 (1−π0 )/n
I The log likelihood-ratio statistic is
χ21 = −2(L0 −L1 ) = 2(y log nπy 0 +(n−y ) log n−nπ
n−y
0
)

陳俞成 離散資料分析 Categorical Data Analysis


大綱
課程綱要
參考用書 Categorical Response Data
評分 . . .
Chapter 1 Introduction

Confidence Interval for a Proportion

I Let za denote the z-score from the standard


normal distribution having right-tailed probability
a; this is the 100(1 − a) percentile of that
distribution.
I Let χ2df (a) denote the 100(1 − a) percentileof
the chi-squared distribution with degrees of
freedom df.

陳俞成 離散資料分析 Categorical Data Analysis


大綱
課程綱要
參考用書 Categorical Response Data
評分 . . .
Chapter 1 Introduction

Confidence Interval for a Proportion


I The Wald confidence interval is |zW | < zα/2 or
r
π̂(1 − π̂)
π̂ ± zα/2
n
I The score confidence interval is |zS | < zα/2 . Its
endpoints are the π0 solution to the equations
π̂ − π0
p = ±zα/2
π0 (1 − π0 )/n
陳俞成 離散資料分析 Categorical Data Analysis
大綱
課程綱要
參考用書 Categorical Response Data
評分 . . .
Chapter 1 Introduction

Confidence Interval for a Proportion

I The likelihood-ratio-based confidence interval is


the set of π0 satisfying
−2(L0 − L1 ) = −2[L(π0 ) − L(π̂)] ≤ χ21 (α)

陳俞成 離散資料分析 Categorical Data Analysis


大綱
課程綱要
參考用書 Categorical Response Data
評分 . . .
Chapter 1 Introduction

Confidence Interval for a Proportion

For sample that are too small for the normal


approximation, one can use the binomial distribution
directly in calculating P-values.
For instance, suppose there are Y = 4 successes in
N = 5 trials.
I H0 : π = .5 vs. Ha : π > .5(right one-sided)
I P-values=P(4) + P(5) = .188

陳俞成 離散資料分析 Categorical Data Analysis


大綱
課程綱要
參考用書 Categorical Response Data
評分 . . .
Chapter 1 Introduction

Summary

I Information Type of Data


I Three Distributions for Categorical Data
I Inference for a proportion
(MLE,Testing,Confidence Interval)

陳俞成 離散資料分析 Categorical Data Analysis

You might also like