Professional Documents
Culture Documents
Tugas Analisis Data Survival: Kelas B
Tugas Analisis Data Survival: Kelas B
Tugas Analisis Data Survival: Kelas B
JURUSAN MATEMATIKA
YOGYAKARTA
2014
1.1 Log normal distribution
For example, in finance, the variable could represent the compound return from a sequence of
many trades, each expressed as its return + 1; or a long-term discount factor can be derived
from the product of short-term discount factors. This was observed by Gilbrat, who claimed
that the size of a firm and its growth rate are independent, and called the associated central
limit tendency "the law of proportionate effect."[2]
Summary
2.1 LOGNORMAL REGRESSION MODEL
During the past severel years, many researchers including Segal (1988). Ciampi and Thiffault
(1989), Davis and Anderson (1989), Bloch and Segal (1989), Loh (1991), Ciampi (1991),
Leblanc and Crowley (1992), Schmoor et al. (1993), Ahn (1994) and Ahn and Loh (1994)
extended the tree-structured regression method to cencored survival data. There are two
broad categories of regression models for cencored survival data. One approach is uding para
metric families of survival distribution. The former category of the methods include
exponential, weibull, log-normal and log-gamma regression models. The latter category was
introduced by Cox (1972), Miller91976), Buckley and james (1979) and Koul, Susarla and
Van Ryzin (1981).
One way to examine the relationship of covariates to survival time is through a regression
model in which survival time has a distribution that depends on the covariates. Parametric
regression models would be appropriate for this situation. Among the parametric models, the
log normal distribution has been widely used as a lifetime distribution model.
When covariate are considered, we assume that the survival time, or a function of it, has an
explicit relationship with covariats. Furthermore, when a parametric model is considered, we
assume that the survival time (or a function of it) follows a given heoretical distribution (or
model) and has an explicit relationship with the covariates.
Let ε in (11.2.4) be the standard normal random variable with the density function g(ε) and
survivorship function G(ε),
−ε 2
g (ε)=
exp
2( ) (11.5.1)
√2 π
2
ε −x
1 2
G ( ε )=1−Φ ( ε )=1− ∫e dx (11.5.2)
√ 2 π −∞
Where Φ is the cumulative distribution function of the standard normal distribution. Then the
model defined by (11.2.4) for the survival time T of individual i,
p
log T i=a0 + ∑ ak x ki+ σ ε i=μ i+ σ ε i
k=1
Is the lognormal regression model. T has the lognormal distribution with the density function
−( logt−μi )2
2
f ( t , μi , σ )=
exp [ 2σ 2 ]
√ 2 π σt
(11.5.3)
log t−μ i
S ( t , μi , σ 2 ) =1−Φ ( σ ) (11.5.4)
It can be shown that the hazard function h(t , σ , a0 , a1 ,… ,a p) of T with covariate x1, x2, ... ,
xpand unknown parameters and coefficient σ, a0, a1, ... , apcanbe written as
The Newton-Raphson method is used to find the maximum likelihood estimates of the
parameters. However, if cencoring is heavy, this method fails to converge unless initial
values are very close to the maximum likelihood estimates. In the case, another iterative
procedure is provided using method that requires only the first derivatives of the log
likelihood function
Since we work log times, yi = ln ωi represents a log lifetime or log cencoring time. From the
probability density function and survival function of Y, the log likelihood function for a
cencored sample based on n obsevation is
Where ϕ(.) is the standard normal probability density function. Let zi = (yi - xiβ)/σ and A(z) =
ϕ(z)/(1-ϕ(z)). The first and second derivatives of ln L can be obtained from the above
formula. The maximum likelihood equations ∂ln L/ ∂βr =0, r= 1, ...., p, and ∂ln L/∂σ = 0 are
solved by the Newton-Raphson method to get the maximum likelihood estimates of β and σ 2.
Negative values σ ccan arise during iteratio. This is avoided by replacing each negative value
with one-half the σ value in the previous iteration.
In the case that the Newton-Raphson method fails to converge, a generalization og the
method pf sampford and taylor (1959) is tried. The maximum likelihood equations are
Where V = (v1, ...., vn)’ and (3) gives
~
Equations (4) and (5) are used to calculate new estimates β and ~
σ 2 of β and σ2. The procedure
is repeated, This procedure converges to the maximum likelihood estimates ^β and σ^ 2 if
suitable initial values are used. This procedure is essentially an EM algoritm. This scheme
converges more surely than Newton-Raphson iteration, though more slowly. A maximum of
30 iteration are allowed in the program.
Example:
Diperoleh data dari studi tentang pasien syok spetik. Event yang menjadi perhatian dalam
studi ini adalah relapse (kekambuhan kembali) dari 38 pasien syok petik diketahui beberapa
kadar dari darahnya dan jenis kelaminnya. Ingin diketahui variabel mana saja yang signifikan
untuk terjadinya syok spetik kembali. (software yang digunakan adalah minitab dan R)
Data Survei Pasien Syok Spetik di PKU MUHAMMADIYAH BANTUL TAHUN 2008-
2009
P er cent
50 *
10 Normal
0.932
10
1 1
1 10 100 1 10 100
time time
E xponential Normal
99
90
90
50
P er cent
P er cent
50
10
10
1 1
0.1 1.0 10.0 100.0 0 10 20
time time
Pembahasan dengan R
Model 1
model1=survreg(Surv(time,Status)~factor(jk)+hematikosit+leukosit+guladarah, data=data,
dist="lognormal")
summary(model1)
output :
log ( t )+ 0,02289+0,25742 factor ( jk ) P+0,04285 hematikosit
S ( x )=1−∅( +0,00760 leukosit+ 0,00323 guladara h
1.25 )
Variabel yang tidak signifikan adalah jk, leukosit dan gula darah.
Model 2
model2=survreg(Surv(time,Status)~factor(jk)+hematikosit+guladarah, data=data,
dist="lognormal")
summary(model2)
output :
log ( t )+ 0,25238+0,22725 factor ( jk ) P+ 0,03897 hematikosit
S ( x )=1−∅( +0,00323 guladara h
1.23 )
Variabel yang tidak signifikan adalah jk dan guladarah
Model 3
model3=survreg(Surv(time,Status)~factor(jk)+hematikosit, data=data, dist="lognormal")
summary(model3)
output :
S ( x )=1−∅ ( log ( t )+ 0,6993+0,1895 factor ( jk ) P+ 0,03897 hematikosit
1.096 )
Variabel yang tidak signifikan adalah jk
Model 4
model4=survreg(Surv(time,Status)~hematikosit, data=data, dist="lognormal")
summary(model4)
output :