Tugas Analisis Data Survival: Kelas B

TUGAS
ANALISIS DATA SURVIVAL

KELAS B
Yogyakarta, 19 Mei 2015

Kelompok :
1. Rosanita 2012/331429/PA/14683
2. Tiya Octaviani 2012/331430/PA/14684
3. Atina Husnaqilati 2012/334761/PA/14992
4. Rinda desanty V 2012/334762/PA/14993
5. Indria dewi 2012/334788/PA/15012
6. Ika P 2012/334940/PA/15066
JURUSAN MATEMATIKA
FAKULTAS MATEMATIKA DAN ILMU PENGETAHUAN ALAM
UNIVERSITAS GADJAH MADA
YOGYAKARTA
2014
1.1 Log normal distribution
In probability theory, a log-normal (or lognormal) distribution is a continuous probability

distribution of a random variablewhose logarithm is normally distributed. Thus, if the random
variable X is log-normally distributed, then Y=exp(X) has a normal distribution. Likewise, if
X has a normal distribution, then X=exp(Y) has a log-normal distribution. A random variable
which is log-normally distributed takes only positive real values.
The distribution is occasionally referred to as the Galton distribution or Galton's distribution,

after Francis Galton.[1] The log-normal distribution also has been associated with other
names, such as McAlister, Gibrat and Cobb–Douglas.[1]
A variable might be modeled as log-normal if it can be thought of as the

multiplicative product of many independent random variables, each of which is positive. This
is justified by considering the central limit theorem in the log domain.
For example, in finance, the variable could represent the compound return from a sequence of
many trades, each expressed as its return + 1; or a long-term discount factor can be derived
from the product of short-term discount factors. This was observed by Gilbrat, who claimed
that the size of a firm and its growth rate are independent, and called the associated central
limit tendency "the law of proportionate effect."[2]
In wireless communication, the delay caused by shadowing or slow fading from random

objects is often assumed to be log-normally distributed: see log-distance path loss model.
The log-normal distribution is the maximum entropy probability distribution for a random

variate X for which the mean and variance of ln(X) are specified.[3]
Summary
2.1 LOGNORMAL REGRESSION MODEL
During the past severel years, many researchers including Segal (1988). Ciampi and Thiffault
(1989), Davis and Anderson (1989), Bloch and Segal (1989), Loh (1991), Ciampi (1991),
Leblanc and Crowley (1992), Schmoor et al. (1993), Ahn (1994) and Ahn and Loh (1994)
extended the tree-structured regression method to cencored survival data. There are two
broad categories of regression models for cencored survival data. One approach is uding para
metric families of survival distribution. The former category of the methods include
exponential, weibull, log-normal and log-gamma regression models. The latter category was
introduced by Cox (1972), Miller91976), Buckley and james (1979) and Koul, Susarla and
Van Ryzin (1981).
One way to examine the relationship of covariates to survival time is through a regression
model in which survival time has a distribution that depends on the covariates. Parametric
regression models would be appropriate for this situation. Among the parametric models, the
log normal distribution has been widely used as a lifetime distribution model.
When covariate are considered, we assume that the survival time, or a function of it, has an
explicit relationship with covariats. Furthermore, when a parametric model is considered, we
assume that the survival time (or a function of it) follows a given heoretical distribution (or
model) and has an explicit relationship with the covariates.
2.1.1 Log normal Regression Model

Let T1, .......... Tn and C1, ........, Cn be independent random variables, shere Ci in the cencoring
time assosiated with the survival time Ti, i= 1, ...., n. We observed (W1, δ 1), ......., (Wn, δ n),
where Wi = min {Ti,Ci} , δ i = I (Ti ≤ Ci) and I( ) is the indicator function. Assume that for
each i, a p-dimensional covariate vector xi = (xi1, ......, xip) independent of Ti is available
Let ε in (11.2.4) be the standard normal random variable with the density function g(ε) and
survivorship function G(ε),
−ε 2
g (ε)=
exp
2( ) (11.5.1)
√2 π
2
ε −x
1 2
G ( ε )=1−Φ ( ε )=1− ∫e dx (11.5.2)
√ 2 π −∞
Where Φ is the cumulative distribution function of the standard normal distribution. Then the
model defined by (11.2.4) for the survival time T of individual i,
p
log T i=a0 + ∑ ak x ki+ σ ε i=μ i+ σ ε i
k=1
Is the lognormal regression model. T has the lognormal distribution with the density function
−( logt−μi )2
2
f ( t , μi , σ )=
exp [ 2σ 2 ]
√ 2 π σt
(11.5.3)
And the survivorship function
log t−μ i
S ( t , μi , σ 2 ) =1−Φ ( σ ) (11.5.4)
It can be shown that the hazard function h(t , σ , a0 , a1 ,… ,a p) of T with covariate x1, x2, ... ,
xpand unknown parameters and coefficient σ, a0, a1, ... , apcanbe written as
log h ( t , σ ,a 0 ,a 1 , … , a p ) =log h0 ¿ ¿ ¿ (11.5.5)

Where h0(.) is the hazard function of an individual with all covariates equal to zero. Equation
(11.5.5) indicates that h(t , σ , a0 , a1 ,… ,a p) is a function of h0 evaluated at t exp(-μ), not
independent of t. thus, the lognormal regression model is not a proportional hazards model.
2.1.2 Maximum Likelihood Methods
The Newton-Raphson method is used to find the maximum likelihood estimates of the
parameters. However, if cencoring is heavy, this method fails to converge unless initial
values are very close to the maximum likelihood estimates. In the case, another iterative
procedure is provided using method that requires only the first derivatives of the log
likelihood function
2.1.2.1 The newton Raphson method
Since we work log times, yi = ln ωi represents a log lifetime or log cencoring time. From the
probability density function and survival function of Y, the log likelihood function for a
cencored sample based on n obsevation is
Where ϕ(.) is the standard normal probability density function. Let zi = (yi - xiβ)/σ and A(z) =
ϕ(z)/(1-ϕ(z)). The first and second derivatives of ln L can be obtained from the above
formula. The maximum likelihood equations ∂ln L/ ∂βr =0, r= 1, ...., p, and ∂ln L/∂σ = 0 are
solved by the Newton-Raphson method to get the maximum likelihood estimates of β and σ 2.
Negative values σ ccan arise during iteratio. This is avoided by replacing each negative value
with one-half the σ value in the previous iteration.
2.1.2.2 An alternative iteration method
In the case that the Newton-Raphson method fails to converge, a generalization og the
method pf sampford and taylor (1959) is tried. The maximum likelihood equations are
Where V = (v1, ...., vn)’ and (3) gives
~
Equations (4) and (5) are used to calculate new estimates β and ~
σ 2 of β and σ2. The procedure
is repeated, This procedure converges to the maximum likelihood estimates ^β and σ^ 2 if
suitable initial values are used. This procedure is essentially an EM algoritm. This scheme
converges more surely than Newton-Raphson iteration, though more slowly. A maximum of
30 iteration are allowed in the program.
Example:
Diperoleh data dari studi tentang pasien syok spetik. Event yang menjadi perhatian dalam
studi ini adalah relapse (kekambuhan kembali) dari 38 pasien syok petik diketahui beberapa
kadar dari darahnya dan jenis kelaminnya. Ingin diketahui variabel mana saja yang signifikan
untuk terjadinya syok spetik kembali. (software yang digunakan adalah minitab dan R)
Data Survei Pasien Syok Spetik di PKU MUHAMMADIYAH BANTUL TAHUN 2008-
2009
hematikos leukosi guladara

No jk it t Status time h
1 L 38 21.1 0 8 116
2 P 39 9.9 0 6 301
3 L 48 8.2 1 12 116
4 P 33 6.2 0 8 213
5 P 30 3.6 0 7 96
6 L 37 34.1 1 1 55
7 L 45 10.3 0 12 74
8 P 48 10.8 0 10 495
9 L 33 11 1 2 60
10 L 46 18.7 1 14 227
11 L 42 9.1 0 7 80
12 P 23 29.8 1 2 146
13 P 33 13.5 1 4 90
14 L 42 10.5 0 3 123
15 L 34 19.2 1 14 539
16 P 38 17.8 1 2 120
17 L 18 3.2 1 2 149
18 P 33 10.4 0 7 60
19 P 31 26.5 0 14 186
20 L 39 22.6 0 6 114
21 P 34 14.4 1 6 78
22 P 37 1.5 1 6 160
23 P 30 3.2 1 4 50
24 L 40 3.3 0 5 150
25 P 11 4.4 1 3 123
26 P 38 28.1 0 10 172
27 L 43 13.3 1 7 141
28 P 21 19.4 0 5 129
29 L 32 69 0 10 119
30 P 11 60.9 1 5 116
31 L 38 28.6 0 5 96
32 L 27 14.5 0 8 190
33 L 38 3 0 7 188
34 P 28 15.6 0 3 103
35 P 24 14.9 0 2 107
36 P 29 17.1 0 6 138
37 L 45 3.4 1 5 94
38 L 39 6.6 0 5 112
Identifikasi Model
Probability Plot for time

LSXY Estimates-Censoring Column in Status
C orrelation C oefficient
Weibull Lognormal
Weibull
99
0.969
90
Lognormal
90
0.976
50
E xponential
P er cent
P er cent
50 *
10 Normal
0.932
10
1 1
1 10 100 1 10 100
time time
E xponential Normal
99
90
90
50
P er cent
P er cent
50
10
10
1 1
0.1 1.0 10.0 100.0 0 10 20
time time
Dengan menggunakan software R
Pembahasan dengan R
 Model 1
model1=survreg(Surv(time,Status)~factor(jk)+hematikosit+leukosit+guladarah, data=data,
dist="lognormal")
summary(model1)
output :
log ( t )+ 0,02289+0,25742 factor ( jk ) P+0,04285 hematikosit
S ( x )=1−∅( +0,00760 leukosit+ 0,00323 guladara h
1.25 )
Variabel yang tidak signifikan adalah jk, leukosit dan gula darah.
 Model 2
model2=survreg(Surv(time,Status)~factor(jk)+hematikosit+guladarah, data=data,
dist="lognormal")
summary(model2)
output :
log ( t )+ 0,25238+0,22725 factor ( jk ) P+ 0,03897 hematikosit
S ( x )=1−∅( +0,00323 guladara h
1.23 )
Variabel yang tidak signifikan adalah jk dan guladarah
 Model 3
model3=survreg(Surv(time,Status)~factor(jk)+hematikosit, data=data, dist="lognormal")
summary(model3)
output :
S ( x )=1−∅ ( log ( t )+ 0,6993+0,1895 factor ( jk ) P+ 0,03897 hematikosit
1.096 )
Variabel yang tidak signifikan adalah jk
 Model 4
model4=survreg(Surv(time,Status)~hematikosit, data=data, dist="lognormal")
summary(model4)
output :
S ( x )=1−∅ ( log ( t )+ 0,9417+0,03875

1.09
hematikosit
)
Semua variabel sudah signifikan

Tugas Analisis Data Survival: Kelas B

Uploaded by

Copyright:

Available Formats

You might also like

Tugas Analisis Data Survival: Kelas B

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Tugas Analisis Data Survival: Kelas B

Uploaded by

Copyright:

Available Formats

TUGAS

ANALISIS DATA SURVIVAL

Yogyakarta, 19 Mei 2015

FAKULTAS MATEMATIKA DAN ILMU PENGETAHUAN ALAM

UNIVERSITAS GADJAH MADA

In probability theory, a log-normal (or lognormal) distribution is a continuous probability

The distribution is occasionally referred to as the Galton distribution or Galton's distribution,

A variable might be modeled as log-normal if it can be thought of as the

In wireless communication, the delay caused by shadowing or slow fading from random

The log-normal distribution is the maximum entropy probability distribution for a random

2.1.1 Log normal Regression Model

And the survivorship function

log h ( t , σ ,a 0 ,a 1 , … , a p ) =log h0 ¿ ¿ ¿ (11.5.5)

2.1.2 Maximum Likelihood Methods

2.1.2.1 The newton Raphson method

2.1.2.2 An alternative iteration method

hematikos leukosi guladara

Probability Plot for time

Dengan menggunakan software R

S ( x )=1−∅ ( log ( t )+ 0,9417+0,03875

You might also like