Tugas Analisis Data Survival: Kelas B

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 11

TUGAS

ANALISIS DATA SURVIVAL


KELAS B

Yogyakarta, 19 Mei 2015


Kelompok :
1. Rosanita 2012/331429/PA/14683
2. Tiya Octaviani 2012/331430/PA/14684
3. Atina Husnaqilati 2012/334761/PA/14992
4. Rinda desanty V 2012/334762/PA/14993
5. Indria dewi 2012/334788/PA/15012
6. Ika P 2012/334940/PA/15066

JURUSAN MATEMATIKA

FAKULTAS MATEMATIKA DAN ILMU PENGETAHUAN ALAM

UNIVERSITAS GADJAH MADA

YOGYAKARTA

2014
1.1 Log normal distribution

In probability theory, a log-normal (or lognormal) distribution is a continuous probability


distribution of a random variablewhose logarithm is normally distributed. Thus, if the random
variable X is log-normally distributed, then Y=exp(X) has a normal distribution. Likewise, if 
X has a normal distribution, then X=exp(Y) has a log-normal distribution. A random variable
which is log-normally distributed takes only positive real values.

The distribution is occasionally referred to as the Galton distribution or Galton's distribution,


after Francis Galton.[1] The log-normal distribution also has been associated with other
names, such as McAlister, Gibrat and Cobb–Douglas.[1]

A variable might be modeled as log-normal if it can be thought of as the


multiplicative product of many independent random variables, each of which is positive. This
is justified by considering the central limit theorem in the log domain.

For example, in finance, the variable could represent the compound return from a sequence of
many trades, each expressed as its return + 1; or a long-term discount factor can be derived
from the product of short-term discount factors. This was observed by Gilbrat, who claimed
that the size of a firm and its growth rate are independent, and called the associated central
limit tendency "the law of proportionate effect."[2]

In wireless communication, the delay caused by shadowing or slow fading from random


objects is often assumed to be log-normally distributed: see log-distance path loss model.

The log-normal distribution is the maximum entropy probability distribution for a random


variate X for which the mean and variance of ln(X) are specified.[3]

Summary
2.1 LOGNORMAL REGRESSION MODEL

During the past severel years, many researchers including Segal (1988). Ciampi and Thiffault
(1989), Davis and Anderson (1989), Bloch and Segal (1989), Loh (1991), Ciampi (1991),
Leblanc and Crowley (1992), Schmoor et al. (1993), Ahn (1994) and Ahn and Loh (1994)
extended the tree-structured regression method to cencored survival data. There are two
broad categories of regression models for cencored survival data. One approach is uding para
metric families of survival distribution. The former category of the methods include
exponential, weibull, log-normal and log-gamma regression models. The latter category was
introduced by Cox (1972), Miller91976), Buckley and james (1979) and Koul, Susarla and
Van Ryzin (1981).

One way to examine the relationship of covariates to survival time is through a regression
model in which survival time has a distribution that depends on the covariates. Parametric
regression models would be appropriate for this situation. Among the parametric models, the
log normal distribution has been widely used as a lifetime distribution model.

When covariate are considered, we assume that the survival time, or a function of it, has an
explicit relationship with covariats. Furthermore, when a parametric model is considered, we
assume that the survival time (or a function of it) follows a given heoretical distribution (or
model) and has an explicit relationship with the covariates.

2.1.1 Log normal Regression Model


Let T1, .......... Tn and C1, ........, Cn be independent random variables, shere Ci in the cencoring
time assosiated with the survival time Ti, i= 1, ...., n. We observed (W1, δ 1), ......., (Wn, δ n),
where Wi = min {Ti,Ci} , δ i = I (Ti ≤ Ci) and I( ) is the indicator function. Assume that for
each i, a p-dimensional covariate vector xi = (xi1, ......, xip) independent of Ti is available

Let ε in (11.2.4) be the standard normal random variable with the density function g(ε) and
survivorship function G(ε),

−ε 2
g (ε)=
exp
2( ) (11.5.1)
√2 π
2
ε −x
1 2
G ( ε )=1−Φ ( ε )=1− ∫e dx (11.5.2)
√ 2 π −∞

Where Φ is the cumulative distribution function of the standard normal distribution. Then the
model defined by (11.2.4) for the survival time T of individual i,

p
log T i=a0 + ∑ ak x ki+ σ ε i=μ i+ σ ε i
k=1

Is the lognormal regression model. T has the lognormal distribution with the density function

−( logt−μi )2
2
f ( t , μi , σ )=
exp [ 2σ 2 ]
√ 2 π σt
(11.5.3)

And the survivorship function

log t−μ i
S ( t , μi , σ 2 ) =1−Φ ( σ ) (11.5.4)

It can be shown that the hazard function h(t , σ , a0 , a1 ,… ,a p) of T with covariate x1, x2, ... ,
xpand unknown parameters and coefficient σ, a0, a1, ... , apcanbe written as

log h ( t , σ ,a 0 ,a 1 , … , a p ) =log h0 ¿ ¿ ¿ (11.5.5)


Where h0(.) is the hazard function of an individual with all covariates equal to zero. Equation
(11.5.5) indicates that h(t , σ , a0 , a1 ,… ,a p) is a function of h0 evaluated at t exp(-μ), not
independent of t. thus, the lognormal regression model is not a proportional hazards model.

2.1.2 Maximum Likelihood Methods

The Newton-Raphson method is used to find the maximum likelihood estimates of the
parameters. However, if cencoring is heavy, this method fails to converge unless initial
values are very close to the maximum likelihood estimates. In the case, another iterative
procedure is provided using method that requires only the first derivatives of the log
likelihood function

2.1.2.1 The newton Raphson method

Since we work log times, yi = ln ωi represents a log lifetime or log cencoring time. From the
probability density function and survival function of Y, the log likelihood function for a
cencored sample based on n obsevation is

Where ϕ(.) is the standard normal probability density function. Let zi = (yi - xiβ)/σ and A(z) =
ϕ(z)/(1-ϕ(z)). The first and second derivatives of ln L can be obtained from the above
formula. The maximum likelihood equations ∂ln L/ ∂βr =0, r= 1, ...., p, and ∂ln L/∂σ = 0 are
solved by the Newton-Raphson method to get the maximum likelihood estimates of β and σ 2.
Negative values σ ccan arise during iteratio. This is avoided by replacing each negative value
with one-half the σ value in the previous iteration.

2.1.2.2 An alternative iteration method

In the case that the Newton-Raphson method fails to converge, a generalization og the
method pf sampford and taylor (1959) is tried. The maximum likelihood equations are
Where V = (v1, ...., vn)’ and (3) gives

~
Equations (4) and (5) are used to calculate new estimates β and ~
σ 2 of β and σ2. The procedure
is repeated, This procedure converges to the maximum likelihood estimates ^β and σ^ 2 if
suitable initial values are used. This procedure is essentially an EM algoritm. This scheme
converges more surely than Newton-Raphson iteration, though more slowly. A maximum of
30 iteration are allowed in the program.

Example:

Diperoleh data dari studi tentang pasien syok spetik. Event yang menjadi perhatian dalam
studi ini adalah relapse (kekambuhan kembali) dari 38 pasien syok petik diketahui beberapa
kadar dari darahnya dan jenis kelaminnya. Ingin diketahui variabel mana saja yang signifikan
untuk terjadinya syok spetik kembali. (software yang digunakan adalah minitab dan R)

Data Survei Pasien Syok Spetik di PKU MUHAMMADIYAH BANTUL TAHUN 2008-
2009

hematikos leukosi guladara


No jk it t Status time h
1 L 38 21.1 0 8 116
2 P 39 9.9 0 6 301
3 L 48 8.2 1 12 116
4 P 33 6.2 0 8 213
5 P 30 3.6 0 7 96
6 L 37 34.1 1 1 55
7 L 45 10.3 0 12 74
8 P 48 10.8 0 10 495
9 L 33 11 1 2 60
10 L 46 18.7 1 14 227
11 L 42 9.1 0 7 80
12 P 23 29.8 1 2 146
13 P 33 13.5 1 4 90
14 L 42 10.5 0 3 123
15 L 34 19.2 1 14 539
16 P 38 17.8 1 2 120
17 L 18 3.2 1 2 149
18 P 33 10.4 0 7 60
19 P 31 26.5 0 14 186
20 L 39 22.6 0 6 114
21 P 34 14.4 1 6 78
22 P 37 1.5 1 6 160
23 P 30 3.2 1 4 50
24 L 40 3.3 0 5 150
25 P 11 4.4 1 3 123
26 P 38 28.1 0 10 172
27 L 43 13.3 1 7 141
28 P 21 19.4 0 5 129
29 L 32 69 0 10 119
30 P 11 60.9 1 5 116
31 L 38 28.6 0 5 96
32 L 27 14.5 0 8 190
33 L 38 3 0 7 188
34 P 28 15.6 0 3 103
35 P 24 14.9 0 2 107
36 P 29 17.1 0 6 138
37 L 45 3.4 1 5 94
38 L 39 6.6 0 5 112
Identifikasi Model

Probability Plot for time


LSXY Estimates-Censoring Column in Status
C orrelation C oefficient
Weibull Lognormal
Weibull
99
0.969
90
Lognormal
90
0.976
50
E xponential
P er cent

P er cent
50 *
10 Normal
0.932
10

1 1
1 10 100 1 10 100
time time

E xponential Normal
99
90
90
50
P er cent

P er cent

50
10
10

1 1
0.1 1.0 10.0 100.0 0 10 20
time time

Dengan menggunakan software R

Pembahasan dengan R

 Model 1

model1=survreg(Surv(time,Status)~factor(jk)+hematikosit+leukosit+guladarah, data=data,
dist="lognormal")

summary(model1)

output :
log ( t )+ 0,02289+0,25742 factor ( jk ) P+0,04285 hematikosit
S ( x )=1−∅( +0,00760 leukosit+ 0,00323 guladara h
1.25 )
Variabel yang tidak signifikan adalah jk, leukosit dan gula darah.

 Model 2
model2=survreg(Surv(time,Status)~factor(jk)+hematikosit+guladarah, data=data,
dist="lognormal")
summary(model2)
output :
log ( t )+ 0,25238+0,22725 factor ( jk ) P+ 0,03897 hematikosit
S ( x )=1−∅( +0,00323 guladara h
1.23 )
Variabel yang tidak signifikan adalah jk dan guladarah
 Model 3
model3=survreg(Surv(time,Status)~factor(jk)+hematikosit, data=data, dist="lognormal")
summary(model3)
output :
S ( x )=1−∅ ( log ( t )+ 0,6993+0,1895 factor ( jk ) P+ 0,03897 hematikosit
1.096 )
Variabel yang tidak signifikan adalah jk
 Model 4
model4=survreg(Surv(time,Status)~hematikosit, data=data, dist="lognormal")
summary(model4)
output :

S ( x )=1−∅ ( log ( t )+ 0,9417+0,03875


1.09
hematikosit
)
Semua variabel sudah signifikan

You might also like