Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

HW1_Reliability Analysis

R26104047 統計所碩二 張晏壬

Integrated circuit failure times in hours. When the test ended at 1,370 hours, 4,128 units were still running
(Meeker 1987)

0.1 0.1 0.15 0.6 0.8 0.8


1.2 2.5 3 4 4 6
10 10 12.5 20 20 43
43 48 48 54 74 84
94 168 263 593 - -

Use at least two methods to estimate the mean life of the IC products. Present all the methods you know.

When a high proportion of censored data, especially over 80%, is present in a dataset, traditional methods for
estimating mean life may lead to bias. In such cases, more advanced statistical methods, such as the Kaplan-Meier
method or Cox proportional hazards model, can be used.

1. Use Keplan-Meier estimator

The Kaplan-Meier method provides non-parametric estimation of the survival function and can handle any num-
ber of censored observations. This method is often used in the analysis of clinical trial data and can estimate
survival rates over the entire study period.

library(lubridate)
library(ggsurvfit)
library(gtsummary)
library(tidycmprsk)
library(condsurv)
library(survival)
library(survminer)

# Load the failure data


failures <- c(0.10, 0.10, 0.15, 0.60, 0.80, 0.80, 1.20,2.50, 3.00, 4.00, 4.00, 6.00,
10.00, 10.00, 12.50, 20.00, 20.00, 43.00, 43.00, 48.00, 48.00, 54.00,
74.00, 84.00, 94.00, 168.00, 263.00, 593.00, rep(1370,4128))

1
status <- c(rep(1, 28), rep(0,4128)) # 1 for failure data, 0 for censored data

IC_product=data.frame(time=failures,status=status)

Draw the Kaplan-Meier curves

survfit2(Surv(time, status) ~ 1, data = IC_product) %>%


ggsurvfit() + labs(x = "Hours", y = "Overall survival probability")+
add_confidence_interval()

1.0000
Overall survival probability

0.9975

0.9950

0.9925

0 500 1000
Hours
At each event time, calculate the proportion of surviving units, which is defined as the number of units still under
test at that time divided by the total number of units. Then, calculate the cumulative product of the proportions
of surviving units up to each event time, which gives an estimate of the survival function.

sfit <- survfit2(Surv(time, status) ~ 1, data = IC_product)

summary(sfit, times = sfit$time)

## Call: survfit(formula = Surv(time, status) ~ 1, data = IC_product)


##
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 0.10 4156 2 1.000 0.000340 0.999 1.000

2
## 0.15 4154 1 0.999 0.000417 0.998 1.000
## 0.60 4153 1 0.999 0.000481 0.998 1.000
## 0.80 4152 2 0.999 0.000589 0.997 1.000
## 1.20 4150 1 0.998 0.000636 0.997 1.000
## 2.50 4149 1 0.998 0.000680 0.997 0.999
## 3.00 4148 1 0.998 0.000721 0.996 0.999
## 4.00 4147 2 0.997 0.000797 0.996 0.999
## 6.00 4145 1 0.997 0.000832 0.995 0.999
## 10.00 4144 2 0.997 0.000899 0.995 0.998
## 12.50 4142 1 0.996 0.000930 0.995 0.998
## 20.00 4141 2 0.996 0.000990 0.994 0.998
## 43.00 4139 2 0.995 0.001046 0.993 0.997
## 48.00 4137 2 0.995 0.001100 0.993 0.997
## 54.00 4135 1 0.995 0.001126 0.993 0.997
## 74.00 4134 1 0.994 0.001151 0.992 0.997
## 84.00 4133 1 0.994 0.001175 0.992 0.997
## 94.00 4132 1 0.994 0.001199 0.992 0.996
## 168.00 4131 1 0.994 0.001223 0.991 0.996
## 263.00 4130 1 0.994 0.001246 0.991 0.996
## 593.00 4129 1 0.993 0.001269 0.991 0.996
## 1370.00 4128 0 0.993 0.001269 0.991 0.996

surv_prob <- summary(sfit, times = sfit$time)$surv

設 𝑡𝑖 為第 𝑖 個測量時間,𝑆(𝑡𝑖 ) 為存活函數在 𝑡𝑖 的估計值,𝑛 為樣本數,而存活時間 𝑇 的均值 𝑇 ̄


可以用下面的公式計算:
𝑇 ̄ 的定義是總存活時間的期望值(平均壽命),可以表示為存活密度函數 𝑓(𝑡) 乘以 𝑡 的積分:


𝑇̄ = ∫ 𝑡𝑓(𝑡)𝑑𝑡
0
𝑑
當樣本中有設限資料時,𝑓(𝑡) 的估計需要使用存活函數 𝑆(𝑡) 來得到,並將 𝑓(𝑡) 替換為 − 𝑑𝑡 𝑆(𝑡):
𝑑
𝑓(𝑡) = − 𝑆(𝑡)
𝑑𝑡
此外平均壽命也可以不用積分形式,以離散近似表示下列式子:
𝑛−1
𝑆(𝑡𝑖 ) − 𝑆(𝑡𝑖+1 )
𝑇̄ ≈ ∑ (𝑡𝑖 − 𝑡𝑖+1 )
𝑖=1
𝑆(𝑡𝑖 )
Approximate the mean life by taking the weighted average of the estimated survival probabilities, using the time
intervals 𝑡𝑖 − 𝑡𝑖+1 as weights.

mean_life <- sum(surv_prob[-1] * diff(sfit$time) * surv_prob[-length(surv_prob)])


cat("The mean life of the IC products : ",mean_life,"hours")

## The mean life of the IC products : 1352.12 hours

3
2. Use Cox proportional hazards model:

The Cox proportional hazards model can estimate the mean life while taking into account the influence of other
variables. This method can also estimate the mean life between different groups and consider the impact of
multiple risk factors.

ℎ(𝑡, 𝑥) = ℎ0 (𝑡)𝑒𝑥 𝛽

其中 ℎ(𝑡, 𝑥) 是在時間 𝑡 的風險率(hazard rate), 𝑥 是觀察到的特徵向量,𝛽 是與這些特徵向量相關


的係數向量,ℎ0 (𝑡) 是基線風險率(baseline hazard rate)。

# Fit a Cox proportional hazards model to the IC product data


# using the coxph function in the survival package:
cox_model <- coxph(Surv(time, status) ~ 1, data = IC_product)

# Estimate the baseline cumulative hazard function using the basehaz function:
baseline_haz <- basehaz(cox_model)


𝑇 ̄ = 𝑒𝛽 ∫ 𝑡𝛽−1 𝑆(𝑡)𝑑𝑡
0

其中,𝑇 ̄ 是平均壽命,𝛽 是 Cox 模型中估計的迴歸係數,𝑆(𝑡) 是在時間 𝑡 生存的概率。

# Calculate the survival probability at each time point


# using the cumulative hazard function
surv_prob <- exp(-baseline_haz$haz)

# Estimate the mean life using the survival probabilities


# Using the trapezoidal rule to approximate the integral above
mean_life <- sum(diff(IC_product$time)*surv_prob[-1])

## Warning in diff(IC_product$time) * surv_prob[-1]: longer object length is not a


## multiple of shorter object length

cat("The mean life of the IC products : ",mean_life,"hours")

## The mean life of the IC products : 1366.481 hours

You might also like