Cox Regression Model

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 41

COX REGRESSION

MODEL
• Kathy is reviewing doctor order sets used to diagnose and treat sepsis.
There is a high risk of death when patients present with symptoms
related to sepsis. Research shows that there are certain tests and
progressions that produce better patient outcomes, so Kathy is meeting
with a statistician named Eric to input different sepsis patient scenarios
that the hospital has seen over the last year using the Cox regression
model. This will help Kathy redesign sepsis order sets to improve patient
care.
The Cox regression model is also known as proportional hazards
regression. It is considered a survival analysis method and is used to
examine outcomes based on several variables during specific events. It is
used in health care to determine survival outcomes and works best when
consulting with a statistician. Eric explains to Kathy that there will be
hazard rates and predictor variables.
• Hazard rates are specific factors that influence the event at a certain
point in time, such as sepsis or death.
• Predictor variables are also called covariates and are the factors that
manipulate the hazard rate.
Eric also tells Kathy that there are two important types of predictor
variables.
• Time dependent covariates change with time and use more than one
patient (for example, a temperature from two patients that are each taken
every hour for four hours).
• Fixed covariates include factors that do not change (age, sex, race, etc.).
• Kathy realizes the importance of using the guidance of a statistician like
Eric. He will help her process the various time-dependent and fixed
covariates seen when dealing with sepsis. However, Kathy is a little
confused by how she will end up with statistical data without using
normal parameters such as means, medians, and standard deviations.
Eric explains that the Cox regression model is similar to
a nonparametric model, which means that a normal distribution of
variables can't be used because there are too many unknown factors that
may influence outcomes when dealing with patients. However, the Cox
regression model does accept the effects that predictor variables have on
survival and that they are continual over time and stabilize once placed
on a scale.
• Kathy wants to make sure that the Cox regression model is the right one to use
in her situation. She has also heard of logistic regression. She asks Eric if
using logistic regression is better. Eric describes logistic regression as a
model that considers cumulative incidence, which takes into account the
amount of new specified events that happen over a given amount of time. This
estimates an odds ratio, while Cox regression provides a hazard ratio.
• Eric tells Kathy that the Cox regression is better suited because logistic
regression assesses events at a particular time but is unable to compare
survival rates. Cox regression is able to compare those rates of particular
events over specified times, providing information about events, survival
rates, and the probability of experiencing the event again. Eric begins to take
data from Kathy and apply it to the Cox regression model.
• This Cox Regression or the Proportional Hazards Regression is that
method that performs investigating activities with the effect of various
variables depending upon the time in which a specified event is going to
take place in action. Once this Cox model is built fully from the
observed and collected values, then it will be used for making different
predictions with the new data inputs. The Cox Regression model was
generally drawn up as a predictive model for analysing the time-to-event
data. Cox regression analysis is a technique for assessing the
association between variables and survival rate.
• The Cox's Proportional Hazards regression Model which is known also
by the name of Cox Regression or by Cox's Model introduced in the year
1972, basically builds a form of survival analysis functions, all of which
tells the probability of a certain event for example - a death, if happens
at a fixed particular time. This model is essentially such a regression
model that can be commonly used as a statistical spss cox regression in
conducting the medical research work, investigating along with the
association in between the survival time of the patients and among one
or more predictor variables.
• This model is basically a type of survival analysis regression model in
which there is a description of the relationships of the event incidence,
which is expressed by the set of hazardous functions along with a set of
variety of covariates. The multivariate analysis by applying the method
of SPSS Cox Regression is then implemented when there is the need for
multiple and potentially interacting covariates.
• The model of Cox Proportional Hazards is typically a Semi-Parametric
Model since there exist no clear assumptions regarding the shape of this
baseline of hazard functions. There are several Cox models, but the best
Cox models are particularly those which include the censored data and
observations where events didn’t expect to happen as well as the
collected data from various observations where events have actually
occurred. There are some of the other regression models to all of which
are used in the survival analysis by the SPSS experts in assuming the
specific distributions of all the survival times like the Weibull,
Gompertz, exponential and other log-normal distributions.
• The Cox regression can too handle all the quantitative predictor
variables and also the categorical variables. It also tackles those
problems which are related to participant heterogeneity. Though, it is
much popular in the process of survival analysis that the Cox regression
has its downside which is compared with all the other methods of
regression which can be much difficult to understand. Various types of
technical computations are required in this process which includes the
copious matrix multiplications and inversions too. And this characteristic
makes it extremely challenging to calculate by only using hand, but there
exist many other statistical packages which is quite responsible for
handling the particular type of Cox regression.
NEURAL NETWORKS
• Neural networks, also known as artificial neural networks (ANNs) or
simulated neural networks (SNNs), are a subset of machine learning and
are at the heart of deep learning algorithms. Their name and structure are
inspired by the human brain, mimicking the way that biological neurons
signal to one another.
• Artificial neural networks (ANNs) are comprised of a node layers,
containing an input layer, one or more hidden layers, and an output layer.
Each node, or artificial neuron, connects to another and has an
associated weight and threshold. If the output of any individual node is
above the specified threshold value, that node is activated, sending data
to the next layer of the network. Otherwise, no data is passed along to
the next layer of the network.
• Neural networks rely on training data to learn and improve their
accuracy over time. However, once these learning algorithms are fine-
tuned for accuracy, they are powerful tools in computer science
and artificial intelligence, allowing us to classify and cluster data at a
high velocity. Tasks in speech recognition or image recognition can take
minutes versus hours when compared to the manual identification by
human experts. One of the most well-known neural networks is Google’s
search algorithm.
• Once an input layer is determined, weights are assigned. These weights
help determine the importance of any given variable, with larger ones
contributing more significantly to the output compared to other inputs.
All inputs are then multiplied by their respective weights and then
summed. Afterward, the output is passed through an activation function,
which determines the output. If that output exceeds a given threshold, it
“fires” (or activates) the node, passing data to the next layer in the
network. This results in the output of one node becoming in the input of
the next node. This process of passing data from one layer to the next
layer defines this neural network as a feedforward network.
Let’s break down what one single node might look like using binary values. We can
apply this concept to a more tangible example, like whether you should go surfing
(Yes: 1, No: 0). The decision to go or not to go is our predicted outcome, or y-hat.
Let’s assume that there are three factors influencing your decision-making:
• Are the waves good? (Yes: 1, No: 0)
• Is the line-up empty? (Yes: 1, No: 0)
• Has there been a recent shark attack? (Yes: 0, No: 1)
• Then, let’s assume the following, giving us the following inputs:
• X1 = 1, since the waves are pumping
• X2 = 0, since the crowds are out
• X3 = 1, since there hasn’t been a recent shark attack
• Now, we need to assign some weights to determine importance. Larger weights signify that
particular variables are of greater importance to the decision or outcome.
• W1 = 5, since large swells don’t come around often
• W2 = 2, since you’re used to the crowds
• W3 = 4, since you have a fear of sharks
• Finally, we’ll also assume a threshold value of 3, which would translate to a bias value of –3.
With all the various inputs, we can start to plug in values into the formula to get the desired
output.
• Y-hat = (1*5) + (0*2) + (1*4) – 3 = 6
• If we use the activation function from the beginning of this section, we can
determine that the output of this node would be 1, since 6 is greater than 0. In this
instance, you would go surfing; but if we adjust the weights or the threshold, we can
achieve different outcomes from the model. When we observe one decision, like in
the above example, we can see how a neural network could make increasingly
complex decisions depending on the output of previous decisions or layers.
• In the example above, we used perceptrons to illustrate some of the mathematics at
play here, but neural networks leverage sigmoid neurons, which are distinguished by
having values between 0 and 1. Since neural networks behave similarly to decision
trees, cascading data from one node to another, having x values between 0 and 1 will
reduce the impact of any given change of a single variable on the output of any
given node, and subsequently, the output of the neural network.
• Neural networks are artificial systems that were inspired by biological
neural networks. These systems learn to perform tasks by being exposed
to various datasets and examples without any task-specific rules. The
idea is that the system generates identifying characteristics from the data
they have been passed without being programmed with a pre-
programmed understanding of these datasets.
• Neural networks are based on computational models for threshold logic.
Threshold logic is a combination of algorithms and mathematics. Neural
networks are based either on the study of the brain or on the application
of neural networks to artificial intelligence. The work has led to
improvements in finite automata theory.
SUPERVISED VS UNSUPERVISED
LEARNING:
• Neural networks learn via supervised learning; Supervised machine learning
involves an input variable x and output variable y. The algorithm learns from a
training dataset. With each correct answers, algorithms iteratively make
predictions on the data. The learning stops when the algorithm reaches an
acceptable level of performance.
• Unsupervised machine learning has input data X and no corresponding output
variables. The goal is to model the underlying structure of the data for
understanding more about the data. The keywords for supervised machine
learning are classification and regression. For unsupervised machine learning,
the keywords are clustering and association.
EVOLUTION OF NEURAL
NETWORKS:
• Hebbian learning deals with neural plasticity. Hebbian learning is
unsupervised and deals with long term potentiation. Hebbian learning deals
with pattern recognition and exclusive-or circuits; deals with if-then rules.
• Back propagation solved the exclusive-or issue that Hebbian learning could
not handle. This also allowed for multi-layer networks to be feasible and
efficient. If an error was found, the error was solved at each layer by
modifying the weights at each node. This led to the development of support
vector machines, linear classifiers, and max-pooling. The vanishing gradient
problem affects feedforward networks that use back propagation and recurrent
neural network. This is known as deep-learning.
• Hardware-based designs are used for biophysical simulation and neurotrophic
computing. They have large scale component analysis and convolution creates
new class of neural computing with analog. This also solved back-propagation
for many-layered feed forward neural networks.
• Convolutional networks are used for alternating between convolutional layers
and max-pooling layers with connected layers (fully or sparsely connected)
with a final classification layer. The learning is done without unsupervised
pre-training. Each filter is equivalent to a weights vector that has to be trained.
The shift variance has to be guaranteed to dealing with small and large neural
networks. This is being resolved in Development Networks.
TYPES OF NEURAL NETWORKS
There are seven types of neural networks that can be used.
• The first is a multilayer perceptron which has three or more layers and
uses a nonlinear activation function.
• The second is the convolutional neural network that uses a variation of
the multilayer perceptrons.
• The third is the recursive neural network that uses weights to make
structured predictions.
• The fourth is a recurrent neural network that makes connections between
the neurons in a directed cycle. The long short-term memory neural
network uses the recurrent neural network architecture and does not use
activation function.
• The final two are sequence to sequence modules which uses two
recurrent networks and shallow neural networks which produces a vector
space from an amount of text. These neural networks are applications of
the basic neural network demonstrated below.
LIMITATIONS:
• The neural network is for a supervised model. It does not handle
unsupervised machine learning and does not cluster and associate data. It
also lacks a level of accuracy that will be found in more computationally
expensive neural network. Also, the neural network does not work with
any matrices where X’s number of rows and columns do not match Y
and W’s number of rows.
• The next steps would be to create an unsupervised neural network and to
increase computational power for the supervised model with more
iterations and threading.
SUPPORT VECTOR
MACHINE
• Support Vector Machine or SVM is one of the most popular Supervised
Learning algorithms, which is used for Classification as well as
Regression problems. However, primarily, it is used for Classification
problems in Machine Learning.
• The goal of the SVM algorithm is to create the best line or decision
boundary that can segregate n-dimensional space into classes so that we
can easily put the new data point in the correct category in the future.
This best decision boundary is called a hyperplane.
• SVM chooses the extreme points/vectors that help in creating the hyperplane.
These extreme cases are called as support vectors, and hence algorithm is
termed as Support Vector Machine. Consider the below diagram in which
there are two different categories that are classified using a decision boundary
or hyperplane:
• Example: SVM can be understood with the example that we have used in the
KNN classifier. Suppose we see a strange cat that also has some features of
dogs, so if we want a model that can accurately identify whether it is a cat or
dog, so such a model can be created by using the SVM algorithm. We will
first train our model with lots of images of cats and dogs so that it can learn
about different features of cats and dogs, and then we test it with this strange
creature. So as support vector creates a decision boundary between these two
data (cat and dog) and choose extreme cases (support vectors), it will see the
extreme case of cat and dog. On the basis of the support vectors, it will
classify it as a cat.
• Consider the above diagram.

• SVM algorithm can be used for Face detection, image classification,


text categorization, etc.
TYPES OF SVM
SVM can be of two types:
• Linear SVM: Linear SVM is used for linearly separable data, which
means if a dataset can be classified into two classes by using a single
straight line, then such data is termed as linearly separable data, and
classifier is used called as Linear SVM classifier.
• Non-linear SVM: Non-Linear SVM is used for non-linearly separated
data, which means if a dataset cannot be classified by using a straight
line, then such data is termed as non-linear data and classifier used is
called as Non-linear SVM classifier.
HYPERPLANE AND SUPPORT
VECTORS IN THE SVM
ALGORITHM:
Hyperplane: There can be multiple lines/decision boundaries to segregate
the classes in n-dimensional space, but we need to find out the best
decision boundary that helps to classify the data points. This best
boundary is known as the hyperplane of SVM.
• The dimensions of the hyperplane depend on the features present in the
dataset, which means if there are 2 features (as shown in image), then
hyperplane will be a straight line. And if there are 3 features, then
hyperplane will be a 2-dimension plane.
• We always create a hyperplane that has a maximum margin, which
means the maximum distance between the data points.
• Support Vectors: The data points or vectors that are the closest to the
hyperplane and which affect the position of the hyperplane are termed as
Support Vector. Since these vectors support the hyperplane, hence called
a Support vector.
HOW DOES SVM WORKS?
Linear SVM:
• The working of the SVM algorithm
can be understood by using an
example. Suppose we have a dataset
that has two tags (green and blue), and
the dataset has two features x1 and x2.
We want a classifier that can classify
the pair(x1, x2) of coordinates in either
green or blue.
• So as it is 2-d space so by just using
a straight line, we can easily separate
these two classes. But there can be
multiple lines that can separate these
classes.
• Hence, the SVM algorithm helps to
find the best line or decision
boundary; this best boundary or
region is called as a hyperplane.
SVM algorithm finds the closest point
of the lines from both the classes.
These points are called support
vectors. The distance between the
vectors and the hyperplane is called
as margin. And the goal of SVM is to
maximize this margin.
The hyperplane with maximum
margin is called the optimal
hyperplane.
NON-LINEAR SVM:

• If data is linearly arranged, then we


can separate it by using a straight
line, but for non-linear data, we
cannot draw a single straight line.
• So to separate these data points, we
need to add one more dimension. For
linear data, we have used two
dimensions x and y, so for non-linear
data, we will add a third dimension z.
It can be calculated as:
Z = x2 + y2
• By adding the third dimension, the
sample space will become as shown
image:
• So now, SVM will divide the
datasets into classes in the
following way.
• Since we are in 3-d Space, hence
it is looking like a plane parallel
to the x-axis. If we convert it in
2d space with z=1, then it will
become as:

• Hence we get a circumference of


radius 1 in case of non-linear
data.

You might also like