Data Science and Applications Notes

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

What is Predictive Modelling and jargons around it?

• A predictive / machine learning model is nothing but a mathematical


function of input data that can maps the relationship between input data
and the output optimally.
• The established relationship is used for forecasting / predicting for new
scenarios.
• It helps in the knowing the relation between input features and the target
variable.

In short, predictive modelling is a statistical technique using machine learning


and data mining to predict and forecast likely future outcomes with the aid of
historical and existing data. It works by analysing current and historical data
and projecting what it learns on a model generated to forecast likely
outcomes. Predictive modelling can be used to predict just about anything,
from TV ratings and a customer’s next purchase to credit risks and corporate
earnings.

The most widely used predictive modelling methods are as below:


1. Simple linear regression: A statistical method to mention the relationship
between two variables which are continuous.

2. Multiple linear regression: A statistical method to mention the


relationship between more than two variables which are continuous.
What are some applications of data science and predictive modelling in
industry?

Data Science Application in Different Areas


• Education.
• Airline Route Planning.
• Healthcare Industry.
• Delivery Logistics.
• Banking and Finance.
• Filtered Internet Search.
• Product Recommendation Systems.
• Digital Marketing and Advertising.

Health Care and Pharmaceuticals - important improvements in clinical care


and treatments
• The health care industry includes a massive amount of data from the
Electronic Health Records (EHR), genome sequencing, mobile health
devices, social media, and other sources
Finance - deep learning to analyse trading patterns before and after the
announcement of significant company news
• Applications of data science to finance are important to financial
security, managing risk, marketing and improving trading. The
communications that banks rapidly send after a suspicious transaction

Application of predictive modelling


• Auto insurance - Predictive modeling can be used to determine the risk of
accidents to policy holders
• Fraud detection systems - Predictive modeling can be used to identify
high-risk transactions/customers Pro-active customer retention -
Predictive modeling can be used to predict the probability of a customer
terminating his/her services. Predictive modeling can also be successfully
carried out in live transactions.

Predictive Modeling is also extremely useful in Big Data scenarios where the
data is large, unstructured, and complex, and cannot be managed by using a
normal database management system. For example, social networks and
web logs are sources of Big Data, which if studied and analyzed carefully,
can provide you with significant insights into user behavioral patterns
What is Cross Validation, bias variance rid off, overfitting, multiple colinearity,
Generation cost function of linear regression models?

Bias Variance Trade-off –


• Lower the complexity / Tuning will lead to high bias of the predictions
(UNDERFITTING)
• Higher the complexity / tuning will lead to Higher variance but lower bias
(OVERFITTING)
• A good model will try to make trade off between Under fitting versus Over
fitting, in other words it is Bias versus Variance Trade off

Multi collinearity -
• The inherent correlation among predictors (X variables will lead to the problem
of multi collinearity
• Not only direct correlation between X variables but also multi linear relation
would lead to the problem
• Multi collinearity leads to instability / high variance /overfitting of coefficients of
the model. Even with small change in the training data would lead to dramatic
change in the coefficients/ predictions

Cost function of linear regression models -


• Cost function measures the performance of a machine learning model for a data
set. Cost function quantifies the error between predicted and expected values
and presents that error in the form of a single real number.
• Depending on the problem, cost function can be formed in many different ways.
The purpose of cost function is to be either minimized or maximized.
• For algorithms relying on gradient descent to optimize model parameters, every
function has to be differentiable.

Overfitting -
• Model is said to be overfitting if it performs on the Training Data but performs
significantly poor in Test Data
• Reasons of Overfitting
Complex model (with large number of variables and confounding)
Hyper tuning of parameters in order to get better training accuracy

Cross Validation -
The purpose of cross–validation is to test the ability of a machine learning model
to predict new data. It is also used to flag problems like overfitting or selection bias
and gives insights on how the model will generalize to an independent dataset.
Gradient descent algorithm and its working mechanism
• Gradient Descent is used machine learning algorithms in the industry,
And yet it confounds a lot of newcomers.
• Gradient descent is an iterative optimization algorithm for finding the
local minimum of a function.
• The goal of the gradient descent algorithm is to minimize the given
function (say cost function). To achieve this goal, it performs two steps
iteratively:

1. Compute the gradient (slope), the first order derivative of the function
at that point
2. Make a step (move) in the direction opposite to the gradient, opposite
direction of slope increase from the current point by alpha times the
gradient at that point

Gradient Descent, the algorithm follows a straight path towards the


minimum. If the cost function is convex, then it converges to a global
minimum and if the cost function is not convex, then it converges to a local
minimum

You might also like