Professional Documents
Culture Documents
Unit1 Dav PDF
Unit1 Dav PDF
• Making predictions and searching for different structures in data is the most
important part of data science.
• They are important because they have the ability to handle different
analytical tasks.
• Probability and Statistics are involved in different predictive algorithms that
are there in Machine Learning. They help in deciding how much data is
reliable
• Probability is one of the most fundamental concepts in statistics.
• A statistic is a result that’s derived from performing a mathematical
operation on numerical data.
• Probability is all about chance. Whereas statistics is more about how we
handle various data using different techniques.
Statistics Basics:-
• Statistics is the study of the collection, analysis, interpretation, presentation,
and organization of data.
• It is a method of collecting and summarising the data. This has many
applications from a small scale to large scale.
• Whether it is the study of the population of the country or its economy, stats
are used for all such data analysis.
• Statistics has a huge scope in many fields such as sociology, psychology,
geology, weather forecasting, etc.
• The data collected here for analysis could be quantitative or qualitative.
Quantitative data are also of two types such as: discrete and continuous.
Discrete data has a fixed value whereas continuous data is not a fixed data
but has a range.
Exploring Descriptive and Inferential Statistics
• In general, you use statistics in decision making. Statistics come in two flavours:
• Descriptive: Descriptive statistics provide a description that illuminates some
characteristic of a numerical dataset, including dataset distribution, central
tendency (such as mean, min, or max), and dispersion (as in standard deviation
and variance).
• Inferential: Rather than focus on pertinent descriptions of a dataset, inferential
statistics carve out a smaller section of the dataset and attempt to deduce
significant information about the larger dataset.
• Use this type of statistics to get information about a real-world measure in which
you’re interested.
Descriptive Statics
• descriptive statistics describe the characteristics of a numerical dataset, but that
doesn’t tell you why you should care.
• most data scientists are interested in descriptive statistics only because of what
they reveal about the real-world measures they describe.
• For example, a descriptive statistic is often associated with a degree of accuracy,
indicating the statistic’s value as an estimate of the real-world measure.
• You can use descriptive statistics in many ways — to detect outliers, for example,
or to plan for feature pre-processing requirements or to quickly identify what
features you may want, or not want, to use in an analysis.
statistic Class value
Mean 79.18
Range 66.21 – 96.53
Proportion >= 70 86.7%
Inferential Statics
• inferential statistics are used to reveal something about a real-world measure.
• Inferential statistics do this by providing information about a small data selection,
so you can use this information to infer something about the larger dataset from
which it was taken.
• In statistics, this smaller data selection is known as a sample, and the larger,
complete dataset from which the sample is taken is called the population.
• If your dataset is too big to analyse in its entirety, pull a smaller sample of this
dataset, analyse it, and then make inferences about the entire dataset based on
what you learn from analysing the sample.
• You can also use inferential statistics in situations where you simply can’t afford
to collect data for the entire population.
• In this case, you’d use the data you do have to make inferences about the
population at large.
• At other times, you may find yourself in situations where complete information
for the population is not available. In these cases, you can use inferential statistics
to estimate values for the missing data based on what you learn from analysing the
data that is available
• For an inference to be valid, you must select your sample carefully so that you get
a true representation of the population.
• Even if your sample is representative, the numbers in the sample dataset will
always exhibit some noise — random variation, in other words — that guarantees
the sample statistic is not exactly identical to its corresponding population statistic.
Probability basics:-
• Probability denotes the possibility of the outcome of any random event.
• The meaning of this term is to check the extent to which any event is likely
to happen.
• For example, when we flip a coin in the air, what is the possibility of getting
a head? The answer to this question is based on the number of possible
outcomes. Here the possibility is either head or tail will be the outcome. So,
the probability of a head to come as a result is 1/2.
• The probability is the measure of the likelihood of an event to happen. It
measures the certainty of the event. The formula for probability is given by;
• P(E) = Number of Favourable Outcomes/Number of total outcomes
• P(E) = n(E)/n(S)
• Probability denotes the possibility of something happening.
• It is a mathematical concept that predicts how likely events are to
occur.
• The probability values are expressed between 0 and 1.
• The definition of probability is the degree to which something is likely
to occur.
• This fundamental theory of probability is also applied to probability
distributions.
Axioms of probability
• Axioms mean a rule a principle that most people believe to be true. It is the
premise on the basis of which we do further reasoning
• There are three axioms of probability that make the foundation of
probability theory-
• Axiom 1: Probability of Event
• The first one is that the probability of an event is always between 0 and 1. 1
indicates definite action of any of the outcome of an event and 0 indicates
no outcome of the event is possible.
• Two major kind of distributions based on the type of likely values for
the variables are,
1.Discrete Distributions
2.Continuous Distributions
Discrete Distribution Vs Continuous Distribution
• You can use the Naïve Bayes machine learning method, which was borrowed straight from the
statistics field, to predict the likelihood that an event will occur, given evidence defined in your
data features — something called conditional probability.
• Naïve Bayes, which is based on classification and regression, is especially useful if you need to
classify text data.
• This model is easy to build and is mostly used for large datasets. It is a probabilistic machine
learning model that is used for classification problems.
• The core of the classifier depends on the Bayes theorem with an assumption of independence
among predictors. That means changing the value of a feature doesn’t change the value of another
feature.
• Why is it called Naive?
• It is called Naive because of the assumption that 2 variables are independent when they may not
be. In a real-world scenario, there is hardly any situation where the features are independent.
• Conditional probability is defined as the likelihood of an event or outcome occurring, based on the occurrence
of a previous event or outcome. Conditional probability is calculated by multiplying the probability of the
preceding event by the updated probability of the succeeding, or conditional, event.
• A conditional probability would look at such events in relationship with one another.
• Conditional probability is thus the likelihood of an event or outcome occurring based on the occurrence of
some other event or prior outcome.
• Two events are said to be independent if one event occurring does not affect the probability that the other
event will occur.
• However, if one event occurring or not does, in fact, affect the probability that the other event will occur, the
two events are said to be dependent. If events are independent, then the probability of some event B is not
contingent on what happens with event A.
• A conditional probability, therefore, relates to those events that are dependent on one another.
• Conditional probability is often portrayed as the "probability of A given B," notated as P(A|B).
• Conditional probability is calculated by multiplying the probability of the preceding event by the probability
of the succeeding or conditional event.
• Four candidates A, B, C, and D are running for a political office. Each
has an equal chance of winning: 25%. However, if candidate A drops
out of the race due to ill health, the probability will change: P(Win |
One candidate drops out) = 33.33%.
The formula for conditional probability is:
P(B|A) = P(A and B) / P(A)
which you can also rewrite as:
P(B|A) = P(A∩B) / P(A)
Example:-
• In a group of 100 sports car buyers, 40 bought alarm systems, 30
purchased bucket seats, and 20 purchased an alarm system and bucket
seats. If a car buyer chosen at random bought an alarm system, what is
the probability they also bought bucket seats?
• Step 1: Figure out P(A). It’s given in the question as 40%, or 0.4.
• Step 2: Figure out P(A∩B). This is the intersection of A and B: both
happening together. It’s given in the question 20 out of 100 buyers, or
0.2.
• Step 3: Insert your answers into the formula:
P(B|A) = P(A∩B) / P(A) = 0.2 / 0.4 = 0.5
Bayes’ Theorem(Example)
• Mathematically Bayes’ theorem can be stated as:
Basically, we are trying to find the probability of event A, given event B is true.
Here P(B) is called prior probability which means it is the probability of an
event before the evidence
P(B|A) is called the posterior probability
• Probability of an event after the evidence is seen. With regards to our
dataset, this formula can be re-written as:
• Y: class of the variable
• X: dependent feature vector (of size n)
What is Naive Bayes?
• Bayes’ rule provides us with the formula for the probability of Y given some
feature X. In real-world problems, we hardly find any case where there is
only one feature.
• When the features are independent, we can extend Bayes’ rule to what is
called Naive Bayes which assumes that the features are independent that
means changing the value of one feature doesn’t influence the values of
other variables and this is why we call this algorithm “NAIVE”
• Naive Bayes can be used for various things like face recognition, weather
prediction, Medical Diagnosis, News classification, Sentiment Analysis, and
a lot more.
• When there are multiple X variables, we simplify it by assuming that X’s
are independent, so
For n number of X, the formula becomes Naive Bayes:
Naive Bayes Example
• Let’s take a dataset to predict whether we can pet an animal or not.
Assumptions of Naive Bayes
• All the variables are independent. That is if the animal is Dog that
doesn’t mean that Size will be Medium
• All the predictors have an equal effect on the outcome. That is, the
animal being dog does not have more importance in deciding If we can
pet him or not. All the features have equal importance.
• We should try to apply the Naive Bayes formula on the above dataset
however before that, we need to do some precomputations on our
dataset.
• We also need the probabilities (P(y)), which are calculated in the table
below. For example, P(Pet Animal = NO) = 6/14.
• Now if we send our test data, suppose test = (Cow, Medium, Black)
Probability of petting an animal :
We see here that P(Yes|Test) > P(No|Test), so the prediction that we can pet this animal
is “Yes”.
Types of Naïve Bayes
• Naïve Bayes comes in these three popular flavors:
• »»MultinomialNB: Use this version if your variables (categorical or continuous) describe discrete
frequency counts, like word counts.
• This version of Naïve Bayes assumes a multinomial distribution, as is often the case with text data.
• It does not except negative values.
• »»BernoulliNB: If your features are binary, you use multinomial Bernoulli Naïve Bayes to make
predictions.
• This version works for classifying text data, but isn’t generally known to perform as well as
MultinomialNB.
• If you want to use BernoulliNB to make predictions from continuous variables, that will work, but
you first need to sub-divide them into discrete interval groupings (also known as binning).
• »»GaussianNB: Use this version if all predictive features are normally distributed. It’s not a good
option for classifying text data, but it can be a good choice if your data contains both positive and
negative values (and if your features have a normal distribution, of course).
Quantifying Correlation
• Many statistical and machine learning methods assume that your features are independent.
• To test whether they’re independent, though, you need to evaluate their correlation — the extent
to which variables demonstrate interdependency.
• We will have brief introduction to Pearson correlation and Spearman’s rank correlation.
• Correlation is used to test relationships between quantitative variables or categorical variables. In
other words, it's a measure of how things are related. The study of how variables are correlated is
called correlation analysis.
• Some examples of data that have a high correlation: Your caloric intake and your weight.
• Correlation means to find out the association between the two variables and Correlation
coefficients are used to find out how strong the is relationship between the two variables. The most
popular correlation coefficient is Pearson’s Correlation Coefficient. It is very commonly used in
linear regression.
• Correlation is quantified per the value of a variable called r, which
ranges between –1 and 1.
• The closer the r-value is to 1 or –1, the more correlation there is
between two variables.
• If two variables have an r-value that’s close to 0, it could indicate that
they’re independent variables.
Calculating correlation with Pearson’s r
• If you want to uncover dependent relationships between continuous variables in
a dataset, you’d use statistics to estimate their correlation.
• The simplest form of correlation analysis is the Pearson correlation, which
assumes that
• Your data is normally distributed.
• You have continuous, numeric variables.
• Your variables are linearly related.
• Because the Pearson correlation has so many conditions, only use it to determine
whether a relationship between two variables exists, but not to rule out possible
relationships.
• If you were to get an r-value that is close to 0, it indicates that there is no linear
relationship between the variables, but that a nonlinear relationship between them
still could exist.
• Consider the example of car price detection where we have to detect
the price considering all the variables that affect the price of the car
such as carlength, curbweight, carheight, carwidth, fueltype, carbody,
horsepower, etc.
• We can see in the scatterplot, as the carlength, curbweight, carwidth
increases price of the car also increases.
• So, we can say that there is a positive correlation between the above
three variables with car price.
• Here, we also see that there is no correlation between the carheight
and car price.
• To find the Pearson coefficient, also referred to as the Pearson correlation
coefficient or the Pearson product-moment correlation coefficient, the two
variables are placed on a scatter plot. The variables are denoted as X and Y.
• There must be some linearity for the coefficient to be calculated; a scatter
plot not depicting any resemblance to a linear relationship will be useless.
• The closer the resemblance to a straight line of the scatter plot, the higher
the strength of association.
• Numerically, the Pearson coefficient is represented the same way as a
correlation coefficient that is used in linear regression, ranging from -1 to
+1.
Formula:-
Find the value of the correlation
coefficient from the following table:
• let’s first define what a dimension is. Given a matrix A, the dimension
of the matrix is the number of rows by the number of columns. If A
has 3 rows and 5 columns, A would be a 3x5 matrix.
• Now in the most simplest of terms, dimensionality reduction is exactly
what it sounds like, you’re reducing the dimension of a matrix to
something smaller than it currently is.
• Given a square (n by n) matrix A, the goal would be to reduce the
dimension of this matrix to be smaller than n x n.
• Current Dimension of A : n
Reduced Dimension of A : n - x, where x is some positive integer
• the most common application would be for data visualization
purposes. It’s quite difficult to visualize something graphically which
is in a dimension space greater than 3.
• Through dimensionality reduction, you’ll be able to transform your
dataset of 1000s of rows and columns into one small enough to
visualize in 3 / 2 / 1 dimensions.
What is dimensionality Reduction?
• As data generation and collection keeps increasing, visualizing it and
drawing inferences becomes more and more challenging.
• One of the most common ways of doing visualization is through
charts.
• Suppose we have 2 variables, Age and Height. We can use a scatter or
line plot between Age and Height and visualize their relationship
easily:
• Now consider a case in which we have, say 100 variables (p=100).
• In this case, we can have 100(100-1)/2 = 5000 different plots.
• It does not make much sense to visualize each of them separately.
• In such cases where we have a large number of variables, it is better to
select a subset of these variables (p<<100) which captures as much
information as the original set of variables.
• we can reduce p dimensions of the data into a subset of k dimensions
(k<<n). This is called dimensionality reduction.
Benefits of Dimensionality Reduction
• Factor analysis is along the same lines as SVD in that it’s a method you can use for filtering out
redundant information and noise from your data.
• An offspring of the psychometrics field, this method was developed to help you derive a root
cause, in cases where a shared root cause results in shared variance — when a variable’s variance
correlates with the variance of other variables in the dataset.
• A variables variability measures how much variance it has around its mean.
• The greater a variable’s variance, the more information that variable contains
• When you find shared variance in your dataset, that means information redundancy is at
play.
• You can use factor analysis or principal component analysis to clear your data of this information
redundancy.
• In order to apply Factor Analysis, we must make sure the data we have
is suitable for it.
• The simplest approach would be to look at the correlation matrix of
the features and identify groups of intercorrelated variables.
• If there are some correlated features with a correlation degree of more
than 0.3, perhaps it would be interesting to use Factor Analysis.
Groups of features highly intercorrelated will be merged into one
variable latent, called factor.
• Factor analysis makes the following assumptions:
• Your features are metric — numeric variables on which meaningful calculations
can be made.
• Your features should be continuous or ordinal.
• You have more than 100 observations in your dataset and at least 5 observations
per feature.
• Your sample is homogenous.
• There is r > 0.3 correlation between the features in your dataset.
• In factor analysis, you do a regression on features to uncover underlying latent
variables, or factors.
• You can then use those factors as variables in future analyses, to represent the
original dataset from which they’re derived.
• At its core, factor analysis is the process of fitting a model to prepare a dataset for
analysis by reducing its dimensionality and information redundancy.
Decreasing dimensionality and removing outliers with PCA
• A Linear Regression model’s main aim is to find the best fit linear
line and the optimal values of intercept and coefficients such that
the error is minimized.
The above graph presents the linear relationship between the output(y) variable and predictor(X)
variables. The blue line is referred to as the best fit straight line. Based on the given data points, we
attempt to plot a line that fits the points the best.
• Before using linear regression, though, make sure you’ve considered
its limitations:
• Linear regression only works with numerical variables, not categorical ones.
• If your dataset has missing values, it will cause problems. Be sure to address
your missing values before attempting to build a linear regression model.
• If your data has outliers present, your model will produce inaccurate results.
• Check for outliers before proceeding.
• The linear regression assumes that there is a linear relationship
between dataset features and the target variable. Test to make sure this
is the case, and if it’s not, try using a log transformation to
compensate.
• The linear regression model assumes that all features are independent
of each other.
• Prediction errors, or residuals, should be normally distributed.
• you should have at least 20 observations per predictive feature if you
expect to generate reliable results using linear regression.
Logistic regression
• Logistic Regression is a “Supervised machine learning” algorithm that can be used to model the
probability of a certain class or event. It is used when the data is linearly separable and the outcome
is binary in nature.
• Logistic regression is a machine learning method you can use to estimate values for a categorical
target variable based on your selected features.
• Your target variable should be numeric, and contain values that describe the target’s class — or
category.
• Logistic regression predicts the output of a categorical dependent variable. Therefore the outcome
must be a categorical or discrete value. It can be either Yes or No, 0 or 1, true or False, etc. but
instead of giving the exact value as 0 and 1, it gives the probabilistic values which lie between 0
and 1.
• Logistic Regression is much similar to the Linear Regression except that how they are used. Linear
Regression is used for solving Regression problems, whereas Logistic regression is used for
solving the classification problems.
• In Logistic regression, instead of fitting a regression line, we fit an "S"
shaped logistic function, which predicts two maximum values (0 or 1).
• The curve from the logistic function indicates the likelihood of something
such as whether the cells are cancerous or not, a mouse is obese or not
based on its weight, etc.
• Logistic Regression is a significant machine learning algorithm because it
has the ability to provide probabilities and classify new data using
continuous and discrete datasets.
• Logistic Regression can be used to classify the observations using different
types of data and can easily determine the most effective variables used for
the classification.
Logistic Function (Sigmoid Function):
•In Logistic Regression y can be between 0 and 1 only, so for this let's
divide the above equation by (1-y):
• But we need range between -[infinity] to +[infinity], then take
logarithm of the equation it will become:
• One cool thing about logistic regression is that, in addition to predicting the class of observations in
your target variable, it indicates the probability for each of its estimates. Though logistic regression
is like linear regression, it’s requirements are simpler, in that:
• There does not need to be a linear relationship between the features and target variable.
• Residuals don’t have to be normally distributed.
• Predictive features are not required to have a normal distribution.
• When deciding whether logistic regression is a good choice for you, make sure to consider the
following limitations:
• Missing values should be treated or removed.
• our target variable must be binary or ordinal.
• Predictive features should be independent of each other.
• Logistic regression requires a greater number of observations (than linear regression) to produce a
reliable result.
• The rule of thumb is that you should have at least 50 observations per predictive feature if you
expect to generate reliable results.
Example:-
• Let us consider a problem where we are given a dataset containing
Height and Weight for a group of people.
• Our task is to predict the Weight for new entries in the Height column.
• So we can figure out that this is a regression problem where we will
build a Linear Regression model.
• We will train the model with provided Height and Weight values.
• Once the model is trained we can predict Weight for a given unknown
Height value.
• Now suppose we have an additional field Obesity and we have to classify whether a
person is obese or not depending on their provided height and weight.
• This is clearly a classification problem where we have to segregate the dataset into two
classes (Obese and Not-Obese).
• So, for the new problem, we can again follow the Linear Regression steps and build a
regression line.
• This time, the line will be based on two parameters Height and Weight and the regression
line will fit between two discreet sets of values.
• As this regression line is highly susceptible to outliers, it will not do a good job in
classifying two classes.
• To get a better classification, we will find probability for each output value from the
regression line.
• Now based on a predefined threshold value, we can easily classify the output into two
classes Obese or Not-Obese.
Ordinary least squares (OLS) regression
methods
• Ordinary Least Squares regression (OLS) is a common technique for estimating coefficients
of linear regression equations which describe the relationship between one or more independent
quantitative variables and a dependent variable (simple or multiple linear regression).
• Least squares stand for the minimum squares error (SSE). Maximum likelihood and Generalized
method of moments estimator are alternative approaches to OLS.
• Example: We want to predict the height of plants depending on the number of days they have
spent in the sun. Before getting exposure, they are 30 cm. A plant grows 1 mm (0.1 cm) after being
exposed to the sun for a day.
• Y is the height of the plants
• X is the number of days spent in the sun
• β0 is 30 because it is the value of Y when X is 0.
• β1 is 0.1 because it is the coefficient multiplied by the number of days.
• A plant being exposed 5 days to the sun has therefore an estimated height of Y = 30 + 0.1*5 = 30.5
cm.
How do ordinary least squares (OLS) work?
26 37 24 28 35 22 31 53 41 64 29
Step 1: Sort your data from low to high
First, you’ll simply sort your data in ascending order.
22 24 26 28 29 31 35 37 41 53 64
Step 2: Identify the median, the first quartile (Q1), and the third quartile (Q3)
The median is the value exactly in the middle of your dataset when all values are ordered from low to
high.
Since you have 11 values, the median is the 6th value. The median value is 31.
22 24 26 28 29 31 35 37 41 53 64
• Next, we’ll use the exclusive method for identifying Q1 and Q3. This
means we remove the median from our calculations.
• The Q1 is the value in the middle of the first half of your dataset,
excluding the median. The first quartile value is 26.
22 24 26 28 29
Your Q3 value is in the middle of the second half of your dataset, excluding the median. The third
quartile value is 41.
35 37 41 53 64
Calculate your IQR
The IQR is the range of the middle half of your dataset. Subtract Q1 from Q3 to calculate the IQR.
Formula Calculation
IQR = Q3 – Q1 Q1 = 26
Q3 = 41
IQR = 41 – 26
= 15
Calculate your upper fence
The upper fence is the boundary around the third quartile. It tells you that any values
exceeding the upper fence are outliers.
Formula Calculation
Upper fence = Q3 + (1.5 * IQR) Upper fence = 41 + (1.5 * 15)
= 41 + 22.5
= 63.5
Calculate your lower fence
The lower fence is the boundary around the first quartile. Any values less than the lower fence
are outliers.
Formula Calculation
Lower fence = Q1 – (1.5 * IQR) Lower fence = 26 – (1.5 * IQR)
= 26 – 22.5
= 3.5
Use your fences to highlight any outliers
Go back to your sorted dataset from Step 1 and highlight any values that are greater than the upper
fence or less than your lower fence.
These are your outliers.
•Upper fence = 63.5
•Lower fence = 3.5
22 24 25 28 29 31 35 37 41 53 64
• In comparison, a Tukey boxplot is a pretty easy way to spot outliers.
• Each boxplot has whiskers that are set at 1.5*IQR. Any values that lie
beyond these whiskers are outliers.
• Figure shows outliers as they appear within a Tukey boxplot.
Detecting outliers with multivariate analysis
• Sometimes outliers only show up within combinations of data points from disparate variables.
• These outliers really wreak havoc on machine learning algorithms, so it’s important to detect and
remove them.
• You can use multivariate analysis of outliers to do this.
• A multivariate approach to outlier detection involves considering two or more variables at a time
and inspecting them together for outliers.
• There are several methods you can use, including
• Scatter-plot matrix
• Boxplot
• Density-based spatial clustering of applications with noise (DBScan)
• Principal component analysis (PCA)
Introducing Time Series Analysis
• A time series is just a collection of data on attribute values over time.
• Time series analysis is performed to predict future instances of the measure based
on the past observational data.
• To forecast or predict future values from data in your dataset, use time series
techniques
• In time series the order of observations provides a source of additional information
that should be analysed and used in the prediction process
• Time series are typically assumed to be generated at regularly spaced interval of
time (e.g. daily temperature), and so are called regular time series.
• A Time-Series represents a series of time-based orders. It would be
Years, Months, Weeks, Days, Horus, Minutes, and Seconds
• A time series is an observation from the sequence of discrete-time of
successive intervals.
• A time series is a running chart.
• Time Series Analysis (TSA) is used in different fields for time-based
predictions – like Weather Forecasting, Financial, Signal processing,
Engineering domain – Control Systems, Communications Systems.
• Since TSA involves producing the set of information in a particular
sequence, it makes a distinct from spatial and other analyses.
• Time series can have one or more variables that change over time.
• If there is only one variable varying over time, we call it Univariate
time series.
• If there is more than one variable it is called Multivariate time series.
How to analyse Time Series?
• Quick steps here for your reference, anyway. Will see this in detail in
this article later.
• Collecting the data and cleaning it
• Preparing Visualization with respect to time vs key feature
• Observing the stationarity of the series
• Developing charts to understand its nature.
• Model building – AR, MA, ARMA and ARIMA
• Extracting insights from prediction
Identifying patterns in time series
• Time series exhibit specific patterns.
• Take a look at Figure to get a better understanding of what these patterns are all
about.
• Constant time series remain at roughly the same level over time, but are subject
to some random error.
• In contrast, trended series show a stable linear movement up or down.
• Whether constant or trended, time series may also sometimes exhibit seasonality
— predictable, cyclical fluctuations that reoccur seasonally throughout a year.
• As an example of seasonal time series, consider how many businesses show
increased sales during the holiday season.
• Let’s discuss the time series’ data types and their influence. While
discussing TS data-types, there are two major types.
• Stationary
• Non- Stationary
• 6.1 Stationary: A dataset should follow the below thumb rules,
without having Trend, Seasonality, Cyclical, and Irregularity
component of time series
• The MEAN value of them should be completely constant in the data
during the analysis
• The VARIANCE should be constant with respect to the time-frame
• The COVARIANCE measures the relationship between two variables.
• 6.2 Non- Stationary: This is just the opposite of Stationary.
• If you’re including seasonality in your model, incorporate it in the quarter, month,
or even 6-month period — wherever it’s appropriate.
• Time series may show nonstationary processes — or, unpredictable cyclical
behaviour that is not related to seasonality and that results from economic or
industry-wide conditions instead.
• Because they’re not predictable, nonstationary processes can’t be forecasted.
• You must transform nonstationary data to stationary data before moving forward
with an evaluation.
Modelling univariate time series data