Research in EDA Calid

Eulogio “AMANG” Rodriguez Institute of Science and Technology
Nagtahan Sampaloc, Manila
RESEARCH IN ENGINEERING
IN DATA ANALYSIS
Submitted by: John Michael L. Calidro

BSEE-2B
Submitted to: Mr. Duyan, Nelson

TABLE OF CONTENTS
Introduction ............................................................................................................................................. 4
I. Introduction to Probability and Data ............................................................................................... 5
Introduction to Data Analysis ............................................................................................................. 5
The four types of data analysis are: ................................................................................................ 6
Descriptive Analysis ....................................................................................................................... 6
Diagnostic Analysis ........................................................................................................................ 6
Predictive Analysis ......................................................................................................................... 6
Prescriptive Analysis ...................................................................................................................... 7
Discrete Random Variable .................................................................................................................. 7
Mean And Variance Of Discrete Random Variable ....................................................................... 7
Mean Of Discrete Random Variable ............................................................................................... 7
Variance Of Discrete Random Variable ......................................................................................... 8
Discrete Random Variable - Types ................................................................................................. 8
Binomial Random Variable ............................................................................................................ 8
Geometric Random Variable .......................................................................................................... 8
Bernoulli Random Variable ............................................................................................................ 8
Poisson random variable ................................................................................................................. 9
Continuous Random Variable ............................................................................................................. 9
PDF of Continuous Random Variable ............................................................................................ 9
CDF of Continuous Random Variable .......................................................................................... 10
Mean of Continuous Random Variable ........................................................................................ 10
Variance of Continuous Random Variable ................................................................................... 10
Joint Probability Distribution ............................................................................................................ 10
II. Inferential Statistics .......................................................................................................................... 11
Sampling Distributions and Estimations ........................................................................................... 12
Sampling Distributions ................................................................................................................. 12
Types of Distribution .................................................................................................................... 12
Estimations.................................................................................................................................... 13
Estimation about single sample......................................................................................................... 13
Point Estimate ............................................................................................................................... 13
Interval Estimate ........................................................................................................................... 13
Confidence intervals ..................................................................................................................... 14
Testing of Hypothesis on a Single Sample ....................................................................................... 14
Comparison tests ........................................................................................................................... 15
Correlation Test ............................................................................................................................ 15
Regression Test ............................................................................................................................. 15
III. Regression and Modelling .............................................................................................................. 15
Regression and Econometrics ........................................................................................................... 16
Calculating Regression ..................................................................................................................... 16
Analysis of Variance ......................................................................................................................... 16
One-Way ANOVA Versus Two-Way ANOVA ........................................................................... 17
One-Way ANOVA........................................................................................................................ 17
Two-Way ANOVA ....................................................................................................................... 17
Regression and Correlation ............................................................................................................... 17
Regression ..................................................................................................................................... 18
Correlation .................................................................................................................................... 18
Correlation and Regression Formula ............................................................................................ 19
Deisgn and Analysis of Experiments ................................................................................................ 20
Principles of experimentation ....................................................................................................... 20
Completely randomized design..................................................................................................... 21
Randomized complete block design ............................................................................................. 21
REFERENCES: .................................................................................................................................... 21
Introduction
When it comes to properly analysing processes, integration, and yield to
improve a company's competitiveness, engineering teams across industries rely
heavily on Engineering Data Analysis (EDA).
Every day, at all times, a huge variety of sources provide data for collection.
Photographs and numbers are constantly being gathered by government agencies,
consumer groups, and other organisations from all over the world. These range from
security cameras that use facial recognition technology when people enter a building
to mobile devices that track shopping, media, and communication habits.
There are many different types of data analysis techniques that fall under the
umbrella term "data analytics". Any type of information can benefit from the
application of data analytics techniques, which can yield insights that can be used to
improve things. Metrics and trends that would otherwise be lost in the volume of
information can be uncovered by using data analytics technologies. By optimising
procedures with this information, a system or organisation can boost its overall
efficiency.
Businesses can make wise decisions and stay out of expensive traps with the
aid of data analysis. Data is necessary to make educated decisions, but the process
goes deeper than that. The relevant data must be precise. Businesses can obtain
pertinent and accurate information through data analysis, which is useful for creating
business planning, marketing strategies, and realigning the company's vision and
mission.
A snapshot of what students know, ought to know, and what can be done to
support their academic requirements can be obtained through data analysis. By
properly analysing and interpreting data, educators can make well-informed decisions
that have a favourable impact on the academic performance of their students.
I. Introduction to Probability and Data
The two sciences that impact and direct our daily lives are statistics, the science
of data interpretation, and probability, the science of chance. They are crucial to the
process of producing scientific discoveries and are used to forecast the weather and
assess the efficacy of medications.
It is critical to recognise the distinctions and overlaps between statistics and

probability. Though unrelated, they are linked subjects.
A theoretical discipline called probability is used to assess the possibility that

occurrences will occur in the future. Statistics, on the other hand, is an applied field
that analyses acquired data using chances theory.
The relationship between probability and statistics lies in the comparison of the
ideas developed in probability mathematics with statistical results that provide further
insights into the data. Additionally, statistics can be used to calculate the likelihood of
a future event.
Introduction to Data Analysis

After extracting the raw data and organising it, a data analyst will analyse it to
turn the seemingly meaningless statistics into information that makes sense. Once the
data has been examined, the data analyst will share their conclusions with the firm by
making recommendations or ideas regarding the best course of action.
When applied to specific issues and challenges inside an organisation, data

analytics can be thought of as a type of business intelligence. Finding patterns in a
dataset that can provide important and pertinent information about a certain aspect of
the business such as the behaviour of particular consumer groups or employee usage of
a particular tool is the key.
The four types of data analysis are:
Descriptive Analysis
Diagnostic Analysis
Predictive Analysis
Prescriptive Analysis
Descriptive Analysis
Descriptive analysis is the initial kind of data processing. All data insight is built
upon it. In business today, it is the most straightforward and typical application of data.
Using a dashboard-style summary of historical data, descriptive analysis provides an
explanation for "what happened.”
Business applications of descriptive analysis include:

● Monthly revenue reports
● Sales leads overview
Diagnostic Analysis
Diagnostic analysis probes deeper to identify the reasons behind those results.
Because it finds patterns of behaviour and makes more connections between data,
organisations utilise this kind of analytics.When fresh issues come up, chances are you
have already gathered some information about the situation. Having the data at your
disposal eliminates the need for duplication of effort and connects all issues.
Business applications of diagnostic analysis include:
● A freight company investigating the cause of slow shipments in a certain region

● A SaaS company drilling down to determine which marketing activities
increased trials
Predictive Analysis
The goal of predictive analysis is to provide an answer to the query "what is
likely to happen." This kind of analysis is an advancement above descriptive and
diagnostic studies in that it makes predictions about future events based on historical
data. With the help of the data we've compiled, predictive analysis can logically
forecast how events will turn out.
Business applications of predictive analysis include:

● Risk Assessment
● Sales Forecasting
● Using customer segmentation to determine which leads have the best chance
of converting
● Predictive analytics in customer success teams
Prescriptive Analysis
The cutting edge of data analysis is called prescriptive analysis, which uses
cutting edge data techniques and technology to synthesise the knowledge from all prior
analyses to suggest a course of action for a given problem or choice. Businesses must
ensure they are prepared and willing to invest the necessary time, energy, and resources
because it is a significant organisational commitment.
Discrete Random Variable

Any whole numerical value that can result from a random experiment is a
discrete random variable. A countable number of alternative outcomes are accepted by
the discrete random variable; these can be enumerated as 0, 1, 2, 3, 4,...
There are two forms of data: discrete and continuous. In this case, a discrete
random variable has been taken into consideration. Additionally, an algebraic variable
and a discrete random variable are not the same thing. A discrete random variable
accepts several values, but an algebraic variable only accepts one.
Mean And Variance Of Discrete Random Variable

To have a greater understanding of discrete random variables, one might utilise
the mean and variance of the variable.
Mean Of Discrete Random Variable

The term "mean" refers to the average value of a random variable. The expected
value is another name for the mean. E[X] is typically used to indicate it. X is the random
variable in this case. The weighted average of all the variable's values can also be used
to define the mean or expected value of a random variable.The formulas for the mean
of a random variable are given below:
Mean of a Discrete Random Variable: E[X] = ∑xP(X=x). Here P(X = x) is the

probability mass function.
Variance Of Discrete Random Variable
The expected value of the square of the random variable's deviation from the mean is
the definition of variance for a random variable. Var[X] or σ² indicates a random
variable's variance.If μ is the mean then the formula for the variance is given as follows:
Variance of a Discrete Random Variable: Var[X] = ∑(x−μ)²P(X=x)
Discrete Random Variable - Types

A random variable that has a limited number of different values is called a discrete
random variable. A random variable's range of values and frequency of values can be
ascertained using a probability distribution. The following are some examples of
discrete random variables linked to various probability distributions.
Binomial Random Variable

A binomial random variable is one that expresses the quantity of successes in a
binomial experiment. A binomial experiment can only result in one of two outcomes:
success or failure. It also has a fixed number of repeated Bernoulli trials. The variables
n and p indicate the number of tries and success probability, respectively.
A binomial random variable, X, is written as X∼Bin(n,p)
The probability mass function is given as P(X=x)=(n/x)p^x(1−p)^n−x
Geometric Random Variable

When a Bernoulli trial reaches its first success, the number of following failures
is represented as a random variable known as a geometric random variable. In a
Bernoulli trial, p represents the probability of success and 1 - p represents the likelihood
of failure.
A geometric random variable is written as X∼G(p)

The probability mass function is P(X = x) = (1 - p)^x-1 p
Bernoulli Random Variable

The most basic kind of random variable is a Bernoulli random variable. There
are just two potential values: 0 to indicate a failure and 1 to indicate a success.
A Bernoulli random variable is given by X∼Bernoulli (p), where p represents
the success probability.
Probability mass function: P(X = x) = p if x=1 and 1−p if x=0
Poisson random variable
A Poisson random variable is used to display the number of times an event will
happen in a specified amount of time. These things happen on their own and at a steady
pace. A Poisson distribution's parameter is provided by
λ is always higher than zero.
A Poisson random variable is represented as X∼Poisson(λ)

The probability mass function is given by P(X = x) = λxe/λx!
Continuous Random Variable

A random variable that can have values along a continuum is called a continuous
random variable. Measurements like height, weight, time, etc. are represented by
continuous random variables. An continuous random variable can be represented as the
area under a density curve.
Any random variable that has an unlimited number of possible values is said to
be continuous. This means that the probability of a continuous random variable taking
on a precise value is zero. Characteristics of a continuous random variable are
expressed in terms of the probability density function and cumulative distribution
function.
The probabilities connected to a continuous random variable are represented by
the probability density function (pdf) and the cumulative distribution function (CDF).
PDF of Continuous Random Variable

A function that indicates the probability that a random variable's value will fall
inside a range of values is known as the probability density function of a continuous
random variable. If X is the continuous random variable, then the following is the
formula for the pdf, or f(x): f(x) = dF(x)/dx = F'(x) where, F(x) is the cumulative
distribution function.
CDF of Continuous Random Variable
By integrating the probability density function, one are able to determine the
cumulative distribution function of a continuous variable. In other words, it is the
probability that a given value, x, will be smaller than or equal to the random variable,
X.The following formula can be used to find the CDF of a continuous random variable
that is determined between two points, a and b:
Mean and Variance of a continuous random variable can also be determined with
the help of the probability density function, f(x).
Mean of Continuous Random Variable

The weighted average value of a continuous random variable, X, is the definition
of its mean. Another term for it is the continuous random variable's expectation. The
formula for it is:
Variance of Continuous Random Variable

The expectation of the squared variation from the mean is the definition of
variance for a continuous random variable. Finding the dispersion in the continuous
random variable's distribution with respect to the mean is helpful. The formula for it is:
Joint Probability Distribution

A statistical metric that determines the chance of two events happening equally
at the same point in time is called a joint probability. Both events must be
independent of one another, which means they cannot be conditional or reliant on one
another, for joint probability to hold. Venn diagrams can be used to display joint
probability.
Notation for joint probability can take a few different forms. The following
formula represents the probability of events intersection:
P (X⋂Y)
where:
X,Y=Two different events that intersect
P(X and Y),P(XY)=The joint probability of X and Y
An intersection is the symbol "∩" in a joint probability. The intersection of A

and B is the same as the probability that both events, A and B, will occur. As a result,
the intersection of multiple events is another name for the joint probability.
In this example, the dice are used, and the goal is to determine the probability
that each die would result in a four. Remember that in a die there's six sides.
In order to determine the joint probability, we first need to determine the

probability of each roll:
The chance of rolling a three on the first die is ⅙.
The chance of rolling a three on the second die is ⅙.
By multiplying each individual event together, we can apply the joint probability
formula that we found to determine the joint probability for this example.
1/6 x 1/6 = 1/36
This means that there is a 1/36 chance of rolling two fours using a pair of dice.
II. Inferential Statistics

The part of statistics known as inferential statistics uses a variety of analytical
techniques to draw conclusions about population data from sample data. Descriptive
statistics is another area of statistics in addition to inferential statistics. While
descriptive statistics provide an overview of the characteristics of the data collection,
inferential statistics aid in making inferences about the population.
The study of statistics that employs analytical methods to infer information about
a population from random sample analysis is known as inferential statistics. Expanding
about a population is the aim of inferential statistics.
Sampling Distributions and Estimations
Sampling Distributions
A type of probability distribution called a sampling distribution of a statistic is
produced by selecting numerous random samples of a specific size from the same
population. You will better understand how a sample statistic changes between samples
by using these distributions.
Researchers, marketers, statisticians, analysts, and academics can all draw
significant conclusions about particular subjects and data thanks to data. It can assist
governments in planning for services required by a population, or it can assist
enterprises in making decisions about their future and improving their performance.
Types of Distribution
Sampling distribution of mean

The mean sampling distribution is the most common kind. Plotting the data points
and determining the means of each sample group selected from the population are its
main objectives. The sample distribution mean, which is the population mean overall,
is shown as the graph's centre in a normal distribution.
Sampling distribution of proportion

The proportions in a population are the main focus of this sampling distribution.
You choose samples and figure out how much of each. The percentage of the total
population is represented by the means of the sample proportions from each group.
T-distribution
A sampling distribution with a tiny population or one about which you don't
know much is called a T-distribution. In addition to other statistics like confidence
intervals, statistical differences, and linear regression, it is used to estimate the
population mean. A t-score is used by the T-distribution to assess data that doesn't fit
into a normal distribution.
The formula for t-score is:
t = [ x - μ ] / [ s / sqrt( n ) ]
In the formula, "x" is the sample mean and "μ" is the population mean and
signifies standard deviation.
Estimations
Any statistical process used to determine a population's value based on
observations made within a sample size taken from that population is called an
estimation. Point and interval estimate are the two categories of estimating.
To determine the true value of an item or function that might be present in a
population, estimation in statistics is utilised. Therefore, figuring out this truth value
can be aided by using a sample of the population.
Estimation about single sample

The difference between a parameter and its related statistic is known as sampling
error. You can use inferential statistics to estimate population parameters while
accounting for sampling error, as in most cases you don't know the true value.
Point estimates and interval estimates are the two main forms of population
estimates that you can create.
Point Estimate
A parameter's single value estimate is called a point estimate. A sample mean, for
example, is a point estimate of the population mean.
Interval Estimate
A range of values within which the parameter is anticipated to lie is provided by
an interval estimate. The most popular kind of interval estimate is a confidence interval.
Both types of estimates are important for gathering a clear idea of where a
parameter is likely to lie.
Confidence intervals
An interval estimate for a parameter is produced by a confidence interval using
the variability surrounding a statistic. Because they account for sample error,
confidence intervals are helpful in parameter estimation.
There is a confidence level assigned to each confidence interval. When you repeat
a study, a confidence level indicates the likelihood (in percentage terms) that the
interval containing the parameter estimate will occur.
Example: Point estimate and confidence interval

You want to know the average number of paid vacation days that employees at an
international company receive. After collecting survey responses from a random
sample, you calculate a point estimate and a confidence interval.
Your point estimate of the population mean paid vacation days is the sample mean of
19 paid vacation days.
With random sampling, a 95% confidence interval of [16 22] means you can be
reasonably confident that the average number of vacation days is between 16 and 22.
Testing of Hypothesis on a Single Sample

A formal method of statistical analysis employing inferential statistics is
hypothesis testing. Comparing populations or evaluating the correlations between
variables with samples is the aim of hypothesis testing.
Both parametric and non-parametric statistical tests are available. Because
parametric tests are more likely to identify an impact if one exists, they are regarded as
having greater statistical power.
Non-parametric tests are more appropriate when any of these assumptions are
broken by your data. Because non-parametric tests make no assumptions about the
population data's distribution, they are sometimes known as "distribution-free tests."
There are three types of statistical tests: regression, correlation, and comparison
tests.
Comparison tests
Comparison tests determine whether the means, medians, or rankings of scores
from two or more groups differ from one another.Take into account the quantity of
samples, the levels of measurement of your variables, and whether your data satisfies
the requirements for parametric tests when determining which test best fits your goals.
Correlation Test
Correlation tests determine the extent to which two variables are associated.
Regression Test
Testing for causality between changes in predictor variables and changes in an
outcome variable is done using regression analysis. Which regression test to choose
will depend on how many and what kinds of variables you have as outcomes and
predictors.
Regression tests that are used often are primarily parametric. You can do data
transformations if your data is not regularly distributed.
Using mathematical procedures such as calculating the square root of each number,
data transformations assist you in transforming your data into a regularly distributed
set.
III. Regression and Modelling

Finding the degree and nature of the relationship between a single dependent
variable (sometimes represented by Y) and a number of other factors (referred to as
independent variables) is the goal of the statistical technique known as regression,
which is applied in the fields of finance, investing, and other sciences.Therefore, a
straight line with a slope that indicates how a change in one variable affects a change
in another is used to graphically represent linear regression. In a linear regression
connection, the value of one variable is represented by the y-intercept while the value
of the other is zero. There are non-linear regression models as well, although they are
far more intricate.
Although it is a useful technique for identifying correlations between
variables in data, regression analysis is not always able to establish causality. In the
fields of business, finance, and economics, it has multiple applications.
Regression and Econometrics
A group of statistical methods called econometrics are applied to the analysis of
financial and economic data. Regression analysis can be used to determine the strength
of the relationship between income and consumption and whether it is statistically
significant, meaning that it seems unlikely that the relationship is the result of pure
chance, if the data support the existence of such an association.
A common criticism of econometrics is that it places undue emphasis on the
interpretation of regression results without tying it to economic theory or investigating
causal mechanisms. It is essential that a theory be able to sufficiently explain the data's
findings, even if it requires coming up with your own explanation about the underlying
mechanisms.
Calculating Regression
The least-squares method is frequently used in linear regression models to find
the line of greatest fit. Minimising the sum of squares produced by a mathematical
function yields the least-squares method. The distance between a data point and the
regression line or mean value of the data set is then squared to determine a square.
Analysis of Variance
The statistical analysis tool known as analysis of variance (ANOVA) divides the
observed aggregate variability present in a data set into two categories: systematic
variables and random factors. On the provided data set, the systematic components
have a statistical impact, but the random factors do not. In a regression research,
analysts employ the ANOVA test to ascertain the impact of independent factors on the
dependent variable.
The Formula for ANOVA is:

F=MST/MSE
where: F = ANOVA coefficient
MST =Mean sum of squares due to treatment
MSE = Mean sum of squares due to error
The first step in examining the factors influencing a particular data set is to run
an ANOVA test. After the test is complete, an analyst tests the systematic aspects that
contribute meaningfully to the inconsistent data set again. The findings of the ANOVA
test are used by the analyst in an f-test to produce more data that supports the suggested
regression models.
One-Way ANOVA Versus Two-Way ANOVA
One-Way ANOVA
ANOVA comes in two primary varieties: one-way, or linear, and two-way.
ANOVA variants are also available. MANOVA, or multivariate ANOVA, is different
from ANOVA in that it evaluates more than one dependent variable at a time whereas
ANOVA only evaluates one dependent variable at a time. The term "one-way" or "two-
way" describes how many independent variables are included in your analysis of
variance test. An ANOVA done one way assesses the effect of one factor on one
response variable. It establishes if every sample is the same or not. When comparing
the means of three or more independent (unrelated) groups, one-way ANOVA is
performed to see if there are any statistically significant differences.
Two-Way ANOVA
An expansion of the one-way ANOVA is the two-way ANOVA. One
independent variable influences a dependent variable in a one-way relationship. A two-
way ANOVA has two independent variables. A business can compare employee
productivity based on two independent factors, such skill set and salary, by using a two-
way ANOVA, for instance. It is employed to monitor the interplay between the two
variables and assess the simultaneous influence of two variables.
Regression and Correlation

The two methods most frequently employed to look at the relationship between
quantitative variables are regression and correlation. Regression in this context is
linear regression. While linear regression employs an equation to represent this
relationship, correlation is employed to show the link between the variables.
When defining a link between quantitative variables that are thought to have a
linear relationship, correlation and regression are utilised.
Regression
One definition of regression is a measurement that quantifies the impact of
changing one variable on another. Regression analysis is used to determine the
relationship between two variables. Since it is simpler to analyse than the other types
of regression, linear regression is the one that is employed the most frequently. To
determine a link between variables, linear regression is used to locate the line that fits
the data the best.
In order to estimate the value of the unknown variable using the information of
the known factors, regression analysis is used to establish the link between two
variables. The best-fitting line through the data points is the aim of linear regression.
For two variables, x, and y, the regression analysis can be visualized as follows:
Correlation
A measurement that quantifies the link between variables is called correlation. A
direct correlation exists between two variables when an increase or decrease in one
results in a commensurate increase or decrease in the other. In a similar vein, variables
are said to be indirectly associated if a rise in one results in a reduction in another or
vice versa. They are uncorrelated if a change in one of the independent variables has
no effect on the dependent variable.
The purpose of this correlation analysis is to ascertain whether the variables under
the investigation are related. Additionally, a signed numerical number that represents
the correlation's strength and direction is obtained using a correlation coefficient, such
as Pearson's correlation coefficient.
The scatter plot gives the correlation between two variables x and y for
individual data points as shown below.
Correlation and Regression Formula

Using Pearson's correlation coefficient and the least squares approach are the
best ways to perform regression analysis and correlation, respectively.
Pearson's Correlation Coefficient:
Ordinary Least Squares (OLS) Linear Regression:

Deisgn and Analysis of Experiments
Design and analysis of experiments refers to organising an experiment to gather
relevant data and deriving conclusions from the data regarding any issue being studied.
This could include everything from clearly stating the experiment's goals to the last
stages of creating reports that include the crucial research findings.
Principles of experimentation
The three fundamental concepts of randomization, replication, and local control
are used in almost all studies. These three guidelines complement one another in a sense
as they work to improve experiment accuracy and offer a reliable test of significance
while maintaining the unique characteristics of each principle's function in any
experiment. It would be helpful to comprehend the general words used in conceptual
experimental designs as well as the nature of variation among observations in an
experiment before delving into the specifics of these three principles.
It's also intriguing to remember that these errors, which are caused by unrelated
factors and enter the experimental observations, can occur in a systematic or random
manner. Errors resulting from equipment wear and tear, such as a spring balance that
becomes misaligned after frequent use, or errors caused by human fatigue are instances
of systematic error. However, in a related experiment, there was an unpredictable
fluctuation in the amount of leaves gathered in litter traps under different treatments.
This variation was random in nature. While it is quite possible that the random errors
would cancel out with repeated measurements, it is evident that no number of repeated
measurements could overcome systematic error.
Completely randomized design
Randomization is the process of allocating the treatments or factors to be tested
to the experimental units in accordance with certain laws or probability. The precise
technical definition of randomization ensures that systematic error is eliminated.
Additionally, it guarantees that any error component remaining in the observations is
entirely random. This offers a foundation for estimating random fluctuations
accurately, which is crucial for determining the importance of real variations.
Randomized complete block design

In forestry research, one of the most used experimental designs is the randomised
complete block design (RCBD). The design is particularly well-suited for field studies
in which there are a limited number of treatments and a distinguishing characteristic
that allows for the identification of homogeneous groups of experimental units. The
inclusion of equal-sized blocks that contain all of the treatments is the main
characteristic that sets the RCBD apart.
REFERENCES:
https://www.fs-technology.com/EN/EDA-
en.html#:~:text=Engineering%20Data%20Analysis%20(EDA)%20is,the%20competit
iveness%20of%20the%20company.
https://engineeringonline.ucr.edu/blog/engineering-data-analysis/
https://www.investopedia.com/terms/d/data-
analytics.asp#:~:text=Data%20analytics%20help%20a%20business,raw%20data%20
for%20human%20consumption.
https://www.simplilearn.com/data-analysis-methods-process-types-
article#:~:text=Data%20analysis%20helps%20businesses%20make,in%20question%
20must%20be%20accurate.
https://sedl.org/pubs/sedl-letter/v22n02/using-
data.html#:~:text=Data%20analysis%20can%20provide%20a,that%20positively%20
affect%20student%20outcomes.
https://www.studysmarter.co.uk/explanations/math/probability-and-
statistics/#:~:text=Probability%20and%20Statistics-
,Probability%20and%20Statistics,process%20in%20making%20scientific%20breakth
roughs.
https://careerfoundry.com/en/blog/data-analytics/what-is-data-analytics/#what-is-
data-analytics
https://www.cuemath.com/algebra/discrete-random-variable/
https://www.investopedia.com/terms/j/jointprobability.asp
https://www.cuemath.com/data/inferential-statistics/
https://www.indeed.com/career-advice/career-development/what-is-sampling-
distribution
https://www.scribbr.com/statistics/inferential-statistics/
https://www.investopedia.com/terms/r/regression.asp#:~:text=%25%2025%25%200
%25-
,What%20Is%20a%20Regression%3F,(known%20as%20independent%20variables).
https://www.investopedia.com/terms/a/anova.asp#:~:text=Analysis%20of%20varianc
e%2C%20or%20ANOVA,the%20dependent%20and%20independent%20variables.
https://www.cuemath.com/data/correlation-and-regression/

Research in EDA Calid

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Research in EDA Calid

Uploaded by

Copyright:

Available Formats

Eulogio “AMANG” Rodriguez Institute of Science and Technology

Nagtahan Sampaloc, Manila

Submitted by: John Michael L. Calidro

Submitted to: Mr. Duyan, Nelson

It is critical to recognise the distinctions and overlaps between statistics and

A theoretical discipline called probability is used to assess the possibility that

Introduction to Data Analysis

When applied to specific issues and challenges inside an organisation, data

Business applications of descriptive analysis include:

Business applications of diagnostic analysis include:

● A freight company investigating the cause of slow shipments in a certain region

Business applications of predictive analysis include:

Discrete Random Variable

Mean And Variance Of Discrete Random Variable

Mean Of Discrete Random Variable

Mean of a Discrete Random Variable: E[X] = ∑xP(X=x). Here P(X = x) is the

Variance of a Discrete Random Variable: Var[X] = ∑(x−μ)²P(X=x)

Discrete Random Variable - Types

Binomial Random Variable

Geometric Random Variable

A geometric random variable is written as X∼G(p)

Bernoulli Random Variable

A Poisson random variable is represented as X∼Poisson(λ)

Continuous Random Variable

PDF of Continuous Random Variable

Mean of Continuous Random Variable

Variance of Continuous Random Variable

Joint Probability Distribution

An intersection is the symbol "∩" in a joint probability. The intersection of A

In order to determine the joint probability, we first need to determine the

1/6 x 1/6 = 1/36

II. Inferential Statistics

Sampling Distributions and Estimations

Sampling distribution of mean

Sampling distribution of proportion

Estimation about single sample

Example: Point estimate and confidence interval

Testing of Hypothesis on a Single Sample

III. Regression and Modelling

The Formula for ANOVA is:

One-Way ANOVA Versus Two-Way ANOVA

Regression and Correlation

Correlation and Regression Formula

Pearson's Correlation Coefficient:

Ordinary Least Squares (OLS) Linear Regression:

Randomized complete block design

You might also like