Professional Documents
Culture Documents
Research in EDA Calid
Research in EDA Calid
RESEARCH IN ENGINEERING
IN DATA ANALYSIS
The relationship between probability and statistics lies in the comparison of the
ideas developed in probability mathematics with statistical results that provide further
insights into the data. Additionally, statistics can be used to calculate the likelihood of
a future event.
Descriptive Analysis
Descriptive analysis is the initial kind of data processing. All data insight is built
upon it. In business today, it is the most straightforward and typical application of data.
Using a dashboard-style summary of historical data, descriptive analysis provides an
explanation for "what happened.”
Diagnostic Analysis
Diagnostic analysis probes deeper to identify the reasons behind those results.
Because it finds patterns of behaviour and makes more connections between data,
organisations utilise this kind of analytics.When fresh issues come up, chances are you
have already gathered some information about the situation. Having the data at your
disposal eliminates the need for duplication of effort and connects all issues.
Predictive Analysis
The goal of predictive analysis is to provide an answer to the query "what is
likely to happen." This kind of analysis is an advancement above descriptive and
diagnostic studies in that it makes predictions about future events based on historical
data. With the help of the data we've compiled, predictive analysis can logically
forecast how events will turn out.
Prescriptive Analysis
The cutting edge of data analysis is called prescriptive analysis, which uses
cutting edge data techniques and technology to synthesise the knowledge from all prior
analyses to suggest a course of action for a given problem or choice. Businesses must
ensure they are prepared and willing to invest the necessary time, energy, and resources
because it is a significant organisational commitment.
There are two forms of data: discrete and continuous. In this case, a discrete
random variable has been taken into consideration. Additionally, an algebraic variable
and a discrete random variable are not the same thing. A discrete random variable
accepts several values, but an algebraic variable only accepts one.
Mean and Variance of a continuous random variable can also be determined with
the help of the probability density function, f(x).
In this example, the dice are used, and the goal is to determine the probability
that each die would result in a four. Remember that in a die there's six sides.
By multiplying each individual event together, we can apply the joint probability
formula that we found to determine the joint probability for this example.
This means that there is a 1/36 chance of rolling two fours using a pair of dice.
Sampling Distributions
A type of probability distribution called a sampling distribution of a statistic is
produced by selecting numerous random samples of a specific size from the same
population. You will better understand how a sample statistic changes between samples
by using these distributions.
Researchers, marketers, statisticians, analysts, and academics can all draw
significant conclusions about particular subjects and data thanks to data. It can assist
governments in planning for services required by a population, or it can assist
enterprises in making decisions about their future and improving their performance.
Types of Distribution
t = [ x - μ ] / [ s / sqrt( n ) ]
In the formula, "x" is the sample mean and "μ" is the population mean and
signifies standard deviation.
Estimations
Any statistical process used to determine a population's value based on
observations made within a sample size taken from that population is called an
estimation. Point and interval estimate are the two categories of estimating.
To determine the true value of an item or function that might be present in a
population, estimation in statistics is utilised. Therefore, figuring out this truth value
can be aided by using a sample of the population.
Point Estimate
A parameter's single value estimate is called a point estimate. A sample mean, for
example, is a point estimate of the population mean.
Interval Estimate
A range of values within which the parameter is anticipated to lie is provided by
an interval estimate. The most popular kind of interval estimate is a confidence interval.
Both types of estimates are important for gathering a clear idea of where a
parameter is likely to lie.
Confidence intervals
An interval estimate for a parameter is produced by a confidence interval using
the variability surrounding a statistic. Because they account for sample error,
confidence intervals are helpful in parameter estimation.
There is a confidence level assigned to each confidence interval. When you repeat
a study, a confidence level indicates the likelihood (in percentage terms) that the
interval containing the parameter estimate will occur.
With random sampling, a 95% confidence interval of [16 22] means you can be
reasonably confident that the average number of vacation days is between 16 and 22.
Correlation Test
Correlation tests determine the extent to which two variables are associated.
Regression Test
Testing for causality between changes in predictor variables and changes in an
outcome variable is done using regression analysis. Which regression test to choose
will depend on how many and what kinds of variables you have as outcomes and
predictors.
Regression tests that are used often are primarily parametric. You can do data
transformations if your data is not regularly distributed.
Using mathematical procedures such as calculating the square root of each number,
data transformations assist you in transforming your data into a regularly distributed
set.
Calculating Regression
The least-squares method is frequently used in linear regression models to find
the line of greatest fit. Minimising the sum of squares produced by a mathematical
function yields the least-squares method. The distance between a data point and the
regression line or mean value of the data set is then squared to determine a square.
Analysis of Variance
The statistical analysis tool known as analysis of variance (ANOVA) divides the
observed aggregate variability present in a data set into two categories: systematic
variables and random factors. On the provided data set, the systematic components
have a statistical impact, but the random factors do not. In a regression research,
analysts employ the ANOVA test to ascertain the impact of independent factors on the
dependent variable.
The first step in examining the factors influencing a particular data set is to run
an ANOVA test. After the test is complete, an analyst tests the systematic aspects that
contribute meaningfully to the inconsistent data set again. The findings of the ANOVA
test are used by the analyst in an f-test to produce more data that supports the suggested
regression models.
One-Way ANOVA
ANOVA comes in two primary varieties: one-way, or linear, and two-way.
ANOVA variants are also available. MANOVA, or multivariate ANOVA, is different
from ANOVA in that it evaluates more than one dependent variable at a time whereas
ANOVA only evaluates one dependent variable at a time. The term "one-way" or "two-
way" describes how many independent variables are included in your analysis of
variance test. An ANOVA done one way assesses the effect of one factor on one
response variable. It establishes if every sample is the same or not. When comparing
the means of three or more independent (unrelated) groups, one-way ANOVA is
performed to see if there are any statistically significant differences.
Two-Way ANOVA
An expansion of the one-way ANOVA is the two-way ANOVA. One
independent variable influences a dependent variable in a one-way relationship. A two-
way ANOVA has two independent variables. A business can compare employee
productivity based on two independent factors, such skill set and salary, by using a two-
way ANOVA, for instance. It is employed to monitor the interplay between the two
variables and assess the simultaneous influence of two variables.
When defining a link between quantitative variables that are thought to have a
linear relationship, correlation and regression are utilised.
Regression
One definition of regression is a measurement that quantifies the impact of
changing one variable on another. Regression analysis is used to determine the
relationship between two variables. Since it is simpler to analyse than the other types
of regression, linear regression is the one that is employed the most frequently. To
determine a link between variables, linear regression is used to locate the line that fits
the data the best.
In order to estimate the value of the unknown variable using the information of
the known factors, regression analysis is used to establish the link between two
variables. The best-fitting line through the data points is the aim of linear regression.
For two variables, x, and y, the regression analysis can be visualized as follows:
Correlation
A measurement that quantifies the link between variables is called correlation. A
direct correlation exists between two variables when an increase or decrease in one
results in a commensurate increase or decrease in the other. In a similar vein, variables
are said to be indirectly associated if a rise in one results in a reduction in another or
vice versa. They are uncorrelated if a change in one of the independent variables has
no effect on the dependent variable.
The purpose of this correlation analysis is to ascertain whether the variables under
the investigation are related. Additionally, a signed numerical number that represents
the correlation's strength and direction is obtained using a correlation coefficient, such
as Pearson's correlation coefficient.
The scatter plot gives the correlation between two variables x and y for
individual data points as shown below.
Principles of experimentation
The three fundamental concepts of randomization, replication, and local control
are used in almost all studies. These three guidelines complement one another in a sense
as they work to improve experiment accuracy and offer a reliable test of significance
while maintaining the unique characteristics of each principle's function in any
experiment. It would be helpful to comprehend the general words used in conceptual
experimental designs as well as the nature of variation among observations in an
experiment before delving into the specifics of these three principles.
It's also intriguing to remember that these errors, which are caused by unrelated
factors and enter the experimental observations, can occur in a systematic or random
manner. Errors resulting from equipment wear and tear, such as a spring balance that
becomes misaligned after frequent use, or errors caused by human fatigue are instances
of systematic error. However, in a related experiment, there was an unpredictable
fluctuation in the amount of leaves gathered in litter traps under different treatments.
This variation was random in nature. While it is quite possible that the random errors
would cancel out with repeated measurements, it is evident that no number of repeated
measurements could overcome systematic error.
Completely randomized design
Randomization is the process of allocating the treatments or factors to be tested
to the experimental units in accordance with certain laws or probability. The precise
technical definition of randomization ensures that systematic error is eliminated.
Additionally, it guarantees that any error component remaining in the observations is
entirely random. This offers a foundation for estimating random fluctuations
accurately, which is crucial for determining the importance of real variations.
REFERENCES:
https://www.fs-technology.com/EN/EDA-
en.html#:~:text=Engineering%20Data%20Analysis%20(EDA)%20is,the%20competit
iveness%20of%20the%20company.
https://engineeringonline.ucr.edu/blog/engineering-data-analysis/
https://www.investopedia.com/terms/d/data-
analytics.asp#:~:text=Data%20analytics%20help%20a%20business,raw%20data%20
for%20human%20consumption.
https://www.simplilearn.com/data-analysis-methods-process-types-
article#:~:text=Data%20analysis%20helps%20businesses%20make,in%20question%
20must%20be%20accurate.
https://sedl.org/pubs/sedl-letter/v22n02/using-
data.html#:~:text=Data%20analysis%20can%20provide%20a,that%20positively%20
affect%20student%20outcomes.
https://www.studysmarter.co.uk/explanations/math/probability-and-
statistics/#:~:text=Probability%20and%20Statistics-
,Probability%20and%20Statistics,process%20in%20making%20scientific%20breakth
roughs.
https://careerfoundry.com/en/blog/data-analytics/what-is-data-analytics/#what-is-
data-analytics
https://www.cuemath.com/algebra/discrete-random-variable/
https://www.investopedia.com/terms/j/jointprobability.asp
https://www.cuemath.com/data/inferential-statistics/
https://www.indeed.com/career-advice/career-development/what-is-sampling-
distribution
https://www.scribbr.com/statistics/inferential-statistics/
https://www.investopedia.com/terms/r/regression.asp#:~:text=%25%2025%25%200
%25-
,What%20Is%20a%20Regression%3F,(known%20as%20independent%20variables).
https://www.investopedia.com/terms/a/anova.asp#:~:text=Analysis%20of%20varianc
e%2C%20or%20ANOVA,the%20dependent%20and%20independent%20variables.
https://www.cuemath.com/data/correlation-and-regression/