Eda Critique Paper

CRITIQUE PAPER
ENGINEERING DATA ANALYSIS
(MWF 2:30-4:00 PM)
SUBMITTED BY :
ANNA MIKAELA DC SANCHEZ
BSCE-2B
SUBMITTED TO : ENGR. JOEL MOLINA
DATA MINING VS STATISTICS : 7 CRITICAL DIFFERENCES

TABLE OF CONTENTS
 What is Data Mining?
 What is Statistics?
 Data Mining vs Statistics: Key Differences
o Data Mining vs Statistics: Deriving Insights and Interpreting Data
o Data Mining vs Statistics: Quantitative and Generic Input
o Data Mining vs Statistics: Exploring Data and Formalizing thoughts
o Data Mining vs Statistics: Importance of Domain Knowledge
o Data Mining vs Statistics: Focus on Data Collection
o Data Mining vs Statistics: Tools and Techniques
 Conclusion
INTRODUCTION
Data Mining and Statistics are two universal terms in this domain. Data Mining is about looking
deep into data to derive hidden patterns. It involves using a variety of techniques, including
domain understanding and mathematical rules. It is usually performed by a Data scientist,
business intelligence developer, or business analyst with data exposure. Tools and techniques are
used to mine data, such as statistical and visualization frameworks.

Data Mining can be divided into four concepts on a high level. Data mining involves grouping
data according to patterns, finding anomalies, determining relationships, and predictive
modeling. Statistics is the science of analysis and interpretation of numeric data. It involves
drawing conclusions based on a small amount of data and then extending it to the whole
population. Hypothesis testing helps establish the validity of results found on smaller data to the
larger outside world.
SUMMARY
WHAT IS DATA MINING?
Data mining is about looking deep into data to derive hidden patterns. Data in this context can be
anything: natural language sentences, images, or numeric data. Data Mining involves using a
variety of techniques, including domain understanding and mathematical rules.
In the earlier days, Data Mining used to be a manual process, but with the advent of cheap
processing power, it has become a semi-automatic process. It is usually performed by a Data
scientist, business intelligence developer, or business analyst with data exposure.
Numerous tools are available to mine data, including statistical and visualization frameworks. A
Data mining professional usually has exposure to tools related to storage, exploration,
visualization, and statistics. Even a database with good querying ability is a productive tool for
an expert data miner.
Data Mining can be divided into the below concepts on a high level.
Grouping Data According to Patterns: This involves techniques like clustering and
classification. Clustering group data without prior knowledge of the number of output groups.
Classification attempts to categorize data points to one of the predefined labels.
Finding Anomalies: Extracting data that is significantly different from other data points in the
set is required to establish patterns. Concepts like Normal distribution and statistical rules are
employed to extract anomalies.
Deriving Relationships: Extracting cause and effect relationships can be done statistically.
Association rule learning is commonly used to accomplish this.
Predictive Modeling: While it may seem like an entirely different concept compared to Data
Mining, predictive modeling is often used to uncover insights like reasons for specific customer
behavior and estimate other unknown outcomes.
Verifying results obtained through data mining is usually done using a statistical technique called
hypothesis testing. Hypothesis testing helps one establish the validity of results found on smaller
data to the larger outside world.
Since Data mining often involves dealing with personal information and deriving patterns, it
usually raises questions regarding legality and ethics.
WHAT IS STATISTICS?
Statistics is the science of analysis and interpretation of numeric data. It is considered a part of
applied mathematics. Using Statistics generally involves drawing conclusions based on a small
amount of data and then extending it to the whole population. Population in a statistical sense is
the total data where something is applicable. A sample is a subset of the population where an
experiment or observation is conducted. Statistics can be divided into two on a high level.

Descriptive statistics and Inferential Statistics. Descriptive statistics focuses on summarizing the
data in terms of different metrics. These metrics could be aggregation metrics like mean, median,
or mode. Or it could be metrics related to variation in data like standard deviation, range, etc.
Distribution is another term that is generally used with descriptive statistics. It denotes the shape
of the data and forms the basis of defining properties like probability distribution functions.
Inferential Statistics is the method of using descriptive statistics to form deductions about the
sample and then extending it to the whole population. It relies on probability distributions and
makes deductions based on it. Hypothesis testing is a critical part of inferential statistics.
Hypothesis testing establishes how well the sample represents the population and the degree of
validity of extending sample results to population results.
An example of this could be using a simple survey among a small percentage of your customers
about a product feature and generalizing the results to the whole set of people who uses the
product.
Data Mining vs Statistics: Key Differences
Now that we understand the basics of what Data Mining and Statistics is, let us explore how
these are different from each other.
Data Mining vs Statistics: Deriving Insights and Interpreting Data
Data Mining vs Statistics: Quantitative and Generic Input
Data Mining vs Statistics: Exploring Data and Formalizing thoughts
Data Mining vs Statistics: Importance of Domain Knowledge
Data Mining vs Statistics: Focus on Data Collection
Data Mining vs Statistics: Tools and Techniques
Data Mining vs Statistics: Deriving Insights and Interpreting Data

As evident from the sections above, Data Mining and Statistics are entirely different concepts.
Data Mining is the process of deriving useful insights from data. Statistics is the science of
collecting, analyzing, and interpreting data. Statistics can be one of the methods that are used in
data mining.
Data Mining vs Statistics: Quantitative and Generic Input
Statistics is concerned with quantitative data only while Data Mining deals with any kind of data.
Deriving numeric metrics out of data is often the first step of using statistics on it.
Data Mining vs Statistics: Exploring Data and Formalizing thoughts
The final result of data mining is often a prediction method, while for statistics, this is more
about deducing something based on probability distributions. Data Mining is often exploratory in
nature. Statistics is about confirming hypotheses.
Data Mining vs Statistics: Importance of Domain Knowledge
Heuristics are thumb rules that are formed based on the knowledge of a domain. Heuristics are
very important in data mining and often form the base of exploration. Statistics is about negating
all heuristics and interpreting data only on the basis of mathematical evidence and probability.
Data Mining vs Statistics: Focus on Data Collection
Collecting data and cleaning is an important part of statistics. Data Mining is supposed to work
with virtually any kind of data and does not put much emphasis on the collection of data. It is
more about working with available data than defining strategies for collecting data/
Data Mining vs Statistics: Tools and Techniques
A Data Mining expert must be aware of tools and techniques used in data storage, exploration,
and visualization. This means he must be an expert in a wide range of tools. For storage, it could
be anything from a simple relational database to a completely managed flat-file storage like S3.
Even NoSQL databases are important for a data mining professional. Data exploration tools like
SQL and processing frameworks like Spark are also important for Data Mining. Visualization
tools like Tableau, PowerBI, etc help him present the results. And Last but not least, Data Miner
must also have some background in statistics.
A Statistician works with open source or proprietary tools that help him compute descriptive
statistics and derive inferences. This includes open-source tools like R or scikit learn and
proprietary tools like SAS, SPSS, minitab, etc. Even a spreadsheet tool like Microsoft Excel or
Open Office is a potent tool for statisticians.
CRITIQUE
The whole article served its purpose of explaning and differentiating Data Mining and
Statistics.The structure and flow was accurate and fact friendly as the comparison of the two
were very simple yet complex.Data Mining is about looking deep into data to derive hidden
patterns. Data in this context can be anything: natural language sentences, images, or numeric
data. Statistics is the science of analysis and interpretation of numeric data. It is considered a part
of applied mathematics.These two sentences were very self explanatory as it was said in the
article that Data Mining is generalized form of data gathering while Statistics are specificied
datas gather to conclude in general.
CONCLUSION
We have now learned about the basics of Data Mining and Statistics. As discussed Data Mining
and Statistics are different concepts on their own. While Data Mining is the exploration of data
to derive insights, statistics is the science of interpreting data. Statistics is a core part of Data
mining, but they are not the same. Data Mining employs statistical techniques to derive
prediction models or confirm results, but it is much more than statistics and includes storage,
exploration, visualization etc.

Eda Critique Paper

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Eda Critique Paper

Uploaded by

Copyright:

Available Formats

CRITIQUE PAPER

ENGINEERING DATA ANALYSIS

(MWF 2:30-4:00 PM)

ANNA MIKAELA DC SANCHEZ

SUBMITTED TO : ENGR. JOEL MOLINA

DATA MINING VS STATISTICS : 7 CRITICAL DIFFERENCES

 What is Data Mining?

 Data Mining vs Statistics: Key Differences

o Data Mining vs Statistics: Deriving Insights and Interpreting Data

o Data Mining vs Statistics: Quantitative and Generic Input

o Data Mining vs Statistics: Exploring Data and Formalizing thoughts

o Data Mining vs Statistics: Importance of Domain Knowledge

o Data Mining vs Statistics: Focus on Data Collection

o Data Mining vs Statistics: Tools and Techniques

domain understanding and mathematical rules. It is usually performed by a Data scientist,

used to mine data, such as statistical and visualization frameworks.

data according to patterns, finding anomalies, determining relationships, and predictive

larger outside world.

WHAT IS DATA MINING?

variety of techniques, including domain understanding and mathematical rules.

processing power, it has become a semi-automatic process. It is usually performed by a Data

scientist, business intelligence developer, or business analyst with data exposure.

an expert data miner.

Classification attempts to categorize data points to one of the predefined labels.

employed to extract anomalies.

Association rule learning is commonly used to accomplish this.

behavior and estimate other unknown outcomes.

data to the larger outside world.

usually raises questions regarding legality and ethics.

experiment or observation is conducted. Statistics can be divided into two on a high level.

validity of extending sample results to population results.

Data Mining vs Statistics: Key Differences

these are different from each other.

Data Mining vs Statistics: Deriving Insights and Interpreting Data

Data Mining vs Statistics: Quantitative and Generic Input

Data Mining vs Statistics: Exploring Data and Formalizing thoughts

Data Mining vs Statistics: Importance of Domain Knowledge

Data Mining vs Statistics: Focus on Data Collection

Data Mining vs Statistics: Tools and Techniques

Data Mining vs Statistics: Deriving Insights and Interpreting Data

Data Mining vs Statistics: Quantitative and Generic Input

Data Mining vs Statistics: Exploring Data and Formalizing thoughts

nature. Statistics is about confirming hypotheses.

Data Mining vs Statistics: Importance of Domain Knowledge

Data Mining vs Statistics: Focus on Data Collection

Data Mining vs Statistics: Tools and Techniques

must also have some background in statistics.

Open Office is a potent tool for statisticians.

datas gather to conclude in general.

exploration, visualization etc.

You might also like