Professional Documents
Culture Documents
Eda Critique Paper
Eda Critique Paper
SUBMITTED BY :
BSCE-2B
What is Statistics?
Conclusion
INTRODUCTION
Data Mining and Statistics are two universal terms in this domain. Data Mining is about looking
deep into data to derive hidden patterns. It involves using a variety of techniques, including
business intelligence developer, or business analyst with data exposure. Tools and techniques are
modeling. Statistics is the science of analysis and interpretation of numeric data. It involves
drawing conclusions based on a small amount of data and then extending it to the whole
population. Hypothesis testing helps establish the validity of results found on smaller data to the
SUMMARY
Data mining is about looking deep into data to derive hidden patterns. Data in this context can be
anything: natural language sentences, images, or numeric data. Data Mining involves using a
In the earlier days, Data Mining used to be a manual process, but with the advent of cheap
Numerous tools are available to mine data, including statistical and visualization frameworks. A
Data mining professional usually has exposure to tools related to storage, exploration,
visualization, and statistics. Even a database with good querying ability is a productive tool for
Data Mining can be divided into the below concepts on a high level.
Grouping Data According to Patterns: This involves techniques like clustering and
classification. Clustering group data without prior knowledge of the number of output groups.
Finding Anomalies: Extracting data that is significantly different from other data points in the
set is required to establish patterns. Concepts like Normal distribution and statistical rules are
Deriving Relationships: Extracting cause and effect relationships can be done statistically.
Predictive Modeling: While it may seem like an entirely different concept compared to Data
Mining, predictive modeling is often used to uncover insights like reasons for specific customer
Verifying results obtained through data mining is usually done using a statistical technique called
hypothesis testing. Hypothesis testing helps one establish the validity of results found on smaller
Since Data mining often involves dealing with personal information and deriving patterns, it
WHAT IS STATISTICS?
Statistics is the science of analysis and interpretation of numeric data. It is considered a part of
applied mathematics. Using Statistics generally involves drawing conclusions based on a small
amount of data and then extending it to the whole population. Population in a statistical sense is
the total data where something is applicable. A sample is a subset of the population where an
data in terms of different metrics. These metrics could be aggregation metrics like mean, median,
or mode. Or it could be metrics related to variation in data like standard deviation, range, etc.
Distribution is another term that is generally used with descriptive statistics. It denotes the shape
of the data and forms the basis of defining properties like probability distribution functions.
Inferential Statistics is the method of using descriptive statistics to form deductions about the
sample and then extending it to the whole population. It relies on probability distributions and
makes deductions based on it. Hypothesis testing is a critical part of inferential statistics.
Hypothesis testing establishes how well the sample represents the population and the degree of
An example of this could be using a simple survey among a small percentage of your customers
about a product feature and generalizing the results to the whole set of people who uses the
product.
Now that we understand the basics of what Data Mining and Statistics is, let us explore how
Data Mining is the process of deriving useful insights from data. Statistics is the science of
collecting, analyzing, and interpreting data. Statistics can be one of the methods that are used in
data mining.
Statistics is concerned with quantitative data only while Data Mining deals with any kind of data.
Deriving numeric metrics out of data is often the first step of using statistics on it.
The final result of data mining is often a prediction method, while for statistics, this is more
about deducing something based on probability distributions. Data Mining is often exploratory in
Heuristics are thumb rules that are formed based on the knowledge of a domain. Heuristics are
very important in data mining and often form the base of exploration. Statistics is about negating
all heuristics and interpreting data only on the basis of mathematical evidence and probability.
Collecting data and cleaning is an important part of statistics. Data Mining is supposed to work
with virtually any kind of data and does not put much emphasis on the collection of data. It is
more about working with available data than defining strategies for collecting data/
A Data Mining expert must be aware of tools and techniques used in data storage, exploration,
and visualization. This means he must be an expert in a wide range of tools. For storage, it could
be anything from a simple relational database to a completely managed flat-file storage like S3.
Even NoSQL databases are important for a data mining professional. Data exploration tools like
SQL and processing frameworks like Spark are also important for Data Mining. Visualization
tools like Tableau, PowerBI, etc help him present the results. And Last but not least, Data Miner
A Statistician works with open source or proprietary tools that help him compute descriptive
statistics and derive inferences. This includes open-source tools like R or scikit learn and
proprietary tools like SAS, SPSS, minitab, etc. Even a spreadsheet tool like Microsoft Excel or
CRITIQUE
The whole article served its purpose of explaning and differentiating Data Mining and
Statistics.The structure and flow was accurate and fact friendly as the comparison of the two
were very simple yet complex.Data Mining is about looking deep into data to derive hidden
patterns. Data in this context can be anything: natural language sentences, images, or numeric
data. Statistics is the science of analysis and interpretation of numeric data. It is considered a part
of applied mathematics.These two sentences were very self explanatory as it was said in the
article that Data Mining is generalized form of data gathering while Statistics are specificied
CONCLUSION
We have now learned about the basics of Data Mining and Statistics. As discussed Data Mining
and Statistics are different concepts on their own. While Data Mining is the exploration of data
to derive insights, statistics is the science of interpreting data. Statistics is a core part of Data
mining, but they are not the same. Data Mining employs statistical techniques to derive
prediction models or confirm results, but it is much more than statistics and includes storage,