Professional Documents
Culture Documents
Batch 2 - PE31 - Data Preprocessing and Visualisation
Batch 2 - PE31 - Data Preprocessing and Visualisation
Problem Statement: To revise the pre-requisite of DL and Explore the Wine Quality dataset.
Objectives:
1. To understand the basics & Pre-requisite of Deep Learning (DL).
2. To analyze the functions for exploring the Wine Quality Dataset.
Theory:
● Data Collection
Data collection refers to the process of gathering information from various sources. There are
several techniques associated with data collection, including:
o Experiments: This involves manipulating one or more variables to observe the effect on a
particular outcome.
o Case studies: This involves gathering detailed information about a specific individual,
group, or organization to understand a particular phenomenon.
o Focus groups: This involves gathering a small group of people together to discuss a
specific topic or product.
o Self-report measures: This involves asking individuals to report on their own behavior,
attitudes, or experiences.
o Transactional data: This involves collecting data from financial transactions, such as
purchase history.
o Web scraping: This is a way to extract large amount of data from websites, fast and
automatically.
o Social media scraping: This is similar to web scraping but targeting social media
platforms, like twitter, Facebook, Instagram.
● Pre-processing of Data
Data pre-processing refers to the techniques used to prepare raw data for analysis. It is an
important step in the data science process as it helps ensure that the data is clean, consistent,
and ready for analysis. Some common techniques associated with data pre-processing include:
o Data cleaning: This involves identifying and correcting errors, inconsistencies, and
missing values in the data.
o Data integration: This involves combining data from multiple sources to create a unified
dataset.
o Data transformation: This involves converting data into a format that can be easily
analysed, such as normalizing or scaling the data.
o Data reduction: This involves reducing the complexity of the data by selecting a subset
of the most relevant features for analysis.
o Data normalization: This involves transforming the values of numeric variables so that
they have a mean of 0 and a standard deviation of 1.
o Data standardization: This involves transforming the values of numeric variables so that
they have a mean of 0 and a standard deviation of 1.
o Data augmentation: This refers to the technique of creating new data samples by
applying random transformations to existing samples.
o Data encoding: This refers to the process of converting categorical variables into
numerical values.
o Data hashing: This refers to the process of converting data into a fixed-length numerical
representation, so that it can be used as an input to machine learning algorithms.
● Statistical Analysis
Statistical analysis is the use of statistical methods to collect, organize, analyze, interpret and
present data. It is used to make inferences about a population based on a sample, and to test
hypotheses about relationships between variables. Some common techniques associated with
statistical analysis include:
o Descriptive statistics: This involves summarizing and describing the data using measures
such as mean, median, mode, standard deviation, and frequency distributions.
Program code:
Dataset used:
https://archive.ics.uci.edu/ml/datasets/wine+quality
Output:
FAQs:
Conclusion:
The pre-requisite of DL was studied and the implementation was performed for analysing Covid-19
dataset.