Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 13

Data Science

Unit - 1
Definition
• DATA SCIENCE is the area of study which involves extracting
insights from vast amounts of data by the use of various
scientific methods, algorithms, and processes.
• Helps to discover hidden patterns from the raw data.
• Data Science - emerged because of the evolution of
mathematical statistics, data analysis, and big data.
• Data Science is an interdisciplinary field which extract
knowledge from structured or unstructured data.
• Translation of business problem into a research project and
then translate it back into a practical solution.
Why Data Science?
• Data is the oil for today's world.
• With the tools, technologies, algorithms, we can use data and
convert it into a distinctive business advantage
• It helps to detect fraud using advanced machine learning
algorithms also prevents from significant monetary losses
• Allows to build intelligence ability in machines
• It enables you to take better and faster decisions
• Helps you to recommend the right product to the right
customer to enhance your business
Evolution of Data Sciences
WHAT IS A DATA SCIENTIST?
• A data scientist is a professional responsible for collecting,
analyzing and interpreting extremely large amounts of data.
• The data scientist role is an offshoot of several traditional
technical roles.

A Data Scientist can:


– understand the background domain
– design solutions that produce added value to the organization
– implement the solutions efficiently
– communicate the findings clearly (important!)

• Data Scientist is a practitioner with sufficient expertise in


software engineering, statistics/machine learning, and the
application domain
Data Science Process
Data Preparation
• Data preparation is also referred as data wrangling, data
munging or data cleaning.
• The amount of time needed for data preparation for a
particular analysis problem ,directly depends on the health of
the data i.e. how complete it is, how many missing values are
there, how clean it is and what are the inconsistencies.
• Data preparation is a vital step of the data science process for
any valuable insights to pop up, which is why a data scientist
job commands a high pay package, in the industry
The key steps to your data
preparation
• Access Data
• Cleaning Data and Improving Data Quality
• Blending and Reconciling Data
• Transforming and Instantly Re-Formatting Data
• Exporting and Using Your Data
• Expanding Your Connectivity
• Repeating Tasks with Automation
What is Data Exploration?
• Data exploration refers to the initial step in data analysis in
which data analysts use data visualization and statistical
techniques.
• It describes dataset characterizations, such as size, quantity,
and accuracy, in order to better understand the nature of the
data.
• Data exploration techniques include both manual analysis and
automated data exploration software solutions
• These are visually explore and identify relationships between
different data variables, the structure of the dataset, the
presence of outliers, and the distribution of data to reveal
patterns and points of interest, enabling data analysts to gain
greater insight into the raw data.
Data Exploration Tools

• Manual data exploration methods - writing scripts to analyze


raw data or manually filtering data into spreadsheets.
• Automated data exploration tools - data visualization
software, help data scientists easily monitor data sources and
perform big data exploration on otherwise overwhelmingly
large datasets.
• Graphical displays of data, such as bar charts and scatter plots,
are valuable tools in visual data exploration.
• A popular tool for manual data exploration - Microsoft Excel
spreadsheets
• Which can be used to create basic charts for data exploration,
to view raw data, and to identify the correlation between
variables.
Data Science Modelling
• A data model determines how data is exposed to the end user.
Optimally creating and structuring database tables to answer
business questions is the desired role of data modeling,
setting the stage for the best data analysis possible by
exposing the end user to the most relevant data they require.
• Data Modelling Makes Analysis Easier
• The fundamental objective of data modelling is to only expose
data that holds value for the end user.
• Clearly delineating what questions a table should answer is
essential, and deciding on how different types of data will be
modelled creates optimal conditions for data analysis
The key phases in building a data science
model
• Set the objectives
• Communicate with key stakeholders
• Collect the necessary data for exploratory data analysis (EDA)
• Determine the functional form of the model
• Split the data into training and validation
• Assess the model performance
• Deploy the model for real-time prediction
• Re-build the model

You might also like