Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 10


(AIT 120)


Dr. Varun Malik Name: Parth Khatter

Assistant Professor Roll No: 1955991626


UNIT 1: Introduction to Data and Data Science

1.1 Data Science

Data science is the field of study that combines domain expertise, programming skills, and
knowledge of mathematics and statistics to extract meaningful insights from data. Data
science practitioners apply machine learning algorithms to numbers, text, images, video,
audio, and more to produce artificial intelligence (AI) systems to perform tasks that ordinarily
require human intelligence. In turn, these systems generate insights which analysts and
business users can translate into tangible business value.

1.2 Types of variables

The data source is vast. We may find different types of data from different data sources. But it
is important to know about the characteristics of the data. And variables define the
characteristics of data. There are some parameters by which we can easily divide or categorize
the variables. Basically there are two types of variables.

Mainly two variable types are 

i) categorical and ii) numerical

i. Categorical: Categorical variables represent types of data which may be divided into
groups. It is also known as qualitative variable.

Examples: Car Brand is a categorical variable that holds categorical data like Audi, Toyota,
BMW, etc. Answer is a categorical variable that holds categorical data yes/no.

ii. Numerical: All the variables representing in number is known as the numerical variable. It
is also known as quantitative variable. It can be a) discrete, or b) continuous.

a) Discrete: In a simple way, all the variables which contain countable data is known as the
discrete variable.

Example: Variable -No of Children, SAT Score, Population etc., all of these variables contain
discrete data.
b) Continuous: The variable which is uncountable is known as the continuous variable. It
takes forever to count and the counting process will never end.

1.3 Data Analytics

As the process of analysing raw data to find trends and answer questions, the definition of
data analytics captures its broad scope of the field. However, it includes many techniques
with many different goals.

The data analytics process has some components that can help a variety of initiatives. By
combining these components, a successful data analytics initiative will provide a clear picture
of where you are, where you have been and where you should go.

1.3.1 Types of Data Analytics

Data analytics is a broad field. There are four primary types of data analytics: descriptive,
diagnostic, predictive and prescriptive analytics. Each type has a different goal and a different
place in the data analysis process. These are also the primary data analytics applications in

 Descriptive analytics helps answer questions about what happened. These techniques
summarize large datasets to describe outcomes to stakeholders. By developing key
performance indicators (KPIs,) these strategies can help track successes or failures.
Metrics such as return on investment (ROI) are used in many industries. Specialized
metrics are developed to track performance in specific industries. This process
requires the collection of relevant data, processing of the data, data analysis and data
visualization. This process provides essential insight into past performance.
 Diagnostic analytics helps answer questions about why things happened. These
techniques supplement more basic descriptive analytics. They take the findings from
descriptive analytics and dig deeper to find the cause. The performance indicators are
further investigated to discover why they got better or worse. This generally occurs in
three steps:

 Identify anomalies in the data. These may be unexpected changes in a metric

or a particular market.
 Data that is related to these anomalies is collected.
 Statistical techniques are used to find relationships and trends that explain
these anomalies.
 Predictive analytics helps answer questions about what will happen in the future.
These techniques use historical data to identify trends and determine if they are likely
to recur. Predictive analytical tools provide valuable insight into what may happen in
the future and its techniques include a variety of statistical and machine learning
techniques, such as: neural networks, decision trees, and regression.
 Prescriptive analytics helps answer questions about what should be done. By using
insights from predictive analytics, data-driven decisions can be made. This allows
businesses to make informed decisions in the face of uncertainty. Prescriptive
analytics techniques rely on machine learning strategies that can find patterns in large
datasets. By analysing past decisions and events, the likelihood of different outcomes
can be estimated.

1.4 Data Analysis

Data analysis is defined as a process of cleaning, transforming, and modelling data to
discover useful information for business decision-making. The purpose of Data Analysis is to
extract useful information from data and taking the decision based upon the data analysis.
A simple example of Data analysis is whenever we take any decision in our day-to-day life is
by thinking about what happened last time or what will happen by choosing that particular
decision. This is nothing but analysing our past or future and making decisions based on it.
For that, we gather memories of our past or dreams of our future. So that is nothing but data
analysis. Now same thing analyst does for business purposes, is called Data Analysis.

1.4.1 Types of Data Analysis: Techniques and Methods

There are several types of Data Analysis techniques that exist based on business and
technology. However, the major Data Analysis methods are:

 Text Analysis
 Statistical Analysis
 Diagnostic Analysis
 Predictive Analysis
 Prescriptive Analysis
1.4.2 Data Analysis Tools

1.4.3 Data Analysis Process

The Data Analysis Process is nothing but gathering information by using a proper
application or tool which allows you to explore the data and find a pattern in it. Based on that
information and data, you can make decisions, or you can get ultimate conclusions.
Data Analysis consists of the following phases:

 Data Requirement Gathering

 Data Collection
 Data Cleaning
 Data Analysis
 Data Interpretation
 Data Visualization

1.5 Data Mining

Data mining is a process used by companies to turn raw data into useful information. By
using software to look for patterns in large batches of data, businesses can learn more about
their customers to develop more effective marketing strategies, increase sales and decrease
costs. Data mining depends on effective data collection, warehousing, and computer

1.6 Big Data

Big Data is essentially a special application of data science, in which the data sets are
enormous and require overcoming logistical challenges to deal with them. The primary
concern is efficiently capturing, storing, extracting, processing, and analysing information
from these enormous data sets.
Processing and analysis of these huge data sets is often not feasible or achievable due to
physical and/or computational constraints. Special techniques and tools (e.g., software,
algorithms, parallel programming, etc.) are therefore required.

Big Data is the term that is used to encompass these large data sets, specialized techniques,
and customized tools. It is often applied to large data sets in order to perform general data
analysis and find trends, or to create predictive models.

1.7 Business Intelligence

Business intelligence (BI) combines business analytics, data mining, data visualization, data
tools and infrastructure, and best practices to help organizations to make more data-driven
decisions. In practice, you know you’ve got modern business intelligence when you have a
comprehensive view of your organization’s data and use that data to drive change, eliminate
inefficiencies, and quickly adapt to market or supply changes.

It’s important to note that this is a very modern definition of BI—and BI has had a strangled
history as a buzzword. Traditional Business Intelligence, capital letters and all, originally
emerged in the 1960s as a system of sharing information across organizations. It further
developed in the 1980s alongside computer models for decision-making and turning data into
insights before becoming specific offering from BI teams with IT-reliant service solutions.
Modern BI solutions prioritize flexible self-service analysis, governed data on trusted
platforms, empowered business users, and speed to insight. This article will serve as an
introduction to BI and is the tip of the iceberg.

1.8 Artificial Intelligence

In the simplest terms, AI which stands for artificial intelligence refers to systems or machines
that mimic human intelligence to perform tasks and can iteratively improve themselves based
on the information they collect. AI manifests in a number of forms. A few examples are:

 Chatbots use AI to understand customer problems faster and provide more efficient
 Intelligent assistants use AI to parse critical information from large free-text datasets to
improve scheduling
 Recommendation engines can provide automated recommendations for TV shows based
on users’ viewing habits
AI is much more about the process and the capability for super powered thinking and data
analysis than it is about any particular format or function. Although AI brings up images of
high-functioning, human-like robots taking over the world, AI isn’t intended to replace
humans. It’s intended to significantly enhance human capabilities and contributions. That
makes it a very valuable business asset.
1.9 Machine Learning
Machine learning (ML) is a type of artificial intelligence (AI) that allows software
applications to become more accurate at predicting outcomes without being explicitly
programmed to do so. Machine learning algorithms use historical data as input to predict new
output values.

Recommendation engines are a common use case for machine learning. Other popular uses
include fraud detection, spam filtering, malware threat detection, business process
automation (BPA) and Predictive maintenance.

1.9.1 Types of Machine Learning

 Supervised learning: In this type of machine learning, data scientists supply algorithms

with labelled training data and define the variables they want the algorithm to assess for
correlations. Both the input and the output of the algorithm is specified.
 Unsupervised learning: This type of machine learning involves algorithms that train on
unlabelled data. The algorithm scans through data sets looking for any meaningful
connection. The data that algorithms train on as well as the predictions or
recommendations they output are predetermined.
 Semi-supervised learning: This approach to machine learning involves a mix of the two
preceding types. Data scientists may feed an algorithm mostly labelled training data, but
the model is free to explore the data on its own and develop its own understanding of the
data set.
 Reinforcement learning: Data scientists typically use reinforcement learning to teach a
machine to complete a multi-step process for which there are clearly defined rules. Data
scientists program an algorithm to complete a task and give it positive or negative cues as
it works out how to complete a task. But for the most part, the algorithm decides on its
own what steps to take along the way.

1.10 Applications of Data Science

1. In Search Engines
The most useful application of Data Science is Search Engines. As we know when we want
to search for something on the internet, we mostly used Search engines like Google, Yahoo,
Safari, Firefox, etc. So Data Science is used to get Searches faster.
For Example, When we search something suppose “Data Structure and algorithm courses ”
then at that time on the Internet Explorer we get the first link of GeeksforGeeks Courses.
This happens because the GeeksforGeeks website is visited most in order to get information
regarding Data Structure courses and Computer related subjects. So this analysis is Done
using Data Science, and we get the Topmost visited Web Links.

2. In Transport
Data Science also entered into the Transport field like Driverless Cars. With the help of
Driverless Cars, it is easy to reduce the number of Accidents.
For Example, In Driverless Cars the training data is fed into the algorithm and with the
help of Data Science techniques, the Data is analysed like what is the speed limit in
Highway, Busy Streets, Narrow Roads, etc. And how to handle different situations while
driving etc.

3. In Finance
Data Science plays a key role in Financial Industries. Financial Industries always have an
issue of fraud and risk of losses. Thus, Financial Industries needs to automate risk of loss
analysis in order to carry out strategic decisions for the company. Also, Financial Industries
uses Data Science Analytics tools in order to predict the future. It allows the companies to
predict customer lifetime value and their stock market moves. 
For Example, In Stock Market, Data Science is the main part. In the Stock Market, Data
Science is used to examine past behaviour with past data and their goal is to examine the
future outcome. Data is analysed in such a way that it makes it possible to predict future
stock prices over a set timetable.

4. In E-Commerce
E-Commerce Websites like Amazon, Flipkart, etc. uses data Science to make a better user
experience with personalized recommendations.
For Example, When we search for something on the E-commerce websites we get
suggestions similar to choices according to our past data and also we get recommendations
according to most buy the product, most rated, most searched, etc. This is all done with the
help of Data Science.

5. In Health Care
In the Healthcare Industry data science act as a boon. Data Science is used for:
 Detecting Tumor.
 Drug discoveries.
 Medical Image Analysis.
 Virtual Medical Bots.
 Genetics and Genomics.
 Predictive Modelling for Diagnosis etc.
6. Image Recognition
Currently, Data Science is also used in Image Recognition. 

For Example, When we upload our image with our friend on Facebook, Facebook gives
suggestions Tagging who is in the picture. This is done with the help of machine learning
and Data Science. When an Image is Recognized, the data analysis is done on one’s
Facebook friends and after analysis, if the faces which are present in the picture matched
with someone else profile then Facebook suggests us auto-tagging.  

7. Targeting Recommendation
Targeting Recommendation is the most important application of Data Science. Whatever
the user searches on the Internet, he/she will see numerous posts everywhere. This can be
explained properly with an example: Suppose I want a mobile phone, so I just Google
search it and after that, I changed my mind to buy offline. Data Science helps those
companies who are paying for Advertisements for their mobile. So everywhere on the
internet in the social media, in the websites, in the apps everywhere I will see the
recommendation of that mobile phone which I searched for. So this will force me to buy
8. Airline Routing Planning
With the help of Data Science, Airline Sector is also growing like with the help of it, it
becomes easy to predict flight delays. It also helps to decide whether to directly land into
the destination or take a halt in between like a flight can have a direct route from Delhi to
the U.S.A or it can halt in between after that reach at the destination.
9. Data Science in Gaming
In most of the games where a user will play with an opponent i.e. a Computer Opponent,
data science concepts are used with machine learning where with the help of past data the
Computer will improve its performance. There are many games like Chess, EA Sports, etc.
will use Data Science concepts.
10. Medicine and Drug Development
The process of creating medicine is very difficult and time-consuming and has to be done
with full disciplined because it is a matter of Someone’s life. Without Data Science, it takes
lots of time, resources, and finance or developing new Medicine or drug but with the help
of Data Science, it becomes easy because the prediction of success rate can be easily
determined based on biological data or factors. The algorithms based on data science will
forecast how this will react to the human body without lab experiments.
11. In Delivery Logistics

Various Logistics companies like DHL, FedEx, etc. make use of Data Science. Data
Science helps these companies to find the best route for the Shipment of their Products, the
best time suited for delivery, the best mode of transport to reach the destination, etc.
12. Autocomplete

AutoComplete feature is an important part of Data Science where the user will get the
facility to just type a few letters or words, and he will get the feature of auto-completing the
line. In Google Mail, when we are writing formal mail to someone so at that time data
science concept of Autocomplete feature is used where he/she is an efficient choice to auto-
complete the whole line.  Also in Search Engines in social media, in various apps,
AutoComplete feature is widely used.
13. Finds a solution for social issues

Social issues such as poverty, unemployment, corruption, and high mortality rates are
significant concerns globally. With the help of the data, entrepreneurs and organizations can
implement solutions to tackle such serious problems. Data science provides practical ways to
explore social issues that have been a concern for many years. Data science consultants are
in demand to help assist with various social problems worldwide. Data science aids in
improving communication between people. This is because it allows organizations to gain
access to information that may be normally hidden from them. Analysts can analyse the
information gathered with the help of tools and statistics. In addition, the data can then be put
into practice to build effective solutions.

14. Helps predict climate change

Climate change has become a severe concern impacting all forms of life on Earth. Data
visualization and machine learning are two popular methods to study the effects of climate
change. Data visualization, for example, plays an essential role in helping individuals
understand climate change. It provides data scientists the means to engage the audience and
highlight the dangers of global warming. Climate control is becoming the need for the hour,
and data science can predict the effects of climate change on land, marine life, and the
ecosystem. The predictions provided by data scientists guide institutions and
environmentalists to implement measures that may potentially lessen the effects of global

You might also like