Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 39

Introduction

to
Data Science

Work is Worship 1
LEARNING OBJECTIVES

The participants will be able to:

 define data and information.


 discuss different types of data.
 compare the types of data.
 demonstrate the structure of data in python.
 understand the DIKW model
 know about data science and careers in data science.
 learn about data science in python.

Work is Worship 2
What is Data and Information

 Raw facts and figures.


 Collection of raw facts and figures is information.

 Data can be defined as facts or information which when


stored, can be used as a basis for decision making, calculation
or discussion.
 Information is organized or classified data, which has some
meaningful values.
Work is Worship 3
Forms of Data
Data can be categorized into three forms:

Structured Structured data is the information that has been


Data formatted and transformed into a well-defined data
model.

Semi Semi-structured data or partially structured data is


Structured another category between structured and unstructured
Data data.

Unstructured data is defined as data present in


Unstructured
Data absolute raw form. This data is difficult to process due
to its complex arrangement and formatting.
Work is Worship 4
Forms of Data
Unstructured data is Semi- structured data Structured data is
not organized. is unorganized as it organized.
does not follow any
data structure.

Work is Worship 5
ACTIVITY
“Group the given data into Structured and Unstructured Data”.

1. Product ids 12. Audio and Video Communication


2. MP3 13. Pricing data
3. Phone Numbers 14. Text Files
4. Credit Card Numbers 15. Reports of Training
5. Customer account data 16. Customer Account Data
6. Doctor’s Notes 17. X-Ray and MRI Scans
7. Email Messages 18. Patient Forms
8. Biometrics Data 19. JPG
9. Financial Transactions 20. Call logs and Web logs
10. PDF 21. CSV
11. Medical Insurance Data 22. XML
Work is Worship 6
ANSWERS
STRUCTURED DATA UNSTRUCTURED DATA
Product ids Biometrics Data
Pricing data X-Ray and MRI Scans
Customer account data Doctor’s Notes
Patient Forms Call logs and Web logs
Medical Insurance Data Audio and Video Communication
Financial Transactions PDF
Customer Account Data JPG
CSV MP3
XML Text Files
Phone Numbers Reports of Training
Credit Card Numbers Email Messages

Work is Worship 7
How to Structure Data in Python?
Example:
List1 = [80, 85, 90, 95, 100, 105, 110, 115, 120, 125]
print(List1)

#Finding the largest element in a list.


list1 = [34, 12, 56, 78, 23, 45]
print("The largest element in the list is:", max(list1))

Work is Worship 8
Types of Data
Data comes in different types.
Ex: Text ,Image, Video, Numbers, Spreadsheets, Sound.

Qualitative Data: Qualitative data is the data which is a descriptive


information.
Ex: Names, Smell, Taste, Colours, "What a nice day it is“? ,

Quantitative Data: Quantitative Data is the data that is in numerical


information.
Ex: 1, 3.65, Shoe Size, Weight of Students, Salary of Employees.

Work is Worship 9
Types of Data

Quantitative Data
Continuous
Discrete Data
Data is
is Countable
Measurable
Discrete Continuous
Data Data

Discrete Data Can be expressed as a Continuous Data Can be any


specific value. value in an interval
Ex: Ex:
Number of months in a year. Time , Age of members in a
Waist Size. family, Temperature.
Number of Principal in a school
Work is Worship 10
Real world Examples and influence of data in our lives

 Weather Data
 Education
 Stock Market Data
 Internet Search
 Health Records
 Entertainment
 Social Media Posts
 Augmented Furniture Shopping
 E-Commerce
 Casino Gambling
 GPS Data
 Magic Shows
 Census Data
 Traffic Data
 Energy Consumption Data
 Banking
Work is Worship 11
DIKW Model

Data after transformation to information can also be converted to knowledge


and wisdom. This is called the DIKW model, which explains how we move

from Data to Information to Knowledge to Wisdom.

Work is Worship 12
Examples of DIKW
Data Information Knowledge Wisdom
Knowledge is If the water is
Boiling Point of
1000c gained of water’s being touched,
water
boiling point. hands may burn.

Age- 77 Elderly heart On a drug that Decrease Warfarin,


Taking condition needs decreases Warfarin Monitor INR carefully
Amiodarone and Anticaogulation ; metabolism; at risk Avoid drugs that injure
Warfarin 250 mg Liver damage of over-coagulation. liver

South Facing Traffic


The route i am
Red Light on the light of Khandagiri
traffic has turned driving towards has I better stop the car.
road
red turned red.

Work is Worship 13
What are Data Foot Prints

A data footprint is the data which


is left behind when users are
online.
There are two types of data foot
prints.

 Active- Active digital data footprints


consist of data a user leaves
intentionally.

 Passive- Passive digital data


footprints are composed of data that
a user leaves behind unintentionally
on the internet.

Work is Worship 14
Active Data Passive Data

User Clickstream Web Browsing History


Social Media Interactions Location Data
Transaction History Social Media Activity
Streaming Data Device Meta data
Network Logs Network Activity
Customer Support Transactions Biometric Data
Health Monitoring Purchase History
Supply Chain Tracking Communication Patterns
Energy Consumption Data Search Queries
Work is Worship 15
Data Loss and Recovery
The process of restoring inaccessible, lost, corrupted, damaged, or deleted data
is called data recovery.

System Failure Power Failure


Hardware Failure
Software Crash
Disaster Natural disaster
Fire
Crime Theft
Hacking
Computer Virus
Unintentional Action Accidental deletion of files
Loss of pen drives or laptops
Intentional Action Deletions of file or program

Work is Worship 16
What is Data Collection and Variables
Data Collection:
The method of gathering data for calculating and analyzing is known
as data collection.
Variable: A variable is an attribute of an object of study that may
vary for different cases.

 Numerical variable
They represent values that have numbers. For Example, age, weight, height.
 Categorical variable
These variables represent values that have words, for example, name, nationality,
sport, etc.

Work is Worship 17
What are Data Sources
Data sources can be classified into two types:

1. Primary Data Source. 2. Secondary Data Source.


Online
Interviews
Surveys
PRIMARY
Marketing Feedback
Campaigns Forms
DATA
SOURCES
Transactional IOT Sensor
SECONDARY Data Data
Social
Satellite Web Media
Data Traffic
Work is Worship 18
What is Big Data

When the data volumes


1
exceed the processing
capacities of traditional
databases, it is called as
Big Data. 4
2

5V’s 3
5
www.kaggle.com
Work is Worship 19
Questioning Your Data

Will a customer buy this product?, Will India win the


match?, Are you happy with your salary?

Checking of blood pressure, Checking of car tyre pressure.

What will be the temperature today?, How many goals did


your favorite team score.

Distinguishing Height and Number of Students in a class.

I am driving a car and the traffic lights are red.

Work is Worship 20
Introduction to Data Science

 Data Science is a collection of information gathered by


observations, measurements, research or analysis.
Data Science is about finding patterns in data, through analysis,
and make future predictions.

Work is Worship 21
Careers in Data Science

Data Data Scientists are data enthusiasts who gather and analyze large sets of structured and
Scientist unstructured data. They analyze, process, and model data and later interpret the results to
create actionable plans for companies and organizations.
Business Business Intelligence Analysts use data to assess the market and find the latest business
Intelligence trends in the industry. This helps to develop a clearer picture of how a company should
Analyst shape its strategy.
Data Data Engineer examines not only the data for their own business but also that of third
Engineer parties. In addition to mining data, a data engineer creates robust algorithms to help
analyze the data further.
Data Data Architects work closely with users, system designers, and developers to create a
Architect blueprint that data management systems use to centralize, integrate and maintain the data
sources.
Senior Senior Data Scientists anticipate the business's needs in the future. Although they might
Data not be involved in gathering data, they play a high-level role in analyzing it.
Scientist

Work is Worship 22
Where is Data Science needed
For route planning: To discover the best routes to ship.
To foresee delays for flight/ship/train etc. (through predictive analysis).
To create promotional offers.
To analyze health benefit of training.
To predict who will win elections.
Data Science can be applied in nearly every part of a business where data
is available.
Consumer goods
Stock markets
Industry
Politics
E-commerce

Work is Worship 23
How does a Data Scientist work?
A Data Scientist requires expertise in several backgrounds:
Statistics, Programming (Python or R), Mathematics, Databases.
Ask the right questions : To understand the business problem.
Explore and collect data : From database, web logs, customer feedback.
Extract the data : Transform the data to a standardized format.
Clean the data : Remove erroneous values from the data.
Find and replace missing
values : Check for missing values and replace them with a
suitable value
Analyze data, find patterns and make future predictions.
Represent the result : Present the result with useful ways that the
"company" can understand.

Work is Worship 24
Database Table and Database Table Structure
Duration Average_Pulse Max_Pulse Calorie_Burnage Hours_Work Hours_Sleep

30 80 120 240 10 7

A row is a 30 85 120 250 10 7


horizontal 45 90 130 260 8 7
representation of
45 95 130 270 8 7
data.
45 100 140 280 0 7
A column is a 60 105 140 290 7 8
vertical
60 110 145 300 7 8
representation of
data. 60 115 145 310 8 8

75 120 150 320 0 8

75 125 150 330 8 8

Work is Worship 25
Data Science & Python

 Python is a programming language widely used by Data


Scientists.
 Python has in-built mathematical libraries and functions, making
it easier to calculate mathematical problems and to perform data
analysis.

Work is Worship 26
Python Libraries

Pandas- This library is used for structured data operations, like import

CSV files, create dataframes, and data preparation.

Numpy - This is a mathematical library. Has a powerful N-dimensional

array object, linear algebra. Numpy stands for Numerical python.

Matplotlib- This library is used for visualization of data.

SciPy- This library has linear algebra modules. Scipy stands for

Numerical python. Work is Worship 27


Installing Python Libraries

Work is Worship 28
Python DataFrame

A data frame is a structured representation of data.


import pandas as pd
d = {'col1': [1, 2, 3, 4, 7], 'col2': [4, 5, 6, 9, 5], 'col3': [7, 8, 12, 1, 11]}
df = pd.DataFrame(data=d)
print(df)

pip install pandas

Work is Worship 29
Data Science Functions

The mean() function


The NumPy mean() function is used to find the average value of an array.

Ex:

import numpy as np
Calorie_burnage = [240, 250, 260, 270, 280, 290, 300, 310, 320, 330]
Average_calorie_burnage = np.mean(Calorie_burnage)
print(Average_calorie_burnage)

Work is Worship 30
Data Science Functions

The max() function


The Python max() function is used to find the highest value in an list.
Ex:
Average_pulse_max = max(80, 85, 90, 95, 100, 105, 110, 115, 120, 125)
print (Average_pulse_max)

The min() function


The Python min() function is used to find the lowest value in an array.
Ex:

Average_pulse_min = min(80, 85, 90, 95, 100, 105, 110, 115, 120, 125)
print (Average_pulse_min)
Work is Worship 31
Data Preparation

Extract and Read Data With Pandas


Before data can be analyzed, it must be imported/extracted.
We use the read_csv() function to import a CSV file

import pandas as pd
health_data = pd.read_csv("data.csv", header=0, sep=",")
print(health_data)

Work is Worship 32
Data Cleaning Functions
Remove Blank Rows - dropna() function
health_data.dropna(axis=0,inplace=True)
print(health_data)
Data Types Function – info() function
Ex: print(health_data.info())
astype() function
health_data["Average_Pulse"] = health_data['Average_Pulse'].astype(float)
health_data["Max_Pulse"] = health_data["Max_Pulse"].astype(float)
print (health_data.info())
Analyze the Data - describe() function
print(health_data.describe())

Work is Worship 33
Data Visualization
Data Visualization is the representation of data or information in a graph,
chart or other visual formats.

Common types of data visualizations are:

Charts
Graphs
Tables
Maps
Histograms

Work is Worship 34
Data Visualization

import matplotlib.pyplot as plt


import numpy as np

xpoints = np.array([0, 6])


ypoints = np.array([0, 250])

plt.plot(xpoints, ypoints)
plt.show()

Work is Worship 35
Data Visualization

import matplotlib.pyplot as plt


x = [1,2,3] # x axis values
y = [2,4,1] # corresponding y axis values
plt.plot(x, y) # plotting the points
plt.xlabel('x - axis') # naming the x axis
plt.ylabel('y - axis') # naming the y axis
plt.title('My first graph!') # giving a title to my graph
plt.show() # function to show the plot

Work is Worship 36
Data Visualization

import matplotlib.pyplot as plt


langs = ['C', 'C++','Java', 'Python', 'PHP']
students= [20,28,42,78,10]
plt.bar(langs,students)
plt.show()

Work is Worship 37
Data is the new science.
Big Data holds the answers.
Artificial Intelligence Controls the World!!!.

Work is Worship 38
Work is Worship 39

You might also like