Professional Documents
Culture Documents
Python-II (Part a)
Python-II (Part a)
Smart Phone
Diary
Various device are connected through Internet and they communicate with each other
Millions of devices around us like refrigerators, cars, machines and dish washers are producing da
21
This data is measured in Zettabytes 10
They used data to get useful insights about customer shopping patterns
All of stores were selling them very well but two stores were not selling them at all
The situation was investigated and it was found that there was simple stocking oversight
They are quite smart in analyzing the data through person likes or dislikes
They saw that Facebook users were crazy about cake pops so they introduce it immediately
on their stores
The success behind this that they don’t see the huge data as burden but they are using it for bene
Data Science is the process of extracting knowledge and insights from data by using scientific methods
Netflix is the using the process of data science for exploring user interests
Facebook and other social media platforms are using same exploration techniques
How the data scientists get useful insights from the data ?
Machine Learning
Statistics
Big Data Processing (Unstructured)
Programming Language
Data Visualization
Data Extraction and Processing
Probability
Data Wrangling (Cleaning) and Exploration
Categories of data
What is Statistics ?
Sampling Techniques
Types of Statistics
Descriptive Statistics
Probability
Inferential Statistics
Data with no inherent order or ranking such as gender or race, such data is nominal data
ID:0 Good
ID:1 Bad
ID:2 Average
Discrete Data:
Discrete data known as categorical data, it can hold finite number of possible values (CAN BE COUNTED)
Continuous Data:
• Continuous data that can hold infinite number of possible values (IN A CERTAIN RANGE)
• Time to wake up
The company has created a medicine to cure cancer. How would you test it’s effectiveness through
statistics ?
You and friend have a bit that Shahid Afridi will make 24 runs in next over
Sales data have come in the company. The manager ask you to make a report
Sample:
Sample of the population, A well chosen sample will contain most of the information about
Will we knock every home door that please tell us about your son ?
You are going to take a sample, apply your statistics to the particular sample
Stratified:
• It is basically focused on main characteristics of data. It provides graphical summary of the data
Case:
Suppose you want to gift T-shirts to all the students of your class.
The steps you will take:
By applying descriptive statistics, you will find how many maximum, minimum & average shirt size
Larg
e
Mediu
m
Sma
ll
You’ve grouped the people in large medium and small
Inferential statistics allows us to infer data parameters based on statistical model using a sample data
Measures of Central
Tendency
Measures of spread
Standard
Range Inter Quartile Range Variance
Deviation
22,3,4,34,4,8,8,9,5,7,12,45,89,4,4,9,0,12 MODE
a = np.random.randint(12,47,50)
np.mean(a) # Calculate the mean of the data
np.median(a) # Calculates the median of the data
from statistics import mode # For mode first import it from statistics
mode(a) # Calculates the mode of the data
Measures of spread
We have find the range easily but we haven’t understand that we are going to do basically ?
It tells about the measure of variability, based on dividing the dataset into quartiles
ﺗﻐﯿﺮ
Quartile divides a rank-ordered dataset into four equal parts which is Q1, Q2, Q3, rest
Q3 – Q1
Populatio
Sample
n
Variance
Variance
Population Variance : It is the average of squared deviations
For Understanding
of Sample & Population
Move toSlide # 19
Check the examples provided below for clearing all the concepts
The greater the value of the standard deviation, the further the data tend to be
dispersed from the mean
0
Designed by Syed Umaid Ahmed 54
American Men : Mean of Heights
-0.15%
0.15%
5’ 1’’ 5’ 4’’ 5’ 7’’ 5’ 10’’ 6’ 1’’ 6’ 4’’ 6’ 7’’
Entropy is freedom to move, Ice (very low), Liquid (Less), Gas (Greater)
Designed by Syed Umaid Ahmed 57
Pick a ball from Container 1, 2 & 3
Combinations
Entropy (Parent) =
strong weak
strong weak
Desired Outcomes
Total Outcomes
Statistics
Designed by Syed Umaid Ahmed 80
Basic Terminologies in Probability
Random Experiment: (Result can’t be predicted but know ALL possibilities)
An experiment or a process for which a result cannot be predicted with certainty
e.g. Rolling a dice is a random experiment, Cards
Sample Space: (If we put all & all results of Random Experiment in “Set”, it’s called sample space)
Non-Disjoint Event:
1. A student can be obtain 100 marks in accounting and 100 marks in Python
2. Normal Distribution
?
o u
ti n
o n
C
Designed by Syed Umaid Ahmed 83
1. Probability Density Function
The equation describing a continuous probability distribution is called a Probability Density Function
a b
3 Probability that a random variable assumes a value between a and b
Is equal to the area under the PDF bounded by a and b
Designed by Syed Umaid Ahmed 84
Designed by Syed Umaid Ahmed 85
2. Normal Distribution (Gaussian Distribution)
The graph of the normal distribution depends upon two factors:
If we had a large population and we divide it many samples, the mean of all samples will almost be equal to populati
Designed by Syed Umaid Ahmed 87
Designed by Syed Umaid Ahmed 88
Different Types of Probability
1. Marginal Probability
It is the probability of the occurrence of a single event
13/52
Probability of a card is Ace and Red. It is joint Probability means Intersection Property
2/52
BY-SA
Conditional Probability of an Event B is the probability that the event will occur given that Event A has already occurre
If A and B are dependent events than the If A and B are independent events than the
expression for conditional probability is: expression for conditional probability is:
• Most of you have noticed it in your email addresses. This is done through ML by Naïve Bayes
Items produced by machine A, 5% are defective; similarly, 3% of machine B’s items and 1% of
machine C's are defective.
If a randomly selected item is defective, what is the probability it was produced by machine C?
= 0.01 x 0.50
0.024
Probability of Coming from C ?
= 5/24
• This is done by different methods. i.e, point estimation, and interval estimate
1. State the Hypothesis: This involves stating Null Hypothesis (Ho) and Alternative hypothesis (H1)
Analyze Sample Data: Calculation and Interpretation of test statistics as described in plan
Interpret Results: Involves the application of decision rule described in analysis plan
Used Case
They all bunks a class, So as a punishment they have to clean the classroom every
week.
Furqan decided that we put he name of everyone in a chit and select one chit everyday.
One who got a name will clean the room. Three days passed, there is no name of
Furqan
Designed by Syed Umaid Ahmed 107
Three days passed, Everyone name came out except Furqan.
Assume that the event is Free of Bias. What is Probability that John name is not coming out !
P (Furqan not picked for 12 days) = ¾ x ¾ x ¾ x ¾ x ¾…12 times = 0.032 < 0.05
explicitly programmed.
• Make Decisions
i.e.
Netflix Recommendations, Facebook Tagging &
Predictor Variable: It is the feature of the data used to predict the output
Training Data: The Machine Learning model is built using Training Data, It is splitted into two parts, (8
Testing Data: The Machine Learning Model is always evaluated using Testing Data. It is small (20%)
2. Fill in the missing values, redundant variables, remove unnecessary data, duplicate values and so on.
2. The model is the Machine Learning algorithm that predicts the output by using data fed to it
3. Further modification is done in the model using techniques like Parameter Tuning, Cross Validations
1. The final outcome is predicted after performing parameter tuning and improving the accuracy of model
https://www.youtube.com/watch?v=kmSab3AuLdY&list=RDCMUCYUjYU5FveRAscQ8V21w81A&start_radio=1&t=3
https://www.youtube.com/watch?v=xs99sViYwJA&list=PLMoSUbG1Q_r-rw4MC03RUrzQzm0NuBCRj&index=6
https://www.youtube.com/watch?v=VjHBotwi5a8&list=RDCMUCYUjYU5FveRAscQ8V21w81A&index=3
Corona Predictor:
https://www.youtube.com/watch?v=6CZiz-FLZF0&list=PLMoSUbG1Q_r-rw4MC03RUrzQzm0NuBCRj&index=8