Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 24

Data Analyst


Eng. Ejaz Ahmad
What is Data Analyst?

 A process of inspecting, cleaning, transforming and
modeling data with the goal of finding of useful
information and make decision
1. Inspecting
2. Cleaning
3. Transforming
4. Modeling
Skills Required

 Programming (python, R)
 SQL
 Excel
 Tablue
 Statistic
Programming

 Python libraries
•Pandas
•Numpy
•Matplotlib
•Sklearn
•Tensorflor/ MachineLearning
1.Data Extraction

 Following steps include in Data Extraction
1. SQL
2. Web Scrapping
3. File Format (CSV,XML,JSON)
4. Consulting API
5. Baying Data
6. Distributed Database
2.Data Cleaning

 Following steps include in data cleaning
1. Missing values and empty data
2. Data imputation
3. Incorrect types
4. Incorrect and invalid values
5. Outliers and non relevant data
6. Statistical sanitization
3.Data Wrangling

1. Hierarchical data
2. Handling Categorical data
3. Reshaping and transforming structure
4. Indexing data for quick access
5. Merging ,combining and joining
4.Analysis

1. Exploration
2. Building Statistical model
3. Visualization and representation
4. Correlation vs causation analysis
5. Hypothesis testing
6. Statistical analysis
7. Reporting
5.Action

1. Building Machine Learning Models
2. Feature Engineering
3. Moving ML into Production
4. Building ETL pipelines
5. Live dashboard and reporting
6. Decision making and real life testing
MACHINE
LEARNING

Engr. Ejaz Ahmad
Machine learning steps

 Frame the problem
 Get data
 Discover and Visualize the data to get inside
 Prepare the data for machine learning
 Select a model and train it
 Fine tune your model
 Present your solution
 Launch your system
1.Frame the problem

 What is the business objective of this model
 What is the previous solution of model
 Decide is what type of machine learning algorithm
applied
Type of Machine
Learning

 Supervised
• Learn from Known Datasets known Training
datasets
 Unsupervised
• Learn from unlabeled data, used to find stricture
and patterns in big data
 Reinforcement
• Learn from experiences and rewards
Selecting Algorithm

 Classification:
•Is This A or B:
 Anomaly Detection Algorithm
•Is This Weird: Analyze patterns
 Regression Algorithm
•How much or how many: estimator
 Clustering Algorithm
•Find Structure in datasets
 Reinforcement Algorithm
•Use to tack decision
1.Classification

 It give 2 or 3
 If give 2 out put yes or no called 2 class classification

 If give 3 outputs yes no or maybe it is called Multi


class
Classification Algorithm

 Logistic Regression
 Decision Tree
 Artificial Neural network
 K-Nearest Neighbor
 Support Vector Machine
 Random Forest
 Naïve Bayes
 Stochastic Gradient Descent
2.Anomaly Detection

 It analyze the certain patterns and alert you when
there is a change in patterns

 Credit card companies use this algorithm to find any


usual change
3.Regression

 It is an Estimator
 Predict the numerical/ integer values
4.Clustering Algorithms

 Unsupervised learning use to understand the
structure of data
5.Reinforcement
Algorithms

 Used to make a decision

Popular data source

 UC Irvine Machine Learning Repository
 Kaggle datasets
 Amazon’s AWS datasets
 http://dataportals.org/
 http://opendatamonitor.eu/
 http://quandl.com/
 Wikipedia’s list of Machine Learning datasets
 Quora.com question
 Quora.com question
 UCI ML respositery

Splitting Test and Train data

 From sklearn.model_selection import train_test_split
 train_set, test_set = train_test_split(housing,
test_size=0.2, random_state=42)

You might also like