Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

EXTRA CLASS FOR AI

AI PROJECT CYCLE
REVISION
Preparing Dinner
• Decide the dish
• Collect the ingredients
• Identify the taste most liked by the family
• Identify the best method of preparation of the dish
• Prepare the dish
• Family tastes the dish to decide if the dish is good or not
Analogy with Project cycle
• Decide the dish - Identify the problem to be solved - Problem scoping
• Collect the ingredients - Collect data required to solve the problem -
Data Acquisition
• Identify the taste most liked by the family - Analyse the data to find
patterns - Data Exploration
• Identify the best method of preparation of the dish - Selecting a
model
• Prepare the dish - Algorithm and coding for the model - Modelling
• Family tastes the dish to decide if the dish is good or not - Checking if
the model is working well - Evaluation
Project cycle
• Project cycle refers to the process of developing an AI system
• Different stages of AI project cycle are :
• Problem scoping
• Data Acquisition
• Data Exploration
• Modelling
• Evaluation
Problem Scoping
• Identifying a problem and having a vision to solve it is Problem scoping
• Solving a problem is not easy as it may be complex
• Deeper Understanding of the problem will help us solve the problem
We use 4W Problem canvas as a tool for problem scoping (to understand the
problem better)
4Ws - Who, What, Where, Why
Who - People who are directly or indirectly affected by the problem
What - What is the problem, How do we know it is a problem (evidence for
the problem - newspaper articles or websites or complaints from users)
• Where - Identify context of the problem - Location/situation/context
in which the problem arises
• Why - benefits which stakeholders will get from the solution - what
would be of key value to the stakeholder
Once we answer all the questions in 4W Problem canvas, we can
summarize this in one template called Problem Statement Template.
In future, whenever there is need to look back at the basis of the
problem, we can take a look at the Problem Statement Template and
understand the key elements of the problem.
Problem Statement Template
Data Acquisition
• Second stage - Data Acquisition
• Data can be a piece of information or facts and statistics collected together
for reference or analysis
• Acquisition - Collecting - collecting data from different sources
• Data Acquisition is the stage where data is collected for the problem to be
solved.
• The collected data is used for training an intelligent system and testing
whether the system is predicting accurately.
• Data collected is therefore split into training dataset and testing dataset.
• For any AI project to be efficient, the training data should be reliable,
authentic and relevant to the problem statement scoped.
Identifying required data - Data feature
• If we want to predict the salary of a person, what data would we
need?
• current salary, years of experience, annual increment etc.
• These are data related to the problem we are trying to solve
• Data features refer to the type of data you want to collect which will
help you solve the problem at hand
• In other words, the parameters that affect a given problem will be the
data features
Once we identify the data to be collected, we identify the sources for
data
Sources of Data
• Surveys - Ask a sample set of people and collect data
• Web Scraping - An application which reads HTML files and looks for
relevant data and collects it
• Sensors - Collect data from sensors - for e.g. temperature, pressure
etc.
• Cameras - Collect data as images
• Observations - Collect data from reports or experiments
• API - Application programming interface - our AI program can collect
data from another company's database using a function
Things to note when deciding source of data
• Collecting data from random websites is not correct - Data may not
be reliable and authentic in such cases
• Collecting other people's data without consent breaches the privacy
of others and is therefore not right.
• Extracting private data can be an offence
• One of the most reliable and authentic sources of information, are
the open-sourced
sourced websites hosted by the government. These
government portals have general information collected in suitable
format which can be downloaded and used wisely.
• Some of the open-sourced
sourced Govt. portals are: data.gov.in, india.gov.in
Data Exploration
• Data is a complex entity
• We cannot make sense of data just by looking at it
• We have to work some patterns out of data to understand it
• Data Exploration is the stage where we try to clean the data and try to
understand the data by identifying patterns in it.
• To identify patterns or analyse the data we use Data visualization.
• Visualization helps in :
• Quickly get a sense of the trends, relationships and patterns contained within the data.
• Define strategy for which model to use at a later stage.
• Communicate the same to others effectively. To visualise data, we can use various types
of visual representations.
Visualization using Charts
• Bar Chart - Categories vs Numbers (e.g. Sales in different months -
months is category, sales is number)
• Histogram - Range vs Numbers (e.g. Number of students who fall in a
height range)
• Pie chart - Categories vs Numbers (e.g.Amount spent on purchasing
different items)
• Line chart - Numbers Vs Numbers (weight of a child vs age of the
child)
• Scatter plot - Number Vs Number
Modelling
• When we want machines to identify patterns, data needs to be in the
most basic form (0s and 1s)
• Machine use mathematical representation to identify patterns in the
data (some mathematical formula for the graphical representation
identified)
• The ability to mathematically describe the relationship between
parameters is the heart of every AI model.
• Modelling refers to the mathematical approach towards analysing
data
Types of Models
Rule based model
• Rules are defined by the developer
• The machine follows the rules or instructions mentioned by the
developer and performs its task accordingly
• In rule-based
based approach because we feed the data along with rules to
the machine and the machine after getting trained on them is now
able to predict answers for the same
Drawbacks of rule based model
• A drawback/feature for this approach is that the learning is static.
The machine once trained, does not take into consideration any
changes made in the original training dataset.
• Once trained, the model cannot improvise itself on the basis of
feedbacks
Learning based model
• Learning models is a model where the machine learns by itself. Under
the Learning Based approach, the AI model gets trained on the data
fed to it and then is able to design a model which is adaptive to the
change in data.
• Three parts of learning based models
• Supervised learning
• Unsupervised learning
• Reinforcement learning
Supervised Learning
• In a supervised learning model, the dataset which is fed to the
machine is labelled.
• A label is some information which can be used as a tag for data
• The intelligent system will try to identify pattens among the label and
form rules by itself
There are two types of supervised learning models:
• Classification
• Regression
Classification - Identifying the label for input
data
• For example, in the grading system, students
are classified as belonging to different grades
based on their marks in the examination.
• In classification, the marks and the grades
are a few students are given to the system.
The system identifies the rule for grading and
allots the grade for the student based on
marks.
• (i.e.) Input data is classified or grouped in to
one of the labels.
• This model works on discrete dataset
Regression - Predicting continuous numerical
values
• If you wish to predict your next
salary, then you would put in the
data of your previous salary, any
increments, etc., and would train
the model.
• Here, the data which has been
fed to the machine is continuous
or decimal numbers
• Models that predict the output
as numerical values use
regression
Unsupervised learning
• An unsupervised learning model works on unlabelled dataset
• The unsupervised learning models are used to identify relationships,
patterns and trends out of the data which is fed into it.
• The person training the system does not have any information about
the data
After training, the machine would come up with patterns which it was
able to identify out of it.
The Machine might come up with patterns which are already known to
the user or something very unusual
Unsupervised models are classified into two categories:
• Clustering
• Dimensionality Reduction
Clustering and Dimensionality reduction
• Clustering refers to unsupervised learning model which clusters
unknown data according to patterns or data identified by it.
• A picture or an image is a 2D representation of our 3D world. When
we take a picture of a ball, we can see only the circle not a sphere. We
can see only one side of the ball in the picture. Even though we do
not see the other side of the ball, we know it is a ball.
• Dimensionality reduction is all about reducing the dimensions of the
data collected and still make sense out of the data.
Evaluation
• Once a model has been made and trained, it needs to go through
proper testing so that one can calculate the efficiency and
performance of the model.
• Hence, the model is tested with the help of Testing Data (separated
out of the acquired dataset during Data Acquisition)
• Efficiency of the model is calculated on the basis of:
• Accuracy
• Precision
• F1 Score
• Recall
Neural Networks
• Neural networks are modelled after neurons in human brain
• Neural networks extract Data Features on their own without the need for
developers to identify data features that will help in solving the problem.
• It is fast and efficient way to solve problems which have huge datasets such
as images
• Neural network has three layers – Input layer, Hidden layer, Output layer
• Input layer collects the data and feeds it to the hidden layer
• Hidden layer does the processing and sends the output to the output layer.
Hidden layer is not visible to the user. There can be any number of hidden
layers in a neural network. Each node in a hidden layer is a machine
learning algorithm. There can be any number of nodes in a hidden layer.
The last hidden layer passes the final processed data to the output layer.
• Output layer passes the data from the hidden layer to the user as the final
output
Structure of a Neural network
Output Layer
Input Layer Hidden Layer

You might also like