Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 7

AI Project cycle

Ai Project cycle mainly has 5 stages:


1. Problem scoping
2. Data acquisition
3. Data exploration
4. Modelling
5. Evaluation/Testing
 Problem scoping: It is the process component of an Ai project
cycle to define the goal/aim of the problem which you want to
achieve with the help of the project.
 Problem scoping is the first stage of any Ai project cycle. To
find out the goal of the project, you need to understand the 4Ws
canvas of the AI project cycle.
 WHO has the problem?
 WHAT is the problem?
 WHERE does this problem arise?
 WHY is the problem worth solving?

 Data acquisition: It is the second stage of the Ai project cycle


which is use to collect data for the scoped problem and this data
has to be reliable, authentic and efficient.

 As the term clearly mentions, this stage is about acquiring data


for the project Data can be a piece of information or facts and
statistics collected together for reference or analysis. Whenever
we want an AI project to be able to predict an output, we need to
train it first using data.

 As we know there are mainly two types of data training data and
testing data.
 Training data: It is an initial set of data used to help a program
understand how to apply technologies like neural network to
learn and produce mature results.

 Testing data: It is generally nothing but it is use to assess the AI


machines for its efficiency and performance.

 Data Features: Data features refer to the type of data you want to
collect.

 The data can be collected from various sources like :


 Newspaper.
 Web scraping.
 Camera.
 Surveys.
 Observation.
 Application program interface (API).
 Questionnaires.
Sometimes, you use the internet and try to acquire data for your
project from some random websites. Such data might not be
authentic as its accuracy cannot be proved. Due to this, it
becomes necessary to find a reliable source of data from where
some authentic information can be taken. At the same time, we
should keep in mind that the data which we collect is open-
sourced and not someone’s property. Extracting private data can
be an offence. One of the most reliable and authentic sources of
information, are the open-sourced websites hosted by the
government. These government portals have general
information collected in suitable format which can be
downloaded and used wisely.
Some of the open-sourced Govt. portals are: data.gov.in,
india.gov.in, nic.in, incometaxindia.gov.in etc.
Data exploration: It is the third stage of the AI project cycle to
put the data in a meaningful form that will be needed to analyze
the data.
For analyzing the data, the data needs to be visualized to fid the
relationship between the data, get the pattern out of it and sense
the trends.
Data visualization: It is a tool for pictorial representation of the
data in the form of Bar graphs, Double bar graph, Pie chart ,
Histogram , line graph, double line graph etc.
Modelling: It refers to developing algorithms/ model which are
trained in such a manner that generates effective, efficient and
Reliable results.

Rule Based Approach: It refers to the AI modeling where the


rules are defined by the developer. The machine follows the
rules or instructions mentioned by the developer and perform its
task accordingly. For example, we have a dataset which tells us
about the conditions on the basis of which we can decide if an
elephant may be spotted or not while on safari. The parameters
are: Outlook, Temperature, Humidity and Wind.
Learning Based Approach: Refers to the AI modeling where the
machine learns by itself. Under the Learning Based approach,
the AI model gets trained on the data fed to it and then is able to
design a model which is adaptive to the change in data. That is,
if the model is trained with X type of data and the machine
designs the algorithm
around it, the model
would modify itself
according to the changes
which occur in the
data so that all the
exceptions are
handled in this case.
 The learning-based
approach can further be
divided into three parts:
Supervised learning: In a supervised learning model, the dataset
which is fed to the machine is labeled. In other words, we can say that
the dataset is known to the person who is training the machine only
then he/she is able to label the data. A label is some information
which can be used as a tag for data. For example, students get grades
according to the marks they secure in examinations. These grades are
labels which categories the students according to their marks.
There are two types of Supervised Learning models:
 Classification: Where the data is classified according to the
labels. For example, in the grading system, students are
classified on the basis of the grades they obtain with respect to
their marks in the examination. This model works on discrete
dataset which means the data need not be continuous.

 Regression: Such models work on continuous data. For


example, if you wish to predict your next salary, then you would
put in the data of your previous salary, any increments, etc., and
would train the model. Here, the data which has been fed to the
machine is continuous.
Unsupervised Learning :
An unsupervised learning model works on unlabelled dataset. This
means that the data which is fed to the machine is random and there is
a possibility that the person who is training the model does not have
any information regarding it. The unsupervised learning models are
used to identify relationships, patterns and trends out of the data
which is fed into it. It helps the user in understanding what the data is
about and what are the major features identified by the machine in it.
Unsupervised learning models can be further divided into two
categories:
Clustering: Refers to the unsupervised learning algorithm which can
cluster the unknown data according to the patterns or trends identified
out of it. The patterns observed might be the ones which are known to
the developer or it might even come up with some unique patterns out
of it.
Dimensionality Reduction: We humans are able to visualize up to 3-
Dimensions only but according to a lot of theories and algorithms,
there are various entities which exist beyond 3-Dimensions. For
example, in Natural language Processing, the words are considered to
be N-Dimensional entities. This means that we cannot visualise them
as they exist beyond our visualization ability. Hence, to make sense
out of it, we need to reduce their dimensions. Here, dimensionality
reduction algorithm is used.
Evaluation: Once a model has been made and trained, it needs to go
through proper testing so that one can calculate the efficiency and
performance of the model. Hence, the model is tested with the help of
Testing Data (which was separated out of the acquired dataset at Data
Acquisition stage) and the efficiency of the model is calculated on the
basis of the parameters mentioned below:

You might also like