Notes On COMPUTER VISION

VISION AI
Abstract— Vision AI is the automated characteristics such as shapes, textures, colors,

extraction of information from images. sizes, spatial arrangement, among other things, to
Information can mean anything from 3D provide a description as complete as possible of the
models, camera position, object detection and image.
recognition to grouping and searching image
content. In this book we take a wide definition of 2. THE VISION AI
computer vision and include things like image
warping, de-noising and augmented reality.
Sometimes vision AI tries to mimic human Vision AI is an interdisciplinary scientific field that
vision, sometimes uses a data and statistical deals with how computers can gain high-level
approach, sometimes geometry is the key to understanding from digital images or videos. From
solving problems. the perspective of engineering, it seeks to
understand and automate tasks that the human
visual system can do.
1. INTRODUCTION
Vision AI tasks include methods for
As humans, we are capable of understanding and acquiring, processing, analyzing and understanding
describing a scene encapsulated in an image. This digital images, and extraction of high-
involves much more than detecting four people in dimensional data from the real world in order to
the foreground, one street, and several cars as in the produce numerical or symbolic information, e.g. in
image below. the forms of decisions. Understanding in this
context means the transformation of visual images
(the input of the retina) into descriptions of the
world that make sense to thought processes and can
elicit appropriate action. This image understanding
can be seen as the disentangling of symbolic
information from image data using models
constructed with the aid of geometry, physics,
statistics, and learning theory.
The scientific discipline of vision AI is concerned
with the theory behind artificial systems that
Fig. 1 extract information from images. The image data
can take many forms, such as video sequences,
Aside from that basic information, we are able to views from multiple cameras, multi-dimensional
understand that the people in the foreground are data from a 3D scanner, or medical scanning
walking, that one of them is barefoot — a curious device. The technological discipline of vision AI
thing — and we even know who they are. We can seeks to apply its theories and models to the
reasonably infer that they are not in danger of being construction of computer vision systems.
hit by a car and that the white Volkswagen is
poorly parked. A human would also have no
problem describing the clothes they are wearing
3. WORKING
and, in addition to indicating the color, guessing at
the material and texture of each outfit. When Machine see the image of a person, we
apply neurons to the input and the model extract
These are also the skills a vision AI system needs.
the features of that person’s face by applying
In a few words, the main problem solved by
some operations and as soon as that features is
computer vision can be summarized as follows:
extracted, we apply convolutional neural network
to make the model learn that face.
Given a two-dimensional image, a vision AI
system must recognize the present objects and their
4. FEATURES OF VISION AI
Vision AI is being used today in a wide variety of

real-world applications, which include:
4.1 Optical character recognition (OCR):
reading handwritten postal codes on letters(or
numbers) and automatic number plate recognition
(ANPR);
Fig. 2
Machine reads the image in a matrix format. Like it

reads image in a 3D matrix or say RGB format.
The features of the image is nothing but pixel
values of the matrix. We convert that matrix into
grayscale format or say 3D to 1D to extract the Fig. 3
features in short time, so training will be done in 4.2 Machine inspection:
less amount of time. As soon as it done, machine
sends the model into dense neural network and let rapid parts inspection for quality assurance using
it finish training the features for that face. And at stereo vision with specialized illumination to
last machine learns the features of that face and measure tolerances on aircraft wings or auto body
hence model trained. parts or looking for defects in steel castings using
X-ray vision;
Fig. 4
4.3 Retail:
object recognition for automated checkout lanes
Fig. 3
Simply we can state, vision AI mimics natural

processes: retrieves visual information, handles it,
and interprets it. And state-of-the-art algorithms,
so-called neural nets used for vision AI tasks,
replicate natural neural networks.
Its all about the data in vision AI. If the data is not
proper, there will be no accuracy . The proper the
data, the better will be result.
Fig. 5
4.4 3D model building (photogrammetry):
fully automated construction of 3D models from
aerial photographs used in systems such as Bing
Maps;
4.5 Medical imaging:
registering pre-operative and intra-operative
imagery (Figure 1.4d) or performing long-term
studies of people’s brain morphology as they age;
Fig. 8
5 TECHNIQUES OF VISION AI
5.1 IMAGE CLASSIFICATION

Given a set of images that are all labeled with a
single category, we’re asked to predict these
Fig. 6 categories for a novel set of test images and
measure the accuracy of the predictions. There are a
variety of challenges associated with this task,
including viewpoint variation, scale variation, intra-
4.6 Automotive safety: class variation, image deformation, image
detecting unexpected obstacles such as pedestrians occlusion, illumination conditions, and background
on the street, under conditions where active vision clutter.
techniques such as radar or lidar do not work well. The most popular architecture used for image
Autonomous driving also included. classification is Convolutional Neural Networks
(CNNs). A typical use case for CNNs is where you
feed the network images and the network classifies
the data. CNNs tend to start with an input “scanner”
which isn’t intended to parse all the training data at
once. For example, to input an image of 100 x 100
pixels, you wouldn’t want a layer with 10,000
nodes.
Fig. 7
4.7 Match move:
merging computer-generated imagery (CGI) with
live action footage by tracking feature points in the
source video to estimate the 3D camera motion and
shape of the environment.
Fig. 9
5.2 OBJECT DETECTION Fig. 11
Object detection allows us to identify and locate
Object Tracking methods can be divided into 2
objects in an image or video. With this kind of
categories according to the observation model:
identification and localization, object detection can
generative method and discriminative method. The
be used to count objects in a scene and determine
generative method uses the generative model to
and track their precise locations, all while
describe the apparent characteristics and minimizes
accurately labeling them. Object detection is
the reconstruction error to search the object, such as
commonly confused with image recognition. So, it
PCA.
is necessary to distinguish between them. Image
recognition assigns an label to an image. Object
detection detects an object and draw a line around 5.4 SEMANTIC SEGMENTATION
it.
Central to Vision AI is the process of segmentation,

which divides whole images into pixel groupings
which can then be labelled and classified.
Particularly, Semantic Segmentation tries to
semantically understand the role of each pixel in the
image (e.g. is it a car, a motorbike, or some other
type of class?). For example, in the picture above,
apart from recognizing the person, the road, the
cars, the trees, etc., we also have to delineate the
boundaries of each object. Therefore, unlike
Fig. 10 classification, we need dense pixel-wise predictions
from our models.
5.3 OBJECT TRACKING
Object Tracking refers to the process of following a

specific object of interest, or multiple objects, in a
given scene. It traditionally has applications in
video and real-world interactions where
observations are made following an initial object
detection. Now, it’s crucial to autonomous driving
systems such as self-driving vehicles from
companies like Uber and Tesla.
Fig. 12
5.5 INSTANCE SEGMENTATION
Beyond Semantic Segmentation, Instance

Segmentation segments different instances of
classes, such as labelling 5 cars with 5 different
colors. In classification, there’s generally an image
with a single object as the focus and the task is to
say what that image is. But in order to segment 1990s. Multiplex recording devices were
instances, we need to carry out far more complex introduced, together with cover video surveillance
tasks. We see complicated sights with multiple for ATM machines.
overlapping objects and different backgrounds, and
we not only classify these different objects but also
identify their boundaries, differences, and relations
to one another!
Fig. 13
Fig. 13
6 HISTORY
2000s. Two researchers at MIT introduced the first
face detection framework (Viola-Jones) that works
in real-time. Google started testing robot cars on
1970s. When Vision AI first started out in the
roads. Google released Goggles, an image
early 1970s, it was viewed as the visual perception
recognition app for searches based on pictures
component of an ambitious agenda to mimic
taken by mobile devices. To help tag photos,
human intelligence and to endow robots with
Facebook began using facial recognition. Facial
intelligent behavior. At the time, it was believed by
recognition was used to help confirm the identity of
some of the early pioneers of artificial intelligence
Osama bin Laden after he is killed in a US raid.
and robotics (at places such as MIT, Stanford, and
Google Brain’s neural network recognized pictures
CMU) that solving the “visual input” problem
of cats using a deep learning algorithm. Google
would be an easy step along the path to solving
launched open-source Machine learning-system
more difficult problems such as higher-level
TensorFlow
reasoning and planning. According to one well-
known story, in 1966, Marvin Minsky at MIT
asked his undergraduate student Gerald Jay
Sussman to “spend the summer linking a camera to
7 DISTINGUISHING VISION AI
a computer and getting the computer to describe
what it saw” (Boden 2006, p. 781).5 We now know FROM RELATED FIELDS
that the problem is slightly more difficult than
that.6 What distinguished vision AI from the
already existing field of digital image processing
(Rosenfeld and Pfaltz 1966; Rosenfeld and Kak
It is important to understand that vision AI
1976) was a desire to recover the three-dimensional
accomplishes much more than other fields such as
structure of the world from images and to use this
image processing or machine vision, with which it
as a stepping stone towards full scene
shares several characteristics. Let’s have a look at
understanding.
the differences between the fields.
1980s. In the 1980s, a lot of attention was focused
on more sophisticated mathematical techniques for Image processing
performing quantitative image and scene analysis.
Image processing is focused on processing raw Focusing on Visual AI, the number of use-cases for
images to apply some kind of transformation. applying AI that performs at human level or better
Usually, the goal is to improve images or prepare is increasing exponentially, given the fast-paced
them as an input for a specific task, while in vision advances in Machine Learning.
AI the goal is to describe and explain images
Visual AI encompasses techniques used in the
For instance, noise reduction, contrast, or rotation image processing industry to solve a wide range of
operations, typical components of image previously intractable problems by using Vision AI
processing, can be performed at pixel level and do and Deep Learning.
not need a complex grasp of the image that allows
for some understanding of what is happening in it. However, high innovation potential does not come
without challenges.
Machine vision AI inference requires a considerable amount of

processing power, especially for real-time data-
This is a particular case where vision AI is used to intensive applications. AI solutions can be
perform some actions, typically in production or deployed in cloud environments (Amazon
manufacturing lines. In the chemical industry, AWS, Google GCP, Microsoft Azure) in order to
machine vision systems can help with the take advantage of simplified management and
manufacturing of products by checking the scalable computing assets.
containers in the line (are they clean, empty, and
free of damage?) or by checking that the final Nevertheless, in most circumstances Cloud is not
product is properly sealed. the adequate environment for deploying Artificial
Intelligence.
Vision AI
 What if your solution needs to run real-
time and requires fast response times?
Vision AI can solve more complex problems
such as facial recognition (used, for example, by  How to operate a system that is mission
Snapchat to apply filters), detailed image analysis critical and running off-grid?
that allows for visual searches like the ones Google  How to handle the high operating costs of
Images performs, or biometric identification. analyzing massive data in the cloud?
 What about data privacy if sending and
storing video material in the cloud?
8. CHALLENGES OF VISION AI
Therefore, vision AI solutions will need to be
deployed on edge endpoints for most use cases.
This allows to process the data where it is captured
Vision AI is difficult to implement because:
while only the results (light data) are sent back to
8.1 CLOUD IS NOT ENOUGH FOR MOST the cloud for further analysis.
USECASES
8.2 MOVING TO THE EDGE REQUIRES
Artificial Intelligence is present in many areas of HARDWARE KNOWLEDGE
our lives, providing visible improvements to the
way we discover information, communicate or
move from point A to point B. AI adoption is
rapidly increasing not only in consumer areas such
as digital assistants and self-driving vehicles, but
across all industries, disrupting whole business
models and creating new opportunities to generate
new sources of customer value.
For most use-cases, deploying AI solutions on  Managing updates, data analysis and real-
Edge devices is the only reasonable way to solve time insights
the challenge. A fine example is a farming
 Knowledge about data privacy and
analytics system. The system has to capture and do
security best practices
inference for 30 images per second per camera. For
an average setup of 100 cameras, we get a volume
There is a high level of development risk
of 259.2 million images per day. This is highly
associated with this approach. Especially when
inefficient to process in the cloud.
considering development time, required domain
experts and difficulties in developing a scalable
The best option for this use-case is to run inference
infrastructure.
in real-time at the Edge. Analyse the data where it
is being generated! And only communicate key
Fortunately, there’s a finer path to
data points to the cloud backend for data
make your vision a reality, one that allows you to
aggregation and further analysis. Edge computing
reduce your development costs and time to market
is considered to be a current key trend in IT
by an order of magnitude.
industry.
Considering the rapid growth of AI inference

capabilities in Edge hardware platforms (Intel
NUC, Intel NCS, Nvidia Jetson, ARM Ethos), 9. SOLUTIONS
transferring the processing requirements from
Cloud to Edge becomes a very attractive option for
a wide range of businesses. Vision AI problems can easily be solved using
Machine Learning algorithms. Since Machine
8.3 COMPLEXITY LETS MOST Learning has expanded computers’ ability to
ORGANIZATIONS FAIL AT SCALING understand images and extract different
VISUAL AI information from visual data. Below listed Visual
Programming can also play a vital role in solving
Even with the promise of great hardware support vision AI challenges.
for Edge deployments, developing a Visual AI
solution remains a complex process. Visual Programming:
Use a visual approach to build complex Visual AI
In a traditional approach, several of the following solutions on the fly. The visual programming
building blocks may be necessary for developing approach can reduce development time by over
your solution at scale: 90%. It does not only greatly reduce the effort to
write code from scratch, but gives the visibility of
 Collecting input data specific to the how the vision application works.
problem 9.1 Device Management:
 Expertise with Deep Learning tools Add and manage thousands of edge devices
like Tensorflow, PyTorch, Keras, Caffe, easily, no matter the device type and
MXnet for training and evaluating Deep architecture (amd64, aarch64, …). Create a
Learning models device image and flash it to your device to
 Selecting the appropriate hardware (e.g. make it appear in your workspace. Check
Intel, NVIDIA, ARM) and software device health metrics, online or deployment
platforms (e.g. Linux, Windows, Docker, statuses without writing a single line of code.
Kubernetes) and optimizing DL models 9.2 Deployment Management:
for the deployment environment
Use an integrated device management tool to
 Managing deployments to thousands of enroll and manage endpoint devices. Deploy
distributed edge devices AI applications to numerous edge devices at
the click of a button. While you can focus on
the algorithm development, deployment,
versioning and device management is taken their patients, monitor the evolution of diseases,
care of for you. and prescribe the right treatments. The technology
9.3 Modular Approach: not only helps medical professionals to save time
on routine tasks and give more time to the patients.
Benefit from many pre-existing software
modules to build your own use case. Viso
Suite provides the most common deep learning
models for Object Detection, Image
Classification, Object Segmentation or
Keypoint Detection off-the-shelf. Select the
suitable model and create your application
with thousands of ready-to-choose logic
modules.
9.4 Flexibility where needed:
Add your own algorithms and code where

needed for your custom vision AI solution.
Only build code where it does not exist yet and
to market 10x faster and with a very limited
risk. Fig. 14
10.2 SECURITY
Using biometric analysis such as retinal and

10. APPLICATIONS fingerprint scanning to uniquely identify
individuals for security purposes. As cameras and
sensors proliferate, vision AI provides insights into
Vision AI technology of today is powered by deep public spaces and workplaces, airports and
learning algorithms that use a special kind of neural industrial sites, so that security and safety officers
networks, called convolutional neural network can make sense of a flood of images and data from
(CNN), to make sense of images. These neural both remote and high-traffic areas.
networks are trained using thousands of sample
images which helps the algorithm understand and
break down everything that’s contained in an
image. These neural networks scan images pixel by
pixel, to identify patterns and “memorize” them. It
also memorizes the ideal output that it should
provide for each input image (in case of supervised
learning) or classifies components of images by
scanning characteristics such as contours and
colors. This memory is then used by the systems as
the reference while scanning more images. And
with every iteration, the AI system becomes better Fig. 14
at providing the right output. Following are a few
areas in which vision AI technology is being used
or tested:
10.3 MANUFACTURING
10.1 HEALTH CARE
Inspecting manufacturing processes and finished
Diagnosing diseases by analyzing images obtained products fo non-conformance and defects. It helps
from CT scans and other medical imaging manufacturing process in several domains such as
processes. Vision AI is today assisting an time-efficiency, accuracy, repeatability, reduced
increasing number of doctors to better diagnose costs, post-pandemic value.
objects in the surroundings for visually challenged
people. Vision AI will also play a vital role in the
development of artificial general intelligence (AGI)
and artificial super intelligence (ASI) by giving them
the ability. to process information as well as or
even better than the human visual system
12. CONCLUSION
Fig. 15
Based on the above discussion, it is clear

that vision AI technology will continue to be a very
10.4 TRANSPORTATION
useful tool in tackling a wide variety of challenges
Guiding autonomous vehicles by identifying in meat classification and quality prediction. Good
obstacles, people, and road signs along the way. predictions and classification Considering the
One of the best example is, Self-driving vehicles, capabilities of present-day vision AI, it might be
in order to navigate safely through the streets, must hard to believe that there are more benefits and
applications of the technology that remain
unexplored. The future of vision AI will pave the
way for artificial intelligence systems that are as
be able to identify the obstacles, sign-posts, other human as us. However, before doing so, there are
vehicles, and any person that may come in the way. a few challenges that must be overcome, the
biggest of them being the demystification of the
black box of AI. That’s because just like other deep
learning applications, vision AI, while being
functionally effective, is undecipherable when it
comes to its inner workings.
9. REFERENCES
Fig. 16
[1] Computer Vision: A Modern Approach.
Book by David Forsyth and Jean Ponce
11. FUTURE
[2] https://insights.daffodilsw.com/blog/vision-
ai-what-is-it-and-why-does-it-matter
With further research on and refinement of the
[3] https://www.usmsystems.com
technology, the future of vision AI will see it
perform a broader range of functions. Not only will [4] Programming Computer Vision With
vision AI technologies be easier to train but also be Python. Book by Jan Erik Solem.
able to discern more from images than they do
now. This can also be used in conjunction with [5]https://www.forbes.com/sites/cognitiveworl
other technologies or other subsets of AI to build d/2019/06/26/the-present-and-future-of-
more potent applications. For instance, image computer-vision/?sh=70b13835517d
captioning applications can be combined with
natural language generation (NLG) to interpret the
[6]https://www.sciencedirect.com/topics/food- [8]https://tryolabs.com/resources/introductory-
science/computer-vision-technology guide-computer-vision
[7]https://viso.ai/deep-learning/why-computer-
vision-is-difficult/

Notes On COMPUTER VISION

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Notes On COMPUTER VISION

Uploaded by

Copyright:

Available Formats

VISION AI

Abstract— Vision AI is the automated characteristics such as shapes, textures, colors,

Vision AI is being used today in a wide variety of

Machine reads the image in a matrix format. Like it

Simply we can state, vision AI mimics natural

5.1 IMAGE CLASSIFICATION

Central to Vision AI is the process of segmentation,

5.3 OBJECT TRACKING

Object Tracking refers to the process of following a

5.5 INSTANCE SEGMENTATION

Beyond Semantic Segmentation, Instance

Machine vision AI inference requires a considerable amount of

Considering the rapid growth of AI inference

Add your own algorithms and code where

Using biometric analysis such as retinal and

Based on the above discussion, it is clear

You might also like