Professional Documents
Culture Documents
Notes On COMPUTER VISION
Notes On COMPUTER VISION
Fig. 2
Fig. 4
4.3 Retail:
object recognition for automated checkout lanes
Fig. 3
Fig. 8
5 TECHNIQUES OF VISION AI
Fig. 7
4.7 Match move:
merging computer-generated imagery (CGI) with
live action footage by tracking feature points in the
source video to estimate the 3D camera motion and
shape of the environment.
Fig. 9
5.2 OBJECT DETECTION Fig. 11
Object detection allows us to identify and locate
Object Tracking methods can be divided into 2
objects in an image or video. With this kind of
categories according to the observation model:
identification and localization, object detection can
generative method and discriminative method. The
be used to count objects in a scene and determine
generative method uses the generative model to
and track their precise locations, all while
describe the apparent characteristics and minimizes
accurately labeling them. Object detection is
the reconstruction error to search the object, such as
commonly confused with image recognition. So, it
PCA.
is necessary to distinguish between them. Image
recognition assigns an label to an image. Object
detection detects an object and draw a line around 5.4 SEMANTIC SEGMENTATION
it.
Particularly, Semantic Segmentation tries to
semantically understand the role of each pixel in the
image (e.g. is it a car, a motorbike, or some other
type of class?). For example, in the picture above,
apart from recognizing the person, the road, the
cars, the trees, etc., we also have to delineate the
boundaries of each object. Therefore, unlike
Fig. 10 classification, we need dense pixel-wise predictions
from our models.
Fig. 12
Fig. 13
Fig. 13
6 HISTORY
2000s. Two researchers at MIT introduced the first
face detection framework (Viola-Jones) that works
in real-time. Google started testing robot cars on
1970s. When Vision AI first started out in the
roads. Google released Goggles, an image
early 1970s, it was viewed as the visual perception
recognition app for searches based on pictures
component of an ambitious agenda to mimic
taken by mobile devices. To help tag photos,
human intelligence and to endow robots with
Facebook began using facial recognition. Facial
intelligent behavior. At the time, it was believed by
recognition was used to help confirm the identity of
some of the early pioneers of artificial intelligence
Osama bin Laden after he is killed in a US raid.
and robotics (at places such as MIT, Stanford, and
Google Brain’s neural network recognized pictures
CMU) that solving the “visual input” problem
of cats using a deep learning algorithm. Google
would be an easy step along the path to solving
launched open-source Machine learning-system
more difficult problems such as higher-level
TensorFlow
reasoning and planning. According to one well-
known story, in 1966, Marvin Minsky at MIT
asked his undergraduate student Gerald Jay
Sussman to “spend the summer linking a camera to
7 DISTINGUISHING VISION AI
a computer and getting the computer to describe
what it saw” (Boden 2006, p. 781).5 We now know FROM RELATED FIELDS
that the problem is slightly more difficult than
that.6 What distinguished vision AI from the
already existing field of digital image processing
(Rosenfeld and Pfaltz 1966; Rosenfeld and Kak
It is important to understand that vision AI
1976) was a desire to recover the three-dimensional
accomplishes much more than other fields such as
structure of the world from images and to use this
image processing or machine vision, with which it
as a stepping stone towards full scene
shares several characteristics. Let’s have a look at
understanding.
the differences between the fields.
1980s. In the 1980s, a lot of attention was focused
on more sophisticated mathematical techniques for Image processing
performing quantitative image and scene analysis.
Image processing is focused on processing raw Focusing on Visual AI, the number of use-cases for
images to apply some kind of transformation. applying AI that performs at human level or better
Usually, the goal is to improve images or prepare is increasing exponentially, given the fast-paced
them as an input for a specific task, while in vision advances in Machine Learning.
AI the goal is to describe and explain images
Visual AI encompasses techniques used in the
For instance, noise reduction, contrast, or rotation image processing industry to solve a wide range of
operations, typical components of image previously intractable problems by using Vision AI
processing, can be performed at pixel level and do and Deep Learning.
not need a complex grasp of the image that allows
for some understanding of what is happening in it. However, high innovation potential does not come
without challenges.
10.2 SECURITY
12. CONCLUSION
Fig. 15
9. REFERENCES
Fig. 16
[1] Computer Vision: A Modern Approach.
Book by David Forsyth and Jean Ponce
11. FUTURE
[2] https://insights.daffodilsw.com/blog/vision-
ai-what-is-it-and-why-does-it-matter
With further research on and refinement of the
[3] https://www.usmsystems.com
technology, the future of vision AI will see it
perform a broader range of functions. Not only will [4] Programming Computer Vision With
vision AI technologies be easier to train but also be Python. Book by Jan Erik Solem.
able to discern more from images than they do
now. This can also be used in conjunction with [5]https://www.forbes.com/sites/cognitiveworl
other technologies or other subsets of AI to build d/2019/06/26/the-present-and-future-of-
more potent applications. For instance, image computer-vision/?sh=70b13835517d
captioning applications can be combined with
natural language generation (NLG) to interpret the
[6]https://www.sciencedirect.com/topics/food- [8]https://tryolabs.com/resources/introductory-
science/computer-vision-technology guide-computer-vision
[7]https://viso.ai/deep-learning/why-computer-
vision-is-difficult/