GTN K22416C 9 Kadeetentitienti

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 36

FINAL PROJECT

Subject: Introduction to Digital Business &


Artificial Intelligence
Topic: a DB&AI project

Group: Ka de ét èn ti ti en ti
Lecturer: Nguyen Van Ho

Ho Chi Minh city, 12/2022


2
3
CHAPTER 1. INTRODUCTION
1.1. Overview of the problem
Artificial intelligence has been developed for more than 60 years, and its research
results have penetrated every aspect of our economy and society, and many
outstanding achievements have been made. For instance, IBM's dark blue computer
defeated the world chess champion in 1997, marking the official arrival of artificial
intelligence (AI). In 2016, Google Alpha G beat the top human professional go player
Lee Se-dol, making artificial intelligence almost synonymous with the future. The
state council officially established the national strategic aim for the development of
the new generation of AI in China by releasing the development plan for the new
generation of AI in 2017. AI has already shown its value in industries such as
marketing, healthcare, finance, and education, etc. [1]
Facial recognition is one of the front-runner applications of AI. It is one of the
advanced forms of biometric authentication capable of identifying and verifying a
person. Biometric facial recognition is a form of AI that involves the automated
extraction, digitization, and comparison of the spatial and geometric distribution of
facial features to identify individuals. Using a digital photograph of a subject’s face, a
contour map of the position of facial features is converted into a digital template,
using an algorithm to compare an image of a face with one stored in a database.
Images can be collected from repositories of passport or driver’s license photographs,
or from the vast number of images that have been uploaded to social media sites and
the internet. Biometric facial recognition systems can be integrated with the closed-
circuit television systems that already exist in public and private spaces to identify
people in real-time (Smith et al.2018). [2]
Research in automatic face recognition started in the 1960s. There have been
increased investments in facial recognition technology. Venture funding in facial
recognition start-ups was a massive uptick in 2021. With advancements in this
technology, we have seen significant progress in this area and several face
recognitions have been developed and deployed. [3]
1.2. Reason for choosing the topic
AI facial recognition technology is extremely successful when applied in many fields.
1.2.1. For the economy
a. Face Unlock on Smart Devices
This is also a turning point for the technology industry, because when Facial
Recognition was born, specifically, the phone industry developed a type of facial
security key.
A typical example of the explosion of application of AI facial recognition technology
is Apple. This is a technology company leading the era of intelligent facial recognition
when applying AI Face Recognition on their iPhone models. And it is here that opens

4
a fast-paced and progressive AI technology race. (Note: Apple is not the first adopter,
but it is the creator of the global technology race.)
b. Advertising in companies
Few people can imagine the ability of facial recognition to create ads, evaluate them,
and target specific audiences to deliver advertising programs that are more cost-
effective and less expensive than mass advertising.
Few people can imagine the ability of facial recognition to make ads, which helps
corporations analyze and target customers to offer advertising programs that bring
high efficiency and reduce a portion of the cost compared to mass advertising.
1.2.2. For education
a. Checking the presence of the student in class
Classes in various schools or universities frequently include a lot of students.
Traditional attendance has many drawbacks. We may, however, use a face recognition
system in which students must undergo an FR scan before entering class to prove they
are present. Comparing this to more conventional attendance systems, it not only
assures accuracy but also saves time.
b. Ensuring students stay focused
The ESG School of Management in Paris is testing a facial security monitoring
solution in two online classes to ensure students are not distracted during class. Using
software called Nestor, each student's personal computer webcam analyzes eye
movements and facial expressions to find out if the student is concentrating on the
lecture videos.
1.2.3. For national security
a. Facial recognition aids in retail crime prevention
The facial recognition camera system is presently being used to identify criminals, or
they also assist the owner in learning about the history of fraud and informing them so
that they may take fast action to stop it.
Along with data that will be given to or made instantly known to the business owner,
the information also contains the personal photographs of offenders. The
accomplishments of this technology are evident in the significant decrease in the
number of crimes in stores.
b. Looking for missing people
When someone goes missing, especially in children, facial recognition also works
very well to identify victims in kidnappings.
Although facial recognition technology still has many limitations and certain risks in
terms of information security. But undeniably the great contributions of AI facial
recognition to social life as mentioned above, we decided to choose this project “AI
Facial Recognition" because of all the positive aspects. Hoping to bring useful
information and lesser-known benefits of this technology. So researchers could

5
properly target the research path to optimally develop this technology, and customers
could also choose the right technology for their demands.
Face recognition is one of the most important abilities that we use in our daily lives.
There are several reasons for the growing interest in automated face recognition,
including rising concerns for public security, the need for identity verification for
physical and logical access, and the need for face analysis in multimedia data
management and digital entertainment.[4]

CHAPTER 2. DESCRIPTION OF APPLICATIONS, MODELS, SOLUTIONS,


SOFTWARE
2.1. Define facial recognition
Face recognition is a research topic in the computer field that has developed since the
early 90s of the last centuries [5]. Up to now, it has received the attention of many
researchers from many different research fields such as pattern recognition, machine
learning, Statistics, and biometrics. Many practical applications require a facial
recognition system, from simple login management systems to surveillance
applications in Public Area Surveillance or Population Management and Forensics.
Besides, compared with recognition systems based on other biometric characteristics
of people as iris and fingerprint recognition, gait recognition and facial recognition
have many advantages:

- A facial recognition system requires no direct interaction between the recognized


objects and the system
- Run data acquisition (photographs of human faces) to make the process of
identifying a person easier than acquiring other student tracking biological
characteristics (such as fingerprint acquisition) hand, iris)

6
- Facial data is more common than other features due to the explosion of social
networks (Facebook, Twitter, Yahoo...), media data sharing services (YouTube,
Vimeo...) rapid development of image acquisition devices.
- From a person's face, we can extract a lot of related information, not just that
person's identity, such as gender, skin color, gaze direction, race, behavior, health,
age, emotions…
For example:

2.2. Approaches to facial recognition problem


Based on the use of facial features during recognition, recognition systems are divided
into two main approaches: Global Approaches and Local Features Based Approaches.
The sub-global methods will use the global mold features of the face (shape, color,
keys contours, etc.…) and apply local features of the face (pixels, details like eyes,
nose, mouth, eyebrows…) for identification. Among facial recognition systems based
on global features, Eigenfaces and Fisherfaces are the most representative. Eigenfaces
use Principal Components Analysis (PCA) to Represent each face image as a linear
combination of the resulting individual vectors from covariance matrix resolution
calculated from normalized posterior images. Because PCA is an unsupervised
algorithm, it cannot take advantage of class information when the training image set
has more than one sample for each class. So in Fisherfaces, the Linear Discriminant
Analysis (LDA) is used better to exploit this information. Regarding face recognition
systems based on local features, the LBP method and Gabor Wavelets are typical
techniques used to extract facial features. Local features from face images.

7
Neural Networks based

Comparison of the level of recognition between PCA and LDA

Local Binary Pattern (LBP)

8
Example of three Gabor filter responses to two facial expression images [6].
Studies have shown that the above systems based on local features give better results
than systems following the global approach, especially after working with images
affected by the conditions described above. Another approach combines global and
local characteristics to obtain a hybrid system to achieve higher efficiency. The
systems in this document are based on local features.
Although there have been many studies on face recognition, it is still not much to
build a fully automated system from the first step to the last. In this topic, I aim to
build a fully automatic facial recognition system with techniques applied to the
following steps:
- Face detection: using OpenCV's Haar cascade classifier [7] and Dlib's HOG feature
detector [8].

For example, Haar Cascade does not work well when it recognizes only 7/12 faces
and mistakes 1 hand as a face.

Feature extraction of HOG

9
- Feature extraction: determine 128 quantitative values of faces through Dlib's pre-
trained ResNet network.
- Object recognition: based on Euclidean distance [9] between 2 coded faces to
conclude.

Eigenfaces

Eigenfaces (1st row), Fisherfaces (2nd row)


2.3. Models of Operation
Facial recognition is a fast-growing area used widely in identity verification,
monitoring, and access control systems. High recognition rates and less training time
are key factors in facial recognition problems. In our paper, we have compared
artificial neural network (ANN) and convolutional neural network (CNN) for this
specified problem. Our dataset contains more than 14,855 images, of which 1325
images with varied expressions and backgrounds are of the subject to be recognized.
Results show the supremacy of CNN over ANN in terms of accuracy in facial
recognition and less number of epochs, i.e. lesser training time. [10]
Face recognition is the problem of identifying or verifying faces in a photograph. A
general statement of the problem of machine recognition of faces can be formulated as
follows: given still or video images of a scene, identify or verify one or more persons
in the scene using a stored database of faces [11]. Face detection, face alignment,
feature extraction, and finally face recognition are the first four steps in many
descriptions of the process of face recognition.

10
Face detection: Determine which faces are present in the image and mark them with a
bounding box.
Face alignment: Normalize the face so that its photometric and geometry are in line
with the database.
Feature Extraction: Extract features from the facial features that can be applied to the
task of recognition.
Face Recognition: Match the face against one or more known faces in a database that
has been prepared.
Unlike in the past, a system may integrate any or all of the processes into a single
process instead of having a separate module or program for each phase.

Although the method which is used for facial recognition is quite simple, you might
want to consider this in the case of a highly challenging issue.
The objective is to output a person's face along with identification using a deep neural
network. This implies that the neural network must be taught to automatically
recognize various facial features and generate numbers using those aspects. If you
send in multiple photographs of the same person, the neural network's output will be
extremely similar or close, however, if you pass in multiple images of a different
person, the output of the neural network will be considerably different.
By selecting a machine learning algorithm, feeding in data, and producing a result,
machine learning has helped to tackle numerous issues. It is unnecessary for us to
create our neural network. A trained model called Dlib is available for use. When
inputting a face image, it returns several face encodings, which we can compare to
other face encodings to see if the inputted face matches any of the faces we have
photographs of. This is exactly what we need it to accomplish. However, face
recognition is a collection of numerous connected issues:
First, take a look at the image and identify each face.
Next, concentrate on each face and be able to recognize the same person even if their
face is turned in an odd direction or has poor lighting.
Third, be able to identify distinguishing characteristics of a face that you may use to
differentiate it from those of other people, such as the size of the eyes, the length of
the face, etc.

11
Finally, the name of the person can then be determined by comparing the distinctive
traits of that face to all the others you are familiar with.
Your brain is programmed to perform all of this swiftly and automatically if you're a
human. Because we are so adept at identifying faces, we frequently see faces in
everyday objects:
We must educate computers at each stage of this process independently because they
are unable to make such high-level generalizations (at least not yet) and construct a
pipeline where each stage of facial recognition is solved independently and the output
is passed on to the following phase. To put it another way, we will combine several
machine learning algorithms: We'll discover a different machine learning algorithm
for every stage. To prevent this from becoming a book, I won't go into detail about
every algorithm, but you will understand the fundamental concepts underlying each
one and how to construct facial recognition.
The problem is to tackle this issue one step at a time. Each step is a different machine
learning algorithm.
2.3.1. Step 1: Finding all the Faces
The first step in the pipeline is face detection. Obviously, the faces in a photograph
need locating before telling them apart
Face detection is a great feature for cameras. When the camera can automatically pick
out faces, it can make sure that all the faces are in focus before it takes the picture. But
we’ll use it for a different purpose - finding the areas of the image we want to pass on
to the next step in our pipeline.
Face detection went mainstream in the early 2000s when Paul Viola and Michael
Jones invented a way to detect faces that were fast enough to run on cheap cameras.
However, much more reliable solutions exist now. A method invented in 2005 called
Histogram of Oriented Gradients (HOG for short) is a tool to find faces in an image.
Making the image black and white is the first step

Then we’ll look at every single pixel in our image one at a time. For every single
pixel, we look at the pixels that directly surround it:

12
Our goal is to figure out how dark the current pixel is compared to the pixels directly
surrounding it. Then we want to draw an arrow showing in which direction the image
is getting darker:

Looking at just this one pixel and the pixels touching it, the image is getting darker
towards the upper right.
If you repeat that process for every single pixel in the image, you end up with every
pixel being replaced by an arrow. These arrows are called gradients and they show the
flow from light to dark across the entire image:

This might seem like a random thing to do, but there’s a really good reason for
replacing the pixels with gradients. If we analyze pixels directly, really dark images
and light images of the same person will have different pixel values. But by only
considering the direction that brightness changes, both dark images, and bright images
will end up with the same representation.
But saving the gradient for every single pixel includes much detail. We end up
missing the forest for the trees. It would be better if we could just see the basic flow of
lightness/darkness at a higher level so we could see the basic pattern of the image.

13
To do this, we have to break up the image into small squares of 16x16 pixels each. In
each square, we will count up how many gradients point in each major direction (how
many points up, point up-right, point right, etc.…). Then we’ll replace that square in
the image with the arrow directions that were the strongest. The result is we turn the
original image into a very simple representation that captures the basic structure of a
face simply:

The original image is turned into a HOG representation that captures the major
features of the image regardless of image brightness.
We have to find the part of our image that looks the most similar to a known HOG
pattern that was extracted from a bunch of other training faces so that we can find
faces in this HOG image

2.3.2. Step 2: Posing and Projecting Faces


We isolated the faces in our image. However, we have to deal with the problem that
faces turned different directions look different to a computer:

14
To account for this, we will try to warp each picture so that the eyes and lips are
always in the sample place in the image. This will make it a lot easier to compare
faces in the next steps.
We are going to use an algorithm called face landmark estimation. There are lots of
ways to do this, but we are going to use the approach invented in 2014 by Vahid
Kazemi and Josephine Sullivan.
The basic idea is we will come up with 68 specific points (called landmarks) that exist
on every face — the top of the chin, the outside edge of each eye, the inner edge of
each eyebrow, etc. Then we will train a machine learning algorithm to be able to find
these 68 specific points on any face:

The 68 landmarks we will locate on every face. This image was created by Brandon
Amos of CMU who works on OpenFace
The result of locating the 68 face landmarks on our test image
Now that we know where the eyes and mouth are,
we’ll simply rotate, scale and shear the image so
that the eyes and mouth are centered as best as
possible. A shear mapping is a linear map that
displaces each point in a fixed direction, by an
amount proportional to its signed distance from
the line that is parallel to that direction and goes
through the origin. [12]

15
We will not do any fancy 3d warps because
that would introduce distortions to the image.
We are only going to use basic image
transformations like rotation and scale that
preserve parallel lines (called affine
transformations):

Now no matter how the face is turned, we can center the eyes and mouth in roughly
the same position in the image. This will make our next step a lot more accurate.
2.3.3. Step 3: Encoding Faces
We are to the meat of the problem — actually telling faces apart.
The simplest approach to face recognition is to directly compare the unknown face we
found in Step 2 with all the pictures we have of people that have already been tagged.
When we find a previously tagged face that looks very similar to our unknown face, it
must be the same person. Seems like a pretty good idea, right?
There is a huge problem with that approach. A site like Facebook with billions of
users and a trillion photos cannot possibly loop through every previously-tagged face
to compare it to every newly uploaded picture. That would take way too long. They
need to be able to recognize faces in milliseconds, not hours.
What we need is a way to extract a few basic measurements from each face. Then we
could measure our unknown face the same way and find the known face with the
closest measurements. For example, we might measure the size of each ear, the
spacing between the eyes, the length of the nose, etc.
2.3.4. The most reliable way to measure a face
The measurements that seem obvious to us humans (like eye color) don’t make sense
to a computer looking at individual pixels in an image. Researchers have discovered
that the most accurate approach is to let the computer figure out the measurements to
collect itself. Deep learning does a better job than humans at figuring out which parts
of a face are important to measure. We are going to train it to generate 128
measurements for each face
The training process works by looking at 3 face images at a time:

16
Load a training face image of a known person
Load another picture of the same known person
Load a picture of a different person
Then the algorithm looks at the measurements it is currently generating for each of
those three images. It then tweaks the neural network slightly so that it makes sure the
measurements it generates for #1 and #2 are slightly closer while making sure the
measurements for #2 and #3 are slightly further apart:

After repeating this step millions of


times for millions of images of
thousands of different people, the
neural network learns to reliably
generate 128 measurements for each
person. Any ten different pictures of
the same person should give roughly
the same measurements.
Machine learning people call the 128
measurements of each to face an
embedding. The idea of reducing
complicated raw data like a picture into
a list of computer-generated numbers
comes up a lot in machine learning (especially in language translation). The exact
approach for faces we are using FaceNet was invented in 2015 by researchers at
Google but many similar approaches exist.
2.3.5. Encoding our face image
This process of training a convolutional neural network to output face embeddings
requires a lot of data and computer power. But once the network has been trained, it
can generate measurements for any face, even ones it has never seen before So this
step only needs to be done once. The fine folks at Open Face already did this and they
published several trained networks. We need to run our face images through their pre-
trained network to get the 128 measurements for each face. The measurements for our
test image:

17
The network generates nearly the same numbers when looking at two different
pictures of the same person.
2.3.6. Step 4: Finding the person’s name from the encoding
This last step is the easiest in the whole process. All we have to do is find the person
in our database of known people who has the closest measurements to our test image.
You can do that by using any basic machine-learning classification algorithm. No
fancy deep-learning tricks are needed. We will use a simple linear SVM classifier. In
machine learning, support vector machines (SVMs, also support vector networks) are
supervised learning models with associated learning algorithms that analyze data for
classification and regression analysis [12]. However, lots of classification algorithms
could work.

All we need to do is train a classifier that can take in


the measurements from a new test image and tell
which known person is the closest match. Running
this classifier takes milliseconds. The result of the
classifier is the name of the person
2.3.7. Conclusion
By identifying, modifying, and detecting the contours of the face, we were able to
determine whom the image in the picture represents.
This demonstrates emphatically how quickly machine learning has seized control in
the field of artificial intelligence. A typical example is Facebook, which tags
individuals spotted in a photo automatically rather than manually. This wasn't
accomplished in the past; instead, suggestions were added to the persons in images.
2.4. Advantages of face recognition
Facial recognition offers further advantages in addition to unlocking your smartphone
2.4.1. Increased security
On a governmental level, facial recognition can help to identify terrorists or
other criminals. On a personal level, facial recognition can be used as a security tool
for locking personal devices and for personal surveillance cameras.
2.4.2. Reduced crime
It is simpler to find burglars, thieves, and trespassers when face recognition is
used. Even just knowing that facial recognition technology is in place can reduce
crime, especially minor offenses. Cybersecurity has advantages in addition to physical
security. Face recognition technology can be used by businesses to access computers
without passwords. Theoretically, because there is nothing to steal or alter, as there is
with a password, the technology cannot be hacked.
2.4.3. Removing bias from stop and search
Police issue stems from complaints from the public about unlawful stops and
searches; facial recognition technology could streamline the procedure. Face
recognition technology could assist minimize potential prejudice and lessen stops and

18
searches on law-abiding persons by identifying suspects among crowds using an
automated procedure rather than a human one.

Facial recognition technology is used at an intersection to identify jaywalkers in


Shenzhen, China. Offenders' faces are often displayed on the screen. (Shutterstock).
2.4.4. Greater convenience
Customers won't need to use their credit cards or cash to make purchases in
stores as the technology becomes more widely used. This might shorten wait times at
the register. Facial recognition provides a quick, automatic, and seamless verification
process because it doesn't involve physical touch like fingerprinting or other security
procedures - important in the post-COVID environment.
2.4.5. Faster processing
Facial recognition has advantages for the businesses that utilize it because it
can identify faces in under a second. In a time of sophisticated hacking tools and
cyberattacks, businesses want both safe and quick solutions. Quick and effective
identity verification is made possible by facial recognition.
2.4.6. Integration with other technologies
The majority of security software is compatible with facial recognition
systems. In actuality, integration is simple. As a result, the amount of additional
funding needed to implement it is constrained.
2.5. Disadvantages of face recognition
While some people do not mind being filmed in public and do not object to the use of
facial recognition where there is a clear benefit or rationale, the technology can inspire
intense reactions from others. Some of the disadvantages or concerns include
2.5.1 Surveillance

Surveillance of facial recognition

19
Some worry that the use of facial recognition along with ubiquitous video cameras,
artificial intelligence, and data analytics creates the potential for mass surveillance,
which could restrict individual freedom. While facial recognition technology allows
governments to track down criminals, it could also allow them to track down ordinary
and innocent people at any time.
2.5.2 Scope for error
Facial recognition data is not free from error, which could lead to people being
implicated for crimes they have not committed. For example, a slight change in
camera angle or a change in appearance, such as a new hairstyle, could lead to an
error. In 2018, Newsweek reported that Amazon’s facial recognition technology had
falsely identified 28 members of the US Congress as people arrested for crimes.
2.5.3 Breach of privacy
The question of ethics and privacy is the most contentious one. Governments have
been known to store several citizens' pictures without their consent. In 2020, the
European Commission said it was considering a ban on facial recognition technology
in public spaces for up to 5 years, to allow time to work out a regulatory framework to
prevent privacy and ethical abuses.
2.5.4 Massive data storage
Facial recognition software relies on machine learning technology, which requires
massive data sets to “learn” to deliver accurate results. Such large data sets require
robust data storage. Small and medium-sized companies may not have sufficient
resources to store the required data.
2.6. Uses of facial recognition
In reality, facial recognition technology has already permeated our day-to-day
activities. It unlocks phones, tags friends on Facebook, and secures homes. In part,
these small integrations fulfill the overarching promise of technology by providing a
simplified, expedited, and, in some cases, more secure way to proceed with daily life.
These convenient, personal, and consensual interactions with facial recognition
technology might reduce or even dispel common clichés about surveillance. But
personal engagement with technology doesn’t always translate into a full
understanding of how that technology collects and uses data. This is exacerbated when
the technology’s use isn’t limited to the interactions that we can see and catalog.[14]
2.6.1. Unlock Phones
Numerous phones, including the most recent iPhone, may now be unlocked using face
recognition. With the help of this technology, it is possible to protect critical
information and make sure that if a phone is stolen, the thief cannot access personal
information.

20
2.6.2. Smarter Advertising
Face recognition can improve the targeting of advertisements by establishing accurate
assumptions about people's gender and age. Companies like Tesco already have plans
to install facial recognition screens at gas stations. It won't take long for facial
recognition to be widely employed in advertising.
2.6.3. Find missing people
Face recognition technology may be used to find missing children and victims of
human trafficking. Law enforcement can be contacted if a missing person is
recognized by facial recognition in a public location, such as an airport, store, or
another public area, as long as they are listed as missing in a database.
2.6.4. Protect Law Enforcement and security services
The ability to swiftly identify persons in the field from a safe distance is already
helping police officers thanks to face recognition programs on mobile devices. This
can help them by giving them perspective about the people they are working with and
whether they should proceed cautiously.

A visitor from Denmark stands in front of a face recognition camera after arriving at
customs at Orlando International Airport on June 21, 2018. Florida's busiest airport
is becoming the first in the nation to require a face scan of passengers on all arriving
and departing international flights. (AP Photo/John Raoux)
For instance, if a policeman stops a wanted murderer during a routine traffic check,
the officer will immediately realize that the man or woman is armed and dangerous
and will request assistance.
2.6.5. Identifying People on social media
Facebook uses face recognition technology to automatically identify people who
appear in pictures. This makes it simpler for people to find pictures in which they
feature and gives them the ability to suggest when certain people should be tagged in
pictures.

21
2.6.6. Tracking Student Attendance
In addition to enhancing safety in schools, face recognition can track children's
attendance. In the past, pupils could mark down another child who was skipping class
on their attendance papers. Facial recognition technology is already widely used in
schools to prevent students from skipping classes. To confirm students' identities,
tablets are used to scan their faces and compare the results to a database.
2.6.7. Finance and banking
Facial recognition technology can be used for customer identification and verification.
Similarly, to this, facial recognition software can speed up the customer onboarding
procedure. Facial recognition can be used to increase security and prevent ATM theft
and financial fraud incidents.
2.7. Challenges in facial recognition
However, building a fully automatic pattern recognition system with high accuracy
recognition has been a real challenge for researchers. Due to factors (subjective and
objective) that affect the image acquisition process and produce highly variable photos
of the same face. The main factors affecting the accuracy of a face recognition system
can be listed:
- Light conditions: Photos obtained in different lighting conditions will be very
different and reduce the accuracy of the recognition process.

Examples of faces under different lighting conditions


- Aging changes: Changes in faces in age will be difficult to recognize in the human
visual system.

- Pose variations: The recognition with frontal angle images has much better results
than those taken at a large angle. The usual solution for large directional is to use
interpolation algorithms to compensate for the obscured face.

22
Facial recognition under pose variations
- Emotions (Facial Expression Variations): In different emotional states, features
important for facial recognition (such as eyes, nose, and mouth) can be distorted and
lead to incorrect identification results.

Face different emotions


- Occlusions: Face photos can be obscured by objective factors such as obstacles in
front of the face or subjective factors such as secondary facial conditions (such as
bandanas, eyeglasses, etc.) for the wrong format recognition process.

Face with glasses

23
2.8. Software and support libraries:
2.8.1. OpenCV (Open Computer Vision)

OpenCV system
OpenCV (OpenSource Computer Vision Library) is an open-source computer vision
and machine learning software library. OpenCV was built to provide a common
infrastructure for computer vision applications and to accelerate the use of machine
perception in commercial products. Being an Apache 2 licensed product, OpenCV
makes it easy for businesses to utilize and modify the code.
The library has more than 2500 optimized algorithms, which includes a
comprehensive set of both classic and state-of-the-art computer vision and machine
learning algorithms. These algorithms can be used to detect and recognize faces,
identify objects, classify human actions in videos, track camera movements, track
moving objects, extract 3D models of objects, produce 3D point clouds from stereo
cameras, stitch images together to produce a high-resolution image of an entire scene,
find similar images from an image database, remove red eyes from images taken using
flash, follow eye movements, recognize scenery and establish markers to overlay it
with augmented reality, etc. OpenCV has more than 47 thousand users and an
estimated number of downloads exceeding 18 million. The library is used extensively
by companies, research groups, and by governmental bodies.
Along with well-established companies like Google, Yahoo, Microsoft, Intel, IBM,
Sony, Honda, and Toyota that employ the library, there are many startups such as
Applied Minds, VideoSurf, and Zeitera, that make extensive use of OpenCV.
OpenCV’s deployed uses span the range from stitching street view images together,
detecting intrusions in surveillance video in Israel, monitoring mine equipment in
China, helping robots navigate and pick up objects at Willow Garage, detecting
swimming pool drowning accidents in Europe, running interactive art in Spain and
New York, checking runways for debris in Turkey, inspecting labels on products in
factories around the world on to rapid face detection in Japan.

24
It has C++, Python, Java, and MATLAB interfaces and supports Windows, Linux,
Android, and Mac OS. OpenCV leans mostly towards real-time vision applications
and takes advantage of MMX and SSE instructions when available. A full-featured
CUDA and OpenCL interfaces are being actively developed right now. There are over
500 algorithms and about 10 times as many functions that compose or support those
algorithms. OpenCV is written natively in C++ and has a templated interface that
works seamlessly with STL containers.[15]
2.8.2 Dlib

Structure of the Dlib library


Dlib is an open-source suite of applications and libraries written in C++ under a
permissive Boost license. Dlib offers a wide range of functionality across several
machine learning sectors, including classification and regression, numerical
algorithms such as quadratic program solvers, an array of image processing tools, and
diverse networking functionality, among many other facets.

Dlib's HoG is in action.


Dlib also features robust tools for object pose estimation, object tracking, face
detection (classifying a perceived object as a face), and face recognition (identifying a
perceived face).

25
Dlib is being used in a Jupyter notebook as part of a novel facial recognition
framework.
Though Dlib is a cross-platform resource, many custom workflows involving facial
capture and analysis (whether recognition or detection) use the OpenCV library of
functions, operating in a Python environment, as in the image below.

A common environment for use of Dlib – with Python OpenCV


I‍ n the field of computer vision, Dlib has APIs that help us do things like facial
landmarks detection, correlation tracking, and deep learning. Different from
OpenCV's purpose of providing an algorithmic infrastructure for image physics and
computer vision applications, Dlib is designed for machine learning and artificial
intelligence applications with the following main sublibraries:
- Classification: classification techniques are mainly based on two basic methods, k-
NN, and SVM.

26
- Data transformation: algorithms transform data to reduce the amount, remove
redundant data, and enhance the distinctiveness (discrimination) of retained
characteristics.
- Clustering: clustering techniques.
- Regression: regression techniques.
- Structured Prediction: structured prediction algorithms.
- Markov Random Fields: algorithm based on random Markov fields.
To work, Dlib does not require Python libraries, but if you use Dlib in work related to
image processing, and computer vision, you should install the Numpy, SciPy, and
Scikit-image libraries. Necessary libraries when installing Dlib:
- Boost: a collection of C++ libraries that help programmers not have to struggle with
basic but difficult tasks to code from scratches such as linear algebra, multithreading,
basic image processing, or unit testing.
- Boost.Python: provides interoperability between C++ language and Python language
- CMake: is an open-source toolkit for compiling, testing, and packaging software.
- X11/XQuartx: provides a basic framework for GUI development that is very popular
on Unix-like operating systems.[16]
2.8.3 Python
Python is an interpreted, object-oriented, high-level programming language with
dynamic semantics. It is high-level built-in data structures, combined with dynamic
typing and dynamic binding, making it very attractive for Rapid Application
Development, as well as for use as a scripting or glue language to connect existing
components together. Python's simple, easy-to-learn syntax emphasizes readability
and therefore reduces the cost of program maintenance. Python supports modules and
packages, which encourages program modularity and code reuse. The Python
interpreter and the extensive standard library are available in source or binary form
without charge for all major platforms and can be freely distributed.[17]
Currently, Python is ranked 3rd in the Top 10 most popular programming languages
being used in the world. (2016 rankings):

27
Top 10 most popular programming languages in the world
According to statistics from the top 39 computer science schools, the majority of
schools use the Python language to teach

Statistical graph of programming language applied to teaching in 39 computer


science schools

CHAPTER 3. STUDY AND RESEARCH PLAN


3.1. Related Subjects:
3.1.1. Machine learning
Machine learning (ML) is a type of artificial intelligence (AI) that allows software
applications to become more accurate at predicting outcomes without being explicitly
programmed to do so. Machine learning algorithms use historical data as input to
predict new output values. [18]
3.1.2. Natural language processing
Natural language processing (NLP) refers to the branch of computer science - and
more specifically, the branch of Artificial intelligence or AI - concerned with giving
computers the ability to understand text and spoken words in much the same way
human beings can. artificial intelligence or AI [19]
3.1.3. Speech
Natural language processing is an area of applied research related to artificial
intelligence, with the task of providing capabilities for computers to understand, hear
and express human language, thereby, helping humans perform certain tasks through
several different forms such as speech and writing.[20]
3.1.4. Expert systems
An expert system is a computer program that uses artificial intelligence (AI)
technologies to simulate the judgment and behavior of a human or an organization that
has expertise and experience in a particular field.[21]
3.1.5. Robot
The advancement in recent technologies such as artificial intelligence (AI), computer
vision, and the internet of things (IoT) extends to various areas, especially surveillance
systems that require real-time facial recognition processing to ensure safety. Mobile

28
robots are widely used in surveillance systems to perform dangerous tasks that
humans cannot.[22]
3.1.6. Vision
Assisted by industrial cameras, machine vision systems can measure and count
products, calculate their weight or volume, and inspect goods at top speed concerning
predefined characteristics. Furthermore, they automatically extract limited, but crucial,
information from huge quantities of data, or they help experts interpret images by
filtering, optimizing, and supplementing the image itself, or by facilitating quick
retrieval and availability of the data.[23]
3.2. Learning and research methods
3.2.1. Machine learning
Steps to learn machine learning:
Start with Basic Knowledge of Mathematics:
 Linear algebra:
 Linear algebra can be a subfield of mathematics involving vectors, matrices,
and linear transformations. It's an important foundation for the field of machine
learning, from symbols that won't describe the operation of algorithms to the
implementation of algorithms in code. This helps to represent the data as linear
equations.
 Learn the concepts of algebra, rooting, graphical linear equations, how to
perform operations on angular coefficients, calculations, matrix analysis, etc.
 Real algebra is used to find variables in real life and solve them. It is used in
recommendation systems and facial recognition. The representation of data is
carried out with the help of learned matrices in linear algebra.
 Calculus: Maximum & Minimum, Functions of single and multiple variables,
and private derivatives are some of the important topics of calculus to cover. It
is used to study variables and how they change. Knowledge of calculus is
needed to build many machine-learning techniques and applications.
 Probability:
 Probability can be a field of mathematics that quantifies uncertainty. This is
undeniably a pillar of the field of machine learning and a lot of people
recommend that it be a prerequisite topic to consider before starting. This is an
important area of mathematics for collecting and analyzing data in the field of
machine learning.
 The concept of probability is, it is a measure of the likelihood of an event
occurring. You need to gain insights from the available data, and for this
purpose, you need to understand probabilities.
 Usually, probability, and statistics are something that you need to study
together. A combination of both of these skills is required to become a machine
learning expert as they give you knowledge of what kind of data analysis is
needed.

29
 Statistics: Descriptive Statistics: Descriptive statistics are required to describe
and summarize the data available to you so that you can make decisions about
what kind of data analysis tools can be used to interpret the results. The topics
to cover in descriptive statistics are – Central Trend, Standard Distribution,
Variability, and Sampling Distribution.
 Inferential Statistics: Inferential statistics help you draw inferences and
conclusions after analyzing the data. The topics you will have to study to
conclude are – Estimation, Hypothesis Testing, ANOVA, Correlation,
Regression, etc. This technique is applied to a smaller sample and implies it to
the larger group.
 Programming languages: No one programming language can cover the whole
of the work in machine learning. So the language takes precedence based on
the project you're working on. Programming languages help you code your
problems in a language that is easy for machines to understand. Some
languages to start with are – Python, Java, R, and Scala. Python is the most
popular programming language used for machine learning projects.
3.2.2. Natural language processing
 Vocabulary analysis: Vocabulary analysis involves identifying and analyzing
word structure. We divide the entire paragraph of text into paragraphs,
sentences, and words.
 Parsing: Also known as parsing, it involves analyzing words in sentences to
find grammar and rearranging them to determine how they relate to each other.
It rejects statements like "The apple eats the girl."
 Semantic analysis: This involves extracting dictionary meanings from texts. It
also maps syntactic structures and objects in the task domain to check for
meaning. It rejects claims like "high dwarf".
 Lecture integration: It analyzes the previous sentence to guess the meaning of
the current sentence and the sentence after it.
 Pragmatic analysis: This reinterprets the statement to ensure it accurately
defines the meaning of the statement. It attempts to retrieve aspects of the
language that require knowledge of the real world.
3.2.3. Speech
 The voice is s basic human means of communication aimed at exchanging
information in the language as well as the emotions of the speaker [24].

 Speech processing [25] is the study of human speech in the form of signals, and
methods of processing these signals. Speech signals are usually expressed in
numerical form, i.e., "digitized", and therefore, speech processing can be
considered the intersection of “digital signal processing” and “natural language
processing”.
 Intonation is a general and important component of speech because all
languages have intonation. In linguistics, intonation is a component of
phonetics and is represented by physical factors such as time, pitch, intensity,
and spectrum. And an intonation system is a chance in the pitch, intensity,
30
break, and spectrum of a sentence to express a meaning, and nuance emotional
when communicating by voice [26][27]. Intonation consists of several
components as follows: pitch, duration, and intensity.
 Some intonation patterns:
- Instinct: is a hiatus pitch model that defines an accent by a single point. The spline
curve passing through these points forms the line F0 [28].
- ToBI: the most used interruptive accent modeling is presented in the ToBI model
[29]
- Fujisaki: Fujisaki's model was developed based on the filter method. Fujisaki argues
that pitch lines consist of two components: accent and phrase intonation [30]

31
32
REFERENCES
[1] “Artificial Intelligence (AI) for Energizing the E-commerce."- March 15, 2022-
Prof. Yanli, Mahabubur Rahman Miraj, Md Sazibur Rahman, Md Sazibur Rahman,
Tariqul Islam, Mir Abdur Rob.

33
[2] The ethical application of biometric facial recognition technology – Marcus Smith
& Seumas Miller.
[3] What Is AI, ML & How They Are Applied to Facial Recognition Technology -
suneratech.com.
[4] [5] Handbook of Face Recognition - Stan Z.Li & Anil K.Jain, 2 nd Edition.
Springer, 2011.
[6] Z. Zhang, M. Lyons, M. Schuster, and S. Akamatsu, “Comparison between
geometry-based and Gabor-wavelets-based facial expression recognition using multi-
layer perceptrons,” Proc. Int’l Conf. Automatic Face and Gesture Recognition, 454-
459, 1998
[7] M. A. Turk and A. P. Pentland, “Face recognition using eigenfaces,” 1991,
pp.586–591.
[8] P. N. Belhumeur, J. P. Hespanha, and D. J. Kriegman, “Eigenfaces vs.
fisherfaces: Recognition using class specific linear projection,” Pattern Anal. Mach.
Intell. IEEE Trans. On, vol. 19, no. 7, pp. 711–720, 1997.
[9] T. Ahonen, A. Hadid, and M. Pietikainen, “Face description with local binary
patterns: Application to face recognition,” Pattern Anal. Mach. Intell. IEEE Trans.
On, vol. 28, no. 12, pp. 2037–2041, 2006.
[10] Neelabh Shanker Singh. (2019, 12 3). Facial Recognition Using Deep
Learning. Springer.
[11] Kishor, S. B., Naranje, S. G., & Urkude, M. D. (2015). Face Recognition
Technique: A Literature Survey on Face Recognition and Insight on Machine
Recognition Using. CreateSpace Independent Publishing Platform.
[12] Shear -- from Wolfram MathWorld. (n.d.). Wolfram MathWorld. Retrieved
December 13, 2022
[13] Support vector machine. (n.d.). Wikipedia. Retrieved December 13, 2022
[14] Gladstone, N. (2018, September 19). How Facial Recognition Technology
Permeated Everyday Life. Centre for International Governance Innovation. Retrieved
November 27, 2022
[15] OpenCV. (2020, November 4).

[16] dlib C++ Library. (n.d.).

[17] What is Python? Executive Summary. (n.d.).

[18] Burns, E. (2021, March 30). machine learning. Enterprise AI.

[19] Education, I. C. (2021, August 17). Natural Language Processing (NLP).

34
[20] Lutkevich, B., & Burns, E. (2021, March 2). natural language processing (NLP).

Enterprise AI.

[21] Lutkevich, B. (2022, July 7). expert system. Enterprise AI.

[22] (Just a Moment. . ., n.d.)

[23] Ag, B. (2022, December 16). How Does the Industrial Camera Work as the Centerpiece

of a Machine Vision System? Basler AG.

[24] D.-K. Mac, V. Aubergé, A. Rilliard, and E. Castelli, “Cross-cultural perception of


Vietnamese Audio-Visual prosodic attitudes,” Speech Prosody 2010, 2010.
[25] J. P. H. van Santen, Progress in Speech Synthesis. Springer Science & Business
Media, 1996.
[26] A. Botinis, Intonation: Analysis, modeling, and technology, vol. 15. Springer,
2000.
[27] D. Hirst and A. Di Cristo, Intonation systems: a survey of twenty languages.
Cambridge University Press, 1998.
[28] J. A. Louw and E. Barnard, “Automatic intonation modeling with INTSINT,”
Proc. Pattern Recognit. Assoc. South Afr., pp. 107–111, 2004.
[29] K. E. Silverman, M. E. Beckman, J. F. Pitrelli, M. Ostendorf, C. W. Wightman,
P. Price, J. B. Pierrehumbert, and J. Hirschberg, “TOBI: a standard for labeling
English prosody.,” in ICSLP, 1992, vol. 2, pp. 867–870.
[30] H. Fujisaki, S. Ohno, C. Wang, “A command-response model for F0 contour
generation in multilingual speech synthesis”, Journal of Phonetics, vol. 2, pp 223-232,
1974.

35
36

You might also like