10 Chapter2

Chapter 2
REVIEW OF RELATED LITERATURE
This chapter includes related literature and related studies that give ideas to the
researcher in the formulation of the study. The reviews found on the literature published
and other studies.
Facial Recognition and Voice Recognition
A facial recognition system is a computer application capable of identifying or
verifying a person from a digital image or a video frame from a video source. One of the
ways to do this is by comparing selected facial features from the image and a facial
database. Face detection is a computer technology being used in a variety of applications
that identifies human faces in digital images. Face detection also refers to the
psychological process by which humans locate and attend to faces in a visual scene. It is
typically used in security systems and can be compared to other biometrics such as
fingerprint or eye iris recognition systems. (Parveen kumar & Doulat singh, 2016).
In this study, face recognition is implemented and used for identifying the user
accessing the system proposed. It capable of identifying the student from video frames
captured by the systems camera. Locate and extract facial features then compare it to a
known facial measurements of the students in the system.

10
Face Recognition and Detection Techniques
On a study conducted by Parveen kumar and Doulat singh (2016), Approach on
Face Recognition & Detection Techniques, Images are an important form of data and are
used in almost every application. Some applications cannot use images directly due to the
large amount of memory space needed to store these images. One of the most critical
decision points in the design of a face recognition system is the choice of an appropriate
face representation. Effective feature descriptors are expected to convey sufficient
invariant and non-redundant facial information. Motion information is used to find the
moving regions and probable eye region blobs are extracted by thresholding the image.
These blobs reduce the search space for face verification which is done by template
matching. Experimental results for face detection show good performance even across
orientation and pose variation to a certain extent. The face recognition is carried out by
cumulatively summing up the Euclidean distance between the test face images and the
stored database, which shows good discrimination for true and false subjects. As human
face is a dynamic object having high degree of variability in its appearance, that makes
face detection a difficult problem in computer vision. In this field, accuracy and speed of
identification is a main issue. The study concludes that image based approaches is best
among other approach for face recognition. Image-based approaches are having three
methods: Linear subspace methods, neural networks, and statistical approaches.
Face Recognition Algorithms
On a survey paper written by Marqu´es, Ion (2010) Face Recognition Algorithms,
the paper presented the face recognition area, explaining different approaches, methods,
11
tools and algorithms used since the 60’s. Some algorithms are better, some are less
accurate, some of them are more versatile and others are too computationally costly. The
paper also discusses feature extraction methods for face recognition. The paper also
discusses problems with face recognition like pose and illumination and how they can
possibly be solved using different approaches.
In this study, knowing known face recognition and detection techniques is trivial.
Also taking into account the speed and accuracy of each technique for face recognition
provides the researchers of this study some useful information and approaches to improve
face recognition. The “Face Recognition Algorithms” paper guide the researchers on how
to resolve some of the significant issues on face recognition such as pose and illumination
and also help the researchers of this study to weigh the most effective algorithms to be
used on face recognition. Thus the above mentioned face detection, recognition,
technique, and algorithms provide very useful information in the field of face recognition.
Histogram of Oriented Gradients Algorithm
On a paper by Dalal, Navneet and Triggs, Bill (2005) Histograms of Oriented
Gradients for Human Detection, the paper shows that after reviewing existing edge and
gradient based descriptors, shows that grids of Histograms of Oriented Gradient (HOG)
descriptors signicantly outperform existing feature sets for human detection. The study
evaluates the influence of each stage of the computation on performance, concluding that
Fine-scale gradients, fine orientation binning, relatively coarse spatial binning, and
High-quality local contrast normalization in overlapping descriptor blocks is all important
for good results.

12
In this study, HOG algorithm is implemented and used for locating and
classifying if there is a human face on an image. HOG method is based on evaluating
well-normalized local histograms of image gradient orientations in a dense grid. Similar
features have seen increasing use over the past decade. The basic idea is that local object
appearance and shape can often be characterized rather well by the distribution of local
intensity gradients or edge directions, even without precise knowledge of the
corresponding gradient or edge positions. In practice this is implemented
by dividing the image window into small spatial regions ("cells"), for each cell
accumulating a local 1-D histogram of gradient directions or edge orientations over the
pixels of the cell. An overview of the feature extraction and object detection chain shown
in Figure 2. The detector window is tiled with a grid of overlapping blocks
in which Histogram of Oriented Gradient feature vectors are extracted. The combined
vectors are fed to a linear SVM for object/non-object classification. The detection
window is scanned across the image at all positions and scales, and conventional non-
maximum suppression is run on the output pyramid to detect object instances, but this
paper concentrates on the feature extraction process.
Figure 2. Overview of the HOG Feature Extraction Chain Face Landmarks
Alignment
13
Face Landmark Alignment
On the paper by Kazemi, Vahid and Sullivan, Josephine (2014), One Millisecond
Face Alignment with an Ensemble of Regression Trees, the paper presents a general
framework based on gradient boosting for learning an ensemble of regression trees that
optimizes the sum of square error loss and naturally handles missing or partially labelled
data, the paper presents a new faster algorithm for face alignment and face landmark
estimation. The paper addresses the problem of Face Alignment for a single image. It
show how an ensemble of regression trees can be used to estimate the face’s landmark
positions directly from a sparse subset of pixel intensities, achieving super real time
performance with high quality predictions. It described on how an ensemble of regression
trees can be used to regress the location of the face landmarks like the top of the chin, the
outside edge of each eye, the inner edge of each eyebrow, lips, etc. Shown in Figure 3 an
ensemble of randomized regression trees is used to detect 194 landmarks on face from a
single image in a millisecond.
Figure 3. Selected results on the HELEN Dataset

14
In this study, the algorithm use by Kazemi, Vahid and Sullivan, Josephine (2014)
is implemented to locate face landmarks such as eyes, nose, lips, eyebrows etc. on a
given image and then rotate the images so that these landmarks are centered on the image
frame.
Deep Convolutional Neural Network Algorithms
On a paper written by Google employees Florian Schroff, Dmitry Kalenichenko,
and James Philbin (2015) FaceNet: A Unified Embedding for Face Recognition and
Clustering , the paper present a system called FaceNet, that directly learns a mapping
from face images to a compact Euclidean space where distances directly correspond to a
measure of face similarity. Once this space has been produced, tasks such as face
recognition, verification and clustering can be easily implemented using standard
techniques with FaceNet embeddings as feature vectors. It uses Deep Convolutional
Neural Network algorithms to optimize the embedding itself. The system achieves state-
of-the-art face recognition performance using only 128-bytes per face.
On a paper written by Brandon Amos, Bartosz Ludwiczuk, Mahadev
Satyanarayanan (2016) OpenFace: A general-purpose face recognition library with
mobile applications, the paper presents a face recognition library which implements face
recognition with neural networks from the techniques used by Facebook’s DeepFace and
Google’s FaceNet. Furthermore an article Machine Learning is Fun (Geitgey, 2016). The
article explains on how effective algorithms like Histogram of Oriented Gradients, face
landmark estimation and Deep Neural Networks are for face recognition.
15
In this study, techniques used by Facebook’s DeepFace and Google’s FaceNet
algorithms have been implemented to extract facial measurements of a human face from
a given image by using OpenFace trained convolutional neural network feature extractor.
It uses a trained neural network that takes 128 facial measurements of a face that is
needed for face recognition. Figure 4 shows the logic flow for face recognition with
neural networks.
Figure 4. Logic Flow for Face Recognition with a Neural Network
Once a face is detected, the systems preprocess each face in the image to create a
normalized and fixed-size input to the neural network. Preprocessing the image
implements algorithm by Kazemi, Vahid and Sullivan, Josephine(2014) normalizing the
faces so that they eyes, nose, and mouth appear at similar locations in each image. Shown
in Figure 5 by using affine transformation. The transformation is based on the large blue
landmarks and the final image is cropped to the boundaries and resized to 96x96 pixels.
16
Figure 5. OpenFace’s Affine Transformation
The preprocessed images are too high-dimensional for a classifier to take directly
on input. The neural network is used as a feature extractor to produce a low dimensional
representation that characterizes a person’s face. A low-dimensional representation is key
so it can be efficiently used in classifiers or clustering techniques.
Figure 6 illustrates how FaceNet’s training procedure learns to cluster face
representations of the same person. The unit hypersphere is a high-dimensional sphere
such that every point has distance 1 from the origin. Constraining the embedding to the
unit hyper sphere provides a structure to a space that is otherwise unbounded.
Figure 6. Illustration of FaceNet’s Triplet-loss Training Procedure

17
On a study by Klemen Grm, Vitomir Struc, Anais Artige, et al.(2017), Strengths
and Weaknesses of Deep Learning Models for Face Recognition Against Image
Degradations, the paper studies the effects of different covariates on the verification
performance of four recent deep CNN models using the Labeled Faces in the Wild
(LFW) dataset. The paper specifically investigated the influence of covariates related to:
image quality – blur, JPEG compression, occlusion, noise, image brightness, contrast,
missing pixels; and model characteristics, CNN architecture, color information,
descriptor computation, and analyze their impact on the face verification performance of
AlexNet, VGG-Face, GoogLeNet, and SqueezeNet.Identify the strengths and weaknesses
of the deep learning models, and present key areas for potential future research. The
results indicate that high levels of noise, blur, missing pixels, and brightness have a
detrimental effect on the verification performance of all models, whereas the impact of
contrast changes and compression artifacts is limited. It has been found that the descriptor
computation strategy and color information does not have a significant influence on
performance.
In this study, neural networks models are used to take facial measurements and
thus also present that input image with or that present’s high level of blur, missing pixels,
and brightness thus affect the overall face recognition of capability of the system.
Speech Recognition
Alternatively referred to as speech recognition, voice recognition is a computer
software program or hardware device with the ability to decode the human voice. Voice
recognition is commonly used to operate a device, perform commands, or write without

18
having to use a keyboard, mouse, or press any buttons. Today, this is done on a computer
with ASR (automatic speech recognition) software programs. Many ASR programs
require the user to "train" the ASR program to recognize their voice so that it can more
accurately convert the speech to text. For example, you could say "open Internet" and the
computer would open the Internet browser. The first ASR device was used in 1952 and
recognized single digits spoken by a user (it was not computer driven). Today, ASR
programs are used in many industries, including healthcare, military (e.g., F-16 fighter
jets), telecommunications, and personal computing (i.e. hands-free computing). In this
study, speech recognition is use to navigate through the system. Voice commands are
given by its users and then it is recognized through speech recognition and is executed by
the system (computerhope.com, 2019).
In this study, speech recognition is implemented and is used for navigation of the
system. It is also used as to operate the system without having to press buttons on the
system.
Voice recognition software on computers requires that analog audio be converted
into digital signals, known as analog-to-digital conversion. For a computer to decipher a
signal, it must have a digital database, or vocabulary, of words or syllables, as well as a
speedy means for comparing this data to signals. The speech patterns are stored on the
hard drive and loaded into memory when the program is run. A comparator checks these
stored patterns against the output of the A/D converter -- an action called pattern
recognition (Rouse, 2018).

19
In this study, audio signals captured by the microphone is segmented or split into
10-15 milliseconds long. Which is then compared by the system’s acoustic model to get
its features of every segmented signal.
Hidden Markov Model Algorithm
On a paper by Mark Gales and Steve Young (2007), “The Application of Hidden
Markov Models in Speech Recognition”, the paper describes various refinements which
are needed to achieve state-of-the-art performance on continuous speech recognition. It
also discusses Hidden Markov Models (HMMs) could provide a simple and effective
framework for modelling time-varying spectral vector sequences. As a consequence,
almost all present day large vocabulary continuous speech recognition (LVCSR) systems
are based on HMMs.
In this study, Hidden Markov Model algorithm is implemented and primarily used
to recognize speech spoken by the users of the system. The Hidden Markov Model
algorithm implemented in this study is combined with an acoustic model for extracting
statistical features of a given speech. Shown in Figure 7 the basic unit of sound
represented by the acoustic model in the phone.
Figure 7. Architecture of a HMM-based Recognizer in Phones

20
For example, the word “bat” is composed of three phones /b/ /ae/ /t/. About 40
such phones are required for English. For any given w, the corresponding acoustic model
is synthesized by concatenating phone models to make words as defined by a
pronunciation dictionary. The parameters of these phone models are estimated from
training data consisting of speech waveforms and their orthographic transcriptions. The
language model is typically an N-gram model in which the probability of each word is
conditioned only on its N − 1 predecessors. The N-gram parameters are estimated by
counting N-tuples in appropriate text corpora. The decoder operates by searching through
all possible word sequences using pruning to remove unlikely hypotheses thereby
keeping the search tractable. When the end of the utterance is reached, the most likely
word sequence is output. Alternatively, modern decoders can generate lattices containing
a compact representation of the most likely hypotheses. The feature extraction stage
seeks to provide a compact representation of the speech waveform. This form should
minimize the loss of information that discriminates between words, and provide a good
match with the distributional assumptions made by the acoustic models. For example, if
diagonal covariance Gaussian distributions are used for the state-output distributions then
the features should be designed to be Gaussian and uncorrelated. Feature vectors are
typically computed every 10 ms using an overlapping analysis window of around 25 ms.
One of the simplest and most widely used encoding schemes is based on mel-frequency
cepstral coefficients (MFCCs). These are generated by applying a truncated discrete
cosine transformation (DCT) to a log spectral estimate computed by smoothing an FFT
with around 20 frequency bins distributed non-linearly across the speech spectrum. The
nonlinear frequency scale used is called a mel scale and it approximates the response of
21
the human ear. The DCT is applied in order to smooth the spectral estimate and
approximately decorrelate the feature elements. After the cosine transform the first
element represents the average of the log-energy of the frequency bins. This is sometimes
replaced by the log-energy of the frame, or removed completely.
CMU SPHINX-4 Speech Recognition System
On a paper by Paul Lamere, Philip Kwok, Evandro Gouvea, Bhiksha Raj ,et al.
(2003), THE CMU SPHINX-4 SPEECH RECOGNITION SYSTEM, the paper has
described the salient features of the architecture of CMU’s latest Sphinx-4 speech
recognition system. Also have described the advantages of the system’s modular design,
its flexibility in the usage of various kinds of acoustic and language representations, and
the features due to the Java platform that it has been developed on. The recognizer
incorporates a completely new algorithm in its search, the Bush-derby algorithm, in
addition to the conventional Viterbi search algorithm. The Bush-derby algorithm enables
the system to include all paths through a unit into the score for the unit. It also facilitates
the use of multiple sources of information, such as audio and visual features, or parallel
feature streams, in a natural way. Shown in Figure 8 is the Sphinx-4 front end module.
The main blocks are the front end, the decoder, and the knowledge base. The decoder
comprises the linguist, the search manager, and the acoustic scorer. The communication
between these modules, as well as communication with an application, is depicted.

22
Figure 7. Sphinx-4 system architecture.
Figure 8. Sphinx-4 Front End Module
Figure 8 shows a detailed representation of the front-end module. The module
consists of several communicating blocks, each with an input and an output. Each block
has its input linked to the output of its predecessor. When a block is ready for more data,
it reads data from the predecessor, and interprets it to find out if the incoming information
is speech data or a control signal. The control signal might indicate beginning or end of
speech – important for the decoder - or might indicate data dropped or some other
problem. If the incoming data is speech, it is processed and the output is buffered, waiting
for the successor block to request it. The decoder block consists of three modules: search
manager, linguist, and acoustic scorer. The primary function of the search manager is to
construct and search a tree of possibilities for the best hypothesis. The construction of the
search tree is done based on information obtained from the linguist. In addition, the
23
search manager communicates with the acoustic scorer to obtain acoustic scores for
incoming data. HMM is performed in two ways: depth-first or breadth-first. Depth-first
search is similar to conventional stack decoding. Breadth-first search is performed using
the standard Viterbi algorithm as well as a new algorithm called Bush-derby. The
algorithm works that rather than considering only the highest scoring path for each unit
which surrounded by dotted boxes shown in Figure 9, this algorithm combines all paths
of the unit into a single score. During search, competing units (phoneme, word, grammar
etc.) are each represented by a DIRECTED ACYCLIC GRAPH (DAG). Each dag has a
source and a sink, as illustrated by the examples in figure 9 it shows two-node DAGS for
two competing phonemes AX and AXR. The DAGS associated with the competing word
units CAT and RAT in Figure 3b are more complicated. In Viterbi, each competing unit
is scored by the probability of the (single) best path through it and the unit with
maximum best-path score wins. For example, if the probabilities on the edges were
(0.9,0.02,0.01) and (0.2,0.7,0.6) for the phonemes AX and AXR, respectively, then the
scores would be 0.9 and 0.7 and AX would win. If we used the sum of the probabilities of
all paths rather than the maximum, then the phoneme AXR would come out higher.
Figure 9. Bush-derby Algorithm

24
In this study, the researchers utilized the CMU Sphinx-4 speech recognition
system for its modularity, flexibility and algorithmic aspects. It uses newer search
strategies, is universal in its acceptance of various kinds of grammars and language
models, types of acoustic models and feature streams. Algorithmic innovations included
in the system design enable it to incorporate multiple information sources in an elegant
manner. The system is entirely developed on the Java™ platform and is highly portable,
flexible, and easier to use with multithreading.
Understanding the CMU Sphinx Speech Recognition System
In a paper “Understanding the CMU Sphinx Speech Recognition System” by
Chun-Feng Liao (2016), the purpose of the project is to find out how an efficient speech
recognition engine can be implemented ,the author examined the source code of Sphinx2
carefully and figured out the role and function of each component. The project found that
the core of Sphinx speech recognition system is composed of three components:
FrontEnd , Knowledge base and Decoder . Frontend receives and processes speech
signals. Knowledge base provides data for decoder and the decoder searches the states
then return the result .A speech recognition system provides some APIs to be
manipulated by voice applications.
In this study, the paper by Chun-Feng Liao helped the researchers understand the
core architecture of cmu sphinx and how the principal components of the system work.
25
Pocketsphinx
In a paper submitted to the IEEE International Conference on Acoustics Speech
and Signal Processing Proceedings by David Huggins-Daines, Mohit Kumar, Arthur
Chan, et al. (2006), POCKETSPHINX: A FREE, REAL-TIME CONTINUOUS SPEECH
RECOGNITION SYSTEM FOR HAND-HELD DEVICES, the paper attempt to address
the issues of deploying speech recognition applications on embedded or mobile devices.
The most difficult of these is the computational requirements of continuous speech
recognition for a medium to large vocabulary scenario. The need to minimize the size and
power consumption for these devices leads to compromises in their hardware and
operating system software that further restrict their capabilities below what one might
assume from their raw CPU speed. The paper, presents a preliminary case study on the
porting and optimization of CMU SPHINX-II, a popular open source large vocabulary
continuous speech recognition (LVCSR) system, to hand-held devices. The resulting
system operates in an average 0.87 times real-time on a 206MHz device, 8.03 times faster
than the baseline system. To the author’s knowledge, is the first hand-held LVCSR
system available under an open-source license.
In this study, Pocketsphinx modules like the acoustic models and phonetic
dictionary have provided researchers the ability to customize the speech recognition part
of the system proposed in this study. In addition to hardware limitations faced by the
researchers building a speech recognition system from the ground up requires access to
proprietary speech recognition toolkits which are often expensive and usually provided
without source code. Pocketsphinx provided a framework for the researchers to work on.
26
Kiosk Systems
An electronic kiosk is a computerized or simple help hub in airports, malls, or
other large areas that provide visitors with free information, such as directions or
commercial (computerhope.com, 2018).
In this study, the researchers developed kiosk software with face and speech
recognition that can be installed on a physical hardware and can serve as a student kiosk
where users can view their school records etc.
On a study by Dr. Stephen Jonas, Stephen Megregian, Gary Wenger, et al. (1994),
Implementing a Kiosk-based Campus Information System at Community Colleges: Three
Case Studies, the paper presents case studies of three community colleges that
implemented a campus wide information systems (CWIS) using touchscreen kiosks. The
three colleges cumulatively offer students approximately 20 kiosk encasements on
campus for access to the CWIS. The study concluded that Intouch kiosk system is
reaching a broad spectrum of students: men and women of all ages, ethnic backgrounds,
majors, and academic disciplines. The report titled, 'Who's Using the Kiosks?' notes that
6,480 different students, representing 35 percent of the institution's total spring
enrollment, accessed their personal records (from SCC's student information system) at
least once during the term. Additionally, the statistics showed that kiosks seem to attract
students who are somewhat more dedicated and academically successful than the overall
student population, as indicated by kiosk users having slightly higher grade point
averages and higher credit hours loads than the general student population. An actual
kiosk user distribution breakdown by major closely parallels that of SCC's overall
breakdown by major.
27
In this study, the study “Implementing a Kiosk-based Campus Information System
at Community Colleges: Three Case Studies” gives insight to the researchers of this study
to further improve and add additional features to the current ISAT U kiosk to
accommodate students with less physical capabilities of accessing their curricular
information in the said kiosk.
Lobby Attendant Software
Lobby Attendant Software[67] virtual front desk kiosk software sold online that
can be installed on any computer and can be placed on building lobby’s the software
contents is customizable depending on the needs. The software can display Building
Directory, Visitor Check-In, Local News & Information, VOIP Calling Text Messaging,
Visitor Badge Printing it can also provide visitors to find their way, search for businesses,
departments and employees, call occupants and other selected services.
The above mentioned software is similar to this study in a sense that they provide
ease of access to information’s need by users. However the software above is target for
businesses and offers much customable user interface and lots of features at a cost.
Information Kiosk System
Information kiosk system by Wizzard Technologies Inc.[68] kiosk software that
enables student to view Class Schedules, Academic Records, Student Tuition fees and
Balances, Update Mobile Number, View Clearance, Courses Offered, Officers’ Directory
or Organization’s Structure, Announcement Message from the President, School
Calendar, Scholarships School Organization and Enrolment Procedure.

28
The Information kiosk system by Wizzard Technologies Inc. is quite similar to
this study because it is made for schools and for student to view information. However
the software mentioned above the students can update their information on the kiosk and
view school Organization’s Structure etc. In this study the Student kiosk with face and
speech recognition can be further improve and add those additional features so that it can
become a one stop information kiosk for students.
ISO 25010
On a paper by Haslinda, Fahmy, Sukinah, Roslina, Fariha, et al. (2015),
Evaluation of e-Book Applications Using ISO 25010, The paper aims to evaluate the
quality of e-Book applications based on the characteristics set forth by the ISO 25010
standard. The study investigated the quality of e-Book applications in facilitating
classroom learning based on the ISO 25010 standard. A survey of 37 primary schools in
the district of Kemaman in Terengganu Malaysia was carried out involving 200 teachers.
Results indicate that e-Book applications are perceived as usable, reliable, functional, and
efficient in supporting the learning process.
On a paper by Adewole Adewumi and Nicholas Omoregbe (2015), Evaluating
Open Source Software Quality Models against ISO 25010, the paper aims to evaluate
existing open source software quality models against the ISO 25010 standard.
Comparison between ISO 25010 and the other basic quality models showed that the ISO
model was more comprehensive in terms of the number of quality characteristics that it
supported and could serve as a standard.

29
The quality model is the cornerstone of a product quality evaluation system. The
quality model determines which quality characteristics will be taken into account when
evaluating the properties of a software product.
The quality of a system is the degree to which the system satisfies the stated and
implied needs of its various stakeholders, and thus provides value. Those stakeholders'
needs (functionality, performance, security, maintainability, etc.) are precisely what is
represented in the quality model, which categorizes the product quality into
characteristics and sub-characteristics (iso25000.com, 2018).
Furthermore in this study, ISO 25010 is used to evaluate the system that it
conforms to the standard set by the ISO for computer systems. The system will be
evaluated based on the following quality characteristics standard set by ISO 25010.
Functionality Suitability. This characteristic represents the degree to which a
product or system provides functions that meet stated and implied needs when used under
specified conditions. This characteristic is composed of the following sub characteristics.
Functional completeness, degree to which the set of functions covers all the specified
tasks and user objectives. Functional correctness, degree to which a product or system
provides the correct results with the needed degree of precision. Functional
appropriateness, degree to which the functions facilitate the accomplishment of specified
tasks and objectives.
Performance efficiency. This characteristic represents the performance relative to
the amount of resources used under stated conditions. This characteristic is composed of
the following subcharacteristics. Time behaviour, degree to which the response and
processing times and throughput rates of a product or system, when performing its
30
functions, meet requirements. Resource utilization, Degree to which the amounts and
types of resources used by a product or system, when performing its functions, meet
requirements. Capacity, Degree to which the maximum limits of a product or system
parameter meet requirements.
Compatibility. Degree to which a product, system or component can exchange
information with other products, systems or components, and/or perform its required
functions, while sharing the same hardware or software environment. This characteristic
is composed of the following sub characteristics. Co-existence, degree to which a product
can perform its required functions efficiently while sharing a common environment and
resources with other products, without detrimental impact on any other product.
Interoperability, degree to which two or more systems, products or components can
exchange information and use the information that has been exchanged.
Usability. Degree to which a product or system can be used by specified users to
achieve specified goals with effectiveness, efficiency and satisfaction in a specified
context of use. This characteristic is composed of the following sub characteristics.
Appropriateness recognizability, degree to which users can recognize whether a product
or system is appropriate for their needs. Learnability, degree to which a product or
system can be used by specified users to achieve specified goals of learning to use the
product or system with effectiveness, efficiency, freedom from risk and satisfaction in a
specified context of use. Operability, degree to which a product or system has attributes
that make it easy to operate and control. User error protection, Degree to which a system
protects users against making errors. User interface aesthetics, Degree to which a user
interface enables pleasing and satisfying interaction for the user. Accessibility, degree to
31
which a product or system can be used by people with the widest range of characteristics
and capabilities to achieve a specified goal in a specified context of use.
Reliability. Degree to which a system, product or component performs specified
functions under specified conditions for a specified period of time. This characteristic is
composed of the following sub characteristics. Maturity, degree to which a system,
product or component meets needs for reliability under normal operation. Availability,
degree to which a system, product or component is operational and accessible when
required for use. Fault tolerance, degree to which a system, product or component
operates as intended despite the presence of hardware or software faults. Recoverability,
degree to which, in the event of an interruption or a failure, a product or system can
recover the data directly affected and re-establishes the desired state of the system.
Security. Degree to which a product or system protects information and data so
that persons or other products or systems have the degree of data access appropriate to
their types and levels of authorization. This characteristic is composed of the following
sub characteristics. Confidentiality, Degree to which a product or system ensures that data
are accessible only to those authorized to have access. Integrity, Degree to which a
system, product or component prevents unauthorized access to, or modification of,
computer programs or data. Non-repudiation, degree to which actions or events can be
proven to have taken place, so that the events or actions cannot be repudiated later.
Accountability, Degree to which the actions of an entity can be traced uniquely to the
entity. Authenticity, Degree to which the identity of a subject or resource can be proved to
be the one claimed.

32
Maintainability. This characteristic represents the degree of effectiveness and
efficiency with which a product or system can be modified to improve it, correct it or
adapt it to changes in environment, and in requirements. This characteristic is composed
of the following sub characteristics. Modularity, degree to which a system or computer
program is composed of discrete components such that a change to one component has
minimal impact on other components. Reusability, degree to which an asset can be used
in more than one system, or in building other assets. Analyzability, Degree of
effectiveness and efficiency with which it is possible to assess the impact on a product or
system of an intended change to one or more of its parts, or to diagnose a product for
deficiencies or causes of failures, or to identify parts to be modified. Modifiability,
Degree to which a product or system can be effectively and efficiently modified without
introducing defects or degrading existing product quality. Testability, Degree of
effectiveness and efficiency with which test criteria can be established for a system,
product or component and tests can be performed to determine whether those criteria
have been met.
Portability. Degree of effectiveness and efficiency with which a system, product
or component can be transferred from one hardware, software or other operational or
usage environment to another. This characteristic is composed of the following sub
characteristics. Adaptability, degree to which a product or system can effectively and
efficiently be adapted for different or evolving hardware, software or other operational or
usage environments. Installability, degree of effectiveness and efficiency with which a
product or system can be successfully installed and/or uninstalled in a specified

33
environment. Replaceability, degree to which a product can replace another specified
software product for the same purpose in the same environment

10 Chapter2

Uploaded by

Copyright:

Available Formats

You might also like

10 Chapter2

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

10 Chapter2

Uploaded by

Copyright:

Available Formats

Chapter 2

REVIEW OF RELATED LITERATURE

and other studies.

Facial Recognition and Voice Recognition

A facial recognition system is a computer application capable of identifying or

database. Face detection is a computer technology being used in a variety of applications

known facial measurements of the students in the system.

Face Recognition and Detection Techniques

On a study conducted by Parveen kumar and Doulat singh (2016), Approach on

face representation. Effective feature descriptors are expected to convey sufficient

methods: Linear subspace methods, neural networks, and statistical approaches.

Face Recognition Algorithms

On a survey paper written by Marqu´es, Ion (2010) Face Recognition Algorithms,

possibly be solved using different approaches.

Histogram of Oriented Gradients Algorithm

On a paper by Dalal, Navneet and Triggs, Bill (2005) Histograms of Oriented

High-quality local contrast normalization in overlapping descriptor blocks is all important

for good results.

classifying if there is a human face on an image. HOG method is based on evaluating

well-normalized local histograms of image gradient orientations in a dense grid. Similar

intensity gradients or edge directions, even without precise knowledge of the

corresponding gradient or edge positions. In practice this is implemented

in Figure 2. The detector window is tiled with a grid of overlapping blocks

paper concentrates on the feature extraction process.

Figure 2. Overview of the HOG Feature Extraction Chain Face Landmarks

Face Landmark Alignment

performance with high quality predictions. It described on how an ensemble of regression

single image in a millisecond.

Figure 3. Selected results on the HELEN Dataset

Deep Convolutional Neural Network Algorithms

On a paper written by Google employees Florian Schroff, Dmitry Kalenichenko,

recognition, verification and clustering can be easily implemented using standard

techniques with FaceNet embeddings as feature vectors. It uses Deep Convolutional

of-the-art face recognition performance using only 128-bytes per face.

On a paper written by Brandon Amos, Bartosz Ludwiczuk, Mahadev

Satyanarayanan (2016) OpenFace: A general-purpose face recognition library with

In this study, techniques used by Facebook’s DeepFace and Google’s FaceNet

Figure 4. Logic Flow for Face Recognition with a Neural Network

implements algorithm by Kazemi, Vahid and Sullivan, Josephine(2014) normalizing the

Figure 5. OpenFace’s Affine Transformation

representation that characterizes a person’s face. A low-dimensional representation is key

so it can be efficiently used in classifiers or clustering techniques.

Figure 6 illustrates how FaceNet’s training procedure learns to cluster face

representations of the same person. The unit hypersphere is a high-dimensional sphere

unit hyper sphere provides a structure to a space that is otherwise unbounded.

Figure 6. Illustration of FaceNet’s Triplet-loss Training Procedure

On a study by Klemen Grm, Vitomir Struc, Anais Artige, et al.(2017), Strengths

missing pixels; and model characteristics, CNN architecture, color information,

AlexNet, VGG-Face, GoogLeNet, and SqueezeNet.Identify the strengths and weaknesses

Alternatively referred to as speech recognition, voice recognition is a computer

recognition is commonly used to operate a device, perform commands, or write without

jets), telecommunications, and personal computing (i.e. hands-free computing). In this

the system (computerhope.com, 2019).

Voice recognition software on computers requires that analog audio be converted

into digital signals, known as analog-to-digital conversion. For a computer to decipher a

signal, it must have a digital database, or vocabulary, of words or syllables, as well as a

recognition (Rouse, 2018).