Professional Documents
Culture Documents
10 Chapter2
10 Chapter2
10 Chapter2
This chapter includes related literature and related studies that give ideas to the
researcher in the formulation of the study. The reviews found on the literature published
verifying a person from a digital image or a video frame from a video source. One of the
ways to do this is by comparing selected facial features from the image and a facial
that identifies human faces in digital images. Face detection also refers to the
psychological process by which humans locate and attend to faces in a visual scene. It is
typically used in security systems and can be compared to other biometrics such as
fingerprint or eye iris recognition systems. (Parveen kumar & Doulat singh, 2016).
In this study, face recognition is implemented and used for identifying the user
accessing the system proposed. It capable of identifying the student from video frames
captured by the systems camera. Locate and extract facial features then compare it to a
Face Recognition & Detection Techniques, Images are an important form of data and are
used in almost every application. Some applications cannot use images directly due to the
large amount of memory space needed to store these images. One of the most critical
decision points in the design of a face recognition system is the choice of an appropriate
invariant and non-redundant facial information. Motion information is used to find the
moving regions and probable eye region blobs are extracted by thresholding the image.
These blobs reduce the search space for face verification which is done by template
matching. Experimental results for face detection show good performance even across
orientation and pose variation to a certain extent. The face recognition is carried out by
cumulatively summing up the Euclidean distance between the test face images and the
stored database, which shows good discrimination for true and false subjects. As human
face is a dynamic object having high degree of variability in its appearance, that makes
face detection a difficult problem in computer vision. In this field, accuracy and speed of
identification is a main issue. The study concludes that image based approaches is best
among other approach for face recognition. Image-based approaches are having three
the paper presented the face recognition area, explaining different approaches, methods,
11
tools and algorithms used since the 60’s. Some algorithms are better, some are less
accurate, some of them are more versatile and others are too computationally costly. The
paper also discusses feature extraction methods for face recognition. The paper also
discusses problems with face recognition like pose and illumination and how they can
In this study, knowing known face recognition and detection techniques is trivial.
Also taking into account the speed and accuracy of each technique for face recognition
provides the researchers of this study some useful information and approaches to improve
face recognition. The “Face Recognition Algorithms” paper guide the researchers on how
to resolve some of the significant issues on face recognition such as pose and illumination
and also help the researchers of this study to weigh the most effective algorithms to be
used on face recognition. Thus the above mentioned face detection, recognition,
technique, and algorithms provide very useful information in the field of face recognition.
Gradients for Human Detection, the paper shows that after reviewing existing edge and
gradient based descriptors, shows that grids of Histograms of Oriented Gradient (HOG)
descriptors signicantly outperform existing feature sets for human detection. The study
evaluates the influence of each stage of the computation on performance, concluding that
Fine-scale gradients, fine orientation binning, relatively coarse spatial binning, and
In this study, HOG algorithm is implemented and used for locating and
features have seen increasing use over the past decade. The basic idea is that local object
appearance and shape can often be characterized rather well by the distribution of local
by dividing the image window into small spatial regions ("cells"), for each cell
accumulating a local 1-D histogram of gradient directions or edge orientations over the
pixels of the cell. An overview of the feature extraction and object detection chain shown
in which Histogram of Oriented Gradient feature vectors are extracted. The combined
vectors are fed to a linear SVM for object/non-object classification. The detection
window is scanned across the image at all positions and scales, and conventional non-
maximum suppression is run on the output pyramid to detect object instances, but this
Alignment
13
On the paper by Kazemi, Vahid and Sullivan, Josephine (2014), One Millisecond
Face Alignment with an Ensemble of Regression Trees, the paper presents a general
framework based on gradient boosting for learning an ensemble of regression trees that
optimizes the sum of square error loss and naturally handles missing or partially labelled
data, the paper presents a new faster algorithm for face alignment and face landmark
estimation. The paper addresses the problem of Face Alignment for a single image. It
show how an ensemble of regression trees can be used to estimate the face’s landmark
positions directly from a sparse subset of pixel intensities, achieving super real time
trees can be used to regress the location of the face landmarks like the top of the chin, the
outside edge of each eye, the inner edge of each eyebrow, lips, etc. Shown in Figure 3 an
ensemble of randomized regression trees is used to detect 194 landmarks on face from a
In this study, the algorithm use by Kazemi, Vahid and Sullivan, Josephine (2014)
is implemented to locate face landmarks such as eyes, nose, lips, eyebrows etc. on a
given image and then rotate the images so that these landmarks are centered on the image
frame.
and James Philbin (2015) FaceNet: A Unified Embedding for Face Recognition and
Clustering , the paper present a system called FaceNet, that directly learns a mapping
from face images to a compact Euclidean space where distances directly correspond to a
measure of face similarity. Once this space has been produced, tasks such as face
Neural Network algorithms to optimize the embedding itself. The system achieves state-
mobile applications, the paper presents a face recognition library which implements face
recognition with neural networks from the techniques used by Facebook’s DeepFace and
Google’s FaceNet. Furthermore an article Machine Learning is Fun (Geitgey, 2016). The
article explains on how effective algorithms like Histogram of Oriented Gradients, face
landmark estimation and Deep Neural Networks are for face recognition.
15
algorithms have been implemented to extract facial measurements of a human face from
a given image by using OpenFace trained convolutional neural network feature extractor.
It uses a trained neural network that takes 128 facial measurements of a face that is
needed for face recognition. Figure 4 shows the logic flow for face recognition with
neural networks.
Once a face is detected, the systems preprocess each face in the image to create a
normalized and fixed-size input to the neural network. Preprocessing the image
faces so that they eyes, nose, and mouth appear at similar locations in each image. Shown
in Figure 5 by using affine transformation. The transformation is based on the large blue
landmarks and the final image is cropped to the boundaries and resized to 96x96 pixels.
16
The preprocessed images are too high-dimensional for a classifier to take directly
on input. The neural network is used as a feature extractor to produce a low dimensional
such that every point has distance 1 from the origin. Constraining the embedding to the
and Weaknesses of Deep Learning Models for Face Recognition Against Image
Degradations, the paper studies the effects of different covariates on the verification
performance of four recent deep CNN models using the Labeled Faces in the Wild
(LFW) dataset. The paper specifically investigated the influence of covariates related to:
image quality – blur, JPEG compression, occlusion, noise, image brightness, contrast,
descriptor computation, and analyze their impact on the face verification performance of
of the deep learning models, and present key areas for potential future research. The
results indicate that high levels of noise, blur, missing pixels, and brightness have a
detrimental effect on the verification performance of all models, whereas the impact of
contrast changes and compression artifacts is limited. It has been found that the descriptor
computation strategy and color information does not have a significant influence on
performance.
In this study, neural networks models are used to take facial measurements and
thus also present that input image with or that present’s high level of blur, missing pixels,
and brightness thus affect the overall face recognition of capability of the system.
Speech Recognition
software program or hardware device with the ability to decode the human voice. Voice
having to use a keyboard, mouse, or press any buttons. Today, this is done on a computer
with ASR (automatic speech recognition) software programs. Many ASR programs
require the user to "train" the ASR program to recognize their voice so that it can more
accurately convert the speech to text. For example, you could say "open Internet" and the
computer would open the Internet browser. The first ASR device was used in 1952 and
recognized single digits spoken by a user (it was not computer driven). Today, ASR
programs are used in many industries, including healthcare, military (e.g., F-16 fighter
study, speech recognition is use to navigate through the system. Voice commands are
given by its users and then it is recognized through speech recognition and is executed by
In this study, speech recognition is implemented and is used for navigation of the
system. It is also used as to operate the system without having to press buttons on the
system.
speedy means for comparing this data to signals. The speech patterns are stored on the
hard drive and loaded into memory when the program is run. A comparator checks these
stored patterns against the output of the A/D converter -- an action called pattern
In this study, audio signals captured by the microphone is segmented or split into
10-15 milliseconds long. Which is then compared by the system’s acoustic model to get
On a paper by Mark Gales and Steve Young (2007), “The Application of Hidden
Markov Models in Speech Recognition”, the paper describes various refinements which
also discusses Hidden Markov Models (HMMs) could provide a simple and effective
almost all present day large vocabulary continuous speech recognition (LVCSR) systems
In this study, Hidden Markov Model algorithm is implemented and primarily used
to recognize speech spoken by the users of the system. The Hidden Markov Model
algorithm implemented in this study is combined with an acoustic model for extracting
statistical features of a given speech. Shown in Figure 7 the basic unit of sound
For example, the word “bat” is composed of three phones /b/ /ae/ /t/. About 40
such phones are required for English. For any given w, the corresponding acoustic model
pronunciation dictionary. The parameters of these phone models are estimated from
training data consisting of speech waveforms and their orthographic transcriptions. The
language model is typically an N-gram model in which the probability of each word is
counting N-tuples in appropriate text corpora. The decoder operates by searching through
all possible word sequences using pruning to remove unlikely hypotheses thereby
keeping the search tractable. When the end of the utterance is reached, the most likely
word sequence is output. Alternatively, modern decoders can generate lattices containing
a compact representation of the most likely hypotheses. The feature extraction stage
seeks to provide a compact representation of the speech waveform. This form should
minimize the loss of information that discriminates between words, and provide a good
match with the distributional assumptions made by the acoustic models. For example, if
diagonal covariance Gaussian distributions are used for the state-output distributions then
the features should be designed to be Gaussian and uncorrelated. Feature vectors are
One of the simplest and most widely used encoding schemes is based on mel-frequency
with around 20 frequency bins distributed non-linearly across the speech spectrum. The
nonlinear frequency scale used is called a mel scale and it approximates the response of
21
the human ear. The DCT is applied in order to smooth the spectral estimate and
approximately decorrelate the feature elements. After the cosine transform the first
element represents the average of the log-energy of the frequency bins. This is sometimes
On a paper by Paul Lamere, Philip Kwok, Evandro Gouvea, Bhiksha Raj ,et al.
(2003), THE CMU SPHINX-4 SPEECH RECOGNITION SYSTEM, the paper has
described the salient features of the architecture of CMU’s latest Sphinx-4 speech
recognition system. Also have described the advantages of the system’s modular design,
its flexibility in the usage of various kinds of acoustic and language representations, and
the features due to the Java platform that it has been developed on. The recognizer
addition to the conventional Viterbi search algorithm. The Bush-derby algorithm enables
the system to include all paths through a unit into the score for the unit. It also facilitates
the use of multiple sources of information, such as audio and visual features, or parallel
feature streams, in a natural way. Shown in Figure 8 is the Sphinx-4 front end module.
The main blocks are the front end, the decoder, and the knowledge base. The decoder
comprises the linguist, the search manager, and the acoustic scorer. The communication
consists of several communicating blocks, each with an input and an output. Each block
has its input linked to the output of its predecessor. When a block is ready for more data,
it reads data from the predecessor, and interprets it to find out if the incoming information
is speech data or a control signal. The control signal might indicate beginning or end of
speech – important for the decoder - or might indicate data dropped or some other
problem. If the incoming data is speech, it is processed and the output is buffered, waiting
for the successor block to request it. The decoder block consists of three modules: search
manager, linguist, and acoustic scorer. The primary function of the search manager is to
construct and search a tree of possibilities for the best hypothesis. The construction of the
search tree is done based on information obtained from the linguist. In addition, the
23
search manager communicates with the acoustic scorer to obtain acoustic scores for
the standard Viterbi algorithm as well as a new algorithm called Bush-derby. The
algorithm works that rather than considering only the highest scoring path for each unit
which surrounded by dotted boxes shown in Figure 9, this algorithm combines all paths
of the unit into a single score. During search, competing units (phoneme, word, grammar
etc.) are each represented by a DIRECTED ACYCLIC GRAPH (DAG). Each dag has a
source and a sink, as illustrated by the examples in figure 9 it shows two-node DAGS for
two competing phonemes AX and AXR. The DAGS associated with the competing word
units CAT and RAT in Figure 3b are more complicated. In Viterbi, each competing unit
is scored by the probability of the (single) best path through it and the unit with
maximum best-path score wins. For example, if the probabilities on the edges were
(0.9,0.02,0.01) and (0.2,0.7,0.6) for the phonemes AX and AXR, respectively, then the
scores would be 0.9 and 0.7 and AX would win. If we used the sum of the probabilities of
all paths rather than the maximum, then the phoneme AXR would come out higher.
In this study, the researchers utilized the CMU Sphinx-4 speech recognition
system for its modularity, flexibility and algorithmic aspects. It uses newer search
models, types of acoustic models and feature streams. Algorithmic innovations included
manner. The system is entirely developed on the Java™ platform and is highly portable,
Chun-Feng Liao (2016), the purpose of the project is to find out how an efficient speech
recognition engine can be implemented ,the author examined the source code of Sphinx2
carefully and figured out the role and function of each component. The project found that
FrontEnd , Knowledge base and Decoder . Frontend receives and processes speech
signals. Knowledge base provides data for decoder and the decoder searches the states
then return the result .A speech recognition system provides some APIs to be
In this study, the paper by Chun-Feng Liao helped the researchers understand the
core architecture of cmu sphinx and how the principal components of the system work.
25
Pocketsphinx
recognition for a medium to large vocabulary scenario. The need to minimize the size and
power consumption for these devices leads to compromises in their hardware and
operating system software that further restrict their capabilities below what one might
assume from their raw CPU speed. The paper, presents a preliminary case study on the
porting and optimization of CMU SPHINX-II, a popular open source large vocabulary
system operates in an average 0.87 times real-time on a 206MHz device, 8.03 times faster
than the baseline system. To the author’s knowledge, is the first hand-held LVCSR
In this study, Pocketsphinx modules like the acoustic models and phonetic
dictionary have provided researchers the ability to customize the speech recognition part
of the system proposed in this study. In addition to hardware limitations faced by the
researchers building a speech recognition system from the ground up requires access to
proprietary speech recognition toolkits which are often expensive and usually provided
without source code. Pocketsphinx provided a framework for the researchers to work on.
26
Kiosk Systems
other large areas that provide visitors with free information, such as directions or
In this study, the researchers developed kiosk software with face and speech
recognition that can be installed on a physical hardware and can serve as a student kiosk
On a study by Dr. Stephen Jonas, Stephen Megregian, Gary Wenger, et al. (1994),
Case Studies, the paper presents case studies of three community colleges that
implemented a campus wide information systems (CWIS) using touchscreen kiosks. The
campus for access to the CWIS. The study concluded that Intouch kiosk system is
reaching a broad spectrum of students: men and women of all ages, ethnic backgrounds,
majors, and academic disciplines. The report titled, 'Who's Using the Kiosks?' notes that
enrollment, accessed their personal records (from SCC's student information system) at
least once during the term. Additionally, the statistics showed that kiosks seem to attract
students who are somewhat more dedicated and academically successful than the overall
student population, as indicated by kiosk users having slightly higher grade point
averages and higher credit hours loads than the general student population. An actual
kiosk user distribution breakdown by major closely parallels that of SCC's overall
breakdown by major.
27
at Community Colleges: Three Case Studies” gives insight to the researchers of this study
to further improve and add additional features to the current ISAT U kiosk to
Lobby Attendant Software[67] virtual front desk kiosk software sold online that
can be installed on any computer and can be placed on building lobby’s the software
contents is customizable depending on the needs. The software can display Building
Directory, Visitor Check-In, Local News & Information, VOIP Calling Text Messaging,
Visitor Badge Printing it can also provide visitors to find their way, search for businesses,
The above mentioned software is similar to this study in a sense that they provide
ease of access to information’s need by users. However the software above is target for
businesses and offers much customable user interface and lots of features at a cost.
enables student to view Class Schedules, Academic Records, Student Tuition fees and
Balances, Update Mobile Number, View Clearance, Courses Offered, Officers’ Directory
this study because it is made for schools and for student to view information. However
the software mentioned above the students can update their information on the kiosk and
view school Organization’s Structure etc. In this study the Student kiosk with face and
speech recognition can be further improve and add those additional features so that it can
ISO 25010
Evaluation of e-Book Applications Using ISO 25010, The paper aims to evaluate the
quality of e-Book applications based on the characteristics set forth by the ISO 25010
classroom learning based on the ISO 25010 standard. A survey of 37 primary schools in
the district of Kemaman in Terengganu Malaysia was carried out involving 200 teachers.
Results indicate that e-Book applications are perceived as usable, reliable, functional, and
Open Source Software Quality Models against ISO 25010, the paper aims to evaluate
existing open source software quality models against the ISO 25010 standard.
Comparison between ISO 25010 and the other basic quality models showed that the ISO
model was more comprehensive in terms of the number of quality characteristics that it
The quality model is the cornerstone of a product quality evaluation system. The
quality model determines which quality characteristics will be taken into account when
The quality of a system is the degree to which the system satisfies the stated and
implied needs of its various stakeholders, and thus provides value. Those stakeholders'
represented in the quality model, which categorizes the product quality into
Furthermore in this study, ISO 25010 is used to evaluate the system that it
conforms to the standard set by the ISO for computer systems. The system will be
evaluated based on the following quality characteristics standard set by ISO 25010.
product or system provides functions that meet stated and implied needs when used under
Functional completeness, degree to which the set of functions covers all the specified
tasks and user objectives. Functional correctness, degree to which a product or system
provides the correct results with the needed degree of precision. Functional
the amount of resources used under stated conditions. This characteristic is composed of
the following subcharacteristics. Time behaviour, degree to which the response and
processing times and throughput rates of a product or system, when performing its
30
functions, meet requirements. Resource utilization, Degree to which the amounts and
types of resources used by a product or system, when performing its functions, meet
information with other products, systems or components, and/or perform its required
functions, while sharing the same hardware or software environment. This characteristic
can perform its required functions efficiently while sharing a common environment and
resources with other products, without detrimental impact on any other product.
exchange information and use the information that has been exchanged.
system can be used by specified users to achieve specified goals of learning to use the
product or system with effectiveness, efficiency, freedom from risk and satisfaction in a
specified context of use. Operability, degree to which a product or system has attributes
that make it easy to operate and control. User error protection, Degree to which a system
protects users against making errors. User interface aesthetics, Degree to which a user
interface enables pleasing and satisfying interaction for the user. Accessibility, degree to
31
which a product or system can be used by people with the widest range of characteristics
functions under specified conditions for a specified period of time. This characteristic is
product or component meets needs for reliability under normal operation. Availability,
required for use. Fault tolerance, degree to which a system, product or component
recover the data directly affected and re-establishes the desired state of the system.
that persons or other products or systems have the degree of data access appropriate to
their types and levels of authorization. This characteristic is composed of the following
sub characteristics. Confidentiality, Degree to which a product or system ensures that data
are accessible only to those authorized to have access. Integrity, Degree to which a
proven to have taken place, so that the events or actions cannot be repudiated later.
Accountability, Degree to which the actions of an entity can be traced uniquely to the
entity. Authenticity, Degree to which the identity of a subject or resource can be proved to
efficiency with which a product or system can be modified to improve it, correct it or
program is composed of discrete components such that a change to one component has
minimal impact on other components. Reusability, degree to which an asset can be used
effectiveness and efficiency with which it is possible to assess the impact on a product or
system of an intended change to one or more of its parts, or to diagnose a product for
Degree to which a product or system can be effectively and efficiently modified without
effectiveness and efficiency with which test criteria can be established for a system,
product or component and tests can be performed to determine whether those criteria