Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 68

THE FEASIBILITY OF A SPEECH RECOGNIZER AS A FORM OF SECURITY

MEASURE

A Thesis Presented To
the High School Department of
Sacred Heart Academy
of Pasig

In Partial Fulfillment of the


Requirements in English

And Science 10

Researchers:

BAROJA, Drezen Scott A.

DAGSIL, Dyanne Francine D.

DONATO, Asianti Crishna E.

MIRAFLOR, Robbin Cross F.

REYES, Louise Erlle P.

TALOSIG, Nathan E.

10- Prudence

Research Advisers:

Mr. Raldin Gem Frias

Ms. Kendra Caramat


Table of Contents

Acknowledgements iii

Abstract iv

Chapter 1: The Problem and Its Background 1

Introduction 2

Background of the Study 2

Conceptual Framework 3

Statement of the Problem 8

Significance of the Study 9

Scope and Delimitations 10

Definition of Terms 11

Chapter 2: Review of Related Literature and Studies 13

Synthesis 18

Chapter 3: Research Methodology 20

Research Design 20

Research Setting 20

Research Instruments 21

Materials 22

Equipment 23

Procedures 24

i
Statistical Treatment 25

Chapter 4: Results and Discussion 27

Discussion 40

Chapter 5: Summary, Conclusions and Recommendations 1

Summary 42

Conclusion 43

Recommendation 45

Bibliography 46

Curriculum Vitae 47

ii
Acknowldegements

The researchers would like to thank their respective families for giving them the support

they needed to get through such horrendous times. They would also like to thank SHAP

for giving them the opportunity to write such an amazing piece and learn about speech

recognition and security. The researchers wanted to thank their fellow classmates for

helping them in every step of the way. Lastly, the researchers want to thank one of the

best research advisers out there, Sir Raldin Gem Frias and Ms. Kendra Caramat. They

have guided the researchers into creating a product that can revolutionize the world.

From the making of the title to the title defense, they have stuck to us like glue, back

when we had no clue on what to do. The researchers thank them for being patient and

understanding. The researchers would also like to thank the panelists for their

constructive criticism which helped the researchers improve the research even more.

Abstract

The first speech recognition systems were focused on numbers, not words. In 1952,

Bell Laboratories designed the “Audrey” system which could recognize a single voice

speaking digits aloud. Ten years later, IBM introduced “Shoebox” which understood and

responded to 16 words in English. Speech recognition was invented with the idea of

making things more hands-free and easier. Though the world is more secure than ever,

that is no reason for you to take the issue of security lightly. This research titled “The

Feasibility of a Speech Recognizer as a Form of Security Measure” focuses on finding

on creating a security measure using speech recognition that is both effective and

iii
cheap. The researchers used an open-source speech recognizer as the base to test

how effective a regular speech recognizer would be. The research was mainly aimed at

finding out how feasible a speech recognition activated security measure is and if the

call/text that will be produced is fast enough to help a person in danger. This thesis

answers three research question imposed by the researchers. The researchers

answered the questions mentioned through a series of tests that determined the overall

effectivity of the product. The researchers conclude, with the results taken from the

tests, that speech recognition is a feasible security measure.

iv
CHAPTER 1

THE PROBLEM AND ITS BACKGROUND

Introduction

There are numerous crimes both in homes and public places such as theft,

robbery, homicide and others. Nowadays, the use of an effective security measure is

becoming an essential part of people’s everyday lives.

Security measure is a precaution taken against terrorism, espionage or other

danger. “It is the protection of personnel, hardware, software, networks and data from

physical actions and events that could cause serious loss or damage to an enterprise,

agency or institution. This includes protection from fire, flood, natural disasters, burglary,

theft, vandalism and terrorism.” (Rouse, 2016, para. 1). This also ensures that it is the

ability to perform its appointed task by protecting it from attacks inside and outside the

organization. There are methods and measures that are meant to detect attackers and

intruders from affecting protected assets.

Speech recognizer is a more simple and effective form of security measure.

Hope (2019) stated that it is a computer software program or hardware device with the

ability to decode the human voice. It is commonly used to operate a device, perform

commands, or write without having to use a keyboard, mouse, or press any buttons.

Speech Recognizer occurs when the recognizer has recognized an assigned word or

words that was programmed in the software. Rouse (2016) stated that it has a limited

vocabulary of words and phrases, and it may only identify these if they are spoken very

1
clearly. Speech recognizer software are easy to use because it is frequently installed in

computers and mobile devices. The disadvantage of a speech recognizer includes its

inability to recognize or capture words due to mispronunciation, lack of support to

different languages, and its inability to sort through background noise. But all

throughout, it’s the most simple and easier form of security measure because it enables

hands-free control of various devices and equipment, provides input to automatic

translation, and creates print-ready dictation. Speech must be converted from physical

sound to an electrical signal with a microphone, and then to digital data with an analog-

to-digital converter.

Background of the Study

Security measure is important because it protects the belongings of a person. It

helps a lot of things in our society because nowadays, there are many crimes reported

ahead. But it is also usually overlooked by most organizations. There are many reasons

to do it — the attacker doing it for financial gain, personal gain, for seeking revenge or

for the vulnerable target available. Security measure is challenging than previous

decades as there are more sensitive devices available such as USB drives,

smartphones, laptops, tablets and many more that enables the stealing of data easily.

Though nowadays, there are high tech security measures that cannot pass by easily.

The great advantage is that criminals or attackers need to pass through many methods

and layers of security. And as the result, they will have a hard time gaining their

objectives. There are many types of security measures that is really effective and

somehow efficient for the sake of our safety. It has three important components which

2
are access control, surveillance, and testing. All obstacles should be placed in a way

where attackers frequently do their objectives. The location on where the security

measure will be placed should be specific and effective.

Amos (2018) stated that early systems of a speech recognizer were limited to a

single speaker and had limited vocabularies of about a dozen words. The earliest form

of speech recognizer were automated telephone systems and medical dictation

software. It is frequently used for dictation and for giving commands to computer-based

systems. Velde (2019) stated that the first-ever recorded attempt at speech recognition

technology dates back to 1,000 A.D. through the development of an instrument that

could supposedly answer “yes” or “no” to direct questions. Modern speech recognizers

has the ability to recognize speech from multiple speakers and have infinite

vocabularies in numerous languages. According to Hope (2019), today, speech

recognition is done on a computer with ASR (automatic speech recognition) software

programs. Mostly ASR programs requires the user to train the program to recognize the

voice so that it can more accurately convert the assigned speech into text. The first ASR

device was used in 1952, it recognizes single parts of the speech that was spoken by a

certain user.

Conceptual Framework

The researchers used the CIPP Evaluation model in order to show the Context,

Input, Process, and Product of the study. The CIPP Evaluation Model is a decision-

focused framework made by Daniel Stufflebeam and his colleagues in the 1960s for the

main purpose of guiding evaluations of programs and projects. The CPP Model is also

3
defined as the “systematic collection of information about the activities, characteristics,

and outcomes of programs to make judgements about the program, improve program

effectiveness, and/or inform decisions about future programming,” (Stufflebeam, 2003).

CIPP is an acronym for the four main concepts this model use. It stands for context,

input, process, and Product. The researchers chose the CIPP Model for it “aims to

provide an analytic and rational basis for program decision-making, based on a cycle of

planning, structuring, implementing and reviewing and revising decisions, each

examined through a different aspect of evaluation - Context, Input, Process, and

Product evaluation,” (Robinson, 2002).

The researchers decided to employ symbols and colors to provide a clean and

understandable process from the Flowchart Diagram (developed by Newman and

Goldstein around the 1940’s). A flowchart is a diagram of the sequence of movements

or actions of people or things involved in a complex system or activity. Each color in this

diagram represents a certain part of the process. The researchers decided to use colors

to distinguish between the different concepts to make the diagram simple and not

cluttered. The blue rectangle signifies the Context or the objectives of the study, the

black rectangle signifies the Input, the green signifies the Process, and the yellow

teardrop signifies the Product.

A blue rectangle was chosen by the researchers to show the Context. Blue was

chosen as we psychologically associate blue with determination and goal-making,

important qualities in setting up our objectives or the context. In Context, the

researchers collected and assessed information to scale the main objectives the

4
researchers want to accomplish. The researchers stated 4 objectives that they want to

accomplish, as shown in Figure 1. The researchers want to make an affordable security

system for local store owners who can’t afford the more expensive security measures.

They wish to provide an enticing alternative that won’t cost us much as the other

security systems. They also want to show people the advantages of a hands-free

security system and maybe give them a glimpse of the future of security systems. They

also aim to prove that speech recognition as an effective security measure. As speech

recognition is currently looked at as a “pet project” in the security system industry, the

researchers want to show people that using speech recognition as a form of security

measure is not only possible but also very effective. They lastly want to encourage

security companies to use speech recognition in their security measures.

After the Context is the Input Evaluation Stage. The purpose of this stage is for

the researchers to assemble a concrete list of materials and components needed to

make and execute the procedure to create the intended product. After the researchers

outlined their main goals, they then made a list of the components required. The main

components needed in this study are the Raspberry Pi, the brain of the whole product

and the microphone, the component used to detect the sounds that are needed to

activate the security measure. Since the Raspberry Pi comes preinstalled with its own

OS and it’s powerful enough to be used independently, a computer is really not required

unless a powerful system is required to process or load a certain code. Other equipment

such as the cables (micro-USB cable and HDMI cable) are essential in the process.

5
The Process Evaluation Stage is one of the most important stages in the CIPP

Model since the quality of the product here is investigated, documented and assessed

(Wilson & Mertens, 2012). The steps need to be executed properly in order to produce

excellent results. Programming is very crucial as one mistake can stop the program

from working. The researchers need to be meticulous and careful in their coding.

Testing and Troubleshooting are very important steps to spot and correct mistakes.

Once the code is inserted, testing needs to be followed to make sure no mistakes are

made.

The last and final stage in the CIPP Model is the product evaluation. A teardrop

was chosen to show emphasis on the outcome. This stage assesses the final outcome

of the study, whether expected or unexpected. Once the steps have been followed

properly, the Product will be made. Once the product is made, the researchers can then

determine the effectivity of a speech recognizer as a form of security measure.

6
Conceptual Framework

Context
1. make an affordable security system for local store owners;
2. show the advantages of using a hands-free security system;
3. prove that speech recognition can be an effective security measure; and
4. encourage security companies to use speech recognition in their security
measures.

Input
- Raspberry Pi 3 Model B
- Micro-USB Cable and Power Brick
- RODD Brand USB Computer Microphone
- USB Speakers
- HDMI Cable

Process
1. Connect all necessary components to Raspberry Pi.
2. Program all the required codes.
3. Test and troubleshoot.

Product
Security Measure using Speech
Recognition Software

7
Figure 1. The procedures needed to make a security measure using speech recognition

software.

Statement of the Problem

The purpose of this study is to develop a security measure using a speech

recognizer software wherein the speech recognizer will recognize a certain phrase or

word provided by the researchers and the software will secretly call the police and a

close relative/friend alerting them that there is an emergency. The researchers will also

explore the concept of speech recognition software as an alternative for other such

forms of security. The researchers’ aim to give an affordable alternative to local store

owners who cannot afford the other security measures. During the study, the

researchers aim to answer the following questions:

1. What will be the optimal distance between the user and the speech recognizer

for the software and microphone to properly identify the voice?

2. How accurate will the sensor and speech recognizing software be in recognizing

and identifying when the user is trying to activate the security measure?

3. How long is the elapsed time between the activation of the speech recognition

software and the call/text has been made?

Hypotheses

8
1. If the microphone isn’t being obscured and the background noise levels are

ranging from 35 to 110 dB, then the approximate needed decibel the user needs

to produce is approximately 60-80 dB, with the distance around 2 – 10 meters

away from the mic. The decibels needed to be produced by the user is directly

proportional to the distance of the user from the microphone. If there are no

obstructions covering the microphone, then the optimal distance of the user from

the microphone is 5 meters.

2. If the user is standing 5 meters away from the microphone and the user said the

word in 50-60 dB, then the software has about 90% chance of success, keeping in

mind other factors such as the background noise, possible objects covering up the

mic and the clarity of the voice of the user.

3. If the user spoke at the optimal distance between the user and speech recognizer

and produced the appropriate number of decibels needed by the speech

recognizer, then the pre-recorded call/message inputted will take between 5-20

seconds to process and call/text the number inputted.

Significance of the Study

Caliwan (2019) stated that though the total number of index crimes have

dropped, the number of robberies and theft has remained constant, only dropping by

0.4%. The researchers aim to help Filipino citizens by giving them an alternative

security system that is effective and seamless. This study aims to show Filipino citizens

the effectivity of speech recognition when used in security systems. The researchers

want to provide people an affordable yet reliable security measure usually seen in

higher-end products that go up to P100,000, something an ordinary joe cannot afford.

9
The researchers also aim to show security system companies that speech recognition

can be a viable form of security. This study aims to specifically help these groups of

people:

Small Local Store Owners. Stores are prone to being robbed. Especially small

local stores. Small local store owners usually can’t afford security systems that can aid

them during a robbery or emergency. The researchers focused on trying to make a

cheap yet reliable security measure. This product takes the feature of speech

recognition from higher-end security systems.

Security System Companies. The researchers’ aim to make speech recognition

software as a viable security measure. The researchers want to challenge security

system companies to look into speech recognition and invest on this portion of the

security system industry. With this, speech recognition-based security measures can

become cheaper and more affordable for the consumer.

Corner Stores and Gas Stations. These places are still open up until midnight that

this is get robbed easily by the burglars // this is the perfect target for burglars. Mesa

Alarm Systems (2017) stated that over 7,000 corner stores are robbed each year and

most burglars rarely walk away with more than $900. Since most robberies happen at

night, people should limit their time at these locations after dark. The researchers aim to

lessen the number of robberies in these places. With the speech recognizer, the

employees can easily alert authorities from burglars.

Scope and Delimitations

10
This study aims to create a device that can detect emergency based on sound. In

order to do so, the aspects looked into was the sensitivity of the sensors even to the

faintest sounds. This study will cover the effectivity of the product in capturing sounds

accurately. The device will be tested to different sound levels to determine its capability

in capturing and detecting sounds based from the user’s activities and speeches and

distinguishing the speaker from background noises. To further enhance the device' s

capability, the researchers will also conduct a test on how precise the device can pick

up sounds based on the environment. The researchers will also focus on programming

the sensors capability in picking up sounds efficiently and also the capability of the

sensor to pick up sounds properly.

Definition of Terms

For better understanding of this study, the following terms that are defined operationally:

1. Acoustic Model. It is a file that contains statistical representations of each the

distinct sounds that makes up a word.

2. Artificial Intelligence (AI). It is the branch of computer science that deal with

writing computer programs that can solve problems creatively.

3. Automated Telephone Systems. It is a telephone system that interacts with

callers without input from a human other than the caller.

4. Automatic Speech Recognition (ASR). It is the use of computer hardware and

software-based techniques to identify and process human voices. It is used to

identify the words a person has spoken or to authenticate the identity of a person

speaking into the system.

11
5. Cyber Espionage. It is an unauthorized spying by computer; the term generally

refers to the deployment of viruses that clandestinely observe or destroy data in

the computer systems of government agencies and large enterprises.

6. Hacking. It generally refers to unauthorized intrusion into a computer or a

network. The person engaging into hacking activities is known as a hacker.

7. Internet of Things (IoT). It is a system of interrelated computing devices,

mechanical and digital machines, objects, animals or people that are provided

with unique identifiers (UIDs) and the ability to transfer data over a network

without requiring human-to-human or human-to-computer interaction.

8. Malware. It is any program or file that is harmful to a computer user. It is also

called as a malicious software.

9. Markov model. It is a stochastic method for randomly changing systems where it

is assumed that future states do not depend on past states.

10. Phoneme. It is any of the abstract units of the phonetic system of a language

that correspond to a set similar speech sounds which are perceived to be a

single distinctive sound in the language.

11. Physical Security. It is that part of security concerned with physical measures

designed to safeguard personnel; and to safeguard them against espionage,

sabotage, damage, and theft.

12. Rudimentary Speech Recognition Software. It has limited vocabulary of words

and phrases, and it may only identify these if they are spoken very clearly.

13. Security Measure. It is measures taken as a precaution against theft or

espionage or sabotage etc.

12
14. Speech Recognizer/Speech Recognition. It is a computer software program or

hardware device with the ability to decode human voice.

15. Statistical Language Model. It is a file used by Speech Recognition Engine to

recognize speech, contains a large list of words and their probability of

occurrence, used in dictation applications.

16. Trigram. It is a graphic unit made up of three parts, as a trigraph.

17. Unique Identifiers (UIDs). It is a numeric or alphanumeric string that is

associated with a single entity with a given system, this made it possible to

address that entity, so that it can be accessed and interacted with.

CHAPTER 2

REVIEW OF RELATED LITERATURE AND STUDIES

The literature and studies cited in this chapter tackle the different concepts,

understandings, and ideas related to the topic of the effectivity of speech recognizer as

a form of security measure. This chapter also contains generalizations or conclusions

and different developments related to the topic. The literature and studies included in

this chapter can help in familiarizing the reader to information and abstracts that are

relevant and similar to the present study.

Current State of Security in the Philippines

Caliwan (2019) stated that total crime volume has been down and is declining,

thanks in large part to Philippine National Police’s intensified drive against crime and

13
lawlessness. This past year has seen a drop of index crimes (such as robberies,

murders, homicide, physical injury, rape, theft, car napping and cattle rustling.) by

22.6%; from 7421 in May 2018 to 5744 in May 2019. Though the total number of index

crimes have dropped, the number of robberies and theft has remained constant, only

dropping by 0.4%. He stressed the importance of safety. He stated that not because

total crimes have dropped in numbers means people should be comfortable. Filipinos

should always stay vigilant.

Bueza (2018) wrote that around 1.4 million families feel victim to common crimes

in the third quarter of 2018, according to a Social Weather Stations (SWS) survey

released on November 29, 2018. The SWS survey held from September 15 to 23

showed that 6.1% of Filipino families (around 1.4 million families) reported victimization

by any of the common crimes within the past 6 months alone (common crimes refer to

pick pocketing or robbery of personal property, break-ins, car napping, and physical

violence.) It also said that 5.6% of Filipino families have suffered from property crimes. It

is very much recommended then that people should have at least a security measure in

their homes and stores.

Importance of Security Systems

Kaysen (2017) stressed the importance of security systems on homes (and

stores) and its effectivity on reducing burglaries and robberies. The National Council for

Home Safety and Security stated that homes without alarms are three times more likely

to get burglarized. Burglaries, since the boom of the new generation, have dramatically

14
reduced crime rates down to 28%. It also states the positive and negative effects of

having security systems on the vicinity.

Rode (2019) addressed the importance of security systems for retail stores. He

stated that, as stores are big investments, it can be very upsetting and stressful when

the safety of a person’s investment is compromised by an outside intruder. Security

systems not only act as a form of safety measure: it can also be used to deter criminals

from even attempting to enter a person’s store. If a break-in does occur, having a

security system can provide police with invaluable information that can lead them to a

suspect.

Overviews on Speech Recognition

“Speech recognition is the ability of a machine or program to identify words and

phrases in spoken language and convert them to a machine-readable format.” (Rouse,

2007, para. 2). In this article, she states the meaning of speech recognition and how it

works. Speech recognition works using algorithms through acoustic and language

modeling. Acoustic modeling is the relationship between linguistic units of speech and

audio signals. It is the language modeling matches sounds with word sequences to help

distinguish between words that sound similar.

Velde (2019) explained how speech recognition work and its uses. Speech

recognition technology is not just about making things easier. It is also about the safety.

Instead of texting while driving, people can now tell their car who to call or what

restaurant to navigate to. As beneficial as it may seem in an ideal scenario, it is

dangerous when implemented before it has high enough accuracy. Speech recognition

analyze sounds by filtering what you say, digitizing it to a format it can “read,” and then
15
analyzing it for meaning. Then, based on algorithms and previous input, it can make a

highly accurate educated guess as to what the person are saying. It gets to know the

speaker’s use of language. Background noise can easily throw a speech recognition

device off track. This is because it does not inherently have the ability to distinguish the

ambient sounds it “hears” of a dog barking or a helicopter flying overhead, from a

person’s voice.

Velde (2019) also said as of May 2017, Google’s machine learning algorithms

have now achieved a 95%-word accuracy rate for the English language. That current

rate also happens to be the threshold for human accuracy. She compared the growth of

speech recognition to a child learning his or her first words. Whereas humans have

refined the process, they are still figuring out the best practices for computers. They

have to be trained in the same way our parents and teachers trained students. That

training involves a lot of innovative thinking, manpower, and research.

Gong (1995) surveyed how speech recognition in noisy environments work.

Gong concluded that environmental noise significantly degrades the performance of

most current automatic speech recognition systems. This degradation comes mainly

from differences in the learning and use environments of a system. In recent years,

many studies have focused on reducing these differences but the technology, even till

this day, still has a hard time distinguishing the voice from background noises.

Graff and Peacocke (1995) gave an introduction to speech and speaker

recognition. Speech recognition has already proven useful for certain applications, such

as telephone voice-response systems for selecting services or information, digit

recognition for cellular phones, and data entry while walking around a railway yard or

16
clambering over a jet engine during an inspection. Speaker recognition is related to

work on speech recognition. Instead of determining what was said, the focus is on

determining the speaker. Deciding whether or not a particular speaker produced the

utterance is called verification, and choosing a person's identity from a set of known

speakers is called identification.

Application of Speech Recognizers used as a Security Measure

Reynolds (2002) described the deployment of speech technologies in

STARHome, a fully functional smart home prototype. STARHome is a smart home

prototype that includes a security feature wherein they have a feature for calling people

you know. Reynolds also stated that using voice biometrics for security and home

automation involves several ergonomic constraints including a drastic limitation of

speech duration for recognition.

Foster (1996) wrote about his speech activated security system. Speech

actuated security devices and methods whereby a lock, or other security or access

device, may be actuated by a speech input thereto, but without disclosure of the actual

code where doing so to those hearing the code words spoken during use of the security

device. The security device includes a microphone, a display for displaying a plurality of

code elements, and a processor for controlling the display and analyzing the

microphone signal to detect a proper sequence of code elements spoken by a user as

detected by the microphone and to operate the security device in response thereto. He

stated that using speech recognition can be effective in cutting the physicality in half.

17
Effectivity of Speech Recognizers as Security Measures

De Leon, Hernaez, Pucher, Saratxaga and Yamagishi (2012) wrote about the

evaluation of speaker verification security and detection of HMM-based synthetic

speech. Through a hidden Markov model (HMM)-based text-to-speech (TTS)

synthesizer, which can synthesize speech for a target speaker using small amounts of

training data through model adaptation of an average voice or background model, they

tested and concluded that over 81% of the matched claims are accepted. This result

suggests vulnerability in SV systems and thus a need to accurately detect synthetic

speech.

Chow, He, Su, Yang and Zhang (2000) focused their paper on the architecture

design, implementation and optimization of distributed speech recognition systems.

They concluded that speech recognition is best when there is no background noise and

there are no physical obstructions.

Synthesis

Crime is a problem in the Philippines, it remains a serious issue throughout the

country. Caliwan (2019), stated that robberies and theft have remained constant from

May 2018 to May 2019. Thus, the reason to have a need to an effective and affordable

security systems that is accessible to the people. Kaysen (2017), stated that homes

without alarms are three times more likely to get burglarized. Security systems are an

essential in every home or stores to protect the property from robberies, theft and other

property crimes. Having a security system can act not only as a safety measure to

protect against burglars and home intruders but to also drive away these criminals.

18
Rouse (2007, para. 2), stated that; “Speech recognition is the ability of a machine

or program to identify words and phrases in spoken language and convert them to a

machine-readable format.”. Speech recognition technology is not just about making

things easier, it is also about the safety. It has a lot of uses and advantages yet only a

few people tried to use speech recognizer for security measures. One example of

speech recognizer is STARHome. Reynolds (2002), described the deployment of

speech technologies in STARHome, a fully functional smart home prototype. It is a

smart home prototype that includes a security feature wherein there’s a feature where it

can call the person’s emergency contact. Foster also made a speech actuated security

system back in 1996. A security device and methods whereby a speech input may be

actuated by the lock, or other security or access device. It includes a microphone, a

display for displaying a plurality of code elements, and a processor for controlling the

display and analyzing the microphone signal to detect a proper sequence of code

elements spoken by a user as detected by the microphone and to operate the security

device.

De Leon, Hernaez, Pucher, Saratxaga and Yamagishi (2012), concluded their

study on evaluation of speaker verification security and detection of HMM-based

synthetic speech that over 81% of the matched claims are accepted. This result

suggests a vulnerability in the system, the researchers will be using Google’s API Client

Library for Speech Recognition. Velde (2019), stated that Google’s machine learning

algorithms has achieved a 95%-word accuracy. Gong (1995) concluded that

environmental noise greatly degrades the performance of the speech recognizer,

something the researchers are tackling on preventing. Chow, He, Su, Yang and Zhang

19
(2000) concluded that speech recognition is best when there is no background noise

and there are no physical obstructions.

In conclusion, speech recognizer has proven itself as a useful tool and the future

of technology. With the Philippines’ current climate towards security, having an

affordable and seamless security system with speech recognition capabilities is a step

towards the future. There have been countless examples of people trying to use speech

recognition as a form of security measure and with the current advanced technologies

we currently have. We the researchers, are confident to create a product an effective

security system.

CHAPTER 3

RESEARCH METHODOLOGY

Research Design

“Research design is defined as a framework of methods and techniques chosen

by a researcher to combine various components of research in a reasonably logical

manner so that the research problem is efficiently handled. It provides insights about

“how” to conduct research using a particular methodology,” (Bhat, 2019). Research

20
design is a model or layout used to answer the research questions. The researchers

used a true experimental research design, which is defined as a type of experimental

design that is thought to be the most accurate type of experimental research, to collect

and gather data and other information that was needed for the product. Bhat (2019)

stated that experimental research is any research conducted with a scientific approach,

where a set of variables are kept constant while the other set of variables are being

measured as the subject of experiment. The researchers conducted several trials in

order to know if the speech recognizer is effective as a form security measure. The

researchers chose to use the experimental research design to test the security system’s

ability to sort through background noise, properly identify the key word, and quickly

notify the authorities.

Research Setting

The study was conducted in two rooms within a condominium at East

Residences Ortigas, a condominium complex that lies in Pasig, Metro Manila. Each

room had ample space to provide a spacious working environment. One room was used

to test the effectivity of the speech recognizer on a natural setting, wherein the

monotonous routine of hustle and bustle are ongoing. The researchers also provided a

room wherein no background noise is being emitted. This was done to properly record

the sensor’s capability without any physical obstructions. experiment to simulate the

noisy. This place was chosen for the environments usually seen in local stores and to

test the sensor and software’s capability to distinguish between the background noise

and the speaker. This place was also chosen to examine the sensors capability to

properly and quickly identify the key word.

21
Research Instruments

The results of the product were acquired through experimentation. The

researchers used a decibel meter to accurately detect the decibel level of the room. The

researchers also used a tape measure to see the distance between the microphone and

the speaker. The experiment was done to find the effectivity of speech recognizer as a

form of security measure by measuring and identifying the relationship between the

background the background noise level, the distance of the speaker from the

microphone, and the ability of the sensor to properly pick up and identify the key word.

To further test the ability of the speech recognizer the researchers will be using a

speaker with noises solely based on regular routines.

Experimentation was chosen as a research instrument to properly identify the

effectivity of speech recognizer as a form of security measure using controlled variables

to procure the most accurate results. Data collected will then be processed through the

success rate percentage.

Materials

Table 1. The Raspberry Pi 3 Model B and the Microphone with its Corresponding

Materials Quantity  Price Appearance

Rasperry Pi 3 Model B 1 ₱2500

22
Home Studio USB 1 ₱928

Condenser Microphone

F-165 Multimedia 1 Pair ₱300

Speakers

Quantity, Price and Appearance.

The materials provided above are the components that were used during the

making of the speech-recognizer security system. Moody (2011) described the

Raspberry Pi as a "potential BBC Micro 2.0", not by replacing PC compatible machines

but by supplementing them. The Raspberry Pi is a very versatile product mainly used

for robotics. Due to these reasons, the researchers chose to use the Raspberry Pi

instead of other alternatives.

The Home Studio USB Condenser Microphone has a rating of 4.5 stars out of 81

reviews in Lazada. The researchers decided to use this microphone as it also came with

a noise filter which can eliminate most of the researchers’ problems with background

noise.

The F-165 Multimedia Speakers are cheap wired speakers that provides loud

and clear enough sound to alert near neighbors and to scare intruders away.

23
Equipment

Table 2. All the Equipment Used with its Corresponding Quantity, Price and

Appearance.

Equipment Quantity Price Appearance

OTG Cable 1 piece provided

HDMI Cable  1 piece provided

Monitor  1 piece provided

Keyboard 1 piece provided

Mouse 1 piece provided

The equipment used were essentials and mainly used to communicate with the

Raspberry Pi more efficiently. An OTG Cable (a wire that enables a connection between

micro-USB and regular USB) was used to connect normal USB devices to the

Raspberry Pi, since the Raspberry Pi only had micro-USB ports. A monitor was used to

see the input of the device. The HDMI cable was used to connect the monitor to the

24
Raspberry Pi. A keyboard was used to type all the codes necessary and a mouse is

also used to interact with the Raspberry Pi through its user interface.

Procedures

1. Connect the Raspberry Pi to a monitor, a mouse, and a keyboard.

2. Choose an operating system for the Raspberry Pi. The researchers chose to install

Linux, an open-source operating system modelled on UNIX, as the operating system

for the security system.

3. Install Python, including all the necessary libraries. Download the installer from the

official Linux website. Run the installer then choose the path you want Python to be

installed in. Choose custom and tick all of the boxes.

4. Install PIP, the package installer for Python which can be used to install packages

from the Python Package Index and other indexes. To install PIP, download get-

pip.py from the official Linux website. Open a terminal using APT and execute curl

https://bootstrap.pypa.io/get-pip.py -o get-pip.py. Afterwards, execute python

get-pip.py.

5. PyAudio, an extension of Python and a requirement for Python to be able to use the

microphone, needs to be installed. To install PyAudio, use the APT and execute

sudo apt-get install python-pyaudio python3-pyaudio in a terminal.

6. Install Google API Client Library for Python, the main library/database the software

will use to properly identify the word by constantly listening to the user and

comparing the audio to thousands of files until they find a match. To install Google

API Client Library for Python, open a terminal using APT and execute pip install

--upgrade google-api-python-client.

25
7. Connect the USB Microphone to the port using an OTG cable. Make sure the

Internet is connected for the Google API Client Library to work.

8. Open IDLE, the default source code editor that comes pre-installed with Python, and

write this code:

LINK FOR CODE:

https://github.com/malnourishita/speech/blob/master/GUI%20TEST.py

Make sure there are no errors in the code. Test and run the code many times. Run the

program and leave it.

9. Activate the security system by saying the placed keyword.

Statistical Treatment

To get the results of the data effectively, the researchers used ANOVA to

determine the results of the thesis.

Analysis of Variance

This can help the researchers in creating a comparison between two or more

variables that allows the researchers to get various results and predictions on two or

more sets of data.

Steps In ANOVA

First procedure is to determine the optimal distance between the sensor and the

user.

26
Second procedure is to test how many times the speech recognizer will accept

the key word over a hundred times.

Third procedure is to assess the elapsed time between the activation of the

speech recognition and the text/call.

27
CHAPTER 4

RESULTS AND DISCUSSION

This chapter overlooks and discusses the results, presentation, analysis and

interpretation of the data gathered by the researchers. This study aims to determine the

effectivity of speech recognizer as a form of security measure. The researchers applied

an experimental study to properly procure the data presented. Experiments were done

to answer the questions communicated in the statement of the problem. The analytical

procedures are arranged according to the sequence of specific questions.

The experiments are mainly focused on the feasibility and accuracy of the speech

recognition and the user’s satisfaction towards the product. No experiment will be done

on measuring the effectivity of the security measure after the text has been received by

the user as its effectivity is controlled by variables that are too hard to control and are

too broad. The keyword used in the program is “help” unless said otherwise.

Through the experiments, the researchers wish to answer the following questions

from the statement of the problem:

1. What will be the optimal distance between the user and the speech recognizer

for the software and microphone to properly identify the voice?

2. How accurate will the sensor and speech recognizing software be in recognizing

and identifying when the user is trying to activate the security measure?

3. How long is the elapsed time between the activation of the speech recognition

software and the call and text made?

28
Distance (in meters)
20

18

16

14
Number of Times Recognized
12

10

0
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8

Distance (in meters)

Figuring Out the Optimal Distance

Figure 1.1 shows how many times the speech recognizer successfully recognized the

word in specific distances.

Figure 1.1 shows the effectivity of the speech recognizer in specific distances. At

each distance, the researchers made sure the speaker kept a constant of 70 dB to

guarantee an accurate finding. The speaker was asked to say the default keyword, help,

at a specific distance for 20 times each. A measuring tape was used to measure the

distance between the user and the microphone. A decibel meter app was used to detect

the decibel level of the user (the decibel meter was placed near the speaker, not the

speech recognizer). The researchers included the results that were around 67-73 dB

and those that did not meet the criteria were not included. 70 dB was chosen to be used

as the standard in all of the researchers test as it is the approximate decibel level

29
60

50

40

30

20

10

0
1 Meter 2 Meters 3 Meters 4 Meters 5 Meters 6 Meters 7 Meters 8 Meters

Set A (Constant: 60 dB) Set B (60 dB +5 dB to every meter)

produced by a human when shouting/talking loudly (Sramkova, 2015). Interestingly

enough, the graph is not a smooth, gradual decline. There are dips in certain distances.

These dips can be explained away to the program’s difficulty in picking up sounds at

long distances or as a software glitch. 0.5 m and 1 m are the most optimal distances

with 19 successful activations out of 20. 1.5 to 3.5 meters follow next, since they have

almost the same range, at 18-17. 2.5 m had a dip in activations as it only scored 16 out

of 20 but 3 m has 17 activations out of 20. The researchers decided to stop at 8 m as

they feel further distances will no longer be effective and will not help in the study. 0.5 m

– 3.5 m is the optimal distance range wherein the user should stand in order to properly

activate the security measure. Based on this, the researchers conclude that 2 m is the

optimal distance wherein the user should be as the distance was effective in activating

the speech recognizer without sacrificing

30
Figure 1.2 shows the relationship between decibels produced, the distance between the

user and the speech recognizer and its effectivity in successfully identifying whenever

the user is trying to activate the speech recognizer.

The experiment was done with a background noise of 55-65 dB. Two sets of

experiments were done. Set A was done with the speaker’s decibel level at 60 dB, while

Set B was done with 60 dB plus 5 dB for every meter passed. For both sets, the

researchers attempted to activate the speech recognizer at 1-8 m. Set A was tested

with the decibel level of the speaker constant through the different distances. Set B, on

the other hand, had the speaker increase her decibel level by 5 dB for every meter.

Figure 1.2 shows the relationship between decibels produced, the distance between the

user and the speech recognizer and its effectivity. Each set had 400 attempts (divided

through the meters). Figure 1.2 shows that set A sees a gradual decline in 1 meter to 6

meters, and a massive drop in 7 and 8 meters. Set B, as seen in Figure 1.2, flatlines in

its effectivity, with minor dips in between them and an improvement in performance at 7

meters (with a 90-dB level). These minor dips fall within the margin of error. With the

results displayed in Figure 1.2, the researchers concluded that if the decibels produced

stays the same and the distance becomes farther, the effectivity level starts declining.

Whilst if the decibels produced increase alongside the distance, then the effectivity

stays relatively the same.

31
Testing the Product’s Effectivity and Accuracy

Accuracy of the Speech Recognizer

Not recognized
6%

Recognized
94%

Recognized Not recognized

Figure 2.1 shows the accuracy of the speech recognizer in recognizing when the user is

trying to activate it.

To find out the accuracy of the speech recognizer, the researchers made the

speaker stand at the optimal distance, which is 2 meters, and the speaker spoke at 70

dB. There was no physical obstruction between the user and the microphone. The

background noise level was around 55-65 dB. All variables were set at its optimal

setting. Figure 2.1 shows that out of 100 times, the speech recognizer recognized 94

attempts to activate the security measure, making the speech recognizer’s accuracy

94%. Any attempt that was detected after 30 seconds was considered as not

32
recognized, which was the case for 2 attempts. This shows that the speech recognizer,

in its most optimal setting, is very effective.

Accuracy on Different Background Noise Levels


60

50

40

30

20

10

48 46 43 37 25 16
0
60 dB 70 dB 80 dB 90 dB 100 dB 110 dB

Column1

Number of Times Recognized

33
Decibel Levels and its Real Life Equivalent
Rock Concert 110

Factory Machinery 100

Subway Train 90

Blender 80

Washing Machine, TV 70

Normal Conversation Noise 60

0 20 40 60 80 100 120

Column1

Figure 2.2 shows the accuracy of the speech recognizer in recognizing when the user is

trying to activate it on certain background noise levels.

Figure 2.3 shows the decibel levels and its real life equivalent.

The speaker stood at the optimal distance, which is 2 meters, away from the

microphone, and the speaker spoke at 70 dB. The researchers played tv static white

noise at different volumes to maintain consistency all throughout the test. 50 attempts

were given to each background noise level. Figure 2.3 shows that the speech

34
Number of Times Recognized Different Keywords
Help 20

Spanghew 17

Mytacism 16

Whiffle 19

Gyascutus 17

Levament 18

Supercalifrajilisticexpialidocious 19

Onomatopoeia 17

Poltophagy 18

Axinomacy 17
0 5 10 15 20 25

Number of Times Recognized

recognizer has no problem recognizing the keyword around 60, 70 and 80 dB of

background noise levels. According to figure 2.4, 60 dB is the approximate decibel level

for normal conversation noise, 70 dB is the approximate decibel level of a washing

machine and 80 dB is the approximate decibel level for a blender. This shows that the

speech recognizer can withstand normal conversation to a blender. The researchers

noticed the speech recognizer having some difficulties at around 90 dB (90 dB is the

approximate decibel level for a subway train). At 100 dB, there was a significant drop in

the number of times the speech recognizer recognized when the user was trying to

activate it. The researchers conclude that the optimal background noise level for the

speech recognizer should be around 60-90 dB. For reference, Figure 2.3 shows the

various decibel levels and its real-life example.

35
Figure 2.4 shows the effectivity of speech recognizer on different types of keywords.

The speaker stood at the optimal distance, which is 2 meters, and the speaker

spoke at 70 dB. The background noise level was around 55-65 dB. Figure 2.4 shows

the effectiveness of the speech recognizer in different types of keywords. The

researchers conducted this experiment to show the range of the speech recognizer. To

test the ability of the speech recognizer, the researchers used unconventional words

that are not used in everyday conversation. The researchers chose to use the top 8 of

the most obscure words, according to Merriam-Webster, and the words

supercalifragilisticexpialidocious and onomatopoeia, as they are two words that are

infamous for being hard to pronounce. The researchers chose these words as they are

the perfect candidates to use as a keyword, as they are obscure words that are rarely

said in everyday conversation. The researchers also included the word help as a base

to compare to the other keywords. The researchers made sure the speaker pronounced

each word in its proper pronunciation to maintain accuracy. Figure 2.4 shows that the

36
speech recognizer was successful in recognizing the inputted keywords. The word the

speech recognizer had the hardest time detecting was Mytacism, but that can be

attributed to the word’s hard pronunciation and its obscurity. A problem the researchers

also noticed is the mispronunciation and the lack of clarity of the voice of the user, as

the speaker frequently mispronounced the words (though these attempts were not

counted in the final result.) The researchers conclude that the speech recognizer can

detect obscure words with ease, as long as the word is said properly.

Number of Times Recognized Different Keywords


Robber Robber Get Him Robber 44

Help Me Somebody Please 48

I Think There is a Trespasser 43

Call the Cops Now 50

There is Someone in My House 44

My Favorite Day is Friday 49

38 40 42 44 46 48 50 52

Number of Times Recognized

37
Figure 2.5 shows the effectivity of speech recognizer on different types of phrases and

sentences.

38
Effectivity If Covered By Various Objects

Windbreaker 17

Pillow 16

Paper Bag 19

Plastic Bag 20

0 5 10 15 20 25

Number of Times Recognized

The speaker stood at the optimal distance, which is 2 meters, and the speaker

spoke at 70 dB. The background noise level was around 55-65 dB. Figure 2.5 shows

how effective the speech recognizer is on detecting different types of phrases and

sentences. The researchers used phrases that are related to “Robberies” and

“Trespassing”. In phrases or sentences, the results are better than the expected

outcome. Figure 2.5 shows that the phrases actually faired better than the words used

on Figure 2.4. The researchers noticed that there was a correlation between the number

of syllables in a sentence and its effectivity as the phrases with fewer syllables got a

higher result whilst the phrases with the most syllables got the worst results (both “I

think there is a trespasser” and “Robber robber get him robber” have 8 syllables, the

largest number of syllables among the phrases). The researchers concluded that any

word can be used, as long as it is pronounced properly. The researchers also

concluded through this experiment that a phrase is more effective and there is less risk

of activating the security system by mistake.

39
Figure 2.6 shows the effectivity of speech recognizer if obscured by various thin objects.

As the user might want to hide the mic, the researchers wanted to test the items

tested in Figure 2.6, as they are relatively thin items that can cover the mic. The

speaker stood at the optimal distance, which is 2 meters, and the speaker spoke at 70

dB. The background noise level was around 55-65 dB. The researchers chose to test

the objects windbreaker, pillow, paper bag and a plastic bag if they will hinder the

effectivity of the speech recognizer. These items were chosen as they are relatively thin

objects that are perfect to use as a cover for the microphone. Figure 2.6 shows that thin

objects like plastics and paper bags doesn’t hinder the speech recognizer’s effectivity

whilst objects with a thicker material does struggle a bit. The researchers suggest

covering the mic with plastic bags and paper bags as it is the most effective. They also

suggest the user to avoid covering the mic with pillows, jackets or anything with a

thicker material.

TRIA ACTIVATE TRIA ACTIVATE TRIA ACTIVATE TRIA ACTIVATE

L NO. D L NO. D L NO. D L NO. D


1 NO 6 NO 11 YES 16 NO
2 NO (37.56 7 NO 12 NO 17 NO
sec)

40
3 NO (39.08 8 NO (34.81 13 NO 18 NO
sec) sec)
4 NO 9 NO 14 NO 19 NO (43.01
sec)
5 NO (45.91 10 NO 15 NO 20 NO
sec)

Figure 2.7 shows the effectivity of speech recognizer if obscured by a wall.

One of the worst enemies a speech recognizer has are physical obstructions.

Physical obstructions greatly decrease the effectivity of a speech recognizer. And any

building has one physical obstruction they can’t remove because it’s part of the

building’s foundation, it’s walls. Figure 2.7 shows how the speech recognizer is not

effective if obscured by a wall. The speaker stood at the optimal distance, which is 2

meters, and the speaker spoke at 70 dB. The background noise level was around 55-65

dB. Out of the 20 times the speech recognizer was tested, only trial no.11 succeeded.

Trials with a number on their side means the speech recognizer did detect that the user

was trying to activate the security measure, though it was after 30 seconds. The number

beside them was how long it took to be recognized by the speech recognizer. After 1

minute, the researchers move unto another trial. The researchers conclude that walls

will hinder and make the speech recognizer not effective.

TRIAL NUMBER ELAPSED TIME BETWEEN CALL AND ACTIVATION

1 16.98 seconds

2 15.46 seconds

3 10.32 seconds

4 15.34 seconds

41
5 13.78 seconds

6 10.53 seconds

7 14.46 seconds

8 10.16 seconds

9 8.6 seconds

10 9.09 seconds

11 10.23 seconds

12 12.04 seconds

13 9.93 seconds

14 26.20 seconds

15 11.07 seconds

Finding the Elapsed Time between Calls and Texts

Figure 3.1 shows how long the elapsed Time is between the activation of the speech

recognition software and the call made.

Figure 3.1 shows the inconsistency of how long the security measure takes to

call the inputted number. The speaker stood at the optimal distance, which is 2 meters,

and the speaker spoke at 70 dB. The background noise level was around 55-65 dB. Out

of the 15 trials, result shows that the fastest time is during the 9 th trial which is 8.60

seconds. However, there was an irregularity as the result was almost tripled during the

14th trial. These errors can be explained away as software glitches. These software

glitches make the speech recognizer inconsistent in activating the security measure.

The average time elapsed time between the call and activation of the speech

recognition is 12.946 seconds.

42
TRIAL NUMBER ELAPSED TIME BETWEEN TEXT AND ACTIVATION

1 10.23 seconds

2 7.31 seconds

3 7.25 seconds

4 8.64 seconds

5 10.01 seconds

6 6.23 seconds

7 7.12 seconds

8 6.45 seconds

9 8.13 seconds

10 8.69 seconds

11 9.23 seconds

12 7.46 seconds

13 8.12 seconds

14 7.04 seconds

15 9.62 seconds

Figure 3.2 shows how long the elapsed Time is between the activation of the speech

recognition software and the text made.

Figure 3.2 shows that the security measure, compared to Figure 3.1, is far more

consistent in sending a text than initiating a call. The speaker stood at the optimal

distance, which is 2 meters, and the speaker spoke at 70 dB. The background noise

level was around 55-65 dB. Out of 15 trials, results show that the fastest text sent was

during the 6th trial, which had a time of 6.23 seconds. The longest was during the 1 st

trial, which had a time of 10.23, not bad compared to the tripled time in Figure 3.1. The
43
average time elapsed Time between the text and activation of the speech recognition is

7.063 seconds.

Discussion

Established by Figure 1.1, the optimal distance between the user and the speech

recognizer is 0.5 to 3.5 meters, with the decibel level of the user being 70 dB. The

researchers decided to make 2 meters as the optimal distance between the speech

recognizer and user, as it provides effectivity while still providing space and range of

motion for the user. According to Figure 1.2, as the user’s distance increases, the

effectivity of the product decreases. To keep the effectivity as the user’s distance

increases, increase the user’s decibel level along with the distance, as promoted by

Figure 1.2.

The researchers measured the effectivity of the speech recognizer and found

that the speech recognizer had a 94% chance of working on the most optimal setting.

Background noise, when too loud, can hinder the performance of the speech

recognizer, as shown in Figure 2.2. 60 to 70 dB is the ideal background noise levels.

The researchers suggest the user to avoid background noise levels of up to 90 to 110

dB as they significantly drop the effectivity of the product.

The researchers recommend the user to use a phrase or sentence as their

keyword as it is proved to be more effective than using an obscure word. According to

Figure 2.4 and Figure 2.5, simple phrases were detected more often than the

complicated words. However, the researchers concluded that any word can be used

44
and the word itself will not hinder the performance of the speech recognizer. But the

pronunciation and clarity of the user is an important factor and it can reduce the

effectivity of the product. The researchers suggest using a word that the user can

properly pronounce and is easy to remember but obscure enough that it will not be said

on a regular conversation. The effectivity of the product can be reduced if obscured by

an object. Thin objects like paper and plastics will not affect the product’s effectivity but

objects that are thicker can slightly hinder the performance of the speech recognizer. In

Figure 2.7, the researchers tried to activate the speech recognizer through a wall but to

no avail. The researchers can conclude that the speech recognizer cannot be used in

different rooms.

The security measure’s elapsed time between the activation and the initiation of

a call is very inconsistent and the average time elapsed time is 12.946 seconds,

according to Figure 3.1. The security measure’s elapsed time between the activation

and the sending of a text, however, was smooth, consistent and a lot faster than

initiating a call. The average time elapsed time is 7.063 seconds, according to Figure

3.2.

CHAPTER 5

SUMMARY, CONCLUSIONS AND RECOMMENDATIONS

This chapter presents the summary of the findings, conclusions and the

corresponding recommendations.

Summary

45
Though crime rates have dropped for non-index crimes, robberies and theft has

remained constant and is even increasing. Having a security measure in every home

and public areas like malls, restaurants, and local stores can make people feel safer.

Security measures plays a big role in the lives of people because this can save people

from danger. Local stores in the Philippines lack security measures because a.) it’s too

expensive or b.) they don’t think of securing their stores until it is too late.

The researchers wanted to provide an affordable and reliable security system to

the masses, while also being forward-thinking and advanced. That’s why the

researchers decided to combine speech recognition and security measure to produce a

security measure that is hands-free, reliable, easy to use, and affordable security

measure. The researchers aim to have an affordable and viable product that can show

Filipinos a glimpse of the future of safety. Speech recognition is the easiest form of

security measure. By just saying the keyword, the speech recognizer can immediately

activate its safety precautions and call for emergency.

The researchers created the security system using the Raspberry Pi, a popular

micro-computer used by programmers everywhere and a simple USB microphone. The

researchers programmed it and fine-tailored it until it became effective enough for the

researchers’ standards. Once the speech recognizer is activated, the security measure

will then send a text and initiate a call to the inputted numbers. The call will play a pre-

recorded message that is customizable. The GUI allows the user to change the text, the

numbers it will call and the keyword used to activate the security system.

Once the product was done, the speech recognizer went through a myriad of

tests in order for the researchers to examine how effective the speech recognition is.

46
The security measure was then tested by getting the average gap between the call/texts

and activation. The outcome of the product turned out to be very effective and

successful and it proves that the speech recognition is effective and feasible as a

security measure.

Conclusion

1. What will be the optimal distance between the user and the speech recognizer for

the software and microphone to properly identify the voice?

- The researchers conducted a series of tests and concluded that the optimal

distance range between the user and the speech recognizer is around 0.5 meters

-3.5 meters and the optimal distance is 2 meters (considering all other variables

are at its optimal setting). The researchers also concluded that the decibel level

of the user and the distance between the user and the speech recognizer should

have a directly-proportional relationship in order to keep the effectivity of the

speech recognizer.

2. How accurate will the sensor and speech recognizing software be in recognizing and

identifying when the user is trying to activate the security measure?

- The researchers tested the speech recognizer 100 times, trying to activate the

security system, and they got a result of 94 out of 100. The researchers

concluded that the speech recognizer has an accuracy of 94%. Background

noise of around 60-80 dB is the optimal setting to achieve maximum effectivity for

the product. The researchers also conducted a series of tests testing the

effectivity of the speech recognizer in recognizing both obscure words and

phrases and they concluded that using phrases is not only more effective than

47
using words, it can also prevent accidental activations of the security system.

The researchers also tested the speech recognizer to see if it will activate if

covered by thin materials/objects and a wall. They concluded that thin materials

wont affect the effectivity of the speech recognizer while thicker objects, including

walls, can hinder the effectivity of a product.

3. How long is the elapsed time between the activation of the speech recognition

software and the call/text made?

- The elapsed time between the activation of the speech recognition software and

the call made has an average time of 12.946 seconds while the elapsed time

between the activation of the speech recognition software and the text made has

an average time of 7.063 seconds.

After a countless of studies and research was conducted and done by the

researchers, the researchers concluded that speech recognition is an effective, feasible

and affordable security system.

Recommendation

There are variations in the study that the researchers recommend the future

researchers to improve on; for example, having a better microphone and equipment is

recommended if the future researchers want better results. A more advanced Raspberry

48
Pi model is recommended to process the program more efficiently, though it is not

needed.

To further enhance the product, a program that can execute the commands more

quickly and more efficiently with fewer bugs is recommended. Having your own speech

libraries instead of using google can be both bad and good, as Google’s libraries are

well optimized but having your own library can essentially remove the use of WIFI in the

product.

Better testing conditions and more trials can really flesh out the results of the

product. Asides for the success of the research, focusing on these can further help the

future researchers in aiding them if they wish to contribute to this investigatory project.

Bibliography

1. A brief history of speech recognition. (n.d.). Retrieved from https://sonix.ai/history-of-

speech-recognition

49
2. Rouse, M. (2016, September 21). What is physical security? - Definition from

WhatIs.com. Retrieved from https://searchsecurity.techtarget.com/definition/physical-

security

3. Rouse, M. (2016, December 6). What is speech recognition? - Definition from

WhatIs.com. Retrieved from

https://searchcustomerexperience.techtarget.com/definition/speech-recognition

4. Real Python. (2020, January 23). The Ultimate Guide To Speech Recognition With

Python. Retrieved from https://realpython.com/python-speech-recognition/

5. Kaysen, R. (2017, December 22). Do Security Systems Make Your Home Safer?

Retrieved from https://www.nytimes.com/2017/12/22/realestate/do-security-systems-

make-your-home-safer.html?rref=collection%2Ftimestopic%2FHome

%20Security&action=click&contentCollection=timestopics&region=stream&module=stre

am_unit&version=latest&contentPlacement=5&pgtype=collection

6. Importance of Home Security System. (n.d.). Retrieved from

http://www.netfreedom.org/the-importance-of-home-security-system.asp

7. The Importance Of Security Alarm Systems For Your Retail Store – Security Alarms

Miami - Articles - Advanced Fire & Security - Advanced Fire Sprinklers. (n.d.). Retrieved

from http://www.advfireonline.com/advanced-fire-and-security-articles-the-importance-

of-security-alarm-systems-for-your-retail-store.html

8. The National Security situation in 2018, and outlook for 2019. (n.d.). Retrieved from

https://www.google.com/amp/s/pia.gov.ph/news/articles/1016616.amp

9. Caliwan, C. L. (2019, June 16). Total crime volume down in May 2019: PNP.

Retrieved from https://www.pna.gov.ph/articles/1072470

50
10. Rouse, M. (2016, December 6). What is speech recognition? - Definition from

WhatIs.com. Retrieved from

https://searchcustomerexperience.techtarget.com/definition/speech-recognition

11. Krishnan, S. (2018, October 12). Create your own Voice based application using

Python. Retrieved from https://medium.com/@sundarstyles89/create-your-own-google-

assistant-voice-based-assistant-using-python-94b577d724f9

12. Googleapis. (2020, March 24). googleapis/google-api-python-client. Retrieved from

https://github.com/googleapis/google-api-python-client

13. Googleapis. (2020, March 26). googleapis/google-cloud-python. Retrieved from

https://github.com/googleapis/google-cloud-python

14. Speech Recognition. (n.d.). Retrieved from

https://pypi.org/project/SpeechRecognition/

15. Making Calls. (n.d.). Retrieved from https://www.twilio.com/docs/voice/make-calls?

fbclid=IwAR0mxVZzHMat3JBaBc8PJGe_0DwDLdc6IZGDOpvubp_15CH5lEJKufrHFfs

16. Prell, C. G. L., & Clavier, O. H. (2016, October 12). Effects of noise on speech

recognition: Challenges for communication by service members. Retrieved from

https://www.sciencedirect.com/science/article/pii/S0378595516303513

17. EarQ. (n.d.). Retrieved from https://www.earq.com/hearing-health/decibels

TALOSIG, Nathan E.

Lot 31, Blk. 2, Dahlia Street, Phase 7-B, Greenwoods Exec.

Village, Cainta, Rizal

09437057481

51
nathantalosig7@gmail.com

EDUCATIONAL BACKGROUND

High School

Sacred Heart Academy of Pasig (2016 – 2020)

Grade School

Saint Gabriel International School

Angelicum College

Pre School

Saint Vincent Preschool

Saint Gabriel International School

ACHIEVEMENTS

Excellence in Conduct (Grade 2-3)

Excellence in Academics (Grade 2)

Top 3 in Reading (Grade 6)

MTAP Participant (Grade 3)

MTAP Participant (Grade 7-9)

Top 3 in Bookmark Making Contest (Grade 2)

52
Best Boy Scout (Grade 4)

Green Merit Card Receiver (Grade 7-1st Quarter, 3rd Quarter, 4th Quarter)

White Merit Card Reciever (Grade 7-2nd Quarter)

White Merit Card Reciever (Grade 8, Grade 9)

Green Merit Card Receiver (Grade 10- 1st Quarter, 2nd Quarter, 3rd Quarter)

Poetry Slam Contest: 3rd Place (Grade 7)

Mr. & Ms. UN Participant (Grade 7)

Mr. & Ms. UN First Runner Up (Grade 9)

SHAP Pautakan Participant (Grade 8)

SHAP Pautakan Participant (Grade 9)

Literary Cosplay Junior Winner (Grade 9)

Literary Cosplay 3rd Placer All in All (Grade 9)

Perfect Attendance (Grade 9-1st Quarter, 3rd Quarter)

Ultimate Sci-Math Quiz Bee Participant (Grade 8)

Ultimate Sci-Math Quiz Bee Participant (Grade 9)

English Quiz Bee 3rd Placer (Grade 10)

Social Studies Quiz Bee Participant (Grade 10)

2nd Honorable Mention (Grade 6)

53
Second Honors (Grade 5)

Pep Squad Varsity (Grade 9)

Cheerdance Competition (Grade 8-9)

Volleyball Varsity (Grade 9)

Dance Troupe (Grade 6)

Choir (Grade 5-6)

INTERESTS

Rapping

Making YouTube Videos

Being Beautiful and Smart

______________________________________________________________________

CHARACTER REFERENCE

Name: Kendra Caramat

Occupation: English Teacher

Name: Raldin Gem Frias

Occupation: Science Teacher

Name: Yvonne Cagalingan

54
Occupation: Filipino Teacher

Name: Katreng Solas Aporo

Occupation: Social Studies Teacher / Best Friend

“I hereby certify that the information above is true and correct.”

REYES, Louise Erlle P.

Zuri Residences, Tokyo Avenue, Block 6 Lot 8, Cabrera Road,

Barangay Dolores, Taytay Rizal

09063037954

izzyreyes1222@icloud.com / louiseekim03@gmail.com

55
EDUCATIONAL BACKGROUND

High School

Sacred Heart Academy of Pasig (2014 – 2020)

Grade School

Calvary Christian School

Pre School

Mona Lisa Academy

ACHIEVEMENTS

With Honors (Nursery – Grade 4)

Most Improved (Grade 3 – 4)

Dance Class (New Generation Workshop) (2010)

Violin and Piano Lesson (2012 - 2014)

Contestant on The Voice Academy (2015)

With Honors (2017 – 2020)

Dance Class (ACTS Academy) (2018)

Cheer Dance Competition (2017 – 2019)

56
Perfect Attendance Awardee (2015 – 2019)

INTERESTS

Dancing

Singing

Modeling

______________________________________________________________________

CHARACTER REFERENCE

Name: Kendra Caramat

Occupation: English Teacher

Name: Raldin Gem Frias

Occupation: Science Teacher

“I hereby certify that the information above is true and correct.”

MIRAFLOR,Robbin Cross F.

Ciudad del Carmen b1 l1 Rosario Pasig City

09202672074

robbinmiraflor@yahoo.com/crossmiraflor16@gmail.com

57
EDUCATIONAL BACKGROUND

High School

Sacred Heart Academy of Pasig

Grade School

Sacred Heart Academy of Pasig

Pre School

Woodstock Learning center

ACHIEVEMENTS

With Honors (Grade 10 1st quarter)

Perfect Attendance Awardee

INTERESTS

Basketball

Gaming

______________________________________________________________________

CHARACTER REFERENCE

Name: Kendra Caramat

Occupation: English Teacher

58
Name: Raldin Gem Frias

Occupation: Science Teacher

“I hereby certify that the information above is true and correct.”

DONATO, Asianti Crishna E.

Unit 141 Amethyst bldg. East Residences Ortigas Pasig City

09993080408

donatoasianti@gmail.com

EDUCATIONAL BACKGROUND

59
High School

Sacred Heart Academy of Pasig (2014 – 2020)

Grade School

Paintbox School for Kids

Pre School

Paintbox School for Kids

ACHIEVEMENTS

With Honors (Nursery –Grade 5)

Second Honorable Mention (Grade 6)

Piano and swimming lessons (2010-2013)

Violin lessons (2016-2018)

Cheer Dance Competition (2017;2019)

INTERESTS

Reading

Painting

Drawing

60
______________________________________________________________________

CHARACTER REFERENCE

Name: Kendra Caramat

Occupation: English Teacher

Name: Raldin Gem Frias

Occupation: Science Teacher

“I hereby certify that the information above is true and correct.”

DAGSIL, Dyanne Francine D .

B6L15 Star Apple St. Ph. 8D Greenwoods Exe. Vil. Taytay Rizal

2752891

dyannefrancinedagsil@gmail.com

61
EDUCATIONAL BACKGROUND

High School

Sacred Heart Academy of Pasig (Grade 7-10)

Grade School

Sacred Heart Academy of Pasig (Grade 1-6)

Pre School

John Michael Learning Center

ACHIEVEMENTS

Perfect Attendance Awardee (Grade 6-10)

With Honors (Grade 1-10)

3rd Place in Filipino Poster Making Contest (Grade 8)

INTERESTS

Animating

Drawing

______________________________________________________________________

CHARACTER REFERENCE

62
Name: Kendra Caramat

Occupation: English Teacher

Name: Raldin Gem Frias

Occupation: Science Teacher

“I hereby certify that the information above is true and correct.”

63

You might also like