Professional Documents
Culture Documents
Understanding and Mitigating Security Risks of Voice-Controlled Third-Party Functions On Virtual Personal Assistant
Understanding and Mitigating Security Risks of Voice-Controlled Third-Party Functions On Virtual Personal Assistant
Understanding and Mitigating Security Risks of Voice-Controlled Third-Party Functions On Virtual Personal Assistant
TEAM 7
LAKSHMISAGAR A S
LOHITH J R
MANAVI SATTIGARAHALLI
Virtual Personal Assistants (VPA) rely on voice commands to communicate with users.
Smart speakers allow third-party to publish function (skills or actions) for service of users.
VPAs are vulnerable to security threats like voice squatting and voice masquerading, whose aim will be to steal personal
information.
The paper proposes technique to automatically captures ongoing attacks to address threats within Amazon Echo and Google
Home.
CONTENT
Introduction
Background
Exploiting VPA Voice control
Defending voice squatting
Defending voice masquerading
Discussion
Conclusion
INTRODUCTION
Voice-based attacks
Adversary exploits utilizing the variations in commands and how the skill is invoked.
. Attackers aims at interactions between users and VPA, and the hand-overing commands are manipulated.
VPA responses with the skill which contains the maximum matching words or longest matching pattern.
Alexa and Google Home cannot recognize accurately and are capable of hijacking.
Experiment
In Alexa skills names need not be unique. Multiple skills with same names are allowed in market.
The skills are chosen based on lonest matching pattern and some undisclosed policies.
Google does not allow duplicate invocation names, but pronunciations can be misleading.
As only one skill runs at a time and there is no strong indication whether skill is quitted, VPAs are vulnerable.
An adversary exploits how a skill is invoked utilizing the variations in the ways the command is spoken. (accent,
courteous expression, etc.)
There can be voice squatting (phonetically similar) and word squatting (includes strategical extra words).
According to a study, Alexa and Google invoked around 50% of skills mistakenly.
Voice Masquerading Attack
Running skill can pretend to hand-over control or fake termination in order to impersonate.
The response content is in JSON object for each command. By having a Goodbye or silent response, the skill can fake-terminate.
After every 6 seconds for Alexa and 8 seconds for Google, skills will be forceful terminated. To continue faking, the silent audio
needs to be repromoted.
Sensitive data or information can be obtained by the attacking skill by impersonating legitimate skill.
Analysis show that out of 9,582 commands, 52 were to switch skill and 485 were to terminate in the middle, and these could easily
be exploited.
Data Collection
Ran web crawlers to collect metadata like name, author, invocation name, sample utterances, description and reviews.
Gathered around 23,758 skills with 19,670 of them being 3 rd party skills.
Methodology
Built scanner to capture Competitive Invocation Name (CIN) – names for a given invocation name.
Two parts within defensing voice squatting –
Utterance paraphrasing – identifies suspicious variations of invocation name
Pronunciation comparison – finds the similarity in pronunciation between two different names
Utterance paraphrasing
Paraphrase common invocation utterances of the target skill using Bilingual pivoting and deep neural networks.
Gathered 11 prefixes and 6 suffixes and applied to build a variations recognizable to VPA system.
Please, my, can you, it, the, for me, app, some, few, etc.…
Pronunciation comparison
Converts a name into phonemic presentation using ARPABET phoneme code.
CMU pronunciation dictionary which includes 134,000 words is used.
Train a Grapheme-to-phoneme model using recurrent neural network with long short term memory(LSTM) units.
Every phonemic representation is compared with target name and their edit distant is measured within scanner.
Weighed cost matrix is obtained for the operations on different phoneme pairs.
Needleman- Wunsch algorithm used to identify the minimum edit distance and related edit path.
)=1-
Evaluation
Calculated the cost for transforming misrecognized invocations names to identified from voice commands.
3655 out of 19,670 Alexa skills have CINs which include identical invocation names.
After removing skills with identical names, 531 skills have CINs with average of 1.31 CINs.
Invocation name with most CINs is “cat facts” with 66 skills.
345 skills CINs are the utterance paraphrasing of other skills’ names.
Google has only 1,001 skills and does not allow them to have identical invocation names.
Only 4 skills have similarly pronounced CINs.
Some skills deliberately utilize the invocation names unrelated to their functionalities and follow popular skills
invocation name.
DEFENDING VOICE MASQUERADING
Limitation of Defense – All the data set might not comprehend enough for covering real-world attack cases that
could happen.
Future direction – Manual inspection of each skill is infeasible as skill’s inside logic is invisible to VPA system.
ANY
QUESTIONS?