Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

US010228904B2

(12) United States Patent (10) Patent No.: US 10 ,228,904 B2


Raux (45 ) Date of Patent: Mar. 12 , 2019
( 54 ) GAZE TRIGGERED VOICE RECOGNITION (56 ) References Cited
INCORPORATING DEVICE VELOCITY U . S . PATENT DOCUMENTS
(71 ) Applicant: Lenovo (Singapore) Pte . Ltd ., 6 ,970 ,824 B2 11/2005 Hinde et al.
Singapore (SG ) 2002/0105575 A18 / 2002 Hinde et al.
2013 / 0307771 AL 11/ 2013 Parker et al.
(72 ) Inventor: Antoine Roland Raux , Cupertino, CA 2014 /0184550 A1* 7 /2014 Hennessey .............. GO6F 3/013
(US ) 345 / 173
2014 /0350942 AL 11/ 2014 Kady et al.
( 73 ) Assignee : Lenovo (Singapore ) Pte. Ltd ., 2015/0109191 A1AI*" 4 /2015 Johnson .................. GIOL 15/ 22
Singapore (SG ) 345/ 156
2015/0207957 A1* 7/2015 Lee ........... ........... GO6F 3/ 1208
358/ 452
( * ) Notice : Subject to any disclaimer, the term of this
patent is extended or adjusted under 35 FOREIGN PATENT DOCUMENTS
U .S .C . 154 (b ) by 277 days .
CN 104023127 A 9 /2014
(21) Appl. No.: 14/539,495 EP
EP
1215658 A2
2806335 A1
6 / 2002
11 /2014
( 22 ) Filed : Nov. 12 , 2014 * cited by examiner
(65 ) Prior Publication Data Primary Examiner — Farzad Kazeminezhad
US 2016 /0132290 A1 May 12, 2016 (74 ) Attorney , Agent, or Firm — Ference & Associates
LLC
(51 ) Int. CI.
GIOL 15 / 22 ( 2006 .01) (57 ) ABSTRACT
GOOF 3 /01 ( 2006 .01) One embodiment provides a method , involving : detecting , at
G06F 3 / 12 ( 2006 . 01) an electronic device , a location of user gaze; activating,
G06F 3 / 16 ( 2006 .01) based on the location of the user gaze , a voice input module ,
(52 ) U .S . CI. wherein the activating is based on a command input using a
CPC G06F 3/ 167 (2013 .01 ); G06F 37013 modality that detects an input in combination with the
( 2013 .01); GIOL 15/ 22 (2013 . 01) ; G06F location of the user gaze , the modality comprising a change
3 / 1208 ( 2013 . 01 ); GIOL 2015 / 223 ( 2013.01) ; in velocity of the device ; detecting, at the electronic device ,
GIOL 2015/ 227 (2013 .01 ) a voice input; evaluating, using the voice input module, the
(58 ) Field of Classification Search voice input, and performing, based on evaluation of the
CPC ........ GO6F 3 / 167; G06F 3 /013 ; G06F 3 /1208 ; voice input, at least one action . Other aspects are described
G10L 15/22 and claimed .
USPC ......... 345 /173, 156
See application file for complete search history. 18 Claims, 4 Drawing Sheets

FOR THOSE 7 United States v Lenovo recommends Windows


lenovo WHODOS" / For. Work y Support vi Email My osnulCart Search U - 302
Contact Usvl Sign -Up Account Cart
Laptops Tablets Desktops & Workstations Servers & Accessories & Services & Deals Which Pais
All-in -one Storage Software Warranty right forme
Home Tablets
Tablets & Tablet PCs
NARROW YOUR LENOVO TABLETS
SEARCH
A Screen Size
WORK ,PLAY , COMFORT,
VERSATILITY 100000 HOLD TILT STAND
MODE MODE MODE
300
O 7 inch
o 8 inch YOGA TABLET
0 10 . 1 inch 3 MODES & 18 HOURS OF
O 11.6 inch BATTERY
Starting Price YOGA TABLET EXPLORE YOGA
ASHTON KUTCHER SEE THE UNVEILINGI
Less than $ 200 Android or Screen Size
o $ 200 to $ 399 Barter? For details : Lenovo .com Windows?
O $ 400 to $ 799
Over $ 800 ANDROID TABLETS
A Hard Drive Backed by the Google App Store, Android offers all
of your favorite apps , games and productivity tools .
301 8GB With near -limitless customerization and easy access to
16GB all ofGoogle ' s services ,it's no wonder Android is 303
( 32GB considered the world 's most popular mobile OS .
O 64GB
O 128GB Explore Android
U . S . Patent Mar. 12 , 2019 Sheet 1 of 4 US 10 ,228, 904 B2

120
140 130-

BNA at ery Power Circuit Mangemt AditonalDevices ,wr(siahrenolgreset ,


c
a umdeiroa d
m
, i
ecvroipc
h ens e
s
, x
t oe
r a
ngel
)
.
etc

190 SDRAM 1
.
FIG

Software Proces r sTcoruechn /controler


Flash Memory 110
vym 170
180 ???????

100
y 160
WLAN Transceivr W AN Transceivr 150
U . S . Patent Mar. 12 , 2019 Sheet 2 of 4 US 10 ,228,904 B2

240
Memory 294 Speaker
)
s
(
Microphne
266 275
263 264 BIOS268BootCode290 273 274
MPogwmetr L ClockGen. Audio TCO 265fra12CSM/ SPIFlash N VR AM
Ut OMI242Contrle
Link
tweru
270
10Super FWHub VA
B
S u
I p OoS
r t 279

SCoymsputemr 210Chipset
Control
Memory
/
)
s
(
Core
220
Proces or
222
)
s
(
)
core
-
multi
or
single
g
.
e
(
2327 Memory 234by
224
FSB
Hub226Control er
m ent 251
- 252
110
Contrle Hub 250
254
LPC
ASIC
(
)
s
TPM
276 FLASH 278

SATA E
-
PCI USB LAN LANK GPIO
|
271 272 ROM 277
LVOSILD PCI
-E

> - - - - - - - - -
253

280
DisplayPort onere Graphics HDD/SDD KB
|
Mouse
DisplayDevice w HDMI DVI SOVO z

Discret 236 T
282

284
238

2
.
FIG
U . S . Patent Mar. 12, 2019 Sheet 3 of 4 US 10 , 228 , 904 B2

3
.
FIG
300
302 303
-

t utututututulub
wwwww

me
for
right CE

HOLDTILTSTANMDODE OF
HOURS
18
&
MODES
3
TYAOBLGEAT BAT ERY ETXOYPAGLBOAREOUTHESEEANTKAVUESTIHLCTNEOGRN
WrLeicnomdevwsoMyEmail SD&APaisWcehrsvaioeclrihse Scirzeen
Ilirsn
W
S a
otr
f wa
n t
g
r y
e
Work
For
Syl
/
vCUsvl
Sign
u-
ACart
!
Up
opcntoaructn w orAndroid?Windows aloA,SApptheGbyBnfotadcoregkrliesd .tpandgapps,fofyourroadvuocmtrilevstyetoeasyaandcl-nWuistmoeictralzsetrhiosn OS.mpmost'sthewconpbsuridlearde
Ano
wonder
is
s,
s
'
it
Google
of
all
nerdvriocieds AEnxdprloide
TANBDLREOTISD
VSU/FORTtnHaiOtSeEsd WD&eskotroksptsaion TLAEBNOVTS C,PLAYWORKOMFORT
VERSATILY NA
B?
d.
LFor
:
comaetrnatoievlros
W)HODO one
in
-
All
iern w

Tablets v

T>Homeablets PCsT&ablets SizeScre n PStraitcneg 200


$
than
Lessto

Laptops YNAORUORWSEARCH inch


7
0 inch08 1 inch
inch
.
10
D 6
.
11
D
200to
399
$ 799
$
400
800
$
Over
o Drive
Hard
16GB
O 32GB64GB
0128GB
B
OSGB
|
301
U . S . Patent Mar. 12 , 2019 Sheet 4 of 4 US 10 , 228, 904 B2

4
.
FIG
400
402 403
mere
1

me
for
right
PRWhDiecahls HTSOITLADNTMODE OF
HOURS
18
&
MODES
3
iiii viiiiiiii

in TYAOBLGEAT BAT ERY YETXOPAGLBOAREUTTNSKAVUEHTILCEHNOGRN


WrLeicnomdvwso My
,
Email
S&Acervsioecris
Scirzeen
For
Work
Sv
/
CUsvl
yl
Sion
AuCart
Up
- L
ocpntoaurcnt
WS/aotrf awntgrye theGBnfotadcoregkrliesd.tpandgapps,fofyourroadvuocmtrilevstyeto
alloA,App
Sby easyaandcnear-lWithuisctmoetrilzsaetiosn OS.mmostp'sthewconpbsruidlearde AEnxdprloide
orAndroidW?indows Ait
is
wonder
no
s
'
sall
Google
,of
nedrvriocieds

TANBDLREOTISD
FORT/UVSnHtiaOtSeEds WD&eskotrkostpasion TLAEBNOVTS VERSATIL Y
CPLAY
,WORKOMFORT com
.
dBarter
Lenovo
:For
?etails

All
-
in
one

Ww lHeOnDovS TLaabplteotpss THaoblmets


w

PCs&Tablets
YNAORUORWSEARCH SizeScre n PStraitcneg 200
$$
than
Less399
to
200 800
$
Over Drive
Hard
inch
7
O
inch
8
inch 11
1
.
10 inch
6
.
0
799
to
400
$
D
64GB
B
US 10 ,228, 904 B2
GAZE TRIGGERED VOICE RECOGNITION location of a user gaze ; code that activates, based on the
INCORPORATING DEVICE VELOCITY location of the user gaze , a voice input module ; code that
detects a voice input using a modality ; code that evaluates,
BACKGROUND using the voice input module , the voice input, and code that
5 performs, based on the evaluation of the voice input, at least
With the creation of intelligent digitalpersonal assistants , one action .
(e .g ., SIRI, S Voice , GOOGLE NOW , CORTANA , and The foregoing is a summary and thus may contain sim
HIDI) the use of voice commands to control electronic plifications , generalizations, and omissions of detail; conse
devices has become extremely popular. SIRI is a registered quently , those skilled in the art will appreciate that the
trademark of Apple Inc . in the United States and other 10 summary is illustrative only and is not intended to be in any
countries . S VOICE is a registered trademark of Samsung way limiting .
Electronics Co . in the United States and other countries . For a better understanding of the embodiments , together
GOOGLE is a registered trademark of Google Inc. in the with other and further features and advantages thereof,
United States and other countries. CORTANA is a pending
trademark of Microsoft in the United States and other 15is reference is made to the following description , taken in
countries . Generally , a user interacts with a voice input conjunction with the accompanying drawings . The scope of
module , for example embodied in a personal assistant the invention will be pointed out in the appended claims.
through use of natural language . This style of interface BRIEF DESCRIPTION OF THE SEVERAL
allows a device to receive voice inputs, e .g ., voice com
mands, from the user (e .g ., “ What is the weather tomorrow ," 20 VIEWS OF THE DRAWINGS
“ Call Dan ” ) process those requests and perform the user ' s
desired actions by carrying out the task itself or delegating FIG . 1 illustrates an example of information handling
user requests to a desired application . device circuitry .
Because natural language is a major method of commu FIG . 2 illustrates another example of information han
nication that people are comfortable with , the ability to use 25 dling device circuitry .
voice commands offers a natural and efficient way to utilize FIG . 3 illustrates an example method of gaze triggered
functions of a device 's operating system or applications, no recognition .
matter how simple or complex . However, one of the major FIG . 4 illustrates a further example method of gaze
issues when utilizing the personal assistants is determining triggered recognition .
what portion of a user' s speech is intended to be received as 30
a voice command . Constantly listening to the user has DETAILED DESCRIPTION
proven too difficult a task to achieve with a usable level of
false positives (i.e ., the assistant responding to unrelated It will be readily understood that the components of the
speech ) and false negatives (i.e., the assistant ignoring user embodiments, as generally described and illustrated in the
commands). In addition , the personal assistant can be an 35 figures herein , may be arranged and designed in a wide
energy intensive application , thus allowing it to run con - variety of different configurations in addition to the
stantly in the background could have a significant impact on described example embodiments. Thus, the following more
battery life . In order to overcome this issue , most voice detailed description of the example embodiments, as repre
controlled assistants today make use of some form of trigger sented in the figures , is not intended to limit the scope of the
to initiate the voice recognition process . This trigger 40 embodiments , as claimed , but is merely representative of
assumes that any speech directly following the trigger is a example embodiments.
command directed to the assistant. Some common triggers Reference throughout this specification to " one embodi
are physical button presses ( e. g ., SIRI activation ) and spe - ment” or “ an embodiment" (or the like ) means that a
cial key phrases to be spoken before any system - directed particular feature , structure , or characteristic described in
command ( e . g ., Okay GOOGLE ). 45 connection with the embodiment is included in at least one
embodiment. Thus, the appearance of the phrases “ in one
BRIEF SUMMARY embodiment” or “ in an embodiment or the like in various
places throughout this specification are not necessarily all
In summary, one aspect provide a method, comprising: referring to the same embodiment.
detecting, at an electronic device , a location of user gaze ; 50 Furthermore , the described features , structures , or char
activating , based on the location of the user gaze , a voice acteristics may be combined in any suitable manner in one
input module ; detecting, at the electronic device , a voice or more embodiments . In the following description , numer
input; evaluating, using the voice input module , the voice ous specific details are provided to give a thorough under
input, and performing, based on evaluation of the voice standing of embodiments. One skilled in the relevant art will
input, at least one action . 55 recognize, however, that the various embodiments can be
Another aspect provides an information handling device , practiced without one or more of the specific details, or with
comprising : a processor ; at least one sensor operatively other methods, components , materials , et cetera. In other
coupled to the processor ; and a memory that stores instruc - instances, well known structures, materials , or operations
tions executable by the processor to : detect a location of user are not shown or described in detail to avoid obfuscation .
gaze ; activate , based on the location of the user gaze , a voice 60 An embodiment allows users to interact with an electronic
input module ; detect a voice input using the at least one device by tracking the user' s gaze and using the location of
sensor; evaluate , using the voice input module , the voice a user ' s gaze as a trigger mechanism . For example , an
input, and perform , based on the evaluation of the voice embodiment may actively listen for audio input when the
input, at least one action . user ' s gaze is fixed on the upper right and corner of a smart
A further aspect provides a product, comprising: a storage 65 phone screen . An embodiment thus conveniently and easily
device having code stored therewith , the code being execut - solves the need to manually trigger an electronic device to
able by the processor and comprising : code that detects a receive audio inputs such as voice commands.
US 10 ,228 , 904 B2
Some currently available commercial systems use triggers Processors comprise internal arithmetic units , registers,
that require the pressing of a particular button (e .g ., pressing cache memory , busses, I/ O ports , etc ., as is well known in
and holding the home button to activate SIRI, or pressing the art. Internal busses and the like depend on different
and holding the search button to activate CORTANA ). An vendors , but essentially all the peripheral devices ( 120 ) may
alternative method currently available is the use of a key 5 attach to a single chip 110 . The circuitry 100 combines the
phrase (e .g., saying "Hey SIRI” while a device running iOS processor, memory control, and I/O controller hub all into a
8 or later and is plugged in , or saying “ Okay GOOGLE” single chip 110 . Also , systems 100 of this type do not
while a device running ANDROID 4 . 3 is awake). typically use SATA or PCI or LPC . Common interfaces, for
ANDROID is a registered trademark of Google Inc. in the
United States and other countries . Once a user speaks a key 10 example , include SDIO and I2C .
There are power management chip ( s) 130 , e. g., a battery
phase , the device is triggered to listen for the voice com management unit, BMU , which manage power as supplied ,
mands following the key - phrase .
The main issue with the current methods of activating a recharged by aviaconnection
for example , a rechargeable battery 140 , which may be
to a power source ( not shown ). In
trigger is that they tend to disrupt whatever task the user is
currently involved in (eg . exiting an application upon 15 at least one design , a single chip , such as 110 , is used to
activation of the personal assistant ). In particular, if a user is supply BIOS like functionality and DRAM memory .
involved in performing a task that requires a keyboard , System 100 typically includes one or more of a WWAN
mouse , or touch input on the device ( e . g ., editing an email, transceiver 150 and a WLAN transceiver 160 for connecting
editing a document, browsing photos , or viewing social to various networks , such as telecommunications networks
networking), they will have to interrupt that task or possibly 20 and wireless Internet devices , e .g ., access points . Addition
even close their current application to click , touch , or enter a lly , devices 120 are commonly included , e.g., an image
a separate area to access the personal assistant. sensor such as a camera. System 100 often includes a touch
One current solution to the requirement of tactile input is screen 170 for data input and display /rendering . System 100
the use of a key phrase . Currently most key phrases can only also typically includes various memory devices, for example
be used outside of third party applications, or require you to 25 flash memory 180 and SDRAM 190 .
be in a certain menu or screen in the device' s operating FIG . 2 depicts a block diagram of another example of
system ( e. g ., being in the GOOGLE now application before information handling device circuits , circuitry or compo
saying “ Okay GOOGLE ” ) . Thus, the key phrase triggers nents. The example depicted in FIG . 2 may correspond to
may not be as restrictive as the key press method , which can computing systems such as the THINKPAD series of per
force the user to reposition their hand or use two hands to 30
press a key. However, the method of using key phrases sonal N . C .,
computers sold by Lenovo (US) Inc. of Morrisville ,
or other devices. As is apparent from the description
involves drawbacks as well. Even if the key phrase could be herein , embodiments may include other features or only
used while in a third party application , key phrase triggers some of the features of the example illustrated in FIG . 2 .
must be spoken prior to each voice command given by the
The example of FIG . 2 includes a so -called chipset 210 (a
user. This constant and repetitive act places a burden on the 35 group
user and undercuts the benefit of the natural language aspect chipsetsof) with
integrated circuits , or chips, that work together,
an architecture that may vary depending on
of the intelligent assistant, which is one of its primary
qualities . manufacturer (for example, INTEL , AMD , ARM , etc .).
Thus, an embodiment addresses these limitations by uti- INTEL is a registered trademark of Intel Corporation in the
lizing gaze tracking , which allows the user to trigger voice 40 United States and other countries . AMD is a registered
recognition by simply looking at a designated area on the trademark of Advanced Micro Devices, Inc . in the United
device 's display . An embodiment uses a sensor device that States and other countries . ARM is an unregistered trade
detects the location of a user 's gaze . An embodiment then mark of ARM Holdings plc in the United States and other
activates a voice input module, e .g ., an intelligent assistant, countries. The architecture of the chipset 210 includes a core
which detects any speech commands from the user. The 45 and memory control group 220 and an I/ O controller hub
trigger could be activated by the user fixing their gaze on a 250 that exchanges information ( for example , data , signals ,
particular corner of a device 's screen , or looking at a commands, etc .) via a direct management interface (DMI)
predetermined location set by the user. Additionally , an 242 or a link controller 244. In FIG . 2 , the DMI 242 is a
embodiment could have an icon or even an animated char chip - to -chip interface ( sometimes referred to as being a link
acter (e . g ., CLIPPY , Microsoft' s beloved office assistant ) 50 between a " northbridge ” and a " southbridge” ). The core and
that the user is to focus on when they wish to activate the memory control group 220 include one or more processors
intelligent assistant. 222 (for example , single or multi- core ) and a memory
It should be noted that while examples are provide herein controller hub 226 that exchange information via a front side
focusing on an intelligent assistant, these examples are bus (FSB ) 224 ; noting that components of the group 220
non - limiting and the general techniques may be applied to 55 may be integrated in a chip that supplants the conventional
voice modules generally, such as provided for dictation in “ northbridge ” style architecture . One or more processors
forms or within applications generally . 222 comprise internal arithmetic units , registers , cache
The illustrated example embodiments will be best under- memory , busses, I/O ports, etc ., as is well known in the art.
stood by reference to the figures . The following description In FIG . 2 , thememory controller hub 226 interfaces with
is intended only by way of example , and simply illustrates 60 memory 240 ( for example , to provide support for a type of
certain example embodiments . RAM that may be referred to as " system memory ” or
While various other circuits , circuitry or components may “ memory ” ) . The memory controller hub 226 further
be utilized in information handling devices, with regard to includes a low voltage differential signaling (LVDS) inter
smart phone and / or tablet circuitry 100 , an example illus - face 232 for a display device 292 ( for example , a CRT, a flat
trated in FIG . 1 includes a system on a chip design found for 65 panel, touch screen , etc .). A block 238 includes some
example in tablet or other mobile computing platforms. technologies that may be supported via the LVDS interface
Software and processor (s) are combined in a single chip 110 . 232 (for example , serial digital video , HDMI/DVI, display
US 10 ,228 , 904 B2
port). The memory controller hub 226 also includes a back out of their browser application , forcing them to press
PCI-express interface (PCI- E ) 234 that may support discrete and hold a button , or repeat an annoying key phase , an
graphics 236 . embodiment allows the user to simply look at a predeter
In FIG . 2 , the I/ O hub controller 250 includes a SATA mined location on the screen (e . g ., the lower left corner 301) .
interface 251 ( for example , for HDDs, SDDs, etc ., 280 ), a 5 By looking at this predetermined location , the user would
PCI- E interface 252 (for example , for wireless connections activate the personal assistant and any subsequent input
282), a USB interface 253 ( for example, for devices 284 (e . g ., voice commands ) would be interpreted as intended to
such as a digitizer, keyboard ,mice , cameras , phones ,micro - be used by the personal assistant.
phones , storage , other connected devices , etc . ), a network By way of further example , in an embodiment, a user
interface 254 ( for example , LAN ), a GPIO interface 255 , a 10 could be browsing a webpage 300 and wish to utilize the
LPC interface 270 ( for ASICs 271 , a TPM 272 , a super I/O intelligent digital personal assistant to issue a command
273 , a firmware hub 274 , BIOS support 275 as well as about something nonrelated to the website they are viewing
various types ofmemory 276 such as ROM 277, Flash 278 , ( e . g ., adding an appointment to their calendar ) . Once again ,
and NVRAM 279 ), a power management interface 261, a instead of requiring the user to back out of their browser
clock generator interface 262, an audio interface 263 ( for 15 application , forcing them to press and hold a button , or
example , for speakers 294 ), a TCO interface 264, a system repeat an annoying key phase the user could simply look at
management bus interface 265 , and SPI Flash 266 , which a predetermined icon on the screen (e .g., a microphone icon
can include BIOS 268 and boot code 290 . The 1/ 0 hub located on the screen 302 ). By looking at the icon , an
controller 250 may include gigabit Ethernet support. intuitive symbol for speech recognition , the user would
The system , upon power on ,may be configured to execute 20 activate the personal assistant and any additional input (e . g .,
boot code 290 for the BIOS 268 , as stored within the SPI voice commands) would be interpreted as intended for use
Flash 266 , and thereafter processes data under the control of by the personal assistant.
one ormore operating systems and application software ( for In an additional embodiment, a user could be browsing a
example , stored in system memory 240). An operating webpage 300 and wish to utilize the intelligent digital
system may be stored in any of a variety of locations and 25 personal assistant to inquire about something on the third
accessed , for example , according to instructions of the BIOS party application , ( e . g ., checking prices on an online shop
268 . As described herein , a device may include fewer or ping application ). Once again , instead of requiring the user
more features than shown in the system of FIG . 2 . to back out of their browser application , forcing them to
Information handling device circuitry, as for example press and hold a button , or repeat an annoying key phase the
outlined in FIG . 1 or FIG . 2 , may be used in devices such as 30 user could simply look at an anthropomorphic agent on the
tablets , smart phones , personal computer devices generally , screen ( e . g ., CLIPPY, an animated character located on the
and/ or electronic devices which users may use to enter, screen 303, or the like ). By looking at visual representation
record , or modify data . For example , the circuitry outlined of their personal assistant , the user would activate the
in FIG . 1 may be implemented in a tablet or smart phone personal assistant and any additional input (e .g ., voice
embodiment, whereas the circuitry outlined in FIG . 2 may 35 commands) would be interpreted as intended to be used by
be implemented in a personal computer embodiment. the personal assistant.
It will be understood that such devices (e.g., a tablet In order to further increase the intuitive nature , an
computing device , personal computer, or smartphone ), pri- embodiment can alter the visual representation of the pre
marily offer touch screens, microphones and cameras as determined location . By way of example , and referring to
primary input devices, with current devices relying primar- 40 FIG . 4 , an embodiment could change color or highlight the
ily on the touch screen and microphone inputs for applica - predetermined location with a visual symbol 401 when the
tion control. In an embodiment, fusion of such modalities user directs his gaze at the location . This altering of the
provides a more user friendly experience , particularly for visual state of the location is a clear indicator to the user that
certain applications thatmay warrant the use of other input the intelligent personal assistant is currently active , and
modalities not supported by such devices . 45 would be able to receive further commands. Additionally ,
By way of example , and referring now to FIG . 3 , a the altering of the visual state of the location enables the user
webpage is shown as would be viewed on a display of a to avoid false positives. If the user did not intend to activate
typical information handling device such as a touch screen , the personal assistant, they could avoid issuing further
170 of FIG . 1, or a display device , 292 of FIG . 2 . An commands and avert their gaze thus avoiding the need to
embodiment allows a user to activate the intelligent digital 50 cancel or exit the personal assistant application .
personal assistant in a non - invasive fashion . Additionally or alternatively , an embodiment can alter the
In an embodiment, the location of a user ' s gaze is detected visual representation of the predetermined icon . By way of
using a sensor device . The sensor device may be housed example , and referring to FIG . 4 , an embodiment could
within the information handling device (e.g ., a webcam in a change background color or highlight the predetermined
tablet, smartphone, personal computer, etc . ). Additionally or 55 icon 402 when the user directs his gaze at the icon . In an
alternatively , the sensor device could be separate device additional embodiment, the icon could simply appear or
(e .g ., a stand alone webcam , or a sensor such as a KINECT disappear when the user 's gaze is focused on the known
device ). KINECT is a registered trademark of Microsoft location of the icon . This altering of the visual state of the
Corporation in the United States and other countries. In a icon as before is a clear indicator that the intelligent personal
further embodiment, the sensor device could be any image 60 assistant is currently active . Additionally , the altering of the
capture device or video capture device . Additionally , the visual state of the icon enables the user to avoid false
sensor could be of a more complex nature (e . g ., a range positives . If the user did not intend to activate the personal
imaging device , a 3D scanning device , etc .). assistant, they easily avoid the need to cancel or exit the
By way of example , in an embodiment, a user could be personal assistant activation . Thus saving the user time and
browsing a webpage 300 and wish to utilize the intelligent 65 frustration with the personal assistant .
digital personal assistant to inquire about something on the In a further embodiment, the animated character could
website they are viewing . Instead of requiring the user to react to the user 's visual focus. By way of example , and
US 10,228 , 904 B2
referring to FIG . 4 in comparison to 303 of FIG . 3 , an user could flick or alter the position or angle of their device
embodiment could change the reaction of the animated (i. e ., utilize the accelerometer ) as the second modality to
character 403 when the user directs his gaze at its location . verify their intent to activate the intelligent personal assis
In an additional embodiment, the animated character could tant. This has the addition of the increased competency ,
have multiple reactions depending on multiple circum - 5 without the need of a second hand or requiring the user to
stances ( e .g ., what application was open , how long the user 's repeat an annoying key phrase .
gaze was present, time of day, etc. ) which could indicate to Additionally or alternatively , an embodiment may , for
the user that the intelligent personal assistant was ready to example, allow the user send an electronic communication
receive a specific set of commands . Not only would this to their device ( e . g ., through a Bluetooth headset, near field
altering of the reaction of the character be a clear indicator 10 communication device , etc .) in combination with utilizing
that the intelligent personal assistant is currently active . their gaze . For example , if a user affixed their gaze upon the
Additionally, as before , the altering of the visual state of the predetermined icon and the icon ' s visual state changed , the
location enables the user to avoid false positives. If the user user could interact with a separate device (e . g ., press a
did not intend to activate the personal assistant, they easily button on their Bluetooth headset ) to verify their intent to
avoid the need to cancel or exit the personal assistant 15 activate the intelligent personal assistant.
activation . Thus saving the user time and frustration when As will be appreciated by one skilled in the art, various
utilizing the personal assistant. aspects may be embodied as a system , method or device
In an additional embodiment, the user can select which program product. Accordingly , aspects may take the form of
option they wish to utilize (e . g ., the location , icon , character, an entirely hardware embodiment or an embodiment includ
etc .). Thus if a user found the animated character overly 20 ing software that may all generally be referred to herein as
invasive or annoying, they could chose the simpler or a " circuit ," "module ” or “ system .” Furthermore, aspects
cleaner option of the predetermined location . Alternatively , may take the form of a device program product embodied in
if a user had difficultly remembering the predetermined one or more device readable medium (s) having device
location they may choose to implement the icon and have it readable program code embodied therewith .
remain on the screen at all times thus allowing for easier 25 It should be noted that the various functions described
identification . In a further embodiment, the user could select herein may be implemented using instructions stored on a
a personalized icon or character based on an image , video , device readable storage medium such as a non -signal storage
third party application or the like. device that are executed by a processor. A storage device
Additionally, an embodiment allows the user to select the may be, for example , an electronic , magnetic , optical, elec
predetermined location of whatever mode of location iden - 30 tromagnetic , infrared , or semiconductor system , apparatus,
tification they choose (e .g ., predetermined location, icon , or device , or any suitable combination of the foregoing.
character , etc .). In addition to an overall default setting, a More specific examples of a storage medium would include
user may also set the location of the identifier based on the following: a portable computer diskette , a hard disk , a
which application the user has open (e .g ., lower corners for random access memory (RAM ), a read -only memory
browsers to avoid covering the uniform resource locater 35 (ROM ), an erasable programmable read - only memory
(URL )/ search bar, upper corners for videos to avoid cover - (EPROM or Flash memory ), an optical fiber, a portable
ing the play / time bar , etc . ). In an additional embodiment, compact disc read - only memory (CD -ROM ), an optical
third party applications can have a preset preferred location storage device, a magnetic storage device , or any suitable
based on the graphical user interface (GUI) of the applica - combination of the foregoing . In the context of this docu
tion . In a further embodiment, this preset could be overruled 40 ment, a storage device is not a signal and “ non -transitory ”
by the user. includes all media except signal media .
In addition to the ease of use , an embodiment allows for Program code embodied on a storage medium may be
greater accuracy. A user may wish to further protect them - transmitted using any appropriate medium , including but not
selves from the possibility of false positives. Thus , an limited to wireless, wireline, optical fiber cable, RF, et
embodimentmay provide an additional mode of activation 45 cetera , or any suitable combination of the foregoing .
in tandem with the user ' s gaze . This additional activation Program code for carrying out operations may be written
step could include current methods of activation such as in any combination of one or more programming languages .
pressing and holding a particular key while the user ' s gaze The program code may execute entirely on a single device ,
is located at a predetermined location (e . g ., the button to be partly on a single device , as a stand - alone software package ,
pressed ). Additionally or alternatively , an embodiment could 50 partly on single device and partly on another device , or
make use of a key phrase as the additionalmode of activa entirely on the other device . In some cases, the devices may
tion (e . g ., referencing the animated character 403 by name be connected through any type of connection or network ,
when focused on him ). including a local area network (LAN ) or a wide area network
In addition to the above mentioned existing trigger meth - (WAN ), or the connection may be made through other
ods gaze tracking enables alternative methods. For example , 55 devices ( for example , through the Internet using an Internet
an embodimentmay allow a user to blink once or twice with Service Provider ), through wireless connections, e . g ., near
one or both eyes before activation of the intelligent personal field communication , or through a hard wire connection ,
assistant. This additional step allows for a higher degree of such as over a USB connection .
competency without requiring a great deal of additional Example embodiments are described herein with refer
effort by the user , and without being overly burdensome 60 ence to the figures, which illustrate example methods,
(e .g ., repeating the same key phrase each time the user devices and program products according to various example
wishes it activate the personal assistant ). embodiments. It will be understood that the actions and
Additionally or alternatively, an embodiment may allow functionality may be implemented at least in part by pro
the user to move their device ( e . g ., tablet, smartphone , gram instructions. These program instructions may be pro
personal computer, etc .) in combination with utilizing their 65 vided to a processor of a general purpose information
gaze . For example , if a user affixed their gaze upon the handling device , a special purpose information handling
predetermined icon and the icon 's visual state changed , the device, or other programmable data processing device to
US 10 ,228 ,904 B2
10
produce a machine, such that the instructions, which execute 9. Themethod of claim 1,wherein the modality detects an
via a processor of the device implement the functions /acts input selected from a group consisting of: facial manipula
specified . tion , a change in velocity of the device, an electronic
It is worth noting that while specific blocks are used in the communication, a key phrase , and a button press.
figures , and a particular ordering of blocks has been illus- 5 10 . An information handling device, comprising:
trated , these are non -limiting examples. In certain contexts, a display device ;
two or more blocks may be combined , a block may be split a processor operatively coupled to the display device ;
into two or more blocks, or certain blocks may be re -ordered at least one sensor operatively coupled to the processor ;
or re -organized as appropriate , as the explicit illustrated and
examples are used only for descriptive purposes and are not 10 a memory that stores instructions executable by the pro
to be construed as limiting. cessor to :
As used herein , the singular " a " and " an " may be con
strued as including the plural “ one or more ” unless clearly detect a location of user gaze on the display device ;
indicated otherwise . activate , based on the location of the user gaze , a voice
This disclosure has been presented for purposes of illus - 15 input module of a digital assistant, wherein the acti
tration and description but is not intended to be exhaustive vating is based on a command input using a modality
or limiting. Many modifications and variations will be that detects an input in combination with the location of
apparent to those of ordinary skill in the art. The example the user gaze , the modality comprising a change in
embodiments were chosen and described in order to explain velocity of the device;
principles and practical application , and to enable others of 20 detect, after activation of the voice inputmodule, a voice
ordinary skill in the art to understand the disclosure for input using the at least one sensor ; and
various embodiments with various modifications as are perform , based on the voice input, at least one action ,
suited to the particular use contemplated . wherein the performed at leastone action is irrespective
Thus, although illustrative example embodiments have of the location of the user gaze .
been described herein with reference to the accompanying 25 11. The me information handling device of claim 10 ,
figures , it is to be understood that this description is not wherein voice input module activation is triggered by the
limiting and that various other changes and modifications location of user gaze being focused on a characteristic
may be affected therein by one skilled in the art without selected from a group consisting of: a predetermined loca
departing from the scope or spirit of the disclosure . tion , an icon , an anthropomorphic agent, a user selected
What is claimed is: image, and a third party created agent.
1 . A method, comprising: 12 . The information handling device of claim 11 , further
detecting , at a display device of an electronic device, a comprising changing a visual state of the characteristic in
location of user gaze; response to the user gaze being focused on the characteristic.
activating, based on the location of the user gaze , a voice
input module of a digital assistant, wherein the acti- 35 13 . The information handling device of claim 11, wherein
vating is based on a command input using a modality the location of the characteristic is determined based on a
that detects an input in combination with the location of factor selected from a group consisting of: user selection ,
the user gaze , the modality comprising a change in third party application preference , and current device task .
velocity of the device; 14 . The information handling device of claim 10 , wherein
detecting , at the electronic device and after activation of 40 detecting a location of user gaze comprises : using a sensor
the voice input module, a voice input; and device selected from a group consisting of: an image capture
performing , based on the voice input, at least one action , device , a video capture device , a range imaging device , and
wherein theperformed at least one action is irrespective a 3D scanning device .
of the location of the user gaze . 15 . The information handling device of claim 10 , wherein
2 . The method of claim 1 , wherein the voice input module 45 detecting a voice input comprises: using an audio capture
activation is triggered by the location of user gaze being device to detect audio .
focused on a characteristic selected from a group consisting 16 . The information handling device of claim 15 , wherein
of: a predetermined location, an icon , an anthropomorphic the audio detected comprises voice commands from the user.
agent, a user selected image , and a third party created agent. 17 . The information handling device of claim 10 . wherein
3 . The method of claim 2 , further comprising changing a 50 the modality detects an input selected from a group consist
visual state of the characteristic in responsee toto the
the user
user gaze
gaze inge
ing of: facial manipulation , a change in velocity of the
being focused on the characteristic . device , an electronic communication , a key phrase , and a
4 . The method of claim 2 , wherein the location of the button press.
characteristic is determined based on a factor selected from
a group consisting of: user selection , third party application 55 18 . A product, comprising:
preference, and current device task . a storage device having code stored therewith , the code
5 . The method of claim 1 , wherein detecting a location of being executable by the processor and comprising:
user gaze comprises: using a sensor device to detect the user code that detects a location of a user gaze on a display
gaze . device ;
6 . The method of claim 5 . wherein the sensor device is 60 code that activates , based on the location of the user gaze ,
selected from a group consisting of: an image capture a voice inputmodule of a digital assistant, wherein the
device , a video capture device , a range imaging device , and activating is based on a command input using a modal
a 3D scanning device. ity that detects an input in combination with the loca
7. The method of claim 1 , wherein detecting a voice input tion of the user gaze , the modality comprising a change
comprises : using an audio capture device to detect audio . 65 in velocity of the device ;
8 . The method of claim 7 , wherein the audio detected code that detects , after activation of the voice input
comprises voice commands from the user. module, a voice input using a modality ; and
US 10 ,228 , 904 B2
11
code that performs, based on the voice input, at least one
action, wherein the performed at least one action is
irrespective of the location of the user gaze .

You might also like