Google Soli: 2019-20 Dept. of ECE, KLECET, Chikodi

Google Soli
Chapter 1
INTRODUCTION
In the past, radar had been widely used for long-range detection and surveillance of
objects. However, since the last decade, there have been some studies on object detection
using a short - range radar with high resolution such as ultra-wideband (UWB) radar. Unlike
optical cameras, a radar sensor is not affected by illumination and has the ability to detect the
objects even in an occluded condition. Therefore, radar can be used in a wide variety of
applications, both in the outdoor and indoor environments. Furthermore, it can operate at
lower power compared to optical cameras and does not need to be exposed to the outside of
the device it is attached to because of the radar signal’s transmissivity. Due to the property of
human-computer interaction (HCI), being able to see through the blocking material, HCI
devices can be designed more neatly. Furthermore, due to the recent advances in machine
learning, there have been studies on obtaining meaningful knowledge or context from the raw
radar signal.
Radars are still regarded as suitable only for detecting the moving objects at a long
range and there is little research on recognizing non-rigid objects such as human hands at a
short range. Furthermore, very little research on applications such as gesture recognition has
been conducted. Even though the radar is a useful sensor to apply machine learning
techniques to, research in radar based on machine learning has not been done much.
Recently, a short - range radar with high-resolution and low-power, which is called Soli, was
developed for tracking and recognizing fine hand gestures. In the Soli project, the various
features that can be obtained from radar signals were defined and feature-based gesture
recognition was performed using the random forest classifier. Besides, convolutional neural
network (CNN) was employed to classify the driver’s hand gesture based on an optical
camera, depth camera, and radar sensor. The CNN was used to fuse data from the three
sensors and resulted in improved accuracy under the varying lighting conditions. The CNN
was also used for gesture recognition using micro-Doppler signatures and classification
accuracy was 85.6% for 10 gestures. Furthermore, there are some application studies using a
short - range radar. Research on feature-based gesture recognition using 24GHz frequency-
Dept. of ECE, KLECET, Chikodi. 2019-20 Page 1

Google Soli
modulated continuous wave (FMCW) radar was conducted with classification accuracy of
88.57% for 7 gestures, and feature analysis was also performed. Radar Cat was developed for
material and object recognition. However, the described studies are less robust to the
range/speed of the motions and shape of hands that vary from person to person. Also they are
designed to classify a small number of gestures. Since they have an accuracy of about 90%,
they are insufficient for practical applications[2].
Soli is a new, robust, high-resolution, low power, miniature gesture sensing

technology for interactive computer graphics based on millimeter-wave radar. Radar operates
on the principle of reflection and detection of radio frequency (RF) electromagnetic waves.
The RF spectrum has several highly attractive properties as a sensing modality for interactive
systems and applications. The sensors do not depend on lighting, noise or atmospheric
conditions. These are extremely fast and highly precise and can work through materials,
which allows them to be easily embedded into devices and environments. When
implemented at millimeter-wave RF frequencies, the entire sensor can be designed as a
compact solid-state semiconductor device. A radar chip that is a miniature, low-power device
having no moving parts and can be manufactured inexpensively at scale. The resulting Soli
sensor delivers the promise of truly ubiquitous gesture interaction across a very broad range
of applications, including but not limited to virtual reality (VR), wearables and smart
garments, Internet of Things (IoT) and game controllers, as well as more traditional devices
such as mobile phones, tablets and laptops[1].
Creating input technology that can capture similar precision of finger motion with
free touchless gestures was one of the important motivations of this work on the Soli sensor.
The development of a radar-based sensor optimized for human-computer interaction (HCI)
requires re-thinking and re-building the entire sensor architecture from the ground up,
starting with basic principles. It is the first end-to-end radar sensing system specifically
designed for tracking and recognizing fine hand gestures. The complete end-to-end design,
development, and evaluation of this new gesture sensing modality is the major achievement
and contribution of this work, opening new research frontiers in non-imaging sensors for
interaction.

Google Soli
Chapter 2
LITERATURE SURVEY
[1] This paper describe a new approach to developing a radar-based sensor optimized
for human-computer interaction, building the sensor architecture from the ground up with the
inclusion of radar design principles, high temporal resolution gesture tracking, a hardware
abstraction layer (HAL), a solid state radar chip and system architecture, interaction models
and gesture vocabularies, and gesture recognition. We demonstrate that Soli can be used for
robust gesture recognition and can track gestures with sub-millimeter accuracy, running at
over 10,000 frames per second on embedded hardware. Radars are indeed a viable, powerful
and attractive technology that can enhance and improve user interaction. Soli proposes a new
category of gesture sensors based on physical principles of millimeter-wave RF radiation that
were not previously explored in interactive applications. The Soli Software Pipeline (SSP)
supports real-time gesture recognition using multiple radar hardware architectures and can be
optimized for embedded application processors to allow high frame rates with minimized
latency for improved temporal resolution.
[2] This paper proposes a novel machine learning architecture, specifically designed
for radio-frequency based gesture recognition. We focus on high-frequency (60GHz), short
range radar based sensing, in particular Google’s Soli sensor. The signal has unique
properties such as resolving motion at a very fine level and allowing for segmentation in
range and velocity spaces rather than image space. This enables recognition of new types of
inputs but poses significant difficulties for the design of input recognition algorithms. The
proposed algorithm is capable of detecting a rich set of dynamic gestures and can resolve
small motions of fingers in fine detail. This technique is based on an end-to-end trained
combination of deep convolutional and recurrent neural networks. The algorithm achieves
high recognition rates on a challenging set of 11 dynamic gestures and generalizes well
across 10 users. The proposed model runs on commodity hardware at 140Hz (CPU only).
[3] This paper introduces a simple but effective technique in automatic hand gesture
recognition using radar. The proposed technique classifies hand gestures based on the
envelopes of their micro-Doppler (MD) signatures. These envelopes capture the distinctions
Google Soli
among different hand movements and their corresponding positive and negative Doppler
frequencies that are generated during each gesture act. We detect the positive and negative
frequency envelopes of MD separately, and form a feature vector of their augmentation. We
use the k-nearest neighbor (kNN) classifier and Manhattan distance (L1) measure, in lieu of
Euclidean distance (L2), so as not to diminish small but critical envelope values. It is shown
that this method outperforms both low-dimension representation techniques based on
principal component analysis (PCA) and sparse reconstruction using Gaussian-windowed
Fourier dictionary, and can achieve very high classification rates.
[4] This paper introduces a simple but effective technique in automatic hand gesture
recognition using radar. The proposed technique classifies hand gestures based on the
envelopes of their micro-Doppler signatures. These envelopes capture the distinctions among
different hand movements and their corresponding positive and negative Doppler frequencies
which are generated during each gesture act. Automatic hand gesture recognition is poised to
make homes more user friendly and most efficient through the use of contactless radio
frequency (RF) sensors that can identify different hand gestures for instrument and household
appliance control. This paper presents a method to discriminate five classes of dynamic hand
gestures using radar micro-Doppler sensor. These classes are swiping hand, hand rotation,
flipping fingers, calling and snapping fingers.
[5] This paper proposes a hand gesture recognition system for a real-time application
of HCI using 60 GHz frequency-modulated continuous wave (FMCW) radar, Soli, developed
by Google. The overall system includes signal processing part that generates range-Doppler
map (RDM) sequences without clutter and machine learning part including a long short-term
memory (LSTM) encoder to learn the temporal characteristics of the RDM sequences. A set
of data is collected from 10 participants for the experiment. The proposed hand gesture
recognition system successfully distinguishes 10 gestures with a high classification accuracy
of 99.10%. It also recognizes the gestures of a new participant with an accuracy of 98.48%.

Google Soli
Chapter 3
SOLI TECHNOLOGY
3.1 Background and Related Work

Input is an essential, necessary and critical component of interactive computer
graphic systems. As mobile computing grows, new modes of interaction are emerging and
becoming feasible, including touch and touchless gestures using camera-based tracking or
capacitive field sensors, voice and gaze input, and a multitude of sensors embedded in
various objects, the human body, clothes and environments or distributed as 3D interfaces.
free air gestures emerge as a promising and attractive form of human computer interaction. In
particular researchers are interested in gestures that involve highly precise and controlled
motions performed by small muscle groups in the wrist and fingers. It has been well
established over decades of research that small muscle groups in the hands allow for fluid,
effective and rapid manipulation, resulting in precise and intuitive interaction. Creating input
technology that can capture similar precision of finger motion with free touchless gestures
was one of the important motivations of Soli sensor.
3.2 Radar Fundamentals

Soli sensor is solid-state millimeter-wave radar for mobile gesture recognition. The
fundamental principles of radar sensing are straight forward. A modulated electromagnetic
wave is emitted toward a moving or static target that scatters the transmitted radiation, with
some portion of energy redirected back toward the radar where it is intercepted by the
receiving antenna[2].
Fig.3.1 The overview of the proposed gesture recognition system

Google Soli
The design of any radar system includes a)hardware, such as antennas and internal
circuitry components, b)signal processing techniques to modulate the transmitted waveform
and extract information from the received waveform, and c) radar control software that
executes radar operation and algorithms. The design of all these elements is strongly
interconnected and cannot be specified independently from each other or the specifics of the
application.
3.3 Solid-State Radar Devices

Electronic hardware design for high frequency radar can be challenging, requiring
highly specialized equipment and skill. In particular, antenna and waveguide design for
wideband, super-GHz frequencies can present a significant barrier to cost-efficient,
ubiquitous sensing. To overcome this challenge by designing an allin-one radar IC that
integrates all radar functionality onto a single chip, including antennas and preprocessing that
interface directly to a standard microprocessor that can be found in a normal mobile phone or
a smart watch[2].
3.4 Scattering Center Model of Human Hand

The RF response of the hand as a superposition of responses from discrete, dynamic
scattering centers. Scattering center models are consistent with the geometrical theory of
diffraction when the wavelength is small in comparison to the target’s spatial extent, an
assumption that holds for millimeter-wave sensing of the hand. Below equation propose a
generalized time-varying scattering center model that accounts for non-rigid hand dynamics.
Fig.3.2 Scattering Center Model of Human Hand

Google Soli
Each scattering center is parameterized by complex reflectivity parameter ρi(T) and

radial distance ri(T) from the sensor, which vary as a function of time T:
𝑁𝑠𝑐
y(r, T) = ∑ ρi(T)δ(r − ri(T))

𝑖=1
Where NSC is the number of scattering centers and δ(.) is the Dirac delta function.
The complex reflectivity parameter ρ is frequency dependent and varies with the local hand
geometry, orientation with respect to the radar, surface texture, and material composition.
This parametric description of the millimeter-wave scattering response for a dynamically
reconfiguring hand presents a tractable model for gesture parameter estimation and tracking.
3.5 Gesture Recognition Pipeline

Soli utilizes a single broad antenna beam to illuminate the entire hand as modulated
pulses are transmitted at very high repetition rates (between 1-10 kHz). The raw received
signal, consisting of a superposition of reflection from scattering centers within the radar’s
antenna beam, is then processed into multiple abstract signal representations
(Transformations). The sensor does not resolve shape of objects with high spatial resolution
but instead provides high temporal resolution, capturing primarily changes in hand-pose.
One important thing to consider when recognizing human gesture is real-time

operation because it requires heavy computation during radar signal processing. For the
purpose of real-time recognition, the sampling frequency of the Soli radar is set slightly
lower, and several signal processing parameters such as the size of FFT are also set to a
proper value so that they are capable of real-time processing. Such a lower hardware
specification can decrease range and Doppler resolutions, but it is sufficient to recognize
gestures by applying machine learning technique such as Long Short-Term Memory
(LSTM).
The first step in any machine learning pipeline is to extract features from the data.
Traditionally this has been done manually but recently Convolutional Neural Networks
(CNNs) have been successful in a variety of challenging tasks in learning features
automatically. While not encoding shape, the Range Doppler Image (RDI) still contains

Google Soli
interpretable information about the motion of reflection centers, and CNNs can extract useful
intermediate representations. In RDI the distance and radial velocity of the reflected objects
are expressed as two dimensions. Before detecting gestures, clutters caused by reflection
from other objects except the hand are extracted from raw Range-Doppler Maps (RDMs).
Assuming that all objects except the hand are almost static, the background subtraction
method can be applied to extract the clutters. By generating an adaptive background model
based on the Gaussian mixture model (GMM), the clusters that might change over time are
effectively extracted. After then, the clutters are removed by calculating the difference
between the current frame and the background model that contains the clutter of the radar
signal[5].
Fig.3.3 Gesture recognition pipeline. (1) Data produced by the sensor when sliding index finger over
thumb. (2)Preprocessing and stacking of frames. (3) CNN. (4) RNN with per-frame predictions.
Designing CNN architectures is a complex task involving many hyper

parameters such as the number of layers and neurons, activation functions and filter sizes. In
the experiments section different CNN variants are reported. Most saliently, a network
adapted from computer vision is compared to a network that already designed specifically for
the Soli data.
Recurrent Neural Networks differ from feed forward networks in that they contain
feedback loops, encoding contextual information of a temporal sequence. The output of the
CNN is given to the RNN, while the outputs RNN are fed to a softmax layer, providing per-
frame gesture probabilities. During processing standard RNNs may suffer from numerical

Google Soli
instabilities known as vanishing or exploding gradient problem. To avoid this issue LSTM
use memory cells to store, modify and access internal gates via special gates.

Google Soli
Chapter 4
ADVANTAGES, DISADVANTAGES AND
APPLICATIONS
4.1 Advantages
1. Replace all kinds of buttons and switches and make the devices operable remotely.
This feature provides an user virtual switch through which user can control the devices
without touching it.
2. Most of the existing motion sensor technique uses camera based motion sensing.
Camera based motion sensors are sensitive to light. It may reduce the accuracy of motion
sensor where as Soli sensor is insensitive to noise and capable of detecting minute
motion.
3. Allows to control gadgets with gestures.
Soli sensor recognizes the gesture through reflected complex wave, performs
transformation operation over the reflected signal and control the electronic gadgets in
very less time.
4. Allows free hand typing.
Soli frees device interaction from a screen or similar surface to make the human hand in
air.
4.2 Disadvantages
1. It has limited radar range.
Soli sensor can sense the hand gesture at distance of 10 meters only.
2. Multiple gestures could not be possible at a time.
The soli sensor is capable of recognizing single gesture at a time.
3. Highly expensive.
Soli is made up of small millimeter radar and many other cheap components which made
sensor expensive.

Google Soli
4.3 Applications
1. Medical
The implications of Project Soli for healthcare are very exciting. Incorporating Project
Soli type chips would allow users to control devices without touching them, reducing the
risk of spreading infections.
2. Gaming
Wireless interference with the gaming kit makes games more interesting. For example in
poker game, it could be used to recognize the player’s turn or automatically update the
game without player or dealer intervention.
3. Education
To make learning more fun for children.
4. Smart home or offices
User can control the gadgets without touching them.

Google Soli
Chapter 5
CONCLUSION AND FUTURE SCOPE
5.1 Conclusion
Soli is a interaction platform for connecting a world. It proposes new interaction
experience across multiple connected products of the future. It can be wearable, mobile,
internet of things, automotive, industrial and many more. It offer a third dimension
interaction which compliments and enhances other interaction modalities.
5.2 Future scope

Soli technology is in a very early stage of development. There is very little prior art
available. An important area of future research on Soli is the human factors implications of
these new interaction modalities. There are also many exciting opportunities to discover and
develop novel interaction techniques, applications and use cases of this technology.

Google Soli
REFERENCES
[1] Jaime Lien, Nicholas Gillian, M. Emre Karagozler, Patrick Amihood, Carsten
Schwesig, Erik Olson, Hakim Raja, Ivan Poupyrev, “Soli: Ubiquitous Gesture
Sensing with Millimeter Wave Radar”, ACM Transactions on Graphics, Vol. 35, No.
4, Article 142, July 2016.
[2] Saiwen Wang†, Jie Song†, Jaime Lien, Ivan Poupyrev, Otmar Hilliges, “Interacting
with Soli: Exploring Fine-Grained Dynamic Gesture Recognition in the Radio-
Frequency Spectrum”, ACM Transactions, October 2016.
[3] Moeness G. Amin, Zhengxin Zeng and Tao Shan, “Hand Gesture Recognition based
on Radar Micro-Doppler Signature Envelopes”, International Graduate Exchange
Program of Beijing Institute of Technology.
https://www.researchgate.net/publication/329362210.
[4] Hui-Shyong Yeo Aaron Quigley, “Radar Sensing in Human-Computer Interaction”,
interactions.acm.org, January–February 2018.
[5] Jae-Woo Choi, Si-Jung Ryu and Jonghwan Kim, “Short-Range Radar Based Real-
Time Hand Gesture Recognition Using LSTM Encoder”, vol 7, ISSN 2169-3536.

Google Soli

Google Soli: 2019-20 Dept. of ECE, KLECET, Chikodi

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Google Soli: 2019-20 Dept. of ECE, KLECET, Chikodi

Uploaded by

Copyright:

Available Formats

Google Soli

Dept. of ECE, KLECET, Chikodi. 2019-20 Page 1

Soli is a new, robust, high-resolution, low power, miniature gesture sensing

Dept. of ECE, KLECET, Chikodi. 2019-20 Page 2

Dept. of ECE, KLECET, Chikodi. 2019-20 Page 4

3.1 Background and Related Work

3.2 Radar Fundamentals

Fig.3.1 The overview of the proposed gesture recognition system

Dept. of ECE, KLECET, Chikodi. 2019-20 Page 5

3.3 Solid-State Radar Devices

3.4 Scattering Center Model of Human Hand

Fig.3.2 Scattering Center Model of Human Hand

Dept. of ECE, KLECET, Chikodi. 2019-20 Page 6

Each scattering center is parameterized by complex reﬂectivity parameter ρi(T) and

y(r, T) = ∑ ρi(T)δ(r − ri(T))

3.5 Gesture Recognition Pipeline

One important thing to consider when recognizing human gesture is real-time

Dept. of ECE, KLECET, Chikodi. 2019-20 Page 7

Designing CNN architectures is a complex task involving many hyper

Dept. of ECE, KLECET, Chikodi. 2019-20 Page 8

Dept. of ECE, KLECET, Chikodi. 2019-20 Page 9

Dept. of ECE, KLECET, Chikodi. 2019-20 Page 10

Dept. of ECE, KLECET, Chikodi. 2019-20 Page 11

5.2 Future scope

Dept. of ECE, KLECET, Chikodi. 2019-20 Page 12

Dept. of ECE, KLECET, Chikodi. 2019-20 Page 13

Dept. of ECE, KLECET, Chikodi. 2019-20 Page 14

You might also like