Professional Documents
Culture Documents
Download ebook Mobile And Ubiquitous Systems Computing Networking And Services 18Th Eai International Conference Mobiquitous 2021 Virtual Event November 8 11 2021 Proceedings Takahiro Hara online pdf all chapter docx epub
Download ebook Mobile And Ubiquitous Systems Computing Networking And Services 18Th Eai International Conference Mobiquitous 2021 Virtual Event November 8 11 2021 Proceedings Takahiro Hara online pdf all chapter docx epub
https://ebookmeta.com/product/architecture-of-computing-
systems-34th-international-conference-arcs-2021-virtual-event-
june-7-8-2021-proceedings-1st-edition-christian-hochberger-
editor/
https://ebookmeta.com/product/computer-vision-systems-13th-
international-conference-icvs-2021-virtual-event-
september-22-24-2021-proceedings-1st-edition-markus-vincze/
https://ebookmeta.com/product/formal-methods-24th-international-
symposium-fm-2021-virtual-event-
november-20-26-2021-proceedings-1st-edition-marieke-huisman/
https://ebookmeta.com/product/advances-in-computer-games-17th-
international-conference-acg-2021-virtual-event-
november-23-25-2021-revised-selected-papers-cameron-browne/
Lecture Notes of the Institute
for Computer Sciences, Social Informatics
and Telecommunications Engineering 419
123
Editors
Takahiro Hara Hirozumi Yamaguchi
Osaka University Osaka University
Osaka, Japan Osaka, Japan
© ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2022
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the
material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now
known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book are
believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors
give a warranty, expressed or implied, with respect to the material contained herein or for any errors or
omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in
published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
We are delighted to introduce the proceedings of the eighteenth edition of the European
Alliance for Innovation (EAI) International Conference on Mobile and Ubiquitous
Systems: Computing, Networking and Services (MobiQuitous 2021). Despite the
considerable research effort in the area of mobile and ubiquitous computing over nearly
two decades, and the maturity of some of its base technologies, many challenges
persist. The goal of this conference series is to provide a forum for presenting and
discussing high-quality research in the field, involving international practitioners and
researchers from diverse backgrounds. Areas addressed by MobiQuitous include sys-
tems, applications, social networks, middleware, networking, sensing, data manage-
ment, data processing, and services, all with special focus on mobile and ubiquitous
computing.
The technical program of Mobiquitous 2021 consisted of 26 full papers, selected
from 79 submitted papers, in oral presentation sessions at the main conference track.
The acceptance rate was 33%. Aside from the high-quality technical paper presenta-
tions, the technical program also featured two keynote speeches, one industrial session,
one poster and demo session, and four technical workshops. The two keynote speeches
were given by Chieko Asakawa (IBM Fellow/Chief Executive Director of the National
Museum of Emerging Science and Innovation (Miraikan, Japan) and Claudio Bettini
(University of Milan, Italy). The industrial session included two invited speeches by
Chihiro Ono (KDDI Research, Inc., Japan) and Kota Tsubouchi (Yahoo! Japan Cor-
poration, Japan). The poster and demo session consisted of 11 presentations.
The four workshops organized were the First Workshop on Ubiquitous and
Multi-domain User Modeling (UMUM 2021), the First International Workshop on
Smart Society Technologies (IWSST 2021), the Fourth International Workshop on
Mobile Ubiquitous Systems, Infrastructures, Communications and AppLications
(MUSICAL 2021 Fall) and the Workshop on Innovative Technologies for the
Healthcare Empowerment (InnovTech4Health). The UMUM 2021 workshop aimed to
address how to build ubiquitous and multi-domain user models and increase their
applicability in real-world contexts. The IWSST 2021 workshop aimed to discuss
technologies and applications that solve relevant problems in modern society.
The MUSICAL 2021 Fall workshop aimed to discuss technologies related to future
mobile ubiquitous systems. The InnovTech4Health workshop focused on the devel-
opment of information and communication technologies for wellbeing.
Coordination with the steering chair, Imrich Chlamtac, was essential for the success
of the conference. We sincerely appreciate his constant support and guidance. It was
also a great pleasure to work with such an excellent organizing committee team for
their hard work in organizing and supporting the conference. In particular, we are
grateful to the Technical Program Committee who completed the peer-review process
for technical papers and helped to put together a high-quality technical program. We
are also grateful to the conference manager, Karolina Marcinova, for her support and
vi Preface
all the authors who submitted their papers to the MobiQuitous 2021 conference and
workshops.
We strongly believe that MobiQuitous provides a good forum for all researchers,
developers, and practitioners to discuss current and future research directions of mobile
and ubiquitous computing. We also expect that the future MobiQuitous conferences
will be as successful and stimulating as this year’s, as indicated by the contributions
presented in this volume.
Steering Committee
Imrich Chlamtac University of Trento, Italy
Organizing Committee
General Chairs
Takahiro Hara Osaka University, Japan
Hirozumi Yamaguchi Osaka University, Japan
Local Chairs
Shigemi Ishida Kyushu University, Japan
Tetsuya Shigeyasu Prefectural University of Hiroshima, Japan
Workshops Chairs
Takuya Yoshihiro Wakayama University, Japan
Janick Edinger University of Mannheim, Germany
Md Osman Gani Miami University, USA
Publications Chair
Akira Uchiyama Osaka University, Japan
Web Chair
Hiroki Yoshikawa Osaka University, Japan
Financial Chair
Akimitsu Kanzaki Shimane University, Japan
MobiQuitous 2021
Is This IoT Device Likely to Be Secure? Risk Score Prediction for IoT
Devices Using Gradient Boosting Machines . . . . . . . . . . . . . . . . . . . . . . . . 115
Carlos A. Rivera A., Arash Shaghaghi, David D. Nguyen,
and Salil S. Kanhere
One-Shot Wayfinding Method for Blind People via OCR and Arrow
Analysis with a 360-Degree Smartphone Camera . . . . . . . . . . . . . . . . . . . . 150
Yutaro Yamanaka, Seita Kayukawa, Hironobu Takagi, Yuichi Nagaoka,
Yoshimune Hiratsuka, and Satoshi Kurihara
xii Contents
Air Handling Unit Explainability Using Contextual Importance and Utility. . . 513
Avleen Malhi, Manik Madhikermi, Matti Huotari, and Kary Främling
Premises Based Smart Door Chains System Using IoT Cloud. . . . . . . . . . . . 834
Abdul Hannan, Faisal Hussain, Sehrish Munawar Cheema,
and Ivan Miguel Pires
Short Papers
1 Introduction
2 Related Work
In this research, we tackle the problem of event detection from smartphone GPS
readings, and the extraction of informative event-relevant tweets that can help
people understand an event. We first describe in this section the existing, per-
tinent event detection research works. Afterwards, we describe existing research
on event-relevant tweet extraction.
an event. Thus, there is a limit to the types of events that can be captured. Our
aim is to detect events occurring in a city independent of domain, event type
(scheduled or unscheduled) and scale. Hence, less constraining data sources and
methods are needed.
Twitter, one of the most popular social networking platforms today, is a
promising source of data for detecting various types of events in varying scale.
Many studies on event detection have focused on the burst, i.e., drastic increase
in volume, of event-related keywords in the Twitter stream [5,10,11,25,32]. How-
ever, keyword burst may not always be indicative of a congestion. Tweets also
burst when, say, a well-known celebrity dies or a grave disaster occurs in another
country, which are instances where observers of the event do not necessarily con-
centrate physically in the same place. These methods therefore may not shed
light on whether an event is causing real-world congestion. Some event detection
studies have leveraged geotagged tweets, which include location data of where
the tweets were posted, to identify the location of an event where there is concen-
tration of people [16,31,37,38]. However, only 0.9% of all tweets are geotagged
[7,23]. Moreover, the number of geotagged tweets decreased since June 2019
when geotagging options were removed [17]. Obviously, geotag-based methods
could no longer detect events when no geotagged tweets are posted.
Smartphone GPS is a city-level and situation-agnostic (no information about
the situation is provided) data source that has also recently garnered attention.
Smartphone GPS data are often collected via applications installed with user
permission [34], and it is possible to capture human mobility as an increase or
decrease in population by counting the number of unique users per grid cell
in each timeslot. Coinciding with this development is anomaly detection using
human mobility that is also getting the attention of the research community
[14,27,28]. For instance, Neumann et al. [28] proposed a method that uses a
day’s data of human mobility as a feature to classify whether the day is special
(e.g., December 24 or long weekends) or not by computing how much human
mobility on that day deviated from normal. However, a day’s worth of data
(24 h) is needed before classification can be achieved, which makes real-time
event detection difficult. The research works of Fuse et al. [14] and Mishima
et al. [27] are similar in that they detect anomalies emerging from grid cells in
real time. Fuse et al. [14] classified whether an anomaly occurred or not using a
sticky hierarchical Dirichlet process - hidden Markov model (sHDP-HMM). Fuse
et al.’s approach learns the latent state and hyperparameters of the training
data that represent the normal state, and the sHDP-HMM infers the latent
state of the test data with the learned hyperparameters. It compares the normal
state and the latent state of the test data, and detects an anomaly if they are
unequal. Mishima et al. [27] computed for an anomaly score that indicates the
difference from normal human mobility volume per grid cell, where this normalcy
is inferred using past human mobility data. An anomaly is detected when the
anomaly score exceeds a predefined threshold value. Both approaches, Fuse et
al. and Mishima et al., albeit promising for detecting events that have caused
congestion, only tested in their experiments train accidents, typhoons, and New
Event Detection and Event-Relevant Tweet Extraction with Human Mobility 7
Year’s holidays, and there is no sufficient verification regarding event type and
scale. In addition, Fuse et al.’s approach was necessary to manually define one
day of normal human mobility as training data per grid cell, while Mishima
et al.’s approach automatically estimates normal human mobility from multiple
days. To avoid overfitting of training data and to automatically estimate normal
human mobility for a large number of grid cells, we applied Mishima et al.’s
approach to event detection for a variety of event types and scales, and verified
its limitations.
Note that since GPS-based event detection does not provide information
to people on what kind of event is occurring, we need to extract such missing
information from event-relevant tweets (see below).
3 Proposed Method
We now describe our approach for detecting events and extracting event-relevant
tweets. Figure 1 shows the framework of our approach. First, our method detects
an event in a grid cell, which indicates a higher than normal concentration of
8 N. Takeda et al.
human mobility in that location, by calculating from GPS data the number
of unique users in a grid cell. Next, our method extracts event-relevant tweets
useful for understanding what is occurring in the event using the POI names
associated to grid cell where the event is detected. To extract event-relevant
tweets regardless of event types and scales, event-related topics that correlate
with human mobility are selected and three scores are computed for each tweet
considering the tweet’s meaning and the posting time. We detail these below.
We adopted the approach of Mishima and colleagues [27] for event detection. We
use smartphone GPS data that consists of a timestamp, latitude and longitude
to compute for human mobility. The basic idea is to infer what a normal human
mobility per grid cell is using past human mobility data, and then to compute
an anomaly score based on how the magnitude of the current human mobility
deviates from the inferred normal human mobility. Our method detects an event
for each timeslot (i.e., a unit of time in which the population in a grid cell is
aggregated using GPS) when the anomaly score exceeds a predefined threshold
value.
The method of Mishima et al. [27] infers normal human mobility for each
grid cell, as well as day-group, i.e., the characteristics of each day (e.g., weekday,
weekend/holiday, or first day of a consecutive holiday) because human mobility
trends are different between weekdays and weekends. First, the method divides
the smartphone GPS data to each grid cell and day-group according to latitude,
longitude, and timestamp, and computes human mobility transitions, where
human mobility is the number of unique users in a grid cell. Hereafter, event
detection for a single grid cell c is described and the same process is applied to
all grid cells. The human mobility dataset V is the number of unique users per
each day-group and per each timeslot, and is represented as: V ∈ RNg ,Nt , where
Event Detection and Event-Relevant Tweet Extraction with Human Mobility 9
where μi is the mean vector of human mobility in cluster clg,i . The normal
human mobility is defined as the mean and standard deviation of clg,i∗ , i.e.,
μg,i∗ = {μg,i∗ ,t | t = 1, 2, . . . , Nt }, σg,i∗ = {σg,i∗ ,t | t = 1, 2, . . . , Nt }. From here
we derive the anomaly score Ag,t for current state of human mobility vg,t as the
z-score, that is,
vg,t − μg,i∗ ,t
Ag,t = . (2)
σg,i∗ ,t
The following then becomes the function for detecting an event:
true, if Ag,t > φ
IsEvent(Ag,t ) = (3)
false, otherwise.
where φ is the predefined threshold value. As an example, with the process above,
event detection is performed every 15 min across the 250 m × 250 m grid cells
throughout the city.
represent the grid cell-POI database as P , where Pc denotes the multiple POI
names contained in grid cell c. When an anomaly is detected in grid cell c at
time slot te , tweets that include POI names from that cell are extracted, i.e.,
T Wc = {twt | pc ∈ Pc , 1 ≤ t ≤ te }. Tweets twt are tweets posted at timeslot
t. Obviously, here, tweets posted only before the event detection time are used.
Note, however, that even if we collect tweets that include the POI names, they
could still include considerable amount of tweets that are not event-relevant.
Hereafter, we omit the subscript c to describe the event-relevant tweet extrac-
tion for a single grid cell c.
Next, our method performs clustering on T W based on semantics, i.e., fea-
ture vectors to select a cluster that has the highest relevance to the event, i.e.,
event-related topic. Each tweet is vectorized with a pretrained BERT (Bidirec-
tional Encoder Representations from Transformers) model [12] to learn the rep-
resentation of semantic distance between tweets. BERT is a Transformer-based
language representation model that can vectorize sentences in a context-aware
manner. BERT encodes T W into feature vectors F = {fi | i = 1, 2, . . . , N },
where fi is the average feature vector of all wordsM included in a tweet and
1
represents the meaning of twi , i.e., fi = M j=1 BERT(wj ∈ W ), where
W = {wj | wj ∈ twi , j = 1, 2, . . . , M } and M is the number of words in twi .
Afterwards, we apply k-means clustering to segregate the tweets into some topics
towards extracting an event-related topic. Tweet topics T P are extracted by k-
means using T P = {tpi | i = 1, 2, . . . , L} = kmeans(F , L). There are efforts to
vectorize documents (or words) with BERT and cluster similar meanings with k-
means, and the effectiveness of these efforts have been verified [19,26]. However,
methods for automatically determining the number of clusters have not been
sufficiently explored. Thus, in our study, to discover the optimal number of clus-
ters L∗ , we increment L in sequence and evaluate the clustering result for each
L based on the correlation of time series changes between human mobility and
topic transitions. A topic transition is a time series of the number of tweets in
each topic for each timeslot (Fig. 2(a)). Topic transition for a tpi is represented
as T Vi = {|twt | | twt ∈ T W , 1 ≤ t ≤ te }. Considering that topics appearing
in tweets are mostly independent, only event-related topics are expected to be
highly correlated with human mobility (Fig. 2(b)). When human mobility data
is V = {vt ∈ V | 1 ≤ t ≤ te } and event-related topic is (tpo ∈ T P ), our
assumption is T Vo ∝ V . We compute the correlation and independence score
(CI) for each L. CIL is computed using the following:
cov (T Vi , V )
max1≤i≤L
σ (T Vi ) σ (V )
CIL = , (4)
cov (T Vi , T Vj )
maxi=j
σ (T Vi ) σ (T Vj )
∗
L = arg max CIL , (5)
L
Fig. 2. Topic transitions and human mobility in a grid cell with an anomaly caused
at 18:00. Topic 3 is selected as event-related topic because of its high correlation with
human mobility.
tion value between topics (a low value indicates that the topics are independent
of each other), and the numerator indicates the highest correlation value between
human mobility transition and each topic transition (a high value indicates that
the topic correlated with human mobility is included). Thus, if the CIL is high,
the clustering result indicates that the topics are independent of each other and
contain topics that are highly correlated with human mobility transition. Ulti-
mately, we adopt the value of L that maximizes CIL as the optimal value L∗ .
We select the topic tpo that has the maximum correlation with human mobility
as an event-related topic containing the event-relevant tweets when the number
of clusters is L∗ . Figure 2(a) shows an example of extracted topics, with L∗ = 4,
from a grid cell with a detected event. The correlations between human mobility
(Fig. 2(b)) and Topic 0 to 3 were 0.64, 0.37, 0.77, and 0.86, respectively. Thus,
our approach selected Topic 3 (red line in Fig. 2(a)) as an event-related topic tpo .
Although there is a burst of tweets in Topic 0 (blue line in Fig. 2(a)) at 12:30 due
to an announcement of a future event, our approach successfully identified it as
the topic that was not related to the event. Our method can therefore eliminate
such noisy topic.
Finally, we scored each tweet to extract those that are especially relevant
to the event from the plethora of tweets that belong to tpo . The top scoring
tweets allow the user to understand what is occurring in the grid cell where the
event is detected. To extract the event-relevant tweets, we focus not only on
the meaning of the tweets, but also tweet posting time because such tweets are
frequently posted at relevant times, such as during or just before the event. We
consider the time when event-relevant tweets are likely to be posted depends on
the event type. For example, during a festival, tweets from people enjoying the
festival are posted while the festival is on-going, i.e., when the human mobility
volume is high. On the other hand, during a live concert, people do not tweet
while watching, instead, so many tweets are posted just before the concert starts
i.e., when event-related topics are frequently posted (they are posted after the
event as well, but we need to extract tweets at the event detection time, such
as before the event starts or during the event). Thus, to extract event-relevant
tweets, we defined three different weighting schemes based on three hypotheses.
12 N. Takeda et al.
I. Event-relevant tweets are located close in the feature space to the average
feature vector of tweets belonging to event-related topic.
II. Event-relevant tweets are more likely to be posted during times when there
are more event participants.
III. Event-relevant tweets are more likely to be posted during the times when
the event-related topics appear more frequently.
We start by defining the D-Score based on H-I (H, henceforth, stands for
hypothesis). We consider the tweets that are representative of a topic are dis-
tributed close to the average feature vector of tweets with the event-related topic,
inspired by existing document representative phrase extraction method [6]. We
obtained the topic vector by averaging vectors in the event-related topic, i.e.,
t-vec = μtp o . The D-Score reflects the distance from the t-vec to each tweet in
the feature space. A tweet with a higher D-Score indicates that it is semantically
similar to the t-vec. The D-Score in the tweet tw ∈ tpo is computed using the
following equations, normalized by the z-score:
1 N
cos(tw, t-vec) − cos(twi , t-vec)
D-Scoretw = N i , (6)
1 N 1 N 2
(cos(twi , t-vec) − cos(twi , t-vec))
N i N i
where cos(tw, t-vec) is the cosine similarity between tw and t-vec, N is the
number of tweets belonging to tpo , i.e., N = |tpo |.
Second, we defined the HV -Score based on H-II. Even if the tweets are
about an event-related topics, they may contain event-irrelevant tweets. Thus,
we consider that during high human mobility, event participants and people
caught in the crowds post numerous event-relevant tweets, and assume that the
more likely a tweet is posted at a time human mobility volume is high, the more
likely also it will be an event-relevant tweet. We therefore focus on weighting
tweets based on human mobility. The HV -Score indicates the relative human
mobility transition per timeslot, and can be computed in the posting timeslot t
as
vt − μV
HV -Scoret = , (7)
σV
where μV and σV are the mean and standard deviation, respectively, of the
human mobility transition.
Third, we defined the P T -Score based on H-III. Our method weights each
tweet based on the number of tweets on the event-related topic. For events such
as live concerts and stage performances, the number of tweets decreases during
the event because people participating in the event refrain from posting tweets.
Instead, users post numerous tweets about their expectations of participating in
the event just before the event starts. Thus, for such events, weight should be
given to tweets posted when there are many posts about an event-related topic.
Event Detection and Event-Relevant Tweet Extraction with Human Mobility 13
The P T -Score indicates the number of posted tweets regarding the event-related
topic in each timeslot, and is computed in the posting timeslot t as
|twt | − μT V o
P T -Scoret = , (8)
σT V o
where μT V o and σT V o are the mean and standard deviation, respectively, of the
number of posted tweets about the event-related topics.
Note that we normalized each score above using z-score so that these differ-
ent scores can be added together. In the succeeding section, we show how we
combined these three scores to come up with four different scoring schemes and
then compare how they contribute to our method’s performance.
4 Experiments
We evaluated our method in two experiments. We first describe here in detail
the datasets we used, and then describe our experiment set-up to evaluate the
detection capability of our method compared to geotag-based method, given both
scheduled and unscheduled events at different scales (Experiment-I). Finally,
we describe our event-relevant tweet extraction experiment to evaluate whether
users can understand what is occurring in the event (i.e., the cause and time of
the event) by referring to the extracted tweets (Experiment-II).
4.1 Datasets
We created datasets for each experiment1 because there are no open datasets
with human mobility data attached to a tweet dataset. Our target events include
nine scheduled events and five unscheduled events as shown in Table 1. We
selected events that can elicit behavior responses from people, such as those
that affect urban traffic flow and stimulate users to consider avoiding traffic
congestion (e.g., due to baseball game or train delay) or draw people in to par-
ticipate (e.g., festival, live concert). We also considered these events to have had
hundreds to hundreds of thousands of participants in order for us to examine
the effect of differences in scale. For the human mobility data, we aggregated
the smartphone GPS data of users who agreed to provide their location data,
which were collected by an application made by a Japanese mobile carrier. The
minimum time interval for each GPS data instance was two minutes, with the
sampling rate depending on the smartphone’s model and signal conditions. We
utilized GPS readings that were collected between May 1, 2019 and Sept. 30,
2020 from several million people. The human mobility data were computed every
15 min (i.e., Nt = 96) in 250 m × 250 m grid cells in Japan’s Tokyo and Aichi
prefectures.
1
We cannot disclose the number of tweets and the number of people in each experi-
ment due to the agreed terms of use.
14 N. Takeda et al.
Table 1. Details of the target events. If the number of participants in an event was not
officially announced (e.g., unscheduled events), the maximum capacity of the venue is
provided instead. The times of the events are the officially announced times of their
occurrence.
Event type ID POI name Event name Event date and No. of participants
time
Scheduled A Meiji Jingu Stadium Baseball game July 29, 2020 4,982
events (large scale stadiums) 18:00 ˜
B TOHO CINEMAS Opening a branch July 03, 2020 9:00 Up to 1,735
Ikebukuro (movie ˜
theatre)
C Shinjuku BLAZE (small Live concert July 11, 2020 Up to 800
concert venue) 12:30 ˜
D Shinjuku LOFT (small Live concert Sep. 27, 2020 Up to 550
concert venue) 17:15 ˜
E Shibuya Eggman (small Live concert Sep. 09, 2020 Up to 350
concert venue) 19:45 ˜
F Tokyo Dome (large scale Baseball game July 26, 2019 45,817
stadiums) 18:00 ˜
G Tokyo Big Sight Sales exhibition July 14, 2019 Approx. 48,000a
(convention center) 11:00 ˜
H Nagoya Dome (large Live concert May 26, 2019 49,692
scale stadiums) 16:00 ˜
I Port of Nagoya Fireworks festival July 15, 2019 Approx. 360,000
19:00 ˜ (total for one day)
Unscheduled J Ikebukuro Station Train delay June 29, 2020 Approx. 560,000
events 08:30 ˜ (total for one day)
K Sugamo Station Unannounced street July 03, 2020 Approx. 76,000
speech by a politician 12:00 ˜ (total for one day)
L Tokyo International Heavy congestion Sep. 19, 2020b Approx. 200,000
Airport (total for one day)
M Shibuya Station Train delay June 29, 2020 Approx. 360,000
08:30 ˜ (total for one day)
N Oizumi-gakuen Station Train delay Sep. 03, 2020 Approx. 87,000
17:40 ˜ (total for one day)
a
Calculate the average number of participants per event based on the number of participants and the
number of events per year.
b
The time is not described because it is not clear what time the congestion occurred.
As for the tweet datasets, we used both geotagged and non-geotagged tweets
containing POI names. The tweet datasets were sampled from 10% of all tweets
posted within Japan. In Exp-I, we used geotagged tweets as baseline by aggre-
gating the time series of hourly geotagged tweet volumes on a per grid-cell basis
in the month in which the target event occurred. For Exp-II, we used tweets
containing the grid cell-associated POI names. We created the grid cell-POI
database by processing an open dataset2 that contains the latitude and longi-
tude of major and local POIs within Japan. BERT for vectorizing each tweet
is pretrained on Wikipedia3 , and this model is often used to vectorize Japanese
tweets [2,36].
2
https://nlftp.mlit.go.jp/ksj/index.html.
3
https://github.com/cl-tohoku/bert-japanese.
Event Detection and Event-Relevant Tweet Extraction with Human Mobility 15
Our method and comparative methods were evaluated on whether each target
event was detected in the grid cell where it occurred. The event-detection time
was also evaluated because if the target event could be anticipated early or
detected quickly, relevant information could be delivered before people are caught
in a congestion or before the event is over.
We defined the number of day groups Ng = 8 (i.e., 2 × 2 × 2) according
to whether the day, previous day, and next day are either weekday or week-
end/holiday considering that human mobility vary depending on the day of the
week as well as the type of the previous or succeeding day (e.g., even on the
same weekday, Wednesdays and Fridays should be different day groups.). We
computed for normal human mobility using data within the past two months
before a target event occurred. We performed k-means clustering to compute
the normal human mobility in each grid cell, searched for the parameters that
would optimally detect the correct events, and set the number of clusters K = 2
and the threshold φ = 3. We explain below the technical details.
The most common conventional event detection approach is a geotag-based
method. However, it will not be able to accurately detect events because the
number of geotagged tweets is very small now (e.g., only 17 geotagged tweets
were posted throughout the day in event L). We benchmarked two compara-
tive methods (henceforth, CM) using the SR approach [29], a state-of-the-art
unsupervised anomaly detection method. We denoted as CM-1 the SR-based
approach that uses the transition of the number of geotagged tweets per hour,
CM-2 the SR-based approach that uses human mobility data, and we denote our
proposed method as PM. SR approach can detect anomalies using a saliency
map even if similar patterns have not appeared in the past. For both human
mobility data and changes in tweet volume, the shape of the time series differs
greatly depending on the event type, e.g., the amount of data increases rapidly
in the case of train accidents, but on the other hand, gradually increases before
the start of the event in the case of concert events. The SR method is suitable
as a comparison method because it has been verified to be robust to various
shapes of time series data [29]. As SR hyperparameters, the threshold τ is set
to 3, the number of estimated points ρ is set to 5, and the sliding window size ω
is set to 30, respectively, based on the search for the best parameters for event
detection. Note that non-geotagged tweets cannot be used for Exp-I because it
is impossible to identify the location in small grid cell units.
Table 2 shows the results of Exp-I. PM detected 9 out of 10 events, contrast
with CM-1 that detected only 3 events and CM-2 with just 4 events. PM detected
events B, D, J, K, M, and N that were not detected by CM-1. In these events,
no geotagged tweets were posted around the time of the event. CM-1 detected
events A, C, and L, but only a very small number of geotagged tweets were
posted at that time (up to three tweets for event L), so if a few geotagged tweets
are posted in a grid cell, the event may be falsely detected. Also, PM detected
events B, C, D, K, and M that were not detected by CM-2. We confirmed that
human mobility in these events was different from the normal human mobility,
16 N. Takeda et al.
although the magnitude of the peaks themselves was not large. Therefore, it is
considered that PM could detect these events.
Table 2. Results of Exp-I. The number in parentheses indicates the difference between
event detection time and event occurrence time.
Let us take for instance the ones in Fig. 3, which compares human mobil-
ity during normal days and the day-event for events A (baseball game) and K
(unannounced street speech). CM-1 and CM-2 detected event A that have sig-
nificant increase in human mobility (i.e., Fig. 3(a)), but did not detect the event
K that have relatively small increases (i.e., Figure 3(b)). PM was able to detect
even small increases in human mobility because it looks at the difference to the
normal human mobility transitions (difference between blue and orange lines in
Fig. 3(b)), whereas CM-2 did not because it only considers current transitions
in human mobility (orange line in Fig. 3(b)). Even if we set the threshold τ of
CM-2 to a smaller value, we may find event K, but at the same time, we will
find many false positives. Although PM was able to detect 9 out of 10 events
of different scales (ranging from 550 to 560,000 participants), a small-scale live
concert with up to 350 participants (i.e., event E) was not detected. This result
is due to the large number of people outside the venue, albeit within the grid
Event Detection and Event-Relevant Tweet Extraction with Human Mobility 17
cell, as well as the relatively low impact of this small event (in case the number
of participants in the event is very small compared to vg,t,c ). The detection of
such relatively small-scale events may be achieved by setting an adjustable grid
cell, e.g., define the rectangular polygon data for concert venue individually.
Fig. 3. Comparison of human mobility between normal days and the day-event. The
highlighted blocks indicate the duration of the event.
PM detected on average 22 and 77.5 min earlier than CM-1 (for events A, C,
and L), and CM-2 (for events A, J, L, and N) respectively. For the scheduled
events, we assumed that people gradually gathered for the start-time of the
event, which caused congestion even before the event started. In Fig. 3(a), in
fact, we confirmed that people gradually gathered from approximately 16:00, two
hours before the event started at 18:00. PM detected the event at 16:45 (75 mins
earlier), while CM-1 detected it at 18:00 (±0 min), and CM-2 detected it at
19:00 (60 mins later). Thus, PM detected these events even before they started.
Further, we considered that the unscheduled events J, M, N (train delays), and
K (unannounced street speech), people gathered in the grid cell immediately
after the event starts, resulting in a sudden concentration of people in a short
period of time compared to scheduled events. In Fig. 3(b), we can see that people
gathered between 12:00 and 13:00 for event K, and PM was able to detect this
event in real time while congestion was occurring within that short duration.
The unscheduled event L (heavy congestion at an airport) does not have a clear
event occurrence time, but PM detected congestion at 6:00 in the morning,
15 min earlier than CM-1 and 120 min earlier than CM-2. CM-1 and CM-2 did
not detect the events until the time when people had clearly formed a crowd,
since it detects the events characterized by significant increase in human mobility.
What these results suggest is that PM can deliver information even before an
event congestion ends (i.e., with a maximum delay of 30 min, and a minimum of
75 min before the event starts), given different event types and scales.
extracted tweets are effective for understanding the target events. We compared
among four scores, i.e., each with different strategies using variant combinations
of equations (6)–(8). Tweet extraction based solely on the D-Score is denoted
as M-1, D-Score + HV -Score as M-2, D-Score + P T -Score as M-3, and lastly,
D-Score + HV -Score + P T -Score as M-4. The optimal number of clusters L∗
for each event is automatically determined by our method (see Sect. 3.2). We
compared these four methods to two baselines:
– B-1: The most intuitive and simple method assumes that tweets with a post-
ing time closer to the event-detection time are more likely to be event-relevant
tweets. The extracted tweets included POI names and ordered from the time
they were posted until close to the time that the event was detected.
– B-2: To verify the effectiveness of our method to select the event-related topic,
we extracted tweets that included POI names and ordered them based on their
distance to the centroid of all the tweets in the feature space without topic
clustering, i.e., the score of a tweet tw is computed as follows: B-2-Scoretw =
cos(tw, μT W ). This method is similar to techniques used in existing document
summarization tasks [6].
Using an existing study [24] as our reference, we manually evaluated the
relevance of the extracted tweets to the event in question. Each method scored
the tweets’ relevance to the event, and then the 10 tweets with the highest scores
were extracted as event-relevant tweets. Three annotators graded the value of
each tweet (420 tweets in total, since there are 6 methods and 7 events) on a
3-point scale. A grade of 2 means that the annotator can identify what event
is occurring (i.e., the cause and time of the event) by referring to the tweet. A
grade of 1 means one can identify what event, but other interpretations are also
possible. A grade of 0 means one cannot identify the event just by referring to the
tweet. Each method was evaluated by the percentage of tweets that were given
a grade of 2 by at least two annotators. Lastly here, we computed for Fleiss’
κ to measure the inter-rater agreement [13]. κ was 0.976 (an almost perfect
agreement) on all tweet grades provided by our three annotators. This suggests
that each tweet grade is highly reliable.
Table 3 shows the evaluation results we obtained, which demonstrate that
M-4 performs well on these datasets. In 5 out of 7 events, all 10 tweets that
were extracted are event-relevant. In particular, the M-4 score for the baseball
game at the Tokyo Dome (event F) is a significant improvement from B-2’s
(i.e., 0.70 and 0.20, respectively). The tweets extracted using B-1 and B-2 gave
out incorrect topics on events that were held at the same place. In event F for
instance, a popular singer announced during a baseball game via Twitter a future
live concert event at the Tokyo Dome, and many Twitter users simultaneously
posted tweets about that future event. Consequently, B-1 and B-2 incorrectly
extracted the tweets that were about the upcoming live concert instead of the
tweets about the ongoing baseball game. This is because B-1 and B-2 were not
able to select the event-related topic. However, M-4 incorrectly extracted a few
tweets that were not relevant to the event, specifically, 5 tweets at events F and
L out of all the 70 extracted tweets. These tweets coincidentally slipped into the
Event Detection and Event-Relevant Tweet Extraction with Human Mobility 19
relevant topics at the time the event occurred. For instance, the irrelevant tweets
at Tokyo Dome (event F) included contents about baseball video games and
baseball games at the Tokyo Dome the following day. It is difficult to filter these
tweets because they are also tweets about baseball. To eliminate such tweets,
setting the appropriate stop words (e.g., “video”) or extracting the current tweets
by considering the tense of the tweet content would be effective. Further, our
method scored 0.80 on the heavy congestion at Tokyo International Airport
(event L), while B-2 scored 0.90 (however, one of the two irrelevant tweets was
given a grade of 2 by one annotator and 1 by two annotators). The 6 out of 10
tweets extracted using B-2 were short statements of user’s thoughts, citing the
same breaking news about the event (i.e., airport congestion). We found that
when many similar event-relevant tweets are posted at the same time and no
other topics are posted (i.e., less diversity in the semantics of the tweets), they
can be correctly extracted regardless of the topic. This suggest that it is possible
to not cluster the tweets if they are not widely distributed in the feature space
because such distribution indicates lack of semantic diversity among tweets.
Event type ID POI name B-1 B-2 M-1 M-2 M-3 M-4
Scheduled events F Tokyo Dome 0.30 0.20 0.60 0.70 0.60 0.70
G Tokyo Big Sight 0.20 0.90 0.80 0.90 1.00 1.00
H Nagoya Dome 0.70 0.70 0.90 0.70 1.00 1.00
I Port of Nagoya 0.60 0.60 0.70 1.00 0.80 1.00
Unscheduled events L Tokyo International Airport 0.40 0.90 0.50 0.60 0.80 0.80
M Shibuya Station 0.50 0.90 1.00 1.00 1.00 1.00
N Oizumi-gakuen Station 0.90 1.00 1.00 1.00 1.00 1.00
Total average 0.51 0.74 0.79 0.84 0.89 0.93
Table 3 also shows the improvement in score with the addition of the
HV -Score (M-2 and M-4) for the fireworks festival at the Port of Nagoya (event
I). We confirmed that the event-relevant tweets about the festival were mostly
posted during the event. We can obtain a higher score by focusing on the times
when people are concentrating (HV -Score) rather than focusing on the times
when there are many tweets (P T -Score). However, for the live concert at Nagoya
Dome (event H), the score was improved by adding the P T -Score (M-3). We
can observe in Fig. 4 that the number of tweets drastically decreased during the
live concert because people participating in the event refrained from posting
tweets. In this case, the P T -Score is effective because it focuses on the time
when there are many tweets (i.e., 15:00). This suggests that it is important to
focus on both the human mobility transition and the number of tweets when
selecting the topics to extract for determining the event-relevant tweets.
20 N. Takeda et al.
Fig. 4. Number of tweets and human mobility transitions at the Nagoya Dome (event
H). The highlighted block indicates the duration of the event.
Table 4 shows sample tweets that were extracted using M-4 and are given
a grade of 2. We can identify what event is occurring at the POI by referring
to tweets like the one in Table 4. For example, in event I, users can understand
that congestion is occurring in a particular grid cell and that the cause of the
congestion is a festival, and if they are intrigued by the tweets, they might
consider participating. Also, in event M, users may consider changing their route
to avoid the congestion due to the train delay. In summary, our method that
considers these characteristics, D-Score + HV -Score + P T -Score, i.e., M-4, was
the most robust of the six methods independent to the type of event.
References
1. Ahmed, M.F., Vanajakshi, L., Suriyanarayanan, R.: Real-time traffic congestion
information from tweets using supervised and unsupervised machine learning tech-
niques. Transp. Dev. Econ. 5(2), 1–11 (2019). https://doi.org/10.1007/s40890-019-
0088-2
2. Akahori, T., Dohsaka, K., Ishii, M., Ito, H.: Efficient creation of Japanese tweet
emotion dataset using sentence-final expressions. In: 2021 IEEE 3rd Global Con-
ference on Life Sciences and Technologies, pp. 501–505 (2021)
3. Allan, J.: Introduction to Topic detection and Tracking. In: Allan, J. (eds) Topic
Detection and Tracking. The Information Retrieval Series, vol. 12, pp. 1–16.
Springer, Boston, (2002). https://doi.org/10.1007/978-1-4615-0933-2 1
4. Alsaedi, N., Burnap, P., Rana, O.: Can we predict a riot? disruptive event detection
using twitter. ACM Trans. Internet Technol. 17(2), 1–26 (2017)
5. Bhuvaneswari, A., Valliyammai, C.: Identifying event bursts using log-normal dis-
tribution of tweet arrival rate in twitter stream. In: Proceedings of the 10th Inter-
national Conference on Advanced Computing, pp. 339–343 (2018)
6. Bennani-Smires, K., Musat, C., Hossmann, A., Baeriswyl, M., Jaggi, M.: Simple
unsupervised keyphrase extraction using sentence embeddings. In: Proceedings of
the 22nd Conference on Computational Natural Language Learning, pp. 221–229
(2018)
22 N. Takeda et al.
7. de Bruijn, J.A., de Moel, H., Jongman, B., Wagemaker, J., Aerts, J.C.J.H.:
TAGGS: grouping tweets to improve global geoparsing for disaster response. J.
Geovisualization Spat. Anal. 2(1), 1–14 (2017). https://doi.org/10.1007/s41651-
017-0010-6
8. Calabrese, F., Ferrari, L., Blondel, V.D.: Urban sensing using mobile phone network
data: a survey of research. ACM Comput. Surv. 47(2), 25-1-25-20 (2014)
9. Chen, Y., Lv, Y., Wang, X., Li, L., Wang, F.Y.: Detecting traffic information from
social media texts with deep learning approaches. IEEE Trans. Intell. Transp. Syst.
20(8), 3049–3058 (2019)
10. Comito, C., Forestiero, A., Pizzuti, C.: Bursty event detection in twitter streams.
ACM Trans. Knowl. Discov. Data 13(4), 1–28 (2019)
11. Cordeiro, M.: Twitter event detection: combining wavelet analysis and topic infer-
ence summarization. In: Proceedings of the 7th Doctoral Symposium in Informatics
Engineering, pp. 123–138 (2012)
12. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep
bidirectional transformers for language understanding. In: Proceedings of the 2019
Conference of the North American Chapter of the Association for Computational
Linguistics: Human Language Technologies, vol. 1, pp. 4171–4186 (2019)
13. Fleiss, J.L.: Measuring nominal scale agreement among many raters. Psychol. Bull.
76(5), 378–382 (1971)
14. Fuse, T., Kamiya, K.: Statistical anomaly detection in human dynamics monitoring
using a hierarchical dirichlet process hidden markov model. IEEE Trans. Intell.
Transp. Syst. 18(11), 3083–3092 (2017)
15. Gutiérrez, C., Figuerias, P., Oliveira, P., Costa, R., Jardim-Goncalves, R.: Twit-
ter mining for traffic events detection. In: Proceedings of the 2015 Science and
Information Conference, pp. 371–378 (2015)
16. Han, Y., Karunasekera, S., Leckie, C., Harwood, A.: Multi-spatial scale event detec-
tion from geo-tagged tweet streams via power-law verification. In: Proceedings of
the 2019 IEEE International Conference on Big Data, pp. 1131–1136 (2019)
17. Hu, Y., Wang, R.Q.: Understanding the removal of precise geotagging in tweets.
Nat. Human Behav. 4, 1219–1221 (2020)
18. Kaiser, M.S., et al.: Advances in crowd analysis for urban applications through
urban event detection. IEEE Trans. Intell. Transp. Syst. 19(10), 3092–3112 (2018)
19. Kaviani, M., Rahmani, H.: EmHash: hashtag recommendation using neural net-
work based on BERT embedding. In: 2020 6th International Conference on Web
Research, pp. 113–118 (2020)
20. Kong, X., et al.: Big trajectory data: a survey of applications and services. IEEE
Access 6, 58295–58306 (2018)
21. Kong, X., Song, X., Xia, F., Guo, H., Wang, J., Tolba, A.: LoTAD: long-term
traffic anomaly detection based on crowdsourced bus trajectory data. World Wide
Web 21(3), 825–847 (2018)
22. Lam, C.T., Gao, H., Ng, B.: A real-time traffic congestion detection system using
on-line images. In: Proceedings of the 2017 IEEE 17th International Conference
on Communication Technology, pp. 1548–1552 (2017)
23. Lee, K., Ganti, R., Srivatsa, M., Mohapatra, P.: Spatio-temporal provenance:
identifying location information from unstructured text. In: Proceedings of the
2013 IEEE International Conference on Pervasive Computing and Communica-
tions Workshops, pp. 499–504 (2013)
24. Mele, I., Crestani, F.: A Multi-source collection of event-labeled news documents.
In: Proceedings of the 2019 ACM SIGIR International Conference on Theory of
Information Retrieval, pp. 205–208 (2019)
Event Detection and Event-Relevant Tweet Extraction with Human Mobility 23
25. Metzler, D., Cai, C., Hovy, E.: Structured event retrieval over microblog archives.
In: Proceedings of the 2012 Conference of the North American Chapter of the
Association for Computational Linguistics: Human Language Technologies, pp.
646–655 (2012)
26. Miller, D.: Leveraging BERT for extractive text summarization on lectures. arXiv
preprint, arXiv:1906.04165 (2019)
27. Mishima, Y., Minamikawa, A.: Anomaly detection of urban dynamics in an extreme
weather with mobile GPS data. In: Proceedings of NetMob 2019 (2019)
28. Neumann, J., Zao, M., Karatzoglou, A., Oliver, N.: Event detection in communi-
cation and transportation data. In: Pattern Recognition and Image Analysis, pp.
827–838 (2013)
29. Ren, H., et al.: Time-series anomaly detection service at Microsoft. In: Proceedings
of the 25th ACM SIGKDD International Conference on Knowledge Discovery and
Data Mining, pp. 3009–3017 (2019)
30. Silveira Jacques Junior, J.C., Musse, S.R., Jung, C.R.: Crowd analysis using com-
puter vision techniques. IEEE Sig. Process. Mag. 27(5), 66–77 (2010)
31. Wei, H., Zhou, H., Sankaranarayanan, J., Sengupta, S., Samet, H.: Detecting lat-
est local events from geotagged tweet streams. In: Proceedings of the 26th ACM
SIGSPATIAL International Conference on Advances in Geographic Information
Systems, pp. 520–523 (2018)
32. Weng, J., Lee, B.S.: Event detection in twitter. In: Proceedings of the 5th Inter-
national AAAI Conference on Weblogs and Social Media (2011)
33. Xu, Z., et al.: Crowdsourcing based description of urban emergency events using
social media big data. IEEE Trans. Cloud Comput. 8(2), 387–397 (2020)
34. Yabe, T., Tsubouchi, K., Sudo, A.: A framework for evacuation hotspot detec-
tion after large scale disasters using location data from smartphones: case study
of Kumamoto earthquake. In: Proceedings of the 24th ACM SIGSPATIAL Inter-
national Conference on Advances in Geographic Information Systems, pp. 1–10
(2016)
35. Yamaki, S., Lin, S.D., Kameyama, W.: Detection of anomaly state caused by unex-
pected accident using data of smart card for public transportation. In: Proceedings
of the 2019 IEEE International Conference on Big Data, pp. 1693–1698 (2019)
36. Yamamoto, K., Shimada, K.: Acquisition of periodic events with person attributes.
In: 2020 International Conference on Asian Language Processing, pp. 229–234
(2020)
37. Zhang, C., et al.: TrioVecEvent: embedding-based online local event detection in
geo-tagged tweet streams. In: Proceedings of the 23rd ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining, pp. 595–604 (2017)
38. Zhang, C., et al.: GeoBurst: real-time local event detection in geo-tagged tweet
streams. In: Proceedings 39th International ACM SIGIR Conference on Research
and Development in Information Retrieval, pp. 513–522 (2016)
39. Zhang, Q., Chan, A.B.: Wide-area crowd counting via ground-plane density maps
and multi-view fusion CNNs. In: Proceedings of the 2019 IEEE/CVF Conference
on Computer Vision and Pattern Recognition, pp. 8289–8298 (2019)
Design of Room-Layout Estimator Using
Smart Speaker
1 Introduction
Currently, smart home appliances are becoming prevalent owing to recent
advances in wireless communication and Internet of Things (IoT)-related tech-
nologies. Using smart speakers working as a voice user interface (VUI), such as
Google Home and Amazon Alexa, we can control smart home appliances using
our voice.
For VUI-based control, we need to specify which device to do what. For
example, we can turn on the lights by ordering a smart speaker to turn on the
light in the living room. In this example, we need to explicitly specify the light in
This work was supported in part by the Japan Society for the Promotion of Science
(JSPS) KAKENHI Grant Numbers JP21K11847, JP20KK0258, and JP19KT0020 as
well as the Cooperative Research Project Program of RIEC, Tohoku University.
c ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2022
Published by Springer Nature Switzerland AG 2022. All Rights Reserved
T. Hara and H. Yamaguchi (Eds.): MobiQuitous 2021, LNICST 419, pp. 24–39, 2022.
https://doi.org/10.1007/978-3-030-94822-1_2
Design of Room-Layout Estimator Using Smart Speaker 25
the living room because there are lights in every room. To uniquely specify the
target device, we often use room names that are configured to a smart speaker
before using the smart speaker.
However, smart speakers often experience ambiguous target problems. We
often forget to specify a room name because we implicitly aim to control devices
in the room where we are in. A target device specified by demonstrative words,
such as this light also causes a similar ambiguity.
Another cause of the ambiguous target problem is the ambiguity in the room
names. Different names are often used to specify rooms. For example, we might
attempt to turn on the light in the living room by ordering turn on the light in
the drawing room or turn on the light in the front room.
To address the ambiguous target problem, context-aware decision-making has
been proposed [2,3]. In context-aware decision-making approaches, the control
target is estimated based on the user’s context. However, user context estima-
tion requires sensors and a machine-learning model pre-trained with the user’s
previous behaviors.
In this study, we propose a new approach, room-layout-based appliance con-
trol, as shown in Fig. 1. In practical situations, ambiguous control commands
are often used, such as turn on the light. When a user makes an ambiguous com-
mand, such as turn on the light, we assume that the user aims to order turn on
the light in the room they are in. A smart speaker, therefore, estimates the room
where the user is located using a user‘s location estimator. The room layout,
which comprises room directions and types, such as a living room and bedroom,
is also estimated by a smart speaker using a room-layout estimator to determine
the room name where the control target is located.
As a first step toward this goal, in this study, we present the design of a
room-layout estimator for smart speakers. Our assumption here is that smart
speakers are equipped with a couple of microphones to estimate the user location.
Analyzing the sound source direction, the room-layout estimator first estimates
the direction of the rooms. The type of the rooms is then estimated based on
26 T. Joya et al.
the activity sound, such as faucet sound, dish sound, and TV sounds, derived
from the room direction. Although smart speakers on the market have a single
microphone, we believe that in the near future, smart speakers will be equipped
with multiple microphones to improve robustness to noise and to improve users‘
voice separation performance.
Our main contributions are as follows:
– We propose a room-layout-based appliance control method for smart speakers.
To the best of our knowledge, this is the first attempt to utilize the layout of
rooms estimated on smart speakers to determine the control target appliance.
– We present the design of a room-layout estimator for smart speakers equipped
with multiple microphones. In contrast to existing sound source localization
technologies, our approach for the room-layout estimation utilizes the room-
specific characteristics of the reflected sound to distinguish different rooms.
– We show the basic performance of our room-layout estimator through exper-
imental evaluations. We collected the home activity sound data from two
different houses. The experimental evaluations demonstrated that the room-
direction estimation accuracy and room-type estimation accuracy were 0.850
and 0.714, respectively.
The remainder of this paper is organized as follows. Section 2 describes related
work on sound source localization in indoor environments. In Sect. 3, we present
the design of our room-layout estimator that utilizes multiple microphones on
a smart speaker, followed by experimental evaluations in Sect. 4. Finally, the
paper is concluded in Sect. 5.
2 Related Work
To the best of our knowledge, this is the first attempt to estimate a room layout
using a microphone array rather than sound sources.
Sound source localization, which estimates the location of sound sources using
a microphone array, has been widely studied and includes time delay estimation,
beamforming, and subspace-based methods. Typical time delay estimators are
cross-correlation-based methods where sound sources‘ locations are estimated
by calculating the cross-correlations between microphones [7,12,15]. The beam-
forming methods are represented by delay-and-sum beamformers, which combine
sound signals on multiple microphones with phase compensation [14,16]. The
representative subspace-based method is the MUSIC method that utilizes the
orthogonality of signal and noise components in the spatial correlation matrix
of microphone array signals to estimate the location of sound sources [4,11].
Numerous studies on sound source localization have attempted to reduce the
influence of reflected sound signals in indoor environments, where the sound
localization performance degrades because of reverberation.
Suzuki et al. presented a sub-band peak hold process, which considers the
amplitude of a direct sound signal, the sound signal that first reaches the micro-
phones, and masks the reflected sound signals that reach subsequent to the direct
Design of Room-Layout Estimator Using Smart Speaker 27
Fig. 2. Example of a sound density map with a single sound source moving in 4 rooms
3.1 Approach
Our primary approach to estimating the room layout is to extract the reverbera-
tion features using a sound density map (a map of the sound power distribution
as a function of time for each angle). We found that the sound signals from
different rooms have different reverberation features because of the differences
in size, wall locations, and diffraction objects. The difference in reverberation
features appears as a difference in the band on the sound density map. There-
fore, we distinguish sound signals from different rooms based on the features of
bands on a sound density map using unsupervised learning algorithms.
28 T. Joya et al.
Figure 2 shows an example of a sound density map with a single sound source,
this is, a vacuum cleaner, moving in four rooms. We installed a microphone array
in a room of a 1-bedroom smart house and collected sound signals to draw a
sound density map using the MUSIC method [11]. In Fig. 2, the moving sound
source moves from one room to the next room at the time indicated by the dashed
lines. We can confirm that the width and fluctuation of the band appearing on
the sound density map are dependent on the location of the sound source.
There are multiple sound sources in a practical environment, resulting in
multiple bands corresponding to the sound sources on a sound density map. We
first divide the sound sources and then group them by estimating the room where
the sound source was located, by unsupervised learning with features extracted
from a sound density map.
3.2 Assumptions
We assume that our method, that is, the room-layout estimator for a smart
speaker, is used in a residential environment, such as a 2-bedroom house where
multiple rooms are on the same floor and are located adjacent to each other
with doors separating them. A smart speaker with a microphone array was
installed in one of the rooms. Our goal is to estimate the room layout of rooms
connected via a door to a room where the smart speaker is installed. In these
rooms, multiple people live together. They might make living noises at different
locations simultaneously. The number of rooms next to the room where the smart
speaker is installed is given before the room-layout estimation.
Another random document with
no related content on Scribd:
them as placed on Christ’s right and left hand: this being also the
order which the builder adopts in his Scripture history on the façade
—so that it is to be read from left to right—i. e. from Christ’s left to
Christ’s right, as He sees it. Thus, therefore, following the order of
the great statues: first in the central porch, there are six apostles on
Christ’s right hand, and six on His left. On His left hand, next Him,
Peter; then in receding order, Andrew, James, John, Matthew,
Simon; on His right hand, next Him, Paul; and in receding order,
James the Bishop, Philip, Bartholomew, Thomas, and Jude. These
opposite ranks of the Apostles occupy what may be called the apse
or curved bay of the porch, and form a nearly semicircular group,
clearly visible as we approach But on the sides of the porch, outside
the lines of apostles, and not clearly seen till we enter the porch are
the four greater prophets. On Christ’s left, Isaiah and Jeremiah, on
His right, Ezekiel and Daniel.
Then in front, along the whole façade—read in order from
Christ’s left to His right—come the series of the twelve minor
prophets, three to each of the four piers of the temple, beginning at
the south angle with Hosea, and ending with Malachi.
As you look full at the façade in front, the statues which fill the
minor porches are either obscured in their narrower recesses or
withdrawn behind each other so as to be unseen. And the entire
mass of the front is seen, literally, as built on the foundation of the
Apostles and Prophets, Jesus Christ Himself being the chief corner-
stone. Literally that; for the receding Porch is a deep “angulus” and
its mid-pillar is the “Head of the Corner.”
Built on the foundation of the Apostles and Prophets, that is to
say of the Prophets who foretold Christ, and the Apostles who
declared Him. Though Moses was an Apostle of God, he is not here
—though Elijah was a Prophet of God, he is not here. The voice of
the entire building is that of the Heaven at the Transfiguration. “This
is my beloved Son, hear ye Him.”
There is yet another and a greater prophet still, who, as it seems
at first, is not here. Shall the people enter the gates of the temple,
singing “Hosanna to the Son of David;” and see no image of his
father, then?—Christ Himself declare, “I am the root and offspring of
David;” and yet the Root have no sign near it of its Earth?
Not so. David and his Son are together. David is the pedestal of
the Christ.
We will begin our examination of the Temple front, therefore with
this goodly pedestal stone. The statue of David is only two-thirds life-
size, occupying the niche in front of the pedestal. He holds his
sceptre in his right hand, the scroll in his left. King and Prophet, type
of all Divinely right doing, and right claiming, and right proclaiming,
kinghood forever.
The pedestal of which this statue forms the fronting or western
sculpture, is square, and on the two sides of it are two flowers in
vases, on its north side the lily, and on its south the rose. And the
entire monolith is one of the noblest pieces of Christian sculpture in
the world.
Above this pedestal comes a minor one, bearing in front of it a
tendril of vine, which completes the floral symbolism of the whole.
The plant which I have called a lily is not the Fleur de Lys, nor the
Madonna’s, but an ideal one with bells like the crown Imperial
(Shakespeare’s type of “lilies of all kinds”), representing the mode of
growth of the lily of the valley, which could not be sculptured so large
in its literal form without appearing monstrous, and is exactly
expressed in this tablet—as it fulfils, together with the rose and vine,
its companions, the triple saying of Christ, “I am the Rose of Sharon,
and the Lily of the Valley.” “I am the true Vine.”
On the side of the upper stone are supporters of a different
character. Supporters,—not captives nor victims; the Cockatrice and
Adder. Representing the most active evil principles of the earth, as in
their utmost malignity; still Pedestals of Christ, and even in their
deadly life, accomplishing His final will.
Both creatures are represented accurately in the mediæval
traditional form, the cockatrice half dragon, half cock; the deaf adder
laying one ear against the ground and stopping the other with her
tail.
The first represents the infidelity of Pride. The cockatrice—king
serpent or highest serpent—saying that he is God, and will be God.
The second, the infidelity of Death. The adder (nieder or nether
snake) saying that he is mud and will be mud.
Lastly, and above all, set under the feet of the statue of Christ
Himself, are the lion and dragon; the images of Carnal sin, or Human
sin, as distinguished from the Spiritual and Intellectual sin of Pride,
by which the angels also fell.
THE external aspect has nothing worthy of note. The only objects
that attract the eye are the four high white minarets that rise at the
four corners of the edifice, upon pedestals as big as houses. The
famous cupola looks small. It appears impossible that it can be the
same dome that swells into the blue air, like the head of a Titan, and
is seen from Pera, from the Bosphorus, from the Sea of Marmora,
and from the hills of Asia. It is a flattened dome, flanked by two half
domes, covered with lead, and perforated with a wreath of windows,
supported upon four walls painted in stripes of pink and white,
sustained in their turn by enormous bastions, around which rise
confusedly a number of small mean buildings, baths, schools,
mausoleums, hospitals, etc., which hide the architectural forms of
the basilica. You see nothing but a heavy, irregular mass, of a faded
colour, naked as a fortress, and not to all appearance large enough
to hold within it the immense nave of Santa Sofia’s church. Of the
ancient basilica nothing is really visible but the dome, which has lost
the silvery splendour that once made it visible, according to the
Greeks, from the summit of Olympus. All the rest is Mussulman. One
summit was built by Mahomet the Conqueror, one by Selim II., the
other two by Amurath III. Of the same Amurath are the buttresses
built at the end of the Sixteenth Century to support the walls shaken
by an earthquake, and the enormous crescent in bronze planted
upon the top of the dome, of which the gilding alone cost fifty
thousand ducats.
THE MOSQUE OF SANTA-SOFIA.
IT is said that the line in Heber’s “Palestine” which describes the rise
of Solomon’s temple originally ran—
“Like the green grass, the noiseless fabric grew;”
and that, at Sir Walter Scott’s suggestion, it was altered to its present
form—
“Like some tall palm, the noiseless fabric sprung.”
Whether we adopt the humbler or the grander image, the
comparison of the growth of a fine building to that of a natural
product is full of instruction. But the growth of an historical edifice like
Westminster Abbey needs a more complex figure to do justice to its
formation: a venerable oak, with gnarled and hollow trunk, and
spreading roots, and decaying bark, and twisted branches, and
green shoots; or a coral reef extending itself with constantly new
accretions, creek after creek, and islet after islet. One after another,
a fresh nucleus of life is formed, a new combination produced, a
larger ramification thrown out. In this respect Westminster Abbey
stands alone amongst the edifices of the world. There are, it may be,
some which surpass it in beauty or grandeur; there are others,
certainly, which surpass it in depth and sublimity of association; but
there is none which has been entwined by so many continuous
threads with the history of a whole nation....
WESTMINSTER ABBEY.
The vast political pageants of which it has been the theatre, the dust
of the most worldly laid side by side with the dust of the most saintly,
the wrangles of divines or statesmen which have disturbed its sacred
peace, the clash of arms which has pursued fugitive warriors and
princes into the shades of its sanctuary—even the traces of
Westminster boys who have played in its cloisters and inscribed their
names on its walls—belong to the story of the Abbey no less than its
venerable beauty, its solemn services, and its lofty aspirations....
The Chapel of Henry VII. is indeed well called by his name, for it
breathes of himself through every part. It is the most signal example
of the contrast between his closeness in life, and his “magnificence
in the structures he had left to posterity”—King’s College Chapel, the
Savoy, Westminster. Its very style was believed to have been a
reminiscence of his exile, being “learned in France,” by himself and
his companion Fox. His pride in its grandeur was commemorated by
the ship, vast for those times, which he built, “of equal cost with his
Chapel,” “which afterwards, in the reign of Queen Mary, sank in the
sea and vanished in a moment.”
It was to be his chantry as well as his tomb, for he was
determined not to be behind the Lancastrian princes in devotion; and
this unusual anxiety for the sake of a soul not too heavenward in its
affections expended itself in the immense apparatus of services
which he provided. Almost a second Abbey was needed to contain
the new establishment of monks, who were to sing in their stalls “as
long as the world shall endure.” Almost a second Shrine, surrounded
by its blazing tapers, and shining like gold with its glittering bronze,
was to contain his remains.
To the Virgin Mary, to whom the chapel was dedicated he had a
special devotion. Her “in all his necessities he had made his
continual refuge;” and her figure, accordingly, looks down upon his
grave from the east end, between the apostolic patrons of the Abbey,
Peter and Paul, with “the holy company of heaven—that is to say,
angels, archangels, patriarchs, prophets, apostles, evangelists,
martyrs, confessors and virgins,” to “whose singular mediation and
prayers he also trusted,” including the royal saints of Britain, St.
Edward, St. Edmund, St. Oswald, St. Margaret of Scotland, who
stand, as he directed, sculptured, tier above tier, on every side of the
Chapel; some retained from the ancient Lady Chapel; the greater
part the work of his own age. Around his tomb stand his
“accustomed Avours or guardian saints” to whom “he calls and
cries”—“St. Michael, St. John the Baptist, St. John the Evangelist, St.
George, St. Anthony, St. Edward, St. Vincent, St. Anne, St. Mary
Magdalene, and St. Barbara,” each with their peculiar emblems,
—“so to aid, succour, and defend him, that the ancient and ghostly
enemy, nor none other evil or damnable spirit, have no power to
invade him, nor with their wickedness to annoy him, but with holy
prayers to be intercessors to his Maker and Redeemer.” These were
the adjurations of the last mediæval King, as the Chapel was the
climax of the latest mediæval architecture. In the very urgency of the
King’s anxiety for the perpetuity of these funeral ceremonies, we
seem to discern an unconscious presentiment lest their days were
numbered.
But, although in this sense the Chapel hangs on tenaciously to
the skirts of the ancient Abbey and the ancient Church, yet that
solemn architectural pause between the two—which arrests the most
careless observer, and renders it a separate structure, a foundation
“adjoining the Abbey” rather than forming part of it—corresponds
with marvellous fidelity to the pause and break in English history of
which Henry VII.’s reign is the expression. It is the close of the
Middle Ages: the apple of Granada in its ornaments shows that the
last Crusade was over; its flowing draperies and classical attitudes
indicate that the Renaissance had already begun. It is the end of the
Wars of the Roses, combining Henry’s right of conquest with his
fragile claim of hereditary descent. On the one hand, it is the
glorification of the victory of Bosworth. The angels, at the four
corners of the tomb, held or hold the likeness of the crown which he
won on that famous day. In the stained-glass we see the same
crown hanging on the green bush in the fields of Leicestershire. On
the other hand, like the Chapel of King’s College at Cambridge, it
asserts everywhere the memory of the “holy Henry’s shade”; the Red
Rose of Lancaster appears in every pane of glass: and in every
corner is the Portcullis—the “Alters securitas,” as he termed it, with
an allusion to its own meaning, and the double safeguard of his
succession—which he derived through John of Gaunt from the
Beaufort Castle in Anjou, inherited from Blanche of Navarre by
Edmund Crouchback; whilst Edward IV. and Elizabeth of York are
commemorated by intertwining these Lancastrian symbols with the
Greyhound of Cecilia Neville, wife of Richard, Duke of York, with the
Rose in the Sun, which scattered the mists at Barnet, and the Falcon
on the Fetterlock, by which the first Duke of York expressed to his
descendants that “he was locked up from the hope of the kingdom,
but advising them to be quiet and silent, as God knoweth what may
come to pass.”
It is also the revival of the ancient, Celtic, British element in the
English monarchy, after centuries of eclipse. It is a strange and
striking thought, as we mount the steps of Henry VII.’s Chapel, that
we enter there a mausoleum of princes, whose boast it was to be
descended, not from the Confessor or the Conqueror, but from
Arthur and Llewellyn; and that round about the tomb, side by side
with the emblems of the great English Houses, is to be seen the Red
Dragon of the last British king, Cadwallader—“the dragon of the
great Pendragonship” of Wales, thrust forward by the Tudor king in
every direction, to supplant the hated White Boar of his departed
enemy—the fulfilment, in another sense than the old Welsh bards
had dreamt, of their prediction that the progeny of Cadwallader
should reign again....
We have seen how, by a gradual but certain instinct, the main
groups have formed themselves round particular centres of death:
how the Kings ranged themselves round the Confessor; how the
Prince and Courtiers clung to the skirts of Kings; how out of the
graves of the Courtiers were developed the graves of the Heroes;
how Chatham became the centre of the Statesmen, Chaucer of the
Poets, Purcell of the Musicians, Casaubon of the Scholars, Newton
of the Men of Science: how, even in the exceptional details, natural
affinities may be traced; how Addison was buried apart from his
brethren in letters, in the royal shades of Henry VII.’s Chapel,
because he clung to the vault of his own loved Montague; how
Ussher lay beside his earliest instructor, Sir James Fullerton, and
Garrick at the foot of Shakespeare, and Spelman opposite his
revered Camden, and South close to his master Busby, and
Stephenson to his fellow-craftsman Telford, and Grattan to his hero
Fox, and Macaulay beneath the statue of his favourite Addison.
These special attractions towards particular graves and
monuments may interfere with the general uniformity of the Abbey,
but they make us feel that it is not a mere dead museum, that its cold
stones are warmed with the life-blood of human affections and
personal partiality. It is said that the celebrated French sculptor of the
monument of Peter the Great at St. Petersburg, after showing its
superiority in detail to the famous equestrian statue of Marcus
Aurelius at Rome, ended by the candid avowal, “Et cependant cette
mauvaise bête est vivante, et la mienne est morte.” Perhaps we may
be allowed to reverse the saying, and when we contrast the
irregularities of Westminster Abbey with the uniform congruity of
Salisbury or the Valhalla, may reflect, “Cette belle bête est morte,
mais la mienne est vivante.”