Professional Documents
Culture Documents
Textbook Web Information Systems Engineering Wise 2017 18Th International Conference Puschino Russia October 7 11 2017 Proceedings Part I 1St Edition Athman Bouguettaya Et Al Eds Ebook All Chapter PDF
Textbook Web Information Systems Engineering Wise 2017 18Th International Conference Puschino Russia October 7 11 2017 Proceedings Part I 1St Edition Athman Bouguettaya Et Al Eds Ebook All Chapter PDF
https://textbookfull.com/product/health-information-science-6th-
international-conference-his-2017-moscow-russia-
october-7-9-2017-proceedings-1st-edition-siuly-siuly-et-al-eds/
https://textbookfull.com/product/web-information-systems-
engineering-wise-2020-21st-international-conference-amsterdam-
the-netherlands-october-20-24-2020-proceedings-part-i-zhisheng-
Web Information Systems Engineering WISE 2014 15th
International Conference Thessaloniki Greece October 12
14 2014 Proceedings Part II 1st Edition Boualem
Benatallah
https://textbookfull.com/product/web-information-systems-
engineering-wise-2014-15th-international-conference-thessaloniki-
greece-october-12-14-2014-proceedings-part-ii-1st-edition-
boualem-benatallah/
https://textbookfull.com/product/rough-sets-international-joint-
conference-ijcrs-2017-olsztyn-poland-july-3-7-2017-proceedings-
part-i-1st-edition-lech-polkowski/
Athman Bouguettaya · Yunjun Gao
Andrey Klimenko · Lu Chen
Xiangliang Zhang · Fedor Dzerzhinskiy
Weijia Jia · Stanislav V. Klimenko · Qing Li (Eds.)
Web Information
LNCS 10569
Systems Engineering –
WISE 2017
18th International Conference
Puschino, Russia, October 7–11, 2017
Proceedings, Part I
123
Lecture Notes in Computer Science 10569
Commenced Publication in 1973
Founding and Former Series Editors:
Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board
David Hutchison
Lancaster University, Lancaster, UK
Takeo Kanade
Carnegie Mellon University, Pittsburgh, PA, USA
Josef Kittler
University of Surrey, Guildford, UK
Jon M. Kleinberg
Cornell University, Ithaca, NY, USA
Friedemann Mattern
ETH Zurich, Zurich, Switzerland
John C. Mitchell
Stanford University, Stanford, CA, USA
Moni Naor
Weizmann Institute of Science, Rehovot, Israel
C. Pandu Rangan
Indian Institute of Technology, Madras, India
Bernhard Steffen
TU Dortmund University, Dortmund, Germany
Demetri Terzopoulos
University of California, Los Angeles, CA, USA
Doug Tygar
University of California, Berkeley, CA, USA
Gerhard Weikum
Max Planck Institute for Informatics, Saarbrücken, Germany
More information about this series at http://www.springer.com/series/7409
Athman Bouguettaya Yunjun Gao
•
Qing Li (Eds.)
Web Information
Systems Engineering –
WISE 2017
18th International Conference
Puschino, Russia, October 7–11, 2017
Proceedings, Part I
123
Editors
Athman Bouguettaya Fedor Dzerzhinskiy
University of Sydney Institute of Computing for Physics
Darlington, NSW and Technology
Australia Protvino
Russia
Yunjun Gao
Zhejiang University Weijia Jia
Hangzhou Shanghai Jiao Tong University
China Minhang Qu
China
Andrey Klimenko
Institute of Computing for Physics Stanislav V. Klimenko
and Technology Institute of Computing for Physics
Protvino and Technology
Russia Protvino
Russia
Lu Chen
Nanyang Technological University Qing Li
Singapore City University of Hong Kong
Singapore Kowloon
Hong Kong
Xiangliang Zhang
King Abdullah University of Science
and Technology
Thuwal
Saudi Arabia
LNCS Sublibrary: SL3 – Information Systems and Applications, incl. Internet/Web, and HCI
Mr. Ravshan Burkhanov, and Mr. Boris Strelnikov; the WISE Steering Committee
representative, Prof. Yanchun Zhang. The editors and chairs are grateful to Ms. Sudha
Subramani and Mr. Sarathkumar Rangarajan for their help with preparing the pro-
ceedings and updating the conference website.
We would like to sincerely thank our keynote and invited speakers:
– Professor Beng Chin Ooi, Fellow of the ACM, IEEE, and Singapore National
Academy of Science (SNAS), NGS faculty member and Director of Smart Systems
Institute, National University of Singapore, Singapore
– Professor Lei Chen, Department of Computer Science and Engineering, Hong Kong
University, Hong Kong, SAR China
– Professor Jie Lu, Associate Dean (Research Excellence) in the Faculty of Engi-
neering and Information Technology, University of Technology Sydney, Sydney,
Australia
In addition, special thanks are due to the members of the international Program
Committee and the external reviewers for a rigorous and robust reviewing process. We
are also grateful to the Moscow Institute of Physics and Technology, Russia, the
Institute of Computing for Physics and Technology, Russia, City University of Hong
Kong, SAR China, University of Sydney, Australia, Zhejiang University, China,
Victoria University, Australia, University of New South Wales, Australia, and the
International WISE Society for supporting this conference. The WISE Organizing
Committee is also grateful to the special session organizers for their great efforts to help
promote Web information system research to a broader audience.
We expect that the ideas that emerged at WISE 2017 will result in the development
of further innovations for the benefit of scientific, industrial, and social communities.
General Co-chairs
Stanislav V. Klimenko Moscow Institute of Physics and Technology, Russia
Qing Li City University of Hong Kong, SAR China
Program Co-chairs
Athman Bouguettaya University of Sydney, Australia
Yunjun Gao Zhejiang University, China
Andrey Klimenko Institute of Computing for Physics and Technology, Russia
Workshop Co-chairs
Reynold C.K. Cheng The University of Hong Kong, SAR China
An Liu Soochow University, China
Publication Chair
Lu Chen Nangyang Technological University, Singapore
Publicity Co-chairs
Jiannan Wang Simon Fraser University, Canada
Bin Yao Shanghai Jiao Tong University, China
Daria Marinina Moscow Institute of Physics and Technology, Russia
Mikhail Pochkaylov Moscow Institute of Physics and Technology, Russia
Anton Semenistyy Moscow Institute of Physics and Technology, Russia
VIII Organization
Program Committee
Karl Aberer EPFL, Switzerland
Mohammed Eunus Ali Bangladesh University of Engineering and Technology,
Bangladesh
Toshiyuki Amagasa University of Tsukuba, Japan
Athman Bouguettaya University of Sydney, Australia
Yi Cai South China University of Technology, China
Xin Cao UNSW, Australia
Bin Cao Zhejiang University of Technology, China
Richard Chbeir LIUPPA Laboratory, France
Lisi Chen Hong Kong Baptist University, SAR China
Jinchuan Chen Renmin University of China, China
Cindy Chen University of Massachusetts Lowell, USA
Jacek Chmielewski Poznań University of Economics and Business, Poland
Alex Delis University of Athens, Greece
Ting Deng Beihang University, China
Hai Dong RMIT University, Australia
Schahram Dustdar TU Wien, Austria
Fedor Dzerzhinskiy Promsvyazbank, Russia
Islam Elgedawy Middle East Technical University, Turkey
Hicham Elmongui Alexandria University, Egypt
Yunjun Gao Zhejiang University, China
Thanaa Ghanem Metropolitan State University, USA
Azadeh Ghari Neiat University of Sydney, Australia
Daniela Grigori Laboratoire LAMSADE, Université Paris Dauphine,
France
Viswanath Gunturi Indian Institute of Technology Ropar, India
Hakim Hacid Bell Labs, USA
Armin Haller Australian National University, Australia
Tanzima Hashem Bangladesh University of Engineering and Technology,
Bangladesh
Organization IX
Data Mining
Pattern Mining
Cloud Computing
Query Processing
Graph Theory
Event Detection
Event Cube – A Conceptual Framework for Event Modeling and Analysis. . . . 499
Qing Li, Yun Ma, and Zhenguo Yang
Web-Based Applications
A Robust and Fast Reputation System for Online Rating Systems . . . . . . . . . 175
Mohsen Rezvani and Mojtaba Rezvani
Efficient Multi-version Storage Engine for Main Memory Data Store . . . . . . 205
Jinwei Guo, Bing Xiao, Peng Cai, Weining Qian, and Aoying Zhou
Sentiment Analysis
Recommender Systems
Tao Zhang1(&), Bin Zhou1, Jiuming Huang1, Yan Jia1, Bing Zhang2,
and Zhi Li2
1
National University of Defense Technology, Changsha, Hunan, China
towermxt@gmail.com, binzhou@nudt.edu.cn,
jiuming.huang@qq.com, jiayanjy@vip.sina.com
2
Hunan Eefung Software Co., Ltd., Changsha, Hunan, China
{zhangbing,lizhi}@eefung.com
1 Introduction
Microblog (such as Twitter, Snapchat, Sina weibo, etc.), as one of the most prevalent
social media, allows users to share and exchange small digital contents (tweets, blog,
photos, etc.) in a real-time manner. Usually, some new and interesting events spread
vary fast on microblog, and also cause a myriad of discussion posts. For the purpose of
relationship crisis management, product marketing, or even emergency management,
many different microblog users (no matter organizational or personal) prefer to be
informed or alerted as soon as bursty topics start to grow viral or dramatically. Tracking
the microblog stream in a real-time manner can detect those headlines or breaking news
as early as possible.
Bursty topic detection on real-time streams has acquired much research efforts in
recent years, and is increasingly used in many user-focused tasks, such as information
recommendation (Diao et al. [1], Kleinberg [2], Xie et al. [3], Xie et al. [4], Zhu and
Shasha [5]), trend analysis (Huang et al. [6]), and document search (Magdy et al. [7]).
Those detection tasks have been categorized as feature-pivot techniques in some survey
works (Atefeh and Khreich [8]). Bursty topics on real-time microblog streams have
bursty features of not only short-term surged keywords, but also sharply increasing
tweet volume.
In bursty topic detection task, researchers have to face two main challenges, topic
interpretability and memory scalability. Most of the effective prior works [2–4, 7, 9–12]
take tweet volume, words frequency, or co-occurrence words frequency in the data
stream as topic bursty features. When tracking the bursty features on real-time
microblog streams, memory scalability is also a big challenge. Sketch-based methods,
such as TopicSketch [3, 4, 13] and SigniTrend [9] surpass the rest with efficient
performance in memory scalability.
Unfortunately, the word intrusion and topic overlap are always detrimental to the
quality of detected bursty topic. Besides, topic words coherence is also sensitive to the
fixed value N of the picked top-N topic words. Therefore, topic quality with fine
coherence and granularity is another great challenge. In previous studies, a typical way
[10, 14, 15] for this task is to detect bursty words and then cluster them. However, two
drawbacks cause it been substituted, complicated heuristic tuning and post-processing,
since noisy words and words ambiguity are unavoidable. Another attempt is to discover
bursty topic via topic models, like TopicSketch [4] and LDA [16]. But when choose the
top-N words in a detected topic, there always no consensus solutions for general topics.
In this paper, we propose a novel detection framework to detect bursty topics soon
after they start burst, and devise an automatic evaluation on detected topics to provide
coherent topic words with fine granularity. We summarize our major contributions as
follows:
• We proposed a refined version of TopicSketch, a up-to-date and efficient detection
method using tensor decomposition and dimension reduction [3, 17] for real-time
bursty topic detection. Our main improvement is with the evalution of word
intrusion and topic coherence, making use of clustering and fuzzy set theory jointly
to facilitate the process of extracting informative and interpretable bursty topics and
their bursty scores.
• We proposed a novel topic quality measure, sketch-based PMI method to estimate
word intrusion and topic coherence based on pairwise pointwise mutual information
(PMI) among topic words. We take the words sketch statistics for PMI reference
corpus, in which words are dynamically sampled over consecutive sliding window
on real-time data stream, and fresh word probability feeding into PMI, gives esti-
mation of topic coherence much more reasonable and precise.
• We also conduct extensive experiments on real-world data from Eefung.com1 to
demonstrate the efficiency in real-time bursty topic detection, the soundness of the
coherence of the detected topics, and the effectiveness in bursty topic interpretability.
1
http://www.eefung.com/.
A Refined Method for Detecting Interpretable and Real-Time Bursty Topic 5
This paper is organized as follows: Sect. 2 briefly reviews the related work.
Solution overview is specified in Sect. 3. Section 4 explains our topic refinement
model based on tensor decomposition with topic evaluation. The experimental results
are discussed in Sect. 5. The conclusion is summarized in Sect. 6.
2 Related Work
For early bursty topic detection, Kleinberg [2] propose an infinite-state automaton to
model the arrival times of documents in a stream to identify bursts that have high
intensity over limited durations of time. The states of the probabilistic automaton
correspond to the frequencies of individual words, while the state transitions capture
the burst, which correspond to a significant change in word frequency. Twevent [10]
detectes bursty tweet segments as event segments and then clusters the event segments
into events considering both their frequency distribution and content similarity.
Wikipedia is exploited to identify the realistic events. Statistic based methods generate
the bursty topic based on bursty features trend over real-time data stream. TopicSketch
[3, 4, 13] monitors the acceleration of three quantities to provide early signals of
popularity surge, and estimates the topic words probability distribution and topic
acceleration. EMA/MACD [18], trend indicator wildly used in stock market, and
sketch structure contribute to remarkable performance on memory scalability. Sign-
iTrend [5] proposes a significance measure to detect emerging topics early, and can
track even all keyword pairs using only a fixed amount of memory. At last, it aggre-
gates the detected co-trends into larger topics. Huang et al. [6] extract high quality
microblog by transforming some important social media features into wavelet domain
and fuse further to get a weighted ensemble value, which filter much noisy documents,
and then get bursty topic by LDA in new time window data stream.
Research efforts on topic quality evaluation become impressive a lot to approach or
even surpass human levels of accuracy. Newman et al. [19] introduce the notion of
topic “coherence”, and propose an automatic method for estimating topic coherence
based on pairwise PMI between the topic words. Aletras and Stevenson [20] calculate
the distributional similarity between semantic vectors for the top-N topic words using a
range of distributional similarity measures such as cosine similarity and the Dice
coefficient. They show that their method correlates well with the observed coherence
rated by human judges taking Wikipedia as the reference corpus. Lau et al. [21] explore
two tasks of automatic evaluation of single topics and automatic evaluation of whole
topic models, and provide recommendations on the best strategy for performing the two
tasks. They can perform automatic evaluation of the human-interpretability of topics, as
well as topic models. Besides, they have systematically compared different existing
methods and found appreciable differences between them. For reasonable topic gran-
ularity, Lau and Baldwin [22], following Lau et al. [21], investigate the impact of the
cardinality hyper-parameter, parameter N of top-N words, on topic coherence
evaluation.
6 T. Zhang et al.
3 Solution Overview
3.1 Problem Formulation
Just like TopicSketch [3, 4], we follow two criteria in defining a bursty topic:
(1) Bursty topic has to be a sudden surge of related tweets size in a short time, to avoid
continuing hot topics blended into the detection. (2) The size of bursty topic related
microblog would be large enough to filter away the trivial topics.
For topics generated by a topic model, extrinsic evaluation and intrinsic evaluation
demonstrate efficiency and effectiveness of these detected topics. Extrinsic evaluation
explains early detection and the importance of the discoveries. Intrinsic evaluation of
the topics contribute to quantify interpretability via scoring word intrusion and topic
coherence using the top-N topic words [19, 21, 23].
We first discuss how to extract bursty topics based on the refinement topic model, and
then explain how to evaluate the detected bursty topics automatically.
2
http://research.pinnacle.smu.edu.sg/clear/.
A Refined Method for Detecting Interpretable and Real-Time Bursty Topic 7
Sketch
In computing, sketch and its variant, count-min sketch [24], both are probabilistic data
structures that serve as frequency tables of events in a stream of data3. Sketch in our
method, also a variant designed for capturing the trend of word tokens frequency, has
three components: the trend of microblog volume, co-occurrence sketch, dictionary
sketch.
The trend of microblog volume is a valuable indicator for a burst stream containing
bursty topics. We estimate the volume trend by EMA (Exponential Moving Average)
and MACD (Moving Average Convergence/Divergence) [18], widely accepted stock
market trend analysis techniques. Denote Dt means all microblogs at timestamp t, and
jDt j is size of Dt . For a time interval Dt, microblog volume rate is v ¼ jDDDt j=Dt, and
we form a discrete time series V ¼ fvt jt ¼ 0; 1; . . .g. The n-interval EMA with
smoothing factor a is
X
n
EMAðnÞ½vt ¼ avt þ ð1 aÞEMAðn 1Þ½vt1 ¼ að1 aÞk vtk ð1Þ
k0
The n is called the window size in EMA. Usually, the MACD is used to estimate
the acceleration of vt when defined by the difference of its n1 and n2 interval moving
averages:
The co-occurrence sketch contains word pairs acceleration M2 and word triples
acceleration M3 . Their definitions are same with TopicSketch [3]. The acceleration are
3
https://en.wikipedia.org/wiki/Count%E2%80%93min_sketch.
8 T. Zhang et al.
the trends of the frequency of word pairs and word triples, respectively. The dictionary
sketch is statistics for probabilities of all words and pairs on the current data stream,
and it is devised for PMI estimation at topic evaluation stage.
Tensor Decomposition Model
[17] describes that k distinct topics, drawn according to the discrete distribution
specified by the probability vector w ¼ ðw1 ; w2 ; . . .; wk Þ, called burst level in our
method. Given the topic k, the document’s l words are drawn independently according
to the discrete distribution specified by the probability vector /k . The sketches M2 and
M3 [3] are demonstrated as:
X
K X
K
M2 ¼ wk /k /k ¼ wk /k /Tk ð3Þ
k¼1 k¼1
X
K
M3 ¼ wk /k /k /k ð4Þ
k¼1
X
K
M3 ðgÞ ¼ wk /k /Tk hg; /k i ð5Þ
k¼1
Algorithm 1 from TopicSketch [3] explains how the tensor composition works to
generate topics. First, The SVD method performs on M2 to find a whitening matrix to
whiten M3 ðgÞ and get the matrix T3 . Next SVD performs on T3 to find the eigenvector
vk to recover topic words vector. The procedure contains two SVD work stages and
Recovery, and the most time consumption is transforming M3 ðgÞ from a N N matrix
to a K K matrix T3 , which take time in the order of O(KN 2 ). And the method detailed
proved in [3].
Refinement Model
Despite having been adopted in an efficient real-time bursty topic detection system
[13], tensor decomposition [17] topic model does not perform well on topic quality due
to word noise and spam. So, our goal in this part is to preserve bursty topic inter-
pretability and filter away trivial topics by refining the tensor decomposition model.
In the result of word intrusion and topic overlap, the topic (/k ) derived from tensor
decomposition, cannot typically interpret a single real event. We implement clustering
on co-occurred words to avoid these problems. Equation 6 is the notion of each cluster
at each burst level wk .
Each word in cluster Ckm retains the word probability pi in /k , which helps to
estimate burst score akm for the cluster Ckm on burst level wk .
X
akm ¼ wk p ; p 2 /k \ Ckm
i i i
ð8Þ
The refinement model contains two steps, as described at Algorithm 2. The first step
is for clustering at each burst level wk . Top-N words according to /k are clustered into M
clusters according to co-occurred word pairs in the pair dictionary sketch. And the
variable size of clusters can help to provide flexible topic granularity. Besides, obviously
the most of the word pairs that come from a bursty topic related microblog will be
clustered into one cluster. Meanwhile the clusters preserve the topic interpretability
quite well. In the second step, we can obtain a burst score for each cluster in step 1.
Another random document with
no related content on Scribd:
The Project Gutenberg eBook of Los cien mil hijos
de San Luis
This ebook is for the use of anyone anywhere in the United States and
most other parts of the world at no cost and with almost no restrictions
whatsoever. You may copy it, give it away or re-use it under the terms
of the Project Gutenberg License included with this ebook or online at
www.gutenberg.org. If you are not located in the United States, you will
have to check the laws of the country where you are located before
using this eBook.
Language: Spanish
Credits: Ramón Pajares Box. (This file was produced from images
generously made available by The Internet Archive/Canadian
Libraries.)
Nota de transcripción
SAN LUIS
33.000
MA DRID
O B RAS DE P É RE Z G AL DÓ S
132, Hortaleza
1904
EST. TIP. DE LA VIUDA E HIJOS DE TELLO
IMPRESOR DE CÁMARA DE S. M.
C. de San Francisco, 4.
LOS CIEN MIL HIJOS
DE SAN LUIS
1.º El mismo general don Francisco Eguía, cuya alta misión era
promover desde la frontera el levantamiento de partidas realistas.
2.º Don José Morejón, oficial de la secretaría de la Guerra, y
después Secretario reservado de Su Majestad con ejercicio de
decretos, el cual tenía el encargo de gestionar en París con e
gobierno francés los medios de arrancar a España el cauterio de la
Constitución gaditana, sustituyéndole con una cataplasma anodina
hecha en la misma farmacia de donde salió la Carta de Luis XVIII.
Alababa yo estas cosas por no reñir con el anciano general, que era
muy galante y atento conmigo; pero en mi interior deploraba, como
amante muy fiel del régimen absoluto, que cosas tan graves se
emprendieran por la mediación de personas de tan dudoso valer. No
conocía yo en aquellos tiempos a Morejón; pero mis noticias eran que
no había sido inventor de la pólvora. En cuanto a Eguía, debo deci
con mi franqueza habitual que era uno de los hombres más pobres de
ingenio que en mi vida he visto.
Aún gastaba la coleta que le hizo tan famoso en 1814, y con la
coleta el mismo humor atrabiliario, despótico, voluble y regañón. Pero
en Bayona no infundía miedo como en Madrid, y de él se reían todos
No es exagerado cuanto se ha dicho de la astuta pastelera que llegó a
dominarle. Yo la conocí, y puedo atestiguar que el agente de nuestro
egregio soberano comprometía lamentablemente su dignidad y aun la
dignidad de la corona, poniendo en manos de aquella infame muje
negocios tan delicados. Asistía la tal a las conferencias, administraba
gran parte de los fondos, se entendía directamente con los partidarios
que un día y otro pasaban la frontera, y parecía en todo ser ella misma
la organizadora del levantamiento y el principal apoderado de nuestro
querido rey.
Después de esto he pasado temporaditas en Bayona, y he visto la
vergonzosa conducta de algunos españoles que sin cesar conspiran
en aquel pueblo, verdadera antesala de nuestras revoluciones; pero
nunca he visto degradación y torpeza semejantes a las del tiempo de
Eguía. Yo escribía entonces a don Víctor Sáez, residente en Madrid, y
le decía: «Felicite usted a los francmasones, porque mientras la
salvación de Su Majestad siga confiada a las manos que por aqu
tocan el pandero, ellos están de enhorabuena.»
En el invierno del mismo año se realizaron las predicciones que yo
por no poder darle consejos, había hecho al mismo Eguía, y fue que
habiendo convocado de orden del rey a otros personajes absolutistas
para trabajar en comunidad, se desavinieron de tal modo, que aquello
más que junta parecía la dispersión de las gentes. Cada cual pensaba
de distinto modo, y ninguno cedía en su terca opinión. A esta variedad
en los pareceres y terquedad para sostenerlos llamo yo enjaezar los
entendimientos a la calesera, es decir, a la española. El marqués de
Mataflorida[2] proponía el establecimiento del absolutismo puro
Balmaseda, comisionado por el gobierno francés para tratar este
asunto, también estaba por lo despótico, aunque no en grado tan
furioso; Morejón se abrazaba a la Carta francesa; Eguía sostenía e
veto absoluto y las dos Cámaras, a pesar de no saber lo que eran una
cosa y otra, y Saldaña, nombrado como una especie de quinto en
discordia, no se resolvía ni por la tiranía entera ni por la tiranía a media
miel.
[2] Conocido por don Buenaventura en las Memorias de un cortesano
y en La segunda casaca.