Professional Documents
Culture Documents
ebffiledocnew_834Download full chapter Social Computing Second International Conference Of Young Computer Scientists Engineers And Educators Icycsee 2016 Harbin China August 20 22 2016 Proceedings Part I 1St Edition Wanxiang Che pdf docx
ebffiledocnew_834Download full chapter Social Computing Second International Conference Of Young Computer Scientists Engineers And Educators Icycsee 2016 Harbin China August 20 22 2016 Proceedings Part I 1St Edition Wanxiang Che pdf docx
https://textbookfull.com/product/advanced-computer-
architecture-11th-conference-aca-2016-weihai-china-
august-22-23-2016-proceedings-1st-edition-junjie-wu/
https://textbookfull.com/product/intelligent-robotics-and-
applications-9th-international-conference-icira-2016-tokyo-japan-
august-22-24-2016-proceedings-part-i-1st-edition-naoyuki-kubota/
https://textbookfull.com/product/big-data-computing-and-
communications-second-international-conference-
bigcom-2016-shenyang-china-july-29-31-2016-proceedings-1st-
Wanxiang Che · Qilong Han
Hongzhi Wang · Weipeng Jing
Shaoliang Peng · Junyu Lin
Guanglu Sun · Xianhua Song
Hongtao Song · Zeguang Lu (Eds.)
Social Computing
Second International Conference
of Young Computer Scientists,
Engineers and Educators, ICYCSEE 2016
Harbin, China, August 20–22, 2016, Proceedings, Part I
123
Communications
in Computer and Information Science 623
Commenced Publication in 2007
Founding and Former Series Editors:
Alfredo Cuzzocrea, Dominik Ślęzak, and Xiaokang Yang
Editorial Board
Simone Diniz Junqueira Barbosa
Pontifical Catholic University of Rio de Janeiro (PUC-Rio),
Rio de Janeiro, Brazil
Phoebe Chen
La Trobe University, Melbourne, Australia
Xiaoyong Du
Renmin University of China, Beijing, China
Joaquim Filipe
Polytechnic Institute of Setúbal, Setúbal, Portugal
Orhun Kara
TÜBİTAK BİLGEM and Middle East Technical University, Ankara, Turkey
Igor Kotenko
St. Petersburg Institute for Informatics and Automation of the Russian
Academy of Sciences, St. Petersburg, Russia
Ting Liu
Harbin Institute of Technology (HIT), Harbin, China
Krishna M. Sivalingam
Indian Institute of Technology Madras, Chennai, India
Takashi Washio
Osaka University, Osaka, Japan
More information about this series at http://www.springer.com/series/7899
Wanxiang Che Qilong Han
•
Social Computing
Second International Conference
of Young Computer Scientists,
Engineers and Educators, ICYCSEE 2016
Harbin, China, August 20–22, 2016
Proceedings, Part I
123
Editors
Wanxiang Che Junyu Lin
Harbin Institute of Technology Harbin Engineering University
Harbin Harbin
China China
Qilong Han Guanglu Sun
Harbin Engineering University Harbin University of Science
Harbin and Technology
China Harbin
China
Hongzhi Wang
Harbin Institute of Technology Xianhua Song
Harbin Harbin University of Science
China and Technology
Harbin
Weipeng Jing China
Northeast Forestry University
Harbin Hongtao Song
China Harbin Engineering University
Harbin
Shaoliang Peng China
National University of Defense Technology
Changsha Zeguang Lu
China Harbin Sea of Clouds and Computer
Technology
Harbin
China
As the general and program co-chairs of the Second International Conference of Young
Computer Scientists, Engineers and Educators 2016 (ICYCSEE 2016), it is our great
pleasure to welcome you to the proceedings of the conference, which was held in
Harbin, China, during August 20–22, 2016, hosted by Harbin Engineering University.
The goal of this conference is to provide a forum for young computer scientists,
engineers, and educators.
The call for papers of this year’s conference attracted 338 paper submissions. After
the hard work of the Program Committee, 91 papers were accepted to appear in the
conference proceedings, with an acceptance rate of 27 %. The main theme of this
conference was “Social Computing.” The accepted papers cover a wide range of areas
related to social computing such as: science and foundations for social computing,
computation infrastructure for social computing, big data management analysis for
social computing, evaluation methodologies for social computing and social media,
intelligent computation for social computing, natural language processing techniques
and culture analysis in social computing and social media, mobile social computing and
social media, privacy and security in social computing and social media, public opinion
analysis for social media, social modeling, social network analysis, user-generated
content (wikis, blogs), and visualizing social interaction.
We would like to thank all the Program Committee members – 178 members from
84 institutes – for their hard work in completing the review tasks. Their collective
efforts made it possible to attain quality reviews for all the submissions within a few
weeks. Their diverse expertise in each individual research area helped us to create an
exciting program for the conference. Their comments and advice helped the authors to
improve the quality of their papers and gain deeper insights.
Our thanks also go to the authors and participants for their tremendous support in
making the conference a success. Moreover, we thank Dr. Lanlan Chang and Jian Li
from Springer, whose professional assistance was invaluable in the production of the
proceedings.
Besides the technical program, this year ICYCSEE offered different experiences to
the participants. We hope you enjoy the conference proceedings.
General Chairs
Qilong, Han Harbin Engineering University, China
Wanxiang, Che Harbin Institute of Technology, China
Program Chairs
Hongzhi, Wang Harbin Institute of Technology, China
Shaoliang, Peng National University of Defense Technology, China
Junyu, Lin Harbin Engineering University, China
Organization Chairs
Hongtao, Song Harbin Engineering University, China
Zeguang, Lu Sea of Clouds and Computer Technology Services Ltd.,
China
Publication Chairs
Guanglu, Sun Harbin University of Science and Technology, China
Zhaowen, Qiu Northeast Forestry University, China
Publication Co-chairs
Weipeng, Jing Northeast Forestry University, China
Xianhua, Song Harbin University of Science and Technology, China
Education Chairs
Yingtao, Zhang Harbin Institute of Technology, China
Zhongyang, Han Heilongjiang Institute of Technology, China
VIII Organization
Industrial Chair
Jiquan, Ma Heilongjiang University, China
Demo Chairs
Changjian, Zhou Northeast Agricultural University, China
Qi, Han Harbin Institute of Technology, China
Panel Chairs
Haiwei, Pan Harbin Engineering University, China
Hui, Gao Harbin Huade University, China
Registration/Financial Chairs
Yong, Wang Harbin Engineering University, China
Fa, Yue Sea of Clouds and Computer Technology Services Ltd.,
China
Post/Expo Chair
Tingting, Chen SuperMap Software Co., Ltd
Program Committee
Tian, Bai Jilin University, China
Zhifeng, Bao University of Tasmania, Australia
Jiajun, Bu Zhejiang University, China
Zhipeng, Cai Georgia State University, USA
Wanxiang, Che Harbin Institute of Technology, China
Xuebin, Chen Hebei United University, China
Wenliang, Chen Soochow University, China
Siyao, Cheng Harbin Institute of Technology, China
Dansong, Cheng Harbin Institute of Technology, China
Yuan, Cheng Harbin University of Science and Technology, China
Yan, Chu Harbin Engineering University, China
Lei, Cui Microsoft Research
Beiliang, Cui Nanjing Tech University, China
Bin, Cui Peking University, China
Jianrui, Ding Harbin Institute of Technology, China
Minghui, Dong Institute for Infocomm Research, Singapore
Xunli, Fan Northwest University, China
Chunxiang, Fan University of Ulm, Germany
Guangsheng, Feng Harbin Engineering University, China
Yansong, Feng University of Edinburgh, UK
Guohong, Fu Heilongjiang University, China
Hui, Gao Harbin Huade University, China
Shang, Gao Jilin University, China
Jing, Gao University at Buffalo, USA
Dianxuan, Gong North China University of Science and Technology,
China
Yi, Guan Harbin Institute of Technology, China
Quanlong, Guan Jinan University, China
Yuhang, Guo Beijing Institute of Technology, China
X Organization
Research Track
MapReduce for Big Data Analysis: Benefits, Limitations and Extensions . . . . 453
Yang Song, Hongzhi Wang, Jianzhong Li, and Hong Gao
Education Track
Industry Track
Demo Track
The BBC News Hunter: A Novel Crawler for BBC News . . . . . . . . . . . . . . 217
Mingxin Wang, Ning Wang, Boran Wang, Can Tian, Yanchun Liang,
Guozhong Zhao, and Xiaosong Han
1 Introduction
The anaphoric zero pronoun *pro* is coreferring to noun phrase NP1. Chinese zero
pronouns have been studied in syntactic level and features are extracted from syntactic
trees [1]. In addition, tree kernel method [3] is proposed as well. These methods may
not work well when the syntactic tree is simple such as conversation text. And using
syntactic features tends to generate inconsistent samples because two sub-trees with
same structure may have opposite labels. Furthermore, the lexical and semantic
information are ignored. To overcome these shortcomings, we use distributed repre-
sentations to build a context-aware model. Distributed representations are called word
embeddings. Each dimension of the embedding represents a latent feature of the word.
Distributed representations are used in many NLP tasks and are proved to be good at
capturing syntactic and semantic regularities in language [4].
Our motivation is to search for the relation of distributed representations between
AZP’s antecedent and context below the AZP position. We propose a strategy to
extract key words from the context and candidate antecedents. By using key words’
distributed representations, we train a classifier to identify whether a candidate is the
real antecedent of a certain AZP. The experiment results show our model outperforms
previous supervised methods.
The rest of this paper is organized as follows. In the Sect. 2, we introduce previous
works. In Sect. 3 we describe how to use distributed representations to build our
context-aware model. In Sect. 4, we introduce the baseline methods and report the
experiment results. Finally, we conclude our work and forecast future work.
2 Related Work
Chinese Zero Pronoun. Previous methods about resolution of zero pronoun are
rule-based. Converse employ Hobb’s algorithm [6] and select antecedent by syntactic
structure in the Chinese Tree Bank documents. Maximum entropy models are also used
to find the antecedents for overt, third-person pronouns [5]. Yeh and Chen [7] employ
Centering Theory (Grosz et al. [8]) and constraint rules to identify the antecedents of
zero anaphors by using shallow parsing.
Zhao and Ng [1] propose a feature-based method which is the first supervised
machine learning approach. They extract a set of syntactic and positional features in
zero pronoun identification and resolution tasks. The two tasks are regarded as binary
classification problems respectively. Kong and Zhao [3] propose a tree-kernel method
to resolve Chinese zero pronoun with appropriate syntactic parse tree structures. They
build a unified framework dealing with zero pronoun identification, anaphoricity
determination and antecedent selection by using tree-kernel method. Chen and Ng [9]
extend Zhao and Ng [1] feature set and exploit the coreference links between zero
pronouns. Chen and Ng [10] propose an unsupervised approach by using ranking
model and Integer Linear Programming. They assume that zero pronouns and overt
pronouns have similar distributions and train an unsupervised model to resolve Chinese
zero pronoun. Rao et al. [11] builds a novel model that tracks the flow of focus in a
discourse. Chen and Ng [12] propose an unsupervised probabilistic model for zero
pronoun resolution.
A Context-Aware Model Using Distributed Representations 5
In this section, we show how to build a context-aware model to resolve Chinese zero
pronoun. We define the task of antecedent selection as a binary classification problem.
We employ a strategy to generate antecedent candidates and create positive or negative
samples. A heuristic method of extracting key words from context is proposed and an
instance form is designed in order to transform key words to representation features.
Finally, we build a binary classifier to identify candidate antecedents.
Fig. 1. The parse tree that corresponds to the AZP example in Sect. 3.2
fit all sentence patterns. According to our data analysis, we find that the verbs and
nouns close to AZP tend to have semantic links with the antecedent more than other
words. We define m as the maximal number of key words extracted from context after
AZP’s position in a sentence. The extraction rules are as follows:
(1) Extract verbs and nouns from AZP’s following context sequentially.
(2) If the total number of key words not exceeds m, fill symbol ‘*’ until the total
number equals m.
(3) If the total number of key words exceeds m, delete words back to front until the
total number equals m, nouns have higher priority to be deleted than verbs,
modifier nouns have higher priority to be deleted than terminal nouns.
Additionally, because of the word numbers of candidate antecedents are different
we define the maximal number of candidate’s words as c. We extract c words from
every candidate. If the total number of candidate is over c, keep the last c words; else
fill symbol ‘*’ in front of the candidate until the word number is c.
In OntoNotes 5.0 corpus, AZP has annotation with its antecedent. So we can get tag
t of every sample easily. In a supervised task, we define the form of every sample as [t,
a1,…, ac, w1,…, wm]. In this form, t is the tag of sample, ai present the key word in
AZP’s candidate antecedent, wi is the key word from AZP’s following context. For
example, in Fig. 1, let m = 6, c = 3, we get the forms:
A Context-Aware Model Using Distributed Representations 7
From example above, the two words 进出口 (import and export) and 贸易 (trade)
from antecedent have closely relation with the word 进出口 (import and export) from
context in semantic level. On the contrary, the words 中国 (China) and 产品 (products)
have little relation with key words in AZP’s following context like words 占 (repre-
sents) and 上升 (increasing).
expðuTwj vwj Þ
Pðwi jwj Þ ¼ PW ð1Þ
w¼1 expðuwj vwj Þ
T
We define the dimension of every word embedding as d. All the words in the
samples are replaced by its distributed representations and every sample is transformed
as a feature vector with (c + m) * d dimensions. The symbol ‘*’ in samples should be
replaced by zero vector (with d dimensions). We use SVM model to train a binary
classifier. This classifier is used to identify whether a candidate is the AZP’s
antecedent.
8 B. Wu and T. Zhao
4 Experiment
4.1 Dataset
We employ Chinese portion of the OntoNotes 5.0 corpus which was used in the official
CoNLL-2012 [21] task. Because only the training set and development set contain ZP
coreferential annotation in CoNLL-2012 shared dataset, we employ training set for
model training and development set for model evaluation. In OntoNotes 5.0 corpus, a
ZP is tagged as *pro* and all ZPs that have explicit coreferential annotation are
regarded as anaphoric ZPs. The Chinese portion of OntoNotes 5.0 contains six types of
source: Broadcast News (BN), Newswire (NW), Broadcast Conversation (BC),
Magazine (MZ), Telephone Conversation (TC), Web Blog (WB).
Baseline Systems. We introduce three baseline systems which are all supervised
machine learning models. (1) Zhao and Ng [1]. They propose a method of generating
candidate antecedents and design 26 features for the zero pronoun and candidate. Soon
et al.’s method [22] is used to create positive and negative instances. The description of
these features is listed in Table 1 [10]. (2) Kong and Zhao [3]. They use tree-kernel
method to resolve Chinese zero pronoun. They use Z&N’s method to generate can-
didate antecedents and use Soon’s method to create instances. (3) Chen and Ng [9].
They extend Z&N’s method and add features about the contextual information between
candidate antecedents and ZPs. They also create coreference links to resolve far-away
antecedents for ZPs.
Table 1. Features for AZP resolution in the Zhao and Ng baseline system. z is a zero pronoun.
a is a candidate antecedent of z. V is the VP node following z in the parse tree.
Features between The sentence distance between a and z; the segment distance between
a and z (4) a and z, where segments are separated by punctuations; whether
a is the closest NP to z; whether a and z are siblings in the
associated parse tree
Features on a (12) Whether a has an ancestor NP, and if so, whether this NP is a
descendent of a’s lowest ancestor IP; whether a has an ancestor
VP, and if so, whether this VP is a descendent of a’s lowest
ancestor IP; whether a has an ancestor CP; the grammatical role of
a; the clause type in which a appears; whether a is an adverbial NP,
a temporal NP, a pronoun or a named entity; whether a is in the
headline text
Features on z (10) Whether V has an ancestor NP, and if so, whether this NP node is a
descendent of V’s lowest ancestor IP; whether V has an ancestor
VP, and if so, whether this VP is a descendent of V’s lowest
ancestor IP; whether V has an ancestor CP; the grammatical role of
z; the type of the clause in which V appears; whether z is the first or
last ZP of the sentence; whether z is in the headline of the text
A Context-Aware Model Using Distributed Representations 9
The baseline systems above have the same strategy of generating candidate ante-
cedents and instances, but use different ways to classify whether a candidate antecedent
have coreference with an AZP. The three methods all use SVM model to train the binary
classifier while Z&N and C&N using SVMlight tool and K&Z using SVMlight−TK. Our
linear model uses the same data and same strategies of generating candidate antecedents
and instances with the baseline methods. Therefore, we use Chen and Ng [10] experi-
mental results of baselines as the comparative data in Table 2.