Professional Documents
Culture Documents
Data Science Analytics and Applications Proceedings of The 1st International Data Science Conference iDSC2017 1st Edition Peter Haber
Data Science Analytics and Applications Proceedings of The 1st International Data Science Conference iDSC2017 1st Edition Peter Haber
Data Science Analytics and Applications Proceedings of The 1st International Data Science Conference iDSC2017 1st Edition Peter Haber
https://ebookstep.com/product/intro-to-python-for-computer-
science-and-data-science-learning-to-program-with-ai-big-data-
and-the-cloud-1st-edition-deitel/
https://ebookstep.com/download/ebook-51584990/
https://ebookstep.com/download/ebook-33607582/
https://ebookstep.com/download/ebook-43189802/
Inclusive Development of Society-Proceedings of the 6th
International Conference on Management and Technology
in Knowledge, Service, Tourism & Hospitality (SERVE
2018) 1st Edition Ford Lumban Gaol
https://ebookstep.com/product/inclusive-development-of-society-
proceedings-of-the-6th-international-conference-on-management-
and-technology-in-knowledge-service-tourism-hospitality-
serve-2018-1st-edition-ford-lumban-gaol/
https://ebookstep.com/product/ai-in-marketing-sales-and-service-
how-marketers-without-a-data-science-degree-can-use-ai-big-data-
and-bots-gentsch/
https://ebookstep.com/product/statistique-et-data-science-avec-r-
francois-husson/
https://ebookstep.com/product/qualitat-und-data-science-in-der-
marktforschung-bernhard-keller/
https://ebookstep.com/product/data-science-on-the-google-cloud-
platform-2nd-edition-third-early-release-valliappa-lakshmanan/
Peter Haber
Thomas Lampoltshammer
Manfred Mayr Eds.
Data Science –
Analytics
and Applications
Proceedings of the 1st International Data
Science Conference – iDSC2017
Data Science – Analytics and Applications
Peter Haber · Thomas Lampoltshammer · Manfred Mayr
(Eds.)
Thomas Lampoltshammer
Department für E-Governance in Wirtschaft und Verwaltung
Donau-Universität Krems, Krems an der Donau / Österreich
Die Deutsche Nationalbibliothek verzeichnet diese Publikation in der Deutschen Nationalbibliografie; detaillierte bibliogra
fische Daten sind im Internet über http://dnb.d-nb.de abrufbar.
Springer Vieweg
© Springer Fachmedien Wiesbaden GmbH 2017
Das Werk einschließlich aller seiner Teile ist urheberrechtlich geschützt. Jede Verwertung, die nicht ausdrücklich
vom Urheberrechtsgesetz zugelassen ist, bedarf der vorherigen Zustimmung des Verlags. Das gilt insbesondere für
Vervielfältigungen, Bearbeitungen, Übersetzungen, Mikroverfilmungen und die Einspeicherung und Verarbeitung in
elektronischen Systemen.
Die Wiedergabe von Gebrauchsnamen, Handelsnamen, Warenbezeichnungen usw. in diesem Werk berechtigt auch ohne
besondere Kennzeichnung nicht zu der Annahme, dass solche Namen im Sinne der Warenzeichen- und Markenschutz-
Gesetzgebung als frei zu betrachten wären und daher von jedermann benutzt werden dürften.
Der Verlag, die Autoren und die Herausgeber gehen davon aus, dass die Angaben und Informationen in diesem Werk zum
Zeitpunkt der Veröffentlichung vollständig und korrekt sind. Weder der Verlag noch die Autoren oder die Herausgeber über-
nehmen, ausdrücklich oder implizit, Gewähr für den Inhalt des Werkes, etwaige Fehler oder Äußerungen. Der Verlag bleibt
im Hinblick auf geografische Zuordnungen und Gebietsbezeichnungen in veröffentlichten Karten und Institutionsadressen
neutral.
It is with deep satisfaction that we write this foreword for the Proceedings of the 1st International Data
Science Conference (iDSC) held in Salzburg, Austria, June 12th - 13th 2017. The conference program
and the resulting proceedings represent the efforts of many people. We want to express our gratitude
towards the members of our program committee as well as towards our external reviewers for their hard
work during the reviewing process.
iDSC proofed itself as an innovative conference, which gave its participants the opportunity to delve
into state-of-the-art research and best practice in the fields of Data Science and data-driven business
concepts. Our research track offered a series of presentations by Data Science researchers regarding
their current work in the fields of Data Mining, Machine Learning, Data Management, and the entire
spectrum of Data Science.
In our industry track, practitioners demonstrated showcases of data-driven business concepts and how
they use Data Science to achieve organisational goals, with a focus on manufacturing, retail, and
financial services. Within each of these areas, experts described their experience, demonstrated their
practical solutions, and provided an outlook into the future of Data Science in the business domain.
Besides these two parallel tracks, a European symposium on Text and Data Mining has been integrated
into the conference. This symposium highlighted the EU project FutureTDM, granting insights into the
future of Text and Data Mining, and introducing overarching policy recommendations and sector-
specific guidelines to help stakeholders overcome the legal and technical barriers, as well the lack of
skills that have been identified.
Our sponsors had their own, special platform via workshops to provide hands-on interaction with tools
or to learn approaches towards concrete solutions. In addition, an exhibition of products and services
offered by our sponsors took place throughout the conference, with the opportunity for our participants
to seek contact and advice.
Completing the picture of our program, we proudly presented keynote presentations from leaders in
Data Science and data-driven business, both researchers and practitioners. These keynotes provided all
participants the opportunity to come together and shared views on challenges and trends in Data Science.
In addition to the contributed papers, five invited keynote presentations were given by: Euro Beinat (CS
Research, Salzburg University), Mario Meir-Huber (Microsoft Austria), Mike Olson (Cloudera), Ralf
Klinkenberg (RapidMiner) and Janek Strycharz (Digital Center Poland). We thank the invited speakers
for sharing their insights with our community.
The conference chair John Thompson has also helped us in many ways setting up the industry track, for
which we are grateful. We would especially like to thank our two colleagues, Astrid Karnutsch and
Maximilian Tschuchnig, for their enormous and constructive commitment to organizing and conducting
the conference. The paper submission and reviewing process was managed using the EasyChair system.
These proceedings will provide scientists and practitioners with an excellent reference to current
activities in the Data Science domain. We trust also that this will be an impetus to stimulate further
studies, research activities and applications in all discussed areas ensured by the support of our publisher
Springer / Vieweg Wiesbaden Germany.
Finally, again, the conference would not be possible without the excellent papers contributed by our
authors. We thank them for their contributions and their participation at iDSC·17.
FutureTDM is a european project focusing on reducing barriers and increasing uptake of Text and
Data Mining (TDM) for research environments in Europe. The outcomes of the project were
presented in the Symposium which has also served to connect key actors and interest groups and
promote open dialogue via discussion panels and informal workshops. The FTDM Symposium
was scheduled alongside iDSC 2017, given that both events address similar target groups and
share a common perspective: they both aimed at creating a communication network among the
members of the TDM community, where experts can exchange ideas and share the most up-to-date
research results, as well as legal and industrial advances relevant to TDM. The audience targeted
by the iDSC conference was the broad community of researchers and industry practitioners as well
as other practitioners and stakeholders, making it ideal for disseminating the project’s results.
The project’s objective has been to detect the barriers to TDM, reveal best practices and put
together sets of recommendations for TDM practitioners through a collaborative knowledge and
open information approach. The barriers recorded were grouped around four pillars: a) legal, b)
economic, c) skills, d) technical. These categories emerged after discussions with respective
stakeholders such as researchers, developers, publishers and SMEs during Knowledge CafØs run
across Europe (the Netherlands, the United Kingdom, Italy, Slovenia, Germany, Poland etc) and
two workshops held in Brussels1 (on September, 27th 2016 and March, 29th 2017).
The Symposium2 was a chance to invite experts from all over Europe to share their experience and
expertise in different domains. It was also a great opportunity to announce the guidelines and
recommendations formulated in order to increase TDM uptake. It started with a brief introduction
by Bernhard Jger (SYNYO)3 underlying the need to bring together different groups of
stakeholders, such as policy makers and legislators, developers and users who would benefit from
the project’s findings and the respective recommendations formed by the FTDM working groups.
It continued with a keynote speech by Janek Strycharz (Projekt Polska Foundation) dedicated to
the Economic Potential of Data Analytics. Janek Strycharz elaborated on different types of Big
Data and the variety of possibilities they offer and explained how that at a global and european
scale there could be a benefit from Big Data and TDM (the European GDP alone would be
increased by USD 200 billion).
1
FutureTDM Workshop I and II outcomes can be found at http://www.futuretdm.eu/knowledge-
cafes/futuretdm-workshop/
http://www.futuretdm.eu/knowledge-cafes/futuretdm-workshop-2/
2
All presentation slides are available online at www.slideshare.net/FutureTDM/presentations
3
Presentation on Introduction to the FutureTDM project is available at
https://www.slideshare.net/FutureTDM/introduction-to-the-future-tdm-project
VIII Future TDM
The first session entitled “Data Analytics and the Legal Landscape: Intellectual Property and
Data Protection” included Freyja van den Boom, researcher from Open Knowledge
International/Content Mine who presented the legal barriers identified and the respective
recommendations created under the subject "Dealing with the legal bumps on the road to further
TDM uptake". The focus of the presentation was on the principles identified to counterbalance
barriers: Awareness and Clarity, TDM Without Boundaries, and Equitable Access. The session
was chaired by Ben White (Head of Intellectual Property at the British Library) and included the
following panelists: i) Duncan Campbell (John Wiley & Sons, Inc.), representing the publisher’s
perspective, ii) Prodromos Tsiavos (Onassis Cultural Centre/IP Advisor), providing an
organization’s point of view, iii) Marie Timmermann (Science Europe), offering her point of view
as the EU Legislation and Regulatory Affairs Officer and iv) Romy Sigl (AustrianStartups) sharing
her experience from startUps. The discussion revolved around regulations which must address the
implementation of the law and its exceptions, copyright issues, the distinction between commercial
and noncommercial activities, the need for better communication between different groups of
stakeholders and the importance and value of TDM for publishers.
During the following session the projects ContentMine (Stefan Kasberger), PLAZI (Donat Agosti),
CORE (Petr Knoth), RapidMiner (Ralf Klinkenberg), clarin:el (Maria Gavrilidou) and ALCIDE
(Alessio Palmero Aprosio) were introduced and the presenters were accessible for a more detailed
presentation of their work to the attendees who would be interested in learning more. The
researchers shared their experience on technical and legal problems they have encountered
demonstrating the TDM applications and infrastructures they had created.
The next session offered an overview of FTDM case studies from Startups to Multinationals.
The presentation entitled "Stakeholder consultations - The Highlights" was given by Freyja van
den Boom (Open Knowledge International/Content Mine) who talked about the findings from
continuous stakeholder consultations throughout the project. The session was chaired by Maria
Eskevich (Radboud University) and included as panelists Donat Agosti (PLAZI), Petr Knoth
(CORE), Kim Nilsson (PIVIGO), and Peter Murray-Rust (ContentMine). The issues raised during
discussion pinpointed the need for realistic solutions to infrastructures, community engagement,
and open source and data.
Kiera McNeice (British Library) was the presenter in the fourth session and her presentation was
entitled "Supporting TDM in the Education Sector". The session focusing on “Universities, TDM
and the need for strategic thinking on educating researchers” was chaired by Ben White (Head
of Intellectual Property at the British Library) and panelists Claire Sewell (Cambridge University
Library), Jonas Holm (Stockholm University Library), and Kim Nilsson (PIVIGO). The discussion
which followed touched upon issues such as the future of Data Science and the nature of Data
Scientists. Some of the key concepts which were discussed were that of inclusion and diversity,
gender imbalance and nationality characteristics, which all affect access to Data Science and the
ability to become a Data Scientist. Concerns were expressed as to whether anyone could become a
Data Scientist, and whether the focus should be on becoming a Data Scientist or a more efficient
TDM user.
The challenges and solutions regarding technologies and infrastructures supporting Text and
Data Analytics was the topic of the fifth session, the main presenter of which was Maria Eskevich
(Radboud University). She focused on "The TDM Landscape: Infrastructure and Technical
Implementation" and touched upon the business and scientific perspectives on TDM by showing
the investment made by the EU in the five economic sectors. She also talked about the
barriers/challenges encountered in terms of accessibility and interoperability of infrastructures,
sustainability of data and digital readiness of language resources. The following discussion,
chaired by Stelios Piperidis (ARC) with Mihai Lupu (Data Market Austria ), Maria Gavrilidou
(clarin: el) and Nelson Silva (know-centre) revolved around real TDM problems and the solutions
the researchers came up with and close with the requirements of an effective TDM infrastructure.
Future TDM IX
The final session of the Symposium was dedicated to the Next Steps: A Roadmap to promoting
greater uptake of Data Analytics in Europe. A presentation was made by Kiera McNeice
(British Library) who briefly summarised what the project has achieved so far and focussed on the
key principles from the FutureTDM Policy Framework4 which must underlie all the efforts to be
made in the future in Legal Policies, Skills and Education, Economy and Incentives and Technical
and Infrastructure.
The Symposium close with a presentation of Bernhard Jger and Burcu Akinci (SYNYO) of the
FutureTDM platform (http://www.futuretdm.eu/), which is populated with the project outcomes
and findings.The platform will continue to exist after the end of the project and will be
continuously revised and updated in order to maintain a coherent and up‐to‐date view on the TDM
landscape open to the public.
Kornella Pouli
Athena RIC/ILSP, Athens
Burcu Akinci
SYNYO GmbH, Vienna
4
http://www.futuretdm.eu/policy-framework/
Organisation
Organising Institutions
Salzburg University of Applied Sciences
Information Professionals GmbH
Conference Chairs
Peter Haber Salzburg University of Applied Sciences
Thomas J. Lampoltshammer Danube University Krems
Manfred Mayr Salzburg University of Applied Sciences
John A. Thompson Information Professionals GmbH
Organising Committee
Peter Haber Salzburg University of Applied Sciences
Astrid Karnutsch Salzburg University of Applied Sciences
Thomas J. Lampoltshammer Danube University Krems
Manfred Mayr Salzburg University of Applied Sciences
John A. Thompson Information Professionals GmbH
Susanne Schnitzer Information Professionals GmbH
Maximilian E. Tschuchnig Salzburg University of Applied Sciences
Program Committee
David C. Anastasiu San Jose State University
Vera Andrejcenko University of Antwerp
Christian Bauckhage University of Bonn
Markus Breunig Rosenheim University of Applied Sciences
Stefanie Cox IT Innovation Centre
Werner Dubitzky University of Ulster, Coleraine
Gnther Eibl Salzburg University of Applied Sciences
Sleyman Eken University Kocaeli
Karl Entacher Salzburg University of Applied Sciences
Edison Pignaton de Freitas Federal University of Rio Grande do Sul
Bernhard Geissler Danube University Krems
Charlotte Gerritsen Netherlands Institute for the Study of Crime and Law
Enforcement (NSCR)
Mohammad Ghoniem Luxembourg Institute of Science and Technology
Peter Haber Salzburg University of Applied Sciences
Johann Hchtl Danube University Krems
Martin Kaltenbck Semantic Web Company
Astrid Karnutsch Salzburg University of Applied Sciences
Elmar Kiesling Vienna University of Technology
Robert Krimmer University of Tallinn
Peer Krger Ludwig-Maximilians-Universitt Mnchen
Thomas J. Lampoltshammer Danube University Krems
Michael Leitner Louisiana State University
Giuseppe Manco University of Calabria
Manfred Mayr Salzburg University of Applied Sciences
Mark-David McLaughlin Bentley University
Robert Merz Salzburg University of Applied Sciences
Elena Lloret Pastor University of Alicante
Cody Ryan Peeples Cisco
Gabriela Viale Pereira Fundaªo Getœlio Vargas – EAESP
Peter Ranacher University of Zurich
Siegfried Reich Salzburg Research Forschungsgesellschaft mbH
Eric Rozier Iowa State University
Johannes Scholz Graz University of Technology
Maximilian E. Tschuchnig Salzburg University of Applied Sciences
Jrgen Umbrich Vienna University of Economics and Business
Andreas Unterweger Salzburg University of Applied Sciences
Eveline Wandl-Vogt Austrian Academy of Sciences
Stefan Wegenkittl Salzburg University of Applied Sciences
Stefanie Wiegand IT Innovation Centre / University of Southampton
Peter Wild Austrian Institute of Technology
Radboud Winkels University of Amsterdam
Anneke Zuiderwijk - van Eijk Delft University of Technology
Reviewer
David C. Anastasiu San Jose State University
Christian Bauckhage University of Bonn
Markus Breunig Rosenheim University of Applied Sciences
Cornelia Ferner Salzburg University of Applied Sciences
Werner Dubitzky University of Ulster, Coleraine
Gnther Eibl Salzburg University of Applied Sciences
Karl Entacher Salzburg University of Applied Sciences
Bernhard Geissler Danube University Krems Hchtl
Martin Kaltenbck Semantic Web Company
Peer Krger Ludwig-Maximilians-Universitt Mnchen
Thomas J. Lampoltshammer Danube University Krems
Michael Leitner Louisiana State University
Elena Lloret Pastor University of Alicante
Manfred Mayr Salzburg University of Applied Sciences
Robert Merz Salzburg University of Applied Sciences
Edison Pignaton de Freitas Federal University of Rio Grande do Sul
Siegfried Reich Salzburg Research Forschungsgesellschaft mbH
Eric Rozier Iowa State University
Johannes Scholz Graz University of Technology
Maximilian E. Tschuchnig Salzburg University of Applied Sciences
Jrgen Umbrich Vienna University of Economics and Business
Andreas Unterweger Salzburg University of Applied Sciences
Stefan Wegenkittl Salzburg University of Applied Sciences
6SRQVRUVRIWKHFRQIHUHQFH
Platinum Sponsors
Cloudera GmbH
Apache Hadoop-based software,
f ware, support
soft
and services, and training
www.cloudera.com
Silver Sponsors
F&F GmbH
IT consulting, solutions and Big Data Analytics
www.fff-muenchen.de
www.ff-muenchen.de
RapidMiner GmbH
Data science software
f ware platform for data
soft
preparation, machine learning, deep learning,
text mining, and predictive analytics
www.rapidminer.com
Table of Content
Circadian Cycles and Work Under Pressure: A Stochastic Process Model for E-learning
Population Dynamics
Internetanalysetechniken, konzipiert zur Quantifizierung von Internetnutzungsmustern, erlauben ein
tieferes Verstndnis menschlichen Verhaltens. Neueste Modelle menschlicher Verhaltensdynamiken
haben gezeigt, dass im Gegensatz zu zufllig verteilten Ereignissen, Menschen Ttigkeiten ausben, die
schubweises Verhalten aufweisen. Besonders die Teilnahme an Internetkursen zeigt hufig Zeitrume
von Inaktivitt und Prokrastination gefolgt von hufigen Besuchen kurz vor den Prfungen. Hier
empfehlen wir ein stochastisches Prozessmodell, welches solche Muster kennzeichnet und
Tagesrhythmen menschlicher Aktivitten einbezieht. Wir bewerten unser Modell anhand von realen
Daten, die whrend einer Zeitspanne von zwei Jahren auf einer Plattform fr Universittskurse
gesammelt wurden. Anschlieend schlagen wir ein dynamisches Modell vor, welches sowohl
Prokrastinationszeitrume als auch Zeitrume des Arbeitens unter Zeitdruck bercksichtigt. Da
Tagesrhythmen und Prokrastination-Druck-Kreislufe wesentlich fr menschliches Verhalten sind,
kann unsere Methode auf andere Ttigkeiten ausgeweitet werden, wie zum Beispiel die Auswertung von
Surfgewohnheiten und Kaufverhalten von Kunden.
Towards German Word Embeddings: A Use Case with Predictive Sentiment Analysis
Trotz des Forschungsbooms im Bereich Worteinbettungen und ihrer Textmininganwendungen der
letzten Jahre, konzentriert sich der Groteil der Publikationen ausschlielich auf die englische Sprache.
Auerdem ist die Hyperparameterabstimmung ein Prozess, der selten gut dokumentiert (speziell fr
nicht-englische Texte), jedoch sehr wichtig ist, um hochqualitative Wortwiedergaben zu erhalten. In
dieser Arbeit zeigen wir, wie verschiedene Hyperparameterkombinationen Einfluss auf die
resultierenden deutschen Wortvektoren haben und wie diese Wortwiedergaben Teil eines komplexeren
Modells sein knnen. Im Einzelnen fhren wir als erstes eine intrinsische Bewertung unserer deutschen
Worteinbettungen durch, die spter in einem vorausschauenden Stimmungsanalysemodell verwendet
werden. Letzteres dient nicht nur einer intrinsischen Bewertung der deutschen Worteinbettungen,
sondern zeigt auerdem, ob Kundenwnsche nur durch das Einbetten von Dokumenten vorhergesagt
werden knnen.
German Abstracts 5
Feature Extraction and Large Activity-Set Recognition Using Mobile Phone Sensors
Diese Arbeit beschftigt sich mit dem Problem der Aktivittserkennung unter Verwendung von Daten,
die vom Mobiltelefon des Benutzers erhoben wurden. Wir beginnen mit der Betrachtung und Bewertung
der Beschrnkungen der gngigen Aktivittserkennungsanstze fr Mobiltelefone. Danach stellen wir
unseren Ansatz zur Erkennung einer groen Anzahl von Aktivitten vor, welche die meisten
Nutzeraktivitten abdeckt. Auerdem werden verschiedene Umgebungen untersttzt, wie zum Beispiel
zu Hause, auf Arbeit und unterwegs. Unser Ansatz empfiehlt ein einstufiges Klassifikationsmodell, dass
die Aktivitten genau klassifiziert, eine groe Anzahl von Aktivitten umfangreich abdeckt und in realen
Umgebungen umsetzbar anzuwenden ist. In der Literatur gibt es keinen einzigen Ansatz, der alle drei
Eigenschaften in sich vereint. In der Regel optimieren vorhandene Anstze ihre Modelle entweder fr
einen oder maximal zwei der folgenden Eigenschaften: Genauigkeit, Umfang und Anwendbarkeit.
Unsere Ergebnisse zeigen, dass unser Ansatz ausreichende Leistung im Hinblick auf Genauigkeit bei
einem realistischen Datensatz erbringt, trotz deutlich erhhter Aktivittszahl im Vergleich zu gngigen
Modellen, die auf Aktivittserkennen basieren.
0.6
0.06
0.4
0.04
0.2
0.02
0.0
Oct Nov Dec Jan Feb Mar 0.00
0:00 3:00 6:00 9:00 12:00 15:00 18:00 21:00
2013
Semester Dates
(a) distribution of proxies over a semester (b
b) average distribution of visits per day
Fig. 1. Example of the temporal distribution of several proxies for the activity on a course related site in an eLearning system.
and determine θpr as those parameters that minimizes the Algorithm 1 Generating a Cox Process
following error Require: Semester τs , Distribution Parameters θpw , E
Td
1: {tˆi }E
i=1 ∼ Uniform(τs )
X
2: Ta ∼ P (T |θ)
D(t, θpr ) = (Vdd (τ ) − V̂dd (τ )[θpr ])2 (9)
3: U ∼ ∅
τ =1
4: for i ← 1, ..., E do
To minimize this expression, we resort to the Levenberg-
5: ui ∼ Uniform(0, 1)
Marquadt algorithm and let τ = 1, ..., τd vary over the whole
6: ri ∼ V (ti |Ta )
number of days of the semester. The fluctuating nature of the
7: if ui < ri then
distribution of visits is accounted for via a Gaussian noise
8: U ← U ∪ tˆi
assumption with variance σn2 .
9: end if
D. Piecewise PR for Course Workload 10: end for
11: return U
Although some courses will have the characteristics required
by the continuous model in (7), most of the behavior related
to a course will be characterized by rapid shocks occurring
F. PPRC
on days which define a deadline of some sort (examination
or course work). This behavior, however, can be easily in- Summarizing all of the above, we refer to our full model
corporated into (7) by requiring a high value of α. This will as the piecewise, procrastination reaction model (PPRC). The
produce a high visit rate on only a few days around β, up to complete set of parameters of our model is given by
the deadline. n o
To fully specify a semester we need then to define one such θpw = δ, V0 , σ0 , ν, ρ, Pd (t), λc , pc (12)
shock for each deadline and therefore assume
and characterizes the main aspects of human behavior for
t α
Vd (t) = Ta − (Ta − V0 )e−( ta −δ ) (10) the inter-event time distribution. Circadian characteristic are
captured by Pd (t), the reaction of the population towards a
if ta−1 < t < ta which provides a piecewise approximation of given task is parametrized by ν, ρ, and δ, baseline behaviors
our model, i.e. one solution of (5) for each course deadline, are expressed via V0 and σ0 and short term behaviors via λc
where t0 = 0 and ta defines the point in time of deadline and pc .
a ∈ {0, 1, ..., M } where M is the number of deadlines in
a course. For simplicity, we assume that all deadlines are III. M ETHODOLOGY
separated by δ days from day at which students begin to react Next, we outline the procedure we use for model fitting.
(reaction width). Finally, we let Ta be the number of visits First we show how to obtain the procrastination reaction
of the population at that deadline. Since we do not know in parameters and then introduce an algorithm for simulating a
advance how many visits will happen for a given deadline, Cox process. Finally, we discuss a training procedure based on
we model shocks in terms of random variables. From our simulated annealing which allows us to obtain the parameters
empirical data we found via Kolmogorov-Smirnoff tests that for the PPRC θpw as defined by (12).
the distribution that best fits our model follows the gamma
distribution A. Thinning Algorithm
ρν ν−1 −ρT In order to simulate and generate data from our model, we
Gamma(T |ρ, ν) = T e (11)
Γ(ν) proceed via a modification of the rejection sample algorithm
This is again very much in line with known models of human for point data, known as thinning.
attention [9]. Given our overall collection of observed data, we intent
Finally, we observe that this kind of assumption in which to generate a set of seconds {ti } ranging from 0 to τs
we define the intensity function of a inhomogeneous Poisson the total number of seconds in one semester. Traditionally,
process trough another stochastic process is known as a doubly inhomogeneous Poisson process generation [14], [15] requires
stochastic Poisson process or as a Cox process [13]. us to sample from a uniform Poisson distribution via a max-
imum intensity λ∗ , since an inhomogeneous Poisson process
E. Cascade Rates with intensity λ(s) requires that its number of events to be
R
In earlier related work [5], it has been pointed out that distributed via N (S) = λ(s)ds.
short term behaviors of user populations can be modeled via In our case however, we do not know the contribution to
a Poisson process in which another rate is imposed after the the distribution of the number of events in a cascade so we
initial visit. Since we do not know which of the visits will directly generate E events from a uniform distribution over the
generate a cascade of activities, we define a variable pc as interval (0, τs ). We then generate a gamma distributed sample
the proportion of initial Poisson events which give rise to Ta (see again (11)) and random noise from N (0, σn ). This
a cascade. Furthermore, λc defines the rate of the uniform then allows us to create the stochastic intensity function of
Poisson process which characterizes the Poisson distribution. the PPRC model.
Circadian Cycles and Work Under Pressure: A Stochastic Process Model for Elearning Population Dynamics 17
1.0 0.6
0.8
0.6 0.4
0.4 0.2
0.2
0.0 0.0
0 100 200 300 400 500 600 700 800 0 2 4 6 8 10 12 14
# Iterations Log Inter Event Time (s)
(a) Area test statistic obtained from iterating a simulated (b) Logaarithm of the cumulative distribution of the inter-
annealing procedure for diffferent
ferent temperature values. event tim
mes
Fig. 3. Exemplary results of the behavior of model in training and empirical data fitting.
B. Tr
Training
Given a sample of empirical point visits {te }, we next
wish to estimate the parameters of our model θpw which
best reproduce the data. The daily behavior of the model Fig. 4. Real distribution of visits over a period three weeks before an
as established by Pd (t) can be obtained directly from the examination deaddline related to one of the courses in our data set (low wer
panel), and simulaated point process of visits using the Cox process discusssed
histogram of the hours of {te }. To
To obtain the PPRC parameters in the text. Note that
t idle periods reflect reduced activities over night.
we first need to obtain the daily visits by a histogram of the
point data for each day.
We then obtain the peak values corresponding to the relevant
We we use simulaated
t d annealing
li [16] by b random
d di l
displacement t on
course work by normalizing the daily visits histogram and fined by the variables (V
the space defi V0 , σ0 , δ, pc , λc , E). AAs
consecutively choosing the biggest peaks of the distribution an example off the results of this model fitting procedure, w we
(located at {ta } with values {T
Ta }) until the cumulative distri- show the outccome w.r.t. a computer science course in Fig.. 3
bution posses a standard deviation bigger than 0.25, up to a where we com mpare the performance our model to a Paretto-
maximum of 24 diffferent
ferent peaks (which would correspond to and exponential distribution chosen as baseline models.
the 24 weeks per semester and a maximum of one homework
per week). IV. R ESULTS
We obtain the parameters of the gamma distribution us- In a series oof practical experiments, we trained our models
els
ing maximum likelihood estimation from the values of the for time markks of 20 diffferent
ferent courses. In each case, wwe
{TTa } found by this procedure. In order to train the re- initialized the model parameters to E = 4000, λr = 1500
maining parameters of our model, we require that the inter- pc = 0.7 V0 = 5 σ0 = 1. δ = 0.1 and used a total of 40 000
event distribution PM (u|θpw ) as sampled via the numerical iterations and an annealing temperature of 0.2. After training,
ng,
Cox algorithm reproduces the distribution of the empirical we obtained a average Kolmogoro
Kolmogorov-Smirnoffff (KS) divergen
nce
cumulative distribution PD (u). The objective function, we statistic of 0.003 ± 0.01 for the whole data set as well ass a
consider
R for for maximization s given by the area test statistic cascade rate oof cascade 1930 ± 691[s] for an average cascaade
A = |P PD (u) − PM (u|θpw )|du where we choose u = log(t) rate of 30 minnutes. Finally, the reaction width δ found in our
for numerical convenience. We We thus obtain the cumulative data was 2.32 ± 1.6 days before the deadline.
model distribution as a sample statistic form our model. As Table IV ppresents goodness-of-fit statistics for a random
Table
such, for a given values of the parameters diffferent
ferent samples selection of hhighly attended courses. Overall, the Pairwise
will generate diffferent
ferent values of the area statistic. In order Procrastinationn Reaction Cascade model proposed in this paper
to minimize the stochastic surface defined by the parameters, was found to fit (almost surprisingly) well to our empirical
18 C. Ojeda, R. Sifa and C. Bauckhage
R EFERENCES
[1] A. L. Barabasi, “The origin of bursts and heavy tails in human dynam-
ics,” Nature, vol. 435, no. 7039, pp. 207–211, 2005.
[2] F. Wu and B. Huberman, “Novelty and collective attention,” PNAS, vol.
104, no. 45, pp. 17 599–17 601, 2007.
[3] R. Sifa, F. Hadiji, J. Runge, A. Drachen, K. Kersting, and C. Bauckhage,
“Predicting Purchase Decisions in Mobile Free-to-Play Games,” in Proc.
of AAAI AIIDE, 2015.
[4] C. Ojeda, K. Cvejoski, R. Sifa, and C. Bauckhage, “Variable Attention
and Variable Noise: Forecasting User Activity,” in Proc. of LWDA
KDML, 2016.
Investigating and Forecasting User Activities in
Newsblogs: A Study of Seasonality, Volatility and
Attention Burst
Christian Bauckhage, César Ojeda and∗Rafet
, RafetSifa Christian Backhage and Rafet Sifa
Fraunhofer IAIS ∗ Fraunhofer IAIS, St. Augustin, Germany University of Bonn
César Ojeda Sifa∗† and Christian Bauckhage ∗†
Abstract—The study of collective attention is a major topic website as a reference, we might use such information as a
in the area of Web science as we are interested to know how general guideline for forecasting. Information regarding the
a particular news topic or meme is gaining or losing popularity population behavior over a website will provide the basis for
over time. Recent research focused on developing methods which
quantify the success and popularity of topics and studyied their understanding both the evolution of that particular website as
dynamics over time. Yet, the aggregate behavior of users across well as the possible success of the future content.
content creation platforms has been largely ignored even though The population behavior on a given website is unavoidably
the popularity of news items is also linked to the way users stochastic, as we cannot know a priori whether a particular user
interact with the Web platforms. In this paper, we present a or a set of users will visit or not (see Fig. 1 for an examplary
novel framework of research which studies the shift of attentions
of population over newsblogs. We concentrate on the commenting time series of activities of blog commenters). Statistical time
behavior of users for news articles which serves as a proxy for series analysis has a rich history of success in fields as diverse
attention to Web content. We make use of methods from signal as electronics, computer science, and economics and we know
processing and econometrics to uncover patterns in the behavior that, given that relevant information regarding the behavior
of users which then allow us to simulate and hence to forecast the of the system is properly modeled in the phenomena that is
behavior of a population once an attention shift occurs. Studying
a data set of over 200 blogs with 14 million news posts, we found measured, and randomness is realized under proper bounds,
periodic regularities in the commenting behavior. Namely, cycles estimates can be achieved and predictive analytics becomes
of 7 days as well as 24 days of activity which may be related to possible.
known scales of meme lifetimes. In this work we present a bottom up case study to analyze
attention of blog users (particulary commenters) to understand
I. I NTRODUCTION
and predict their activity patterns which exploits the fact that
Much recent research on Web analytics has concentrated many websites as well as blogs have years of user history
on developing theories of collective attention where the main which can be mined in search for relevant patterns.
object of study is the evolution of the popularity of topics,
ideas, or sets of news [1], [2]. In this context, the concept of II. R ELATED W ORK
a meme has arisen as the main atom of modern quantitative The study of information diffusion on the Web intents to
social science [1], [3] and researchers seek to understand model pathways and dynamics trough which ideas propagates.
whether a particular meme will remain popular and, if so, for One of the goals is to infer the structure of networks in
how long. Under this line of research, virality is the main which information propagates [4]. Researchers often try to
phenomenon to model. Usually, a contagious (i.e. network devise equations which govern the aggregate behavior [2];
based) approach is followed, and virality is literally treated these equations typically have parameters which depend on the
as an infection in a given population: as an item becomes population size, the rate of the spreading process, and universal
popular, i.e. as a piece of news is retweeted or discussed in features of how attention fades. They model time series which
the media, the population unit, whether blog or twitter account, show how much activity a particular topic or meme attract.
is considered infected. Forecasting can be then performed once the parameters of a
It is important to note that, this paradigm of research, particular population have been determined. In [5] Matsubara
ignores the impact of the source of the meme. The website et al. learned the parameters of the attention time series related
or content generating media plays a key role in generating (or to the first Harry Potter movie and then predicted the behavior
hindering) the evolution of a particular idea or topic. Naturally, for the next movies by measuring only the initial population
if a given website has a large number of users, it is likely, that reaction. More importantly, the natural limitations of human
certain topics become popular and capture the attention of the attention and human behavior have shown to define the overall
population. We might thus ask whether the baseline behavior behavior of the population as they access the information [6],
of the users of a website is enough to generate popularity. [7], [8], [9].
Is it the content which is popular or is it the website which Blog dynamics are traditionally studied in the context of
is popular? If we can use the baseline behavior of a given conversation trees established between different blogs [10],
number of blogs 203 Blog URL News Genre Viral Post Count
number of posts 713,122 le-grove.co.uk Sport 620
number of comments 14,883,752 politicalticker.blogs.cnn.com Politics 542
time span of activity 2006-2014 order-order.com Politics 294
sloone.wordpress.com Personal 248
technologizer.com Technology 136
TABLE I: Characteristics of the Wordpress Data Set snsdkorean.wordpress.com Entertainment 116
collegecandy.com Magazine 108
seokyualways.wordpress.com Personal 96
religion.blogs.cnn.com Religion 93
kickdefella.wordpress.com Personal 89
3000 http://politicalticker.blogs.cnn.com/
Number Of Comments
2500 Average Trend
Number Of Comments Ct
2000
1500
1000
500
20
10
0
10
201
1
201
2
201
2
201
2
201
3
201
3
201
3
201
4
201
4
201
4 Fig. 2: Seasoonality study through a semilog plot of the t
Oct Feb Jun Oct Feb Jun Oct Feb Jun Oct
Dates periodogram for
f estimation of the power spectral density from
om
Fig. 1: Time series of the number of comments Ct and bursts an example newsblog. Notice the presence of 3 frequencies ies
yt for a newsblog. The upper figure shows the daily fluctu- among the noisy behavior that account for commenting activiity
ations of activities (blue) and the monthly moving average in period of th
hree-and-a-half and seven days.
trend (red). The lower figure shows a burst representation that
allows for locating noticeable changes in the overall behavior.
behavior
If seasonal behavior is present, peaks
p will appear in tthe
periodigram. For example, Fig.2 show ws the periodogram forr a
then users will show more activity, increase their commenting time series of comment counts from m our dataset. Notice tthe
rate, and also share the content with others users which arree not peaks at frequencies of 0.13, 0.29 an
nd 0.31 which account ffor
part of the baseline set of followerrss of the blog. Consequently, periods in the comment patterns off weekly and half weekkly
some of the comments will appear as a result of a spreading seasonal behavior indicating that thee users of that particullar
process due to the web site followers. The popularity of web newsblog leave comments in three-and-a-half
and-a-half and seven daays
page trranslate
anslate into popularity of the content. frequencies.
Assuming a somewhat stable set of users has the added B. Burrst
st and Volatility
Volatility
advantage of exploiting seasonality efffects,fects, as people are
In order to characterize the popularity
arity of a particular newws
known to follow seasonal behavior [20]. Such periodicities can
source, we require a quantitative measurement
m of attention
help to better understand the repeating behavior (for instance
shifts. These can described via an iter-burst measure [21],
weekly visits of average users) as well as to forecast activities.
namely
V. T IME S ERIES A NALYSIS Ct − Ct−1
−
yt = . ((3)
Ct
Wee model the user commenting behavior as a discrete
W
stochastic process, i.e. as a collection of random variables Eq. (3) is also know as the logarith
hmic derivative of the prro-
X1 , X2 , X3 , ..., Xk indexed by time. Since we aim at attention cess Ct . Due to its rescaled nature, yt allows for comparison on
modeling, our first variable of study is Ct , the number of between diffferent
ferent newsblogs and timees, but, for our data set,, it
comments an entire blog has at time t. Considering daily provides limited value for forecastinng. We
We thus also considder
samples accords with the pace of publication in most blogs. the detrended comment count C˜t whiich is obtained after using
the Baxter-King filter (see the Appenndix).
A. Seasonality
C. Modeling V Volatility
olatility
One approach to time series prediction using stochastic
processes requires detecting periodicities. If are recovered Having introduced C˜t , we next neeed an approach that ccan
by detecting their frequencies and associated amplitudes, we account for fluctuations. Since afterr the filter is applied w
we
can predict the behavior at each cycle. A common method obtain a zero mean behavior, we can assume
a that the commennts
considers power spectra of the stochastic process at hand via behavior will vary as
the discrete Fourier transform Ct = σt t
C̃ ((4)
r T
1X where σt represents
p the standard deviiation of the fluctuation at
fx (ω) = xt exp −iωt
iωt. (1)
T t=1 time t and t ∼ N (0, 1) a noise value at time t, sampled fromma
normal distribution of mean 0 and varriance 1. In finance, wh hen
We then square the transform
We
modeling the fluctuations in returns (as opposed to C˜t ) σt is
1 known as the volatility. The simplest model for volatility occcur
fx (ω)fˆx (ω) I(ω) = (2)
2π in econometrics under the names off ARCH (autoregressiive
to obtain the so called periodogram of a time series. conditional heteroscedasticity) and generalized ARCH or
22 C. Ojeda, R. Sifa and C. Bauckhage
Probability
q
X q
X 0.0020
0.0005
The values of p and q determine how correlated σt is with
0.0000
0 200 400 600 800 1000
past fluctuations so that the model is called GARCH(p, q). Number Of Comments
To learn these parameters from data, we have to take into
To 0.007 Distribution of Comments From WP Selection
lognorm
account that for GARCH(1, 0) the square of the fluctuations 0.006
Empirical Distribution
Probability
values of the partial autocorrelations of the square variable 0.003
0.001
VI. M AIN R ESULTS
0.000
A. Attention Prroxy
oxy 0 200 400 600 800
Number Of Comments
1000 1200 1400
17 days 7 days
300
300 RealizedVolatility
250 GARCH p: 6 q: 6
250
200
Volatiilliitty σ
200 150
# Counts
100
150
50
100
0
1 1
2008 ov 2008 ai 2009 ov 2009 ai 2010 ov 2010 ai 201 ov 201 ai 2012
50
Mai N M N M N M N M
Dates
0
Fig. 230.—Characteristic
Jamaican and Haitian
Mollusca: A, Sagdae
pistylium Müll., Jamaica; B,
Chondropoma salleanum Pfr.,
San Domingo; C,
Eutrochatella Tankervillei
Gray, Jamaica; D, Cylindrella
agnesiana C. B. Ad.,
Jamaica.
The land operculates form the bulk of the land fauna, there being
actually 242 species, as against 221 of land Pulmonata, a proportion
never again approached in any part of the world. As many as 80 of
these belong to the curious little genus Stoastoma, which is all but
peculiar to the island, one species having been found in San
Domingo, and one in Porto Rico. Geomelania and Chittya, two
singular inland forms akin to Truncatella, are quite peculiar. Alcadia
reaches its maximum of 14 species, as against 4 species in San
Domingo and 9 species in Cuba, and Lucidella is common to San
Domingo only; but, if Stoastoma be omitted, the Helicinidae
generally are not represented by so many or by so striking forms as
in Cuba, which has 90 species, as against Jamaica 44, and San
Domingo 35.
(c) San Domingo, although not characterised by the extraordinary
richness of Cuba and Jamaica, possesses many specially
remarkable forms of land Mollusca, to which a thorough exploration,
when circumstances permit, will no doubt make important additions.
From its geographical position, impinging as it does on all the islands
of the Greater Antilles, it would be expected that the fauna of San
Domingo would not exhibit equal signs of isolation, but would appear
to be influenced by them severally. This is exactly what occurs, and
San Domingo is consequently, although very rich in peculiar species,
not equally so in peculiar genera. The south-west district shows
distinct relations with Jamaica, the Jamaican genera Leia,
Stoastoma, Lucidella, and the Thaumasia section of Cylindrella
occurring here only. The north and north-west districts are related to
Cuba, while the central district, consisting of the long band of
mountainous country which traverses the island, contains the more
characteristic Haitian forms.
The Helicidae are the most noteworthy of the San Domingo land
Mollusca. The group Eurycratera, which contains some of the finest
existing land snails, is quite peculiar, while Parthena, Cepolis,
Plagioptycha, and Caracolus here reach their maximum. The
Cylindrellidae are very abundant, but no section is peculiar. Land
operculates do not bear quite the same proportion to the Pulmonata
as in Cuba and Jamaica, but they are well represented (100 to 152);
Rolleia is the only peculiar genus.
The relations of San Domingo to the neighbouring islands are
considerably obscured by the fact that they are well known, while
San Domingo is comparatively little explored. To this may perhaps
be due the curious fact that there are actually more species common
to Cuba and Porto Rico (26) than to Porto Rico and San Domingo.
Cuba shares with San Domingo its small-sized Caracolus and also
Liguus, but the great Eurycratera, Parthena, and Plagioptycha are
wholly wanting in Cuba. The land operculates are partly related to
Cuba, partly to Jamaica, thus Choanopoma, Ctenopoma, Cistula,
Tudora, and many others, are represented on all these islands, while
the Jamaican Stoastoma occurs on San Domingo and Porto Rico,
but not on Cuba, and Lucidella is common to San Domingo and
Jamaica alone. An especial link between Jamaica and San Domingo
is the occurrence in the south-west district of the latter island of
Sagda (2 sp.). The relative numbers of the genera Strophia,
Macroceramus, and Helicina, as given below (p. 351), are of interest
in this connexion.
Porto Rico, with Vièque, is practically a fragment of San Domingo.
The points of close relationship are the occurrence of Caracolus,
Cepolis, and Parthena among the Helicidae, and of Simpulopsis,
Pseudobalea, and Stoastoma. Cylindrella and Macroceramus are
but poorly represented, but Strophia still occurs. The land
operculates (see the Table) show equal signs of removal from the
headquarters of development. Megalomastoma, however, has some
striking forms. The appearance of a single Clausilia, whose nearest
relations are in the northern Andes, is very remarkable. Gaeotis,
which is allied to Peltella (Ecuador only), is peculiar.
Fig. 231.—Examples of West Indian
Helices: A, Helix (Parthena)
angulata Fér., Porto Rico; B,
Helix (Thelidomus) lima Fér.,
Vièque; C, Helix (Dentellaria) nux
denticulata Chem., Martinique.
Land Mollusca of the Greater Antilles
Cuba. Jamaica. S. Domingo. Porto Rico.
Glandina 18 24 15 8
Streptostyla 4 ... 2 ...
Volutaxis ... 11 (?) 1 ...
Selenites 1 ... ... ...
Hyalinia 4 11 5 6
Patula 5 1 ... ...
Sagda ... 13 2 ...
Microphysa 7 18 8 3
Cysticopsis 9 6 ... ...
Hygromia (?) ... ... 3 ...
Leptaxis (?) ... ... 1 ...
Polygyra 2 ... ... ...
Jeanerettia 6 ... ... 1
Euclasta ... ... ... 4
Plagioptycha ... ... 14 2
Strobila ... 1 ... ...
Dialeuca ... 1 ... ...
Leptoloma 1 8 ... ...
Eurycampta 4 ... ... ...
Coryda 7 ... ... ...
Thelidomus 15 3 ... 3
Eurycratera ... ... 7 ...
Parthena ... ... 2 2
Cepolis ... ... 3 1
Caracolus 8 ... 6 2
Polydontes 3 ... ... 1
Hemitrochus 12 1 ... ...
Polymita 5 ... ... ...
Pleurodonta ... 34 ... ...
Inc. sed. 5 ... ... ...
Simpulopsis ... ... 1 1
Bulimulus 3 3 6 7
Orthalicus 1 1 ... ...
Liguus 3 ... 1 ...
Gaeotis ... ... ... 3
Pineria 2 ... ... 1
Macroceramus 34 2 14 3
Leia ... 14 2 ...
Cylindrella 130 36 35 3
Pseudobalea 2 ... 1 1
Stenogyra 6 7 (?) ...
Opeas 8 (?) 4 6
Subulima 6 14 2 2
Glandinella 1 ... ... ...
Spiraxis 2 (?) 2 1
Melaniella 7 ... ... ...
Geostilbia 1 ... 1 ...
Cionella 2 ... ... ...
Leptinaria ... 1 ... 3
Obeliscus ... ... 1 2
Pupa 2 7 3 2
Vertigo 4 ... ... ...
Strophia 19 ... 3 2
Clausilia ... ... ... 1
Succinea 11 2 5 3
Vaginula 2 2 2 1
Megalomastoma 13 ... 1 3
Neocyclotus 1 33(?) ... ...
Licina 1 ... 3 ...
Jamaicia ... 2 ... ...
Crocidopoma ... 1 3 ...
Rolleia ... ... 1 ...
Choanopoma 25 12 19 3
Ctenopoma 30 2 1 ...
Cistula 15 3 3 3
Chondropoma 57 (?) 19 4
Tudora 7 17 5 ...
Adamsiella 1 12 ... ...
Blaesospira 1 ... ... ...
Xenopoma 1 ... ... ...
Cistula 15 3 3 ...
Colobostylus 4 13 5 ...
Diplopoma 1 ... ... ...
Geomelania ... 21 ... ...
Chittya ... 1 ... ...
Blandiella ... ... 1 ...
Stoastoma ... 80 1 1
Eutrochatella 21 6 6 ...
Lucidella ... 4 1 ...
Alcadia 9 14 4 ...
Helicina 58 16 24 9
Proserpina 2 4 ... ...
The Virgin Is., with St. Croix, Anguilla, and the St. Bartholomew
group (all of which are non-volcanic islands), are related to Porto
Rico, while Guadeloupe and all the islands to the south, up to
Grenada (all of which are volcanic), show marked traces of S.
American influence. St. Kitt’s, Antigua, and Montserrat may be
regarded as intermediate between the two groups. St. Thomas, St.
John, and Tortola have each one Plagioptycha and one Thelidomus,
while St. Croix has two sub-fossil Caracolus which are now living in
Porto Rico, together with one Plagioptycha and one Thelidomus
(sub-fossil). The gradual disappearance of some of the characteristic
greater Antillean forms, and the appearance of S. American forms in
the Lesser Antilles, is shown by the following table:—
S
P S S G M t
o t S t u a S .
r . t A . a D r t B T
t S . T n A d o t . a V G r
o T t o g K n e m i r i r i
h . C r u i t l i n L b n e n
R o r t i t i o n i u a c n i
i m J o o l t g u i q c d e a d
c a a i l l ’ u p c u i o n d a
o s n x a a s a e a e a s t a d
. . . . . . . . . . . . . . . .
Bulimulus 7 4 2 4 1 2 2 3 8 9 5 3 3 6 2 4
Cylindrella 3 2 1 1 1 . . . . 1 1 1 1 . . 1
Macroceramus 3 1 1 . 2 1 . . . . . . . . . .
Cyclostomatidae, etc.23 4 1 5 1 1 1 . 4 . . . . . . 1
Dentellaria . . . . . . 1 1 8 5 11 2 2 . 1 1
Cyclophorus . . . . . . . . 1 2 2 . . . . .
Amphibulimus . . . . . . . . 2 3 1 . . . . .
Homalonyx . . . . . . . . 1 1 . . . . . .