Professional Documents
Culture Documents
A Theoretical Framework of Data Quality in Partici
A Theoretical Framework of Data Quality in Partici
A Theoretical Framework of Data Quality in Partici
discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/286763523
CITATION READS
1 135
2 authors:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Syarulnaziah Anawar on 18 March 2016.
1.0 INTRODUCTION later may distribute the collected data, analyze and
interpret them. In participatory sensing, human
The growth of smartphone era makes a new paradigm voluntarily act as sensors where people can perform
that smartphone is not only a device but capable of data contribution alone or in group, and shares
capturing moments as data through images, video information about their life, habits, routines and
recordings, audio, and gps information. Mobile phones enviroment that is captured by their mobile devices to
becomes one of the important thing in human life, in improve quality of life.
which people carry mobile phones anywhere, and In the recent years, participatory sensing system has
anytime. With large number of mobile phone users, it been used as a mean for data collections in various
increases the capabilities of the device to capture, domains like healthcare, cultural and heritage, urban
classifying or transmitting image, acoustic, location planning, environmental monitoring system, and
and others data in interactive or autonomous emergency response [1]. Many participatory sensing
manner[6]. systems have been built to elaborate this paradigm to
Mobile crowd-sensing is a mega paradigm of using collect various scale of data using participatory
mobile devices as a sensor to collect data, in which sensing. The essential components of implementing
participatory sensing is included. Participatory sensing participatory sensing system are capturing ubiquitous
has unique characteristics that differentiate them from data, and leveraging data processing to enable the
traditional sensor network. Participatory sensing is an technical capabilities of these systems. These
approach of mobile crowd-sensing where citizens elements are especially important because it will offer
contribute data on participatory sensing platform and mechanisms to support advocacy and civic
engagement of participant in any participatory able to provide worthwhile interaction and not being
sensing campaign. able to adapt to any changes that may occur in the
The aim of this paper is to propose a conceptual environment [20].
framework of data quality in participatory sensing that
incorporate what is typically presented as information 2.2 Data Quality in Participatory Sensing
uncertainty in other computing domain as data
quality in participatory sensing research area. We Data quality is an important part of almost of all
argue that in the view of participatory sensing research area, including participatory sensing.
literature, the interpretation of data quality is rather Participatory sensing is an approach to distribute the
loose and there is no established framework that data collection gathered from participant that giving
represents the elements of data quality in participatory contribution to the system in various aspects of their
sensing system. We will focus the argument of our life. With a large amount of various data from many
study using a case of mHealth participatory sensing aspects, participatory sensing also have an aim to
system. analyze and interpret the contribution data into
This paper will serve a guideline of data quality in something that usable to help improving life or
participatory sensing. This paper will explore on human environment. Participatory sensing networks will enable
factor or human error as the main factor of uncertainty public and professionals to gather and share local
information. To the best of our knowledge, this area is knowledge of their life and the environment [6].
still very much unexplored. We will take a different It is important to note that in most participatory
direction with other researchers which mostly focus on sensing literature [13], they address three (3) main
improving the network or the system itself before problems associated with the area. Anawar and
sharing time process in the participatory sensing Yahya (2013) classify the problems as incentive,
system. Furthermore, this paper will address the identity, and integrity [2]:
following research question; what variables and a. Incentive: The motivator that drives the campaign
factors of uncertainty information that might exist in participants in contributing data.
participatory sensing system? To answer this question, b. Identity: Level of anonymity and privacy of the
we have studied about the uncertainty information in campaign participants.
a few domains and explore variables and factors that c. Integrity: The validity of the data contributed by
exists in each domain. the campaign participants.
The rest of this paper is organized as follow: First, we
give an overview about participatory sensing system. We narrow down on the integrity aspect as our main
Second, we highlights the problem on data quality in concern, where participatory system has a unique set
participatory sensing system and how uncertainty of issues associated with them. Ganti et al. (2011) said
information affect data quality. Third, we give an maintaining the integrity of collected sensor data is an
overview about the studies of uncertainty information important problem [12]. An idea called “human as
in various domain and then specifically to the sensor” become the foundation of participatory
participatory sensing system. Next, we provide the sensing system and it is not new, and have been
discussion of our proposed framework for data quality applied successfully in personal sensing application for
in mHealth participatory sensing and last, we explain social improvement and health monitoring.
about our future works in this area. Participatory sensing helps to collect a very large
amount of data at various area. There is no doubt that
gaining masses of information through participatory
2.0 DATA QUALITY IN PARTICIPATORY sensing in different domains is possible. Participants are
SENSING SYSTEM expected to give and get correct and valid
information where providers have the important role to
2.1 Overview of Data Quality provide the trustworthy information for users. In
participatory sensing, participant is not only a
Data quality is a combination of reliability, accuracy, contributor but also act as an information provider.
completeness, timeliness, accessibility, consistency Information gathered from the participant is hard to
and validity of data [28]. In his study, Even et al. (2009) manage, because it involves a massive amount of
has stated the important of data quality in information data [28].
system as higher quality of data makes the In a participatory sensing system, the validity of the
organizational data sources become more usable and collected data is highly depends on the sensing data
consequently increase the benefits gained from the collected by mobile devices carried by participants.
data [11]. This will contribute to the effectiveness and Participants are expected to explicitly follow task
efficiency of business operation which also increase guideline where instructions were given as to the
trust in information system [9]. In 2012, McNaull duration of each activity needed and the manner of
described the importance of data quality, where a each activity to be performed. However, it is difficult
low quality of data may results in incorrect contextual for the system to obtain accurate sensing data,
knowledge to be processed. McNaull argues that low because high mobility and environmental complexity
quality of data in information system signifies a failure in participatory sensing systems may bring much more
in anticipating user’s need, where user may not being uncertainty, and there may be some inexperienced
139 Andita Suci Pratiwi & Syarulnaziah Anawar / Jurnal Teknologi (Sciences & Engineering) 77:18 (2015) 137–146
and non-reputable participants who will generate is missing or it is not consistent. Uncertainty information
corrupted data [29]. This problem will greatly reduce is an existing issue in many research fields, including
the quality of the sensing result. Consequently, it highly participatory sensing system. Those possibilities of
important to improve the quality of the sensing data uncertainty information bring us to untrusted
by identifying the information that has become information.
uncertain due to the above problems. Kiurehgian et al. (2008), has classified uncertainty
Little attention has been given by researchers in information into two (2) : aleatory uncertainty and
participatory sensing area on how having human to epistemic uncertainty. Aleatory uncertainty is
operate, carry and interact with participatory sensing uncertainty that presumed to be an intrinsic
system affects participation, data quality and spatial- randomness of some phenomenon, which means the
temporal coverage. Focuses on methods to model uncertainty is being affected by nature and physical
sensing of individuals, access their spatial and world [15]. On the other hand, epistemic uncertainty
temporal availability and also determine the ideal comes from a lack of knowledge or data. Uncertainty
feedback mechanisms will greatly help the data information is complex problem and affects decision
collection process. Having human as participant of making of service provider in participatory sensing
sensing process leads to unique research challenge system. The trustworthiness of information is in question
such like; can we model the “human factors” involved because the information will later be used for life
in the sensing process and use it to gather higher improvement. In order to handle the uncertainty
quality data while helping the users to have better information, it is necessary to study about the
understanding of their contribution [24]. characteristics and factors that cause uncertainty
While different explanation extend our information.
understanding of data quality, inconsistent Berztiss (2002) studied about uncertainty
conceptualizations of the term can lead to confusion management and explained various aspects of
in identifying the elements of data quality in uncertainty which are inconsistency, vagueness,
participatory sensing. In line with Reddy’s work (2007), imprecision, incompleteness and rounding [5]. Among
we strongly argue that human factor is one of the five (5) aspects of uncertainty, Berztiss highlight
biggest influence to participant’s contribution and inconsistency as the variable to be measured. In 2004,
some of participant’s behavior leads to uncertainty Antifakos et al. introduced task difficulty, cost,
information which in the end affects quality of data. knowledge of uncertainty, and level of uncertainty as
Therefore, in the context of our study, we define data variables in their work [3]. In 2008, Coppi defined
in participatory sensing is of high quality when: various sources of uncertainty information which are
“Participant’s contribution of sensed data is randomness, imprecision, and vagueness. However,
complete, accurate, up to date, and relevant only imprecision and vagueness were used in his work.
within the context of the system and service [7].
provider’s goal, while minimizing the uncertainty In 2012, MacEachren et al. explained the types of
information” uncertainty to the components of information as:
Based on that definition, we propose hypothesis that if accuracy, precision, completeness, consistency,
the contributed information having one of the currency timing and interrelatedness or it has the same
elements of uncertainty then we could called it as meaning as relevance [19]. In their study, MacEachren
uncertainty information which exists in participatory et al. used accuracy, precision and currency as
sensing system. variables they used. 1 year later, Dragos defined
various types of uncertainty as precision, ambiguty,
vagueness, and inconsistencies [8]. In the same year, Li
3.0 OVERVIEW OF UNCERTAINTY et al. (2013) further explored the concept of
INFORMATION randomness by categorizing unknowing exact result of
event as randomness in uncertainty information [17].
3.1 Uncertainty Information From the literatures of uncertainty information, we
found many variables that each research area may
Data correctness issue and quality of data is important have different element(s) of uncertainty information
to be preconcerted before analyzing information. and different technique(s) to solve uncertainty
Information should be of high integrity when information. Some research may have more than one
requirements of participatory sensing system is fulfilled. variable of uncertainty information to be tackled. We
Unfortunately, the information shared by participants represents the variables of uncertainty information
has possibilities of being uncertainty due to the user from various research areas in Table 1.
action toward the system. For example, information
may not be clear, having vague meaning, some data
140 Andita Suci Pratiwi & Syarulnaziah Anawar / Jurnal Teknologi (Sciences & Engineering) 77:18 (2015) 137–146
4.0 UNCERTAINTY INFORMATION IN mHealth behaviors and practices into personalized, evidence-
PARTICIPATORY SENSING based, in health-care domain.
There are difficulties in mHealth participatory
The phenomenon of using mobile phones as a sensor sensing, one of the most challenging problem is to
for human, increases the functionalities of devices ensuring that the device is compatible and
and enable participants to contribute and share accessible and also to ensuring relevancy and
information ranging from their environment to their accuracy of the data. To provide the patient with
health state to improve their quality of life. mHealth the most relevant data and help them with self-
participatory sensing system serve as data collection monitoring of their illness, it is better to sensing in
platform using mobile devices and allow community frequency (Arvidson, 2012). There’s also a lot of
and stakeholder to collect, analyze and submit or challenges in developing the infrastructure of
share health information to a larger interactive mHealth participatory sensing systems and the main
participatory sensing network. issue is to concern in knowledge representation from
Many mHealth participatory sensing systems has the information that gathered which is highly variable
been built to improve the quality of environment or within different service providers and sources. The
for health care monitoring. mHealth participatory variability includes information uncertainty.
sensing systems will greatly reduce operational cost Yu et al. (2014) argue that research in participatory
as it allows knowledge transfer between medical sensing area is still remains at theoretical and
professionals and community, and improving experimental stage. A system that enhances the
community’s participation in wellness care and quality of sensing data is required because data
health maintenance. mHealth participatory sensing shared, captured, and collected by participants
systems will transform previously unmeasured cannot be applied as intended if it is not reliable and
142 Andita Suci Pratiwi & Syarulnaziah Anawar / Jurnal Teknologi (Sciences & Engineering) 77:18 (2015) 137–146
inaccurate [29]. By having human as contributor of becomes more efficient and at the same time
the sensing process will lead to a unique research helping participants to understand more about their
challenge. Human factors has a big influence in contributions [24]. Figure 2 shows stakeholders and
elevating quality of data in participatory sensing architectural components of mHealth participatory
because it will make the process of data gathering sensing.
Figure 2 Stakeholders and architectural components of mHealth participatory sensing (adapted from Christin, D et al., 2011)
From literature, we found that there are rare studies based on data trustworthiness level. Yang et al.,
being made concerning data quality in participatory outlined that participant’s ability was attributed to
sensing [24]. Presents quality involved in “human-in- their responsiveness towards the system which is
the-loop” sampling with five variables in the metrics parallel with earlier argument by Reddy et al.
which are timeliness, capture, relevancy, coverage, In 2013, [29] studied about improving data quality
and responsiveness. Timeliness presents the exactness using their proposed accumulated reputation model
of time of the event timing. Capture is a variable that in participatory sensing systems using Gompertz
affected by the sensor specifications and the function and give a reputation score to contributed
capturing process by the participants. Relevancy is information [14] studied about a reputation system for
description of the phenomenon or event is mobile phone based sensing which focused on noise
corresponding to what that participant sense and it is monitoring. They used robust average algorithm and
can be completely irrelevant to completely relevant. combined it with Gompertz reputation and applied it
Coverage is the representing of the spatial and in their module.
temporal availability that associated with the
coverage, and the key role of this variable are
temporal extent and resolution. Reddy further 5.0 THEORETICAL FRAMEWORK OF DATA
describe responsiveness as the responding of the
QUALITY IN mHEALTH PARTICIPATORY
sensing request from the system by participants.
In 2011, [28] studied about improving data quality SENSING
using reputation management in participatory
sensing for data classification. They explored the 5.1 Theoretical Framework
trustworthiness of data or information from
participants using reputation management approach One of the interesting area of participatory sensing is
and introduced three (3) categories of reputation: in health-monitoring where participatory sensing
data quality record and participant’s past helps people to monitor their health using their
performance (DR); participant’s ability and device smartphones to maintaining or even improving their
capabilities, represents personal information (PI); and, health. This is an interesting field to be explored in
community trust and organizer trust, represents participatory sensing which the campaign should be
indirect reputation (IR). They came out with the in long-term time frames to collect the data and
conceptual classification model to classify data analyze it.
143 Andita Suci Pratiwi & Syarulnaziah Anawar / Jurnal Teknologi (Sciences & Engineering) 77:18 (2015) 137–146
Due to the scope of this paper which emphasizes on participatory sensing campaign based on their
participant’s data input in M-Health domain, not all contribution data and level of uncertainty of their
the variables will be used in the proposed data contribution based on the quality of data that they
quality framework. From the presented literature, we shared while using mHealth application.
found 14 variables that have been studied in data We represents our finding about uncertainty elements
quality in various research areas, and 3 variables that in participatory sensing system that has been studied
have been studied particularly in participatory by other researchers in Table 3.
sensing area which is accuracy, timeliness, and
relevancy. The data quality metrics that meets the Table 3 Uncertainty Elements in Participatory Sensing
nature of mHealth participatory sensing is accuracy, Research(s)
relevancy, completeness, and timeliness. .
Variable(s) Author(s)
This study focus on variable that can be affected
by human errors as one of the factor that influencing [24] [28] [29]
the trustworthiness of data and cause uncertainty Timeliness √ √ -
information. The variables chosen are variables that
Accuracy - - √
can cause the data become uncertain when it
comes from participant contribution behavior. We Relevancy √ - -
exclude coverage as variable due to scope and Completeness - √ -
limitation of this study which not focus on spatial and
Responsiveness √ √ -
temporal areas because coverage and device’s
capability are mostly influence by the sensor or the Capture √ √ -
device not human as contributor. Output of this study Coverage √ - -
is to determine the reputation of participants of
Figure 3 The Proposed Theoretical Framework for Data Quality in mHealth Participatory Sensing
Figure 3 shows the proposed theoretical framework a. Participant Input (t) = The quantity of
for mHealth participatory sensing. We applied the participation is measured by the frequency of
experimentation variables set in [23] as the data to input recorded by the participants. The
be examined in our framework. The experimentation requirement was participant should record their
variables are weight for each week. The frequency is
compared over weekly time intervals that make
144 Andita Suci Pratiwi & Syarulnaziah Anawar / Jurnal Teknologi (Sciences & Engineering) 77:18 (2015) 137–146
5.2 Work in Progress suppress the chance of their obesity becomes others
coroner disease.
We apply the proposed framework on our mHealth Data from this system will be collected in web
participatory sensing system called w8loss that is databases. The application has food database,
developed in Android platform. This system aims to workout option, calculation of BMI and calculation of
help people to maintain or improve their quality of calories budget of each day based on participant’s
health by monitoring and controlling their food and BMI information. We include some of screen captures
their ideal calories in-take each day in terms of doing of the application.
healthy diet and reduce weight periodically to As shown in Figure 4, we include some of screen
captures of the application.
6.0 CONCLUSION AND FUTURE WORK software for qualitative research. The finding will be
validated using the statistical validation on the data
In this paper, we identify variables and factors of collected in mHealth database.
uncertainty information that might exist in
participatory sensing system by providing uncertainty
information in a few domains and explore variables Acknowledgement
and factors that exists in each domain. Next, we
propose a conceptual framework of data quality in This research is supported by Fundamental Research
participatory sensing that incorporate what is Grant Scheme, Ministry of Higher Education,
typically presented as information uncertainty in Malaysia. (FRGS/2012/FTMK/TK06/03/1-F00142).
other computing domain as data quality in
participatory sensing research area. The six (6)
variables included in the proposed framework are
Timeliness, Incompleteness, Randomness, Rounding, References
and Relevancy.
For future work, we will perform data collection [1]. Anawar, S., Yahya, S., Ananta, G. P., Abidin, Z. Z., & Ayop,
Z. 2013. Conceptualizing Autonomous Engagement in
and analysis using a qualitative Data will be
Participatory Sensing Design: A Deployment for Weightloss
collected through interview with each participant as Self Monitoring Campaign. In Proceeding of IEEE
respondent after they have using the w8loss Conference on e-Learning, e-Management, and e-
application for two (2) weeks. The collected data will Service, IC3E 2013, Kuching, Malaysia.
be analyze using Atlas.ti, a qualitative data analysis [2]. Anawar, Syarulnaziah, and Saadiah Yahya. 2013.
Empowering Health Behaviour Intervention Through
146 Andita Suci Pratiwi & Syarulnaziah Anawar / Jurnal Teknologi (Sciences & Engineering) 77:18 (2015) 137–146
Computational Approach for Intrinsic Incentives in [19]. McNaull, J, Augusto, J, C, Mulvenna, M, McCullagh, P.
Participatory Sensing Application. Research and 2012. Data and Information Quality Issues in Ambient
Innovation in Information Systems (ICRIIS), 2013 Assisted Living Systems. Journal Data and Information
International Conference on. IEEE. Quality. 4(1). DOI = 10.1145/2378016.2378020.
[3]. Antifakos, S., Schwaninger, A., & Schiele, B. 2004. [20]. Nottelmann, H., & Fuhr, N. 2003. From Uncertain Inference
Evaluating the Effects of Displaying Uncertainty in Context- to Probability of Relevance for Advanced IR Applications.
Aware Applications. 54–69. 235-250.
[4]. Arvidson. 2012. [21]. Omerovic, A., & Stølen, K. 2011. A Practical Approach to
[5]. Berztiss, A. T. 2002. Uncertainty Management. Uncertainty Handling and Estimate Acquisition in Model-
[6]. Burke, J, Estrin, D, Hansen, M, Parker, A, Ramanathan, N, based Prediction of System Quality. 4(1): 55-70.
Reddy, S, Srivastava, M B. 2006. Participatory Sensing. [22]. Pratiwi, A, S, Anawar, S. 2014. Resolving Uncertainty
Papers. Center for Embedded Network Sensing, UC Los Information Using Case-Based Reasoning Approach in
Angeles. 1-5. Weight-loss Participatory Sensing Campaign. Recent
[7]. Coppi, R. 2008. Management of Uncertainty in Statistical Advances on Soft Computing and Data Mining. Advance
Reasoning: The Case of Regression Analysis. International in Intelligent Systems and Computing 287. DOI:
Journal of Approximate Reasoning. 47(3): 284-305. 10/1007/978-3-319-07692-8_52. Springer.
doi:10.1016/j.ijar.2007.05.011. [23]. Reddy, S, Burke, J, Estrin, D, Hansen, M, Srivastava, M. 2007.
[8]. Dragos, V. 2013. An Ontological Analysis of Uncertainty in A Framework for Data Quality and Feedback in
Soft Data. 1566-1573. Participatory Sensing. SenSys'07. ACM 1-59593-763-
[9]. DeLone, W. and McLean, E. 1992. Information Systems 6/07/0011. Sydney, Australia.
Success: The Quest for the Dependent Variable. Inform. [24]. Ucla, C. 2011. 2 . 2 Participatory Sensing ( PART ). Center
Syst. Res. 3(1): 60-95. for Embedded Networked Sensing. 25-72.
[10]. Even, A. Shankaranaryanan, l. 2009. Dual Assessment of [25]. Wolf, G., Kalavagattu, A., Khatri, H., Balakrishnan, R.,
Data Quality in Customer Databases. ACM Journal of Chokshi, B., Fan, J., Kambhampati, S. 2009. Query
Data and Information Quality. 1(3). 15: 1-15: 29. Processing Over Incomplete Autonomous Databases:
[11]. Ganti, R, K. 2011. Mobile Crowdsensing : Current State and Query Rewriting Using Learned Data Dependencies. The
Future Challenges. IEEE Communication Magazine. VLDB Journal. 18(5): 1167-1190. doi:10.1007/s00778-009-
November. 32-39. 0155-0.
[12]. Goldman, J, Shilton, K, Burke, J. 2009. Participatory [26]. Xu, C., Cheung, S. C., Chan, W. K., & Ye, C. 2010. Partial
Sensing: A Citizen-Powered Approach to Illuminating the Constraint Checking for Context Consistency in Pervasive
Patterns that Shape Our World. Foresight & Governance Computing. ACM Transactions on Software Engineering
Project, White Paper. 1-15. and Methodology. 19(3): 1-61.
[13]. Huang, K, L, Kanhere, S, S, Hu, W. 2014. On the Need for a doi:10.1145/1656250.1656253.
Reputation System in Mobile Phone Based Sensing. Ad [27]. Yang, H, F, Zhang, J, Roe, P. 2011. Using Reputation
Hoc Networks. Elsevier B.V. 12(1): 130-149. Management in Participatory Sensing for Data
[14]. Kiureghian, A, D, Ditlevsen, O. 2009. Aleatory or Epistemic? Classification. Procedia Computer Science. DOI:
Does It Matter? Structural Safety. Elsevier Ltd. 31(2): 105- 10.1016/j.procs.2011.07.026. ISBN: 1877-0509. ISSN:
112. 18770509. 190-197.
[15]. Lalmas, M. 1998. Information Retrieval and Dempster- [28]. Yu, R, Liu, R, Wang, X, Cao, J. 2014. Improving Data
Shafer ’ s Theory of Evidence. 157-176. Quality with an Accumulated Reputation Model in
[16]. Li, Y., Chen, J., & Feng, L. 2013. Dealing with Uncertainty: A Participatory Sensing Systems. Sensors. 14. DOI:
Survey of Theories and Practices. IEEE Transactions on 10.3390/s140305573. ISBN: 8624836875. ISSN: 14248220.
Knowledge and Data Engineering. 25(11): 2463-2482. 5573-5594.
doi:10.1109/TKDE.2012.179. [29]. Zhang, S., Chen, Q., & Yang, Q. 2010. Acquiring
[17]. Liang, L. R., Looney, C. G., & Mandal, V. 2011. Fuzzy- Knowledge from Inconsistent Data Sources Through
Inferenced Decisionmaking Under Uncertainty and Weighting. Data & Knowledge Engineering. 69(8): 779-799.
Incompleteness. Applied Soft Computing. 11(4): 3534- doi:10.1016/j.datak.2010.03.001.
3545. doi:10.1016/j.asoc.2011.01.026. [30]. Zhang, W., Lin, X., Pei, J., & Zhang, Y. 2008. Managing
[18]. MacEachren, a. M., Roth, R. E., O’Brien, J., Li, B., Swingley, Uncertain Data: Probabilistic Approaches. 2008 The Ninth
D., & Gahegan, M. 2012. Visual Semiotics & International Conference on Web-Age Information
Uncertainty Visualization: An Empirical Study. IEEE Management, 405–412. doi:10.1109/WAIM.2008.42.
Transactions on Visualization and Computer Graphics,
18(12): 2496-2505. doi:10.1109/TVCG.2012.279.