Professional Documents
Culture Documents
Ontology-Based Quality Evaluation of Value Generalization Hierarchies For Data Anonymization
Ontology-Based Quality Evaluation of Value Generalization Hierarchies For Data Anonymization
Ontology-Based Quality Evaluation of Value Generalization Hierarchies For Data Anonymization
1
Lero@UCD, School of Computer Science and Informatics,
University College Dublin
vanessa.ayala-rivera@ucdconnect.ie,
{thomas.cerqueus,liam.murphy}@ucd.ie,
2
Lero@DCU, School of Electronic Engineering, Dublin City University
patrick.mcdonagh@dcu.ie
1 Introduction
Data publishing is an essential element of scientific and societal research.
By exploiting data, researchers can create innovative solutions and im-
proved services. However, this data often contains sensitive information
about individuals, whose personal data needs to be protected from dis-
closure. Privacy-Preserving Data Publishing (PPDP) develops methods
of anonymization for releasing this data without compromising the con-
fidentiality of individuals, while trying to retain the utility of the data.
A common mechanism to anonymize data is generalization. This con-
sists in replacing a specific value with a broader, more general value
(e.g., replacing flu with respiratory disease) with the objective of making
the original value more difficult to distinguish. Full-domain generaliza-
tion is one of the most known and widely used generalization schemes
[17, 18, 21, 28, 31]. Under this scheme, all values in an attribute are gener-
alized to their respective ancestor values at the same (higher) level of a
hierarchy. This hierarchy, commonly known as Value Generalization Hi-
erarchy (VGH) [28], contains a set of terms related to an attribute within
a specific domain. The leaf nodes correspond to the original values of a
dataset and the ancestor nodes correspond to the candidate values used
for the generalizations. More general terms are located at higher levels in
the VGH and more specialized terms are lower in the VGH.
For categorical attributes, a generalization should ideally correspond
to a “less specific but semantically consistent value” [31]. In spite of this
objective, most of anonymization methods do not usually consider the
semantics of the terms [22]. Some generalization methods rely on the
assumption that VGHs are well-specified by preserving the proper se-
mantics in the VGH specification. In this context, it has been discussed
in the literature that VGHs play an important role in the quality of the
anonymized data [7,27]. It has also been argued that a “good” VGH may
improve the utility of the anonymized data [7]. Similarly, a “bad” VGH
may cause over-generalization which can potentially reduce data preci-
sion [20, 27]. However, it is unclear as to what a “good” or “bad” VGH
is quantitatively, and how the quality of a VGH can be measured. So far,
the responses to these questions have been left to the judgement of the
users who define the VGHs. Moreover, these decisions sometimes repre-
sent the subjective opinion of a single individual, and thus correspond
to just one interpretation of a domain (to which the VGH pertains to).
These situations demonstrate how user-defined VGHs can offer a partial
and subjective knowledge model of a domain. In our opinion, above prob-
lems occur because there are currently no approaches that examine what
a “good” VGH is, or any other mechanisms that allow the users to assess
in a standardized manner the quality of their VGH. This is further exacer-
bated by the fact that VGHs may be specified without a deep knowledge
in the underlying semantics of the domain which the VGH represents.
In this paper, to address the above problems, we introduce a method
for the evaluation of the quality of VGHs with respect to how well the
semantics of the concepts specified in the VGH are maintained through-
out the generalization process. The quality of VGHs is measured using
semantic similarity metrics applied to the concepts found in the VGH
and also using the structural organization of the VGH. To the best of
our knowledge, none of the previous works have proposed an approach
that applies ontologies and semantics to VGHs to allow users to assess the
quality of the VGHs used for anonymization. As a result of measuring the
semantic loss of VGHs, the users can improve the specification of their
VGHs and prevent applications from using inconsistent, incorrect, or re-
dundant VGHs. Thus, helping to improve the utility of the anonymized
data by retaining more meaning of the original concepts.
The main contributions of this paper are as follows:
– We propose a new method and a composite score to evaluate the
quality of a given VGH based on the semantic properties of the VGH
and the information contained in a reference ontology.
– We analyze and discuss the issues commonly encountered in the spec-
ification of VGHs and identify desirable properties in a “good” VGH.
Section 2 discusses the related work and the motivation for the use of
semantics and ontologies in anonymization. Section 3 presents our VGH
quality assessment method. Section 4 presents our empirical evaluation.
Section 5 presents our conclusions and future work.
Procedure 1 depicts the process for computing the GSL score for a
VGH, termed VghGSL. To aid in the understanding of the quality as-
sessment process, we use a VGH created for a set of vertebrate animals
(shown in Figure 1), noun as the syntactic category and WordNet [11]
as the reference ontology. Even though there are limitations to WordNet
(e.g., inaccurate or incomplete domain specifications), for the purpose of
our experiment, we consider that WordNet represents the standard ontol-
ogy. To calculate semantic similarity, we will use the WuP metric (shown
in Appendix C). Compared to other metrics, its simplicity leads to a com-
putationally efficient solution. However, our approach can be applied to
other similarity metrics.
For each level in the VGH (levels are defined by the height at which the
ancestor nodes are positioned in the VGH), the similarity between each
leaf and ancestor node needs to be calculated. First, each of the words
in the VGH are mapped to a concept (or synset if WordNet is used) in
the reference ontology. If the exact word is not found, a synonym is used.
When multiple senses are available for the same word, the correct sense for
the word must be disambiguated. Automatic word-sense disambiguation
[26] is a broad research field on its own, and is beyond the scope of
this paper. In our approach, the senses of the terms at the leaf nodes
are disambiguated by hand (as the user is involved in the assessment
process), consequently the ancestors’ senses are derived from the inherited
hypernyms associated with the leaf terms. Appendix D provides a method
which demonstrates the retrieval process for a concept.
Once the correct concepts (and senses) are retrieved from the refer-
ence ontology, the GSL scores for leaf-ancestors transitions (TransGSL)
are calculated (as given by Equation 1). This process is depicted in Fig-
ure 1, which shows an example of how the TransGSL between the leaf node
salmon (sense#1), and its corresponding ancestor nodes is calculated us-
ing the WuP metric. The semantic similarity is calculated according to
WordNet and the associated hypernym tree for salmon concept. For ex-
ample, the semantic similarity between salmon#1 and fish#1 is 0.9231.
Thus, the TransGSL for this transition is 1 - 0.9231 = 0.0769. It can be
seen that the TransGSL for the generalization salmon -> ectotherm is
higher than to the other two ancestors. This is because ectotherm is not
part of the hypernym tree for salmon but a sister term (share the same
hypernym) of chordate. Once the TransGSL scores have been calculated
for each leaf-ancestor transition, a representative score for each level is
obtained. This is given by:
LevelGSL(i) = max T ransGSL(l, a) (2)
(l,a)
generalizations cat -> mammal and dog -> mammal. In this example, we
simplified the calculation of VghGSL by setting all weights to 1/h (i.e.,
1/3). The LevelGSL scores are added up ( 0.1538
3 + 0.3333
3 + 0.2
3 = 0.2290).
Weights. For the computation of VghGSL, we handle two weight
variations. The first one is a constant weight (i.e., 1/h), which does not
depend on the levels of a VGH, thus, all the levels are penalized in the
same manner. This weight is defined with the aim of using the arithmetic
mean in the computation of the VghGSL, as it is unknown how many
generalizations will be needed to satisfy the privacy requirement. The
second variation is a level-based weight which depends on the VGH level
h+1−i
considered and it is given by: wi = P h , where i is the index of a level
j=1
j
in the VGH and h denotes the height of the VGH. To better explain how
the weights work, consider the case where two VGHs have obtained the
same LevelGSL scores, but in different levels. VGH1 has a score of 0.1
and 0.2 in Levels 1 and 2 respectively. VGH2 has the same scores but
reversed, this is, 0.2 for Level 1 and 0.1 for Level 2. When all the leaf-
ancestor transitions in the VGH have the same penalty (using a constant
weight, i.e., 1/2), both VGHs obtain the same VghGSL score (0.15). Since
we use the average function, the assessment will provide similar scores for
correctly (i.e., VGH1) and incorrectly (i.e., VGH2) ordered VGHs. Most
of the similarity metrics consider the fact that concepts at the lower
levels are more similar than those at the upper levels (e.g., WuP). In
order to reintroduce this aspect in our assessment, we penalize the loss of
information per level, giving a larger weight to the lower levels, compared
to the higher levels. By using the level-based weight in this scenario (0.666
for Level 1 and 0.333 for Level 2), the VghGSL score is 0.1333 for VGH1
and 0.1666 for VGH2.
4 Empirical Evaluation
To evaluate our proposed method, we conducted a case study using mem-
bers from our research group. We pursued two objectives in this experi-
ment: (i) to investigate how VGHs (of the same domain) created by differ-
ent people are subjective to their interpretation of the domain, and (ii) to
demonstrate how our proposed VGH assessment method can be applied
to quantitatively measure the quality of the created VGHs. We present
the study in two phases. First, we review some of the issues encountered
in the specification of a categorical VGH, and second, we show how the
VghGSL score can be used to compare in a standard manner, the quality
of multiple VGHs. Thus, helping to identify which VGH (among a set of
VGHs created for a domain) can retain higher utility in the anonymized
data by better preserving the semantics of the original values.
For our evaluation, consider the scenario where a veterinary labora-
tory has been testing a new treatment for animals. The laboratory would
like to share their results, while protecting the specific details about the
animals used in their tests; thus the dataset needs to be anonymized.
Phase 1: Specification of VGHs. To guide the anonymization
of the animal attribute, we asked two members of our team (postdoc-
toral researchers who are not experts in the field of knowledge engineer-
ing) to create their own VGHs using multiple sources (e.g., dictionaries,
Wikipedia, WordNet3 ) and their own knowledge about the domain. It is
worth mentioning that the subjects (i.e., researchers) created the VGHs
without pre-computing the semantic loss, or any other information met-
rics, among the terms in their VGHs. The VGHs created are provided
in Appendix E and are denoted as VGH1 and VGH2. The leaf nodes
correspond to the original values of animal attribute. The VGHs created
are height-unbalanced (i.e., leaf nodes are at different heights). Since a
common pre-condition of full-domain generalization methods is that the
VGHs are height-balanced, a typical approach is to replicate the leaf val-
ues until reaching the same height of the deepest leaf node.
As discussed in Section 2, it is common that data publishers (who are
not necessarily domain experts) create a VGH with the aim of anonymiz-
3
WordNet was used by the subjects only as source of knowledge (e.g., definitions,
taxonomies), and not to measure similarity between terms.
ing a dataset. In our experiment, the subjects were not experts in the
domain, so they faced some difficulties while defining their VGHs. It was
reported that the process of building a VGH from multiple sources was
cumbersome, as different taxonomies were available for the same domain.
Most of these taxonomies were application-specific, so it was challenging
to come up with a final aggregated taxonomy. Another issue in the defini-
tion of the VGHs was that the subjects often used adjectives as the terms
of the ancestor nodes, which modify or elaborate the meaning of words,
rather than representing an is-a relationship. This caused the VGHs to
have mixed syntactic categories (e.g., nouns and adjectives) in the defini-
tion of the ancestor nodes. It has been argued that language semantics are
mostly captured by nouns, therefore, most of research focuses on nouns in
semantic similarity calculation [25]. This is the case for WordNet-based
similarity metrics. Since these metrics are focused on taxonomic relations,
their applicability is restricted to the noun and verb categories. Moreover,
the categories to be measured have to be of the same type (i.e., noun-
noun or verb-verb). Therefore, we nominalized the adjectives found in the
VGHs mapping them to a related noun, for example warm-blooded was
mapped to homeotherm; similarly cold-blooded was mapped to ectotherm.
Even though the subjects attempted to provide the adequate generaliza-
tions in the VGH, in the end, they were uncertain about the quality of
their VGHs. Thus, the second phase of our experiment was to compare
the quality of the VGHs using our proposed VghGSL measure.
Phase 2: Comparing the Quality of VGHs. In our implemen-
tation, we used WordNet 3.0 and the Java libraries JAWS 1.3 [1] and
RiTa [3] to retrieve data from the WordNet database. To calculate the
semantic similarity among terms, we used the library JWI [2].
To compute the VghGSL score, we used the weight variations ex-
plained in Section 3. The constant weight to assign no penalty (setting
all level weights to 1/h), and the level-based weight to penalize more the
information loss at lower levels (using the wi equation). To compare the
VGHs, we first calculated the TransGSL score for all leaf-ancestor transi-
tions and then obtained the LevelGSLs (using the max function). Table 2
presents the results for each VGH, showing the transitions causing the
maximum loss per level, and the LevelGSL scores calculated using the
constant weight (1/4) and the level-based weights (0.4 for Level 1, 0.3
for Level 2, 0.2 for Level 3 and 0.1 for Level 4). The VghGSL scores are
shown in the last row of each VGH table.
From Table 2, it can be deduced that VGH1 is better specified than
VGH2. According to the VghGSL scores, VGH1 better preserves the se-
Table 2. VGHs Comparison using Constant and Level-Based Weighted GSL.
VGH1
Generalization
Max TransGSL Transition LevelGSL ·1/h LevelGSL ·wi
L0->L1 Horse, Giraffe -> Ungulate 0.0258 0.0414
L0->L2 Horse, Giraffe, Tiger -> Mammal 0.0463 0.0556
L0->L3 Horse, Giraffe, Tiger -> Homeotherm 0.09 0.072
L0->L4 Horse, Giraffe, Tiger -> Animal 0.0833 0.0333
VghGSL Score 0.2454 0.2023
VGH2
Generalization
Max TransGSL Transition LevelGSL ·1/h LevelGSL ·wi
L0->L1 Horse, Giraffe -> Herbivore 0.09 0.1440
L0->L2 Horse, Giraffe, Tiger -> Mammal 0.0463 0.0556
L0->L3 Horse, Giraffe, Tiger -> Vertebrate 0.0577 0.0462
L0->L4 Horse, Giraffe, Tiger -> Animal 0.0833 0.0333
VghGSL Score 0.2773 0.2791
LevelGSL(max) * 1/h
LevelGSL(max) * wi
0.2 0.2
VGH1 VGH1
0.15 VGH2 0.15 VGH2
0.1 0.1
0.05 0.05
0 0
0 1 2 3 4 0 1 2 3 4
VGH Levels VGH Levels
Fig. 3. Constant Weight LevelGSLs. Fig. 4. Level-Based Weight LevelGSLs.
This appendix presents the equation for the WuPalmer measure given by:
2 ∗ N3
SimW uP (c1 , c2 ) =
N1 + N2 + 2 ∗ N3
where c1 and c2 are the two concepts for which the semantic similarity is
measured, N1 and N2 denote the number of is-a links on the path from
c1 and c2 respectively, to their least common subsumer (LCS), and N3
denotes the number of is-a links on the path from the LCS to the root
of the taxonomy. The score range is (0,1] (1 for identical concepts). To
illustrate the WuP metric, say we want to calculate the similarity between
car and compact car in the ontology shown in Appendix A. The LCS is
car. Thus, following the formula, we obtain SimW uP (car, compact car) =
2∗2
0+1+(2∗2) = 0.8.
In our work, the senses of the terms at the ancestor nodes are obtained
from the inherited hypernyms associated with the leaf terms. To do this,
a matching is performed between the hypernyms of a leaf node and each
of the leaf node’s ancestors. If there is a match, the sense for the matched
hypernym is selected. Otherwise, a manual disambiguation is needed. This
process is shown below in Procedure 2.
Procedure 2 getConceptFromOntology
Input: a node in the VGH n, reference ontology O, syntactic category of the words
in the VGH cat, inherited hypernyms of the concept in a VGH node Hn ;
Output: underlying lexical concept from ontology for the VGH node concept;
1: Cn = getConceptSetForWord(n, O, cat);
2: if n is a leaf node then
3: sn = getDisambiguatedSense(n, Cn );
4: else
5: if Cn is found in Hn then
6: sn = getSense(Cn , Hn );
7: else
8: sn = getDisambiguatedSense(n, Cn );
9: end if
10: end if
11: concept = getConcept(sn , Cn );
12: return concept;
This appendix shows the VGHs created for the animal attribute.
Acknowledgments