Professional Documents
Culture Documents
Open Government Data
Open Government Data
Open Government Data
I
n December 2010, the International Open Gov- makers, agencies (as providers and consumers),
ernment Dataset Search (IOGDS) team at the data experts, independent software developers
and service providers, academia, and citizen stake-
Tetherless World Constellation (TWC) at Rensselaer
holders. The publication of widely varied data has
Polytechnic Institute (RPI) embarked on a project inspired a wide assortment of applications and
to discover, document, and analyze open data services; provided essential data for journalists,
catalogs published by governments at various lev- bloggers, and activists; and has fueled academic
els around the world.1 By early 2013, the IODGS research. In turn, demand from stakeholders has
project had accumulated descriptive metadata for increased the quality and variety of this data,
more than 1,022,787 datasets from 192 catalogs and has encouraged the publication of thousands
in 24 languages, representing 43 countries and of datasets as readily consumed linked open
international organizations. government data.2
RPI’s aggregate catalog, implemented using the
Resource Description Framework (RDF) and pub- Building a Million-Dataset Catalog
lished using both a public SPARQL endpoint and a Starting in 2010, countries published open gov-
faceted user interface, has proven to be a valuable ernment data catalogs using a range of platforms
tool for gaining insight into the nature of open and cataloging approaches. Our project recog-
government data publication. Here, we discuss nized these catalogs as potentially valuable sources
what our team has learned about international of key data, providing names, descriptions, and
government data publication trends and tenden- URLs of datasets from many countries, if only the
cies through the application of data analytics and contents of those catalogs could be collected and
data visualization to this metadata collection. analyzed. Governments seldom publish data cat-
alogs using uniform data models, much less as
Open Government machine-readable RDF following linked data prin-
Data Publication: A Review ciples.3 To generate a uniform aggregate catalog,
Motivated by the first Obama Administration’s our team developed a semi-automated process that
transparency initiatives, in May 2009 the US included manual data portal identification; cata-
launched the Data.gov Web portal with a catalog log and dataset metadata identification; per-catalog
of 47 datasets containing previously unreleased customization of metadata harvesting tools; and
government data. During its first year, Data.gov automated linked data conversion and publication
grew to more than 250,000 datasets, inspired hun- on a public SPARQL endpoint based on an existing
dreds of applications and services, and was seen as laboratory infrastructure.4 A novel faceted browser
the flagship of the worldwide movement toward developed for the Semantic eScience Framework
open government data publication. Other govern- (SeSF)5,6 project was adapted to provide a highly
ments followed in rapid succession, and in the next efficient faceted browse and search experience for
few years open government sites for countries, mu- the user.
nicipalities, cities, and others went online. IOGDS didn’t consider certain other character-
The significant growth in number and size of istics of open government data publication that
open government data catalogs since 2009 has might be of particular interest to practitioners. For
been made possible by the emergence of an open example, it would be useful to have greater detail
government data ecosystem consisting of policy regarding the fi le formats in use, giving us deeper
Acknowledgments
This work has been made possible by a gen-
erous gift to the Tetherless World Constella-
tion at Rensselaer Polytechnic Institute from
Microsoft Research.
References
1. J.S. Erickson et al., “TWC Internation-
al Open Government Dataset Catalog,”
Proc. 7th Int’l Conf. Semantic Systems,
ACM, 2011, pp. 227–229.
2. L. Ding, V. Peristeras, and M. Hausen-
Figure 3. Word cloud showing the top keywords from the 1,405 datasets published blas, “Linked Open Government Data,”
by the US government’s Medicare program. IEEE Intelligent Systems, vol. 27, no. 3,
2012, pp. 11–15; http://bit.ly/16YYb7s.
3. T. Berners-Lee, “Linked Data,” W3C
Design Issues, 27 July 2006; http://bit.
ly/cwflPW.
4. L. Ding et al., “TWC LOGD: A Portal
for Linked Open Government Data
Ecosystems,” Web Semantics: Science,
Services and Agents on the World Wide
Web, vol. 9, no. 3, 2011; http://bit.
ly/16tmY9q.
5. E. Rozell, et al., “A Framework for
Integrating Oceanographic Data
Repositories,” Proc. AGU Fall Meeting
2010, Am. Geophysical Union, 2010;
http://bit.ly/1dLdxFc.
6. E. Rozell, Extensible User Interface
Framework for Faceted Browsing
Applications, master’s thesis,
Rensselaer Polytechnic Inst., 2012;
Figure 4. Word cloud examining keywords from 10,678 datasets published in the UK
http://bit.ly/16EnnpG.
government catalog (http://data.gov.uk).
7. J. Hendler and T. Pardo, Open Govern-
ment Primer on Machine-Readability,
data movement. Our analysis of the In the future, the adoption of bet- blog, 24 Sept. 2012; www.data.gov/
IOGDS data has shed some light on ter standards will allow applica- communities/node/116/blogs/76451.
the coverage, trends, and diversity of tions and services to be able to re- 8. F. Maali and J. Erickson, eds., Data
published data around the world. peat our data analytics at larger Catalog Vocabulary (DCAT), W3C.
org, 1 Aug. 2013; www.w3.org/TR/ Amar Viswanathan is a PhD student at the James A. Hendler is the director of the
vocab-dcat. Tetherless World Constellation at Rensselaer Poly- Rensselaer Institute for Data Explora-
9. European Commission: Interop- technic Institute. Contact him at kannaa@rpi.edu. tion and Applications (IDEA), the Tether-
erability for European Public less World Senior Constellation Chair, and
A dministrations (ISA), DCAT Joshua Shinavier is a PhD student in com- a member of the faculty in the Department
A pplication Profile for Data Portals puter science at the Tetherless World Con- of Computer Science and the Department of
in Europe, 2 Sept. 2013; http://bit. stellation at Rensselaer Polytechnic Institute. Cognitive Science at Rensselaer Polytechnic
ly/19kaBwo. Contact him at shinaj@rpi.edu. Institute. Contact him at hendler@cs.rpi.edu.
John S. Erickson is the Director of Web Sci- Yongmei Shi is a research associate at the
ence Operations at the Tetherless World Con- Tetherless World Constellation at Rensselaer Selected CS articles and columns
stellation at Rensselaer Polytechnic Institute. Polytechnic Institute. Contact her at yong- are also available for free at
Contact him at erickj4@rpi.edu. mei.shi@gmail.com. http://ComputingNow.computer.org.