Professional Documents
Culture Documents
DataRss Tech Overview
DataRss Tech Overview
DataRss Tech Overview
Roles
DataRSS is used between two parties, the Publisher, who ‘owns’ some data, and the Accessor, who wants to use
that data. Publisher and Accessor are organizations with people in them. The Publisher wants to offer a technical
means to allow an application program simple and standardized access to their data. The Accessor wants to write an
application program that accesses and does something useful with data coming from any Publisher. Accessor and
Publisher don’t know each other.
Accessor’s Application A can as easily get data from Publisher P as from Publisher Q. Publisher P’s data can be
accessed as easily by Accessor A as by Accessor B.
Data RSS is a simple protocol and a simple data format. It can be implemented in any programming language
and more importantly, the Publisher and Accessor software need not know (can not know) what language the
counterparties software is written in.
All DataRss requests return a response in one of several formats. For now those are: XML, JSON and HTML.
Why HTML? This way requests from a normal browser can return some useful human readable information.
DataRss Endpoint
In essence DataRss is embodied by a url which we call the DataRss endpoint. A publisher makes their data
available to others by the simple and single act of implementing responses to this url. For example, hypothetically1,
the Sunlight Foundation could let the world know that their DataRss endpoint could be found at http://
services.sunlightfoundation.com/datarss.
At minimum this would mean that clicking on that link would return a response that looks something like this:2
---
datarss:
version: 0.1
source:
name: Sunlight Labs
version: 1
---
In what follows I will document key examples of of the format as it is evolving. This is organized along the lines of
each of the top level URL components that are used to control it.
REST
The overall scheme of things is that I am trying to describe a unified set of REST URL patterns. Some of the
routes return information about the data sets (i.e. discovery) and some of them return actual data.
N.B. There are many ways to skin this cat - as is evidenced by the fact that each Publisher who designed a REST
API for their data approached it in a slightly different way. In a way that is the problem that I am trying to address.
Request url: .
The base Data RSS Endpoint returns a basic “hello world” response to prove that there is, in fact, a Data RSS
Endpoint here. It indicates the version of DataRSS and the name of the publisher, as well as whatever version
number they might set for their implementation.
Example:
---
datarss:
version: 0.1
source:
name: Sunlight Labs
version: 1
---
Request performance and feature information about this particular endpoint. An accessor might call this at the
very start to learn something about the particular implementation.
Example:
Request: ./info
Response:
features:
api-key-required: Yes
formats: [JSON, XML]
Return a list of all the distinct data sets that this endpoint publishes. Each dataset corresponds more or less to a
table or database or list of information. Datasets also may present various canned queries and default behaviors.
Example:
REQUEST: ./datasets
Data RSS - Technical Overview
Pito Salas - rps@salas.com - April 9, 2009
RESPONSE:
---
name: newswire
fullname: New York times Newswire API
---
name: campaigns
fullname: New York Times Campaign Finance API
---
Notes:
Return the list of all the distinct fields of information that may appear in responses from this dataset.
Example:
REQUEST: ./dataset/candidates/fields
RESPONSE:
---
name: imsp_candidate_id
fullname: the id number of the candidate
url-index: yes
---
name: candidate_name
fullname: the name of the candidate
url-index: no
---
Notes:
• url-index: yes means that this field can be used as an actual part of the URL, in exactly this way:
./dataset/candidates/imsp_candidate_id/9120
./dataset/<name>/queries
Return the list of all the standing queries that this dataset defines. A standing query is kind of a canned query
which is meaningful to a particular space.
Example:
REQUEST: ./dataset/candidates/queries
RESPONSE:
---
name: businesses
type: url-parameter
parameter: imsp_candidate_id
fullname: This query will summarize contributions at the business level for a
specific candidate.
Notes:
Data RSS - Technical Overview
Pito Salas - rps@salas.com - April 9, 2009
• type: named-query
A simple name that denotes a request for a specific result set. For example, ./dataset/newswire/
query/last24hours would return records corresponding to the named query last24hours.
• type: url-parameter
A query that includes a parameter right in the URL. For example: ./dataset/candidates/query/
businesses/9120 would return records for a query called businesses and the argument 9120.
• type: question-mark
The most powerful query type, that allows a more open ended set of question mark URL
parameters. For example: ./dataset/district/query/zips?state=MA&districtnumber=29 would return
records for a query called “district” with parameters state and districtnumber
Conclusion
Please note: this is not meant as a specification and it’s not a specification. It is a working document which will
change with feedback and further design. In the Appendix below you can see the examples that I have worked
through that have driven the design.
Next is to continue applying this model to other existing data APIs and find the holes. So far there have been
none that were especially hard to overcome.
Data RSS - Technical Overview
Pito Salas - rps@salas.com - April 9, 2009
Annotated Examples
EXAMPLE 1: New York Times Newswire API
REQUEST: ./info
RESPONSE:
---
features:
formats: [JSON, XML]
api-key-required: yes
paginated: no
---
REQUEST: ./datasets
RESPONSE:
---
name: newswire
fullname: New York times Newswire API
---
name: campaigns
fullname: New York Times Campaign Finance API
---
REQUEST: ./dataset/newswire/fields
RESPONSE:
---
name: url
url-index: no
---
name: section
url-index: no
---
name: summary
url-index: no
---
name: type
url-index: no
---
name: people
url-index: no
---
name: created
url-index: no
---
name: pubdate
url-index: no
---
... and so on
Data RSS - Technical Overview
Pito Salas - rps@salas.com - April 9, 2009
REQUEST: ./dataset/newswire/queries
RESPONSE:
---
name: recent
type: named-query
fullname: all available recent items
---
name: last24hours
type: named-query
fullname: items published in last 24 hours
---
REQUEST: ./dataset/newswire/query/last24hours
RESPONSE:
---
url: xxx
section: yyy
summary: zzz
type: aaa
people: xxx
---
and so on.
EXAMPLE 2: FOLLOWTHEMONEY
REQUEST: ./datasets
RESPONSE:
---
name: candidates
fullname: Follow the Money information about candidates
paginated: yes
sorts: [sector_name, industry_name, ...]
---
name: party_pacs
fullname: Follow the Money information about Pacs
paginated: yes
---
REQUEST: ./dataset/candidates/fields
RESPONSE:
---
Data RSS - Technical Overview
Pito Salas - rps@salas.com - April 9, 2009
name: imsp_candidate_id
fullname: the id number of the candidate
url-index: yes
---
name: candidate_name
fullname: the name of the candidate
url-index: no
---
name: state
url-index: no
fullname: the state this candidate is in
---
REQUEST: ./dataset/candidates/queries
RESPONSE:
---
name: businesses
type: url-parameter
parameter: imsp_candidate_id
fullname: This query will summarize contributions at the business level for a
specific candidate.
---
REQUEST: ./info
RESPONSE:
---
datarss:
version: 0.1
source:
name: Sunlight Labs
version: 1
features:
api-key-required: Yes
formats: [JSON, XML]
---
REQUEST: ./datasets
RESPONSE:
---
name: legislators
fullname: US Representatives and Senators, providing basic contact
information as well as all the various IDs we track for legislators.
Data RSS - Technical Overview
Pito Salas - rps@salas.com - April 9, 2009
paginated: no
---
name: districts
fullname: Congressional districts, providing lookups to obtain district
information from a zipcode or latitude and longitude.
paginated: no
---
REQUEST: ./dataset/districts/fields
RESPONSE:
---
name: state
fullname: the state of a district
url-index: no
---
name: districtnumber
fullname: the number of a district within a state
url-index: yes
---
name: zip
fullname: the zipcode of a district within a state
url-index: no
---
REQUEST: ./dataset/district/zip/02474
RESPONSE:
list of all districts in that zip. This example illustrates url-index: yes
REQUEST: ./dataset/district/queries
RESPONSE:
---
name: zips
type: question-mark
parameters: [state, districtnumber]
---
REQUEST: ./dataset/district/query/zips?state=MA&districtnumber=29
RESPONSE:
list info about all the zipcodes in the specified district. This example
illustrates query type: question-mark