
The government owns a lot of data. A lot. And a huge portion of it is public information that you’re allowed to request (i.e., through a Freedom of Information Act request/local equivalent, or by filing out some paperwork at City Hall). The information is always technically available to you, but more often than not the process is pretty cumbersome–your request might not be responded to quickly, it might be denied for various reasons (some of them pretty dubious), they might give it to you in hard copy, you might be charged for the printing costs, the information they give might not be quite what you requested…the list goes on. To illustrate that point, here’s an awesome blog post by Ian White of Urban Mapping describing his very humorous, VERY maddening dealings with various municipal transit authorities, just trying to get basic data from them like train schedules and locations of stations. It’s worth a skim, if only to see the incredible amount of incompetence and/or obstinacy one often must deal with just trying to get data out of government.

The idea that citizens should be able to access data is rooted in two basic facts. One, the US government isn’t allowed to hold copyrights. The reasoning behind this makes sense–if the taxpayers funded the creation of the data, they should be able to see the results. Turns out this provision of the US Code doesn’t apply to state or local governments. But that shouldn’t make much of a difference in terms of most datasets, because as we saw in Feist v. Rural, the Supreme Court ruled that a mere collection of facts expressed in a database is not afforded copyright protection (I say “most” because it’s not always quite so simple…but in general, much of the data cities collect is not protected by copyright).
In recognition of these facts and citizens’ right to data, some governments have adopted open data policies, encouraging the publication of government data. This often takes the from of an online portal–at the federal level (Data.gov), on the state level (Data.Colorado.gov, among others), and at the local level (Data.SFGov.org, among others). The benefits of an open data portal are great, both
from a public perspective:
- Increased government transparency
- Lowered barriers to access for government data
- Data now available in more useful forms (i.e., not a ream of paper printed single-sided)
- Data now available in one centralized location (i.e., not spread across various department websites)
and from a government perspective:
- Reduces corruption by making ethics reports, employee salaries, campaign finance data, etc. public
- Citizens can sometimes uncover inaccuracies or omissions; thus public availability can acutally improve government’s data
- Developers can use data to develop useful applications at no cost to the city (examples from San Francisco)
- Can pre-empt information requests by posting data publicly, reduce bureaucratic costs of complying with initial and duplicate requests

Sounds like a pretty sweet deal, right? So why doesn’t everyone have one of these? The overhead costs aren’t huge, and if you have a competent IT department the technical administration isn’t a terrible burden. Turns out the main objections are cultural. I worked on San Francisco’s open data policy this summer and found that the biggest concerns for departments reluctant to post data is just that they don’t recognize the larger benefits of open data, and therefore feel little motivation to commit resources to the cause. Another major concern was that many departments just didn’t know what they should and should not post. Every department has a ton of data, but it’s very much a subjective call to determine whether data is of public value and worth posting.
But beyond the public value question is an even larger one–the question of privacy, and this is of particular concern for local governments. The data released by the federal government tends to consist mainly of huge aggregations. Thus, the privacy concern from the release of that data is quite low for the average citizen. But local governments collect a lot of data that could be of concern to an individual citizen if it’s released–say, for example, crime data for incidents in front of your home that could lower its value just as you’re trying to put it up for sale. The laws currently in place in several local jurisdictions don’t provide much guidance on the matter. For example, San Francisco’s policy, which is sort of vague on the privacy issue:
Data prioritized for publication should be of likely interest to the public and should not disclose information that is proprietary, confidential, or protected by law or contract;
New York’s policy has several more provisions, but still leaves a lot of questions for those determining what to include on an online portal. Thus far, the interests of privacy and transparency have been balance-tested on an ad-hoc basis, and sometimes the data is modified to reflect privacy concerns before it is published. Crime data, the example above, has in many cases been aggregated to the block level so that individual homeowners are not targeted. Names are redacted, information related to ongoing criminal investigations is not released, and more. Local governments have mostly erred on the conservative side when cataloguing data for publication. Every once in a while, they mess up (and then learn their lesson). But for the most part the privacy concerns seem to be protected by these sites. However, lacking bright-line standards, governments will continue to have to make subjective calls of transparency versus privacy (having helped to write San Francisco’s policy this summer, I can say from experience that coming up with bright-line standards for this sort of thing is extremely difficult–maybe impossible).
More available data makes for more useful apps (just ask any third-party app on Facebook that’s stealing your information), but at a certain point government needs to weigh the interests of developers against those of their residents. We worry so much about private companies that have data about us online, but often don’t even think about all of the data that government collects. Concerns about online privacy extend here too, and only time will tell if less blurry standards for determining datasets for publication will be developed.

————————-
Further reading:
- Open Government: http://en.wikipedia.org/wiki/Open_government
- Open Government Data: http://opengovernmentdata.org/
- The Sunlight Foundation: http://sunlightfoundation.com/
- The 8 Principles of Open Government Data: http://www.opengovdata.org/home/8principles
————————-
Image credits: http://3.bp.blogspot.com/-80zvXKxYCO0/T0HxCMRlFeI/AAAAAAAADmU/TXWPprCRK2s/s1600/us.capitol.building.double.twin.pillars.jpg then http://quickmeme.com, http://philosophyforchange.wordpress.com/2010/05/17/camus-authenticity-and-revolt/, http://data.sfgov.org, http://newsserve.net/i/20120426/2093-Privacy-vs-transparency.jpg
I think you are right that there will always be a tension between privacy and transparency. I’m curious why you believe it was a mistake to release teachers’ names along with the their students’ average standardized testing scores. Do you object to the large margin of error in the data obtained, the lurking variables in the study, the small sample size, or the actual act of publishing teachers’ names alongside test scores. I agree there were issues with the study, but don’t parents have a right to know the abilities of the teachers instructing their children?
LikeLike