Articles by Hugh

I am a web-guy, writer, and participant in the open movement. I started LibriVox.org, and have a little software development company.

You can find me at http://hugmcguire.net.

Our friends at freeourdata.org.uk have an article about abolishing Crown Copyright in the UK. Canada suffers under the same of copyright policy on government documents and data, while in the USA, everything published by the government is de facto public domain.

The key point is:

But the problem with crown copyright as it stands, and more importantly as it’s used, is that it’s used to restrict.

[link…]

MoveMyData.org

This is a bit off-topic, but spiritually related to the mission of datalibre.ca … MoveMyData.org. From the “about”:

Your content and data should be yours to manage and do with as you please. Your images, writing, tags, profile, blog entries, comments, testimonials, video, and music should be yours to download and move anyplace you want.

We will help ensure that no website ever holds your data hostage.

[link…]

I have not played with it yet, but I love the idea.

Jon Udell interviews Greg Elin, chief info architect of the Sunlight Foundation, which aims to make the operation of Congress and the U.S. government more transparent and accountable. It’s interesting to follow this debate in the USA – where government data and reports are de facto public domain, though true access is a different story, compared with Canada where government data is often covered by restrictive copyright provisions (starting with Crown Copyright). Says Udell about the talk:

Having surveyed a wide range of government data sources, Greg’s conclusion is that the future is already here, but not yet evenly distributed. There are pockets within the government where data management practices are excellent, and large swaths where they are mediocre to horrible. The Sunlight Foundation has an interesting take on how to bootstrap better data practices across the board. By demonstrating them externally, in compelling ways, you can incent the government to internalize them:

Some of that can be said here, but we are behind the curve, having a big hurdle to get over just convincing the Canadian government that the proved wisdom of US government data policy is compelling: making government data available spurs innovation. Restricting it restricts innovation.

See the Interviews with Innovators page, on IT Conversations.

From O’Reilly Radar:

Carl Malamud has this funny idea that public domain information ought to be… well, public. He has a history of creating public access databases on the net when the provider of the data has failed to do so or has licensed its data only to a private company that provides it only for pay. His technique is to build a high-profile demonstration project with the intent of getting the actual holder of the public domain information (usually a government agency) to take over the job.

Carl’s done this in the past with the SEC’s Edgar database, with the Smithsonian, and with Congressional hearings. But now, he’s set his eyes on the crown jewels of public data available for profit: the body of Federal case law that is the foundation of multi-billion dollar businesses such as WestLaw.

In a site that just went live tonight, Carl has begun publishing the full text of legal opinions, starting back in 1880, and outlined a process that will eventually lead to a full database of US Case law. Carl writes:

1. The short-term goal is the creation of an unencumbered full-text repository of the Federal Reporter, the Federal Supplement, and the Federal Appendix.
2. The medium-term goal is the creation of an unencumbered full-text repository of all state and federal cases and codes.

Link to the database.

GeoData Alliance

The GeoData Alliance is a nonprofit organization open to all individuals and institutions committed to using geographic information to improve the health of our communities, our economies, and the Earth.

The purpose of the GeoData Alliance is to foster trusted and inclusive processes to enable the creation, effective and equitable flow, and beneficial use of geographic information.

[more…]

Good news: Elections Ontario makes the postal code/electoral riding data file available:

The Postal Codes by Electoral Districts (ED) file provides a link between the six-character postal code and Ontario’s new provincial electoral districts. It is a zip file containing three files that can be loaded and used in spreadsheets and databases. The first is a text file with the ED names; the second file contains the postal codes that have been assigned to a single ED; and the third file contains postal codes that have been found in multiple EDs. This third files repeats the postal code for each ED in which it is found.

Have not looked at it yet: any comments about formats/license etc?

(from the civicaccess.ca mailing list).

Jon Udell writes about the outside edge of what’s happening on the web (including lawnmowers), but his focus is often as much about how regular, non-digerati people might be helped by new changes and technologies. Formerly the blogger-in-chief at Infoworld, he’s now working with microsoft. He’s been writing recently about public data, and I wanted to find out why.

1. you seem to be spending much time recently writing about access to public data … why is that?

I’ve always thought the real purpose of information technology was to harness our collective intelligence to tackle complex and pressing problems. When I heard Doug Engelbart’s talk at the 2004 Accelerating Change conference. I realized for the first time how all of his work points toward that one goal. Graphical user interfaces, networks, hyperlinked webs of information — for him, these are all means by which we “augment” our human capabilities so we can have some hope of dealing with the challenges we face as a species.

In that context, getting data into shared information spaces is just part of the story. We’ll also need to be able to share the tools we use to analyze and interpret the data, and the conversations we have about the analysis and interpretation.

2. what do you think is the most compelling argument for making public data available to citizens?

Well it’s ours, our taxes paid for it, so we should have it. But the compelling reason is that we need more eyeballs, hands, and brains figuring out what’s going on in the world, so that when we debate courses of action we can ground our thinking in the best facts and interpretations.

3. are you convinced by any arguments *against* making public data available to citizens?

Here’s an argument I don’t buy: That amateur analysts will do more harm than good. I don’t buy it because there will be checks and balances. Those who don’t cite data will be laughed at. Those who do cite data but interpret it incorrectly will be corrected. Those who do great work will develop reputations that are discoverable and measurable.

Here’s an argument I do buy: There’s the risk of violating privacy. The District of Columbia, for example, has released a lot of data but has postponed releasing adult arrests and charges until the location information can be aggregated. We will increasingly have to make these kinds of calls.

5. public data is an issue that most people will have trouble getting excited about. how do you think “data activists” should approach it?

The best advice I’ve heard comes from Tom Steinberg, founder of MySociety.org. He counsels activists to use data in ways that matter directly to people. Suppose you could get geographic data on planned highway routes, for example. Nobody cares, until you connect the dots and show people their houses will have to be bulldozed to make way for it. Then they really care.

6. in your experience with government officials, how have *they* reacted to your requests for data?

When I started asking my local police department for crime data, they stonewalled. Eventually I had to get a lawyer to write them a letter citing our state’s ‘Right to Know’ act, and we were both unhappy about having to do that.

But once I met with the police chief and explained my interest in exploring both local patterns as well as this whole general process, he was OK with that. Better than OK, actually. I think he was relieved when he saw that some questions people have been speculating about might now be discussed in a more rational way. And he’s really excited by the prospect of geographical analysis because they haven’t had that capability.

8. what do you think are the connections between open access to public data and other similar movements – free culture, free software etc?

There’s an arc that runs from free and open-source software, to open data, to Web 2.0-style participation, and now to the collaborative use of software, services, and public data in order to understand and influence public policy.

9. with your crystal ball, where do you think the confluence of these movements will take us in, say, 5 years?

I’m sure it won’t happen that soon, but here’s what I’d like to see. Imagine some local, state, or national debate. The facts and interpretations at issue are rarely attached to URLs, much less to to primary sources of data at those URLs and to interactive visualizations of the data. We spend lots of time arguing about facts and interpretations, but mostly in a vacuum with no real shared context, which is wildly unproductive. If we could establish shared context, maybe we could argue more productively, and get more stuff done more quickly and more sanely.

The good folks at freeourdata.org.uk (one of the major inspirations for this blog) met with the UK’s Minister of Information, Michael Wills. The whole interview is interesting, of course, but just the opening remarks from Michael Wills show a remarkable openness to the idea of freed data:

Personally I’m very excited by this area, I asked to do this as part of my portfolio… The whole issue of data is I think tremendously exciting for all the reasons that you’ve said, it’s part of the infrastructure now of our society and our economy and it’s going to become more so with what’s happening with data mashing, the extraordinary intellectual creative energy that’s being unleashed is something that as a government we have to respond to, and the power of information you know is a very exciting document, something that I think is very much where government wants to be.

[more…]

More interesting stuff from Jon Udell, this time taking some climate data for his area, using the ManyEyes platform and trying to see what has been happening in New Hampshire in the last century.

The experiment is non-conclusive, but there is an excellent debate in the comment threads, about the problems with amateurs getting their hands on the data – and the hash they can make of things because they are not experts.

Says one commenter (Brendan Lane Larson, Meteorologist, Weather Informaticist and Member of the American Meteorological Society)

Your vague “we” combined with the demonstration of the Many Eyes site trivializes the process of evidence exploration and collaborative interpretation (community of practice? peer review?) with an American 1960s hippy-like grandiose dream of democratization of visualized data that doesn’t need to be democratized in the first place. Did you read the web page at the URI that Bob Drake posted in comments herein? Do you really think that a collective vague “we” is going to take the time to read and understand (or have enough background to understand) the processes presented on that page such as “homogenization algorithms” and what these algorithms mean generally and specifically?

To which Udell replies:

I really do think that the gap between what science does and what the media says (and what most people understand) about what science does can be significantly narrowed by making the data behind the science, and the interpretation of that data, and the conversations about the interpretations, a lot more accessible.

To turn the question around, do you think we can, as a democratic society, make the kinds of policy decisions we need to make — on a range of issues — without narrowing that gap?

There is much to be said about this … but Larson’s comment “Do you really think that a collective vague “we” is going to take the time to read and understand (or have enough background to understand) the … XYZ…” is the same question that has been asked countless times, about all sorts of open approaches (from making software, to encyclopaedia, to news commentary). And the answer in general is “yes.” That is, not every member of the vague “we” will take the time, but very often with issues of enough importance, many of the members of the vague “we” can and do take the time to understand, and might just do a better job of demonstrating, interpreting or contextualizing data in ways that other members of the vague “we” can connect with and understand.

The other side of the coin of course, is that along with the good amateur stuff there is always much dross – data folk are legitimately worried about an uneducated public getting their hands on data and making all sorts of errors with it – which of course is not a good thing. But, I would argue, the potential gains from an open approach to data outweigh the potential problems.

UDATE: good addition to the discussion from Mike Caulfield.

Quality Repositories, is a website that comes out of a stats (?) course at University of Maryland. It aims to evaluate the usefulness and availability of various sources of public data, from US Government, non-US government, academic, and sports related (?) data sets. Evaluations are based on criteria such as: online availability, browsability, searchability, retrievable formats etc. The about text:

Data repositories provide a valuable resource for the public; however, the lack of standards in terminology, presentation, and access of this data across repositories reduces the accessibility and usability of these important data sets. This problem is complex and likely requires a community effort to identify what makes a “good” repository, both in technical and information terms. This site provides a starting point for this discussion….

This site suggests criteria for evaluating repositories and applies them to a list of statistical repositories. We’ve selected statistical data because it is one of the simplest data types to access and describe. Since our purpose is partly to encourage visualization tools, statistical data is also one of the easiest to visualize. The list is not comprehensive but should grow over time. By “repositories” we mean a site that provides access to multiple tables of data that they have collected. We did not include sites that linked to other site’s data sources.

The site was created by Rachael Bradley, Samah Ramadan and Ben Shneiderman.

(Tip to Jon Udell and http://del.icio.us/tag/publicdata)

« Older entries § Newer entries »