October 2011

Data and public policy – OECD Social Justice Report

I entered into the discourse on open data to facilitate the production of these types of reports.

Social Justice in the OECD – How Do the Member States Compare?  Sustainable Governance Indicators 2011

I am really interested in public policy issues such as social justice, health inequality and the environment and hope that open data and open government policies will lead to being able to access these types of data, especially at the neighbourhood scales. I hope that apps will open the door to access, but that eventually we will work toward comprehensive access to data for this type analysis and develop new ways to dialogue between citizen and government using data for evidence-based decision-making.

Currently in Canada, it is incredibly difficult to put one of these reports together. The way data are aggregated differ and because one has to try and pry data from multiple federal agencies, multiple agencies in each province and territory and from a number of municipal agencies. Because of staff changes in government offices, contacts are lost and numerous cold calls have to be remade and data renegotiated.

Page 14 of this report shows the model used to create the indices in this OECD report. At a glance there are 29 variables, each consisting between one to 5 data sets suggesting that potentially these data may need to be accessed from more than 50 different public officials at different levels of government, divisions, departments, etc. Then there is the negotiating of use, licenses, costs, aggregation, accuracy, timeliness and formats since no two agencies even within one government department follow the same rule book and in fact, access is often determined by the mood of the public official or what they think the rules are. Doing a time series is even more complex as data are not collected at the same intervals. A follow up report to track trends requires almost the same amount of work since the data gathering process often has to start nearly from scratch. This is a highly inefficient and cost prohibitive process.

To make matters worse, in Canada, we have lost our think tanks and national social policy research organizations who used to do this kind of work as their funding was cut, and of course we have lost the census.

I hope we can think of open data and open government to include apps to get the bus, find a skating rink or remember to take out the garbage, but more importantly, to inform public policy on transit, public health, and the environment. Also, with open data we need the resources to produce information products such as this report. Many things can be crowdsourced, a census and this type of analysis cannot and there is a role for government and non profit organizations to translate the data into meaningful information and then for us to use that knowledge to improve, track and critique or develop new programs to address what the data tell us.

Apps rely on one or two datasets, these reports rely on hundreds. I want the hundreds which requires a broader open data policy in Canada at all levels of government and I would go further to suggest that open data needs to move beyond the institutional boundaries of IT and CIO divisions and into thematic areas, as that is where data for these indicators are produced and owned.

I met Alex at the Cybera Summit at the Banff Centre in October and that is where I was  introduced to the WEHUB. There are many interesting ways to do open data, science and to use the cloud to do so.  I invited Alex to prepare the following guest post about how WEHUB  does it.

Water and Environmental Hub…aggregating water data from across North America and making it available through an API


Alex Joseph, Executive Director – Water and Environmental Hub 

As anyone searching for water data from multiple sources knows…there isn’t really a Google for water data. 

A search for water data often results in a web page with a phone number to call someone, or an anonymous info request form. The water datasets that are available are often embedded as graphs in .pdf files obscuring the raw data or available in real time but embedded in html code on web pages. In the best cases, raw water data is available in large .zip files where you get the whole dataset or the opposite, you are faced with downloading hundreds of individual observation stations and then try and sew together hundreds of spreadsheet files, hoping that the columns all line up!

It gets even more time consuming and expensive when one tries to find water data that crosses political boundaries. Imagine the effort required to find data on the “Lake Winnipeg Watershed”? A search involves multiple provinces, states, 3 levels of government, multiple departments within those governments etc. etc. with a high probability that each of those datasets is in a different format.

Besides the challenges with access to water data, the few water datasets that are accessible on the web are unlikely to be provided through an API. Thus, those generous web developers that attended the World Bank sponsored Water Hackathons last week likely found that very little water data is available through an API allowing them to build dynamic water apps….

…but this is changing.

The Water and Environmental Hub (WEHUB) project is an open cloud-based web platform that aggregates, federates, and connects water data and information with users looking to search, discover, download, analyze, model and interpret water and environmental-based information. By combining water expertise with an open web development approach and an entrepreneurial foundation, the project hopes to spur economic diversification and benefit both public users and the private sector by improving the access to water data and tools for academia, government, industry, NGOs and the general public.

The WEHUB also enables organizations and users to develop customized applications on top of the WEHUB platform using our (RESTful) API, so that the data can be easily shared, integrated, leveraged, and customized.

The web platform is structured as a three-tiered system with a Client, Server and Database.  Each tier in the system is divided into components that address the catalogue, spatial and non-spatial data, and the social network requirements.  The catalogue acts as the index for the data and allows for easy search, download and upload of the data. The spatial data is shown on the client – as a map – making it easy for the user to visualize the data.  The social network allows for commenting, flagging and sharing of data. The WEHUB employs a Representational State Transfer (REST) software architecture. Open standards (e.g. OGC standards such as WMS, WFS, SOS, WaterML, GroundwaterML) are used whenever practical, efficient and economical to meet the needs of users.

In terms of geographical scope, the project began with Alberta and Western Canadian water data and information, a region to which the partners have relevant expertise and networks. As development successes are achieved, the project has extended across North America, with scalability a key design thrust.

Open Data & Public Research, U of T Open Access Week

The University of Toronto Map and Data Librarians put together a really fun panel for Open Access Week with the City of Toronto Open Data team and Jury Konga on the topic of Open Data. As promised here are my slides. There were some great questions from the audience and it was a very well attended session.

There is also an honourable mention to the Toronto Wellbeing initiative.

This year I thought I would honour Wendy Watkins a founder with Ernie Boyko of the Data Liberation Initiative (DLI) in Canada on for Ada Lovelace Day.

Wendy Watkins

In Canada our census data is sold back to us under a cost recovery program initiated by the Brian Mulroney Conservative Government in the early 1980s.  In fact, the Conservatives of that day also tried to Cancel the census but alas the constitution prevented them for doing so and instead they cut Statistics Canada’s budget severely which instituted a very regressive cost recovery practice.  The prices were so high that not only could citizens not afford to use their own data, universities encouraged students to use free US census data since they did not have the resources to pay for Canadian census data.  During those years, Canadians became experts on the US and not on Canada.

It is through the hard work of Wendy Watkins, her collaborators, data & map & research librarians that Canadian universities now have Census data for faculty and students along with associated census geographic files.  I had the good fortune as a student to benefit from the DLI.  Here is an excerpt from one of Wendy’s papers about the history of the DLI:

In April 1993, after receipt of the “Liberation Paper,” the Social Science Federation of Canada (SSFC) hosted a meeting with representatives from the Social Sciences and Humanities Research Council (SSHRC), the Association of Universities and Colleges of Canada (AUCC), the Canadian Association of Research Libraries (CARL), the Canadian Association of Public Data Users (CAPDU) and other interested parties to devise a strategy to make Canadian data more readily available to the education and research communities. The meeting resulted in the striking of a smaller working group, under the aegis of the SSFC, to devise a plan that would be acceptable to all parties. Statistics Canada and the DSP [Depository Service Program] played advisory roles in this process. While the initiative has involved government in an advisory role, it is unique in that it was conceived and developed by members of the Canadian research community.

The working group, consisting of researchers, representatives from CARL and CAPDU, as well as members of the SSFC, held a series of meetings over the next months. Advice from both Statistics Canada and the Depository Services Program was invited and found to be invaluable. When the group had formulated a working document to which both Statistics Canada and the DSP agreed, meetings were arranged with senior management in several government departments. The SSFC also met with Ministers and their executive assistants in order to move the proposal forward. Finally, in December 1995, the DLI had received a strong enough informal blessing that the project was deemed to be a go. Letters of agreement were distributed and data began to be released.

More officially, the DLI received approval by the Treasury Board Ministers in a February 1996 decision. It was subsequently included as part of the federal government’s Science and Technology Strategy in March. Most recently, in October 1996, it was officially announced by Dr. John Gerard, Minister of State for Science and Technology at a press conference held in conjunction with National Science and Technology Week and the 30th anniversary of Carleton University’s Data Centre. ( Data Liberation and Academic Freedom, 1996).

The DLI not only fueled Canadian research, it promoted academic freedom, advanced data driven informed decision-making and created a new class of librarian called data librarians and also data centres in libraries.  Data also became artifacts to be collected in libraries, which added a new practice of adding digital material in a catalog along with hard copy books on shelves, the DLI spurred the early adoption of the Internet with the use of basic FTP protocols to transfer data from Statistics Canada and university libraries, and it was the forerunner in the acquisition of digital materials.  The DLI also promoted collaboration between universities and government via a consortium agreement that has been embraced by other organizations such as the Community Social Data Strategy.  Finally, the DLI also accelerated a new type of expertise in data metadata, data catalogs, data citation and data preservation (b).

Today there is a very vibrant DLI community of practice that shares knowledge on a yearly basis at DLI Bootcamps, maintains a repository of training materials, an active blog Data Interests Group for Reference Services and actively exchanges expertise on a DLILIST listserv.

The DLI also politicized access to data very early on and in a sense they began the discourse on data access in Canada.  The cancellation of the 2011 Census being one of the big issues DLI supporters took on.  Further Wendy Watkins and her colleagues participate in key roundtable discussions on access to research data, the preservation of data and develop important infrastructures that disseminate Canadian Data.

Data users and Canadians can thank Wendy for being on the vanguard of open data, open government and data liberation in Canada and for building an incredible cadre of data literate librarians, faculty and students.  Open Data initiatives in Canada can benefit from her work and should recognize Wendy as one of their data access pioneers.  Now we just need to have a census and for those data to be cost free to the public.

