The Data Liberation Initiative (DLI)

I’m going to feature some Canadian data access projects and people working with data in Canada that I find interesting and important on datalibre.ca . Here is my first go at it. Hope you like it! It is about a great program called the Data Liberation Initiative (DLI) that was formally instituted in 1996. I greatly benefited from the DLI as an undergraduate student studying Geomatics at Carleton University.

Tracey

************
Did you know that until the latter half of the 1990s students and faculty in Canadian Universities had to pay for Canadian Demographic Data that were collected with the use of their own tax dollars? Well it’s true! If students and faculty wanted access to Statistics Canada data to conduct any kind of demographic analysis, to study the labour market, or issues related to income and poverty, explore provincial migration patterns etc. they had to pay exorbitant amounts. What was the effect? Canadian students became US experts since their data were free and worse policy decisions for Canadians were based on US data! The real knowledge and social cost of Data Cost Recovery policies can never be recovered!

Why access to Canadian public data?

I think Professor Paul Bernard, Chair, Advisory Committee on Social Conditions (Statistics Canada) and member of the National Statistics Council said it well back in 1991:

…the genuine exercise of democracy increasingly requires that citizens get access to complex information and have the skills required to understand it.” While he realizes there are pressures on Statistics Canada to reduce costs and increase income, he feels the outcome has been the restriction of “…access to information only to groups that have the solid ability to pay.” Bernard feels that this may “…hamper the participation in public debates of groups whose contribution is not backed up by much money” as well as “those who have no prospect of turning a profit or reaping some tangible and relatively immediate benefit from using it.” This, he states, is “…likely to lead, in the long run, to suboptimal development and less than full-blown democracy.” (see Watkins).

Interestingly, since 1927 the Government of Canada did have a program to share Government information via the Depository Services Program (DSP) which is

an arrangement with some 680 public and academic libraries to house, catalogue and provide reference services for the federal government publications they acquire under the Program. These depositories must make their DSP collections available to all Canadians and for interlibrary loans. DSP also includes depositories such as Parliamentarians, central libraries of the federal government departments and press libraries.

The DSP however does not include the dissemination of public data files or databases collected and managed by the Government of Canada. Data users were and still are considered a special interest group. Odd! Numerate Canadian citizens a special interest group? Imagine literate Canadian citizens being considered a special interest group! Anyway, this meant that independent analysis on a variety of topics important to Canadians was left unquestioned, unstudied, ignored and unknown. Not the best scenario for a democracy or a knowledge based economy let alone for the promotion and growth of a numerate workforce and citizenry.

Fortunately, in 1993 we see the early formation of the Data Liberation Initiative (DLI). An early working group consisting of researchers, data librarians and representatives from Canadian Association of Research Libraries (CARL) and Canadian Association of Public Data Users (CAPDU) , Statistics Canada and the DSP as well as members of the Social Science Federation of Canada (SSFC) got together and held a series of meetings. In 1995 Statistics Canada gave the DLI its formal blessing and the DLI received Treasury Board approval in1996.

What is the Data Liberation Initiative?

The DLI a data purchasing consortium between Canadian Universities and Statistics Canada. Large universities pay $12,000 per year and smaller universities pay $3,000. The Treasury Board of Canada, Industry Canada, Health Canada, Human Resources Development Canada, Social Sciences and Humanities Research Council of Canada, Medical Research Council of Canada and Statistics Canada also financially contribute. These institutions subscribe to the service.

The DLI provides

affordable and equitable access to the standard data products listed in the Statistics Canada Catalogue through an annual subscription fee. The terms of agreement specified in the DLI license place conditions on the use of products disseminated through this program. These restrictions are directed at stopping the redistribution of data received through this channel and protecting against the loss of sales to non-educational markets for Statistics Canada, which is known within Statistics Canada as “leakage”. The license allows the use of DLI data for non-profit, academic research and instruction. Access to statistical information through DLI does require student or staff affiliation with a DLI member institution. While students and staff do not have to pay directly for access, DLI does require mediated services to disseminate statistical and data products on local campuses.

How does it works:

Students and Faculty go to their respective data libraries , consult with the data librarian, sign a use agreement in plain english a DLI Data Use License, access the data via a dedicated computer and download what they need.

The Infrastructure:

An elaborate organizational structure with very dedicated members is in place with a data delivery technical infrastructure that includes a web site, an FTP service, CDRom data delivery service and a special order process. In addition each participating university institutes a ‘data service’ which assumes responsibility for DLI at their site. The project is also glued together with two listserves. The data files are delivered in ASCII formats with associated metadata discoverable using StatCan Software at dedicated workstations in the Library.

Critical Note:

The DLI was and is the best possible reaction and compromise to the very restrictive data cost recovery policies initiated in 80s that remain alive and well with us today. It is important to repeat that these public data have already been paid for by taxation, they are re-paid for with tuition and DLI data access is restricted only to Canadians who are university students and faculty. The DLI solved one very important Canadian knowledge creation and dissemination issue in academic institutions but not the broader issue of access to data by Canadian citizens. They did set a precedent!

Statistics Canada data are still sold to Federal Departments, Provincial Governments and Municipal Governments who are not allowed to share between and among them due to very stringent licensing regimes. Our taxes have paid for many of the same datasets multiple times since these are government purchases and transactions. Just think of all the bureaucracy to manage these license regimes, royalties, the lawyers, purchasing and accounting services, storage, and so on. In addition civil society organizations such as Non Governmental Organizations, Non Profit Organizations, Community Based Researchers etc. who are not wealthy yet fulfill an important democratic function cannot afford these data while it is their role to keep government accountable on a variety of issues (e.g. Environment, Homelessness, Education etc.). Further citizens who want to learn about their communities, develop a community plan or start a new business want access to data but can only do so if they have a significant amount of cash to do so. The result – a lack of informed decision making.

Dream Idea:

It would be fantastic to have the knowledge, training and infrastructure of the DLI extended to all of our public libraries and community access points. Imagine knowledge one stop shopping – picking up a video, a music CD, a novel and some demographic data related to school closures in your neighbourhood – Wow! Of course, the data should be at no cost to the citizen nor the library. Also, imagine having a data librarian in every library that can help citizens find the data they need and helping them learn how to use them? Now that is a knowledge Society.

References:

You can access the documents I referred to here – my del.icio.us – tagged with datalibre civicaccess and DLI.

Continuum of Access, By Chuck Humphrey, University of Alberta.

Charles Humphrey (2005). Collaborative Training in Statistical and Data Library Services: Lessons from the Canadian Data Liberation Initiative. Resource Sharing & Information Networks, Vol. 18 (1/2), pp. 167-181.

5 comments

Helen Clarke reported in March 2006 that Statistics Canada was planning to convert its electronic publications to open access, starting in April 2006. What ever happened to that plan?

They made their PDF copies of publications free if acquired online but not the data that went into creating them (http://www.statcan.ca/cgi-bin/downpub/freepub.cgi). Publications contain the information created with data. The data is what people would like access to so that they may do their own analysis, make maps of their neighbourhoods, conduct an analysis about school closures to keep the school boards accountable, look at housing statistics, decide where they want to locate their businesses or do other market research, etc. At the moment that data is not freely (as in no cost) available to citizens, NGOs, community health centres, municipalities etc. We would luv for the public data that we have already paid for in taxation to be available at no cost to citizens.

You wrote:

“The DSP however does not include the dissemination of public data files or databases collected and managed by the Government of Canada. ”

This is not entirely true. DSP libraries do get quite a few CD-ROM-based databases (e.g. Labour Force Historical Review, Income Trends in Canada) that are sometimes the only source of data on particular topics, even for DLI institutions. DSP libraries are also eligible to register with E-STAT, which provides access to a lot of census data (with mapping capabilties), plus CANSIM, STC’s time-series database (althoough the E-STAT version of CANSIM is only updated once per year, unlike the commercial version). There are currently 117 DSP libraries registered with E-STAT (according to http://www.statcan.ca/english/Estat/Schools/DSP/dsp.htm), so access is not as restricted as you might think. Ultimately, I suspect that the problem is due to Crown copyright and the fact that STC cannot afford to forego the income it currently receives from the sale of data files, due to inadequate government funding. I can tell you that many individual STC employees would *love* to give the data away, but are prevented from doing so. The situation has improved tremendously in just the last few years, but still has a long way to go.

This is really great info! Thanks Walter!

I just got off the phone with the Greenborough Branch of the Ottawa Public Library (OPL). She explained that yes in fact they had a small subset of the CANSIM data available to patrons who came to visit the library in person. She also knew that the Library had a few CDRoms of data but she did not know off hand what was on the CDs. The E-STAT data and the details of which data sets are available was not listed on the OPL website while reference to the E-Stat dbase was listed – http://www.biblioottawalibrary.ca/connect/online_resources/subject_e.cfm?subject_id=alpha#results.

The librarian did however indicate that the intended users for these datasets were high school students and lay researchers. She also indicated that more detailed data for more sophisticated applications were available from StatCan for a fee. Additionally she did not know if the data available from their online subscription services were downloadable or only for viewing as she had never had a request nor had she ever tried to access these data herself. I did however peak her interest to try!

I will make a trip to the Metcalfe Library which is the Main OPL Branch (Downtown Ottawa) tomorrow armed with a data key and a hopeful attitude and will report back the experience in a post! So stay tuned!

Regarding public officials, I completely agree that many officials would rather make the data available. These are the subject matter and data specialists; the managers who may not be neither of those may not be as amenable to those ideas. I did come across some studies that suggest that the cost of managing the sale and royalties of a really small amount of data is more expensive than just giving them away. The cost of lost knowledge as we both know as a result of cost recovery is incalculable and ephemeral. I will see if I can dig up these documents in the coming weeks and speak to them here.

Cheers
t