Uncategorized

You are currently browsing the archive for the Uncategorized category.

I met with Wendy Watkins at the Carleton University Data Library Carleton University Data Library yesterday. She is one of the founders and current co-chair of DLI and CAPDU (Canadian Association of Public Data Users), a member of the governing council of the International Association of Social Science Information Service and Technology (IASSIST) and a great advocate for data accessibility and whatever else you can think of in relation to data.

Wendy introduced me to a very interesting project that is happening between and among university libraries in Ontario called the Ontario Data Documentation, Extraction Service Infrastructure Initiative (ODESI). ODESI will make discovery, access and integration of social science data from a variety of databases much easier.

Administration of the Project:

Carleton University Data Library in cooperation with the University of Guelph. The portal will be hosted at the Scholar’s Portal at the University of Toronto which makes online journal discovering and access a dream. The project is partially funded by the Ontario Council of University Libraries (OCUL) and OntarioBuys operated out of the Ontario Ministry of Finance. It is a 3 year project with $1 040 000 in funding.

How it works:

ODESI operates on a distributed data access model, where servers that host data from a variety of organizations will be accessed via Scholars’ Portal. The metadata are written in the DDI standard which produces XML. DDI is the

Data Documentation Initiative [which] is an international effort to establish a standard for technical documentation describing social science data. A membership-based Alliance is developing the DDI specification, which is written in XML.

The standard has been adopted by several international organizations such as IASSIST, Interuniversity Consortium for Political and Social Research (ICPSR), Council of European Social Science Data Archives (CESSDA) and several governmental departments including Statistics Canada, Health Canada and HRSDC.

Collaboration:

This project will integrate with and is based on the existing and fully operational Council of European Social Science Data Archives (CESSDA), which is cross boundary data initiative. CESSDA

promotes the acquisition, archiving and distribution of electronic data for social science teaching and research in Europe. It encourages the exchange of data and technology and fosters the development of new organisations in sympathy with its aims. It associates and cooperates with other international organisations sharing similar objectives.

The CESSDA Trans-Border Agreement and Constitution are very interesting models of collaboration. CESSDA is the governing body of a group of national European Social Science Data Archives. The CESSDA data portal is accompanied by a multilingual thesaurus, currently 13 nations and 20 organizations are involved and data from thousands of studies are made available to students, faculty and researchers at participating institutions. The portal search mechanism is quite effective although not pretty!

In addition, CESSDA is associated with a series of National Data Archives, Wow! Canada does not have a data archive!

Users:

Users would come to the portal, search across the various servers on the metadata fields, access the data. Additionally, users will be provided with some tools to integrate myriad data sets and conduct analyses with the use of statistical tools that are part of the service. For some of the data, basic thematic maps can also be made.

Eventually the discovery tools will be integrated with the journal search tools of the Scholar’s Portal. You will be able to search for data, find the journals that have used that data or vice versa, find the journal and then the data. This will hugely simplify the search and integration process of data analysis. At the moment, any data intensive research endeavour or data based project needs to dedicate 80-95% of the job to find the data from a bunch of different databases, navigating the complex licensing and access regimes, maybe pay a large sum of money, organizing the data in such a way that it is statistically accurate then make those comparisons. Eventually one gets to talk about results!

Data Access:

Both the CESSDA data portal project and ODESI are groundbreaking initiatives that are making data accessible to the research community. These data however will only be available to students, faculty and researchers at participating institutions. Citizens who do not fall into those categories can only search the metadata elements, see what is available but will not get access to the data.

Comment:

It is promising that a social and physical infrastructure exists to make data discoverable and accessible between and among national and international institutions. What is needed is a massive cultural shift in our social science data creating and managing institutions that would make them amenable to the creation of policies to unlock these same public data assets, some of the private sector data assets (Polls, etc.) and make them freely (as in no cost) available to all citizens.

The Canadian Recording Industry Association (CRIA) releases all kinds of data related to sales.  It is also an organization that has quite a bit of power with the Canadian Government.

Michael Geist has an interesting piece on interpreting CRIA sales data!  It is an industry I know very little about and I would probably have just accepted their reported numbers as I would not have had the contextual knowledge to frame what they were saying otherwise!

Numbers are tricky rascals at best! Especially when an industry is trying to lobby for its own interests and at times politicians just believe any ole number thrown at them!  Worse the wrong numbers, or numbers out of context get picked up by newswires and get repeated at nauseam!  Just depends who’s ear a particular industry has I guess and how much homework a reporter does.

One of the great data myths is that cost recovery policies are synonymous with higher data quality. Often the myth making stems from effective communications from nations with heavy cost recovery policies such as the UK who often argue that their data are of better quality than those of the US which have open access policies. Canada, depending on the data and the agencies they come from is at either end of this spectrum and often in between.

I just read an interesting study that examined open access versus cost recovery for two framework datasets. The researchers looked at the technical characteristics and use of datasets from nations of similar socio-economic, jurisdiction size, population density, and government type (Netherlands, Denmark, German State of the North Rhine Westfalia, US State of Massachusetts and the US Metropolitan region of Minneapolis-St. Paul). The study compared parcel and large scale topographic datasets typically found as framework datasets in geospatial data infrastructures (see SDI def. page 8). Some of these datasets were free, some were extremely expensive and all under different licensing regimes that defined use. They looked at both technical (e.g. data quality, metadata, coverage, etc.) and non-technical characteristics (e.g. legal access, financial access, acquisition procedures, etc.).

For Parcel Datasets the study discovered that datasets that were assembled from a centralized authority were judged to be technically more advanced while those that require assembly from multiple jurisdictions with standardized or a central institution integrating them were of higher quality while those of multiple jurisdictions without standards were of poor quality as the sets were not harmonized and/or coverage was inconsistent. Regarding non-technical characteristics many datasets came at a high cost, most were not easy to access from one location and there were a variety of access and use restrictions on the data.

For Topographic Information the technical averages were less than ideal while for non-technical criteria access was impeded in some cases due to involvement of utilities (tendency toward cost recovery) and in other cases multiple jurisdictions – over 50 for some – need to be contacted to acquire a complete coverage and in some cases coverage is just not complete.

The study’s hypothesis was:

that technically excellent datasets have restrictive-access policies and technically poor datasets have open access policies.

General conclusion:

All five jurisdictions had significant levels of primary and secondary uses but few value-adding activities, possibly because of restrictive-access and cost-recovery policies.

Specific Results:

The case studies yielded conflicting findings. We identified several technically advanced datasets with less advanced non-technical characteristics…We also identified technically insufficient datasets with restrictive-access policies…Thus cost recovery does not necessarily signify excellent quality.

Although the links between access policy and use and between quality and use are apparent, we did not find convincing evidence for a direct relation between the access policy and the quality of a dataset.

Conclusion:

The institutional setting of a jurisdiction affects the way data collection is organized (e.g. centralized versus decentralized control), the extent to which data collection and processing are incorporated in legislation, and the extent to which legislation requires use within government.

…We found a direct link between institutional setting and the characteristics of the datasets.

In jurisdictions where information collection was centralized in a single public organization, datasets (and access policies) were more homogenous than datasets that were not controlled centrally (such as those of local governments). Ensuring that data are prepared to a single consistent specification is more easily done by one organization than by many.

…The institutional setting can affect access policy, accessibility, technical quality, and consequently, the type and number of users.

My Observations:
It is really difficult to find solid studies like this one that systematically look at both technical and access issues related to data. It is easy to find off the cuff statements without sufficient backup proof though! While these studies are a bit of a dry read, they demonstrate the complexities of the issues, try to tease out the truth, and reveal that there is no one stop shopping for data at any given scale in any country when it comes to data. In other words, there is merit in pushing for some sort of centralized, standardized and interoperable way – which could also mean distributed – to discover and access public data assets. In addition, there is an argument to be made to make those data freely (no cost) accessible in formats we can readily use and reuse. This of course includes standardizing licensing policies!

Reference Institutions Matter: The Impact of Institutional Choices Relative to Access Policy and Data Quality on the Development of Geographic Information Infrastructures by Van Loenen and De Jong in Research and Theory in Advancing Data Infrastructure Concepts edited by Harlan Onsrud, 2007 published by ESRI Press.

If you have references to more studies send them along!

Boris leaves me excellent links from time to time in my del.icio.us account! I usually find them when i am in those in-between times, usually idling between jobs, that’s when i recall to go over and see what’zup and find lovely info gifts in the Links For You section. This time he left a delightful info present about an exquisite way to make the numbers tangible from the artistic expressions of Chris Jordan in his Running the Numbers photo exhibit.

This new series looks at contemporary American culture through the austere lens of statistics. Each image portrays a specific quantity of something: fifteen million sheets of office paper (five minutes of paper use); 106,000 aluminum cans (thirty seconds of can consumption) and so on. My hope is that images representing these quantities might have a different effect than the raw numbers alone, such as we find daily in articles and books. Statistics can feel abstract and anesthetizing, making it difficult to connect with and make meaning of 3.6 million SUV sales in one year, for example, or 2.3 million Americans in prison, or 426,000 cell phones retired every day. This project visually examines these vast and bizarre measures of our society, in large intricately detailed prints assembled from thousands of smaller photographs.

I luv how he plays with scale and patterns to represent the tyranny of our mass consumption (see Plastic Bottles, 2007) or his choice of materials (see Building Blocks, 2007) to symbolize an issue.

Chris Jordan Shipping Containers 2007

Here are some of the photographic themes his photos depict:

  • nine million wooden ABC blocks, equal to the number of American children with no health insurance coverage in 2007.
  • 8 million toothpicks, equal to the number of trees harvested in the US every month to make the paper for mail order catalogs.
  • two million plastic beverage bottles, the number used in the US every five minutes.
  • 65,000 cigarettes, equal to the number of American teenagers under age eighteen who become addicted to cigarettes every month.

Material and consumption culture is frighteningly beautiful in his photos. My favorite is the

  • 75,000 shipping containers, the number of containers processed through American ports every day (Photos in this post).

Chris Jordan Shipping Containers 2007

That’s allot of stuff moving from place to place!

I tripped over this yesterday while looking for some arguments for and against cost recovery. The arguments are quite good and comprehensive. If any of you can think of more send them to the civicacces.ca list or leave comments here.

This texte I believe was put together by Jo Walsh and colleagues as they were preparing positions for the INSPIRE Directive that became official May 7, 2007. Public Geo Data put together a great campaign, an online petition, a discussion list and superb material to lobby EUROGI for Free and Open Access to Geo Data. At the time the UK was pushing heavily for the Ordnance Survey‘s extreme cost recovery model for the EU while other European nations were working towards more open and free access models. You can read more about it by going through the archive of their mailing list.

Here is the full text for Why Should Government Spatial Data be Free?

I’m going to feature some Canadian data access projects and people working with data in Canada that I find interesting and important on datalibre.ca . Here is my first go at it. Hope you like it! It is about a great program called the Data Liberation Initiative (DLI) that was formally instituted in 1996. I greatly benefited from the DLI as an undergraduate student studying Geomatics at Carleton University.

Tracey

************
Did you know that until the latter half of the 1990s students and faculty in Canadian Universities had to pay for Canadian Demographic Data that were collected with the use of their own tax dollars? Well it’s true! If students and faculty wanted access to Statistics Canada data to conduct any kind of demographic analysis, to study the labour market, or issues related to income and poverty, explore provincial migration patterns etc. they had to pay exorbitant amounts. What was the effect? Canadian students became US experts since their data were free and worse policy decisions for Canadians were based on US data! The real knowledge and social cost of Data Cost Recovery policies can never be recovered!

Why access to Canadian public data?

I think Professor Paul Bernard, Chair, Advisory Committee on Social Conditions (Statistics Canada) and member of the National Statistics Council said it well back in 1991:

…the genuine exercise of democracy increasingly requires that citizens get access to complex information and have the skills required to understand it.” While he realizes there are pressures on Statistics Canada to reduce costs and increase income, he feels the outcome has been the restriction of “…access to information only to groups that have the solid ability to pay.” Bernard feels that this may “…hamper the participation in public debates of groups whose contribution is not backed up by much money” as well as “those who have no prospect of turning a profit or reaping some tangible and relatively immediate benefit from using it.” This, he states, is “…likely to lead, in the long run, to suboptimal development and less than full-blown democracy.” (see Watkins).

Interestingly, since 1927 the Government of Canada did have a program to share Government information via the Depository Services Program (DSP) which is

an arrangement with some 680 public and academic libraries to house, catalogue and provide reference services for the federal government publications they acquire under the Program. These depositories must make their DSP collections available to all Canadians and for interlibrary loans. DSP also includes depositories such as Parliamentarians, central libraries of the federal government departments and press libraries.

The DSP however does not include the dissemination of public data files or databases collected and managed by the Government of Canada. Data users were and still are considered a special interest group. Odd! Numerate Canadian citizens a special interest group? Imagine literate Canadian citizens being considered a special interest group! Anyway, this meant that independent analysis on a variety of topics important to Canadians was left unquestioned, unstudied, ignored and unknown. Not the best scenario for a democracy or a knowledge based economy let alone for the promotion and growth of a numerate workforce and citizenry.

Fortunately, in 1993 we see the early formation of the Data Liberation Initiative (DLI). An early working group consisting of researchers, data librarians and representatives from Canadian Association of Research Libraries (CARL) and Canadian Association of Public Data Users (CAPDU) , Statistics Canada and the DSP as well as members of the Social Science Federation of Canada (SSFC) got together and held a series of meetings. In 1995 Statistics Canada gave the DLI its formal blessing and the DLI received Treasury Board approval in1996.

What is the Data Liberation Initiative?

The DLI a data purchasing consortium between Canadian Universities and Statistics Canada. Large universities pay $12,000 per year and smaller universities pay $3,000. The Treasury Board of Canada, Industry Canada, Health Canada, Human Resources Development Canada, Social Sciences and Humanities Research Council of Canada, Medical Research Council of Canada and Statistics Canada also financially contribute. These institutions subscribe to the service.

The DLI provides

affordable and equitable access to the standard data products listed in the Statistics Canada Catalogue through an annual subscription fee. The terms of agreement specified in the DLI license place conditions on the use of products disseminated through this program. These restrictions are directed at stopping the redistribution of data received through this channel and protecting against the loss of sales to non-educational markets for Statistics Canada, which is known within Statistics Canada as “leakage”. The license allows the use of DLI data for non-profit, academic research and instruction. Access to statistical information through DLI does require student or staff affiliation with a DLI member institution. While students and staff do not have to pay directly for access, DLI does require mediated services to disseminate statistical and data products on local campuses.

How does it works:

Students and Faculty go to their respective data libraries , consult with the data librarian, sign a use agreement in plain english a DLI Data Use License, access the data via a dedicated computer and download what they need.

The Infrastructure:

An elaborate organizational structure with very dedicated members is in place with a data delivery technical infrastructure that includes a web site, an FTP service, CDRom data delivery service and a special order process. In addition each participating university institutes a ‘data service’ which assumes responsibility for DLI at their site. The project is also glued together with two listserves. The data files are delivered in ASCII formats with associated metadata discoverable using StatCan Software at dedicated workstations in the Library.

Critical Note:

The DLI was and is the best possible reaction and compromise to the very restrictive data cost recovery policies initiated in 80s that remain alive and well with us today. It is important to repeat that these public data have already been paid for by taxation, they are re-paid for with tuition and DLI data access is restricted only to Canadians who are university students and faculty. The DLI solved one very important Canadian knowledge creation and dissemination issue in academic institutions but not the broader issue of access to data by Canadian citizens. They did set a precedent!

Statistics Canada data are still sold to Federal Departments, Provincial Governments and Municipal Governments who are not allowed to share between and among them due to very stringent licensing regimes. Our taxes have paid for many of the same datasets multiple times since these are government purchases and transactions. Just think of all the bureaucracy to manage these license regimes, royalties, the lawyers, purchasing and accounting services, storage, and so on. In addition civil society organizations such as Non Governmental Organizations, Non Profit Organizations, Community Based Researchers etc. who are not wealthy yet fulfill an important democratic function cannot afford these data while it is their role to keep government accountable on a variety of issues (e.g. Environment, Homelessness, Education etc.). Further citizens who want to learn about their communities, develop a community plan or start a new business want access to data but can only do so if they have a significant amount of cash to do so. The result – a lack of informed decision making.

Dream Idea:

It would be fantastic to have the knowledge, training and infrastructure of the DLI extended to all of our public libraries and community access points. Imagine knowledge one stop shopping – picking up a video, a music CD, a novel and some demographic data related to school closures in your neighbourhood – Wow! Of course, the data should be at no cost to the citizen nor the library. Also, imagine having a data librarian in every library that can help citizens find the data they need and helping them learn how to use them? Now that is a knowledge Society.

References:

You can access the documents I referred to here – my del.icio.us – tagged with datalibre civicaccess and DLI.

Continuum of Access, By Chuck Humphrey, University of Alberta.

Charles Humphrey (2005). Collaborative Training in Statistical and Data Library Services: Lessons from the Canadian Data Liberation Initiative. Resource Sharing & Information Networks, Vol. 18 (1/2), pp. 167-181.

The great and famous Rosling Video, about data, from TED.

Not canadian but could be?  We have the best Radarsat data in the world  and have done some great work in the past with tracking down toxic bins floating around in flood zones using radar.  Radar is the only remote sensing technique that will cut through rain, fog, and cloud cover thus ideal during tropical storms, or for rainforest imagery.

The funding mechanism is also very interesting.

Newer entries »