Uncategorized

You are currently browsing the archive for the Uncategorized category.

CODATA has just released the following special journal issue –

Open Data for Global Science

Lots of great FREE articles from a variety of perspectives – Public policy, standards, case studies, etc. -  on access to scientific data.

Checking the #s

A great post on Michael Geist’s blog about really really checking the numbers! Read – Misleading RCMP Data Undermines Counterfeiting Claims.

An excellent article by CBC online today –
Mixed-race identity: The Current looks at the growing number of mixed-race Canadians.  It is a great look at how statistics and social change are ways to help us understand the composition of our societies, their increasing complexity and how this information leads to change in public policy.  The release of the 2006 Census data has fueled some great articles in the Ottawa Citizens, one on how Same-sex couples now part of tall. Seems like there are more males than females in same sex relationships, 9% of the couples have children and these are found predominantly in the two mom categories.  Toronto, Montreal, West Coast Vancouver and Ottawa have the highest population of same sex couples.  In addition seems like Married people now in the minority in Canada but that people remain in committed relationships and rear children!  According to the article this is not much concern providing that:

Canadians continue to form families that fulfill the societal functions they always have — providing economic stability, raising children, instilling values — the categorization of those relationships can be “completely irrelevant.”

But Ms. Tipper said the rise in the number of some family groupings, such as those headed by lone parents, explains part of the increase in the unmarried population and that represents a significant social and economic challenge for Canada.

A NYTimes Editorial What You Don’t Know Can Hurt You, discusses the real cost of not having information and the politics of the US Census.

Just before the break, the House of Representatives passed a bill that would cut $23.6 million from the bureau’s 2008 budget for compiling the nation’s most important economic statistics. A cut of that size would result in the largest loss of source data since the government started keeping the statistics during the Great Depression, impairing the accuracy of figures on economic growth, consumer spending, corporate profits, labor productivity, inflation and other benchmark indicators.

Imagine the Ottawa River Keeper having access to this type of data! Or for the folks along the St-Laurent Sea Way! How wonderful for citizens to be able to view a 3D model of their rivers and their conditions at any time of day!

This is exactly what is going on along the Hudson where IBM and the Beacon Institute, a nonprofit scientific-research organization in NY are collaborating on the development of the River and Estuary Observatory Network (REON) which is a

distributed-processing hardware and analytical software, the system designed to take heterogeneous data from a variety of sources and make sense of it in real time. The software learns to recognize data patterns and trends and prioritizes useful data. If some data stream begins to exhibit even minor variations, the system automatically redirects resources toward it. The system will also be equipped with IBM’s visualization technologies; fed with mapping data, they can create a virtual model of the river and simulate its ecosystem in real time.

The type of data that will be gathered from sensor reports are

temperature, pressure, salinity, dissolved oxygen content, and pH levels, which will indicate whether pollutants have entered the river. Other sensors will be directed toward sea life, says Nierzwicki-Bauer, and will be used to study species and determine how communities of microscopic organisms change over time.

It is expected there will be many hundreds of sensor required for this project that will rely on fibre optic cables and wireless technologies. Eventually the system will be connected to Ocean sensor and monitoring networks.

REON

Ah! Nice to see some exciting data collecting activities!

Via:
Networking the Hudson River: The Hudson could become the world’s largest environmental-monitoring system. By Brittany Sauser.

This CBC.ca video gives a brief on how 2d and 3d street view data are collected. In this case it is the city of Toronto and the data collector is Tele Atlas. The things cartographers do to make maps! Tele Atlas seems to be selling georeferenced landmarks, street networks, and a variety of other data it collects simply by driving the streets with cameras and GPS mounted on the roof of cars. At 500 km a day and terrabytes of data, these folks are collecting and selling tons of geo-information that we like to play with on google earth, help find places in mapquest, and allow city planners or police forces to prepare evacuation plans, understand the characteristics of the route planned for a protest or know the point address in a 911 call.

The video also briefly discusses privacy issues, seems like the street is public space and if you happen to be naughty going into some taudry establishment and your act happens to be caught on film, well, so be it, either behave or accept the digital consequences of your private acts in public space, or so the video suggests!

Regarding access to these data, well, my guess is a big price tag. It is a private company after all!

The Information Machine is a short film written, produced and directed by Charles and Rae Eames for the IBM Pavillion at the 1958 Brussels World’s Fair. Animation by John Whitney. Music by Elmer Bernstein. The topic is primarly about the computer in the context of human development but I think it also represents our fascination and need to collect, organize data and abstract the world around us. Since it was written in 1958 it does go on about he, his, him, man and men’s yada yada at nauseaum, it nonethelss remains a cute informative short film in the public domain and captured in the Internet Archive and does represent ideas as relevant to us today as they were then!

via: Information Aesthetics 

I met with Wendy Watkins at the Carleton University Data Library Carleton University Data Library yesterday. She is one of the founders and current co-chair of DLI and CAPDU (Canadian Association of Public Data Users), a member of the governing council of the International Association of Social Science Information Service and Technology (IASSIST) and a great advocate for data accessibility and whatever else you can think of in relation to data.

Wendy introduced me to a very interesting project that is happening between and among university libraries in Ontario called the Ontario Data Documentation, Extraction Service Infrastructure Initiative (ODESI). ODESI will make discovery, access and integration of social science data from a variety of databases much easier.

Administration of the Project:

Carleton University Data Library in cooperation with the University of Guelph. The portal will be hosted at the Scholar’s Portal at the University of Toronto which makes online journal discovering and access a dream. The project is partially funded by the Ontario Council of University Libraries (OCUL) and OntarioBuys operated out of the Ontario Ministry of Finance. It is a 3 year project with $1 040 000 in funding.

How it works:

ODESI operates on a distributed data access model, where servers that host data from a variety of organizations will be accessed via Scholars’ Portal. The metadata are written in the DDI standard which produces XML. DDI is the

Data Documentation Initiative [which] is an international effort to establish a standard for technical documentation describing social science data. A membership-based Alliance is developing the DDI specification, which is written in XML.

The standard has been adopted by several international organizations such as IASSIST, Interuniversity Consortium for Political and Social Research (ICPSR), Council of European Social Science Data Archives (CESSDA) and several governmental departments including Statistics Canada, Health Canada and HRSDC.

Collaboration:

This project will integrate with and is based on the existing and fully operational Council of European Social Science Data Archives (CESSDA), which is cross boundary data initiative. CESSDA

promotes the acquisition, archiving and distribution of electronic data for social science teaching and research in Europe. It encourages the exchange of data and technology and fosters the development of new organisations in sympathy with its aims. It associates and cooperates with other international organisations sharing similar objectives.

The CESSDA Trans-Border Agreement and Constitution are very interesting models of collaboration. CESSDA is the governing body of a group of national European Social Science Data Archives. The CESSDA data portal is accompanied by a multilingual thesaurus, currently 13 nations and 20 organizations are involved and data from thousands of studies are made available to students, faculty and researchers at participating institutions. The portal search mechanism is quite effective although not pretty!

In addition, CESSDA is associated with a series of National Data Archives, Wow! Canada does not have a data archive!

Users:

Users would come to the portal, search across the various servers on the metadata fields, access the data. Additionally, users will be provided with some tools to integrate myriad data sets and conduct analyses with the use of statistical tools that are part of the service. For some of the data, basic thematic maps can also be made.

Eventually the discovery tools will be integrated with the journal search tools of the Scholar’s Portal. You will be able to search for data, find the journals that have used that data or vice versa, find the journal and then the data. This will hugely simplify the search and integration process of data analysis. At the moment, any data intensive research endeavour or data based project needs to dedicate 80-95% of the job to find the data from a bunch of different databases, navigating the complex licensing and access regimes, maybe pay a large sum of money, organizing the data in such a way that it is statistically accurate then make those comparisons. Eventually one gets to talk about results!

Data Access:

Both the CESSDA data portal project and ODESI are groundbreaking initiatives that are making data accessible to the research community. These data however will only be available to students, faculty and researchers at participating institutions. Citizens who do not fall into those categories can only search the metadata elements, see what is available but will not get access to the data.

Comment:

It is promising that a social and physical infrastructure exists to make data discoverable and accessible between and among national and international institutions. What is needed is a massive cultural shift in our social science data creating and managing institutions that would make them amenable to the creation of policies to unlock these same public data assets, some of the private sector data assets (Polls, etc.) and make them freely (as in no cost) available to all citizens.

The Canadian Recording Industry Association (CRIA) releases all kinds of data related to sales.  It is also an organization that has quite a bit of power with the Canadian Government.

Michael Geist has an interesting piece on interpreting CRIA sales data!  It is an industry I know very little about and I would probably have just accepted their reported numbers as I would not have had the contextual knowledge to frame what they were saying otherwise!

Numbers are tricky rascals at best! Especially when an industry is trying to lobby for its own interests and at times politicians just believe any ole number thrown at them!  Worse the wrong numbers, or numbers out of context get picked up by newswires and get repeated at nauseam!  Just depends who’s ear a particular industry has I guess and how much homework a reporter does.

One of the great data myths is that cost recovery policies are synonymous with higher data quality. Often the myth making stems from effective communications from nations with heavy cost recovery policies such as the UK who often argue that their data are of better quality than those of the US which have open access policies. Canada, depending on the data and the agencies they come from is at either end of this spectrum and often in between.

I just read an interesting study that examined open access versus cost recovery for two framework datasets. The researchers looked at the technical characteristics and use of datasets from nations of similar socio-economic, jurisdiction size, population density, and government type (Netherlands, Denmark, German State of the North Rhine Westfalia, US State of Massachusetts and the US Metropolitan region of Minneapolis-St. Paul). The study compared parcel and large scale topographic datasets typically found as framework datasets in geospatial data infrastructures (see SDI def. page 8). Some of these datasets were free, some were extremely expensive and all under different licensing regimes that defined use. They looked at both technical (e.g. data quality, metadata, coverage, etc.) and non-technical characteristics (e.g. legal access, financial access, acquisition procedures, etc.).

For Parcel Datasets the study discovered that datasets that were assembled from a centralized authority were judged to be technically more advanced while those that require assembly from multiple jurisdictions with standardized or a central institution integrating them were of higher quality while those of multiple jurisdictions without standards were of poor quality as the sets were not harmonized and/or coverage was inconsistent. Regarding non-technical characteristics many datasets came at a high cost, most were not easy to access from one location and there were a variety of access and use restrictions on the data.

For Topographic Information the technical averages were less than ideal while for non-technical criteria access was impeded in some cases due to involvement of utilities (tendency toward cost recovery) and in other cases multiple jurisdictions – over 50 for some – need to be contacted to acquire a complete coverage and in some cases coverage is just not complete.

The study’s hypothesis was:

that technically excellent datasets have restrictive-access policies and technically poor datasets have open access policies.

General conclusion:

All five jurisdictions had significant levels of primary and secondary uses but few value-adding activities, possibly because of restrictive-access and cost-recovery policies.

Specific Results:

The case studies yielded conflicting findings. We identified several technically advanced datasets with less advanced non-technical characteristics…We also identified technically insufficient datasets with restrictive-access policies…Thus cost recovery does not necessarily signify excellent quality.

Although the links between access policy and use and between quality and use are apparent, we did not find convincing evidence for a direct relation between the access policy and the quality of a dataset.

Conclusion:

The institutional setting of a jurisdiction affects the way data collection is organized (e.g. centralized versus decentralized control), the extent to which data collection and processing are incorporated in legislation, and the extent to which legislation requires use within government.

…We found a direct link between institutional setting and the characteristics of the datasets.

In jurisdictions where information collection was centralized in a single public organization, datasets (and access policies) were more homogenous than datasets that were not controlled centrally (such as those of local governments). Ensuring that data are prepared to a single consistent specification is more easily done by one organization than by many.

…The institutional setting can affect access policy, accessibility, technical quality, and consequently, the type and number of users.

My Observations:
It is really difficult to find solid studies like this one that systematically look at both technical and access issues related to data. It is easy to find off the cuff statements without sufficient backup proof though! While these studies are a bit of a dry read, they demonstrate the complexities of the issues, try to tease out the truth, and reveal that there is no one stop shopping for data at any given scale in any country when it comes to data. In other words, there is merit in pushing for some sort of centralized, standardized and interoperable way – which could also mean distributed – to discover and access public data assets. In addition, there is an argument to be made to make those data freely (no cost) accessible in formats we can readily use and reuse. This of course includes standardizing licensing policies!

Reference Institutions Matter: The Impact of Institutional Choices Relative to Access Policy and Data Quality on the Development of Geographic Information Infrastructures by Van Loenen and De Jong in Research and Theory in Advancing Data Infrastructure Concepts edited by Harlan Onsrud, 2007 published by ESRI Press.

If you have references to more studies send them along!

« Older entries § Newer entries »