Cost Recovery Policies are NOT Synonymous with Data Quality

One of the great data myths is that cost recovery policies are synonymous with higher data quality. Often the myth making stems from effective communications from nations with heavy cost recovery policies such as the UK who often argue that their data are of better quality than those of the US which have open access policies. Canada, depending on the data and the agencies they come from is at either end of this spectrum and often in between.

I just read an interesting study that examined open access versus cost recovery for two framework datasets. The researchers looked at the technical characteristics and use of datasets from nations of similar socio-economic, jurisdiction size, population density, and government type (Netherlands, Denmark, German State of the North Rhine Westfalia, US State of Massachusetts and the US Metropolitan region of Minneapolis-St. Paul). The study compared parcel and large scale topographic datasets typically found as framework datasets in geospatial data infrastructures (see SDI def. page 8). Some of these datasets were free, some were extremely expensive and all under different licensing regimes that defined use. They looked at both technical (e.g. data quality, metadata, coverage, etc.) and non-technical characteristics (e.g. legal access, financial access, acquisition procedures, etc.).

For Parcel Datasets the study discovered that datasets that were assembled from a centralized authority were judged to be technically more advanced while those that require assembly from multiple jurisdictions with standardized or a central institution integrating them were of higher quality while those of multiple jurisdictions without standards were of poor quality as the sets were not harmonized and/or coverage was inconsistent. Regarding non-technical characteristics many datasets came at a high cost, most were not easy to access from one location and there were a variety of access and use restrictions on the data.

For Topographic Information the technical averages were less than ideal while for non-technical criteria access was impeded in some cases due to involvement of utilities (tendency toward cost recovery) and in other cases multiple jurisdictions – over 50 for some – need to be contacted to acquire a complete coverage and in some cases coverage is just not complete.

The study’s hypothesis was:

that technically excellent datasets have restrictive-access policies and technically poor datasets have open access policies.

General conclusion:

All five jurisdictions had significant levels of primary and secondary uses but few value-adding activities, possibly because of restrictive-access and cost-recovery policies.

Specific Results:

The case studies yielded conflicting findings. We identified several technically advanced datasets with less advanced non-technical characteristics…We also identified technically insufficient datasets with restrictive-access policies…Thus cost recovery does not necessarily signify excellent quality.

Although the links between access policy and use and between quality and use are apparent, we did not find convincing evidence for a direct relation between the access policy and the quality of a dataset.

Conclusion:

The institutional setting of a jurisdiction affects the way data collection is organized (e.g. centralized versus decentralized control), the extent to which data collection and processing are incorporated in legislation, and the extent to which legislation requires use within government.

…We found a direct link between institutional setting and the characteristics of the datasets.

In jurisdictions where information collection was centralized in a single public organization, datasets (and access policies) were more homogenous than datasets that were not controlled centrally (such as those of local governments). Ensuring that data are prepared to a single consistent specification is more easily done by one organization than by many.

…The institutional setting can affect access policy, accessibility, technical quality, and consequently, the type and number of users.

My Observations:
It is really difficult to find solid studies like this one that systematically look at both technical and access issues related to data. It is easy to find off the cuff statements without sufficient backup proof though! While these studies are a bit of a dry read, they demonstrate the complexities of the issues, try to tease out the truth, and reveal that there is no one stop shopping for data at any given scale in any country when it comes to data. In other words, there is merit in pushing for some sort of centralized, standardized and interoperable way – which could also mean distributed – to discover and access public data assets. In addition, there is an argument to be made to make those data freely (no cost) accessible in formats we can readily use and reuse. This of course includes standardizing licensing policies!

Reference Institutions Matter: The Impact of Institutional Choices Relative to Access Policy and Data Quality on the Development of Geographic Information Infrastructures by Van Loenen and De Jong in Research and Theory in Advancing Data Infrastructure Concepts edited by Harlan Onsrud, 2007 published by ESRI Press.

If you have references to more studies send them along!