udell on climate

More interesting stuff from Jon Udell, this time taking some climate data for his area, using the ManyEyes platform and trying to see what has been happening in New Hampshire in the last century.

The experiment is non-conclusive, but there is an excellent debate in the comment threads, about the problems with amateurs getting their hands on the data – and the hash they can make of things because they are not experts.

Says one commenter (Brendan Lane Larson, Meteorologist, Weather Informaticist and Member of the American Meteorological Society)

Your vague “we” combined with the demonstration of the Many Eyes site trivializes the process of evidence exploration and collaborative interpretation (community of practice? peer review?) with an American 1960s hippy-like grandiose dream of democratization of visualized data that doesn’t need to be democratized in the first place. Did you read the web page at the URI that Bob Drake posted in comments herein? Do you really think that a collective vague “we” is going to take the time to read and understand (or have enough background to understand) the processes presented on that page such as “homogenization algorithms” and what these algorithms mean generally and specifically?

To which Udell replies:

I really do think that the gap between what science does and what the media says (and what most people understand) about what science does can be significantly narrowed by making the data behind the science, and the interpretation of that data, and the conversations about the interpretations, a lot more accessible.

To turn the question around, do you think we can, as a democratic society, make the kinds of policy decisions we need to make — on a range of issues — without narrowing that gap?

There is much to be said about this … but Larson’s comment “Do you really think that a collective vague “we” is going to take the time to read and understand (or have enough background to understand) the … XYZ…” is the same question that has been asked countless times, about all sorts of open approaches (from making software, to encyclopaedia, to news commentary). And the answer in general is “yes.” That is, not every member of the vague “we” will take the time, but very often with issues of enough importance, many of the members of the vague “we” can and do take the time to understand, and might just do a better job of demonstrating, interpreting or contextualizing data in ways that other members of the vague “we” can connect with and understand.

The other side of the coin of course, is that along with the good amateur stuff there is always much dross – data folk are legitimately worried about an uneducated public getting their hands on data and making all sorts of errors with it – which of course is not a good thing. But, I would argue, the potential gains from an open approach to data outweigh the potential problems.

UDATE: good addition to the discussion from Mike Caulfield.

3 comments

data folk are legitimately worried about an uneducated public getting their hands on data and making all sorts of errors with it – which of course is not a good thing

I’ve only dealt with a few small repositories of data (some for bioinformatics, some for computer security). The data itself is useful, but the metadata behind it at least as important: How was the data collected? What does it include? How do we pare out non-representative data? How do we quantify and eliminate noise from the dataset?

Udell neatly avoids the hard question in your original post:

Your vague “we” combined with the demonstration of the Many Eyes site trivializes the process of evidence exploration and collaborative interpretation (community of practice? peer review?)

How do the people using the data know that they are interpreting it properly? How do they know what visualizations make sense? How do they tell the difference between random correlation and actual causation?

Any idiot can throw data at a visualization tool and produce meaningless pictures. The trick lies in encoding the appropriate metadata in the original dataset, making tools aware of it, and ensuring that the tool can provide appropriate commentary to prevent naive users from falling into common traps.

“The data itself is useful, but the metadata behind it at least as important”…

right and i guess the idea is that the more “open” your data and meta data is, the more likely we are to get the good interpretations out of it, because we can see not just the visualization/analysis, but also the underlying data, and even more the meta data. the other option is to “trust the experts” which… has a number of flaws.

not least of which is that for the general public, the usual method of exposure to data (if they have any at all) is the mainstream media (also, apparently, “experts”). one area where I have a little bit of expertise, climate change, is so consistently badly reported on in the media (actually better the last couple of years) that one wonders how anyone could make a bigger hash of things than professional journalists.

I did quite a bit of research on the topic of scientific data portals, how scientists trust the data they use and read, metadata, and issues related to accuracy, reliability and authenticity. For me to trust any analysis, i want the metadata, i want the methodology, i want to know the limitations of the data, i want to know the background of the person doing the analysis so that i may decide whether or not they know what they are doing, i want other supporting documentation related to the methodology and so on. Just like when i read about a new miracle drug, i look up who wrote the report to see who paid for the research! So i want to know if there is bias but also i want to know if the work was done well.

Most, but most certainly not all scientific data, and here i am speaking of the natural sciences and not the social sciences nor medical sciences, are made quite accessible once one knows where to look and most of these data are available at not cost if the data are being used for non commercial purposes. So it is not scientists who are restricting access to data generally but the bureaucracy that surrounds them. In Canada it is the governmental department, the city government or the provincial government who restrict this access. If the data are collected by the private sector then it is the companies who restrict them. If scientist are working in a university, the restriction is not the institution but the lack of an infrastructure to enable the scientist to share his/her data, namely a portal with a decent interface and excellent metadata forms for the scientist to fill out. Further, Canada is way behind the times as it does not have a data archive and scientific research funding bodies do not mandate scientists nor do these agencies provide resources to share their data. I do not like off the cuff statements that scientists are afraid of the grand public! Some are for sure but in my experience this is a minority. Also, that is speculation, and assumption not based on facts or data. It is also stereotyping a very large number of people in a particular profession, something none of us like having done to us.

I like that Udel is creating a public discussion about data, i read most of the comments last night and they in some way were more interesting than the analysis he did. His analysis was faulty, and lacked rigour which he admits in the comments. And as seen in the comments important algorithmic information related to climate models was completely overlooked and not discussed. I however appreciate his experimentation. My greatest difficulty is that people will believe whatever ole chart, map or number they are given so i always hope that people who do generate that content take responsibility for what they say and really understand what they are doing with their data. Some sort of accountability for what they put out there. A datapedia would be good as you would get data and scientific geeks peer reviewing stuff created by amateurs who are learning the ropes. Just like when i get my bike fixed by my really nice neighbour, he does a good job but not a great job as his knowledge only goes so far. I take my good bike to a good bike shop who will back up their work with a warranty and a reputation to maintain. That way i know my brakes won’t fail. I also trust that my good gear is in good hands. In other words there is a trust issue regarding what is being said and how it is being said when the public works with data. And i think that doubt is reasonable.

The National Cancer Registry in Ireland for example, allows their data to be used for free by anyone. However if you are going to publish results derived from their data they ask that the methodology and the data used be submitted for verification. Partially because they do not want their organization to be associated with faulty work, but also because they have a public health concern, namely they want to avoid improper information that could misinform a public health policy or a faulty rumour that could harm people. This is what distinguishes snake oil sellers from legitimate chemists. Recent events in China with dangerous products being sold is also related to that. Non professionals producing antibiotics that kill people as they are not following the methodological rigour in that production process and the products are not being inspected. I think it is perfectly legitimate for accountability reasons that there be lots of questions and a professional and scientific process in place to ensure some sort of accountability and responsibility.

The meteorologist who responded on the Udel blog had very legitimate concerns, as did a number of commentators. And I greatly respected the way Udel responded and accepted their criticism. That is a conversation about data not an accusatory statement about how scientists are this, and professionals are that, and amateurs are this. In the end, what is the best way at telling the truth about something, that is what we should be aiming for, and the MNPOV is a good way to have a conversation.