<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: udell on climate</title>
	<atom:link href="http://datalibre.ca/2007/07/23/udell-on-climate/feed/" rel="self" type="application/rss+xml" />
	<link>http://datalibre.ca/2007/07/23/udell-on-climate/</link>
	<description>urging governments to make data about canada and canadians free and accessible to citizens</description>
	<lastBuildDate>Mon, 21 May 2012 16:17:42 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
	<item>
		<title>By: Tracey</title>
		<link>http://datalibre.ca/2007/07/23/udell-on-climate/comment-page-1/#comment-43</link>
		<dc:creator>Tracey</dc:creator>
		<pubDate>Wed, 25 Jul 2007 15:37:28 +0000</pubDate>
		<guid isPermaLink="false">http://datalibre.ca/2007/07/23/udell-on-climate/#comment-43</guid>
		<description>I did quite a bit of research on the topic of scientific data portals, how scientists trust the data they use and read, metadata, and issues related to accuracy, reliability and authenticity.  For me to trust any analysis, i want the metadata, i want the methodology, i want to know the limitations of the data, i want to know the background of the person doing the analysis so that i may decide whether or not they know what they are doing, i want other supporting documentation related to the methodology and so on.  Just like when i read about a new miracle drug, i look up who wrote the report to see who paid for the research!  So i want to know if there is bias but also i want to know if the work was done well.

Most, but most certainly not all scientific data, and here i am speaking of the natural sciences and not the social sciences nor medical sciences, are made quite accessible once one knows where to look and most of these data are available at not cost if the data are being used for non commercial purposes.  So it is not scientists who are restricting access to data generally but the bureaucracy that surrounds them.  In Canada it is the governmental department, the city government or the provincial  government who restrict this access.  If the data are collected by the private sector then it is the companies who restrict them.  If scientist are working in a university, the restriction is not the institution but the lack of an infrastructure to enable the scientist to share his/her data, namely a portal with a decent interface and excellent metadata forms for the scientist to fill out.  Further, Canada is way behind the times as it does not have a data archive and scientific research funding bodies do not mandate scientists nor do these agencies provide resources to share their data.  I do not like off the cuff statements that scientists are afraid of the grand public!  Some are for sure but in my experience this is a minority. Also, that is speculation, and assumption not based on facts or data.  It is also stereotyping a very large number of people in a particular profession, something none of us like having done to us.

I like that Udel is creating a public discussion about data, i read most of the comments last night and they in some way were more interesting than the analysis he did.  His analysis was faulty, and lacked rigour which he admits in the comments. And as seen in the comments important algorithmic information related to climate models was completely overlooked and not discussed.  I however appreciate his experimentation.  My greatest difficulty is that people will believe whatever ole chart, map or number they are given so i always hope that people who do generate that content take responsibility for what they say and really understand what they are doing with their data.  Some sort of accountability for what they put out there.  A datapedia would be good as you would get data and scientific geeks peer reviewing stuff created by amateurs who are learning the ropes.  Just like when i get my bike fixed by my really nice neighbour, he does a good job but not a great job as his knowledge only goes so far.  I take my good bike to a good bike shop who will back up their work with a warranty and a reputation to maintain.  That way i know my brakes won&#039;t fail.  I also trust that my good gear is in good hands.  In other words there is a trust issue regarding what is being said and how it is being said when the public works with data.  And i think that doubt is reasonable.

The National Cancer Registry in Ireland for example, allows their data to be used for free by anyone.  However if you are going to publish results derived from their data they ask that the methodology and the data used be submitted for verification.  Partially because they do not want their organization to be associated with faulty work, but also because they have a public health concern, namely they want to avoid improper information that could misinform a public health policy or a faulty rumour that could harm people.  This is what distinguishes snake oil sellers from legitimate chemists.  Recent events in China with dangerous products being sold is also related to that.  Non professionals producing antibiotics that kill people as they are not following the methodological rigour in that production process and the products are not being inspected.  I think it is perfectly legitimate for accountability reasons that there be lots of questions and a professional and scientific process in place to ensure some sort of accountability and responsibility.  

The meteorologist who responded on the Udel blog had very legitimate concerns, as did a number of commentators.  And I greatly respected the way Udel responded and accepted their criticism.  That is a conversation about data not an accusatory statement about how scientists are this, and professionals are that, and amateurs are this.  In the end, what is the best way at telling the truth about something, that is what we should be aiming for, and the MNPOV is a good way to have a conversation.</description>
		<content:encoded><![CDATA[<p>I did quite a bit of research on the topic of scientific data portals, how scientists trust the data they use and read, metadata, and issues related to accuracy, reliability and authenticity.  For me to trust any analysis, i want the metadata, i want the methodology, i want to know the limitations of the data, i want to know the background of the person doing the analysis so that i may decide whether or not they know what they are doing, i want other supporting documentation related to the methodology and so on.  Just like when i read about a new miracle drug, i look up who wrote the report to see who paid for the research!  So i want to know if there is bias but also i want to know if the work was done well.</p>
<p>Most, but most certainly not all scientific data, and here i am speaking of the natural sciences and not the social sciences nor medical sciences, are made quite accessible once one knows where to look and most of these data are available at not cost if the data are being used for non commercial purposes.  So it is not scientists who are restricting access to data generally but the bureaucracy that surrounds them.  In Canada it is the governmental department, the city government or the provincial  government who restrict this access.  If the data are collected by the private sector then it is the companies who restrict them.  If scientist are working in a university, the restriction is not the institution but the lack of an infrastructure to enable the scientist to share his/her data, namely a portal with a decent interface and excellent metadata forms for the scientist to fill out.  Further, Canada is way behind the times as it does not have a data archive and scientific research funding bodies do not mandate scientists nor do these agencies provide resources to share their data.  I do not like off the cuff statements that scientists are afraid of the grand public!  Some are for sure but in my experience this is a minority. Also, that is speculation, and assumption not based on facts or data.  It is also stereotyping a very large number of people in a particular profession, something none of us like having done to us.</p>
<p>I like that Udel is creating a public discussion about data, i read most of the comments last night and they in some way were more interesting than the analysis he did.  His analysis was faulty, and lacked rigour which he admits in the comments. And as seen in the comments important algorithmic information related to climate models was completely overlooked and not discussed.  I however appreciate his experimentation.  My greatest difficulty is that people will believe whatever ole chart, map or number they are given so i always hope that people who do generate that content take responsibility for what they say and really understand what they are doing with their data.  Some sort of accountability for what they put out there.  A datapedia would be good as you would get data and scientific geeks peer reviewing stuff created by amateurs who are learning the ropes.  Just like when i get my bike fixed by my really nice neighbour, he does a good job but not a great job as his knowledge only goes so far.  I take my good bike to a good bike shop who will back up their work with a warranty and a reputation to maintain.  That way i know my brakes won&#8217;t fail.  I also trust that my good gear is in good hands.  In other words there is a trust issue regarding what is being said and how it is being said when the public works with data.  And i think that doubt is reasonable.</p>
<p>The National Cancer Registry in Ireland for example, allows their data to be used for free by anyone.  However if you are going to publish results derived from their data they ask that the methodology and the data used be submitted for verification.  Partially because they do not want their organization to be associated with faulty work, but also because they have a public health concern, namely they want to avoid improper information that could misinform a public health policy or a faulty rumour that could harm people.  This is what distinguishes snake oil sellers from legitimate chemists.  Recent events in China with dangerous products being sold is also related to that.  Non professionals producing antibiotics that kill people as they are not following the methodological rigour in that production process and the products are not being inspected.  I think it is perfectly legitimate for accountability reasons that there be lots of questions and a professional and scientific process in place to ensure some sort of accountability and responsibility.  </p>
<p>The meteorologist who responded on the Udel blog had very legitimate concerns, as did a number of commentators.  And I greatly respected the way Udel responded and accepted their criticism.  That is a conversation about data not an accusatory statement about how scientists are this, and professionals are that, and amateurs are this.  In the end, what is the best way at telling the truth about something, that is what we should be aiming for, and the MNPOV is a good way to have a conversation.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Hugh</title>
		<link>http://datalibre.ca/2007/07/23/udell-on-climate/comment-page-1/#comment-41</link>
		<dc:creator>Hugh</dc:creator>
		<pubDate>Tue, 24 Jul 2007 19:15:04 +0000</pubDate>
		<guid isPermaLink="false">http://datalibre.ca/2007/07/23/udell-on-climate/#comment-41</guid>
		<description>&quot;The data itself is useful, but the metadata behind it at least as important&quot;... 

right and i guess the idea is that the more &quot;open&quot; your data and meta data is, the more likely we are to get the good interpretations out of it, because we can see not just the visualization/analysis, but also the underlying data, and even more the meta data. the other option is to &quot;trust the experts&quot; which... has a number of flaws.

not least of which is that for the general public, the usual method of exposure to data (if they have any at all) is the mainstream media (also, apparently, &quot;experts&quot;). one area where I have a little bit of expertise, climate change, is so consistently badly reported on in the media (actually better the last couple of years) that one wonders how anyone could make a bigger hash of things than professional journalists.</description>
		<content:encoded><![CDATA[<p>&#8220;The data itself is useful, but the metadata behind it at least as important&#8221;&#8230; </p>
<p>right and i guess the idea is that the more &#8220;open&#8221; your data and meta data is, the more likely we are to get the good interpretations out of it, because we can see not just the visualization/analysis, but also the underlying data, and even more the meta data. the other option is to &#8220;trust the experts&#8221; which&#8230; has a number of flaws.</p>
<p>not least of which is that for the general public, the usual method of exposure to data (if they have any at all) is the mainstream media (also, apparently, &#8220;experts&#8221;). one area where I have a little bit of expertise, climate change, is so consistently badly reported on in the media (actually better the last couple of years) that one wonders how anyone could make a bigger hash of things than professional journalists.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Erigami</title>
		<link>http://datalibre.ca/2007/07/23/udell-on-climate/comment-page-1/#comment-40</link>
		<dc:creator>Erigami</dc:creator>
		<pubDate>Tue, 24 Jul 2007 18:30:11 +0000</pubDate>
		<guid isPermaLink="false">http://datalibre.ca/2007/07/23/udell-on-climate/#comment-40</guid>
		<description>&lt;blockquote&gt;data folk are legitimately worried about an uneducated public getting their hands on data and making all sorts of errors with it - which of course is not a good thing&lt;/blockquote&gt;

I&#039;ve only dealt with a few small repositories of data (some for bioinformatics, some for computer security). The data itself is useful, but the metadata behind it at least as important: How was the data collected? What does it include? How do we pare out non-representative data? How do we quantify and eliminate noise from the dataset?

Udell neatly avoids the hard question in your original post:

&lt;blockquote&gt;Your vague â€œweâ€ combined with the demonstration of the Many Eyes site trivializes the process of evidence exploration and collaborative interpretation (community of practice? peer review?)&lt;/blockquote&gt;

How do the people using the data know that they are interpreting it properly? How do they know what visualizations make sense? How do they tell the difference between random correlation and actual causation?

Any idiot can throw data at a visualization tool and produce meaningless pictures. The trick lies in encoding the appropriate metadata in the original dataset, making tools aware of it, and ensuring that the tool can provide appropriate commentary to prevent naive users from falling into common traps.</description>
		<content:encoded><![CDATA[<blockquote><p>data folk are legitimately worried about an uneducated public getting their hands on data and making all sorts of errors with it &#8211; which of course is not a good thing</p></blockquote>
<p>I&#8217;ve only dealt with a few small repositories of data (some for bioinformatics, some for computer security). The data itself is useful, but the metadata behind it at least as important: How was the data collected? What does it include? How do we pare out non-representative data? How do we quantify and eliminate noise from the dataset?</p>
<p>Udell neatly avoids the hard question in your original post:</p>
<blockquote><p>Your vague â€œweâ€ combined with the demonstration of the Many Eyes site trivializes the process of evidence exploration and collaborative interpretation (community of practice? peer review?)</p></blockquote>
<p>How do the people using the data know that they are interpreting it properly? How do they know what visualizations make sense? How do they tell the difference between random correlation and actual causation?</p>
<p>Any idiot can throw data at a visualization tool and produce meaningless pictures. The trick lies in encoding the appropriate metadata in the original dataset, making tools aware of it, and ensuring that the tool can provide appropriate commentary to prevent naive users from falling into common traps.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

