datasets

You are currently browsing the archive for the datasets category.

Open Dinosaur Project

This, I love:

The Open Dinosaur Project was founded to involve scientists and the public alike in developing a comprehensive database of dinosaur limb bone measurements, to investigate questions of dinosaur function and evolution. We have three major goals:1) do good science; 2) do this science in the most open way possible; and 3) allow anyone who is interested to participate. And by anyone, we mean anyone! We do not care about your education, geographic location, age, or previous background with paleontology. The only requirement for joining us is that you share the goals of our project and are willing to help out in the efforts.

Want to sign up? Email project head Andy Farke (andrew.farke@gmail.com), and welcome aboard!

[via datalibre]

Last night I attended the Town Hall Discussion on The Future of the Internet: Access, Openness and Inclusion. There was a hint from the moderator Marita Moll that Industry Canada as part of its Broadband Program might be releasing a map of Canadian broadband. There has been some interesting discussion in the US about access to broadband data at Off the Map and a podcast at All Points Blog. An E-Scan report has been done in Ontario on possibilities for the development of a Broadband Atlas for Ontarians. In all cases access to infrastructure data are highlighted as barriers, particularly as infrastructure has increasingly become privatized and splintered.

The City of Vancouver will soon vote on a Motion to have:

  • Open Standards
  • Open Source
  • Open Data
  • CBC News: Vancouver mulls making itself an ‘open city’, by Emily Chung

    Via: Digital Copyright Canada

    It is quite surprising that this was not the norm, to manage the public good!

    the Federal Court of Canada released late yesterday that it will force the federal government to stop withholding data on one of Canada’s largest sources of pollution – millions of tonnes of toxic mine tailings and waste rock from mining operations throughout the country.

    The Federal Court sided with the groups and issued an Order demanding that the federal government immediately begin publicly reporting mining pollution data from 2006 onward to the National Pollutant Release Inventory (NPRI). The strongly worded decision describes the government’s pace as “glacial” and chastises the government for turning a “blind eye” to the issue and dragging its feet for “more than 16 years”.

    I look forward to reading the court order. According to Ecojustice (Formerly the Sierra Legal Defence Fund) the ruling includes the following strong wording:

    * It calls the federal government’s pace “glacial”[paragraph 145];
    * It says the government’s approach has been simply to turn a “blind eye”[207];
    * It notes that the frustration felt by advocates trying to uncover this information “after more than 16 years of consultation” is “perfectly understandable” [124];
    * It states that not reporting “denies the Canadian public its rights to know how it is threatened by a major source of pollution”[127];
    * It highlights that the minister has chosen not to publish the pollution data “in deference to” the mining industry[220];
    * It used unusually simple language even I understand when it said that the government was simply “wrong”[177].

    The advocates were: Justin Duncan and Marlene Cashin and their dedicated clients at Great Lakes United and Mining Watch Canada who launched the case in 2007.

    It is uncertain how these data will be released. Currently, these types of pollutant data are released on the National Pollutant Release Inventory (NPRI) which is:

    The National Pollutant Release Inventory (NPRI) is Canada’s legislated, publicly accessible inventory of pollutant releases (to air, water and land), disposals and transfers for recycling. (Mining Watch)

    The NPRI is fairly usable & accessible, includes georeferencing and some mapping services. I tried to use their library and it was however not working!

    The Mining Association of Canada wants to read the ruling “carefully” to assess how Environment Canada should release these data. I find this confusing, since I thought the Government got to decide how these data are to be released and what is to be included, and that decision was based on ensuring the public good and the public right to know. The fight is not yet quite over. It will be important to ensure the data are not watered down for public consumption.

    It is another wonderful example of creating an infrastructure – NPRI + law – to distribute public data. This also teaches us something about gouvernementalité, and who the government thinks with, in this case the mineral and mining industry and not citizens. Citizens should not have to lobby for 16 years and expend incredible resources to get the courts to get the government to ensure the public good!

    Articles:

  • Court orders pollution data from mining made public, By Juliet O’Neill, Canwest News ServiceApril 24, 2009
  • Environment Canada forced to reveal full extent of pollution from mines
    Court ruling considered major victory for green organizations
    , MARTIN MITTELSTAEDT, Saturday’s Globe and Mail, April 24, 2009
  • Great Lakes United Press Release, Court victory forces Canada to report pollution data for mines, April 24, 2009 – 11:16am — Brent Gibson
  • Mining Watch Press Release: Court Victory Forces Canada to Report Pollution Data for Mines, Friday April 24, 2009 11:31 AM

    Canada Institute for Scientific and Technical Information (CISTI) is

    Canada’s national science library and leading scientific publisher, provides Canada’s research and innovation community with tools and services for accelerated discovery, innovation and commercialization.

    CISTI delvers science data and information to Canadians online, in the Depository Service and as paper delivery service to researchers in Universities.  But its days of doing that are numbered…

    CISTI has just suffered very serious budget cuts – 70% cut – that affects scientific innovation, access to scientific data, the dissemination of Canadian Science and open access publishing.

    The Government of Canada and the National Research Council of Canada have decided that the journals and services of NRC Research Press will be transferred to the private sector.

    Privatization? In a sense they are a victim of their own success.  The NRC frames it as follows in a letter to their clients (e.g. Depository Service Program):

    this transformation is not the development of a “new business” but the movement of a successful program into a new legal and business environment. It is our belief that this new environment will afford us more flexibility to manage our publishing activities.

    More flexibility to reduce services to Canadians more like it since the Depository Services Program (DSP) and the delivery of online access to journals to Canadians cannot be funded by an entity outside of the Federal government, and it is expected that the termination date to journals delivered in this way will be sometime in 2010.

    This means less access to scientific journals to Canadians. Research Canadians have paid for!  CISTI journals deposited in the DSP were important, since the DSP’s:

    primary objective is to ensure that Canadians have ready and equal access to federal government information. The DSP achieves this objective by supplying these materials to a network of more than 790 libraries in Canada and to another 147 institutions around the world holding collections of Canadian government publications.

    In addition, hundreds of government jobs – scientists, librarians and researchers are expected to be lost.  The budget cut is $35 million in annual expenditures.

    This plan includes a reduction in NRC’s a-base funding totalling $16.8 million per year by 2011-2012 (announced in Budget 2009) as well as reductions in revenue-generating activities.

    Hmm! Wonder what our current Federal Minister of State for Science and Technology’s thoughts are about science?

    Here are a couple of articles:

    Actions:

    Here are a few articles:

  • NRC cuts could affect 300 positions, The Ottawa Citizen
  • Access to CISTI Source to End
  • Action:

    Ready or Not, Here Comes Open Access: Sure, you’d rather focus on science than on debates about open access. But the decisions made today about publishing models are relevant not only to your work, but also to the future of biomedical research. So pay attention.

    November issue of Genome Technology focuses entirely on Open Access openly available under a CC license.  The articles discussed both data and publications.  Wonderful! See page 40 of the journal to read Ready or Not, Here Comes Open Access.

    Via: SPARC, the Scholarly Publishing and Academic Resources Coalition

    I was looking for some cross city comparison data yesterday and recalled the Federation of Canadian Municipalities (FCM) Quality of Life Reporting System (QoLRS).

    Conçu par la FCM, le Système de rapports sur la qualité de vie mesure, surveille et fait état de la qualité de vie dans les villes canadiennes en utilisant les données provenant de diverses sources nationales et municipales. / Developed by FCM, the Quality of Life Reporting System (QOLRS) measures, monitors and reports on the quality of life in Canadian urban municipalities using data from a variety of national and municipal sources.

    Regroupant initialement 16 municipalités à ses débuts en 1999, le SRQDV compte maintenant 22 municipalités, dont certains des plus grands centres urbains du Canada et beaucoup de municipalités de banlieue qui les entourent. / Starting with 16 municipalities in 1999, the QOLRS has grown to include 22 municipalities, comprising some of Canada’s largest urban centres and many of the suburban municipalities surrounding them.

    The FCM’s QoLRS site includes all the documentation, data, metadata and methodologies related to the development of their indicators and the system they have developed.

    :: Reports
    :: Annexes
    :: Indicators

    Their data are most impressive.  You can download a spreadsheet of the data for each indicator for 1991, 1996, 2001 and I expect 2006 QoLRS will be coming soon.   Each variable was also adjusted to the current geographies of amalgamated cities which makes cross comparison across time and space possible (see the guide to geographies).  This was not easy to do at the time. Each spreadsheet includes the data source, the variable, and a tab that provides the metadata.  Which means that you can verify what was done, reuse those data or if you had some money & loads of time you could purchase & acquire the data pertaining to your city and add to the indicator system.  Unfortunately the FCM had to purchase these datasets and it cost them many many thousands of dollars.

    There are 11 themes and 72 indicators over 3 census periods for 20 cities (Sudbury, Regina, Winnipeg, Niagara, CMQ, Saskatoon, Edmonton, Hamilton, Halifax, Windsor, Toronto, Kingston, London, Ottawa, Vancouver, Waterloo, Halton, Calgary, Peel, York).  Datasets come from:

    • Statistics Canada
    • Canada Housing and Mortgage Corporation
    • Environment Canada
    • the 22 cities themselves
    • Elections Canada
    • Audit Bureau of Circulation
    • Tax Filer Data
    • Human Resources and Development Services Canada,
    • FCM Special Surveys
    • Industry Canada
    • Anielsky Management (Ecological Footprint)
    • Canadian Centre for Justice

    Putting something like this together is no small feat, so please go check out what is available, play with the data a little, and if you cannot find data for your city, call up your local councilor and ask them to become a member of the QoLRS team!  Also let the FCM know they are doing a good job, as this is one way for us Canadians to see what is going on in our cities overtime.

    gas price map

    From gasbuddy.com:

    Now you can see what gas prices are around the country at a glance. Areas are color coded according to their price for the average price for regular unleaded gasoline.

    Here is the US map.

    [via infoesthetics]

    Remember hearing about SETI@home? Check out, and download, Gridrepublic.org:

    GridRepublic members run a screensaver that allows their computers to work on public-interest research projects when the machines are not otherwise in use. This screensaver does not affect performance of the host computer any more than an ordinary screensaver does.

    By aggregating idle resources from users around the world, we create a massive supercomputer.

    Gridrepublic is built on the system that started as SETI@home, which was turned into a general distributed computing platform BOINC. Gridrepublic is a central place for all projects using this distributed platform, where you can dowload & install the system and even better, choose which projects your computer’s idle time will be supporting, including:

    Einstein@home: you can contribute your computer’s idle time to a search for spinning neutron stars (also called pulsars) using data from the LIGO and GEO gravitational wave detectors.

    BBC Climate Change: The same model that the Met Office uses to make daily weather forecasts has been adapted to run on home PCs. The model incorporates many variable parameters, allowing thousands of sets of conditions. Your computer will run one individual set of conditions– in effect your individual version of how the world’s climate works– and then report back to the research team what it calculates. This experiment was described on the BBC television documentary Meltdown (BBC-4, February 20th, 2006). Note: workunits require several months of screensaver time; faster computers recommended.

    Rosetta@home: needs your help to determine the 3-dimensional shape of proteins as part of research that may ultimately contribute to cures for major human diseases such as AIDS / HIV, Malaria, Cancer, and Alzheimer’s.
    Proteins@Home: investigating the “Inverse Protein Folding Problem”: Whereas “Protein Folding” seeks to determine a protein’s shape from its amino acid sequence, “Inverse Protein Folding” begins with a protein of known shape and seeks to “work backwards” to determine the amino acid sequence from which it is generated.

    Quantum Monte Carlo: Reactions between molecules are important for virtually all parts of our lives. The structure and reactivity of molecules can be predicted by Quantum Chemistry, but the solution of the vastly complex equations of Quantum Theory often require huge amounts of computing power. This project seeks to raise the necessary computing time in order to further develop the very promising Quantum Monte Carlo (QMC) method for general use in Quantum Chemistry.

    Donate here.

    Science 2.0 — Is Open Access Science the Future?
    Is posting raw results online, for all to see, a great tool or a great risk?
    By M. Mitchell Waldrop, Scientific American

    The first generation of World Wide Web capabilities rapidly transformed retailing and information search. More recent attributes such as blogging, tagging and social networking, dubbed Web 2.0, have just as quickly expanded people’s ability not just to consume online information but to publish it, edit it and collaborate about it—forcing such old-line institutions as journalism, marketing and even politicking to adopt whole new ways of thinking and operating.

    Science could be next. A small but growing number of researchers (and not just the younger ones) have begun to carry out their work via the wide-open tools of Web 2.0. And although their efforts are still too scattered to be called a movement—yet—their experiences to date suggest that this kind of Web-based “Science 2.0” is not only more collegial than traditional science but considerably more productive…

    read the rest of the article…

    Via Zzzoot

    « Older entries § Newer entries »