DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web. DBpedia allows you to ask sophisticated queries against Wikipedia and to link other datasets on the Web to Wikipedia data.
You are currently browsing the archive for the datasets category.
MySociety has released some very useful and sexy interactive travel-time maps for the UK using public data.
From Wired:
Sources at Google have disclosed that the humble domain, http://research.google.com, will soon provide a home for terabytes of open-source scientific datasets. The storage will be free to scientists and access to the data will be free for all. The project, known as Palimpsest and first previewed to the scientific community at the Science Foo camp at the Googleplex last August, missed its original launch date this week, but will debut soon.
Building on the company’s acquisition of the data visualization technology, Trendalyzer, from the oft-lauded, TED presenting Gapminder team, Google will also be offering algorithms for the examination and probing of the information. The new site will have YouTube-style annotating and commenting features.
[Via Open Access News]
The Federation of Canadian Municipalities has just released its Quality of Life Reporting System (Press Release) – Trends & Issues in Affordable Housing and Homelessness (Report in pdf). If you go to the end of the report & this post you will find the data sources required to write this important report on the situation of housing & homelessness in Canadian Cities.
NOTE – these public datasets were purchased to do this analysis. It costs many many thousands of dollars to acquire these public data. Public data used to inform citizens on a most fundamental issue – shelter for Canadians. Statistics Canada does not generally do city scale analysis as it is a Federal agency and the Provinces will generally not do comparative analysis beyond cities in their respective provinces. This type of cross country and cross city analysis requires a not-for-profit organization or the private sector to do the work. We are very fortunate that the FCM has the where-with-all to prepare these reports on an ongoing basis. This is an expensive proposition, not only because subject matter specialists are required, much city consultation is necessary for contextual and validation reasons but also because the datasets themselves are extremely costly. There is no real reason to justify this beyond cost recovery policies. Statistics Canada and CMHC are the worst in this regard. The other datasets used are not readily accessible to most. While the contextual data requires specially designed surveys.
The documents referred to in this report were however freely available but not readily findable/discoverable as there is no central repository or portal where authors can register/catalogue their reports. This is unfortunate as it takes a substantial amount of effort to dig up specialized material from each city, province or federal departments and NGOs.
Public (but not free) Datasets Used in the report:
- Statistics Canada – Population Census, Population and Dwelling Counts, Age and Sex, Families and Households, and Housing and Shelter Costs, Tax Filer Statistics for economic, some population data and Migration Estimates.
- Canada Mortgage and Housing Corporation – Customized data on the cost of housing, the availability of housing, vacancy rates and housing Starts and Completions Survey.
- Citizenship and Immigration Canada – volume of immigration, demographics of immigrants and destinations in Canadian cities.
- Human Resources and Social Development Canada – Minimum wage database, Homeless Individuals and Families Information System (HIFIS).
- Homeless and Social Housing Data were derived from 22 FCM QOLRS participating municipalities.
Report and Press Release Via Ted on the Social Planning Network of Ontario Mailing List.
The Toronto Star has a big map of languages spoken in Toronto, using 2006 census data.
Bricoleururbanism.org, a wonderful blog, has digested some of the map images for you, and here is one:
Via: Spacingmontreal.ca
Makes me think of a good question for schools and universities: why aren’t you guys doing this stuff in your classes and publishing it like crazy? Wouldn’t it be nice if a big chunk of school work was designed to be actually useful to the world, and actually was? And was distributed freely on the net?
From the economist:
A good graphic can tell a story, bring a lump to the throat, even change policies. Here are three of history’s best…
They chose these 3:
- Florence Nightingale’s chart of the causes of the deaths of soldiers in the Crimean war
- Charles Joseph Minard’s chart of Napoleon’s Russian campaign of 1812
- William Playfair’s chart of “weekly wages of a good mechanic†and the “price of a quarter of wheat†against monarchs.
[link…]
Dennis D. McDonald on data & energy use:
What kind of culture changes will be needed, I wonder, both for energy utility staff and for customers when customers are able to make a much more direct connection between the devices they use at home and their monthly bill? This change has the potential for making the customer-company relationship more interactive than it is now. This raises some interesting questions:
* Who is going to teach customers how to best manage their energy consumption?
* Will the energy company’s call center staff have to develop a new set of counseling and advice-giving skills?
* What new tools will control room staff need to monitor distribution network performance, and will these tools take into account human-supplied information alongside automatically-supplied data from the grid and its increasing number of sensors?
(via jon udell)
An excellent post on the development of Government Open Data Principles. These were developed at an O’Reilly and Associates workshop. Ethan Zuckerman provides some excellent background on his blog and a Open Data WiKi is accepting comments and of course collaboration. Has this been done anywhere else?
Government data shall be considered open if it is made public in a way that complies with the principles below:
1. Complete
All public data is made available. Public data is data that is not subject to valid privacy, security or privilege limitations.2. Primary
Data is as collected at the source, with the highest possible level of granularity, not in aggregate or modified forms.3. Timely
Data is made available as quickly as necessary to preserve the value of the data.4. Accessible
Data is available to the widest range of users for the widest range of purposes.5. Machine processable
Data is reasonably structured to allow automated processing.6. Non-discriminatory
Data is available to anyone, with no requirement of registration.7. Non-proprietary
Data is available in a format over which no entity has exclusive control.8. License-free
Data is not subject to any copyright, patent, trademark or trade secret regulation. Reasonable privacy, security and privilege restrictions may be allowed.
Ecologo is an excellent example of a Government of Canada consumer data and information service that facilitates the making of informed decisions on how and what to consume.
I discovered it this morning while reading an article about greening computers in the Globe and Mail. I pay attention to electronic waste on my personal blog but think Ecologo is also relevant here as it is a program that provides data on green consumer product certification to Canadians using a rigorous review system. There is also a tinge of national pride here when I read the following even though I know that Canada as a green country is a myth, nonetheless Ecologo was:
launched by the Canadian federal government in 1988, EcoLogo is North America’s oldest environmental standard and certification organization (and the second oldest in the world). It is the only North American standard accredited by the Global Ecolabeling Network as meeting the international ISO 14024 standard for Type I (third-party certified, multi-attribute) environmental labels.
Environment Canada has always been excellent at developing sustainability and other quality of life criteria and monitoring measures. It is one of those interesting departments that is both science and policy, and they stick to good science in their methods to communicate, evaluate and disseminate – budgets permitting of course!
EcoLogoM certification criteria documents (CCDs) are developed in an open, public and transparent process, with a broad base of stakeholder participation including user groups (e.g. procurement associations, institutional purchasers and consumer protection organizations), product producers (e.g. industry members and associations), government / regulators, general science-based representatives (e.g. academics, life cycle experts and other scientists), environmental non-governmental organizations (ENGOs), and other environmental advocates. The criteria address multiple environmental attributes related to human health and environmental considerations throughout the life cycle of the product. Currently, there are 122 Certification Criteria Documents addressing over 250 product types.
You can look up just about anything and discover products in their impressive list. I like that there is a rigorous system in place that is about making informed choices. This is what data are for! They also have an excellent purchaser’s tool box organized by product, category or company.
Hmm! Wonder if we could ever develop a criteria to evaluate organizations on their access, preservation and dissemination of data? What would be the key criteria in such an evaluation? Would an organization get a Free and Open Knowledge certificate (the acronym is terrible! we need Michael Lenczner‘s help here!)? A CivicAccess gold, silver or bronze stamp of data democracy and liberation?
Yes, we map all 4,294,967,296 IP addresses onto a huge image and let you zoom into it and pan around. Just like google maps, but more internetty.
Comments on Posts