datasets

You are currently browsing the archive for the datasets category.

Casualty Count

Coalition Casualty Count is a site managed by independent US citizens who analytically count the coalition casualties

for Operation Iraqi Freedom and Operation Enduring Freedom [Afghanistan]. We attempt to be up to date, precise, accurate and reliable.

There are many other sites on the web that list information of Fatalities from Iraq , but few if any of them do this in an analytical fashion. We endeavor to provide not just a list of names but a resource detailing when, where and how fatalities occurred.

You can read their methodology here.  I am always happy when I get to see the data and read how they were assembled, this provides me with the means to critically assess what is being presented to me.  I love the myriad visualization tools that are emerging on the net however, I wish they were accompanied by metadata which helps me better understand and decide whether or not I trust what is being said to me.

There are alot of data points and even some maps on this site and these folks are commended for doing this work and telling this important story.  There is also a list of the Canadian men and women causalities in Afghanistan.

via: Spatial Sustain

banks & dataviz

It’s amazing the flowering of data visualization projects – and how well they sometimes bring to life abstract issues.

Here is a beautiful little project, which helps you understand the scale of the financial woes brought on by the subprime mortgage troubles in the US. It’s a complex problem with all sorts of reasons and ramifications, but the simplest explanation is this: in the past decade, banks have been falling over themselves to give out loans to really, really bad credit risks. This means that lots of money that’s gone out in loans isn’t coming back. Which means banks are going to start to fail.

You can see this by asking: how many loan repayments are more than 90 days late? And you could split that out among various banks, and track it over the period from 2002-2007, and see not just how many, but the value of those overdue payments. And if you did that, you’d get this:

bank mortgage

If you made that graph into a little movie over time, you’d be in good shape. Which is what and still i persist has done.

PS time to dump your shares of Wells Fargo, I’d say.

[thanks, as always, to infosthetics]

world clock

the world clock.

UNdata

Check out the new UNdata – United Nations Data Access System (UNdata)

The new UN data access system (UNdata) will improve the dissemination of statistics by United Nations Statistics Division (UNSD) to the widest possible audience. An easy to use data access system was developed that meets UNSD’s vision of providing an integrated information resource with current, relevant and reliable statistics free of charge to the global community.

Subsequent stages of the development of the UN data access system will extend to UN system data as well as to data of national statistical offices – providing the user with a simple single-entry point to global statistics.

UNdata

UNdata

Imagine if we could do that in Canada!

I have a thing about cars, idling, air quality and really appreciate it when people develop interesting visualizations & sonifications that make car population issues tangible by using metaphors which make those data meaningful. While this is an HR intensive and expensive visualization project, it could not have been done without access to some free data and in this case Madrid Movilidad. I would have liked a bit more metadata and metholodological explanations to accompany the visualizations though! Nonetheless, this project reinforces the argument that experimentation and innovation comes with free data!

Cascade on Wheels is a visualization project that intends to express the quantity of cars we live with in big cities nowadays. The data set we worked on is the daily average of cars passing by streets, over a year. In this case, a section of the Madrid city center, during 2006. The averages are grouped down into four categories of car types. Light vehicles, taxis, trucks, and buses.

We made two different visualizations of the same data set. We intended not just to visualize the data in a readable way, but also to express its meaning, with the use of metaphors. In the Walls Map piece, car counts are represented by 3D vertical columns emerging from the streets map, like walls. The Traffic Mixer piece, where noise is the metaphor, is an hybrid of a visualization and a sound toy. The first piece focuses more on showing the data in a readable and functional way, while the latter focuses more on expressing the meaning of the data and immersing the user into these numbers. Both pieces try to complete each other.

Check out their videos!

Well the folks (Matt Ball and Jeff Thurston) over at Spatial Sustain a Vector 1 Media blog have a great article exactly about that topic here. The article discusses free data as a platform for economic expansion, how free geospatial data weighed against cost represents a return on investment, industry creation based on government free data in the US.

Free federal data spurred free market competition. If the data were locked up to begin with, the market would never have taken off. There wouldn’t be the level of investment in technology, and we’d be much poorer in terms of both economic benefit and our knowledge of our world.

A few years back Gabe Sawhney and I co-prepared and Gabe gave the presentation entitled CivicAccess.ca: Democracy in an information age and the need for free and open civic data at Geotec organized by Matt and it is nice to see Matt doing some new stuff.

dbpedia

DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web. DBpedia allows you to ask sophisticated queries against Wikipedia and to link other datasets on the Web to Wikipedia data.

Dreamy Travel-Time Maps

MySociety has released some very useful and sexy interactive travel-time maps for the UK using public data.

From Wired:

Sources at Google have disclosed that the humble domain, http://research.google.com, will soon provide a home for terabytes of open-source scientific datasets. The storage will be free to scientists and access to the data will be free for all. The project, known as Palimpsest and first previewed to the scientific community at the Science Foo camp at the Googleplex last August, missed its original launch date this week, but will debut soon.

Building on the company’s acquisition of the data visualization technology, Trendalyzer, from the oft-lauded, TED presenting Gapminder team, Google will also be offering algorithms for the examination and probing of the information. The new site will have YouTube-style annotating and commenting features.

[Via Open Access News]

The Federation of Canadian Municipalities has just released its Quality of Life Reporting System (Press Release) – Trends & Issues in Affordable Housing and Homelessness (Report in pdf).  If you go to the end of the report & this post you will find the data sources required to write this important report on the situation of housing & homelessness in Canadian Cities.

NOTE – these public datasets were purchased to do this analysis.  It costs many many thousands of dollars to acquire these public data.  Public data used to inform citizens on a most fundamental issue – shelter for Canadians.  Statistics Canada does not generally do city scale analysis as it is a Federal agency and the Provinces will generally not do comparative analysis beyond cities in their respective provinces.  This type of cross country and cross city analysis requires a not-for-profit organization or the private sector to do the work.  We are very fortunate that the FCM has the where-with-all to prepare these reports on an ongoing basis.  This is an expensive proposition, not only because subject matter specialists are required, much city consultation is necessary for contextual and validation reasons but also because the datasets themselves are extremely costly.  There is no real reason to justify this beyond cost recovery policies.  Statistics Canada and CMHC are the worst in this regard.  The other datasets used are not readily accessible to most.  While the contextual data requires specially designed surveys.

The documents referred to in this report were however freely available but not readily findable/discoverable as there is no central repository or portal where authors can register/catalogue their reports.  This is unfortunate as it takes a substantial amount of effort to dig up specialized material from each city, province or federal departments and NGOs.

Public (but not free) Datasets Used in the report:

  • Statistics Canada – Population Census, Population and Dwelling Counts, Age and Sex, Families and Households, and Housing and Shelter Costs, Tax Filer Statistics for economic, some population data and Migration Estimates.
  • Canada Mortgage and Housing Corporation – Customized data on the cost of housing, the availability of housing, vacancy rates and housing Starts and Completions Survey.
  • Citizenship and Immigration Canada – volume of immigration, demographics of immigrants and destinations in Canadian cities.
  • Human Resources and Social Development Canada – Minimum wage database, Homeless Individuals and Families Information System (HIFIS).
  • Homeless and Social Housing Data were derived from 22 FCM QOLRS participating municipalities.

Report and Press Release Via Ted on the Social Planning Network of Ontario Mailing List.

« Older entries § Newer entries »