In 1985 I started my “data scientist” career as the head of the DataBank at the Center for Urban Economic Development at UIC-Chicago. Ahh yes, these were the days when universities were investing in mainframe computers, Home Mortgage Disclosure Act data were becoming available, and the phrase “data-driven decision making” was entering our lexicon. Heady days. My first action item was to visit local community and public-sector agencies to find out what data they wanted and needed to run their programs. The primary ask was for city- or neighborhood-level data on health, employment and housing–could the DataBank somehow find these data?
I’m now head of the Metro Chicago Information Center and am still a part of the civic data movement, now in the era of Big Data and Open Government. MCIC is running the Apps Competition for Metro Chicago, Illinois (www.appsformetrochicago.org) – an unprecedented government partnership with conscious outreach on needs for and connections between developers and community organizations. By design, MCIC is collecting data desires from coders and community groups – what do you wish you had?
The funny thing is that these folks want the same stuff the civic groups wanted in 1985. As I look at the data wish list compiled by our outreach folks at MCIC, the similarities to 1985 are striking: neighborhood housing data, local health indicators, more data on city businesses. The difference is that now there is a false belief that these data exist somewhere, if only they could be delivered to potential users in a structured and consumable manner. Unlock the data!
But are there really more civic data to be easily accessed in the Big Data era? To a certain extent, yes—technology opens new data possibilities, but I believe we’ve also been lulled us into a false sense of abundance. The reality is that timely data are not as available as one might think; the amount of data varies widely by subject, by governmental source and by geography; and that most data are not easily mash-able/app-able for quick digestion. Data emerge through a complex socio-operational context of which technological change is only a part. Most important, the operational needs of cities lie at the core of any determination of civic data availability. City government collect data that help them operate cities.
Think about it. Data on housing vacancies don’t exist from city governments because homeowners have to file a certificate of “occupancy” but not a certificate of “non-occupancy.” Public health incident data aren’t available because there are a myriad number of health facilities run by nonprofits, federal agencies, and city health departments and they very rarely share data. We don’t know characteristics of businesses in cities because local governments generally don’t collect the kind of information we want. Yes, they do inspections and licensing as part of operations, but that doesn’t give us number of employees, or sales history, or lines of business. The (federal) Bureau of Labor Statistics DOES collect this information, but does NOT publish economic data for cities—maybe that’s what we should be advocating for.
Coupling the energy of amazing civic coders with the multifaceted knowledge of data geeks is the best way to bring about real data liberation.