The Art Of The Database: How To Mine For Data For Your Reporting

Reuters Content Solutions

There is a wealth of data out there that can enrich journalists’ stories, but it can be hidden treasure – and unearthing it may take longer than the typical news cycle allows.

During this volatile and fast-moving presidential election, the media is under intense pressure to break news quickly, and finding data is crucial for analyzing public opinion, revealing special interests, and tracking campaign finance.

The process of hunting down significant data can be byzantine, and analyzing it can be daunting. Freedom of Information Act roadblocks are rife, but fortunately, data stories can be told without the dreaded public-records request process. Here is a quick guide for accessing and using data.

Be a reporter first

Start by researching agencies that might have the databases you’re looking for and calling officials there to minimize a potentially frustrating records-request process, experts say. “One of the hardest things for journalists is to figure out who has what,” says John Wihbey, an assistant professor of journalism at Northeastern University.

Every agency on the federal and state level employs someone to help people get access to information. Developing a relationship with that gatekeeper is a good place to start.

Sometimes a little sweetness goes a long way, says Derek Willis, a news-app developer for ProPublica who has contacted numerous county, state, and federal agencies for his Open Elections project. “If it’s really local, like it’s your county, bringing donuts is never a bad idea,” he says. “If I’m asking for, like, 10 years of elections results, then that goes above and beyond what they normally do.”

FOIA frustrations

The Obama administration has ushered in one of the least-open administrations in history. The federal government has a backlog of more than 100,000 Freedom of Information Act (FOIA) requests, resulting in wait times of months or years. President Obama signed a law last month that strengthens FOIA by forcing agencies to err on the side of disclosure. But the law does nothing to diminish the backlog.

“The backlogs are so great that it makes FOIA a really, really difficult tool to make it possible for any contemporaneous reporting,” says Adam Marshall, an attorney with the nonprofit Reporters Committee for Freedom of the Press.

Only about 33% of FOIA requests were released in full in 2015, according to About 60% were partially released and seven percent were completely rejected.

Exemptions vary by state, but they allow agencies to deny requests that might violate personal privacy, compromise law-enforcement efforts, threaten national security, reveal commercially valuable information, or break laws surrounding confidential communication between attorneys and clients.

The Center for Public Integrity put together a scorecard for how states rank in terms of transparency and public corruption, using states’ laws on access to public records as one factor. They found that only three states scored higher than a “D” (they were Alaska, California, and Connecticut).

Open data as alternative to FOIA

Thanks to advocacy groups and the open-data movement, there are troves of data floating around the Internet.

Sites like and are good places to get acquainted with the kinds of information that’s online for the public. Harvard University’s Shorenstein Center on Media, Politics, and Public Policy put together a digest of useful data sets, including a list of administrative databases from federal agencies.

The administrative databases include data sets like demographic information on food stamp recipients from the Department of Agriculture and civil rights data for public schools from the Department of Education. Other open data sites collect niche data sets, like FDAzilla, which sells data on restaurant inspections to business owners but is also accessible to journalists.

City governments, like New York City, and organizations like the World Bank are examples of agencies with open-data initiatives and online archives. ProPublica makes its data available at a cost through its data store. Some private companies post public data about their business, like the home-sharing startup Airbnb does.

Universities also sell data. T. Christian Miller, a senior investigative reporter with ProPublica and a board member of Investigative Reporters and Editors (IRE), says he often bypasses law enforcement agencies and goes straight to the University of Michigan’s National Archive of Criminal Justice Data.

“They keep it up to date, they massage it, and they may share it with you if you can make a good case argument for why you need it,” says Miller.

Public voter registration files are one way to find an election story, says Alex Howard, a senior analyst at the Sunlight Foundation and a former fellow at the Tow-Knight Center for Entrepreneurial Journalism.

Howard says that analyzing registration data could offer a glimpse into “whether it’s difficult to register to vote” because of state voter ID laws. Willis of ProPublica said he has analyzed voter registration data to debunk claims that candidates were bringing new voters to primaries.

“This is an answerable question, thanks to data,” says Willis.

Stories on campaign finance can use databases from the Center for Responsive Politics at and the National Institute for Money in State Politics.

Using open data resources can be a good place to get story ideas, but it won’t likely lead to the kind of groundbreaking investigative stories that make a difference, warns Miller.

“I’m very conflicted,” says Miller. “I think open data as a theory is awesome. In practice, it’s what the government wants you to see … restaurant inspections are very important, but what they’re not going to let you see is the lobbying deals. I have yet to see a story done from open data that’s particularly revealing.”

Putting the data to work

Once you’ve tracked down a data set, the next challenge is analyze it. “Data is only interesting if it tells us something new,” says Wihbey.

There are numerous educational resources for data reporting that range from the Shorenstein Center’s tip sheet on basic math for journalists to using spreadsheets and learning to code.

“Increasingly, if you at least don’t have a base of technical skills, the kind of things you can do is narrowed,” says Willis, adding that he advises journalists to learn how to convert PDFs into spreadsheets if they will be working with records.

IRE’s National Institute on Computer-Assisted Reporting (NICAR) organizes seminars, workshops, and an annual conference for journalists who want to brush up on their data-reporting skills. Local journalism programs often offer seminars on digital journalism, too.

Northeastern University’s online StoryBench project has tutorials on digital reporting, including how to manage data and build graphs and charts.

Technical implementation aside, experts say that data should just be one part of the reporting process, not a substitute for human voices. “You should also do some shoe-leather reporting, talk to people who are involved,” says Wihbey.

“The cliché is putting a face on the data,” he adds. “It’s also to make sure the numbers reflect what’s going on. Don’t just rely on the numbers: talk to humans, who are stakeholders in the actual data.”

For more on this topic, read the Data Journalism Addendum: Three Experts Pick The Best Tools For Data Journalists

This blog was written through a partnership with Thompson Reuters. To learn how SAP is helping it Run Live, click here.

If you enjoyed this post, you might enjoy:


About Reuters Content Solutions

Reuters Solutions develops tailored, multimedia content with a journalistic approach. Reuters Solutions operates independently from Reuters editorial.