Data stories are usually big stories – in terms of time, cost, and what well-crunched numbers have the potential to reveal. Because the stakes are high, it’s important that reporters acquire the best data tools and the knowledge to use them.
So we asked these three experts to weigh in on the current technology that data journalists are using for finding, processing, and contextualizing data. Our experts are:
- Derek Willis, a news-app developer for ProPublica and co-founder of the OpenElections Project
- John Wihbey, lecturer at Northeastern University and assistant director of Harvard University’s Shorenstein Center on Media, Politics, and Public Policy
- Alexander Howard, senior analyst at the Sunlight Foundation, an open-government advocacy group
Here’s what they said.
The spreadsheet is supreme
As simple as it may sound, spreadsheets are one of the most fundamental tools for crunching data. Experts use tools that are at the fingertips of any layperson: Microsoft Excel spreadsheets or Google Sheets. The latter allows users to import an Excel spreadsheet, collaborate remotely with colleagues, and create colorful infographics.
There are free tools out there to gather, clean, and organize data, but they vary in sophistication. OpenRefine is another useful tool for cleaning up “messy data,” like format inconsistencies, into a streamlined spreadsheet ready to go, says Wihbey of the Shorenstein Center.
For more pointers on the basics of finding and using data, experts recommend the Data Journalist Handbook.
Data visualization doesn’t have to be difficult
The Internet is rich with free applications for visualizing data. Some of these Web-based tools don’t require coding, either. CartoDB is a popular tool for creating interactive maps and other charts and doesn’t require sophisticated programming, says Wihbey. Other tools with a low barrier to entry in terms of coding skill are Infogr.am and Plot.ly.
Willis says Chart Chooser by Juicebox is a “great way to quickly visualize data,” as is Bl.ocks.org. SAP BusinessObjects Lumira also ranks high in ease-of-use with point-and-click data manipulation and the ability to create engaging visualizations with any amount of data in real-time.
Another tool for visualizing data is research, says Howard. “Data visualization is its own art and science. If you’re going to use charts, graphs, and diagrams, study the history of information presentation in the sciences,” he says, recommending Edward Tufte’s books on the subject as a good place to start.
Scientific tools take research to the next level
“The highest levels of data journalism blur into data science,” says Wihbey. Some of the tools used by the most skilled data journalists are also used in the tech industry and academia.
Python is a programming language that can also be used to “scrape” data from the Internet to create a data set that never existed before. The Shorenstein Center’s Journalist’s Resource website provides a “Web scraping” tutorial showing how the coding language harvested Iowa’s Polk County inmate records and put them in a spreadsheet for analysis.
Beyond Python, the programming language R “is really the leading edge in journalism” for those who have a coding background, says Wihbey. It’s used by The New York Times Upshot blog and Nate Silver’s FiveThirtyEight.com for statistical analysis. But this advanced level of data analysis and presentation requires knowledge of Python, he adds.
Another emerging trend is the advent of data services that offer increasingly sophisticated insights – such as SAP Digital Consumer Insight that provides information on consumer ‘mobile moments’ in and around specific locations. This is a fast growing trend with many such offerings likely to be available in the future.
This blog was written through a partnership with Thompson Reuters. To learn how SAP is helping it Run Live, click here.
If you enjoyed this post, you might also enjoy: