One particular field of journalism is data journalism. Simon Rogers, Data Editor at Twitter, former editor of the Guardian’s award-winning Datablog and an instructor of the free online data driven journalism course, describes data journalism as a way of telling stories by using numbers. It brings stories that are in the public eye to life by showing the numbers behind the news. The data can be accompanied by visualizations, but they are only there in service of the story.
For as long as journalism has existed, the reporting of data has played a role as well. In the olden days data was often collected by using a notebook and a cassette recorder and journalists often had to rely solely on the research and analysis performed by statisticians. Over the years the techniques of data journalism have changed. Journalists have had much easier access to tools that help them gather data, such as Excel and Numbers, and easier access to tools that help visualize their data. In the digital age we now live in there has also been a wider spread of open data. Governments and other organizations that collect statistics around the world are publishing thousands of databases online, which has made it both easier and harder at the same time for journalists to find the data they are looking for. The search for data has become easier, because journalists can now browse through the Internet and search for the information they need. However, since so many datasets are now available to journalists and the public in general, it is also more difficult the find the ‘perfect’ dataset. What I mean by the ‘perfect’ dataset is a dataset that not only offers you the data you’re looking for to accompany your story, but that is also valid. This blog post will offer you, as journalists, guidelines on how to find this ‘perfect’ dataset yourselves.
How can you get data to support your story?
Quote: “Data journalism begins in one of two ways: either you have a question that needs data, or a dataset that needs questioning. Whichever it is, the compilation of data is what defines it as an act of data journalism”. – Paul Bradshaw
Paul Bradshaw is the Head of the Online Journalism MA at Birmingham City University, Visiting Professor at City University’s School of Journalism in London and also an instructor for the online data journalism course. His quote shows that a story can be either based on a question for which you need to search data or on a dataset which raises an interesting question that needs to be sorted out. This blog post will be focused on the first situation. You have a certain topic in mind and are looking for data to accompany your story. The first thing to do is ask yourself ‘What kind of data am I looking for?’. When you know what you are looking for, you can start searching for the data.
Where can you find data?
Of course you can collect your data by doing your own research, but in most cases you will probably not have the time or money to do that. Therefore, a quicker way to gather data would be to look for it online. As I have mentioned before, more organizations are publishing their data online. You can, for instance, go to a government website or the website of a national statistical service and find all sorts of data there. On this Wikipedia page you can find a list of national and international statistical services you could use to gather data. You can also look for information on the websites of international bodies, e.g. the website of the World Health Organization, the United Nations, the World Bank or the European Union.
How do you know if your dataset is valid?
When you have found a dataset, you need to make sure that the data is valid and does indeed support your story. So how do you know if your data is trustworthy? Rogers states that when you are relying on data that is collected by someone else, you need to check who collected it and when and how it was collected. Get in touch with the person who collected the data and ask them about it. Besides that, also try to find another source that has the same kind of data and compare that dataset with the one you found. These two steps are very important to determine whether your data is valid or not. Take for instance this example as described by TechTarget, that shows how the analysis of big data projects can go wrong. In this project researchers wanted to use Twitter feeds and other social media to predict the unemployment rate in the United States. They looked for words that pertained to unemployment, e.g. jobs, unemployment and classifieds, in tweets and posts on other social media. After that they looked for correlations between the number of words per month in this category and the unemployment rate of that month. During the project there was a sudden increase in the word count, so the researchers believed they were on to something. However, what they failed to notice was that Steve Jobs died in that same period they found an increase. Therefore, the number of tweets with ‘jobs’ in them were of course higher but not related to unemployment. If the researchers had looked more closely at what was happening during the time of their research, they would have known that the increase in words was unrelated to the unemployment rate. So it is important for you, as a journalist, to be aware that not all research is accurate and trustworthy. If you look for another dataset that says the same thing, the chances that you found a good, trustworthy dataset are higher. Furthermore, you need to be aware of how you interpret the data. Most mistakes about false data analysis are made by interpreting the data wrong. Look carefully at what the data is actually saying and not just at what you want or believe it is saying.
Keep the five W’s in mind
The most important things to remember when you are trying to see if your dataset is valid are the five W’s, as described by Simon Rogers. Ask yourself these questions before you use the dataset you found.
- Who: Where did the data come from?
- What: What are you trying to say with your data?
- When: How old is your data?
- Where: Which situation is described by the collected data? An essential part of data journalism is to combine different datasets and create a new story. Simon Rogers has, for instance, combined the gun ownership and homicides over the world and made one supporting visual out of it.
- Why: Why is the data you found interesting and what does the data mean?
In conclusion, the ‘perfect’ dataset will offer you the data you are looking for, that can accompany your story and that is also valid. This blog post has showed you how to find this dataset and how to determine if that dataset is valid. To summarize, you need to check who collected the data you found and when and how it was collected. Get in touch with that person and ask them questions about their data. When you found a dataset that could support your story, be aware that not all data is accurate and trustworthy. Try to look for another source with the same kind of data. The chances that your dataset is trustworthy are higher when you have another source that says the same thing. If you want your data to be valid, always keep the five W’s in mind. The five W’s offer you guidelines that can help determine whether the data can be trusted or not.