If the 2015 SCAA Symposium had to be summarized in one word, I would summarize it with the word, “data.” We heard a lot of and a lot about data and the need for data derived decisions (alliteration win). I got the impression from presenters that we’ve been flying a little blind when it comes to decision-making. As an engineer and scientist, I was thrilled to hear that the specialty coffee world was moving towards more analytical methods of decision-making, but I would also venture to say that as an engineer and scientist, I am intimately familiar with the dangers of data. Therefore, I offer a few thoughts to keep in mind when data is being used as the major tool of persuasion and decision-making.
No doubt, by now, we’ve all heard about how we live in the age of big data. It’s all around us. Google, Facebook, and other entities make fortunes on data. Information is the currency of our time. Although data is constantly used in rhetoric to convince us of something, data in and of itself has no meaning. Data is only as good as two things: design of the experiment/method of data collection and the data interpreter.
Experiment design and method of data collection has profound impact of the quality of the data used for decision-making. For instance, every five minutes we hear statistics being spouted out from the most recent survey on such and such and how this emphatically proves whatnot beyond a shadow of a doubt—well, until tomorrow when the next survey proves the exact opposite. What’s happening here? Doesn’t the data prove one case over the other? More often than not, the issue stems from poor experiment (or survey) design that results in biased data. Biased data normally arises when the method of collection doesn’t cast a wide enough net around the issue and the results are skewed. For example, an experiment surveyed an international, multi-generational, varying income class set of participants and found that the University of Cincinnati was the most chosen school for higher education. Sounds good right? That is until I tell you that the survey pool was my family—father from Zimbabwe, mother from USA, different generations, and different income levels, but three of us went to UC for our bachelor’s degree. Obviously, this is ridiculous, but you can easily see how data can be biased towards a particular result. Not all experiments are this way, but be aware and critical when reading the newest set of “objective” statistics. Excellent experiment design and unbiased data collection is of critical importance for data derived decisions—especially when these decisions could impact large amount of people.
Even with impeccable experiment design and data collection that yields unbiased data, interpretation of that data can be problematic. I would like to bring your attention to one of the most common mistakes I come across regularly as I read published journal papers, reports, and internet articles, which is the ecological fallacy. This fallacy is common in research and social/geographic analysis. One type of ecological fallacy is when we have data about a group, but make broad inferences about individuals in that group. Be warned, once you understand this fallacy, you will see it everywhere! For example, my wife has two section of senior English class. On average, class 1 scores significantly higher than class 2. I run into a student from class 1 and say, “Here, edit my blog post because you’re an English master!” False, they may be an English master, but they could also be the worst in the class. Just because the average of class 1 is higher than class 1, does not make each individual in class 1 a brilliant wordsmith. In fact, statistically, there will be several students in class 2 that are much more capable English students than in class 1—the ecological fallacy in full glory.
So why bring this up? Because data and its use impacts us all. Many high level decisions in any organization are based on collected data. Bad data = bad decision. Good data + bad interpretation = bad decision. As the specialty coffee community moves forward into a new age of data derived management, I hope we remember these two important facts about data. Especially, the ecological fallacy, as every country and coffee growing region has its own culture, laws, and ways of life. Trust me, few could be more excited than I am that the SCAA is pushing for more data and research, but we need to be diligent in our approach to collection and analysis of data—let’s avoid the mo’ data, mo’ problems scenario.