The Pumpkin Spice School of Big Data

Source: Pumpkin Spice Trident Layers Gum by Mike MozartIn our particular pocket of New England, the leaves are turning golden, and football is replacing baseball on the TVs. This means one thing to coffee drinkers: the re-emergence of the Pumpkin Spice Latte at Starbucks. Over the past ten years, this drink has gone from an odd cult drink to a phenomenon so large that it has earned its own hashtag on Twitter: #PSL.

At the same time, one has to wonder, “What is Pumpkin Spice?” (Other than possibly the long-lost American cousin of the Spice Girls?) Pumpkin spice doesn’t actually have pumpkin in it. And it’s far from the spiciest flavor out there. However, the concept of “pumpkin spice” insinuates the idea of something that’s handmade, traditional, and uniquely American in a way that draws people into the concept of wanting to consume it. Despite its complete lack of pumpkin and relative lack of spice, the flavor created is almost secondary to the cultish conceit that has been constructed around “Pumpkin Spice.”

Unfortunately, the hype, conceptualization, and ubiquitous phenomenon of Pumpkin Spice is matched in the enterprise world through the most overhyped phrase in tech: Big Data.  Like Pumpkin Spice, everybody wants Big Data, everybody wants to invest in Big Data tools, and everybody thinks that we are currently in a season or era of Big Data. And in the past, we’ve explained why we reluctantly think the term “Big Data” is still necessary. But when you go behind the curtain and try to figure out what Big Data is, what do you actually find?

For one thing, “Big Data” often isn’t that big. Although we talk about petabytes of data, there are practitioners that talk about “Big Data” problems that are only hundreds of megabytes. These are still very big portions of data, but these problems are manageable through traditional analytics tools.

And even when Big Data is “big,” this is still a very relative term. For instance, even when Big Data collects terabytes of data, text, and binaries, the data collected is rarely analyzed on a daily basis. In fact, we still lack the sentiment analysis, video analysis, and audio analysis needed to quickly analyze large amounts of data. And we know that data is about to grow by at least one order of magnitude, if not two, as the Internet of Things and the accompanying billions of sensors start to embed themselves into our planet.

Even outside of the Internet of Things, the entirety of the biological ecosystem represents yet another large source of data that we are just starting to tap. We are nowhere close to understanding what happens in each of our organs, much less in each cell of our bodies. To get to this level of detail for any lifeform represents additional orders of magnitude for data.

And then there’s even a higher level of truly Big Data when we track matter, molecules, and atomic behavior on a broad-based level to truly understand the nature of chemical reactions and mechanical physics. Compared to all of this, we are just starting to collect data on Planet Earth. And yet we call it Big Data.

So, our “Big Data” isn’t big in comparison to the amount of data that actually exists on Earth. And the types of data that we collect are still very limited in nature, since they almost always come from electronic sources, and often lack the level of detail that could legitimately recreate the environment and context of the transaction in question. And yet we are already calling it Big Data and setting ourselves up to start talking about “Bigger Data,” “Enormous Data,” and “Insanely Large Data.”

To get past the hype, we should start thinking about Big Data in terms of the scope that is actually being collected and supported. There is nothing wrong with talking about the scale of “log management data” or “sensor data” or “video data” or “DNA genome data.” For those of us who live in each of these worlds and know that log management gets measured in terabytes per day or that the human genome has 3 billion base pairs and approximately 3 million SNP (single-nucleotide polymorphism) replacements, we start talking about meaningful measurements of data again, rather than simply defaulting to the overused Big Data term.

I will say that there is one big difference between Pumpkin Spice season and Big Data Season. Around the end of the year, I can count on the end of Pumpkin Spice season. However, the imprecise cult of Big Data seems far from over; the community of tech thought leaders continues to push more and more use cases into Big Data, rather than provide clarity on what actually is “Big,” what actually constitutes “Data,” and how to actually use these tools correctly in the Era of Big Data.

In this light, Blue Hill Research promises to keep the usage of the phrase “Big Data” to a minimum. We believe there are more valuable ways to talk about data, such as:

- Our primary research in log and machine data management
- Our scheduled research in self-service topics including data quality, business intelligence, predictive analytics, and enterprise performance management
- Tracking the $3 billion spent in analytics over the past five years.
- Cognitive and neuroinspired computing

By focusing on the actual data topics that provide financial, operational, and line-of-business value, Blue Hill will do its best to minimize the extension of Big Data season.

