By Jeff Melton, SVP, Global Technology & Platforms, MSLGROUP
The never-ending advancement of technology has exponentially increased the kaleidoscope of structured and unstructured data, which has in turn increased our ability to quantify everything around us. Just four years ago technology’s creation of data reached a tipping point that led to the popular adoption of a previously unused phrase – Big Data.
According to Google Trends the term ‘Big Data’ started to increase in search interest in 2011 though the The New York Times cites, “2012 being the breakout year for Big Data as an idea, in the marketplace, and as a term.” The same New York Times article cites John Mashey the chief scientist at Silicon Graphics in the 1990s as the originator of the term. Mr. Mashey stated, “I was using one label for a range of issues, and I wanted the simplest, shortest phrase to convey that the boundaries of computing keep advancing.”
Seek the right data, not more data
As Big Data has grown the potential of insight generation has also perceivably grown. But with potential also comes complexity and so we must not forget the research and statistical principles so important to insight generation. When gazing at the vast opportunity that Big Data presents we need to remember to seek the right data not more data.
When analyzing data via a statistical model the main goal is to obtain an estimate of the unknown. We seek to take a sample of observations, investigate their behavior, and generalize in order to understand aspects of the wider population. Throughout this endeavor, of utmost importance are precision and accuracy; the sheer volume of data at hand only addresses half of this dichotomy.
Within an experimental design, replication is necessary to prove a point. We want to have evidence that points towards an associative relationship: when X happens, Y is likely to follow. In this light, precision is a measure of how often we can obtain a similar result upon replication. This can be refined when gathering more data – unfortunately, the accuracy of our results does not depend as greatly on how much data we have, but instead on the kind of data we’re manipulating.
Simply gathering copious amounts of data can actually amplify flaws in statistical methodology and potentially induce unwanted biases rather than refine results.
The dichotomy between precision & accuracy
Consider the analogy of throwing darts at a target: our goal is to hit the very center every time.
- A precise individual will be able to consistently hit approximately the same spot on the target over the course of many throws – but this location may not be near the center.
- On the other hand, an accurate individual will be able to, on average, hone in on the center of the target, yet their darts may land with high variability.
- The best player is one who possesses both the ability to replicate their actions and hit the center of the target at a whim (i.e., both precision and accuracy).
Achieving a balance in data collection
Among the sea of “big data,” blindly riding the torrents can lead to shipwreck; we have to be selective and mindful of the waves that will lead safely to shore. The value of a polished body of data far outweighs the value of its magnitude; thus, it is necessary to offer balance in data collection that straddles between sufficient volume and targeted data sects.
The solution is a clear measurement plan complete with a clear data plan. Both plans assist in the forming of a clear hypothesis that will allow for the efficient collection, analysis, management, and visualization of the right data. Finding the right data can be a daunting task that is only going to grow as data gets bigger.
From managing to accessing: A new approach to data
There is a market opportunity for technology and or services that assist in the understanding of existing – but more importantly – new data sources: a data match-making service.
Creating an easy and intuitive service – whether automated technology or consulting – that matches business challenges, hypotheses, or generic questions with the right technologies, methodologies, and data sources might become the critical missing piece in the growth of bigger and bigger data.
The future won’t be about managing Big Data but rather understanding Big Data as a growing library of opportunity. The entities that best manage the understanding of Big Data sources, technologies and methodologies in order to uncover the right data will advance most efficiently and potentially with the greatest profit.
This post is part of our People’s Insights report Data In. Data Out. Transforming Big Data into Smart Ideas.