What is Big Data?

Data
Big Data: what is it?

What is “big data?” Good question. Its name suggests that it describes a large pile of something, collected and organized by a company: numbers, autocorrect mistakes, search queries, anything

More practically, Big Data is often the tagline for aggregative software services that do things like predict fluctuations in airline ticket prices, or track video-viewing habits on Netflix. It collects and stores all of this data for retrieval later, and then uses it to try and predict outcomes. Accordingly, phenomena like House of Cards (based on painstaking research of Netflix habits), the 2008 Wall Street meltdown, and the installation of (mostly unmonitored) video cameras in seemingly last corner of Chicago are good examples of Big Data at work. What’s so great about any of that? To be fair, Google and especially Facebook can be regarded as leading Big Data collectors, too, but in both cases, the benefits they’ve provided are often matched by the privacy infringements, security concerns, and general Internet fatigue that both of those “free” services can cause.

The next time that some TED speaker, Amazon-bestselling author, or columnist tells that we are living in a uniquely disruptive and transformative era and that (this time, anyway) Big Data is the reason why, your should be skeptical. Big Data, as understood in the tech media, is basically a way to collect data, infringe privacy, and, in return, provide services (often “free” – be wary of anything that’s “free,” because it usually has a hidden price in the data it collects).  Its Bigness is a byproduct of higher network speeds and cheaper, easier cloud storage. Other than size, its data collection targets (what we do, watch, buy, sell, etc) are old-hat, nothing that would shock even the the Attic Greeks, who kept their own meticulous manual measurements (Small Data?) of diet and exercise regimens. There’s nothing new out there.

But Big Data is a Big Deal because it has no drawbacks for the parties that promote it. As Anthony Nyström pointed out recently, the idea of Big Data is so nebulous that even if it fails to deliver, then the speakers and evangelists who have sold tons of books and speeches on its account can simply say that the “data is bad” or that it’s your problem. This is what happens when people are allowed to get away with generalities and not pressed to be more concrete in their assertions. But it also highlights how flimsy the notion of “data” is, anyway. “Data-driven” and “the data” are terms that have become almost sacrosanct in the United States in particular. Elon Musk’s recent spat with the NYT over its “fake” review of the Tesla S is a good case in point. The reviewer-driver, Musk asserted, was simply lying when he said that the car had an unreliable battery that couldn’t hold charge in cold weather, and “the data” that Musk’s company had collected from the car would shatter the reviewer’s soft nonsense. No such thing happened. If anything, Musk’s torrent of data only inflamed the he-said/he-said debate.

Look: data is not some god or force of nature. It’s manmade, and handled by humans who have to then make sense of it. If you have a bad analyst, or too much data, then the entire operation can be compromised. Would Apple have been better off collecting more data about tablets before it made the iPad, rather than simply following Steve Jobs’ gut assertion that users needed to be guided in what they wanted? It’s debatable whether more data even leads to better decisions. And even in cases where the amount of data isn’t an issue, its quality can become one, too, even if it seems like good data on the surface. Data about lower crime rates in certain neighborhoods could lead one to think that crime wasn’t an issue there, despite having the obvious blindspot that many crimes go unreported and as such are not part of “the data.”

But that can be fixed, you might say – we just need better surveillance and better tools to give us better data. More technological progress (I disagree with the entire notion of “progress,” but I’ll let that slide for another time) you might say. OK: but at what cost? The same sort of nonsensical, overexcited language that drives a lot of the press about Big Data also drives the posts of many tech bloggers who advocate for rollbacks on privacy or any notion of any unconnected world. Jeff Jarvis thinks you shouldn’t be worried about losing your privacy, since publicness makes our lives better. Nick Bilton just can’t stand it that electronic devices can’t be used during airplane takeoff, as if those few moments of not being able to refresh Gmail or Facebook were critical to the betterment of humanity.

In these cases, as with the debate about Big Data and all of its privacy entanglements, it’s not so much the content of the assertion as it is the attitude with which it is made. It rings of “I know best” and has little regard for niceties like privacy and offline existence in particular. Don’t want to be part of “the data” made by Big Data and its tools? Too bad, that aforementioned attitude would say. What’s worse, the price of this “progress” toward more data and bigger data is often hidden because so many of Big Data’s tools are “free.” To be fair, paid services like Netflix are also part of the overall Big Data dredge. But general consumer awareness of how and why their data is being collected, whether by a free or paid service, appears to be low, and that’s too bad.

Slate has already worried that Big Data could be the end of creativity. I disagree, but I’m glad to see at least some pushback on the Big Data train – it isn’t clear that Big Data, despite all of its pretenses, is giving, or can give, us what we really want or need. Big Data, I think, assumes a certain linearity in how humans operate – that we show a machine, by way of what we click or like or +1, what we truly want, and that that input can be transformed into a high-quality output, like a certain type of content. I admit to making some data-based posts myself, but if I were to make this entire blog a slave to the data it collects, it would probably look like a super-geeky version of BuzzFeed, which, while fun for a while, would preclude some of the longer or more detailed posts that provide variety and often are surprise hits (at least from my modest perspective). So I’m sticking with just a modest, consciously restrained dose of data for now, something I think that those aforementioned Greeks would approve of.



Blog at WordPress.com.

%d bloggers like this: