While most companies selling you the big data solution may tell you they've got it all figured, one of IBM's top data analytics researchers delivered a hefty dose of reality at the company's Information on Demand 2013 conference in Las Vegas today.
Further reading The real Minority Report: Rochester Police Department uses IBM tech to stop crimes before they happen IBM IOD 2013: IBM gets serious about big data with a brace of new product announcements IBM IOD 2013: 20 per cent more big data skills will be required within next five years says IBM
Director of information management for analytics research at IBM, Aya Soffer, identified learning which data to "trust" as a "big problem" in big data analytics, asking whether many analysts are still collecting the wrong data, only to risk making "stupid decisions".
Discussing ways in which big data can overcome one of its much-hyped "Vs" - veracity - Soffer acknowledged that the rise of "Hadoop-like architecture" has now made data far easier to parallelise, and with that, easier to digest, interpret and ultimately trust; even under uncertainty.
"What Watson [IBM's flagship cognitive computer, which beat champions of game show Jeopardy back in 2011] does," explained Soffer, is say 'I don't know if it's 100 percent, but I can tell you 78 per cent that I think this is something'."
Soffer also told of how IBM is starting to look into social media data mining to work out how to "fuse information" in order to better cross-reference it.
"If you have some of the information extracted out of social media, and you want to make a decision on whether this is a real trend or just people posting things on behalf of companies, this is where the numbers play in our favour," she explained.
"So we can do statistical analysis to see if it's meaningful, statistically. And the more you put the pieces of the puzzle together, the more you can trust information to some extent."
But "to some extent", said Soffa, still isn't enough.
"That's clearly still a big problem: what data can you really trust? Are we ingesting lots and lots of information just to make stupid decisions, because the data was bad?"
To combat this, one method Soffer is looking at is a form of computer crowd-sourcing, having observed the way that "if you pull a crowd of people to look at [something], they usually come up with the right observations".
Building a matrix of tiny, "dumb" computers could be a data analytics equivalent of this approach. But it's important, said Soffer, not always to force computers to think like people.