Market researchers have long been confident that there are some questions big data can’t answer. But new work in the field suggests that confidence might be misplaced.
Of course, big data faces its own set of challenges. The reported failure of Google to accurately predict Flu Trends may prove to be a turning point in popular perceptions of the supposed infallibility of big data. Likewise, there are also questions about the matter of representativeness. The idea that we can easily replace carefully designed sample with an ‘N=All’ approach is starting to look a little naive.
But while big data might be looking a little weaker in some respects, in others it’s strengthening its hand. We in market research have been feeling pretty secure that even if big data can capture everything about what we do, then organisations will still need market research to explain why we do it. So here’s the bad news. The evidence is mounting that big data can tell us a lot about the soft issues that we had assumed were the preserve of market research.
One example that I have written about previously was work undertaken by Cambridge University and the Microsoft Research Centre, which found that Facebook Likes can be used to predict a variety of personal attributes including religion, politics, race and sexual orientation. Their research involved 58,000 Facebook users in the US who completed a psychometric questionnaire through an app called ‘myPersonality’. Those taking the test were asked to provide the researchers with access to their Facebook data. The team were able to create some highly predictive models using these Likes. For example, they were able to identify male sexuality and sort African-Americans from Caucasian Americans, Christians from Muslims and Republicans from Democrats. There were also some pretty impressive figures for predicting relationship status and substance abuse.
Evidence is mounting that big data can tell us a lot about the soft issues that we had assumed were the preserve of market research
Another example is a study by researchers at Cornell University, who analysed over 1.5 million geotagged tweets from almost 10,000 people in the US. They wanted to understand if the content of the tweets themselves could be used to predict the location of the user, as identified from the geotagging. So they divided the data set in two, using 90% of the tweets to train their algorithm and the remaining 10% to test it against. What they found was that tweets contained an awful lot of information about the likely location of the user. Some of it was obvious, such as tweets that were generated by the location-based social networking site Foursquare, thus giving an exact location. Other tweets contained references to the city they were in. And others made reference to events that were taking place in their location. As a result of all this information, they were able to create an algorithm that correctly predicted people’s home cities 68% of the time, their home state 70% of the time and their time zone 80% of the time.
Finally, a paper published in Nature last year found that the lifestyles of mobile phone users could be predicted from their patterns of movement. This was based on an analysis of mobile phone location data, using the data automatically generated by the device pings the network at regular intervals, regardless of whether it is in use or not. The analysis found it was possible to allocate to 95% of users a unique ‘fingerprint’ based on their movements so that it was possible to accurately predict at what time of the day individuals would be in a certain neighbourhood or town. When linking this information to mapping data, it was then possible to infer a lot about that individual’s lifestyle.
It is highly likely that these studies represent merely the tip of the iceberg of activity that is underway in this area. It is usually only academic researchers that place their findings in the public domain and make them available for peer review. And academics often struggle to get access to big data assets. So we can assume that this sort of activity is being widely undertaken by many data-intensive industries including, of course, database marketing organisations.
So is this a threat or an opportunity for market research? While it is hard to see how the current level of understanding in this area (as outlined above) could directly displace much of the existing portfolio of market research survey work, the issue is less to do with what is possible now then the direction this is going in. In a very short space of time, we have reached a position where some fairly basic, but intimate, information has been accurately inferred from our data trails. What if we can start inferring levels of customer satisfaction from digital behaviour? What if we can start inferring consumers’ needs before they have expressed them? That might sound far-fetched but it is exactly what Google is exploring with their anticipatory systems such as Google Now.
Yet again, this is a reinforcement of the need for market research to engage with a much wider set of tools than that of its traditional repertoire. We need an understanding of the academic literature around consumer behaviour, access to and ability to handle large-scale data sets, and a facility to leverage this to meet business needs. These sound like skills which reside in our industry and as such we should be perfectly placed to meet this exciting challenge. But only if we see this as an opportunity and act before others do.