What problems can Big Data solve?

Depending on whom you ask, Big data is solving a variety of problems. As the UN Global Pulse initiative sees it, big data allows “…decision makers to track development progress, improve social protection, and understand where existing policies and programmes require adjustment.” [Big Data for Development: Challenges & Opportunities, p4, May 2012. Download PDF].

Big data is as well said to enable the creation of financial value across business sectors and industries [source: McKinsey Global Research Institute analysis, download available from this page] – retail should especially benefit from creating better customer experiences, while the manufacturing sector should be able to increase its productivity significantly.

For us regular folks, whether we understand ourselves as buyers of industrially produced goods, or as frequent users of public transport services, big data can help us to reveal the physical availability of goods as well as to avoid traffic congestion.

All that is said to be enabled through the monitoring and deep analysis of previously uncorrelated data streams that have become so ubiquitous through new technologies.

The promise that big data has in stock is “It makes our lives easier.” Especially for us Westerners living in urban agglomerations, this may sound truly compelling.
According to a video produced and posted by OgilvyOne a couple of days back (here) big data can help ordinary people to solve ordinary problems. The video is nicely done, and quite entertaining.

The data layers staged in that video are – roughly – centered around three different aspects of big data: authentication, peer activity, and network activity.

Authentication, in that regard, is tied to individually owned devices, while peer activity defines a social vicinity that can consist of both personally known peers (whatever their current physical location may be), and others occupying a close-range shared space, probably involved in similar activities as the protagonist in the video. Network activity, finally, relates to personally not known individuals involved in an activity that is agglomerating in an infrastructure (could be a public transport system, an operator’s cellular network, or something similar).

The three main problem areas that the video addresses, are: selection, physical availability, and navigation.
These three things are deeply interlinked, as ideally only that is made selectable that also is available, and easily navigated to. “Availability”, in that sense, has a double facet as: (1) in existence, (2) in one’s vicinity.
In other words: only that is marked as selectable which also is accessible.

And, as is easily understood, any data brought into the big data game by individuals for authentication always has a connotation to contribute to peer activity, and to network activity. The “peer” layer, in other words, is situated right between the individual and the network – just as “availability” is banding together “existence” and “nearness”.

What big data really enables is this: it allows to dynamically re-organize the varying aspects of “peers” and “availability” by incorporating both temporal and spatial dimensions. This creates a much richer context (as Martin Heidegger could have called it: a space of possibility) than we have had at our disposal so far, as we can draw a “soft” border between an individual, bound and limited by their own volatile decisions or selections, and an individual-as-peer, being exposed to a selection offer (that may be decided upon by the individual by involving a particular set of other peers, but to which it responds in a way that can be put into a context of statistical significance).

Similarly, this approach can be turned around to illustrate the menace of big data: every single individual’s contribution to the network activity on peer level can be easily processed with recourse to this individual’s activity and authentication patterns. We simply can’t be sure whether the data traces we leave behind are processed with regard to us as individuals, or to us as peers (or network contributors).

As advertisers keep telling us: The selection network grows semantically richer if personally identifiable information (such as likes and preferences) is utilized across different points of the passage through a city. Having aggregated data about commuter streams is something entirely different than knowing that a certain individual has passed a particular landmark at a particular point in time by reading their RFID chip. But which part of our data is transferred, and to whom, is simply not in our hands.

These are first world problems, for sure.

According to the UN Global Pulse program, big data has entirely different implications for rural communities in developing countries.
The UN’s Global Pulse paper stages the situation of a “hypothetical household living in the outskirts of a medium-size city a few hours from the capital in a developing country.
The head of the household is a mechanic who owns a small garage. His wife cultivates vegetables and raises a few sheep on their plot of land as well as sews and sells drapes in town. They have four children aged 6 to 18. Over the past couple of months, they have faced soaring commodity prices, particularly food and fuel. Let us consider their options.” (p13)

In the scenario developed in the Pulse paper, the individual is not at all in the center of attention. Instead, rural communities are primarily staged as a network of peers. Consequently, the scenario is centered around “availability”, but with regard to scarcity, not to abundance.
And, as the authors write: “…a systemic – as opposed to idiosyncratic – shock will prompt dozens, hundreds or thousands of households and individuals to react in roughly similar ways.” (p14)

We see the notion of individuals-as-peers at work here. On individual level, the need for weighing options (“selection”) is given, but the determination of the scenario as “systemic, not idiosyncratic” draws the focus to the peer level.

Along with that, the “network” level is largely determined by data sources that are suited to show a collated shift in usage patterns: data from local mobile operators may “show a significant drop in calls and an increase in text messages”, mobile banking service providers may notice “a depletion of mobile money savings accounts”, satellite imagery may “show a decrease in the movement of cars and trucks moving in and out of the city’s local market”. (all citations from p14)

It is noteworthy that the UN paper doesn’t take the perspective of individuals sliding into debt, but aims to mitigate an economically challenging situation for a community or a country equally on a systemic (or: network) level.

What remains?
We need to be very careful: looking at what big data tells us on the different levels (of individual, peer, or network level) is not a quality inherent to the data. The same data can be used to do individual credit scoring, or to counteract a systemic crisis.

What remains the same in both scenarios is the distinction between the data layers as a merely theoretical one – and the contributors’ uncertainty to which extend their data is adhered to particular levels of analysis. As explicated earlier, this is the precise strength of big data initiatives: dynamic data modeling in multiple contexts.
Big data is not solving problems. It just serves a whole variety of purposes.

As the discussion about “Prism”, “BLARNEY”, “Tempora”, and other surveillance programs have shown recently, the rightfulness of big data initiatives is to be disputed. And rightfully so.