Review of Rob Kitchin, The Data Revolution: Big Data, Open Data, Data Infrastructures & their Consequences (Sage 2014) 208 Pages £22.99
Reviewed by David Moats
Rob Kitchin’s The Data Revolution helpfully strips away the hype surrounding ‘big data’ and clarifies key terms. This review suggests some avenues where the book could be taken further, particularly in reference to the performativity of big data discourses and infrastructures, giving the example of Cameron and Palin’s dismantling of the hype around globalisation. It is also suggested that the performative effects of data enact sociality in new ways and may require more dramatic methodological and philosophical shifts. Crucially however, Kitchin’s book successfully clears a space of shared understanding on which more radical projects can be built.
Keywords: Big Data, Open Data, Small Data, Performativity, Methods
As an industry, academia is not immune to cycles of hype and fashion. Terms like ‘postmodernism’, ‘globalisation’, and ‘new media’ have each had their turn filling the top line of funding proposals. Although they are each grounded in tangible shifts, these terms become stretched and fudged to the point of becoming almost meaningless. Yet, they elicit strong, polarised reactions. For at least the past few years, ‘big data’ seems to be the buzzword, which elicits funding, as well as the ire of many in the social sciences and humanities.
Rob Kitchin’s book The Data Revolution is one of the first systematic attempts to strip back the hype surrounding our current data deluge and take stock of what is really going on. This is crucial because this hype is underpinned by very real societal change, threats to personal privacy and shifts in store for research methods. The book acts as a helpful wayfinding device in an unfamiliar terrain, which is still being reshaped, and is admirably written in a language relevant to social scientists, comprehensible to policy makers and accessible even to the less tech savvy among us.
The Data Revolution seems to present itself as the definitive account of this phenomena but in filling this role ends up adopting a somewhat diplomatic posture. Kitchin takes all the correct and reasonable stances on the matter and advocates all the right courses of action but he is not able to, in the context of this book, pursue these propositions fully. This review will attempt to tease out some of these latent potentials and how they might be pushed in future work, in particular the implications of the ‘performative’ character of both big data narratives and data infrastructures for social science research.
Kitchin’s book starts with the observation that ‘data’ is a misnomer – etymologically data should refer to phenomena in the world which can be abstracted, measured etc. as opposed to the representations and measurements themselves, which should by all rights be called ‘capta’. This is ironic because the worst offenders in what Kitchin calls “data boosterism” seem to conflate data with ‘reality’, unmooring data from its conditions of production and making relationship between the two given or natural.
As Kitchin notes, following Bowker (2005), ‘raw data’ is an oxymoron: data are not so much mined as produced and are necessarily framed technically, ethically, temporally, spatially and philosophically. This is the central thesis of the book, that data and data infrastructures are not neutral and technical but also social and political phenomena. For those at the critical end of research with data, this is a starting assumption, but one which not enough practitioners heed. Most of the book is thus an attempt to flesh out these rapidly expanding data infrastructures and their politics.
The first part of the book, however, is largely normative, attempting to nail down slippery terms. Firstly, these are technical terms like primary, secondary, tertiary, meta-data etc. Kitchin describes how what we retroactively call ‘small data’ has become an increasingly valuable commodity for both the state and business, driven by the development of computers and in particular relational databases. Now, data collection and analysis is often outsourced to companies – ‘data brokers’, who solely specialise in data – renting or sell it to other companies or governments. The book then discusses the drive toward open government data (mainly economic and transport data) initiated by the US. Kitchen reminds us that there are costs to making data open, analysable and compatible. Open data is not always “linked” data – data which uses unambiguous markers like social security numbers to allow the joining up of different datasets. Finally, he attempts to define the character of so called ‘big data’ outside of the standard understandings based on quantity and necessary computing power. He refers to the famous ‘three V’s’ (volume, velocity, variety) but adds to them exhaustiveness, (high) resolution, relationality and flexibility. Big data is not just a technical phenomenon; it entails a drive for completeness in the consumption of ever more data. After setting up the field, Kitchin then goes on to analyse debates and discourses around big data within governments, industry and academia and how big data is authorised, critiqued and implemented.
As Kitchin makes clear, big data is not only material infrastructures and technologies, but also discourses and narratives, which justify their existence and shape their trajectories. He describes data as immersed in data assemblages – “amalgams of systems of thought, forms of knowledge, finance, political economies, governmentalities and legalities, materialities and infrastructures, practices, organisations and institutions, subjectivities and communities, places, and marketplaces – that frame how data are produced and to what ends they are employed” (xvi). The term ‘assemblage’ and the host of entities above invokes simultaneously Foucauldian dispositifs or apparatuses, Deleuzian assemblages and Michel Callon’s socio-technical devices, which all have common origins and usages but slightly different jurisdictions (Law and Ruppert, 2013). Kitchin tends to lean toward Foucault as opposed to Callon, though he frequently acknowledges that data assemblages are performative: they act on social life in the sense that economics formats the economy.
Accepting this proposition, however, might require a different mode of analysis. For example, in relation to another set of assemblages constitutive of what we call globalisation, Cameron and Palan (2004) demonstrate that globalisation, as both popular and academic accounts have it, does not exist as such, but the feverish narratives of globalisation are performative – in the sense that leaders and institutions reshape their worlds in anticipation of globalisation, but again, with unforeseen effects. The authors need to develop new terms to describe what globalisation actually enacts as distinct from what it purports to name and explain. They call the limited instances of networked finance and transnational trade, which globalisation proponents cite as evidence – “the offshore economy” – but ironically they show that globalisation actually presides over a strengthening of the nation in the form of the “private economy” and the emergence of an “anti-economy”, including the poor and attendant state services which remain supposedly untouched by globalisation. These new spaces are “constituted within and constitutive of the narrative configurations of globalization” (Cameron and Palan, 2004: 19).
To analyse big data discourse in this way, we would firstly need to quarantine all these terms that Kitchin has so neatly laid out for us and deploy different ones. To give a modest example in relation to data collection, Richard Rogers (2004) makes an analytic division between ‘tracking’ (monitoring an item as it moves through various gateways) and ‘tracing’ (or attaching the monitoring device to the object itself). Other scholars have, in different ways, used the term ‘connectivity’ (Bennett and Segerberg, 2012; van Dijck, 2013) to describe certain new forms of sociality on social media. Kitchin calls for more theory about data itself but tends to ignore related contributions from software studies (Fuller, 2008), media studies (Gillespie, 2013) and cultural studies (Mackenzie, 2006) which have theorised about data for years – admittedly not in a way which is necessarily compatible with some modes empirical research or policy.
Secondly, we would need to look at the life of the industry terms, particularly big data and openness (see Tkacz, 2014), the infrastructures they create and boundaries they police. This task is more difficult and might require another decade of hindsight and a battery of empirical studies. This is also complicated by access: because we cannot easily peer into boardrooms, unpack proprietary algorithms or analyse government surveillance programmes to see how data narratives steer new developments.
However, it is a bit easier to see the way big data and big data narratives act on our own terrain – the social sciences. Whether a ‘paradigm shift’ is happening or not, this kind of trumped up language and the technologies underwriting it certainly have effects. At the very least, as Kitchin notes, funding is being increasingly directed to computational social science and digital humanities projects at the expense of other topics and qualitative, small data approaches. But he also eludes to effects on the content of research, the potential that the “tail is wagging the dog”, that we are only asking questions that can be asked by the data (Vis, 2013) as opposed to pursuing more sociological questions (see also Marres and Weltevrede, 2013).
Kitchin is at his best when revealing the gap between the narratives and the reality of data analysis such as the fallacy of empiricism – the assertion that, given the granularity and completeness of big data sets and the availability of machine learning algorithms which identify patterns within data (with or without the supervision of human coders), data can “speak for themselves”. Kitchin reminds us that no data set is complete and even these out-of-the-box algorithms are underpinned by theories and assumptions in their creation, and require context specific knowledge to unpack their findings. Kitchin also rightly raises concerns about the limits of big data, that access and interoperability of data is not given and that these gaps and silences are also patterned (Twitter is biased as a sample towards middle class, white, tech savy people). Yet, this language of veracity and reliability seems to suggest that big data is being conceptualised in relation to traditional surveys, or that our population is still the nation state, when big data could helpfully force us to reimagine our analytic objects and truth conditions and more pressingly, our ethics (Rieder, 2013).
However, performativity may again complicate things. As Kitchin observes, supermarket loyalty cards do not just create data about shopping, they encourage particular sorts of shopping; when research subjects change their behaviour to cater to the metrics and surveillance apparatuses built into platforms like Facebook (Bucher, 2012), then these are no longer just data points representing the social, but partially constitutive of new forms of sociality (this is also true of other types of data as discussed by Savage (2010), but in perhaps less obvious ways). This might have implications for how we interpret data, the distribution between quantitative and qualitative approaches (Latour et al., 2012) or even more radical experiments (Wilkie et al., 2014). Kitchin is relatively cautious about proposing these sorts of possibilities, which is not the remit of the book, though it clearly leaves the door open.
The Data Revolution’s main success lies in clearing a space – cutting out the conjecture and gloss, the Utopians and the reactionaries pulling in different directions – and locating a common ground from which to build something. What Kitchin does concretely propose is a mixed methods programme for how to research data assemblages, including ethnography, Foucauldian genealogies and ‘observant participation’. All of which would contribute to the sorts of avenues outlined above. It is perhaps necessary for Kitchin to establish a shared set of normative terms even if these need to be complicated later with some critical distance. Still, if these technologies are in fact disruptive technologies, then perhaps we should be more revolutionary in our approach now before the dust settles.
Bennett WL and Segerberg A (2012) The Logic of Connective Action. Information, Communication & Society, 15(5), 739–768.
Bowker GC (2005) Memory Practices in the Sciences. Cambridge, MA: MIT Press.
Cameron A and Palan R (2004) The Imagined Economies of Globalization. London: Sage.
Fuller M (2008) Software studies: a lexicon. Cambridge, MA: MIT Press.
Gillespie T (2013) The relevance of algorithms. In: Gillespie T, Boczkowski P and Foot K (eds) Media Technologies, Cambridge, MA: MIT Press. 167-194.
Latour B, Jensen P, Venturini T, et al. (2012) The Whole is Always Smaller than Its Parts: A Digital Test of Gabriel Tarde’s Monads. British Journal of Sociology.63 (4) 590-615.
Law J and Ruppert E (2013) The Social Life of Methods: Devices. Journal of Cultural Economy, 6(3), 229–240.
Mackenzie A (2006) Cutting code: Software and sociality. New York: Peter Lang,
Marres N and Weltevrede E (2013) Scraping the Social? Issues in real-time social research. Journal of Cultural Economy, 6(3) 313-335.
Rieder B (2013) Studying Facebook via data extraction: the Netvizz application. In: Proceedings of the 5th Annual ACM Web Science Conference, ACM, 346–355.
Rogers R (2004) Why map? The techno-epistemological outlook. Media Design Research.
Tkacz N (2014) Wikipedia and the Politics of Openness. University of Chicago Press.
Van Dijck J (2013) The Culture of Connectivity: A Critical History of Social Media. Oxford: Oxford University Press.
Vis F (2013) A critical reflection on Big Data: Considering APIs, researchers and tools as data makers. First Monday, 18(10).
Wilkie A, Michael M and Plummer-Fernandez M (2015) Speculative method and Twitter: Bots, energy and three conceptual characters. The Sociological Review 63 (1) 79-101.
David Moats is an ESRC funded PhD candidate in Sociology at Goldsmiths. His research concerns how public science controversies, in particular over nuclear power, unfold on various participatory media platforms (Wikipedia, Youtube, Facebook, Twitter) and how these devices enact participation in contingent ways. David is working to develop ‘quanti-qualitative methods’: a mix of large-scale mapping techniques and qualitative textual analysis informed by the study of controversies in science and technology studies (STS). David is also the co-editor of the Centre for the Study of Invention and Social Process (CSISP) Blog with Noortje Marres and Joe Deville.
Readers may also be interested in:
Edited by Evelyn Ruppert, John Law and Mike Savage
Theory, Culture & Society, July 2013; 30 (4)
Edited by and
Theory, Culture & Society, November 2011; 28 (6)