What can social and political scientists learn from data science? And what can data science contribute to the research on peace and conflict?
‘Most importantly, one has to know what questions to ask’, says Gabriele Schweikert, Research Fellow at the School of Informatics at the University of Edinburgh. ‘And secondly, one needs the necessary data to answer that question.’
For example, researchers on urban conflict might be interested to find out how different instances of violence distribute across a city over time. Available data from media on the location and intensity of violence can be harvested with the help of automatised bots searching for keywords. ‘But if researchers have only a vague idea of their question and do not know what data can do and what not, they might end up with a trivial answer’, she says, adding: ‘Such as the simple result that violent conflict in cities tends to take place in streets.’
Can data predict conflict?
Gabriele’s colleague, Guido Sanguinetti, a Reader in Machine Learning in Informatics at Edinburgh, is an expert in running prediction models, usually in the field of computational biology. But when a friend who worked as a data scientist for the New York Times sent him a visualisation of violent incidents in Afghanistan, taken from the WikiLeaks Afghan War Diaries, he realised that he could ‘do much more with the available data’.
The data set was released by WikiLeaks in 2010 and detailed individual unit behaviour in Afghanistan and Iraq as recorded by US armed forces. It has since been widely discussed in the media.
Using ideas from statistics, signal processing, and ecology, Sanguinetti and his team applied computer models to the data hoping to predict aspects of conflict development. Besides forward prediction of violent events, the method also extracted non-trivial and potentially useful statistics about the conflict, including diffusion, relocation, heterogeneous escalation, and volatility.
The goals of the project seemed highly relevant: to limit the severity of conflicts. Indeed, international organisations, governments, humanitarian agencies, non-governmental organisations (NGOs) as well as insurance companies are all interested in assessing conflict and predicting its progression.
But such modelling and prediction of conflict remains a challenging task due to the dynamic nature of the data typically available, meaning the dynamic nature of politics and human action itself: that naturally the course of conflicts can change quickly under the influence of events. In their publication, Sanguinetti and his colleagues came up with a way to control some of this variation. However, they write that applying data analysis tools to conflict situations ‘blindly’ can create major problems and counter-productive research outcomes.
‘Don’t use data to confirm your assumptions’
‘I definitely see a danger’, says Schweikert, adding that ‘models necessarily involve simplification of complex systems, and sometimes one may hard-wire assumptions in models and then end up seeing those assumptions confirmed by the model’.
This certainly echoes the impression that occasionally, quantitative methods may just serve as a way to solidify preconceived arguments; the temptation for researchers might be the hope that hard data will give them arguments that will trump qualitative, in-the-field research.
There is also a massive difference between traditional quantitative research, such as multivariate survey analysis, and the perils of big data. Understanding the deeper workings of the new data environment is thus increasingly important in all areas of research.
Need for collaboration
The scope of data set analysis and the ‘thickness’ of on the ground field-research might seem worlds apart, but in fact this contrast is precisely what opens up opportunities for mutual learning.
Schweikert mentions the example of social media incitement and violence, as in the case of nationalist groups in Europe. ‘Is it possible to establish a clear causal relationship between social media discourse and real-life events, even to predict events through prediction modelling of social media posts?’ she asks. ‘And what can we learn from social media data about the dynamics of radicalisation within communities?’
The ability of social scientists to provide reliable observations and research from the field, as anthropologists did in the recent comparative study ‘Why We Post’, could well provide additional insights and guidance that allow a deeper argument to come out of data science. At the same time, the mere scope and reach of data analysis with the help of prediction models could add unique value to qualitative research in the field, and provide fertile avenues for hypothesis generation, testing and for quantitative model comparison.
Break out of your comfort zone
If collaboration between social and data scientists would be such a win-win scenario, why is it so difficult to achieve?
‘Language barriers are a major hurdle’, says Sanguinetti, and he does not mean English or another spoken language, but rather the whole data universe and its inner logic and grammars.
‘We have been working at the interface between biology and computing for over a decade, and that has been difficult enough, even if experiments and big data are readily available in biology’, he says, adding that future cooperation between the social sciences and informatics might take even more resources and time.
‘Another major issue is the intrinsic difference between the local, detailed observation of the field researcher, with the global, coarse view of the data scientist’, says Schweikert, while Sanguinetti adds: ‘Interdisciplinary research is always praised, but seldom practised. Teaching and funding are often community-driven, and unless incentives are built to break barriers, the simple inertia of academic life leads us to choose the path of least resistance and stay in our comfort zone.’
Need for change?
What is lacking so far are university programmes that bridge the growing ‘language gap’ between the social and political sciences and the world of data. Meanwhile, private institutions, such as General Assembly, are filling the gap by attracting applicants from all fields of professional life who learn coding and data science in boot camps.
Here are some interesting resources on data science and peace and conflict studies:
– GDELT project (Global Data on Events Location and Tone, offers insights beyond conflicts)
– Computational Event Data System (focuses on Middle East, Balkans and West Africa, machine coded from news reports).
– ACLED (Armed Conflict Location and Event Data, focuses on the Global South)
– Clionadh Raleigh’s publications (a political geographer working on conflict and data at Sussex)
– Nils Weidmann’s work at the University of Konstanz.
– The introduction of the book ‘Modeling Conflict Dynamics with Spatio-temporal Data’, co-authored by Guido Sanguinetti
– More background on the Afghan War Diary project