Publication: Big Data, Societies and Social Sciences

on the October 9, 2018

Gilles Bastin, co-header of the Data Institute’s work package 4 on « Data Science, Social Media and Social Sciences », and Paola Tubaro coordinated the last Revue française de sociologie about “Big Data, Societies and Social Sciences”.
Big Data, Societies and Social Sciences
A special issue of the Revue française de sociologie, 2018/3 (Vol. 59)
Edited by Gilles Bastin and Paola Tubaro

https://www.cairn.info/revue-francaise-de-sociologie-2018-3.htm

This special issue of the Revue française de Sociologie aims at assessing the effects of the data deluge on the practices, objects and results of the academic community in sociology. What practical benefits have sociologists who have started working with this data been able to derive for their research ? What pitfalls did they encounter on the way ? What has been learned about big data throughout these first years of experimentation ? Providing answers to these questions is necessary to assess the scope and scale of the changes that have already taken place in the academic world and to anticipate, as much as possible, its future.

Contributions to this issue address two major issues : how do big data transform society ? How do these data affect the practice of sociology (and more generally, the social sciences) ? They include :

« The time for big data in the social sciences » (Gilles Bastin and Paola Tubaro)


This introduction raises questions about the jurisdiction of social science in the age of big data. Two main issues are addressed : the issue of accessing data for social science in a world where private companies have commodified personal data in an unprecedented way and the general public rightfully demands more control on those data ; the issue of methods suited to explore those data, notably the rise of machine learning and artificial intelligence as tools for social scientists. (available at https://halshs.archives-ouvertes.fr/hal-01885416v1)

« What’s behind the age gap between spouses ? Big data and the study of age difference within couples » (Marie Bergström)


In the majority of heterosexual couples the man is older than the woman. This obser- vation is surprisingly consistent over time and space. In almost all known societies, the husband is on average older than the wife. Yet although this fact is well established, the mechanisms at work are much less so. How does this gender asymmetry come to be? Traditional surveys have a hard time answering this question; because they focus on individuals who are already in a couple, they do not adequately capture the dating pro- cess. This article relies on an alternative approach that mobilizes data from an online dating site. These services—which are now widely used in France—provide an original viewpoint on women’s and men’s mate preferences and the matching mechanisms. In doing so, they provide new results. Whereas survey data suggest that the age difference is above all sought by women, the data from the website show that it is also desired by men, especially after a separation. More generally, the study questions the notion of the “partner choice”—largely used in the sociological literature—and shows that romantic and sexual encounters are based on a compromise between female and male pref- erences that diverge rather than they coincide. Through the example of age difference between spouses, the article seeks to demonstrate some of the opportunities provided by “big data”.

« Platform, big data and the reshaping of urban government. The effects of Waze on traffic regulation policies » (Antoine Courmont)


This article adopts a perspective of sociology of data to analyze how urban governance is being reshaped in connection with the new quantification regime of big data. Drawing on the case of traffic flow plans and the Waze application, it pursues two hypotheses.
1) Big data puts forward new representations of the city that disturb what has been public institutions’ stable, ordered organization of reality, enabling new actors called platforms to offer alternative ways of regulating urban space and thereby generating tensions with local public authorities. 2) Nonetheless, the analysis specifies that the ways in which these new data are produced bring to light modes of accommodating both the reality established by public institutions and the reality of digital service plat- forms. Through the data path, new types of coordination between public and private actors emerge. The article thus illustrates what sociology of data can contribute to understanding how in the big data era, different types of regulation can be applied locally to produce new modes of urban governance.

« The whole rather than its parts. Big data and the multiplicity of opinion measures on the Web » (Baptiste Kotras)


In the form of blogs, forums and social networking sites, the abundance and calculability of the discourse of Internet users provides access to spontaneous opinion, taken directly from the traces of our everyday conversations. Since the 2000s, a group of start-ups and agencies have been developing social and algorithmic methods to take advantage of this abundant material in order to provide a new way of measuring public opinion, which could be more authentic than that measured by traditional surveys. On the basis of a social history of the online opinion survey market, this article studies the way in which a new regime of knowledge about opinion is being reconstructed from its digital traces, and underlines the varied, contingent and situated nature of the epistemic pro- jects that capture big data. Using interviews and ethnographic studies, we demonstrate the contrast between companies that are experts at sampling digital records, and others which aim instead at capturing as much as possible of the opinions being voiced on the social web. In particular, we analyze the technical and epistemic challenges faced by online opinion actors that nullify sample-study based approaches, and thus support a different approach that involves extensive and continuous study of online conversation.

« The Great Regression. Machine Learning, Econometrics, and the Future of Quantitative Social Sciences » (Julien Boelaert and Étienne Ollion)


What can social sciences do with machine learning, and what can the latter do to them? A contribution to the emerging debate on the role of machine learning for the social sciences, this article offers an introduction to this class of statistical techniques. It details its premises, logic, and the challenges it faces. This is done by comparing machine learning to more classical approaches to quantification – most notably para- metric regression – both at a general level and in practice. The article is thus an inter- vention in the contentious debates about the role and possible consequences of adopting statistical learning in science. We claim that the revolution announced by many and feared by others will not happen any time soon, at least not in the terms that both proponents and critics of the technique have spelled out. The growing use of machine learning is not so much ushering in a radically new quantitative era as it is fostering an increased competition between the newly termed classic method and the learning approach. This, in turn, results in more uncertainty with respect to quantified results. Surprisingly enough, this may be good news for knowledge overall.

« Mining political opinion on Twitter: Challenges and opportunities of multiscale approaches » (Marta Severo and Robin Lamarche-Perrin)


Social research on public opinion has been affected by the recent deluge of new digital data on the Web, from blogs and forums to Facebook pages and Twitter accounts. This fresh type of information useful for mining opinions is emerging as an alternative to traditional techniques, such as opinion polls. Firstly, by building the state of the art of studies of political opinion based on Twitter data, this paper aims at identifying the relationship between the chosen data analysis method and the definition of political opinion implied in these studies. Secondly, it aims at investigating the feasibility of per- forming multiscale analysis in digital social research on political opinion by addressing the merits of several methodological techniques, from content-based to interaction- based methods, from statistical to semantic analysis, from supervised to unsupervised approaches. The end result of such an approach is to identify future trends in social science research on political opinion.

« What Big data does to the sociological analysis of texts ? A review of recent research » (Jean-Philippe Cointet and Sylvain Parasie)


Since the 2000s, new techniques of text analysis have emerged at the crossroads of computer science, artificial intelligence and natural language processing. Although they were developed independently of any sociological theory, these methods are now being used by researchers—sociologists and non-sociologists alike—to produce new knowledge of the social domain exploiting the massive volume of textual materials now available. By providing an overview of recent sociological investigations that are based on quantitative analyses of textual corpora, this article identifies three conditions under which these approaches can be a resource for sociological inquiry. The three conditions that emerge from our analysis concern: 1) knowledge of the context of production of textual inscriptions; 2) integration of external data into the study itself; 3) the adaptation of algorithms for sociological reasoning.

Published on October 9, 2018