What Difference Does Quantity Make? On the Epistemology of Big Data in Biology
Abstract: This paper addresses the epistemological significance of big data within biology: is big data science a new way of doing research? What difference does data quantity make to knowledge production strategies and their outputs? I argue that the novelty of big data science does not lie in the sheer quantity of data involved, though this certainly affects research methods and results. Rather, the novelty of big data science lies in (1) the prominence and status acquired by data as scientific commodity and recognised output; and (2) the methods, infrastructures, technologies, skills and knowledge developed to handle (format, disseminate, retrieve, model and interpret) data. These developments generate the impression that data-intensive research is a new mode of doing science, with its own epistemology and norms. I claim that in order to understand this claim, we need to analyze the ways in which data are actually disseminated and used to generate knowledge, and use such empirical study to question what counts as data in the first place. Accordingly, the bulk of this paper reviews the development of sophisticated ways to disseminate, integrate and re-use data acquired on model organisms over the last three decades of experimental biology. I focus on online databases as infrastructures set up to organise and interpret such data; and on the wealth and diversity of expertise, resources and conceptual scaffolding that such databases draw upon in order to function well. This case illuminates some of the conditions under which large amounts of data are collected and integrated in the first place, so as to become available for analysis and interpretation by researchers wishing to use those data to foster discovery. In my conclusions, I reflect on the difference that data quantity is making to contemporary biological research, the methodological and epistemic challenges of identifying and analyzing data given these developments, and the opportunities and worries associated to big data discourse and methods.
Sabina Leonelli is senior lecturer in the Department of Sociology, Philosophy and Anthropology at the University of Exeter, UK, where she also acts as associate director of the Exeter Centre for the Study of the Life Sciences (Egenis). From January to May 2014, she is also a visiting scholar at the “Sciences of the Archive” project of the Max Planck Institute for the History of Science in Berlin. As an empirical philosopher of science, Sabina uses historical and sociological research to foster philosophical understandings of knowledge-making practices and processes, focusing particularly on data-intensive biomedicine, model organism biology and plant science. From 2014 to 2019, she holds an ERC Starting Grant to pursue research in this area (http://www.datastudies.eu). She is also interested in science policy, particularly current debates on Open Science. She has authored numerous research papers in philosophy, STS and science journals; co-edited the volume Scientific Understanding: Philosophical Perspectives (2009, Pittsburgh University Press) as well as special issues in BioSocieties, Studies in the History and the Philosophy of the Biological and Biomedical Sciences and Public Culture; and is an active member of several science and philosophy organisations, most notably the Global Young Academy.