In the early days of e-Research there was much talk of the data deluge (from new scientific instruments, sensor networks, online transactions…) and how researchers couldn’t cope: manual triaging of available data into manageable chunks would lead to failure to discover results and an inability to see patterns in the overall picture.
The ‘obvious’ answer was making everything as machine-processable as possible, so that computers could automatically assist the humans in this task – computers getting on with the mundane repetitive tasks thus enabling humans to do what they are best at.
For example, scientific workflow technologies were created to provide automated “data analysis pipelines” that systematically handle large volumes of data – significantly, this approach also provides a sharable artefact that records the analysis and supports reproducibility and reuse. Semantic Web technologies help us to make sure that metadata is machine-processable, and that data itself can be integrated by machine – automatically – across diverse sources.
Applications of these technologies are co-evolving and we see them embedding with some sophistication into research practice – subtle, successful, ramping up and in stark contrast to the early “build it and they will come approach” where infrastructural investments for the data deluge didn’t always attract the anticipated research users (think Grid). The humans are very much in control; competence and confidence in the new tooling is growing.
But let’s fast forward a few years. I have increasing volumes of data. I have an increasing body of machine-processable methods for processing it. I have captured the interactions of scientists – in data analysis and in visualisation, singly and collaboratively. I have more and more computational power. Now I, the computer, can do research! I can spot patterns in data, try out hypotheses and models, evolve them based on their outcomes. I can generate new scientific results.
(Okay, we might argue whether the computer is really “doing research” but if we watch what it is doing it is indistinguishable from watching a researcher. Or a thousand researchers. And it can see more than any human. This is further emphasised if we consider laboratory automation, where the “robot scientist” really is conducting physical experiments under the guidance of programs.)
The future of e-Research inevitably involves increasing automation, so what are the ethics of automation? Who owns the intellectual property of the machine-generated scientific outcome that was based on the work of the computer, or the outcome that arose based on the logs of the activities of thousands of people – we can give 1000 citizen scientists the credit but who takes the blame? Whose responsibility is it if a program generates bad data or if something goes wrong in its execution? What is “ethical behaviour” in automated systems and how is it observed?
This is not so far away – the more I can do assisted by machine, the more the machine can soon do by itself. Today I can write a workflow that creates workflows based on those of others, and I could automatically modify it on a trial and error basis (think genetic mutation and crossovers) just to see what come out. I can already query over an increasing number and diversity of “linked data” sources to ask new research questions. I can generate and test hypotheses and models, using both observed data of natural experiments and the outcomes of simulations. I can lodge persistent queries in search engines so that as new information is available I will be notified automatically, and I can then process it – or the machine could just do this automatically.
As ever when the application of new technologies raises new challenges, we can also look to technologies to help us solve them. A deluge of data means a deluge of methods used to process the data and a deluge of interactions with the data. But equally we can track the provenance of those methods and results, so that we can better interpret them and assign credit where credit is due. Perhaps we can automate tools that help address ethical considerations!