The growing potential of health informatics and the use of linked datasets offer opportunities for a new wave of health research which can track large numbers of individuals or populations across different datasets. However, such study designs exhibit substantial differences from the models of clinical trials which have informed previous narratives on research ethics. The potential use of health and other (social care or social survey) datasets as linked datasets, and the potential level of detail contained within such linked datasets raises new ethical issues. These not only concern consent but also the effectiveness of research, researcher’s responsibilities, data security, disclosure and anonymity.
In ascertaining efficacy of drug treatments, the randomised controlled trial (RCT) is considered the gold standard in assessing potential patient treatments and sits at the top of the ‘evidence pyramid’. RCTs are valuable when conducted to best practice and where patients are representative of the patient population – however, ‘real world’ patients are frequently excluded from clinical trials because recruitment is hampered by strict eligibility criteria and RCTs are usually powered for efficacy and not safety. Therefore RCTs are not always generalisable to the patient population for which treatments under scrutiny were intended. Secondary data analysis using individual e-Health records linked to dispensed prescription data offer a powerful adjunctive way of observing adverse events associated with prescription regimens and may reinforce prescribing guidelines or identify undesirable long-term adverse effects. Estimates of additional deaths, hospitalisations versus health benefits associated with treatments can be made by diagnostic groups over a relatively long time-span.
Central to conducting a clinical trial is recruitment of patients who have the right to information about the study, and autonomy to participate or withdraw from research, as embodied in the ‘informed consent’ process. With vulnerable groups there are more complex questions around capacity to give informed consent and as a consequence of this, such groups are frequently excluded. Population level studies using individual-level linked data do not have comparable recruitment stages, and in the past consent has rarely been sought at the point of data collection for secondary use of data. If seeking informed consent for a secondary use of data is insisted upon (as is more likely in the future), individuals will likely be offered the opportunity to ‘opt out’ of population level linkages, with obvious detrimental effects on the studies and the usefulness of the collected data. Varieties of opt in/ opt out might equally well be supported where individuals might wish their data to be used in some kinds of studies but not others. A balance is required to ensure that the impact of participation on individuals does not become overly burdensome with potential proliferation of future linkages.
One solution which strikes a balance between the public good and the informed consent model is to anonymise datasets once linkages have been undertaken. In the absence of personal information, consent is typically no longer required. However, using fully anonymised data diminishes the possibility of exploiting complex data to its full potential (since informative distinctions have to be suppressed to reduce the risk of identifying individual cases). In addition, removing personal identifiers (such as health care provider numbers for exact matching, or other personal information for probabilistic matching) also precludes the possibility of enriching the dataset in the future by onward linking to other datasets, such as health or household surveys or indeed longitudinal studies. Alternative data models are therefore used, where data is partially anonymised (the possibility of identifying individual cases being minimised but not eliminated) and/or ‘pseudonymised’ (encrypting identifying information such as a unique health number, and creating separate ‘look-up’ tables which provide a means to get back to the identifying data for future linkages). Here the related ethical issues of confidentiality, data security and statistical disclosure control arise. In this model, researchers are obliged to agree to conditions of accessing the data where for example they agree not to attempt to use the data to identify individuals and also must have completed specific mandatory training and acquired a ‘research passport’ in order to be considered a ‘bona fide’ researcher to be granted access to data by research ethics committees.
e-Science technologies and expertise offer two important options in the production of individual-level linked datasets of the whole patient population. Firstly, they offer infrastructural arrangements to support the storage of original data in secure, but accessible environments typically hosted by the data providers themselves (e.g. Scottish Health Informatics Programme; www.scot-ship.ac.uk), and subsequently allow limited, policy-driven access to authorised researchers for data linkage purposes, thus dramatically increasing the availability of population-linked datasets for research purposes. Secondly, they offer analysis-oriented services which support storage of pseudonymised and partially anonymised data that has been linked in secure environments where all data providers are able to sign-off on the linked, joined, anonymised data sets before final data release to the researchers, The ‘Vanguard’ system1,2 and its application in the Data Management for the e-Social Sciences, DAMES; www.dames.org.uk project is one such example.
As an illustrative example, in the domain of e-Health research, there is substantial scope to explore the safety of medication prescribing. This is of particular interest in under-researched vulnerable groups, such as in pregnancy, in mental health disorders and in the elderly. Analysing secondary health data linked to dispensed drug data can reinforce existing prescribing guidelines or identify medication regimens associated with adverse events happening many years or decades later. Using population level data, an estimate of additional deaths and hospitalisations associated with particular treatments can be made by diagnostic groups, likely to be of particular value where there are existing areas of uncertainty. However, ethical considerations in re-using health data or onward linking to other datasets without consent are not trivial.
The integrated e-Health data model using linked health and pharmacy datasets offer a complementary way of assessing medication safety and related adverse events in the whole population. They present a pragmatic and relatively cheap way of exploring health outcomes by sub-groups. Drug interactions and adverse event incidence rates are likely to vary across treated people and compared with controlled trial conditions, predicted to be worse in patient-level analysis of the whole population where ‘real’ patients with complex health and social conditions exist. The e-Health data model also offers the capability of onward individual-level linking to datasets containing information which may help unpick underlying reasons for health inequalities; such studies enable the tracking of the effects of prescription drugs and other interventions, as linked dataset studies offer scope to look at data collected over years (whereas clinical trials usually have relatively short observational timespans, typically a number of weeks). Moreover, linked studies of health, pharmacy and social science datasets offer scope to capture most of the total population resident in the UK receiving treatment for specific disorders, thereby minimising participation bias. Therefore e-Health research studies pose considerable opportunities to explore added impact of other factors such as deprivation where social problems and riskier behaviour can mask events. For instance, prescription drug overdoses may be as much to do with the biological disorder as the social problems many individuals may face. A particular advantage of e-health research is the ability to periodically run repeat analyses on dynamically updated databases in order to examine the effect of policy changes after implementation.
The political will to facilitate e-Health research with population-linked data, which is overwhelmingly for public benefit, may be constrained by the views of a minority with legitimate concerns with invasion of privacy and the harm that may follow from disclosure. The principal ethical issue of using existing or future population data without gaining informed consent is contentious. Researchers currently face choices to either do what may be perceived by a minority to be ‘wrong’, and proceed on the basis that in the presence of reasonable safeguards public good outweighs the need for individual consent; or to avoid research which does not achieve informed consent. The latter option all but rules out using population-linked data, especially on populations with limited or no capacity to provide consent (e.g. those who have developed conditions such as dementia) and research on those who may be ‘lost’ to the system through changing address or mortality.
The second position is unsatisfactory, and accordingly it is desirable to seek a means of working effectively with population linked data. Custodians of relevant data, such as the NHS, have an opportunity to champion the secondary use of data to improve health services and reduce health inequalities. This can be achieved by addressing privacy concerns and demonstrating that the necessary safeguards have made accidental disclosure of personal information such a low risk prospect, that no harm could come to an individual as a result. As an academic research community, we can champion the health benefits of individual-level linked data research using sophisticated tools and fine-grained security solutions. We can argue for this research as it offers a relatively unbiased sampling method to include those with complex social problems, multiple medical conditions and those at highest risk in society who are often precluded in formal clinical research. We can also press for the inclusion of provisions in an e-research manifesto which provide that: where research being undertaken using linked datasets is clearly for population benefit, and where the fine-grained data and security systems are in place to keep data safe, that the need for specific individual informed consent should be waived.