Publishing your data: the ethics question
23 July 2013 1 Comment
In practice, this means looking at all of those legal and policy issues which have an impact on data sharing and use, such as copyright, licensing, ethics, Gov 2.0, etc and keeping an eye on developments overseas.
ANDS is building the Australian Research Data Commons: a cohesive collection of research resources from all research institutions, to make better use of Australia’s research data outputs.
Last April, I published a short item in this blog about the importance of data citation if you are to get recognition for publishing your data and making it available to others.
There is increasing evidence that making your data available to others (or, to use a more familiar term, publishing your data) can enhance your research reputation. But consider the case of researchers whose work involves human subjects who might feel that their data cannot be published. This presents something of a challenge: how to handle sensitive data so that others can use it, while following ethical guidelines and making sure that the data cannot be wrongly used.
Data can be sensitive for a variety of reasons. Privacy considerations mean that personal, identified data cannot usually be made available to others. Security considerations might mean that you would be putting people at risk if you made some data available. You might also consider the possibility that your data describes the last remaining population of some rare species and, while most of the data might be fine to share, you don’t want anyone to put the species at risk by providing location information (think of the Wollemi Pine).
All is not lost.
The fact that your data might be sensitive does not mean that you cannot consider sharing it. However, there are ethical considerations to keep in mind, and it helps if you do your planning at the beginning of a project.
First, there is the issue of consent when dealing with personal information. This means that you need to inform participants in your study how the research data will be stored, preserved, and used in the long term. You will need this for any study of this kind. You’ll need to inform your participants how confidentiality will be maintained. You’ll need informed consent, either written or verbal, for data sharing.
Second, you may need to consider the issue of anonymising any data that you plan to share so that individuals, organisations, or businesses cannot be identified. There are a number of techniques that can be used to anonymise data: removing direct identifiers such as names or addresses, aggregating variables such as replacing date of birth with age groups, and reducing the upper or lower ranges of a variable to hide outliers (such as subjects with very high salaries or advanced ages). There is really good information from the Australian Bureau of Statistics on the finer details of how to anonymise your data.
Anonymisation is not always required. You may, for example, have conducted oral history interviews where it is customary to publish and share the names of people interviewed, if they have given their consent.
In addition, not all data can be anonymised. Audiovisual data is very difficult to anonymise, so it may not be worth it. Consider using transcripts of interviews, or accept that there is some data which simply cannot be shared.
Third, you’ll need to consider how future users might access the data, and any conditions that you might want to apply to its use. You might, for example, want to limit access to people working in the same discipline, or wish to have an assurance that there will be no data-mining that might allow the data to be re-identified.
This is probably the trickiest aspect of sharing the data, as you need to have confidence that the data repository you choose has the capacity to keep your data secure and mediate requests for its use.
In Australia, there are several examples of facilities that might meet your needs. The Australian Data Archive (ADA) is well set up for this purpose as is its offshoot, the Aboriginal and Torres Strait Islander Data Archive (ATSIDA). Both can handle qualitative and quantitative data. There are other excellent facilities in other countries.
Any access conditions you want to impose can be expressed by the licence attached to the data. If the data repository you choose does not have a standard licence, there is a template available through AusGOAL, the Australian Governments Open Access and Licensing Framework.
Any plans you have to make your data available will have to be approved by your local Human Research Ethics Committee. Some members of the Committee may not be comfortable about your proposals. If they aren’t, you could tell them about one major data sharing initiative that has been a huge success in terms of making data available to improve our capacity to understand Alzheimer’s Disease.
The Alzheimer’s Disease Neuroimaging Initiative (ADNI) was established in California in 2004. It is a longitudinal study intended to improve our understanding of the progression from mild aging through to Alzheimer’s disease (AD). Central to the study has been the notion that all data should be shared, enabling researchers from different countries and disciplines to take part. By March 2013, there were 3,712 approved investigators accessing the data.
Anyone seeking to make use of the de-identified data must sign an agreement, which stipulates that they will not attempt to re-identify or contact subjects involved in the study, disclose or redistribute the data, or attempt to contact ADNI principal investigators (PIs) or staff.
Australian ADNI is based on neuroimaging data from the Australian Imaging, Biomarker and Lifestyle Flagship Study of Ageing (AIBL) which has 12 partners around Australia, all sharing data.
So, who says it can’t be done?
There is a more detailed guide to the whole topic of ethics, consent and data sharing available via the ANDS website.
The Margaret Henty series on data management
- Data sharing in a time of data-intensive research, 4 December 2012.
- Tattoo your data, 23 April 2013.
- Publishing your data: the ethics question (this post).