Babies Learning Language: Transparency and openness is an ethical duty, for individuals and institutions

(tl;dr: I wrote an opinion piece a couple of years ago - now rejected - on the connection between ethics and open science. Rather than letting it just get even staler than it was, here it is as a blog post.)

In the past few years, journals, societies, and funders have increasingly oriented themselves towards open science reforms, which are intended to improve reproducibility and replicability. Typically, transparency policies focus on open access to publications and the sharing of data, analytic code, and other research products.

Many working scientists have a general sense that transparency is a positive value, but also have concerns about specific initiatives. For example, sharing data often carries confidentiality risks that can only be mitigated via substantial additional effort. Further, many scientists worry about personal or career consequences from being “scooped” or having errors discovered. And transparency policies sometimes require resources that are not be available to researchers outside of rich institutions.

I argue below that despite these worries, scientists have an ethical duty to be open. Further, where this duty is in conflict with scientists' other responsibilities, we need to lobby our institutions – universities, journals, and funders – to mitigate the costs and risks of openness.

Scientists have an ethical duty to be open

Openness is definitional to the scientific enterprise. The sociologist Robert Merton (1942) described a set of norms that science is assumed to follow: communism – that scientific knowledge belongs to the community; universalism – that the validity of scientific results is independent of the identity of the scientists; disinterestedness – that scientists and scientific institutions act for the benefit of the overall enterprise; and organized skepticism – that scientific findings must be critically evaluated prior to acceptance. The choice to be a scientist constitutes acceptance of these norms.

For individual scientists to adhere to these norms, the products of research must be open. To contribute to the communal good, papers must be available so they can be read, evaluated, and extended. And to be subject to skeptical inquiry, experimental materials, research data, analytic code, and software must be all available so that analytic calculations can be verified and experiments can be reproduced. Otherwise, evaluators must accept arguments on the authority of the reporter rather than by virtue of the materials and data, an alternative that is inimical to the norm of universalism. For many scientists, the situation is neatly summarized by the motto of the Royal Society: “Nullius in verba,” often loosely translated as “on no one’s word”.

Beyond its centrality to science, openness also carries benefits, both to science and to scientists. Open access to the scientific literature increases the impact of publications, which in turn increases the pace of discovery. Openly accessible data increases the potential for citation and reuse, and maximizes the chances that errors are found and corrected. These benefits accrue not just to the scientific ecosystem at large but also to individual scientists, who gain via citations, media impact, collaborations, and funding opportunities.

Some responsibilities follow from these benefits. Because openness maximizes the impact of research and its products, researchers have a responsibility to their funders to pursue open practices so as to seek the maximal return on funders’ investments. And by the same logic, if research participants contribute their time to scientific projects, the researchers also owe it to these participants to maximize the impact of their contributions, as my colleague Russ Poldrack has argued.

For all of these reasons, individual scientists have a duty to be open – scientific institutions have a duty to promote transparency in the science they support and publish.

The negatives of openness

Scientists have many other ethical duties beyond openness, however. They have obligations to their collaborators and trainees. They have committed to funders to complete specific studies. And in biomedical and social science fields, they have duties to preserve the welfare of their research participants as well. Conflicts with these duties are often the source of researchers’ hesitance to embrace openness.

Transparency policies also carry costs in terms of time and effort. For example, some routes to open access publication require authors to pay substantial publication costs (i.e., author processing charges). Organizing materials and data for sharing as well as providing support to dataset users can also be time-consuming, especially for larger datasets.

Maintaining participant confidentiality is a major source of both cost and risk for biomedical and other human subjects research. Loss of confidentiality by research participants can have big negative consequences for health, employment, and well-being. While ensuring that tabular data does not contain identifying information is often relatively straightforward, other types of data can be tricky and expensive to anonymize. For example, removing identifying information from video data requires considerable time and expertise. And certain types of dense or narrative data simply may not be de-identifiable due to aspects of the data or the participants’ identities.

Transparency can even be a source of risk – actual or perceived – to researchers themselves. Effort spent pursuing open practices may not be seen as compatible with other career incentives. For example, learning technical tools to facilitate code and data sharing could take away from time to pursue new research. Disclosure of high value datasets prior to publication could in principle lead to opportunities for “scooping” – though it turns out that there are very few documented cases of pre-emption as a result of data sharing. Finally, open sharing of research products prior to and during peer review might carry greater risk for junior researchers and for researchers from disadvantaged groups, because of their greater vulnerability to critiques or negative attention.

Individuals should consider openness as a default

In the face of competing duties as well as potential negatives to openness, what should individual researchers do? First, because of the ethical duty to openness for every scientist, open practices should be a default in cases where risks and costs are limited. For example, the vast majority of journals allow authors to post accepted manuscripts in their untypset form to an open repository. This route to “green” open access is easy, cost free, and – because it comes only after articles are accepted for publication – confers essentially no risks of scooping. As a second example, the vast majority of analytic code can be posted as an explicit record of exactly how analyses were conducted, even if posting data is sometimes more fraught. These kinds of “incentive compatible” actions towards openness can bring researchers much of the way to a fully transparent workflow, and there is no excuse not to take them.

For some researchers, however, there will be real negatives associated with one or more open practices. If they are not aware of the positive benefits of transparency and sharing for their work and the work of their trainees, they may consider open practices only as a necessary evil, rather than as opportunities to increase citations or build a reputation. But if they recognize the potential benefits of openness, researchers can ask whether there are steps that can be taken to realize some of those benefits while mitigating risks – for example, releasing only summary, tabular data rather than raw media data, or making use of a data sharing repository with robust access control.

In some cases, researchers might decide not to share. One example of this kind of situation came up in my own work, when I was studying dense audio-video recordings of the private life of a single identified family; these data are both sensitive and impossible to de-identify. The family decided not to share these data, and I support this decision, having seen how much the data would have compromised their family's privacy – though we did make tabular data available so that statistical results could be reproduced. A second more general case is archival data without consent for sharing where recontacting participants may be impossible or impractical. These cases are relatively rare, however; it is more common that sharing simply presents some potentially mitigable costs. It is precisely in these cases that institutions should step in.

Institutions can mitigate the risks and costs of openness

Given the ethical imperative towards openness, institutions like funders, journals, and societies need to use their role to promote open practices and to mitigate potential negatives. Scholarly societies have an important role to play in educating scientists about the benefits of openness and providing resources to steer their members towards best practices for sharing their publication and other research products. Similarly, journals can set good defaults, for example by requiring data and code sharing except in cases where a strong justification is given (equivalent to adopting the second highest level in the Transparency and Openness Promotion guidelines). I don't think the TOP guidelines are perfect, but I'm not sure why in this case we'd let the perfect be the enemy of the good.

Departments and research institutes can also signal their interest in open practices in job advertisements and tenure/promotion guidelines. We did this the last time we had a search at Stanford Psych and it signaled our department's general interest in these practices, leading to some good conversations with candidates (and letting us notice explicitly if candidates weren't as interested as we were). In addition, by structuring graduate programs to provide training in tools and methods for data and code sharing, departments can educate grad students about producing reproducible and replicable research – this has been my hobby horse for quite a while (see here and here).

Institutional funders of research play the most important role, however. Most funders already signal an interest in openness through a required data management plan or similar document, and some (like the US NIH) mandate data sharing to the extent permissible given other regulatory constraints (e.g., institutional review, health or data privacy laws). These requirements, though laudable, don't really change the scientific incentives at play. Data sharing should not just be required: It should also be treated as part of the scientific merit of an application. Creating a sufficiently high value dataset should be itself meritorious enough to warrant funding. And on the opposite side of the calculus, funders should signal their willingness to support the effort required to mitigate data sharing costs. For example, this could take the form of extra budget supplements explicitly tied to sharing activities.

More generally, funders and other institutional stakeholders need to act to change the incentive structure for individuals. For example, funding agencies could make it a priority to invest in creating technical tools and practice guidelines for human subject data anonymization. A small RFP for these could create huge value, making it much more straightforward to participate in data sharing.

Conclusion

Both advocates and critics of open practices often appear to be arguing about the merits of radical transparency, but this goal is often not achievable. Instead, individual researchers and institutions should proceed from both an understanding of the benefits of openness and an appreciation of the ethical duty to be open. These starting points lead naturally to a set of practices that are open by default, with exceptions in case of specific risks.

When individual researchers can't mitigate the costs associated with openness, responsibility falls to institutional actors in the scientific ecosystem to help. We can all do our part in this by lobbying our journals scientific societies, institutions, and funders to support researchers in making the right decisions around transparency.

Babies Learning Language

Monday, February 8, 2021

Transparency and openness is an ethical duty, for individuals and institutions

Scientists have an ethical duty to be open

The negatives of openness

Individuals should consider openness as a default

Institutions can mitigate the risks and costs of openness

Conclusion

No comments:

Post a Comment