Friday, March 30, 2018

Open science: do we need that? And if so: how can we get started?

Recently, I organized with the Young Academy of Groningen a workshop on Open Science and Reproducibility. What is open science? Isn't science supposed to be open anyway? Keynote speaker Simine Vazire showed us how this is not always the case. She took us back to the fundamental ideals of science, sharing Merton's norms, which distinguish science from other forms of knowing. Science has a sense of universalism, in which the validity of a scientific claim does not depend on who is making it--there are no arguments from authority. Another ideal is "communality"--the scientific findings belong to everyone and everyone can check them. Science should also be disinterested--all results should be reported without bias. Science should not withold findings that are unfavorable to the scientist. Finally, nothing is sacred in science, and all claims should be tested. But is this really how science proceeds in practice? How many scientific findings can really be checked by everyone? How many scientists are really unbiased? Studies show that even scientists think that most science does not adhere to these ideas. So, we need a change in science to become more open such that it becomes easier for everyone to check scientific claims.
keynote by Simine Vazire

A major problem in science is the emphasis on significant results as a precondition for publication. Unfortunately it is quite easy to obtain significant results with enough p-hacking (trying out many different tests on different subsets of your data) and HARKING (hypothesizing after results are known--presenting the obtained significant results as the original hypothesis). Probably as a result of the commonality of these practices, many studies do not replicate, which was most clearly shown in large-scale attempts at replications (e.g., the Reproducibility Project).
Here I open the workshop

So how can we improve? A good step would be to share all materials and data so others can check it. A good resource recommended was the Open Science Framework. Candice Morey for example has all her materials and data from various projects there. However, during the data management panel the Research Data Management Office mentiond that this does not adhere to the new European regulations on privacy. Better options would be to work with for example Dataverse.nl. Moreover, it is important to really think carefully about you deidentify your data, because with the current machine learning algorithms it is surprisingly easy to identify someone's data by combining a few different sources. Another challenge in opening up your data is that you may not remember the connection between all your graphs and the raw data, or you may feel your data analysis scripts are too messy. Laura Bringmann shared some knowledge about Rmarkdown, which allows you to seamlessly integrate data analysis with code, avoiding the need to have code and graphs and data live in different places. This also makes it really easy to do revisions, because you can easily reproduce the original analyses that lead to specific numbers and plots. Of course even if you decide to open your data, many others may not do so. One practical step individuals can take to enhance openness in science is to participate in the Peer Reviewers Openness Initiative, in which you pledge to only review articles which make their data open (or provide a good excuse why they cannot do so). Of course another way in which openness can be improved is if universities consider the extent to which an individual makes their data and materials open in hiring and promotion decisions.
Candice Morey talking about pre-registration

In addition, what helps to promote open science is to formalize your hypotheses and deposit them somewhere before you collect the data. This procedure is called pre-registration, a topic also discussed in a keynote by Candice Morey. Preregistration can be done quite easily on the Open Science Framework. Another interesting method is to write up your hypotheses in a Registered Report format (offered by more and more journals), in which reviewers decide on acceptance based on your introduction and methods before you collect the data, and then you are guaranteed (in principle) acceptance, independent of how your results turn out. Of course academic incentives should also change to promote this: rewarding these research practices instead of rewarding high-impact publications.

A further step in improving would be to stop overselling our results and better understanding statistics. Rink Hoekstra talked about common misunderstandings about statistics. Most notably, almost everyone's intuitions about p-values are wrong. P-values cannot ever tell you that your statistical hypothesis is true, but it only provides some evidence against a null hypothesis, and it always carries a certain level of uncertainty. It is therefore never possible to make very definite claims about your data, unlike what journals, and even more the media, wants. A very insightful visualization of how little p-values really mean is the Dance of the P-values. Instead of blindly relying on p-values it is critical to instead focus more on visualizing your data, for which Gert Stulp provided some useful resources.
The data management panel

In short, there is still a long way to go to open up your science, but more and more resources are available. The full slides and materials of the meeting can be found here. You can also check out the hashtag #RUGopenScience.