Life, science and dance: reproducibility

Friday, March 30, 2018

Open science: do we need that? And if so: how can we get started?

Recently, I organized with the Young Academy of Groningen a workshop on Open Science and Reproducibility. What is open science? Isn't science supposed to be open anyway? Keynote speaker Simine Vazire showed us how this is not always the case. She took us back to the fundamental ideals of science, sharing Merton's norms, which distinguish science from other forms of knowing. Science has a sense of universalism, in which the validity of a scientific claim does not depend on who is making it--there are no arguments from authority. Another ideal is "communality"--the scientific findings belong to everyone and everyone can check them. Science should also be disinterested--all results should be reported without bias. Science should not withold findings that are unfavorable to the scientist. Finally, nothing is sacred in science, and all claims should be tested. But is this really how science proceeds in practice? How many scientific findings can really be checked by everyone? How many scientists are really unbiased? Studies show that even scientists think that most science does not adhere to these ideas. So, we need a change in science to become more open such that it becomes easier for everyone to check scientific claims.

keynote by Simine Vazire

A major problem in science is the emphasis on significant results as a precondition for publication. Unfortunately it is quite easy to obtain significant results with enough p-hacking (trying out many different tests on different subsets of your data) and HARKING (hypothesizing after results are known--presenting the obtained significant results as the original hypothesis). Probably as a result of the commonality of these practices, many studies do not replicate, which was most clearly shown in large-scale attempts at replications (e.g., the Reproducibility Project).

Here I open the workshop

So how can we improve? A good step would be to share all materials and data so others can check it. A good resource recommended was the Open Science Framework. Candice Morey for example has all her materials and data from various projects there. However, during the data management panel the Research Data Management Office mentiond that this does not adhere to the new European regulations on privacy. Better options would be to work with for example Dataverse.nl. Moreover, it is important to really think carefully about you deidentify your data, because with the current machine learning algorithms it is surprisingly easy to identify someone's data by combining a few different sources. Another challenge in opening up your data is that you may not remember the connection between all your graphs and the raw data, or you may feel your data analysis scripts are too messy. Laura Bringmann shared some knowledge about Rmarkdown, which allows you to seamlessly integrate data analysis with code, avoiding the need to have code and graphs and data live in different places. This also makes it really easy to do revisions, because you can easily reproduce the original analyses that lead to specific numbers and plots. Of course even if you decide to open your data, many others may not do so. One practical step individuals can take to enhance openness in science is to participate in the Peer Reviewers Openness Initiative, in which you pledge to only review articles which make their data open (or provide a good excuse why they cannot do so). Of course another way in which openness can be improved is if universities consider the extent to which an individual makes their data and materials open in hiring and promotion decisions.

Candice Morey talking about pre-registration

In addition, what helps to promote open science is to formalize your hypotheses and deposit them somewhere before you collect the data. This procedure is called pre-registration, a topic also discussed in a keynote by Candice Morey. Preregistration can be done quite easily on the Open Science Framework. Another interesting method is to write up your hypotheses in a Registered Report format (offered by more and more journals), in which reviewers decide on acceptance based on your introduction and methods before you collect the data, and then you are guaranteed (in principle) acceptance, independent of how your results turn out. Of course academic incentives should also change to promote this: rewarding these research practices instead of rewarding high-impact publications.

A further step in improving would be to stop overselling our results and better understanding statistics. Rink Hoekstra talked about common misunderstandings about statistics. Most notably, almost everyone's intuitions about p-values are wrong. P-values cannot ever tell you that your statistical hypothesis is true, but it only provides some evidence against a null hypothesis, and it always carries a certain level of uncertainty. It is therefore never possible to make very definite claims about your data, unlike what journals, and even more the media, wants. A very insightful visualization of how little p-values really mean is the Dance of the P-values. Instead of blindly relying on p-values it is critical to instead focus more on visualizing your data, for which Gert Stulp provided some useful resources.

The data management panel

In short, there is still a long way to go to open up your science, but more and more resources are available. The full slides and materials of the meeting can be found here. You can also check out the hashtag #RUGopenScience.

Monday, November 27, 2017

Invisible scientists and the messiness of science: a dicsussion about open science

Today we hosted a visit of Rosanne Hertzberger with the Young Academy of Groningen. The theme of the afternoon was "open science", and I heard some soundbites that were too good not to share. Rosanne is a very passionate and courageous person who decided to pioneer being a freelance scientist. She started by saying that we as scientists at the university are unaware of how invisible we are. Why? Because we write lots of stuff, that gets put in journals behind a paywall, we talk about our science at conferences that only scientists go to, and we tend to not talk to the public (because we're too busy writing our papers). Good point. Sometimes I feel like the university considers me to be a little hamster running faster and faster in the paper-producing wheel.

She also talked about how science is the only profession where it is not possible to do it as an amateur--you have to be the equivalent of an olympic athlete, or not at all. But why do you have to fully dedicate yourself to science, why is it frowned upon if you have a significant other interest (in her case: writing articles, books and columns). I sometimes feel like that too: why do people think it is so crazy to be a serious amateur ballet dancer as well as a scientist? (see the inspiring quantum physicist Merritt Moore, or my own attempts at dancing and sciencing here). We had a discussion about the extent to which "everyone" can do science (cf. the citizen science movement), but Rosanne retorted that there are so many people who get a PhD and do not get the opportunity to continue in science because there are so few jobs. And another person said: are we even that special as scientists...

Probably one of the most important discussions revolved around the issue of invisibility. Rosanne said "it's very disappointing to see how little openness social media has brought to science. Why is live-tweeting a conference talk still a thing?" In other words, why do scientists not share their talks on youtube? (see for one example to the contrary Richard Morey's periscope broadcasts or the Lab Scribbles open lab notebook). Why don't scientists share their intermediate results on twitter? (while we do see pictures of their kids or cats). We discussed about the benefits of peer review, of which Rosanne posited that it holds us back, because there is too little communication between scientists in the heat of the process about things that work and things that don't. This means that progress is very slow, which is particularly problematic in the case of diseases and epidemics.

I think one other very important point was that in the communication to the public, and in our textbooks, science all looks very clean and shiny, while it is quite messy in the midst of it. Why don't we share our mess online? Rosanne: "it should be standard procedure to overshare. There is no such thing as TMI in science". There was some debate about how this may result in us all drowning in information, but Rosanne argued that a mechanism like reddit would easily allow us to manage this.

A final remark that I really liked was "aren't we reproducing each other's work all the time? It's called scooping." Good point. We ended also discussed quite a bit about incentives in science. Sharing results and materials takes quite a lot of time, for little reward. But this is what will make science progress much more. There is probably also a lot of things we can learn by talking to people from other fields, because in our discussion we learnt that for example in informatics producing reproducible code was standard practice, while sharing event questionnaires is not standard in psychology.

In short: a lot of work needs to be done, and sharing more of our science messiness, materials, intermediate data and so on would probably be a good idea. To be continued!

Life, science and dance

Friday, March 30, 2018

Open science: do we need that? And if so: how can we get started?

Monday, November 27, 2017

Invisible scientists and the messiness of science: a dicsussion about open science

About Me

Links