Welcome to the Advanced Data Skills, Open Science and Reproducibility M.Res. unit BIOL63101. The following video (as with all the others in this unit) is best viewed in fullscreen mode. I have recorded the audio with a podcasting microphone, so it’s best listened to with headphones. YouTube generates subtitles automatically, so please turn those in if you’d find them useful.

   
  

Workshop 1

Open Research and Reproducibility

In this workshop I will first introduce you to the key concepts in open research, and talk about the so-called “replication crisis” in the Psychological, Biomedical, and Life Sciences that has resulted in the Open Research movement I will also discuss the importance of adopting reproducible research practices in your own research, and provide an introduction to various tools and processes you can incorporate into your own research workflows that will allow you to conduct reproducible research. To go to the first part of the workshop, just click on the image below.

 

   

Experimental Power

The second part of this workshop covers experimental power (and why it is important). One of the insights revealed by the “replication crisis” is that very often research is underpowered for the effect size of interest (i.e., even if the effect is there, your experiment is unlikely to find it). Many of the issues stem from researchers not spending sufficient time considering the power aspects of their research design. In this workshop, I’ll provide you with an overview of some of the issues - just click on the image below to view this second part of the workshop:

 

   

Open Source Software

The third part of the workshop involves a very brief overview of open source software, the use of which is arguably key for researchers to be able to adopt open and reproducible research workflows. To view this third part, just click on the image below.

 

   

  

Workshop 2

Starting with R and RStudio Desktop

In this workshop I’ll introduce you to R (the language) and RStudio Desktop (the environment we use to interact with the language). I’ve also added a link to a great talk by the founder of RStudio, J.J. Alaire. At the end of the workshop I have put together a video which will show you how to run your first R script.

 

   

  

Workshop 3

Data Wrangling

In this workshop I will introduce you to a number of key packages known as the Tidyverse These packages contain a large number of functions for working with data in tidy format. By making our data wrangling reproducible (i.e., by coding it in R), we can easily re-run this stage of our analysis pipeline as new data gets added. Reproducibility of the data wrangling stage is a key part of the analysis process and often gets overlooked in terms of needing to ensure it is reproducible. There are two parts to this workshop. The first focuses on data wrangling/tidying. To go to this first part, just click on the image below.

 

 

  

Summarising Your Data

Once you have completed this first part and have your R script up and running, click on the image below for the second part where you’ll learn how to aggregate and summarise your data.

 

   

  

Workshop 4

Data Visualisation

In this workshop we will explore the basics of Data Visualization using R. You’ll have the opportunity to write an R script on your own computer that will generate some nice data visualisations. Just click on the image below to start.

 

 

  

Workshop 5

R Markdown

In this workshop I will show you how to generate a report in .html format using R Markdown. Reports written using R Markdown allow you to combine narrative that you’ve written alongwith R code chunks, and the output associated with those code chunks all in one knitted document. The assignments for this unit need to be produced using R Markdown.

 

 

  

Workshop 6

Regression Part 1

In this workshop we will explore Simple Regression in the context of the General Linear Model (GLM). You will also have the opportunity to build some regression models where you predict an outcome variable on the basis of one predictor. You will also learn how to run model diagnostics to ensure you are not violating any key assumptions of regression.

 

 

  

Workshop 7

Regression Part 2

In this workshop will explore Multiple Regression in the context of the General Linear Model (GLM). Multiple Regressions builds on Simple Regression, except that having one predictor (as is the case with Simple Regression) we will be dealing with multiple predictors. Again, you will have the opporunity to build some regression models and use various methods to decide which one is ‘best’. You will also learn how to run model disagnostics for these models as you did in the case of Simple Regression.

 

 

  

Workshop 8

ANOVA Part 1

In this workshop we will explore Analysis of Variance (ANOVA) in the context of model building in R for between participants designs, repeated measures designs, and factorial designs. You will learn how to use the {afex} package for building models with Type III Sums of Squares, and the {emmeans} package to conduct follow up tests to explore main effects and interactions.

 

 

  

Workshop 9

ANOVA Part 2

In this workshop we will build on Workshop 8 to explore Analysis of Covariance (ANCOVA). In this workshop we will also examine ANOVA and ANCOVA as special cases of regression and see how we can build both via a linear model. By then doing this yourselves, you wil hopefully be convinced that ANOVA and regression are really the same thing.

 

 

  

Workshop 10

This workshop will introduce you to Binder, a tool for allowing you to reproduce your computational environment, as well as your code and data, and share all of these with a collaborator via sharing a simple web link.

 

 

  

Technical Details

The structure for this unit was very much inspired by the Sharing At Short Notice webinar by Alison Hill and Desirée De Leon.

The repo for each workshop can be accessed via the ‘Improve this Workshop’ link at the bottom of each workshop page. The workshops and this website were all written using R Markdown and the website is hosted on Netlify via continunous deployment from this GitHub repository.

The source code for each of the Workshops above is licensed under the MIT license, and the lecture content under CC-BY.