Quick tips for making reproducible code in R

Lara Southard, PhD
3 min readFeb 10, 2021


Reproducible code should (at minimum) be your last step before closing out a project. You want someone (especially your future self) to be able to pick up your code and quickly and efficiently be able to read it. Typical suggestions for reproducible code surround good commenting and formatting, but I’m going to go into detail for a few tips that have made things much cleaner.


Any time you are running code that has a random process, you should set your seed. You can choose any random number for your seed. Most people choose birthdates, anniversaries, or favorite numbers. I usually go 4 digits out. This will allow anyone to run your code and get the same results.


paste() and paste0()

Most human errors come from copying and pasting values or typos. R has some elegant ways to avoid that that I’ve found myself using regularly. Not only does leveraging these functions help us avoid typos, but it also helps make your code more reproducible.

I typically wrap arithmetic functions and numbers in paste0because it does a better job of handling NA‘s and spacing around numbers.


iris %>% 
group_by(Species) %>%
mutate(group = paste(str_to_title(Species),":",
" Average petal width is ",
digits = 1),
big.mark = ",",
trim = T)),
sep= "")) %>%

Fun little note: stringr::str_to_title() capitalizes the first word in each string in an element.

pull() in RMarkdown

I often leverage RMarkdown’s functionality to knit documents to a PDF document or HTML file when creating reports.

In RMarkdown:

` r format(round(iris %>% 
select(Species, Petal.Width) %>%
group_by(Species) %>%
summarise(avg.width = mean(Petal.Width)) %>%
ungroup() %>%
filter(Species == "setosa") %>%
pull(avg.width), digits =3),
big.mark = ",")`

You can always test this by just running:

iris %>% 
select(Species, Petal.Width) %>%
group_by(Species) %>%
summarise(avg.width = mean(Petal.Width)) %>%
ungroup() %>%
filter(Species == "setosa") %>%

Either way, you’ll get [1] 0.246 (or the same number rounded based on your digits = argument in round()).

Reduce dataframes

When you’re done with your analysis, you’ll want to reduce the number of objects/dataframes you have saved. This way if you need to go back and update anything, you only have 1 or 2 dataframes to update instead of a whole mess of them. I usually do this in a few ways.

Visualizations all link to a general dataframe

I try to have all of my visualizations pull straight from my main dataframe. All filters and changes to the data are made right before I pull in ggplot2(). I usually use dplyr::select()in this process so I’m taking up a lot of computation space. I like this method for two reasons: 1) I can see exactly how the underlying data for the figure is structured 2) I can see any data nuances (filtering, variables, etc.)

Here’s just a little example that doesn’t make a lot of theoretical sense, but shows my goal:

iris %>% 
select(Species, Petal.Width) %>% #reduce down to just what's needed
filter(Petal.Width > 1) %>% #some filtering critera
group_by(Species, Petal.Width) %>%
summarise(count = n()) %>% #one way of many to get a count
ungroup() %>%
#Now plot!
ggplot(aes(x = Petal.Width, y = count, fill = Species))+
geom_col(position = position_dodge())

Models and analysis

I leverage %>% in the data = argument of various model functions in R (e.g. glm(), lmer() , etc.). Here’s a crude example:

glm(Sepal.Length ~ Species, 
data = iris %>%
filter(Petal.Width > 0))

I like this for a similar reason as to above. I can see exactly what’s included in the model right there in the code and I don’t have to go searching for how the dataframe was created.

These are just a few gotchya’s that have made me like past versions of myself much more than I used to. If you have any other helpful ideas, please drop them in the comment section!



Lara Southard, PhD

trained neuroscientist | professional research scientist | lifelong feminist