Quick tips for making reproducible code in R

Reproducible code should (at minimum) be your last step before closing out a project. You want someone (especially your future self) to be able to pick up your code and quickly and efficiently be able to read it. Typical suggestions for reproducible code surround good commenting and formatting, but I’m going to go into detail for a few tips that have made things much cleaner.

set.seed()

set.seed(42)

paste() and paste0()

I typically wrap arithmetic functions and numbers in paste0because it does a better job of handling NA‘s and spacing around numbers.

Example:

iris %>% 
group_by(Species) %>%
mutate(group = paste(Species, "Average Petal Width",
paste0("(", format(
round(
mean(Petal.Width),
digits = 1),
big.mark = ",",
trim = T), ")",
sep= " "))) %>%
distinct(group)

pull() in RMarkdown

In RMarkdown:

` r format(round(iris %>% 
select(Species, Petal.Width) %>%
group_by(Species) %>%
summarise(avg.width = mean(Petal.Width)) %>%
ungroup() %>%
filter(Species == "setosa") %>%
pull(avg.width), digits =3),
big.mark = ",")`

You can always test this by just running:

iris %>% 
select(Species, Petal.Width) %>%
group_by(Species) %>%
summarise(avg.width = mean(Petal.Width)) %>%
ungroup() %>%
filter(Species == "setosa") %>%
pull(avg.width)

Either way, you’ll get [1] 0.246 (or the same number rounded based on your digits = argument in round()).

Reduce dataframes

Visualizations all link to a general dataframe

Here’s just a little example that doesn’t make a lot of theoretical sense, but shows my goal:

iris %>% 
select(Species, Petal.Width) %>% #reduce down to just what's needed
filter(Petal.Width > 1) %>% #some filtering critera
group_by(Species, Petal.Width) %>%
summarise(count = n()) %>% #one way of many to get a count
ungroup() %>%
#Now plot!
ggplot(aes(x = Petal.Width, y = count, fill = Species))+
geom_col(position = position_dodge())

Models and analysis

glm(Sepal.Length ~ Species, 
data = iris %>%
filter(Petal.Width > 0))

I like this for a similar reason as to above. I can see exactly what’s included in the model right there in the code and I don’t have to go searching for how the dataframe was created.

These are just a few gotchya’s that have made me like past versions of myself much more than I used to. If you have any other helpful ideas, please drop them in the comment section!

trained neuroscientist | professional data scientist | lifelong feminist