# Code for the first bar plot
library(simstudy)
library(ggsankey); library(patchwork)
library(thekidsbiostats) # install with remotes::install_github("The-Kids-Biostats/thekidsbiostats")
<- dat %>%
p1 filter(group == "Intervention") %>%
group_by(pre) %>%
tally() %>%
mutate(freq = n / sum(n),
res = str_c(n, "\n(", round(freq*100, 1), "%)")) %>%
ggplot(aes(x = as.factor(pre), y = freq)) +
geom_bar(aes(fill = as.factor(pre)), stat="identity", alpha = 0.8,
colour = "black") +
theme_institute(base_size = 14) +
theme(legend.position = "none",
panel.grid.major.x = element_blank(),
plot.title = element_text(hjust = 0.5)) +
scale_y_continuous(labels = scales::percent_format(),
breaks = seq(0, max_prop, by = 0.05),
expand = expansion(mult = c(0, 0.1))) +
coord_cartesian(ylim = c(0, max_prop)) +
scale_fill_viridis_d(option = "plasma", end = 0.85, direction = -1) +
labs(title = "Pre",
fill = "Response",
x = "Response", y = "") +
geom_text(aes(label = res), vjust = -0.1,
family = "Barlow Semi Condensed") +
guides(fill = guide_legend(nrow = 1))
Overview
(Lengthy) readable code. Efficient functions comprised of the minimum number of characters necessary. An age old battle in programming.
Way back when there were only two users on Bluesky, I linked to one of our earlier blog posts where we presented a nice way to visualise pre-post ordinal Likert data.
The only other user on Bluesky at the time commented:
All of the points raised were correct. It was a good post, but there was a lot of repetition in the code. Which got me thinking about the age old battle in programming…
The case for code
There are (always) a lot of people out there learning to use R, and a lot of them love experimenting with the great visualisation tools R has to offer. And a lot of those people are still learning to be code savvy. I know this because I cross paths with this type of user multiple times a week—copy and pasting examples from blogs (like this) and/or using trial and error to learn. It is this large cohort of users that one can appeal to by sharing verbose code.
Some benefits of posting code:
- If one encounters an error, it is easy for someone still learning to run the code in stages to identify where the problem may be,
- If one wants to make minor changes (colours, line thickness, alpha etc.), they can easily see where to dive in and do so,
- It is crystal clear what and where the code can (successfully) be customised. For example, customisations can be made without having to worry whether those arguments will be honoured by the function and not return a deluge of errors, and
- It reinforces the amount of work that goes into producing an end result, and
- It gives the reader direct line of sight to foundational code concepts and principles, as opposed to these being hidden and/or the outputs appearing seemingly by magic.
For example:
The purpose of this post is not really about running this code. However, if you are looking to, the code to create the dat
this relates to can be found here.
Also, in no time at all, one can copy that code into an LLM, and ask it to explain what is happening each step along the way (assuming, of course, the author didn’t take the time to do that themselves…).
# Filter and process data for the "Intervention" group
<- dat %>%
p1 filter(group == "Intervention") %>% # Keep only rows where 'group' is "Intervention"
group_by(pre) %>% # Group data by 'pre' variable
tally() %>% # Count the number of occurrences in each 'pre' group
mutate(
freq = n / sum(n), # Calculate frequency for each group as a proportion
res = str_c(n, "\n(", round(freq * 100, 1), "%)") # Create label with count and percentage
%>%
) # Start plotting the processed data
ggplot(aes(x = as.factor(pre), y = freq)) + # Define x-axis as 'pre' factor and y-axis as frequency
# Add bars to the plot
geom_bar(
aes(fill = as.factor(pre)), # Use 'pre' as fill color for each bar
stat = "identity", # Use actual values for bar height
alpha = 0.8, # Set transparency of bars
colour = "black" # Add black outline to bars
+
)
... ...
The full commented code for the above plot is below.
Code
# Filter and process data for the "Intervention" group
<- dat %>%
p1 filter(group == "Intervention") %>% # Keep only rows where 'group' is "Intervention"
group_by(pre) %>% # Group data by 'pre' variable
tally() %>% # Count the number of occurrences in each 'pre' group
mutate(
freq = n / sum(n), # Calculate frequency for each group as a proportion
res = str_c(n, "\n(", round(freq * 100, 1), "%)") # Create label with count and percentage
%>%
)
# Start plotting the processed data
ggplot(aes(x = as.factor(pre), y = freq)) + # Define x-axis as 'pre' factor and y-axis as frequency
# Add bars to the plot
geom_bar(
aes(fill = as.factor(pre)), # Use 'pre' as fill color for each bar
stat = "identity", # Use actual values for bar height
alpha = 0.8, # Set transparency of bars
colour = "black" # Add black outline to bars
+
)
# Apply custom theme for aesthetic consistency
theme_institute(base_size = 14) + # Use institute's custom theme with base font size 14
theme(
legend.position = "none", # Remove legend from the plot
panel.grid.major.x = element_blank(), # Remove major grid lines on x-axis
plot.title = element_text(hjust = 0.5) # Center-align the plot title
+
)
# Customize the y-axis scale
scale_y_continuous(
labels = scales::percent_format(), # Show y-axis labels as percentages
breaks = seq(0, max_prop, by = 0.05), # Define tick marks on y-axis
expand = expansion(mult = c(0, 0.1)) # Add space above the bars for clarity
+
)
# Set the limits of the y-axis
coord_cartesian(ylim = c(0, max_prop)) + # Constrain y-axis from 0 to 'max_prop'
# Define color palette for fill colors
scale_fill_viridis_d(
option = "plasma", # Use "plasma" color scheme from the viridis package
end = 0.85, # Limit the color intensity at the end
direction = -1 # Reverse the color scale
+
)
# Add titles and labels
labs(
title = "Pre", # Set the main title for the plot
fill = "Response", # Label for the fill legend (not shown since legend is removed)
x = "Response", # Label for x-axis
y = "" # No label for y-axis
+
)
# Add text labels on top of each bar
geom_text(
aes(label = res), # Use 'res' (count and percent label) for text labels
vjust = -0.1, # Position text slightly above each bar
family = "Barlow Semi Condensed" # Use specific font for text labels
+
)
# Customize the legend layout
guides(
fill = guide_legend(nrow = 1) # Arrange legend items in a single row (applies if legend is displayed)
)
The case for functions
Of course, this figure was actually used repeatedly multiple times in a report we were producing, across a range of pre-post ordinal Likert variables. So we wrapped this code into a function so it could be called and used regularly without making our actual report (.qmd
) incredibly lengthy.
Some benefits of posting functions:
- All code is reusable, per se, but making the effort to put your code into a function makes it readily reuseable,
- It can remove a barrier to entry, especially for those who don’t want to clutter the
.qmd
files or risk any copying errors—they can load the function in and get on with their day, - It can increase uptake, as more advanced users may view the solution as more robust and flexible (less customised) than they would if it is simply presented as a ‘wall of code’, and
- You can reduce the risk of errors, as the process of turning code into a more generic function often reveals areas where hard coded parameters become problematic (or where there might be a copy and paste error). This can force you to make the code more versatile and reduce the likelihood of hard coded or specific use case errors creeping in.
So what did this look like in our actual use case?
Expand the code below to see how our ordinal_plot()
function was structured.
Code
<- function(data,
ordinal_plot left_var = "pre",
left_title = "Pre",
left_fill_title = "Response",
left_x_axis_label = "Response",
left_y_axis_label = "",
center_title = "Pre-Post",
center_fill_title = "Response",
center_x_axis_label = "",
center_y_axis_label = "",
right_var = "post",
right_title = "Post",
right_fill_title = "Response",
right_x_axis_label = "Response",
right_y_axis_label = "",
legend.position = "bottom",
id_var = "id") {
<- c(id_var, left_var, right_var)
vars
<- dat %>%
max_prop select(all_of(vars)) %>%
pivot_longer(cols = c(!!sym(left_var), !!sym(right_var))) %>%
group_by(name) %>%
count(value) %>%
mutate(freq = n / sum(n)) %>%
$freq %>%
.
max
<- plyr::round_any(max_prop, 0.05, f = ceiling)
max_prop
<- dat %>%
p1 group_by(!!sym(left_var)) %>%
tally() %>%
mutate(freq = n / sum(n),
res = str_c(n, "\n(", round(freq * 100, 1), "%)")) %>%
ggplot(aes(x = as.factor(!!sym(left_var)), y = freq)) +
geom_bar(
aes(fill = as.factor(!!sym(left_var))),
stat = "identity",
alpha = 0.8,
colour = "black"
+
) theme_institute(base_size = 14) +
theme(
legend.position = "none",
panel.grid.major.x = element_blank(),
plot.title = element_text(hjust = 0.5)
+
) scale_y_continuous(
labels = scales::percent_format(),
breaks = seq(0, max_prop, by = 0.05),
expand = expansion(mult = c(0, 0.1))
+
) coord_cartesian(ylim = c(0, max_prop)) +
scale_fill_viridis_d(option = "plasma",
end = 0.85,
direction = -1) +
labs(
title = left_title,
fill = left_fill_title,
x = left_x_axis_label,
y = left_y_axis_label
+
) geom_text(aes(label = res), vjust = -0.1, family = "Barlow Semi Condensed") +
guides(fill = guide_legend(nrow = 1))
<- dat %>%
p2 rename(Pre = !!sym(left_var), Post = !!sym(right_var)) %>%
make_long(Pre, Post) %>%
mutate(node = factor(node, levels = c(7, 6, 5, 4, 3, 2, 1)),
next_node = factor(next_node, levels = c(7, 6, 5, 4, 3, 2, 1))) %>%
ggplot(aes(
x = x,
next_x = next_x,
node = node,
next_node = next_node,
fill = factor(node)
alpha = 0.8) +
), geom_sankey(alpha = 0.8, node.color = 'black') +
geom_sankey_label(
aes(label = node),
alpha = 0.8,
size = 3,
color = "black",
fill = "gray80"
+
) scale_x_discrete(expand = c(0.05, 0.05)) +
theme_institute(base_size = 14) +
theme(
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
axis.title.y = element_blank(),
axis.text.y = element_blank(),
axis.ticks = element_blank(),
legend.position = legend.position,
plot.title = element_text(hjust = 0.5)
+
) guides(fill = guide_legend(reverse = T, nrow = 1)) +
labs(title = center_title, fill = center_fill_title, x = center_x_axis_label)
<- dat %>%
p3 group_by(!!sym(right_var)) %>%
tally() %>%
mutate(freq = n / sum(n),
res = str_c(n, "\n(", round(freq * 100, 1), "%)")) %>%
ggplot(aes(x = as.factor(!!sym(right_var)), y = freq)) +
geom_bar(
aes(fill = as.factor(!!sym(right_var))),
stat = "identity",
alpha = 0.8,
colour = "black"
+
) theme_institute(base_size = 14) +
theme(
legend.position = "none",
panel.grid.major.x = element_blank(),
plot.background = element_blank(),
plot.title = element_text(hjust = 0.5)
+
) scale_y_continuous(
labels = scales::percent_format(),
breaks = seq(0, max_prop, by = 0.05),
expand = expansion(mult = c(0, 0.1))
+
) coord_cartesian(ylim = c(0, max_prop)) +
scale_fill_viridis_d(option = "plasma",
end = 0.85,
direction = -1) +
labs(
title = right_title,
fill = right_fill_title,
x = right_x_axis_label,
y = right_y_axis_label
+
) geom_text(aes(label = res), vjust = -0.1, family = "Barlow Semi Condensed") +
guides(fill = guide_legend(nrow = 1))
+ p2 + p3
p1 }
You will see the use of !!sym(..)
here. This is to do with ‘injecting’ ‘expressions’ within functions. It can be tricky to understand. The sym()
part removes the quote marks from the string and makes it a symbol (as in, a name that relates to an object in). The !!
part injects the ‘symbol’ (remember, relating to the object) rather than just injecting the ‘character string’ per se.
And how did we used it with the report (.qmd
)?
...source(".../ordinal_plot.R")
...
# Change in response - intervention group
Some descriptive text ..., and, below is the figure.```{r fig.width = 8, fig.height = 6}
\dat %>%
filter(group == "Intervention") %>%
ordinal_plot()
\```
- We certainly acknowledge that we could have included more arguments in our function to be passed through to the active code to facilitate additional on the fly customisability. The balance we settled on here was a function that was sufficiently versatile for the elements we needed to modify through our reporting use case - and we perhaps avoided obscurity by unnecessarily having ‘everything’ passed through.
Closing comments
When deciding between using basic code or wrapping that code it into functions, it’s important to balance simplicity and efficiency, and think of your audience. For R-based visualisations and reproducible research workflows used within in Quarto documents, showing basic code offers transparency and fosters learning, making it ideal for teaching or exploratory analyses. On the other hand, functions enhance re-usability and scalability, streamlining workflows for complex or repeated tasks.
By combining both approaches (in a blog post)—sharing foundational coding steps while having a more advanced option to encapsulate repetitive tasks within functions—you can assist both beginners and advanced users to create clear, flexible, and reproducible documents and workflows.
Acknowledgements
Thanks to Wesley Billingham and Dr Elizabeth McKinnon for providing feedback on and reviewing this post.
Reproducibility Information
To access the .qmd (Quarto markdown) files as well as any R scripts or data that was used in this post, please visit our GitHub:
https://github.com/The-Kids-Biostats/The-Kids-Biostats.github.io/tree/main/posts/
The session information can also be seen below.
Code
sessionInfo()
R version 4.3.3 (2024-02-29)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.4.1
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Australia/Perth
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] htmlwidgets_1.6.4 compiler_4.3.3 fastmap_1.2.0 cli_3.6.3
[5] tools_4.3.3 htmltools_0.5.8.1 rstudioapi_0.17.1 yaml_2.3.10
[9] rmarkdown_2.29 knitr_1.49 jsonlite_1.8.9 xfun_0.49
[13] digest_0.6.37 rlang_1.1.4 evaluate_1.0.1