Likert scale variables (and hence data) are widely utilised in research—they are useful for getting participants to rate things, or to provide an average quantity as a response in situations where asking for the exact quantity may be problematic. How can asking for an exact quantity be problematic? Well, consider the example below that relates to how much water one drinks per day. Very few people drink the exact same amount of water each day, so asking participants “How many (250ml) glasses of water do you drink per day” and getting a response of “3” is typically pointless—there is likely substantial measurement error, and if they drank 3 glasses yesterday (anomaly or otherwise) but 4 glasses the day before, is the response of 3 not just outright incorrect?
What is presented below is nothing ground breaking. We went in search of a concise, succinct, and accurate way to display (specifically) “pre/post” Likert data, and this is where we are currently at.
The data
Definitions
Firstly, some definitions. There are two main types of Likert data. We are going to refer to them as “ordinal” and “bidirectional”.
Ordinal Likert data(sometimes called unipolar Likert data, or interval Likert data) involves category responses that have some natural order (decreasing/increasing) to them, the width of categories and the distance between categories are not necessarily consistent, and the categories often represent a underlying continuous scale that has been ‘binned’ (into the Likert categories).
An example. “How many glasses of water do you typically drink per day?” with response options:
Less than one glass/day
1-2 glasses/day
3-4 glasses/day
5-6 glasses/day
More than six glasses/day
Birectional Likert data(sometimes called bipolar Likert data) involves category responses that have a natural order with responses from two opposing directions—typically negative responses and positive responses—around a central (or neutral) point.
An example. “The amount of reading I do influences how much reading my child does?” with response options:
Strongly disagree
Disagree
Neither agree nor disagree (the neutral midpoint)
Agree
Strongly agree
We’ll return to bidirectional Likert data in a future post, for now we will look at ordinal Likert data.
Ordinal Likert data
Demo data
Code
library(simstudy)library(ggsankey); library(ggalluvial)library(likert); library(patchwork)library(gt); library(gtsummary)library(flextable)library(thekidsbiostats) # install with remotes::install_github("The-Kids-Biostats/thekidsbiostats")
We’re going to use one of our favourite packages to create some synthetic data to use.
Specifically, we will simulate some pre and post response data, a group identifier (intervention or control), and then some labelled response columns.
Code
set.seed(123) # For reproducibility# dat_i is the intervention groupn <-183# Set the number of individualsdef <-defData(varname ="pre", formula ="1;5", dist ="uniformInt") # Pre values: uniformly distributed between 1 and 5dat_i <-genData(n, def)group_probs <-c(0.45, 0.45, 0.10)dat_i$grp <-sample(1:3, n, replace =TRUE, prob = group_probs)dat_i$post <- dat_i$predat_i$post[dat_i$grp ==2] <-pmin(dat_i$pre[dat_i$grp ==2] + (rbinom(sum(dat_i$grp ==2), 2, 0.2) +1), 5) # Increase by 1, max 5dat_i$post[dat_i$grp ==3] <-pmax(dat_i$pre[dat_i$grp ==3] - (rbinom(sum(dat_i$grp ==3), 2, 0.2) +1), 1) # Decrease by 1, min 1# dat_c is the control groupn <-154# Set the number of individualsdef <-defData(varname ="pre", formula ="1;5", dist ="uniformInt") # Pre values: uniformly distributed between 1 and 5dat_c <-genData(n, def)group_probs <-c(0.55, 0.25, 0.20)dat_c$grp <-sample(1:3, n, replace =TRUE, prob = group_probs)dat_c$post <- dat_c$predat_c$post[dat_c$grp ==2] <-pmin(dat_c$pre[dat_c$grp ==2] + (rbinom(sum(dat_c$grp ==2), 2, 0.2) +1), 5) # Increase by 1, max 5dat_c$post[dat_c$grp ==3] <-pmax(dat_c$pre[dat_c$grp ==3] - (rbinom(sum(dat_c$grp ==3), 2, 0.2) +1), 1) # Decrease by 1, min 1# Combine the control & intervention data into one dataframedat <-rbind(cbind(dat_i, group ="Intervention"), cbind(dat_c, group ="Control")) %>%mutate(post =as.integer(post)) %>%select(-grp)# Add some factored labelsdat <- dat %>%mutate(pre_l =fct_case_when(pre ==1~"Less than one cup/day", pre ==2~"About 1-2 cups/day", pre ==3~"About 3-4 cups/day", pre ==4~"About 5-6 cups/day", pre ==5~"More than 6 cups/day"),post_l =fct_case_when(post ==1~"Less than one cup/day", post ==2~"About 1-2 cups/day", post ==3~"About 3-4 cups/day", post ==4~"About 5-6 cups/day", post ==5~"More than 6 cups/day"))# Visualise the first few rows of datahead(dat, 5) %>%thekids_table(colour ="Saffron", padding =3)
And, we might also like to table some of the ‘change’ data that this plot is based on—using our favourite package (to battle with)gtsummary.
Code
dat %>%mutate(Change =fct_case_when(post < pre ~"Decrease", pre == post ~"No change", post > pre ~"Increase")) %>%select(group, pre_l, Change) %>%tbl_strata(strata = group,~.x %>%tbl_summary(by = pre_l) %>%modify_header(all_stat_cols() ~"**{level}**"),.combine_with ="tbl_stack" ) %>%thekids_table(colour ="Saffron")
Group
Characteristic
Less than one cup/day1
About 1-2 cups/day1
About 3-4 cups/day1
About 5-6 cups/day1
More than 6 cups/day1
Control
Change
Decrease
0 (0%)
9 (29%)
5 (19%)
3 (12%)
8 (20%)
No change
24 (80%)
15 (48%)
14 (52%)
13 (50%)
32 (80%)
Increase
6 (20%)
7 (23%)
8 (30%)
10 (38%)
0 (0%)
Intervention
Change
Decrease
0 (0%)
6 (14%)
6 (15%)
3 (7.7%)
3 (9.1%)
No change
15 (50%)
20 (48%)
10 (26%)
20 (51%)
30 (91%)
Increase
15 (50%)
16 (38%)
23 (59%)
16 (41%)
0 (0%)
1n (%)
Or perhaps just this will suffice:
Code
tbl_merge(tbls =list(dat %>%mutate("Change in water intake"=fct_case_when(post < pre ~"Decrease", pre == post ~"No change", post > pre ~"Increase")) %>%filter(group =="Control") %>%select("Change in water intake") %>%tbl_summary(), dat %>%mutate("Change in water intake"=fct_case_when(post < pre ~"Decrease", pre == post ~"No change", post > pre ~"Increase")) %>%filter(group =="Intervention") %>%select("Change in water intake") %>%tbl_summary()),tab_spanner =c("**Control**", "**Intervention**")) %>%thekids_table(colour ="Saffron")
Control
Intervention
Characteristic
N = 1541
N = 1831
Change in water intake
Decrease
25 (16%)
18 (9.8%)
No change
98 (64%)
95 (52%)
Increase
31 (20%)
70 (38%)
1n (%)
Closing comments
The above isn’t perfect—one could argue that there is no need to duplicate the figure headings and that the legend could be handled better. But is a figure ever perfect?
The figure does show all the raw data (counts and percentages), clearly delineates the pre and post data, gives some idea of the flow of data between levels, and highlights that in the post period, the intervention group comprised a higher proportion of level 5 responses. Combined with a summary table that shows the actual proportional movements from each pre (baseline) group—we are getting somewhere.
As mentioned, we’ll return to bidirectional Likert data in a future post.
Acknowledgements
Thanks to Wesley Billingham and Dr Elizabeth McKinnon for providing feedback on and reviewing this post.
You can look forward to seeing posts from these other team members here in the coming weeks and months.
Reproducibility Information
To access the .qmd (Quarto markdown) files as well as any R scripts or data that was used in this post, please visit our GitHub: