Reverse Coding
Composites
Reverse Coding
Composites
When we design questionnaires & scales we have the choice of phrasing items positively or negatively
A positively-phrased item could be:
"I love R with all of my heart."
"I hate R with every part of my being."
Woah! Negatively-phrased items can help reduce/identify acquiescent or careless responding on surveys - the goal is always high quality data 😍
Reverse Coding (or reverse scoring) is where we change the numeric values assigned to negatively-phrased items so that a high score reflects the same type of response across all items in our questionnaire
With reverse coding, we're not changing participants' data, but just the numeric values we have assigned to the responses for a specific item
"Q1. I really enjoy learning R"
"Q2. I find R soul-destroying."
Assuming our participant feels consistently positive about R, then they would get a score of 5 for Q1 & a score of 1 for Q2
So if we calculated their mean score across these two items, we'd get 3, which doesn't reflect their positive feelings towards R i.e., it looks as though they feel pretty neutral towards R, instead of reflecting their R love
But we can reverse code/score the negatively-phrased item by flipping around the scoring
Now, the first item is still scored as strongly disagree = 1 and strongly agree = 5, but our second item is now scored in the opposite direction where strongly disagree = 5 and strongly agree = 1
"Q1. I really enjoy learning R"
"Q2. I find R soul-destroying."
There are multiple ways we can reverse code our variables in R:
If we have numeric data, we can use simple maths or we can use the dplyr::recode() function - just like in the skills lab last week..
If we have factor data, we can use the forcats::relevel() function
If we have character data, we can use dplyr::recode()
We can, but we don't need to reverse code items before we do a factor analysis (FA), but we must reverse code negative items before we do a reliability analysis (RA)
We also need to reverse code items before creating composites - but more on that later!
Top Tip! It's a good idea to reverse code items after running a FA, because items you might assume are negatively-phrased, don't always turn out to be!
Step 1: Look at the factor loadings:
Variable | MR1 | MR2 | MR3 |
---|---|---|---|
RQ6 | 0.91 | 0.12 | 0.15 |
RQ8 | 0.79 | -0.08 | -0.33 |
RQ7 | 0.71 | 0.16 | -0.06 |
RQ5 | -0.54 | 0.37 | -0.08 |
RQ3 | 0.01 | 0.95 | -0.05 |
RQ2 | 0.04 | 0.95 | -0.02 |
RQ9_NEG | 0.08 | -0.14 | 0.95 |
RQ1_NEG | -0.19 | 0.01 | 0.81 |
RQ4_NEG | -0.35 | -0.20 | 0.42 |
Variable | Question |
---|---|
RQ6 | I enjoy working with computers. |
RQ8 | I feel confident using computers. |
RQ7 | I find computers easy to use. |
RQ5 | I find technology challenging. |
RQ3 | I love learning about statistics. |
RQ2 | I like maths. |
RQ9_NEG | I hate RStudio |
RQ1_NEG | I dislike my R practicals. |
RQ4_NEG | I don't enjoy using R. |
Top Tip! We're looking for inconsistent signs +/- on the primary factor loading!
Step 2: Decide which, if any, variables are negatively related
Variable | MR1 | MR2 | MR3 |
---|---|---|---|
RQ6 | 0.91 | 0.12 | 0.15 |
RQ8 | 0.79 | -0.08 | -0.33 |
RQ7 | 0.71 | 0.16 | -0.06 |
RQ5 | -0.54 | 0.37 | -0.08 |
RQ3 | 0.01 | 0.95 | -0.05 |
RQ2 | 0.04 | 0.95 | -0.02 |
RQ9_NEG | 0.08 | -0.14 | 0.95 |
RQ1_NEG | -0.19 | 0.01 | 0.81 |
RQ4_NEG | -0.35 | -0.20 | 0.42 |
Variable | Question |
---|---|
RQ6 | I enjoy working with computers. |
RQ8 | I feel confident using computers. |
RQ7 | I find computers easy to use. |
RQ5 | I find technology challenging. |
RQ3 | I love learning about statistics. |
RQ2 | I like maths. |
RQ9_NEG | I hate RStudio |
RQ1_NEG | I dislike my R practicals. |
RQ4_NEG | I don't enjoy using R. |
Top Tip! We're looking for inconsistent signs +/- on the primary factor loading!
Step 2: Decide which, if any, variables are negatively related
Variable | MR1 | MR2 | MR3 |
---|---|---|---|
RQ6 | 0.91 | 0.12 | 0.15 |
RQ8 | 0.79 | -0.08 | -0.33 |
RQ7 | 0.71 | 0.16 | -0.06 |
RQ5 | -0.54 | 0.37 | -0.08 |
RQ3 | 0.01 | 0.95 | -0.05 |
RQ2 | 0.04 | 0.95 | -0.02 |
RQ9_NEG | 0.08 | -0.14 | 0.95 |
RQ1_NEG | -0.19 | 0.01 | 0.81 |
RQ4_NEG | -0.35 | -0.20 | 0.42 |
Variable | Question |
---|---|
RQ6 | I enjoy working with computers. |
RQ8 | I feel confident using computers. |
RQ7 | I find computers easy to use. |
RQ5 | I find technology challenging. |
RQ3 | I love learning about statistics. |
RQ2 | I like maths. |
RQ9_NEG | I hate RStudio |
RQ1_NEG | I dislike my R practicals. |
RQ4_NEG | I don't enjoy using R. |
Top Tip! We're looking for inconsistent signs +/- on the primary factor loading!
Step 3: Recode our items
# method 1 - using simple maths on a 5-point scale data <- data |> dplyr::mutate(column_name = 6-column_name)
# method 2 - using recodedata <- data |> dplyr::mutate(column_name = dplyr::recode( column_name, `1` = 5, `2` = 4, `3` = 3, `4` = 2, `5` = 1))
Step 4: Check recoding
Use the table() function to compare frequencies before and after recoding to check it has worked
Demo! Reverse coding numeric items & double checking with table()!
Composite Scores are usually created by calculating the mean of a set of items from a scale or questionnaire
We can calculate composite scores from all items on a scale
Or we can create composites from different subsets of items that make up different factors/subscales
We look at the results of our factor analysis to decide how items can be combined into a composite (i.e., if we have one overall score, or different subscales), or we can use the scoring instructions for any pre-validated scales that we're using
ID | r_love_1 | r_love_2 | r_love_3 | r_love_4 | r_love_comp |
---|---|---|---|---|---|
ppt1 | 1 | 2 | 3 | 1 | 1.75 |
ppt2 | 2 | 5 | 3 | 5 | 3.75 |
ppt3 | 5 | 4 | 4 | 3 | 4.00 |
ppt4 | 3 | 1 | 2 | 1 | 1.75 |
ppt5 | 4 | 2 | 5 | 5 | 4.00 |
ppt6 | 2 | 3 | 3 | 1 | 2.25 |
ppt7 | 1 | 3 | 1 | 4 | 2.25 |
ppt8 | 2 | 1 | 3 | 4 | 2.50 |
We can use a combination of different functions to create a composite score:
Before creating composites we should also reverse score any negative items because the scoring differs between positive and negative items and the mean of them would not be accurate
We often have to make some decisions around how we choose to handle missing data when creating composites
I.e., whether we'll calculate the mean regardless of any missing data, whether we'll calculate it for only complete cases, or whether we'll allow some missing data
This is entirely up to the researcher, or the scale creator (there may be specific recommendations you need to follow for scales created by others)
# method 1 - using columns next to eachothercomp_scores <- data |> dplyr::rowwise() |> dplyr::mutate(column_name1_comp = mean(c_across(item1:item3)), column_name2_comp = mean(c_across(item4:item5)))
# method 2 - using columns NOT next to eachotherdata <- data |> dplyr::rowwise() |> dplyr::mutate(column_name1_comp = mean(c(item1, item2, item3)), column_name2_comp = mean(c(item4, item5)))
Demo! Creating composites!
Reverse Coding
Composites
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
s | Toggle scribble toolbox |
Esc | Back to slideshow |