+ - 0:00:00
Notes for current slide
Notes for next slide

Skills Lab 06: Data Wrangling

Straight Outta Composites

Dr Danielle Evans

7 March 2024

1 / 18

Overview

  • Reverse Coding

    • What?
    • Why?
    • How?
  • Composites

    • What?
    • Why?
    • How?
2 / 18

Overview

  • Reverse Coding

    • What?
    • Why?
    • How?
  • Composites

    • What?
    • Why?
    • How?





Let's get ready to wRangle!! (Yes, again...)

2 / 18

Reverse Coding - What & Why?

  • When we design questionnaires & scales we have the choice of phrasing items positively or negatively

  • A positively-phrased item could be:

"I love R with all of my heart."

  • Whereas a negatively-phrased item could be:

"I hate R with every part of my being."

  • Both Qs measure the same underlying construct, but a high score on these two items reflects opposing feelings (i.e., high like or high dislike)



Woah! Negatively-phrased items can help reduce/identify acquiescent or careless responding on surveys - the goal is always high quality data 😍

3 / 18

Reverse Coding - What & Why?

  • Reverse Coding (or reverse scoring) is where we change the numeric values assigned to negatively-phrased items so that a high score reflects the same type of response across all items in our questionnaire

  • With reverse coding, we're not changing participants' data, but just the numeric values we have assigned to the responses for a specific item

4 / 18

Reverse Coding - What & Why?

  • Let's say we have the two example items below in our questionnaire, where strongly disagree is scored as 1 and strongly agree is scored as 5:

"Q1. I really enjoy learning R"

"Q2. I find R soul-destroying."


  • Assuming our participant feels consistently positive about R, then they would get a score of 5 for Q1 & a score of 1 for Q2

  • So if we calculated their mean score across these two items, we'd get 3, which doesn't reflect their positive feelings towards R i.e., it looks as though they feel pretty neutral towards R, instead of reflecting their R love

5 / 18

Reverse Coding - What & Why?

  • But we can reverse code/score the negatively-phrased item by flipping around the scoring

  • Now, the first item is still scored as strongly disagree = 1 and strongly agree = 5, but our second item is now scored in the opposite direction where strongly disagree = 5 and strongly agree = 1


"Q1. I really enjoy learning R"

"Q2. I find R soul-destroying."


  • Now if we calculated the mean score across these items for our R LovR participant, they would score a 5, which accurately reflects their consistently positive feelings towards R
6 / 18

Any Questions?

7 / 18

Reverse Coding - How?

  • There are multiple ways we can reverse code our variables in R:

    • If we have numeric data, we can use simple maths or we can use the dplyr::recode() function - just like in the skills lab last week..

    • If we have factor data, we can use the forcats::relevel() function

    • If we have character data, we can use dplyr::recode()

  • We can, but we don't need to reverse code items before we do a factor analysis (FA), but we must reverse code negative items before we do a reliability analysis (RA)

  • We also need to reverse code items before creating composites - but more on that later!


Top Tip! It's a good idea to reverse code items after running a FA, because items you might assume are negatively-phrased, don't always turn out to be!

8 / 18

Reverse Coding - How?

Step 1: Look at the factor loadings:

Variable MR1 MR2 MR3
RQ6 0.91 0.12 0.15
RQ8 0.79 -0.08 -0.33
RQ7 0.71 0.16 -0.06
RQ5 -0.54 0.37 -0.08
RQ3 0.01 0.95 -0.05
RQ2 0.04 0.95 -0.02
RQ9_NEG 0.08 -0.14 0.95
RQ1_NEG -0.19 0.01 0.81
RQ4_NEG -0.35 -0.20 0.42
Variable Question
RQ6 I enjoy working with computers.
RQ8 I feel confident using computers.
RQ7 I find computers easy to use.
RQ5 I find technology challenging.
RQ3 I love learning about statistics.
RQ2 I like maths.
RQ9_NEG I hate RStudio
RQ1_NEG I dislike my R practicals.
RQ4_NEG I don't enjoy using R.


Top Tip! We're looking for inconsistent signs +/- on the primary factor loading!

9 / 18

Reverse Coding - How?

Step 2: Decide which, if any, variables are negatively related

Variable MR1 MR2 MR3
RQ6 0.91 0.12 0.15
RQ8 0.79 -0.08 -0.33
RQ7 0.71 0.16 -0.06
RQ5 -0.54 0.37 -0.08
RQ3 0.01 0.95 -0.05
RQ2 0.04 0.95 -0.02
RQ9_NEG 0.08 -0.14 0.95
RQ1_NEG -0.19 0.01 0.81
RQ4_NEG -0.35 -0.20 0.42
Variable Question
RQ6 I enjoy working with computers.
RQ8 I feel confident using computers.
RQ7 I find computers easy to use.
RQ5 I find technology challenging.
RQ3 I love learning about statistics.
RQ2 I like maths.
RQ9_NEG I hate RStudio
RQ1_NEG I dislike my R practicals.
RQ4_NEG I don't enjoy using R.


Top Tip! We're looking for inconsistent signs +/- on the primary factor loading!

10 / 18

Reverse Coding - How?

Step 2: Decide which, if any, variables are negatively related

Variable MR1 MR2 MR3
RQ6 0.91 0.12 0.15
RQ8 0.79 -0.08 -0.33
RQ7 0.71 0.16 -0.06
RQ5 -0.54 0.37 -0.08
RQ3 0.01 0.95 -0.05
RQ2 0.04 0.95 -0.02
RQ9_NEG 0.08 -0.14 0.95
RQ1_NEG -0.19 0.01 0.81
RQ4_NEG -0.35 -0.20 0.42
Variable Question
RQ6 I enjoy working with computers.
RQ8 I feel confident using computers.
RQ7 I find computers easy to use.
RQ5 I find technology challenging.
RQ3 I love learning about statistics.
RQ2 I like maths.
RQ9_NEG I hate RStudio
RQ1_NEG I dislike my R practicals.
RQ4_NEG I don't enjoy using R.


Top Tip! We're looking for inconsistent signs +/- on the primary factor loading!

11 / 18

Reverse Coding - How?

Step 3: Recode our items

# method 1 - using simple maths on a 5-point scale
data <- data |>
dplyr::mutate(column_name = 6-column_name)
# method 2 - using recode
data <- data |>
dplyr::mutate(column_name = dplyr::recode(
column_name, `1` = 5, `2` = 4, `3` = 3, `4` = 2, `5` = 1))

Step 4: Check recoding

Use the table() function to compare frequencies before and after recoding to check it has worked


Demo! Reverse coding numeric items & double checking with table()!

12 / 18

Creating Composites - What & Why?

  • Composite Scores are usually created by calculating the mean of a set of items from a scale or questionnaire

  • We can calculate composite scores from all items on a scale

    • We could have a measure of "Love for R" made up of all 50 survey items
  • Or we can create composites from different subsets of items that make up different factors/subscales

    • Maybe our measure of "Love for R" can be broken down into 3 factors/subscales of "Love for Stats", "Love for Computers", and "Coolness"
  • We look at the results of our factor analysis to decide how items can be combined into a composite (i.e., if we have one overall score, or different subscales), or we can use the scoring instructions for any pre-validated scales that we're using

13 / 18

Creating Composites - What & Why?

  • For example, we could have 4 items of 'r_love' that we can use to create a composite measure by calculating the mean of those items for each participant
ID r_love_1 r_love_2 r_love_3 r_love_4 r_love_comp
ppt1 1 2 3 1 1.75
ppt2 2 5 3 5 3.75
ppt3 5 4 4 3 4.00
ppt4 3 1 2 1 1.75
ppt5 4 2 5 5 4.00
ppt6 2 3 3 1 2.25
ppt7 1 3 1 4 2.25
ppt8 2 1 3 4 2.50
  • We can then use these composites to represent a given construct in further analyses/models
14 / 18

Any Questions?

15 / 18

Creating Composites - How?

  • We can use a combination of different functions to create a composite score:

    • dplyr::mutate(), dplyr::rowwise(), mean(), c() & c_across()
  • Before creating composites we should also reverse score any negative items because the scoring differs between positive and negative items and the mean of them would not be accurate

  • We often have to make some decisions around how we choose to handle missing data when creating composites

    • I.e., whether we'll calculate the mean regardless of any missing data, whether we'll calculate it for only complete cases, or whether we'll allow some missing data

    • This is entirely up to the researcher, or the scale creator (there may be specific recommendations you need to follow for scales created by others)

16 / 18

Creating Composites - How?

# method 1 - using columns next to eachother
comp_scores <- data |>
dplyr::rowwise() |>
dplyr::mutate(column_name1_comp = mean(c_across(item1:item3)),
column_name2_comp = mean(c_across(item4:item5)))
# method 2 - using columns NOT next to eachother
data <- data |>
dplyr::rowwise() |>
dplyr::mutate(column_name1_comp = mean(c(item1, item2, item3)),
column_name2_comp = mean(c(item4, item5)))






Demo! Creating composites!

17 / 18

That's all - happy wrangling!



Artwork by @allison_horst

18 / 18

Overview

  • Reverse Coding

    • What?
    • Why?
    • How?
  • Composites

    • What?
    • Why?
    • How?
2 / 18
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
sToggle scribble toolbox
Esc Back to slideshow