Skills Lab 06: Data Wrangling

Skills Lab 06: Data Wrangling
Straight Outta Composites
Dr Danielle Evans
7 March 2024
1 / 18

Overview

Reverse Coding
- What?
- Why?
- How?
Composites
- What?
- Why?
- How?

2 / 18

Overview

Reverse Coding
- What?
- Why?
- How?
Composites
- What?
- Why?
- How?

Let's get ready to wRangle!! (Yes, again...)

2 / 18

Reverse Coding - What & Why?

When we design questionnaires & scales we have the choice of phrasing items positively or negatively
A positively-phrased item could be:

"I love R with all of my heart."

Whereas a negatively-phrased item could be:

"I hate R with every part of my being."

Both Qs measure the same underlying construct, but a high score on these two items reflects opposing feelings (i.e., high like or high dislike)

Woah! Negatively-phrased items can help reduce/identify acquiescent or careless responding on surveys - the goal is always high quality data 😍

3 / 18

Reverse Coding - What & Why?

Reverse Coding (or reverse scoring) is where we change the numeric values assigned to negatively-phrased items so that a high score reflects the same type of response across all items in our questionnaire
With reverse coding, we're not changing participants' data, but just the numeric values we have assigned to the responses for a specific item

4 / 18

Reverse Coding - What & Why?

Let's say we have the two example items below in our questionnaire, where strongly disagree is scored as 1 and strongly agree is scored as 5:

"Q1. I really enjoy learning R"

"Q2. I find R soul-destroying."

Assuming our participant feels consistently positive about R, then they would get a score of 5 for Q1 & a score of 1 for Q2
So if we calculated their mean score across these two items, we'd get 3, which doesn't reflect their positive feelings towards R i.e., it looks as though they feel pretty neutral towards R, instead of reflecting their R love

5 / 18

Reverse Coding - What & Why?

But we can reverse code/score the negatively-phrased item by flipping around the scoring
Now, the first item is still scored as strongly disagree = 1 and strongly agree = 5, but our second item is now scored in the opposite direction where strongly disagree = 5 and strongly agree = 1

"Q1. I really enjoy learning R"

"Q2. I find R soul-destroying."

Now if we calculated the mean score across these items for our R LovR participant, they would score a 5, which accurately reflects their consistently positive feelings towards R

6 / 18

Any Questions?

7 / 18

Reverse Coding - How?

There are multiple ways we can reverse code our variables in R:
- If we have numeric data, we can use simple maths or we can use the dplyr::recode() function - just like in the skills lab last week..
- If we have factor data, we can use the forcats::relevel() function
- If we have character data, we can use dplyr::recode()
We can, but we don't need to reverse code items before we do a factor analysis (FA), but we must reverse code negative items before we do a reliability analysis (RA)
We also need to reverse code items before creating composites - but more on that later!

Top Tip! It's a good idea to reverse code items after running a FA, because items you might assume are negatively-phrased, don't always turn out to be!

8 / 18

Reverse Coding - How?

Step 1: Look at the factor loadings:

Variable	MR1	MR2	MR3
RQ6	0.91	0.12	0.15
RQ8	0.79	-0.08	-0.33
RQ7	0.71	0.16	-0.06
RQ5	-0.54	0.37	-0.08
RQ3	0.01	0.95	-0.05
RQ2	0.04	0.95	-0.02
RQ9_NEG	0.08	-0.14	0.95
RQ1_NEG	-0.19	0.01	0.81
RQ4_NEG	-0.35	-0.20	0.42

Variable	Question
RQ6	I enjoy working with computers.
RQ8	I feel confident using computers.
RQ7	I find computers easy to use.
RQ5	I find technology challenging.
RQ3	I love learning about statistics.
RQ2	I like maths.
RQ9_NEG	I hate RStudio
RQ1_NEG	I dislike my R practicals.
RQ4_NEG	I don't enjoy using R.

Top Tip! We're looking for inconsistent signs +/- on the primary factor loading!

9 / 18

Reverse Coding - How?

Step 2: Decide which, if any, variables are negatively related

Variable	MR1	MR2	MR3
RQ6	0.91	0.12	0.15
RQ8	0.79	-0.08	-0.33
RQ7	0.71	0.16	-0.06
RQ5	-0.54	0.37	-0.08
RQ3	0.01	0.95	-0.05
RQ2	0.04	0.95	-0.02
RQ9_NEG	0.08	-0.14	0.95
RQ1_NEG	-0.19	0.01	0.81
RQ4_NEG	-0.35	-0.20	0.42

Variable	Question
RQ6	I enjoy working with computers.
RQ8	I feel confident using computers.
RQ7	I find computers easy to use.
RQ5	I find technology challenging.
RQ3	I love learning about statistics.
RQ2	I like maths.
RQ9_NEG	I hate RStudio
RQ1_NEG	I dislike my R practicals.
RQ4_NEG	I don't enjoy using R.

Top Tip! We're looking for inconsistent signs +/- on the primary factor loading!

10 / 18

Reverse Coding - How?

Step 2: Decide which, if any, variables are negatively related

Variable	MR1	MR2	MR3
RQ6	0.91	0.12	0.15
RQ8	0.79	-0.08	-0.33
RQ7	0.71	0.16	-0.06
RQ5	-0.54	0.37	-0.08
RQ3	0.01	0.95	-0.05
RQ2	0.04	0.95	-0.02
RQ9_NEG	0.08	-0.14	0.95
RQ1_NEG	-0.19	0.01	0.81
RQ4_NEG	-0.35	-0.20	0.42

Variable	Question
RQ6	I enjoy working with computers.
RQ8	I feel confident using computers.
RQ7	I find computers easy to use.
RQ5	I find technology challenging.
RQ3	I love learning about statistics.
RQ2	I like maths.
RQ9_NEG	I hate RStudio
RQ1_NEG	I dislike my R practicals.
RQ4_NEG	I don't enjoy using R.

Top Tip! We're looking for inconsistent signs +/- on the primary factor loading!

11 / 18

Reverse Coding - How?

Step 3: Recode our items

# method 1 - using simple maths on a 5-point scale 
data <- data |>
  dplyr::mutate(column_name = 6-column_name)

# method 2 - using recode
data <- data |> 
  dplyr::mutate(column_name = dplyr::recode(
    column_name, `1` = 5, `2` = 4, `3` = 3, `4` = 2, `5` = 1))

Step 4: Check recoding

Use the table() function to compare frequencies before and after recoding to check it has worked

Demo! Reverse coding numeric items & double checking with table()!

12 / 18

Creating Composites - What & Why?

Composite Scores are usually created by calculating the mean of a set of items from a scale or questionnaire
We can calculate composite scores from all items on a scale
- We could have a measure of "Love for R" made up of all 50 survey items
Or we can create composites from different subsets of items that make up different factors/subscales
- Maybe our measure of "Love for R" can be broken down into 3 factors/subscales of "Love for Stats", "Love for Computers", and "Coolness"
We look at the results of our factor analysis to decide how items can be combined into a composite (i.e., if we have one overall score, or different subscales), or we can use the scoring instructions for any pre-validated scales that we're using

13 / 18

Creating Composites - What & Why?For example, we could have 4 items of 'r_love' that we can use to create a composite measure by calculating the mean of those items for each participant

 
    ID 
    r_love_1 
    r_love_2 
    r_love_3 
    r_love_4 
    r_love_comp 
  


    ppt1 




75 
  

    ppt2 




75 
  

    ppt3 




00 
  

    ppt4 




75 
  

    ppt5 




00 
  

    ppt6 




25 
  

    ppt7 




25 
  

    ppt8 




50 
  



We can then use these composites to represent a given construct in further analyses/models
/ 18

ID	r_love_1	r_love_2	r_love_3	r_love_4	r_love_comp
ppt1	1	2	3	1	1.75
ppt2	2	5	3	5	3.75
ppt3	5	4	4	3	4.00
ppt4	3	1	2	1	1.75
ppt5	4	2	5	5	4.00
ppt6	2	3	3	1	2.25
ppt7	1	3	1	4	2.25
ppt8	2	1	3	4	2.50

Any Questions?

15 / 18

Creating Composites - How?

We can use a combination of different functions to create a composite score:
- dplyr::mutate(), dplyr::rowwise(), mean(), c() & c_across()
Before creating composites we should also reverse score any negative items because the scoring differs between positive and negative items and the mean of them would not be accurate
We often have to make some decisions around how we choose to handle missing data when creating composites
- I.e., whether we'll calculate the mean regardless of any missing data, whether we'll calculate it for only complete cases, or whether we'll allow some missing data
- This is entirely up to the researcher, or the scale creator (there may be specific recommendations you need to follow for scales created by others)

16 / 18

Creating Composites - How?

# method 1 - using columns next to eachother
comp_scores <- data |> 
  dplyr::rowwise() |>
  dplyr::mutate(column_name1_comp = mean(c_across(item1:item3)),
                column_name2_comp = mean(c_across(item4:item5)))

# method 2 - using columns NOT next to eachother
data <- data |>
    dplyr::rowwise() |>
    dplyr::mutate(column_name1_comp = mean(c(item1, item2, item3)),
                  column_name2_comp = mean(c(item4, item5)))

Demo! Creating composites!

17 / 18

That's all - happy wrangling!

Artwork by @allison_horst

Give session feedback here! 😀

18 / 18

Help

Keyboard shortcuts

↑, ←, Pg Up, k

Go to previous slide

↓, →, Pg Dn, Space, j

Go to next slide

Home

Go to first slide

End

Go to last slide

Number + Return

Go to specific slide

b / m / f

Toggle blackout / mirrored / fullscreen mode

Clone slideshow

Toggle presenter mode

Restart the presentation timer

?, h

Toggle this help

Toggle scribble toolbox