Chapter 21 Centering & Standardizing Variables

In this chapter, we will learn how to center and standardize variables.

21.1 Conceptual Overview

Centering or standardizing variables can be a useful data preparation step. For example, we often center predictor variables prior to specifying a product term (i.e., interaction term) when estimating moderation effects in a multiple linear regression model.

21.1.1 Review of Centering Variables

Centering is the process of subtracting the variable mean (average) from each of the values of that same variable; in other words, it’s a linear rescaling of a variable. Centering variables is sometimes completed prior to including those variables as predictors in a regression model, and it is generally done for one or both of the following purposes: (a) to make the intercept valuable more interpretable, and (b) to reduce collinearity between two or more predictor variables that are subsequently multiplied to create an interaction term (product term) when estimating a moderated multiple linear regression model or polynomial regression model, for example. Regarding the first purpose, centering to enhance the interpretability of the intercept (constant) value in a regression model is relevant to the extent that we wish to interpret the intercept value. In an ordinary least squares (OLS) regression model, the intercept value represents the mean of the outcome variable when all predictor variables are set to zero. If the scaling of any of the predictor variables in our model does not include zero (e.g., predictor variables are on a 1-10 scale), then considering the intercept value when the predictors are zero, doesn’t make much sense. Regarding the second purpose, it is important to center predictor variables prior to using them to create a multiplicative term (i.e., interaction variable), such as an interaction term (i.e., product term) or a polynomial term (e.g., quadratic term, cubic term). In the context of a moderated multiple linear regression model, centering does not affect the significance of the interaction term, but a lack of centering will affect the interpretation of the main effects.

Thus far, I have been mostly referring to what is referred to as grand-mean centering. To grand-mean center a variable, we simply subtract the overall (grand) mean of the entire sample for that variable from each value of that variable, thereby creating a new variable in which the mean is zero and the standard deviation is the same as it was before centering. For example, let’s assume that we have a variable called Age for a sample of individuals, where Age is measured in years. In it’s raw format, the mean of Age is 36.2 years with a standard deviation of 8.1. If we grand-mean center Age, then for each individual in our sample, we create a new variable in which take their Age and subtract the mean Age of 36.2. If an individual has an Age of 40.0, then their centered Age would be 3.8 (40.0 - 36.2 = 3.8). By centering each individual’s Age relative to the grand-mean, we end up with a variable that has a mean of 0.0, but with a standard deviation that is equal to 8.1, which is equal to the standard deviation of the original Age variable. Why is this the case? Well, we have just performed a linear shift of Age, which affects only the mean and not the standard deviation.

In the context of multilevel models, group-mean centering becomes relevant and important. In short, group-mean centering refers to the process of subtracting the respective group mean for a particular variable (based on another variable that acts as a grouping or clustering variable) for each case’s score on that same variable. For example, if Employee A belongs to Work Team A, and Work Team A consists of 10 other employees, we would first calculate the mean of Work Team A employees’ scores on a continuous variable of interest, and second, we would subtract that group mean from each Work Team A employee’s score on that continuous variable. We would then repeat this process for all employees relative to their respective work teams. In the context of multilevel modeling, both grand-mean centering and group-mean centering can have pronounced on the estimated coefficients and the interpretation of those coefficients. For instance, group-mean centering can be used in multilevel models to separate out the within-group effects and the between-groups effects, if that is of interest. A full discussion of grand-mean centering and group-mean centering in the context of multilevel modeling is beyond the scope of this tutorial; for a more complete overview, please check out Professor Jason Newsom’s handout: http://web.pdx.edu/~newsomj/mlrclass/ho_centering.pdf.

21.1.2 Review of Standardizing Variables

Like centering variables, when standardizing (or scaling) variables, we center the variables around a mean of zero. However, when standardizing a variable, we are actually converting the variable to a z-score, which means we set the mean to 0 and the variance to 1; because the standard deviation is just the square root of the variance, then the standard deviation is also set to 1. So how do you interpret a variable that is standardized? Let’s assume that we standardized a variable called Age for a sample of individuals, where Age is measured in years. In it’s raw format, the mean of Age is 36.2 years with a standard deviation of 8.1, which would mean, for example, that a person who is 44.3 years old has an Age that is exactly 1 standard deviation higher than the mean (44.3 - 8.1 = 36.2). If we standardize the Age variable, then the mean becomes 0.0 and the standard deviation becomes 1.0. Accordingly, the standardized score for the person who has an Age of 44.3 years (which was 1 standard deviation above the mean) would become 1.0. If a person has an Age of 20.0, then that means they have a standardized score of -2.0, which represents 2.0 standard deviations below the mean (20.0 - 36.2 = -16.2 and -16.2 / 8.1 = -2.0).

21.2 Tutorial

This chapter’s tutorial demonstrates how to center and standardize variables in R.

21.2.1 Video Tutorial

As usual, you have the choice to follow along with the written tutorial in this chapter or to watch the video tutorial below. Please note that the video shows how to grand-mean center and standardize variables – but not how to group-mean center variables. If your goal is to group-mean center variables, then check out the corresponding section below.

Link to video tutorial: https://youtu.be/2_TxnvZGtV0

21.2.2 Functions & Packages Introduced

Function	Package
`mean`	base R
`scale`	base R
`mutate`	`dplyr`

21.2.3 Initial Steps

If you haven’t already, save the file called “DiffPred.csv” into a folder that you will subsequently set as your working directory. Your working directory will likely be different than the one shown below (i.e., "H:/RWorkshop"). As a reminder, you can access all of the data files referenced in this book by downloading them as a compressed (zipped) folder from the my GitHub site: https://github.com/davidcaughlin/R-Tutorial-Data-Files; once you’ve followed the link to GitHub, just click “Code” (or “Download”) followed by “Download ZIP”, which will download all of the data files referenced in this book. For the sake of parsimony, I recommend downloading all of the data files into the same folder on your computer, which will allow you to set that same folder as your working directory for each of the chapters in this book.

Next, using the setwd function, set your working directory to the folder in which you saved the data file for this chapter. Alternatively, you can manually set your working directory folder in your drop-down menus by going to Session > Set Working Directory > Choose Directory…. Be sure to create a new R script file (.R) or update an existing R script file so that you can save your script and annotations. If you need refreshers on how to set your working directory and how to create and save an R script, please refer to Setting a Working Directory and Creating & Saving an R Script.

# Set your working directory
setwd("H:/RWorkshop")

Next, read in the .csv data file called “DiffPred.csv” using your choice of read function. In this example, I use the read_csv function from the readr package (Wickham, Hester, and Bryan 2024). If you choose to use the read_csv function, be sure that you have installed and accessed the readr package using the install.packages and library functions. Note: You don’t need to install a package every time you wish to access it; in general, I would recommend updating a package installation once ever 1-3 months. For refreshers on installing packages and reading data into R, please refer to Packages and Reading Data into R.

# Install readr package if you haven't already
# [Note: You don't need to install a package every 
# time you wish to access it]
install.packages("readr")

# Access readr package
library(readr)

# Read data and name data frame (tibble) object
df <- read_csv("DiffPred.csv")

## Rows: 377 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): emp_id, gender, race
## dbl (3): perf_eval, interview, age
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

# Print the names of the variables in the data frame (tibble) objects
names(df)

## [1] "emp_id"    "perf_eval" "interview" "age"       "gender"    "race"

# View variable type for each variable in data frame
str(df)

## spc_tbl_ [377 × 6] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ emp_id   : chr [1:377] "MA322" "MA323" "MA324" "MA325" ...
##  $ perf_eval: num [1:377] 4.2 4.9 4.2 5.3 4.2 6.9 3.4 5.8 4.4 5.6 ...
##  $ interview: num [1:377] 7.5 9.3 7.5 8 9.3 6.8 6.7 7 8.2 6.4 ...
##  $ age      : num [1:377] 29.7 31.7 29.4 37.9 30.9 46.2 43.9 47.8 31.7 44.6 ...
##  $ gender   : chr [1:377] "woman" "man" "woman" "woman" ...
##  $ race     : chr [1:377] "asian" "asian" "asian" "asian" ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   emp_id = col_character(),
##   ..   perf_eval = col_double(),
##   ..   interview = col_double(),
##   ..   age = col_double(),
##   ..   gender = col_character(),
##   ..   race = col_character()
##   .. )
##  - attr(*, "problems")=<externalptr>

# View first 6 rows of data frame
head(df)

## # A tibble: 6 × 6
##   emp_id perf_eval interview   age gender race 
##   <chr>      <dbl>     <dbl> <dbl> <chr>  <chr>
## 1 MA322        4.2       7.5  29.7 woman  asian
## 2 MA323        4.9       9.3  31.7 man    asian
## 3 MA324        4.2       7.5  29.4 woman  asian
## 4 MA325        5.3       8    37.9 woman  asian
## 5 MA326        4.2       9.3  30.9 man    black
## 6 MA327        6.9       6.8  46.2 woman  asian

There are 6 variables and 377 cases (i.e., employees) in the DiffPred data frame: emp_id, perf_eval, interview, age, gender, and race. Per the output of the str (structure) function above, the variables perf_eval, interview, and age are of type numeric (continuous: interval/ratio), and the variables emp_id, gender, and race are of type character (nominal/categorical). The variable emp_id is the unique employee identifier. Imagine that these data were collected as part of a criterion-related validation study - specifically, a concurrent validation design in which job incumbents were administered a rated structured interview (interview) 90 days after entering the organization. The structured interview (interview) variable was designed to assess individuals’ interpersonal skills, and ratings can range from 1 (very weak interpersonal skills) to 10 (very strong interpersonal skills). The interviews were scored by untrained raters who were often the hiring managers but not always. The perf_eval variable is the criterion (outcome) of interest, and it is a 90-day-post-hire measure of supervisor-rated job performance, with possible ratings ranging from 1-7, with 7 indicating high performance. The age variable represents the job incumbents’ ages (in years). The gender variable represents the job incumbents’ tender identify and is defined by two levels/categories/values: man and woman. Finally, the race variable represents the job incumbents’ race/ethnicity and is defined by three levels/categories/values: asian, black, and white.

21.2.4 Grand-Mean Center Variables

We only center variables that are of type numeric and that we conceptualize as having a continuous (interval/ratio) measurement scale. Further, if we’re centering variables prior to inclusion in a regression model, we often only center those variables that we plan on using as predictor variables (and not outcome variables). Thus, in our current data frame, we will grand-mean center just the interview and age variables; for more information on which variables to center, check out the chapter on estimating moderation effects in a multiple linear regression model. I will demonstrate three approaches, and you can try all three or just one, as any of the three will work.

21.2.4.1 Option 1: Basic Arithmetic and `mean` Function from Base R

We’ll start with what is arguably the most intuitive approach for grand-mean centering. We must begin by coming up with a new name for one of our soon-to-be grand-mean centered variable, and in this example, we will center the interview variable. I typically like to name the centered variable by simply adding the c_ prefix to the existing variable’s names (e.g., c_interview). Type the name of the data frame object to which the new centered variable will be attached (df), followed by the $ operator and the name of the new variable we are creating (c_interview). Next, add the <- operator to indicate what values you will assign to this new variable. To create a vector of values to assign to this new c_interview variable, we will subtract the mean (average) score for the original variable (interview) from each case’s value on the variable. Specifically, enter the name of the data frame object, followed by the $ operator and the name of the original variable (interview). After that, enter the subtraction symbol (-). And finally, type the name of the mean function from base R. As the first argument in the mean function, enter the name of the data frame object (df), followed by the $ operator and the name of the original variable (interview). As the second argument, enter na.rm=TRUE to indicate that you wish to drop missing values when calculating the grand mean for the sample.

# Grand-mean centering: Using basic arithmetic and the mean function from base R
df$c_interview <- df$interview - mean(df$interview, na.rm=TRUE)

To admire your work, take a look at the first six rows of your data frame object to inspect the new grand-mean centered variable called c_interview.

# Print first 6 rows of data frame
head(df)

## # A tibble: 6 × 7
##   emp_id perf_eval interview   age gender race  c_interview
##   <chr>      <dbl>     <dbl> <dbl> <chr>  <chr>       <dbl>
## 1 MA322        4.2       7.5  29.7 woman  asian       0.245
## 2 MA323        4.9       9.3  31.7 man    asian       2.04 
## 3 MA324        4.2       7.5  29.4 woman  asian       0.245
## 4 MA325        5.3       8    37.9 woman  asian       0.745
## 5 MA326        4.2       9.3  30.9 man    black       2.04 
## 6 MA327        6.9       6.8  46.2 woman  asian      -0.455

21.2.4.2 Option 2: `scale` Function from Base R

An alternative approach to grand-mean centering is to use the scale function from base R. For some, this function might be preferable to the approach described above, but again, it’s really a matter of preference. As an initial step, start by coming up with a new name for one of our soon-to-be grand-mean centered variable, and in this example, we will center the interview variable. As I mentioned above, I typically like to name the centered variable by simply adding the c_ prefix to the existing variable’s names - for example: c_interview. Now, enter the name of the data frame object to which the new centered variable will be attached (df), followed by the $ operator and the name of the new variable we are creating (c_interview). Next, add the <- operator to indicate what values you will assign to this new variable. To create a vector of values to assign to this new c_interview variable, begin by typing the name of the scale function. As the first argument, type the name of the data frame object (df), followed by the $ operator and the name of the original variable (interview). As the second argument, type center=TRUE which instructs the function to grand-mean center the values. As the third argument, type scale=FALSE to inform the function that you do not which to scale or standardize the variable you are centering.

# Grand-mean centering: Using scale function from base R
df$c_interview <- scale(df$interview, center=TRUE, scale=FALSE)

Take a look at the first six rows of your data frame object to inspect the new grand-mean centered variable called c_interview.

# Print first 6 rows of data frame
head(df)

## # A tibble: 6 × 7
##   emp_id perf_eval interview   age gender race  c_interview[,1]
##   <chr>      <dbl>     <dbl> <dbl> <chr>  <chr>           <dbl>
## 1 MA322        4.2       7.5  29.7 woman  asian           0.245
## 2 MA323        4.9       9.3  31.7 man    asian           2.04 
## 3 MA324        4.2       7.5  29.4 woman  asian           0.245
## 4 MA325        5.3       8    37.9 woman  asian           0.745
## 5 MA326        4.2       9.3  30.9 man    black           2.04 
## 6 MA327        6.9       6.8  46.2 woman  asian          -0.455

21.2.4.3 Option 3: `mutate` Function from `dplyr` and `mean` Function from Base R

This third approach to grand-mean centering variables can come in handy when we want to grand-mean center multiple variables in a single step. With that said, I will begin by showing how to grand-mean center a single variable using this approach, and then we will extend the code/script to involve two variables. We will be using the mutate function from the dplyr package (Wickham et al. 2023), so if you haven’t already, be sure to install and access the dplyr package using the functions below.

# Install dplyr package if not already installed
install.packages("dplyr")

# Access dplyr package
library(dplyr)

In this example, I use the pipe (%>%) operator from the dplyr package (and by extension, the magrittr package). For more information on using pipes with the mutate function, check out the chapter on cleaning data, and for more information on using pipes in general, check out this free eBook by one of the creator’s of the dplyr package and RStudio.

To begin, type the name of the data frame object you wish to create (or overwrite), and in this example, we are going to overwrite our existing data frame (df) by naming our new data frame object the same thing. To create and name the object, we use the <- assignment operator.
Type the name of the original data frame to which the variable we wish to grand-mean center belongs (df), followed by the pipe (%>%) operator.
Type the name of the mutate function. As the first and only argument, begin by typing what you would like to name the new grand-mean centered variable (c_interview). After that, type the = operator to assign values to this new variable. Finally, we will specify a formula to inform the function how the new values will be calculated. Specifically, we type the name of the original variable (interview), type the subtraction (-) operator, and finally type the name of the mean function from base R. As the first argument in the mean function, type the name of the variable (interview) for which you would like to calculate the mean, and as the second argument, enter na.rm=TRUE to indicate that you wish to drop missing values when computing this grand mean for the sample.

# Grand-mean centering: Using mutate function from dplyr and mean function from base R
df <- df %>%
  mutate(c_interview = interview - mean(interview, na.rm=TRUE))

Take a look at the first six rows of your data frame object to inspect the new grand-mean centered variable called c_interview.

# Print first 6 rows of data frame
head(df)

## # A tibble: 6 × 7
##   emp_id perf_eval interview   age gender race  c_interview
##   <chr>      <dbl>     <dbl> <dbl> <chr>  <chr>       <dbl>
## 1 MA322        4.2       7.5  29.7 woman  asian       0.245
## 2 MA323        4.9       9.3  31.7 man    asian       2.04 
## 3 MA324        4.2       7.5  29.4 woman  asian       0.245
## 4 MA325        5.3       8    37.9 woman  asian       0.745
## 5 MA326        4.2       9.3  30.9 man    black       2.04 
## 6 MA327        6.9       6.8  46.2 woman  asian      -0.455

One of the advantages of using this approach is that we can center multiple variables in a single step. To do so, we simply specify which additional variable we would like to grand-mean center by adding an additional argument to the mutate function.

# Grand-mean centering: Multiple variables
df <- df %>%
  mutate(c_interview = interview - mean(interview, na.rm=TRUE),
         c_age = age - mean(age, na.rm=TRUE))

21.2.5 Group-Mean Center Variables

When estimating multilevel models, there are certain contexts in which group-mean centering should be applied. For more information on centering in general and group-mean centering specifically, please check out this handout created by my colleague Jason Newsom.

Like we did above with Option 3 for grand-mean centering, for group-mean centering we will also use the mutate function from dplyr; however, we will also go a step further by applying the group_by function from dplyr to group the data by a values of a categorical (nominal, ordinal) grouping variable. For more information grouping data, check out the chapter on aggregation and segmentation.

Let’s group-mean center interview scores by race variable categories.

To begin, type the name of the data frame object you wish to create (or overwrite), and in this example, we are going to overwrite our existing data frame (df) by naming our new data frame object the same thing. To create and name the object, we use the <- assignment operator.
Type the name of the original data frame to which the variable we wish to grand-mean center belongs (df), followed by the pipe (%>%) operator.
Type the name of the group_by function, and as the function’s argument(s), specify the name(s) of the grouping variable(s); in this example, we will group by the race variable. Follow this function with the the pipe (%>%) operator.
Type the name of the mutate function. As the first and only argument, begin by typing what you would like to name the new group-mean centered variable (gpmc_interview). After that, type the = operator to assign values to this new variable. Finally, we will specify a formula to inform the function how the new values will be calculated. Specifically, we type the name of the original variable (interview), type the subtraction (-) operator, and finally type the name of the mean function from base R. As the first argument in the mean function, type the name of the variable (interview) for which you would like to calculate the mean, and as the second argument, enter na.rm=TRUE to indicate that you wish to drop missing values when computing this grand mean for the sample. Follow the mutate function with the pipe (%>%) operator.
Finally, type the name of the ungroup function, and don’t specify any arguments within the function’s parentheses. This last step makes sure that we ungroup the grouping that we initially applied to the data frame.

# Group-mean centering by race variable
df <- df %>%
  group_by(race) %>%
  mutate(gpmc_interview = interview - mean(interview, na.rm=TRUE)) %>%
  ungroup()

Take a look at the first six rows of your data frame object to inspect the new group-mean centered variable called gpmc_interview.

# Print first 6 rows of data frame
head(df)

## # A tibble: 6 × 9
##   emp_id perf_eval interview   age gender race  c_interview c_age gpmc_interview
##   <chr>      <dbl>     <dbl> <dbl> <chr>  <chr>       <dbl> <dbl>          <dbl>
## 1 MA322        4.2       7.5  29.7 woman  asian       0.245 -7.04         0.730 
## 2 MA323        4.9       9.3  31.7 man    asian       2.04  -5.04         2.53  
## 3 MA324        4.2       7.5  29.4 woman  asian       0.245 -7.34         0.730 
## 4 MA325        5.3       8    37.9 woman  asian       0.745  1.16         1.23  
## 5 MA326        4.2       9.3  30.9 man    black       2.04  -5.84         0.877 
## 6 MA327        6.9       6.8  46.2 woman  asian      -0.455  9.46         0.0305

21.2.6 Standardize Variables

There are different approaches we can use to standardize a variable. I provide two options below. You can try both options or one. Both will get you to the same end. I suggest picking the one that is most intuitive for you.

21.2.6.1 Option 1: `scale` Function from Base R

As our first approach to standardizing (or scaling) a variable, we will use the scale function from base R. In fact, we can even apply this function within the lm (linear regression) function from base R to get standardized regression coefficients; for more information on standardized regression coefficients, check out the chapters on predicting criterion scores using simple linear regression and estimating incremental validity using multiple linear regression. Using the scale function on its own is fairly straightforward when the goal is to standardize. Start by coming up with a new name for one of our soon-to-be grand-mean centered variable, and in this example, we will center the interview variable. I typically like to name the standardized variable by simply adding the st_ prefix to the existing variable’s names (e.g., st_interview). Now, enter the name of the data frame object to which the new centered variable will be attached (df), followed by the $ operator and the name of the new variable we are creating (st_interview). Next, add the <- operator to indicate what values you will assign to this new variable. To assign values to this new st_interview variable, begin by typing the name of the scale function. As the first and only argument, type the name of the data frame object (df), followed by the $ operator and the name of the original variable (interview).

# Standardizing: Using scale function from base R
df$st_interview <- scale(df$interview)

Take a look at your data frame object to inspect the new standardized variable called st_interview.

# Print first 6 rows of data frame
head(df)

## # A tibble: 6 × 10
##   emp_id perf_eval interview   age gender race  c_interview c_age gpmc_interview st_interview[,1]
##   <chr>      <dbl>     <dbl> <dbl> <chr>  <chr>       <dbl> <dbl>          <dbl>            <dbl>
## 1 MA322        4.2       7.5  29.7 woman  asian       0.245 -7.04         0.730             0.203
## 2 MA323        4.9       9.3  31.7 man    asian       2.04  -5.04         2.53              1.70 
## 3 MA324        4.2       7.5  29.4 woman  asian       0.245 -7.34         0.730             0.203
## 4 MA325        5.3       8    37.9 woman  asian       0.745  1.16         1.23              0.618
## 5 MA326        4.2       9.3  30.9 man    black       2.04  -5.84         0.877             1.70 
## 6 MA327        6.9       6.8  46.2 woman  asian      -0.455  9.46         0.0305           -0.378

21.2.6.2 Option 2: `mutate` Function from `dplyr` and `scale` Function from Base R

This alternative approach to standardizing variables can come in handy when we want to standardize multiple variables in a single step. With that said, I will begin by showing how to standardize a single variable using this approach, and then we will try two variables. We will use the mutate function from the dplyr package, so if you haven’t already, be sure to install and access the dplyr package using the functions below.

# Install dplyr package if not already installed
install.packages("dplyr")

# Access dplyr package
library(dplyr)

As the first step, enter the name of the data frame object you wish to create (or overwrite), and in this example, we are going to overwrite our existing data frame (df) by naming our new data frame object the same thing. To create and name the object, we use the <- operator. Second, type the name of the original data frame to which the variable we wish to standardize belongs (df). Third, type the pipe (%>%) operator. Fourth, type the name of the mutate function. As the first and only argument, begin by typing what you would like to call the new standardized variable, which in this case we will call the variable st_interview. After that, type the name of the scale function from base R. As the first and only argument in the scale function, type the name of the variable (interview) that you would like to standardize.

# Standardizing: Using mutate function from dplyr and scale function from base R
df <- df %>%
  mutate(st_interview = scale(interview))

Take a look at your data frame object to inspect the new standardized variable called st_interview.

# Print first 6 rows of data frame
head(df)

## # A tibble: 6 × 10
##   emp_id perf_eval interview   age gender race  c_interview c_age gpmc_interview st_interview[,1]
##   <chr>      <dbl>     <dbl> <dbl> <chr>  <chr>       <dbl> <dbl>          <dbl>            <dbl>
## 1 MA322        4.2       7.5  29.7 woman  asian       0.245 -7.04         0.730             0.203
## 2 MA323        4.9       9.3  31.7 man    asian       2.04  -5.04         2.53              1.70 
## 3 MA324        4.2       7.5  29.4 woman  asian       0.245 -7.34         0.730             0.203
## 4 MA325        5.3       8    37.9 woman  asian       0.745  1.16         1.23              0.618
## 5 MA326        4.2       9.3  30.9 man    black       2.04  -5.84         0.877             1.70 
## 6 MA327        6.9       6.8  46.2 woman  asian      -0.455  9.46         0.0305           -0.378

One of the advantages of using this approach is that we can standardize multiple variables in a single step. To do so, we simply specify which additional variable we would like to standardize by adding an additional argument to the mutate function.

# Standardizing: Multiple variables
df <- df %>%
  mutate(st_interview = scale(interview),
         st_age = scale(age))

21.2.7 Summary

In this chapter, we learned how to grand-mean center and standardize variables. For grand-mean centering variables, we used basic arithmetic and the mean function from base R, the scale function from base R, and a combination of the mutate function from dplyr and the mean function from base R. For standardizing variables, we used the scale function from base R and a combination of the mutate function from dplyr and the scale function from base R.

References

Wickham, Hadley, Romain François, Lionel Henry, Kirill Müller, and Davis Vaughan. 2023. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.

Wickham, Hadley, Jim Hester, and Jennifer Bryan. 2024. Readr: Read Rectangular Text Data. https://CRAN.R-project.org/package=readr.