# Chapter 42 Testing for Differential Prediction Using Moderated Multiple Linear Regression

In this chapter, we will learn how to apply test whether a selection tool shows evidence of differential prediction by using moderating multiple linear regression. We’ll begin with conceptual overviews of the moderated multiple linear regression and differential prediction, and we’ll conclude with a tutorial.

## 42.1 Conceptual Overview

In this chapter, we will learn how to use an ordinary least squares (OLS) moderated multiple linear regression model to test for differential prediction of an employee selection tool due to protected class membership. Thus, the way in which we build and interpret our moderated multiple regression model will be couched in the context of differential prediction. Nonetheless, this tutorial should still provide you with the technical and interpretation skills to understand how to apply moderated multiple linear regression in other non-selection contexts.

### 42.1.1 Review of Moderated Multiple Linear Regression

Link to conceptual video: https://youtu.be/mkia_KLjJ0M

**Moderated multiple linear regression (MMLR)** is a specific type of multiple linear regression, which means that MMLR involves multiple predictor variables and a single outcome variable. For more information on multiple linear regression, check out the chapter on estimating incremental validity. In MMLR, we are in effect determining whether the association between a predictor variable and the outcome variable is conditional upon the level/value of another predictor variable, the latter of which we often refer to as a moderator or moderator variable.

**Moderators:** A **moderator** (i.e., moderator variable) is a predictor variable upon which an association between two other variables is conditional. A moderator variable can be continuous (e.g., age: measured in years) or categorical (e.g., location: Portland vs. Beaverton). When we find that one variable moderates the association between two other variables, we often refer to this as an **interaction** or interaction effect. For example, let’s imagine that we are interested in the association between job satisfaction and job performance in our organization. By introducing the categorical moderator variable of employee location (Portland vs. Beaverton), we can determine whether the magnitude and/or sign of the association between job satisfaction and job performance varies by the location at which employees work. Perhaps we find that the association between job satisfaction and job performance is significantly stronger for employees who work in Beaverton as compared to employees who work in Portland (see figure below), and thus we would conclude that there is a significant interaction between job satisfaction and location in relation to job performance.

When investigating whether an interaction exists between two variables relative to the outcome variable, we need to create a new variable called an **interaction term** (i.e., **product term**). An interaction term is created by multiplying the scores of two predictor variables – or in other words, the computing the product of two variables’ respective scores. In fact, the presence of an in interaction term – and associated multiplication – signals why MMLR is referred to as a *multiplicative model*. In terms of the language we use to describe an interaction and the associated interaction term, we often refer to one of the predictor variables involved in an interaction as the moderator. For an illustration of how an interaction term is created, please refer to the table below.

**Centering Predictor & Moderator Variables:** To improve the interpretability of the main effects (but not the interaction term) and to reduce collinearity between the predictor variables involved in the interaction term, we typically grand-mean center any continuous (interval, ratio) predictor variables; we do *not* need to center the outcome variable or a predictor or moderator variable that is categorical (e.g., dichotomous). As discussed in the chapter on centering and standardizing variables, grand-mean centering refers to the process of subtracting the overall sample mean for a variable from each case’s score on that variable. We center the predictor variables (i.e., predictor and moderator variables) *before* computing the interaction term based on those variables. To illustrate how grand-mean centering works, take a look at the table below, and note that we are just subtracting the (grand) mean for the sample from the specific value on the predictor variable for each case.

When we grand-mean center the predictor variables in a regression model, it also changes our interpretation of the \(\hat{Y}\)-intercept value. If you recall from prior reviews of simple linear regression and multiple linear regression, the \(\hat{Y}\)-intercept value is the mean of the outcome variable when each predictor variable(s) in the model is/are set to zero. Thus, when we center each predictor variable, we shift the mean of each variable to zero, which means that after grand-mean centering, the \(\hat{Y}\)-intercept value is the mean of the outcome variable when each predictor variable is set to its mean. Beyond moderated multiple linear regression, grand-mean centering can be useful in simple and multiple linear regression models when interpreting the regression coefficients associated with the intercept and the slope(s), as in some cases, a score of zero on the predictor variable is theoretically meaningless, implausible, or both. For example, if one of our predictor variables has an interval measurement scale that ranges from 10-100, then a score of zero is not even part of the scale range, which would make shifting the zero value to the grand-mean a more meaningful value.

Regardless of your decision to grand-mean center or to not grand-mean center the predictor variable and moderator variables, the interaction term’s significance level (*p*-value) and model fit (i.e., *R*^{2}) will remain the same. In fact, sometimes it makes more sense to plot the findings of your model based on the uncentered version of the variables. That choice is really up to you and should be based on the context at hand.

**Moderated Multiple Linear Regression:** In moderated multiple linear regression (MMLR), we have three or more predictor variables and a single outcome variable, where one of the predictor variables is the interaction term (i.e., product term) between two of the predictor variables (e.g., predictor and moderator variables). Just like multiple linear regression, MMLR allows for a series of covariates (control variables) to be included in the model; accordingly, each regression coefficient (i.e., slope, weight) represents the relationship between the associated predictor and the outcome variable when statistically controlling for all other predictors in the model.

To better understand what MMLR is and how it works, I find that it’s useful to consider a MMLR model in equation form. Specifically, let’s consider a scenario in which we estimate a MMLR with two predictor variables, their interaction term (i.e., product term), and a single outcome variable. The equation for such a model with *unstandardized* regression coefficients (\(b\)) would be as follows:

\(\hat{Y} = b_{0} + b_{1}X_1 + b_{2}X_2 + b_{3}X_1*X_2 + e\)

where \(\hat{Y}\) represents the predicted score on the outcome variable (\(Y\)), \(b_{0}\) represents the \(\hat{Y}\)-intercept value (i.e., model constant) when the predictor variables \(X_1\) and \(X_2\) and interaction term \(X_1*X_2\) are equal to zero, \(b_{1}\), \(b_{2}\), and \(b_{3}\) represent the unstandardized coefficients (i.e., weights, slopes) of the associations between the predictor variables \(X_1\) and \(X_2\) and interaction term \(X_1*X_2\), respectively, and the outcome variable \(\hat{Y}\), and \(e\) represents the (residual) error in prediction. If the regression coefficient associated with the interaction term (\(b_{3}\)) is found to be statistically significant, then we would conclude that there is evidence of a significant interaction effect.

If we conceptualize one of the predictor variables as a moderator variable, then we might re-write the equation from above by swapping out \(X_2\) with \(W\) (or \(Z\)). The meaning of the equation remains the same, but changing the notation in this way may signal more clearly which variable we are labeling a moderator.

\(\hat{Y} = b_{0} + b_{1}X + b_{2}W + b_{3}X*W + e\)

Unstandardized regression coefficient indicates the “raw” regression coefficient (i.e., weight, slope) when controlling for the associations between all other predictors in the model and the outcome variable; accordingly, each unstandardized regression coefficient represents how many unstandardized units of \(\hat{Y}\) increase/decrease as a result of a single unit increase in a predictor variable, when statistically controlling for the effects of other predictors in the regression model.

Typically, we are most interested in whether slope differences exist; with that being said, in the context of testing for differential prediction (described below), we are also interested in whether intercept differences exist. In essence, a significant interaction term indicates the presence of **slope differences**, where slopes differences refer to significant variation in the association (i.e., slope) between the predictor variable and outcome variable by levels/values of the moderator variable. In the context of our first MMLR equation (see above), if the regression coefficient (\(b_{3}\)) associated with the interaction term (\(X_1*X_2\)) is statistically significant, we conclude that there is significant interaction and thus significant slope differences. In fact, we might even us language like, “\(X_2\) moderates the association between \(X_1\) and \(Y\).”

When we find a statistically significant interaction, it is customary to plot the **simple slopes** and estimate whether each of the simple slopes is significantly different from zero. Simple slopes allow us to probe the form of the significant interaction so that we can more accurately describe it. Plus, the follow-up simple slopes analysis helps us understand at which levels of the moderator variable the association between the predictor and outcome variables is statistically significant. For a *continuous* moderator variable, we typically plot and estimate the simple slopes at 1 standard deviation (SD) above and below the mean for the variable – and sometimes at the mean of the variable as well; with that being said, if we want, we can choose more theoretically or empirically meaningful values for the moderator variable. For a *categorical* variable, we plot and estimate the simple slopes based on the different discrete levels (categories) of the variable. For example, if our moderator variable captures gender identity operationalized as man (0) and woman (1), then we would plot the simple slopes when sex is equal to man (0) and when sex is equal to woman (1).

#### 42.1.1.1 Statistical Assumptions

The statistical assumptions that should be met prior to running and/or interpreting estimates from a moderated multiple linear regression model are the same as those for conventional multiple linear regression include:

- Cases are randomly sampled from the population, such that the variable scores for one individual are independent of the variable scores of another individual;
- Data are free of multivariate outliers;
- The association between the predictor and outcome variables is linear;
- There is no (multi)collinearity between predictor variables;
- Average residual error value is zero for all levels of the predictor variables;
- Variances of residual errors are equal for all levels of the predictor variables, which is referred to as the assumption of homoscedasticity;
- Residual errors are normally distributed for all levels of the predictor variables.

The fourth statistical assumption refers to the concept of **collinearity** (**multicollinearity**). This can be a tricky concept to understand, so let’s take a moment to unpack it. When two or more predictor variables are specified in a regression model, as is the case with multiple linear regression, we need to be wary of collinearity. Collinearity refers to the extent to which predictor variables correlate with each other. Some level of intercorrelation between predictor variables is to be expected and is acceptable; however, if collinearity becomes substantial, it can affect the weights – and even the signs – of the regression coefficients in our model, which can be problematic from an interpretation standpoint. As such, we should avoid including predictors in a multiple linear regression model that correlate highly with one another. The **tolerance** statistic is commonly computed and serves as an indicator of collinearity. The tolerance statistic is computed by computing the shared variance (*R*^{2}) of just the predictor variables in a single model (excluding the outcome variable), and subtracting that *R*^{2} value from 1 (i.e., 1 - *R*^{2}). We typically grow concerned when the tolerance statistic falls below .20 and closer to .00. Ideally, we want the tolerance statistic to approach 1.00, as this indicates that there are lower levels of collinearity. From time to time, you might also see the **variance inflation factor** (**VIF**) reported as an indicator of collinearity; the VIF is just the reciprocal of the tolerance (i.e., 1/tolerance), and in my opinion, it is redundant to report and interpret both the tolerance and VIF. My recommendation is to focus just on the tolerance statistic when inferring whether the statistical assumption of no collinearity might have been violated.

Finally, with respect to the assumption that cases are randomly sampled from population, we will assume in this chapter’s data that this is not an issue. If we were to suspect, however, that there were some clustering or nesting of cases in units/groups (e.g., by supervisors, units, or facilities) with respect to our outcome variable, then we would need to run some type of multilevel model (e.g., hierarchical linear model, multilevel structural equation model), which is beyond the scope of this tutorial. An *intraclass correlation (ICC)* can be used to diagnose such nesting or cluster. Failing to account for clustering or nesting in the data can bias estimates of standard errors, which ultimately influences the *p*-values and inferences of statistical significance.

#### 42.1.1.2 Statistical Signficance

Using null hypothesis significance testing (NHST), we interpret a *p*-value that is *less than .05* (or whatever two- or one-tailed alpha level we set) to meet the standard for statistical significance, meaning that we reject the null hypothesis that the regression coefficient is equal to zero when controlling for the effects of the other predictor variable(s) and interaction term variable in the model. In other words, if a regression coefficient’s *p*-value is less than .05, we conclude that the regression coefficient differs from zero to a statistically significant extent when controlling for the effects of other predictor variables in the model. In contrast, if the regression coefficient’s *p*-value is *equal to or greater than .05*, then we fail to reject the null hypothesis that the regression coefficient is equal to zero when controlling for the effects of other predictor variable(s) and interaction term variable in the model. Put differently, if a regression coefficient’s *p*-value is equal to or greater than .05, we conclude that the regression coefficient does *not* differ from zero to a statistically significant extent, leading us to conclude that there is no association between the predictor variable (or interaction term variable) and the outcome variable in the population – when controlling for the effects of other predictor variables and interaction term variable in the model.

When setting an alpha threshold, such as the conventional two-tailed .05 level, sometimes the question comes up regarding whether borderline *p*-values signify significance or nonsignificance. For our purposes, let’s be very strict in our application of the chosen alpha level. For example, if we set our alpha level at .05, *p* = .049 would be considered statistically significant, and *p* = .050 would be considered statistically nonsignificant.

Because our regression model estimates are based on data from a sample that is drawn from an underlying population, sampling error will affect the extent to which our sample is representative of the population from which its drawn. That is, a regression coefficient estimate (*b*) is a *point estimate* of the population parameter that is subject to sampling error. Fortunately, confidence intervals can give us a better idea of what the true population parameter value might be. If we apply an alpha level of .05 (two-tailed), then the equivalent confidence interval (CI) is a 95% CI. In terms of whether a regression coefficient is statistically significant, if the lower and upper limits of 95% CI do *not* include zero, then this tells us the same thing as a *p*-value that is less than .05. Strictly speaking, a 95% CI indicates that if we were to hypothetically draw many more samples from the underlying population and construct CIs for each of those samples, then the true parameter (i.e., true value of the regression coefficient in the population) would likely fall within the lower and upper bounds of 95% of the estimated CIs. In other words, the 95% CI gives us an indication of plausible values for the population parameter while taking into consideration sampling error. A wide CI (i.e., large difference between the lower and upper limits) signifies more sampling error, and a narrow CI signifies less sampling error.

#### 42.1.1.3 Practical Significance

As a reminder, an effect size is a standardized metric that can be compared across samples. In a moderated multiple linear regression (MMLR) model, an *unstandardized* regression coefficient (\(b\)) is *not* an effect size. The reason being, an unstandardized regression coefficient estimate is based on the original scaling of the predictor and outcome variables, and thus the same effect will take on different regression coefficients to the extent that the predictor and outcome variables have different scalings across samples.

A *standardized* regression coefficient (\(\beta\)) can be interpreted as an effect size (and thus an indicator of practical significance) given that it is standardized. With that being said, I suggest doing so with caution as collinearity (i.e., correlation) between predictor variables in the model can bias our interpretation of \(\beta\) as an effect size. Thus, if your goal is just to understand the bivariate association between a predictor variable and an outcome variable (without introducing statistical control), then I recommend to just estimate a correlation coefficient as an indicator of practical significance, which I discuss in the chapter on estimating criterion-related validity using correlations.

In a MMLR model, we can also describe the magnitude of the effect in terms of the proportion of variance explained in the outcome variable by the predictor variables (i.e., *R*^{2}). That is, in a multiple linear regression model, *R*^{2} represents the proportion of *collective* variance explained in the outcome variable by all of the predictor variables. Conceptually, we can think of the overlap between the variability in the predictor variables and and outcome variable as the variance explained (*R*^{2}), and *R*^{2} is a way to evaluate how well a model fits the data (i.e., model fit). I’ve found that the *R*^{2} is often readily interpretable by non-analytics audiences. For example, an *R*^{2} of .25 in a MMLR model can be interpreted as: the predictor variable and interaction term variable scores explain 25% of the variability in scores on the outcome variable. That is, to convert an *R*^{2} from a proportion to a percent, we just multiply by 100.

R^{2} |
Description |
---|---|

.01 | Small |

.09 | Medium |

.25 | Large |

### 42.1.2 Review of Differential Prediction

**Differential prediction** refers to differences in intercepts, slopes, or both based on subgroup (projected class) membership (Society for Industrial & Organizational Psychology 2018; Cleary 1968), and sometimes differential prediction is referred to as *predictive bias*. Assuming a single regression line is used to predict the criterion (e.g., job performance) scores of applicants, differential prediction is concerned with overpredicting or underpredicting criterion scores for one subgroup relative to another (Cascio and Aguinis 2005). A common regression line would be inappropriate when evidence of intercept differences, slope differences, or both exist.

In the U.S., the use of separate regression lines – one for each subgroup – is legally murky, with some interpreting separate regression lines as violation of federal law and others cautioning that the practice might be interpreted as score adjustment, thereby running afoul of the Civil Rights Act of 1991 (Berry 2015; Saad and Sackett 2002). Specifically, Section 106 of the Civil Rights Act of 1991 amends Section 703 of the Civil Rights Act of 1964 to prohibit the discriminatory use of selection test scores:

“It shall be an unlawful employment practice for a respondent, in connection with the selection or referral of applicants or candidates for employment or promotion, to adjust the scores of, use different cutoff scores for, or otherwise alter the results of, employment related tests on the basis of race, color, religion, sex, or national origin.”

Thus, a legally cautious approach would be to discontinue the use of a selection tool in its current form should evidence of differential prediction come to light.

We’ll begin by considering an example of **intercept differences** when predicting scores on the criterion (e.g., job performance) based on scores on a selection tool. Imagine a situation in which scores on a selection tool differentially predict scores on job performance for men and women, such that the slopes are the same, but the intercepts of each slope are different (see figure below). The dashed black line indicates what a common regression line would look like if it were used. As you can see, if we were to use a single common regression line when intercept differences exist between men and women, then we would tend to *underpredict* job performance scores for men and *overpredict* for women.

Next, we’ll consider an example involving **slope differences** in the context of differential prediction. In the figure below, the slopes (e.g., criterion-related validities) are different when the selection tool is used for men as compared to women. Again, the dashed black line represents what would happen if a common regression line were used, which would clearly be inappropriate. Note that, in this particular example, the job performance scores would generally be *underpredicted* for men and *overpredicted* for women when using the same regression. Further, the selection tool appears to show criterion-related validity (i.e., a significant slope) for men but not for women. Accordingly, based on this sample, the selection tool might only be a valid predictor of job performance for men. Clearly this would be a problem.

Finally, we’ll consider an example involving **both intercept differences and slope differences**. The figure below shows that use of a common regression line would pose problems in terms of predictive bias. In this example, for men, using of a common regression line would *overpredict* job performance scores for a certain range of selection tool scores and *underpredict* job performance scores for another range of selection tool scores; the opposite would be true for women.

**Words of Caution:** In this tutorial, we will learn how to apply MMLR in the service of detecting differential predication using an approach that is generally supported by the U.S. court system and researchers. With that said, using MMLR to detect differential prediction should be used cautiously with small samples, as failure to detect differential prediction that actually exists (i.e., Type II Error, false negative) could be the result of low statistical power, and relatedly, unreliability or bias in the selection tool or criterion and violation of certain statistical assumptions (e.g., homoscedasticity of residual error variances) can affect our ability to find differential prediction that actually exists (Aguinis and Pierce 1998). Further, when one subgroup is less than 30% of the total sample size, statistical power is also diminished (Stone-Romero, Alliger, and Aguinis 1994). The take-home message is to remain cautious and thoughtful when investigating differential prediction.

**Multistep MMLR Process for Differential Prediction:** We will use a three-step “step-up” process for investigating differential prediction using MMLR (Society for Industrial & Organizational Psychology 2018; Cleary 1968). This process uses what is referred to as hierarchical regression, as additional predictor variables are added at each subsequent step, and their incremental validity above and beyond the other predictors in the model is assessed.

In the **first step**, we will regress the criterion variable on the selection tool variable, and in this step, we will verify whether the selection tool shows *criterion-related validity* by itself. If we find evidence of criterion-related validity for the selection tool in this first step, then we proceed to the second step.

In the **second step**, we include the selection tool variable as before but add the dichotomous (binary) protected class/group variable to the model. If the protected class variable is statistically significant when controlling for the selection tool variable, then there is evidence of *intercept differences*. You might be asking yourself: What happens when a protected class variable is not inherently categorical or, specifically, dichotomous? When it comes to the protected class of race, many organizations have more than two races represented among their employees. In this situation, it is customary (at least within the U.S.) to compare two subgroups at time (e.g., White and Black) or to create a dichotomy consisting of the majority race and members of other non-majority races (e.g., White and Non-White). For a inherently continuous variable like age, classically, the standard procedure was to dichotomize the age variable by those who are 40 years and older and those who are younger than 40 years, as this is consistent with the U.S. Age Discrimination in Employment Act (ADEA) of 1967. There are two potential issues with this: (a) Some states have further age protections in place that protect workers under 40 as well, and (b) Some jobs in organizations have very young workforces, leaving the question of whether people in their 30s are being favored over people in their 20s, or vice versa (for example). In addition, when we look at organizations with workers who are well into their 50s, 60s, and 70s, then there might be evidence of age discrimination and differential prediction for those who are in their 50s compared to those who are in their 60s. Dichotomizing the age variable using a 40 and above vs. under 40 split might miss these issues and generally nuance. (For a review of this topic, see Fisher et al. 2017.) As such, with a variable like age, there might be some advantages to keeping it as a continuous variable, as MMLR can handle this.

In the **third step**, we include the selection tool and protected class variables as before but add the interaction term between the selection tool and protected class variables. If the interaction term is statistically significant, then there is evidence of differential prediction in the form of *slope differences*.

#### 42.1.2.1 Sample Write-Up

Our HR analytics team recently found evidence in support of the criterion-related validity of a new structured interview selection procedure. As a next step, we investigated whether there might be evidence of differential prediction of the structured interview based on the U.S. protected class of gender. Based on a sample of 377 individuals, we applied a three-step process with moderated multiple linear regression, which is consistent with established principles (Society for Industrial & Organizational Psychology 2018; Cleary 1968). Based on our analyses, we found evidence of differential prediction (i.e., predictive bias) based on gender for the new structured interview procedure, and specifically, we found evidence of both intercept and slope differences. In the first step, we found evidence of criterion-related validity based on a simple linear regression model, as structured interview scores were positively and significantly associated with job performance (*b* = .307, *p* < .001), thereby indicating some degree of job-relatedness and -relevance. In the second step, we added the protected class variable associated with gender to the model, resulting in a multiple linear regression model, and found that the model intercept for women was significantly lower than the intercept for men (*b* = -.602, *p* < .001). In the third step, we added the interaction term between the structured interview and gender variables, resulting in a moderated multiple linear regression model, and we found that gender moderated the association between interview scores and job performance scores to a statistically significant extent (*b* = .284, *p* = .007). Based on follow-up simple slopes analyses, we found that the association between structured interview scores and job performance scores was statistically significant for women (*b* = .26, *p* < .01), such that for every one unit increase in structured interview scores, job performance scores tended to increase by .26 units; in contrast, the associated between structured interview scores and job performance scores for men was nonsignificant (*b* = -.02, *p* = .77). In sum, we found evidence of both intercept and slope differences, which means that this interview tool shows predictive bias with respect to gender, making the application of a common regression line (i.e., equation) inappropriate according to U.S. guidelines. If possible, this structured interview should be redesigned or the interviewers should be trained to reduce the predictive bias based on gender.
In the interim, we caution against using separate regression lines for men and women, as this may be interpreted as running afoul of the U.S. Civil Rights Act of 1991 (Saad and Sackett 2002). A more conservative approach would be to develop a structured interview process that allows for the use of a common regression line across legally protected groups.

## 42.2 Tutorial

This chapter’s tutorial demonstrates how to use moderated multiple linear regression in the service of detecting differential prediction in a selection tool.

### 42.2.1 Video Tutorial

As usual, you have the choice to follow along with the written tutorial in this chapter or to watch the video tutorial below.

Link to video tutorial: https://youtu.be/3n2vvMc4yHs

### 42.2.2 Functions & Packages Introduced

Function | Package |
---|---|

`mean` |
base R |

`Regression` |
`lessR` |

`reg_brief` |
`lessR` |

`probe_interaction` |
`interactions` |

### 42.2.3 Initial Steps

If you haven’t already, save the file called **“DiffPred.csv”** into a folder that you will subsequently set as your working directory. Your working directory will likely be different than the one shown below (i.e., `"H:/RWorkshop"`

). As a reminder, you can access all of the data files referenced in this book by downloading them as a compressed (zipped) folder from the my GitHub site: https://github.com/davidcaughlin/R-Tutorial-Data-Files; once you’ve followed the link to GitHub, just click “Code” (or “Download”) followed by “Download ZIP”, which will download all of the data files referenced in this book. For the sake of parsimony, I recommend downloading all of the data files into the same folder on your computer, which will allow you to set that same folder as your working directory for each of the chapters in this book.

Next, using the `setwd`

function, set your working directory to the folder in which you saved the data file for this chapter. Alternatively, you can manually set your working directory folder in your drop-down menus by going to *Session > Set Working Directory > Choose Directory…*. Be sure to create a new R script file (.R) or update an existing R script file so that you can save your script and annotations. If you need refreshers on how to set your working directory and how to create and save an R script, please refer to Setting a Working Directory and Creating & Saving an R Script.

```
# Set your working directory
setwd("H:/RWorkshop")
```

Next, read in the .csv data file called **“DiffPred.csv”** using your choice of read function. In this example, I use the `read_csv`

function from the `readr`

package (Wickham, Hester, and Bryan 2022). If you choose to use the `read_csv`

function, be sure that you have installed and accessed the `readr`

package using the `install.packages`

and `library`

functions. *Note: You don’t need to install a package every time you wish to access it; in general, I would recommend updating a package installation once ever 1-3 months.* For refreshers on installing packages and reading data into R, please refer to Packages and Reading Data into R.

```
# Install readr package if you haven't already
# [Note: You don't need to install a package every
# time you wish to access it]
install.packages("readr")
```

```
# Access readr package
library(readr)
# Read data and name data frame (tibble) object
<- read_csv("DiffPred.csv") DiffPred
```

```
## Rows: 377 Columns: 6
## ── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): emp_id, gender, race
## dbl (3): perf_eval, interview, age
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
```

```
# Print the names of the variables in the data frame (tibble) objects
names(DiffPred)
```

`## [1] "emp_id" "perf_eval" "interview" "age" "gender" "race"`

```
# View variable type for each variable in data frame
str(DiffPred)
```

```
## spec_tbl_df [377 × 6] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ emp_id : chr [1:377] "MA322" "MA323" "MA324" "MA325" ...
## $ perf_eval: num [1:377] 4.2 4.9 4.2 5.3 4.2 6.9 3.4 5.8 4.4 5.6 ...
## $ interview: num [1:377] 7.5 9.3 7.5 8 9.3 6.8 6.7 7 8.2 6.4 ...
## $ age : num [1:377] 29.7 31.7 29.4 37.9 30.9 46.2 43.9 47.8 31.7 44.6 ...
## $ gender : chr [1:377] "woman" "man" "woman" "woman" ...
## $ race : chr [1:377] "asian" "asian" "asian" "asian" ...
## - attr(*, "spec")=
## .. cols(
## .. emp_id = col_character(),
## .. perf_eval = col_double(),
## .. interview = col_double(),
## .. age = col_double(),
## .. gender = col_character(),
## .. race = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
```

```
# View first 6 rows of data frame
head(DiffPred)
```

```
## # A tibble: 6 × 6
## emp_id perf_eval interview age gender race
## <chr> <dbl> <dbl> <dbl> <chr> <chr>
## 1 MA322 4.2 7.5 29.7 woman asian
## 2 MA323 4.9 9.3 31.7 man asian
## 3 MA324 4.2 7.5 29.4 woman asian
## 4 MA325 5.3 8 37.9 woman asian
## 5 MA326 4.2 9.3 30.9 man black
## 6 MA327 6.9 6.8 46.2 woman asian
```

There are 6 variables and 377 cases (i.e., employees) in the `DiffPred`

data frame: `emp_id`

, `perf_eval`

, `interview`

, `age`

, `gender`

, and `race`

. Per the output of the `str`

(structure) function above, the variables `perf_eval`

, `interview`

, and `age`

are of type *numeric* (continuous: interval/ratio), and the variables `emp_id`

, `gender`

, and `race`

are of type *character* (nominal/categorical). The variable `emp_id`

is the unique employee identifier. Imagine that these data were collected as part of a criterion-related validation study - specifically, a concurrent validation design in which job incumbents were administered a rated structured interview (`interview`

) 90 days after entering the organization. The structured interview (`interview`

) variable was designed to assess individuals’ interpersonal skills, and ratings can range from 1 (very weak interpersonal skills) to 10 (very strong interpersonal skills). The interviews were scored by untrained raters who were often the hiring managers but not always. The `perf_eval`

variable is the criterion (outcome) of interest, and it is a 90-day-post-hire measure of supervisor-rated job performance, with possible ratings ranging from 1-7, with 7 indicating high performance. The `age`

variable represents the job incumbents’ ages (in years). The `gender`

variable represents the job incumbents’ tender identify and is defined by two levels/categories/values: *man* and *woman*. Finally, the `race`

variable represents the job incumbents’ race/ethnicity and is defined by three levels/categories/values: *asian*, *black*, and *white*.

### 42.2.4 Grand-Mean Center Continuous Predictor Variables

To improve the interpretability of our model and to reduce the collinearity between the predictor and moderator variables with their interaction term, we will grand-mean center continuous predictor and moderator variables. *[Note: A moderator variable is just a predictor variable that we are framing as a moderator.]* With regard to interpretability, grand-mean centering aids in our interpretation of the regression coefficients associated with our predictor and moderator variables in our MMLR by reducing collinearity between these variables and their interaction term and by estimating the y-intercept when all predictors are at their respective means (averages), as opposed to when they are at zero (which is not always a plausible or meaningful value for some variables). Further, when interpreting the association between one predictor variable and the outcome variable while statistically controlling for other variables in the model, grand-mean centering allows us to interpret the effect of the focal predictor variable when the other predictor variables are at their respective means (as opposed to zero). For more detailed information on centering, check out the chapter on centering and standardizing variables.

We only center predictor and moderator variables that are of type *numeric* and that we conceptualize as having a continuous (interval or ratio) measurement scale. In our current data frame, we will grand-mean center just the `interview`

and `age`

variables. We will use what is perhaps the most intuitive approach for grand-mean centering variables. We must begin by coming up with a new name for one of our soon-to-be grand-mean centered variable, and in this example, we will center the `interview`

variable. I typically like to name the centered variable by simply adding the `c_`

prefix to the existing variable’s names (e.g., `c_interview`

). Type the name of the data frame object to which the new centered variable will be attached (`DiffPred`

), followed by the `$`

operator and the name of the new variable we are creating (`c_interview`

). Next, add the `<-`

operator to indicate what values you will assign to this new variable. To create a vector of values to assign to this new `c_interview`

variable, we will subtract the mean (average) score for the original variable (`interview`

) from each case’s value on the variable. Specifically, enter the name of the data frame object, followed by the `$`

operator and the name of the original variable (`interview`

). After that, enter the subtraction symbol (`-`

). And finally, type the name of the `mean`

function from base R. As the first argument in the `mean`

function, enter the name of the data frame object (`DiffPred`

), followed by the `$`

operator and the name of the original variable (`interview`

). As the second argument, enter `na.rm=TRUE`

to indicate that you wish to drop missing values when calculating the grand mean for the sample. Now repeat those same steps for the `age`

variable.

```
# Grand-mean centering: Using basic arithmetic and the mean function from base R
$c_interview <- DiffPred$interview - mean(DiffPred$interview, na.rm=TRUE)
DiffPred$c_age <- DiffPred$age - mean(DiffPred$age, na.rm=TRUE) DiffPred
```

### 42.2.5 Estimate Moderated Multiple Linear Regression Model

As described above, we will use the standard three-step differential prediction assessment using MMLR. We will repeat this three step process for all three protected class variables in our data frame: `gender`

, `c_age`

, and `race`

. [Note that we are using the grand-mean centered age variable we created above.]. Our selection tool of interest is our `c_interview`

(which is the grand-mean centered version of the variable we created above), and our criterion of interest is `perf_eval`

. I will show you how to test differential prediction for each of these protected class variables using the `Regression`

(or `reg_brief`

) function from `lessR`

(Gerbing, Business, and University 2021).

*Note:* To reduce the overall length of this tutorial, I have omitted the diagnostic tests of the statistical assumptions, and thus for the purposes of this tutorial, we will assume that the statistical assumptions have been met. For your own work, be sure to run diagnostic tests to evaluate whether your model meets these statistical assumptions. For more information on statistical assumption testing in this context, check out the chapter on estimating increment validity using multiple linear regression.

If you haven’t already install and access the `lessR`

package using the `install.packages`

and `library`

functions, respectively (see above). This will allow us to access the `Regression`

and `reg_brief`

functions.

```
# Install lessR package if you haven't already
install.packages("lessR")
```

```
# Access lessR package
library(lessR)
```

#### 42.2.5.1 Test Differential Prediction Based on `gender`

Let’s start with investigating whether differential prediction exists for our `interview`

selection tool in relation to `perf_eval`

based `gender`

subgroup membership (i.e., man, woman).

**Step One:** Let’s being by regressing the criterion variable `perf_eval`

on the selection tool variable `c_interview`

. Here we are just establishing whether the selection tool shows evidence of criterion-related validity. See the chapters on predicting criterion scores using simple linear regression if you would like more information on regression-based approaches to assessing criterion-related validity, including determining whether statistical assumptions have been met. You can use either the `Regression`

or `reg_brief`

function from `lessR`

, where the latter provides less output. To keep the amount of output in this tutorial to a minimum, I will use the `reg_brief`

function, but when you evaluate whether statistical assumptions have been met, be sure to use the full `Regression`

function.

```
# Regress perf_eval (criterion) on c_interview (selection tool)
reg_brief(perf_eval ~ c_interview, data=DiffPred)
```

```
## >>> Suggestion
## # Create an R markdown file for interpretative output with Rmd = "file_name"
## reg(perf_eval ~ c_interview, data=DiffPred, Rmd="eg")
##
##
## BACKGROUND
##
## Data Frame: DiffPred
##
## Response Variable: perf_eval
## Predictor Variable: c_interview
##
## Number of cases (rows) of data: 377
## Number of cases retained for analysis: 377
##
##
## BASIC ANALYSIS
##
## Estimate Std Err t-value p-value Lower 95% Upper 95%
## (Intercept) 4.705 0.049 95.145 0.000 4.608 4.802
## c_interview 0.307 0.041 7.465 0.000 0.226 0.387
##
## Standard deviation of perf_eval: 1.028
##
## Standard deviation of residuals: 0.960 for 375 degrees of freedom
## 95% range of residual variation: 3.776 = 2 * (1.966 * 0.960)
##
## R-squared: 0.129 Adjusted R-squared: 0.127 PRESS R-squared: 0.120
##
## Null hypothesis of all 0 population slope coefficients:
## F-statistic: 55.732 df: 1 and 375 p-value: 0.000
##
## -- Analysis of Variance
##
## df Sum Sq Mean Sq F-value p-value
## Model 1 51.380 51.380 55.732 0.000
## Residuals 375 345.720 0.922
## perf_eval 376 397.100 1.056
##
##
## K-FOLD CROSS-VALIDATION
##
##
## RELATIONS AMONG THE VARIABLES
##
##
## RESIDUALS AND INFLUENCE
##
##
## PREDICTION ERROR
```

The regression coefficient for `c_interview`

is statistically significant and positive (*b* = .307, *p* < .001), which provides evidence of criterion-related validity. The amount of variance explained by `c_interview`

in `perf_eval`

is medium-large in magnitude (*R*^{2} = .129, adjusted *R*^{2} = .127, *p* < .001). See table below for common rules of thumbs for qualitatively describing *R*^{2} values.

R^{2} |
Description |
---|---|

.01 | Small |

.09 | Medium |

.25 | Large |

Because Step One revealed a statistically significant association between `c_interview`

and `perf_eval`

– and thus evidence of criterion-related validity for the selection tool – we will proceed to Step Two.

**Step Two:** Next, let’s add the protected class variable `gender`

to our model to investigate whether *intercept differences* exist. For more information on how to interpret and test the statistical assumptions of a multiple linear regression model, check out the chapter on estimating incremental validity using multiple linear regression.

```
# Regress perf_eval (criterion) on c_interview (selection tool) and protected class (gender)
reg_brief(perf_eval ~ c_interview + gender, data=DiffPred)
```

```
## >>> Suggestion
## # Create an R markdown file for interpretative output with Rmd = "file_name"
## reg(perf_eval ~ c_interview + gender, data=DiffPred, Rmd="eg")
##
##
## BACKGROUND
##
## Data Frame: DiffPred
##
## Response Variable: perf_eval
## Predictor Variable 1: c_interview
## Predictor Variable 2: gender
##
## Number of cases (rows) of data: 377
## Number of cases retained for analysis: 377
##
##
## BASIC ANALYSIS
##
## Estimate Std Err t-value p-value Lower 95% Upper 95%
## (Intercept) 5.083 0.091 55.833 0.000 4.904 5.262
## c_interview 0.164 0.049 3.325 0.001 0.067 0.261
## genderwoman -0.602 0.123 -4.889 0.000 -0.844 -0.360
##
## Standard deviation of perf_eval: 1.028
##
## Standard deviation of residuals: 0.932 for 374 degrees of freedom
## 95% range of residual variation: 3.666 = 2 * (1.966 * 0.932)
##
## R-squared: 0.182 Adjusted R-squared: 0.177 PRESS R-squared: NA
##
## Null hypothesis of all 0 population slope coefficients:
## F-statistic: 41.521 df: 2 and 374 p-value: 0.000
##
## -- Analysis of Variance from Type II Sums of Squares
##
## df Sum Sq Mean Sq F-value p-value
## c_interview 1 9.603 9.603 11.053 0.001
## gender 1 20.770 20.770 23.906 0.000
## Residuals 374 324.950 0.869
##
##
## MODELS OF perf_eval FOR LEVELS OF gender
##
## -- Test of Interaction
##
## c_interview:gender df: 1 df resid: 373 SS: 6.385 F: 7.476 p-value: 0.007
##
## -- Assume parallel lines, no interaction of gender with c_interview
##
## Level man: y^_perf_eval = 5.083 + 0.164(x_c_interview)
## Level woman: y^_perf_eval = 4.482 + 0.164(x_c_interview)
##
## -- Visualize Separately Computed Regression Lines
##
## Plot(c_interview, perf_eval, by=gender, fit="lm")
##
##
## K-FOLD CROSS-VALIDATION
##
##
## RELATIONS AMONG THE VARIABLES
##
##
## RESIDUALS AND INFLUENCE
##
##
## PREDICTION ERROR
```

The regression coefficient associated with `gender`

(or more specifically `genderwoman`

) is negative and statistically significant (*b* = -.602, *p* < .001) when controlling for `c_interview`

, which indicates that there is evidence of intercept differences. Note that the name of the regression coefficient for `gender`

has `woman`

appended to it. This means that the intercept represents the `perf_eval`

scores of women minus those for men; because the coefficient is negative, it implies that the intercept value is significantly lower for women relative to men. The amount of variance explained by `c_interview`

*and* `gender`

in `perf_eval`

is medium-large in magnitude (*R*^{2} = .182, adjusted *R*^{2} = .177, *p* < .001), which is larger than when just `c_interview`

was included in the model.

We can visualize the nature of the intercept differences by creating a plot with the simple slopes. To do so, we will use a package called `interactions`

(Long 2019), so if you haven’t already, install and access the package.

```
# Install interactions package if you haven't already
install.packages("interactions")
# Note: You may need to install the sandwich package independently
# install.packages("sandwich")
```

```
# Access interactions package
library(interactions)
```

As input to the `probe_interaction`

function from the `interactions`

package, we will need to use the `lm`

function base R to specify our regression model from above. We can just copy and paste the same arguments from the `reg_brief`

function above into the `lm`

function parentheses. We need to name this model something so that we can reference it in the next function, so let’s name it `Int_Model`

(for intercept differences model) using the `<-`

operator. **Note: If your selection tool variable is dichotomous and is of type factor or character, you will need to convert it to a variable of type numeric (e.g., levels: 0 & 1) – a requirement of the probe_interaction function.**

```
# Regress perf_eval (criterion) on c_interview (selection tool) and protected class (gender)
<- lm(perf_eval ~ c_interview + gender, data=DiffPred) Int_Model
```

Now we’re ready to use the `probe_interaction`

to plot this additive model. As the first argument, enter the name of the regression model we just created above (`Int_Model`

). As the second argument, type the name of the variable you are framing as the predictor after `pred=`

, which in our case is the selection tool variable (`c_interview`

). As the third argument, type the name of the variable you are framing as the moderator after `modx=`

, which in our case is the protected class variable (`gender`

). As the fourth argument, enter `johnson_neyman=FALSE`

, as we aren’t requesting the Johnson-Neyman test in this tutorial. Finally, if you choose to, you can use `x.label=`

and `y.label=`

to create more informative names for the predictor and outcome variables, respectively. You could also add, if you should choose to do so, the `legend.main=`

argument to provide a more informative name for the moderator variable, and/or the `modx.labels=`

to provide more informative names for the moderator variable labels.

```
# Visualize intercept differences
probe_interaction(Int_Model,
pred=c_interview,
modx=gender,
johnson_neyman=FALSE,
x.label="Interview Score",
y.label="Job Performance")
```

```
## SIMPLE SLOPES ANALYSIS
##
## Slope of c_interview when gender = man:
##
## Est. S.E. t val. p
## ------ ------ -------- ------
## 0.16 0.05 3.32 0.00
##
## Slope of c_interview when gender = woman:
##
## Est. S.E. t val. p
## ------ ------ -------- ------
## 0.16 0.05 3.32 0.00
```

*[If you got an error message when running the probe_interaction function, you may need to install the sandwich package and then try again.]*

Because we were only testing an additive model, the figure shows the slopes for each subgroup (men, women) as equal with just the intercepts differing. In the following step, we will formally test whether a multiplicative interaction effect exists, which would suggest slope differences.

**Step Three:** As the final step, it’s time to add the interaction term between `c_interview`

and `gender`

to the model. Fortunately, this is very easy to do in R. All we need to do is change the `+`

symbol from our previous regression model script to the `*`

, where the `*`

implies an interaction in the context of a model. Conceptually, the `*`

is appropriate because we are estimating a multiplicative model when we have an interaction term, which is commonly known as a product term.

```
# Regress perf_eval (criterion) on c_interview (selection tool), protected class (gender),
# and interaction between selection tool and protected class
reg_brief(perf_eval ~ c_interview * gender, data=DiffPred)
```

```
## >>> Suggestion
## # Create an R markdown file for interpretative output with Rmd = "file_name"
## reg(perf_eval ~ c_interview * gender, data=DiffPred, Rmd="eg")
##
##
## BACKGROUND
##
## Data Frame: DiffPred
##
## Response Variable: perf_eval
## Predictor Variable 1: c_interview
## Predictor Variable 2: gender
##
## Number of cases (rows) of data: 377
## Number of cases retained for analysis: 377
##
##
## BASIC ANALYSIS
##
## Estimate Std Err t-value p-value Lower 95% Upper 95%
## (Intercept) 5.258 0.111 47.559 0.000 5.040 5.475
## c_interview -0.025 0.085 -0.291 0.771 -0.191 0.142
## genderwoman -0.724 0.130 -5.572 0.000 -0.980 -0.469
## c_interview:genderwoman 0.284 0.104 2.734 0.007 0.080 0.488
##
## Standard deviation of perf_eval: 1.028
##
## Standard deviation of residuals: 0.924 for 373 degrees of freedom
## 95% range of residual variation: 3.634 = 2 * (1.966 * 0.924)
##
## R-squared: 0.198 Adjusted R-squared: 0.191 PRESS R-squared: NA
##
## Null hypothesis of all 0 population slope coefficients:
## F-statistic: 30.652 df: 3 and 373 p-value: 0.000
##
## -- Analysis of Variance from Type II Sums of Squares
##
## df Sum Sq Mean Sq F-value p-value
## c_interview 1 9.603 9.603 11.053 0.001
## gender 1 20.770 20.770 24.319 0.000
## c_interview:gender 1 6.385 6.385 7.476 0.007
## Residuals 373 318.565 0.854
##
##
## MODELS OF perf_eval FOR LEVELS OF gender
##
## -- Test of Interaction
##
## c_interview:gender df: 1 df resid: 373 SS: 6.385 F: 7.476 p-value: 0.007
##
## -- Assume parallel lines, no interaction of gender with c_interview
##
## Level man: y^_perf_eval = 5.258 + -0.025(x_c_interview)
## Level woman: y^_perf_eval = 4.533 + -0.025(x_c_interview)
##
## -- Visualize Separately Computed Regression Lines
##
## Plot(c_interview, perf_eval, by=gender, fit="lm")
##
##
## K-FOLD CROSS-VALIDATION
##
##
## RELATIONS AMONG THE VARIABLES
##
##
## RESIDUALS AND INFLUENCE
##
##
## PREDICTION ERROR
```

In our output, we now have a third regression term (`c_interview:gender`

) that is our interaction term. You may find that the because the label for the interaction term is long-ish, the columns associated with regression coefficients, standard errors, etc. are shifted over a bit, so make sure that you are referencing the values that you think you are. The regression coefficient associated with the interaction term (`c_interview:gender`

) is positive and statistically signficant (*b* = .284, *p* = .007), but don’t pay too much attention to the sign of the interaction term as it is difficult to interpret it without graphing the entire regression equation. What we do care about is that the interaction term is statistically significant, which indicates that there is evidence of *slope differences*; in other words, there seems to be multiplicative between `c_interview`

and `gender`

. We already know that there is evidence of intercept differences, where men have a higher intercept, but now we have evidence that the criterion-related validities for the selection tool (`c_interview`

) in relation to the criterion (`perf_eval`

) are significantly different from one another. The amount of variance explained by `c_interview`

, `gender`

, *and* their interaction in relation to `perf_eval`

is medium-large in magnitude (*R*^{2} = .198, adjusted *R*^{2} = .191, *p* < .001), which is larger than when just `c_interview`

and `gender`

were included in the model.

Given the statistically significant interaction term, it is appropriate (and helpful) to probe the interaction by plotting it. This will help us understand the form of the interaction. Just like we did for visualizing intercept differences, we will use the `probe_interaction`

function from the `interactions`

package. As input to the `probe_interaction`

function from the `interactions`

package, we will need to use the `lm`

function base R to specify our regression model from above. We can just copy and paste the same arguments from the `reg_brief`

function above into the `lm`

function parentheses. We need to name this model something so that we can reference it in the next function, so let’s name it `Slope_Model`

(for slope differences model) using the `<-`

operator.

```
# Regress perf_eval (criterion) on c_interview (selection tool), protected class (gender),
# and interaction between selection tool and protected class
<- lm(perf_eval ~ c_interview * gender, data=DiffPred) Slope_Model
```

Now we’re ready to use the `probe_interaction`

to plot this multiplicative model. As the first argument, enter the name of the regression model we just created above (`Slope_Model`

). As the second argument, type the name of the variable you are framing as the predictor after `pred=`

, which in our case is the selection tool variable (`c_interview`

). As the third argument, type the name of the variable you are framing as the moderator after `modx=`

, which in our case is the protected class variable (`gender`

). As the fourth argument, enter `johnson_neyman=FALSE`

, as we aren’t requesting the Johnson-Neyman test in this tutorial. Finally, if you choose to, you can use `x.label=`

and `y.label=`

to create more informative names for the predictor and outcome variables, respectively. You could also add, if you should choose to do so, the `legend.main=`

argument to provide a more informative name for the moderator variable, and/or the `modx.labels=`

to provide more informative names for the moderator variable labels.

```
# Probe the significant interaction (slope differences)
probe_interaction(Slope_Model,
pred=c_interview,
modx=gender,
johnson_neyman=FALSE,
x.label="Interview Score",
y.label="Job Performance")
```

```
## SIMPLE SLOPES ANALYSIS
##
## Slope of c_interview when gender = man:
##
## Est. S.E. t val. p
## ------- ------ -------- ------
## -0.02 0.08 -0.29 0.77
##
## Slope of c_interview when gender = woman:
##
## Est. S.E. t val. p
## ------ ------ -------- ------
## 0.26 0.06 4.32 0.00
```

The plot output shows that the two slopes are indeed different, such that there doesn’t seem to be much of an association between the selection tool (`c_interview`

, Interview) and the criterion (`perf_eval`

, Job Performance) for men, but there seems to be a positive association for women. Now take a look at the text output in your console. The simple slopes analysis automatically estimates the regression coefficient (slope) for both levels of the moderator variable (`gender`

: man, woman). Corroborating what we see in the plot, we find that the simple slope for women is indeed statistically significant and positive (*b* = .26, *p* < .01), such that for every one unit increase in interview scores, job performance scores tend to go up by .26 units. In contrast, we find that the simple slope for men is nonsignificant (*b* = -.02, *p* = .77).

In sum, we found evidence of intercept *and* slope differences for the selection tool (`c_interview`

) in relation to the criterion (`perf_eval`

) that were conditional upon the levels/subgroups of the protected class variable (`gender`

). Thus, it would be inappropriate to use a common regression line when predicting scores on the criterion based on the interview selection tool.

**Sample Write-Up:** Our HR analytics team recently found evidence in support of the criterion-related validity of a new structured interview selection procedure. As a next step, we investigated whether there might be evidence of differential prediction of the structured interview based on the U.S. protected class of gender. Based on a sample of 377 individuals, we applied a three-step process with moderated multiple linear regression, which is consistent with established principles (Society for Industrial & Organizational Psychology 2018; Cleary 1968). Based on our analyses, we found evidence differential prediction (i.e., predictive bias) based on gender for the new structured interview procedure, and specifically, we found evidence of both intercept and slope differences. In the first step, we found evidence of criterion-related validity based on a simple linear regression model, as structured interview scores were positively and significantly associated with job performance (*b* = .307, *p* < .001), thereby indicating some degree of job-relatedness and -relevance. In the second step, we added the protected class variable associated with gender to the model, resulting in a multiple linear regression model, and found that the model intercept for women was significantly lower than the intercept for men (*b* = -.602, *p* < .001). In the third step, we added the interaction term between the structured interview and gender variables, resulting in a moderated multiple linear regression model, and we found that gender moderated the association between interview scores and job performance scores to a statistically significant extent (*b* = .284, *p* = .007). Based on follow-up simple slopes analyses, we found that the association between structured interview scores and job performance scores was statistically significant for women (*b* = .26, *p* < .01), such that for every one unit increase in structured interview scores, job performance scores tended to increase by .26 units; in contrast, the associated between structured interview scores and job performance scores for men was nonsignificant (*b* = -.02, *p* = .77). In sum, we found evidence of both intercept and slope differences, which means that this interview tool shows predictive bias with respect to gender, making the application of a common regression line (i.e., equation) inappropriate according to U.S. guidelines. If possible, this structured interview should be redesigned or the interviewers should be trained to reduce the predictive bias based on gender. In the interim, we caution against using separate regression lines for men and women, as this may be interpreted as running afoul of the U.S. Civil Rights Act of 1991 (Saad and Sackett 2002). A more conservative approach would be to develop a structured interview process that allows for the use of a common regression line across legally protected groups.

#### 42.2.5.2 Test Differential Prediction Based on `c_age`

Investigating whether differential prediction exists for `c_interview`

in relation to `perf_eval`

based on the continuous moderator variable `c_age`

will unfold very similarly to the process we used for the dichotomous `gender`

variable from above. The only differences will occur when it comes to estimating and visualizing the intercept differences and simple slopes (assuming we find statistically significant effects). Given this, we will breeze through the steps, which means I will provide less explanation.

**Step One:** Regress the criterion variable `perf_eval`

on the selection tool variable `c_interview`

.

```
# Regress perf_eval (criterion) on c_interview (selection tool)
reg_brief(perf_eval ~ c_interview, data=DiffPred)
```

```
## >>> Suggestion
## # Create an R markdown file for interpretative output with Rmd = "file_name"
## reg(perf_eval ~ c_interview, data=DiffPred, Rmd="eg")
##
##
## BACKGROUND
##
## Data Frame: DiffPred
##
## Response Variable: perf_eval
## Predictor Variable: c_interview
##
## Number of cases (rows) of data: 377
## Number of cases retained for analysis: 377
##
##
## BASIC ANALYSIS
##
## Estimate Std Err t-value p-value Lower 95% Upper 95%
## (Intercept) 4.705 0.049 95.145 0.000 4.608 4.802
## c_interview 0.307 0.041 7.465 0.000 0.226 0.387
##
## Standard deviation of perf_eval: 1.028
##
## Standard deviation of residuals: 0.960 for 375 degrees of freedom
## 95% range of residual variation: 3.776 = 2 * (1.966 * 0.960)
##
## R-squared: 0.129 Adjusted R-squared: 0.127 PRESS R-squared: 0.120
##
## Null hypothesis of all 0 population slope coefficients:
## F-statistic: 55.732 df: 1 and 375 p-value: 0.000
##
## -- Analysis of Variance
##
## df Sum Sq Mean Sq F-value p-value
## Model 1 51.380 51.380 55.732 0.000
## Residuals 375 345.720 0.922
## perf_eval 376 397.100 1.056
##
##
## K-FOLD CROSS-VALIDATION
##
##
## RELATIONS AMONG THE VARIABLES
##
##
## RESIDUALS AND INFLUENCE
##
##
## PREDICTION ERROR
```

The regression coefficient for `c_interview`

is statistically significant and positive (*b* = .307, *p* < .001), which provides evidence of criterion-related validity. The amount of variance explained by `c_interview`

in `perf_eval`

is medium-large in magnitude (*R*^{2} = .129, adjusted *R*^{2} = .127, *p* < .001).

**Step Two:** Regress the criterion variable `perf_eval`

on the selection tool variable `c_interview`

and the protected class variable `c_age`

.

```
# Regress perf_eval (criterion) on c_interview (selection tool) and protected class (c_age)
reg_brief(perf_eval ~ c_interview + c_age, data=DiffPred)
```

```
## >>> Suggestion
## # Create an R markdown file for interpretative output with Rmd = "file_name"
## reg(perf_eval ~ c_interview + c_age, data=DiffPred, Rmd="eg")
##
##
## BACKGROUND
##
## Data Frame: DiffPred
##
## Response Variable: perf_eval
## Predictor Variable 1: c_interview
## Predictor Variable 2: c_age
##
## Number of cases (rows) of data: 377
## Number of cases retained for analysis: 377
##
##
## BASIC ANALYSIS
##
## Estimate Std Err t-value p-value Lower 95% Upper 95%
## (Intercept) 4.705 0.046 102.806 0.000 4.615 4.795
## c_interview 0.558 0.049 11.307 0.000 0.461 0.655
## c_age 0.059 0.007 7.988 0.000 0.045 0.074
##
## Standard deviation of perf_eval: 1.028
##
## Standard deviation of residuals: 0.889 for 374 degrees of freedom
## 95% range of residual variation: 3.495 = 2 * (1.966 * 0.889)
##
## R-squared: 0.256 Adjusted R-squared: 0.252 PRESS R-squared: 0.244
##
## Null hypothesis of all 0 population slope coefficients:
## F-statistic: 64.442 df: 2 and 374 p-value: 0.000
##
## -- Analysis of Variance
##
## df Sum Sq Mean Sq F-value p-value
## c_interview 1 51.380 51.380 65.067 0.000
## c_age 1 50.392 50.392 63.816 0.000
##
## Model 2 101.772 50.886 64.442 0.000
## Residuals 374 295.328 0.790
## perf_eval 376 397.100 1.056
##
##
## K-FOLD CROSS-VALIDATION
##
##
## RELATIONS AMONG THE VARIABLES
##
##
## RESIDUALS AND INFLUENCE
##
##
## PREDICTION ERROR
```

The regression coefficient associated with `c_age`

is positive and statistically significant (*b* = .059, *p* < .001) when controlling for `c_interview`

, which indicates that there is evidence of intercept differences. Because the coefficient is positive, it implies that the intercept value is significantly higher for older workers relative to younger workers. The amount of variance explained by `c_interview`

*and* `c_age`

in `perf_eval`

is large in magnitude (*R*^{2} = .256, adjusted *R*^{2} = .252, *p* < .001), which is notably larger than when just `c_interview`

was included in the model.

Using the `probe_interaction`

function from the `interactions`

package, we will visualize the intercept differences. As input to the `probe_interaction`

function, use the `lm`

function base R to specify our regression model from above. Name this model something so that we can reference it in the next function (`Int_Model`

).

```
# Regress perf_eval (criterion) on c_interview (selection tool) and protected class (c_age)
<- lm(perf_eval ~ c_interview + c_age, data=DiffPred) Int_Model
```

Now specify the arguments in the `probe_interaction`

function. Note that the function automatically categorizes the continuous moderator variable by its mean and 1 standard deviation (SD) below and above the mean. As you can see, individuals whose age is 1 SD above the mean have the highest intercept value.

```
# Visualize intercept differences
probe_interaction(Int_Model,
pred=c_interview,
modx=c_age,
johnson_neyman=FALSE,
x.label="Interview Score",
y.label="Job Performance")
```

```
## SIMPLE SLOPES ANALYSIS
##
## Slope of c_interview when c_age = -8.029929095154546558888 (- 1 SD):
##
## Est. S.E. t val. p
## ------ ------ -------- ------
## 0.56 0.05 11.31 0.00
##
## Slope of c_interview when c_age = 0.000000000000001243921 (Mean):
##
## Est. S.E. t val. p
## ------ ------ -------- ------
## 0.56 0.05 11.31 0.00
##
## Slope of c_interview when c_age = 8.029929095154550111602 (+ 1 SD):
##
## Est. S.E. t val. p
## ------ ------ -------- ------
## 0.56 0.05 11.31 0.00
```

**Step Three:** Now add the interaction term between `c_interview`

and `c_age`

to the model using the `*`

symbol.

```
# Regress perf_eval (criterion) on c_interview (selection tool), protected class (c_age),
# and interaction between selection tool and protected class
reg_brief(perf_eval ~ c_interview * c_age, data=DiffPred)
```

```
## >>> Suggestion
## # Create an R markdown file for interpretative output with Rmd = "file_name"
## reg(perf_eval ~ c_interview * c_age, data=DiffPred, Rmd="eg")
##
##
## BACKGROUND
##
## Data Frame: DiffPred
##
## Response Variable: perf_eval
## Predictor Variable 1: c_interview
## Predictor Variable 2: c_age
##
## Number of cases (rows) of data: 377
## Number of cases retained for analysis: 377
##
##
## BASIC ANALYSIS
##
## Estimate Std Err t-value p-value Lower 95% Upper 95%
## (Intercept) 4.795 0.051 93.404 0.000 4.694 4.896
## c_interview 0.572 0.049 11.753 0.000 0.476 0.668
## c_age 0.069 0.008 8.893 0.000 0.054 0.085
## c_interview:c_age 0.015 0.004 3.658 0.000 0.007 0.023
##
## Standard deviation of perf_eval: 1.028
##
## Standard deviation of residuals: 0.874 for 373 degrees of freedom
## 95% range of residual variation: 3.438 = 2 * (1.966 * 0.874)
##
## R-squared: 0.282 Adjusted R-squared: 0.276 PRESS R-squared: 0.267
##
## Null hypothesis of all 0 population slope coefficients:
## F-statistic: 48.845 df: 3 and 373 p-value: 0.000
##
## -- Analysis of Variance
##
## df Sum Sq Mean Sq F-value p-value
## c_interview 1 51.380 51.380 67.222 0.000
## c_age 1 50.392 50.392 65.929 0.000
## c_interview:c_age 1 10.230 10.230 13.384 0.000
##
## Model 3 112.002 37.334 48.845 0.000
## Residuals 373 285.098 0.764
## perf_eval 376 397.100 1.056
##
##
## K-FOLD CROSS-VALIDATION
##
##
## RELATIONS AMONG THE VARIABLES
##
##
## RESIDUALS AND INFLUENCE
##
##
## PREDICTION ERROR
```

The regression coefficient associated with the interaction term (`c_interview:c_age`

) is positive and statistically signficant (*b* = .015, *p* < .001), which indicates that there is evidence of *slope differences*. The amount of variance explained by `c_interview`

, `c_age`

, *and* their interaction in relation to `perf_eval`

is large in magnitude (*R*^{2} = .282, adjusted *R*^{2} = .276, *p* < .001), which is larger than when just `c_interview`

and `c_age`

were included in the model. Given the statistically significant interaction term, we will use the `probe_interaction`

function from the `interactions`

package to probe the interaction. As input to the `probe_interaction`

function from the `interactions`

package, we will need to use the `lm`

function base R to specify our regression model from above. We need to name this model something so that we can reference it in the next function, so let’s name it `Slope_Model`

(for slope differences model) using the `<-`

operator.

```
# Regress perf_eval (criterion) on c_interview (selection tool), protected class (c_age),
# and interaction between selection tool and protected class
<- lm(perf_eval ~ c_interview * c_age, data=DiffPred) Slope_Model
```

Now we’re ready to use the `probe_interaction`

to plot this multiplicative model.

```
# Probe the significant interaction (slope differences)
probe_interaction(Slope_Model,
pred=c_interview,
modx=c_age,
johnson_neyman=FALSE,
x.label="Interview Score",
y.label="Job Performance")
```

```
## SIMPLE SLOPES ANALYSIS
##
## Slope of c_interview when c_age = -8.029929095154546558888 (- 1 SD):
##
## Est. S.E. t val. p
## ------ ------ -------- ------
## 0.45 0.06 8.09 0.00
##
## Slope of c_interview when c_age = 0.000000000000001243921 (Mean):
##
## Est. S.E. t val. p
## ------ ------ -------- ------
## 0.57 0.05 11.75 0.00
##
## Slope of c_interview when c_age = 8.029929095154550111602 (+ 1 SD):
##
## Est. S.E. t val. p
## ------ ------ -------- ------
## 0.69 0.06 11.40 0.00
```

The plot and the simple slopes analyses output show that the two slopes are indeed different, such that the association between the selection tool (`c_interview`

, Interview) and the criterion (`perf_eval`

, Job Performance) is stronger for older workers (*b* = .69, *p* < .01), than for average-aged workers (*b* = .57, *p* < .01) and younger workers (*b* = .45, *p* < .01). Notably, all three simple slopes are statistically significant and positive, but the slope is stronger for older workers.

In sum, we found evidence of intercept *and* slope differences for the selection tool (`c_interview`

) in relation to the criterion (`perf_eval`

) that were conditional upon the protected class variable (`c_age`

). Thus, it would be inappropriate to use a common regression line when predicting scores on the criterion based on the interview selection tool.

#### 42.2.5.3 Test Differential Prediction Based on `race`

Investigating whether differential prediction exists for `c_interview`

in relation to `perf_eval`

based on `race`

variable will unfold in mostly the same manner as the dichotomous `gender`

variable from above. However, we will filter our data within the regression function to compare just Asian individuals with White individuals, as it is customer to compare just two races/ethnicities at a time. The decision to focus on only Asian and white individuals is arbitrary and just for demonstration purposes. Note that in the `DiffPred`

data frame object that both `asian`

and `white`

are lowercase category labels in the `race`

variable.

**Step one:** Regress the criterion variable `perf_eval`

on the selection tool variable `c_interview`

. To subset the data such that only individuals who are Asian or white are retained, we will add the following argument: `rows=(race=="asian" | race=="white")`

. This argument uses conditional statements to indicate that we want to retain those cases for which `race`

is equal to Asian *or* for which `race`

is equal to White. We have effectively dichotomized the `race`

variable using this approach. If you need to review conditional/logical statements when filtering data, check out the chapter on filtering data.

```
# Regress perf_eval (criterion) on c_interview (selection tool)
reg_brief(perf_eval ~ c_interview, data=DiffPred,
rows=(race=="asian" | race=="white"))
```

```
## >>> Suggestion
## # Create an R markdown file for interpretative output with Rmd = "file_name"
## reg(perf_eval ~ c_interview, data=DiffPred, rows=(race == "asian" | race == "white"), Rmd="eg")
##
##
## BACKGROUND
##
## Data Frame: DiffPred
##
## Response Variable: perf_eval
## Predictor Variable: c_interview
##
## Number of cases (rows) of data: 346
## Number of cases retained for analysis: 346
##
##
## BASIC ANALYSIS
##
## Estimate Std Err t-value p-value Lower 95% Upper 95%
## (Intercept) 4.751 0.051 93.696 0.000 4.651 4.851
## c_interview 0.348 0.043 8.128 0.000 0.264 0.433
##
## Standard deviation of perf_eval: 1.024
##
## Standard deviation of residuals: 0.940 for 344 degrees of freedom
## 95% range of residual variation: 3.696 = 2 * (1.967 * 0.940)
##
## R-squared: 0.161 Adjusted R-squared: 0.159 PRESS R-squared: 0.152
##
## Null hypothesis of all 0 population slope coefficients:
## F-statistic: 66.064 df: 1 and 344 p-value: 0.000
##
## -- Analysis of Variance
##
## df Sum Sq Mean Sq F-value p-value
## Model 1 58.317 58.317 66.064 0.000
## Residuals 344 303.658 0.883
## perf_eval 345 361.975 1.049
##
##
## K-FOLD CROSS-VALIDATION
##
##
## RELATIONS AMONG THE VARIABLES
##
##
## RESIDUALS AND INFLUENCE
##
##
## PREDICTION ERROR
```

The regression coefficient for `c_interview`

is statistically significant and positive (*b* = .348, *p* < .001) based on this reduced sample of 346 individuals, which provides evidence of criterion-related validity. The amount of variance explained by `c_interview`

in `perf_eval`

is medium-large in magnitude (*R*^{2} = .161, adjusted *R*^{2} = .159, *p* < .001).

**Step Two:** Regress the criterion variable `perf_eval`

on the selection tool variable `c_interview`

and the protected class variable `race`

.

```
# Regress perf_eval (criterion) on c_interview (selection tool) and protected class (race)
reg_brief(perf_eval ~ c_interview + race, data=DiffPred,
rows=(race=="asian" | race=="white"))
```

```
## >>> Suggestion
## # Create an R markdown file for interpretative output with Rmd = "file_name"
## reg(perf_eval ~ c_interview + race, data=DiffPred, rows=(race == "asian" | race == "white"), Rmd="eg")
##
##
## BACKGROUND
##
## Data Frame: DiffPred
##
## Response Variable: perf_eval
## Predictor Variable 1: c_interview
## Predictor Variable 2: race
##
## Number of cases (rows) of data: 346
## Number of cases retained for analysis: 346
##
##
## BASIC ANALYSIS
##
## Estimate Std Err t-value p-value Lower 95% Upper 95%
## (Intercept) 4.918 0.070 70.754 0.000 4.781 5.054
## c_interview 0.407 0.045 8.945 0.000 0.317 0.496
## racewhite -0.373 0.108 -3.445 0.001 -0.586 -0.160
##
## Standard deviation of perf_eval: 1.024
##
## Standard deviation of residuals: 0.925 for 343 degrees of freedom
## 95% range of residual variation: 3.639 = 2 * (1.967 * 0.925)
##
## R-squared: 0.189 Adjusted R-squared: 0.184 PRESS R-squared: NA
##
## Null hypothesis of all 0 population slope coefficients:
## F-statistic: 40.010 df: 2 and 343 p-value: 0.000
##
## -- Analysis of Variance from Type II Sums of Squares
##
## df Sum Sq Mean Sq F-value p-value
## c_interview 1 68.459 68.459 80.004 0.000
## race 1 10.156 10.156 11.869 0.001
## Residuals 343 293.502 0.856
##
##
## MODELS OF perf_eval FOR LEVELS OF race
##
## -- Test of Interaction
##
## c_interview:race df: 1 df resid: 342 SS: 1.081 F: 1.264 p-value: 0.262
##
## -- Assume parallel lines, no interaction of race with c_interview
##
## Level asian: y^_perf_eval = 4.918 + 0.407(x_c_interview)
## Level white: y^_perf_eval = 4.545 + 0.407(x_c_interview)
##
## -- Visualize Separately Computed Regression Lines
##
## Plot(c_interview, perf_eval, by=race, fit="lm")
##
##
## K-FOLD CROSS-VALIDATION
##
##
## RELATIONS AMONG THE VARIABLES
##
##
## RESIDUALS AND INFLUENCE
##
##
## PREDICTION ERROR
```

The regression coefficient associated with `race`

(or more specifically `racewhite`

) is negative and statistically signficant (*b* = -.373, *p* = .001) when controlling for `c_interview`

, which indicates that there is evidence of intercept differences. Because the coefficient is negative and because `white`

is appended to the `race`

variable name, it implies that the intercept value is significantly lower for White individuals compared to Asian individuals. The amount of variance explained by `c_interview`

*and* `race`

in `perf_eval`

is medium-large in magnitude (*R*^{2} = .189, adjusted *R*^{2} = .184, *p* < .001).

Using the `probe_interaction`

function from the `interactions`

package, we will visualize the intercept differences. As input to the `probe_interaction`

function, use the `lm`

function base R to specify our regression model from above. The `subset=`

argument functions the same way as the `rows=`

argument from the `reg_brief`

and `Regression`

functions from `lessR`

, so we’ll use that instead for the `lm`

function. Name this model something so that we can reference it in the next function (`Int_Model`

).

```
# Regress perf_eval (criterion) on c_interview (selection tool) and protected class (race)
<- lm(perf_eval ~ c_interview + race, data=DiffPred,
Int_Model subset=(race=="asian" | race=="white"))
```

Now specify the arguments in the `probe_interaction`

function.

```
# Visualize intercept differences
probe_interaction(Int_Model,
pred=c_interview,
modx=race,
johnson_neyman=FALSE,
x.label="Interview Score",
y.label="Job Performance")
```

```
## SIMPLE SLOPES ANALYSIS
##
## Slope of c_interview when race = white:
##
## Est. S.E. t val. p
## ------ ------ -------- ------
## 0.41 0.05 8.94 0.00
##
## Slope of c_interview when race = asian:
##
## Est. S.E. t val. p
## ------ ------ -------- ------
## 0.41 0.05 8.94 0.00
```

**Step Three:** Now add the interaction term between `c_interview`

and `race`

to the model using the `*`

symbol.

```
# Regress perf_eval (criterion) on c_interview (selection tool), protected class (race),
# and interaction between selection tool and protected class
reg_brief(perf_eval ~ c_interview * race, data=DiffPred,
rows=(race=="asian" | race=="white"))
```

```
## >>> Suggestion
## # Create an R markdown file for interpretative output with Rmd = "file_name"
## reg(perf_eval ~ c_interview * race, data=DiffPred, rows=(race == "asian" | race == "white"), Rmd="eg")
##
##
## BACKGROUND
##
## Data Frame: DiffPred
##
## Response Variable: perf_eval
## Predictor Variable 1: c_interview
## Predictor Variable 2: race
##
## Number of cases (rows) of data: 346
## Number of cases retained for analysis: 346
##
##
## BASIC ANALYSIS
##
## Estimate Std Err t-value p-value Lower 95% Upper 95%
## (Intercept) 4.896 0.072 67.863 0.000 4.754 5.038
## c_interview 0.362 0.061 5.974 0.000 0.243 0.481
## racewhite -0.374 0.108 -3.459 0.001 -0.587 -0.161
## c_interview:racewhite 0.103 0.092 1.124 0.262 -0.077 0.283
##
## Standard deviation of perf_eval: 1.024
##
## Standard deviation of residuals: 0.925 for 342 degrees of freedom
## 95% range of residual variation: 3.638 = 2 * (1.967 * 0.925)
##
## R-squared: 0.192 Adjusted R-squared: 0.185 PRESS R-squared: NA
##
## Null hypothesis of all 0 population slope coefficients:
## F-statistic: 27.115 df: 3 and 342 p-value: 0.000
##
## -- Analysis of Variance from Type II Sums of Squares
##
## df Sum Sq Mean Sq F-value p-value
## c_interview 1 68.459 68.459 80.004 0.000
## race 1 10.156 10.156 11.878 0.001
## c_interview:race 1 1.081 1.081 1.264 0.262
## Residuals 342 292.421 0.855
##
##
## MODELS OF perf_eval FOR LEVELS OF race
##
## -- Test of Interaction
##
## c_interview:race df: 1 df resid: 342 SS: 1.081 F: 1.264 p-value: 0.262
##
## -- Assume parallel lines, no interaction of race with c_interview
##
## Level asian: y^_perf_eval = 4.896 + 0.362(x_c_interview)
## Level white: y^_perf_eval = 4.522 + 0.362(x_c_interview)
##
## -- Visualize Separately Computed Regression Lines
##
## Plot(c_interview, perf_eval, by=race, fit="lm")
##
##
## K-FOLD CROSS-VALIDATION
##
##
## RELATIONS AMONG THE VARIABLES
##
##
## RESIDUALS AND INFLUENCE
##
##
## PREDICTION ERROR
```

The regression coefficient associated with the interaction term (`c_interview:racewhite`

) is *not* statistically significant (*b* = .103, *p* = .262), which indicates that there is no evidence of *slope differences*. Given this, we won’t probe the interaction because, well, there is not an interaction to probe.

In sum, we found evidence of intercept differences for the selection tool (`c_interview`

) in relation to the criterion (`perf_eval`

) that were conditional upon the protected class variable (`race`

). Thus, it would be inappropriate to use a common regression line when predicting scores on the criterion based on the interview selection tool.

### 42.2.6 Summary

In this chapter, we learned how to use moderated multiple linear regression (MMLR) to test whether evidence of differential prediction exists. To do so, we learned how to use the `Regression`

function (well, technically we used the `reg_brief`

function) from `lessR`

. In addition, we used the `probe_interaction`

function from the `interactions`

package to probe the intercept differences and slope differences visually and using simple slopes analyses.

## 42.3 Chapter Supplement

In addition to the `Regression`

function from the `lessR`

package covered above, we can use the `lm`

function from base R to estimate a moderated multiple linear regression (MMLR) model. Because this function comes from base R, we do not need to install and access an additional package. In this supplement, you will also have an opportunity to learn how to make an APA (American Psychological Association) style table of regression results.

### 42.3.1 Functions & Packages Introduced

Function | Package |
---|---|

`mean` |
base R |

`lm` |
base R |

`summary` |
base R |

`anova` |
base R |

`probe_interaction` |
`interactions` |

`apa.reg.table` |
`apaTables` |

### 42.3.2 Initial Steps

If required, please refer to the Initial Steps section from this chapter for more information on these initial steps.

```
# Set your working directory
setwd("H:/RWorkshop")
```

```
# Install readr package if you haven't already
# [Note: You don't need to install a package every
# time you wish to access it]
install.packages("readr")
```

```
# Access readr package
library(readr)
# Read data and name data frame (tibble) object
<- read_csv("DiffPred.csv") DiffPred
```

```
## Rows: 377 Columns: 6
## ── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): emp_id, gender, race
## dbl (3): perf_eval, interview, age
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
```

```
# Print the names of the variables in the data frame (tibble) objects
names(DiffPred)
```

`## [1] "emp_id" "perf_eval" "interview" "age" "gender" "race"`

```
# View variable type for each variable in data frame
str(DiffPred)
```

```
## spec_tbl_df [377 × 6] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ emp_id : chr [1:377] "MA322" "MA323" "MA324" "MA325" ...
## $ perf_eval: num [1:377] 4.2 4.9 4.2 5.3 4.2 6.9 3.4 5.8 4.4 5.6 ...
## $ interview: num [1:377] 7.5 9.3 7.5 8 9.3 6.8 6.7 7 8.2 6.4 ...
## $ age : num [1:377] 29.7 31.7 29.4 37.9 30.9 46.2 43.9 47.8 31.7 44.6 ...
## $ gender : chr [1:377] "woman" "man" "woman" "woman" ...
## $ race : chr [1:377] "asian" "asian" "asian" "asian" ...
## - attr(*, "spec")=
## .. cols(
## .. emp_id = col_character(),
## .. perf_eval = col_double(),
## .. interview = col_double(),
## .. age = col_double(),
## .. gender = col_character(),
## .. race = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
```

```
# View first 6 rows of data frame
head(DiffPred)
```

```
## # A tibble: 6 × 6
## emp_id perf_eval interview age gender race
## <chr> <dbl> <dbl> <dbl> <chr> <chr>
## 1 MA322 4.2 7.5 29.7 woman asian
## 2 MA323 4.9 9.3 31.7 man asian
## 3 MA324 4.2 7.5 29.4 woman asian
## 4 MA325 5.3 8 37.9 woman asian
## 5 MA326 4.2 9.3 30.9 man black
## 6 MA327 6.9 6.8 46.2 woman asian
```

### 42.3.3 `lm`

Function from Base R

We will use the `lm`

function from base R to specify and estimate our regression functions. As I did above, I will show you how to apply moderated multiple linear regression (MMLR) in the service of detecting differential prediction of a selection tool based on protected class membership.

**Grand-Mean Center Variables:** Before we estimate the regression models, however, we need to grand-mean center certain variables. We only center predictor and moderator variables that are of type *numeric* and that we conceptualize as having a continuous (interval or ratio) measurement scale. In our current data frame, we will grand-mean center just the `interview`

and `age`

variables. We will use what is perhaps the most intuitive approach for grand-mean centering variables. We must begin by coming up with a new name for one of our soon-to-be grand-mean centered variable, and in this example, we will center the `interview`

variable. I typically like to name the centered variable by simply adding the `c_`

prefix to the existing variable’s names (e.g., `c_interview`

). Type the name of the data frame object to which the new centered variable will be attached (`DiffPred`

), followed by the `$`

operator and the name of the new variable we are creating (`c_interview`

). Next, add the `<-`

operator to indicate what values you will assign to this new variable. To create a vector of values to assign to this new `c_interview`

variable, we will subtract the mean (average) score for the original variable (`interview`

) from each case’s value on the variable. Specifically, enter the name of the data frame object, followed by the `$`

operator and the name of the original variable (`interview`

). After that, enter the subtraction symbol (`-`

). And finally, type the name of the `mean`

function from base R. As the first argument in the `mean`

function, enter the name of the data frame object (`DiffPred`

), followed by the `$`

operator and the name of the original variable (`interview`

). As the second argument, enter `na.rm=TRUE`

to indicate that you wish to drop missing values when calculating the grand mean for the sample. Now repeat those same steps for the `age`

variable.

```
# Grand-mean centering: Using basic arithmetic and the mean function from base R
$c_interview <- DiffPred$interview - mean(DiffPred$interview, na.rm=TRUE)
DiffPred$c_age <- DiffPred$age - mean(DiffPred$age, na.rm=TRUE) DiffPred
```

#### 42.3.3.1 Test Differential Prediction Based on `gender`

Let’s start with investigating whether differential prediction exists for our `interview`

selection tool in relation to `perf_eval`

based `gender`

subgroup membership (i.e., man, woman).

**Step One:** Let’s being by regressing the criterion variable `perf_eval`

on the selection tool variable `c_interview`

. Here we are just establishing whether the selection tool shows evidence of criterion-related validity. See the chapters on criterion-related validity using correlation and predicting criterion scores using simple linear regression if you would like more information on regression-based approaches to assessing criterion-related validity, including determining whether statistical assumptions have been met. Let’s name our regression model for this first step `model.1`

. Remember, we need to use the `summary`

function from base R to view our model output.

```
# Regress perf_eval (criterion) on c_interview (selection tool)
.1 <- lm(perf_eval ~ c_interview, data=DiffPred)
modelsummary(model.1)
```

```
##
## Call:
## lm(formula = perf_eval ~ c_interview, data = DiffPred)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.86274 -0.58012 0.01324 0.58922 2.74118
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.70504 0.04945 95.145 < 0.0000000000000002 ***
## c_interview 0.30665 0.04108 7.465 0.000000000000587 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9602 on 375 degrees of freedom
## Multiple R-squared: 0.1294, Adjusted R-squared: 0.1271
## F-statistic: 55.73 on 1 and 375 DF, p-value: 0.0000000000005866
```

The regression coefficient for `c_interview`

is statistically significant and positive (*b* = .307, *p* < .001), which provides evidence of criterion-related validity. The amount of variance explained by `c_interview`

in `perf_eval`

is medium-large in magnitude (*R*^{2} = .129, adjusted *R*^{2} = .127, *p* < .001). See table below for common rules of thumbs for qualitatively describing *R*^{2} values.

R^{2} |
Description |
---|---|

.01 | Small |

.09 | Medium |

.25 | Large |

**Step Two:** Next, let’s add the protected class variable `gender`

to our model to investigate whether *intercept differences* exist. We’ll name this model `model.2`

. For more information on how to interpret and test the statistical assumptions of a multiple linear regression model, check out the chapter on estimating incremental validity. In addition, let’s use the `anova`

function from base R to do a nested model comparison between `model.1`

and `model.2`

. This will tell us whether adding `gender`

to the model significantly improved model fit. Finally, let’s reference the summary/output for each model and specifically the `r.squared`

values to determine what the change in *R*^{2} was from one model to the next. The change in *R*^{2} gives as an idea of how much (in terms of practical significance) the model fit improved when adding the additional variable. Note that we use the `$`

symbol to indicate that we want the `r.squared`

value from the `summary(model.2)`

and `summary(model.2)`

output, and then we do a simple substraction to get the change in *R*^{2}.

```
# Regress perf_eval (criterion) on c_interview (selection tool) and protected class (gender)
.2 <- lm(perf_eval ~ c_interview + gender, data=DiffPred)
modelsummary(model.2)
```

```
##
## Call:
## lm(formula = perf_eval ~ c_interview + gender, data = DiffPred)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.93543 -0.63543 -0.00836 0.59461 2.75731
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.08327 0.09104 55.833 < 0.0000000000000002 ***
## c_interview 0.16419 0.04939 3.325 0.000973 ***
## genderwoman -0.60166 0.12306 -4.889 0.0000015 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9321 on 374 degrees of freedom
## Multiple R-squared: 0.1817, Adjusted R-squared: 0.1773
## F-statistic: 41.52 on 2 and 374 DF, p-value: < 0.00000000000000022
```

```
# Nested model comparison
anova(model.1, model.2)
```

```
## Analysis of Variance Table
##
## Model 1: perf_eval ~ c_interview
## Model 2: perf_eval ~ c_interview + gender
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 375 345.72
## 2 374 324.95 1 20.77 23.905 0.000001505 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

```
# Change in R-squared
summary(model.2)$r.squared - summary(model.1)$r.squared
```

`## [1] 0.052305`

The regression coefficient associated with `gender`

(or more specifically `genderwoman`

) is negative and statistically signficant (*b* = -.602, *p* < .001) when controlling for `c_interview`

, which indicates that there is evidence of intercept differences. Note that the name of the regression coefficient for `gender`

has `woman`

appended to it. This means that the intercept represents the `perf_eval`

scores of women minus those for men; because the coefficient is negative, it implies that the intercept value is significantly lower for women relative to men. The amount of variance explained by `c_interview`

*and* `gender`

in `perf_eval`

is medium-large in magnitude (*R*^{2} = .182, adjusted *R*^{2} = .177, *p* < .001). The significant *F*-value (23.905, *p* < .001) in from the nested model comparison indicates that adding `gender`

significantly improved model fit, and because there was a significant improvement in model fit, we should interpret the change in *R*^{2} (*R*^{2} = .052), which indicates that adding `gender`

to the model explained an additional 5.2% of the variance in `perf_eval`

.

We can visualize the nature of the intercept differences by creating a plot with the simple slopes. To do so, we will use a package called `interactions`

, so if you haven’t already, install and access the package.

```
# Install interactions package if you haven't already
install.packages("interactions")
```

```
# Access interactions package
library(interactions)
```

Now we’re ready to use the `probe_interaction`

to plot this additive model. As the first argument, enter the name of the regression model we just created above (`model.2`

). As the second argument, type the name of the variable you are framing as the predictor after `pred=`

, which in our case is the selection tool variable (`c_interview`

). As the third argument, type the name of the variable you are framing as the moderator after `modx=`

, which in our case is the protected class variable (`gender`

). As the fourth argument, enter `johnson_neyman=FALSE`

, as we aren’t requesting the Johnson-Neyman test in this tutorial. Finally, if you choose to, you can use `x.label=`

and `y.label=`

to create more informative names for the predictor and outcome variables, respectively. You could also add, if you should choose to do so, the `legend.main=`

argument to provide a more informative name for the moderator variable, and/or the `modx.labels=`

to provide more informative names for the moderator variable labels. **Note: If your selection tool variable is dichotomous and is of type factor or character, you will need to convert it to a variable of type numeric (e.g., levels: 0 & 1) – a requirement of the probe_interaction function.**

```
# Visualize intercept differences
probe_interaction(model.2,
pred=c_interview,
modx=gender,
johnson_neyman=FALSE,
x.label="Interview Score",
y.label="Job Performance")
```

```
## SIMPLE SLOPES ANALYSIS
##
## Slope of c_interview when gender = man:
##
## Est. S.E. t val. p
## ------ ------ -------- ------
## 0.16 0.05 3.32 0.00
##
## Slope of c_interview when gender = woman:
##
## Est. S.E. t val. p
## ------ ------ -------- ------
## 0.16 0.05 3.32 0.00
```

Because we were only testing an additive model, the figure shows the slopes for each subgroup (men, women) as equal with just the intercepts differing. In the following step, we will formally test whether a multiplicative interaction effect exists, which would suggest slope differences.

**Step Three:** As the final step, it’s time to add the interaction term between `c_interview`

and `gender`

to the model. Fortunately, this is very easy to do in R. All we need to do is change the `+`

symbol from our previous regression model script to the `*`

, where the `*`

implies an interaction in the context of a model. Conceptually, the `*`

is appropriate because we are estimating a multiplicative model when we have an interaction term, which is commonly known as a product term. Let’s name this model `model.3`

. As before, we’ll do a nested model comparison with the previous model and this one to determine whether there is a significant improvement in fit. In addition, we’ll estimate change in *R*^{2}.

```
# Regress perf_eval (criterion) on c_interview (selection tool), protected class (gender),
# and interaction between selection tool and protected class
.3 <- lm(perf_eval ~ c_interview * gender, data=DiffPred)
modelsummary(model.3)
```

```
##
## Call:
## lm(formula = perf_eval ~ c_interview * gender, data = DiffPred)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.70489 -0.55275 0.02541 0.58815 2.84361
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.25779 0.11055 47.559 < 0.0000000000000002 ***
## c_interview -0.02466 0.08466 -0.291 0.77099
## genderwoman -0.72437 0.13000 -5.572 0.0000000483 ***
## c_interview:genderwoman 0.28376 0.10378 2.734 0.00655 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9242 on 373 degrees of freedom
## Multiple R-squared: 0.1978, Adjusted R-squared: 0.1913
## F-statistic: 30.65 on 3 and 373 DF, p-value: < 0.00000000000000022
```

```
# Nested model comparison
anova(model.2, model.3)
```

```
## Analysis of Variance Table
##
## Model 1: perf_eval ~ c_interview + gender
## Model 2: perf_eval ~ c_interview * gender
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 374 324.95
## 2 373 318.56 1 6.3852 7.4762 0.00655 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

```
# Change in R-squared
summary(model.3)$r.squared - summary(model.2)$r.squared
```

`## [1] 0.01607944`

In our output, we now have a third regression term (`c_interview:genderwoman`

) that is our interaction term. You may find that the because the label for the interaction term is long-ish, the columns associated with regression coefficients, standard errors, etc. are shifted over a bit, so make sure that you are referencing the values that you think you are. The regression coefficient associated with the interaction term (`c_interview:genderwoman`

) is positive and statistically significant (*b* = .284, *p* = .007), but don’t pay too much attention to the sign of the interaction term as it is difficult to interpret it without graphing the entire regression equation. What we do care about is that the interaction term is statistically significant, which indicates that there is evidence of *slope differences*; in other words, there seems to be multiplicative between `c_interview`

and `gender`

. We already know that there is evidence of intercept differences, where men have a higher intercept, but now we have evidence that the criterion-related validities for the selection tool (`c_interview`

) in relation to the criterion (`perf_eval`

) are significantly different from one another. The amount of variance explained by `c_interview`

, `gender`

, *and* their interaction in relation to `perf_eval`

is medium-large in magnitude (*R*^{2} = .198, adjusted *R*^{2} = .191, *p* < .001). The significant *F*-value (7.476, *p* = .006) in from the nested model comparison indicates that adding the interaction term significantly improved model fit, and because there was a significant improvement in model fit, we should interpret the change in *R*^{2} (*R*^{2} = .016), which indicates that adding the interaction term to the model explained an additional 1.6% of the variance in `perf_eval`

.

Given the statistically significant interaction term, it is appropriate (and helpful) to probe the interaction by plotting it. This will help us understand the form of the interaction. Just like we did for visualizing intercept differences, we will use the `probe_interaction`

function from the `interactions`

package to plot this model - except this one is multiplicative instead of additive. As the first argument, enter the name of the regression model we just created above (`model.3`

). As the second argument, type the name of the variable you are framing as the predictor after `pred=`

, which in our case is the selection tool variable (`c_interview`

). As the third argument, type the name of the variable you are framing as the moderator after `modx=`

, which in our case is the protected class variable (`gender`

). As the fourth argument, enter `johnson_neyman=FALSE`

, as we aren’t requesting the Johnson-Neyman test in this tutorial. Finally, if you choose to, you can use `x.label=`

and `y.label=`

to create more informative names for the predictor and outcome variables, respectively. You could also add, if you should choose to do so, the `legend.main=`

argument to provide a more informative name for the moderator variable, and/or the `modx.labels=`

to provide more informative names for the moderator variable labels.

```
# Probe the significant interaction (slope differences)
probe_interaction(model.3,
pred=c_interview,
modx=gender,
johnson_neyman=FALSE,
x.label="Interview Score",
y.label="Job Performance")
```

```
## SIMPLE SLOPES ANALYSIS
##
## Slope of c_interview when gender = man:
##
## Est. S.E. t val. p
## ------- ------ -------- ------
## -0.02 0.08 -0.29 0.77
##
## Slope of c_interview when gender = woman:
##
## Est. S.E. t val. p
## ------ ------ -------- ------
## 0.26 0.06 4.32 0.00
```

The plot output shows that the two slopes are indeed different, such that there doesn’t seem to be much of an association between the selection tool (`c_interview`

, Interview) and the criterion (`perf_eval`

, Job Performance) for men, but there seems to be a postive association for women. Now take a look at the text output in your console. The simple slopes analysis automatically estimates the regression coefficient (slope) for both levels of the moderator variable (`gender`

: man, woman). Corroborating what we see in the plot, we find that the simple slope for women is indeed statistically significant and positive (*b* = .26, *p* < .01), such that for every one unit increase in interview scores, job performance scores tend to go up by .26 units. In contrast, we find that the simple slope for men is nonsignificant (*b* = -.02, *p* = .77).

In sum, we found evidence of intercept *and* slope differences for the selection tool (`c_interview`

) in relation to the criterion (`perf_eval`

) that were conditional upon the levels/subgroups of the protected class variable (`gender`

). Thus, it would be inappropriate to use a common regression line when predicting scores on the criterion based on the interview selection tool.

**Sample Write-Up:** Our HR analytics team recently found evidence in support of the criterion-related validity of a new structured interview selection procedure. As a next step, we investigated whether there might be evidence of differential prediction of the structured interview based on the U.S. protected class of gender. Based on a sample of 377 individuals, we applied a three-step process with moderated multiple linear regression, which is consistent with established principles (Society for Industrial & Organizational Psychology 2018; Cleary 1968). Based on our analyses, we found evidence differential prediction (i.e., predictive bias) based on gender for the new structured interview procedure, and specifically, we found evidence of both intercept and slope differences. In the first step, we found evidence of criterion-related validity based on a simple linear regression model, as structured interview scores were positively and significantly associated with job performance (*b* = .307, *p* < .001), thereby indicating some degree of job-relatedness and -relevance. In the second step, we added the protected class variable associated with gender to the model, resulting in a multiple linear regression model, and found that the model intercept for women was significantly lower than the intercept for men (*b* = -.602, *p* < .001). In the third step, we added the interaction term between the structured interview and gender variables, resulting in a moderated multiple linear regression model, and we found that gender moderated the association between interview scores and job performance scores to a statistically significant extent (*b* = .284, *p* = .007). Based on follow-up simple slopes analyses, we found that the association between structured interview scores and job performance scores was statistically significant for women (*b* = .26, *p* < .01), such that for every one unit increase in structured interview scores, job performance scores tended to increase by .26 units; in contrast, the associated between structured interview scores and job performance scores for men was nonsignificant (*b* = -.02, *p* = .77). In sum, we found evidence of both intercept and slope differences, which means that this interview tool shows predictive bias with respect to gender, making the application of a common regression line (i.e., equation) inappropriate according to U.S. guidelines. If possible, this structured interview should be redesigned or the interviewers should be trained to reduce the predictive bias based on gender. In the interim, we caution against using separate regression lines for men and women, as this may be interpreted as running afoul of the U.S. Civil Rights Act of 1991 (Saad and Sackett 2002). A more conservative approach would be to develop a structured interview process that allows for the use of a common regression line across legally protected groups.

#### 42.3.3.2 Test Differential Prediction Based on `c_age`

Investigating whether differential prediction exists for `c_interview`

in relation to `perf_eval`

based on the continuous moderator variable `c_age`

will unfold very similarly to the process we used for the dichotomous `gender`

variable from above. The only differences will occur when it comes to estimating and visualizing the intercept differences and simple slopes (assuming we find statistically significant effects). Given this, we will breeze through the steps, which means I will provide less explanation.

**Step One:** Regress the criterion variable `perf_eval`

on the selection tool variable `c_interview`

.

```
# Regress perf_eval (criterion) on c_interview (selection tool)
.1 <- lm(perf_eval ~ c_interview, data=DiffPred)
modelsummary(model.1)
```

```
##
## Call:
## lm(formula = perf_eval ~ c_interview, data = DiffPred)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.86274 -0.58012 0.01324 0.58922 2.74118
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.70504 0.04945 95.145 < 0.0000000000000002 ***
## c_interview 0.30665 0.04108 7.465 0.000000000000587 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9602 on 375 degrees of freedom
## Multiple R-squared: 0.1294, Adjusted R-squared: 0.1271
## F-statistic: 55.73 on 1 and 375 DF, p-value: 0.0000000000005866
```

The regression coefficient for `c_interview`

is statistically significant and positive (*b* = .307, *p* < .001), which provides evidence of criterion-related validity. The amount of variance explained by `c_interview`

in `perf_eval`

is medium-large in magnitude (*R*^{2} = .129, adjusted *R*^{2} = .127, *p* < .001).

**Step Two:** Regress the criterion variable `perf_eval`

on the selection tool variable `c_interview`

and the protected class variable `c_age`

. In addition, do a nested model comparison, and estimate the change in *R*^{2}.

```
# Regress perf_eval (criterion) on c_interview (selection tool) and protected class (c_age)
.2 <- lm(perf_eval ~ c_interview + c_age, data=DiffPred)
modelsummary(model.2)
```

```
##
## Call:
## lm(formula = perf_eval ~ c_interview + c_age, data = DiffPred)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.9396 -0.5133 -0.0155 0.5659 2.5057
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.705040 0.045766 102.806 < 0.0000000000000002 ***
## c_interview 0.557615 0.049317 11.307 < 0.0000000000000002 ***
## c_age 0.059144 0.007404 7.988 0.0000000000000171 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8886 on 374 degrees of freedom
## Multiple R-squared: 0.2563, Adjusted R-squared: 0.2523
## F-statistic: 64.44 on 2 and 374 DF, p-value: < 0.00000000000000022
```

```
# Nested model comparison
anova(model.1, model.2)
```

```
## Analysis of Variance Table
##
## Model 1: perf_eval ~ c_interview
## Model 2: perf_eval ~ c_interview + c_age
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 375 345.72
## 2 374 295.33 1 50.392 63.816 0.00000000000001707 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

```
# Change in R-squared
summary(model.2)$r.squared - summary(model.1)$r.squared
```

`## [1] 0.1269002`

The regression coefficient associated with `c_age`

is positive and statistically signficant (*b* = .059, *p* < .001) when controlling for `c_interview`

, which indicates that there is evidence of intercept differences. Because the coefficient is positive, it implies that the intercept value is significantly higher for older workers relative to younger workers. The amount of variance explained by `c_interview`

*and* `c_age`

in `perf_eval`

is large in magnitude (*R*^{2} = .256, adjusted *R*^{2} = .252, *p* < .001), which is notably larger than when just `c_interview`

was included in the model. The significant *F*-value (63.816, *p* < .001) in from the nested model comparison indicates that adding `c_age`

significantly improved model fit, and because there was a significant improvement in model fit, we should interpret the change in *R*^{2} (*R*^{2} = .127), which indicates that adding `c_age`

to the model explained an additional 12.7% of the variance in `perf_eval`

.

Using the `probe_interaction`

function from the `interactions`

package, we will visualize the intercept differences of the `model.2`

model we specified above. Note that the function automatically categorizes the continuous moderator variable by its mean and 1 standard deviation (SD) below and above the mean. As you can see, individuals whose age is 1 SD above the mean have the highest intercept value.

```
# Visualize intercept differences
probe_interaction(model.2,
pred=c_interview,
modx=c_age,
johnson_neyman=FALSE,
x.label="Interview Score",
y.label="Job Performance")
```

```
## SIMPLE SLOPES ANALYSIS
##
## Slope of c_interview when c_age = -8.029929095154546558888 (- 1 SD):
##
## Est. S.E. t val. p
## ------ ------ -------- ------
## 0.56 0.05 11.31 0.00
##
## Slope of c_interview when c_age = 0.000000000000001243921 (Mean):
##
## Est. S.E. t val. p
## ------ ------ -------- ------
## 0.56 0.05 11.31 0.00
##
## Slope of c_interview when c_age = 8.029929095154550111602 (+ 1 SD):
##
## Est. S.E. t val. p
## ------ ------ -------- ------
## 0.56 0.05 11.31 0.00
```

**Step Three:** Now add the interaction term between `c_interview`

and `c_age`

to the model using the `*`

symbol. In addition, do a nested model comparison, and estimate the change in *R*^{2}.

```
# Regress perf_eval (criterion) on c_interview (selection tool) and protected class (c_age)
.3 <- lm(perf_eval ~ c_interview * c_age, data=DiffPred)
modelsummary(model.3)
```

```
##
## Call:
## lm(formula = perf_eval ~ c_interview * c_age, data = DiffPred)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.98160 -0.48292 -0.02569 0.54557 2.44720
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.795272 0.051339 93.404 < 0.0000000000000002 ***
## c_interview 0.572163 0.048683 11.753 < 0.0000000000000002 ***
## c_age 0.069471 0.007812 8.893 < 0.0000000000000002 ***
## c_interview:c_age 0.014671 0.004010 3.658 0.00029 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8743 on 373 degrees of freedom
## Multiple R-squared: 0.2821, Adjusted R-squared: 0.2763
## F-statistic: 48.85 on 3 and 373 DF, p-value: < 0.00000000000000022
```

```
# Nested model comparison
anova(model.2, model.3)
```

```
## Analysis of Variance Table
##
## Model 1: perf_eval ~ c_interview + c_age
## Model 2: perf_eval ~ c_interview * c_age
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 374 295.33
## 2 373 285.10 1 10.23 13.384 0.0002902 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

```
# Change in R-squared
summary(model.3)$r.squared - summary(model.2)$r.squared
```

`## [1] 0.02576237`

The regression coefficient associated with the interaction term (`c_interview:c_age`

) is positive and statistically signficant (*b* = .015, *p* < .001), which indicates that there is evidence of *slope differences*. The amount of variance explained by `c_interview`

, `c_age`

, *and* their interaction in relation to `perf_eval`

is large in magnitude (*R*^{2} = .282, adjusted *R*^{2} = .276, *p* < .001). The significant *F*-value (13.384, *p* < .001) in from the nested model comparison indicates that adding the interactoin term significantly improved model fit, and because there was a significant improvement in model fit, we should interpret the change in *R*^{2} (*R*^{2} = .026), which indicates that adding the interaction term to the model explained an additional 2.6% of the variance in `perf_eval`

.

Given the statistically significant interaction term, we will use the `probe_interaction`

function from the `interactions`

package to probe the interaction.
Now we’re ready to use the `probe_interaction`

to plot this multiplicative model that we named `model.3`

.

```
# Probe the significant interaction (slope differences)
probe_interaction(model.3,
pred=c_interview,
modx=c_age,
johnson_neyman=FALSE,
x.label="Interview Score",
y.label="Job Performance")
```

```
## SIMPLE SLOPES ANALYSIS
##
## Slope of c_interview when c_age = -8.029929095154546558888 (- 1 SD):
##
## Est. S.E. t val. p
## ------ ------ -------- ------
## 0.45 0.06 8.09 0.00
##
## Slope of c_interview when c_age = 0.000000000000001243921 (Mean):
##
## Est. S.E. t val. p
## ------ ------ -------- ------
## 0.57 0.05 11.75 0.00
##
## Slope of c_interview when c_age = 8.029929095154550111602 (+ 1 SD):
##
## Est. S.E. t val. p
## ------ ------ -------- ------
## 0.69 0.06 11.40 0.00
```

The plot and the simple slopes analyses output show that the two slopes are indeed different, such that the association between the selection tool (`c_interview`

, Interview) and the criterion (`perf_eval`

, Job Performance) is stronger for older workers (*b* = .69, *p* < .01), than for average-aged workers (*b* = .57, *p* < .01) and younger workers (*b* = .45, *p* < .01). Notably, all three simple slopes are statistically significant and postive, but the slope is stronger for older workers.

In sum, we found evidence of intercept *and* slope differences for the selection tool (`c_interview`

) in relation to the criterion (`perf_eval`

) that were conditional upon the protected class variable (`c_age`

). Thus, it would be inappropriate to use a common regression line when predicting scores on the criterion based on the interview selection tool.

#### 42.3.3.3 Test Differential Prediction Based on `race`

Investigating whether differential prediction exists for `c_interview`

in relation to `perf_eval`

based on moderator variable `race`

will unfold in the much the same way as the dichotomous `gender`

variable from above; however, we will filter the `race`

variable within the regression functions to focus on just comparing the Asian and White individuals. The decision to focus on these two subgroups, specifically, is arbitrary and just for the sake of demonstration. Note that in the `DiffPred`

data frame object that both `asian`

and `white`

are lowercase category labels in the `race`

variable.

**Step One:** Regress the criterion variable `perf_eval`

on the selection tool variable `c_interview`

. Add the `subset=(race=="asian" | race=="white")`

argument to subset the data such that only those individuals who are Asian and White are retained. In doing so, we effectively dichotomize the `race`

variable within the function. If you need to review conditional/logical statements when filtering data, check out the chapter on filtering data.

```
# Regress perf_eval (criterion) on c_interview (selection tool)
.1 <- lm(perf_eval ~ c_interview, data=DiffPred,
modelsubset=(race=="asian" | race=="white"))
summary(model.1)
```

```
##
## Call:
## lm(formula = perf_eval ~ c_interview, data = DiffPred, subset = (race ==
## "asian" | race == "white"))
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.78941 -0.60097 -0.04034 0.58952 2.75581
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.75118 0.05071 93.696 < 0.0000000000000002 ***
## c_interview 0.34840 0.04286 8.128 0.00000000000000796 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9395 on 344 degrees of freedom
## Multiple R-squared: 0.1611, Adjusted R-squared: 0.1587
## F-statistic: 66.06 on 1 and 344 DF, p-value: 0.000000000000007962
```

The regression coefficient for `c_interview`

is statistically significant and positive (*b* = .348, *p* < .001), which provides evidence of criterion-related validity. The amount of variance explained by `c_interview`

in `perf_eval`

is medium-large in magnitude (*R*^{2} = .161, adjusted *R*^{2} = .159, *p* < .001).

**Step Two:** Regress the criterion variable `perf_eval`

on the selection tool variable `c_interview`

and the protected class variable `race`

. In addition, do a nested model comparison, and estimate the change in *R*^{2}.

```
# Regress perf_eval (criterion) on c_interview (selection tool) and protected class (race)
.2 <- lm(perf_eval ~ c_interview + race, data=DiffPred,
modelsubset=(race=="asian" | race=="white"))
summary(model.2)
```

```
##
## Call:
## lm(formula = perf_eval ~ c_interview + race, data = DiffPred,
## subset = (race == "asian" | race == "white"))
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.76671 -0.61101 -0.04633 0.61482 2.67395
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.91778 0.06951 70.754 < 0.0000000000000002 ***
## c_interview 0.40664 0.04546 8.945 < 0.0000000000000002 ***
## racewhite -0.37273 0.10819 -3.445 0.000642 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.925 on 343 degrees of freedom
## Multiple R-squared: 0.1892, Adjusted R-squared: 0.1844
## F-statistic: 40.01 on 2 and 343 DF, p-value: 0.000000000000000241
```

```
# Nested model comparison
anova(model.1, model.2)
```

```
## Analysis of Variance Table
##
## Model 1: perf_eval ~ c_interview
## Model 2: perf_eval ~ c_interview + race
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 344 303.66
## 2 343 293.50 1 10.156 11.869 0.0006415 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

```
# Change in R-squared
summary(model.2)$r.squared - summary(model.1)$r.squared
```

`## [1] 0.0280581`

The regression coefficient associated with `race`

(or more specifically `racewhite`

) is negative and statistically signficant (*b* = -.373, *p* = .001) when controlling for `c_interview`

, which indicates that there is evidence of intercept differences. Because the coefficient is negative and because `white`

is appended to the `race`

variable name, it implies that the intercept value is significantly lower for white individuals compared to asian individuals. The amount of variance explained by `c_interview`

*and* `race`

in `perf_eval`

is medium-large in magnitude (*R*^{2} = .189, adjusted *R*^{2} = .184, *p* < .001). The significant *F*-value (11.869, *p* = .001) from the nested model comparison indicates that adding `race`

significantly improved model fit, and because there was a significant improvement in model fit, we should interpret the change in *R*^{2} (*R*^{2} = .028), which indicates that adding `race`

to the model explained an additional 2.8% of the variance in `perf_eval`

.

Using the `probe_interaction`

function from the `interactions`

package, we will visualize the intercept differences. As input to the `probe_interaction`

function, use the the `model.2`

model object from above, and specify the arguments in the `probe_interaction`

function as you did before.

```
# Visualize intercept differences
probe_interaction(model.2,
pred=c_interview,
modx=race,
johnson_neyman=FALSE,
x.label="Interview Score",
y.label="Job Performance")
```

```
## SIMPLE SLOPES ANALYSIS
##
## Slope of c_interview when race = white:
##
## Est. S.E. t val. p
## ------ ------ -------- ------
## 0.41 0.05 8.94 0.00
##
## Slope of c_interview when race = asian:
##
## Est. S.E. t val. p
## ------ ------ -------- ------
## 0.41 0.05 8.94 0.00
```

**Step Three:** Now add the interaction term between `c_interview`

and `race`

to the model using the `*`

symbol. In addition, do a nested model comparison, and estimate the change in *R*^{2}.

```
# Regress perf_eval (criterion) on c_interview (selection tool), protected class (race),
# and interaction between selection tool and protected class
.3 <- lm(perf_eval ~ c_interview * race, data=DiffPred,
modelsubset=(race=="asian" | race=="white"))
summary(model.3)
```

```
##
## Call:
## lm(formula = perf_eval ~ c_interview * race, data = DiffPred,
## subset = (race == "asian" | race == "white"))
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.80582 -0.60528 -0.06719 0.57676 2.63035
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.89594 0.07214 67.863 < 0.0000000000000002 ***
## c_interview 0.36167 0.06054 5.974 0.00000000582 ***
## racewhite -0.37405 0.10815 -3.459 0.000612 ***
## c_interview:racewhite 0.10301 0.09163 1.124 0.261728
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9247 on 342 degrees of freedom
## Multiple R-squared: 0.1922, Adjusted R-squared: 0.1851
## F-statistic: 27.12 on 3 and 342 DF, p-value: 0.0000000000000009343
```

```
# Nested model comparison
anova(model.2, model.3)
```

```
## Analysis of Variance Table
##
## Model 1: perf_eval ~ c_interview + race
## Model 2: perf_eval ~ c_interview * race
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 343 293.50
## 2 342 292.42 1 1.0806 1.2638 0.2617
```

```
# Change in R-squared
summary(model.3)$r.squared - summary(model.2)$r.squared
```

`## [1] 0.002985186`

The regression coefficient associated with the interaction term (`c_interview:racewhite`

) is *not* statistically signficant (*b* = .103, *p* = .262), which indicates that there is no evidence of *slope differences*. Further, adding the interaction term did not significantly improve model fit, so we won’t interpret change in *R*^{2}. Given all this, we won’t probe the interaction because, well, there is not an interaction to probe.

In sum, we found evidence of intercept differences for the selection tool (`c_interview`

) in relation to the criterion (`perf_eval`

) that were conditional upon the protected class variable (`race`

). Thus, it would be inappropriate to use a common regression line when predicting scores on the criterion based on the interview selection tool.

### 42.3.4 APA-Style Results Table

If you want to present the results of your moderated multiple linear regression (MMLR) model to a more statistically inclined audience, particularly an audience that prefers American Psychological Association (APA) style, consider using functions from the `apaTables`

package (Stanley 2021).

Using the `lm`

function from base R, as we did above, let’s begin by estimating a MMLR model and naming the model object (`model.3`

).

```
# Estimate multiple linear regression model
.3 <- lm(perf_eval ~ c_interview * gender, data=DiffPred) model
```

If you haven’t already, install and access the `apaTables`

package using the `install.packages`

and `library`

functions, respectively.

```
# Install package
install.packages("apaTables")
```

```
# Access package
library(apaTables)
```

The `apa.reg.table`

function from `apaTables`

is pretty straightforward. Simply enter your regression model object (`model.3`

) as the sole parenthetical argument. This will generate a table as output in your Console.

```
# Create APA-style regression table
apa.reg.table(model.3)
```

```
##
##
## Regression results using perf_eval as the criterion
##
##
## Predictor b b_95%_CI sr2 sr2_95%_CI Fit
## (Intercept) 5.26** [5.04, 5.48]
## c_interview -0.02 [-0.19, 0.14] .00 [-.00, .00]
## genderwoman -0.72** [-0.98, -0.47] .07 [.02, .11]
## c_interview:genderwoman 0.28** [0.08, 0.49] .02 [-.01, .04]
## R2 = .198**
## 95% CI[.13,.26]
##
##
## Note. A significant b-weight indicates the semi-partial correlation is also significant.
## b represents unstandardized regression weights.
## sr2 represents the semi-partial correlation squared.
## Square brackets are used to enclose the lower and upper limits of a confidence interval.
## * indicates p < .05. ** indicates p < .01.
##
```

If we add a `filename=`

as a second argument, we can specify the name of a .doc file that we can write to our working directory. Here, I name the file “APA Multiple Linear Regression Table.doc”. The .doc file that appears in your working directory will be nicely formatted and include critical regression model results in an organized manner.

```
# Create APA-style regression table and write to working directory
apa.reg.table(model.3, filename="APA Moderated Multiple Linear Regression Table.doc")
```

```
##
##
## Regression results using perf_eval as the criterion
##
##
## Predictor b b_95%_CI sr2 sr2_95%_CI Fit
## (Intercept) 5.26** [5.04, 5.48]
## c_interview -0.02 [-0.19, 0.14] .00 [-.00, .00]
## genderwoman -0.72** [-0.98, -0.47] .07 [.02, .11]
## c_interview:genderwoman 0.28** [0.08, 0.49] .02 [-.01, .04]
## R2 = .198**
## 95% CI[.13,.26]
##
##
## Note. A significant b-weight indicates the semi-partial correlation is also significant.
## b represents unstandardized regression weights.
## sr2 represents the semi-partial correlation squared.
## Square brackets are used to enclose the lower and upper limits of a confidence interval.
## * indicates p < .05. ** indicates p < .01.
##
```

### References

*Organizational Research Methods*1 (3): 296–314.

*Annuual Review of Organizational Psychology and Organizational Behavior*2 (1): 435–63.

*Applied Psychology in Human Resource Management (6th Ed.)*. Upper Saddle River, NJ: Pearson Prentice Hall.

*Journal of Educational Measurement*5 (2): 115–24.

*Human Resource Management Review*27 (2): 316–27.

*lessR: Less Code, More Results*. https://CRAN.R-project.org/package=lessR.

*Interactions: Comprehensive, User-Friendly Toolkit for Probing Interactions*. https://cran.r-project.org/package=interactions.

*Journal of Applied Psychology*87 (4): 667–74.

*Principles for the Validation and Use of Personnel Selection Procedures (5th Ed.)*. Bowling Green, OH: Pearson Prentice Hall.

*apaTables: Create American Psychological Association (APA) Style Tables*. https://CRAN.R-project.org/package=apaTables.

*Journal of Management*20 (1): 167–78.

*Readr: Read Rectangular Text Data*. https://CRAN.R-project.org/package=readr.