Chapter 43 Testing for Differential Prediction Using Moderated Multiple Linear Regression

In this chapter, we will learn how to apply test whether a selection tool shows evidence of differential prediction by using moderating multiple linear regression. We’ll begin with conceptual overviews of the moderated multiple linear regression and differential prediction, and we’ll conclude with a tutorial.

43.1 Conceptual Overview

In this chapter, we will learn how to use an ordinary least squares (OLS) moderated multiple linear regression model to test for differential prediction of an employee selection tool due to protected class membership. Thus, the way in which we build and interpret our moderated multiple regression model will be couched in the context of differential prediction. Nonetheless, this tutorial should still provide you with the technical and interpretation skills to understand how to apply moderated multiple linear regression in other non-selection contexts.

43.1.1 Review of Moderated Multiple Linear Regression

Link to conceptual video: https://youtu.be/mkia_KLjJ0M

Moderated multiple linear regression (MMLR) is a specific type of multiple linear regression, which means that MMLR involves multiple predictor variables and a single outcome variable. For more information on multiple linear regression, check out the chapter on estimating incremental validity. In MMLR, we are in effect determining whether the association between a predictor variable and the outcome variable is conditional upon the level/value of another predictor variable, the latter of which we often refer to as a moderator or moderator variable.

Moderators: A moderator (i.e., moderator variable) is a predictor variable upon which an association between two other variables is conditional. A moderator variable can be continuous (e.g., age: measured in years) or categorical (e.g., location: Portland vs. Beaverton). When we find that one variable moderates the association between two other variables, we often refer to this as an interaction or interaction effect. For example, let’s imagine that we are interested in the association between job satisfaction and job performance in our organization. By introducing the categorical moderator variable of employee location (Portland vs. Beaverton), we can determine whether the magnitude and/or sign of the association between job satisfaction and job performance varies by the location at which employees work. Perhaps we find that the association between job satisfaction and job performance is significantly stronger for employees who work in Beaverton as compared to employees who work in Portland (see figure below), and thus we would conclude that there is a significant interaction between job satisfaction and location in relation to job performance.

This line and scatter plot illustrates pictorially how a variable like employee location might moderate the association between job satisfaction and job performance.
This line and scatter plot illustrates pictorially how a variable like employee location might moderate the association between job satisfaction and job performance.

When investigating whether an interaction exists between two variables relative to the outcome variable, we need to create a new variable called an interaction term (i.e., product term). An interaction term is created by multiplying the scores of two predictor variables – or in other words, the computing the product of two variables’ respective scores. In fact, the presence of an in interaction term – and associated multiplication – signals why MMLR is referred to as a multiplicative model. In terms of the language we use to describe an interaction and the associated interaction term, we often refer to one of the predictor variables involved in an interaction as the moderator. For an illustration of how an interaction term is created, please refer to the table below.

To create an interaction term (i.e., product term), we simply multiply the associated predictor variables’ scores (e.g., predictor variable and moderator variable) for each case (e.g., employee).
To create an interaction term (i.e., product term), we simply multiply the associated predictor variables’ scores (e.g., predictor variable and moderator variable) for each case (e.g., employee).

Centering Predictor & Moderator Variables: To improve the interpretability of the main effects (but not the interaction term) and to reduce collinearity between the predictor variables involved in the interaction term, we typically grand-mean center any continuous (interval, ratio) predictor variables; we do not need to center the outcome variable or a predictor or moderator variable that is categorical (e.g., dichotomous). As discussed in the chapter on centering and standardizing variables, grand-mean centering refers to the process of subtracting the overall sample mean for a variable from each case’s score on that variable. We center the predictor variables (i.e., predictor and moderator variables) before computing the interaction term based on those variables. To illustrate how grand-mean centering works, take a look at the table below, and note that we are just subtracting the (grand) mean for the sample from the specific value on the predictor variable for each case.

To grand-mean center a variable, we subtract the overall sample mean of the variable from each case’s score on that variable.
To grand-mean center a variable, we subtract the overall sample mean of the variable from each case’s score on that variable.

When we grand-mean center the predictor variables in a regression model, it also changes our interpretation of the \(\hat{Y}\)-intercept value. If you recall from prior reviews of simple linear regression and multiple linear regression, the \(\hat{Y}\)-intercept value is the mean of the outcome variable when each predictor variable(s) in the model is/are set to zero. Thus, when we center each predictor variable, we shift the mean of each variable to zero, which means that after grand-mean centering, the \(\hat{Y}\)-intercept value is the mean of the outcome variable when each predictor variable is set to its mean. Beyond moderated multiple linear regression, grand-mean centering can be useful in simple and multiple linear regression models when interpreting the regression coefficients associated with the intercept and the slope(s), as in some cases, a score of zero on the predictor variable is theoretically meaningless, implausible, or both. For example, if one of our predictor variables has an interval measurement scale that ranges from 10-100, then a score of zero is not even part of the scale range, which would make shifting the zero value to the grand-mean a more meaningful value.

Regardless of your decision to grand-mean center or to not grand-mean center the predictor variable and moderator variables, the interaction term’s significance level (p-value) and model fit (i.e., R2) will remain the same. In fact, sometimes it makes more sense to plot the findings of your model based on the uncentered version of the variables. That choice is really up to you and should be based on the context at hand.

Moderated Multiple Linear Regression: In moderated multiple linear regression (MMLR), we have three or more predictor variables and a single outcome variable, where one of the predictor variables is the interaction term (i.e., product term) between two of the predictor variables (e.g., predictor and moderator variables). Just like multiple linear regression, MMLR allows for a series of covariates (control variables) to be included in the model; accordingly, each regression coefficient (i.e., slope, weight) represents the relationship between the associated predictor and the outcome variable when statistically controlling for all other predictors in the model.

To better understand what MMLR is and how it works, I find that it’s useful to consider a MMLR model in equation form. Specifically, let’s consider a scenario in which we estimate a MMLR with two predictor variables, their interaction term (i.e., product term), and a single outcome variable. The equation for such a model with unstandardized regression coefficients (\(b\)) would be as follows:

\(\hat{Y} = b_{0} + b_{1}X_1 + b_{2}X_2 + b_{3}X_1*X_2 + e\)

where \(\hat{Y}\) represents the predicted score on the outcome variable (\(Y\)), \(b_{0}\) represents the \(\hat{Y}\)-intercept value (i.e., model constant) when the predictor variables \(X_1\) and \(X_2\) and interaction term \(X_1*X_2\) are equal to zero, \(b_{1}\), \(b_{2}\), and \(b_{3}\) represent the unstandardized coefficients (i.e., weights, slopes) of the associations between the predictor variables \(X_1\) and \(X_2\) and interaction term \(X_1*X_2\), respectively, and the outcome variable \(\hat{Y}\), and \(e\) represents the (residual) error in prediction. If the regression coefficient associated with the interaction term (\(b_{3}\)) is found to be statistically significant, then we would conclude that there is evidence of a significant interaction effect.

If we conceptualize one of the predictor variables as a moderator variable, then we might re-write the equation from above by swapping out \(X_2\) with \(W\) (or \(Z\)). The meaning of the equation remains the same, but changing the notation in this way may signal more clearly which variable we are labeling a moderator.

\(\hat{Y} = b_{0} + b_{1}X + b_{2}W + b_{3}X*W + e\)

Unstandardized regression coefficient indicates the “raw” regression coefficient (i.e., weight, slope) when controlling for the associations between all other predictors in the model and the outcome variable; accordingly, each unstandardized regression coefficient represents how many unstandardized units of \(\hat{Y}\) increase/decrease as a result of a single unit increase in a predictor variable, when statistically controlling for the effects of other predictors in the regression model.

Typically, we are most interested in whether slope differences exist; with that being said, in the context of testing for differential prediction (described below), we are also interested in whether intercept differences exist. In essence, a significant interaction term indicates the presence of slope differences, where slopes differences refer to significant variation in the association (i.e., slope) between the predictor variable and outcome variable by levels/values of the moderator variable. In the context of our first MMLR equation (see above), if the regression coefficient (\(b_{3}\)) associated with the interaction term (\(X_1*X_2\)) is statistically significant, we conclude that there is significant interaction and thus significant slope differences. In fact, we might even us language like, “\(X_2\) moderates the association between \(X_1\) and \(Y\).”

When we find a statistically significant interaction, it is customary to plot the simple slopes and estimate whether each of the simple slopes is significantly different from zero. Simple slopes allow us to probe the form of the significant interaction so that we can more accurately describe it. Plus, the follow-up simple slopes analysis helps us understand at which levels of the moderator variable the association between the predictor and outcome variables is statistically significant. For a continuous moderator variable, we typically plot and estimate the simple slopes at 1 standard deviation (SD) above and below the mean for the variable – and sometimes at the mean of the variable as well; with that being said, if we want, we can choose more theoretically or empirically meaningful values for the moderator variable. For a categorical variable, we plot and estimate the simple slopes based on the different discrete levels (categories) of the variable. For example, if our moderator variable captures gender identity operationalized as man (0) and woman (1), then we would plot the simple slopes when sex is equal to man (0) and when sex is equal to woman (1).

43.1.1.1 Statistical Assumptions

The statistical assumptions that should be met prior to running and/or interpreting estimates from a moderated multiple linear regression model are the same as those for conventional multiple linear regression include:

  • Cases are randomly sampled from the population, such that the variable scores for one individual are independent of the variable scores of another individual;
  • Data are free of multivariate outliers;
  • The association between the predictor and outcome variables is linear;
  • There is no (multi)collinearity between predictor variables;
  • Average residual error value is zero for all levels of the predictor variables;
  • Variances of residual errors are equal for all levels of the predictor variables, which is referred to as the assumption of homoscedasticity;
  • Residual errors are normally distributed for all levels of the predictor variables.

The fourth statistical assumption refers to the concept of collinearity (multicollinearity). This can be a tricky concept to understand, so let’s take a moment to unpack it. When two or more predictor variables are specified in a regression model, as is the case with multiple linear regression, we need to be wary of collinearity. Collinearity refers to the extent to which predictor variables correlate with each other. Some level of intercorrelation between predictor variables is to be expected and is acceptable; however, if collinearity becomes substantial, it can affect the weights – and even the signs – of the regression coefficients in our model, which can be problematic from an interpretation standpoint. As such, we should avoid including predictors in a multiple linear regression model that correlate highly with one another. The tolerance statistic is commonly computed and serves as an indicator of collinearity. The tolerance statistic is computed by computing the shared variance (R2) of just the predictor variables in a single model (excluding the outcome variable), and subtracting that R2 value from 1 (i.e., 1 - R2). We typically grow concerned when the tolerance statistic falls below .20 and closer to .00. Ideally, we want the tolerance statistic to approach 1.00, as this indicates that there are lower levels of collinearity. From time to time, you might also see the variance inflation factor (VIF) reported as an indicator of collinearity; the VIF is just the reciprocal of the tolerance (i.e., 1/tolerance), and in my opinion, it is redundant to report and interpret both the tolerance and VIF. My recommendation is to focus just on the tolerance statistic when inferring whether the statistical assumption of no collinearity might have been violated.

Finally, with respect to the assumption that cases are randomly sampled from population, we will assume in this chapter’s data that this is not an issue. If we were to suspect, however, that there were some clustering or nesting of cases in units/groups (e.g., by supervisors, units, or facilities) with respect to our outcome variable, then we would need to run some type of multilevel model (e.g., hierarchical linear model, multilevel structural equation model), which is beyond the scope of this tutorial. An intraclass correlation (ICC) can be used to diagnose such nesting or cluster. Failing to account for clustering or nesting in the data can bias estimates of standard errors, which ultimately influences the p-values and inferences of statistical significance.

43.1.1.2 Statistical Signficance

Using null hypothesis significance testing (NHST), we interpret a p-value that is less than .05 (or whatever two- or one-tailed alpha level we set) to meet the standard for statistical significance, meaning that we reject the null hypothesis that the regression coefficient is equal to zero when controlling for the effects of the other predictor variable(s) and interaction term variable in the model. In other words, if a regression coefficient’s p-value is less than .05, we conclude that the regression coefficient differs from zero to a statistically significant extent when controlling for the effects of other predictor variables in the model. In contrast, if the regression coefficient’s p-value is equal to or greater than .05, then we fail to reject the null hypothesis that the regression coefficient is equal to zero when controlling for the effects of other predictor variable(s) and interaction term variable in the model. Put differently, if a regression coefficient’s p-value is equal to or greater than .05, we conclude that the regression coefficient does not differ from zero to a statistically significant extent, leading us to conclude that there is no association between the predictor variable (or interaction term variable) and the outcome variable in the population – when controlling for the effects of other predictor variables and interaction term variable in the model.

When setting an alpha threshold, such as the conventional two-tailed .05 level, sometimes the question comes up regarding whether borderline p-values signify significance or nonsignificance. For our purposes, let’s be very strict in our application of the chosen alpha level. For example, if we set our alpha level at .05, p = .049 would be considered statistically significant, and p = .050 would be considered statistically nonsignificant.

Because our regression model estimates are based on data from a sample that is drawn from an underlying population, sampling error will affect the extent to which our sample is representative of the population from which its drawn. That is, a regression coefficient estimate (b) is a point estimate of the population parameter that is subject to sampling error. Fortunately, confidence intervals can give us a better idea of what the true population parameter value might be. If we apply an alpha level of .05 (two-tailed), then the equivalent confidence interval (CI) is a 95% CI. In terms of whether a regression coefficient is statistically significant, if the lower and upper limits of 95% CI do not include zero, then this tells us the same thing as a p-value that is less than .05. Strictly speaking, a 95% CI indicates that if we were to hypothetically draw many more samples from the underlying population and construct CIs for each of those samples, then the true parameter (i.e., true value of the regression coefficient in the population) would likely fall within the lower and upper bounds of 95% of the estimated CIs. In other words, the 95% CI gives us an indication of plausible values for the population parameter while taking into consideration sampling error. A wide CI (i.e., large difference between the lower and upper limits) signifies more sampling error, and a narrow CI signifies less sampling error.

43.1.1.3 Practical Significance

As a reminder, an effect size is a standardized metric that can be compared across samples. In a moderated multiple linear regression (MMLR) model, an unstandardized regression coefficient (\(b\)) is not an effect size. The reason being, an unstandardized regression coefficient estimate is based on the original scaling of the predictor and outcome variables, and thus the same effect will take on different regression coefficients to the extent that the predictor and outcome variables have different scalings across samples.

A standardized regression coefficient (\(\beta\)) can be interpreted as an effect size (and thus an indicator of practical significance) given that it is standardized. With that being said, I suggest doing so with caution as collinearity (i.e., correlation) between predictor variables in the model can bias our interpretation of \(\beta\) as an effect size. Thus, if your goal is just to understand the bivariate association between a predictor variable and an outcome variable (without introducing statistical control), then I recommend to just estimate a correlation coefficient as an indicator of practical significance, which I discuss in the chapter on estimating criterion-related validity using correlations.

In a MMLR model, we can also describe the magnitude of the effect in terms of the proportion of variance explained in the outcome variable by the predictor variables (i.e., R2). That is, in a multiple linear regression model, R2 represents the proportion of collective variance explained in the outcome variable by all of the predictor variables. Conceptually, we can think of the overlap between the variability in the predictor variables and and outcome variable as the variance explained (R2), and R2 is a way to evaluate how well a model fits the data (i.e., model fit). I’ve found that the R2 is often readily interpretable by non-analytics audiences. For example, an R2 of .25 in a MMLR model can be interpreted as: the predictor variable and interaction term variable scores explain 25% of the variability in scores on the outcome variable. That is, to convert an R2 from a proportion to a percent, we just multiply by 100.

R2 Description
.01 Small
.09 Medium
.25 Large

43.1.2 Review of Differential Prediction

Differential prediction refers to differences in intercepts, slopes, or both based on subgroup (projected class) membership (Society for Industrial & Organizational Psychology 2018; Cleary 1968), and sometimes differential prediction is referred to as predictive bias. Assuming a single regression line is used to predict the criterion (e.g., job performance) scores of applicants, differential prediction is concerned with overpredicting or underpredicting criterion scores for one subgroup relative to another (Cascio and Aguinis 2005). A common regression line would be inappropriate when evidence of intercept differences, slope differences, or both exist.

In the U.S., the use of separate regression lines – one for each subgroup – is legally murky, with some interpreting separate regression lines as violation of federal law and others cautioning that the practice might be interpreted as score adjustment, thereby running afoul of the Civil Rights Act of 1991 (Berry 2015; Saad and Sackett 2002). Specifically, Section 106 of the Civil Rights Act of 1991 amends Section 703 of the Civil Rights Act of 1964 to prohibit the discriminatory use of selection test scores:

“It shall be an unlawful employment practice for a respondent, in connection with the selection or referral of applicants or candidates for employment or promotion, to adjust the scores of, use different cutoff scores for, or otherwise alter the results of, employment related tests on the basis of race, color, religion, sex, or national origin.”

— Section 106, U.S. Civil Rights Act of 1991

Thus, a legally cautious approach would be to discontinue the use of a selection tool in its current form should evidence of differential prediction come to light.

We’ll begin by considering an example of intercept differences when predicting scores on the criterion (e.g., job performance) based on scores on a selection tool. Imagine a situation in which scores on a selection tool differentially predict scores on job performance for men and women, such that the slopes are the same, but the intercepts of each slope are different (see figure below). The dashed black line indicates what a common regression line would look like if it were used. As you can see, if we were to use a single common regression line when intercept differences exist between men and women, then we would tend to underpredict job performance scores for men and overpredict for women.

This simple slopes plot illustrates intercept differences by group but not slope differences. The dashed line indicates what a single regression line would like if it were used.
This simple slopes plot illustrates intercept differences by group but not slope differences. The dashed line indicates what a single regression line would like if it were used.

Next, we’ll consider an example involving slope differences in the context of differential prediction. In the figure below, the slopes (e.g., criterion-related validities) are different when the selection tool is used for men as compared to women. Again, the dashed black line represents what would happen if a common regression line were used, which would clearly be inappropriate. Note that, in this particular example, the job performance scores would generally be underpredicted for men and overpredicted for women when using the same regression. Further, the selection tool appears to show criterion-related validity (i.e., a significant slope) for men but not for women. Accordingly, based on this sample, the selection tool might only be a valid predictor of job performance for men. Clearly this would be a problem.

This simple slopes plot illustrates slope differences by group (i.e., different criterion-related validities) but not intercept differences. The dashed line indicates what a single regression line would like if it were used.
This simple slopes plot illustrates slope differences by group (i.e., different criterion-related validities) but not intercept differences. The dashed line indicates what a single regression line would like if it were used.

Finally, we’ll consider an example involving both intercept differences and slope differences. The figure below shows that use of a common regression line would pose problems in terms of predictive bias. In this example, for men, using of a common regression line would overpredict job performance scores for a certain range of selection tool scores and underpredict job performance scores for another range of selection tool scores; the opposite would be true for women.

This simple slopes plot illustrates both intercept differences and slope differences by group. The dashed line indicates what a single regression line would like if it were used.
This simple slopes plot illustrates both intercept differences and slope differences by group. The dashed line indicates what a single regression line would like if it were used.

Words of Caution: In this tutorial, we will learn how to apply MMLR in the service of detecting differential predication using an approach that is generally supported by the U.S. court system and researchers. With that said, using MMLR to detect differential prediction should be used cautiously with small samples, as failure to detect differential prediction that actually exists (i.e., Type II Error, false negative) could be the result of low statistical power, and relatedly, unreliability or bias in the selection tool or criterion and violation of certain statistical assumptions (e.g., homoscedasticity of residual error variances) can affect our ability to find differential prediction that actually exists (Aguinis and Pierce 1998). Further, when one subgroup is less than 30% of the total sample size, statistical power is also diminished (Stone-Romero, Alliger, and Aguinis 1994). The take-home message is to remain cautious and thoughtful when investigating differential prediction.

Multistep MMLR Process for Differential Prediction: We will use a three-step “step-up” process for investigating differential prediction using MMLR (Society for Industrial & Organizational Psychology 2018; Cleary 1968). This process uses what is referred to as hierarchical regression, as additional predictor variables are added at each subsequent step, and their incremental validity above and beyond the other predictors in the model is assessed.

In the first step, we will regress the criterion variable on the selection tool variable, and in this step, we will verify whether the selection tool shows criterion-related validity by itself. If we find evidence of criterion-related validity for the selection tool in this first step, then we proceed to the second and third steps.

In the second step, we include the selection tool variable as before but add the dichotomous (binary) protected class/group variable to the model. If the protected class variable is statistically significant when controlling for the selection tool variable, then there is evidence of intercept differences. You might be asking yourself: What happens when a protected class variable is not inherently categorical or, specifically, dichotomous? When it comes to the protected class of race, many organizations have more than two races represented among their employees. In this situation, it is customary (at least within the U.S.) to compare two subgroups at time (e.g., White and Black) or to create a dichotomy consisting of the majority race and members of other non-majority races (e.g., White and Non-White). For a inherently continuous variable like age, classically, the standard procedure was to dichotomize the age variable by those who are 40 years and older and those who are younger than 40 years, as this is consistent with the U.S. Age Discrimination in Employment Act (ADEA) of 1967. There are two potential issues with this: (a) Some states have further age protections in place that protect workers under 40 as well, and (b) Some jobs in organizations have very young workforces, leaving the question of whether people in their 30s are being favored over people in their 20s, or vice versa (for example). In addition, when we look at organizations with workers who are well into their 50s, 60s, and 70s, then there might be evidence of age discrimination and differential prediction for those who are in their 50s compared to those who are in their 60s. Dichotomizing the age variable using a 40 and above vs. under 40 split might miss these issues and generally nuance. (For a review of this topic, see Fisher et al. 2017.) As such, with a variable like age, there might be some advantages to keeping it as a continuous variable, as MMLR can handle this.

In the third step, we include the selection tool and protected class variables as before but add the interaction term between the selection tool and protected class variables. If the interaction term is statistically significant, then there is evidence of differential prediction in the form of slope differences.

43.1.2.1 Sample Write-Up

Our HR analytics team recently found evidence in support of the criterion-related validity of a new structured interview selection procedure. As a next step, we investigated whether there might be evidence of differential prediction for the structured interview based on the U.S. protected class of gender. Based on a sample of 370 individuals, we applied a three-step “step up” process with moderated multiple linear regression, which is consistent with established principles (Society for Industrial & Organizational Psychology 2018; Cleary 1968). Based on our analyses, we found evidence of differential prediction (i.e., predictive bias) based on gender for the new structured interview procedure, and specifically, we found evidence of slope differences. In the first step, we found evidence of criterion-related validity based on a simple linear regression model, as structured interview scores were positively and significantly associated with job performance (b = .192, p < .001), thereby indicating some degree of job-relatedness and -relevance. In the second step, we added the protected class variable associated with gender to the model, resulting in a multiple linear regression model, but we did not find evidence of intercept differences (b = .166, p = .146). In the third step, we added the interaction term between the structured interview and gender variables, resulting in a moderated multiple linear regression model, and we found that gender moderated the association between interview scores and job performance scores to a statistically significant extent (b = -.373, p < .001). Based on follow-up simple slopes analyses, we found that the association between structured interview scores and job performance scores was statistically significant and positive for men (b = .39, p < .01), such that for every one unit increase in structured interview scores, job performance scores tended to increase by .39 units; in contrast, the association between structured interview scores and job performance scores for women was nonsignificant (b = .02, p = .72). In sum, we found evidence of slope differences, which means that this interview tool shows evidence of predictive bias with respect to gender, making the application of a common regression line (i.e., equation) egally problematic within the United States. If possible, this structured interview should be redesigned or the interviewers should be trained to reduce the predictive bias based on gender. In the interim, we caution against using separate regression lines for men and women, as this may be interpreted as running afoul of the U.S. Civil Rights Act of 1991 (Saad and Sackett 2002). A more conservative approach would be to develop a structured interview process that allows for the use of a common regression line across legally protected groups.

43.2 Tutorial

This chapter’s tutorial demonstrates how to use moderated multiple linear regression in the service of detecting differential prediction in a selection tool.

43.2.1 Video Tutorial

As usual, you have the choice to follow along with the written tutorial in this chapter or to watch the video tutorial below.

Link to video tutorial: https://youtu.be/3n2vvMc4yHs

43.2.2 Functions & Packages Introduced

Function Package
mean base R
Regression lessR
reg_brief lessR
probe_interaction interactions

43.2.3 Initial Steps

If you haven’t already, save the file “DifferentialPrediction.csv” into a folder that you will subsequently set as your working directory. Your working directory will likely be different than the one shown below (i.e., "H:/RWorkshop"). As a reminder, you can access all of the data files referenced in this book by downloading them as a compressed (zipped) folder from the my GitHub site: https://github.com/davidcaughlin/R-Tutorial-Data-Files; once you’ve followed the link to GitHub, just click “Code” (or “Download”) followed by “Download ZIP”, which will download all of the data files referenced in this book. For the sake of parsimony, I recommend downloading all of the data files into the same folder on your computer, which will allow you to set that same folder as your working directory for each of the chapters in this book.

Next, using the setwd function, set your working directory to the folder in which you saved the data file for this chapter. Alternatively, you can manually set your working directory folder in your drop-down menus by going to Session > Set Working Directory > Choose Directory…. Be sure to create a new R script file (.R) or update an existing R script file so that you can save your script and annotations. If you need refreshers on how to set your working directory and how to create and save an R script, please refer to Setting a Working Directory and Creating & Saving an R Script.

# Set your working directory
setwd("H:/RWorkshop")

Next, read in the .csv data file “DifferentialPrediction.csv” using your choice of read function. In this example, I use the read_csv function from the readr package (Wickham, Hester, and Bryan 2024). If you choose to use the read_csv function, be sure that you have installed and accessed the readr package using the install.packages and library functions. Note: You don’t need to install a package every time you wish to access it; in general, I would recommend updating a package installation once ever 1-3 months. For refreshers on installing packages and reading data into R, please refer to Packages and Reading Data into R.

# Install readr package if you haven't already
# [Note: You don't need to install a package every 
# time you wish to access it]
install.packages("readr")
# Access readr package
library(readr)

# Read data and name data frame (tibble) object
DiffPred <- read_csv("DifferentialPrediction.csv")
## Rows: 320 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): emp_id, gender, race
## dbl (3): perf_eval, interview, age
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Print the names of the variables in the data frame (tibble) objects
names(DiffPred)
## [1] "emp_id"    "perf_eval" "interview" "gender"    "age"       "race"
# View variable type for each variable in data frame
str(DiffPred)
## spc_tbl_ [320 × 6] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ emp_id   : chr [1:320] "EE200" "EE202" "EE203" "EE206" ...
##  $ perf_eval: num [1:320] 6.2 6.6 5 4 5.3 5.9 5.5 3.8 5.7 3.8 ...
##  $ interview: num [1:320] 8.5 5.7 7 6.8 9.6 6 7.5 5.9 9.6 8.3 ...
##  $ gender   : chr [1:320] "woman" "woman" "man" "woman" ...
##  $ age      : num [1:320] 30 45.6 39.2 36.9 24.5 41.5 35.1 43.2 25.7 27.8 ...
##  $ race     : chr [1:320] "black" "black" "asian" "black" ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   emp_id = col_character(),
##   ..   perf_eval = col_double(),
##   ..   interview = col_double(),
##   ..   gender = col_character(),
##   ..   age = col_double(),
##   ..   race = col_character()
##   .. )
##  - attr(*, "problems")=<externalptr>
# View first 6 rows of data frame
head(DiffPred)
## # A tibble: 6 × 6
##   emp_id perf_eval interview gender   age race 
##   <chr>      <dbl>     <dbl> <chr>  <dbl> <chr>
## 1 EE200        6.2       8.5 woman   30   black
## 2 EE202        6.6       5.7 woman   45.6 black
## 3 EE203        5         7   man     39.2 asian
## 4 EE206        4         6.8 woman   36.9 black
## 5 EE209        5.3       9.6 woman   24.5 white
## 6 EE214        5.9       6   woman   41.5 asian

There are 6 variables and 370 cases (i.e., employees) in the DiffPred data frame: emp_id, perf_eval, interview, age, gender, and race. Per the output of the str (structure) function above, the variables perf_eval, interview, and age are of type numeric (continuous: interval/ratio), and the variables emp_id, gender, and race are of type character (nominal/categorical). The variable emp_id is the unique employee identifier. Imagine that these data were collected as part of a criterion-related validation study - specifically, a concurrent validation design in which job incumbents were administered a rated structured interview (interview) 90 days after entering the organization. The structured interview (interview) variable was designed to assess individuals’ interpersonal skills, and ratings can range from 1 (very weak interpersonal skills) to 10 (very strong interpersonal skills). The interviews were scored by untrained raters who were often the hiring managers but not always. The perf_eval variable is the criterion (outcome) of interest, and it is a 90-day-post-hire measure of supervisor-rated job performance, with possible ratings ranging from 1-7, with 7 indicating high performance. The age variable represents the job incumbents’ ages (in years). The gender variable represents the job incumbents’ tender identify and is defined by two levels/categories/values: man and woman. Finally, the race variable represents the job incumbents’ race/ethnicity and is defined by three levels/categories/values: asian, black, and white.

43.2.4 Grand-Mean Center Continuous Predictor Variables

To improve the interpretability of our model and to reduce the collinearity between the predictor and moderator variables with their interaction term, we will grand-mean center continuous predictor and moderator variables. [Note: A moderator variable is just a predictor variable that we are framing as a moderator.] With regard to interpretability, grand-mean centering aids in our interpretation of the regression coefficients associated with our predictor and moderator variables in our MMLR by reducing collinearity between these variables and their interaction term and by estimating the y-intercept when all predictors are at their respective means (averages), as opposed to when they are at zero (which is not always a plausible or meaningful value for some variables). Further, when interpreting the association between one predictor variable and the outcome variable while statistically controlling for other variables in the model, grand-mean centering allows us to interpret the effect of the focal predictor variable when the other predictor variables are at their respective means (as opposed to zero). For more detailed information on centering, check out the chapter on centering and standardizing variables.

We only center predictor and moderator variables that are of type numeric and that we conceptualize as having a continuous (interval or ratio) measurement scale. In our current data frame, we will grand-mean center just the interview and age variables. We will use what is perhaps the most intuitive approach for grand-mean centering variables. We must begin by coming up with a new name for one of our soon-to-be grand-mean centered variable, and in this example, we will center the interview variable. I typically like to name the centered variable by simply adding the c_ prefix to the existing variable’s names (e.g., c_interview). Type the name of the data frame object to which the new centered variable will be attached (DiffPred), followed by the $ operator and the name of the new variable we are creating (c_interview). Next, add the <- operator to indicate what values you will assign to this new variable. To create a vector of values to assign to this new c_interview variable, we will subtract the mean (average) score for the original variable (interview) from each case’s value on the variable. Specifically, enter the name of the data frame object, followed by the $ operator and the name of the original variable (interview). After that, enter the subtraction operator (-). And finally, type the name of the mean function from base R. As the first argument in the mean function, enter the name of the data frame object (DiffPred), followed by the $ operator and the name of the original variable (interview). As the second argument, enter na.rm=TRUE to indicate that you wish to drop missing values when calculating the grand mean for the sample. Now repeat those same steps for the age variable.

# Grand-mean centering: Using basic arithmetic and the mean function from base R
DiffPred$c_interview <- DiffPred$interview - mean(DiffPred$interview, na.rm=TRUE)
DiffPred$c_age <- DiffPred$age - mean(DiffPred$age, na.rm=TRUE)

43.2.5 Estimate Moderated Multiple Linear Regression Model

As described above, we will use the standard three-step differential prediction assessment using MMLR. We will repeat this three step process for all three protected class variables in our data frame: gender, c_age, and race. [Note that we are using the grand-mean centered age variable we created above.]. Our selection tool of interest is our c_interview (which is the grand-mean centered version of the variable we created above), and our criterion of interest is perf_eval. I will show you how to test differential prediction for each of these protected class variables using the Regression (or reg_brief) function from lessR (Gerbing, Business, and University 2021).

Note: To reduce the overall length of this tutorial, I have omitted the diagnostic tests of the statistical assumptions, and thus for the purposes of this tutorial, we will assume that the statistical assumptions have been met. For your own work, be sure to run diagnostic tests to evaluate whether your model meets these statistical assumptions. For more information on statistical assumption testing in this context, check out the chapter on estimating increment validity using multiple linear regression.

If you haven’t already install and access the lessR package using the install.packages and library functions, respectively (see above). This will allow us to access the Regression and reg_brief functions.

# Install lessR package if you haven't already
install.packages("lessR")
# Access lessR package
library(lessR)

43.2.5.1 Test Differential Prediction Based on gender

Let’s start with investigating whether differential prediction exists for our interview selection tool in relation to perf_eval based gender subgroup membership (i.e., man, woman).

Step One: Let’s being by regressing the criterion variable perf_eval on the selection tool variable c_interview. Here we are just establishing whether the selection tool shows evidence of criterion-related validity. See the chapters on predicting criterion scores using simple linear regression if you would like more information on regression-based approaches to assessing criterion-related validity, including determining whether statistical assumptions have been met. You can use either the Regression or reg_brief function from lessR, where the latter provides less output. To keep the amount of output in this tutorial to a minimum, I will use the reg_brief function, but when you evaluate whether statistical assumptions have been met, be sure to use the full Regression function.

# Regress perf_eval (criterion) on c_interview (selection tool)
reg_brief(perf_eval ~ c_interview, data=DiffPred)

## >>> Suggestion
## # Create an R markdown file for interpretative output with  Rmd = "file_name"
## reg(perf_eval ~ c_interview, data=DiffPred, Rmd="eg")  
## 
## 
##   BACKGROUND 
## 
## Data Frame:  DiffPred 
##  
## Response Variable: perf_eval 
## Predictor Variable: c_interview 
##  
## Number of cases (rows) of data:  320 
## Number of cases retained for analysis:  320 
## 
## 
##   BASIC ANALYSIS 
## 
##              Estimate    Std Err  t-value  p-value   Lower 95%   Upper 95% 
## (Intercept)     4.708      0.054   86.472    0.000       4.600       4.815 
## c_interview     0.192      0.041    4.629    0.000       0.110       0.273 
## 
## Standard deviation of perf_eval: 1.005 
##  
## Standard deviation of residuals:  0.974 for 318 degrees of freedom 
## 95% range of residual variation:  3.832 = 2 * (1.967 * 0.974) 
##  
## R-squared:  0.063    Adjusted R-squared:  0.060    PRESS R-squared:  0.051 
## 
## Null hypothesis of all 0 population slope coefficients:
##   F-statistic: 21.425     df: 1 and 318     p-value:  0.000 
## 
## -- Analysis of Variance 
##  
##               df    Sum Sq   Mean Sq   F-value   p-value 
## Model          1    20.319    20.319    21.425     0.000 
## Residuals    318   301.583     0.948 
## perf_eval    319   321.902     1.009 
## 
## 
##   K-FOLD CROSS-VALIDATION 
## 
## 
##   RELATIONS AMONG THE VARIABLES 
## 
## 
##   RESIDUALS AND INFLUENCE 
## 
## 
##   PREDICTION ERROR

The unstandardized regression coefficient for c_interview is statistically significant and positive (b = .192, p < .001), which provides evidence of criterion-related validity. The amount of variance explained by c_interview in perf_eval is small-medium in magnitude (R2 = .063, adjusted R2 = .060, p < .001); that is, approximately 6% of the variability in perf_eval can be explained by c_interview. See the table below for common rules of thumbs for qualitatively describing R2 values.

R2 Description
.01 Small
.09 Medium
.25 Large

Because Step One revealed a statistically significant association between c_interview and perf_eval – and thus evidence of criterion-related validity for the selection tool – we will proceed to Step Two and Step Three. Had we not found evidence of criterion-related validity in Step One, we would have stopped at that step; after all, it won’t be worthwhile to look for evidence of differential prediction if we won’t be using the selection tool moving forward (given that it’s not job related).

Step Two: Next, let’s add the protected class variable gender to our model to investigate whether intercept differences exist. For more information on how to interpret and test the statistical assumptions of a multiple linear regression model, check out the chapter on estimating incremental validity using multiple linear regression. Note that this model can be called an additive model.

# Regress perf_eval (criterion) on c_interview (selection tool) and protected class (gender)
reg_brief(perf_eval ~ c_interview + gender, data=DiffPred)
## 
## >>>  gender is not numeric. Converted to indicator variables.

## >>> Suggestion
## # Create an R markdown file for interpretative output with  Rmd = "file_name"
## reg(perf_eval ~ c_interview + gender, data=DiffPred, Rmd="eg")  
## 
## 
##   BACKGROUND 
## 
## Data Frame:  DiffPred 
##  
## Response Variable: perf_eval 
## Predictor Variable 1: c_interview 
## Predictor Variable 2: genderwoman 
##  
## Number of cases (rows) of data:  320 
## Number of cases retained for analysis:  320 
## 
## 
##   BASIC ANALYSIS 
## 
##              Estimate    Std Err  t-value  p-value   Lower 95%   Upper 95% 
## (Intercept)     4.630      0.076   60.719    0.000       4.480       4.780 
## c_interview     0.211      0.043    4.861    0.000       0.125       0.296 
## genderwoman     0.166      0.114    1.458    0.146      -0.058       0.391 
## 
## Standard deviation of perf_eval: 1.005 
##  
## Standard deviation of residuals:  0.972 for 317 degrees of freedom 
## 95% range of residual variation:  3.825 = 2 * (1.967 * 0.972) 
##  
## R-squared:  0.069    Adjusted R-squared:  0.063    PRESS R-squared:  NA 
## 
## Null hypothesis of all 0 population slope coefficients:
##   F-statistic: 11.813     df: 2 and 317     p-value:  0.000 
## 
## -- Analysis of Variance from Type II Sums of Squares 
##  
##               df    Sum Sq   Mean Sq   F-value   p-value 
## c_interview    1    22.328    22.328    23.626     0.000 
## genderwoman    1     2.009     2.009     2.125     0.146 
## Residuals    317   299.574     0.945 
## 
## -- Test of Interaction 
##   
## c_interview:gender  df: 1  df resid: 316  SS: 17.514  F: 19.622  p-value: 0 
##   
## -- Assume parallel lines, no interaction of gender with c_interview 
##  
## Level man: y^_perf_eval = 4.630 + 0.211(x_c_interview) 
## Level woman: y^_perf_eval = 4.796 + 0.211(x_c_interview) 
##   
## -- Visualize Separately Computed Regression Lines 
## 
## Plot(c_interview, perf_eval, by=gender, fit="lm") 
## 
## 
##   K-FOLD CROSS-VALIDATION 
## 
## 
##   RELATIONS AMONG THE VARIABLES 
## 
## 
##   RESIDUALS AND INFLUENCE 
## 
## 
##   PREDICTION ERROR

The unstandardized regression coefficient associated with gender (or more specifically genderwoman) is not statistically significant (b = .166, p = .146) when controlling for c_interview, which indicates that there is no evidence of intercept differences. Note that the name of the regression coefficient for gender has woman appended to it; this means that the intercept represents the perf_eval scores of women minus those for men. Behind the scenes, the reg_brief function has converted the gender variable to a dichotomous factor, where alphabetically man becomes 0 and woman becomes 1. Because the coefficient is negative, it implies that the intercept value is higher for women relative to men; however, please note that we wouldn’t interpret the direction of this effect because it was not statistically significant. The amount of variance explained by c_interview and gender in perf_eval is small-medium in magnitude (R2 = .069, adjusted R2 = .063, p < .001), which is slightly larger than when just c_interview was included in the model (see Step One above).

Because there is no evidence of intercept differences for Step Two, we won’t plot/graph the our model, as doing so might mislead someone who sees such a plot without knowing or understanding the results of the model we just estimated. We will now proceed to Step Three.

Step Three: As the final step, it’s time to add the interaction term (also known as a product term) between c_interview and gender to the model, which creates what is referred to as a multiplicative model. Fortunately, this is very easy to do in R. All we need to do is change the + operator from our previous regression model script to the *, where the * implies an interaction in the context of a model. Conceptually, the * is appropriate because we are estimating a multiplicative model containing an interaction term.

# Regress perf_eval (criterion) on c_interview (selection tool), protected class (gender), 
# and interaction between selection tool and protected class
reg_brief(perf_eval ~ c_interview * gender, data=DiffPred)
## 
## >>>  gender is not numeric. Converted to indicator variables.

## >>> Suggestion
## # Create an R markdown file for interpretative output with  Rmd = "file_name"
## reg(perf_eval ~ c_interview * gender, data=DiffPred, Rmd="eg")  
## 
## 
##   BACKGROUND 
## 
## Data Frame:  DiffPred 
##  
## Response Variable: perf_eval 
## Predictor Variable 1: c_interview 
## Predictor Variable 2: genderwoman 
## Predictor Variable 3: c_interview.genderwoman 
##  
## Number of cases (rows) of data:  320 
## Number of cases retained for analysis:  320 
## 
## 
##   BASIC ANALYSIS 
## 
##                          Estimate    Std Err  t-value  p-value   Lower 95%   Upper 95% 
##             (Intercept)     4.562      0.076   60.298    0.000       4.413       4.711 
##             c_interview     0.394      0.059    6.672    0.000       0.278       0.511 
##             genderwoman     0.155      0.111    1.397    0.163      -0.063       0.373 
## c_interview.genderwoman    -0.373      0.084   -4.430    0.000      -0.539      -0.207 
## 
## Standard deviation of perf_eval: 1.005 
##  
## Standard deviation of residuals:  0.945 for 316 degrees of freedom 
## 95% range of residual variation:  3.718 = 2 * (1.967 * 0.945) 
##  
## R-squared:  0.124    Adjusted R-squared:  0.115    PRESS R-squared:  NA 
## 
## Null hypothesis of all 0 population slope coefficients:
##   F-statistic: 14.879     df: 3 and 316     p-value:  0.000 
## 
## -- Analysis of Variance from Type II Sums of Squares 
##  
##                           df    Sum Sq   Mean Sq   F-value   p-value 
##             c_interview    1    22.328    22.328    23.626     0.000 
##             genderwoman    1     2.009     2.009     2.250     0.135 
## c_interview.genderwoman    1    17.514    17.514    19.622     0.000 
## Residuals                316   282.060     0.893 
## 
## -- Test of Interaction 
##   
## c_interview:gender  df: 1  df resid: 316  SS: 17.514  F: 19.622  p-value: 0 
##   
## -- Assume parallel lines, no interaction of gender with c_interview 
##  
## Level man: y^_perf_eval = 4.562 + 0.394(x_c_interview) 
## Level woman: y^_perf_eval = 4.717 + 0.394(x_c_interview) 
##   
## -- Visualize Separately Computed Regression Lines 
## 
## Plot(c_interview, perf_eval, by=gender, fit="lm") 
## 
## 
##   K-FOLD CROSS-VALIDATION 
## 
## 
##   RELATIONS AMONG THE VARIABLES 
## 
## 
##   RESIDUALS AND INFLUENCE 
## 
## 
##   PREDICTION ERROR

In our output, we now have an unstandardized regression coefficient called c_interview:gender (or c_interview.gender), which represents our interaction term for c_interview and gender. The regression coefficient associated with the interaction term (c_interview:gender) is negative and statistically significant (b = -.373, p < .001), but we shouldn’t pay too much attention to the sign of the interaction term as it is difficult to interpret it without graphing the regression model in its entirety. We should, however, pay attention to the fact that the interaction term is statistically significant, which indicates that there is evidence of slope differences; in other words, there seems to be multiplicative effect between c_interview and gender. Although we didn’t find evidence of intercept differences in Step Two, here in Step Three, we do find evidence that the criterion-related validities for the selection tool (c_interview) in relation to the criterion (perf_eval) are significantly different from one another; in other words, we found evidence that the slopes for men and women differ. The amount of variance explained by c_interview, gender, and their interaction in relation to perf_eval is medium in magnitude (R2 = .124, adjusted R2 = .115, p < .001), which is larger than when just c_interview and gender were included in the model (see Step Two above).

Given the statistically significant interaction term, it is appropriate (and helpful) to probe the interaction by plotting it. This will help us understand the form of the interaction. We can do so by creating a plot of the simple slopes. To do so, we will use a package called interactions (Long 2019), so if you haven’t already, install and access the package.

# Install interactions package if you haven't already
install.packages("interactions")

# Note: You may need to install the sandwich package independently if you 
# receive an error when attempting to run the probe_interaction function
# install.packages("sandwich")
# Access interactions package
library(interactions)

As input to the probe_interaction function from the interactions package, we will need to use the lm function base R to specify the same regression model we estimated above. We can just copy and paste the same arguments from the reg_brief function into the lm function parentheses. We need to name this model something so that we can reference it in the next function, so let’s name it Slope_Model (for slope differences model) using the <- operator.

# Regress perf_eval (criterion) on c_interview (selection tool), protected class (gender), 
# and interaction between selection tool and protected class
Slope_Model <- lm(perf_eval ~ c_interview * gender, data=DiffPred)

Now we’re ready to apply the probe_interaction in order to plot this multiplicative model. As the first argument, enter the name of the regression model we just created above (Slope_Model). As the second argument, type the name of the variable you are framing as the predictor after pred=, which in our case is the selection tool variable (c_interview). As the third argument, type the name of the variable you are framing as the moderator after modx=, which in our case is the protected class variable (gender). As the fourth argument, enter johnson_neyman=FALSE, as we aren’t requesting the Johnson-Neyman test in this tutorial. Finally, if you choose to, you can use x.label= and y.label= to create more informative names for the predictor and outcome variables, respectively. You could also add, if you should choose to do so, the legend.main= argument to provide a more informative name for the moderator variable, and/or the modx.labels= to provide more informative names for the moderator variable labels.

# Probe the significant interaction (slope differences)
probe_interaction(Slope_Model, 
                  pred=c_interview, 
                  modx=gender,
                  johnson_neyman=FALSE,
                  x.label="Interview Score",
                  y.label="Job Performance")
## SIMPLE SLOPES ANALYSIS 
## 
## Slope of c_interview when gender = man: 
## 
##   Est.   S.E.   t val.      p
## ------ ------ -------- ------
##   0.39   0.06     6.67   0.00
## 
## Slope of c_interview when gender = woman: 
## 
##   Est.   S.E.   t val.      p
## ------ ------ -------- ------
##   0.02   0.06     0.35   0.72

The plot output shows that the two slopes are indeed different, such that there doesn’t seem to be much of an association between the selection tool (c_interview) and the criterion (perf_eval) for women, but there seems to be a positive association for men. Now take a look at the text output in your console. The simple slopes analysis automatically estimates the regression coefficient (slope) for both levels of the moderator variable (gender: man, woman). Corroborating what we see in the plot, we find that the simple slope for men is indeed statistically significant and positive (b = .39, p < .01), such that for every one unit increase in interview scores, job performance scores tend to go up by .39 units. In contrast, we find that the simple slope for women is nonsignificant (b = .02, p = .72).

In sum, we found evidence of slope differences – but not intercept differences – for the selection tool (c_interview) in relation to the criterion (perf_eval) that were conditional upon the levels/subgroups of the protected class variable (gender). Thus, we can conclude that there is evidence of differential prediction based on gender for this interview selection tool, which means that it would be inappropriate to use a common regression line when predicting scores on the criterion based on the interview selection tool.

Sample Write-Up: Our HR analytics team recently found evidence in support of the criterion-related validity of a new structured interview selection procedure. As a next step, we investigated whether there might be evidence of differential prediction of the structured interview based on the U.S. protected class of gender. Based on a sample of 370 individuals, we applied a three-step process with moderated multiple linear regression, which is consistent with established principles (Society for Industrial & Organizational Psychology 2018; Cleary 1968). Based on our analyses, we found evidence differential prediction (i.e., predictive bias) based on gender for the new structured interview procedure, and specifically, we found evidence of slope differences. In the first step, we found evidence of criterion-related validity based on a simple linear regression model, as structured interview scores were positively and significantly associated with job performance (b = .192, p < .001), thereby indicating some degree of job-relatedness and -relevance. In the second step, we added the protected class variable associated with gender to the model, resulting in a multiple linear regression model, but did not find evidence of intercept differences based on gender (b = .166, p = .146). In the third step, we added the interaction term between the structured interview and gender variables, resulting in a moderated multiple linear regression model, and we found that gender moderated the association between structured interview scores and job performance scores to a statistically significant extent (b = -.373, p < .001). Based on follow-up simple slopes analyses, we found that the association between structured interview scores and job performance scores was statistically significant for men (b = .39, p < .01), such that for every one unit increase in structured interview scores, job performance scores tended to increase by .39 units; in contrast, the associated between structured interview scores and job performance scores for women was nonsignificant (b = .02, p = .72). In sum, we found evidence slope differences, which means that this structured interview tool shows predictive bias with respect to gender, making the application of a common regression line (i.e., equation) legally problematic within the United States. If possible, this structured interview should be redesigned or the interviewers should be trained to reduce the predictive bias based on gender. In the interim, we caution against using separate regression lines for men and women, as this may be interpreted as running afoul of the U.S. Civil Rights Act of 1991 (Saad and Sackett 2002). A more conservative approach would be to develop a structured interview process that allows for the use of a common regression line across legally protected groups.

43.2.5.2 Test Differential Prediction Based on c_age

Investigating whether differential prediction exists for c_interview in relation to perf_eval based on the continuous moderator variable c_age will unfold very similarly to the process we used for the dichotomous gender variable from above. The only differences will occur when it comes to estimating and visualizing the intercept differences and simple slopes (assuming we find statistically significant effects). Given this, we will breeze through the steps, which means I will provide less explanation.

Step One: Regress the criterion variable perf_eval on the selection tool variable c_interview.

# Regress perf_eval (criterion) on c_interview (selection tool)
reg_brief(perf_eval ~ c_interview, data=DiffPred)

## >>> Suggestion
## # Create an R markdown file for interpretative output with  Rmd = "file_name"
## reg(perf_eval ~ c_interview, data=DiffPred, Rmd="eg")  
## 
## 
##   BACKGROUND 
## 
## Data Frame:  DiffPred 
##  
## Response Variable: perf_eval 
## Predictor Variable: c_interview 
##  
## Number of cases (rows) of data:  320 
## Number of cases retained for analysis:  320 
## 
## 
##   BASIC ANALYSIS 
## 
##              Estimate    Std Err  t-value  p-value   Lower 95%   Upper 95% 
## (Intercept)     4.708      0.054   86.472    0.000       4.600       4.815 
## c_interview     0.192      0.041    4.629    0.000       0.110       0.273 
## 
## Standard deviation of perf_eval: 1.005 
##  
## Standard deviation of residuals:  0.974 for 318 degrees of freedom 
## 95% range of residual variation:  3.832 = 2 * (1.967 * 0.974) 
##  
## R-squared:  0.063    Adjusted R-squared:  0.060    PRESS R-squared:  0.051 
## 
## Null hypothesis of all 0 population slope coefficients:
##   F-statistic: 21.425     df: 1 and 318     p-value:  0.000 
## 
## -- Analysis of Variance 
##  
##               df    Sum Sq   Mean Sq   F-value   p-value 
## Model          1    20.319    20.319    21.425     0.000 
## Residuals    318   301.583     0.948 
## perf_eval    319   321.902     1.009 
## 
## 
##   K-FOLD CROSS-VALIDATION 
## 
## 
##   RELATIONS AMONG THE VARIABLES 
## 
## 
##   RESIDUALS AND INFLUENCE 
## 
## 
##   PREDICTION ERROR

The regression coefficient for c_interview is statistically significant and positive (b = .192, p < .001), which provides evidence of criterion-related validity. The amount of variance explained by c_interview in perf_eval is small-medium in magnitude (R2 = .063, adjusted R2 = .060, p < .001).

Step Two: Regress the criterion variable perf_eval on the selection tool variable c_interview and the protected class variable c_age.

# Regress perf_eval (criterion) on c_interview (selection tool) and protected class (c_age)
reg_brief(perf_eval ~ c_interview + c_age, data=DiffPred)

## >>> Suggestion
## # Create an R markdown file for interpretative output with  Rmd = "file_name"
## reg(perf_eval ~ c_interview + c_age, data=DiffPred, Rmd="eg")  
## 
## 
##   BACKGROUND 
## 
## Data Frame:  DiffPred 
##  
## Response Variable: perf_eval 
## Predictor Variable 1: c_interview 
## Predictor Variable 2: c_age 
##  
## Number of cases (rows) of data:  320 
## Number of cases retained for analysis:  320 
## 
## 
##   BASIC ANALYSIS 
## 
##              Estimate    Std Err  t-value  p-value   Lower 95%   Upper 95% 
## (Intercept)     4.708      0.050   94.164    0.000       4.609       4.806 
## c_interview     0.978      0.108    9.028    0.000       0.765       1.191 
##       c_age     0.129      0.017    7.752    0.000       0.096       0.161 
## 
## Standard deviation of perf_eval: 1.005 
##  
## Standard deviation of residuals:  0.894 for 317 degrees of freedom 
## 95% range of residual variation:  3.519 = 2 * (1.967 * 0.894) 
##  
## R-squared:  0.212    Adjusted R-squared:  0.207    PRESS R-squared:  0.192 
## 
## Null hypothesis of all 0 population slope coefficients:
##   F-statistic: 42.749     df: 2 and 317     p-value:  0.000 
## 
## -- Analysis of Variance 
##  
##               df    Sum Sq   Mean Sq   F-value   p-value 
## c_interview    1    20.319    20.319    25.406     0.000 
##       c_age    1    48.059    48.059    60.092     0.000 
##  
## Model          2    68.378    34.189    42.749     0.000 
## Residuals    317   253.524     0.800 
## perf_eval    319   321.902     1.009 
## 
## 
##   K-FOLD CROSS-VALIDATION 
## 
## 
##   RELATIONS AMONG THE VARIABLES 
## 
## 
##   RESIDUALS AND INFLUENCE 
## 
## 
##   PREDICTION ERROR

The regression coefficient associated with c_age is positive and statistically significant (b = .129, p < .001) when controlling for c_interview, which indicates that there is evidence of intercept differences. Because the coefficient is positive, it implies that the intercept value is significantly higher for older workers relative to younger workers. The amount of variance explained by c_interview and c_age in perf_eval is medium-large in magnitude (R2 = .212, adjusted R2 = .207, p < .001), which is notably larger than when just c_interview was included in the model (see Step One above).

Using the probe_interaction function from the interactions package, we will visualize the intercept differences. As input to the probe_interaction function, we will use the lm function base R to specify our regression model from above. Name this model something so that we can reference it in the next function (Int_Model).

# Regress perf_eval (criterion) on c_interview (selection tool) and protected class (c_age)
Int_Model <- lm(perf_eval ~ c_interview + c_age, data=DiffPred)

Now, we will specify the arguments in the probe_interaction function. Note that the function automatically categorizes the continuous moderator variable by its mean and 1 standard deviation (SD) below and above the mean. As you can see in the plot, individuals whose age is 1 SD above the mean have the highest intercept value, where the intercept is the value of perf_eval (Job Performance) when interview (Interview) is equal to zero.

# Visualize intercept differences
probe_interaction(Int_Model, 
                  pred=c_interview, 
                  modx=c_age,
                  johnson_neyman=FALSE,
                  x.label="Interview Score",
                  y.label="Job Performance")
## SIMPLE SLOPES ANALYSIS 
## 
## Slope of c_interview when c_age = -8.593122204199181268791 (- 1 SD): 
## 
##   Est.   S.E.   t val.      p
## ------ ------ -------- ------
##   0.98   0.11     9.03   0.00
## 
## Slope of c_interview when c_age = -0.000000000000003164136 (Mean): 
## 
##   Est.   S.E.   t val.      p
## ------ ------ -------- ------
##   0.98   0.11     9.03   0.00
## 
## Slope of c_interview when c_age =  8.593122204199174163364 (+ 1 SD): 
## 
##   Est.   S.E.   t val.      p
## ------ ------ -------- ------
##   0.98   0.11     9.03   0.00

Step Three: Now, we will add the interaction term between c_interview and c_age to the model using the * operator.

# Regress perf_eval (criterion) on c_interview (selection tool), protected class (c_age), 
# and interaction between selection tool and protected class
reg_brief(perf_eval ~ c_interview * c_age, data=DiffPred)

## >>> Suggestion
## # Create an R markdown file for interpretative output with  Rmd = "file_name"
## reg(perf_eval ~ c_interview * c_age, data=DiffPred, Rmd="eg")  
## 
## 
##   BACKGROUND 
## 
## Data Frame:  DiffPred 
##  
## Response Variable: perf_eval 
## Predictor Variable 1: c_interview 
## Predictor Variable 2: c_age 
##  
## Number of cases (rows) of data:  320 
## Number of cases retained for analysis:  320 
## 
## 
##   BASIC ANALYSIS 
## 
##                    Estimate    Std Err  t-value  p-value   Lower 95%   Upper 95% 
##       (Intercept)     4.995      0.068   73.937    0.000       4.862       5.128 
##       c_interview     1.711      0.160   10.700    0.000       1.397       2.026 
##             c_age     0.263      0.027    9.590    0.000       0.209       0.317 
## c_interview:c_age     0.027      0.005    5.986    0.000       0.018       0.036 
## 
## Standard deviation of perf_eval: 1.005 
##  
## Standard deviation of residuals:  0.849 for 316 degrees of freedom 
## 95% range of residual variation:  3.340 = 2 * (1.967 * 0.849) 
##  
## R-squared:  0.293    Adjusted R-squared:  0.286    PRESS R-squared:  0.275 
## 
## Null hypothesis of all 0 population slope coefficients:
##   F-statistic: 43.573     df: 3 and 316     p-value:  0.000 
## 
## -- Analysis of Variance 
##  
##                     df    Sum Sq   Mean Sq   F-value   p-value 
##       c_interview    1    20.319    20.319    28.198     0.000 
##             c_age    1    48.059    48.059    66.694     0.000 
## c_interview:c_age    1    25.817    25.817    35.827     0.000 
##  
## Model                3    94.195    31.398    43.573     0.000 
## Residuals          316   227.707     0.721 
## perf_eval          319   321.902     1.009 
## 
## 
##   K-FOLD CROSS-VALIDATION 
## 
## 
##   RELATIONS AMONG THE VARIABLES 
## 
## 
##   RESIDUALS AND INFLUENCE 
## 
## 
##   PREDICTION ERROR

The regression coefficient associated with the interaction term (c_interview:c_age) is positive and statistically significant (b = .027, p < .001), which indicates that there is evidence of slope differences. The amount of variance explained by c_interview, c_age, and their interaction in relation to perf_eval is large in magnitude (R2 = .293, adjusted R2 = .286, p < .001), which is larger than when just c_interview and c_age were included in the model.

Given the statistically significant interaction term, we will use the probe_interaction function from the interactions package to probe the interaction. As input to the probe_interaction function from the interactions package, we will need to use the lm function base R to specify our regression model from above. We need to name this model something so that we can reference it in the next function, so let’s name it Slope_Model (for slope differences model) using the <- operator.

# Regress perf_eval (criterion) on c_interview (selection tool), protected class (c_age), 
# and interaction between selection tool and protected class
Slope_Model <- lm(perf_eval ~ c_interview * c_age, data=DiffPred)

We’re ready to use the probe_interaction to plot this multiplicative model.

# Probe the significant interaction (slope differences)
probe_interaction(Slope_Model, 
                  pred=c_interview, 
                  modx=c_age,
                  johnson_neyman=FALSE,
                  x.label="Interview Score",
                  y.label="Job Performance")
## SIMPLE SLOPES ANALYSIS 
## 
## Slope of c_interview when c_age = -8.593122204199181268791 (- 1 SD): 
## 
##   Est.   S.E.   t val.      p
## ------ ------ -------- ------
##   1.48   0.13    11.16   0.00
## 
## Slope of c_interview when c_age = -0.000000000000003164136 (Mean): 
## 
##   Est.   S.E.   t val.      p
## ------ ------ -------- ------
##   1.71   0.16    10.70   0.00
## 
## Slope of c_interview when c_age =  8.593122204199174163364 (+ 1 SD): 
## 
##   Est.   S.E.   t val.      p
## ------ ------ -------- ------
##   1.95   0.19    10.16   0.00

The plot and the simple slopes analyses output show that the two slopes are indeed different, such that the association between the selection tool (c_interview, Interview) and the criterion (perf_eval, Job Performance) is more positive for older workers (b = 1.95, p < .01), than for average-aged workers (b = 1.71, p < .01) and younger workers (b = 1.48, p < .01). Notably, all three simple slopes are statistically significant and positive, but the slope is more positive for older workers.

In sum, we found evidence of both intercept and slope differences for the selection tool (c_interview) in relation to the criterion (perf_eval) that were conditional upon the age protected class variable (c_age). Thus, it would be inappropriate to use a common regression line when predicting scores on the criterion based on the interview selection tool.

43.2.5.3 Test Differential Prediction Based on race

To investigate whether differential prediction exists for c_interview in relation to perf_eval based on race, we will follow mostly the same process as we did with the dichotomous variable gender above. However, we will filter our data within the regression function to compare just Black individuals with White individuals, as it is customary to compare just two races/ethnicities at a time. The decision to focus on only Black and White individuals is arbitrary and just for demonstration purposes. Note that in the DiffPred data frame object that both black and white are lowercase category labels in the race variable.

Step one: Regress the criterion variable perf_eval on the selection tool variable c_interview. To subset the data such that only individuals who are Black or White are retained, we will add the following argument: rows=(race=="black" | race=="white"). This argument uses conditional statements to indicate that we want to retain those cases for which race is equal to Black or for which race is equal to White. We have effectively dichotomized the race variable using this approach. If you need to review conditional/logical statements when filtering data, check out the chapter on filtering data.

# Regress perf_eval (criterion) on c_interview (selection tool)
reg_brief(perf_eval ~ c_interview, data=DiffPred, 
          rows=(race=="black" | race=="white"))

## >>> Suggestion
## # Create an R markdown file for interpretative output with  Rmd = "file_name"
## reg(perf_eval ~ c_interview, data=DiffPred, rows=(race == "black" | race == "white"), Rmd="eg")  
## 
## 
##   BACKGROUND 
## 
## Data Frame:  DiffPred 
##  
## Response Variable: perf_eval 
## Predictor Variable: c_interview 
##  
## Number of cases (rows) of data:  212 
## Number of cases retained for analysis:  212 
## 
## 
##   BASIC ANALYSIS 
## 
##              Estimate    Std Err  t-value  p-value   Lower 95%   Upper 95% 
## (Intercept)     4.715      0.068   69.164    0.000       4.580       4.849 
## c_interview     0.140      0.052    2.672    0.008       0.037       0.243 
## 
## Standard deviation of perf_eval: 1.004 
##  
## Standard deviation of residuals:  0.989 for 210 degrees of freedom 
## 95% range of residual variation:  3.901 = 2 * (1.971 * 0.989) 
##  
## R-squared:  0.033    Adjusted R-squared:  0.028    PRESS R-squared:  0.014 
## 
## Null hypothesis of all 0 population slope coefficients:
##   F-statistic: 7.139     df: 1 and 210     p-value:  0.008 
## 
## -- Analysis of Variance 
##  
##               df    Sum Sq   Mean Sq   F-value   p-value 
## Model          1     6.990     6.990     7.139     0.008 
## Residuals    210   205.600     0.979 
## perf_eval    211   212.590     1.008 
## 
## 
##   K-FOLD CROSS-VALIDATION 
## 
## 
##   RELATIONS AMONG THE VARIABLES 
## 
## 
##   RESIDUALS AND INFLUENCE 
## 
## 
##   PREDICTION ERROR

We should begin by noting that our effective sample size is now 212, whereas previously it was 370 when we included individuals of all races. The regression coefficient for c_interview is statistically significant and positive (b = .140, p = .008) based on this reduced sample of 212 individuals, which provides evidence of criterion-related validity. The amount of variance explained by c_interview in perf_eval is small in magnitude (R2 = .033, adjusted R2 = .028, p < .001).

Step Two: We will regress the criterion variable perf_eval on the selection tool variable c_interview and the protected class variable race.

# Regress perf_eval (criterion) on c_interview (selection tool) and protected class (race)
reg_brief(perf_eval ~ c_interview + race, data=DiffPred,
          rows=(race=="black" | race=="white"))
## 
## >>>  race is not numeric. Converted to indicator variables.

## >>> Suggestion
## # Create an R markdown file for interpretative output with  Rmd = "file_name"
## reg(perf_eval ~ c_interview + race, data=DiffPred, rows=(race == "black" | race == "white"), Rmd="eg")  
## 
## 
##   BACKGROUND 
## 
## Data Frame:  DiffPred 
##  
## Response Variable: perf_eval 
## Predictor Variable 1: c_interview 
## Predictor Variable 2: racewhite 
##  
## Number of cases (rows) of data:  212 
## Number of cases retained for analysis:  212 
## 
## 
##   BASIC ANALYSIS 
## 
##              Estimate    Std Err  t-value  p-value   Lower 95%   Upper 95% 
## (Intercept)     4.911      0.089   54.922    0.000       4.735       5.087 
## c_interview     0.127      0.051    2.469    0.014       0.026       0.228 
##   racewhite    -0.440      0.134   -3.287    0.001      -0.705      -0.176 
## 
## Standard deviation of perf_eval: 1.004 
##  
## Standard deviation of residuals:  0.967 for 209 degrees of freedom 
## 95% range of residual variation:  3.813 = 2 * (1.971 * 0.967) 
##  
## R-squared:  0.080    Adjusted R-squared:  0.072    PRESS R-squared:  NA 
## 
## Null hypothesis of all 0 population slope coefficients:
##   F-statistic: 9.139     df: 2 and 209     p-value:  0.000 
## 
## -- Analysis of Variance from Type II Sums of Squares 
##  
##               df    Sum Sq   Mean Sq   F-value   p-value 
## c_interview    1     5.700     5.700     6.094     0.014 
##   racewhite    1    10.107    10.107    10.806     0.001 
## Residuals    209   195.493     0.935 
## 
## -- Test of Interaction 
##   
## c_interview:race  df: 1  df resid: 208  SS: 3.343  F: 3.618  p-value: 0.059 
##   
## -- Assume parallel lines, no interaction of race with c_interview 
##  
## Level black: y^_perf_eval = 4.911 + 0.127(x_c_interview) 
## Level white: y^_perf_eval = 4.470 + 0.127(x_c_interview) 
##   
## -- Visualize Separately Computed Regression Lines 
## 
## Plot(c_interview, perf_eval, by=race, fit="lm") 
## 
## 
##   K-FOLD CROSS-VALIDATION 
## 
## 
##   RELATIONS AMONG THE VARIABLES 
## 
## 
##   RESIDUALS AND INFLUENCE 
## 
## 
##   PREDICTION ERROR

The regression coefficient associated with race (or more specifically racewhite) is negative and statistically significant (b = -.440, p = .001) when controlling for c_interview, which indicates that there is evidence of intercept differences. Because the coefficient is negative and because white is appended to the race variable name, it implies that the intercept value is significantly lower for White individuals compared to Black individuals. The amount of variance explained by c_interview and race in perf_eval is small-medium in magnitude (R2 = .080, adjusted R2 = .072, p < .001).

Using the probe_interaction function from the interactions package, we will visualize the intercept differences. As input to the probe_interaction function, use the lm function base R to specify our regression model from above. The subset= argument functions the same way as the rows= argument from the reg_brief and Regression functions from lessR, so we’ll use that instead for the lm function. Name this model something so that we can reference it in the next function (Int_Model).

# Regress perf_eval (criterion) on c_interview (selection tool) and protected class (race)
Int_Model <- lm(perf_eval ~ c_interview + race, data=DiffPred,
                subset=(race=="black" | race=="white"))

Let’s specify the arguments in the probe_interaction function to plot the intercept differences.

# Visualize intercept differences
probe_interaction(Int_Model, 
                  pred=c_interview, 
                  modx=race,
                  johnson_neyman=FALSE,
                  x.label="Interview Score",
                  y.label="Job Performance")
## SIMPLE SLOPES ANALYSIS 
## 
## Slope of c_interview when race = white: 
## 
##   Est.   S.E.   t val.      p
## ------ ------ -------- ------
##   0.13   0.05     2.47   0.01
## 
## Slope of c_interview when race = black: 
## 
##   Est.   S.E.   t val.      p
## ------ ------ -------- ------
##   0.13   0.05     2.47   0.01

Step Three: Finally, let’s add the interaction term between c_interview and race to the model using the * operator.

# Regress perf_eval (criterion) on c_interview (selection tool), protected class (race), 
# and interaction between selection tool and protected class
reg_brief(perf_eval ~ c_interview * race, data=DiffPred,
          rows=(race=="black" | race=="white"))
## 
## >>>  race is not numeric. Converted to indicator variables.

## >>> Suggestion
## # Create an R markdown file for interpretative output with  Rmd = "file_name"
## reg(perf_eval ~ c_interview * race, data=DiffPred, rows=(race == "black" | race == "white"), Rmd="eg")  
## 
## 
##   BACKGROUND 
## 
## Data Frame:  DiffPred 
##  
## Response Variable: perf_eval 
## Predictor Variable 1: c_interview 
## Predictor Variable 2: racewhite 
## Predictor Variable 3: c_interview.racewhite 
##  
## Number of cases (rows) of data:  212 
## Number of cases retained for analysis:  212 
## 
## 
##   BASIC ANALYSIS 
## 
##                        Estimate    Std Err  t-value  p-value   Lower 95%   Upper 95% 
##           (Intercept)     4.912      0.089   55.275    0.000       4.737       5.087 
##           c_interview     0.218      0.070    3.112    0.002       0.080       0.357 
##             racewhite    -0.463      0.134   -3.466    0.001      -0.727      -0.200 
## c_interview.racewhite    -0.194      0.102   -1.902    0.059      -0.396       0.007 
## 
## Standard deviation of perf_eval: 1.004 
##  
## Standard deviation of residuals:  0.961 for 208 degrees of freedom 
## 95% range of residual variation:  3.790 = 2 * (1.971 * 0.961) 
##  
## R-squared:  0.096    Adjusted R-squared:  0.083    PRESS R-squared:  NA 
## 
## Null hypothesis of all 0 population slope coefficients:
##   F-statistic: 7.375     df: 3 and 208     p-value:  0.000 
## 
## -- Analysis of Variance from Type II Sums of Squares 
##  
##                         df    Sum Sq   Mean Sq   F-value   p-value 
##           c_interview    1     5.700     5.700     6.094     0.014 
##             racewhite    1    10.107    10.107    10.941     0.001 
## c_interview.racewhite    1     3.343     3.343     3.618     0.059 
## Residuals              208   192.150     0.924 
## 
## -- Test of Interaction 
##   
## c_interview:race  df: 1  df resid: 208  SS: 3.343  F: 3.618  p-value: 0.059 
##   
## -- Assume parallel lines, no interaction of race with c_interview 
##  
## Level black: y^_perf_eval = 4.912 + 0.218(x_c_interview) 
## Level white: y^_perf_eval = 4.448 + 0.218(x_c_interview) 
##   
## -- Visualize Separately Computed Regression Lines 
## 
## Plot(c_interview, perf_eval, by=race, fit="lm") 
## 
## 
##   K-FOLD CROSS-VALIDATION 
## 
## 
##   RELATIONS AMONG THE VARIABLES 
## 
## 
##   RESIDUALS AND INFLUENCE 
## 
## 
##   PREDICTION ERROR

The regression coefficient associated with the interaction term (c_interview:racewhite) is not statistically significant (b = -.194, p = .059), which indicates that there is no evidence of slope differences. Given this, we won’t probe the interaction because, well, there is not an interaction to probe.

In sum, we found evidence of intercept differences for the selection tool c_interview in relation to the criterion perf_eval that were conditional upon the protected class variable race. Thus, it would be inappropriate to use a common regression line when predicting scores on the criterion based on the interview selection tool.

43.2.6 Summary

In this chapter, we learned how to use moderated multiple linear regression (MMLR) to test whether evidence of differential prediction exists. To do so, we learned how to use the Regression function (well, technically we used the reg_brief function) from lessR. In addition, we used the probe_interaction function from the interactions package to probe the intercept differences and slope differences visually and using simple slopes analyses.

43.3 Chapter Supplement

In addition to the Regression function from the lessR package covered above, we can use the lm function from base R to estimate a moderated multiple linear regression (MMLR) model. Because this function comes from base R, we do not need to install and access an additional package. In this supplement, you will also have an opportunity to learn how to make an APA (American Psychological Association) style table of regression results.

43.3.1 Functions & Packages Introduced

Function Package
mean base R
lm base R
summary base R
anova base R
probe_interaction interactions
apa.reg.table apaTables

43.3.2 Initial Steps

If required, please refer to the Initial Steps section from this chapter for more information on these initial steps.

# Set your working directory
setwd("H:/RWorkshop")
# Install readr package if you haven't already
# [Note: You don't need to install a package every 
# time you wish to access it]
install.packages("readr")
# Access readr package
library(readr)

# Read data and name data frame (tibble) object
DiffPred <- read_csv("DifferentialPrediction.csv")
## Rows: 320 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): emp_id, gender, race
## dbl (3): perf_eval, interview, age
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Print the names of the variables in the data frame (tibble) objects
names(DiffPred)
## [1] "emp_id"    "perf_eval" "interview" "gender"    "age"       "race"
# View variable type for each variable in data frame
str(DiffPred)
## spc_tbl_ [320 × 6] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ emp_id   : chr [1:320] "EE200" "EE202" "EE203" "EE206" ...
##  $ perf_eval: num [1:320] 6.2 6.6 5 4 5.3 5.9 5.5 3.8 5.7 3.8 ...
##  $ interview: num [1:320] 8.5 5.7 7 6.8 9.6 6 7.5 5.9 9.6 8.3 ...
##  $ gender   : chr [1:320] "woman" "woman" "man" "woman" ...
##  $ age      : num [1:320] 30 45.6 39.2 36.9 24.5 41.5 35.1 43.2 25.7 27.8 ...
##  $ race     : chr [1:320] "black" "black" "asian" "black" ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   emp_id = col_character(),
##   ..   perf_eval = col_double(),
##   ..   interview = col_double(),
##   ..   gender = col_character(),
##   ..   age = col_double(),
##   ..   race = col_character()
##   .. )
##  - attr(*, "problems")=<externalptr>
# View first 6 rows of data frame
head(DiffPred)
## # A tibble: 6 × 6
##   emp_id perf_eval interview gender   age race 
##   <chr>      <dbl>     <dbl> <chr>  <dbl> <chr>
## 1 EE200        6.2       8.5 woman   30   black
## 2 EE202        6.6       5.7 woman   45.6 black
## 3 EE203        5         7   man     39.2 asian
## 4 EE206        4         6.8 woman   36.9 black
## 5 EE209        5.3       9.6 woman   24.5 white
## 6 EE214        5.9       6   woman   41.5 asian

43.3.3 lm Function from Base R

We will use the lm function from base R to specify and estimate our regression functions. As I did above, I will show you how to apply moderated multiple linear regression (MMLR) in the service of detecting differential prediction of a selection tool based on protected class membership.

Grand-Mean Center Variables: Before we estimate the regression models, however, we need to grand-mean center certain variables. We only center predictor and moderator variables that are of type numeric and that we conceptualize as having a continuous (interval or ratio) measurement scale. In our current data frame, we will grand-mean center just the interview and age variables. We will use what is perhaps the most intuitive approach for grand-mean centering variables. We must begin by coming up with a new name for one of our soon-to-be grand-mean centered variable, and in this example, we will center the interview variable. I typically like to name the centered variable by simply adding the c_ prefix to the existing variable’s names (e.g., c_interview). Type the name of the data frame object to which the new centered variable will be attached (DiffPred), followed by the $ operator and the name of the new variable we are creating (c_interview). Next, add the <- operator to indicate what values you will assign to this new variable. To create a vector of values to assign to this new c_interview variable, we will subtract the mean (average) score for the original variable (interview) from each case’s value on the variable. Specifically, enter the name of the data frame object, followed by the $ operator and the name of the original variable (interview). After that, enter the subtraction operator (-). And finally, type the name of the mean function from base R. As the first argument in the mean function, enter the name of the data frame object (DiffPred), followed by the $ operator and the name of the original variable (interview). As the second argument, enter na.rm=TRUE to indicate that you wish to drop missing values when calculating the grand mean for the sample. Now repeat those same steps for the age variable.

# Grand-mean centering: Using basic arithmetic and the mean function from base R
DiffPred$c_interview <- DiffPred$interview - mean(DiffPred$interview, na.rm=TRUE)
DiffPred$c_age <- DiffPred$age - mean(DiffPred$age, na.rm=TRUE)

43.3.3.1 Test Differential Prediction Based on gender

Let’s start with investigating whether differential prediction exists for our interview selection tool in relation to perf_eval based gender subgroup membership (i.e., man, woman).

Step One: Let’s being by regressing the criterion variable perf_eval on the selection tool variable c_interview. Here we are just establishing whether the selection tool shows evidence of criterion-related validity. See the chapters on criterion-related validity using correlation and predicting criterion scores using simple linear regression if you would like more information on regression-based approaches to assessing criterion-related validity, including determining whether statistical assumptions have been met. Let’s name our regression model for this first step model.1. Remember, we need to use the summary function from base R to view our model output.

# Regress perf_eval (criterion) on c_interview (selection tool)
model.1 <- lm(perf_eval ~ c_interview, data=DiffPred)
summary(model.1)
## 
## Call:
## lm(formula = perf_eval ~ c_interview, data = DiffPred)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.7226 -0.6846  0.0007  0.7494  2.5842 
## 
## Coefficients:
##             Estimate Std. Error t value             Pr(>|t|)    
## (Intercept)  4.70750    0.05444  86.472 < 0.0000000000000002 ***
## c_interview  0.19175    0.04143   4.629           0.00000537 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9738 on 318 degrees of freedom
## Multiple R-squared:  0.06312,    Adjusted R-squared:  0.06018 
## F-statistic: 21.43 on 1 and 318 DF,  p-value: 0.000005366

The regression coefficient for c_interview is statistically significant and positive (b = .192, p < .001), which provides evidence of criterion-related validity. The amount of variance explained by c_interview in perf_eval is small-medium in magnitude (R2 = .063, adjusted R2 = .060, p < .001). See the table below for common rules of thumbs for qualitatively describing R2 values.

R2 Description
.01 Small
.09 Medium
.25 Large

Step Two: Next, let’s add the protected class variable gender to our model to investigate whether intercept differences exist. We’ll name this model model.2. For more information on how to interpret and test the statistical assumptions of a multiple linear regression model, check out the chapter on estimating incremental validity. In addition, let’s use the anova function from base R to do a nested model comparison between model.1 and model.2. This will tell us whether adding gender to the model significantly improved model fit. Finally, let’s reference the summary/output for each model and specifically the r.squared values to determine what the change in R2 was from one model to the next. The change in R2 gives as an idea of how much (in terms of practical significance) the model fit improved when adding the additional variable. Note that we use the $ operator to indicate that we want the r.squared value from the summary(model.2) and summary(model.2) output, and then we do a simple subtraction to get the change in R2.

# Regress perf_eval (criterion) on c_interview (selection tool) and protected class (gender)
model.2 <- lm(perf_eval ~ c_interview + gender, data=DiffPred)
summary(model.2)
## 
## Call:
## lm(formula = perf_eval ~ c_interview + gender, data = DiffPred)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.8125 -0.6629  0.0049  0.7578  2.5245 
## 
## Coefficients:
##             Estimate Std. Error t value             Pr(>|t|)    
## (Intercept)  4.62953    0.07625  60.719 < 0.0000000000000002 ***
## c_interview  0.21058    0.04332   4.861           0.00000184 ***
## genderwoman  0.16634    0.11409   1.458                0.146    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9721 on 317 degrees of freedom
## Multiple R-squared:  0.06936,    Adjusted R-squared:  0.06349 
## F-statistic: 11.81 on 2 and 317 DF,  p-value: 0.00001127
# Nested model comparison
anova(model.1, model.2)
## Analysis of Variance Table
## 
## Model 1: perf_eval ~ c_interview
## Model 2: perf_eval ~ c_interview + gender
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1    318 301.58                           
## 2    317 299.57  1    2.0086 2.1255 0.1459
# Change in R-squared
summary(model.2)$r.squared - summary(model.1)$r.squared
## [1] 0.006239901

The unstandardized regression coefficient associated with gender (or more specifically genderwoman) is not statistically significant (b = .166, p = .146) when controlling for c_interview, which indicates that there is no evidence of intercept differences. Note that the name of the regression coefficient for gender has woman appended to it; this means that the intercept represents the perf_eval scores of women minus those for men. Behind the scenes, the reg_brief function has converted the gender variable to a dichotomous factor, where alphabetically man becomes 0 and woman becomes 1. Because the coefficient is negative, it implies that the intercept value is higher for women relative to men; however, please note that we wouldn’t interpret the direction of this effect because it was not statistically significant. The amount of variance explained by c_interview and gender in perf_eval is small-medium in magnitude (R2 = .069, adjusted R2 = .063, p < .001), which is slightly larger than when just c_interview was included in the model (see Step One above). The nonsignificant F-value (2.126, p = .146) in from the nested model comparison indicates that adding gender to the model did not significantly improve model fit, and thus we will not interpret the change in R2.

Because there is no evidence of intercept differences for Step Two, we won’t plot/graph the our model, as doing so might mislead someone who sees such a plot without knowing or understanding the results of the model we just estimated. We will now proceed to Step Three.

Step Three: As the final step, it’s time to add the interaction term (also known as a product term) between c_interview and gender to the model, which creates what is referred to as a multiplicative model. Fortunately, this is very easy to do in R. All we need to do is change the + operator from our previous regression model script to the *, where the * implies an interaction in the context of a model. Conceptually, the * is appropriate because we are estimating a multiplicative model containing an interaction term. Let’s name this model model.3. As before, we’ll do a nested model comparison between the previous model and this model to determine whether there is a significant improvement in model fit. In addition, we’ll estimate change in R2.

# Regress perf_eval (criterion) on c_interview (selection tool), protected class (gender), 
# and interaction between selection tool and protected class
model.3 <- lm(perf_eval ~ c_interview * gender, data=DiffPred)
summary(model.3)
## 
## Call:
## lm(formula = perf_eval ~ c_interview * gender, data = DiffPred)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.71851 -0.67772  0.01742  0.69477  2.48590 
## 
## Coefficients:
##                         Estimate Std. Error t value             Pr(>|t|)    
## (Intercept)              4.56191    0.07566  60.298 < 0.0000000000000002 ***
## c_interview              0.39426    0.05909   6.672       0.000000000113 ***
## genderwoman              0.15494    0.11091   1.397                0.163    
## c_interview:genderwoman -0.37306    0.08422  -4.430       0.000013026266 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9448 on 316 degrees of freedom
## Multiple R-squared:  0.1238, Adjusted R-squared:  0.1155 
## F-statistic: 14.88 on 3 and 316 DF,  p-value: 0.000000004387
# Nested model comparison
anova(model.2, model.3)
## Analysis of Variance Table
## 
## Model 1: perf_eval ~ c_interview + gender
## Model 2: perf_eval ~ c_interview * gender
##   Res.Df    RSS Df Sum of Sq      F     Pr(>F)    
## 1    317 299.57                                   
## 2    316 282.06  1    17.514 19.622 0.00001303 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Change in R-squared
summary(model.3)$r.squared - summary(model.2)$r.squared
## [1] 0.05440861

In our output, we now have an unstandardized regression coefficient called c_interview:gender (or c_interview.gender), which represents our interaction term for c_interview and gender. The regression coefficient associated with the interaction term (c_interview:gender) is negative and statistically significant (b = -.373, p < .001), but we shouldn’t pay too much attention to the sign of the interaction term as it is difficult to interpret it without graphing the regression model in its entirety. We should, however, pay attention to the fact that the interaction term is statistically significant, which indicates that there is evidence of slope differences; in other words, there seems to be multiplicative effect between c_interview and gender. Although we didn’t find evidence of intercept differences in Step Two, here in Step Three, we do find evidence that the criterion-related validities for the selection tool (c_interview) in relation to the criterion (perf_eval) are significantly different from one another; in other words, we found evidence that the slopes for men and women differ. The amount of variance explained by c_interview, gender, and their interaction in relation to perf_eval is medium in magnitude (R2 = .124, adjusted R2 = .115, p < .001), which is larger than when just c_interview and gender were included in the model (see Step Two above). The significant F-value (19.622, p < .001) in from the nested model comparison indicates that adding the interaction term significantly improved model fit, and because there was a significant improvement in model fit, we should interpret the change in R2 (R2 = .054), which indicates that adding the interaction term to the model explained an additional 5.4% of the variance in perf_eval.

Given the statistically significant interaction term, it is appropriate (and helpful) to probe the interaction by plotting it. This will help us understand the form of the interaction. We can do so by creating a plot of the simple slopes. To do so, we will use a package called interactions (Long 2019), so if you haven’t already, install and access the package.

# Install interactions package if you haven't already
install.packages("interactions")

# Note: You may need to install the sandwich package independently if you 
# receive an error when attempting to run the probe_interaction function
# install.packages("sandwich")
# Access interactions package
library(interactions)

As the first argument in the probe_interaction function from the interactions package, enter the name of the regression model we just created above (model.3). As the second argument, type the name of the variable you are framing as the predictor after pred=, which in our case is the selection tool variable (c_interview). As the third argument, type the name of the variable you are framing as the moderator after modx=, which in our case is the protected class variable (gender). As the fourth argument, enter johnson_neyman=FALSE, as we aren’t requesting the Johnson-Neyman test in this tutorial. Finally, if you choose to, you can use x.label= and y.label= to create more informative names for the predictor and outcome variables, respectively. You could also add, if you should choose to do so, the legend.main= argument to provide a more informative name for the moderator variable, and/or the modx.labels= to provide more informative names for the moderator variable labels.

# Probe the significant interaction (slope differences)
probe_interaction(model.3, 
                  pred=c_interview, 
                  modx=gender,
                  johnson_neyman=FALSE,
                  x.label="Interview Score",
                  y.label="Job Performance")
## SIMPLE SLOPES ANALYSIS 
## 
## Slope of c_interview when gender = man: 
## 
##   Est.   S.E.   t val.      p
## ------ ------ -------- ------
##   0.39   0.06     6.67   0.00
## 
## Slope of c_interview when gender = woman: 
## 
##   Est.   S.E.   t val.      p
## ------ ------ -------- ------
##   0.02   0.06     0.35   0.72

The plot output shows that the two slopes are indeed different, such that there doesn’t seem to be much of an association between the selection tool (c_interview) and the criterion (perf_eval) for women, but there seems to be a positive association for men. Now take a look at the text output in your console. The simple slopes analysis automatically estimates the regression coefficient (slope) for both levels of the moderator variable (gender: man, woman). Corroborating what we see in the plot, we find that the simple slope for men is indeed statistically significant and positive (b = .39, p < .01), such that for every one unit increase in interview scores, job performance scores tend to go up by .39 units. In contrast, we find that the simple slope for women is nonsignificant (b = .02, p = .72).

In sum, we found evidence of slope differences – but not intercept differences – for the selection tool (c_interview) in relation to the criterion (perf_eval) that were conditional upon the levels/subgroups of the protected class variable (gender). Thus, we can conclude that there is evidence of differential prediction based on gender for this interview selection tool, which means that it would be inappropriate to use a common regression line when predicting scores on the criterion based on the interview selection tool.

Sample Write-Up: Our HR analytics team recently found evidence in support of the criterion-related validity of a new structured interview selection procedure. As a next step, we investigated whether there might be evidence of differential prediction of the structured interview based on the U.S. protected class of gender. Based on a sample of 370 individuals, we applied a three-step process with moderated multiple linear regression, which is consistent with established principles (Society for Industrial & Organizational Psychology 2018; Cleary 1968). Based on our analyses, we found evidence differential prediction (i.e., predictive bias) based on gender for the new structured interview procedure, and specifically, we found evidence of slope differences. In the first step, we found evidence of criterion-related validity based on a simple linear regression model, as structured interview scores were positively and significantly associated with job performance (b = .192, p < .001), thereby indicating some degree of job-relatedness and -relevance. In the second step, we added the protected class variable associated with gender to the model, resulting in a multiple linear regression model, but did not find evidence of intercept differences based on gender (b = .166, p = .146). In the third step, we added the interaction term between the structured interview and gender variables, resulting in a moderated multiple linear regression model, and we found that gender moderated the association between structured interview scores and job performance scores to a statistically significant extent (b = -.373, p < .001). Based on follow-up simple slopes analyses, we found that the association between structured interview scores and job performance scores was statistically significant for men (b = .39, p < .01), such that for every one unit increase in structured interview scores, job performance scores tended to increase by .39 units; in contrast, the associated between structured interview scores and job performance scores for women was nonsignificant (b = .02, p = .72). In sum, we found evidence slope differences, which means that this structured interview tool shows predictive bias with respect to gender, making the application of a common regression line (i.e., equation) legally problematic within the United States. If possible, this structured interview should be redesigned or the interviewers should be trained to reduce the predictive bias based on gender. In the interim, we caution against using separate regression lines for men and women, as this may be interpreted as running afoul of the U.S. Civil Rights Act of 1991 (Saad and Sackett 2002). A more conservative approach would be to develop a structured interview process that allows for the use of a common regression line across legally protected groups.

43.3.3.2 Test Differential Prediction Based on c_age

Investigating whether differential prediction exists for c_interview in relation to perf_eval based on the continuous moderator variable c_age will unfold very similarly to the process we used for the dichotomous gender variable from above. The only differences will occur when it comes to estimating and visualizing the intercept differences and simple slopes (assuming we find statistically significant effects). Given this, we will breeze through the steps, which means I will provide less explanation.

Step One: Regress the criterion variable perf_eval on the selection tool variable c_interview.

# Regress perf_eval (criterion) on c_interview (selection tool)
model.1 <- lm(perf_eval ~ c_interview, data=DiffPred)
summary(model.1)
## 
## Call:
## lm(formula = perf_eval ~ c_interview, data = DiffPred)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.7226 -0.6846  0.0007  0.7494  2.5842 
## 
## Coefficients:
##             Estimate Std. Error t value             Pr(>|t|)    
## (Intercept)  4.70750    0.05444  86.472 < 0.0000000000000002 ***
## c_interview  0.19175    0.04143   4.629           0.00000537 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9738 on 318 degrees of freedom
## Multiple R-squared:  0.06312,    Adjusted R-squared:  0.06018 
## F-statistic: 21.43 on 1 and 318 DF,  p-value: 0.000005366

The regression coefficient for c_interview is statistically significant and positive (b = .192, p < .001), which provides evidence of criterion-related validity. The amount of variance explained by c_interview in perf_eval is small-medium in magnitude (R2 = .063, adjusted R2 = .060, p < .001).

Step Two: Regress the criterion variable perf_eval on the selection tool variable c_interview and the protected class variable c_age. In addition, do a nested model comparison, and estimate the change in R2.

# Regress perf_eval (criterion) on c_interview (selection tool) and protected class (c_age)
model.2 <- lm(perf_eval ~ c_interview + c_age, data=DiffPred)
summary(model.2)
## 
## Call:
## lm(formula = perf_eval ~ c_interview + c_age, data = DiffPred)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.2596 -0.6087  0.0444  0.6184  2.5522 
## 
## Coefficients:
##             Estimate Std. Error t value             Pr(>|t|)    
## (Intercept)  4.70750    0.04999  94.164 < 0.0000000000000002 ***
## c_interview  0.97806    0.10833   9.028 < 0.0000000000000002 ***
## c_age        0.12863    0.01659   7.752    0.000000000000124 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8943 on 317 degrees of freedom
## Multiple R-squared:  0.2124, Adjusted R-squared:  0.2075 
## F-statistic: 42.75 on 2 and 317 DF,  p-value: < 0.00000000000000022
# Nested model comparison
anova(model.1, model.2)
## Analysis of Variance Table
## 
## Model 1: perf_eval ~ c_interview
## Model 2: perf_eval ~ c_interview + c_age
##   Res.Df    RSS Df Sum of Sq      F             Pr(>F)    
## 1    318 301.58                                           
## 2    317 253.52  1    48.059 60.092 0.0000000000001241 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Change in R-squared
summary(model.2)$r.squared - summary(model.1)$r.squared
## [1] 0.1492979

The regression coefficient associated with c_age is positive and statistically significant (b = .129, p < .001) when controlling for c_interview, which indicates that there is evidence of intercept differences. Because the coefficient is positive, it implies that the intercept value is significantly higher for older workers relative to younger workers. The amount of variance explained by c_interview and c_age in perf_eval is medium-large in magnitude (R2 = .212, adjusted R2 = .207, p < .001), which is notably larger than when just c_interview was included in the model (see Step One above). The significant F-value (60.092, p < .001) in from the nested model comparison indicates that adding c_age significantly improved model fit, and because there was a significant improvement in model fit, we should interpret the change in R2 (R2 = .149), which indicates that adding c_age to the model explained an additional 14.9% of the variance in perf_eval.

Using the probe_interaction function from the interactions package, we will visualize the intercept differences of the model.2 model we specified above. Note that the function automatically categorizes the continuous moderator variable by its mean and 1 standard deviation (SD) below and above the mean. As you can see in the plot, individuals whose age is 1 SD above the mean have the highest intercept value, where the intercept is the value of perf_eval (Job Performance) when interview (Interview) is equal to zero.

# Visualize intercept differences
probe_interaction(model.2, 
                  pred=c_interview, 
                  modx=c_age,
                  johnson_neyman=FALSE,
                  x.label="Interview Score",
                  y.label="Job Performance")
## SIMPLE SLOPES ANALYSIS 
## 
## Slope of c_interview when c_age = -8.593122204199181268791 (- 1 SD): 
## 
##   Est.   S.E.   t val.      p
## ------ ------ -------- ------
##   0.98   0.11     9.03   0.00
## 
## Slope of c_interview when c_age = -0.000000000000003164136 (Mean): 
## 
##   Est.   S.E.   t val.      p
## ------ ------ -------- ------
##   0.98   0.11     9.03   0.00
## 
## Slope of c_interview when c_age =  8.593122204199174163364 (+ 1 SD): 
## 
##   Est.   S.E.   t val.      p
## ------ ------ -------- ------
##   0.98   0.11     9.03   0.00

Step Three: Now, we will add the interaction term between c_interview and c_age to the model using the * operator. In addition, do a nested model comparison, and estimate the change in R2.

# Regress perf_eval (criterion) on c_interview (selection tool) and protected class (c_age)
model.3 <- lm(perf_eval ~ c_interview * c_age, data=DiffPred)
summary(model.3)
## 
## Call:
## lm(formula = perf_eval ~ c_interview * c_age, data = DiffPred)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.6271 -0.5798  0.0261  0.5553  2.4924 
## 
## Coefficients:
##                   Estimate Std. Error t value             Pr(>|t|)    
## (Intercept)       4.995359   0.067563  73.937 < 0.0000000000000002 ***
## c_interview       1.711274   0.159937  10.700 < 0.0000000000000002 ***
## c_age             0.263061   0.027431   9.590 < 0.0000000000000002 ***
## c_interview:c_age 0.027267   0.004555   5.986        0.00000000585 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8489 on 316 degrees of freedom
## Multiple R-squared:  0.2926, Adjusted R-squared:  0.2859 
## F-statistic: 43.57 on 3 and 316 DF,  p-value: < 0.00000000000000022
# Nested model comparison
anova(model.2, model.3)
## Analysis of Variance Table
## 
## Model 1: perf_eval ~ c_interview + c_age
## Model 2: perf_eval ~ c_interview * c_age
##   Res.Df    RSS Df Sum of Sq      F         Pr(>F)    
## 1    317 253.52                                       
## 2    316 227.71  1    25.817 35.827 0.000000005849 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Change in R-squared
summary(model.3)$r.squared - summary(model.2)$r.squared
## [1] 0.08020031

The regression coefficient associated with the interaction term (c_interview:c_age) is positive and statistically significant (b = .027, p < .001), which indicates that there is evidence of slope differences. The amount of variance explained by c_interview, c_age, and their interaction in relation to perf_eval is large in magnitude (R2 = .293, adjusted R2 = .286, p < .001), which is larger than when just c_interview and c_age were included in the model. The significant F-value (35.827, p < .001) in from the nested model comparison indicates that adding the interaction term significantly improved model fit, and because there was a significant improvement in model fit, we should interpret the change in R2 (R2 = .080), which indicates that adding the interaction term to the model explained an additional 8.0% of the variance in perf_eval.

Given the statistically significant interaction term, we will use the probe_interaction function from the interactions package to probe the interaction to plot this multiplicative model that we named model.3.

# Probe the significant interaction (slope differences)
probe_interaction(model.3, 
                  pred=c_interview, 
                  modx=c_age,
                  johnson_neyman=FALSE,
                  x.label="Interview Score",
                  y.label="Job Performance")
## SIMPLE SLOPES ANALYSIS 
## 
## Slope of c_interview when c_age = -8.593122204199181268791 (- 1 SD): 
## 
##   Est.   S.E.   t val.      p
## ------ ------ -------- ------
##   1.48   0.13    11.16   0.00
## 
## Slope of c_interview when c_age = -0.000000000000003164136 (Mean): 
## 
##   Est.   S.E.   t val.      p
## ------ ------ -------- ------
##   1.71   0.16    10.70   0.00
## 
## Slope of c_interview when c_age =  8.593122204199174163364 (+ 1 SD): 
## 
##   Est.   S.E.   t val.      p
## ------ ------ -------- ------
##   1.95   0.19    10.16   0.00

The plot and the simple slopes analyses output show that the two slopes are indeed different, such that the association between the selection tool (c_interview, Interview) and the criterion (perf_eval, Job Performance) is more positive for older workers (b = 1.95, p < .01), than for average-aged workers (b = 1.71, p < .01) and younger workers (b = 1.48, p < .01). Notably, all three simple slopes are statistically significant and positive, but the slope is more positive for older workers.

In sum, we found evidence of both intercept and slope differences for the selection tool (c_interview) in relation to the criterion (perf_eval) that were conditional upon the age protected class variable (c_age). Thus, it would be inappropriate to use a common regression line when predicting scores on the criterion based on the interview selection tool.

43.3.3.3 Test Differential Prediction Based on race

To investigate whether differential prediction exists for c_interview in relation to perf_eval based on race, we will follow mostly the same process as we did with the dichotomous variable gender above. However, we will filter our data within the regression function to compare just Black individuals with White individuals, as it is customary to compare just two races/ethnicities at a time. The decision to focus on only Black and White individuals is arbitrary and just for demonstration purposes. Note that in the DiffPred data frame object that both black and white are lowercase category labels in the race variable.

Step One: Regress the criterion variable perf_eval on the selection tool variable c_interview. Add the subset=(race=="black" | race=="white") argument to subset the data such that only those individuals who are Black and White are retained. In doing so, we effectively dichotomize the race variable within the lm function. If you need to review conditional/logical statements when filtering data, check out the chapter on filtering data.

# Regress perf_eval (criterion) on c_interview (selection tool)
model.1 <- lm(perf_eval ~ c_interview, data=DiffPred,
              subset=(race=="black" | race=="white"))
summary(model.1)
## 
## Call:
## lm(formula = perf_eval ~ c_interview, data = DiffPred, subset = (race == 
##     "black" | race == "white"))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.72580 -0.66775 -0.00208  0.74761  2.49792 
## 
## Coefficients:
##             Estimate Std. Error t value             Pr(>|t|)    
## (Intercept)  4.71479    0.06817  69.164 < 0.0000000000000002 ***
## c_interview  0.13983    0.05233   2.672              0.00813 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9895 on 210 degrees of freedom
## Multiple R-squared:  0.03288,    Adjusted R-squared:  0.02827 
## F-statistic: 7.139 on 1 and 210 DF,  p-value: 0.008133

We should begin by noting that our effective sample size is now 212 (which we can compute by adding the number of degrees of freedom in our model, 212, to the number of parameters estimated, 2), whereas previously it was 370 when we included individuals of all races. The regression coefficient for c_interview is statistically significant and positive (b = .140, p = .008) based on this reduced sample of 212 individuals, which provides evidence of criterion-related validity. The amount of variance explained by c_interview in perf_eval is small in magnitude (R2 = .033, adjusted R2 = .028, p < .001).

Step Two: Regress the criterion variable perf_eval on the selection tool variable c_interview and the protected class variable race. In addition, do a nested model comparison, and estimate the change in R2.

# Regress perf_eval (criterion) on c_interview (selection tool) and protected class (race)
model.2 <- lm(perf_eval ~ c_interview + race, data=DiffPred,
              subset=(race=="black" | race=="white"))
summary(model.2)
## 
## Call:
## lm(formula = perf_eval ~ c_interview + race, data = DiffPred, 
##     subset = (race == "black" | race == "white"))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.51207 -0.64712 -0.03035  0.65707  2.28188 
## 
## Coefficients:
##             Estimate Std. Error t value             Pr(>|t|)    
## (Intercept)  4.91079    0.08941  54.922 < 0.0000000000000002 ***
## c_interview  0.12665    0.05131   2.469              0.01437 *  
## racewhite   -0.44041    0.13398  -3.287              0.00119 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9671 on 209 degrees of freedom
## Multiple R-squared:  0.08042,    Adjusted R-squared:  0.07162 
## F-statistic: 9.139 on 2 and 209 DF,  p-value: 0.0001567
# Nested model comparison
anova(model.1, model.2)
## Analysis of Variance Table
## 
## Model 1: perf_eval ~ c_interview
## Model 2: perf_eval ~ c_interview + race
##   Res.Df    RSS Df Sum of Sq      F   Pr(>F)   
## 1    210 205.60                                
## 2    209 195.49  1    10.107 10.806 0.001187 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Change in R-squared
summary(model.2)$r.squared - summary(model.1)$r.squared
## [1] 0.04754406

The regression coefficient associated with race (or more specifically racewhite) is negative and statistically significant (b = -.440, p = .001) when controlling for c_interview, which indicates that there is evidence of intercept differences. Because the coefficient is negative and because white is appended to the race variable name, it implies that the intercept value is significantly lower for White individuals compared to Black individuals. The amount of variance explained by c_interview and race in perf_eval is small-medium in magnitude (R2 = .080, adjusted R2 = .072, p < .001). The significant F-value (10.806, p = .001) from the nested model comparison indicates that adding race significantly improved model fit, and because there was a significant improvement in model fit, we should interpret the change in R2 (R2 = .048), which indicates that adding race to the model explained an additional 4.8% of the variance in perf_eval.

Using the probe_interaction function from the interactions package, we will visualize the statistically significant intercept differences. As input to the probe_interaction function, use the the model.2 model object from above, and specify the arguments in the probe_interaction function as you did before.

# Visualize intercept differences
probe_interaction(model.2, 
                  pred=c_interview, 
                  modx=race,
                  johnson_neyman=FALSE,
                  x.label="Interview Score",
                  y.label="Job Performance")
## SIMPLE SLOPES ANALYSIS 
## 
## Slope of c_interview when race = white: 
## 
##   Est.   S.E.   t val.      p
## ------ ------ -------- ------
##   0.13   0.05     2.47   0.01
## 
## Slope of c_interview when race = black: 
## 
##   Est.   S.E.   t val.      p
## ------ ------ -------- ------
##   0.13   0.05     2.47   0.01

Step Three: Finally, let’s add the interaction term between c_interview and race to the model using the * operator. In addition, do a nested model comparison, and estimate the change in R2.

# Regress perf_eval (criterion) on c_interview (selection tool), protected class (race),
# and interaction between selection tool and protected class
model.3 <- lm(perf_eval ~ c_interview * race, data=DiffPred,
              subset=(race=="black" | race=="white"))
summary(model.3)
## 
## Call:
## lm(formula = perf_eval ~ c_interview * race, data = DiffPred, 
##     subset = (race == "black" | race == "white"))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.7314 -0.6396  0.0227  0.6637  2.4205 
## 
## Coefficients:
##                       Estimate Std. Error t value             Pr(>|t|)    
## (Intercept)            4.91180    0.08886  55.275 < 0.0000000000000002 ***
## c_interview            0.21843    0.07020   3.112             0.002122 ** 
## racewhite             -0.46345    0.13370  -3.466             0.000641 ***
## c_interview:racewhite -0.19428    0.10213  -1.902             0.058527 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9611 on 208 degrees of freedom
## Multiple R-squared:  0.09615,    Adjusted R-squared:  0.08311 
## F-statistic: 7.375 on 3 and 208 DF,  p-value: 0.0001015
# Nested model comparison
anova(model.2, model.3)
## Analysis of Variance Table
## 
## Model 1: perf_eval ~ c_interview + race
## Model 2: perf_eval ~ c_interview * race
##   Res.Df    RSS Df Sum of Sq      F  Pr(>F)  
## 1    209 195.49                              
## 2    208 192.15  1    3.3426 3.6184 0.05853 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Change in R-squared
summary(model.3)$r.squared - summary(model.2)$r.squared
## [1] 0.0157234

The regression coefficient associated with the interaction term (c_interview:racewhite) is not statistically significant (b = -.194, p = .059), which indicates that there is no evidence of slope differences. Given this, we won’t probe the interaction because, well, there is not an interaction to probe. Further, adding the interaction term did not significantly improve model fit, so we won’t interpret change in R2. Given all this, we won’t probe the interaction because, well, there is not an interaction to probe.

In sum, we found evidence of intercept differences for the selection tool c_interview in relation to the criterion perf_eval that were conditional upon the protected class variable race. Thus, it would be inappropriate to use a common regression line when predicting scores on the criterion based on the interview selection tool.

43.3.4 APA-Style Results Table

If you want to present the results of your moderated multiple linear regression (MMLR) model to a more statistically inclined audience, particularly an audience that prefers American Psychological Association (APA) style, consider using functions from the apaTables package (Stanley 2021).

Using the lm function from base R, as we did above, let’s begin by estimating a MMLR model and naming the model object (model.3).

# Estimate multiple linear regression model
model.3 <- lm(perf_eval ~ c_interview * gender, data=DiffPred)

If you haven’t already, install and access the apaTables package using the install.packages and library functions, respectively.

# Install package
install.packages("apaTables")
# Access package
library(apaTables)

The apa.reg.table function from apaTables is pretty straightforward. Simply enter your regression model object (model.3) as the sole parenthetical argument. This will generate a table as output in your Console.

# Create APA-style regression table
apa.reg.table(model.3)
## 
## 
## Regression results using perf_eval as the criterion
##  
## 
##                Predictor       b       b_95%_CI sr2  sr2_95%_CI             Fit
##              (Intercept)  4.56**   [4.41, 4.71]                                
##              c_interview  0.39**   [0.28, 0.51] .12  [.06, .19]                
##              genderwoman    0.15  [-0.06, 0.37] .01 [-.01, .02]                
##  c_interview:genderwoman -0.37** [-0.54, -0.21] .05  [.01, .10]                
##                                                                     R2 = .124**
##                                                                 95% CI[.06,.19]
##                                                                                
## 
## Note. A significant b-weight indicates the semi-partial correlation is also significant.
## b represents unstandardized regression weights. 
## sr2 represents the semi-partial correlation squared.
## Square brackets are used to enclose the lower and upper limits of a confidence interval.
## * indicates p < .05. ** indicates p < .01.
## 

If we add a filename= as a second argument, we can specify the name of a .doc file that we can write to our working directory. Here, I name the file “APA Multiple Linear Regression Table.doc”. The .doc file that appears in your working directory will be nicely formatted and include critical regression model results in an organized manner.

# Create APA-style regression table and write to working directory
apa.reg.table(model.3, filename="APA Moderated Multiple Linear Regression Table.doc")
## 
## 
## Regression results using perf_eval as the criterion
##  
## 
##                Predictor       b       b_95%_CI sr2  sr2_95%_CI             Fit
##              (Intercept)  4.56**   [4.41, 4.71]                                
##              c_interview  0.39**   [0.28, 0.51] .12  [.06, .19]                
##              genderwoman    0.15  [-0.06, 0.37] .01 [-.01, .02]                
##  c_interview:genderwoman -0.37** [-0.54, -0.21] .05  [.01, .10]                
##                                                                     R2 = .124**
##                                                                 95% CI[.06,.19]
##                                                                                
## 
## Note. A significant b-weight indicates the semi-partial correlation is also significant.
## b represents unstandardized regression weights. 
## sr2 represents the semi-partial correlation squared.
## Square brackets are used to enclose the lower and upper limits of a confidence interval.
## * indicates p < .05. ** indicates p < .01.
## 
The apa.reg.table function from the apaTables package can table moderated multiple linear regression model results in a manner that is consistent with the American Psychological Association (APA) style guide. APA-style tables are useful when presenting to academic audiences or audiences with high levels of technical/statistical expertise.
The apa.reg.table function from the apaTables package can table moderated multiple linear regression model results in a manner that is consistent with the American Psychological Association (APA) style guide. APA-style tables are useful when presenting to academic audiences or audiences with high levels of technical/statistical expertise.

References

Aguinis, Herman, and Charles A Pierce. 1998. “Heterogeneity of Error Variance and the Assessment of Moderating Effects of Categorical Variables: A Conceptual Review.” Organizational Research Methods 1 (3): 296–314.
Berry, Christopher M. 2015. “Differential Validity and Differential Prediction of Cognitive Ability Tests: Understanding Test Bias in the Employment Context.” Annuual Review of Organizational Psychology and Organizational Behavior 2 (1): 435–63.
Cascio, Wayne F, and Herman Aguinis. 2005. Applied Psychology in Human Resource Management (6th Ed.). Upper Saddle River, NJ: Pearson Prentice Hall.
Cleary, T Anne. 1968. “Test Bias: Prediction of Grades of Black and White Students in Integrated Colleges.” Journal of Educational Measurement 5 (2): 115–24.
Fisher, Gwenith G, Donald M Truxillo, Lisa M Finkelstein, and Lauren E Wallace. 2017. “Age Discrimination: Potential for Adverse Impact and Differential Prediction Related to Age.” Human Resource Management Review 27 (2): 316–27.
Gerbing, David, The School of Business, and Portland State University. 2021. lessR: Less Code, More Results. https://CRAN.R-project.org/package=lessR.
Long, Jacob A. 2019. Interactions: Comprehensive, User-Friendly Toolkit for Probing Interactions. https://cran.r-project.org/package=interactions.
Saad, Syed, and Paul R Sackett. 2002. “Investigating Differential Prediction by Gender in Employment-Oriented Personality Measures.” Journal of Applied Psychology 87 (4): 667–74.
Society for Industrial & Organizational Psychology. 2018. Principles for the Validation and Use of Personnel Selection Procedures (5th Ed.). Bowling Green, OH: Pearson Prentice Hall.
Stanley, David. 2021. apaTables: Create American Psychological Association (APA) Style Tables. https://CRAN.R-project.org/package=apaTables.
Stone-Romero, Eugene F, George M Alliger, and Herman Aguinis. 1994. “Type II Error Problems in the Use of Moderated Multiple Regression for the Detection of Moderating Effects of Dichotomous Variables.” Journal of Management 20 (1): 167–78.
Wickham, Hadley, Jim Hester, and Jennifer Bryan. 2024. Readr: Read Rectangular Text Data. https://CRAN.R-project.org/package=readr.