Chapter 43 Statistically & Empirically Cross-Validating a Selection Tool

In this chapter, we will learn how to perform both statistical and empirical cross-validation, and we will do so in the context of validating an employee selection tool by estimating a multiple linear regression model. For a review of multiple linear regression, please refer to the chapter on estimating incremental validity.

43.1 Conceptual Overview

Link to conceptual video: https://youtu.be/DWUc5GBRFuA

Cross-validation is a process through which we determine how well a regression model estimated using a sample from drawn from a particular population generalizes to another sample from that same population. More specifically, the term cross-validity “refers to whether the [regression coefficients] derived from one sample can predict outcomes to the same degree in the population as a whole or in other samples drawn from the same population” (Cascio and Aguinis 2005, 173). At its core, cross-validation represents a step toward true predictive analytics, as it involves training our data on one sample and then testing our data on one or more other samples from that population. In general, we can distinguish between two approaches to cross-validation: statistical and empirical.

43.1.1 Review of Statistical Cross-Validation

Statistical cross-validation is performed without collecting additional data from the underlying population from which the original regression model was estimated. There are a couple of different statistical indicators of a model’s cross-validity, which can be used in tandem to inform judgments of regarding cross-validity: adjusted R2 and squared cross-validity (Raju et al. 1997).

The adjusted R2 (i.e., \(R^2_{adj}\), squared multiple correlation) (Ezekiel 1930; Wherry 1931) appears in most statistical platforms’ regression output (including many R functions), and it provides a more accurate estimate of the amount of variance explained by the predictor variable(s) in the outcome variable than the unadjusted R2 (\(R^2_{unadj}\)). Specifically, the \(R^2_{adj}\) corrects for the sample size relative to the number of predictor variables in the model, which results in a lower estimate than its unadjusted counterpart. The \(R^2_{adj}\) is a better indicator of the magnitude of the association in the underlying population than the \(R^2_{unadj}\). The formula for \(R^2_{adj}\) is:

\(R^2_{adj} = 1-\frac{N-1}{N-k-1}(1-R^2_{unadj})\)

where \(N\) refers to the sample size, \(k\) refers to the number of predictor variables in the model, and \(R^2_{unadj}\) is the unadjusted R2, which reflects the sample-specific variance explained by the predictor variables in relation to the outcome variable. Again, this \(R^2_{adj}\) is readily produced by most regression functions, so additional steps are not required to compute it.

In contrast, the squared cross-validity (\(\rho^2_{c}\)) (Browne 1975) often requires additional computation beyond what our regression function computes. As you can see in the formula below, the \(\rho^2_{c}\) formula uses the \(R^2_{adj}\) as input.

\(\rho^2_{c} = \frac{(N-k-3)(R^2_{adj})^2+R^2_{adj}}{(N-2k-2)(R^2_{adj})+k}\)

Both \(R^2_{adj}\) and \(\rho^2_{c}\) can be used inform our understanding of a model’s squared cross-validity; however, preference should be given to the \(\rho^2_{c}\), as \(R^2_{adj}\) by itself tends to overestimate the squared cross-validity. If we take the square root of \(\rho^2_{c}\), we get an estimate of cross-validity (\(\rho_{c}\)).

43.1.2 Review of Empirical Cross-Validation

Empirical cross-validation requires that we sample additional data from the same population, or that we split our original sample (if it’s large enough) into subsamples, where the subsamples are sometimes referred to as holdout samples. Regardless of which approach we use, the logic is similar: One (sub)sample serves as our training (i.e., initial, original, derivation) data, and the other (sub)sample serves as our test (i.e., cross-validation) data. Using either approach, we estimate the regression model from the first (sub)sample and then estimate the extent to which the model fits the second (sub)sample as well as its and predictive accuracy.

43.1.2.1 Sample Write-Up

We performed a concurrent validation study based on a sample 182 job incumbents. Given that all three selection tools (i.e., conscientiousness inventory, work sample, situational judgment test) showed evidence of incremental validity, we included all three in a multiple linear regression model, with performance evaluation scores as the criterion. All three selection tools showed evidence of incremental validity with respect to one another. To understand how well this estimated model might generalize to the broader population, we first performed statistical cross-validation. Specifically, we found that the adjusted R2 was .348 and the squared cross-validity (\(\rho^2_{c}\)) was .341. The latter indicates that in the population, the three selection tools collectively explain 34.1% of the variability in the criterion. Next, we performed empirical cross-validation by assessing the model’s performance in a new same of data drawn from the sample population. The squared cross-validity (\(\rho^2_{c}\)) was .341, which indicates that the model explained 34.1% of the variance in the criterion when applied to the second sample, which was only a slight decrease in performance compared to the original sample (\(R^2_{unadj}\) = .359, \(R^2_{adj}\) = .348). Further, estimates of root mean square error (RMSE) indicated that the RMSE for the second sample is 13.6% larger than the RMSE for the first sample, which indicates that prediction errors didn’t increase substantially when applying the model to the second sample. Overall, the model performed relatively well applied to a second sample of data, and it doesn’t appear as though we over fit the model when estimating it using the first sample.

43.2 Tutorial

This chapter’s tutorial demonstrates how to perform statistical and empirical cross-validation using R.

43.2.1 Functions & Packages Introduced

Function Package
lm base R
summary base R
sqrt base R
predict base R
R2 caret
mean base R
RMSE caret
abs base R
MAE caret
cbind base R

43.2.2 Initial Steps

If you haven’t already, save the files called “Sample1.csv” and “Sample2.csv” into a folder that you will subsequently set as your working directory. Your working directory will likely be different than the one shown below (i.e., "H:/RWorkshop"). As a reminder, you can access all of the data files referenced in this book by downloading them as a compressed (zipped) folder from the my GitHub site: https://github.com/davidcaughlin/R-Tutorial-Data-Files; once you’ve followed the link to GitHub, just click “Code” (or “Download”) followed by “Download ZIP”, which will download all of the data files referenced in this book. For the sake of parsimony, I recommend downloading all of the data files into the same folder on your computer, which will allow you to set that same folder as your working directory for each of the chapters in this book.

Next, using the setwd function, set your working directory to the folder in which you saved the data files for this chapter. Alternatively, you can manually set your working directory folder in your drop-down menus by going to Session > Set Working Directory > Choose Directory…. Be sure to create a new R script file (.R) or update an existing R script file so that you can save your script and annotations. If you need refreshers on how to set your working directory and how to create and save an R script, please refer to Setting a Working Directory and Creating & Saving an R Script.

# Set your working directory
setwd("H:/RWorkshop")

Next, read in the .csv data files called “Sample1.csv” and “Sample2.csv” using your choice of read function. In this example, I use the read_csv function from the readr package (Wickham, Hester, and Bryan 2024). If you choose to use the read_csv function, be sure that you have installed and accessed the readr package using the install.packages and library functions. Note: You don’t need to install a package every time you wish to access it; in general, I would recommend updating a package installation once ever 1-3 months. For refreshers on installing packages and reading data into R, please refer to Packages and Reading Data into R.

# Install readr package if you haven't already
# [Note: You don't need to install a package every 
# time you wish to access it]
install.packages("readr")
# Access readr package
library(readr)

# Read data and name data frame (tibble) objects
Sample1 <- read_csv("Sample1.csv")
## Rows: 182 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (4): PerfEval, WorkSample, SJT, Consc
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Sample2 <- read_csv("Sample2.csv")
## Rows: 166 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (4): PerfEval, WorkSample, SJT, Consc
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Print the names of the variables in the data frame (tibble) objects
names(Sample1)
## [1] "PerfEval"   "WorkSample" "SJT"        "Consc"
names(Sample2)
## [1] "PerfEval"   "WorkSample" "SJT"        "Consc"
# View number of rows/cases in data frame (tibble) objects
nrow(Sample1)
## [1] 182
nrow(Sample2)
## [1] 166
# View variable type for each variable in data frame (tibble) objects
str(Sample1)
## spc_tbl_ [182 × 4] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ PerfEval  : num [1:182] 5.1 5 3.5 5 4 4.4 5.2 5.1 3.8 3.1 ...
##  $ WorkSample: num [1:182] 7.5 8.1 7.6 6.6 6 6 8 8.8 6.1 6.5 ...
##  $ SJT       : num [1:182] 7.9 5.6 4.4 5.7 8.5 5.8 6.1 4.7 4.5 5.1 ...
##  $ Consc     : num [1:182] 4.9 2.4 3.1 4.5 3.6 4.8 4.7 4.9 3.2 4.6 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   PerfEval = col_double(),
##   ..   WorkSample = col_double(),
##   ..   SJT = col_double(),
##   ..   Consc = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>
str(Sample2)
## spc_tbl_ [166 × 4] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ PerfEval  : num [1:166] 4 4.3 3.9 5.6 4.5 4.9 4.2 4.7 4.5 5.6 ...
##  $ WorkSample: num [1:166] 6.7 9 6.8 7.6 4.5 8.1 6.5 7.1 6.1 5.3 ...
##  $ SJT       : num [1:166] 4.4 3.7 5.6 6.8 6.9 7.4 6.4 7 7.2 10 ...
##  $ Consc     : num [1:166] 4.3 1.7 4.5 5.2 3.8 4.3 3.7 5.1 4.1 4.3 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   PerfEval = col_double(),
##   ..   WorkSample = col_double(),
##   ..   SJT = col_double(),
##   ..   Consc = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>
# View first 6 rows of data frame (tibble) objects
head(Sample1)
## # A tibble: 6 × 4
##   PerfEval WorkSample   SJT Consc
##      <dbl>      <dbl> <dbl> <dbl>
## 1      5.1        7.5   7.9   4.9
## 2      5          8.1   5.6   2.4
## 3      3.5        7.6   4.4   3.1
## 4      5          6.6   5.7   4.5
## 5      4          6     8.5   3.6
## 6      4.4        6     5.8   4.8
head(Sample2)
## # A tibble: 6 × 4
##   PerfEval WorkSample   SJT Consc
##      <dbl>      <dbl> <dbl> <dbl>
## 1      4          6.7   4.4   4.3
## 2      4.3        9     3.7   1.7
## 3      3.9        6.8   5.6   4.5
## 4      5.6        7.6   6.8   5.2
## 5      4.5        4.5   6.9   3.8
## 6      4.9        8.1   7.4   4.3

The first data frame (Sample1) has 182 cases and the following 4 variables: PerfEval, WorkSample, SJT, and Consc. Per the output of the str (structure) function above, all of the variables are of type numeric (continuous: interval/ratio). Imagine that the Sample1 data were collected as part of a criterion-related validation study - specifically, a concurrent validation design in which job incumbents were administered the three selection tools 90 days after entering the organization. PerfEval is the criterion (outcome) of interest, and it is a 90-day-post-hire measure of supervisor-rated job performance, where scores could range from 1 (low performance) to 7 (high performance). The WorkSample variable contains scores from a work sample assessment, and scores could range from 1-10, with 10 indicating a strong performance on the assessment. SJT refers to a situational judgment test in which scores can range from 1-10, with 10 indicating higher proficiency. Consc refers to an a conscientiousness personality inventory in which scores can range from 1-7, with 7 indicating higher conscientiousness. Proactivity refers to a proactive personality inventory in which scores can range from 1-15, with 15 indicating higher proactive personality.

43.2.3 Perform Statistical Cross-Validation

In order to perform statistical cross-validation, we first need to estimate our model. In this chapter, we’ll estimate a multiple linear regression in which we treat the three selection tool variables (WorkSample, SJT, Consc) as predictors and the variable PerfEval as our criterion (outcome). In other words, we are estimating the criterion-related and incremental validity of these three selection tools in relation to the outcome. In doing so, we will determine whether each selection to explains unique variance in the outcome, and how much variance, collectively, all three selection tools explain in the outcome. For more information on multiple linear regression, check out the chapter on estimating incremental validity. Note that in this chapter we will use the lm function from base R to estimate our multiple linear regression model, which is discussed in the chapter supplement to the aforementioned chapter.

# Estimate multiple linear regression model
model1 <- lm(PerfEval ~ WorkSample + SJT + Consc, data=Sample1)

# Print summary of results
summary(model1)
## 
## Call:
## lm(formula = PerfEval ~ WorkSample + SJT + Consc, data = Sample1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.62345 -0.39876  0.01343  0.32258  1.51027 
## 
## Coefficients:
##             Estimate Std. Error t value         Pr(>|t|)    
## (Intercept)  1.30283    0.34833   3.740         0.000248 ***
## WorkSample   0.28430    0.03854   7.376 0.00000000000596 ***
## SJT          0.08118    0.03082   2.634         0.009190 ** 
## Consc        0.23241    0.04652   4.996 0.00000139215819 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.589 on 178 degrees of freedom
## Multiple R-squared:  0.3592, Adjusted R-squared:  0.3484 
## F-statistic: 33.25 on 3 and 178 DF,  p-value: < 0.00000000000000022

In the output, you can see that each of the regression coefficients associated with the predictor variables (i.e., selection tools) are statistically significant, which indicates that they each explain unique variance in the outcome (PerfEval) when controlling for the predictors in the model. Further, the \(R^2_{unadj}\) (unadjusted R2) is .359, and the \(R^2_{adj}\) (adjusted R2) is .348, both of which provide us with estimates of model fit and the amount of variance explained in the outcome variable collectively by the three predictor variables.

Next, let’s define a function that will compute the squared cross-validity (\(\rho^2_{c}\)) based on the Browne (1975) formula when we type the name of a regression model object as the sole argument. Let’s name this function rho2 for \(\rho^2_{c}\). As a reminder, the Browne formula is:

\(\rho^2_{c} = \frac{(N-k-3)(R^2_{adj})^2+R^2_{adj}}{(N-2k-2)(R^2_{adj})+k}\)

You don’t need to change anything in the function script below. Simply run it as is.

# Write function for squared cross-validity formula (Browne, 1975)
rho2 <- function(x) {
  N <- nobs(x) # model sample size
  k <- length(coef(x)) - 1 # number of predictor variables
  R2_adj <- summary(x)$adj.r.squared # adjusted R-squared
  ((N - k - 3) * (R2_adj^2) + R2_adj) / ((N - (2 * k) - 2) * R2_adj + k) # Browne formula
}

Using the new function called rho2, type the name of our regression model object as the sole parenthetical argument.

# Estimate squared cross-validity
rho2(model1)
## [1] 0.3412245

The output indicates that the squared cross-validity (\(\rho^2_{c}\)) is .341, which indicates that, in the population, the predictor variables included in the model explain 34.1% of the variability in the outcome variable. If you recall, the \(R^2_{unadj}\) was .359, and the \(R^2_{adj}\) was .348. Thus, compared to \(R^2_{unadj}\), the model fit didn’t shrink too much.

To estimate the cross-validity (\(\rho_{c}\)), just take the square root of (\(\rho^2_{c}\)).

# Estimate cross-validity
sqrt(rho2(model1))
## [1] 0.5841442

Thus, our cross-validity (multiple correlation) is .584.

Technical Write-Up: We performed a concurrent validation study based on a sample 182 job incumbents. Given that all three selection tools (i.e., conscientiousness inventory, work sample, situational judgment test) showed evidence of incremental validity, we included all three in a multiple linear regression model, with performance evaluation scores as the criterion. All three selection tools showed evidence of incremental validity with respect to one another. To understand how well this estimated model might generalize to the broader population, we performed statistical cross-validation. Specifically, we found that the adjusted R2 was .348 and the squared cross-validity (\(\rho^2_{c}\)) was .341. The latter indicates that in the population, the three selection tools collectively explain 34.1% of the variability in the criterion.

43.2.4 Perform Empirical Cross-Validation

There are different ways that we can go about performing empirical cross-validation, but regardless of the approach, we are entering the realm of predictive analytics. Why? Well, we are testing the model fit and predictive accuracy of our model estimated using one sample in a different sample from the same population. To get things started, let’s estimate our multiple linear regression model that contains the three selection tool variables (WorkSample, SJT, Consc) as predictors and PerfEval as the outcome (criterion) variable. We will estimate our model using the first sample data frame, which we named Sample1 above.

# Estimate multiple linear regression model
model1 <- lm(PerfEval ~ WorkSample + SJT + Consc, data=Sample1)

# Print summary of results
summary(model1)
## 
## Call:
## lm(formula = PerfEval ~ WorkSample + SJT + Consc, data = Sample1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.62345 -0.39876  0.01343  0.32258  1.51027 
## 
## Coefficients:
##             Estimate Std. Error t value         Pr(>|t|)    
## (Intercept)  1.30283    0.34833   3.740         0.000248 ***
## WorkSample   0.28430    0.03854   7.376 0.00000000000596 ***
## SJT          0.08118    0.03082   2.634         0.009190 ** 
## Consc        0.23241    0.04652   4.996 0.00000139215819 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.589 on 178 degrees of freedom
## Multiple R-squared:  0.3592, Adjusted R-squared:  0.3484 
## F-statistic: 33.25 on 3 and 178 DF,  p-value: < 0.00000000000000022

In the output, you can see that each of the regression coefficients associated with the predictor variables (i.e., selection tools) are statistically significant, which indicates that they each explain unique variance in the outcome (PerfEval) when controlling for the predictors in the model. Further, the \(R^2_{unadj}\) (unadjusted R2) is .359, and the \(R^2_{adj}\) (adjusted R2) is .348, both of which provide us with estimates of model fit and the amount of variance explained in the outcome variable collectively by the three predictor variables.

Let’s kick things up a notch by applying our model that we named model1 to a new data frame called (Sample2). Using the predict function from base R, enter the name of the model object as the first argument and the name of the second data frame object as the second argument. Let’s assign these predicted values using the <- assignment operator to an object called predictions. The predict function is using our regression model equation from the first sample to estimate the fitted (predicted) values of the outcome variable in the second sample based on the values for the different predictor variables.

# Predict values on outcome using second sample
predictions <- predict(model1, Sample2)

Next, we will evaluate the quality of our model’s fit to the new dataset (Sample2). We’ll use several functions from the caret package (Kuhn 2021), so be sure to install and access that package if you haven’t already. The caret package is renowned for its predictive modeling and machine learning functions.

# Install caret package if you haven't already
install.packages("caret")
# Access caret package
library(caret)

For comparison, first, let’s extract the unadjusted r.squared value from the our regression model summary by appending r.squared to the end of summary(model1) using the $ operator. Second, using the R2 function from the caret package, enter the name of the predictions object we created above as the first argument, and as the second argument, enter the name of the second data frame (Sample2), followed by the the $ operator and the name of the outcome variable (PerfEval). The first operation will extract our \(R^2_{unadj}\) from the original sample, and the second operation will estimate our \(\rho^2_{c}\) based on our new model and its performance in the second sample.

# Unadjusted R-squared based on the first sample
summary(model1)$r.squared
## [1] 0.359162
# Adjusted R-squared based on the first sample
summary(model1)$adj.r.squared
## [1] 0.3483614
# Squared cross-validity based on the second sample
R2(predictions, Sample2$PerfEval)
## [1] 0.3412069

As one would expect, the \(\rho^2_{c}\) is slightly lower the \(R^2_{unadj}\) when the model is applied to new data, but fortunately, it doesn’t decrease that much. Further, the \(\rho^2_{c}\) is only slightly lower than the \(R^2_{adj}\). This is our first indication that the model performed fairly well with the new data from the population.

To calculate the multiple R and cross-validity (\(\rho_c\)) values, we simply take the square root (sqrt) of the two values we calculated above.

# Multiple R for first sample
sqrt(summary(model1)$r.squared)
## [1] 0.5993013
# Cross-validity for second sample
sqrt(R2(predictions, Sample2$PerfEval))
## [1] 0.5841292

Similarly, the multiple R and cross-validity (\(\rho_c\)) values are comparable, with the cross-validity value only slightly lower than the multiple R value.

The mean squared error (MSE) is another way that we can test how well our model performed. As the label implies, the MSE is the mean of the error (residual) values squared. First, let’s estimate the MSE for the initial sample, which reflects the extent to which the fitted (predicted) values based on the model deviate from the actual outcome variables in the initial sample (Sample1) on which the model was estimated. To estimate the MSE for the model based on the initial sample, we extract the residuals from the model (model1), square them, and then take the mean using the mean function from base R. Second, let’s estimate the MSE for the second sample based on the extent to which the fitted (predicted) values from our model (model1) deviate from the actual outcome variables in the second sample (Sample2). To estimate the MSE for the model (model1) in relation to the second sample, we subtract the predicted values (predictions) from the outcome variable (PerfEval) values for the second sample (Sample2), square the difference, and then compute the mean using the mean function from base R.

# MSE (Mean Squared Error) for first sample
mean(model1$residuals^2)
## [1] 0.3393501
# MSE (Mean Squared Error) for second sample
mean((Sample2$PerfEval - predictions)^2)
## [1] 0.4381229

In isolation, each MSE value does not mean much; however, each MSE value takes on mean when we compare it to another MSE value. As expected, the MSE is higher when we apply the model (model1) estimated from the first sample (Sample1) to the second sample (Sample2). With that said, we don’t want our MSE in the second, validation sample to be too much higher than the original, test sample. Often, however, we opt for the root mean squared error for evaluating the quality of our model’s fit because it is in the same units as the outcome variable.

To calculate the root mean squared error (RMSE), we simply take the square root of the MSE. For our predicted (fitted) values based on our second, validation sample, we can simply apply the RMSE function from the caret package; as the first argument, type the name of the predictions object we created above, and as the second argument, type the name of the second sample (Sample2), followed by the $ operator and the name of the outcome variable (PerfEval).

# RMSE (Root Mean Squared Error) for first sample 
sqrt(mean(model1$residuals^2))
## [1] 0.5825376
# RMSE (Root Mean Squared Error) for second sample
RMSE(predictions, Sample2$PerfEval)
## [1] 0.6619085

Again, we expect to see that the RMSE for our model (model1) when applied to the second sample to be larger, as we expect larger errors in prediction when we apply our model to new data. That said, we don’t want the differences in RMSE to be too large. But how much is too much you might ask? Good question, and there isn’t a great answer. What is considered a large difference depends in part on the context and what you’re predicting. That said, it’s sometimes helpful to calculate how much larger one RMSE is than the other. Below, I simply calculate how much larger (as a percentage) is the RMSE from the second sample than the first sample.

# Percentage difference in RMSE (Root Mean Squared Error) values
(RMSE(predictions, Sample2$PerfEval) - sqrt(mean(model1$residuals^2))) / 
  sqrt(mean(model1$residuals^2)) * 100
## [1] 13.62503

We see that the RMSE for the second, validation sample is 13.6% larger than the RMSE for the first, test sample, which is actually pretty good. If it were me, I would allow my concern to grow when that percentage difference hits 30% or higher, but that’s not a strict rule.

The mean absolute error (MAE) is another common indicator of a model’s fit to data. The MAE is just the average of the errors’ (residuals’) absolute values. We want the difference between MAE values to be as small as possible.

# MAE (Mean Absolute Error) for first sample
mean(abs(model1$residuals))
## [1] 0.4535265
# MAE (Mean Absolute Error) for second sample
MAE(predictions, Sample2$PerfEval)
## [1] 0.5213717

As we did with the RMSE values, let’s calculate how much larger the MAE is when the model is applied to the second sample than the first sample.

# Percentage difference in MAE (Mean Absolute Error) values
(MAE(predictions, Sample2$PerfEval) - mean(abs(model1$residuals))) / 
  mean(abs(model1$residuals)) * 100
## [1] 14.95948

We see that the MAE for the second, validation sample is 15.0% larger than the MAE for the first, test sample, which again is actually pretty good.

Finally, let’s estimate the 95% prediction intervals around each fitted (predicted) value based on our model (model1) when applied to the second, validation sample (Sample2). We will use the predict function from base R. As the first argument, enter the name of your model object (model1). As the second argument, type newdata= followed by the name of the data frame associated with the second sample (Sample2). As the third argument, type interval="prediction" to indicate that you want to estimate prediction intervals; the default is the 95% prediction interval. Let’s assign the prediction interval and fitted values to an object called pred.int so that we can append those vectors to the second sample data frame, which we do in the following step using the cbind function from base R. Use the head function from base R to view the first 6 rows of your updated Sample2 data frame and to marvel at your work.

# Estimate 95% prediction intervals
pred.int <- predict(model1, newdata=Sample2, interval="prediction")

# Join fitted (predicted) values and upper and lower prediction interval values to data frame
Sample2 <- cbind(Sample2, pred.int)

# View first 6 rows
head(Sample2)
##   PerfEval WorkSample SJT Consc      fit      lwr      upr
## 1      4.0        6.7 4.4   4.3 4.564170 3.396242 5.732099
## 2      4.3        9.0 3.7   1.7 4.556964 3.349621 5.764306
## 3      3.9        6.8 5.6   4.5 4.736498 3.570208 5.902788
## 4      5.6        7.6 6.8   5.2 5.224037 4.051437 6.396638
## 5      4.5        4.5 6.9   3.8 4.025462 2.840158 5.210766
## 6      4.9        8.1 7.4   4.3 5.205726 4.032809 6.378642

Unlike the confidence interval, the prediction interval is specific to each value and represents uncertainty around specific values of the outcome. In contrast, the confidence interval represents uncertainty around the point estimate of a regression coefficient associated with the relation between a predictor variable and the outcome variable. Thus, there is a single confidence interval for a regression coefficient, but there is a different prediction interval for each fitted (predicted) value of the outcome variable.

Technical Write-Up: We performed a concurrent validation study based on a sample 182 job incumbents. Given that all three selection tools (i.e., conscientiousness inventory, work sample, situational judgment test) showed evidence of incremental validity, we included all three in a multiple linear regression model, with performance evaluation scores as the criterion. All three selection tools showed evidence of incremental validity with respect to one another. To understand how well this estimated model might generalize to the broader population, we performed empirical cross-validation by assessing the fit and predictive accuracy of the model in a second sample drawn from the sample population. The squared cross-validity (\(\rho^2_{c}\)) was .341, which indicates that the model explained 34.1% of the variance in the criterion when applied to the second sample, which was only a slight decrease in performance compared to the original sample (\(R^2_{unadj}\) = .359, \(R^2_{adj}\) = .348). Further, estimates of root mean square error (RMSE) indicated that the RMSE for the second sample is 13.6% larger than the RMSE for the first sample, which indicates that were prediction errors didn’t increase substantially when applying the model to the second sample. Overall, the model performed relatively well applied to a second sample of data, and it doesn’t appear as though we overfit the model when estimating it using the first sample.

43.2.5 Summary

In this chapter, we learned how to statistically and empirically cross-validate a regression model. To perform statistical cross-validation, we estimated the cross-validity based on the R-squared value, sample size, and number of predictor variables in the model. To perform empirical cross-validation, we fit our regression model using one sample, and then estimated how well that model fit data from a second sample.

References

Browne, MW. 1975. “Predictive Validity of a Linear Regression Equation.” British Journal of Mathematical and Statistical Psychology 28 (1): 79–87.
Cascio, Wayne F, and Herman Aguinis. 2005. Applied Psychology in Human Resource Management (6th Ed.). Upper Saddle River, NJ: Pearson Prentice Hall.
Ezekiel, Mordecai. 1930. “Methods of Correlation Analysis.”
Kuhn, Max. 2021. Caret: Classification and Regression Training. https://CRAN.R-project.org/package=caret.
Raju, Nambury S, Reyhan Bilgic, Jack E Edwards, and Paul F Fleer. 1997. “Methodology Review: Estimation of Population Validity and Cross-Validity, and the Use of Equal Weights in Prediction.” Applied Psychological Measurement 21 (4): 291–305.
Wherry, Robert J. 1931. “A New Formula for Predicting the Shrinkage of the Coefficient of Multiple Correlation.” The Annals of Mathematical Statistics 2 (4): 440–57.
Wickham, Hadley, Jim Hester, and Jennifer Bryan. 2024. Readr: Read Rectangular Text Data. https://CRAN.R-project.org/package=readr.