Chapter 41 Applying a Noncompensatory Approach to Selection Decisions Using Angoff Method

In this chapter, we will learn how to apply a noncompensatory approach to making selection decisions by using the Angoff Method. We’ll begin with a conceptual overview of the noncompensatory approach and Angoff Method, and we’ll conclude with a tutorial.

41.1 Conceptual Overview

In general, there are three overarching (and mutually non-exclusive) approaches to making selection decisions: (a) compensatory (e.g., multiple linear regression), (b) noncompensatory (e.g., multiple-cutoff), and (c) multiple-hurdle. These three approaches can be mixed and matched to fit the selection-decision needs of an organization. In the previous chapter, we focused a compensatory approach using multiple linear regression, and in this chapter, we will focus on the multiple-cutoff noncompensatory approach using the Angoff Method.

41.1.1 Review of Noncompensatory Approach

Link to conceptual video: https://youtu.be/2BMJNLptVPw

A noncompensatory approach to making selection decisions involves the application of multiple cutoff scores, where cutoff scores are sometimes referred to as critical scores or just cut scores. A noncompensatory approach signals that an applicant’s score on one selection tool cannot compensate for their score on another selection tool. A common noncompensatory approach involves setting a separate cutoff score for each selection tool. Naive, judgmental, and empirical methods exist for setting cutoff scores (Mueller et al. 2007). Of the judgmental methods, the Angoff Method (Angoff 1971) is one of the more well-known. Briefly, this method requires that a set of subject matter experts (SMEs) estimate the probability that a minimally qualified applicant would respond correctly to an item (e.g., question) from a selection tool (e.g., assessment, test). Assuming a multi-item selection tool, the SMEs’ probability estimates are then averaged for each item, and the cutoff score is then computed by calculating the sum of the average SME probability estimates across items. Conceptually, the resulting cutoff score is thought to represent the mean of the distribution of selection tool overall scores.

Only relatively simple arithmetic (e.g., mean, sum) is needed when applying the Angoff Method for two or more selection tools as part of a multiple-cutoff approach to making selection decisions. That being said, sometimes it helps to practice prepping the data and applying such arithmetic along with logical expressions to construct a simple algorithm.

41.2 Tutorial

This chapter’s tutorial demonstrates how to apply a noncompensatory approach to making selection decisions by using the Angoff Method.

41.2.1 Video Tutorial

As usual, you have the choice to follow along with the written tutorial in this chapter or to watch the video tutorial below.

Link to video tutorial: https://youtu.be/OBiY73Pruao

41.2.2 Functions & Packages Introduced

Function Package
colMeans base R
sum base R
c base R
rowSums base R
ifelse base R
is.na base R

41.2.3 Initial Steps

If you haven’t already, save the file called “angoff_sme.csv” into a folder that you will subsequently set as your working directory. Your working directory will likely be different than the one shown below (i.e., "H:/RWorkshop"). As a reminder, you can access all of the data files referenced in this book by downloading them as a compressed (zipped) folder from the my GitHub site: https://github.com/davidcaughlin/R-Tutorial-Data-Files; once you’ve followed the link to GitHub, just click “Code” (or “Download”) followed by “Download ZIP”, which will download all of the data files referenced in this book. For the sake of parsimony, I recommend downloading all of the data files into the same folder on your computer, which will allow you to set that same folder as your working directory for each of the chapters in this book.

Next, using the setwd function, set your working directory to the folder in which you saved the data file for this chapter. Alternatively, you can manually set your working directory folder in your drop-down menus by going to Session > Set Working Directory > Choose Directory…. Be sure to create a new R script file (.R) or update an existing R script file so that you can save your script and annotations. If you need refreshers on how to set your working directory and how to create and save an R script, please refer to Setting a Working Directory and Creating & Saving an R Script.

# Set your working directory
setwd("H:/RWorkshop")

Next, read in the .csv data file called “angoff_sme.csv” using your choice of read function. In this example, I use the read_csv function from the readr package (Wickham, Hester, and Bryan 2024). If you choose to use the read_csv function, be sure that you have installed and accessed the readr package using the install.packages and library functions. Note: You don’t need to install a package every time you wish to access it; in general, I would recommend updating a package installation once ever 1-3 months. For refreshers on installing packages and reading data into R, please refer to Packages and Reading Data into R.

# Install readr package if you haven't already
# [Note: You don't need to install a package every 
# time you wish to access it]
install.packages("readr")
# Access readr package
library(readr)

# Read data and name data frame (tibble) object
df1 <- read_csv("angoff_sme.csv")
## Rows: 5 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (9): SME, CogAb_i1, CogAb_i2, CogAb_i3, CogAb_i4, CogAb_i5, KnowTest_i1, KnowTest_i2, KnowTest_i3
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Print the names of the variables in the data frame (tibble) objects
names(df1)
## [1] "SME"         "CogAb_i1"    "CogAb_i2"    "CogAb_i3"    "CogAb_i4"    "CogAb_i5"    "KnowTest_i1" "KnowTest_i2"
## [9] "KnowTest_i3"
# View variable type for each variable in data frame
str(df1)
## spc_tbl_ [5 × 9] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ SME        : num [1:5] 1 2 3 4 5
##  $ CogAb_i1   : num [1:5] 0.78 0.85 0.76 0.75 0.81
##  $ CogAb_i2   : num [1:5] 0.72 0.72 0.7 0.69 0.69
##  $ CogAb_i3   : num [1:5] 0.67 0.6 0.59 0.59 0.63
##  $ CogAb_i4   : num [1:5] 0.53 0.61 0.52 0.56 0.59
##  $ CogAb_i5   : num [1:5] 0.44 0.46 0.49 0.47 0.45
##  $ KnowTest_i1: num [1:5] 0.69 0.8 0.69 0.67 0.75
##  $ KnowTest_i2: num [1:5] 0.7 0.72 0.69 0.69 0.79
##  $ KnowTest_i3: num [1:5] 0.72 0.76 0.78 0.77 0.73
##  - attr(*, "spec")=
##   .. cols(
##   ..   SME = col_double(),
##   ..   CogAb_i1 = col_double(),
##   ..   CogAb_i2 = col_double(),
##   ..   CogAb_i3 = col_double(),
##   ..   CogAb_i4 = col_double(),
##   ..   CogAb_i5 = col_double(),
##   ..   KnowTest_i1 = col_double(),
##   ..   KnowTest_i2 = col_double(),
##   ..   KnowTest_i3 = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>
# View first 6 rows of data frame
head(df1)
## # A tibble: 5 × 9
##     SME CogAb_i1 CogAb_i2 CogAb_i3 CogAb_i4 CogAb_i5 KnowTest_i1 KnowTest_i2 KnowTest_i3
##   <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>       <dbl>       <dbl>       <dbl>
## 1     1     0.78     0.72     0.67     0.53     0.44        0.69        0.7         0.72
## 2     2     0.85     0.72     0.6      0.61     0.46        0.8         0.72        0.76
## 3     3     0.76     0.7      0.59     0.52     0.49        0.69        0.69        0.78
## 4     4     0.75     0.69     0.59     0.56     0.47        0.67        0.69        0.77
## 5     5     0.81     0.69     0.63     0.59     0.45        0.75        0.79        0.73

The data frame contains 9 variables and 5 observations (i.e., SMEs). The SME variable contains unique identifiers for the subject matter experts (SMEs) who estimated the probability (for each item) that a minimally qualified applicant would answer correctly (i.e., Angoff Method). The variables labeled CogAb_i1 through CogAb_i5 correspond to five cognitive ability test items that were designed to have increasing levels of difficulty, with the first item (CogAb_i1) being the easiest and the fifth item (CogAb_i5) being the most difficult. The variables labeled KnowTest_i1 through KnowTest_i3 correspond to three knowledge test items.

41.2.4 Create Cutoff Scores

To create cutoff scores for the cognitive ability and knowledge tests, we’ll nest the colMeans function within the sum function, where both functions are from base R. Let’s begin by creating a cutoff score for the cognitive ability test based on its five items: CogAb_i1, CogAb_i2, CogAb_i3, CogAb_i4, and CogAb_i5.

  1. Specify a unique name for an object that we can subsequently assign the cognitive ability test cutoff score to. Below, I name this object CogAb_cutoff, but you could name it whatever makes sense to you.
  2. To the right of the object name you specified, type the <- operator, which will allow us to assign to the object the value that results from the operations we will type to the right of the <- operator (in the subsequent steps).
  3. Type the name of the sum function. The sum function computes the sum for a vector of scores.
  4. As the sole argument within the sum function specify the colMeans function. The colMeans function computes the means for all columns within a data frame – or a subset of columns if we reference specific variables.
  5. As the sole argument in the colMeans function, we’ll use bracket (i.e., matrix) notation to specify the data frame name and the specific variables we wish to estimate the column means for. First, type the name of the data frame object (df1), and follow that object name up with brackets ([ ]). Using bracket notation, we can reference columns by typing a comma (,), followed by a vector containing the columns (i.e., variables) we wish to reference. To specify that vector of column (variable) names, we can use the c function from base R, and within that function, specify each variable’s name in quotation marks (" "), and separate variable names using commas.
# Create cutoff score for cognitive ability test
CogAb_cutoff <- sum(
  colMeans(
    df1[, c("CogAb_i1","CogAb_i2","CogAb_i3","CogAb_i4","CogAb_i5")]
    )
)

Next, let’s apply the same process as above to the three knowledge test items in order to create a cutoff score for the knowledge test: KnowTest_i1, KnowTest_i2, and KnowTest_i3.

# Create cutoff score for knowledge test
KnowTest_cutoff <- sum(
  colMeans(
    df1[, c("KnowTest_i1","KnowTest_i2","KnowTest_i3")]
    )
)

41.2.5 Apply Cutoff Scores to Make Selection Decisions

After creating the cutoff scores for our two selection tools, we’re ready to apply these cutoff scores to data collected from applicants, which will result in an algorithm of sorts. The name of the data file containing the applicant data on the cognitive ability and knowledge tests is “angoff_applicant.csv”. Below, I read in the data as a data frame and assign the data frame to an object that I’m calling df2.

# Read in data
df2 <- read.csv("angoff_applicant.csv")

Let’s print this data frame to our Console window by using the print function from base R.

# Print data frame object
print(df2)
##    Applicant_ID CogAb_i1 CogAb_i2 CogAb_i3 CogAb_i4 CogAb_i5 KnowTest_i1 KnowTest_i2 KnowTest_i3
## 1           AA1        0        0        1        0        0           0           0           1
## 2           AA2        1        1        1        1        0           1           1           1
## 3           AA3        1        1        0        0        0           1           0           1
## 4           AA4        1        0        0        1        0           0           0           0
## 5           AA5        1        0        1        0        1           1           1           0
## 6           AA6        1        1        1        0        0           0           1           1
## 7           AA7        0        0        0        0        0           0           1           1
## 8           AA8        1        1        0        1        0           0           0           1
## 9           AA9        0        1        1        0        0           1           1           1
## 10         AA10        1        1        1        1        1           1           1           1
## 11         AA11        1        0        0        0        0           1           0           1
## 12         AA12        1        0        1        0        0           1           0           0
## 13         AA13        1        1        1        0        1           1           1           1
## 14         AA14        1        1        0        0        0           0           0           0
## 15         AA15        1        0        1        0        0           0           1           1
## 16         AA16        1        1        0        0        1           1           0           1
## 17         AA17        1        1        0        1        0           1           1           1
## 18         AA18        1        1        0        0        1           1           0           1
## 19         AA19        1        0        0        0        0           1           0           1
## 20         AA20        0        1        0        1        0           1           1           0

The data frame contains 9 variables and 20 observations (i.e., applicants). The Applicant_ID variable contains unique identifiers for the applicants who completed each of the items for the cognitive ability and knowledge tests. The variables labeled CogAb_i1 through CogAb_i5 correspond to five cognitive ability test items that were designed to have increasing levels of difficulty, with the first item (CogAb_i1) being the easiest and the fifth item (CogAb_i5) being the most difficult. The variables labeled KnowTest_i1 through KnowTest_i3 correspond to three knowledge test items. Each of the items consist of scores of 1 and 0, where 1 indicates that an applicant answered the item correctly and 0 indicates that an applicant answered the item incorrectly.

Using the applicant data (df2), we need to compute the overall score for each applicant on each of the two selection tools. Let’s begin with the cognitive ability test items: CogAb_i1, CogAb_i2, CogAb_i3, CogAb_i4, and CogAb_i5.

  1. Type the name of the second data frame object containing the applicant data (df2). Next, insert the $ operator and then specify a unique name for a variable to which we can subsequently assign the cognitive ability test overall scores. Because we’re going to compute an overall score based on the sum of the number of items each applicant answered correctly, I decided to name this variable CogAb_sum.
  2. To the right of the new variable name, type the <- operator, which will allow us to assign to the object the vector that results from the operations we will type to the right of the <- operator (in the subsequent steps).
  3. Type the name of the rowSums function from base R. The rowSums function computes the sum for each row.
  4. As the first argument in the rowSums function, we’ll use bracket (i.e., matrix) notation to specify the data frame name and the specific variables we wish to estimate the column means for. First, type the name of the data frame object (df2), and follow that object name up with brackets ([ ]). Using bracket notation, we can reference columns by typing a comma (,), followed by a vector containing the columns (i.e., variables) we wish to reference. To specify that vector of column (variable) names, we can use the c function from base R, and within that function, specify each variable’s name in quotation marks (" "), and separate variable names using commas.
  5. As the second argument in the rowSums function, type the argument na.rm=TRUE, which will tell the function to ignore missing data when computing the sum for each row.
# Compute the overall (sum) cognitive ability test score for each applicant
df2$CogAb_sum <- rowSums(
  df2[,c("CogAb_i1","CogAb_i2","CogAb_i3","CogAb_i4","CogAb_i5")],
  na.rm=TRUE)

Next, repeat the same process for the three knowledge test items: KnowTest_i1, KnowTest_i2, and KnowTest_i3.

# Compute the overall (sum) knowledge test score for each applicant
df2$KnowTest_sum <- rowSums(
  df2[,c("KnowTest_i1","KnowTest_i2","KnowTest_i3")],
  na.rm=TRUE)

We’re now ready to apply the cutoff scores to applicants’ scores on the two selection tools. Let’s begin by applying the cutoff score for the cognitive ability test to the applicants’ overall scores on the cognitive ability test.

  1. Type the name of the second data frame object containing the applicant data (df2). Next, insert the $ operator and then specify a unique name for a variable to which we can subsequently assign the cognitive ability test pass vs. fail scores (CogAb_pass).
  2. To the right of the new variable name, type the <- operator, which will allow us to assign to the object the vector that results from the operations we will type to the right of the <- operator (in the subsequent steps).
  3. Type the name of the ifelse function from base R. The ifelse function allows us to apply if/else logical arguments to a vector of values (such as a variable).
  4. As the first argument in the ifelse function, specify a logical argument that applies the cutoff score to the applicants overall scores for the cognitive ability test. Specifically, we want to make a logical expression in which the cognitive ability overall scale variable scores are greater than or equal to the cutoff score object we created previously: df2$CogAb_sum >= CogAb_cutoff.
  5. As the second argument in the ifelse function, provide a value that will be generated should the logical expression in the first argument be true for a particular applicant. If an applicant’s overall cognitive ability test score is greater than or equal to the cutoff score, then let’s type "Pass" as the resulting value.
  6. As the second argument in the ifelse function, provide a value that will be generated should the logical expression in the first argument be false for a particular applicant. If an applicant’s overall cognitive ability test score is not greater than or equal to the cutoff score, then let’s type "Fail" as the resulting value.
# Create variable containing cognitive ability test pass/fail scores
df2$CogAb_pass <- ifelse(
  df2$CogAb_sum >= CogAb_cutoff,
  "Pass",
  "Fail"
  )

Next, let’s apply the same process to the knowledge test overall scores.

# Create variable containing knowledge test pass/fail scores
df2$KnowTest_pass <- ifelse(
  df2$KnowTest_sum >= KnowTest_cutoff,
  "Pass",
  "Fail"
  )

We are now ready to start making some overall selection decisions using what’s referred to as a multiple-cutoff approach. Our goal is to create a new variable that indicates which applicants passed both selection tools based on the cutoff scores we previously applied.

  1. Type the name of the second data frame object containing the applicant data (df2). Next, insert the $ operator and then specify a unique name for a variable to which we can subsequently assign the overall pass vs. fail scores (Overall_pass) based on our multiple-cutoff approach.
  2. To the right of the new variable name, type the <- operator, which will allow us to assign to the object the vector that results from the operations we will type to the right of the <- operator (in the subsequent steps).
  3. Type the name of the ifelse function from base R.
  4. As the first argument in the ifelse function, specify a logical argument with a logical & (AND) operator to specify that in order for an applicant to pass both selection tools, they need to earn passing scores on both. Specifically, we want to make a logical expression in which applicants’ scores on pass/fail variables we created above are used to flag those individuals who earned a score of “Pass” on both: df2$CogAb_pass == "Pass" & df2$KnowTest_pass == "Pass".
  5. As the second argument in the ifelse function, provide a value that will be generated should the logical expression in the first argument be true for a particular applicant. If an applicant satisfies both of those logical expressions in the previous argument, let’s assign them a "Pass" score.
  6. As the second argument in the ifelse function, provide a value that will be generated should one or both of the logical expressions in the first argument be false for a particular applicant. If an applicant does not satisfy both of those logical expressions in the first argument, let’s assign them a "Fail" score.
# Create variable containing overall pass/fail scores based on both tools
df2$Overall_pass <- ifelse(
  df2$CogAb_pass == "Pass" & df2$KnowTest_pass == "Pass",
  "Pass",
  "Fail"
  )

Using the ifelse function once more, let’s create a vector containing the applicant unique identifier numbers for those who passed the multiple-cutoff selection process and NA (missing) for everyone else who failed.

  1. Come up with a unique name for an object to which we can assign the vector of applicant unique ID numbers and NAs so that we can reference it later. Here, I name the object Applicant_ID_pass.
  2. To the right of the new variable name, type the <- operator, which will allow us to assign to the object the vector that results from the ifelse function.
  3. Type the name of the ifelse function from base R.
  4. As the first argument in the ifelse function, specify a logical argument in which scores on the Overall_pass variable we created above is equal to the value “Pass”: df2$Overall_pass == "Pass".
  5. As the second argument in the ifelse function, provide a value that will be generated should the logical expression in the first argument be true for a particular applicant. If an applicant satisfies both of those logical expressions in the previous argument, let’s reference their applicant unique identifier number (Applicant_ID) from the applicant data frame object.
  6. As the second argument in the ifelse function, simply enter NA. Should the logical expression in the first argument be false, then the applicant will receive a missing value.
# Create a vector of the applicant unique identifiers for those who passed
# and NAs for those who failed the multiple-cutoff selection process
Applicant_ID_pass <- ifelse(
  df2$Overall_pass == "Pass", 
  df2$Applicant_ID, 
  NA)

Finally, from the Applicant_ID_pass vector object we created above, let’s drop all NA values.

  1. Type name of the vector we created above that contains the applicant unique identifier numbers for those who passed the multiple-cutoff selection process and NAs for those who failed (Applicant_ID_pass).
  2. Following the name of the Applicant_ID_pass vector object, insert brackets ([ ]).
  3. Within the brackets ([ ]), type the not (!) operator followed by the is.na function from base R. The is.na function returns a TRUE if a value is NA and a FALSE if the value is not NA. By preceding that function with the not (!) operator, we can flip that logic, such that those with an NA will effectively receive a FALSE value.
  4. As the sole parenthetical argument within the is.na function, type the exact name of the same vector object from the first step (Applicant_ID_pass).
# Retain only the applicant unique identifer numbers for those who passed
# the multiple-cutoff selection process
Applicant_ID_pass[!is.na(Applicant_ID_pass)]
## [1] "AA2"  "AA10" "AA13"

In your Console, you should see three applicant unique identifier numbers: AA2, AA10, and AA13. These are the applicants who passed the multiple-cutoff selection process. Depending on the overall design of the selection system, these are the three applicants who should be given either job offers or passed along to the next phase of the selection system (e.g., interview).

41.2.6 Summary

In this chapter, we learned how to apply common functions from base R to make selection decisions from a multiple-cutoff selection process that uses the Angoff Method, where the application of multiple cutoffs is a noncompensatory approach.

References

Angoff, W H. 1971. “Scales, Norms, and Equivalent Scores.” In Educational Measurement, edited by R L Thorndike, 508–600. Washington, DC: American Council on Education.
Mueller, Lorin, Dwayne Norris, Scott Oppler, and SM McPhail. 2007. “Implementation Based on Alternate Validation Procedures: Ranking, Cut Scores, Banding, and Compensatory Models.” Alternative Validation Strategies: Developing New and Leveraging Existing Validity Evidence, 349–405.
Wickham, Hadley, Jim Hester, and Jennifer Bryan. 2024. Readr: Read Rectangular Text Data. https://CRAN.R-project.org/package=readr.