Chapter 25 Summarizing Two or More Categorical Variables Using Cross-Tabulations

In this chapter, we will learn about how to summarize two or three categorical (nominal, ordinal) employee-demographic variables using cross-tabulations, and we’ll conclude the chapter with a tutorial.

25.1 Conceptual Overview

In this section, we’ll review the purpose of cross-tabulations and how they can be useful for summarizing data from two or three categorical (nominal, ordinal) variables, followed by a sample-write up of cross-tabulation results.

25.1.1 Review of Cross-Tabulation

A cross-tabulation is a specific type of table. A table in its simplest form is simply an object in which data are stored in rows and columns, and sometimes a table is referred to as a tabular display in the context of data visualization. Broadly speaking, in the R environment you can think of a data frame as a specific type of data table. When we create a table involving two or more categorical variables, we often refer to the the table as a cross-tabulation (cross-tabs). A cross-tabulation can be described more specifically as the process of creating a table from two or more categorical variables (i.e., dimensions), which frequencies (i.e., counts) of observations are displayed per combination of variable categories or levels. The table resulting from a cross tabulation is sometimes referred to as a contingency table. Cross-tabulation is a relatively simple way in which we can summarize data, but it serves as the foundation for the chi-square test of independence and allows for deriving insights via segmentation. Cross-tabulation is often most useful when the variables involved are nominal or ordinal.

In this chapter, we focus on creating cross-tabulations to describe or summarize frequency (i.e., count) data from one or more categorical (i.e., nominal, ordinal) variables. Specifically, we will learn how to create different types of two-way and three-way cross-tabulations, where two-way implies that we are summarizing two variables and three-way implies that we are summarizing three variables.

As a side note, the Aggregating & Segmenting Data chapter provides additional approaches for creating descriptive or summary data tables (well, technically tibbles) using functions from the dplyr package; in that chapter, you can learn how to create tables when one variable is continuous (i.e., interval ratio) and one variable is categorical. Finally, please note that there is a package called data.table that has functions that allow one to convert data frames to a special kind of data table object that allows for faster and enhanced manipulations; if you’re interested, follow this link to learn more.

25.1.2 Sample Write-Up

Based on data stored in the organization’s HR information system, we sought out to describe the organization’s employee demographics. The employee gender and race/ethnicity variables have nominal measurement scales, and thus we computed counts to describe these variables. Specifically, 321 employees identified as women and 300 as men. With respect to race/ethnicity, 192 employees identified as Hispanic/Latino and 429 as White. To describe how the gender and race/ethnicity variables relate to one another, we computed a two-way cross-tabulation. The results indicated that 77 (24%) women identified as Hispanic/Latino, and 244 (76%) women identified as White. Additionally, 115 (38%) men identified as Hispanic/Latino, and 185 (62%) identified as White.

25.2 Tutorial

This chapter’s tutorial demonstrates how to compute cross-tabulations for combinations of two and three categorical (nominal, ordinal) variables.

25.2.1 Video Tutorial

As usual, you have the choice to follow along with the written tutorial in this chapter or to watch the video tutorial below.

Link to video tutorial: https://youtu.be/Ja_CM253oDQ

25.2.2 Functions & Packages Introduced

Function Package
table base R
prop.table base R
round base R
ftable base R
xtabs base R
CrossTable gmodels

25.2.3 Initial Steps

If you haven’t already, save the file called “EmployeeDemographics.csv” into a folder that you will subsequently set as your working directory. Your working directory will likely be different than the one shown below (i.e., "H:/RWorkshop"). As a reminder, you can access all of the data files referenced in this book by downloading them as a compressed (zipped) folder from the my GitHub site: https://github.com/davidcaughlin/R-Tutorial-Data-Files; once you’ve followed the link to GitHub, just click “Code” (or “Download”) followed by “Download ZIP”, which will download all of the data files referenced in this book. For the sake of parsimony, I recommend downloading all of the data files into the same folder on your computer, which will allow you to set that same folder as your working directory for each of the chapters in this book.

Next, using the setwd function, set your working directory to the folder in which you saved the data file for this chapter. Alternatively, you can manually set your working directory folder in your drop-down menus by going to Session > Set Working Directory > Choose Directory…. Be sure to create a new R script file (.R) or update an existing R script file so that you can save your script and annotations. If you need refreshers on how to set your working directory and how to create and save an R script, please refer to Setting a Working Directory and Creating & Saving an R Script.

# Set your working directory
setwd("H:/RWorkshop")

Next, read in the .csv data file called “EmployeeDemographics.csv” using your choice of read function. In this example, I use the read_csv function from the readr package (Wickham, Hester, and Bryan 2024). If you choose to use the read_csv function, be sure that you have installed and accessed the readr package using the install.packages and library functions. Note: You don’t need to install a package every time you wish to access it; in general, I would recommend updating a package installation once ever 1-3 months. For refreshers on installing packages and reading data into R, please refer to Packages and Reading Data into R.

# Install readr package if you haven't already
# [Note: You don't need to install a package every 
# time you wish to access it]
install.packages("readr")
# Access readr package
library(readr)

# Read data and name data frame (tibble) object
demodata <- read_csv("EmployeeDemographics.csv")
## Rows: 163 Columns: 7
## ── Column specification ─────────────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): Sex, RaceEthnicity, Veteran
## dbl (4): EmployeeID, OrgTenureYrs_2019, JobLevel, AgeYrs_2019
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Print the names of the variables in the data frame (tibble) objects
names(demodata)
## [1] "EmployeeID"        "OrgTenureYrs_2019" "JobLevel"          "Sex"               "RaceEthnicity"     "AgeYrs_2019"      
## [7] "Veteran"

Note in the data frame that the EmployeeID variable (i.e., column, field) is a unique identifier variable, and each row contains an individual employee’s demographic data on the following variables: organizational tenure (OrgTenureYrs_2019), job level (JobLevel), sex (Sex), race/ethnicity (RaceEthnicity), age (AgeYrs_2019), and veteran status (Veteran).

25.2.4 Two-Way Cross-Tabulation

A two-way cross-tabulation summarizes the association between two categorical (i.e., nominal, ordinal) variables. Using the data found in the data frame we named demodata, we will begin by creating two-way cross-tabulations using the categorical JobLevel and Sex variables. I’ll demonstrate how to create two-way cross-tabulations using three different functions (i.e., table, xtabs, CrossTable), and you can choose to follow along with all three or just one or two.

25.2.4.1 Option 1: Using the table Function

Using the table function from base R, let’s create a two-way cross-tabulation table containing frequencies based on the JobLevel and Sex variables contained in the demodata data frame object. To begin, type the name of the table function. As the first argument in the function, specify the name of the data frame object (demodata) followed by the $ operator and the name of the first variable (JobLevel). As the second argument, specify the name of the data frame object (demodata) followed by the $ operator and the name of the second variable (Sex).

# Create two-way cross-tabulation table from JobLevel and Sex variables
table(demodata$JobLevel, demodata$Sex)
##    
##     Female Male
##   1     47   35
##   2     22   22
##   3      7   11
##   4      3    7
##   5      1    8

As you can see, the levels of the JobLevel variable (i.e., 1-5) appear as the row labels, and the categories of the Sex variable (i.e., Female, Male) appear as the column labels. If we wanted the categories of the Sex variable to appear as the row labels and the levels of the JobLevel variable to appear as the column labels, we would reverse the order of the two variables in our table function.

Each “cell” in the cross-tabulation table contains the frequency (i.e., count) of employees who are in the intersecting categories. For example, the table shows that 47 female employees are in job level 1, whereas 35 male employees are in job level 2. Because there are five levels associated with the JobLevel variable (i.e., 1-5) and two levels associated with the Sex variable (i.e., Female, Male), the 5x2 cross-tabulation table has a total of 10 cells.

Using the same code as above, let’s assign the cross-tabulation table we created to an object that we’ll (arbitrarily) call table_2. We’ll use the <- operator to do this.

# Assign two-way cross-tabulation table to object
table_2 <- table(demodata$JobLevel, demodata$Sex)

Using the cross-tabulation table object we created above (table_2), we can apply the prop.table function from base R to estimate the proportions in each table cell. To calculate the cell proportions, simply type the name of the prop.table function, and as the only parenthetical argument, type the name of the cross-tabulation table object we created above (table_2).

# Present the cell proportions for the cross-tabulation table
prop.table(table_2)
##    
##          Female        Male
##   1 0.288343558 0.214723926
##   2 0.134969325 0.134969325
##   3 0.042944785 0.067484663
##   4 0.018404908 0.042944785
##   5 0.006134969 0.049079755

When inspecting the cell proportion table displayed above, you might find it challenging to read given that number of decimal places after zero that are reported. To round values to 2 places after the decimal, let’s wrap the code from above in the round function from base R. As the first argument, let’s copy in our prop.table code from above, and as the second argument, let’s type the number of decimals after zero to which we wish to round (e.g., 2).

# Round values to 2 places after decimal
round(prop.table(table_2), 2)
##    
##     Female Male
##   1   0.29 0.21
##   2   0.13 0.13
##   3   0.04 0.07
##   4   0.02 0.04
##   5   0.01 0.05

If you were to sum all of the proportions in the table, you would get a total of 1 (100%). To demonstrate, let’s apply the sum function base R by wrapping our prop.table function code in the sum function.

# Round values to 2 places after decimal
sum(prop.table(table_2))
## [1] 1

As expected, all of the proportions in the table sum to 1 (100%).

What if we wish to compute the proportions by row or by column? Well, to compute those row-by-row or column-by-column proportions, we simply add an argument to the prop.table function. To compute the row proportions, enter 1 as the second argument.

# Print the row proportions for the cross-tabulation table
prop.table(table_2, 1)
##    
##        Female      Male
##   1 0.5731707 0.4268293
##   2 0.5000000 0.5000000
##   3 0.3888889 0.6111111
##   4 0.3000000 0.7000000
##   5 0.1111111 0.8888889

In the output, you can see that the proportions in each row now sum to 1 (100%). Now, let’s round the row proportions to 2 places after the decimal.

# Round cross-tabulation table values to 2 places after decimal
round(prop.table(table_2, 1), 2)
##    
##     Female Male
##   1   0.57 0.43
##   2   0.50 0.50
##   3   0.39 0.61
##   4   0.30 0.70
##   5   0.11 0.89

Next, let’s convert those proportions to percentages by multiplying the previous code by 100. If you recall, the multiplication operator in R is the * symbol. Just be sure to remember that the values presented in the subsequent output represent percentages and not raw frequencies (i.e., counts).

# Convert proportions to percentages by multiplying by 100
100 * round(prop.table(table_2, 1), 2)
##    
##     Female Male
##   1     57   43
##   2     50   50
##   3     39   61
##   4     30   70
##   5     11   89

We can retain two digits after the decimal by re-specifying the code from above in the following way.

# Convert proportions to percentages by multiplying by 100
round(100 * prop.table(table_2, 1), 2)
##    
##     Female  Male
##   1  57.32 42.68
##   2  50.00 50.00
##   3  38.89 61.11
##   4  30.00 70.00
##   5  11.11 88.89

Alternatively, we can compute the column proportions by typing 2 instead of 1 in the second argument of the prop.table function. As you can see below, each column now adds up to 1 (100%).

# Print the row proportions for the cross-tabulation table
prop.table(table_2, 2)
##    
##         Female       Male
##   1 0.58750000 0.42168675
##   2 0.27500000 0.26506024
##   3 0.08750000 0.13253012
##   4 0.03750000 0.08433735
##   5 0.01250000 0.09638554

Now, let’s multiply by 100 to convert the proportions to percentages and round to 2 places after the decimal.

# Round table values to 2 places after decimal and convert to percentages
round(100 * prop.table(table_2, 2), 2)
##    
##     Female  Male
##   1  58.75 42.17
##   2  27.50 26.51
##   3   8.75 13.25
##   4   3.75  8.43
##   5   1.25  9.64

25.2.4.2 Option 2: Using the xtabs Function

The xtabs function from base R can also be used to create a two-way cross-tabulation table. To begin, type the name of the xtabs function. As the first argument in the function parentheses, insert the tilde (~) operator followed by the name of the first variable (JobLevel), the addition (+) operator, and the name of the second variable (Sex). As the second argument, type data= followed by the name of the data frame object two which the aforementioned variables belong (demodata).

# Create cross-tabulation table from JobLevel and Sex variables 
xtabs(~ JobLevel + Sex, data=demodata)
##         Sex
## JobLevel Female Male
##        1     47   35
##        2     22   22
##        3      7   11
##        4      3    7
##        5      1    8

Using the same code as above, let’s assign the cross-tabulation table we created to an object that we’ll (arbitrarily) call table_2. We’ll use the <- operator to do this.

# Assign two-way cross-tabulation table to object
table_2 <- xtabs(~ JobLevel + Sex, data=demodata)

Using the cross-tabulation table object we created above (table_2), we can apply the prop.table function from base R to estimate the proportions in each table cell. To calculate the cell proportions, simply type the name of the prop.table function, and as the only parenthetical argument, type the name of the cross-tabulation table object we created above (table_2).

# Print the cell proportions for the cross-tabulation table
prop.table(table_2)
##         Sex
## JobLevel      Female        Male
##        1 0.288343558 0.214723926
##        2 0.134969325 0.134969325
##        3 0.042944785 0.067484663
##        4 0.018404908 0.042944785
##        5 0.006134969 0.049079755

When inspecting the cell proportion table displayed above, you might find it challenging to read given that number of decimal places after zero that are reported. To round values to 2 places after the decimal, let’s wrap the code from above in the round function from base R. As the first argument, let’s copy in our prop.table code from above, and as the second argument, let’s type the number of decimals after zero to which we wish to round (e.g., 2).

# Round values to 2 places after decimal
round(prop.table(table_2), 2)
##         Sex
## JobLevel Female Male
##        1   0.29 0.21
##        2   0.13 0.13
##        3   0.04 0.07
##        4   0.02 0.04
##        5   0.01 0.05

If you were to sum all of the proportions in the table, you would get a total of 1 (100%). To demonstrate, let’s apply the sum function base R by wrapping our prop.table function code in the sum function.

# Round values to 2 places after decimal
sum(prop.table(table_2))
## [1] 1

As expected, all of the proportions in the table sum to 1 (100%).

What if we wish to compute the proportions by row or by column? Well, to compute those row-by-row or column-by-column proportions, we simply add an argument to the prop.table function. To compute the row proportions, enter 1 as the second argument.

# Print the row proportions for the cross-tabulation table
prop.table(table_2, 1)
##         Sex
## JobLevel    Female      Male
##        1 0.5731707 0.4268293
##        2 0.5000000 0.5000000
##        3 0.3888889 0.6111111
##        4 0.3000000 0.7000000
##        5 0.1111111 0.8888889

In the output, you can see that the proportions in each row now sum to 1 (100%). Now, let’s round the row proportions to 2 places after the decimal.

# Round cross-tabulation table values to 2 places after decimal
round(prop.table(table_2, 1), 2)
##         Sex
## JobLevel Female Male
##        1   0.57 0.43
##        2   0.50 0.50
##        3   0.39 0.61
##        4   0.30 0.70
##        5   0.11 0.89

Next, let’s convert those proportions to percentages by multiplying the previous code by 100. If you recall, the multiplication operator in R is the * symbol. Just be sure to remember that the values presented in the subsequent output represent percentages and not raw frequencies (i.e., counts).

# Convert proportions to percentages by multiplying by 100
100 * round(prop.table(table_2, 1), 2)
##         Sex
## JobLevel Female Male
##        1     57   43
##        2     50   50
##        3     39   61
##        4     30   70
##        5     11   89

Alternatively, we can compute the column proportions by typing 2 instead of 1 in the second argument of the prop.table function. As you can see below, each column now adds up to 1 (100%).

# Print the row proportions for the table
prop.table(table_2, 2)
##         Sex
## JobLevel     Female       Male
##        1 0.58750000 0.42168675
##        2 0.27500000 0.26506024
##        3 0.08750000 0.13253012
##        4 0.03750000 0.08433735
##        5 0.01250000 0.09638554

Now, let’s round to 2 places after the decimal and multiply by 100 to convert the proportions to percentages.

# Round table values to 2 places after decimal and convert to percentages
100 * round(prop.table(table_2, 2), 2)
##         Sex
## JobLevel Female Male
##        1     59   42
##        2     28   27
##        3      9   13
##        4      4    8
##        5      1   10

25.2.4.3 Option 3: Using the CrossTable Function

The CrossTable function from the gmodels package is another option when it comes to creating a two-way cross-tabulation table. The function includes multiple arguments that can eliminate additional steps (e.g., rounding) that would be required when using the table or xtabs function from base R. Before using the CrossTable function, we must install and access the gmodels package (Warnes et al. 2018) using the install.packages and library functions, respectively. The downside of the CrossTable function is that it can only be used to create two-way cross-tabulation tables.

# Install gmodels package if you haven't already
install.packages("gmodels")
# Access gmodels package
library(gmodels)

To create a two-way cross-tabulation table using the JobLevel and Sex variables, type the name of the CrossTable function. As the first argument in the function parentheses, type the name of first variable you wish to use to make the table, and use the $ symbol to indicate that the variable (JobLevel) belongs to the data frame in question (demodata), which should look like this: demodata$JobLevel. As the second argument, type the name of the second variable you wish to use to make the table, and use the $ symbol to indicate that the variable (Sex) belongs to the data frame in question (demodata), which should look like this: demodata$Sex. Make sure you use a comma (,) to separate the two arguments.

# Create cross-tabulation table from JobLevel and Sex variables from demodata data frame
CrossTable(demodata$JobLevel, demodata$Sex)
## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## | Chi-square contribution |
## |           N / Row Total |
## |           N / Col Total |
## |         N / Table Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  163 
## 
##  
##                   | demodata$Sex 
## demodata$JobLevel |    Female |      Male | Row Total | 
## ------------------|-----------|-----------|-----------|
##                 1 |        47 |        35 |        82 | 
##                   |     1.134 |     1.093 |           | 
##                   |     0.573 |     0.427 |     0.503 | 
##                   |     0.588 |     0.422 |           | 
##                   |     0.288 |     0.215 |           | 
## ------------------|-----------|-----------|-----------|
##                 2 |        22 |        22 |        44 | 
##                   |     0.008 |     0.007 |           | 
##                   |     0.500 |     0.500 |     0.270 | 
##                   |     0.275 |     0.265 |           | 
##                   |     0.135 |     0.135 |           | 
## ------------------|-----------|-----------|-----------|
##                 3 |         7 |        11 |        18 | 
##                   |     0.381 |     0.367 |           | 
##                   |     0.389 |     0.611 |     0.110 | 
##                   |     0.087 |     0.133 |           | 
##                   |     0.043 |     0.067 |           | 
## ------------------|-----------|-----------|-----------|
##                 4 |         3 |         7 |        10 | 
##                   |     0.742 |     0.715 |           | 
##                   |     0.300 |     0.700 |     0.061 | 
##                   |     0.037 |     0.084 |           | 
##                   |     0.018 |     0.043 |           | 
## ------------------|-----------|-----------|-----------|
##                 5 |         1 |         8 |         9 | 
##                   |     2.644 |     2.548 |           | 
##                   |     0.111 |     0.889 |     0.055 | 
##                   |     0.012 |     0.096 |           | 
##                   |     0.006 |     0.049 |           | 
## ------------------|-----------|-----------|-----------|
##      Column Total |        80 |        83 |       163 | 
##                   |     0.491 |     0.509 |           | 
## ------------------|-----------|-----------|-----------|
## 
## 

The resulting cross-tabulation table is packed with information! Fortunately, there is a key titled “Cell Contents” that explains how to interpret the value displayed in each row within each cell. As you can see, by default, the table displays the raw frequencies (i.e., counts), the proportions by row, the proportions by column, and the overall cell proportions – and we only had to use a single function!

In some cases, you may wish to reduce the amount of information displayed. To find additional documentation, use the help (?) feature for the CrossTable function.

# Access background information on function
?CrossTable

Once the Help window opens, you can explore the different arguments that can be used within the function to change the default settings. The default number of digits displayed after the decimal is 3, but let’s change to 2 by adding the argument digits=2. Next, let’s add the following arguments: (a) prop.r=FALSE to hide the row proportions, (b) prop.c=FALSE to hide the column proportions, (c) prop.t=TRUE to keep the total proportions visible, and (d) prop.chisq=FALSE to hide the chi-square contribution of each cell.

# Create cross-tabulation table from JobLevel and Sex variables from demodata data frame
CrossTable(demodata$JobLevel, demodata$Sex, digits=2,
           prop.r=FALSE, prop.c=FALSE, prop.t=TRUE, prop.chisq=FALSE)
## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## |         N / Table Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  163 
## 
##  
##                   | demodata$Sex 
## demodata$JobLevel |    Female |      Male | Row Total | 
## ------------------|-----------|-----------|-----------|
##                 1 |        47 |        35 |        82 | 
##                   |      0.29 |      0.21 |           | 
## ------------------|-----------|-----------|-----------|
##                 2 |        22 |        22 |        44 | 
##                   |      0.13 |      0.13 |           | 
## ------------------|-----------|-----------|-----------|
##                 3 |         7 |        11 |        18 | 
##                   |      0.04 |      0.07 |           | 
## ------------------|-----------|-----------|-----------|
##                 4 |         3 |         7 |        10 | 
##                   |      0.02 |      0.04 |           | 
## ------------------|-----------|-----------|-----------|
##                 5 |         1 |         8 |         9 | 
##                   |      0.01 |      0.05 |           | 
## ------------------|-----------|-----------|-----------|
##      Column Total |        80 |        83 |       163 | 
## ------------------|-----------|-----------|-----------|
## 
## 

25.2.5 Three-Way Cross-Tabulation

A three-way cross-tabulation table summarizes the association between three categorical (i.e., nominal, ordinal) variables. Using the data found in the data frame we named demodata, we will begin by creating two-way cross-tabulation tables using the categorical JobLevel, Sex, and RaceEthnicity variables. I’ll demonstrate how to create three-way cross-tabulation tables using two different functions (i.e., table, xtabs), and you can choose to follow along with all three or just one or two.

25.2.5.1 Option 1: Using the table Function

Using the table function from base R, let’s create a three-way cross-tabulation table containing frequencies based on the JobLevel, Sex, and RaceEthnicity variables contained in the demodata data frame object. To begin, type the name of the table function. As the first argument in the function, specify the name of the data frame object (demodata) followed by the $ operator and the name of the first variable (JobLevel). As the second argument, specify the name of the data frame object (demodata) followed by the $ operator and the name of the second variable (Sex). As the third argument, specify the name of the data frame object (demodata) followed by the $ operator and the name of the third variable (RaceEthnicity).

# Create three-way cross-tabulation table from JobLevel, Sex, and RaceEthnicity variables
table(demodata$JobLevel, demodata$Sex, demodata$RaceEthnicity)
## , ,  = Asian
## 
##    
##     Female Male
##   1     14   11
##   2      6   10
##   3      1    6
##   4      0    5
##   5      0    3
## 
## , ,  = Black
## 
##    
##     Female Male
##   1      2    0
##   2      0    1
##   3      0    0
##   4      0    0
##   5      0    0
## 
## , ,  = HispanicLatino
## 
##    
##     Female Male
##   1     14   14
##   2      3    5
##   3      3    2
##   4      3    2
##   5      0    3
## 
## , ,  = White
## 
##    
##     Female Male
##   1     17   10
##   2     13    6
##   3      3    3
##   4      0    0
##   5      1    2

As you can see, when used to create a three-way cross-tabulation table, the table function creates one two-way cross-tabulation table based on the first two variables for each category or level of the third variable. Let’s assign this table to an object that we’ll call table3.

# Assign three-way cross-tabulation table to object
table_3 <- table(demodata$JobLevel, demodata$Sex, demodata$RaceEthnicity)

If you would prefer to view the frequencies (i.e., counts) in a single table, then use the ftable function from base R. Simply type the name of the table object we created in the previous step (table_3) as the sole argument in the ftable function.

# Print three-way cross-tabulation table in a different format
ftable(table_3)
##           Asian Black HispanicLatino White
##                                           
## 1 Female     14     2             14    17
##   Male       11     0             14    10
## 2 Female      6     0              3    13
##   Male       10     1              5     6
## 3 Female      1     0              3     3
##   Male        6     0              2     3
## 4 Female      0     0              3     0
##   Male        5     0              2     0
## 5 Female      0     0              0     1
##   Male        3     0              3     2

Just as we did with the two-way tables above, you can also apply the prop.table function and round functions to this three-way cross-tabulation table object.

25.2.5.2 Option 2: Using the xtabs Function

The xtabs function from base R can also be used to create a three-way cross-tabulation table. To begin, type the name of the xtabs function. As the first argument in the function parentheses, insert the tilde (~) operator followed by the name of the first variable (JobLevel), the addition (+) operator, and the name of the second variable (Sex), the addition (+) operator, and the name of the third variable (RaceEthnicity). As the second argument, type data= followed by the name of the data frame object two which the aforementioned variables belong (demodata).

# Create cross-tabulation table from JobLevel and Sex variables 
xtabs(~ JobLevel + Sex + RaceEthnicity, data=demodata)
## , , RaceEthnicity = Asian
## 
##         Sex
## JobLevel Female Male
##        1     14   11
##        2      6   10
##        3      1    6
##        4      0    5
##        5      0    3
## 
## , , RaceEthnicity = Black
## 
##         Sex
## JobLevel Female Male
##        1      2    0
##        2      0    1
##        3      0    0
##        4      0    0
##        5      0    0
## 
## , , RaceEthnicity = HispanicLatino
## 
##         Sex
## JobLevel Female Male
##        1     14   14
##        2      3    5
##        3      3    2
##        4      3    2
##        5      0    3
## 
## , , RaceEthnicity = White
## 
##         Sex
## JobLevel Female Male
##        1     17   10
##        2     13    6
##        3      3    3
##        4      0    0
##        5      1    2

As you can see, when used to create a three-way cross-tabulation table, the xtabs function creates one two-way cross-tabulation table based on the first two variables for each category or level of the third variable. Let’s assign this table to an object that we’ll call table3.

# Assign three-way cross-tabulation table to object
table_3 <- xtabs(~ JobLevel + Sex + RaceEthnicity, data=demodata)

If you would prefer to view the frequencies (i.e., counts) in a single table, then use the ftable function from base R. Simply type the name of the table object we created in the previous step (table_3) as the sole argument in the ftable function.

# Print three-way cross-tabulation table in a different format
ftable(table_3)
##                 RaceEthnicity Asian Black HispanicLatino White
## JobLevel Sex                                                  
## 1        Female                  14     2             14    17
##          Male                    11     0             14    10
## 2        Female                   6     0              3    13
##          Male                    10     1              5     6
## 3        Female                   1     0              3     3
##          Male                     6     0              2     3
## 4        Female                   0     0              3     0
##          Male                     5     0              2     0
## 5        Female                   0     0              0     1
##          Male                     3     0              3     2

Just as we did with the two-way tables above, you can also apply the prop.table function and round functions to this three-way table object.

25.2.6 Summary

In this chapter, we learned how to create two-way cross-tabulation tables using the table and xtabs functions from base R and the CrossTable function from the gmodels package. In addition, we learned how to create three-way cross-tabulation tables using the table and xtabs.

References

Warnes, Gregory R., Ben Bolker, Thomas Lumley, and Randall C Johnson. 2018. Gmodels: Various r Programming Tools for Model Fitting. https://CRAN.R-project.org/package=gmodels.
Wickham, Hadley, Jim Hester, and Jennifer Bryan. 2024. Readr: Read Rectangular Text Data. https://CRAN.R-project.org/package=readr.