Chapter 14 Removing, Adding, & Changing Variable Names
In this chapter, we will learn how to remove, add, and change variable names in R.
14.1 Conceptual Overview
After reading data into R as a data frame object, you may encounter situations in which it makes sense to remove the variable names (and not the data associated with the variable names), to add or replace variable names, or to just rename (change) certain variables. For example, perhaps the variable names from the original data file don’t adhere to your preferred naming conventions, and thus you wish to change the variable names. As another example, sometimes the variable names are divorced from the associated data, and thus as an initial data management step, we need to add the variable names to the associated data in R. In the following tutorial, you will learn some simple techniques to achieve these objectives.
14.2 Tutorial
This chapter’s tutorial demonstrates how remove, add, and change variable names in a data frame object.
14.2.1 Video Tutorial
Link to video tutorial: https://youtu.be/3m32O9f8gAI
14.2.2 Functions & Packages Introduced
Function | Package |
---|---|
names |
base R |
c |
base R |
head |
base R |
rename |
dplyr |
14.2.3 Initial Steps
If you haven’t already, save the file called “PersData.csv” into a folder that you will subsequently set as your working directory. Your working directory will likely be different than the one shown below (i.e., "H:/RWorkshop"
). As a reminder, you can access all of the data files referenced in this book by downloading them as a compressed (zipped) folder from the my GitHub site: https://github.com/davidcaughlin/R-Tutorial-Data-Files; once you’ve followed the link to GitHub, just click “Code” (or “Download”) followed by “Download ZIP”, which will download all of the data files referenced in this book. For the sake of parsimony, I recommend downloading all of the data files into the same folder on your computer, which will allow you to set that same folder as your working directory for each of the chapters in this book.
Next, using the setwd
function, set your working directory to the folder in which you saved the data file for this chapter. Alternatively, you can manually set your working directory folder in your drop-down menus by going to Session > Set Working Directory > Choose Directory…. Be sure to create a new R script file (.R) or update an existing R script file so that you can save your script and annotations. If you need refreshers on how to set your working directory and how to create and save an R script, please refer to Setting a Working Directory and Creating & Saving an R Script.
Next, read in the .csv data file called “PersData.csv” using your choice of read function. In this example, I use the read_csv
function from the readr
package (Wickham, Hester, and Bryan 2024). If you choose to use the read_csv
function, be sure that you have installed and accessed the readr
package using the install.packages
and library
functions. Note: You don’t need to install a package every time you wish to access it; in general, I would recommend updating a package installation once ever 1-3 months. For refreshers on installing packages and reading data into R, please refer to Packages and Reading Data into R.
# Install readr package if you haven't already
# [Note: You don't need to install a package every
# time you wish to access it]
install.packages("readr")
# Access readr package
library(readr)
# Read data and name data frame (tibble) object
personaldata <- read_csv("PersData.csv")
## Rows: 9 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): lastname, firstname, startdate, gender
## dbl (1): id
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## [1] "id" "lastname" "firstname" "startdate" "gender"
## # A tibble: 9 × 5
## id lastname firstname startdate gender
## <dbl> <chr> <chr> <chr> <chr>
## 1 153 Sanchez Alejandro 1/1/2016 male
## 2 154 McDonald Ronald 1/9/2016 male
## 3 155 Smith John 1/9/2016 male
## 4 165 Doe Jane 1/4/2016 female
## 5 125 Franklin Benjamin 1/5/2016 male
## 6 111 Newton Isaac 1/9/2016 male
## 7 198 Morales Linda 1/7/2016 female
## 8 201 Providence Cindy 1/9/2016 female
## 9 282 Legend John 1/9/2016 male
As you can see from the output generated in your console, the personaldata
data frame object contains basic employee demographic information. The variable names include: id
, lastname
, firstname
, startdate
, and gender
. Technically, the read_csv
function reads in what is called a “tibble” object (as opposed to a data frame object), but for our purposes a tibble will behave similarly to a data frame. For more information on tibbles, check out Wickham and Grolemund’s (2017) chapter on tibbles: http://r4ds.had.co.nz/tibbles.html.
14.2.4 Remove Variable Names from a Data Frame Object
In some instances, you may wish to remove the variable names from a data frame. For example, I sometimes write (i.e., export) a data frame object I’ve been cleaning in R so that I may use the data file with the statistical software program called Mplus (Muthén and Muthén 1998-2018). Because Mplus doesn’t accept variable names within its data files, I may drop the variable names from the data frame object prior to writing to my working directory.
To remove variable names, just apply the names
function with the data frame name as the argument, and then use either the <-
operator with NULL
to remove the variable names.
# Remove variable names
names(personaldata) <- NULL
# Print just the first 6 rows of the data frame object in Console
head(personaldata)
If you get the following error message (see below), then you likely need to convert your object from a tibble to a data frame object prior to removing the variable names. When we use the read_csv
function from the readr
package to read in data, we technically read in the data as a tibble as opposed to a standard data frame object.
\(\color{red}{\text{Error in names[old] <- names(x)[j[old]] : replacement has length zero}}\)
To convert the object to a data frame object, we can use the as.data.frame
object from base R as follows and then re-try the previous step.
# If error message appears, convert object to data frame
personaldata <- as.data.frame(personaldata)
# Remove variable names
names(personaldata) <- NULL
# Print just the first 6 rows of the data frame object in Console
head(personaldata)
##
## 1 153 Sanchez Alejandro 1/1/2016 male
## 2 154 McDonald Ronald 1/9/2016 male
## 3 155 Smith John 1/9/2016 male
## 4 165 Doe Jane 1/4/2016 female
## 5 125 Franklin Benjamin 1/5/2016 male
## 6 111 Newton Isaac 1/9/2016 male
As you can see, the variable names do not appear in the overwritten personaldata
data frame object.
14.2.5 Add Variable Names to a Data Frame Object
In other instances, you might find yourself with a dataset that lacks variable names (or has variable names that need to be replaced), which means that you will need to add those variable names to the data frame.
Let’s work with the personaldata
data frame object from the previous section for practice. To add variable names, we can use the names
function from base R, and enter the name of the data frame as the argument. Using the <-
operator, we can specify the variable names using the c
(combine) function that contains a vector of variable names in quotation marks (" "
) as the arguments. Remember to type a comma (,
) between the function arguments, as commas are used to separate arguments from one another when there are more than one. Please note that the it’s important that the vector of variable names contains the same number of names as the data frame object has columns.
# Add (or replace) variable names to data frame object
names(personaldata) <- c("id", "lastname", "firstname", "startdate", "gender")
# Print just the first 6 rows of data in Console
head(personaldata)
## id lastname firstname startdate gender
## 1 153 Sanchez Alejandro 1/1/2016 male
## 2 154 McDonald Ronald 1/9/2016 male
## 3 155 Smith John 1/9/2016 male
## 4 165 Doe Jane 1/4/2016 female
## 5 125 Franklin Benjamin 1/5/2016 male
## 6 111 Newton Isaac 1/9/2016 male
Now the data frame object has variable names!
14.2.6 Change Specific Variable Names in a Data Frame Object
Using the resulting data frame object from the previous section (personaldata
), we can rename specific variables using the rename
function from the dplyr
package (Wickham et al. 2023). To get started, we’ll need to install the dplyr
package so that we can access the rename
function. If you haven’t already, install and access the dplyr
package using the install.packages
and library
functions, respectively.
We’ll begin by specifying the name of our data frame object personaldata
, followed by the <-
operator so that we can overwrite the existing personaldata
frame object with one that contains the renamed variables. Next, type the name of the rename
function. As the first argument in the function, type the name of the data frame object (personaldata
). As the second argument, let’s change the lastname
variable to Last_Name
by typing the name of our new variable followed by =
and, in quotation marks (" "
), the name of the original variable (Last_Name="lastname"
). As the third argument, let’s apply the same process as the second argument and change the firstname
variable to First_Name
by typing the name of our new variable followed by =
and, in quotation marks (" "
), the name of the original variable (First_Name="firstname"
).
# Add (or replace) variable names to data frame object
personaldata <- rename(personaldata,
Last_Name="lastname",
First_Name="firstname")
Using the head
function from base R, let’s verify that we renamed the two variables successfully.
## id Last_Name First_Name startdate gender
## 1 153 Sanchez Alejandro 1/1/2016 male
## 2 154 McDonald Ronald 1/9/2016 male
## 3 155 Smith John 1/9/2016 male
## 4 165 Doe Jane 1/4/2016 female
## 5 125 Franklin Benjamin 1/5/2016 male
## 6 111 Newton Isaac 1/9/2016 male
As you can see, the lastname
and firstname
variables are now named Last_Name
and First_Name
. It worked!
14.2.7 Summary
In this chapter, we reviewed how to remove variable names from a data frame object; how to add variable names to a data frame object using the names
, colnames
, and c
functions, which all come standard with your base R installation; and how to rename specific variables using the rename
function from the dplyr
package.