Chapter 14 Removing, Adding, & Changing Variable Names

In this chapter, we will learn how to remove, add, and change variable names in R.

14.1 Conceptual Overview

After reading data into R as a data frame object, you may encounter situations in which it makes sense to remove the variable names (and not the data associated with the variable names), to add or replace variable names, or to just rename (change) certain variables. For example, perhaps the variable names from the original data file don’t adhere to your preferred naming conventions, and thus you wish to change the variable names. As another example, sometimes the variable names are divorced from the associated data, and thus as an initial data management step, we need to add the variable names to the associated data in R. In the following tutorial, you will learn some simple techniques to achieve these objectives.

14.2 Tutorial

This chapter’s tutorial demonstrates how remove, add, and change variable names in a data frame object.

14.2.1 Video Tutorial

Link to video tutorial: https://youtu.be/3m32O9f8gAI

14.2.2 Functions & Packages Introduced

Function Package
names base R
c base R
head base R
rename dplyr

14.2.3 Initial Steps

If you haven’t already, save the file called “PersData.csv” into a folder that you will subsequently set as your working directory. Your working directory will likely be different than the one shown below (i.e., "H:/RWorkshop"). As a reminder, you can access all of the data files referenced in this book by downloading them as a compressed (zipped) folder from the my GitHub site: https://github.com/davidcaughlin/R-Tutorial-Data-Files; once you’ve followed the link to GitHub, just click “Code” (or “Download”) followed by “Download ZIP”, which will download all of the data files referenced in this book. For the sake of parsimony, I recommend downloading all of the data files into the same folder on your computer, which will allow you to set that same folder as your working directory for each of the chapters in this book.

Next, using the setwd function, set your working directory to the folder in which you saved the data file for this chapter. Alternatively, you can manually set your working directory folder in your drop-down menus by going to Session > Set Working Directory > Choose Directory…. Be sure to create a new R script file (.R) or update an existing R script file so that you can save your script and annotations. If you need refreshers on how to set your working directory and how to create and save an R script, please refer to Setting a Working Directory and Creating & Saving an R Script.

# Set your working directory
setwd("H:/RWorkshop")

Next, read in the .csv data file called “PersData.csv” using your choice of read function. In this example, I use the read_csv function from the readr package (Wickham, Hester, and Bryan 2024). If you choose to use the read_csv function, be sure that you have installed and accessed the readr package using the install.packages and library functions. Note: You don’t need to install a package every time you wish to access it; in general, I would recommend updating a package installation once ever 1-3 months. For refreshers on installing packages and reading data into R, please refer to Packages and Reading Data into R.

# Install readr package if you haven't already
# [Note: You don't need to install a package every 
# time you wish to access it]
install.packages("readr")
# Access readr package
library(readr)

# Read data and name data frame (tibble) object
personaldata <- read_csv("PersData.csv")
## Rows: 9 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): lastname, firstname, startdate, gender
## dbl (1): id
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Print the names of the variables in the data frame (tibble) object
names(personaldata)
## [1] "id"        "lastname"  "firstname" "startdate" "gender"
# Print data frame (tibble) object
print(personaldata)
## # A tibble: 9 × 5
##      id lastname   firstname startdate gender
##   <dbl> <chr>      <chr>     <chr>     <chr> 
## 1   153 Sanchez    Alejandro 1/1/2016  male  
## 2   154 McDonald   Ronald    1/9/2016  male  
## 3   155 Smith      John      1/9/2016  male  
## 4   165 Doe        Jane      1/4/2016  female
## 5   125 Franklin   Benjamin  1/5/2016  male  
## 6   111 Newton     Isaac     1/9/2016  male  
## 7   198 Morales    Linda     1/7/2016  female
## 8   201 Providence Cindy     1/9/2016  female
## 9   282 Legend     John      1/9/2016  male

As you can see from the output generated in your console, the personaldata data frame object contains basic employee demographic information. The variable names include: id, lastname, firstname, startdate, and gender. Technically, the read_csv function reads in what is called a “tibble” object (as opposed to a data frame object), but for our purposes a tibble will behave similarly to a data frame. For more information on tibbles, check out Wickham and Grolemund’s (2017) chapter on tibbles: http://r4ds.had.co.nz/tibbles.html.

14.2.4 Remove Variable Names from a Data Frame Object

In some instances, you may wish to remove the variable names from a data frame. For example, I sometimes write (i.e., export) a data frame object I’ve been cleaning in R so that I may use the data file with the statistical software program called Mplus (Muthén and Muthén 1998-2018). Because Mplus doesn’t accept variable names within its data files, I may drop the variable names from the data frame object prior to writing to my working directory.

To remove variable names, just apply the names function with the data frame name as the argument, and then use either the <- operator with NULL to remove the variable names.

# Remove variable names
names(personaldata) <- NULL

# Print just the first 6 rows of the data frame object in Console
head(personaldata)

If you get the following error message (see below), then you likely need to convert your object from a tibble to a data frame object prior to removing the variable names. When we use the read_csv function from the readr package to read in data, we technically read in the data as a tibble as opposed to a standard data frame object.

\(\color{red}{\text{Error in names[old] <- names(x)[j[old]] : replacement has length zero}}\)

To convert the object to a data frame object, we can use the as.data.frame object from base R as follows and then re-try the previous step.

# If error message appears, convert object to data frame
personaldata <- as.data.frame(personaldata)
# Remove variable names
names(personaldata) <- NULL

# Print just the first 6 rows of the data frame object in Console
head(personaldata)
##                                         
## 1 153  Sanchez Alejandro 1/1/2016   male
## 2 154 McDonald    Ronald 1/9/2016   male
## 3 155    Smith      John 1/9/2016   male
## 4 165      Doe      Jane 1/4/2016 female
## 5 125 Franklin  Benjamin 1/5/2016   male
## 6 111   Newton     Isaac 1/9/2016   male

As you can see, the variable names do not appear in the overwritten personaldata data frame object.

14.2.5 Add Variable Names to a Data Frame Object

In other instances, you might find yourself with a dataset that lacks variable names (or has variable names that need to be replaced), which means that you will need to add those variable names to the data frame.

Let’s work with the personaldata data frame object from the previous section for practice. To add variable names, we can use the names function from base R, and enter the name of the data frame as the argument. Using the <- operator, we can specify the variable names using the c (combine) function that contains a vector of variable names in quotation marks (" ") as the arguments. Remember to type a comma (,) between the function arguments, as commas are used to separate arguments from one another when there are more than one. Please note that the it’s important that the vector of variable names contains the same number of names as the data frame object has columns.

# Add (or replace) variable names to data frame object
names(personaldata) <- c("id", "lastname", "firstname", "startdate", "gender")

# Print just the first 6 rows of data in Console
head(personaldata)
##    id lastname firstname startdate gender
## 1 153  Sanchez Alejandro  1/1/2016   male
## 2 154 McDonald    Ronald  1/9/2016   male
## 3 155    Smith      John  1/9/2016   male
## 4 165      Doe      Jane  1/4/2016 female
## 5 125 Franklin  Benjamin  1/5/2016   male
## 6 111   Newton     Isaac  1/9/2016   male

Now the data frame object has variable names!

14.2.6 Change Specific Variable Names in a Data Frame Object

Using the resulting data frame object from the previous section (personaldata), we can rename specific variables using the rename function from the dplyr package (Wickham et al. 2023). To get started, we’ll need to install the dplyr package so that we can access the rename function. If you haven’t already, install and access the dplyr package using the install.packages and library functions, respectively.

# Install package
install.packages("dplyr")
# Access package
library(dplyr)

We’ll begin by specifying the name of our data frame object personaldata, followed by the <- operator so that we can overwrite the existing personaldata frame object with one that contains the renamed variables. Next, type the name of the rename function. As the first argument in the function, type the name of the data frame object (personaldata). As the second argument, let’s change the lastname variable to Last_Name by typing the name of our new variable followed by = and, in quotation marks (" "), the name of the original variable (Last_Name="lastname"). As the third argument, let’s apply the same process as the second argument and change the firstname variable to First_Name by typing the name of our new variable followed by = and, in quotation marks (" "), the name of the original variable (First_Name="firstname").

# Add (or replace) variable names to data frame object
personaldata <- rename(personaldata,
                       Last_Name="lastname",
                       First_Name="firstname")

Using the head function from base R, let’s verify that we renamed the two variables successfully.

# View just the first 6 rows of data in Console
head(personaldata)
##    id Last_Name First_Name startdate gender
## 1 153   Sanchez  Alejandro  1/1/2016   male
## 2 154  McDonald     Ronald  1/9/2016   male
## 3 155     Smith       John  1/9/2016   male
## 4 165       Doe       Jane  1/4/2016 female
## 5 125  Franklin   Benjamin  1/5/2016   male
## 6 111    Newton      Isaac  1/9/2016   male

As you can see, the lastname and firstname variables are now named Last_Name and First_Name. It worked!

14.2.7 Summary

In this chapter, we reviewed how to remove variable names from a data frame object; how to add variable names to a data frame object using the names, colnames, and c functions, which all come standard with your base R installation; and how to rename specific variables using the rename function from the dplyr package.

References

Muthén, B O, and L K Muthén. 1998-2018. Mplus Version 8.3. Los Angeles, California: Muthén & Muthén.
Wickham, Hadley, Romain François, Lionel Henry, Kirill Müller, and Davis Vaughan. 2023. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.
Wickham, Hadley, and Garrett Grolemund. 2017. R for Data Science: Visualize, Model, Transform, Tidy, and Import Data. Sebastopol, California: O’Reilly Media, Inc. https://r4ds.had.co.nz/n.
Wickham, Hadley, Jim Hester, and Jennifer Bryan. 2024. Readr: Read Rectangular Text Data. https://CRAN.R-project.org/package=readr.