Chapter 15 Writing Data from R

In this chapter, we will learn what “writing data” means in the context of the R language, and how to go about writing data from R so that we share data with non-R users.

15.1 Conceptual Overview

Writing data refers to the process of exporting data from the R Environment to a (working directory) folder. If you collaborate with others who do not work in R, writing data will allow them to use the data you cleaned, managed, or manipulated in the R Environment in other software programs. In the following tutorial, we will learn how to write a data frame object and a table object to our working directory folder as .csv files.

15.2 Tutorial

This chapter’s tutorial demonstrates how to write data from R into a .csv file that can be opened in programs like Microsoft Excel or Google Sheets – along with many other analytical software programs.

15.2.1 Video Tutorial

As usual, you have the choice to follow along with the written tutorial in this chapter or to watch the video tutorial below.

Link to video tutorial: https://youtu.be/ORTe8vE7nzU

15.2.2 Functions & Packages Introduced

Function Package
write.csv base R
write.table base R
table base R

15.2.3 Initial Steps

If you haven’t already, save the file called “PersData.csv” into a folder that you will subsequently set as your working directory. Your working directory will likely be different than the one shown below (i.e., "H:/RWorkshop"). As a reminder, you can access all of the data files referenced in this book by downloading them as a compressed (zipped) folder from the my GitHub site: https://github.com/davidcaughlin/R-Tutorial-Data-Files; once you’ve followed the link to GitHub, just click “Code” (or “Download”) followed by “Download ZIP”, which will download all of the data files referenced in this book. For the sake of parsimony, I recommend downloading all of the data files into the same folder on your computer, which will allow you to set that same folder as your working directory for each of the chapters in this book.

Next, using the setwd function, set your working directory to the folder in which you saved the data file for this chapter. Alternatively, you can manually set your working directory folder in your drop-down menus by going to Session > Set Working Directory > Choose Directory…. Be sure to create a new R script file (.R) or update an existing R script file so that you can save your script and annotations. If you need refreshers on how to set your working directory and how to create and save an R script, please refer to Setting a Working Directory and Creating & Saving an R Script.

# Set your working directory
setwd("H:/RWorkshop")

Next, read in the .csv data file called “PersData.csv” using your choice of read function. In this example, I use the read_csv function from the readr package (Wickham, Hester, and Bryan 2022). If you choose to use the read_csv function, be sure that you have installed and accessed the readr package using the install.packages and library functions. Note: You don’t need to install a package every time you wish to access it; in general, I would recommend updating a package installation once ever 1-3 months. For refreshers on installing packages and reading data into R, please refer to Packages and Reading Data into R.

# Install readr package if you haven't already
# [Note: You don't need to install a package every 
# time you wish to access it]
install.packages("readr")
# Access readr package
library(readr)

# Read data and name data frame (tibble) object
personaldata <- read_csv("PersData.csv")
## Rows: 9 Columns: 5
## ── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): lastname, firstname, startdate, gender
## dbl (1): id
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Print the names of the variables in the data frame (tibble) object
names(personaldata)
## [1] "id"        "lastname"  "firstname" "startdate" "gender"
# Print data frame (tibble) object
print(personaldata)
## # A tibble: 9 × 5
##      id lastname   firstname startdate gender
##   <dbl> <chr>      <chr>     <chr>     <chr> 
## 1   153 Sanchez    Alejandro 1/1/2016  male  
## 2   154 McDonald   Ronald    1/9/2016  male  
## 3   155 Smith      John      1/9/2016  male  
## 4   165 Doe        Jane      1/4/2016  female
## 5   125 Franklin   Benjamin  1/5/2016  male  
## 6   111 Newton     Isaac     1/9/2016  male  
## 7   198 Morales    Linda     1/7/2016  female
## 8   201 Providence Cindy     1/9/2016  female
## 9   282 Legend     John      1/9/2016  male

As you can see from the output generated in your console, the personaldata data frame object contains basic employee demographic information. The variable names include: id, lastname, firstname, startdate, and gender. Technically, the read_csv function reads in what is called a “tibble” object (as opposed to a data frame object), but for our purposes a tibble will behave similarly to a data frame. For more information on tibbles, check out Wickham and Grolemund’s (2017) chapter on tibbles: http://r4ds.had.co.nz/tibbles.html.

15.2.4 Write Data Frame to Working Directory

The write.csv function from base R can be used to write a data frame object to your working directory or to a folder of your choosing. Let’s write the personaldata data frame (that we read in and named above) to our working directory. Before doing so, however, let’s make a minor change to the data frame to illustrate a scenario in which you clean your data in R and then write the data to a .csv file so that a colleague can work with the data in another program. Specifically, let’s remove the lastname variable from the data frame. To do so, type the name of the data frame (personaldata), followed by the $ symbol and then the name of the variable in question (lastname). Next, type the <- operator followed by NULL. This code will remove the variable from the data frame.

# Remove variable from data frame object
personaldata$lastname <- NULL

# Print data frame object
print(personaldata)
## # A tibble: 9 × 4
##      id firstname startdate gender
##   <dbl> <chr>     <chr>     <chr> 
## 1   153 Alejandro 1/1/2016  male  
## 2   154 Ronald    1/9/2016  male  
## 3   155 John      1/9/2016  male  
## 4   165 Jane      1/4/2016  female
## 5   125 Benjamin  1/5/2016  male  
## 6   111 Isaac     1/9/2016  male  
## 7   198 Linda     1/7/2016  female
## 8   201 Cindy     1/9/2016  female
## 9   282 John      1/9/2016  male

As you can see in your Console output, the variable called lastname is no longer present in the data frame object.

To write our “cleaned”” data frame (personaldata) to our working directory, we use the write.csv function from base R. As the first argument in the parentheses, type the name of the data frame (personaldata). Remember to type a comma (,) before the second argument, as this is how we separate arguments from one another when there are more than one. As the second argument, let’s type what we want to name the file that we will create in our working directory. Make sure that the name of the new .csv file is in quotation marks (" "). Here, I name the new file “Cleaned PersData.csv”; it is important that you keep the .csv extension at the end of the name you provide.

# Write data frame object to working directory
write.csv(personaldata, "Cleaned PersData.csv")

If you go to your working directory folder, you will find the file called “Cleaned PersData.csv” saved there.

We can also specify which folder that we want to write our data to using the full path extension and what we would like to name the new .csv file.

# Write data frame object to folder
write.csv(personaldata, "H:/RWorkshop/Cleaned PersData2.csv")

If you go to your working directory folder, you will find the file called “Cleaned PersData2.csv”.

15.2.5 Write Table to Working Directory

Sometimes we work with table objects in R. If we wish to write a table to our working directory, we can use the write.table function from base R. Before doing so, we need to create a data table object as an example, which we can do using the table function from base R.

To create a table, first, come up with a name for your new table object; in this example, I name the table table_example (because I’m so creative). Second, type the <- operator to the right of your new table name to tell R that you are creating a new object. Third, type the name of the table-creation function, which is table. Fourth, in the function’s parentheses, as the first argument, enter the name of first variable you wish to use to make the table, and use the $ symbol to indicate that the variable (gender) belongs to the data frame in question (personaldata), which should look like this: personaldata$gender. Fifth, as the second argument, enter the name of the second variable you wish to use to make the table, and use the $ symbol to indicate that the variable (startdate) belongs to the data frame in question (personaldata), which should look like this: personaldata$startdate.

# Create table from gender and startdate variables from personaldata data frame
table_example <- table(personaldata$gender, personaldata$startdate)

# Print table object in Console
print(table_example)
##         
##          1/1/2016 1/4/2016 1/5/2016 1/7/2016 1/9/2016
##   female        0        1        0        1        1
##   male          1        0        1        0        4

The table above shows how how many female versus male employees started working on a given date.

Now we are ready to write the table called table_example to our working directory using the write.table function. As the first argument, type the name of the table object (table_example). Second, type what we would like to call the file when it is saved in our working directory (**"Practice Table.csv"**); be sure to include the .csv extension in the name and wrap it all in quotation marks. Third, use the sep="," argument to specify that the values in the table are separated by commas, as this will be a comma separated values file. Fourth, add the argument col.names=NA to format the table such that the column names will be aligned with their respective values. The reason for this fourth argument is that in our table the first column will contain the row names of one of the variables; if we don’t include this argument, the function will by default enter the name of the first column name associated with one of the levels of the variables in the first column, and because the first column actually contains the row names for the table, the row names will be off by one column. The col.names=NA argument simply leaves the first cell in the top row blank so that in the next column to the right, the first column name for one of the variables will appear. [To understand what the table would look like without this fourth argument, simply omit it, and open the resulting file in your working directory to see what happens.]

# Write table object to working directory
write.table(table_example, "Practice Table.csv", sep=",", col.names=NA)

If you go to your working directory, you will find the file called “Practice Table.csv”.

15.2.6 Summary

Writing data from the R environment to your working directory or another folder can be useful, especially when collaborating with those who do not use R. The write.csv function writes a data frame object to a .csv file, whereas the write.table function writes a data table object to a .csv file.

References

Wickham, Hadley, and Garrett Grolemund. 2017. R for Data Science: Visualize, Model, Transform, Tidy, and Import Data. Sebastopol, California: O’Reilly Media, Inc. https://r4ds.had.co.nz/n.
Wickham, Hadley, Jim Hester, and Jennifer Bryan. 2022. Readr: Read Rectangular Text Data. https://CRAN.R-project.org/package=readr.