In this chapter, we will learn what “writing data” means in the context of the R language, and how to go about writing data from R so that we share data with non-R users.
Writing data refers to the process of exporting data from the R Environment to a (working directory) folder. If you collaborate with others who do not work in R, writing data will allow them to use the data you cleaned, managed, or manipulated in the R Environment in other software programs. In the following tutorial, we will learn how to write a data frame object and a table object to our working directory folder as .csv files.
This chapter’s tutorial demonstrates how to write data from R into a .csv file that can be opened in programs like Microsoft Excel or Google Sheets – along with many other analytical software programs.
As usual, you have the choice to follow along with the written tutorial in this chapter or to watch the video tutorial below.
Link to video tutorial: https://youtu.be/ORTe8vE7nzU
If you haven’t already, save the file called “PersData.csv” into a folder that you will subsequently set as your working directory. Your working directory will likely be different than the one shown below (i.e.,
"H:/RWorkshop"). As a reminder, you can access all of the data files referenced in this book by downloading them as a compressed (zipped) folder from the my GitHub site: https://github.com/davidcaughlin/R-Tutorial-Data-Files; once you’ve followed the link to GitHub, just click “Code” (or “Download”) followed by “Download ZIP”, which will download all of the data files referenced in this book. For the sake of parsimony, I recommend downloading all of the data files into the same folder on your computer, which will allow you to set that same folder as your working directory for each of the chapters in this book.
Next, using the
setwd function, set your working directory to the folder in which you saved the data file for this chapter. Alternatively, you can manually set your working directory folder in your drop-down menus by going to Session > Set Working Directory > Choose Directory…. Be sure to create a new R script file (.R) or update an existing R script file so that you can save your script and annotations. If you need refreshers on how to set your working directory and how to create and save an R script, please refer to Setting a Working Directory and Creating & Saving an R Script.
# Set your working directory setwd("H:/RWorkshop")
Next, read in the .csv data file called “PersData.csv” using your choice of read function. In this example, I use the
read_csv function from the
readr package (Wickham, Hester, and Bryan 2022). If you choose to use the
read_csv function, be sure that you have installed and accessed the
readr package using the
library functions. Note: You don’t need to install a package every time you wish to access it; in general, I would recommend updating a package installation once ever 1-3 months. For refreshers on installing packages and reading data into R, please refer to Packages and Reading Data into R.
# Install readr package if you haven't already # [Note: You don't need to install a package every # time you wish to access it] install.packages("readr")
# Access readr package library(readr) # Read data and name data frame (tibble) object <- read_csv("PersData.csv")personaldata
## Rows: 9 Columns: 5 ## ── Column specification ────────────────────────────────────────────────────────────────────────────────────────────── ## Delimiter: "," ## chr (4): lastname, firstname, startdate, gender ## dbl (1): id ## ## ℹ Use `spec()` to retrieve the full column specification for this data. ## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Print the names of the variables in the data frame (tibble) object names(personaldata)
##  "id" "lastname" "firstname" "startdate" "gender"
# Print data frame (tibble) object print(personaldata)
## # A tibble: 9 × 5 ## id lastname firstname startdate gender ## <dbl> <chr> <chr> <chr> <chr> ## 1 153 Sanchez Alejandro 1/1/2016 male ## 2 154 McDonald Ronald 1/9/2016 male ## 3 155 Smith John 1/9/2016 male ## 4 165 Doe Jane 1/4/2016 female ## 5 125 Franklin Benjamin 1/5/2016 male ## 6 111 Newton Isaac 1/9/2016 male ## 7 198 Morales Linda 1/7/2016 female ## 8 201 Providence Cindy 1/9/2016 female ## 9 282 Legend John 1/9/2016 male
As you can see from the output generated in your console, the
personaldata data frame object contains basic employee demographic information. The variable names include:
gender. Technically, the
read_csv function reads in what is called a “tibble” object (as opposed to a data frame object), but for our purposes a tibble will behave similarly to a data frame. For more information on tibbles, check out Wickham and Grolemund’s (2017) chapter on tibbles: http://r4ds.had.co.nz/tibbles.html.
write.csv function from base R can be used to write a data frame object to your working directory or to a folder of your choosing. Let’s write the
personaldata data frame (that we read in and named above) to our working directory. Before doing so, however, let’s make a minor change to the data frame to illustrate a scenario in which you clean your data in R and then write the data to a .csv file so that a colleague can work with the data in another program. Specifically, let’s remove the
lastname variable from the data frame. To do so, type the name of the data frame (
personaldata), followed by the
$ symbol and then the name of the variable in question (
lastname). Next, type the
<- operator followed by
NULL. This code will remove the variable from the data frame.
# Remove variable from data frame object $lastname <- NULL personaldata # Print data frame object print(personaldata)
## # A tibble: 9 × 4 ## id firstname startdate gender ## <dbl> <chr> <chr> <chr> ## 1 153 Alejandro 1/1/2016 male ## 2 154 Ronald 1/9/2016 male ## 3 155 John 1/9/2016 male ## 4 165 Jane 1/4/2016 female ## 5 125 Benjamin 1/5/2016 male ## 6 111 Isaac 1/9/2016 male ## 7 198 Linda 1/7/2016 female ## 8 201 Cindy 1/9/2016 female ## 9 282 John 1/9/2016 male
As you can see in your Console output, the variable called
lastname is no longer present in the data frame object.
To write our “cleaned”” data frame (
personaldata) to our working directory, we use the
write.csv function from base R. As the first argument in the parentheses, type the name of the data frame (
personaldata). Remember to type a comma (
,) before the second argument, as this is how we separate arguments from one another when there are more than one. As the second argument, let’s type what we want to name the file that we will create in our working directory. Make sure that the name of the new .csv file is in quotation marks (
" "). Here, I name the new file “Cleaned PersData.csv”; it is important that you keep the .csv extension at the end of the name you provide.
# Write data frame object to working directory write.csv(personaldata, "Cleaned PersData.csv")
If you go to your working directory folder, you will find the file called “Cleaned PersData.csv” saved there.
We can also specify which folder that we want to write our data to using the full path extension and what we would like to name the new .csv file.
# Write data frame object to folder write.csv(personaldata, "H:/RWorkshop/Cleaned PersData2.csv")
If you go to your working directory folder, you will find the file called “Cleaned PersData2.csv”.
Sometimes we work with table objects in R. If we wish to write a table to our working directory, we can use the
write.table function from base R. Before doing so, we need to create a data table object as an example, which we can do using the
table function from base R.
To create a table, first, come up with a name for your new table object; in this example, I name the table
table_example (because I’m so creative). Second, type the
<- operator to the right of your new table name to tell R that you are creating a new object. Third, type the name of the table-creation function, which is
table. Fourth, in the function’s parentheses, as the first argument, enter the name of first variable you wish to use to make the table, and use the
$ symbol to indicate that the variable (
gender) belongs to the data frame in question (
personaldata), which should look like this:
personaldata$gender. Fifth, as the second argument, enter the name of the second variable you wish to use to make the table, and use the
$ symbol to indicate that the variable (
startdate) belongs to the data frame in question (
personaldata), which should look like this:
# Create table from gender and startdate variables from personaldata data frame <- table(personaldata$gender, personaldata$startdate) table_example # Print table object in Console print(table_example)
## ## 1/1/2016 1/4/2016 1/5/2016 1/7/2016 1/9/2016 ## female 0 1 0 1 1 ## male 1 0 1 0 4
The table above shows how how many female versus male employees started working on a given date.
Now we are ready to write the table called
table_example to our working directory using the
write.table function. As the first argument, type the name of the table object (
table_example). Second, type what we would like to call the file when it is saved in our working directory (
**"Practice Table.csv"**); be sure to include the .csv extension in the name and wrap it all in quotation marks. Third, use the
sep="," argument to specify that the values in the table are separated by commas, as this will be a comma separated values file. Fourth, add the argument
col.names=NA to format the table such that the column names will be aligned with their respective values. The reason for this fourth argument is that in our table the first column will contain the row names of one of the variables; if we don’t include this argument, the function will by default enter the name of the first column name associated with one of the levels of the variables in the first column, and because the first column actually contains the row names for the table, the row names will be off by one column. The
col.names=NA argument simply leaves the first cell in the top row blank so that in the next column to the right, the first column name for one of the variables will appear. [To understand what the table would look like without this fourth argument, simply omit it, and open the resulting file in your working directory to see what happens.]
# Write table object to working directory write.table(table_example, "Practice Table.csv", sep=",", col.names=NA)
If you go to your working directory, you will find the file called “Practice Table.csv”.
Writing data from the R environment to your working directory or another folder can be useful, especially when collaborating with those who do not use R. The
write.csv function writes a data frame object to a .csv file, whereas the
write.table function writes a data table object to a .csv file.