Chapter 16 Arranging (Sorting) Data
In this chapter, we will learn how to arrange (sort) data within a data frame object, which can be useful for identifying high or low numeric values or to alphabetize character values.
16.1 Conceptual Overview
Arranging (sorting) data refers to the process of ordering rows numerically or alphabetically in a data frame or table by the values of one or more variables. Sorting can make it easier to visually scan raw data, such as for the purposes of identifying extreme or outlier values. Sorting can also make facilitate decision making when rank ordering applicants’ scores, for example, on different selection tools.
16.2 Tutorial
This chapter’s tutorial demonstrates how to arrange (sort) data in R.
16.2.1 Video Tutorial
As usual, you have the choice to follow along with the written tutorial in this chapter or to watch the video tutorial below. Both versions of the tutorial will show you how to arrange (sort) data with or without the pipe (%>%
) operator. If you’re unfamiliar with the pipe operator, no need to worry: I provide a brief explanation and demonstration regarding their purpose in both versions of the tutorial.
Link to video tutorial: https://youtu.be/wVwJQsLNbmw
16.2.3 Initial Steps
Please note, that any function that appears in the Initial Steps section has been covered in a previous chapter. If you need a refresher, please view the relevant chapter. In addition, a previous chapter may show you how to perform the same action using different functions or packages.
If you haven’t already, save the file called “PersData.csv” into a folder that you will subsequently set as your working directory. Your working directory will likely be different than the one shown below (i.e., "H:/RWorkshop"
). As a reminder, you can access all of the data files referenced in this book by downloading them as a compressed (zipped) folder from the my GitHub site: https://github.com/davidcaughlin/R-Tutorial-Data-Files; once you’ve followed the link to GitHub, just click “Code” (or “Download”) followed by “Download ZIP”, which will download all of the data files referenced in this book. For the sake of parsimony, I recommend downloading all of the data files into the same folder on your computer, which will allow you to set that same folder as your working directory for each of the chapters in this book.
Next, using the setwd
function, set your working directory to the folder in which you saved the data file for this chapter. Alternatively, you can manually set your working directory folder in your drop-down menus by going to Session > Set Working Directory > Choose Directory…. Be sure to create a new R script file (.R) or update an existing R script file so that you can save your script and annotations. If you need refreshers on how to set your working directory and how to create and save an R script, please refer to Setting a Working Directory and Creating & Saving an R Script.
Next, read in the .csv data file called “PersData.csv” using your choice of read function. In this example, I use the read_csv
function from the readr
package (Wickham, Hester, and Bryan 2024). If you choose to use the read_csv
function, be sure that you have installed and accessed the readr
package using the install.packages
and library
functions. Note: You don’t need to install a package every time you wish to access it; in general, I would recommend updating a package installation once ever 1-3 months. For refreshers on installing packages and reading data into R, please refer to Packages and Reading Data into R.
# Install readr package if you haven't already
# [Note: You don't need to install a package every
# time you wish to access it]
install.packages("readr")
# Access readr package
library(readr)
# Read data and name data frame (tibble) object
personaldata <- read_csv("PersData.csv")
## Rows: 9 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): lastname, firstname, startdate, gender
## dbl (1): id
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## [1] "id" "lastname" "firstname" "startdate" "gender"
## # A tibble: 9 × 5
## id lastname firstname startdate gender
## <dbl> <chr> <chr> <chr> <chr>
## 1 153 Sanchez Alejandro 1/1/2016 male
## 2 154 McDonald Ronald 1/9/2016 male
## 3 155 Smith John 1/9/2016 male
## 4 165 Doe Jane 1/4/2016 female
## 5 125 Franklin Benjamin 1/5/2016 male
## 6 111 Newton Isaac 1/9/2016 male
## 7 198 Morales Linda 1/7/2016 female
## 8 201 Providence Cindy 1/9/2016 female
## 9 282 Legend John 1/9/2016 male
As you can see from the output generated in your console, the personaldata
data frame object contains basic employee demographic information. The variable names include: id
, lastname
, firstname
, startdate
, and gender
. Technically, the read_csv
function reads in what is called a “tibble” object (as opposed to a data frame object), but for our purposes a tibble will behave similarly to a data frame. For more information on tibbles, check out Wickham and Grolemund’s (2017) chapter on tibbles: http://r4ds.had.co.nz/tibbles.html.
16.2.4 Arrange (Sort) Data
There are different functions we could use to arrange (sort) the data in the data frame, and in this chapter, we will focus on the arrange
function from the dplyr
package (Wickham et al. 2023). Please note that there are other functions we could use to sort data, and if you’re interested, in the Arranging (Sorting) Data: Chapter Supplement, I demonstrate how to use the order
function from base R to carry out the same operations we will cover below.
Because the arrange
function comes from the dplyr
package, which is part of the tidyverse
of R packages (Wickham 2023; Wickham et al. 2019). If you haven’t already, install and access the dplyr
package using the install.packages
and library
functions, respectively.
# Install dplyr package if you haven't already
# [Note: You don't need to install a package every
# time you wish to access it]
install.packages("dplyr")
Before diving into arranging the data, as a disclaimer, I will demonstrate two techniques for arranging (sorting) data using the arrange
function.
The first technique uses a “pipe” which in R is represented by the %>%
operator. The pipe operator comes from a package called magrittr
(Bache and Wickham 2022), on which the dplyr
is partially dependent. In short, a pipe allows a person to more efficiently write code and to improve the readability of the code and overall script. Specifically, a pipe forwards the result or value of one object or expression to a subsequent function. In doing so, one can avoid writing functions in which other functions are nested parenthetically. For more information on the pipe operator, check out Wickham and Grolemund’s (2017) chapter on pipes: https://r4ds.had.co.nz/pipes.html.
This brings us to the second technique for arranging (sorting) data using the arrange
function. The second technique uses a more traditional approach that some may argue lacks the efficiency and readability of the pipe. Conversely, others may argue against the use of pipes altogether. I’m not here to settle any “pipes versus no pipes” debate, and you’re welcome to use either technique. If you don’t want to learn how to use pipes (or would like to learn how to use them at a later date), feel free to skip to the section below called Without Pipe.
16.2.4.1 With Pipe
To use the “with pipe” technique, first, type the name of our data frame object, which we previously named personaldata
, followed by the pipe (%>%
) operator. This will “pipe” our data frame into the subsequent function. Second, either on the same line or on the next line, type the name of the arrange
function, and within the parentheses, enter the variable name startdate
as the argument to indicate that we want to arrange (sort) the data by the start date of the employees. The default operation of the arrange
function is to arrange (sort) the data in ascending order. If you’re wondering where I found the exact names of the variables in the data frame, revisit the use of the names
function, which I demonstrated previously in this chapter in the Initial Steps section.
# Arrange (sort) data by variable in ascending order (single line) (with pipe)
personaldata %>% arrange(startdate)
## # A tibble: 9 × 5
## id lastname firstname startdate gender
## <dbl> <chr> <chr> <chr> <chr>
## 1 153 Sanchez Alejandro 1/1/2016 male
## 2 165 Doe Jane 1/4/2016 female
## 3 125 Franklin Benjamin 1/5/2016 male
## 4 198 Morales Linda 1/7/2016 female
## 5 154 McDonald Ronald 1/9/2016 male
## 6 155 Smith John 1/9/2016 male
## 7 111 Newton Isaac 1/9/2016 male
## 8 201 Providence Cindy 1/9/2016 female
## 9 282 Legend John 1/9/2016 male
Alternatively, we can write this script over two lines and achieve the same output in our Console.
# Arrange (sort) data by variable in ascending order (two lines) (with pipe)
personaldata %>%
arrange(startdate)
## # A tibble: 9 × 5
## id lastname firstname startdate gender
## <dbl> <chr> <chr> <chr> <chr>
## 1 153 Sanchez Alejandro 1/1/2016 male
## 2 165 Doe Jane 1/4/2016 female
## 3 125 Franklin Benjamin 1/5/2016 male
## 4 198 Morales Linda 1/7/2016 female
## 5 154 McDonald Ronald 1/9/2016 male
## 6 155 Smith John 1/9/2016 male
## 7 111 Newton Isaac 1/9/2016 male
## 8 201 Providence Cindy 1/9/2016 female
## 9 282 Legend John 1/9/2016 male
Please note that the operations we have performed thus far have not changed anything in the personaldata
data frame object itself; rather, the output in the Console simply shows what it looks like if the data are sorted by the variable in question. We can verify this by viewing the first six rows of data in our data frame object using the head
function. As you can see below, nothing changed in the data frame itself.
## # A tibble: 6 × 5
## id lastname firstname startdate gender
## <dbl> <chr> <chr> <chr> <chr>
## 1 153 Sanchez Alejandro 1/1/2016 male
## 2 154 McDonald Ronald 1/9/2016 male
## 3 155 Smith John 1/9/2016 male
## 4 165 Doe Jane 1/4/2016 female
## 5 125 Franklin Benjamin 1/5/2016 male
## 6 111 Newton Isaac 1/9/2016 male
To change the ordering of data in the personaldata
data frame object itself, we will need to (re)name the data frame object using the <-
variable assignment operator. In this example, I will demonstrate how to overwrite the existing data frame object, and thus I give the data frame object the exact same name as it had originally (i.e., personaldata
). To do so, to the left of the <-
operator, type what you would like to name the new (updated) sorted data frame object (personaldata
). Next, to the right of the <-
operator, copy and paste the same code we wrote above. Finally, use the head
function from base R to view the first six rows of the new data frame object.
# Arrange (sort) data by variable in ascending order and
# overwrite existing data frame object (with pipe)
personaldata <- personaldata %>% arrange(startdate)
# Print just the first 6 rows of the data frame in Console
head(personaldata)
## # A tibble: 6 × 5
## id lastname firstname startdate gender
## <dbl> <chr> <chr> <chr> <chr>
## 1 153 Sanchez Alejandro 1/1/2016 male
## 2 165 Doe Jane 1/4/2016 female
## 3 125 Franklin Benjamin 1/5/2016 male
## 4 198 Morales Linda 1/7/2016 female
## 5 154 McDonald Ronald 1/9/2016 male
## 6 155 Smith John 1/9/2016 male
As you can see in the Console output, now the personaldata
data frame object has been changed such that the data are arranged (sorted) by the startdate
variable.
To arrange the data in descending order, just use the desc
function from dplyr
within the arrange
function as shown below.
# Arrange (sort) data by variable in ascending order and
# overwrite existing data frame object (with pipe)
personaldata <- personaldata %>% arrange(desc(startdate))
# Print just the first 6 rows of the data frame in Console
head(personaldata)
## # A tibble: 6 × 5
## id lastname firstname startdate gender
## <dbl> <chr> <chr> <chr> <chr>
## 1 154 McDonald Ronald 1/9/2016 male
## 2 155 Smith John 1/9/2016 male
## 3 111 Newton Isaac 1/9/2016 male
## 4 201 Providence Cindy 1/9/2016 female
## 5 282 Legend John 1/9/2016 male
## 6 198 Morales Linda 1/7/2016 female
To arrange (sort) data by values/levels of two variables, we simply enter the names of two variables as consecutive arguments. Let’s enter the gender
variable first, followed by the startdate
variable. The ordering of the two variables matters; the function sorts initially by the values/levels of the first variable listed and sorts subsequently by the values/levels of the second variable listed, but does so within the values/levels of the first variable listed. As shown below, startdate
is sorted within the sorted levels of the gender
variable. As a reminder, the default operation of the arrange
function is to arrange (sort) the data in ascending order. Remember, we use commas to separate arguments used in a function (if there are more than one arguments).
# Arrange (sort) data by two variables in ascending order (with pipe)
personaldata %>% arrange(gender, startdate)
## # A tibble: 9 × 5
## id lastname firstname startdate gender
## <dbl> <chr> <chr> <chr> <chr>
## 1 165 Doe Jane 1/4/2016 female
## 2 198 Morales Linda 1/7/2016 female
## 3 201 Providence Cindy 1/9/2016 female
## 4 153 Sanchez Alejandro 1/1/2016 male
## 5 125 Franklin Benjamin 1/5/2016 male
## 6 154 McDonald Ronald 1/9/2016 male
## 7 155 Smith John 1/9/2016 male
## 8 111 Newton Isaac 1/9/2016 male
## 9 282 Legend John 1/9/2016 male
Watch what happens when we switch the order of the two variables we are using to sort the data.
# Arrange (sort) data by two variables in ascending order (with pipe)
personaldata %>% arrange(startdate, gender)
## # A tibble: 9 × 5
## id lastname firstname startdate gender
## <dbl> <chr> <chr> <chr> <chr>
## 1 153 Sanchez Alejandro 1/1/2016 male
## 2 165 Doe Jane 1/4/2016 female
## 3 125 Franklin Benjamin 1/5/2016 male
## 4 198 Morales Linda 1/7/2016 female
## 5 201 Providence Cindy 1/9/2016 female
## 6 154 McDonald Ronald 1/9/2016 male
## 7 155 Smith John 1/9/2016 male
## 8 111 Newton Isaac 1/9/2016 male
## 9 282 Legend John 1/9/2016 male
As you can see, the order of the two sorting variables matters.
To arrange the data in descending order, just use the desc
function from dplyr
within the arrange
function.
# Arrange (sort) data by variable in descending order (with pipe)
personaldata %>% arrange(desc(gender), desc(startdate))
## # A tibble: 9 × 5
## id lastname firstname startdate gender
## <dbl> <chr> <chr> <chr> <chr>
## 1 154 McDonald Ronald 1/9/2016 male
## 2 155 Smith John 1/9/2016 male
## 3 111 Newton Isaac 1/9/2016 male
## 4 282 Legend John 1/9/2016 male
## 5 125 Franklin Benjamin 1/5/2016 male
## 6 153 Sanchez Alejandro 1/1/2016 male
## 7 201 Providence Cindy 1/9/2016 female
## 8 198 Morales Linda 1/7/2016 female
## 9 165 Doe Jane 1/4/2016 female
Or, we can sort one variable in the default ascending order and the other in descending order.
# Arrange (sort) data by two variables in ascending & descending order (with pipe)
personaldata %>% arrange(gender, desc(startdate))
## # A tibble: 9 × 5
## id lastname firstname startdate gender
## <dbl> <chr> <chr> <chr> <chr>
## 1 201 Providence Cindy 1/9/2016 female
## 2 198 Morales Linda 1/7/2016 female
## 3 165 Doe Jane 1/4/2016 female
## 4 154 McDonald Ronald 1/9/2016 male
## 5 155 Smith John 1/9/2016 male
## 6 111 Newton Isaac 1/9/2016 male
## 7 282 Legend John 1/9/2016 male
## 8 125 Franklin Benjamin 1/5/2016 male
## 9 153 Sanchez Alejandro 1/1/2016 male
16.2.4.2 Without Pipe
We can achieve the same output without using the pipe (%>%
) operator as with the pipe operator; again, your choice of using or not using the pipe operator is up to you.
To use the arrange
function without the pipe operator, type the name of the arrange
function, and within the parentheses, as the first argument, type the name of the personaldata
data frame object, and as the second argument, type the startdate
variable, where the latter indicates that we want to arrange (sort) the data frame object by the start date of the employees. The default operation of the arrange
function is to arrange (sort) the data in ascending order. Remember, we use commas to separate arguments used in a function (if there are more than one arguments). If you’re wondering where I found the exact names of the variables in the data frame, revisit the use of the names
function, which I demonstrated previously in this chapter in the Initial Steps section.
## # A tibble: 9 × 5
## id lastname firstname startdate gender
## <dbl> <chr> <chr> <chr> <chr>
## 1 153 Sanchez Alejandro 1/1/2016 male
## 2 165 Doe Jane 1/4/2016 female
## 3 125 Franklin Benjamin 1/5/2016 male
## 4 198 Morales Linda 1/7/2016 female
## 5 154 McDonald Ronald 1/9/2016 male
## 6 155 Smith John 1/9/2016 male
## 7 111 Newton Isaac 1/9/2016 male
## 8 201 Providence Cindy 1/9/2016 female
## 9 282 Legend John 1/9/2016 male
To change the ordering of data in the personaldata
data frame object itself, we will need to (re)name the data frame object using the <-
variable assignment operator. In this example, I will demonstrate how to overwrite the existing data frame object, and thus I give the data frame object the exact same name as it had originally (i.e., personaldata
). To do so, to the left of the <-
operator, type what you would like to name the new (updated) sorted data frame object (personaldata
). Next, to the right of the <-
operator, copy and paste the same code we wrote above. Finally, use the head
function from base R to view the first six rows of the new data frame object.
# Arrange (sort) data by variable in ascending order and
# overwrite existing data frame object without pipe
personaldata <- arrange(personaldata, startdate)
# Print just the first 6 rows of the data frame in Console
head(personaldata)
## # A tibble: 6 × 5
## id lastname firstname startdate gender
## <dbl> <chr> <chr> <chr> <chr>
## 1 153 Sanchez Alejandro 1/1/2016 male
## 2 165 Doe Jane 1/4/2016 female
## 3 125 Franklin Benjamin 1/5/2016 male
## 4 198 Morales Linda 1/7/2016 female
## 5 154 McDonald Ronald 1/9/2016 male
## 6 155 Smith John 1/9/2016 male
To arrange the data in descending order, just use the desc
function from dplyr
within the arrange
function as shown below.
# Arrange (sort) data by variable in descending order and
# overwrite existing data frame object without pipe
personaldata <- arrange(personaldata, desc(startdate))
# Print just the first 6 rows of the data frame in Console
head(personaldata)
## # A tibble: 6 × 5
## id lastname firstname startdate gender
## <dbl> <chr> <chr> <chr> <chr>
## 1 154 McDonald Ronald 1/9/2016 male
## 2 155 Smith John 1/9/2016 male
## 3 111 Newton Isaac 1/9/2016 male
## 4 201 Providence Cindy 1/9/2016 female
## 5 282 Legend John 1/9/2016 male
## 6 198 Morales Linda 1/7/2016 female
To arrange (sort) data by values/levels of two variables, we simply enter the names of two variables as consecutive arguments (after the name of the data frame, which is the first argument). Let’s enter the gender
variable first, followed by the startdate
variable. The ordering of the two variables matters; the function sorts initially by the values/levels of the first variable listed and sorts subsequently by the values/levels of the second variable listed, but does so within the values/levels of the first variable listed.
# Arrange (sort) data by variable in ascending order without pipe
personaldata <- arrange(personaldata, gender, startdate)
As shown in the output above, startdate
is sorted within the sorted levels of the gender
variable. This also verifies that the default operation of the arrange
function is to arrange (sort) the data in ascending order.
To arrange the data in descending order, just use the desc
function from dplyr
within the arrange
function as shown below. You can use the desc
function on one or both sorting variables.
# Arrange (sort) data by one variable in ascending order and
# the other in descending order without pipe
personaldata <- arrange(personaldata, gender, desc(startdate))
Or we can apply the desc
function to both variables.
16.3 Chapter Supplement
In addition to the arrange
function from the dplyr
package covered above, we can use the order
function from base R to arrange (sort) data by values for one or more variable. Because this function comes from base R, we do not need to install and access an additional package like we do with the arrange
functions, which some may find advantageous.
16.3.2 Initial Steps
If required, please refer to the Initial Steps section from this chapter for more information on these initial steps.
# Access readr package
library(readr)
# Read data and name data frame (tibble) object
personaldata <- read_csv("PersData.csv")
## Rows: 9 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): lastname, firstname, startdate, gender
## dbl (1): id
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
16.3.3 order
Function from Base R
To sort a data frame object in ascending order based on a single variable, we will use the order
function from base R to do the following:
- Type the name of the data frame object that you wish to arrange (sort) (
personaldata
). - Insert brackets (
[ ]
), which allow us to reference rows or columns depending on how we format the brackets. If we type a function or value before the comma, we are indicating that we wish to apply operations to row(s), and if we type a function or value after the comma, we are indicating that we wish to apply operations to column(s). - To sort the data frame into ascending rows by the
startdate
variable, type the name of theorder
function before the comma in the brackets. As the sole parenthetical argument of theorder
function, type the name of thepersonaldata
data frame object, followed by the$
operator and the name of the variable by which we wish to sort the data frame, which to reiterate is thestartdate
variable. The$
operator signals to R that a variable belongs to a particular data frame object. By default, theorder
function sorts in ascending order.
## # A tibble: 9 × 5
## id lastname firstname startdate gender
## <dbl> <chr> <chr> <chr> <chr>
## 1 153 Sanchez Alejandro 1/1/2016 male
## 2 165 Doe Jane 1/4/2016 female
## 3 125 Franklin Benjamin 1/5/2016 male
## 4 198 Morales Linda 1/7/2016 female
## 5 154 McDonald Ronald 1/9/2016 male
## 6 155 Smith John 1/9/2016 male
## 7 111 Newton Isaac 1/9/2016 male
## 8 201 Providence Cindy 1/9/2016 female
## 9 282 Legend John 1/9/2016 male
To change the ordering of data in the personaldata
data frame object itself, we will need to (re)name the data frame object using the <-
variable assignment operator. In this example, I will demonstrate how to overwrite the existing data frame object, and thus I give the data frame object the exact same name as it had originally (i.e., personaldata
). To do so, to the left of the <-
operator, type what you would like to name the new (updated) sorted data frame object (personaldata
). Next, to the right of the <-
operator, copy and paste the same code we wrote above. Finally, use the head
function from base R to view the first six rows of the new data frame object.
# Arrange (sort) data by variable in ascending order
# and overwrite existing data frame object
personaldata <- personaldata[order(personaldata$startdate),]
# Print just the first 6 rows of the data frame in Console
head(personaldata)
## # A tibble: 6 × 5
## id lastname firstname startdate gender
## <dbl> <chr> <chr> <chr> <chr>
## 1 153 Sanchez Alejandro 1/1/2016 male
## 2 165 Doe Jane 1/4/2016 female
## 3 125 Franklin Benjamin 1/5/2016 male
## 4 198 Morales Linda 1/7/2016 female
## 5 154 McDonald Ronald 1/9/2016 male
## 6 155 Smith John 1/9/2016 male
To sort in descending order, add the argument decreasing=TRUE
within the order
function parentheses. Remember, we use commas to separate arguments used in a function (if there are two or more arguments).
# Arrange (sort) data by variable in descending order
personaldata <- personaldata[order(personaldata$startdate, decreasing=TRUE),]
# Print just the first 6 rows of the data frame in Console
head(personaldata)
## # A tibble: 6 × 5
## id lastname firstname startdate gender
## <dbl> <chr> <chr> <chr> <chr>
## 1 154 McDonald Ronald 1/9/2016 male
## 2 155 Smith John 1/9/2016 male
## 3 111 Newton Isaac 1/9/2016 male
## 4 201 Providence Cindy 1/9/2016 female
## 5 282 Legend John 1/9/2016 male
## 6 198 Morales Linda 1/7/2016 female
If we wish to sort a data frame object by two variables, as the second argument in the order
function parentheses, simply add the name of the data frame object, followed by the $
operator and the name of the second second variable. We will sort the data frame in by gender
and startdate
. The ordering of the two variables matters; the function sorts initially by the values/levels of the first variable listed and sorts subsequently by the values/levels of the second variable listed, but does so within the values/levels of the first variable listed. As shown below, startdate
is sorted within the sorted levels of the gender
variable. The default operation of the arrange
function is to arrange (sort) the data in ascending order.
# Arrange (sort) data by two variables in ascending order
personaldata <- personaldata[order(personaldata$gender, personaldata$startdate),]
# Print just the first 6 rows of the data frame in Console
head(personaldata)
## # A tibble: 6 × 5
## id lastname firstname startdate gender
## <dbl> <chr> <chr> <chr> <chr>
## 1 165 Doe Jane 1/4/2016 female
## 2 198 Morales Linda 1/7/2016 female
## 3 201 Providence Cindy 1/9/2016 female
## 4 153 Sanchez Alejandro 1/1/2016 male
## 5 125 Franklin Benjamin 1/5/2016 male
## 6 154 McDonald Ronald 1/9/2016 male
To sort by one of the variables in descending order and the other variable by the default ascending order, we need to add the decreasing=
argument, but because we have two variables, we need to provide a vector containing logical values (TRUE
, FALSE
) to indicate which variable we wish to apply a descending order. If the logical value is TRUE
for the decreasing=
argument, then we sort in descending variable. Using the c
(combine) function from base R, we create a vector of two logical values whose order corresponds to the order in which we listed the two variables in the order
function. For example, if the argument is decreasing=c(FALSE, TRUE)
, then we sort the first variable in the default ascending order and the second variable in descending order, which is what we do below. Just be sure to add the following argument to the order
function when attempting to sort two or more variables: method="radix"
.
# Arrange (sort) data by gender in ascending order and
# startdate in descending order
personaldata <- personaldata[order(personaldata$gender, personaldata$startdate, decreasing=c(FALSE, TRUE), method="radix"),]
# Print just the first 6 rows of the data frame in Console
head(personaldata)
## # A tibble: 6 × 5
## id lastname firstname startdate gender
## <dbl> <chr> <chr> <chr> <chr>
## 1 201 Providence Cindy 1/9/2016 female
## 2 198 Morales Linda 1/7/2016 female
## 3 165 Doe Jane 1/4/2016 female
## 4 154 McDonald Ronald 1/9/2016 male
## 5 155 Smith John 1/9/2016 male
## 6 111 Newton Isaac 1/9/2016 male
Or, you could sort by both variables in descending order by change the argument to decreasing=c(TRUE, TRUE)
.
# Arrange (sort) data by gender and startdate variables descending order
personaldata <- personaldata[order(personaldata$gender, personaldata$startdate, decreasing=c(TRUE, TRUE), method="radix"),]
# Print just the first 6 rows of the data frame in Console
head(personaldata)
## # A tibble: 6 × 5
## id lastname firstname startdate gender
## <dbl> <chr> <chr> <chr> <chr>
## 1 154 McDonald Ronald 1/9/2016 male
## 2 155 Smith John 1/9/2016 male
## 3 111 Newton Isaac 1/9/2016 male
## 4 282 Legend John 1/9/2016 male
## 5 125 Franklin Benjamin 1/5/2016 male
## 6 153 Sanchez Alejandro 1/1/2016 male