Chapter 16 Arranging (Sorting) Data

In this chapter, we will learn how to arrange (sort) data within a data frame object, which can be useful for identifying high or low numeric values or to alphabetize character values.

16.1 Conceptual Overview

Arranging (sorting) data refers to the process of ordering rows numerically or alphabetically in a data frame or table by the values of one or more variables. Sorting can make it easier to visually scan raw data, such as for the purposes of identifying extreme or outlier values. Sorting can also make facilitate decision making when rank ordering applicants’ scores, for example, on different selection tools.

16.2 Tutorial

This chapter’s tutorial demonstrates how to arrange (sort) data in R.

16.2.1 Video Tutorial

As usual, you have the choice to follow along with the written tutorial in this chapter or to watch the video tutorial below. Both versions of the tutorial will show you how to arrange (sort) data with or without the pipe (%>%) operator. If you’re unfamiliar with the pipe operator, no need to worry: I provide a brief explanation and demonstration regarding their purpose in both versions of the tutorial.

Link to video tutorial: https://youtu.be/wVwJQsLNbmw

16.2.2 Functions & Packages Introduced

Function Package
arrange dplyr
desc dplyr

16.2.3 Initial Steps

Please note, that any function that appears in the Initial Steps section has been covered in a previous chapter. If you need a refresher, please view the relevant chapter. In addition, a previous chapter may show you how to perform the same action using different functions or packages.

If you haven’t already, save the file called “PersData.csv” into a folder that you will subsequently set as your working directory. Your working directory will likely be different than the one shown below (i.e., "H:/RWorkshop"). As a reminder, you can access all of the data files referenced in this book by downloading them as a compressed (zipped) folder from the my GitHub site: https://github.com/davidcaughlin/R-Tutorial-Data-Files; once you’ve followed the link to GitHub, just click “Code” (or “Download”) followed by “Download ZIP”, which will download all of the data files referenced in this book. For the sake of parsimony, I recommend downloading all of the data files into the same folder on your computer, which will allow you to set that same folder as your working directory for each of the chapters in this book.

Next, using the setwd function, set your working directory to the folder in which you saved the data file for this chapter. Alternatively, you can manually set your working directory folder in your drop-down menus by going to Session > Set Working Directory > Choose Directory…. Be sure to create a new R script file (.R) or update an existing R script file so that you can save your script and annotations. If you need refreshers on how to set your working directory and how to create and save an R script, please refer to Setting a Working Directory and Creating & Saving an R Script.

# Set your working directory
setwd("H:/RWorkshop")

Next, read in the .csv data file called “PersData.csv” using your choice of read function. In this example, I use the read_csv function from the readr package (Wickham, Hester, and Bryan 2024). If you choose to use the read_csv function, be sure that you have installed and accessed the readr package using the install.packages and library functions. Note: You don’t need to install a package every time you wish to access it; in general, I would recommend updating a package installation once ever 1-3 months. For refreshers on installing packages and reading data into R, please refer to Packages and Reading Data into R.

# Install readr package if you haven't already
# [Note: You don't need to install a package every 
# time you wish to access it]
install.packages("readr")
# Access readr package
library(readr)

# Read data and name data frame (tibble) object
personaldata <- read_csv("PersData.csv")
## Rows: 9 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): lastname, firstname, startdate, gender
## dbl (1): id
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Print the names of the variables in the data frame (tibble) object
names(personaldata)
## [1] "id"        "lastname"  "firstname" "startdate" "gender"
# Print data frame (tibble) object
personaldata
## # A tibble: 9 × 5
##      id lastname   firstname startdate gender
##   <dbl> <chr>      <chr>     <chr>     <chr> 
## 1   153 Sanchez    Alejandro 1/1/2016  male  
## 2   154 McDonald   Ronald    1/9/2016  male  
## 3   155 Smith      John      1/9/2016  male  
## 4   165 Doe        Jane      1/4/2016  female
## 5   125 Franklin   Benjamin  1/5/2016  male  
## 6   111 Newton     Isaac     1/9/2016  male  
## 7   198 Morales    Linda     1/7/2016  female
## 8   201 Providence Cindy     1/9/2016  female
## 9   282 Legend     John      1/9/2016  male

As you can see from the output generated in your console, the personaldata data frame object contains basic employee demographic information. The variable names include: id, lastname, firstname, startdate, and gender. Technically, the read_csv function reads in what is called a “tibble” object (as opposed to a data frame object), but for our purposes a tibble will behave similarly to a data frame. For more information on tibbles, check out Wickham and Grolemund’s (2017) chapter on tibbles: http://r4ds.had.co.nz/tibbles.html.

16.2.4 Arrange (Sort) Data

There are different functions we could use to arrange (sort) the data in the data frame, and in this chapter, we will focus on the arrange function from the dplyr package (Wickham et al. 2023). Please note that there are other functions we could use to sort data, and if you’re interested, in the Arranging (Sorting) Data: Chapter Supplement, I demonstrate how to use the order function from base R to carry out the same operations we will cover below.

Because the arrange function comes from the dplyr package, which is part of the tidyverse of R packages (Wickham 2023; Wickham et al. 2019). If you haven’t already, install and access the dplyr package using the install.packages and library functions, respectively.

# Install dplyr package if you haven't already
# [Note: You don't need to install a package every 
# time you wish to access it]
install.packages("dplyr")
# Access dplyr package
library(dplyr)

Before diving into arranging the data, as a disclaimer, I will demonstrate two techniques for arranging (sorting) data using the arrange function.

The first technique uses a “pipe” which in R is represented by the %>% operator. The pipe operator comes from a package called magrittr (Bache and Wickham 2022), on which the dplyr is partially dependent. In short, a pipe allows a person to more efficiently write code and to improve the readability of the code and overall script. Specifically, a pipe forwards the result or value of one object or expression to a subsequent function. In doing so, one can avoid writing functions in which other functions are nested parenthetically. For more information on the pipe operator, check out Wickham and Grolemund’s (2017) chapter on pipes: https://r4ds.had.co.nz/pipes.html.

This brings us to the second technique for arranging (sorting) data using the arrange function. The second technique uses a more traditional approach that some may argue lacks the efficiency and readability of the pipe. Conversely, others may argue against the use of pipes altogether. I’m not here to settle any “pipes versus no pipes” debate, and you’re welcome to use either technique. If you don’t want to learn how to use pipes (or would like to learn how to use them at a later date), feel free to skip to the section below called Without Pipe.

16.2.4.1 With Pipe

To use the “with pipe” technique, first, type the name of our data frame object, which we previously named personaldata, followed by the pipe (%>%) operator. This will “pipe” our data frame into the subsequent function. Second, either on the same line or on the next line, type the name of the arrange function, and within the parentheses, enter the variable name startdate as the argument to indicate that we want to arrange (sort) the data by the start date of the employees. The default operation of the arrange function is to arrange (sort) the data in ascending order. If you’re wondering where I found the exact names of the variables in the data frame, revisit the use of the names function, which I demonstrated previously in this chapter in the Initial Steps section.

# Arrange (sort) data by variable in ascending order (single line) (with pipe)
personaldata %>% arrange(startdate)
## # A tibble: 9 × 5
##      id lastname   firstname startdate gender
##   <dbl> <chr>      <chr>     <chr>     <chr> 
## 1   153 Sanchez    Alejandro 1/1/2016  male  
## 2   165 Doe        Jane      1/4/2016  female
## 3   125 Franklin   Benjamin  1/5/2016  male  
## 4   198 Morales    Linda     1/7/2016  female
## 5   154 McDonald   Ronald    1/9/2016  male  
## 6   155 Smith      John      1/9/2016  male  
## 7   111 Newton     Isaac     1/9/2016  male  
## 8   201 Providence Cindy     1/9/2016  female
## 9   282 Legend     John      1/9/2016  male

Alternatively, we can write this script over two lines and achieve the same output in our Console.

# Arrange (sort) data by variable in ascending order (two lines) (with pipe)
personaldata %>% 
  arrange(startdate)
## # A tibble: 9 × 5
##      id lastname   firstname startdate gender
##   <dbl> <chr>      <chr>     <chr>     <chr> 
## 1   153 Sanchez    Alejandro 1/1/2016  male  
## 2   165 Doe        Jane      1/4/2016  female
## 3   125 Franklin   Benjamin  1/5/2016  male  
## 4   198 Morales    Linda     1/7/2016  female
## 5   154 McDonald   Ronald    1/9/2016  male  
## 6   155 Smith      John      1/9/2016  male  
## 7   111 Newton     Isaac     1/9/2016  male  
## 8   201 Providence Cindy     1/9/2016  female
## 9   282 Legend     John      1/9/2016  male

Please note that the operations we have performed thus far have not changed anything in the personaldata data frame object itself; rather, the output in the Console simply shows what it looks like if the data are sorted by the variable in question. We can verify this by viewing the first six rows of data in our data frame object using the head function. As you can see below, nothing changed in the data frame itself.

# Print just the first 6 rows of the data frame in Console
head(personaldata)
## # A tibble: 6 × 5
##      id lastname firstname startdate gender
##   <dbl> <chr>    <chr>     <chr>     <chr> 
## 1   153 Sanchez  Alejandro 1/1/2016  male  
## 2   154 McDonald Ronald    1/9/2016  male  
## 3   155 Smith    John      1/9/2016  male  
## 4   165 Doe      Jane      1/4/2016  female
## 5   125 Franklin Benjamin  1/5/2016  male  
## 6   111 Newton   Isaac     1/9/2016  male

To change the ordering of data in the personaldata data frame object itself, we will need to (re)name the data frame object using the <- variable assignment operator. In this example, I will demonstrate how to overwrite the existing data frame object, and thus I give the data frame object the exact same name as it had originally (i.e., personaldata). To do so, to the left of the <- operator, type what you would like to name the new (updated) sorted data frame object (personaldata). Next, to the right of the <- operator, copy and paste the same code we wrote above. Finally, use the head function from base R to view the first six rows of the new data frame object.

# Arrange (sort) data by variable in ascending order and 
# overwrite existing data frame object (with pipe)
personaldata <- personaldata %>% arrange(startdate)

# Print just the first 6 rows of the data frame in Console
head(personaldata)
## # A tibble: 6 × 5
##      id lastname firstname startdate gender
##   <dbl> <chr>    <chr>     <chr>     <chr> 
## 1   153 Sanchez  Alejandro 1/1/2016  male  
## 2   165 Doe      Jane      1/4/2016  female
## 3   125 Franklin Benjamin  1/5/2016  male  
## 4   198 Morales  Linda     1/7/2016  female
## 5   154 McDonald Ronald    1/9/2016  male  
## 6   155 Smith    John      1/9/2016  male

As you can see in the Console output, now the personaldata data frame object has been changed such that the data are arranged (sorted) by the startdate variable.

To arrange the data in descending order, just use the desc function from dplyr within the arrange function as shown below.

# Arrange (sort) data by variable in ascending order and 
# overwrite existing data frame object (with pipe)
personaldata <- personaldata %>% arrange(desc(startdate))

# Print just the first 6 rows of the data frame in Console
head(personaldata)
## # A tibble: 6 × 5
##      id lastname   firstname startdate gender
##   <dbl> <chr>      <chr>     <chr>     <chr> 
## 1   154 McDonald   Ronald    1/9/2016  male  
## 2   155 Smith      John      1/9/2016  male  
## 3   111 Newton     Isaac     1/9/2016  male  
## 4   201 Providence Cindy     1/9/2016  female
## 5   282 Legend     John      1/9/2016  male  
## 6   198 Morales    Linda     1/7/2016  female

To arrange (sort) data by values/levels of two variables, we simply enter the names of two variables as consecutive arguments. Let’s enter the gender variable first, followed by the startdate variable. The ordering of the two variables matters; the function sorts initially by the values/levels of the first variable listed and sorts subsequently by the values/levels of the second variable listed, but does so within the values/levels of the first variable listed. As shown below, startdate is sorted within the sorted levels of the gender variable. As a reminder, the default operation of the arrange function is to arrange (sort) the data in ascending order. Remember, we use commas to separate arguments used in a function (if there are more than one arguments).

# Arrange (sort) data by two variables in ascending order (with pipe)
personaldata %>% arrange(gender, startdate)
## # A tibble: 9 × 5
##      id lastname   firstname startdate gender
##   <dbl> <chr>      <chr>     <chr>     <chr> 
## 1   165 Doe        Jane      1/4/2016  female
## 2   198 Morales    Linda     1/7/2016  female
## 3   201 Providence Cindy     1/9/2016  female
## 4   153 Sanchez    Alejandro 1/1/2016  male  
## 5   125 Franklin   Benjamin  1/5/2016  male  
## 6   154 McDonald   Ronald    1/9/2016  male  
## 7   155 Smith      John      1/9/2016  male  
## 8   111 Newton     Isaac     1/9/2016  male  
## 9   282 Legend     John      1/9/2016  male

Watch what happens when we switch the order of the two variables we are using to sort the data.

# Arrange (sort) data by two variables in ascending order (with pipe)
personaldata %>% arrange(startdate, gender)
## # A tibble: 9 × 5
##      id lastname   firstname startdate gender
##   <dbl> <chr>      <chr>     <chr>     <chr> 
## 1   153 Sanchez    Alejandro 1/1/2016  male  
## 2   165 Doe        Jane      1/4/2016  female
## 3   125 Franklin   Benjamin  1/5/2016  male  
## 4   198 Morales    Linda     1/7/2016  female
## 5   201 Providence Cindy     1/9/2016  female
## 6   154 McDonald   Ronald    1/9/2016  male  
## 7   155 Smith      John      1/9/2016  male  
## 8   111 Newton     Isaac     1/9/2016  male  
## 9   282 Legend     John      1/9/2016  male

As you can see, the order of the two sorting variables matters.

To arrange the data in descending order, just use the desc function from dplyr within the arrange function.

# Arrange (sort) data by variable in descending order (with pipe)
personaldata %>% arrange(desc(gender), desc(startdate))
## # A tibble: 9 × 5
##      id lastname   firstname startdate gender
##   <dbl> <chr>      <chr>     <chr>     <chr> 
## 1   154 McDonald   Ronald    1/9/2016  male  
## 2   155 Smith      John      1/9/2016  male  
## 3   111 Newton     Isaac     1/9/2016  male  
## 4   282 Legend     John      1/9/2016  male  
## 5   125 Franklin   Benjamin  1/5/2016  male  
## 6   153 Sanchez    Alejandro 1/1/2016  male  
## 7   201 Providence Cindy     1/9/2016  female
## 8   198 Morales    Linda     1/7/2016  female
## 9   165 Doe        Jane      1/4/2016  female

Or, we can sort one variable in the default ascending order and the other in descending order.

# Arrange (sort) data by two variables in ascending & descending order (with pipe)
personaldata %>% arrange(gender, desc(startdate))
## # A tibble: 9 × 5
##      id lastname   firstname startdate gender
##   <dbl> <chr>      <chr>     <chr>     <chr> 
## 1   201 Providence Cindy     1/9/2016  female
## 2   198 Morales    Linda     1/7/2016  female
## 3   165 Doe        Jane      1/4/2016  female
## 4   154 McDonald   Ronald    1/9/2016  male  
## 5   155 Smith      John      1/9/2016  male  
## 6   111 Newton     Isaac     1/9/2016  male  
## 7   282 Legend     John      1/9/2016  male  
## 8   125 Franklin   Benjamin  1/5/2016  male  
## 9   153 Sanchez    Alejandro 1/1/2016  male

16.2.4.2 Without Pipe

We can achieve the same output without using the pipe (%>%) operator as with the pipe operator; again, your choice of using or not using the pipe operator is up to you.

To use the arrange function without the pipe operator, type the name of the arrange function, and within the parentheses, as the first argument, type the name of the personaldata data frame object, and as the second argument, type the startdate variable, where the latter indicates that we want to arrange (sort) the data frame object by the start date of the employees. The default operation of the arrange function is to arrange (sort) the data in ascending order. Remember, we use commas to separate arguments used in a function (if there are more than one arguments). If you’re wondering where I found the exact names of the variables in the data frame, revisit the use of the names function, which I demonstrated previously in this chapter in the Initial Steps section.

# Arrange (sort) data by variable in ascending order without pipe
arrange(personaldata, startdate)
## # A tibble: 9 × 5
##      id lastname   firstname startdate gender
##   <dbl> <chr>      <chr>     <chr>     <chr> 
## 1   153 Sanchez    Alejandro 1/1/2016  male  
## 2   165 Doe        Jane      1/4/2016  female
## 3   125 Franklin   Benjamin  1/5/2016  male  
## 4   198 Morales    Linda     1/7/2016  female
## 5   154 McDonald   Ronald    1/9/2016  male  
## 6   155 Smith      John      1/9/2016  male  
## 7   111 Newton     Isaac     1/9/2016  male  
## 8   201 Providence Cindy     1/9/2016  female
## 9   282 Legend     John      1/9/2016  male

To change the ordering of data in the personaldata data frame object itself, we will need to (re)name the data frame object using the <- variable assignment operator. In this example, I will demonstrate how to overwrite the existing data frame object, and thus I give the data frame object the exact same name as it had originally (i.e., personaldata). To do so, to the left of the <- operator, type what you would like to name the new (updated) sorted data frame object (personaldata). Next, to the right of the <- operator, copy and paste the same code we wrote above. Finally, use the head function from base R to view the first six rows of the new data frame object.

# Arrange (sort) data by variable in ascending order and 
# overwrite existing data frame object without pipe
personaldata <- arrange(personaldata, startdate)

# Print just the first 6 rows of the data frame in Console
head(personaldata)
## # A tibble: 6 × 5
##      id lastname firstname startdate gender
##   <dbl> <chr>    <chr>     <chr>     <chr> 
## 1   153 Sanchez  Alejandro 1/1/2016  male  
## 2   165 Doe      Jane      1/4/2016  female
## 3   125 Franklin Benjamin  1/5/2016  male  
## 4   198 Morales  Linda     1/7/2016  female
## 5   154 McDonald Ronald    1/9/2016  male  
## 6   155 Smith    John      1/9/2016  male

To arrange the data in descending order, just use the desc function from dplyr within the arrange function as shown below.

# Arrange (sort) data by variable in descending order and 
# overwrite existing data frame object without pipe
personaldata <- arrange(personaldata, desc(startdate))

# Print just the first 6 rows of the data frame in Console
head(personaldata)
## # A tibble: 6 × 5
##      id lastname   firstname startdate gender
##   <dbl> <chr>      <chr>     <chr>     <chr> 
## 1   154 McDonald   Ronald    1/9/2016  male  
## 2   155 Smith      John      1/9/2016  male  
## 3   111 Newton     Isaac     1/9/2016  male  
## 4   201 Providence Cindy     1/9/2016  female
## 5   282 Legend     John      1/9/2016  male  
## 6   198 Morales    Linda     1/7/2016  female

To arrange (sort) data by values/levels of two variables, we simply enter the names of two variables as consecutive arguments (after the name of the data frame, which is the first argument). Let’s enter the gender variable first, followed by the startdate variable. The ordering of the two variables matters; the function sorts initially by the values/levels of the first variable listed and sorts subsequently by the values/levels of the second variable listed, but does so within the values/levels of the first variable listed.

# Arrange (sort) data by variable in ascending order without pipe
personaldata <- arrange(personaldata, gender, startdate)

As shown in the output above, startdate is sorted within the sorted levels of the gender variable. This also verifies that the default operation of the arrange function is to arrange (sort) the data in ascending order.

To arrange the data in descending order, just use the desc function from dplyr within the arrange function as shown below. You can use the desc function on one or both sorting variables.

# Arrange (sort) data by one variable in ascending order and 
# the other in descending order without pipe
personaldata <- arrange(personaldata, gender, desc(startdate))

Or we can apply the desc function to both variables.

# Arrange (sort) data by both variables descending order without pipe
personaldata <- arrange(personaldata, desc(gender), desc(startdate))

16.2.5 Summary

In this chapter, we learned how to arrange (sort) data by one or more variables using the arrange and desc functions from the dplyr package. This chapter also introduced the pipe (%>%) operator, which can help make code easier to read in some contexts.

16.3 Chapter Supplement

In addition to the arrange function from the dplyr package covered above, we can use the order function from base R to arrange (sort) data by values for one or more variable. Because this function comes from base R, we do not need to install and access an additional package like we do with the arrange functions, which some may find advantageous.

16.3.1 Functions & Packages Introduced

Function Package
order base R
c base R

16.3.2 Initial Steps

If required, please refer to the Initial Steps section from this chapter for more information on these initial steps.

# Set your working directory
setwd("H:/RWorkshop")
# Access readr package
library(readr)

# Read data and name data frame (tibble) object
personaldata <- read_csv("PersData.csv")
## Rows: 9 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): lastname, firstname, startdate, gender
## dbl (1): id
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

16.3.3 order Function from Base R

To sort a data frame object in ascending order based on a single variable, we will use the order function from base R to do the following:

  1. Type the name of the data frame object that you wish to arrange (sort) (personaldata).
  2. Insert brackets ([ ]), which allow us to reference rows or columns depending on how we format the brackets. If we type a function or value before the comma, we are indicating that we wish to apply operations to row(s), and if we type a function or value after the comma, we are indicating that we wish to apply operations to column(s).
  3. To sort the data frame into ascending rows by the startdate variable, type the name of the order function before the comma in the brackets. As the sole parenthetical argument of the order function, type the name of the personaldata data frame object, followed by the $ operator and the name of the variable by which we wish to sort the data frame, which to reiterate is the startdate variable. The $ operator signals to R that a variable belongs to a particular data frame object. By default, the order function sorts in ascending order.
# Arrange (sort) data by variable in ascending order 
personaldata[order(personaldata$startdate),]
## # A tibble: 9 × 5
##      id lastname   firstname startdate gender
##   <dbl> <chr>      <chr>     <chr>     <chr> 
## 1   153 Sanchez    Alejandro 1/1/2016  male  
## 2   165 Doe        Jane      1/4/2016  female
## 3   125 Franklin   Benjamin  1/5/2016  male  
## 4   198 Morales    Linda     1/7/2016  female
## 5   154 McDonald   Ronald    1/9/2016  male  
## 6   155 Smith      John      1/9/2016  male  
## 7   111 Newton     Isaac     1/9/2016  male  
## 8   201 Providence Cindy     1/9/2016  female
## 9   282 Legend     John      1/9/2016  male

To change the ordering of data in the personaldata data frame object itself, we will need to (re)name the data frame object using the <- variable assignment operator. In this example, I will demonstrate how to overwrite the existing data frame object, and thus I give the data frame object the exact same name as it had originally (i.e., personaldata). To do so, to the left of the <- operator, type what you would like to name the new (updated) sorted data frame object (personaldata). Next, to the right of the <- operator, copy and paste the same code we wrote above. Finally, use the head function from base R to view the first six rows of the new data frame object.

# Arrange (sort) data by variable in ascending order 
# and overwrite existing data frame object
personaldata <- personaldata[order(personaldata$startdate),]

# Print just the first 6 rows of the data frame in Console
head(personaldata)
## # A tibble: 6 × 5
##      id lastname firstname startdate gender
##   <dbl> <chr>    <chr>     <chr>     <chr> 
## 1   153 Sanchez  Alejandro 1/1/2016  male  
## 2   165 Doe      Jane      1/4/2016  female
## 3   125 Franklin Benjamin  1/5/2016  male  
## 4   198 Morales  Linda     1/7/2016  female
## 5   154 McDonald Ronald    1/9/2016  male  
## 6   155 Smith    John      1/9/2016  male

To sort in descending order, add the argument decreasing=TRUE within the order function parentheses. Remember, we use commas to separate arguments used in a function (if there are two or more arguments).

# Arrange (sort) data by variable in descending order
personaldata <- personaldata[order(personaldata$startdate, decreasing=TRUE),]

# Print just the first 6 rows of the data frame in Console
head(personaldata)
## # A tibble: 6 × 5
##      id lastname   firstname startdate gender
##   <dbl> <chr>      <chr>     <chr>     <chr> 
## 1   154 McDonald   Ronald    1/9/2016  male  
## 2   155 Smith      John      1/9/2016  male  
## 3   111 Newton     Isaac     1/9/2016  male  
## 4   201 Providence Cindy     1/9/2016  female
## 5   282 Legend     John      1/9/2016  male  
## 6   198 Morales    Linda     1/7/2016  female

If we wish to sort a data frame object by two variables, as the second argument in the order function parentheses, simply add the name of the data frame object, followed by the $ operator and the name of the second second variable. We will sort the data frame in by gender and startdate. The ordering of the two variables matters; the function sorts initially by the values/levels of the first variable listed and sorts subsequently by the values/levels of the second variable listed, but does so within the values/levels of the first variable listed. As shown below, startdate is sorted within the sorted levels of the gender variable. The default operation of the arrange function is to arrange (sort) the data in ascending order.

# Arrange (sort) data by two variables in ascending order
personaldata <- personaldata[order(personaldata$gender, personaldata$startdate),]

# Print just the first 6 rows of the data frame in Console
head(personaldata)
## # A tibble: 6 × 5
##      id lastname   firstname startdate gender
##   <dbl> <chr>      <chr>     <chr>     <chr> 
## 1   165 Doe        Jane      1/4/2016  female
## 2   198 Morales    Linda     1/7/2016  female
## 3   201 Providence Cindy     1/9/2016  female
## 4   153 Sanchez    Alejandro 1/1/2016  male  
## 5   125 Franklin   Benjamin  1/5/2016  male  
## 6   154 McDonald   Ronald    1/9/2016  male

To sort by one of the variables in descending order and the other variable by the default ascending order, we need to add the decreasing= argument, but because we have two variables, we need to provide a vector containing logical values (TRUE, FALSE) to indicate which variable we wish to apply a descending order. If the logical value is TRUE for the decreasing= argument, then we sort in descending variable. Using the c (combine) function from base R, we create a vector of two logical values whose order corresponds to the order in which we listed the two variables in the order function. For example, if the argument is decreasing=c(FALSE, TRUE), then we sort the first variable in the default ascending order and the second variable in descending order, which is what we do below. Just be sure to add the following argument to the order function when attempting to sort two or more variables: method="radix".

# Arrange (sort) data by gender in ascending order and 
# startdate in descending order
personaldata <- personaldata[order(personaldata$gender, personaldata$startdate, decreasing=c(FALSE, TRUE), method="radix"),]

# Print just the first 6 rows of the data frame in Console
head(personaldata)
## # A tibble: 6 × 5
##      id lastname   firstname startdate gender
##   <dbl> <chr>      <chr>     <chr>     <chr> 
## 1   201 Providence Cindy     1/9/2016  female
## 2   198 Morales    Linda     1/7/2016  female
## 3   165 Doe        Jane      1/4/2016  female
## 4   154 McDonald   Ronald    1/9/2016  male  
## 5   155 Smith      John      1/9/2016  male  
## 6   111 Newton     Isaac     1/9/2016  male

Or, you could sort by both variables in descending order by change the argument to decreasing=c(TRUE, TRUE).

# Arrange (sort) data by gender and startdate variables descending order
personaldata <- personaldata[order(personaldata$gender, personaldata$startdate, decreasing=c(TRUE, TRUE), method="radix"),]

# Print just the first 6 rows of the data frame in Console
head(personaldata)
## # A tibble: 6 × 5
##      id lastname firstname startdate gender
##   <dbl> <chr>    <chr>     <chr>     <chr> 
## 1   154 McDonald Ronald    1/9/2016  male  
## 2   155 Smith    John      1/9/2016  male  
## 3   111 Newton   Isaac     1/9/2016  male  
## 4   282 Legend   John      1/9/2016  male  
## 5   125 Franklin Benjamin  1/5/2016  male  
## 6   153 Sanchez  Alejandro 1/1/2016  male

References

Bache, Stefan Milton, and Hadley Wickham. 2022. Magrittr: A Forward-Pipe Operator for r. https://CRAN.R-project.org/package=magrittr.
———. 2023. Tidyverse: Easily Install and Load the Tidyverse. https://CRAN.R-project.org/package=tidyverse.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.
Wickham, Hadley, Romain François, Lionel Henry, Kirill Müller, and Davis Vaughan. 2023. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.
Wickham, Hadley, and Garrett Grolemund. 2017. R for Data Science: Visualize, Model, Transform, Tidy, and Import Data. Sebastopol, California: O’Reilly Media, Inc. https://r4ds.had.co.nz/n.
Wickham, Hadley, Jim Hester, and Jennifer Bryan. 2024. Readr: Read Rectangular Text Data. https://CRAN.R-project.org/package=readr.