Chapter 11 Basic Features and Operations of the R Language

In this chapter, you will learn about basic features of the R language along with key bits of terminology. Think of this chapter as the “gentle introduction to R” that nearly every book on R includes. Also, it is completely fine if you don’t fully grasp certain concepts and functions upon completing this chapter. We will revisit many of these concepts and functions in the HR context in subsequent chapters. Until then, use this chapter as an opportunity to practice writing R code.

11.1 Video Tutorial

When exploring the basic features, operations, and functions feel free to follow along with the written tutorial below or to check out the video. In the video, I offer an abbreviated version of what’s covered in the written tutorial and focus on what I think most beginners need to know and understand early on about R. In the written tutorial, I get into some functions and operations that probably won’t become relevant until further along in your learning of using R as tool for HR analytics.

Link to video tutorial: https://youtu.be/yHbVbHEjhLQ

11.2 Functions & Packages Introduced

Function Package
print base R
class base R
str base R
install.packages base R
library base R
is.numeric base R
is.integer base R
is.character base R
is.logical base R
as.Date base R
as.POSIXct base R
c base R
data.frame base R
names base R

11.3 R as a Calculator

In its simplest form, R is a calculator. You can use R to carry out basic arithmetic, algebra, and other mathematical operations. The arithmetic operators in R are + (addition), - (subtraction), * (multiplication), / (division), ^ (exponent), and sqrt (square root). Below, you will find an example of these different arithmetic operators in action. In this book, lines of output are preceded by double hashtags (##); however, in your own R Console, you will not see the double hashtags before your output – unless, that is, you use double hashtags before your lines of script annotations.

3 + 2
## [1] 5
3 - 2
## [1] 1
3 * 2
## [1] 6
3 / 2
## [1] 1.5
3 ^ 2
## [1] 9
sqrt(3)
## [1] 1.732051

Note how the six lines of output we generated (see above) appear in the same order in your Console; relatedly, remember that in R (like many other languages) the order of operations is important.

In R it doesn’t matter whether there are spaces between the numeric values and the arithmetic operators. As such, we can write our code as follows and arrive at the same output.

3+2
## [1] 5
3-2
## [1] 1
3*2
## [1] 6
3/2
## [1] 1.5
3^2
## [1] 9
sqrt(3)
## [1] 1.732051

11.4 Functions

A function refers to an integrated set of instructions that can be applied consistently. Some functions also accept arguments, where an argument is used to further refine the instructions and resulting operations of the function. In R we can use functions that come standard from base R or functions that come from downloadable packages. Let’s take a look at the print function that comes standard with base R, which means that we don’t need a special package to access the function. This won’t be terribly exciting, but we can enter 3 as an argument within the print function parentheses; in general, arguments will appear within the inclusive parentheses.

print(3)
## [1] 3

Note how the print function simply “printed” the numeric value 3 that we entered.

We can also do the classic - yet super cliche - “Hello world!” example to illustrate how R and the print function handle text/character/string data; except, let’s change it to "Hello HR Analytics!".

print("Hello HR Analytics!")
## [1] "Hello HR Analytics!"

Note how we have to put text/character/string data in quotation marks. We can use double (" ") or single quotes (' '). Some people prefer double quotes and some prefer single quotes. I happen to prefer double quotes.

Now, let’s play around with the class function. The class function is used for determining the data type represented by a datum or by multiple data that are contained in a vector or variable. By entering 3 as an argument in the class variable, we find that the data type is numeric.

class(3)
## [1] "numeric"

If you would like to learn more about a function and the types of arguments that can be used within the function, you can access the help feature in R to access documentation on the function. The easiest way to do this is to enter ? before the name of the function. Upon doing so, a help window will open; if you’re using RStudio, a specific window pane dedicated to Help will open.

?class

11.5 Packages

A package is a collection of functions with a common theme or that can be applied to address a similar set of problems. R packages go through a rigorous and laborious development and vetting process before being posted on the CRAN website (https://cran.r-project.org/).

There are two functions that are important when it comes to installing and using packages. First, the install.packages function is used to install a package. The name of the package you wish to install should be surrounded with quotation marks (" " or ' ') and entered as an argument in the function. For example, if we wish to install the lessR package (Gerbing, Business, and University 2021), we type install.packages("lessR"), as shown below. Please note that the names of packages (and functions, arguments, and objects) are case sensitive in R.

install.packages("lessR")

Once you have installed a package, you use the library function to “check out” the package from your “library” of functions. To use the function, enter the exact name of the function without quotation marks.

library(lessR)
## Warning: package 'lessR' was built under R version 4.3.2
## 
## lessR 4.3.0                         feedback: gerbing@pdx.edu 
## --------------------------------------------------------------
## > d <- Read("")   Read text, Excel, SPSS, SAS, or R data file
##   d is default data frame, data= in analysis routines optional
## 
## Learn about reading, writing, and manipulating data, graphics,
## testing means and proportions, regression, factor analysis,
## customization, and descriptive statistics from pivot tables
##   Enter:  browseVignettes("lessR")
## 
## View changes in this and recent versions of lessR
##   Enter: news(package="lessR")
## 
## Interactive data analysis
##   Enter: interact()

We can also access a function from an installed package without using the library function. To do so, we can use the :: operator to append the function name to the package name. For illustration purposes, I use precede lessR::BoxPlot() (which would allow us to access the BoxPlot function from the lessR package) with ? to pull up the function documentation.

?lessR::BoxPlot()

11.6 Variable Assignment

Variable assignment is the process of assigning a value or multiple values to a variable. There are two assignment operators that can be used for variable assignment as well as for (re)naming objects such as tables and data frames: <- and =. Both work the same way. I prefer to use <-, but others prefer =. In the example below, we assign the value 3 to a variable (i.e., object) we are naming x.

x <- 3
x = 3

Both functions achieved the same end, and the function that was run most recently overrides the previous attempt at assigning 3 to x. Using the print function we check with this worked.

print(x)
## [1] 3

Or, instead of using the print function , we can simply run x by itself.

x
## [1] 3

11.7 Types of Data

In general, there are four different types of data in R: numeric, character, Date, and logical.

11.7.1 numeric Data

numeric data are numbers or numeric values. This data type is ready-made for quantitative analysis. We can apply the is.numeric function to determine whether a value or variable is numeric; if the value or variable entered as an argument is numeric, R will return TRUE, and if it is not numeric, R will return FALSE. [Note that TRUE and FALSE statements don’t require quotation marks like text/character/string data, as they are handled differently in R.] Finally, let’s see if that "Hello data science!" phrase is numeric.

is.numeric(3)
## [1] TRUE
is.numeric(TRUE)
## [1] FALSE
is.numeric("Hello data science!")
## [1] FALSE

An integer is a special type of numeric data. An integer does not have any decimals, and thus is a whole number. To specify that numeric data are of type integer, L must be appended to the value. For example, to specify that 3 is an integer, it should be written as 3L. To verify that a value is in fact of type integer, we can apply the as.integer function.

is.integer(3L)
## [1] TRUE
is.integer(3)
## [1] FALSE

Alternatively, we can use the class or str functions to determine whether a value or variable is integer or numeric. The function str is used to identify the structure of an object (e.g., data frame, variable, value).

class(3L)
## [1] "integer"
str(3L)
##  int 3
class(3)
## [1] "numeric"
str(3)
##  num 3

Finally, if we assign a numeric or integer value to a variable, the resulting variable will take on the numeric or integer data type (respectively).

x <- 3
class(x)
## [1] "numeric"
x <- 3L
class(x)
## [1] "integer"

11.7.2 character Data

Data of type character do not explicitly or innately have quantitative properties. Sometimes this type of data is called “string” or “text” data. Data of type factor is similar to character but handled differently by R; this distinction becomes more important when working with vectors and analyses. That said, many analysis functions automatically convert character to factor for analyses, but when it comes to working with and manipulating data frames, this character versus factor distinction becomes more important. When data are of type character, we place quotation marks (" " or ' ') around the text. For example, if the character of interest is old, then we place quotation marks around text like this "old". Also note that character data are case sensitive, which means that "old" is not the same as "Old". Using the function is.character, we can determine whether data are in fact of type character.

is.character("old")
## [1] TRUE

Note how omitting the " " results in an error message.

is.character(old)
## Error in eval(expr, envir, enclos): object 'old' not found

Finally, if we assign a numeric or integer value to a variable, the resulting variable will take on the numeric or integer data types.

y <- "old"
class(y)
## [1] "character"

11.7.3 Date Data

When working with dates in R, there are two different types: Date and POSIXct. Date captures just the date, whereas POSIXct captures the date and time. Behind the scenes, R treats Date numerically as the number of days since January 1, 1970, and POSIXct as the number of seconds since January 1, 1970. To specify a value as a date, we can use the as.Date function.

z <- as.Date("1970-03-01")
class(z)
## [1] "Date"

If we convert a variable of type Date to numeric using the as.numeric function, the result is the number of days since January 1, 1970.

z <- as.Date("1970-03-01")
as.numeric(z)
## [1] 59

Now we can use the as.POSIXct function to specify a value as a date and time. Note the very specific format in which the data and time are to be written.

z <- as.POSIXct("1970-03-01 13:10")
class(z)
## [1] "POSIXct" "POSIXt"

If we convert a variable of type POSIXct to numeric using the as.numeric function, the result is the number of seconds since January 1, 1970.

z <- as.POSIXct("1970-03-01 13:10")
as.numeric(z)
## [1] 5173800

11.7.4 logical Data

Data that are of type logical can take on values of either TRUE or FALSE, which correspond to the integers 1 and 0, respectively. As mentioned above, although TRUE and FALSE appear to be character or factor data, they are actually logical data, which means they do not require quotation marks (" " or ' ').

w <- FALSE
class(w)
## [1] "logical"
is.logical(w)
## [1] TRUE

11.8 Vectors

A vector is a group of data elements in a particular order that are all the same data type. To create a vector, we can use the c function, which stands for “combine.” Within the c function parentheses, we can list the data elements and separate them by commas, as commas separate arguments within a function’s parentheses. We can also assign a vector to a variable using either the <- or = operator. We can create vectors for all of the data types: numeric, character, Date, and logical.

As an example, let’s create a vector of numeric values, and let’s call it a.

a <- c(1, 4, 7, 11, 19)

Using the class and print functions, we can determine the class of our new a object and print its values, respectively.

class(a)
## [1] "numeric"
print(a)
## [1]  1  4  7 11 19

Let’s repeat this process by creating vectors containing integer, character, Date, and logical values.

b <- c(3L, 10L, 2L, 5L, 5L)
class(b)
## [1] "integer"
print(b)
## [1]  3 10  2  5  5
c <- c("old", "young", "young", "old", "young")
class(c)
## [1] "character"
print(c)
## [1] "old"   "young" "young" "old"   "young"
d <- as.Date(c("2018-06-01", "2018-06-01", "2018-10-31", "2018-01-01", "2018-06-01"))
class(d)
## [1] "Date"
print(d)
## [1] "2018-06-01" "2018-06-01" "2018-10-31" "2018-01-01" "2018-06-01"
e <- c(TRUE, TRUE, TRUE, FALSE, FALSE)
class(e)
## [1] "logical"
print(e)
## [1]  TRUE  TRUE  TRUE FALSE FALSE

We can also perform mathematical operations on vectors. For instance, we can multiply vector a (which we created above) by a numeric value, and as a result each vector value will be multiplied by that value. This is an important type of operation to remember when it comes time to transform a variable.

a * 11
## [1]  11  44  77 121 209

Note that performing mathematical operations on a vector does not automatically change the properties of the vector itself. If you inspect the a vector, you will see that the original data (e.g., 1, 4, 7, 11, 19) remain.

print(a)
## [1]  1  4  7 11 19

If we want to overwrite a vector with new values based on our operations, we can use <- or = to name the new vector (which, if named the same thing as the old vector, will override the old vector) and, ultimately, to create a vector with the operations applied to the original values.

a <- a * 11
print(a)
## [1]  11  44  77 121 209

To revert back to the original vector values for object a, we can simply specify the original values using the c function once more.

a <- c(1, 4, 7, 11, 19)

Let’s now apply subtraction, addition, and division operators to the vector. Note that R adheres to the standard mathematical orders of operation.

(3 + a) / 2 - 1
## [1]  1.0  2.5  4.0  6.0 10.0

We can also perform mathematical operations on vectors of the same length (i.e., with the same number of data elements). In order, the mathematical operator will be applied to each pair of vector values from the respective vectors. Let’s begin by creating a new vector called f.

f <- c(3, 1, 3, 5, 3)

Both a and f are the same length, which means we can multiply, add, divide, subtract, and exponentiate

a * f
## [1]  3  4 21 55 57
a + f
## [1]  4  5 10 16 22
a / f
## [1] 0.3333333 4.0000000 2.3333333 2.2000000 6.3333333
a - f
## [1] -2  3  4  6 16
a ^ f
## [1]      1      4    343 161051   6859

11.9 Lists

If we wish to combine data elements into a single list that with different data types, we can use the list function. The list function orders each data element and retains its value.

g <- list(1, "dog", TRUE, "2018-05-30")
print(g)
## [[1]]
## [1] 1
## 
## [[2]]
## [1] "dog"
## 
## [[3]]
## [1] TRUE
## 
## [[4]]
## [1] "2018-05-30"
class(g)
## [1] "list"

11.10 Data Frames

A data frame is a specific type of table in which columns represent variables (i.e., fields) and rows represent cases (i.e., observations). We can create a simple data frame object by combining vectors of the same length. Let’s begin by creating six vector objects, which we will label a through f.

a <- c(1, 4, 7, 11, 19) 
b <- c(3L, 10L, 2L, 5L, 5L) 
c <- c("old", "young", "young", "old", "young") 
d <- as.Date(c("2018-06-01", "2018-06-01", "2018-10-31", "2018-01-01", "2018-06-01")) 
e <- c(TRUE, TRUE, TRUE, FALSE, FALSE)
f <- c(3, 1, 3, 5, 3)

Using the data.frame function from base R we can combine the six vectors to create a data frame object. All we need to do is enter the names of the six vectors as separate arguments in the function parentheses. Just as we did with the vectors, we can name the data frame object using the <- operator (or = operator). Let’s name this data frame object R.

r <- data.frame(a, b, c, d, e, f)

Using the print function, we can view the contents of our new data frame object called R.

print(r)
##    a  b     c          d     e f
## 1  1  3   old 2018-06-01  TRUE 3
## 2  4 10 young 2018-06-01  TRUE 1
## 3  7  2 young 2018-10-31  TRUE 3
## 4 11  5   old 2018-01-01 FALSE 5
## 5 19  5 young 2018-06-01 FALSE 3

We can also rename the columns (i.e., variables) of the data frame object by using the names function from base R along with the c function from base R.

names(r) <- c("TenureSup", "TenureOrg", "Age", "HireDate", "FTE", "NumEmp")

To view the changes to our data frame object, use the print function once more.

print(r)
##   TenureSup TenureOrg   Age   HireDate   FTE NumEmp
## 1         1         3   old 2018-06-01  TRUE      3
## 2         4        10 young 2018-06-01  TRUE      1
## 3         7         2 young 2018-10-31  TRUE      3
## 4        11         5   old 2018-01-01 FALSE      5
## 5        19         5 young 2018-06-01 FALSE      3

Finally, we can use the class function to verify that the object is in fact a data frame.

class(r)
## [1] "data.frame"

11.11 Annotations

Part of the value of using a code/script-based program like R is that you can leave notes and explain your decisions and operations. When preceding text, the # symbol indicates that all text that follows on that line is a comment or annotation; as a result, R knows not to interpret or analyze the text that follows. To illustrate annotations, let’s repeat the steps from the previous section; however, this time, let’s include annotations.

# Create six vectors 
a <- c(1, 4, 7, 11, 19) # Vector a
b <- c(3L, 10L, 2L, 5L, 5L) # Vector b
c <- c("old", "young", "young", "old", "young") # Vector c
d <- as.Date(c("2018-06-01", "2018-06-01", "2018-10-31", "2018-01-01", "2018-06-01")) # Vector d
e <- c(TRUE, TRUE, TRUE, FALSE, FALSE) # Vector e
f <- c(3, 1, 3, 5, 3) # Vector f

# Combine vectors into data frame
r <- data.frame(a, b, c, d, e, f)

# Print data frame
print(r)
##    a  b     c          d     e f
## 1  1  3   old 2018-06-01  TRUE 3
## 2  4 10 young 2018-06-01  TRUE 1
## 3  7  2 young 2018-10-31  TRUE 3
## 4 11  5   old 2018-01-01 FALSE 5
## 5 19  5 young 2018-06-01 FALSE 3
# Rename columns in data frame
names(r) <- c("TenureSup", "TenureOrg", "Age", "HireDate", "FTE", "NumEmp")

# Print data frame
print(r)
##   TenureSup TenureOrg   Age   HireDate   FTE NumEmp
## 1         1         3   old 2018-06-01  TRUE      3
## 2         4        10 young 2018-06-01  TRUE      1
## 3         7         2 young 2018-10-31  TRUE      3
## 4        11         5   old 2018-01-01 FALSE      5
## 5        19         5 young 2018-06-01 FALSE      3
# Determine class of object
class(r)
## [1] "data.frame"

Can you start to envision how annotated code might help to tell a story about data-related decision-making processes?

11.12 Summary

In this chapter, we learned the basics of working with the R statistical programming language. This chapter is by no means comprehensive, and there were probably some concepts and functions that still don’t quite make sense to you. Nonetheless, hopefully, this chapter provided you with a basic understanding of the basic operations and building blocks of R. We’ll practice applying many of the operations and functions from this chapter in subsequent chapters, which means you’ll have many more opportunities to learn and practice.

References

Gerbing, David, The School of Business, and Portland State University. 2021. lessR: Less Code, More Results. https://CRAN.R-project.org/package=lessR.