# Chapter 54 Investigating Processes Using Path Analysis

In this chapter, we will learn how to apply path analysis in order to investigate processes that influence employees’ performance of a particular behavior.

## 54.1 Conceptual Overview

In its simplest forms, **path analysis** can represent a single linear regression model (i.e., one outcome, one predictor), but in its more complex, path analysis can represent a system of multiple equations. Path analysis is often useful for testing theoretical or conceptual models with multiple outcome variables and/or outcome variables that also serve as predictor variables (e.g., mediators). Both conventional multiple linear regression modeling and structural equation modeling can be used for path analysis, and in this tutorial we will focus on the latter approach, as it allows for model estimation in a single step and the assessment of overall model fit to the data across the equations specified in the model.

Path analysis can be used to evaluate models with presumed causal chains of variables, and this is why the approach is sometimes called **causal modeling**. We should be careful with how we use the term *causal modeling*, however, as showing that a chain of variables are linked to together in a model does not necessarily mean that the variables are causally related.

In this chapter, we will focus on **recursive** models, which means the direct relations between two variables presumed to be causally related can only be unidirectional. A **nonrecursive** model allows for bidirectional associations between variables presumed to be causally related and for a predictor variable to be correlated with the residual error of its presumed outcome variable.

### 54.1.1 Path Diagram

It is customary to depict a path analysis model visually using a path diagram, and as mentioned above, path analysis can be used to test a theoretical or conceptual model of interest. Let’s use the Theory of Planned Behavior (Ajzen 1991). For a simplified version of the theory, please refer to the Figure 1 below.

In a simplified form, the Theory of Planned Behavior posits that an intention to perform a particular behavior influences the individual’s decision to enact that behavior, and attitude toward the behavior, perception of norms pertaining to that behavior, and perception of control over performing that behavior influence the individual’s intention to perform the behavior.

We can specify the Theory of Planned Behavior as a path diagram by first drawing the **(manifest, observed) variables** and the **directional (structural) relations (paths)** between the variables as implied by the theory, where rectangles represent the variables and directional arrows represent the directional relations between variables, which is depicted in the path diagram below (see Figure 2).

For **exogenous variables** in the model, on the one hand, we can choose to estimate their variances and the covariances between them; on the other hand, we can choose *not* to estimate their variances and covariances between them. For our purposes, we’ll practice specifying the variances and covariances between exogenous variables, but *note that when missing data are present and full information maximum likelihood is deployed, the decision to estimate the variances of exogenous variables can have additional implications*, which go beyond the scope of this tutorial. Exogenous variables serve only as predictor variables in the model, meaning that no other variables in the model to predict them and their causes exist outside of the model. As shown in the figure above, **variances** are typically represented as curved double-sided arrows, where both arrows point to the same variable. **Covariances** are also represented as double-sided arrows in which the arrows connect two distinct variables.

**(Residual) error terms** (which are sometimes called *disturbances*) are added to **endogenous variables**, where endogenous variables refer to those variables that are predicted by another variable in the model, meaning that at least one presumed cause is modeled. Note that an endogenous variable may also be specified as the predictor of another endogenous variable, as is the case in our example path diagram representing the Theory of Planned Behavior. In addition, please note that an error term represents variability in an endogenous variable that remains unexplained even after the effects of the specified predictor variables are accounted for. Note, for example, that when we have multiple endogenous variables at the same stage of the model (e.g., first-stage mediators, terminal outcomes), we may choose to allow their error terms to covary.

In sum, the conventional symbols used in a path diagram are shown Figure 3 below. With the exception of the manifest variable symbol, all of these symbols represent components of the model that can be estimated statistically using path analysis. These model components that can (and will) be estimated are commonly referred to as **parameters** or **free parameters**.

Importantly, please note that in our path diagram because there are no direct relations specified from Attitude to Behavior, Norms to Behavior, and Control to Behavior, which implies that those direct relations are constrained to zero. If we suspected that, in fact, those direct relations are non-zero, then we would draw direction relations (i.e., single-headed arrows) in our path diagram and estimate their values during path analysis.

### 54.1.2 Model Identification

**Model identification** has to do with the number of (free) parameters specified in the model relative to the number of unique (non-redundant) sources of information available, and model implication has important implications for assessing model fit and estimating parameter estimates.

**Just-identified:** In a just-identified model (i.e., *saturated* model), the number of parameters (e.g., structural relations, variances) is *equal to* the number of unique (non-redundant) sources of information, which means that the degrees of freedom (*df*) is equal to zero. In just-identified models, the model parameters can be estimated, but the model fit cannot be assessed in a meaningful way, aside from the *R*^{2} value. As a specific applications of path analysis, simple linear and multiple linear regression models are always just-identified.

**Over-identified:** In an over-identified model, the number of parameters (e.g., structural relations, variances) is *less than* the number of unique (non-redundant) sources of information, which means that the degrees of freedom (*df*) is greater than zero. In over-identified models, the model parameters can be estimated, and the model fit can be assessed.

**Under-identified:** In an under-identified model, the number of parameters (e.g., structural relations, variances) is *greater than* the number of unique (non-redundant) sources of information, which means that the degrees of freedom (*df*) is less than zero. In under-identified models, the model parameters and model fit cannot be estimated. Sometimes we might say that such models are called **overparameterized** because they have more parameters to be estimated than unique (non-redundant) sources of information.

Most (if not all) statistical software packages that allow structural equation modeling – and, by extension, path analysis – to automatically compute the degrees of freedom for a model or provide an error message if the model is under-identified. As such, we don’t need to count the number of sources of unique (non-redundant) sources of information and free parameters by hand. With that said, to understand model identification and its various forms at a deeper level, it is often helpful to practice calculating the degrees freedom by hand when first learning.

The formula for calculating the number of unique (non-redundant) sources of information available for a particular model is as follows:

\(i = \frac{p(p+1)}{2}\)

where \(p\) is the number of manifest (observed) variables to be modeled. This formula calculates the number of possible unique covariances and variances for the variables specified in the model – or in other words, it calculates the lower diagonal of a covariance matrix, including the variances.

In the path diagram we specified above, there are five manifest variables: Attitude, Norms, Control, Intention, and Behavior. Thus, in the following formula, \(p\) is equal to 5, and thus the number of unique (non-redundant) sources of information is 15.

\(i = \frac{5(5+1)}{2} = \frac{30}{2} = 15\)

To count the number of free parameters (\(k\)), simply add up the number of the specified direction relations, variances, covariances, and error terms in the path analysis model. As shown in Figure 4 below, our example path analysis model has 12 free parameters.

\(k = 12\)

To calculate the degrees of freedom (*df*) for the model, subtract the number of free parameters from the number unique (non-redundant) sources of information, which in this example is equal to 3, as shown below. Thus, the degrees of freedom for the model is 3, which means the model is over-identified.

\(df = i - k = 15 - 12 = 3\)

### 54.1.3 Model Fit

When a model is over-identified (*df* > 0), the extent to which the specified model fits the data can be assessed using various model fit indices, such as the chi-square test (\(\chi^{2}\)), comparative fit index (CFI), Tucker-Lewis index (TLI), root mean square error of approximation (RMSEA), and standardized root mean square residual (SRMR). For a commonly cited reference on cutoffs for fit indices, please refer to Hu and Bentler (1999).

**Chi-square test:** The chi-square test can be used to assess whether the model fits the data, where a statistically significant chi-square value (e.g., *p* < .05) indicates that the model does not fit the data well and a nonsignificant chi-square value (e.g., *p* > .05) indicates that the model fits the data reasonably well. The null hypothesis for the chi-square test is that the model fits the data perfectly, and thus failing to reject the null model provides some confidence that the model fits the data reasonably close to perfectly. The chi-square test is sensitive to sample size and non-normal variable distributions.

**Comparative fit index (CFI):** As the name implies, the comparative fit index (CFI) is a type of comparative (or incremental) fit index, which means that the CFI compares the focal model to a baseline model, which is commonly referred to as the null or independence model. The CFI is generally less sensitive to sample size than the chi-square test. A CFI value *greater than or equal to* .90 generally indicates good model fit to the data.

**Tucker-Lewis index (CFI):** Like the CFI, the Tucker-Lewis index (TLI) is another type of comparative (or incremental) fit index. The TLI is generally less sensitive to sample size than the chi-square test and tends to work well with smaller sample sizes. A TLI value *greater than or equal to* .95 generally indicates good model fit to the data, although some might relax that cutoff to .90.

**Root mean square error of approximation (RMSEA):** The root mean square error of approximation (RMSEA) is an absolute fit index that penalizes model complexity (e.g., models with a larger number of estimated parameters) and thus ends up effectively rewarding more parsimonious models. RMSEA values tend to upwardly biased when the model degrees of freedom are fewer (i.e., when the model is closer to being just-identified). In general, an RMSEA value that is *less than or equal to* .08 indicates good model fit to the data, although some relax that cutoff to .10.

**Standardized root mean square residual (SRMR):** Like the RMSEA, the standardized root mean square residual (SRMR) is an example of an absolute fit index. An SRMR value that is *less than or equal to* .08 generally indicates good fit to the data.

**Summary of model fit indices:** The conventional cutoffs for the aforementioned model fit indices – like any rule of thumb – should be applied with caution and with good judgment and intention. Further, these indices don’t always agree with one another, which means that we often look across multiple fit indices and come up with our best judgment of whether the model adequately fits the data. Generally, it is not advisable to interpret model parameter estimates unless the model fits the data adequately. Below is a table of the conventional cutoffs for the model fit indices.

Fit Index | Cutoff for Adequate Fit |
---|---|

\(\chi^{2}\) | \(\ge .05\) |

CFI | \(\ge .90\) |

TLI | \(\ge .95\) |

RMSEA | \(\le .08\) |

SRMR | \(\le .08\) |

### 54.1.4 Parameter Estimates

In path analysis, parameter estimates (e.g., direct relations, covariances, variances) can be interpreted like those from a regression model, where the associated *p*-values or confidence intervals can be used as indicators of statistical significance.

### 54.1.5 Statistical Assumptions

The statistical assumptions that should be met prior to running and/or interpreting a path analysis model overlap with those associated with multiple linear regression, and thus for a review of those assumptions please see the chapter on incremental validity using multiple linear regression.

### 54.1.6 Conceptual Video

For a more in-depth review of path analysis, please check out the following conceptual video.

Link to conceptual video: https://youtu.be/UGIVPtFKwc0

## 54.2 Tutorial

This chapter’s tutorial demonstrates estimate a path analysis model using R.

### 54.2.1 Video Tutorial

As usual, you have the choice to follow along with the written tutorial in this chapter or to watch the video tutorial below.

Link to video tutorial: https://youtu.be/vMStRfsUTBg

### 54.2.3 Initial Steps

If you haven’t already, save the file called **“PlannedBehavior.csv”** into a folder that you will subsequently set as your working directory. Your working directory will likely be different than the one shown below (i.e., `"H:/RWorkshop"`

). As a reminder, you can access all of the data files referenced in this book by downloading them as a compressed (zipped) folder from the my GitHub site: https://github.com/davidcaughlin/R-Tutorial-Data-Files; once you’ve followed the link to GitHub, just click “Code” (or “Download”) followed by “Download ZIP”, which will download all of the data files referenced in this book. For the sake of parsimony, I recommend downloading all of the data files into the same folder on your computer, which will allow you to set that same folder as your working directory for each of the chapters in this book.

Next, using the `setwd`

function, set your working directory to the folder in which you saved the data file for this chapter. Alternatively, you can manually set your working directory folder in your drop-down menus by going to *Session > Set Working Directory > Choose Directory…*. Be sure to create a new R script file (.R) or update an existing R script file so that you can save your script and annotations. If you need refreshers on how to set your working directory and how to create and save an R script, please refer to Setting a Working Directory and Creating & Saving an R Script.

```
# Set your working directory
setwd("H:/RWorkshop")
```

Next, read in the .csv data file called **“PlannedBehavior.csv”** using your choice of read function. In this example, I use the `read_csv`

function from the `readr`

package (Wickham, Hester, and Bryan 2022). If you choose to use the `read_csv`

function, be sure that you have installed and accessed the `readr`

package using the `install.packages`

and `library`

functions. *Note: You don’t need to install a package every time you wish to access it; in general, I would recommend updating a package installation once ever 1-3 months.* For refreshers on installing packages and reading data into R, please refer to Packages and Reading Data into R.

```
# Install readr package if you haven't already
# [Note: You don't need to install a package every
# time you wish to access it]
install.packages("readr")
```

```
# Access readr package
library(readr)
# Read data and name data frame (tibble) object
<- read_csv("PlannedBehavior.csv") df
```

```
## Rows: 199 Columns: 5
## ── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (5): attitude, norms, control, intention, behavior
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
```

```
# Print the names of the variables in the data frame (tibble) object
names(df)
```

`## [1] "attitude" "norms" "control" "intention" "behavior"`

```
# Print variable type for each variable in data frame (tibble) object
str(df)
```

```
## spec_tbl_df [199 × 5] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ attitude : num [1:199] 2.31 4.66 3.85 4.24 2.91 2.99 3.96 3.01 4.77 3.67 ...
## $ norms : num [1:199] 2.31 4.01 3.56 2.25 3.31 2.51 4.65 2.98 3.09 3.63 ...
## $ control : num [1:199] 2.03 3.63 4.2 2.84 2.4 2.95 3.77 1.9 3.83 5 ...
## $ intention: num [1:199] 2.5 3.99 4.35 1.51 1.45 2.59 4.08 2.58 4.87 3.09 ...
## $ behavior : num [1:199] 2.62 3.64 3.83 2.25 2 2.2 4.41 4.15 4.35 3.95 ...
## - attr(*, "spec")=
## .. cols(
## .. attitude = col_double(),
## .. norms = col_double(),
## .. control = col_double(),
## .. intention = col_double(),
## .. behavior = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
```

```
# Print first 6 rows of data frame (tibble) object
head(df)
```

```
## # A tibble: 6 × 5
## attitude norms control intention behavior
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2.31 2.31 2.03 2.5 2.62
## 2 4.66 4.01 3.63 3.99 3.64
## 3 3.85 3.56 4.2 4.35 3.83
## 4 4.24 2.25 2.84 1.51 2.25
## 5 2.91 3.31 2.4 1.45 2
## 6 2.99 2.51 2.95 2.59 2.2
```

```
# Print number of rows in data frame (tibble) object
nrow(df)
```

`## [1] 199`

There are 5 variables and 199 cases (i.e., employees) in the `df`

data frame: `attitude`

, `norms`

, `control`

, `intention`

, and `behavior`

. Per the output of the `str`

(structure) function above, all of the variables are of type *numeric* (continuous: interval/ratio). All of the variables are self-reports from a survey, where respondents rated their level of agreement (1 = strongly disagree, 5 = strongly agree) for the items associated with the `attitude`

, `norms`

, `control`

, and `intention`

scales, and where respondents rated the frequency with which they enact the behavior associated with the `behavior`

variable (1 = never, 5 = all of the time). The `attitude`

variable reflects an employee’s attitude toward the behavior in question. The `norms`

variable reflects an employee’s perception of norms pertaining to the enactment of the behavior. The `control`

variable reflects an employee’s feeling of control over being able to perform the behavior. The `intention`

variable reflects an employee’s intention to enact the behavior. The `behavior`

variable reflects an employee’s perception of the frequency with which they actually engage in the behavior.

### 54.2.4 Specify & Estimate Path Analysis Models

We will use functions from the `lavaan`

package (Rosseel 2012) to specify and estimate our path analysis model. The `lavaan`

package also allows structural equation modeling with latent variables, but we won’t cover that in this tutorial. If you haven’t already install and access the `lavaan`

package using the `install.packages`

and `library`

functions, respectively. For background information on the `lavaan`

package, check out the package website.

```
# Install package
install.packages("lavaan")
```

```
# Access package
library(lavaan)
```

Let’s begin by showing how a multiple linear regression model can be estimated using `lavaan`

and it’s functions. Specifically, let’s focus on the first part of our Theory of Planned Behavior model, where `attitude`

, `norms`

, and `control`

are proposed as predictors `intention`

.

- Using the
`<-`

operator, name the specified model object something of your choosing (`specmod`

). To the right of the`<-`

operator, enter quotation marks (`" "`

), and within them, specify your model, which in this case is a multiple linear regression equation:`intention ~ attitude + norms + control`

. Just like we would with other regression functions in R, we use the tilde (`~`

) to separate our outcome variable from our predictor variable, where the outcome variable goes to the left of the tilde, and the predictor variables go to the right. - Using the
`sem`

function from`lavaan`

, type the name of the specified regression model (`specmod`

) as the first argument and`data=`

followed by the name of the data from which the variables belong as the second argument. Using the`<-`

operator, name the estimated model object something, and here I name it`fitmod`

for fitted model. - Using the
`summary`

function from base R, request a summary of the model fit and parameter estimate results.

- As the first argument in the
`summary`

function, type the name of the estimated model function from the previous step (`fitmod`

). - As the second argument, type
`fit.measures=TRUE`

to request the model fit indices as part of the output. - As the third argument, type
`rsquare=TRUE`

to request the unadjusted*R*^{2}value for the model.

```
# Specify path analysis model
<- "
specmod intention ~ attitude + norms + control
"
# Estimate model
<- sem(specmod, data=df)
fitmod
# Request summary of results
summary(fitmod, fit.measures=TRUE, rsquare=TRUE)
```

```
## lavaan 0.6-11 ended normally after 1 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 4
##
## Number of observations 199
##
## Model Test User Model:
##
## Test statistic 0.000
## Degrees of freedom 0
##
## Model Test Baseline Model:
##
## Test statistic 91.633
## Degrees of freedom 3
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 1.000
## Tucker-Lewis Index (TLI) 1.000
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -219.244
## Loglikelihood unrestricted model (H1) -219.244
##
## Akaike (AIC) 446.489
## Bayesian (BIC) 459.662
## Sample-size adjusted Bayesian (BIC) 446.990
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.000
## 90 Percent confidence interval - lower 0.000
## 90 Percent confidence interval - upper 0.000
## P-value RMSEA <= 0.05 NA
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.000
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## intention ~
## attitude 0.352 0.058 6.068 0.000
## norms 0.153 0.059 2.577 0.010
## control 0.275 0.058 4.740 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .intention 0.530 0.053 9.975 0.000
##
## R-Square:
## Estimate
## intention 0.369
```

Towards the top of the output, alongside “Number of observations”, the number 199 appears, which indicates that 199 employees’ data were included in this analysis. Further below, you will see a zero (0) next to “Degrees of freedom”, which indicates that the model is just-identified. Thus, we will ignore the model fit information and skip down to the “Regressions” section. The path coefficient (i.e., regression coefficient) between `attitude`

and `intention`

is statistically significant and positive (*b* = .352, *p* < .001). The path coefficient between `norms`

and `intention`

is statistically significant and positive (*b* = .153, *p* = .010). The path coefficient between `control`

and `intention`

is statistically significant and positive (*b* = .275, *p* < .001). Under the “R-Square” section, we see the unadjusted *R*^{2} value for the outcome variable `intention`

, which is equal to .369; that is, collectively, `attitude`

, `norms`

, and `control`

explain 36.9% of the variance in `intention`

.

We can also explicitly model the covariances/correlations between the predictor (exogenous) variables in the model by using the double tilde (`~~`

) operator. Note that on separate lines under the regression equation script, we can specify that we want `attitude`

to be able to freely covary with `norms`

and `control`

and for `norms`

to be able to freely covary with `control`

. Everything else in your script remains the same as in our previously specified model. As noted above, in a real-life situation, we should think carefully about what it means to add these covariances to exogenous variables, particularly in the case of estimating the model using full information maximum likelihood and in the presence of missing data. We are specifying these covariances here for demonstration purposes.

```
# Specify path analysis model
<- "
specmod # Direct relations
intention ~ attitude + norms + control
# Covariances
attitude ~~ norms + control
norms ~~ control
"
# Estimate model
<- sem(specmod, data=df)
fitmod
# Request summary of results
summary(fitmod, fit.measures=TRUE, rsquare=TRUE)
```

```
## lavaan 0.6-11 ended normally after 18 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 10
##
## Number of observations 199
##
## Model Test User Model:
##
## Test statistic 0.000
## Degrees of freedom 0
##
## Model Test Baseline Model:
##
## Test statistic 136.306
## Degrees of freedom 6
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 1.000
## Tucker-Lewis Index (TLI) 1.000
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -1011.828
## Loglikelihood unrestricted model (H1) -1011.828
##
## Akaike (AIC) 2043.656
## Bayesian (BIC) 2076.589
## Sample-size adjusted Bayesian (BIC) 2044.908
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.000
## 90 Percent confidence interval - lower 0.000
## 90 Percent confidence interval - upper 0.000
## P-value RMSEA <= 0.05 NA
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.000
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## intention ~
## attitude 0.352 0.058 6.068 0.000
## norms 0.153 0.059 2.577 0.010
## control 0.275 0.058 4.740 0.000
##
## Covariances:
## Estimate Std.Err z-value P(>|z|)
## attitude ~~
## norms 0.200 0.064 3.128 0.002
## control 0.334 0.070 4.748 0.000
## norms ~~
## control 0.220 0.065 3.411 0.001
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .intention 0.530 0.053 9.975 0.000
## attitude 0.928 0.093 9.975 0.000
## norms 0.830 0.083 9.975 0.000
## control 0.939 0.094 9.975 0.000
##
## R-Square:
## Estimate
## intention 0.369
```

Just as before, the degrees of freedom (*df*) for the model is equal to zero (0), which again indicates that this model is just-identified, which is what we would expect from a multiple linear regression model as well. In addition, note that the path coefficients remain the same (along with their associated *p*-values), and the *R*^{2} value remains the same. In the new, output, however, the covariances are estimated, and when we estimate the covariances between variables, the variances of the variables involved in the covariances are also estimated by default. Often the covariances and variances are not of substantive interest when interpreting a path analysis model, though.

Using the `lm`

(linear model) function from base R we can verify that our path analysis results using the `sem`

function from `lavaan`

are equivalent to multiple linear regression results from a standard linear regression function. For a review of the `lm`

function, please refer to the chapter supplement for estimating incremental validity using multiple linear regression.

```
# Estimate multiple linear regression model
<- lm(intention ~ attitude + norms + control, data=df)
fitmod
# Request summary of results
summary(fitmod)
```

```
##
## Call:
## lm(formula = intention ~ attitude + norms + control, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.80282 -0.52734 -0.06018 0.51228 1.85202
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.58579 0.23963 2.445 0.0154 *
## attitude 0.35232 0.05866 6.006 0.00000000913 ***
## norms 0.15250 0.05979 2.550 0.0115 *
## control 0.27502 0.05862 4.692 0.00000509027 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7356 on 195 degrees of freedom
## Multiple R-squared: 0.369, Adjusted R-squared: 0.3593
## F-statistic: 38.01 on 3 and 195 DF, p-value: < 0.00000000000000022
```

As you can see, aside from the number of digits reported after the decimal, the `lm`

regression coefficients are equivalent to the `sem`

path coefficients. In addition, the unadjusted *R*^{2} value is the same.

Now it’s time to test a full model of the Theory of Planned Behavior by adding the equation in which `intention`

predicts `behavior`

. We will build on our previous path analysis model script by specifying that `intention`

is a predictor of `behavior`

(i.e., `behavior ~ intention`

). Everything else in our script can remain the same.

```
# Specify path analysis model
<- "
specmod # Direct relations
intention ~ attitude + norms + control
behavior ~ intention
# Covariances
attitude ~~ norms + control
norms ~~ control
"
# Estimate model
<- sem(specmod, data=df)
fitmod
# Request summary of results
summary(fitmod, fit.measures=TRUE, rsquare=TRUE)
```

```
## lavaan 0.6-11 ended normally after 18 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 12
##
## Number of observations 199
##
## Model Test User Model:
##
## Test statistic 2.023
## Degrees of freedom 3
## P-value (Chi-square) 0.568
##
## Model Test Baseline Model:
##
## Test statistic 182.295
## Degrees of freedom 10
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 1.000
## Tucker-Lewis Index (TLI) 1.019
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -1258.517
## Loglikelihood unrestricted model (H1) -1257.506
##
## Akaike (AIC) 2541.035
## Bayesian (BIC) 2580.555
## Sample-size adjusted Bayesian (BIC) 2542.538
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.000
## 90 Percent confidence interval - lower 0.000
## 90 Percent confidence interval - upper 0.103
## P-value RMSEA <= 0.05 0.735
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.019
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## intention ~
## attitude 0.352 0.058 6.068 0.000
## norms 0.153 0.059 2.577 0.010
## control 0.275 0.058 4.740 0.000
## behavior ~
## intention 0.453 0.065 7.014 0.000
##
## Covariances:
## Estimate Std.Err z-value P(>|z|)
## attitude ~~
## norms 0.200 0.064 3.128 0.002
## control 0.334 0.070 4.748 0.000
## norms ~~
## control 0.220 0.065 3.411 0.001
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .intention 0.530 0.053 9.975 0.000
## .behavior 0.699 0.070 9.975 0.000
## attitude 0.928 0.093 9.975 0.000
## norms 0.830 0.083 9.975 0.000
## control 0.939 0.094 9.975 0.000
##
## R-Square:
## Estimate
## intention 0.369
## behavior 0.198
```

Again, we have 199 observations (employees) used in this analysis, but this time, the degrees of freedom is equal to 3, which indicates that our model is over-identified. Because the model is over-identified, we can interpret the model fit indices to assess the how well the model we specified fits the data. First, the chi-square test value appears to the right of the text “Model Fit Test Statistic”, and the associated *p*-value appears below. The chi-square test is nonsignificant (\(\chi^{2}(df=3)=2.023, p=.568\)), which indicates that the model does not fit significantly worse than a model that fits the data perfectly; thus, we have our first indicator that the model fits the data reasonably well. Second, the comparative fit index (CFI) is 1.000, which is greater than the conventional .90 cutoff, which also indicates that the model fits the data reasonably well. Third, the Tucker-Lewis index (TLI) is 1.019, which is greater than the conventional .95 cutoff, which also indicates that the model fits the data reasonably well. Fourth, the root mean square error of approximation (RMSEA) value of .000 is less than the conventional cutoff of .08, and thus we have further evidence that the model fits the data reasonably well. Fifth, the standardized root mean square residual (SRMR) value of .019 is less than the conventional cutoff of .08, and thus we have further evidence that the model fits the data reasonably well. This is one of the relatively rare occasions in which all of the model fit indices are in agreement with one another, leading us to conclude that the specified model fits the data well. We can now feel comfortable proceeding forward interpreting the parameter estimates.

The path coefficient (i.e., direct relation, presumed causal relation) between `attitude`

and `intention`

is statistically significant and positive (*b* = .352, *p* < .001); the path coefficient between `norms`

and `intention`

is also statistically significant and positive (*b* = .153, *p* = .010); and the path coefficient between `control`

and `intention`

is also statistically significant and positive (*b* = .275, *p* < .001). Further, the path coefficient between `intention`

and `behavior`

is statistically significant and positive (*b* = .453, *p* < .001). The covariances and variances are not of substantive interest, so we will ignore them. The unadjusted *R*^{2} value for the outcome variable `intention`

is .369, which indicates that, collectively, `attitude`

, `norms`

, and `control`

explain 36.9% of the variance in `intention`

, and the unadjusted *R*^{2} value for the outcome variable `behavior`

is .198, which indicates that `intention`

explains 19.8% of the variance in `behavior`

. Overall, the results lends support to the propositions of the Theory of Planned Behavior, at least based on this sample of employees.

### 54.2.5 Additional Information on Model Specification Notation

Thus far we have focused on using the tilde (`~`

) operator to note directional relations (i.e., path coefficients) and the double tilde (`~~`

) to specify a covariance. The plus (`+`

) operator allows us to add variables to one side of the directional relation or covariance equation.

There are equivalent approaches for specifying *directional relations*. If we want to specify that `attitude`

, `norms`

, and `control`

are predictors of `intention`

we could use either approach shown below when specifying our model.

```
# Equivalent approaches to specifying directional relations
### Approach 1
<- "
specmod intention ~ attitude + norms + control
"
### Approach 2
<- "
specmod intention ~ attitude
intention ~ norms
intention ~ control
"
```

The same logical applies if we hypothetically wanted to specify `attitude`

, `norms`

, and `control`

as predictors of both `intention`

and `behavior`

. It’s really up to you how you decide to specify your model when equivalent approaches are possible.

```
# Equivalent approaches to specifying directional relations
### Approach 1
<- "
specmod intention + behavior ~ attitude + norms + control
"
### Approach 2
<- "
specmod intention ~ attitude + norms + control
behavior ~ attitude + norms + control
"
### Approach 3
<- "
specmod intention ~ attitude
intention ~ norms
intention ~ control
behavior ~ attitude
behavior ~ norms
behavior ~ control
"
```

Further, there are equivalent approaches for specifying *covariances*. If we want to specify that `attitude`

, `norms`

, and `control`

are permitted to covary with one another, we could use either approach below.

```
# Equivalent approaches to specifying covariances
### Approach 1
<- "
specmod attitude ~~ norms + control
norms ~~ control
"
### Approach 2
<- "
specmod attitude ~~ norms
attitude ~~ control
norms ~~ control
"
```

If you would like to explicitly specify a *variance* component in your model (even if the model will do this by default for variables with specified covariances), you would use the double tilde (`~~`

) operator with the same variable’s name on either side of the double tilde.

```
# Specifying variances
<- "
specmod attitude ~~ attitude
norms ~~ norms
control ~~ control
"
```

Finally, if you have a single exogenous variable in your model (that does not share a covariance with any other variable), it is up to you whether you wish to specify the variance component for that exogenous variable as a free parameter (or not). The model fit and parameter estimates will remain the same.

### 54.2.6 Summary

In this chapter, we explored the building blocks of path analysis, which allowed us to simultaneously fit a model with more than one outcome variable and with a variable that acts as both a predictor and an outcome. To do so, we used the `sem`

function from the `lavaan`

package and evaluated model fit indices and parameter estimates.

### References

*Organizational Behavior and Human Decision Processes*50: 179–211.

*Structural Equation Modeling*6 (1): 1–55.

*Journal of Statistical Software*48 (2): 1–36. https://www.jstatsoft.org/v48/i02/.

*Readr: Read Rectangular Text Data*. https://CRAN.R-project.org/package=readr.