Chapter 55 Investigating Processes Using Path Analysis

In this chapter, we will learn how to apply path analysis in order to investigate processes that influence employees’ performance of a particular behavior.

55.1 Conceptual Overview

In its simplest forms, path analysis can represent a single linear regression model (i.e., one outcome, one predictor), but in its more complex, path analysis can represent a system of multiple equations. Path analysis is often useful for testing theoretical or conceptual models with multiple outcome variables and/or outcome variables that also serve as predictor variables (e.g., mediators). Both conventional multiple linear regression modeling and structural equation modeling can be used for path analysis, and in this tutorial we will focus on the latter approach, as it allows for model estimation in a single step and the assessment of overall model fit to the data across the equations specified in the model.

Path analysis can be used to evaluate models with presumed causal chains of variables, and this is why the approach is sometimes called causal modeling. We should be careful with how we use the term causal modeling, however, as showing that a chain of variables are linked to together in a model does not necessarily mean that the variables are causally related.

In this chapter, we will focus on recursive models, which means the direct relations between two variables presumed to be causally related can only be unidirectional. A nonrecursive model allows for bidirectional associations between variables presumed to be causally related and for a predictor variable to be correlated with the residual error of its presumed outcome variable.

55.1.1 Path Diagram

It is customary to depict a path analysis model visually using a path diagram, and as mentioned above, path analysis can be used to test a theoretical or conceptual model of interest. Let’s use the Theory of Planned Behavior (Ajzen 1991). For a simplified version of the theory, please refer to the Figure 1 below.

Figure 1: Conceptual representation of the Theory of Planned Behavior (Ajzen, 1991)

In a simplified form, the Theory of Planned Behavior posits that an intention to perform a particular behavior influences the individual’s decision to enact that behavior, and attitude toward the behavior, perception of norms pertaining to that behavior, and perception of control over performing that behavior influence the individual’s intention to perform the behavior.

We can specify the Theory of Planned Behavior as a path diagram by first drawing the (manifest, observed) variables and the directional (structural) relations (paths) between the variables as implied by the theory, where rectangles represent the variables and directional arrows represent the directional relations between variables, which is depicted in the path diagram below (see Figure 2).

Figure 2: Path diagram depicting Theory of Planned Behavior

For exogenous variables in the model, on the one hand, we can choose to estimate their variances and the covariances between them; on the other hand, we can choose not to estimate their variances and covariances between them. For our purposes, we’ll practice specifying the variances and covariances between exogenous variables, but note that when missing data are present and full information maximum likelihood is deployed, the decision to estimate the variances of exogenous variables can have additional implications, which go beyond the scope of this tutorial. Exogenous variables serve only as predictor variables in the model, meaning that no other variables in the model to predict them and their causes exist outside of the model. As shown in the figure above, variances are typically represented as curved double-sided arrows, where both arrows point to the same variable. Covariances are also represented as double-sided arrows in which the arrows connect two distinct variables.

(Residual) error terms (which are sometimes called disturbances) are added to endogenous variables, where endogenous variables refer to those variables that are predicted by another variable in the model, meaning that at least one presumed cause is modeled. Note that an endogenous variable may also be specified as the predictor of another endogenous variable, as is the case in our example path diagram representing the Theory of Planned Behavior. In addition, please note that an error term represents variability in an endogenous variable that remains unexplained even after the effects of the specified predictor variables are accounted for. Note, for example, that when we have multiple endogenous variables at the same stage of the model (e.g., first-stage mediators, terminal outcomes), we may choose to allow their error terms to covary.

In sum, the conventional symbols used in a path diagram are shown Figure 3 below. With the exception of the manifest variable symbol, all of these symbols represent components of the model that can be estimated statistically using path analysis. These model components that can (and will) be estimated are commonly referred to as parameters or free parameters.

Figure 3: Conventional path diagram symbols and their meanings

Importantly, please note that in our path diagram because there are no direct relations specified from Attitude to Behavior, Norms to Behavior, and Control to Behavior, which implies that those direct relations are constrained to zero. If we suspected that, in fact, those direct relations are non-zero, then we would draw direction relations (i.e., single-headed arrows) in our path diagram and estimate their values during path analysis.

55.1.2 Model Identification

Model identification has to do with the number of (free) parameters specified in the model relative to the number of unique (non-redundant) sources of information available, and model implication has important implications for assessing model fit and estimating parameter estimates.

Just-identified: In a just-identified model (i.e., saturated model), the number of parameters (e.g., structural relations, variances) is equal to the number of unique (non-redundant) sources of information, which means that the degrees of freedom (df) is equal to zero. In just-identified models, the model parameters can be estimated, but the model fit cannot be assessed in a meaningful way, aside from the R² value. As a specific applications of path analysis, simple linear and multiple linear regression models are always just-identified.

Over-identified: In an over-identified model, the number of parameters (e.g., structural relations, variances) is less than the number of unique (non-redundant) sources of information, which means that the degrees of freedom (df) is greater than zero. In over-identified models, the model parameters can be estimated, and the model fit can be assessed.

Under-identified: In an under-identified model, the number of parameters (e.g., structural relations, variances) is greater than the number of unique (non-redundant) sources of information, which means that the degrees of freedom (df) is less than zero. In under-identified models, the model parameters and model fit cannot be estimated. Sometimes we might say that such models are called overparameterized because they have more parameters to be estimated than unique (non-redundant) sources of information.

Most (if not all) statistical software packages that allow structural equation modeling – and, by extension, path analysis – to automatically compute the degrees of freedom for a model or provide an error message if the model is under-identified. As such, we don’t need to count the number of sources of unique (non-redundant) sources of information and free parameters by hand. With that said, to understand model identification and its various forms at a deeper level, it is often helpful to practice calculating the degrees freedom by hand when first learning.

The formula for calculating the number of unique (non-redundant) sources of information available for a particular model is as follows:

\(i = \frac{p(p+1)}{2}\)

where \(p\) is the number of manifest (observed) variables to be modeled. This formula calculates the number of possible unique covariances and variances for the variables specified in the model – or in other words, it calculates the lower diagonal of a covariance matrix, including the variances.

In the path diagram we specified above, there are five manifest variables: Attitude, Norms, Control, Intention, and Behavior. Thus, in the following formula, \(p\) is equal to 5, and thus the number of unique (non-redundant) sources of information is 15.

\(i = \frac{5(5+1)}{2} = \frac{30}{2} = 15\)

To count the number of free parameters (\(k\)), simply add up the number of the specified direction relations, variances, covariances, and error terms in the path analysis model. As shown in Figure 4 below, our example path analysis model has 12 free parameters.

\(k = 12\)

To calculate the degrees of freedom (df) for the model, subtract the number of free parameters from the number unique (non-redundant) sources of information, which in this example is equal to 3, as shown below. Thus, the degrees of freedom for the model is 3, which means the model is over-identified.

\(df = i - k = 15 - 12 = 3\)

Figure 4: Counting the number of (free) parameters in specified path analysis model

55.1.3 Model Fit

When a model is over-identified (df > 0), the extent to which the specified model fits the data can be assessed using various model fit indices, such as the chi-square test (\(\chi^{2}\)), comparative fit index (CFI), Tucker-Lewis index (TLI), root mean square error of approximation (RMSEA), and standardized root mean square residual (SRMR). For a commonly cited reference on cutoffs for fit indices, please refer to Hu and Bentler (1999).

Chi-square test: The chi-square test can be used to assess whether the model fits the data, where a statistically significant chi-square value (e.g., p < .05) indicates that the model does not fit the data well and a nonsignificant chi-square value (e.g., p > .05) indicates that the model fits the data reasonably well. The null hypothesis for the chi-square test is that the model fits the data perfectly, and thus failing to reject the null model provides some confidence that the model fits the data reasonably close to perfectly. The chi-square test is sensitive to sample size and non-normal variable distributions.

Comparative fit index (CFI): As the name implies, the comparative fit index (CFI) is a type of comparative (or incremental) fit index, which means that the CFI compares the focal model to a baseline model, which is commonly referred to as the null or independence model. The CFI is generally less sensitive to sample size than the chi-square test. A CFI value greater than or equal to .90 generally indicates good model fit to the data.

Tucker-Lewis index (TLI): Like the CFI, the Tucker-Lewis index (TLI) is another type of comparative (or incremental) fit index. The TLI is generally less sensitive to sample size than the chi-square test and tends to work well with smaller sample sizes. A TLI value greater than or equal to .95 generally indicates good model fit to the data, although some might relax that cutoff to .90.

Root mean square error of approximation (RMSEA): The root mean square error of approximation (RMSEA) is an absolute fit index that penalizes model complexity (e.g., models with a larger number of estimated parameters) and thus ends up effectively rewarding more parsimonious models. RMSEA values tend to upwardly biased when the model degrees of freedom are fewer (i.e., when the model is closer to being just-identified). In general, an RMSEA value that is less than or equal to .08 indicates good model fit to the data, although some relax that cutoff to .10.

Standardized root mean square residual (SRMR): Like the RMSEA, the standardized root mean square residual (SRMR) is an example of an absolute fit index. An SRMR value that is less than or equal to .08 generally indicates good fit to the data.

Summary of model fit indices: The conventional cutoffs for the aforementioned model fit indices – like any rule of thumb – should be applied with caution and with good judgment and intention. Further, these indices don’t always agree with one another, which means that we often look across multiple fit indices and come up with our best judgment of whether the model adequately fits the data. Generally, it is not advisable to interpret model parameter estimates unless the model fits the data adequately. Below is a table of the conventional cutoffs for the model fit indices.

Fit Index	Cutoff for Adequate Fit
\(\chi^{2}\)	\(p \ge .05\)
CFI	\(\ge .90\)
TLI	\(\ge .95\)
RMSEA	\(\le .08\)
SRMR	\(\le .08\)

55.1.4 Parameter Estimates

In path analysis, parameter estimates (e.g., direct relations, covariances, variances) can be interpreted like those from a regression model, where the associated p-values or confidence intervals can be used as indicators of statistical significance.

55.1.5 Statistical Assumptions

The statistical assumptions that should be met prior to running and/or interpreting a path analysis model overlap with those associated with multiple linear regression, and thus for a review of those assumptions please see the chapter on incremental validity using multiple linear regression.

55.1.6 Conceptual Video

For a more in-depth review of path analysis, please check out the following conceptual video.

Link to conceptual video: https://youtu.be/UGIVPtFKwc0

55.2 Tutorial

This chapter’s tutorial demonstrates estimate a path analysis model using R.

55.2.1 Video Tutorial

As usual, you have the choice to follow along with the written tutorial in this chapter or to watch the video tutorial below.

Link to video tutorial: https://youtu.be/vMStRfsUTBg

55.2.2 Functions & Packages Introduced

Function	Package
`sem`	`lavaan`
`summary`	base R
`lm`	base R

55.2.3 Initial Steps

If you haven’t already, save the file called “PlannedBehavior.csv” into a folder that you will subsequently set as your working directory. Your working directory will likely be different than the one shown below (i.e., "H:/RWorkshop"). As a reminder, you can access all of the data files referenced in this book by downloading them as a compressed (zipped) folder from the my GitHub site: https://github.com/davidcaughlin/R-Tutorial-Data-Files; once you’ve followed the link to GitHub, just click “Code” (or “Download”) followed by “Download ZIP”, which will download all of the data files referenced in this book. For the sake of parsimony, I recommend downloading all of the data files into the same folder on your computer, which will allow you to set that same folder as your working directory for each of the chapters in this book.

Next, using the setwd function, set your working directory to the folder in which you saved the data file for this chapter. Alternatively, you can manually set your working directory folder in your drop-down menus by going to Session > Set Working Directory > Choose Directory…. Be sure to create a new R script file (.R) or update an existing R script file so that you can save your script and annotations. If you need refreshers on how to set your working directory and how to create and save an R script, please refer to Setting a Working Directory and Creating & Saving an R Script.

# Set your working directory
setwd("H:/RWorkshop")

Next, read in the .csv data file called “PlannedBehavior.csv” using your choice of read function. In this example, I use the read_csv function from the readr package (Wickham, Hester, and Bryan 2024). If you choose to use the read_csv function, be sure that you have installed and accessed the readr package using the install.packages and library functions. Note: You don’t need to install a package every time you wish to access it; in general, I would recommend updating a package installation once ever 1-3 months. For refreshers on installing packages and reading data into R, please refer to Packages and Reading Data into R.

# Install readr package if you haven't already
# [Note: You don't need to install a package every 
# time you wish to access it]
install.packages("readr")

# Access readr package
library(readr)

# Read data and name data frame (tibble) object
df <- read_csv("PlannedBehavior.csv")

## Rows: 199 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (5): attitude, norms, control, intention, behavior
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

# Print the names of the variables in the data frame (tibble) object
names(df)

## [1] "attitude"  "norms"     "control"   "intention" "behavior"

# Print variable type for each variable in data frame (tibble) object
str(df)

## spc_tbl_ [199 × 5] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ attitude : num [1:199] 2.31 4.66 3.85 4.24 2.91 2.99 3.96 3.01 4.77 3.67 ...
##  $ norms    : num [1:199] 2.31 4.01 3.56 2.25 3.31 2.51 4.65 2.98 3.09 3.63 ...
##  $ control  : num [1:199] 2.03 3.63 4.2 2.84 2.4 2.95 3.77 1.9 3.83 5 ...
##  $ intention: num [1:199] 2.5 3.99 4.35 1.51 1.45 2.59 4.08 2.58 4.87 3.09 ...
##  $ behavior : num [1:199] 2.62 3.64 3.83 2.25 2 2.2 4.41 4.15 4.35 3.95 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   attitude = col_double(),
##   ..   norms = col_double(),
##   ..   control = col_double(),
##   ..   intention = col_double(),
##   ..   behavior = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>

# Print first 6 rows of data frame (tibble) object
head(df)

## # A tibble: 6 × 5
##   attitude norms control intention behavior
##      <dbl> <dbl>   <dbl>     <dbl>    <dbl>
## 1     2.31  2.31    2.03      2.5      2.62
## 2     4.66  4.01    3.63      3.99     3.64
## 3     3.85  3.56    4.2       4.35     3.83
## 4     4.24  2.25    2.84      1.51     2.25
## 5     2.91  3.31    2.4       1.45     2   
## 6     2.99  2.51    2.95      2.59     2.2

# Print number of rows in data frame (tibble) object
nrow(df)

## [1] 199

There are 5 variables and 199 cases (i.e., employees) in the df data frame: attitude, norms, control, intention, and behavior. Per the output of the str (structure) function above, all of the variables are of type numeric (continuous: interval/ratio). All of the variables are self-reports from a survey, where respondents rated their level of agreement (1 = strongly disagree, 5 = strongly agree) for the items associated with the attitude, norms, control, and intention scales, and where respondents rated the frequency with which they enact the behavior associated with the behavior variable (1 = never, 5 = all of the time). The attitude variable reflects an employee’s attitude toward the behavior in question. The norms variable reflects an employee’s perception of norms pertaining to the enactment of the behavior. The control variable reflects an employee’s feeling of control over being able to perform the behavior. The intention variable reflects an employee’s intention to enact the behavior. The behavior variable reflects an employee’s perception of the frequency with which they actually engage in the behavior.

55.2.4 Specifying & Estimating Path Analysis Models

We will use functions from the lavaan package (Rosseel 2012) to specify and estimate our path analysis model. The lavaan package also allows structural equation modeling with latent variables, but we won’t cover that in this tutorial. If you haven’t already install and access the lavaan package using the install.packages and library functions, respectively. For background information on the lavaan package, check out the package website.

# Install package
install.packages("lavaan")

# Access package
library(lavaan)

55.2.4.1 Estimating a Multiple Linear Regression Model Using Path Analysis

Let’s begin by demonstrating how a multiple linear regression model can be estimated using lavaan and it’s functions. Specifically, let’s focus on the first part of our Theory of Planned Behavior model, where attitude, norms, and control are proposed as predictors intention.

Using the <- operator, name the specified model object something of your choosing (specmod). To the right of the <- operator, enter quotation marks (" "). Within the quotation marks, specify the model, which in this case is a multiple linear regression equation:

Regress the outcome variable (intention) onto the predictor variables (attitude, norms, control). Just like we would with other regression functions in R, we use the tilde (~) to separate our outcome variable from our predictor variable, where the outcome variable goes to the left of the tilde, and the predictor variables go to the right, resulting in the following equation (and specified directional paths/relations): intention ~ attitude + norms + control.
To estimate the model intercept associated with our outcome variable (intention), we need to explicitly specify the intercept of of intention, as the sem function will not provide these by default with these model specifications. To do so, specify the outcome variable followed by ~ 1, which should look like this: intention ~ 1.

Using the sem function from lavaan, type the name of the specified regression model (specmod) as the first argument and data= followed by the name of the data from which the variables belong as the second argument. Using the <- operator, name the estimated model object something, and here I name it fitmod for fitted model.
Using the summary function from base R, request a summary of the model fit and parameter estimate results.

As the first argument in the summary function, type the name of the estimated model function from the previous step (fitmod).
As the second argument, type fit.measures=TRUE to request the model fit indices as part of the output.
As the third argument, type rsquare=TRUE to request the unadjusted R² value for the model.

# Specify path analysis model & assign to object
specmod <- "
# Specify directional paths/relations
intention ~ attitude + norms + control

# Specify model intercept for outcome variable
intention ~ 1
"

# Estimate path analysis model & assign to fitted model object
fitmod <- sem(specmod,  # name of specified model object 
              data=df)  # name of data frame object

# Print summary of model results
summary(fitmod,              # name of fitted model object 
        fit.measures=TRUE,   # request model fit indices 
        rsquare=TRUE)        # request R-squared estimates

## lavaan 0.6.15 ended normally after 15 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                         5
## 
##   Number of observations                           199
## 
## Model Test User Model:
##                                                       
##   Test statistic                                 0.000
##   Degrees of freedom                                 0
## 
## Model Test Baseline Model:
## 
##   Test statistic                                91.633
##   Degrees of freedom                                 3
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    1.000
##   Tucker-Lewis Index (TLI)                       1.000
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)               -219.244
##   Loglikelihood unrestricted model (H1)       -219.244
##                                                       
##   Akaike (AIC)                                 448.489
##   Bayesian (BIC)                               464.955
##   Sample-size adjusted Bayesian (SABIC)        449.115
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.000
##   90 Percent confidence interval - lower         0.000
##   90 Percent confidence interval - upper         0.000
##   P-value H_0: RMSEA <= 0.050                       NA
##   P-value H_0: RMSEA >= 0.080                       NA
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.000
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   intention ~                                         
##     attitude          0.352    0.058    6.068    0.000
##     norms             0.153    0.059    2.577    0.010
##     control           0.275    0.058    4.740    0.000
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .intention         0.586    0.237    2.470    0.014
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .intention         0.530    0.053    9.975    0.000
## 
## R-Square:
##                    Estimate
##     intention         0.369

Towards the top of the output, alongside “Number of observations”, the number 199 appears, which indicates that 199 employees’ data were included in this analysis. Because there were no missing data, all 199 employees were retained in the sample for analysis (instead of being listwise deleted by default). Further below, you will see a zero (0) next to “Degrees of freedom”, which indicates that the model is just-identified. Thus, we will ignore the model fit information and skip down to the “Regressions” section. The path coefficient (i.e., regression coefficient) between attitude and intention is statistically significant and positive (b = .352, p < .001). The path coefficient between norms and intention is statistically significant and positive (b = .153, p = .010). The path coefficient between control and intention is statistically significant and positive (b = .275, p < .001). Beneath the “Intercepts” section, you will find the model intercept value for intention, which is statistically significantly different from zero and positive (b = .589, p = .014). Under the “R-Square” section, we see the unadjusted R² value for the outcome variable intention, which is equal to .369; that is, collectively, attitude, norms, and control explain 36.9% of the variance in intention.

Using the lm (linear model) function from base R we can verify that our path analysis results using the sem function from lavaan are equivalent to multiple linear regression results from a conventional linear regression function. For a review of the lm function, please refer to the chapter supplement for estimating incremental validity using multiple linear regression.

# Specify & estimate multiple linear regression model
fitmod <- lm(intention ~ attitude + norms + control, data=df)

# Print summary of results
summary(fitmod)

## 
## Call:
## lm(formula = intention ~ attitude + norms + control, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.80282 -0.52734 -0.06018  0.51228  1.85202 
## 
## Coefficients:
##             Estimate Std. Error t value      Pr(>|t|)    
## (Intercept)  0.58579    0.23963   2.445        0.0154 *  
## attitude     0.35232    0.05866   6.006 0.00000000913 ***
## norms        0.15250    0.05979   2.550        0.0115 *  
## control      0.27502    0.05862   4.692 0.00000509027 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7356 on 195 degrees of freedom
## Multiple R-squared:  0.369,  Adjusted R-squared:  0.3593 
## F-statistic: 38.01 on 3 and 195 DF,  p-value: < 0.00000000000000022

As you can see, aside from the number of digits reported after the decimal, the lm regression coefficients, model intercept estimate unadjusted R² value are the same as the corresponding estimates from the model estimated using the sem function. With that said, the standard errors and p-values differ slightly across the two models. The reason being, the sem function uses maximum likelihood (ML) estimation, whereas the lm function uses a type of least squares estimation called ordinary least squares (OLS) estimation. Although a full review and comparison of the differences between ML and OLS estimators is beyond the scope of this chapter, simply put, the OLS estimator and ML estimator carry different assumptions and estimation approaches. Thus, while the substantive results remain the same, there are differences in the standard errors and p-values due to the different estimators applied by the lm and sem functions.

55.2.4.2 Estimating Covariances in a Path Analysis Model

Using path analysis, we can explicitly model the covariances (correlations) between the predictor (exogenous) variables in the model by using the double tilde (~~) operator. As noted above, in a real-life situation, we should carefully consider what it means to add these covariances to exogenous variables, particularly when estimating models in the presence of missing data by using full information maximum likelihood. Briefly, estimating the covariances between exogenous variables assumes the exogenous variables are random variables, with their own freely estimated intercepts and variances. Conversely, by omitting the covariances (i.e., by default, constraining the covariances to zero), the exogenous variables are assumed to be fixed variables, with their intercepts, variances, and covariances constrained to sample estimates. Thus, in this chapter, we are merely specifying the covariances between these exogenous variables to illustrate how to apply double tilde (~~) operator.

Within our model specification quotation marks (" "), we additionally specify the covariances between the exogenous variables (attitude, norms, control). In doing so, we will freely estimate not only the covariances between the exogenous variables, but also their intercepts and variances. To do so, we simply apply the double tilde (~~) operator to capture all three possible covariances between the three exogenous variables. Finally, in this model, we will drop the model intercept (intention ~ 1) that we freely estimated in the path analysis model from the previous section, as that model intercept won’t be of substantive interest for us going forward.

# Specify path analysis model & assign to object
specmod <- "
# Specify directional paths/relations
intention ~ attitude + norms + control

# Specify covariances between exogenous variables
attitude ~~ norms + control
norms ~~ control
"

# Estimate path analysis model & assign to fitted model object
fitmod <- sem(specmod,  # name of specified model object 
              data=df)  # name of data frame object

# Print summary of model results
summary(fitmod,              # name of fitted model object 
        fit.measures=TRUE,   # request model fit indices 
        rsquare=TRUE)        # request R-squared estimates

## lavaan 0.6.15 ended normally after 18 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        10
## 
##   Number of observations                           199
## 
## Model Test User Model:
##                                                       
##   Test statistic                                 0.000
##   Degrees of freedom                                 0
## 
## Model Test Baseline Model:
## 
##   Test statistic                               136.306
##   Degrees of freedom                                 6
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    1.000
##   Tucker-Lewis Index (TLI)                       1.000
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -1011.828
##   Loglikelihood unrestricted model (H1)      -1011.828
##                                                       
##   Akaike (AIC)                                2043.656
##   Bayesian (BIC)                              2076.589
##   Sample-size adjusted Bayesian (SABIC)       2044.908
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.000
##   90 Percent confidence interval - lower         0.000
##   90 Percent confidence interval - upper         0.000
##   P-value H_0: RMSEA <= 0.050                       NA
##   P-value H_0: RMSEA >= 0.080                       NA
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.000
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   intention ~                                         
##     attitude          0.352    0.058    6.068    0.000
##     norms             0.153    0.059    2.577    0.010
##     control           0.275    0.058    4.740    0.000
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   attitude ~~                                         
##     norms             0.200    0.064    3.128    0.002
##     control           0.334    0.070    4.748    0.000
##   norms ~~                                            
##     control           0.220    0.065    3.411    0.001
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .intention         0.530    0.053    9.975    0.000
##     attitude          0.928    0.093    9.975    0.000
##     norms             0.830    0.083    9.975    0.000
##     control           0.939    0.094    9.975    0.000
## 
## R-Square:
##                    Estimate
##     intention         0.369

First, note that the model degrees of freedom (df) is equal to zero (0), which indicates that this model is just-identified. Second, note that the parameter estimates for the covariances now appear in the output under the section titled “Covariances.” Third, note that the intercepts of the exogenous variables now appear in the section titled “Intercepts.” Fourth, note that the variances of the exogenous variables now appear in the section titled “Variances.”

55.2.4.3 Specifying a More Complex Path Analysis Model

Now it’s time to test a full model of the Theory of Planned Behavior by adding the equation in which intention predicts behavior. We will build on our previous path analysis model script by specifying that intention is a predictor of behavior (i.e., behavior ~ intention). Everything else in our script can remain the same.

# Specify path analysis model & assign to object
specmod <- "
# Specify directional paths/relations
intention ~ attitude + norms + control
behavior ~ intention

# Specify covariances between exogenous variables
attitude ~~ norms + control
norms ~~ control
"

# Estimate path analysis model & assign to fitted model object
fitmod <- sem(specmod,  # name of specified model object 
              data=df)  # name of data frame object

# Print summary of model results
summary(fitmod,              # name of fitted model object 
        fit.measures=TRUE,   # request model fit indices 
        rsquare=TRUE)        # request R-squared estimates

## lavaan 0.6.15 ended normally after 18 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        12
## 
##   Number of observations                           199
## 
## Model Test User Model:
##                                                       
##   Test statistic                                 2.023
##   Degrees of freedom                                 3
##   P-value (Chi-square)                           0.568
## 
## Model Test Baseline Model:
## 
##   Test statistic                               182.295
##   Degrees of freedom                                10
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    1.000
##   Tucker-Lewis Index (TLI)                       1.019
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -1258.517
##   Loglikelihood unrestricted model (H1)      -1257.506
##                                                       
##   Akaike (AIC)                                2541.035
##   Bayesian (BIC)                              2580.555
##   Sample-size adjusted Bayesian (SABIC)       2542.538
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.000
##   90 Percent confidence interval - lower         0.000
##   90 Percent confidence interval - upper         0.103
##   P-value H_0: RMSEA <= 0.050                    0.735
##   P-value H_0: RMSEA >= 0.080                    0.120
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.019
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   intention ~                                         
##     attitude          0.352    0.058    6.068    0.000
##     norms             0.153    0.059    2.577    0.010
##     control           0.275    0.058    4.740    0.000
##   behavior ~                                          
##     intention         0.453    0.065    7.014    0.000
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   attitude ~~                                         
##     norms             0.200    0.064    3.128    0.002
##     control           0.334    0.070    4.748    0.000
##   norms ~~                                            
##     control           0.220    0.065    3.411    0.001
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .intention         0.530    0.053    9.975    0.000
##    .behavior          0.699    0.070    9.975    0.000
##     attitude          0.928    0.093    9.975    0.000
##     norms             0.830    0.083    9.975    0.000
##     control           0.939    0.094    9.975    0.000
## 
## R-Square:
##                    Estimate
##     intention         0.369
##     behavior          0.198

Again, we have 199 observations (employees) used in this analysis, but this time, the degrees of freedom is equal to 3, which indicates that our model is over-identified. Because the model is over-identified, we can interpret the model fit indices to assess the how well the model we specified fits the data. First, the chi-square test value appears to the right of the text “Model Fit Test Statistic”, and the associated p-value appears below. The chi-square test is nonsignificant (\(\chi^{2}(df=3)=2.023, p=.568\)), which indicates that the model does not fit significantly worse than a model that fits the data perfectly; thus, we have our first indicator that the model fits the data reasonably well. Second, the comparative fit index (CFI) is 1.000, which is greater than the conventional .90 cutoff, which also indicates that the model fits the data reasonably well. Third, the Tucker-Lewis index (TLI) is 1.019, which is greater than the conventional .95 cutoff, which also indicates that the model fits the data reasonably well. Fourth, the root mean square error of approximation (RMSEA) value of .000 is less than the conventional cutoff of .08, and thus we have further evidence that the model fits the data reasonably well. Fifth, the standardized root mean square residual (SRMR) value of .019 is less than the conventional cutoff of .08, and thus we have further evidence that the model fits the data reasonably well. This is one of the relatively rare occasions in which all of the model fit indices are in agreement with one another, leading us to conclude that the specified model fits the data well. We can now feel comfortable proceeding forward interpreting the parameter estimates.

The path coefficient (i.e., direct relation, presumed causal relation) between attitude and intention is statistically significant and positive (b = .352, p < .001); the path coefficient between norms and intention is also statistically significant and positive (b = .153, p = .010); and the path coefficient between control and intention is also statistically significant and positive (b = .275, p < .001). Further, the path coefficient between intention and behavior is statistically significant and positive (b = .453, p < .001). The covariances and variances are not of substantive interest, so we will ignore them. The unadjusted R² value for the outcome variable intention is .369, which indicates that, collectively, attitude, norms, and control explain 36.9% of the variance in intention, and the unadjusted R² value for the outcome variable behavior is .198, which indicates that intention explains 19.8% of the variance in behavior. Overall, the results lends support to the propositions of the Theory of Planned Behavior, at least based on this sample of employees.

In the following chapter, we will learn how to estimate the indirect effects of the predictor variable(s) on the outcome variable(s) via a mediator variable by applying mediation analysis in a path analysis framework.

55.2.5 Obtaining Standardized Parameter Estimates

Just like traditional linear regression models, we sometimes obtain standardized parameter estimates in path analysis models to get a better understanding of the magnitude of the effects. Fortunately, requesting standardized parameter estimates just requires adding the standardized=TRUE to the summary function.

# Specify path analysis model & assign to object
specmod <- "
# Specify directional paths/relations
intention ~ attitude + norms + control
behavior ~ intention

# Specify covariances between exogenous variables
attitude ~~ norms + control
norms ~~ control
"

# Estimate path analysis model & assign to fitted model object
fitmod <- sem(specmod,  # name of specified model object 
              data=df)  # name of data frame object

# Print summary of model results
summary(fitmod,              # name of fitted model object 
        fit.measures=TRUE,   # request model fit indices
        standardized=TRUE,   # request standardized estimates
        rsquare=TRUE)        # request R-squared estimates

## lavaan 0.6.15 ended normally after 18 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        12
## 
##   Number of observations                           199
## 
## Model Test User Model:
##                                                       
##   Test statistic                                 2.023
##   Degrees of freedom                                 3
##   P-value (Chi-square)                           0.568
## 
## Model Test Baseline Model:
## 
##   Test statistic                               182.295
##   Degrees of freedom                                10
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    1.000
##   Tucker-Lewis Index (TLI)                       1.019
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -1258.517
##   Loglikelihood unrestricted model (H1)      -1257.506
##                                                       
##   Akaike (AIC)                                2541.035
##   Bayesian (BIC)                              2580.555
##   Sample-size adjusted Bayesian (SABIC)       2542.538
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.000
##   90 Percent confidence interval - lower         0.000
##   90 Percent confidence interval - upper         0.103
##   P-value H_0: RMSEA <= 0.050                    0.735
##   P-value H_0: RMSEA >= 0.080                    0.120
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.019
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   intention ~                                                           
##     attitude          0.352    0.058    6.068    0.000    0.352    0.370
##     norms             0.153    0.059    2.577    0.010    0.153    0.152
##     control           0.275    0.058    4.740    0.000    0.275    0.291
##   behavior ~                                                            
##     intention         0.453    0.065    7.014    0.000    0.453    0.445
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   attitude ~~                                                           
##     norms             0.200    0.064    3.128    0.002    0.200    0.227
##     control           0.334    0.070    4.748    0.000    0.334    0.357
##   norms ~~                                                              
##     control           0.220    0.065    3.411    0.001    0.220    0.249
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .intention         0.530    0.053    9.975    0.000    0.530    0.631
##    .behavior          0.699    0.070    9.975    0.000    0.699    0.802
##     attitude          0.928    0.093    9.975    0.000    0.928    1.000
##     norms             0.830    0.083    9.975    0.000    0.830    1.000
##     control           0.939    0.094    9.975    0.000    0.939    1.000
## 
## R-Square:
##                    Estimate
##     intention         0.369
##     behavior          0.198

The standardized parameter estimates of interest appear in the column labeled “Std.all” within the “Parameter Estimates” section.

55.2.6 Alternative Approaches to Model Specifications

Thus far we have focused on using the tilde (~) operator to note directional relations (i.e., path coefficients), the intercept (~ 1) operator to specify a model intercept or mean, and the double tilde (~~) to specify a covariance. Further, the plus (+) operator allowed us to add variables to one side of the directional relation and covariance equations.

Of note, there are equivalent approaches for specifying directional paths (directional relations). If we want to specify that attitude, norms, and control are predictors of intention we could use either approach shown below when specifying our model.

# Equivalent approaches to specifying directional relations

### Approach 1
specmod <- "
intention ~ attitude + norms + control
"

### Approach 2
specmod <- "
intention ~ attitude
intention ~  norms 
intention ~ control
"

The same logical applies if we hypothetically wanted to specify attitude, norms, and control as predictors of both intention and behavior. It’s really up to you how you decide to specify your model when equivalent approaches are possible.

# Equivalent approaches to specifying directional relations

### Approach 1
specmod <- "
intention + behavior ~ attitude + norms + control
"

### Approach 2
specmod <- "
intention ~ attitude + norms + control
behavior ~ attitude + norms + control
"

### Approach 3
specmod <- "
intention + behavior ~ attitude + norms + control
"

### Approach 4
specmod <- "
intention ~ attitude
intention ~  norms 
intention ~ control
behavior ~ attitude
behavior ~  norms 
behavior ~ control
"

Further, there are equivalent approaches for specifying covariances. If we want to specify that attitude, norms, and control are permitted to covary with one another, we could use either approach below.

# Equivalent approaches to specifying covariances

### Approach 1
specmod <- "
attitude ~~ norms + control
norms ~~ control
"

### Approach 2
specmod <- "
attitude ~~ norms
attitude ~~ control
norms ~~ control
"

If you would like to explicitly specify a variance component in your model (even if the model will do this by default for variables with specified covariances), you would use the double tilde (~~) operator with the same variable’s name on either side of the double tilde.

# Specifying variances
specmod <- "
attitude ~~ attitude
norms ~~ norms
control ~~ control
"

Finally, if you have a single exogenous variable in your model (that does not share a covariance with any other variable), it is up to you whether you wish to specify the variance component for that exogenous variable as a free parameter (or not). The model fit and parameter estimates will remain the same.

55.2.7 Estimating Models with Missing Data

When missing data are present, we must carefully consider how we handle the missing data before or during the estimations of a model. In the chapter on missing data, I provide an overview of relevant concepts, particularly if the data are missing completely at random (MCAR), missing at random (MAR), or missing not at random (MNAR); I suggest reviewing that chapter prior to handling missing data.

As a potential method for addressing missing data, the lavaan model functions, such as the sem function, allow for full-information maximum likelihood (FIML). Further, the functions allow us to specify specific estimators given the type of data we wish to use for model estimation (e.g., ML, MLR).

To demonstrate how missing data are handled using FIML, we will need to first introduce some missing data into our data. To do so, we will use a multiple imputation package called mice and a function called ampute that “amputates” existing data by creating missing data patterns. For our purposes, we’ll replace 10% (.1) of data frame cells with NA (which signifies a missing value) such that the missing data are missing completely at random (MCAR).

# Install package
install.packages("mice")

# Access package
library(mice)

# Create a new data frame object
df_missing <- df

# Remove non-numeric variable(s) from data frame object
df_missing$EmployeeID <- NULL

# Set a seed
set.seed(2024)

# Remove 10% of cells so missing data are MCAR
df_missing <- ampute(df_missing, prop=.1, mech="MCAR")

# Extract the new missing data frame object and overwrite existing object
df_missing <- df_missing$amp

Implementing FIML when missing data are present is relatively straightforward. For example, for a path analysis from a previous section, we can apply FIML in the presence of missing data by adding the missing="fiml" argument to the sem function.

# Specify path analysis model & assign to object
specmod <- "
# Direct relations
intention ~ attitude + norms + control
behavior ~ intention
# Covariances
attitude ~~ norms + control
norms ~~ control
"

# Estimate path analysis model & assign to fitted object
fitmod <- sem(specmod,         # name of specified model object
              data=df_missing, # name of data frame object
              missing="fiml")  # specify FIML

# Print summary of model results
summary(fitmod,              # name of fitted model object
        fit.measures=TRUE,   # request model fit indices
        rsquare=TRUE)        # request R-squared estimates

## lavaan 0.6.15 ended normally after 24 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        17
## 
##   Number of observations                           199
##   Number of missing patterns                         6
## 
## Model Test User Model:
##                                                       
##   Test statistic                                 2.227
##   Degrees of freedom                                 3
##   P-value (Chi-square)                           0.527
## 
## Model Test Baseline Model:
## 
##   Test statistic                               168.863
##   Degrees of freedom                                10
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    1.000
##   Tucker-Lewis Index (TLI)                       1.016
##                                                       
##   Robust Comparative Fit Index (CFI)             1.000
##   Robust Tucker-Lewis Index (TLI)                1.016
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -1230.989
##   Loglikelihood unrestricted model (H1)      -1229.876
##                                                       
##   Akaike (AIC)                                2495.979
##   Bayesian (BIC)                              2551.965
##   Sample-size adjusted Bayesian (SABIC)       2498.108
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.000
##   90 Percent confidence interval - lower         0.000
##   90 Percent confidence interval - upper         0.107
##   P-value H_0: RMSEA <= 0.050                    0.703
##   P-value H_0: RMSEA >= 0.080                    0.139
##                                                       
##   Robust RMSEA                                   0.000
##   90 Percent confidence interval - lower         0.000
##   90 Percent confidence interval - upper         0.109
##   P-value H_0: Robust RMSEA <= 0.050             0.702
##   P-value H_0: Robust RMSEA >= 0.080             0.144
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.018
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Observed
##   Observed information based on                Hessian
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   intention ~                                         
##     attitude          0.360    0.061    5.892    0.000
##     norms             0.141    0.061    2.325    0.020
##     control           0.278    0.060    4.599    0.000
##   behavior ~                                          
##     intention         0.444    0.066    6.740    0.000
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   attitude ~~                                         
##     norms             0.200    0.065    3.083    0.002
##     control           0.336    0.071    4.725    0.000
##   norms ~~                                            
##     control           0.221    0.066    3.368    0.001
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .intention         0.582    0.251    2.324    0.020
##    .behavior          1.780    0.207    8.617    0.000
##     attitude          3.182    0.069   46.307    0.000
##     norms             2.907    0.066   44.154    0.000
##     control           3.104    0.069   44.971    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .intention         0.539    0.056    9.674    0.000
##    .behavior          0.709    0.072    9.846    0.000
##     attitude          0.922    0.094    9.846    0.000
##     norms             0.839    0.085    9.836    0.000
##     control           0.937    0.095    9.913    0.000
## 
## R-Square:
##                    Estimate
##     intention         0.368
##     behavior          0.191

The FIML approach uses all observations in which data are missing on one or more observed variables in the model. As you can see in the output, all 199 observations were retained when estimating the model. Please note that had we not specified the covariances between the exogenous variables (i.e., attitude, norms, control), we would have seen observations listwise deleted if they had missing data on one or more of those variables. Importantly, as noted earlier in the chapter, in a real-life situation, we would think carefully about what it means both conceptually and statistically to add covariances between these exogenous variables.

Now watch what happens when we remove the missing="fiml" argument in the presence of missing data.

# Specify path analysis model & assign to object
specmod <- "
# Direct relations
intention ~ attitude + norms + control
behavior ~ intention
# Covariances
attitude ~~ norms + control
norms ~~ control
"

# Estimate path analysis model & assign to fitted object
fitmod <- sem(specmod,         # name of specified model object
              data=df_missing) # name of data frame object

# Print summary of model results
summary(fitmod,              # name of fitted model object
        fit.measures=TRUE,   # request model fit indices
        rsquare=TRUE)        # request R-squared estimates

## lavaan 0.6.15 ended normally after 18 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        12
## 
##                                                   Used       Total
##   Number of observations                           173         199
## 
## Model Test User Model:
##                                                       
##   Test statistic                                 1.351
##   Degrees of freedom                                 3
##   P-value (Chi-square)                           0.717
## 
## Model Test Baseline Model:
## 
##   Test statistic                               145.704
##   Degrees of freedom                                10
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    1.000
##   Tucker-Lewis Index (TLI)                       1.041
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -1100.040
##   Loglikelihood unrestricted model (H1)      -1099.364
##                                                       
##   Akaike (AIC)                                2224.080
##   Bayesian (BIC)                              2261.919
##   Sample-size adjusted Bayesian (SABIC)       2223.920
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.000
##   90 Percent confidence interval - lower         0.000
##   90 Percent confidence interval - upper         0.093
##   P-value H_0: RMSEA <= 0.050                    0.827
##   P-value H_0: RMSEA >= 0.080                    0.080
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.018
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   intention ~                                         
##     attitude          0.359    0.063    5.712    0.000
##     norms             0.146    0.062    2.356    0.018
##     control           0.273    0.062    4.414    0.000
##   behavior ~                                          
##     intention         0.405    0.070    5.793    0.000
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   attitude ~~                                         
##     norms             0.197    0.069    2.836    0.005
##     control           0.323    0.075    4.325    0.000
##   norms ~~                                            
##     control           0.200    0.070    2.835    0.005
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .intention         0.533    0.057    9.301    0.000
##    .behavior          0.708    0.076    9.301    0.000
##     attitude          0.912    0.098    9.301    0.000
##     norms             0.869    0.093    9.301    0.000
##     control           0.940    0.101    9.301    0.000
## 
## R-Square:
##                    Estimate
##     intention         0.364
##     behavior          0.162

As you can see in the output, the sem function defaults to listwise deletion when we do not specify that FIML be applied. This results in the number of observations dropping from 199 to 173 for model estimation purposes. A reduction in sample size can negatively affect our statistical power to detect associations or effects that truly exist in the underlying population. To learn more about statistical power, please refer to the chapter on power analysis.

Within the sem function, we can also specify a specific estimator if we choose to override the default. For example, we could specify the MLR (maximum likelihood with robust standard errors) estimator if we had good reason to. To do so, we would add this argument: estimator="MLR". For a list of other available estimators, you can check out the lavaan package website.

# Specify path analysis model & assign to object
specmod <- "
# Direct relations
intention ~ attitude + norms + control
behavior ~ intention
# Covariances
attitude ~~ norms + control
norms ~~ control
"

# Estimate path analysis model & assign to fitted object
fitmod <- sem(specmod,         # name of specified model object
              data=df_missing, # name of data frame object
              estimator="MLR", # request MLR estimator
              missing="fiml")  # specify FIML

# Print summary of model results
summary(fitmod,              # name of fitted model object
        fit.measures=TRUE,   # request model fit indices
        rsquare=TRUE)        # request R-squared estimates

## lavaan 0.6.15 ended normally after 24 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        17
## 
##   Number of observations                           199
##   Number of missing patterns                         6
## 
## Model Test User Model:
##                                               Standard      Scaled
##   Test Statistic                                 2.227       2.502
##   Degrees of freedom                                 3           3
##   P-value (Chi-square)                           0.527       0.475
##   Scaling correction factor                                  0.890
##     Yuan-Bentler correction (Mplus variant)                       
## 
## Model Test Baseline Model:
## 
##   Test statistic                               168.863     177.353
##   Degrees of freedom                                10          10
##   P-value                                        0.000       0.000
##   Scaling correction factor                                  0.952
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    1.000       1.000
##   Tucker-Lewis Index (TLI)                       1.016       1.010
##                                                                   
##   Robust Comparative Fit Index (CFI)                         1.000
##   Robust Tucker-Lewis Index (TLI)                            1.009
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -1230.989   -1230.989
##   Scaling correction factor                                  0.907
##       for the MLR correction                                      
##   Loglikelihood unrestricted model (H1)      -1229.876   -1229.876
##   Scaling correction factor                                  0.905
##       for the MLR correction                                      
##                                                                   
##   Akaike (AIC)                                2495.979    2495.979
##   Bayesian (BIC)                              2551.965    2551.965
##   Sample-size adjusted Bayesian (SABIC)       2498.108    2498.108
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.000       0.000
##   90 Percent confidence interval - lower         0.000       0.000
##   90 Percent confidence interval - upper         0.107       0.117
##   P-value H_0: RMSEA <= 0.050                    0.703       0.641
##   P-value H_0: RMSEA >= 0.080                    0.139       0.186
##                                                                   
##   Robust RMSEA                                               0.000
##   90 Percent confidence interval - lower                     0.000
##   90 Percent confidence interval - upper                     0.107
##   P-value H_0: Robust RMSEA <= 0.050                         0.676
##   P-value H_0: Robust RMSEA >= 0.080                         0.147
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.018       0.018
## 
## Parameter Estimates:
## 
##   Standard errors                             Sandwich
##   Information bread                           Observed
##   Observed information based on                Hessian
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   intention ~                                         
##     attitude          0.360    0.058    6.155    0.000
##     norms             0.141    0.055    2.587    0.010
##     control           0.278    0.061    4.543    0.000
##   behavior ~                                          
##     intention         0.444    0.065    6.832    0.000
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   attitude ~~                                         
##     norms             0.200    0.063    3.181    0.001
##     control           0.336    0.064    5.288    0.000
##   norms ~~                                            
##     control           0.221    0.064    3.443    0.001
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .intention         0.582    0.207    2.812    0.005
##    .behavior          1.780    0.209    8.531    0.000
##     attitude          3.182    0.069   46.299    0.000
##     norms             2.907    0.066   44.216    0.000
##     control           3.104    0.069   44.926    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .intention         0.539    0.051   10.541    0.000
##    .behavior          0.709    0.059   11.922    0.000
##     attitude          0.922    0.079   11.644    0.000
##     norms             0.839    0.073   11.557    0.000
##     control           0.937    0.080   11.739    0.000
## 
## R-Square:
##                    Estimate
##     intention         0.368
##     behavior          0.191

55.2.8 Summary

In this chapter, we explored the building blocks of path analysis, which allowed us to simultaneously fit a model with more than one outcome variable and with a variable that acts as both a predictor and an outcome. To do so, we used the sem function from the lavaan package and evaluated model fit indices and parameter estimates.

References

Ajzen, Icek. 1991. “The Theory of Planned Behavior.” Organizational Behavior and Human Decision Processes 50: 179–211.

Hu, Li-tze, and Peter M Bentler. 1999. “Cutoff Criteria for Fit Indexes in Covariance Structure Analysis: Conventional Criteria Versus New Alternatives.” Structural Equation Modeling 6 (1): 1–55.

Rosseel, Yves. 2012. “lavaan: An R Package for Structural Equation Modeling.” Journal of Statistical Software 48 (2): 1–36. https://www.jstatsoft.org/v48/i02/.

Wickham, Hadley, Jim Hester, and Jennifer Bryan. 2024. Readr: Read Rectangular Text Data. https://CRAN.R-project.org/package=readr.