# Chapter 33 Evaluating a Post-Test-Only with Control Group Design Using Independent-Samples *t*-test

In this chapter, we learn about the post-test-only with control group training evaluation design and how a independent-samples *t*-test can be used to analyze the data acquired from this design. We’ll begin with conceptual overviews of this training evaluation design and of the independent-samples *t*-test, and then we’ll conclude with a tutorial.

## 33.1 Conceptual Overview

In this section, we’ll begin by describing the post-test-only with control group training evaluation design, and we’ll conclude by reviewing the independent-samples *t*-test, with discussions of statistical assumptions, statistical significance, and practical significance; the section wraps up with a sample-write up of a independent-samples *t*-test used to evaluate data from a post-test-only with control group training evaluation design.

### 33.1.1 Review of Post-Test-Only with Control Group Design

In a **post-test-only with control group** training evaluation design (i.e., research design), employees are assigned (randomly or non-randomly) to either a treatment group (e.g., new training program) or a control group (e.g., comparison group, old training program), and every participating employee is assessed on selected training outcomes (i.e., measures) after the training has concluded. If *random assignment* to groups is used, then a post-test-only with control group design is considered *experimental*. Conversely, if *non-random assignment* to groups is used, then the design is considered *quasi-experimental*. Regardless of whether random or non-random assignment is used, an independent-samples *t*-test can be used to analyze the data from a post-test-only with control group design, provided key statistical assumptions are satisfied.

Like any evaluation design, there are limitations to the inferences and conclusions we can draw from a post-test-only with control group design. As a major strength, this design includes a control group, and if coupled with random assignment to groups, then the design qualifies as a true experimental design. With that being said, if we use *non*-random assignment to the treatment and control groups, then we are less likely to have equivalent groups of individuals who enter each group, which may bias how they engage in the demands of their respective group and how they complete the outcome measures. Further, because this design lacks a pre-test (i.e., assessment of initial performance on the outcome measures), we cannot be confident that employees in the treatment and control groups “started in the same place” with respect to the outcome(s) we might measure at post-test. Consequently, any differences we observe between the two groups on a post-test outcome measure may reflect pre-existing differences – meaning, the training may not have “caused” the differences that are apparent at post-test.

### 33.1.2 Review of Independent-Samples *t*-test

Link to conceptual video: https://youtu.be/9uLd4nzGyGQ

The **independent-samples t-test** is an inferential statistical analysis that can be used to compare the to compare the means of two independent groups, such as a treatment and control group. That is, this analysis compares differences in means when we have two separate groups of cases drawn from two populations; critically, each case must appear in only one of the two samples, hence the name

*independent-samples*

*t*-test. In the context of a post-test-only training evaluation design, we can conceptualize group membership (e.g., treatment vs. control) as a categorical (nominal, ordinal) predictor variable with just two categories (i.e., levels), and the post-test outcome measure as a continuous (interval, ratio) outcome variable. Importantly, the outcome variable must be continuous for an independent-samples

*t*-test to be an appropriate analysis. The independent-samples

*t*-test is sometimes called a between-subjects

*t*-test or a two-groups

*t*-test.

The formula for an independent-samples *t*-test can be written as follows:

\(t = \frac{\overline{X}_1 - \overline{X}_2}{s_{{X}_{1}{X}_2} \sqrt{\frac{2}{n}}}\)

where \(X_{1}\) is the mean of the first group and \(X_{2}\) is the mean of the second group, \(s_{{X}_{1}{X}_2}\) is the pooled standard deviation of \(X_{1}\) and \(X_{2}\), and \(n\) refers to the number of cases (i.e., sample size).

#### 33.1.2.1 Statistical Assumptions

The statistical assumptions that should be met prior to running and/or interpreting estimates from an independent-samples *t*-test include:

- The outcome (dependent, response) variable has a univariate normal distribution in each of the two underlying populations (e.g., samples, groups, conditions), which correspond to the two categories (levels) of the predictor (independent, explanatory) variable;
- The variances of the outcome (dependent, response) variable are equal across the two populations (e.g., samples, groups, conditions), which is often called the equality of variances or homogeneity of variances assumption.

#### 33.1.2.2 Statistical Significance

If we wish to know whether the two means we are comparing differ to a statistically significant extent, we can compare our *t*-value value to a table of critical values of a *t*-distribution. If our calculated value is larger than the critical value given the number of degrees of freedom (*df* = *n* - 2) and the desired alpha level (i.e., significance level, *p*-value threshold), we would conclude that there is evidence of a significant *difference* in means between the two independent samples. Alternatively, we can calculate the exact *p*-value if we know the *t*-value and the degrees of freedom. Fortunately, modern statistical software calculates the *t*-value, degrees of freedom, and *p*-value for us.

Using null hypothesis significance testing (NHST), we interpret a *p*-value that is *less than .05* (or whatever two- or one-tailed alpha level we set) to meet the standard for statistical significance, meaning that we reject the null hypothesis that the difference between the two means is equal to zero. In other words, if the *p*-value is less than .05, we conclude that the two means differ from each other to a statistically significant extent. In contrast, if the *p*-value is *equal to or greater than .05*, then we fail to reject the null hypothesis that the difference between the two means is equal to zero. Put differently, if the *p*-value is equal to or greater than .05, we conclude that the two means do *not* differ from zero to a statistically significant extent, leading us to conclude that there is no difference between the two means in the population.

When setting an alpha threshold, such as the conventional two-tailed .05 level, sometimes the question comes up regarding whether borderline *p*-values signify significance or nonsignificance. For our purposes, let’s be very strict in our application of the chosen alpha level. For example, if we set our alpha level at .05, *p* = .049 would be considered statistically significant, and *p* = .050 would be considered statistically nonsignificant.

Because our independent-samples *t*-test is estimated using data from a sample drawn from an underlying population, sampling error will affect the extent to which our sample is representative of the population from which its drawn. That is, the observed difference between the two means is a *point estimate* of the population parameter and is subject to sampling error. Fortunately, confidence intervals can give us a better idea of what the true population parameter value might be. If we apply an alpha level of .05 (two-tailed), then the equivalent confidence interval (CI) is a 95% CI. In terms of whether the difference between two means is is statistically significant, if the lower and upper limits of 95% CI do *not* include zero, then this tells us the same thing as a *p*-value that is less than .05. Strictly speaking, a 95% CI indicates that if we were to hypothetically draw many more samples from the underlying populations and construct CIs for each of those samples, then the true parameter (i.e., true value of the difference in means in the population) would likely fall within the lower and upper bounds of 95% of the estimated CIs. In other words, the 95% CI gives us an indication of plausible values for the population parameter while taking into consideration sampling error. A wide CI (i.e., large difference between the lower and upper limits) signifies more sampling error, and a narrow CI signifies less sampling error.

#### 33.1.2.3 Practical Significance

A significant independent-samples *t*-test and associated *p*-value only tells us that the two means are statistically different from one another. It does not, however, tell us about the magnitude of the difference between means – or in other words, the practical significance. The standardized mean difference score (Cohen’s *d*) is an effect size, which means that it is a standardized metric that can be used to compare *d*-values compare samples. In essence, the Cohen’s *d* indicates the magnitude of the difference between means in standard deviation units. A *d*-value of .00 indicates that there is no difference between the two means, while the following are some generally accepted qualitative-magnitude labels we can attach to the absolute value of *d*.

Cohen’s d |
Description |
---|---|

.20 | Small |

.50 | Medium |

.80 | Large |

Here is the formula for computing *d*:

\(d = t \sqrt{\frac{n_1 + n_2} {n_1n_2}}\)

where \(t\) refers to the calculated \(t\)-value, \(n_1\) refers to the sample size of the first independent sample, and \(n_2\) refers to the sample size of the second independent sample.

#### 33.1.2.4 Sample Write-Up

We conducted a study using employees from our organization to determine whether a new safety training program outperformed the old safety training program. Fifty employees were randomly assigned to complete either the new safety training program (treatment group) or the the old safety training program (control group), resulting in 25 employees per condition. Both groups of participants completed a safety knowledge test one week after completing their respective training programs. In other words, these participants were part of a randomized post-test-only with control group training evaluation design. Given that separate groups of employees completed the new and old safety training programs, an independent-samples *t*-test was used to determine whether the average safety knowledge test score for the group that completed the new training program was significantly different than the average safety knowledge test score for the group that completed the old training program. Possible test scores could range from 1 to 100 points. We found a statistically significant difference between the means on the safety knowledge test for the treatment group that participated in the new training program (*M* = 72.36, *SD* = 6.98, *n* = 25) and the control group that participated in the old training program (*M* = 61.32, *SD* = 9.15, *n* = 25), such that those who participated in the new training program performed better on the safety knowledge test (*t* = 4.80, *p* < .01, 95% CI[6.41, 15.67]). Further, because we found a statistically significant difference in means, we then interpreted Cohen’s *d* as an indicator of practical significance for this effect, leading us to conclude that the effect was large (*d* = 1.36). In sum, with respect to the safety knowledge test, the new safety training program outperformed the old safety training program to a large extent.

## 33.2 Tutorial

This chapter’s tutorial demonstrates how to estimate an independent-samples *t*-test, test the associated statistical assumptions, and present the findings in writing and visually.

### 33.2.1 Video Tutorial

As usual, you have the choice to follow along with the written tutorial in this chapter or to watch the video tutorial below.

Link to video tutorial: https://youtu.be/oATcHuMZtuo

### 33.2.3 Initial Steps

If you haven’t already, save the file called **“TrainingEvaluation_PostControl.csv”** into a folder that you will subsequently set as your working directory. Your working directory will likely be different than the one shown below (i.e., `"H:/RWorkshop"`

). As a reminder, you can access all of the data files referenced in this book by downloading them as a compressed (zipped) folder from the my GitHub site: https://github.com/davidcaughlin/R-Tutorial-Data-Files; once you’ve followed the link to GitHub, just click “Code” (or “Download”) followed by “Download ZIP”, which will download all of the data files referenced in this book. For the sake of parsimony, I recommend downloading all of the data files into the same folder on your computer, which will allow you to set that same folder as your working directory for each of the chapters in this book.

Next, using the `setwd`

function, set your working directory to the folder in which you saved the data file for this chapter. Alternatively, you can manually set your working directory folder in your drop-down menus by going to *Session > Set Working Directory > Choose Directory…*. Be sure to create a new R script file (.R) or update an existing R script file so that you can save your script and annotations. If you need refreshers on how to set your working directory and how to create and save an R script, please refer to Setting a Working Directory and Creating & Saving an R Script.

Next, read in the .csv data file called **“TrainingEvaluation_PostControl.csv”** using your choice of read function. In this example, I use the `read_csv`

function from the `readr`

package (Wickham, Hester, and Bryan 2024). If you choose to use the `read_csv`

function, be sure that you have installed and accessed the `readr`

package using the `install.packages`

and `library`

functions. *Note: You don’t need to install a package every time you wish to access it; in general, I would recommend updating a package installation once ever 1-3 months.* For refreshers on installing packages and reading data into R, please refer to Packages and Reading Data into R.

```
# Install readr package if you haven't already
# [Note: You don't need to install a package every
# time you wish to access it]
install.packages("readr")
```

```
# Access readr package
library(readr)
# Read data and name data frame (tibble) object
td <- read_csv("TrainingEvaluation_PostControl.csv")
```

```
## Rows: 50 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Condition
## dbl (2): EmpID, PostTest
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
```

`## [1] "EmpID" "Condition" "PostTest"`

```
## spc_tbl_ [50 × 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ EmpID : num [1:50] 26 27 28 29 30 31 32 33 34 35 ...
## $ Condition: chr [1:50] "New" "New" "New" "New" ...
## $ PostTest : num [1:50] 66 74 62 84 78 73 60 61 71 83 ...
## - attr(*, "spec")=
## .. cols(
## .. EmpID = col_double(),
## .. Condition = col_character(),
## .. PostTest = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
```

```
## # A tibble: 6 × 3
## EmpID Condition PostTest
## <dbl> <chr> <dbl>
## 1 26 New 66
## 2 27 New 74
## 3 28 New 62
## 4 29 New 84
## 5 30 New 78
## 6 31 New 73
```

There are 50 cases (i.e., employees) and 3 variables in the `td`

data frame: `EmpID`

(unique identifier for employees), `Condition`

(training condition: *New* = new training program, *Old* = old training program), and `PostTest`

(post-training scores on training assessment, ranging from 1-100, where 100 indicates better performance). Regarding participation in the training conditions, 25 employees participated in the old training program, and 25 employees participated in the new training program.

### 33.2.4 Estimate Independent-Samples *t*-test

Different functions are available that will allow us to estimate an independent-samples *t*-test in R. In this chapter, we will review how to run an independent-samples *t*-test using the `ttest`

function from the `lessR`

package (Gerbing, Business, and University 2021), as it produces a generous amount of output automatically. Using this function, we will evaluate whether the means on the post-test variable (`PostTest`

) differ based on the two levels of the training `Condition`

variable (*New*, *Old*) differ from one another to a statistically significant extent; in other words, let’s find out if we should treat the means as being different from one another.

To access and use the `ttest`

function, we need to install and/or access the `lessR`

package. If you haven’t already, be sure to install the `lessR`

package; if you’ve recently installed the package, then you likely don’t need to install it in this session, and you can skip that step to save time. You *will* need to run the `library`

function to access the package (assuming it’s been installed), so don’t forget that important step.

Please note that when you use the `library`

function to access the `lessR`

package in a new R or `RStudio`

session, you will likely receive a message in red font in your Console. Typically, red font in your Console is not a good sign; however, accessing the `lessR`

package is one of the unique situations in which a red-font message is not a warning or error message. You may also receive a warning message that indicates that certain “objects are masked.” For our purposes, we can ignore that message.

Now we’re ready to run an independent-samples *t*-test. To begin, type the name of the `ttest`

function. As the first argument in the parentheses, specify the statistical model that we wish to estimate. To do so, type the name of the continuous outcome (dependent) variable (`PostTest`

) to the left of the `~`

operator and the name of the categorical predictor (independent) variable (`Condition`

) to the right of the `~`

operator. For the second argument, use `data=`

to specify the name of the data frame where the outcome and predictor variables are located (`td`

). For the third argument, enter `paired=FALSE`

to inform R that the data are *not* paired (i.e., you are *not* requesting a paired-samples *t*-test).

```
##
## Compare PostTest across Condition with levels New and Old
## Response Variable: PostTest, PostTest
## Grouping Variable: Condition,
##
##
## ------ Describe ------
##
## PostTest for Condition New: n.miss = 0, n = 25, mean = 72.360, sd = 6.975
## PostTest for Condition Old: n.miss = 0, n = 25, mean = 61.320, sd = 9.150
##
## Mean Difference of PostTest: 11.040
##
## Weighted Average Standard Deviation: 8.136
##
##
## ------ Assumptions ------
##
## Note: These hypothesis tests can perform poorly, and the
## t-test is typically robust to violations of assumptions.
## Use as heuristic guides instead of interpreting literally.
##
## Null hypothesis, for each group, is a normal distribution of PostTest.
## Group New Shapiro-Wilk normality test: W = 0.950, p-value = 0.253
## Group Old Shapiro-Wilk normality test: W = 0.969, p-value = 0.621
##
## Null hypothesis is equal variances of PostTest, homogeneous.
## Variance Ratio test: F = 83.727/48.657 = 1.721, df = 24;24, p-value = 0.191
## Levene's test, Brown-Forsythe: t = -0.955, df = 48, p-value = 0.344
##
##
## ------ Infer ------
##
## --- Assume equal population variances of PostTest for each Condition
##
## t-cutoff for 95% range of variation: tcut = 2.011
## Standard Error of Mean Difference: SE = 2.301
##
## Hypothesis Test of 0 Mean Diff: t-value = 4.798, df = 48, p-value = 0.000
##
## Margin of Error for 95% Confidence Level: 4.627
## 95% Confidence Interval for Mean Difference: 6.413 to 15.667
##
##
## --- Do not assume equal population variances of PostTest for each Condition
##
## t-cutoff: tcut = 2.014
## Standard Error of Mean Difference: SE = 2.301
##
## Hypothesis Test of 0 Mean Diff: t = 4.798, df = 44.852, p-value = 0.000
##
## Margin of Error for 95% Confidence Level: 4.635
## 95% Confidence Interval for Mean Difference: 6.405 to 15.675
##
##
## ------ Effect Size ------
##
## --- Assume equal population variances of PostTest for each Condition
##
## Standardized Mean Difference of PostTest, Cohen's d: 1.357
##
##
## ------ Practical Importance ------
##
## Minimum Mean Difference of practical importance: mmd
## Minimum Standardized Mean Difference of practical importance: msmd
## Neither value specified, so no analysis
##
##
## ------ Graphics Smoothing Parameter ------
##
## Density bandwidth for Condition New: 4.175
## Density bandwidth for Condition Old: 5.464
```

As you can see in the output, the `ttest`

function provides descriptive statistics, assumption tests regarding the distributions and the variances, a statistical significance test of the mean comparison, and an indicator of practical significance in the form of the standardized mean difference (Cohen’s *d*). In addition, the default data visualization depicting the two density distributions is helpful for understanding the difference between the two means. Let’s now review each section of the output.

**Description**: The *Description* section includes basic descriptive statistics about the sample. In the output, we can see that that there are 25 employees in each condition (*n* = 25), and descriptively, the mean `PostTest`

score for the *New* condition is 72.36 (*SD* = 6.98), and the mean `PostTest`

score for the *Old* condition is 61.32 (*SD* = 9.15). The difference between the two means is 11.04.

**Assumptions:** As noted in the output: “These hypothesis tests can perform poorly, and the *t*-test is typically robust to violations of assumptions. Use as heuristic guides instead of interpreting literally.” Given that, we shouldn’t put too much of an emphasis on these statistical assumption tests, but nevertheless than can provide some guidance when it comes to detecting potential statistical-assumption violations that might preclude us from choosing to interpret the results of the independent-samples *t*-test itself.

Let’s dive into interpreting these assumption tests. The **Shapiro-Wilk normality test** is used to test the null hypothesis that a distribution is normal; as such, if the *p*-value associated with the test statistic (*W*) is *less than* the conventional alpha level of .05, then we *reject* the null hypothesis and assume that the distribution is *not* normal. If, however, *p*-value associated with the test statistic (*W*) is *greater than* .05, then we *fail to reject* the null hypothesis, which means that we do not have statistical evidence that the distribution is anything other than normal; in other words, if the *p*-value is equal to or greater than our alpha level (.05), then we can assume the variable is normally distributed, and can feel satisfied that we have met the statistical assumption of normality for that particular distribution. In the output, we can see that the `PostTest`

variable for those in the *New* training condition is normally distributed (*W* = .950, *p* = .253), and for those in the *Old* training condition, the `PostTest`

variable is also normally distributed (*W* = .969, *p* = .621).

*Note:* In accordance with the central limit theorem, a sampling distribution will tend to approximate a normal distribution when it is based on more than 30 cases (*N* > 30). Thus, if there are more than 30 cases in each independent sample, then the `ttest`

function won’t report the Shapiro-Wilk test, as univariate normality will be assumed.

As for the equal variances assumption, **Levene’s test** (i.e., homogeneity of variances test) is commonly used. The null hypothesis of this test is that the variances are equal. Thus, if the *p*-value is *less* than the conventional alpha level of .05, then we reject the null hypothesis and assume the variances are different. If, however, the *p*-value is *equal to or greater than* .05, then we fail to reject the null hypothesis and assume that the variances are equal (i.e., variances are homogeneous). In the output, we see that the test is nonsignificant (*t* = -.955, *p* = .344), which suggests that, based on this test, we have no reason to believe that the two variances are anything but equal. All in all, we found evidence to support that we met the two statistical assumptions necessary to proceed forward with making inferences.

**Inference:** The *Inference* section of the output is where you will find the independent-samples *t*-test itself. This section is called *Inference* because this is where we’re making statistical inferences about the underlying population of employees. Specifically, the *t*-test and its associated *p*-value represent the statistical test of the null hypothesis (i.e., the two means are equal). If we have evidence that the variances are equal (which we do based on Levene’s test), then we should interpret the sub-section titled *Assume equal population variances of PostTest for each Condition*. If we had instead found evidence that the variances were *not* equal, then we would want to interpret the sub-section titled *Do not assume equal population variances of PostTest for each Condition*.

First, take a look at the line prefaced with *Hypothesis Test of 0 Mean Diff*; this line contains the results of the independent-samples *t*-test (*t* = 4.798, *p* < .001). Because the *p*-value is less than our conventional two-tailed alpha cutoff of .05, we reject the null hypothesis and conclude that the two means are different from one another. How do we know which mean is greater (or less than) the other? Well, we need to look back to the *Description* section; in that section, we see that the mean `PostTest`

score for the *New* training condition is greater than the mean for the *Old* training condition; as such, we can conclude the following: The average post-training assessment score for employees who participated in the new training program (*M* = 72.36, *SD* = 6.98) was significantly higher than the average score for those who participated in the old training program (*M* = 61.32, *SD* = 9.15) (*t* = 4.798, *p* < .001). Regarding the 95% confidence interval, we can conclude that the true mean difference in the population is likely between 6.41 and 15.67 (on a 1-100 point assessment).

**Effect Size:** In the *Effect Size* section, the standardized mean difference (Cohen’s *d*) is provided as an indicator of practical significance. In the output, *d* is equal to 1.36, which is considered to be a (very) large effect, according to conventional rules-of-thumb (see table below). *Please note that typically we only interpret practical significance when a difference has been found to be statistically significant (see Inference section).*

Cohen’s d |
Description |
---|---|

.20 | Small |

.50 | Medium |

.80 | Large |

**Sample Write-Up:** In total, 25 employees participated in the new training program and 25 employees participated in the old training program. After completely their respective training programs, employees completed an assessment of their knowledge, where scores could range from 1-100. The average post-training assessment score for employees who participated in the new training program (*M* = 72.36, *SD* = 6.98) was significantly higher than the average score for those who participated in the old training program (*M* = 61.32, *SD* = 9.15) (*t* = 4.798, *p* < .001, 95% CI[6.41, 15.67]). This difference can be considered to be very large (*d* = 1.36).

### 33.2.5 Visualize Results Using Bar Chart

When we find a statistically significant difference between two means based on an independent-samples *t*-test, you may decide to present the two means in a bar chart. We will use the `BarChart`

function from `lessR`

to do so.

Type the name of the `BarChart`

function. As the first argument, type `x=`

followed by the name of the categorical predictor variable (`Condition`

). As the second argument, type `y=`

followed by the name of the continuous outcome variable (`PostTest`

). As the third argument, specify `stat="mean"`

to request the application of the mean function to the `PostTest`

variable based on the levels of the `Condition`

variable. As the fourth argument, type `data=`

followed by the name of the data frame object to which our predictor and outcome variables belong (`td`

). As the fifth argument, use `xlab=`

to provide the x-axis label (`"Training Condition"`

). As the sixth argument, use `ylab=`

to provide the y-axis label (`"Post-Test Score"`

).

```
# Create bar chart
BarChart(x=Condition, y=PostTest,
stat="mean",
data=td,
xlab="Training Condition",
ylab="Post-Test Score")
```

```
## PostTest
## - by levels of -
## Condition
##
## n miss mean sd min mdn max
## New 25 0 72.36 6.98 60.00 73.00 84.00
## Old 25 0 61.32 9.15 42.00 61.00 79.00
```

```
## >>> Suggestions
## Plot(PostTest, Condition) # lollipop plot
##
## Plotted Values
## --------------
## New Old
## 72.360 61.320
```

### 33.2.6 Summary

In this chapter, we learned to use the independent-samples *t*-test to compare two means from independent groups of cases, which is the case when we evaluating a training program using a post-test-only *with* control group design. The `ttest`

function from `lessR`

can be used to run an independent-samples *t*-test, and the `BarChart`

function from `lessR`

can be used to present the results of a significant difference visually.

## 33.3 Chapter Supplement

In addition to the `ttest`

function from the `lessR`

package covered above, we can use the `t.test`

function from base R to estimate an independent-samples *t*-test. Further, a statistically equivalent approach to evaluating a post-test-only with control group design is to estimate a simple linear regression model, which can be done using the `lm`

function from base R. Because these functions both come from base R, we do not need to install and access an additional package.

### 33.3.1 Functions & Packages Introduced

Function | Package |
---|---|

`tapply` |
base R |

`shapiro.test` |
base R |

`leveneTest` |
`car` |

`t.test` |
base R |

`cohen.d` |
`effsize` |

`mean` |
base R |

`lm` |
base R |

`summary` |
base R |

`options` |
base R |

`as.factor` |
base R |

`relevel` |
base R |

### 33.3.2 Initial Steps

If required, please refer to the Initial Steps section from this chapter for more information on these initial steps.

```
# Install readr package if you haven't already
# [Note: You don't need to install a package every
# time you wish to access it]
install.packages("readr")
```

```
# Access readr package
library(readr)
# Read data and name data frame (tibble) object
td <- read_csv("TrainingEvaluation_PostControl.csv")
```

```
## Rows: 50 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Condition
## dbl (2): EmpID, PostTest
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
```

`## [1] "EmpID" "Condition" "PostTest"`

```
## spc_tbl_ [50 × 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ EmpID : num [1:50] 26 27 28 29 30 31 32 33 34 35 ...
## $ Condition: chr [1:50] "New" "New" "New" "New" ...
## $ PostTest : num [1:50] 66 74 62 84 78 73 60 61 71 83 ...
## - attr(*, "spec")=
## .. cols(
## .. EmpID = col_double(),
## .. Condition = col_character(),
## .. PostTest = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
```

```
## # A tibble: 6 × 3
## EmpID Condition PostTest
## <dbl> <chr> <dbl>
## 1 26 New 66
## 2 27 New 74
## 3 28 New 62
## 4 29 New 84
## 5 30 New 78
## 6 31 New 73
```

### 33.3.3 `t.test`

Function from Base R

By itself, the `t.test`

function from base R does not generate statistical assumption tests and and estimates of effect size (practical significance) like the `ttest`

function from `lessR`

does. Thus, we will need to apply additional functions to achieve all of the necessary output.

Before running our independent-samples *t*-test using the `t.test`

function from base R, we should test two assumptions (see Statistical Assumptions section). We will begin by estimating the *Shapiro-Wilk normality test* and *Levene’s test* of equal variances (i.e., homogeneity of variances), which are two statistical tests of the two statistical assumptions. Both of these tests were covered in detail above when we applied the `ttest`

function from `lessR`

, so we will breeze through the interpretation in this section.

To compute the Shapiro-Wilk normality test to test the assumption of normal distributions, we will use the `shapiro.test`

function from base R. Because we need to test the assumption of normality of the outcome variable (`PostTest`

) for both levels of the predictor variable (`Condition`

), we also need to use the `tapply`

function from base R. The `tapply`

function can be quite useful, as it allows us to apply a function to a variable for each level of another categorical (nominal, ordinal) variable. To begin, type the name of the `tapply`

function. As the first argument, type the name of the data frame (`td`

), followed by the `$`

symbol and the name of the outcome variable (`PostTest`

). As the second argument, type the name of the data frame (`td`

), followed by the `$`

symbol and the name of the categorical predictor variable (`Condition`

). Finally, as the third argument, type the name of the `shapiro.test`

function.

*Note:* In accordance with the central limit theorem, a sampling distribution will tend to approximate a normal distribution when it is based on more than 30 cases (*N* > 30). Thus, if there are more than 30 cases in each independent sample, then we can assume that the assumption of univariate normality has been met, which means we won’t formally test the assumption for the two independent samples.

```
# Compute Shapiro-Wilk normality test for normal distributions
tapply(td$PostTest, td$Condition, shapiro.test)
```

```
## $New
##
## Shapiro-Wilk normality test
##
## data: X[[i]]
## W = 0.95019, p-value = 0.2533
##
##
## $Old
##
## Shapiro-Wilk normality test
##
## data: X[[i]]
## W = 0.96904, p-value = 0.6208
```

The output indicates that the *p*-values associated with both tests are equal to or greater than the conventional alpha of .05; therefore, we *fail to reject* the null hypothesis that the values are normally distributed. In other words, we have evidence that the outcome variable is normally distributed for both conditions, which suggests that we have met the first statistical assumption.

To test the equality (homogeneity) of variances assumption, we will use the `leveneTest`

function from the `car`

package. More than likely the `car`

package is already installed, as many other packages are dependent on it. That being said, you may still need to install the package prior to accessing it using the `library`

function.

Type the name of the `leveneTest`

function. As the first argument, specify the statistical model. To do so, type the name of the outcome variable (`PostTest`

) to the left of the `~`

symbol and the name of the predictor variable (`Condition`

) to the right of the `~`

symbol. For the second argument, use `data=`

to specify the name of the data frame (`td`

).

```
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 1 0.9121 0.3443
## 48
```

The output indicates that the *p*-value (i.e., `Pr(>F)`

= .3443) associated with Levene’s test is equal to or greater than an alpha of .05; thus, we *fail to reject* the null hypothesis that the variances are equal and thus conclude that the variances are equal. We have satisfied the second statistical assumption.

Now that we seem to have satisfied the two statistical assumptions, we are ready to apply the `t.test`

function from base R. To begin, type the name of the `t.test`

function. As the first argument, type the name of the outcome (dependent) variable (`PostTest`

) to the left of the `~`

symbol and the name of the predictor (independent) variable (`Condition`

) to the right of the `~`

symbol. For the second argument, use `data=`

to specify the name of the data frame (`td`

). As the third argument, type `paired=FALSE`

to indicate that the data are *not* paired, which means to that are *not* requesting a paired-samples *t*-test. As the final argument, type `var.equal=TRUE`

to indicate that we found evidence that the variances were equal; if you had found evidence that the variances were *not* equal, then you would use the argument `var.equal=FALSE`

.

```
# Independent-samples t-test using t.test function from base R
t.test(PostTest ~ Condition, data=td, paired=FALSE, var.equal=TRUE)
```

```
##
## Two Sample t-test
##
## data: PostTest by Condition
## t = 4.7976, df = 48, p-value = 0.000016
## alternative hypothesis: true difference in means between group New and group Old is not equal to 0
## 95 percent confidence interval:
## 6.413209 15.666791
## sample estimates:
## mean in group New mean in group Old
## 72.36 61.32
```

Note that the output provides you with the results of the independent-samples *t*-test in terms of a formal statistical test (*t* = 4.798, *p* < .001), the 95% confidence interval (95% CI[6.413, 15.667]), and the mean of the outcome variable for each level of the categorical predictor variable (*New* condition: *M* = 72.36; *Old* condition: *M* = 61.32). Thus, we found evidence that the mean `PostTest`

score for the *New* training condition was statistically significantly higher than the mean of the *Old* training condition. The output, however, does *not* include an estimate of practical significance (i.e., effect size).

To compute Cohen’s *d* as an indicator of practical significance, we will use the `cohen.d`

function from the `effsize`

package (Torchiano 2020). If you haven’t already, install the `effsize`

package. Make sure to access the package using the `library`

function.

As the first argument in the `cohen.d`

function parentheses, type the name of the outcome variable (`PostTest`

) to the left of the `~`

symbol and the name of the predictor variable (`Condition`

) to the right of the `~`

symbol. For the second argument, use `data=`

to specify the name of the data frame (`td`

). As the third argument, type `paired=FALSE`

to indicate that the data are *not* paired (i.e., dependent).

```
##
## Cohen's d
##
## d estimate: 1.356961 (large)
## 95 percent confidence interval:
## lower upper
## 0.7262066 1.9877157
```

The output indicates that Cohen’s *d* is 1.357, which would be considered large by conventional cutoff standards.

Cohen’s d |
Description |
---|---|

.20 | Small |

.50 | Medium |

.80 | Large |

**Sample Write-Up:** In total, 25 employees participated in the new training program and 25 employees participated in the old training program. After completely their respective training programs, employees completed an assessment of their knowledge, where scores could range from 1-100. The average post-training assessment score for employees who participated in the new training program (*M* = 72.36, *SD* = 6.98) was significantly higher than the average score for those who participated in the old training program (*M* = 61.32, *SD* = 9.15) (*t* = 4.798, *p* < .001, 95% CI[6.41, 15.67]). This difference can be considered to be very large (*d* = 1.36).

### 33.3.4 `lm`

Function from Base R

An independent-samples *t*-test can alternatively be specified using a simple linear regression model; thus, a post-test-only with control group design can also be evaluated using a simple linear regression model. Doing so, offers some advantages if introducing covariates (i.e., control variables) is of interest, which would expand the model to a multiple linear regression model. For more comprehensive reviews, please refer to the chapters that introduce simple linear regression models and multiple linear regression models in the context of selection tool validation.

To evaluate a post-test-only with control group design using simple linear regression and the `lm`

function from base R, we must do the following.

- Specify the name of an object to which we can assign our regression model using the
`<-`

assignment operator. Here, I decided to name the object`reg_mod`

. - Type the name of the
`lm`

function.

- As the first argument, specify a simple linear regression model, which will use the same syntax as the
`t.test`

function above:`PostTest ~ Condition`

. That is, we need to regression the post-test variable (`PostTest`

) onto the condition variable (`Condition`

) using the`~`

operator. - As the second argument, type
`data=`

followed by the name of the data frame object to which the variables in our model belong (`td`

).

- Type the name of the
`summary`

function from base R, and as the sole parenthetical argument, specify the name of the object we created above (`reg_mod`

).

```
# Estimate simple linear regression model using lm function from base R
reg_mod <- lm(PostTest ~ Condition, data=td)
# Print summary of results
summary(reg_mod)
```

```
##
## Call:
## lm(formula = PostTest ~ Condition, data = td)
##
## Residuals:
## Min 1Q Median 3Q Max
## -19.32 -3.36 0.64 3.68 17.68
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 72.360 1.627 44.470 < 0.0000000000000002 ***
## ConditionOld -11.040 2.301 -4.798 0.000016 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.136 on 48 degrees of freedom
## Multiple R-squared: 0.3241, Adjusted R-squared: 0.31
## F-statistic: 23.02 on 1 and 48 DF, p-value: 0.000016
```

*Note:* Your results may default to scientific notation, which is completely fine if you’re comfortable interpreting scientific notation. If you wish to view the output in traditional notation, and not scientific notation, you can specify the `options`

function from base R as follows. After that you, can re-estimate your model from above to see the results in traditional notation.

In the output, we will take note of the section called *Coefficients*. The intercept coefficient estimate (`(Intercept)`

) represents the mean score on the `PostTest`

variable for the default reference group (i.e., comparison group). Behind the scenes, the `lm`

function converts `Condition`

variable to dummy variable of type factor, where the two levels of the variable – “New” and “Old” – are converted to values of 0 and 1, respectively. By default, the level that comes first alphabetically, which in this example is the “New” level, is assigned a value of 0, whereas the second level alphabetically, which is the “Old” level, is assigned a value of 1. Thus, in this example, when the `Condition`

variable is equal to 0 (i.e., “New”), the model intercept is 72.360; in other words, the mean post-test score for those who participated in the “New” training condition is 72.360. The `ConditionOld`

variable represents the `Condition`

variable with the focal group being the those who participated in the “Old” training condition; accordingly, the coefficient estimate of -11.040 represents the difference between the “Old” condition post-test mean and the “New” condition post-test mean (i.e., “Old” minus “New”). Based on the intercept estimate, we already know the mean post-test score for the “New” condition is 72.360, which means that the “Old” condition post-test mean is equal to 61.320 (72.360 - 11.040 = 61.320). Further, the *p*-value (`Pr(>|t|)`

) associated with the `ConditionOld`

variable is less than .05, which indicates the the difference between the two post-test means (11.040) is statistically significantly different from zero and in favor of the “New” training condition. In fact, the absolute value of the *t*-value associated with the `ConditionOld`

variable is 4.798, which is equal to the *t*-value we estimated using the `t.test`

function in the previous section. Finally, the unadjusted *R*^{2} (i.e., multiple R-squared) associated with the model is .3241, which indicates that the `ConditionOld`

variable explains 32.41% of the variance in the `PostTest`

variable, which can be considered a large effect given the thresholds shown below.

R^{2} |
Description |
---|---|

.01 | Small |

.09 | Medium |

.25 | Large |

If we wish to change the default reference group in our model from “New” to “Old”, we can apply the `as.factor`

and `relevel`

functions from base R within the `lm`

function. First, we need to wrap the `Condition`

variable in the `as.factor`

variable to convert the `Condition`

variable to a variable of type factor. Second, we need to treat `as.factor(Condition)`

as the first argument in the `relevel`

function; in the second argument of the `relevel`

function, we need to specify `ref=`

followed by the level which we wish to serve as our reference group, which is “New” in this case. Once we’ve done that, we can re-estimate our model and summarize the results.

```
# Re-level and re-estimate model with "Old" as reference group
reg_mod <- lm(PostTest ~ relevel(as.factor(Condition), ref="Old"),
data=td)
# Print summary of results
summary(reg_mod)
```

```
##
## Call:
## lm(formula = PostTest ~ relevel(as.factor(Condition), ref = "Old"),
## data = td)
##
## Residuals:
## Min 1Q Median 3Q Max
## -19.32 -3.36 0.64 3.68 17.68
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 61.320 1.627 37.685 < 0.0000000000000002 ***
## relevel(as.factor(Condition), ref = "Old")New 11.040 2.301 4.798 0.000016 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.136 on 48 degrees of freedom
## Multiple R-squared: 0.3241, Adjusted R-squared: 0.31
## F-statistic: 23.02 on 1 and 48 DF, p-value: 0.000016
```

In the updated output, our intercept estimate is 61.320, which represents the post-test mean for the “Old” condition. The updated coefficient estimate associated with the `Condition`

variable is now equal to 11.040, which again represents the difference between the mean post-test scores for the “Old” and “New” conditions, except this time the reference group is the “Old” condition. Accordingly, the positive coefficient estimate of 11.040 indicates that the “New” condition post-test mean is 11.040 points higher than the “Old” condition mean of 61.320 – or in other words, 72.360 (i.e., 61.320 + 11.040 = 72.360). The absolute value of the *t*-value and the value of the *p*-value remain the same, as does the value of the unadjusted *R*^{2}.

### References

*lessR: Less Code, More Results*. https://CRAN.R-project.org/package=lessR.

*Effsize: Efficient Effect Size Computation*. https://doi.org/10.5281/zenodo.1480624.

*Readr: Read Rectangular Text Data*. https://CRAN.R-project.org/package=readr.