The two-way ANOVA is an extension of the one-way ANOVA. The "two-way" comes because each item is classified in two ways, as opposed to one way.

For example, one way classifications might be: gender, political party, religion, or race. Two way classifications might be by gender and political party, gender and race, or religion and race.

Each classification variable is a called a factor and so there are two factors, each having several levels within that factor. The factors are called the "row factor" and the "column factor" because the data is usually arranged into table format. Each combination of a row level and a column level is called a treatment.

The two-way ANOVA that we're going to discuss requires a balanced design. The balanced design is where each treatment has the same sample size.

A pharmaceutical company is testing a new drug to see if it helps reduce the time to recover from a fever. They decide to test the drug on three different races (Caucasian, African American, and Hispanic) and both genders (male and female). This makes six treatments (3 races × 2 genders = 6 treatments).They randomly select five test subjects from each of those six treatments, so all together, they have 3 × 2 × 5 = 30 test subjects.

The response variable is the time in minutes after taking the medicine before the fever is reduced.

The data might look something like this.

Male | Female | |
---|---|---|

Caucasian | 54, 49, 59, 39, 55 | 25, 29, 47, 26, 28 |

African American | 53, 72, 43, 56, 52 | 46, 51, 33, 47, 41 |

Hispanic | 33, 30, 26, 25, 29 | 18, 21, 34, 40, 24 |

Here are the summary statistics for each of the six treatments as well as for each level within each factor.

Male | Female | All | |
---|---|---|---|

Caucasian | mean = 51.2 stdev = 7.694 |
mean = 31.0 stdev = 9.083 |
mean = 49.4 stdev = 10.405 |

African American | mean = 55.2 stdev = 10.569 |
mean = 43.6 stdev = 6.914 |
mean = 41.1 stdev = 13.279 |

Hispanic | mean = 28.6 stdev = 3.209 |
mean = 27.4 stdev = 9.263 |
mean = 28.0 stdev = 6.566 |

All | mean = 45.0 stdev = 14.097 |
mean = 34.0 stdev = 10.650 |
mean = 39.5 stdev = 13.490 |

There are three sets of hypothesis tests for the Two-Way ANOVA.

- H
_{0}: The means of each row (race) are equal

H_{1}: The mean of at least one row (race) is different - H
_{0}: The means of each column (gender) are equal

H_{1}: The mean of at least one column (gender) is different - H
_{0}: There is no interaction between the row (race) and column (gender)

H_{1}: There is interaction between the row (race) and column (gender)

The first two hypotheses are essentially one-way ANOVAs for the row (race) or column (gender) variables.

The third hypothesis is similar to a chi-squared test for independence where no interaction means they are not related to each other. This test is also similar to the test for independence in the way that the degrees of freedom are calculated, the df here is the df(Row) × df(Column).

There are extra rows in the Two-Way ANOVA table. There are three sources besides the error (unexplained), so we have a row for each of those sources. The variations (SS) are best found using technology.

Source | SS | df | MS | F |
---|---|---|---|---|

Row (race) |
2328.2 | |||

Column (gender) |
907.5 | |||

Interaction (race × gender) |
452.6 | |||

Error |
1589.2 | |||

Total |
5277.5 |

We're back to that "one less than" rule for df.

- There are 3 races, so there are 2 df for the races
- There are 2 genders, so there is 1 df for the gender
- Interaction is race × gender and so we multiple df(row) × df(column) = 2 × 1 = 2 df there
- The error df is the sum of the individual df's for each treatment. There were 5 in each treatment group and so there are 4 df for each. There are 6 treatment groups of 4 df each, so there are 24 df for the error term.
- The total df is one less than the sample size.

Source | SS | df | MS | F |
---|---|---|---|---|

Row (race) |
2328.2 | 2 | ||

Column (gender) |
907.5 | 1 | ||

Interaction (race × gender) |
452.6 | 2 | ||

Error |
1589.2 | 24 | ||

Total |
5277.5 | 29 |

Once you have the df, the table works like it did before.

The MS = SS / df.

- MS(race) = 2328.2 / 2 = 1164.1
- MS(gender) = 907.5 / 1 = 907.5
- MS(interaction) = 452.6 / 2 = 226.3
- MS(error) = 1589.2 / 24 = 66.22 [this is the pooled estimate of the variance]
- MS(total) = 5277.5 / 29 = 181.98 [this is technically not part of the table, but it is the variance on the response variable]

The F test statistic is found by dividing the MS for each row by the MS for the error source.

- F(race) = 1164.1 / 66.22 = 17.58
- F(gender) = 907.5 / 66.22 = 13.71
- F(interaction) = 226.3 / 66.22 = 3.42

There is no F for the error or total sources.

Source | SS | df | MS | F |
---|---|---|---|---|

Row (race) |
2328.2 | 2 | 1164.10 | 17.58 |

Column (gender) |
907.5 | 1 | 907.50 | 13.71 |

Interaction (race × gender) |
452.6 | 2 | 226.30 | 3.42 |

Error |
1589.2 | 24 | 66.22 | |

Total |
5277.5 | 29 | 181.98 |

The table is complete.

To make a decision about the hypothesis test, you really need a p-value. These test statistics have F distributions. The numerator df is the df for the source and the denominator df is the df for the error.

- Race: F has 2 numerator and 24 denominator df
- Gender: F has 1 numerator and 24 denominator df
- Interaction: F has 2 numerator and 24 denominator df

The p-value is the area to the right of the test statistic since this is always a right tail test.

- The p-value for the Race factor is the area to the right F = 17.58 using 2 numerator and 24 denominator df.
- The p-value for the Race factor is the area to the right F = 13.71 using 1 numerator and 24 denominator df.
- The p-value for the Race factor is the area to the right F = 3.42 using 2 numerator and 24 denominator df.

Source | SS | df | MS | F | P |
---|---|---|---|---|---|

Row (race) |
2328.2 | 2 | 1164.10 | 17.58 | 0.000 |

Column (gender) |
907.5 | 1 | 907.50 | 13.71 | 0.001 |

Interaction (race × gender) |
452.6 | 2 | 226.30 | 3.42 | 0.049 |

Error |
1589.2 | 24 | 66.22 | ||

Total |
5277.5 | 29 | 181.98 |

Each test statistic and p-value are used to answer the hypothesis dealing with the appropriate source. That is, the

- F = 17.58 is for the race source, so it would be used to determine if there is a difference in the mean reaction times of the different races.
- F = 13.71 is for the gender source, so it would be used to determine if there is a difference in the mean reaction times of the different genders.
- F = 3.42 is for the interaction source, so it would be used to determine if there is interaction between the race and gender.