Two-Way Analysis of Variance

Introduction

The two-way ANOVA is an extension of the one-way ANOVA. The "two-way" comes because each item is classified in two ways, as opposed to one way.

For example, one way classifications might be: gender, political party, religion, or race. Two way classifications might be by gender and political party, gender and race, or religion and race.

Each classification variable is a called a factor and so there are two factors, each having several levels within that factor. The factors are called the "row factor" and the "column factor" because the data is usually arranged into table format. Each combination of a row level and a column level is called a treatment.

The two-way ANOVA that we're going to discuss requires a balanced design. The balanced design is where each treatment has the same sample size.

Example: Drug Testing

A pharmaceutical company is testing a new drug to see if it helps reduce the time to recover from a fever. They decide to test the drug on three different races (Caucasian, African American, and Hispanic) and both genders (male and female). This makes six treatments (3 races × 2 genders = 6 treatments).They randomly select five test subjects from each of those six treatments, so all together, they have 3 × 2 × 5 = 30 test subjects.

The response variable is the time in minutes after taking the medicine before the fever is reduced.

The data might look something like this.

Data

	Male	Female
Caucasian	54, 49, 59, 39, 55	25, 29, 47, 26, 28
African American	53, 72, 43, 56, 52	46, 51, 33, 47, 41
Hispanic	33, 30, 26, 25, 29	18, 21, 34, 40, 24

Summary Statistics

Here are the summary statistics for each of the six treatments as well as for each level within each factor.

	Male	Female	All
Caucasian	mean = 51.2 stdev = 7.694	mean = 31.0 stdev = 9.083	mean = 49.4 stdev = 10.405
African American	mean = 55.2 stdev = 10.569	mean = 43.6 stdev = 6.914	mean = 41.1 stdev = 13.279
Hispanic	mean = 28.6 stdev = 3.209	mean = 27.4 stdev = 9.263	mean = 28.0 stdev = 6.566
All	mean = 45.0 stdev = 14.097	mean = 34.0 stdev = 10.650	mean = 39.5 stdev = 13.490

Hypothesis Tests

There are three sets of hypothesis tests for the Two-Way ANOVA.

H₀: The means of each row (race) are equal
H₁: The mean of at least one row (race) is different
H₀: The means of each column (gender) are equal
H₁: The mean of at least one column (gender) is different
H₀: There is no interaction between the row (race) and column (gender)
H₁: There is interaction between the row (race) and column (gender)

The first two hypotheses are essentially one-way ANOVAs for the row (race) or column (gender) variables.

The third hypothesis is similar to a chi-squared test for independence where no interaction means they are not related to each other. This test is also similar to the test for independence in the way that the degrees of freedom are calculated, the df here is the df(Row) × df(Column).

Two-Way ANOVA Table

There are extra rows in the Two-Way ANOVA table. There are three sources besides the error (unexplained), so we have a row for each of those sources. The variations (SS) are best found using technology.

Source	SS	df	MS	F
Row (race)	2328.2
Column (gender)	907.5
Interaction (race × gender)	452.6
Error	1589.2
Total	5277.5

How to find the degrees of freedom

We're back to that "one less than" rule for df.

There are 3 races, so there are 2 df for the races
There are 2 genders, so there is 1 df for the gender
Interaction is race × gender and so we multiple df(row) × df(column) = 2 × 1 = 2 df there
The error df is the sum of the individual df's for each treatment. There were 5 in each treatment group and so there are 4 df for each. There are 6 treatment groups of 4 df each, so there are 24 df for the error term.
The total df is one less than the sample size.

Source	SS	df
Row (race)	2328.2	2
Column (gender)	907.5	1
Interaction (race × gender)	452.6	2
Error	1589.2	24
Total	5277.5	29

Finishing the Table

Once you have the df, the table works like it did before.

The MS = SS / df.

MS(race) = 2328.2 / 2 = 1164.1
MS(gender) = 907.5 / 1 = 907.5
MS(interaction) = 452.6 / 2 = 226.3
MS(error) = 1589.2 / 24 = 66.22 [this is the pooled estimate of the variance]
MS(total) = 5277.5 / 29 = 181.98 [this is technically not part of the table, but it is the variance on the response variable]

The F test statistic is found by dividing the MS for each row by the MS for the error source.

F(race) = 1164.1 / 66.22 = 17.58
F(gender) = 907.5 / 66.22 = 13.71
F(interaction) = 226.3 / 66.22 = 3.42

There is no F for the error or total sources.

Source	SS	df	MS	F
Row (race)	2328.2	2	1164.10	17.58
Column (gender)	907.5	1	907.50	13.71
Interaction (race × gender)	452.6	2	226.30	3.42
Error	1589.2	24	66.22
Total	5277.5	29	181.98

The table is complete.

Finding the p-values

To make a decision about the hypothesis test, you really need a p-value. These test statistics have F distributions. The numerator df is the df for the source and the denominator df is the df for the error.

Race: F has 2 numerator and 24 denominator df
Gender: F has 1 numerator and 24 denominator df
Interaction: F has 2 numerator and 24 denominator df

The p-value is the area to the right of the test statistic since this is always a right tail test.

The p-value for the Race factor is the area to the right F = 17.58 using 2 numerator and 24 denominator df.
The p-value for the Race factor is the area to the right F = 13.71 using 1 numerator and 24 denominator df.
The p-value for the Race factor is the area to the right F = 3.42 using 2 numerator and 24 denominator df.

Source	SS	df	MS	F	P
Row (race)	2328.2	2	1164.10	17.58	0.000
Column (gender)	907.5	1	907.50	13.71	0.001
Interaction (race × gender)	452.6	2	226.30	3.42	0.049
Error	1589.2	24	66.22
Total	5277.5	29	181.98

Each test statistic and p-value are used to answer the hypothesis dealing with the appropriate source. That is, the

F = 17.58 is for the race source, so it would be used to determine if there is a difference in the mean reaction times of the different races.
F = 13.71 is for the gender source, so it would be used to determine if there is a difference in the mean reaction times of the different genders.
F = 3.42 is for the interaction source, so it would be used to determine if there is interaction between the race and gender.